How to Parse XML saved in Excel cell - python

I have excel file which contains XML data in each cell of column, I want to parse those XML data in each cell and save each to new file.
Here is my code:
import pandas as pd
import numpy as np
import xml.etree.cElementTree as et
file_path = r'C:\Users\user\Documents\datasets\sample.xlsx'
df = pd.read_excel(file_path)
for i in count_row:
pd.read_xml(df['XML'].iloc[i])
Here's sample file and Here's desired output

Instead of pandas, you could also look at openpyxl. This might make it easier for you to carve out the data that you need.
You are mentioning that you want to parse the XML, but not specifying what you want to do with it... but, I would suggest xmltodict library for parsing XML.

Related

How convert a messy table in pdf to a nest dataframe or csv file?

I am really beginner in python, but I want for my job as planner in a company fill automatically a form in excel, but the information is on a register pdf file.
I tried many python libraries so I got the Module fitz — PyMuPDF, It generate the best file because the data has coordinates (axis x and y). But now the issue is how to order every thing, since every data are on just one column
import fitz
import pandas as pd code her
doc = fitz.open('C:\\Users\\fhoqu\\OneDrive\\Desktop\\Sures19al21julio2022.pdf')
print(doc.page_count)
page1 = doc[0]
words = page1.get_text("words")
df = pd.DataFrame(words)
df.rename(columns={1:"axisX", 2:"axisY", 4:"Datos"}, inplace=True)
this is the original pdf:
but it generates a csv file like this:
and I need a file ordened like this:

Conversion of YAML Data to Data Frame using yamltodb

I am Trying to convert the YAML Data to Data frame through pandas with yamltodb package. but it is showing only the single row enclosed with header and only one data is showing. I tried to convert the Yaml file to JSON file and then tried normalize function. But it is not working out. Attached the screenshot for JSON function output. I need to categorize it under batman, bowler and runs etc. Code
Output Image and their code..
Just guessing, as I don’t know what your data actually looks like
import pandas as pd
import yaml
with open('fName.yaml', 'r') as f:
df = pd.io.json.json_normalize(yaml.load(f))
df.head()

How to write the JSON structured data to a text file in Python?

I am trying to write my JSON structured data to a JSON file. js dataframe contains the JSON data like this:
[{"variable":"Latitude","min":26.845043,"Q1":31.1972475},{"variable":"Longitude","min":-122.315002,"Q1":-116.557795},{"variable":"Zip","min":20910.0,"Q1":32788.5}]
But when I write it to a file, the data gets stored differently. Could you please help me to store the result as like it is in the dataframe(js)?
"[{\"variable\":\"Latitude\",\"min\":26.845043,\"Q1\":31.1972475},{\"variable\":\"Longitude\",\"min\":-122.315002,\"Q1\":-116.557795},{\"variable\":\"Zip\",\"min\":20910.0,\"Q1\":32788.5}]"
Code:
import csv
import json
import pandas as pd
df = pd.read_csv(r'C:\Users\spanda031\Downloads\DQ_RESUlT.csv')
js = df.to_json(orient="records")
print(js)
# Read JSON file
with open('C:\\Users\\spanda031\\Downloads\\data.json', 'w') as data_file:
json.dump(js,data_file)
import pandas as pd
import json
df = pd.read_csv("temp.csv")
# it will dump json to file
df.to_json("filename.json", orient="records")
Output as filename.json:
[{"variable":"Latitude","min":26.84505,"Q1":31.19725},{"variable":"Longtitude","min":-122.315,"Q1":-116.558},{"variable":"Zip","min":20910.0,"Q1":32788.5}]
I think you're double-encoding your data - df.to_json converts the data to a JSON string. Then you're running json.dump which then encodes that already-encoded string as JSON again - which results in wrapping your existing JSON in quote marks and escaping all the inner quote marks with a backslash You end up with JSON-within-JSON, which is not necessary or desirable.
You should use one or other of these methods, but not both together. It's probably easiest to use df.to_json to serialise the dataframe data accurately, and then just write the string directly to a file as text.
Talk is so cheap ,why not let me show you the code ?
import csv
import json
import pandas as pd
df = pd.read_csv(r'C:\Users\spanda031\Downloads\DQ_RESUlT.csv')
// where magic happends! :)
js = df.to_dict(orient="records")
print(js)
# Read JSON file
with open('C:\\Users\\spanda031\\Downloads\\data.json', 'w') as data_file:
json.dump(js,data_file)

pandas.read_excel() on strangley formatted excel tables

Would pandas.read_excel() read something like this?
As you can see, there's a few lines of text before and after the table.
I cannot manually delete those unwanted lines of text.
you can try below code
import pandas as pd
data = pd.read_csv("/Users/manoj/demo.csv", encoding='utf-8', skiprows=5)
Refer Panda documentation
if pandas skip rows doesn't work;
you could use xlsxwriter or some other xlsx engine to read it into an array then import that array into a pandas dataframe

how add link to excel file using python

I'm generating an csv file that is opened by excel and converted to xlsx manually.
The csv contains some path to .txt files.
Is it possible to build the file path in such way that when the csv is converted to xlsx , they became clickable hyperlinks ?
Thanks.
I would be interested to understand your workflow a bit better, but to try and help with your specific request:
The HYPERLINK solution proposed in the comments looks like a good one
If you are able to implement that upstream in the csv generation step then great
If not and/or you are interested in automating the conversion process, consider using the pandas library:
Create a DataFrame object from a csv using the pandas.read_csv method
Convert your paths to HYPERLINKs
Write back to xlsx using the pandas.DataFrame.to_excel method
E.g. if you have a file original.csv and the relevant column header is file_paths:
import pandas as pd
df = pd.read_csv('original.csv')
df['file_paths'] = '=HYPERLINK("' + df['file_paths'] + '")'
df.to_excel('new.xlsx', index=False)
Hope that helps!
Jon

Categories

Resources