Conversion of YAML Data to Data Frame using yamltodb - python

I am Trying to convert the YAML Data to Data frame through pandas with yamltodb package. but it is showing only the single row enclosed with header and only one data is showing. I tried to convert the Yaml file to JSON file and then tried normalize function. But it is not working out. Attached the screenshot for JSON function output. I need to categorize it under batman, bowler and runs etc. Code
Output Image and their code..

Just guessing, as I don’t know what your data actually looks like
import pandas as pd
import yaml
with open('fName.yaml', 'r') as f:
df = pd.io.json.json_normalize(yaml.load(f))
df.head()

Related

How to Parse XML saved in Excel cell

I have excel file which contains XML data in each cell of column, I want to parse those XML data in each cell and save each to new file.
Here is my code:
import pandas as pd
import numpy as np
import xml.etree.cElementTree as et
file_path = r'C:\Users\user\Documents\datasets\sample.xlsx'
df = pd.read_excel(file_path)
for i in count_row:
pd.read_xml(df['XML'].iloc[i])
Here's sample file and Here's desired output
Instead of pandas, you could also look at openpyxl. This might make it easier for you to carve out the data that you need.
You are mentioning that you want to parse the XML, but not specifying what you want to do with it... but, I would suggest xmltodict library for parsing XML.

Reading in a multiindex .csv file as returned from pandas using the ftable type in R

I have a multi-index (multi-column to be exact) pandas data frame in Python that I saved using the .to_csv() method. Now I would like to continue my analysis in R. For that I need to read in the .csv file. I know that R does not really support multi-index data frames like pandas does but it can handle ftables using the stats package. I tried to use read.ftable() but I can't figure out how to set the arguments right to correctly import the .csv file.
Here's some code to create a .csv file that has the same structure as my original data:
require(stats)
# create example .csv file with a multiindex as it would be saved when using pandas
fileConn<-file('test.csv')
long_string = paste("col_level_1,a,b,c,d\ncol_level_2,cat,dog,tiger,lion\ncol_level_3,foo,foo,foo,foo\nrow_level_1,,,,\n1,",
"\"0,525640810622065\",\"0,293400380474675\",\"0,591895790442417\",\"0,675403394728461\"\n2,\"0,253176104907883\",",
"\"0,107715459748816\",\"0,211636325794272\",\"0,618270276545688\"\n3,\"0,781049927692169\",\"0,72968971635063\",",
"\"0,913378426593516\",\"0,739497259262532\"\n4,\"0,498966730971063\",\"0,395825713762063\",\"0,252543611974303\",",
"\"0,240732390893718\"\n5,\"0,204075522469035\",\"0,227454178487449\",\"0,476571725142606\",\"0,804041968683541\"\n6,",
"\"0,281453400066927\",\"0,010059089264751\",\"0,873336799707968\",\"0,730105129502755\"\n7,\"0,834572206714808\",",
"\"0,668889079581709\",\"0,516135581764696\",\"0,999861473609101\"\n8,\"0,301692961056344\",\"0,702428450077691\",",
"\"0,211660363912457\",\"0,626178589354395\"\n9,\"0,22051883447221\",\"0,934567760412661\",\"0,757627523007149\",",
"\"0,721590060307143\"",sep="")
writeLines(long_string, fileConn)
close(fileConn)
When opening the .csv file in a reader of your choice, it should look like this:
How can I read this in using R?
I found one solution without using read.ftable() based on this post. Not that this won't give you the data in the ftable format:
headers <- read.csv(file='./test.csv',header=F,nrows=3,as.is=T,row.names=1)
dat <- read.table('./test.csv',skip=4,header=F,sep=',',row.names=1)
headers_collapsed <- apply(headers,2,paste,collapse='.')
colnames(dat) <- headers_collapsed

Using pandas to read from .csv file but it's cutting off the decimal places?

I am trying to read in temperature data from a .csv file using pandas. Here is a sample of the file:
My issue is that my code is not extracting the data to 1 decimal place? This is what I have at the moment:
# Import packages
import pandas as pd
# Open data file
data_file_name = "data_set.csv"
data_file = pd.read_csv(data_file_name, header=2).astype(int)
# Extract temperature data
target_data = data_file["Temperature"].astype(float)
print(target_data.loc[[0]])
After adding in the print statement to see if the first value is -23.5 as it should be, instead I get:
-23.0
Why isn't my code reading the data as a float with 1 d.p?
I believe your issue is that you're reading in the datafile as .astype(int), which is converting everything in the CSV to an int, so you are unable to recover the decimal by doing .astype(float). Try to not specify type on the inital read_csv, as Pandas can normally handle properly typing automatically.

Options for creating csv structure I can work with

My task is to take an output from a machine, and convert that data to json. I am using python, but the issue is the structure of the output.
From my research online, csv usually has the first row with the keys and the values in the same order underneath. Example: https://e.nodegoat.net/CMS/upload/guide-import_person_csv_notepad.png
However, the output from my machine doesn't look like this.
Mine looks like:
Date:,10/10/2015
Name:,"Company name"
Location:,"Company location"
Serial num:,"Serial number"
So the machine i'm working with is outputting each result on a new .dat file instead of appending onto a single csv with the row of keys and whatnot. Technically, yes the data is separated with csv, but not sure how to work with this.
How should I go about turning this kind of data to json? Should I look into restructuring the data to the default csv? Or is there a way I can work with this and not do any cleanup to convert this? In either case, any direction is appreciated.
You can try transpose using pandas
import pandas as pd
from io import StringIO
data = '''\
Date:,10/10/2015
Name:,"Company name"
Location:,"Company location"
Serial num:,"Serial number"
'''
f = StringIO(data)
df = pd.read_csv(f)
t = df.set_index(df.columns[0]).T
print(t['Location:'][0])
print(t['Serial num:'][0])

How to write the JSON structured data to a text file in Python?

I am trying to write my JSON structured data to a JSON file. js dataframe contains the JSON data like this:
[{"variable":"Latitude","min":26.845043,"Q1":31.1972475},{"variable":"Longitude","min":-122.315002,"Q1":-116.557795},{"variable":"Zip","min":20910.0,"Q1":32788.5}]
But when I write it to a file, the data gets stored differently. Could you please help me to store the result as like it is in the dataframe(js)?
"[{\"variable\":\"Latitude\",\"min\":26.845043,\"Q1\":31.1972475},{\"variable\":\"Longitude\",\"min\":-122.315002,\"Q1\":-116.557795},{\"variable\":\"Zip\",\"min\":20910.0,\"Q1\":32788.5}]"
Code:
import csv
import json
import pandas as pd
df = pd.read_csv(r'C:\Users\spanda031\Downloads\DQ_RESUlT.csv')
js = df.to_json(orient="records")
print(js)
# Read JSON file
with open('C:\\Users\\spanda031\\Downloads\\data.json', 'w') as data_file:
json.dump(js,data_file)
import pandas as pd
import json
df = pd.read_csv("temp.csv")
# it will dump json to file
df.to_json("filename.json", orient="records")
Output as filename.json:
[{"variable":"Latitude","min":26.84505,"Q1":31.19725},{"variable":"Longtitude","min":-122.315,"Q1":-116.558},{"variable":"Zip","min":20910.0,"Q1":32788.5}]
I think you're double-encoding your data - df.to_json converts the data to a JSON string. Then you're running json.dump which then encodes that already-encoded string as JSON again - which results in wrapping your existing JSON in quote marks and escaping all the inner quote marks with a backslash You end up with JSON-within-JSON, which is not necessary or desirable.
You should use one or other of these methods, but not both together. It's probably easiest to use df.to_json to serialise the dataframe data accurately, and then just write the string directly to a file as text.
Talk is so cheap ,why not let me show you the code ?
import csv
import json
import pandas as pd
df = pd.read_csv(r'C:\Users\spanda031\Downloads\DQ_RESUlT.csv')
// where magic happends! :)
js = df.to_dict(orient="records")
print(js)
# Read JSON file
with open('C:\\Users\\spanda031\\Downloads\\data.json', 'w') as data_file:
json.dump(js,data_file)

Categories

Resources