Convert Json into CSV - python

I have json in this format and want to convert in to the CSV file.
{
"extrapolationLevel": 1,
"columnNames": [
"name",
"usersession.country",
"application",
"usersession.osFamily",
"usersession.startTime",
"visuallyCompleteTime"
],
"values": [
[
"pdp",
"Serbia",
"Desktop",
"Windows",
1573215462076,
1503
],
]
}
I want to convert this json into CSV format here is my script
import csv
import json
with open('response_1573222394875.json') as infile:
Data = json.loads(infile.read())
with open("q.csv", "w") as outfile:
f = csv.writer(outfile)
f.writerow(["name","usersession.country","application","usersession.osFamily","usersession.startTime","visuallyCompleteTime"])
f.writerow([Data["name"], Data["usersession.country"],
Data["application"],
Data["usersession.osFamily"],
Data["usersession.startTime"],
Data["visuallyCompleteTime"]])
Expected Output
name usersession.country application usersession.osFamilyusersession.startTime visuallyCompleteTime
pdp Serbia Desktop Windows 1573215462076 1503
plp us APP Windows 1573215462076 1548
startpage uk Site Windows 1573215462076 1639
product india Desktop Windows 1573215462076 3194
pdp Vietnam APP Windows 1573215462076 3299
can any one help me here please.

The keys of Data are "extrapolationLevel", "columnNames", and "values", nothing else. Data["usersession.country"], for example, doesn't make sense, because there's no dictionary present with that key. The values are just stored in lists. Here's all you need to do:
with open("q.csv", "w") as outfile:
f = csv.writer(outfile)
f.writerow(Data["columnNames"])
f.writerows(Data["values"])

import json
json_str = '''
{
"extrapolationLevel": 1,
"columnNames": [
"name",
"usersession.country",
"application",
"usersession.osFamily",
"usersession.startTime",
"visuallyCompleteTime"
],
"values": [
[
"pdp",
"Serbia",
"Desktop",
"Windows",
1573215462076,
1503
]
]
}
'''
data = json.loads(json_str)
csv_row = lambda v: ('"{}",' * len(v)).format(*v)[:-1] # remove trailing comma
print(csv_row(data["columnNames"]))
for value in data["values"]:
print(csv_row(value))

Related

How to write the content of a json file to an excel file

I have a bit complicated looking json file that stores a dictionary.
Here is the dictionary if you want to code on your local machine.
{'https://www.linkedin.com/in/manashi-sherawat-mathur-phd-3a97b69':
{'showallexperiences':
[{'company_url': 'https://www.linkedin.com/company/1612/'},
{'company_url': 'https://www.linkedin.com/search/results/all/?keywords=Independent+Pharma%2FBiotech+Professional'], 'showalleducation':
[{'university_url': 'https://www.linkedin.com/company/3555/'}, {'university_url': None}]},
'https://www.linkedin.com/in/baneshwar-singh-6143082b/':
{'showallexperiences':
[{'company_url': 'https://www.linkedin.com/company/166810/'},
{'company_url': 'https://www.linkedin.com/company/166810/'}],
'showalleducation':
[{'university_url': 'https://www.linkedin.com/company/6737/'}, {'university_url': 'https://www.linkedin.com/company/5826549/'}]}}}
The main keys of the dictionary are different Linkedin URLs and their values are also dictionaries each storing company and university URLs belonging to that link.
I need to write the content of that dictionary to an excel file.
Particularly, each different Linkedin URL (the main key)'s info should be written on a separate row.
I also hope to add some numbering for the columns of different company's and university's urls.
Can somebody please suggest any solution?
Try:
import pandas as pd
data = {
"https://www.linkedin.com/in/manashi-sherawat-mathur-phd-3a97b69": {
"showallexperiences": [
{"company_url": "https://www.linkedin.com/company/1612/"},
{
"company_url": "https://www.linkedin.com/search/results/all/?keywords=Independent+Pharma%2FBiotech+Professional"
},
],
"showalleducation": [
{"university_url": "https://www.linkedin.com/company/3555/"},
{"university_url": None},
],
},
"https://www.linkedin.com/in/baneshwar-singh-6143082b/": {
"showallexperiences": [
{"company_url": "https://www.linkedin.com/company/166810/"},
{"company_url": "https://www.linkedin.com/company/166810/"},
],
"showalleducation": [
{"university_url": "https://www.linkedin.com/company/6737/"},
{"university_url": "https://www.linkedin.com/company/5826549/"},
],
},
}
all_data = []
for k, v in data.items():
company_urls = {
f"exp{i}_company_url": e["company_url"]
for i, e in enumerate(v["showallexperiences"], 1)
}
edu_urls = {
f"edu{i}_uni_url": e["university_url"]
for i, e in enumerate(v["showalleducation"], 1)
}
all_data.append({"linkedin_link": k, **company_urls, **edu_urls})
df = pd.DataFrame(all_data)
print(df.to_markdown(index=False))
df.to_csv("data.csv", index=False)
Prints:
linkedin_link
exp1_company_url
exp2_company_url
edu1_uni_url
edu2_uni_url
https://www.linkedin.com/in/manashi-sherawat-mathur-phd-3a97b69
https://www.linkedin.com/company/1612/
https://www.linkedin.com/search/results/all/?keywords=Independent+Pharma%2FBiotech+Professional
https://www.linkedin.com/company/3555/
https://www.linkedin.com/in/baneshwar-singh-6143082b/
https://www.linkedin.com/company/166810/
https://www.linkedin.com/company/166810/
https://www.linkedin.com/company/6737/
https://www.linkedin.com/company/5826549/
and saves data.csv (screenshot from LibreOffice):

Conversion from nested json to csv with pandas

I am trying to convert a nested json into a csv file, but I am struggling with the logic needed for the structure of my file: it's a json with 2 objects and I would like to convert into csv only one of them, which is a list with nesting.
I've found very helpful "flattening" json info in this blog post. I have been basically adapting it to my problem, but it is still not working for me.
My json file looks like this:
{
"tickets":[
{
"Name": "Liam",
"Location": {
"City": "Los Angeles",
"State": "CA"
},
"hobbies": [
"Piano",
"Sports"
],
"year" : 1985,
"teamId" : "ATL",
"playerId" : "barkele01",
"salary" : 870000
},
{
"Name": "John",
"Location": {
"City": "Los Angeles",
"State": "CA"
},
"hobbies": [
"Music",
"Running"
],
"year" : 1985,
"teamId" : "ATL",
"playerId" : "bedrost01",
"salary" : 550000
}
],
"count": 2
}
my code, so far, looks like this:
import json
from pandas.io.json import json_normalize
import argparse
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Converting json files into csv for Tableau processing')
parser.add_argument(
"-j", "--json", dest="json_file", help="PATH/TO/json file to convert", metavar="FILE", required=True)
args = parser.parse_args()
with open(args.json_file, "r") as inputFile: # open json file
json_data = json.loads(inputFile.read()) # load json content
flat_json = flatten_json(json_data)
# normalizing flat json
final_data = json_normalize(flat_json)
with open(args.json_file.replace(".json", ".csv"), "w") as outputFile: # open csv file
# saving DataFrame to csv
final_data.to_csv(outputFile, encoding='utf8', index=False)
What I would like to obtain is 1 line per ticket in the csv, with headings:
Name,Location_City,Location_State,Hobbies_0,Hobbies_1,Year,TeamId,PlayerId,Salary.
I would really appreciate anything that can do the click!
Thank you!
I actually wrote a package called cherrypicker recently to deal with this exact sort of thing since I had to do it so often!
I think the following code would give you exactly what you're after:
from cherrypicker import CherryPicker
import json
import pandas as pd
with open('file.json') as file:
data = json.load(file)
picker = CherryPicker(data)
flat = picker['tickets'].flatten().get()
df = pd.DataFrame(flat)
print(df)
This gave me the output:
Location_City Location_State Name hobbies_0 hobbies_1 playerId salary teamId year
0 Los Angeles CA Liam Piano Sports barkele01 870000 ATL 1985
1 Los Angeles CA John Music Running bedrost01 550000 ATL 1985
You can install the package with:
pip install cherrypicker
...and there's more docs and guidance at https://cherrypicker.readthedocs.io.
An you already have a function to flatten a Json object, you have just to flatten the tickets:
...
with open(args.json_file, "r") as inputFile: # open json file
json_data = json.loads(inputFile.read()) # load json content
final_data = pd.DataFrame([flatten_json(elt) for elt in json_data['tickets']])
...
With your sample data, final_data is as expected:
Location_City Location_State Name hobbies_0 hobbies_1 playerId salary teamId year
0 Los Angeles CA Liam Piano Sports barkele01 870000 ATL 1985
1 Los Angeles CA John Music Running bedrost01 550000 ATL 1985
There may be a simpler solution for this. But this should work!
import json
import pandas as pd
with open('file.json') as file:
data = json.load(file)
df = pd.DataFrame(data['tickets'])
for i,item in enumerate(df['Location']):
df['location_city'] = dict(df['Location'])[i]['City']
df['location_state'] = dict(df['Location'])[i]['State']
for i,item in enumerate(df['hobbies']):
df['hobbies_{}'.format(i)] = dict(df['hobbies'])[i]
df = df.drop({'Location','hobbies'}, axis=1)
print(df)

Not getting expected output in python when converting a csv to json

I have an excel file in which data is saved in csv format in such a way.This data is present in the excel file as shown below,under column A (The CSV File is generated by LabView Software code which i have written to generate data).I have also attached an image of the csv file for reference at the end of my question.
RPM,Load Current,Battery Output,Power Capacity
1200,30,12,37
1600,88,18,55
I want to create a Json file in such format
{
"power_capacity_data" :
{
"rpm" : ["1200","1600"],
"load_curr" : ["30","88"],
"batt_output" : ["12","18"],
"power_cap" : ["37","55"]
}
}
This is my code
import csv
import json
def main():
#created a dictionary so that i can append data to it afterwards
power_data = {"rpm":[],"load_curr":[],"batt_output":[],"power_cap":[]}
with open('power1.lvm') as f:
reader = csv.reader(f)
#trying to append the data of column "RPM" to dictionary
rowcount = 0
for row in reader:
if rowcount == 0:
#trying to skip the first row
rowcount = rowcount + 1
else:
power_data['rpm'].append(row[0])
print(row)
json_report = {}
json_report['pwr_capacity_data'] = power_data
with open('LVMJSON', "w") as f1:
f1.write(json.dumps(json_report, sort_keys=False, indent=4, separators=(',', ': '),encoding="utf-8",ensure_ascii=False))
f1.close()
if __name__ == "__main__":
main()
The output json file that i am getting is this:(please ignore the print(row) statement in my code)
{
"pwr_capacity_data":
{
"load_curr": [],
"rpm": [
"1200,30,12.62,37.88",
"1600,88,18.62,55.88"
],
"batt_output": [],
"power_cap": []
}
}
The whole row is getting saved in the list,but I just want the values under the column RPM to be saved .Can someone help me out with what I may be doing wrong.Thanks in advance.I have attached an image of csv file to just in case it helps
You could use Python's defaultdict to make it a bit easier. Also a dictionary to map all your header values.
from collections import defaultdict
import csv
import json
power_data = defaultdict(list)
header_mappings = {
'RPM' : 'rpm',
'Load Current' : 'load_curr',
'Battery Output' : 'batt_output',
'Power Capacity' : 'power_cap'}
with open('power1.lvm', newline='') as f_input:
csv_input = csv.DictReader(f_input)
for row in csv_input:
for key, value in row.items():
power_data[header_mappings[key]].append(value)
with open('LVMJSON.json', 'w') as f_output:
json.dump({'power_capacity_data' : power_data}, f_output, indent=2)
Giving you an output JSON file looking like:
{
"power_capacity_data": {
"batt_output": [
"12",
"18"
],
"power_cap": [
"37",
"55"
],
"load_curr": [
"30",
"88"
],
"rpm": [
"1200",
"1600"
]
}
}

Python - parsed json output to CSV

I have parsed a json file in python and have the results printed on screen.
However, I would also like the results to be output to a csv file, exactly as they appear on screen.
Here is my code:
import json
with open('euroinc.json') as json_data:
d = json.load(json_data)
for p in d['results']:
print(p['sedol']+','+p['company']+','+p['name']+','+ p['unitType']+','+p['perf48t60m']+','+p['perf36t48m']+','+p['perf24t36m']+','+p['perf12t24m']+','+p['perf12m']+','+p['initialCharge']+','+p['netAnnualCharge'])
Any help would be much appreciated!
Thanks,
Craig
Update: here is the json sample:
{
"results": [
{
"sector": "Europe Including UK",
"perf48t60m": "n/a",
"discountedCode": "",
"price_buy": "0",
"plusFund": false,
"unitType": "Accumulation",
"perf6m": "6.35%",
"perf36t48m": "11.29%",
"loaded": "Unbundled",
"fundSize": "2940.1",
"annualCharge": "1.07",
"netAnnualCharge": "1.07",
"sedol": "B7BW9Y0",
"perf24t36m": "0.25%",
"annualSaving": "0.00",
"updated": "06/09/2017",
"incomeFrequency": "N/a",
"perf60m": "n/a",
"perf12t24m": "12.97%",
"company": "BlackRock",
"initialCharge": "0.00",
"paymentType": "Dividend",
"perf3m": "0.32%",
"name": "BlackRock Global European Value (D2 GBP)",
"perf12m": "19.37%",
"price_change": "-39.00",
"yield": "0.00",
"price_sell": "6569.00",
"perf36m": "35.19%",
"numHoldings": "51"
},
{
"sector": "Europe Including UK",
"perf48t60m": "22.01%",
"discountedCode": "",
"price_buy": "0",
"plusFund": false,
"unitType": "Income",
"perf6m": "7.81%",
"perf36t48m": "9.61%",
"loaded": "Unbundled",
"fundSize": "566.1",
"annualCharge": "0.30",
"netAnnualCharge": "0.30",
"sedol": "B76VTR5",
"perf24t36m": "-3.95%",
"annualSaving": "0.00",
"updated": "06/09/2017",
"incomeFrequency": "Quarterly",
"perf60m": "77.38%",
"perf12t24m": "15.38%",
"company": "Vanguard",
"initialCharge": "0.00",
"paymentType": "Dividend",
"perf3m": "0.74%",
"name": "Vanguard SRI European Stock",
"perf12m": "19.69%",
"price_change": "-21.37",
"yield": "2.79",
"price_sell": "15800.81",
"perf36m": "32.65%",
"numHoldings": "502"
}
]
}
This will write a CSV file with a header. Note fieldnames and extrasaction parameters are required to specify the order of columns and prevent an error when there are extra dictionary entries.
#!python2
import json
import csv
with open('euroinc.json') as json_data:
d = json.load(json_data)
# with open('out.csv','w',newline='') as f: # Python 3 version
with open('out.csv','wb') as f:
w = csv.DictWriter(f,
fieldnames='sedol company name unitType perf48t60m perf36t48m perf24t36m perf12t24m perf12m initialCharge netAnnualCharge'.split(),
extrasaction='ignore')
w.writeheader()
# Ways to use a different header.
# Note the direct write should use \r\n as that is the default 'excel' CSV dialect for line terminator.
# f.write('A,B,C,D,E,F,G,H,I,J,K\r\n')
# w.writerow(dict(zip(w.fieldnames,'col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 col11'.split())))
w.writerows(d['results'])
Output:
sedol,company,name,unitType,perf48t60m,perf36t48m,perf24t36m,perf12t24m,perf12m,initialCharge,netAnnualCharge
B7BW9Y0,BlackRock,BlackRock Global European Value (D2 GBP),Accumulation,n/a,11.29%,0.25%,12.97%,19.37%,0.00,1.07
B76VTR5,Vanguard,Vanguard SRI European Stock,Income,22.01%,9.61%,-3.95%,15.38%,19.69%,0.00,0.30
I assume p is a dictionary
You could try:
for p in d['results']:
for key in p.keys():
result = a[i]+',',
print result
for the csv part you could try:
import csv
csv_file = open('your_csv.csv', 'wb')
csv_outp = csv.writer(csv_file, delimiter=',')
csv_outp.writerow(result)
I hope this help you

Saving multiple dictionaries-of-lists from multiple files to a dictionary, then writing to file

N files of with dictionaries-of-lists, saved as a.json, b.json...
{
"ELEC.GEN.OOG-AK-99.A": [
["2013", null],
["2012", 2.65844],
["2011", 2.7383]
],
"ELEC.GEN.AOR-AK-99.A": [
["2015", 217.30239],
["2014", 214.46868],
["2013", 197.32097]
],
"ELEC.GEN.HYC-AK-99.A": [
["2015", 1542.29841],
["2014", 1538.738],
["2013", 1345.665]
]}
I am unclear how to save them all to one large dictionary/json file, like so:
{
"a":
{
"ELEC.GEN.OOG-AK-99.A": [
["2013", null],
["2012", 2.65844],
["2011", 2.7383]
],
"ELEC.GEN.AOR-AK-99.A": [
["2015", 217.30239],
["2014", 214.46868],
["2013", 197.32097]
],
"ELEC.GEN.HYC-AK-99.A": [
["2015", 1542.29841],
["2014", 1538.738],
["2001", 1345.665]
]},
"b": {...},
...
}
This is data I requested that will be used in a javascript graph, and it is theoretically possible to preprocess it even more when streaming the requested data from its source, as well as maybe possible to work around the fact there are so many data files I need to request to get my graph working, but both those options seem very difficult.
I don't understand the best way to parse json-that-is-meant-for-javascript in python.
====
I have tried:
from collections import defaultdict
# load into memory
data = defaultdict(dict)
filelist = ["a.json", "b.json", ...]
for fn in filelist:
with open(fn, 'rb') as f:
# this brings up TypeError
data[fn] = json.loads(f)
# write
out = "out.json"
with open(out, 'wb') as f:
json.dump(data, f)
===
For json.loads() I get TypeError: expected string or buffer. For json.load() it works!
Loading from string:
>>> with open("a.json", "r") as f:
... json.loads(f.read())
...
{u'Player2': 4, u'Player3': 10, u'Player1': 3}
>>>
Loading from file object:
>>> with open("a.json", "r") as f:
... json.load(f)
...
{u'Player2': 4, u'Player3': 10, u'Player1': 3}
>>>
you are using json.loads instead of json.load to load a file, you also need to open it for reading for string instead of bytes, so change this:
with open(fn, 'rb') as f:
data[fn] = json.loads(f)
to this:
with open(f, 'r') as f: #only r instead of rb
data[fn] = json.load(f) #load instead of loads
And again further down when writing open for w instead of wb

Categories

Resources