Conversion from nested json to csv with pandas - python

I am trying to convert a nested json into a csv file, but I am struggling with the logic needed for the structure of my file: it's a json with 2 objects and I would like to convert into csv only one of them, which is a list with nesting.
I've found very helpful "flattening" json info in this blog post. I have been basically adapting it to my problem, but it is still not working for me.
My json file looks like this:
{
"tickets":[
{
"Name": "Liam",
"Location": {
"City": "Los Angeles",
"State": "CA"
},
"hobbies": [
"Piano",
"Sports"
],
"year" : 1985,
"teamId" : "ATL",
"playerId" : "barkele01",
"salary" : 870000
},
{
"Name": "John",
"Location": {
"City": "Los Angeles",
"State": "CA"
},
"hobbies": [
"Music",
"Running"
],
"year" : 1985,
"teamId" : "ATL",
"playerId" : "bedrost01",
"salary" : 550000
}
],
"count": 2
}
my code, so far, looks like this:
import json
from pandas.io.json import json_normalize
import argparse
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Converting json files into csv for Tableau processing')
parser.add_argument(
"-j", "--json", dest="json_file", help="PATH/TO/json file to convert", metavar="FILE", required=True)
args = parser.parse_args()
with open(args.json_file, "r") as inputFile: # open json file
json_data = json.loads(inputFile.read()) # load json content
flat_json = flatten_json(json_data)
# normalizing flat json
final_data = json_normalize(flat_json)
with open(args.json_file.replace(".json", ".csv"), "w") as outputFile: # open csv file
# saving DataFrame to csv
final_data.to_csv(outputFile, encoding='utf8', index=False)
What I would like to obtain is 1 line per ticket in the csv, with headings:
Name,Location_City,Location_State,Hobbies_0,Hobbies_1,Year,TeamId,PlayerId,Salary.
I would really appreciate anything that can do the click!
Thank you!

I actually wrote a package called cherrypicker recently to deal with this exact sort of thing since I had to do it so often!
I think the following code would give you exactly what you're after:
from cherrypicker import CherryPicker
import json
import pandas as pd
with open('file.json') as file:
data = json.load(file)
picker = CherryPicker(data)
flat = picker['tickets'].flatten().get()
df = pd.DataFrame(flat)
print(df)
This gave me the output:
Location_City Location_State Name hobbies_0 hobbies_1 playerId salary teamId year
0 Los Angeles CA Liam Piano Sports barkele01 870000 ATL 1985
1 Los Angeles CA John Music Running bedrost01 550000 ATL 1985
You can install the package with:
pip install cherrypicker
...and there's more docs and guidance at https://cherrypicker.readthedocs.io.

An you already have a function to flatten a Json object, you have just to flatten the tickets:
...
with open(args.json_file, "r") as inputFile: # open json file
json_data = json.loads(inputFile.read()) # load json content
final_data = pd.DataFrame([flatten_json(elt) for elt in json_data['tickets']])
...
With your sample data, final_data is as expected:
Location_City Location_State Name hobbies_0 hobbies_1 playerId salary teamId year
0 Los Angeles CA Liam Piano Sports barkele01 870000 ATL 1985
1 Los Angeles CA John Music Running bedrost01 550000 ATL 1985

There may be a simpler solution for this. But this should work!
import json
import pandas as pd
with open('file.json') as file:
data = json.load(file)
df = pd.DataFrame(data['tickets'])
for i,item in enumerate(df['Location']):
df['location_city'] = dict(df['Location'])[i]['City']
df['location_state'] = dict(df['Location'])[i]['State']
for i,item in enumerate(df['hobbies']):
df['hobbies_{}'.format(i)] = dict(df['hobbies'])[i]
df = df.drop({'Location','hobbies'}, axis=1)
print(df)

Related

Extending Googles HelloAnalytics Python script to pull out the data in a csv [duplicate]

I am trying to build an API for my Google Analytics Account to export the data into a CSV. I have the Authentication code working, but I am struggling with now printing the data in the format I would like.
For the time being, I am only pulling dimension country, dimension city, and metric session. (However these will change when I get this working.) Right now, it prints:
Date Range(0)
ga:sessions: 2
ga:country:United States
ga:city:Los Angeles
...
However, I would like to have this in a line:
date Range sessions country city
0 2 USA Los Angeles
...
What code in Python do I need to use? Below is what I have.
def initialize_analyticsreporting():
parser = argparse.ArgumentParser(
formatter_class=argparse.RawDescriptionHelpFormatter,
parents=[tools.argparser])
flags = parser.parse_args([])
http = credentials.authorize(httplib2.Http())
service = build('analytics', 'v4', http=http, discoveryServiceUrl=('https://analyticsreporting.googleapis.com/$discovery/rest'))
def get_report(service):
return service.reports().batchGet(
body={
'reportRequests':[
{
"viewId": "ga:52783868",
"dimensions": [{
"name": "ga:country"},
{"name": "ga:city"}],
"metrics": [{
"expression": "ga:sessions"}],
"dateRanges": [{
"startDate": "2017-04-10",
"endDate": "2017-04-12"}]
}
]
}
).execute()
countries = []
cities = []
val = []
def print_reponse(response):
for report in response.get('reports', []):
columnHeader = report.get('columnHeader',{})
dimensionHeaders=columnHeader.get('columnHeader',[])
metricHeaders = columnHeader.get('metricHeader',{}).get('metricHeaderEntries',[])
rows = report.get('data',{}).get('rows',[])
for row in rows:
dimensions = row.get('dimensions',[])
dateRangeValues=row.get('metrics',[])
for header, dimension in zip(dimensionHeaders,dimensions):
print(header+':'+dimension)
for i, values in enumerate(dateRangeValues):
for metricHeader, value in zip(metricHeaders, values.get('values')):
print(metricHeader.get('name')+':'+value)
def main():
analytics = initialize_analyticsreporting()
response = get_report(service)
print_reponse(response)
if __name__ == '__main__':
main()
As lrnzcig's suggestion, we could parse the data with pandas and then export to csv file.
First, import pandas and json_normalize
import pandas as pd
from pandas.io.json import json_normalize
Use this function to parse data
def parse_data(response):
reports = response['reports'][0]
columnHeader = reports['columnHeader']['dimensions']
metricHeader = reports['columnHeader']['metricHeader']['metricHeaderEntries']
columns = columnHeader
for metric in metricHeader:
columns.append(metric['name'])
data = json_normalize(reports['data']['rows'])
data_dimensions = pd.DataFrame(data['dimensions'].tolist())
data_metrics = pd.DataFrame(data['metrics'].tolist())
data_metrics = data_metrics.applymap(lambda x: x['values'])
data_metrics = pd.DataFrame(data_metrics[0].tolist())
result = pd.concat([data_dimensions, data_metrics], axis=1, ignore_index=True)
return result
The result will look like ...
ga:country ga:city ga:sessions
0 (not set) (not set) 64
1 Argentina (not set) 1
2 Australia Adelaide 3
3 Australia Brisbane 9
...
Finally, call function to_csv to save data as csv file
result.to_csv('result.csv')

Turn Python Strings into JS Array

I trying to parse names from a CSV and turn them into a JS Array, this my first attempt at using python and I'm having trouble getting the right structure for the JSON file. My code is below with the current and desired output, any pointers would be greatly appreciated.
import csv, json
csvPath = "forbes_pub_top_2000.csv"
jsonPath = "pub.json"
# Read CSV, filter Names, add to data
data = {}
with open(csvPath, 'r') as csv_file:
csv_reader = csv.reader(csv_file)
next(csv_reader)
for line in csv_reader:
company = line[2]
data[company] = line[2]
# Add data to root node
root = {}
root["names"] = data
# Write data to JSON file
with open(jsonPath, 'w') as json_file:
json_file.write(json.dumps(root, indent=4))
Current output:
{
"names": {
"ICBC": "ICBC",
"China Construction Bank": "China Construction Bank",
"Berkshire Hathaway": "Berkshire Hathaway",
"JPMorgan Chase": "JPMorgan Chase",
"Wells Fargo": "Wells Fargo",
"Agricultural Bank of China": "Agricultural Bank of China",
"Bank of America": "Bank of America",
"Bank of China": "Bank of China",
...
}
Desired Output:
{
"names": ["ICBC", "China Construction Bank", "Berkshire Hathaway", "JPMorgan Chase", "Wells Fargo", "Agricultural Bank of China", "Bank of America", "Bank of China", ... ]
}
Instead of this:
for line in csv_reader:
company = line[2]
data[company] = line[2]
do this:
for line in csv_reader:
data.append(line[2])
You will also need to make data a list, not a dict:
data = []

Python write mutiple array value into csv

with my code, i read the values of JSON data and insert into array
def retrive_json():
with open('t_v1.json') as json_data:
d = json.load(json_data)
array = []
for i in d['ride']:
origin_lat = i['origin']['lat']
origin_lng = i['origin']['lng']
destination_lat = i['destination']['lat']
destination_lng = i['destination']['lng']
array.append([origin_lat,origin_lng,destination_lat,destination_lng])
return array
the result array is this :
[[39.72417, -104.99984, 39.77446, -104.9379], [39.77481, -104.93618, 39.6984, -104.9652]]
how i can write each element of each array into specific field in csv?
i have try in this way:
wrt = csv.writer(open(t_.csv', 'w'), delimiter=',',lineterminator='\n')
for x in jjson:
wrt.writerow([x])
but the value of each array are store all in one field
How can solved it and write each in a field?
this is my json file:
{
"ride":[
{
"origin":{
"lat":39.72417,
"lng":-104.99984,
"eta_seconds":null,
"address":""
},
"destination":{
"lat":39.77446,
"lng":-104.9379,
"eta_seconds":null,
"address":null
}
},
{
"origin":{
"lat":39.77481,
"lng":-104.93618,
"eta_seconds":null,
"address":"10 Albion Street"
},
"destination":{
"lat":39.6984,
"lng":-104.9652,
"eta_seconds":null,
"address":null
}
}
]
}
Let's say we have this:
jsonstring = """{
"ride":[
{
"origin":{
"lat":39.72417,
"lng":-104.99984,
"eta_seconds":null,
"address":""
},
"destination":{
"lat":39.77446,
"lng":-104.9379,
"eta_seconds":null,
"address":null
}
},
{
"origin":{
"lat":39.77481,
"lng":-104.93618,
"eta_seconds":null,
"address":"10 Albion Street"
},
"destination":{
"lat":39.6984,
"lng":-104.9652,
"eta_seconds":null,
"address":null
}
}
]
}"""
Here is a pandas solution:
import pandas as pd
import json
# Load json to dataframe
df = pd.DataFrame(json.loads(jsonstring)["ride"])
# Create the new columns
df["o1"] = df["origin"].apply(lambda x: x["lat"])
df["o2"] = df["origin"].apply(lambda x: x["lng"])
df["d1"] = df["destination"].apply(lambda x: x["lat"])
df["d2"] = df["destination"].apply(lambda x: x["lng"])
#export
print(df.iloc[:,2:].to_csv(index=False, header=True))
#use below for file
#df.iloc[:,2:].to_csv("output.csv", index=False, header=True)
Returns:
o1,o2,d1,d2
39.72417,-104.99984,39.77446,-104.9379
39.77481,-104.93618,39.6984,-104.9652
Condensed answer:
import pandas as pd
import json
with open('data.json') as json_data:
d = json.load(json_data)
df = pd.DataFrame(d["ride"])
df["o1"],df["o2"] = zip(*df["origin"].apply(lambda x: (x["lat"],x["lng"])))
df["d1"],df["d2"] = zip(*df["destination"].apply(lambda x: (x["lat"],x["lng"])))
df.iloc[:,2:].to_csv("t_.csv",index=False,header=False)
Or, maybe the most readable solution:
import json
from pandas.io.json import json_normalize
open('data.json') as json_data:
d = json.load(json_data)
df = json_normalize(d["ride"])
cols = ["origin.lat","origin.lng","destination.lat","destination.lng"]
df[cols].to_csv("output.csv",index=False,header=False)
This might help:
import json
import csv
def retrive_json():
with open('data.json') as json_data:
d = json.load(json_data)
array = []
for i in d['ride']:
origin_lat = i['origin']['lat']
origin_lng = i['origin']['lng']
destination_lat = i['destination']['lat']
destination_lng = i['destination']['lng']
array.append([origin_lat,origin_lng,destination_lat,destination_lng])
return array
res = retrive_json()
csv_cols = ["orgin_lat", "origin_lng", "dest_lat", "dest_lng"]
with open("output_csv.csv", 'w') as out:
writer = csv.DictWriter(out, fieldnames=csv_cols)
writer.writeheader()
for each_list in res:
d = dict(zip(csv_cols,each_list))
writer.writerow(d)
Output csv generated is:
orgin_lat,origin_lng,dest_lat,dest_lng
39.72417,-104.99984,39.77446,-104.9379
39.77481,-104.93618,39.6984,-104.9652
To me it looks like you've got an array of arrays and you want the individual elements. Therefore you'll want to use a nested for loop. Your current for loop is getting each array, to then split up each array into it's elements you'll want to loop through those. I'd suggest something like this:
for x in jjson:
for y in x:
wrt.writerow([y])
Obviously you might want to update your bracketing etc this is just me giving you an idea of how to solve your issue.
Let me know how it goes!
Why the csv-Library?
array = [[1, 2, 3, 4], [5, 6, 7, 8]]
with open('test.csv', 'w') as csv_file :
csv_file.write("# Header Info\n" \
"# Value1, Value2, Value3, Value4\n") # The header might be optional
for row in array :
csv_file.write(",".join(row) + "\n")

adding column 2 from a group of text files to 1 text file

I have a group of text files and I am looking to sequentially add the second column from each text file into a new text file. The files are tab delimited and of the following format:
name dave
age 35
job teacher
income 30000
I have generated a file with the 1st column of one of these files in the place of the second column to hopefully simplify the problem:
0 name
0 age
0 job
0 income
I have a large number of these files and would like to have them all in a tab delimited text file such as:
name dave mike sue
age 35 28 40
job teacher postman solicitor
income 30000 20000 40000
I have a text file containing just the names of all the files called all_libs.txt
so far I have written:
#make a sorted list of the file names
with open('all_libs.txt', 'r') as lib:
people = list([line.rstrip() for line in lib])
people_s = sorted(people)
i=0
while i< len(people_s):
with open(people_s[i]) as inf:
for line in inf:
parts = line.split() #split line into parts
if len(parts) > 1: #if more than 1 discrete unit in parts
with open("all_data.txt", 'a') as out_file: #append column2 to all_data
out_file.write((parts[1])+"\n")
i=i+1 #go to the next file in the list
As each new file is opened I would like to add it as a new column rather than just appending as a new line. Would really appreciate any help? I realize something like SQL would probably make this easy but I have never used it and don't really have time to commit to the learning curve for SQL. Many thanks.
This is a very impractical way to store your data - each record is distributed over all the lines, so it's going to be hard to reconstruct the records when reading the file and (as you've seen) to add records.
You should be using a standard format like csv or (even better in a case like this) json:
For example, you could save them as CSV like this:
name,age,job,income
dave,35,teacher,30000
mike,28,postman,20000
sue,40,solicitor,40000
Reading this file:
>>> import csv
>>> with open("C:/Users/Tim/Desktop/people.csv", newline="") as infile:
... reader = csv.DictReader(infile)
... people = list(reader)
Now you have a list of people:
>>> people
[{'income': '30000', 'age': '35', 'name': 'dave', 'job': 'teacher'},
{'income': '20000', 'age': '28', 'name': 'mike', 'job': 'postman'},
{'income': '40000', 'age': '40', 'name': 'sue', 'job': 'solicitor'}]
which you can access easily:
>>> for item in people:
... print("{0[name]} is a {0[job]}, earning {0[income]} per year".format(item))
...
dave is a teacher, earning 30000 per year
mike is a postman, earning 20000 per year
sue is a solicitor, earning 40000 per year
Adding new records now is only a matter of adding them to the end of your file:
>>> with open("C:/Users/Tim/Desktop/people.csv", "a", newline="") as outfile:
... writer = csv.DictWriter(outfile,
... fieldnames=["name","age","job","income"])
... writer.writerow({"name": "paul", "job": "musician", "income": 123456,
... "age": 70})
Result:
name,age,job,income
dave,35,teacher,30000
mike,28,postman,20000
sue,40,solicitor,40000
paul,70,musician,123456
Or you can save it as JSON:
>>> import json
>>> with open("C:/Users/Tim/Desktop/people.json", "w") as outfile:
... json.dump(people, outfile, indent=1)
Result:
[
{
"income": "30000",
"age": "35",
"name": "dave",
"job": "teacher"
},
{
"income": "20000",
"age": "28",
"name": "mike",
"job": "postman"
},
{
"income": "40000",
"age": "40",
"name": "sue",
"job": "solicitor"
}
]
file_1 = """
name dave1
age 351
job teacher1
income 300001"""
file_2 = """
name dave2
age 352
job teacher2
income 300002"""
file_3 = """
name dave3
age 353
job teacher3
income 300003"""
template = """
0 name
0 age
0 job
0 income"""
Assume that the above is read from the files
_dict = {}
def concat():
for cols in template.splitlines():
if cols:
_, col_name = cols.split()
_dict[col_name] = []
for each_file in [file_1, file_2, file_3]:
data = each_file.splitlines()
for line in data:
if line:
words = line.split()
_dict[words[0]].append(words[1])
_text = ""
for key in _dict:
_text += '\t'.join([key, '\t'.join(_dict[key]), '\n'])
return _text
print concat()
OUTPUT
job teacher1 teacher2 teacher3
age 351 352 353
name dave1 dave2 dave3
income 300001 300002 300003

python json to csv converting script?

Let me start by stating that I am new to python. I wrote a script that will convert a .json file to csv format. I managed to write a script to do the job, however I don't think that my script will work if the format of the json file was to change. My script assumes that the json file will be in the same format at all times.
<json file example>
{
"Order":
{
"order_id":"8251662",
"order_date":"2012-08-20 13:17:37",
"order_date_shipped":"0000-00-00 00:00:00",
"order_status":"fraudreview",
"order_ship_firstname":"pam",
"order_ship_lastname":"Gregorio",
"order_ship_address1":"1533 E. Dexter St",
"order_ship_address2":"",
"order_ship_city":"Covina",
"order_ship_state":"CA",
"order_ship_zip":"91746",
"order_ship_country":"US United States",
"order_ship_phone":"6268936923",
"order_ship_email":"pgregorio#brighton.com",
"order_bill_firstname":"pam",
"order_bill_lastname":"Gregorio",
"order_bill_address1":"1533 E. Dexter St",
"order_bill_address2":"",
"order_bill_city":"Covina",
"order_bill_state":"CA",
"order_bill_zip":"91746",
"order_bill_country":"US United States",
"order_bill_phone":"6268936923",
"order_bill_email":"pgregorio#brighton.com",
"order_gift_message":"",
"order_giftwrap":"0",
"order_gift_charge":"0",
"order_shipping":"Standard (Within 5-10 Business Days)",
"order_tax_charge":"62.83",
"order_tax_shipping":"0",
"order_tax_rate":"0.0875",
"order_shipping_charge":"7.5",
"order_total":"788.33",
"order_item_count":"12",
"order_tracking":"",
"order_carrier":"1"
},
"Items":
[
{
"item_id":"25379",
"item_date_shipped":"",
"item_code":"17345-J3553-J35532",
"item_quantity":"2","item_taxable":"YES",
"item_unit_price":"32","item_shipping":"0.67",
"item_addcharge_price":"0",
"item_description":" ABC Slide Bracelet: : Size: OS: Silver Sku: J35532",
"item_quantity_returned":"0",
"item_quantity_shipped":"0",
"item_quantity_canceled":"0",
"item_status":"pending",
"item_product_id":"17345",
"item_product_kit_id":"0",
"item_product_sku":"J35532",
"item_product_barcode":"881934310775",
"item_tracking":"",
"item_carrier":"0",
"item_source_orderid":""
},
{
"item_id":"25382",
"item_date_shipped":"",
"item_code":"17608-J3809-J3809C",
"item_quantity":"1",
"item_taxable":"YES",
"item_unit_price":"22",
"item_shipping":"0.23",
"item_addcharge_price":"0",
"item_description":" \"ABC Starter Bracelet 7 1\/4\"\"\": : Size: OS: Silver Sku: J3809C",
"item_quantity_returned":"0",
"item_quantity_shipped":"0",
"item_quantity_canceled":"0",
"item_status":"pending",
"item_product_id":"17608",
"item_product_kit_id":"0",
"item_product_sku":"J3809C",
"item_product_barcode":"881934594175",
"item_tracking":"",
"item_carrier":"0",
"item_source_orderid":""
},
{
"item_id":"25385",
"item_date_shipped":"",
"item_code":"17687-J9200-J92000",
"item_quantity":"2",
"item_taxable":"YES",
"item_unit_price":"12",
"item_shipping":"0.25",
"item_addcharge_price":"0",
"item_description":" ABC Cathedral Bead: : Size: OS: Silver Sku: J92000",
"item_quantity_returned":"0",
"item_quantity_shipped":"0",
"item_quantity_canceled":"0",
"item_status":"pending",
"item_product_id":"17687",
"item_product_kit_id":"0",
"item_product_sku":"J92000",
"item_product_barcode":"881934602832",
"item_tracking":"",
"item_carrier":"0",
"item_source_orderid":""
},
{
"item_id":"25388",
"item_date_shipped":"",
"item_code":"17766-J9240-J92402",
"item_quantity":"2",
"item_taxable":"YES",
"item_unit_price":"22",
"item_shipping":"0.46",
"item_addcharge_price":"0",
"item_description":" ABC Ice Diva Bead: : Size: OS: Silver Sku: J92402",
"item_quantity_returned":"0",
"item_quantity_shipped":"0",
"item_quantity_canceled":"0",
"item_status":"pending",
"item_product_id":"17766",
"item_product_kit_id":"0",
"item_product_sku":"J92402",
"item_product_barcode":"881934655838",
"item_tracking":"",
"item_carrier":"0",
"item_source_orderid":""
},
],
"FraudReasons":
[
{
"order_id":"11957",
"fraud_reason":"order total exceeds max amount"
},
{
"order_id":"11957",
"fraud_reason":"order exceeds max item count"
}
]
}
My script currently works fine with this json file but It wont work if there is only one item or one fraudreason. Here is the code to my script.
<script code>
#!/usr/bin/python
import simplejson as json
import optparse
import pycurl
import sys
import csv
json_data = open(file)
data = json.load(json_data)
json_data.close()
csv_file = '/tmp/' + str(options.orderId) + '.csv'
orders = data['Order']
items = data['Items']
frauds = data['FraudReasons']
o = csv.writer(open(csv_file, 'w'), lineterminator=',')
o.writerow([orders['order_id'],orders['order_date'],orders['order_date_shipped'],orders['order_status'],orders['order_ship_firstname'],orders['order_ship_lastname'],orders['order_ship_address1'],orders['order_ship_address2'],orders['order_ship_city'],orders['order_ship_state'],orders['order_ship_zip'],orders['order_ship_country'],orders['order_ship_phone'],orders['order_ship_email'],orders['order_bill_firstname'],orders['order_bill_lastname'],orders['order_bill_address1'],orders['order_bill_address2'],orders['order_bill_city'],orders['order_bill_state'],orders['order_bill_zip'],orders['order_bill_country'],orders['order_bill_phone'],orders['order_bill_email'],orders['order_gift_message'],orders['order_giftwrap'],orders['order_gift_charge'],orders['order_shipping'],orders['order_tax_charge'],orders['order_tax_shipping'],orders['order_tax_rate'],orders['order_shipping_charge'],orders['order_total'],orders['order_item_count'],orders['order_tracking'],orders['order_carrier']])
for item in items:
o.writerow([item['item_id'],item['item_date_shipped'],item['item_code'],item['item_quantity'],item['item_taxable'],item['item_unit_price'],item['item_shipping'],item['item_addcharge_price'],item['item_description'],item['item_quantity_returned'],item['item_quantity_shipped'],item['item_quantity_canceled'],item['item_status'],item['item_product_id'],item['item_product_kit_id'],item['item_product_sku'],item['item_product_barcode'],item['item_tracking'],item['item_carrier'],item['item_source_orderid']])
for fraud in frauds:
o.writerow([fraud['fraud_reason']],)
I also have not been able to figure out how not to use the labels I hope someone can help me with this
thanks in advance.
You may want to use csv.DictWriter:
# It's considered best to stash the main logic of your script
# in a main() function like this.
def main(filename, options):
with open(filename) as fi:
data = json.load(fi)
csv_file = '/tmp/' + str(options.orderId) + '.csv'
order = data['Order']
items = data['Items']
frauds = data['FraudReasons']
# Here's one way to keep this maintainable if the JSON
# format changes, and you don't care too much about the
# order of the fields...
orders_fields = sorted(orders.keys())
item_fields = sorted(items[0].keys()) if items else ()
fraud_fields = sorted(fraud[0].keys()) if fraud else ()
csv_options = dict(lineterminator=',')
with open(csv_file, 'w') as fo:
o = csv.DictWriter(fo, order_fields, **csv_options)
o.writeheader()
o.writerow(orders)
fo.write('\n') # Optional, if you want to keep them separated.
o = csv.DictWriter(fo, item_fields, **csv_options)
o.writeheader()
o.writerows(items)
fo.write('\n') # Optional, if you want to keep them separated.
o = csv.DictWriter(fo, fraud_fields, **csv_options)
o.writeheader()
o.writerows(frauds)
# If this script is run from the command line, just run
# main(). Here's the place to use `optparse`.
if __name__ == '__main__':
main(...) # You'll need to fill in the main() arguments...
If you need to specify the order of fields, assign them to a tuple like this:
orders_fields = (
'order_id',
'order_date',
'order_date_shipped',
# ... etc.
)
You should ask the json-generated object (data) for the names of the fields. To retain the input order, tell json to use collections.OrderedDict instead of plain dict (requires python 2.7):
import json
from collections import OrderedDict as ordereddict
data = json.loads(open('mydata.json', object_pairs_hook=ordereddict)
orders = data['Order']
print orders.keys() # Will print the keys in the order they were read
You can then use orders.keys() instead of your hard-coded list, either with writerow or (simpler) with csv.DictWriter.
Note that this uses the default json, not simplejson, and requires python 2.7 for the ordered_pairs_hook argument and the OrderedDict type.
Edit: Yeah, I see from the comments that you're stuck with 2.4. You can download an ordereddict from PyPi, and you can extend the JSONDecoder class and pass it with the cls argument (see here), instead of object_pairs_hook, but that's uglier and more work...

Categories

Resources