Custom parser for file similar to json python - python

I'm attempting to create a parser to translate a "custom" file into JSON so I can more easily manipulate its contents (For argument's sake, call the "custom" formate a .qwerty).
I've already created a Lexer which breaks down the file into individual lexemes (tokens) which structure is [token_type, token_value]. Now I am struggling to parse the lexemes into their correct dictionaries, as it is difficult to insert data into a sub-sub-dictionary since Keys aren't constant. As well as insert data into arrays stored in dictionaries.
It should be noted I am attempting to sequentially parse tokens into an actual python json object then dump the json object.
An example of the file can be seen below, along with what the end result is meant to resemble.
FILE: ABC.querty
Dict_abc_1{
Dict_abc_2{
HeaderGUID="";
Version_TPI="999";
EncryptionType="0";
}
Dict_abc_3{
FamilyName="John Doe";
}
Dict_abc_4{
Array_abc{
{TimeStamp="2018-11-07 01:00:00"; otherinfo="";}
{TimeStamp="2018-11-07 01:00:00"; otherinfo="";}
{TimeStamp="2018-11-07 01:00:00"; otherinfo="";}
{TimeStamp="2018-11-07 02:53:57"; otherinfo="";}
{TimeStamp="2018-11-07 02:53:57"; otherinfo="";}
}
Dict_abc_5{
LastContact="2018-11-08 01:00:00";
BatteryStatus=99;
BUStatus=PowerOn;
LastCallTime="2018-11-08 01:12:46";
LastSuccessPoll="2018-11-08 01:12:46";
CallResult=Successful;
}
}
}
Code=999999;
FILE: ABC.json
{
"Dict_abc_1":{
"Dict_abc_2":{
"HeaderGUID":"",
"Version_TPI":"999",
"EncryptionType":"0"
},
"Dict_abc_3":{
"FamilyName":"John Doe"
},
"Dict_abc_4":{
"Array_abc":[
{"TimeStamp":"2018-11-07 01:00:00", "otherinfo":""},
{"TimeStamp":"2018-11-07 01:00:00", "otherinfo":""},
{"TimeStamp":"2018-11-07 01:00:00", "otherinfo":""},
{"TimeStamp":"2018-11-07 02:53:57", "otherinfo":""},
{"TimeStamp":"2018-11-07 02:53:57", "otherinfo":""}
],
"Dict_abc_5":{
"LastContact":"2018-11-08 01:00:00",
"BatteryStatus":99,
"BUStatus":"PowerOn",
"LastCallTime":"2018-11-08 01:12:46",
"LastSuccessPoll":"2018-11-08 01:12:46",
"CallResult":"Successful"
}
}
},
"Code":999999
}
Additional token information,
Token types can either be (with possible values)
IDENTIFIER contain the name of the variable identifier
VARIABLE containing actual data belonging to the parent IDENTIFIER
OPERATOR equal "="
OPEN_BRACKET equal "{"
CLOSE_BRACKET equal "}"
An example of ABC.querty's lexemes can be seen HERE
fundamental logical extract of main.py
def main():
content = open_file(file_name) ## read file
lexer = Lexer(content) ## create lexer class
tokens = lexer.tokenize() ## create lexems as seen in pastebin
parser = Parser(tokens).parse() ## create parser class given tokens
print(json.dumps(parser, sort_keys=True,indent=4, separators=(',', ': ')))
parser.py
import re
class Parser(object):
def __init__(self, tokens):
self.tokens = tokens
self.token_index = 0
self.json_object = {}
self.current_object = {}
self.path = [self.json_object]
def parse(self):
while self.token_index < len(self.tokens):
token = self.getToken()
token_type = token[0]
token_value = token[1]
print("%s \t %s" % (token_type, token_value))
if token_type in "IDENTIFIER":
self.increment()
identifier_type = self.getToken()
if identifier_type[0] in "OPEN_BRACKET":
identifier_two_type = self.getToken(1)
if identifier_two_type[0] in ["OPERATOR","IDENTIFIER"]:
## make dict in current dict
pass
elif identifier_two_type[0] in "OPEN_BRACKET":
## make array in current dict
pass
elif identifier_type[0] in "OPERATOR":
## insert data into current dict
pass
if token_type in "CLOSE_BRACKET":
identifier_type = self.getToken()
if "OPEN_BRACKET" in identifier_type[0]:
#still in array of current dict
pass
elif "IDENTIFIER" in identifier_type[0]:
self.changeDirectory()
else:
#end script
pass
self.increment()
print(self.path)
return self.json_object
def changeDirectory(self):
if len(self.path) > 0:
self.path = self.path.pop()
self.current_object = -1
def increment(self):
if self.token_index < len(self.tokens):
self.token_index+=1
def getToken(self, x=0):
return self.tokens[self.token_index+x]
Additional parse information,
Currently, I was trying to store the current dictionary in a path array to allow me to insert into dictionaries and arrays within dictionaries.
Any suggestions or solutions are very much appreciated,
Thanks.

Last time I solved this problem I find out that finite-state machine is very helpful. I want to recommend the way after you have tokens but I don't know how it's called in english. The principle is: you go through tokens and add one by one on stack. After adding on stack you are checking stack for some rules. Like you combine primitive tokens into expressions that might be a part of more complex expressions.
For example "FamilyName":"John Doe". Tokens are "FamilyName", : and "John Doe".
You add first token on stack.
stack = ["FamilyName"].
Rule 1: str_obj -> E. So you create Expression(type='str', value="FamilyName") and stack is now stack = [Expression].
Then you add next token.
stack = [Expression, ':']. No rules for ':'. Go next.
stack = [Expression, ':', "FamilyName"]. Again we meet rule 1. So stack becomes stack = [Expression, ':', Expression]. Then we see another rule. Rule 2: E:E -> E. Use it like Expression(type='kv_pair, value=(Expression, Expression)). And stack becomes stack=[Expression].
And if you describes all the rules it will work like that. Hope it helps.

Related

Python parse JSON object and ask for selection of values

I am new to programming in general. I am trying to make a Python script that helps with making part numbers. Here, for an example, computer memory modules.
I have a python script that needs to read the bmt object from a JSON file. Then ask the user to select from it then append the value to a string.
{"common": {
"bmt": {
"DR1": "DDR",
"DR2": "DDR2",
"DR3": "DDR3",
"DR4": "DDR4",
"DR5": "DDR5",
"DR6": "DDR6",
"FER": "FeRAM",
"GD1": "GDDR",
"GD2": "GDDR2",
"GD3": "GDDR3",
"GD4": "GDDR4",
"GD5": "GDDR5",
"GX5": "GDDR5X",
"GD6": "GDDR6",
"GX6": "GDDR6X",
"LP1": "LPDDR",
"LP2": "LPDDR2",
"LP3": "LPDDR3",
"LP4": "LPDDR4",
"LX4": "LPDDR4X",
"LP5": "LPDDR5",
"MLP": "mDDR",
"OPX": "Intel Optane/Micron 3D XPoint",
"SRM": "SRAM",
"WRM": "WRAM"
},
"modtype": {
"CM": "Custom Module",
"DD": "DIMM",
"MD": "MicroDIMM",
"SD": "SODIMM",
"SM": "SIMM",
"SP": "SIPP",
"UD": "UniDIMM"
}
}}
Example: There is a string called "code" that already has the value "MMD". The script asks the user what to select from the listed values, (e.g. "DR1"). If a selection is made (user enters "DR1", it appends that value to the string, the new value would be "MMDDR1".
This code is to print the JSON. This is how far I have gotten
def enc(code):
memdjson = json.loads("memd.json")
print(memdjson)
How do I do this?
Repo with the rest of the code is here: https://github.com/CrazyblocksTechnologies/PyCIPN
Try:
import json
def enc(code):
memdjson = json.load(open("memd.json"))["common"]["bmt"]
selected = input("Select a value from the following: \n{}\n\n".format(' '.join(memdjson.keys())))
return code+memdjson[selected]
import json
import inquirer
with open(path_to_json_file) as f:
file = json.load(f)
code = "MMD"
questions = [
inquirer.List('input',
message="Select one of the following:",
choices=list(file["common"]["bmt"].keys()),
),
]
answer = inquirer.prompt(questions)
code += answer

A regex for CSV parsing? Python3 Re module

Is there a regex (Python re compatible) that I can use for parsing csv?
EDIT: I didn't realize there was a csv module in Python's standard library
Here's the regex: (?<!,\"\w)\s*,(?!\w\s*\",). It's python compatible and JavaScript compatible. Here's the full parsing script (as a python function):
def parseCSV(csvDoc, output_type="dict"):
from re import compile as c
from json import dumps
from numpy import array
# This is where all the parsing happens
"""
To parse csv files.
Arguments:
csvDoc - The csv document to parse.
output_type - the output type this
function will return
"""
csvparser = c('(?<!,\"\\w)\\s*,(?!\\w\\s*\",)')
lines = str(csvDoc).split('\n')
# All the lines are not empty
necessary_lines = [line for line in lines if line != ""]
All = array([csvparser.split(line) for line in necessary_lines])
if output_type.lower() in ("dict", "json"): # If you want JSON or dict
# All the python dict keys required (At the top of the file or top row)
top_line = list(All[0])
main_table = {} # The parsed data will be here
main_table[top_line[0]] = {
name[0]: {
thing: name[
# The 'actual value' counterpart
top_line.index(thing)
] for thing in top_line[1:] # The requirements
} for name in All[1:]
}
return dumps(main_table, skipkeys=True, ensure_ascii=False, indent=1)
elif output_type.lower() in ("list",
"numpy",
"array",
"matrix",
"np.array",
"np.ndarray",
"numpy.array",
"numpy.ndarray"):
return All
else:
# All the python dict keys required (At the top of the file or top row)
top_line = list(All[0])
main_table = {} # The parsed data will be here
main_table[top_line[0]] = {
name[0]: {
thing: name[
# The 'actual value' counterpart
top_line.index(thing)
] for thing in top_line[1:] # The requirements
} for name in All[1:]
}
return dumps(main_table, skipkeys=True, ensure_ascii=False, indent=1)
Dependancies: NumPy
All you need to do is chuck in the raw text of the csv file and then the function will return a json (or a 2-dimension list if you wish) in this format:
{"top-left-corner name":{
"foo":{"Item 1 left to foo":"Item 2 of the top row",
"Item 2 left to foo":"Item 3 of the top row",
...}
"bar":{...}
}
}
And here's an example of it:
CSV.csv
foo,bar,zbar
foo_row,foo1,,
barie,"2,000",,
and it outputs:
{
"foo": {
"foo_row": {
"bar": "foo1",
"zbar": ""
},
"barie": {
"bar": "\"2,000\"",
"zbar": ""
}
}
}
It should work if your csv file is formatted correctly (The ones I tested was made by apple's Numbers)

Dictionary from a String with particular structure

I am using python 3 to read this file and convert it to a dictionary.
I have this string from a file and I would like to know how could be possible to create a dictionary from it.
[User]
Date=10/26/2003
Time=09:01:01 AM
User=teodor
UserText=Max Cor
UserTextUnicode=392039n9dj90j32
[System]
Type=Absolute
Dnumber=QS236
Software=1.1.1.2
BuildNr=0923875
Source=LAM
Column=OWKD
[Build]
StageX=12345
Spotter=2
ApertureX=0.0098743
ApertureY=0.2431899
ShiftXYZ=-4.234809e-002
[Text]
Text=Here is the Text files
DataBaseNumber=The database number is 918723
..... (There are more than 1000 lines per file) ...
On the text I have "Name=Something" and then I would like to convert it as follows:
{'Date':'10/26/2003',
'Time':'09:01:01 AM'
'User':'teodor'
'UserText':'Max Cor'
'UserTextUnicode':'392039n9dj90j32'.......}
The word between [ ] can be removed, like [User], [System], [Build], [Text], etc...
In some fields there is only the first part of the string:
[Colors]
Red=
Blue=
Yellow=
DarkBlue=
What you have is an ordinary properties file. You can use this example to read the values into map:
try (InputStream input = new FileInputStream("your_file_path")) {
Properties prop = new Properties();
prop.load(input);
// prop.getProperty("User") == "teodor"
} catch (IOException ex) {
ex.printStackTrace();
}
EDIT:
For Python solution, refer to the answerred question.
You can use configparser to read .ini, or .properties files (format you have).
import configparser
config = configparser.ConfigParser()
config.read('your_file_path')
# config['User'] == {'Date': '10/26/2003', 'Time': '09:01:01 AM'...}
# config['User']['User'] == 'teodor'
# config['System'] == {'Type': 'Abosulte', ...}
Can easily be done in python. Assuming your file is named test.txt.
This will also work for lines with nothing after the = as well as lines with multiple =.
d = {}
with open('test.txt', 'r') as f:
for line in f:
line = line.strip() # Remove any space or newline characters
parts = line.split('=') # Split around the `=`
if len(parts) > 1:
d[parts[0]] = ''.join(parts[1:])
print(d)
Output:
{
"Date": "10/26/2003",
"Time": "09:01:01 AM",
"User": "teodor",
"UserText": "Max Cor",
"UserTextUnicode": "392039n9dj90j32",
"Type": "Absolute",
"Dnumber": "QS236",
"Software": "1.1.1.2",
"BuildNr": "0923875",
"Source": "LAM",
"Column": "OWKD",
"StageX": "12345",
"Spotter": "2",
"ApertureX": "0.0098743",
"ApertureY": "0.2431899",
"ShiftXYZ": "-4.234809e-002",
"Text": "Here is the Text files",
"DataBaseNumber": "The database number is 918723"
}
I would suggest to do some cleaning to get rid of the [] lines.
After that you can split those lines by the "=" separator and then convert it to a dictionary.

how can i take a specific element from this list in python?

I'm working with the Microsoft Azure face API and I want to get only the glasses response.
heres my code:
########### Python 3.6 #############
import http.client, urllib.request, urllib.parse, urllib.error, base64, requests, json
###############################################
#### Update or verify the following values. ###
###############################################
# Replace the subscription_key string value with your valid subscription key.
subscription_key = '(MY SUBSCRIPTION KEY)'
# Replace or verify the region.
#
# You must use the same region in your REST API call as you used to obtain your subscription keys.
# For example, if you obtained your subscription keys from the westus region, replace
# "westcentralus" in the URI below with "westus".
#
# NOTE: Free trial subscription keys are generated in the westcentralus region, so if you are using
# a free trial subscription key, you should not need to change this region.
uri_base = 'https://westcentralus.api.cognitive.microsoft.com'
# Request headers.
headers = {
'Content-Type': 'application/json',
'Ocp-Apim-Subscription-Key': subscription_key,
}
# Request parameters.
params = {
'returnFaceAttributes': 'glasses',
}
# Body. The URL of a JPEG image to analyze.
body = {'url': 'https://upload.wikimedia.org/wikipedia/commons/c/c3/RH_Louise_Lillian_Gish.jpg'}
try:
# Execute the REST API call and get the response.
response = requests.request('POST', uri_base + '/face/v1.0/detect', json=body, data=None, headers= headers, params=params)
print ('Response:')
parsed = json.loads(response.text)
info = (json.dumps(parsed, sort_keys=True, indent=2))
print(info)
except Exception as e:
print('Error:')
print(e)
and it returns a list like this:
[
{
"faceAttributes": {
"glasses": "NoGlasses"
},
"faceId": "0f0a985e-8998-4c01-93b6-8ef4bb565cf6",
"faceRectangle": {
"height": 162,
"left": 177,
"top": 131,
"width": 162
}
}
]
I want just the glasses attribute so it would just return either "Glasses" or "NoGlasses"
Thanks for any help in advance!
I think you're printing the whole response, when really you want to drill down and get elements inside it. Try this:
print(info[0]["faceAttributes"]["glasses"])
I'm not sure how the API works so I don't know what your specified params are actually doing, but this should work on this end.
EDIT: Thank you to #Nuageux for noting that this is indeed an array, and you will have to specify that the first object is the one you want.
I guess that you can get few elements in that list, so you could do this:
info = [
{
"faceAttributes": {
"glasses": "NoGlasses"
},
"faceId": "0f0a985e-8998-4c01-93b6-8ef4bb565cf6",
"faceRectangle": {
"height": 162,
"left": 177,
"top": 131,
"width": 162
}
}
]
for item in info:
print (item["faceAttributes"]["glasses"])
>>> 'NoGlasses'
Did you try:
glasses = parsed[0]['faceAttributes']['glasses']
This looks more like a dictionary than a list. Dictionaries are defined using the { key: value } syntax, and can be referenced by the value for their key. In your code, you have faceAttributes as a key that for value contains another dictionary with a key glasses leading to the last value that you want.
Your info object is a list with one element: a dictionary. So in order to get at the values in that dictionary, you'll need to tell the list where the dictionary is (at the head of the list, so info[0]).
So your reference syntax will be:
#If you want to store it in a variable, like glass_var
glass_var = info[0]["faceAttributes"]["glasses"]
#Or if you want to print it directly
print(info[0]["faceAttributes"]["glasses"])
What's going on here? info[0] is the dictionary containing several keys, including faceAttributes,faceId and faceRectangle. faceRectangle and faceAttributes are both dictionaries in themselves with more keys, which you can reference to get their values.
Your printed tree there is showing all the keys and values of your dictionary, so you can reference any part of your dictionary using the right keys:
print(info["faceId"]) #prints "0f0a985e-8998-4c01-93b6-8ef4bb565cf6"
print(info["faceRectangle"]["left"]) #prints 177
print(info["faceRectangle"]["width"]) #prints 162
If you have multiple entries in your info list, then you'll have multiple dictionaries, and you can get all the outputs as so:
for entry in info: #Note: "entry" is just a variable name,
# this can be any name you want. Every
# iteration of entry is one of the
# dictionaries in info.
print(entry["faceAttributes"]["glasses"])
Edit: I didn't see that info was a list of a dictionary, adapted for that fact.

python json to csv converting script?

Let me start by stating that I am new to python. I wrote a script that will convert a .json file to csv format. I managed to write a script to do the job, however I don't think that my script will work if the format of the json file was to change. My script assumes that the json file will be in the same format at all times.
<json file example>
{
"Order":
{
"order_id":"8251662",
"order_date":"2012-08-20 13:17:37",
"order_date_shipped":"0000-00-00 00:00:00",
"order_status":"fraudreview",
"order_ship_firstname":"pam",
"order_ship_lastname":"Gregorio",
"order_ship_address1":"1533 E. Dexter St",
"order_ship_address2":"",
"order_ship_city":"Covina",
"order_ship_state":"CA",
"order_ship_zip":"91746",
"order_ship_country":"US United States",
"order_ship_phone":"6268936923",
"order_ship_email":"pgregorio#brighton.com",
"order_bill_firstname":"pam",
"order_bill_lastname":"Gregorio",
"order_bill_address1":"1533 E. Dexter St",
"order_bill_address2":"",
"order_bill_city":"Covina",
"order_bill_state":"CA",
"order_bill_zip":"91746",
"order_bill_country":"US United States",
"order_bill_phone":"6268936923",
"order_bill_email":"pgregorio#brighton.com",
"order_gift_message":"",
"order_giftwrap":"0",
"order_gift_charge":"0",
"order_shipping":"Standard (Within 5-10 Business Days)",
"order_tax_charge":"62.83",
"order_tax_shipping":"0",
"order_tax_rate":"0.0875",
"order_shipping_charge":"7.5",
"order_total":"788.33",
"order_item_count":"12",
"order_tracking":"",
"order_carrier":"1"
},
"Items":
[
{
"item_id":"25379",
"item_date_shipped":"",
"item_code":"17345-J3553-J35532",
"item_quantity":"2","item_taxable":"YES",
"item_unit_price":"32","item_shipping":"0.67",
"item_addcharge_price":"0",
"item_description":" ABC Slide Bracelet: : Size: OS: Silver Sku: J35532",
"item_quantity_returned":"0",
"item_quantity_shipped":"0",
"item_quantity_canceled":"0",
"item_status":"pending",
"item_product_id":"17345",
"item_product_kit_id":"0",
"item_product_sku":"J35532",
"item_product_barcode":"881934310775",
"item_tracking":"",
"item_carrier":"0",
"item_source_orderid":""
},
{
"item_id":"25382",
"item_date_shipped":"",
"item_code":"17608-J3809-J3809C",
"item_quantity":"1",
"item_taxable":"YES",
"item_unit_price":"22",
"item_shipping":"0.23",
"item_addcharge_price":"0",
"item_description":" \"ABC Starter Bracelet 7 1\/4\"\"\": : Size: OS: Silver Sku: J3809C",
"item_quantity_returned":"0",
"item_quantity_shipped":"0",
"item_quantity_canceled":"0",
"item_status":"pending",
"item_product_id":"17608",
"item_product_kit_id":"0",
"item_product_sku":"J3809C",
"item_product_barcode":"881934594175",
"item_tracking":"",
"item_carrier":"0",
"item_source_orderid":""
},
{
"item_id":"25385",
"item_date_shipped":"",
"item_code":"17687-J9200-J92000",
"item_quantity":"2",
"item_taxable":"YES",
"item_unit_price":"12",
"item_shipping":"0.25",
"item_addcharge_price":"0",
"item_description":" ABC Cathedral Bead: : Size: OS: Silver Sku: J92000",
"item_quantity_returned":"0",
"item_quantity_shipped":"0",
"item_quantity_canceled":"0",
"item_status":"pending",
"item_product_id":"17687",
"item_product_kit_id":"0",
"item_product_sku":"J92000",
"item_product_barcode":"881934602832",
"item_tracking":"",
"item_carrier":"0",
"item_source_orderid":""
},
{
"item_id":"25388",
"item_date_shipped":"",
"item_code":"17766-J9240-J92402",
"item_quantity":"2",
"item_taxable":"YES",
"item_unit_price":"22",
"item_shipping":"0.46",
"item_addcharge_price":"0",
"item_description":" ABC Ice Diva Bead: : Size: OS: Silver Sku: J92402",
"item_quantity_returned":"0",
"item_quantity_shipped":"0",
"item_quantity_canceled":"0",
"item_status":"pending",
"item_product_id":"17766",
"item_product_kit_id":"0",
"item_product_sku":"J92402",
"item_product_barcode":"881934655838",
"item_tracking":"",
"item_carrier":"0",
"item_source_orderid":""
},
],
"FraudReasons":
[
{
"order_id":"11957",
"fraud_reason":"order total exceeds max amount"
},
{
"order_id":"11957",
"fraud_reason":"order exceeds max item count"
}
]
}
My script currently works fine with this json file but It wont work if there is only one item or one fraudreason. Here is the code to my script.
<script code>
#!/usr/bin/python
import simplejson as json
import optparse
import pycurl
import sys
import csv
json_data = open(file)
data = json.load(json_data)
json_data.close()
csv_file = '/tmp/' + str(options.orderId) + '.csv'
orders = data['Order']
items = data['Items']
frauds = data['FraudReasons']
o = csv.writer(open(csv_file, 'w'), lineterminator=',')
o.writerow([orders['order_id'],orders['order_date'],orders['order_date_shipped'],orders['order_status'],orders['order_ship_firstname'],orders['order_ship_lastname'],orders['order_ship_address1'],orders['order_ship_address2'],orders['order_ship_city'],orders['order_ship_state'],orders['order_ship_zip'],orders['order_ship_country'],orders['order_ship_phone'],orders['order_ship_email'],orders['order_bill_firstname'],orders['order_bill_lastname'],orders['order_bill_address1'],orders['order_bill_address2'],orders['order_bill_city'],orders['order_bill_state'],orders['order_bill_zip'],orders['order_bill_country'],orders['order_bill_phone'],orders['order_bill_email'],orders['order_gift_message'],orders['order_giftwrap'],orders['order_gift_charge'],orders['order_shipping'],orders['order_tax_charge'],orders['order_tax_shipping'],orders['order_tax_rate'],orders['order_shipping_charge'],orders['order_total'],orders['order_item_count'],orders['order_tracking'],orders['order_carrier']])
for item in items:
o.writerow([item['item_id'],item['item_date_shipped'],item['item_code'],item['item_quantity'],item['item_taxable'],item['item_unit_price'],item['item_shipping'],item['item_addcharge_price'],item['item_description'],item['item_quantity_returned'],item['item_quantity_shipped'],item['item_quantity_canceled'],item['item_status'],item['item_product_id'],item['item_product_kit_id'],item['item_product_sku'],item['item_product_barcode'],item['item_tracking'],item['item_carrier'],item['item_source_orderid']])
for fraud in frauds:
o.writerow([fraud['fraud_reason']],)
I also have not been able to figure out how not to use the labels I hope someone can help me with this
thanks in advance.
You may want to use csv.DictWriter:
# It's considered best to stash the main logic of your script
# in a main() function like this.
def main(filename, options):
with open(filename) as fi:
data = json.load(fi)
csv_file = '/tmp/' + str(options.orderId) + '.csv'
order = data['Order']
items = data['Items']
frauds = data['FraudReasons']
# Here's one way to keep this maintainable if the JSON
# format changes, and you don't care too much about the
# order of the fields...
orders_fields = sorted(orders.keys())
item_fields = sorted(items[0].keys()) if items else ()
fraud_fields = sorted(fraud[0].keys()) if fraud else ()
csv_options = dict(lineterminator=',')
with open(csv_file, 'w') as fo:
o = csv.DictWriter(fo, order_fields, **csv_options)
o.writeheader()
o.writerow(orders)
fo.write('\n') # Optional, if you want to keep them separated.
o = csv.DictWriter(fo, item_fields, **csv_options)
o.writeheader()
o.writerows(items)
fo.write('\n') # Optional, if you want to keep them separated.
o = csv.DictWriter(fo, fraud_fields, **csv_options)
o.writeheader()
o.writerows(frauds)
# If this script is run from the command line, just run
# main(). Here's the place to use `optparse`.
if __name__ == '__main__':
main(...) # You'll need to fill in the main() arguments...
If you need to specify the order of fields, assign them to a tuple like this:
orders_fields = (
'order_id',
'order_date',
'order_date_shipped',
# ... etc.
)
You should ask the json-generated object (data) for the names of the fields. To retain the input order, tell json to use collections.OrderedDict instead of plain dict (requires python 2.7):
import json
from collections import OrderedDict as ordereddict
data = json.loads(open('mydata.json', object_pairs_hook=ordereddict)
orders = data['Order']
print orders.keys() # Will print the keys in the order they were read
You can then use orders.keys() instead of your hard-coded list, either with writerow or (simpler) with csv.DictWriter.
Note that this uses the default json, not simplejson, and requires python 2.7 for the ordered_pairs_hook argument and the OrderedDict type.
Edit: Yeah, I see from the comments that you're stuck with 2.4. You can download an ordereddict from PyPi, and you can extend the JSONDecoder class and pass it with the cls argument (see here), instead of object_pairs_hook, but that's uglier and more work...

Categories

Resources