Open a JSON files and edit structure - python

I have produced a couple of json files after scraping a few elements. The structure for each file is as follows:
us.json
{'Pres': 'Biden', 'Vice': 'Harris', 'Secretary': 'Blinken'}
uk.json
{'1st Min': 'Johnson', 'Queen':'Elizabeth', 'Prince': 'Charles'}
I'd like to know how I could edit the structure of each dictionary inside the json file to get an output as it follows:
[
{"title": "Pres",
"name": "Biden"}
,
{"title": "Vice",
"name": "Harris"}
,
{"title": "Secretary",
"name": "Blinken"}
]
As far as I am able to think how to do it (I'm a beginner, studying only since a few weeks) I need first to run a loop to open each file, then I should generate a list of dictionaries and finally modify the dictionary to change the structure. This is what I got NOT WORKING as it overrides always with the same keys.
import os
import json
list_of_dicts = []
for filename in os.listdir("DOCS/Countries Data"):
with open(os.path.join("DOCS/Countries Data", filename), 'r', encoding='utf-8') as f:
text = f.read()
country_json = json.loads(text)
list_of_dicts.append(country_json)
for country in list_of_dicts:
newdict = country
lastdict = {}
for key in newdict:
lastdict = {'Title': key}
for value in newdict.values():
lastdict['Name'] = value
print(lastdict)
Extra bonus if you could also show me how to generate an ID mumber for each entry. Thank you very much

This look like task for list comprehension, I would do it following way
import json
us = '{"Pres": "Biden", "Vice": "Harris", "Secretary": "Blinken"}'
data = json.loads(us)
us2 = [{"title":k,"name":v} for k,v in data.items()]
us2json = json.dumps(us2)
print(us2json)
output
[{"title": "Pres", "name": "Biden"}, {"title": "Vice", "name": "Harris"}, {"title": "Secretary", "name": "Blinken"}]
data is dict, .items() provide key-value pairs, which I unpack into k and v (see tuple unpacking).

You can do this easily by writing a simple function like below
import uuid
def format_dict(data: dict):
return [dict(title=title, name=name, id=str(uuid.uuid4())) for title, name in data.items()]
where you can split the items as different objects and add a identifier for each using uuid.
Full code can be modified like this
import uuid
import os
import json
def format_dict(data: dict):
return [dict(title=title, name=name, id=str(uuid.uuid4())) for title, name in data.items()]
list_of_dicts = []
for filename in os.listdir("DOCS/Countries Data"):
with open(os.path.join("DOCS/Countries Data", filename), 'r', encoding='utf-8') as f:
country_json = json.load(f)
list_of_dicts.append(format_dict(country_json))
# list_of_dicts contains all file contents

Related

python get data json value max

How can I extract the T3 Period, Year and maximum value?
file.json
[
{"Fecha":"2022-08-01T00:00:00.000+02:00", "T3_TipoDato":"Avance", "T3_Periodo":"M08", "Anyo":2022, "value":10.4},
{"Fecha":"2022-07-01T00:00:00.000+02:00", "T3_TipoDato":"Definitivo", "T3_Periodo":"M07", "Anyo":2022, "value":10.8},
{"Fecha":"2022-06-01T00:00:00.000+02:00", "T3_TipoDato":"Definitivo", "T3_Periodo":"M06", "Anyo":2022, "value":10.2}
]
My code:
import json
with open("file.json") as f:
distros_dict = json.load(f)
print (distros_dict)
that is my proposition.
Load data from a file to a list.
Loop thru every dict in a list to edit it.
(At my example I, deleted two keys from every dict in list.)
import json
distros_dict = []
with open(f'file.json', "r", encoding='utf-8') as f:
distros_dict.extend(json.load(f))
for item in distros_dict:
item.pop('Fecha')
item.pop('T3_TipoDato')
distros_dict = sorted(distros_dict, key = lambda i: i['value'], reverse=True)[0]
Try this:
from json import load
with open("file.json") as f:
dictionary_max = max(load(f), key=lambda x: x["value"])
result = {
"T3_Periodo": dictionary_max["T3_Periodo"],
"Anyo": dictionary_max["Anyo"],
"value": dictionary_max["value"],
}
print(result)
output:
{'T3_Periodo': 'M07', 'Anyo': 2022, 'value': 10.8}

json to dictionary in python

This is my json file input.
{"Report":{"id":101,"type":"typeA","Replist":[{"rptid":"r001","subrpt":{"subid":74,"subname":"name1","subval":113},"RelsubList":[{"Relid":8,"Relsubdetails":{"Rel_subname":"name8","Rel_Subval":65}},{"Relid":5,"Relsubdetails":{"Rel_subname":"name5","Rel_Subval":40}}],"fldA":30,"fldB":23}]}}
...
I am writing python program to convert the input into the below format in my dictionary.
I am new to python.
Expected output:
out: {"id": "101", "type": "typeA", "rptid": "r001", "subrpt_subid": "74", "subrpt_subname": "name1", "subrpt_subval":"113","Relid":"8","Rel_subname":"name8","Rel_Subval":"65","Relid":"5","Rel_subname":"name5","Rel_Subval":"40","fldA":"30","fldB":"23"
I used the following logic to convert the output till subrpt.
Current output:
out: {'id': '101', 'type': 'typeA', 'rptid': 'r001', 'subrpt_subid': '74', 'subrpt_subname': 'name1', 'subrpt_subval': '113'}
But I am struggling to get the logic of RelsubList(it looks like it has both list and dictionary[{}] ).
please help me to get the logic for the same.
import json
list1 = []
dict1 = {}
dict2 = {}
data_file = "samp1.json"
file = open(data_file)
for line in file:
json_line = json.loads(line)
json_line = json_line["Report"]
dict1["id"]=str(json_line["id"])
dict1["type"] = str(json_line["type"])
json_line = json_line["Replist"]
dict1["rptid"]= str(json_line[0]["rptid"])
dict1["subrpt_subid"] = str(json_line[0]["subrpt"]["subid"])
dict1["subrpt_subname"] = str(json_line[0]["subrpt"]["subname"])
dict1["subrpt_subval"] = str(json_line[0]["subrpt"]["subval"])
print("out:", dict1)
Some of your logic is confusing to me, i.e. why are you doing json.loads(line) in every loop?
Anyway, the following should get you the logic for RealsubList:
import json
f = open("data.json")
data = json.load(f)
for line in data:
relsublist = data["Report"]["Replist"][0]["RelsubList"]
print(relsublist)
Results in:
[{'Relid': 8, 'Relsubdetails': {'Rel_subname': 'name8', 'Rel_Subval': 65}}, {'Relid': 5, 'Relsubdetails': {'Rel_subname': 'name5', 'Rel_Subval': 40}}]
The reason for the [0] index after ["Replist"] is Replist contains an array of nested dictionaries, so you need to call it out by index. In this case its only a single array, so it would be 0

Python : Normalize Json response (array)

I am new to JSON and Python,I am trying to achieve below
Need to parse below JSON
{
"id": "12345abc",
"codes": [
"BSVN1FKW3JKKNNMN",
"HJYYUKJJL999OJR",
"DFTTHJJJJ0099JUU",
"FGUUKHKJHJGJJYGJ"
],
"ctr": {
"source": "xyz",
"user_id": "1234"
}
}
Expected output:Normalized on "codes" value
ID~CODES~USER_ID
12345abc~BSVN1FKW3JKKNNMN~1234
12345abc~HJYYUKJJL999OJR~1234
12345abc~DFTTHJJJJ0099JUU~1234
12345abc~FGUUKHKJHJGJJYGJ~1234
Started with below ,but need help to get to my desired output.
The "codes" block can have n number of values separated by comma.
The below code is throwing an error "TypeError: string indices must be integers"
#!/usr/bin/python
import os
import json
import csv
f = open('rspns.csv','w')
writer = csv.writer(f,delimiter = '~')
headers = [‘ID’,’CODES’,’USER_ID’]
default = ''
writer.writerow(headers)
string = open('sample.json').read().decode('utf-8')
json_obj = json.loads(string)
#print json_obj['id']
#print json_obj['codes']
#print json_obj['codes'][0]
#print json_obj['codes'][1]
#print json_obj['codes’][2]
#print json_obj['codes’][3]
#print json_obj['ctr’][‘user_id']
for keyword in json_obj:
row = []
row.append(str(keyword['id']))
row.append(str(keyword['codes']))
row.append(str(keyword['ctr’][‘user_id']))
writer.writerow(row)
If your json_obj looks exactly like that , that is it is a dictionary, then the issue is that when you do -
for keyword in json_obj:
You are iterating over keys in json_obj, then if you try to access ['id'] for that key it should error out saying string indices must be integers .
You should first get the id and user_id before looping and then loop over json_obj['codes'] and then add the previously computed id and user_id along with the current value from codes list to the writer csv as a row.
Example -
import json
import csv
string = open('sample.json').read().decode('utf-8')
json_obj = json.loads(string)
with open('rspns.csv','w') as f:
writer = csv.writer(f,delimiter = '~')
headers = ['ID','CODES','USER_ID']
writer.writerow(headers)
id = json_obj['id']
user_id = json_obj['ctr']['user_id']
for code in json_obj['codes']:
writer.writerow([id,code,user_id])
You don't want to iterate through json_obj as that is a dictionary and iterating through will get the keys. The TypeError is caused by trying to index into the keys ('id', 'code', and 'ctr') -- which are strings -- as if they were a dictionary.
Instead, you want a separate row for each code in json_obj['codes'] and to use the json_obj dictionary for your lookups:
for code in json_obj['codes']:
row = []
row.append(json_obj['id'])
row.append(code)
row.append(json_obj['ctr’][‘user_id'])
writer.writerow(row)

Ordering with default dict append

I have a Python code, see below, which takes a JSON file in the structure:
{
"name":"Winking Entertainment",
"imports":"Translink Capital"
},
{
"name":"Wochacha",
"imports":"Sequoia Capital"
},
{
"name":"Wuhan Kindstar Diagnostics",
"imports":"Baird Venture Partners"
},
And aggregates repeat values in "imports" and turns the matching strings into a single array for that entry. (see snippet below)
import json
from collections import defaultdict
def map_names_to_imports(raw_data):
name_to_imports = defaultdict(list)
for row in raw_data:
name_to_imports[row['imports']].append(row['name'])
return name_to_imports
def reformat(name_to_imports):
output = []
for name, imports in name_to_imports.items():
new_dict = {
'name': name,
'imports': list(set(imports))
}
output.append(new_dict)
return output
def run(raw_data):
name_to_imports = map_names_to_imports(raw_data)
output = reformat(name_to_imports)
with open('clean-data2.json','wb') as f:
f.write(json.dumps(output))
if __name__ == '__main__':
raw_data = json.load(open('bricinvestors.json'))
run(raw_data)
The issue I am having is my Json file is not coming out the right way.
For some reason, name and imports are getting reversed. So my output looks like:
{"imports": ["SinoHub"], "name": "Iroquois Capital"}, {"imports": ["Qunar.com", "Lashou.com"], "name": "Tenaya Capital"}
In fact, I want to keep the {"name": "string", "imports": "string"} format -- and not the other way around.
What should I do?
Thanks.
If you're using Python 2.7+, you could use collections.OrderedDict as your input to json.loads(), instead of the standard Python dict. The standard library dict class doesn't guarantee the ordering of keys.
Building on dano's answer, you could use the OrderedDict.setdefault method instead of using a defaultdict:
import json
import collections
OrderedDict = collections.OrderedDict
def map_names_to_imports(raw_data):
name_to_imports = OrderedDict()
for row in raw_data:
name_to_imports.setdefault(row['imports'], []).append(row['name'])
return name_to_imports
def reformat(name_to_imports):
output = []
for name, imports in name_to_imports.items():
new_dict = OrderedDict([('name', name),
('imports', list(set(imports)))])
output.append(new_dict)
return output
def run(raw_data):
name_to_imports = map_names_to_imports(raw_data)
output = reformat(name_to_imports)
with open('clean-data2.json', 'wb') as f:
f.write(json.dumps(output))
if __name__ == '__main__':
raw_data = json.load(open('bricinvestors.json'),
object_pairs_hook=OrderedDict)
run(raw_data)
Final version, which is based in large part on #unutbu's answer.
import json
import collections
OrderedDict = collections.OrderedDict
def map_names_to_imports(raw_data):
name_to_imports = OrderedDict()
for row in raw_data:
name_to_imports.setdefault(row['imports'], []).append(row['name'])
return name_to_imports
def reformat(name_to_imports):
the_output = []
for name, imports in name_to_imports.items():
new_dict = OrderedDict([('name', name),
('imports', list(set(imports)))])
the_output.append(new_dict)
return the_output
def run(raw_data):
name_to_imports = map_names_to_imports(raw_data)
the_output = reformat(name_to_imports)
with open('data/clean-data2.json', 'w+', encoding='utf8') as f:
f.write(json.dumps(the_output))
if __name__ == '__main__':
raw_data = json.load(open('data/bricsinvestorsfirst.json'), object_pairs_hook=OrderedDict)
run(raw_data)

python json to csv converting script?

Let me start by stating that I am new to python. I wrote a script that will convert a .json file to csv format. I managed to write a script to do the job, however I don't think that my script will work if the format of the json file was to change. My script assumes that the json file will be in the same format at all times.
<json file example>
{
"Order":
{
"order_id":"8251662",
"order_date":"2012-08-20 13:17:37",
"order_date_shipped":"0000-00-00 00:00:00",
"order_status":"fraudreview",
"order_ship_firstname":"pam",
"order_ship_lastname":"Gregorio",
"order_ship_address1":"1533 E. Dexter St",
"order_ship_address2":"",
"order_ship_city":"Covina",
"order_ship_state":"CA",
"order_ship_zip":"91746",
"order_ship_country":"US United States",
"order_ship_phone":"6268936923",
"order_ship_email":"pgregorio#brighton.com",
"order_bill_firstname":"pam",
"order_bill_lastname":"Gregorio",
"order_bill_address1":"1533 E. Dexter St",
"order_bill_address2":"",
"order_bill_city":"Covina",
"order_bill_state":"CA",
"order_bill_zip":"91746",
"order_bill_country":"US United States",
"order_bill_phone":"6268936923",
"order_bill_email":"pgregorio#brighton.com",
"order_gift_message":"",
"order_giftwrap":"0",
"order_gift_charge":"0",
"order_shipping":"Standard (Within 5-10 Business Days)",
"order_tax_charge":"62.83",
"order_tax_shipping":"0",
"order_tax_rate":"0.0875",
"order_shipping_charge":"7.5",
"order_total":"788.33",
"order_item_count":"12",
"order_tracking":"",
"order_carrier":"1"
},
"Items":
[
{
"item_id":"25379",
"item_date_shipped":"",
"item_code":"17345-J3553-J35532",
"item_quantity":"2","item_taxable":"YES",
"item_unit_price":"32","item_shipping":"0.67",
"item_addcharge_price":"0",
"item_description":" ABC Slide Bracelet: : Size: OS: Silver Sku: J35532",
"item_quantity_returned":"0",
"item_quantity_shipped":"0",
"item_quantity_canceled":"0",
"item_status":"pending",
"item_product_id":"17345",
"item_product_kit_id":"0",
"item_product_sku":"J35532",
"item_product_barcode":"881934310775",
"item_tracking":"",
"item_carrier":"0",
"item_source_orderid":""
},
{
"item_id":"25382",
"item_date_shipped":"",
"item_code":"17608-J3809-J3809C",
"item_quantity":"1",
"item_taxable":"YES",
"item_unit_price":"22",
"item_shipping":"0.23",
"item_addcharge_price":"0",
"item_description":" \"ABC Starter Bracelet 7 1\/4\"\"\": : Size: OS: Silver Sku: J3809C",
"item_quantity_returned":"0",
"item_quantity_shipped":"0",
"item_quantity_canceled":"0",
"item_status":"pending",
"item_product_id":"17608",
"item_product_kit_id":"0",
"item_product_sku":"J3809C",
"item_product_barcode":"881934594175",
"item_tracking":"",
"item_carrier":"0",
"item_source_orderid":""
},
{
"item_id":"25385",
"item_date_shipped":"",
"item_code":"17687-J9200-J92000",
"item_quantity":"2",
"item_taxable":"YES",
"item_unit_price":"12",
"item_shipping":"0.25",
"item_addcharge_price":"0",
"item_description":" ABC Cathedral Bead: : Size: OS: Silver Sku: J92000",
"item_quantity_returned":"0",
"item_quantity_shipped":"0",
"item_quantity_canceled":"0",
"item_status":"pending",
"item_product_id":"17687",
"item_product_kit_id":"0",
"item_product_sku":"J92000",
"item_product_barcode":"881934602832",
"item_tracking":"",
"item_carrier":"0",
"item_source_orderid":""
},
{
"item_id":"25388",
"item_date_shipped":"",
"item_code":"17766-J9240-J92402",
"item_quantity":"2",
"item_taxable":"YES",
"item_unit_price":"22",
"item_shipping":"0.46",
"item_addcharge_price":"0",
"item_description":" ABC Ice Diva Bead: : Size: OS: Silver Sku: J92402",
"item_quantity_returned":"0",
"item_quantity_shipped":"0",
"item_quantity_canceled":"0",
"item_status":"pending",
"item_product_id":"17766",
"item_product_kit_id":"0",
"item_product_sku":"J92402",
"item_product_barcode":"881934655838",
"item_tracking":"",
"item_carrier":"0",
"item_source_orderid":""
},
],
"FraudReasons":
[
{
"order_id":"11957",
"fraud_reason":"order total exceeds max amount"
},
{
"order_id":"11957",
"fraud_reason":"order exceeds max item count"
}
]
}
My script currently works fine with this json file but It wont work if there is only one item or one fraudreason. Here is the code to my script.
<script code>
#!/usr/bin/python
import simplejson as json
import optparse
import pycurl
import sys
import csv
json_data = open(file)
data = json.load(json_data)
json_data.close()
csv_file = '/tmp/' + str(options.orderId) + '.csv'
orders = data['Order']
items = data['Items']
frauds = data['FraudReasons']
o = csv.writer(open(csv_file, 'w'), lineterminator=',')
o.writerow([orders['order_id'],orders['order_date'],orders['order_date_shipped'],orders['order_status'],orders['order_ship_firstname'],orders['order_ship_lastname'],orders['order_ship_address1'],orders['order_ship_address2'],orders['order_ship_city'],orders['order_ship_state'],orders['order_ship_zip'],orders['order_ship_country'],orders['order_ship_phone'],orders['order_ship_email'],orders['order_bill_firstname'],orders['order_bill_lastname'],orders['order_bill_address1'],orders['order_bill_address2'],orders['order_bill_city'],orders['order_bill_state'],orders['order_bill_zip'],orders['order_bill_country'],orders['order_bill_phone'],orders['order_bill_email'],orders['order_gift_message'],orders['order_giftwrap'],orders['order_gift_charge'],orders['order_shipping'],orders['order_tax_charge'],orders['order_tax_shipping'],orders['order_tax_rate'],orders['order_shipping_charge'],orders['order_total'],orders['order_item_count'],orders['order_tracking'],orders['order_carrier']])
for item in items:
o.writerow([item['item_id'],item['item_date_shipped'],item['item_code'],item['item_quantity'],item['item_taxable'],item['item_unit_price'],item['item_shipping'],item['item_addcharge_price'],item['item_description'],item['item_quantity_returned'],item['item_quantity_shipped'],item['item_quantity_canceled'],item['item_status'],item['item_product_id'],item['item_product_kit_id'],item['item_product_sku'],item['item_product_barcode'],item['item_tracking'],item['item_carrier'],item['item_source_orderid']])
for fraud in frauds:
o.writerow([fraud['fraud_reason']],)
I also have not been able to figure out how not to use the labels I hope someone can help me with this
thanks in advance.
You may want to use csv.DictWriter:
# It's considered best to stash the main logic of your script
# in a main() function like this.
def main(filename, options):
with open(filename) as fi:
data = json.load(fi)
csv_file = '/tmp/' + str(options.orderId) + '.csv'
order = data['Order']
items = data['Items']
frauds = data['FraudReasons']
# Here's one way to keep this maintainable if the JSON
# format changes, and you don't care too much about the
# order of the fields...
orders_fields = sorted(orders.keys())
item_fields = sorted(items[0].keys()) if items else ()
fraud_fields = sorted(fraud[0].keys()) if fraud else ()
csv_options = dict(lineterminator=',')
with open(csv_file, 'w') as fo:
o = csv.DictWriter(fo, order_fields, **csv_options)
o.writeheader()
o.writerow(orders)
fo.write('\n') # Optional, if you want to keep them separated.
o = csv.DictWriter(fo, item_fields, **csv_options)
o.writeheader()
o.writerows(items)
fo.write('\n') # Optional, if you want to keep them separated.
o = csv.DictWriter(fo, fraud_fields, **csv_options)
o.writeheader()
o.writerows(frauds)
# If this script is run from the command line, just run
# main(). Here's the place to use `optparse`.
if __name__ == '__main__':
main(...) # You'll need to fill in the main() arguments...
If you need to specify the order of fields, assign them to a tuple like this:
orders_fields = (
'order_id',
'order_date',
'order_date_shipped',
# ... etc.
)
You should ask the json-generated object (data) for the names of the fields. To retain the input order, tell json to use collections.OrderedDict instead of plain dict (requires python 2.7):
import json
from collections import OrderedDict as ordereddict
data = json.loads(open('mydata.json', object_pairs_hook=ordereddict)
orders = data['Order']
print orders.keys() # Will print the keys in the order they were read
You can then use orders.keys() instead of your hard-coded list, either with writerow or (simpler) with csv.DictWriter.
Note that this uses the default json, not simplejson, and requires python 2.7 for the ordered_pairs_hook argument and the OrderedDict type.
Edit: Yeah, I see from the comments that you're stuck with 2.4. You can download an ordereddict from PyPi, and you can extend the JSONDecoder class and pass it with the cls argument (see here), instead of object_pairs_hook, but that's uglier and more work...

Categories

Resources