I am new to JSON and Python,I am trying to achieve below
Need to parse below JSON
{
"id": "12345abc",
"codes": [
"BSVN1FKW3JKKNNMN",
"HJYYUKJJL999OJR",
"DFTTHJJJJ0099JUU",
"FGUUKHKJHJGJJYGJ"
],
"ctr": {
"source": "xyz",
"user_id": "1234"
}
}
Expected output:Normalized on "codes" value
ID~CODES~USER_ID
12345abc~BSVN1FKW3JKKNNMN~1234
12345abc~HJYYUKJJL999OJR~1234
12345abc~DFTTHJJJJ0099JUU~1234
12345abc~FGUUKHKJHJGJJYGJ~1234
Started with below ,but need help to get to my desired output.
The "codes" block can have n number of values separated by comma.
The below code is throwing an error "TypeError: string indices must be integers"
#!/usr/bin/python
import os
import json
import csv
f = open('rspns.csv','w')
writer = csv.writer(f,delimiter = '~')
headers = [‘ID’,’CODES’,’USER_ID’]
default = ''
writer.writerow(headers)
string = open('sample.json').read().decode('utf-8')
json_obj = json.loads(string)
#print json_obj['id']
#print json_obj['codes']
#print json_obj['codes'][0]
#print json_obj['codes'][1]
#print json_obj['codes’][2]
#print json_obj['codes’][3]
#print json_obj['ctr’][‘user_id']
for keyword in json_obj:
row = []
row.append(str(keyword['id']))
row.append(str(keyword['codes']))
row.append(str(keyword['ctr’][‘user_id']))
writer.writerow(row)
If your json_obj looks exactly like that , that is it is a dictionary, then the issue is that when you do -
for keyword in json_obj:
You are iterating over keys in json_obj, then if you try to access ['id'] for that key it should error out saying string indices must be integers .
You should first get the id and user_id before looping and then loop over json_obj['codes'] and then add the previously computed id and user_id along with the current value from codes list to the writer csv as a row.
Example -
import json
import csv
string = open('sample.json').read().decode('utf-8')
json_obj = json.loads(string)
with open('rspns.csv','w') as f:
writer = csv.writer(f,delimiter = '~')
headers = ['ID','CODES','USER_ID']
writer.writerow(headers)
id = json_obj['id']
user_id = json_obj['ctr']['user_id']
for code in json_obj['codes']:
writer.writerow([id,code,user_id])
You don't want to iterate through json_obj as that is a dictionary and iterating through will get the keys. The TypeError is caused by trying to index into the keys ('id', 'code', and 'ctr') -- which are strings -- as if they were a dictionary.
Instead, you want a separate row for each code in json_obj['codes'] and to use the json_obj dictionary for your lookups:
for code in json_obj['codes']:
row = []
row.append(json_obj['id'])
row.append(code)
row.append(json_obj['ctr’][‘user_id'])
writer.writerow(row)
Related
*New to Programming
Question: I need to use the below "Data" (two rows as arrays) queried from sql and use it to create the message structure below.
data from sql using fetchall()
Data = [[100,1,4,5],[101,1,4,6]]
##expected message structure
message = {
"name":"Tom",
"Job":"IT",
"info": [
{
"id_1":"100",
"id_2":"1",
"id_3":"4",
"id_4":"5"
},
{
"id_1":"101",
"id_2":"1",
"id_3":"4",
"id_4":"6"
},
]
}
I tried to create below method to iterate over the rows and then input the values, this is was just a starting, but this was also not working
def create_message(data)
for row in data:
{
"id_1":str(data[0][0],
"id_2":str(data[0][1],
"id_3":str(data[0][2],
"id_4":str(data[0][3],
}
Latest Code
def create_info(data):
info = []
for row in data:
temp_dict = {"id_1_tom":"","id_2_hell":"","id_3_trip":"","id_4_clap":""}
for i in range(0,1):
temp_dict["id_1_tom"] = str(row[i])
temp_dict["id_2_hell"] = str(row[i+1])
temp_dict["id_3_trip"] = str(row[i+2])
temp_dict["id_4_clap"] = str(row[i+3])
info.append(temp_dict)
return info
Edit: Updated answer based on updates to the question and comment by original poster.
This function might work for the example you've given to get the desired output, based on the attempt you've provided:
def create_info(data):
info = []
for row in data:
temp_dict = {}
temp_dict['id_1_tom'] = str(row[0])
temp_dict['id_2_hell'] = str(row[1])
temp_dict['id_3_trip'] = str(row[2])
temp_dict['id_4_clap'] = str(row[3])
info.append(temp_dict)
return info
For the input:
[[100, 1, 4, 5],[101,1,4,6]]
This function will return a list of dictionaries:
[{"id_1_tom":"100","id_2_hell":"1","id_3_trip":"4","id_4_clap":"5"},
{"id_1_tom":"101","id_2_hell":"1","id_3_trip":"4","id_4_clap":"6"}]
This can serve as the value for the key info in your dictionary message. Note that you would still have to construct the message dictionary.
The following situation:
movies.csv
movieId,title,genres
tags.csv
userId,movieId,tag,timestamp
I want to get the tags from tags.csv and append to the dictionary containing a list where all the tags should be stored. The movieID should be identical so that the list can be appended. The list should also not have duplicates.
Here is the code:
import csv
reader = csv.reader(open('movies1.csv'))
dict = {}
header = next(reader)
# Check file as empty
if header != None:
for row in reader:
key = row[0]
value = {
"id": row[0],
"title": row[1][:-6],
"year": row[1][-5:-1],
"average_rating": 0,
"ratings": [],
"tags": [], #the list that should be filled with tags
"genres": row[2].split('|')
}
dict[key] = value
tags={}
with open('tags1.csv', mode='r') as infile:
reader = csv.reader(infile)
header = next(reader)
# Check file as empty
if header != None:
for col in reader:
if col[1] == dict[key]['id']:
dict[key]['tags'].append(col[2])
print(dict)
My result:
I get all the tags for the last movie. The rest of the tags are just empty.
What am I doing wrong?
So i made it work. I created a second Dictionary and that looped in both of them.
for tag in tags:
for movie in dict:
if tags[tag]['movieId'] == dict[movie]['id']:
if tags[tag]['tag'] not in dict[movie]['tags']:
dict[movie]['tags'].append(tags[tag]['tag'])
The following code is giving me the error:
Traceback (most recent call last): File "AMZGetPendingOrders.py", line 66, in <module>
item_list.append(item['SellerSKU']) TypeError: string indices must be integers
The code:
from mws import mws
import time
import json
import xmltodict
access_key = 'xx' #replace with your access key
seller_id = 'yy' #replace with your seller id
secret_key = 'zz' #replace with your secret key
marketplace_usa = '00'
orders_api = mws.Orders(access_key, secret_key, seller_id)
orders = orders_api.list_orders(marketplaceids=[marketplace_usa], orderstatus=('Pending'), fulfillment_channels=('MFN'), created_after='2018-07-01')
#save as XML file
filename = 'c:order.xml'
with open(filename, 'w') as f:
f.write(orders.original)
#ConvertXML to JSON
dictString = json.dumps(xmltodict.parse(orders.original))
#Write new JSON to file
with open("output.json", 'w') as f:
f.write(dictString)
#Read JSON and parse our order number
with open('output.json', 'r') as jsonfile:
data = json.load(jsonfile)
#initialize blank dictionary
id_list = []
for order in data['ListOrdersResponse']['ListOrdersResult']['Orders']['Order']:
id_list.append(order['AmazonOrderId'])
#This "gets" the orderitem info - this code actually is similar to the initial Amazon "get" though it has fewer switches
orders_api = mws.Orders(access_key, secret_key, seller_id)
#opens and empties the orderitem.xml file
open('c:orderitem.xml', 'w').close()
#iterated through the list of AmazonOrderIds and writes the item information to orderitem.xml
for x in id_list:
orders = orders_api.list_order_items(amazon_order_id = x)
filename = 'c:orderitem.xml'
with open(filename, 'a') as f:
f.write(orders.original)
#ConvertXML to JSON
amz_items_pending = json.dumps(xmltodict.parse(orders.original))
#Write new JSON to file
with open("pending.json", 'w') as f:
f.write(amz_items_pending)
#read JSON and parse item_no and qty
with open('pending.json', 'r') as jsonfile1:
data1 = json.load(jsonfile1)
#initialize blank dictionary
item_list = []
for item in data1['ListOrderItemsResponse']['ListOrderItemsResult']['OrderItems']['OrderItem']:
item_list.append(item['SellerSKU'])
#print(item)
#print(id_list)
#print(data1)
#print(item_list)
time.sleep(10)
I don't understand why Python thinks this is a list and not a dictionary. When I print id_list it looks like a dictionary (curly braces, single quotes, colons, etc)
print(data1) shows my dictionary
{
'ListOrderItemsResponse':{
'#xmlns':'https://mws.amazonservices.com/Orders/201 3-09-01',
'ListOrderItemsResult':{
'OrderItems':{
'OrderItem':{
'QuantityOrdered ':'1',
'Title':'Delta Rothko Rolling Bicycle Stand',
'ConditionId':'New',
'Is Gift':'false',
'ASIN':'B00XXXXTIK',
'SellerSKU':'9934638',
'OrderItemId':'49 624373726506',
'ProductInfo':{
'NumberOfItems':'1'
},
'QuantityShipped':'0',
'C onditionSubtypeId':'New'
}
},
'AmazonOrderId':'112-9XXXXXX-XXXXXXX'
},
'ResponseM etadata':{
'RequestId':'8XXXXX8-0866-44a4-96f5-XXXXXXXXXXXX'
}
}
}
Any ideas?
because you are iterating over each key value in dict:
{'QuantityOrdered ': '1', 'Title': 'Delta Rothko Rolling Bicycle Stand', 'ConditionId': 'New', 'Is Gift': 'false', 'ASIN': 'B00XXXXTIK', 'SellerSKU': '9934638', 'OrderItemId': '49 624373726506', 'ProductInfo': {'NumberOfItems': '1'}, 'QuantityShipped': '0', 'C onditionSubtypeId': 'New'}
so first value in item will be 'QuantityOrdered ' and you are trying to access this string as if it is dictionary
you can just do:
id_list.append(data1['ListOrderItemsResponse']['ListOrderItemsResult']['OrderItems']['OrderItem']['SellerSKU']))
and avoid for loop in dictionary
I guess you are trying to iterate OrderItems and finding their SellerSKU values.
for item in data1['ListOrderItemsResponse']['ListOrderItemsResult']['OrderItems']:
item_list.append(item['SellerSKU'])
This is just a part of my json file which looks like:
"network_lo": "127.0.0.0",
"ec2_block_device_mapping_root": "/dev/sda1",
"selinux": "false",
"uptime_seconds": 127412,
"ec2_reservation_id": "r-cd786568",
"sshdsakey": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"ec2_block_device_mapping_ami": "/dev/sda1",
"memorysize": "3.66 GB",
"swapsize": "0.00 kB",
"netmask": "255.255.255.192",
"uniqueid": "24wq0see",
"kernelmajversion": "3.2",
I have a Python scipt which download this file.. i want to parse this file and remove a number of objects like "swapsize","sshdsakey"
sqs = boto.sqs.connect_to_region("ap-west-1")
q = sqs.get_queue("deathvally")
m = q.read(visibility_timeout=15)
if m == None:
print "No message!"
else:
with open('download.json', 'w') as json_data:
print m.get_body()
json_data.write(m.get_body())
json_data.close()
# I want a logic here which can simply delete the specific json objects
# Something like this is what i tried but didn't work...
# clean_data = json.load(json_data)
# for element in clean_data: ##
# del element['sshdsakey']
# json_data.write(clean_data)
I basically need to parse the fetched json file and then remove the specific objects and then just write this new modified stuff in a file.
json.loads will decode JSON string into Python dictionary (Although format you provided is not a valid JSON format, there have to be curly braces on each side), then you can delete the needed keys with del , encode dictionary back to JSON string with json.dumps and write the resultit
clean_data = json.loads(json_data.read())
del clean_data[your_key]
with open(your_file_to_write, 'w') as f:
f.write(json.dumps(clean_data))
You can parse your json using loads from native json module.
Then delete an element from the dict using del
import json
keys_to_remove = ['sshdsakey', 'selinux']
json_str = '''{
"network_lo": "127.0.0.0",
"ec2_block_device_mapping_root": "/dev/sda1",
"selinux": "false",
"uptime_seconds": 127412,
"ec2_reservation_id": "r-cd786568",
"sshdsakey": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
}'''
data = json.loads(json_str)
for key in keys_to_remove:
if key in data:
del data[key]
print data
You need to first convert the JSON object string into a Python dict, delete the keys from it, and then write to to the output file.
import json
sqs = boto.sqs.connect_to_region("ap-west-1")
q = sqs.get_queue("deathvally")
m = q.read(visibility_timeout=15)
if m is None:
print "No message!"
else:
KEYS_TO_REMOVE = "swapsize", "sshdsakey", "etc"
with open('download.json', 'w') as json_data:
json_obj = json.loads(m.get_body())
for key in KEYS_TO_REMOVE:
try:
del json_obj[key]
except KeyError:
pass
json_data.write(json.dumps(json_obj, indent=4))
I have below data in JSON format, I have started with code below which throws a KEY ERROR.
Not sure how to get all data listed in headers section.
I know I am not doing it right in json_obj['offers'][0]['pkg']['Info']: but not sure how to do it correctly.
how can I get to different nodes like info,PricingInfo,Flt_Info etc?
{
"offerInfo":{
"siteID":"1",
"language":"en_US",
"currency":"USD"
},
"offers":{
"pkg":[
{
"offerDateRange":{
"StartDate":[
2015,
11,
8
],
"EndDate":[
2015,
11,
14
]
},
"Info":{
"Id":"111"
},
"PricingInfo":{
"BaseRate":1932.6
},
"flt_Info":{
"Carrier":"AA"
}
}
]
}
}
import os
import json
import csv
f = open('api.csv','w')
writer = csv.writer(f,delimiter = '~')
headers = ['Id' , 'StartDate', 'EndDate', 'Id', 'BaseRate', 'Carrier']
default = ''
writer.writerow(headers)
string = open('data.json').read().decode('utf-8')
json_obj = json.loads(string)
for pkg in json_obj['offers'][0]['pkg']['Info']:
row = []
row.append(json_obj['id']) # just to test,but I need column values listed in header section
writer.writerow(row)
It looks like you're accessing the json incorrectly. After you have accessed json_obj['offers'], you accessed [0], but there is no array there. json_obj['offers'] gives you another dictionary.
For example, to get PricingInfo like you asked, access like this:
json_obj['offers']['pkg'][0]['PricingInfo']
or 11 from the StartDate like this:
json_obj['offers']['pkg'][0]['offerDateRange']['StartDate'][1]
And I believe you get the KEY ERROR because you access [0] in the dictionary, which since that isn't a key, you get the error.
try to substitute this piece of code:
for pkg in json_obj['offers'][0]['pkg']['Info']:
row = []
row.append(json_obj['id']) # just to test,but I need column values listed in header section
writer.writerow(row)
With this:
for pkg in json_obj['offers']['pkg']:
row.append(pkg['Info']['Id'])
year = pkg['offerDateRange']['StartDate'][0]
month = pkg['offerDateRange']['StartDate'][1]
day = pkg['offerDateRange']['StartDate'][2]
StartDate = "%d-%d-%d" % (year,month,day)
print StartDate
writer.writerow(row)
Try this
import os
import json
import csv
string = open('data.json').read().decode('utf-8')
json_obj = json.loads(string)
print json_obj["offers"]["pkg"][0]["Info"]["Id"]
print str(json_obj["offers"]["pkg"][0]["offerDateRange"]["StartDate"][0]) +'-'+ str(json_obj["offers"]["pkg"][0]["offerDateRange"]["StartDate"][1])+'-'+str(json_obj["offers"]["pkg"][0]
["offerDateRange"]["StartDate"][2])
print str(json_obj["offers"]["pkg"][0]["offerDateRange"]["EndDate"][0]) +'-'+ str(json_obj["offers"]["pkg"][0]["offerDateRange"]["EndDate"][1])+'-'+str(json_obj["offers"]["pkg"][0]
["offerDateRange"]["EndDate"][2])
print json_obj["offers"]["pkg"][0]["Info"]["Id"]
print json_obj["offers"]["pkg"][0]["PricingInfo"]["BaseRate"]
print json_obj["offers"]["pkg"][0]["flt_Info"]["Carrier"]