parsing JSON with missing fields

parsing JSON with missing fields - python

I have json array with very dynamic field and some of the array doesn't have all the fields.
Example :
[
{
"Name": "AFG LIMITED",
"Vendor ID": "008343",
"EGID": "67888",
"FID": "83748374"
},
{
"Name": "ABC LIMITED",
"Vendor ID": "008333",
"EGID": "67888",
"AID": "0000292"
"FID": "98979"
},
]
I need to extract particular key with header & pipe delimiter like :Name|Vendor ID|EGID|AID(only present in second array).if any key is not present then it should have null value
I was try to parse this with below code but it's breaking in the second line itself as AID is missing.
import json
with open("sample.json", "r") as rf:
decoded_data = json.load(rf)
# Check is the json object was loaded correctly
try:
for i in decoded_data:
print i["Name"],"|",i["Vendor ID"]"|",i["EGID"],"|",i["AId"]
except KeyError:
print(null)
output from above code:
AFG LIMITED|008343|67888|null

Related

How to convert python-request JSON results to csv?

I am trying to get my list of contacts from my WIX website using their API endpoint url and the requests module in python. I am totally stuck.
Here's my code so far:
import requests
auth_key = "my auth key"
r = requests.get("https://www.wixapis.com/crm/v1/contacts", headers={"Authorization": auth_key})
print(r.status_code)
dict = r.json()
contacts_list = dict["contacts"]
for i in contacts_list:
for key in i:
print(key, ':', i[key])
Here is what I get:
200
id : long id string 1
emails : [{'tag': 'UNTAGGED', 'email': 'sampleemail1#yahoo.com'}]
phones : []
addresses : [{'tag': 'UNTAGGED', 'countryCode': 'US'}]
metadata : {'createdAt': '2020-07-08T22:41:07.135Z', 'updatedAt': '2020-07-08T22:42:19.327Z'}
source : {'sourceType': 'SITE_MEMBERS'}
id : long id string 2
emails : [{'tag': 'UNTAGGED', 'email': 'sampleemail2#yahoo.com'}]
phones : []
addresses : []
metadata : {'createdAt': '2020-07-03T00:51:21.127Z', 'updatedAt': '2020-07-04T03:26:16.370Z'}
source : {'sourceType': 'SITE_MEMBERS'}
Process finished with exit code 0
Each line is a string. I need each row of the csv to be a new contact (There are two sample contacts). The columns should be the keys. I plan to use the csv module to writerow(Fields), where fields is a list of string (keys) such as Fields = [id, emails, phones, addresses, metadata, source]
All I really need is the emails in a single column of a csv though. Is there a way to maybe just get the email for each contact?

A CSV file with one column is basically just a text file with one item per line, but you can use the csv module to do it if you really want, as shown below.
I commented-out the 'python-requests' stuff and used some sample input for testing.
test_data = {
"contacts": [
{
"id": "long id string 1",
"emails": [
{
"tag": "UNTAGGED",
"email": "sampleemail1#yahoo.com"
}
],
"phones": [],
"addresses": [
{
"tag": "UNTAGGED",
"countryCode": "US"
}
],
"metadata": {
"createdAt": "2020-07-08T22:41:07.135Z",
"updatedAt": "2020-07-08T22:42:19.327Z"
},
"source": {
"sourceType": "SITE_MEMBERS"
}
},
{
"id": "long id string 2",
"emails": [
{
"tag": "UNTAGGED",
"email": "sampleemail2#yahoo.com"
}
],
"phones": [],
"addresses": [],
"metadata": {
"createdAt": "2020-07-03T00:51:21.127Z",
"updatedAt": "2020-07-04T03:26:16.370Z"
},
"source": {
"sourceType": "SITE_MEMBERS"
}
}
]
}
import csv
import json
import requests
auth_key = "my auth key"
output_filename = 'whatever.csv'
#r = requests.get("https://www.wixapis.com/crm/v1/contacts", headers={"Authorization": auth_key})
#print(r.status_code)
#json_obj = r.json()
json_obj = test_data # FOR TESTING PURPOSES
contacts_list = json_obj["contacts"]
with open(output_filename, 'w', newline='') as outp:
writer = csv.writer(outp)
writer.writerow(['email']) # Write csv header.
for contact in contacts_list:
email = contact['emails'][0]['email'] # Get the first one.
writer.writerow([email])
print('email csv file written')
Contents of whatever.csv file afterwards:
email
sampleemail1#yahoo.com
sampleemail2#yahoo.com

Update:
As pointed by #martineau, I just saw you can array in few values, you need to cater it. You may make them string with [].join() in the for loop
you can write it to csv like this using csv package.
import csv, json, sys
auth_key = "my auth key"
r = requests.get("https://www.wixapis.com/crm/v1/contacts", headers={"Authorization": auth_key})
print(r.status_code)
dict = r.json()
contacts_list = dict["contacts"]
output = csv.writer(sys.stdout)
#insert header(keys)
output.writerow(data[0].keys())
for i in contacts_list:
output.writerow(i.values())
At the end you can print and verify output

Reading json in python separated by newlines

I am trying to read some json with the following format. A simple pd.read_json() returns ValueError: Trailing data. Adding lines=True returns ValueError: Expected object or value. I've tried various combinations of readlines() and load()/loads() so far without success.
Any ideas how I could get this into a dataframe?
{
"content": "kdjfsfkjlffsdkj",
"source": {
"name": "jfkldsjf"
},
"title": "dsldkjfslj",
"url": "vkljfklgjkdlgj"
}
{
"content": "djlskgfdklgjkfgj",
"source": {
"name": "ldfjkdfjs"
},
"title": "lfsjdfklfldsjf",
"url": "lkjlfggdflkjgdlf"
}

The sample you have above isn't valid JSON. To be valid JSON these objects need to be within a JS array ([]) and be comma separated, as follows:
[{
"content": "kdjfsfkjlffsdkj",
"source": {
"name": "jfkldsjf"
},
"title": "dsldkjfslj",
"url": "vkljfklgjkdlgj"
},
{
"content": "djlskgfdklgjkfgj",
"source": {
"name": "ldfjkdfjs"
},
"title": "lfsjdfklfldsjf",
"url": "lkjlfggdflkjgdlf"
}]
I just tried on my machine. When formatted correctly, it works
>>> pd.read_json('data.json')
content source title url
0 kdjfsfkjlffsdkj {'name': 'jfkldsjf'} dsldkjfslj vkljfklgjkdlgj
1 djlskgfdklgjkfgj {'name': 'ldfjkdfjs'} lfsjdfklfldsjf lkjlfggdflkjgdlf

Another solution if you do not want to reformat your files.
Assuming your JSON is in a string called my_json you could do:
import json
import pandas as pd
splitted = my_json.split('\n\n')
my_list = [json.loads(e) for e in splitted]
df = pd.DataFrame(my_list)

Thanks for the ideas internet. None quite solved the problem in the way I needed (I had lots of newline characters in the strings themselves which meant I couldn't split on them) but they helped point the way. In case anyone has a similar problem, this is what worked for me:
with open('path/to/original.json', 'r') as f:
data = f.read()
data = data.split("}\n")
data = [d.strip() + "}" for d in data]
data = list(filter(("}").__ne__, data))
data = [json.loads(d) for d in data]
with open('path/to/reformatted.json', 'w') as f:
json.dump(data, f)
df = pd.read_json('path/to/reformatted.json')

If you can use jq then solution is simpler:
jq -s '.' path/to/original.json > path/to/reformatted.json

Python 3 Get JSON value

I am using rest with a python script to extract Name and Start Time from a response.
I can get the information but I can't combine data so that the information is on the same line in a CSV. When I go to export them to CSV they all go on new lines.
There is probably a much better way to extract data from a JSON List.
for item in driverDetails['Query']['Results']:
for data_item in item['XValues']:
body.append(data_item)
for key, value in data_item.items():
#driver = {}
#test = {}
#startTime = {}
if key == "Name":
drivers.append(value)
if key == "StartTime":
drivers.append(value)
print (drivers)
Code to write to CSV:
with open(logFileName, 'a') as outcsv:
# configure writer to write standard csv file
writer = csv.writer(outcsv, delimiter=',', quotechar="'",
quoting=csv.QUOTE_MINIMAL, lineterminator='\n',skipinitialspace=True)
for driver in drivers:
writer.writerow(driver)
Here is a sample of the response:
"Query": {
"Results": [
{
"XValues": [
{
"ReportScopeStartTime": "2018-06-18T23:00:00Z"
},
{
"ReportScopeEndTime": "2018-06-25T22:59:59Z"
},
{
"ID": "1400"
},
{
"Name": " John Doe"
},
{
"StartTime": "2018-06-19T07:16:10Z"
},
],
},
"XValues": [
{
"ReportScopeStartTime": "2018-06-18T23:00:00Z"
},
{
"ReportScopeEndTime": "2018-06-25T22:59:59Z"
},
{
"ID": "1401"
},
{
"Name": " Jane Smith"
},
{
"StartTime": "2018-06-19T07:16:10Z"
},
],
},
My ouput in csv:
John Doe
2018-06-19T07:16:10Z
Jane Smith
2018-06-19T07:16:10Z
Desired Outcome:
John Doe, 2018-06-19T07:16:10Z
Jane Smith, 2018-06-19T07:16:10Z

Just use normal dictionary access to get the values:
for item in driverDetails['Query']['Results']:
for data_item in item['XValues']:
body.append(data_item)
if "Name" in data_item:
drivers.append(data_item["Name"])
if "StartTime" in data_item:
drivers.append(data_item["StartTime"])
print (drivers)
If you know the items will already have the required fields then you won't even need the in tests.
writer.writerow() expects a sequence. You are calling it with a single string as a parameter so it will split the string into individual characters. Probably you want to keep the name and start time together so extract them as a tuple:
for item in driverDetails['Query']['Results']:
name, start_time = "", ""
for data_item in item['XValues']:
body.append(data_item)
if "Name" in data_item:
name = data_item["Name"]
if "StartTime" in data_item:
start_time = data_item["StartTime"]
drivers.append((name, start_time))
print (drivers)
Now instead of being a list of strings, drivers is a list of tuples: the name for every item that has a name and the start time but if an input item has a name and no start time that field could be empty. Your code to write the csv file should now do the expected thing.
If you want to get all or most of the values try gathering them together into a single dictionary, then you can pull out the fields you want:
for item in driverDetails['Query']['Results']:
fields = {}
for data_item in item['XValues']:
body.append(data_item)
fields.update(data_item)
drivers.append((fields["ID"], fields["Name"], fields["StartTime"]))
print (drivers)
Once you have the fields in a single dictionary you could even build the tuple with a loop:
drivers.append(tuple(fields[f] for f in ("ID", "Name", "StartTime", "ReportScopeStartTime", "ReportScopeEndTime")))
I think you should list the fields you want explicitly just to ensure that new fields don't surprise you.

Writing multiple json objects to a json file

I have a list of json objects that I would like to write to a json file. Example of my data is as follows:
{
"_id": "abc",
"resolved": false,
"timestamp": "2017-04-18T04:57:41 366000",
"timestamp_utc": {
"$date": 1492509461366
},
"sessionID": "abc",
"resHeight": 768,
"time_bucket": ["2017-year", "2017-04-month", "2017-16-week", "2017-04-18-day", "2017-04-18 16-hour"],
"referrer": "Standalone",
"g_event_id": "abc",
"user_agent": "abc"
"_id": "abc",
} {
"_id": "abc",
"resolved": false,
"timestamp": "2017-04-18T04:57:41 366000",
"timestamp_utc": {
"$date": 1492509461366
},
"sessionID": "abc",
"resHeight": 768,
"time_bucket": ["2017-year", "2017-04-month", "2017-16-week", "2017-04-18-day", "2017-04-18 16-hour"],
"referrer": "Standalone",
"g_event_id": "abc",
"user_agent": "abc"
}
I would like to wirte this to a json file. Here's the code that I am using for this purpose:
with open("filename", 'w') as outfile1:
for row in data:
outfile1.write(json.dumps(row))
But this gives me a file with only 1 long row of data. I would like to have a row for each json object in my original data. I know there are some other StackOverflow questions that are trying to address somewhat similar situation (by externally inserting '\n' etc.), but it hasn't worked in my case for some reason. I believe there has to be a pythonic way to do this.
How do I achieve this?

The format of the file you are trying to create is called JSON lines.
It seems, you are asking why the jsons are not separated with a newline. Because write method does not append the newline.
If you want implicit newlines you should better use print function:
with open("filename", 'w') as outfile1:
for row in data:
print(json.dumps(row), file=outfile1)

Use the indent argument to output json with extra whitespace. The default is to not output linebreaks or extra spaces.
with open('filename.json', 'w') as outfile1:
json.dump(data, outfile1, indent=4)
https://docs.python.org/3/library/json.html#basic-usage

Filter data to facebook graph with json-python

I 'm getting Facebook graph and data already shows what I need , but I could not filter the 'message' and 'id' or JSON , appreciate them , I leave my code:
import facebook
import json
import urllib.request
from urllib.request import urlopen
page_id = "MYPAGE"
access_token = 'MY-ACCESS-TOKEN'
api_endpoint = "https://graph.facebook.com/v2.5/"
fb_graph_url = page_id+"?fields=id,name,feed.since(2015-12-22).until(2015-12-25){comments.filter(stream)}&access_token="+access_token
html = api_endpoint + fb_graph_url
print(html,"\n")
data = urllib.request.urlopen(html)
read_page= data.read()
print(read_page)
print(data.read(),"\n")
data2=json.loads(read_page.decode())
#message=data2["feed"]["data"]
message=data2
for item in message['feed']['data'][1]['comments']['data']:
print(item['message'])
print(item['from']['name'])
print(message,"\n")
He shows me something like:
{
"id": "2825921296",
"name": "MY-PAGE",
"feed": {
"data": [
{
"id": "2825921296_5155340"
},
{
"id": "2825921296_5155340",
"comments": {
"data": [
{
"from": {
"name": "Carl Jhon",
"id": "282564921296"
},
"message": "Comment one",
"created_time": "2015-12-10T03:42:05+0000",
"id": "5153352885_5153353484206"
},
{
And my question is , How to display only the 'message' and 'name' of all it shows.
Thankl and I appreciate your response.

It looks like the variable "message" in your code (not in the JSON data) is a dictionary. Subsequently, you can access the name and the first message by adding:
print(message['feed']['data'][1]['comments']['data'][0]['message'])
print(message['name'])
You can access the nth message with:
print(message['feed']['data'][1]['comments']['data'][n]['message'])
To print all of the messages, including the name of the author, you could use the for loop like this:
for item in message['feed']['data'][1]['comments']['data']:
print(item['message'])
print(item['from']['name'])
Or you can output a specific number of messages and names (100 in this case):
if len(message['feed']['data'][1]['comments']['data'])>=100:
for i in range(100):
print(message['feed']['data'][1]['comments']['data'][i]['message'])
print(message['feed']['data'][1]['comments']['data'][i]['from']['name'])
In case the message contains emojis, you can either add # -*- coding: utf-8 -*- to the top of your script or take a look at this post

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

parsing JSON with missing fields - python

Related

How to convert python-request JSON results to csv?

Reading json in python separated by newlines

Python 3 Get JSON value

Writing multiple json objects to a json file

Filter data to facebook graph with json-python

Categories

Resources