Changing JSON to Dataframe in python [closed]

Changing JSON to Dataframe in python [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I want to change the API json response to dataframe by making columns under data to dataframe. Note it also has some nested parameters under data (message) I want to make it individual columns.
{
'success': True,
'code': 200,
'data': [
{
'id': 342964769,
'type': 'ios',
'create_time': 1567591650,
'open_count': 2,
'environment': 'production',
'campaign_id': 12713145,
'project_id': 1758,
'error': 0,
'sent_count': 3,
'message': {
'timestamp': '1567591643',
'badge': '',
'alert': "'I pulled pints and cut turf here back in the day' - Mike Pence speaks to "
"small crowd in Doonbeg",
'sound': 'default',
'articleId': '38465289',
'category': 'news/',
'id': '342964769',
'pushId': 'fireabse-5d6f8cdb8c3e9',
'title': 'Independent.ie',
'content-available': '1',
'xpush': 'yes',
'cid': '12713145'
},
'error_message': None
}, {
'id': 342964771,
'type': 'android',
'create_time': 1567591650,
'open_count': 0,
'environment': 'production',
'campaign_id': 12713145,
'project_id': 1758,
'error': 0,
'sent_count': 0,
'message': None,
'error_message': None
}
]
}

It's what you want?
def dict_pop(d, *args):
v = d.pop(*args)
return v if v else {}
resp = [{**dict_pop(i,'message'), **i} for i in resp['data']]
resp = pd.DataFrame(resp)

You can flatten the dictionaries by removing the message level and making each entry of the dictionary part of the parent dict:
import pandas as pd
import copy
data = {
'success': True,
'code': 200,
'data': [
{
'id': 342964769,
'type': 'ios',
'create_time': 1567591650,
'open_count': 2,
'environment': 'production',
'campaign_id': 12713145,
'project_id': 1758,
'error': 0,
'sent_count': 3,
'message': {
'timestamp': '1567591643',
'badge': '',
'alert': "'I pulled pints and cut turf here back in the day' - Mike Pence speaks to "
"small crowd in Doonbeg",
'sound': 'default',
'articleId': '38465289',
'category': 'news/',
'id': '342964769',
'pushId': 'fireabse-5d6f8cdb8c3e9',
'title': 'Independent.ie',
'content-available': '1',
'xpush': 'yes',
'cid': '12713145'
},
'error_message': None
}, {
'id': 342964771,
'type': 'android',
'create_time': 1567591650,
'open_count': 0,
'environment': 'production',
'campaign_id': 12713145,
'project_id': 1758,
'error': 0,
'sent_count': 0,
'message': None,
'error_message': None
}
]
}
processed = []
for dat in data["data"]:
new_dat = copy.deepcopy(dat) # only important if the original data matters to you
if "message" in new_dat and new_dat["message"]:
message = new_dat.pop("message")
new_dat.update(message)
processed.append(new_dat)
df = pd.DataFrame(processed)
print(df.columns)
Output:
Index(['id', 'type', 'create_time', 'open_count', 'environment', 'campaign_id',
'project_id', 'error', 'sent_count', 'error_message', 'timestamp',
'badge', 'alert', 'sound', 'articleId', 'category', 'pushId', 'title',
'content-available', 'xpush', 'cid', 'message'],
dtype='object')

Related

JSON viewers don't accept my pattern even after dict going through json.dumps() + json.loads()

The result when printing after a = json.dumps(dicter) and print(json.loads(a)) is this:
{
'10432981': {
'tournament': {
'name': 'Club Friendly Games',
'slug': 'club-friendly-games',
'category': {
'name': 'World',
'slug': 'world',
'sport': {
'name': 'Football',
'slug': 'football',
'id': 1
},
'id': 1468,
'flag': 'international'
},
'uniqueTournament': {
'name': 'Club Friendly Games',
'slug': 'club-friendly-games',
'category': {
'name': 'World',
'slug': 'world',
'sport': {
'name': 'Football',
'slug': 'football',
'id': 1
},
'id': 1468,
'flag': 'international'
},
'userCount': 0,
'hasPositionGraph': False,
'id': 853,
'hasEventPlayerStatistics': False,
'displayInverseHomeAwayTeams': False
},
'priority': 0,
'id': 86
}
}
}
But when trying to read in any json viewer, they warn that the format is incorrect but don't specify where the problem is.
If it doesn't generate any error when converting the dict to JSON and not even when reading it, why do views warn of failure?

You must enclose the strings using double quotes ("). The json.loads returns a python dictionary, so it is not a valid JSON object. If you want to get valid JSON you can get the string that json.dumps returns.

Remove item in JSON if key has value

I have tried everything I can possible come up with, but the value wont go away.
I have a JSON user and if user['permissions'] have key permission = "DELETE PAGE" remove that index of del user['permissions'][1] (in this example)
I want to have a list of possible values as "DELETE PAGE" and so on. If value in key, then delete that index.
Then return the users json without those items found.
I have tried del user['permission][x] and .pop() and so on but it is still there.
{
'id': 123,
'name': 'My name',
'description': 'This is who I am',
'permissions': [{
'id': 18814,
'holder': {
'type': 'role',
'parameter': '321',
'projectRole': {
'name': 'Admin',
'id': 1,
}
},
'permission': 'VIEW PAGE'
}, {
'id': 18815,
'holder': {
'type': 'role',
'parameter': '123',
'projectRole': {
'name': 'Moderator',
'id': 2
}
},
'permission': 'DELETE PAGE'
}]
}

Here's the code:
perm = a['permissions']
for p in perm:
if p['permission'] == 'DELETE PAGE':
perm.remove(p)
print(a)
Output:
{'id': 123, 'name': 'My name', 'description': 'This is who I am', 'permissions': [{'id': 18814, 'holder': {'type': 'role', 'parameter': '321', 'projectRole': {'name': 'Admin', 'id': 1}}, 'permission': 'VIEW PAGE'}]}

How can I make json data from requests into excel file?

This is my first time dealing with json data. So I'm not that familiar with the structure of json.
I got some data through "we the people" e-petition sites with following code:
url = "https://api.whitehouse.gov/v1/petitions.json?limit=3&offset=0&createdBefore=1573862400"
jdata_2 = requests.get(url).json()
Yet, I realize this is something different from... the ordinary json structure since I got some error while I tried to convert it into excel file with pandas
df = pandas.read_json(jdata_2)
Obviously, I must miss something which I must have done before using pandas.read_json() code.
I have searched for the answer but most of questions are "How can I convert json data into excel data", which needs json data. For my case, I scraped it from the url, so I thought I could make that strings into json data, and then try to convert it into excel data as well. So I tried to use json.dump() as well, but it didn't work as well.
I know it must be the naive question. But I'm not sure where I can start with this naive question. If anyone can instruct me how to deal with it, I would really appreciate it. Or link me some references that I can study as well.
Thank you for your help in advance.
This is the json data with the requests, and I pprint it with indent=4.
Input:
url = "https://api.whitehouse.gov/v1/petitions.json?limit=3&offset=0&createdBefore=1573862400"
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(jdata_2)
Output :
{ 'metadata': { 'requestInfo': { 'apiVersion': 1,
'query': { 'body': None,
'createdAfter': None,
'createdAt': None,
'createdBefore': '1573862400',
'isPublic': 1,
'isSignable': None,
'limit': '3',
'mock': 0,
'offset': '0',
'petitionsDefaultLimit': '1000',
'publicThreshold': 149,
'responseId': None,
'signatureCount': None,
'signatureCountCeiling': None,
'signatureCountFloor': 0,
'signatureThreshold': None,
'signatureThresholdCeiling': None,
'signatureThresholdFloor': None,
'sortBy': 'DATE_REACHED_PUBLIC',
'sortOrder': 'ASC',
'status': None,
'title': None,
'url': None,
'websiteUrl': 'https://petitions.whitehouse.gov'},
'resource': 'petitions'},
'responseInfo': { 'developerMessage': 'OK',
'errorCode': '',
'moreInfo': '',
'status': 200,
'userMessage': ''},
'resultset': {'count': 1852, 'limit': 3, 'offset': 0}},
'results': [ { 'body': 'Please save kurdish people in syria \r\n'
'pleaee save north syria',
'created': 1570630389,
'deadline': 1573225989,
'id': '2798897',
'isPublic': True,
'isSignable': False,
'issues': [ { 'id': 326,
'name': 'Homeland Security & '
'Defense'}],
'petition_type': [ { 'id': 291,
'name': 'Call on Congress to '
'act on an issue'}],
'reachedPublic': 0,
'response': [],
'signatureCount': 149,
'signatureThreshold': 100000,
'signaturesNeeded': 99851,
'status': 'closed',
'title': 'Please save rojava north syria\r\n'
'please save kurdish people\r\n'
'please stop erdogan\r\n'
'plaease please',
'type': 'petition',
'url': 'https://petitions.whitehouse.gov/petition/please-save-rojava-north-syria-please-save-kurdish-people-please-stop-erdogan-plaease-please'},
{ 'body': 'Kane Friess was a 2 year old boy who was '
"murdered by his mom's boyfriend, Gyasi "
'Campbell. Even with expert statements from '
'forensic anthropologists, stating his injuries '
'wete the result of homicide. Mr. Campbell was '
'found guilty of involuntary manslaughter. This '
"is an outrage to Kane's Family and our "
'community.',
'created': 1566053365,
'deadline': 1568645365,
'id': '2782248',
'isPublic': True,
'isSignable': False,
'issues': [ { 'id': 321,
'name': 'Criminal Justice Reform'}],
'petition_type': [ { 'id': 281,
'name': 'Change an existing '
'Administration '
'policy'}],
'reachedPublic': 0,
'response': [],
'signatureCount': 149,
'signatureThreshold': 100000,
'signaturesNeeded': 99851,
'status': 'closed',
'title': "Kane's Law. Upon which the murder of a child, "
'regardless of circumstances, be seen as 1st '
'degree murder. A Federal Law.',
'type': 'petition',
'url': 'https://petitions.whitehouse.gov/petition/kanes-law-upon-which-murder-child-regardless-circumstances-be-seen-1st-degree-murder-federal-law'},
{ 'body': "Schumer and Pelosi's hatred and refusing to "
'work with President Donald J. Trump is holding '
'America hostage. We the people know securing '
'our southern border is a priority which will '
'not happen with these two in office. Lets '
'build the wall NOW!',
'created': 1547050064,
'deadline': 1549642064,
'id': '2722358',
'isPublic': True,
'isSignable': False,
'issues': [ {'id': 306, 'name': 'Budget & Taxes'},
{ 'id': 326,
'name': 'Homeland Security & '
'Defense'},
{'id': 29, 'name': 'Immigration'}],
'petition_type': [ { 'id': 291,
'name': 'Call on Congress to '
'act on an issue'}],
'reachedPublic': 0,
'response': [],
'signatureCount': 149,
'signatureThreshold': 100000,
'signaturesNeeded': 99851,
'status': 'closed',
'title': 'Remove Chuck Schumer and Nancy Pelosi from '
'office',
'type': 'petition',
'url': 'https://petitions.whitehouse.gov/petition/remove-chuck-schumer-and-nancy-pelosi-office'}]}
And this is the Error message I got
Input :
df = pandas.read_json(jdata_2)
Output :
ValueError: Invalid file path or buffer object type: <class 'dict'>

You can try the below code as well, it is working fine
URL = "https://api.whitehouse.gov/v1/petitions.json?limit=3&offset=0&createdBefore=1573862400"
// fetching the json response from the URL
req = requests.get(URL)
text_data= req.text
json_dict= json.loads(text_data)
//converting json dictionary to python dataframe for results object
df = pd.DataFrame.from_dict(json_dict["results"])
Finally, saving the dataframe to excel format i.e xlsx
df.to_excel("output.xlsx")

How can I print specific integer variables in a nested dictionary by using Python?

This is my first question :)
I loop over a nested dictionary to print specific values. I am using the following code.
for i in lizzo_top_tracks['tracks']:
print('Track Name: ' + i['name'])
It works for string variables, but does not work for other variables. For example, when I use the following code for the date variable:
for i in lizzo_top_tracks['tracks']:
print('Album Release Date: ' + i['release_date'])
I receive a message like this KeyError: 'release_date'
What should I do?
Here is a sample of my nested dictionary:
{'tracks': [{'album': {'album_type': 'album',
'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/56oDRnqbIiwx4mymNEv7dS'},
'href': 'https://api.spotify.com/v1/artists/56oDRnqbIiwx4mymNEv7dS',
'id': '56oDRnqbIiwx4mymNEv7dS',
'name': 'Lizzo',
'type': 'artist',
'uri': 'spotify:artist:56oDRnqbIiwx4mymNEv7dS'}],
'external_urls': {'spotify': 'https://open.spotify.com/album/74gSdSHe71q7urGWMMn3qB'},
'href': 'https://api.spotify.com/v1/albums/74gSdSHe71q7urGWMMn3qB',
'id': '74gSdSHe71q7urGWMMn3qB',
'images': [{'height': 640,
'width': 640}],
'name': 'Cuz I Love You (Deluxe)',
'release_date': '2019-05-03',
'release_date_precision': 'day',
'total_tracks': 14,
'type': 'album',
'uri': 'spotify:album:74gSdSHe71q7urGWMMn3qB'}]}

The code you posted isn't syntactically correct; running it through a Python interpreter gives a syntax error on the last line. It looks like you lost a curly brace somewhere toward the end. :)
I went through it and fixed up the white space to make the structure easier to see; the way you had it formatted made it hard to see which keys were at which level of nesting, but with consistent indentation it becomes much clearer:
lizzo_top_tracks = {
'tracks': [{
'album': {
'album_type': 'album',
'artists': [{
'external_urls': {
'spotify': 'https://open.spotify.com/artist/56oDRnqbIiwx4mymNEv7dS'
},
'href': 'https://api.spotify.com/v1/artists/56oDRnqbIiwx4mymNEv7dS',
'id': '56oDRnqbIiwx4mymNEv7dS',
'name': 'Lizzo',
'type': 'artist',
'uri': 'spotify:artist:56oDRnqbIiwx4mymNEv7dS'
}],
'external_urls': {
'spotify': 'https://open.spotify.com/album/74gSdSHe71q7urGWMMn3qB'
},
'href': 'https://api.spotify.com/v1/albums/74gSdSHe71q7urGWMMn3qB',
'id': '74gSdSHe71q7urGWMMn3qB',
'images': [{'height': 640, 'width': 640}],
'name': 'Cuz I Love You (Deluxe)',
'release_date': '2019-05-03',
'release_date_precision': 'day',
'total_tracks': 14,
'type': 'album',
'uri': 'spotify:album:74gSdSHe71q7urGWMMn3qB'
}
}]
}
So the first (and only) value you get for i in lizzo_top_tracks['tracks'] is going to be this dictionary:
i = {
'album': {
'album_type': 'album',
'artists': [{
'external_urls': {
'spotify': 'https://open.spotify.com/artist/56oDRnqbIiwx4mymNEv7dS'
},
'href': 'https://api.spotify.com/v1/artists/56oDRnqbIiwx4mymNEv7dS',
'id': '56oDRnqbIiwx4mymNEv7dS',
'name': 'Lizzo',
'type': 'artist',
'uri': 'spotify:artist:56oDRnqbIiwx4mymNEv7dS'
}],
'external_urls': {
'spotify': 'https://open.spotify.com/album/74gSdSHe71q7urGWMMn3qB'
},
'href': 'https://api.spotify.com/v1/albums/74gSdSHe71q7urGWMMn3qB',
'id': '74gSdSHe71q7urGWMMn3qB',
'images': [{'height': 640, 'width': 640}],
'name': 'Cuz I Love You (Deluxe)',
'release_date': '2019-05-03',
'release_date_precision': 'day',
'total_tracks': 14,
'type': 'album',
'uri': 'spotify:album:74gSdSHe71q7urGWMMn3qB'
}
}
The only key in this dictionary is 'album', the value of which is another dictionary that contains all the other information. If you want to print, say, the album release date and a list of the artists' names, you'd do:
for track in lizzo_top_tracks['tracks']:
print('Album Release Date: ' + track['album']['release_date'])
print('Artists: ' + str([artist['name'] for artist in track['album']['artists']]))
If these are dictionaries that you're building yourself, you might want to remove some of the nesting layers where there's only a single key, since they just make it harder to navigate the structure without giving you any additional information. For example:
lizzo_top_albums = [{
'album_type': 'album',
'artists': [{
'external_urls': {
'spotify': 'https://open.spotify.com/artist/56oDRnqbIiwx4mymNEv7dS'
},
'href': 'https://api.spotify.com/v1/artists/56oDRnqbIiwx4mymNEv7dS',
'id': '56oDRnqbIiwx4mymNEv7dS',
'name': 'Lizzo',
'type': 'artist',
'uri': 'spotify:artist:56oDRnqbIiwx4mymNEv7dS'
}],
'external_urls': {
'spotify': 'https://open.spotify.com/album/74gSdSHe71q7urGWMMn3qB'
},
'href': 'https://api.spotify.com/v1/albums/74gSdSHe71q7urGWMMn3qB',
'id': '74gSdSHe71q7urGWMMn3qB',
'images': [{'height': 640, 'width': 640}],
'name': 'Cuz I Love You (Deluxe)',
'release_date': '2019-05-03',
'release_date_precision': 'day',
'total_tracks': 14,
'type': 'album',
'uri': 'spotify:album:74gSdSHe71q7urGWMMn3qB'
}]
This structure allows you to write the query the way you were originally trying to do it:
for album in lizzo_top_albums:
print('Album Release Date: ' + album['release_date'])
print('Artists: ' + str([artist['name'] for artist in album['artists']]))
Much simpler, right? :)

Create 2 records from JSON Array having Structs

I have a JSON array which is in format below:
{
"id": "1",
"active": "True",
"gender": "female",
"coding": [
{
"system": "http://loinc.org",
"code": "8310-5",
"display": "Body temperature"
},
{
"system": "http://loinc.org",
"code": "8716-3",
"display": "Vital Signs grouping"
}
]
}
- I need output as two records. is it possible can someone help me with the Python code
{"id": "1","active": "True","gender": "female",{"system": "http://loinc.org","code": "8310-5","display": "Body temperature"},
{"id": "1","active": "True","gender": "female",{"system": "http://loinc.org","code": "8716-3","display": "Vital Signs grouping"}

I'm going to assume you want the codings in their own key since your question wasn't clear
import json
obj = json.loads(s) # where s is your json string
objs = [] # where we will store the results
for coding in obj['coding']:
new_obj = obj.copy()
new_obj['coding'] = coding # set the coding entry to one coding
objs.append(new_obj)
Output of objs:
[{'active': 'True',
'coding': {'code': '8310-5',
'display': 'Body temperature',
'system': 'http://loinc.org'},
'gender': 'female',
'id': '1'},
{'active': 'True',
'coding': {'code': '8716-3',
'display': 'Vital Signs grouping',
'system': 'http://loinc.org'},
'gender': 'female',
'id': '1'}]
If you want just a flat dict then
objs = []
for coding in obj['coding']:
new_obj = obj.copy()
del new_obj['coding']
new_obj.update(coding)
objs.append(new_obj)
Now objs is:
[{'active': 'True',
'code': '8310-5',
'display': 'Body temperature',
'gender': 'female',
'id': '1',
'system': 'http://loinc.org'},
{'active': 'True',
'code': '8716-3',
'display': 'Vital Signs grouping',
'gender': 'female',
'id': '1',
'system': 'http://loinc.org'}]

You can do it like this:
import json
input_dict = json.loads(myjson)
base = input_dict.copy()
base.pop('coding')
output = [dict(base, **c) for c in input_dict['coding']]
print(output)
Output:
[{'active': 'True', 'code': '8310-5', 'display': 'Body temperature', 'gender': 'female', 'id': '1', 'system': 'http://loinc.org'},
{'active': 'True', 'code': '8716-3', 'display': 'Vital Signs grouping', 'gender': 'female', 'id': '1', 'system': 'http://loinc.org'}]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Changing JSON to Dataframe in python [closed] - python

It's what you want? def dict_pop(d, args): v = d.pop(args) return v if v else {} resp = [{dict_pop(i,'message'), i} for i in resp['data']] resp = pd.DataFrame(resp)

Related

JSON viewers don't accept my pattern even after dict going through json.dumps() + json.loads()

Remove item in JSON if key has value

How can I make json data from requests into excel file?

How can I print specific integer variables in a nested dictionary by using Python?

Create 2 records from JSON Array having Structs

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Changing JSON to Dataframe in python [closed] - python

It's what you want? def dict_pop(d, *args): v = d.pop(*args) return v if v else {} resp = [{**dict_pop(i,'message'), **i} for i in resp['data']] resp = pd.DataFrame(resp)

Related

JSON viewers don't accept my pattern even after dict going through json.dumps() + json.loads()

Remove item in JSON if key has value

How can I make json data from requests into excel file?

How can I print specific integer variables in a nested dictionary by using Python?

Create 2 records from JSON Array having Structs

Categories

Resources

It's what you want? def dict_pop(d, args): v = d.pop(args) return v if v else {} resp = [{dict_pop(i,'message'), i} for i in resp['data']] resp = pd.DataFrame(resp)