JSON to Pandas: is there a more elegant solution? - python

I have some JSON, returned from an API call, that looks something like this:
{
"result": {
"code": "OK",
"msg": ""
},
"report_name": "FAMOUS_DICTATORS",
"columns": [
"rank",
"name",
"deaths"
],
"data": [
{
"row": [
1,
"Mao Zedong",
63000000
]
},
{
"row": [
2,
"Jozef Stalin",
23000000
]
}
]
}
I'd like to convert the JSON into a Pandas DataFrame:
rank name deaths
1 Mao Zedong 63000000
2 Jozef Stalin 23000000
I wrote this and it works, but looks a bit ugly:
import pandas as pd
import json
columns = eval(r.content)['columns']
df = pd.DataFrame(columns = eval(r.content)['columns'])
for row in eval(r.content)['data']:
df.loc[len(df)+1] = row['row']
Is there a more elegant/Pythonic way to do this (e.g. possibly using pandas.io.json.read_json)?

The read_json function of pandas is a very tricky method to use. If you don't know with certainty the validity of your JSON object or whether its initial structure is sane enough to build a dataframe around, it's much better to stick to tried and tested methods to break your data down to something that pandas can use without issues 100%.
In your case, I suggest breaking down your data to a list of lists. Out of all that JSON, the only part you really need is in the data and column keys.
Try this:
import pandas as pd
import json
import urllib
js = json.loads(urllib.urlopen("test.json").read())
data = js["data"]
rows = [row["row"] for row in data] # Transform the 'row' keys to list of lists.
df = pd.DataFrame(rows, columns=js["columns"])
print df
This gives me the desired result:
rank name deaths
0 1 Mao Zedong 63000000
1 2 Jozef Stalin 23000000

see pandas.io.json.read_json(path_or_buf=None, orient=None, typ='frame', dtype=True, convert_axes=True, convert_dates=True, keep_default_dates=True, numpy=False, precise_float=False, date_unit=None
http://pandas.pydata.org/pandas-docs/dev/generated/pandas.io.json.read_json.html

Related

Convert Embedded JSON Dict To Panda DataFrame Where Columns Headers Are Seperate From Values

I'm trying to create a python pandas DataFrame out of a JSON dictionary. The embedding is tripping me up.
The column headers are in a different section of the JSON file to the values.
The json looks similar to below. There is one section of column headers and multiple sections of data.
I need each column filled with the data that relates to it. So value_one in each case will fill the column under header_one and so on.
I have come close, but can't seem to get it to spit out the dataframe as described.
{
"my_data": {
"column_headers": [
"header_one",
"header_two",
"header_three"
],
"values": [
{
"data": [
"value_one",
"value_two",
"value_three"
]
},
{
"data": [
"value_one",
"value_two",
"value_three"
]
}
]
}
}
Assuming your dictionary is my_dict, try:
>>> pd.DataFrame(data=[d["data"] for d in my_dict["my_data"]["values"]],
columns=my_dict["my_data"]["column_headers"])

Deeply nested json - a list within a dictionary to Pandas DataFrame

I'm trying to parse nested json results.
data = {
"results": [
{
"components": [
{
"times": {
"periods": [
{
"fromDayOfWeek": 0,
"fromHour": 12,
"fromMinute": 0,
"toDayOfWeek": 4,
"toHour": 21,
"toMinute": 0,
"id": 156589,
"periodId": 20855
}
],
}
}
],
}
],
}
I can get to and create dataframes for "results" and "components" lists, but cannot get to "periods" due to the "times" dict. So far I have this:
df = pd.json_normalize(data, record_path = ['results','components'])
Need a separate "periods" dataframe with the included column names and values. Would appreciate your help on this. Thank you!
I results
II components
III times
IIII periods
The normalize should be correct way:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html
There is 4 level of nesting. There can be x components in results and y times in components - however that type of nesting is overengineering?
The simplest way of getting data is:
print data['a']['b']['c']['d'] (...)
in your case:
print data['results']['components']['times']['periods']
You can access the specific nested level by this piece of code:
def GetPropertyFromPeriods (property):
propertyList = []
for x in data['results']['components']['times']:
singleProperty = photoURL['periods'][property]
propertyList.append(singleProperty)
return propertyList
This give you access to one property inside periods (fromDayOfWeek, fromHour, fromMinute)
After coverting json value, transform it into pandas dataframe:
print pd.DataFrame(data, columns=["columnA", "columnB”])
If stuck:
How to Create a table with data from JSON output in Python
Python - How to convert JSON File to Dataframe
pandas documentation:
pandas.DataFrame.from_dict
pandas.json_normalize

How to convert Nested JSON to Pandas Dataframe

I have the following nested json:
import requests
headers = {
'Content-Type': 'application/json'
}
r = requests.get("...API DATA...")
[
{
"ticker":"btcusd",
"baseCurrency":"btc",
"quoteCurrency":"usd",
"priceData":[
{
"open":3914.749407813885,
"high":3942.374263716895,
"low":3846.1755315352952,
"close":3849.1217299601617,
"date":"2019-01-02T00:00:00+00:00",
"tradesDone":756.0,
"volume":339.68131616889997,
"volumeNotional":1307474.735327181
}
]
}
]
I would like to convert it into Pandas data frame. I have been able to do this:
j = r.json()
df = pd.DataFrame.from_dict(j)
df
Output:
I would like to also expand 'priceData' in columns.
I have tried different methods, including json.normalise and json.loads, but there was always an error I could not understand.
Could please anyone show me how to do it in order to understand?
Thanks!
EDIT
priceData contains more than 1 element.
"ticker", "baseCurrency", and "quoteCurrency" are not essentially needed in the data frame, hence can be discarded.
Use json_normalize:
import pandas as pd
pd.json_normalize(data,'priceData',['ticker','baseCurrency','quoteCurrency'])
where,
data = [
{
"ticker":"btcusd",
"baseCurrency":"btc",
"quoteCurrency":"usd",
"priceData":[
{
"open":3914.749407813885,
"high":3942.374263716895,
"low":3846.1755315352952,
"close":3849.1217299601617,
"date":"2019-01-02T00:00:00+00:00",
"tradesDone":756.0,
"volume":339.68131616889997,
"volumeNotional":1307474.735327181
}
]
}
]
Edit: for multiple API calls
final_data = []
for _ in range(3): ## how ever many times you want to call the API
## get you data from each API call
final_data.extend(data)

Why is a string integer read incorrectly with pandas.read_json?

I am not the one for any hyperbole but I am really stumped by this error and i am sure you will be too..
Here is a simple json object:
[
{
"id": "7012104767417052471",
"session": -1332751885,
"transactionId": "515934477",
"ts": "2019-10-30 12:15:40 AM (+0000)",
"timestamp": 1572394540564,
"sku": "1234",
"price": 39.99,
"qty": 1,
"ex": [
{
"expId": 1007519,
"versionId": 100042440,
"variationId": 100076318,
"value": 1
}
]
}
]
Now I saved the file into ex.json and then executed the following python code:
import pandas as pd
df = pd.read_json('ex.json')
When i see the dataframe the value of my id has changed from "7012104767417052471" to "7012104767417052160"py
Does anyone understand why python does this? I tried it in node, js, and even excel and it is looking fine in everything else..
If I do this I get the right id:
with open('Siva.json') as data_file:
data = json.load(data_file)
df = json_normalize(data)
But I want to understand why pandas doesn't process json properly in a strange way.
This is a known issue:
This has been an OPEN issue since 2018-04-04
read_json reads large integers as strings incorrectly if dtype not explicitly mentioned #20608
As stated in the issue. Explicitly designate the dtype to get the correct number.
import pandas as pd
df = pd.read_json('test.json', dtype={'id': 'int64'})
id session transactionId ts timestamp sku price qty ex
7012104767417052471 -1332751885 515934477 2019-10-30 12:15:40 AM (+0000) 2019-10-30 00:15:40.564 1234 39.99 1 [{'expId': 1007519, 'versionId': 100042440, 'variationId': 100076318, 'value': 1}]

Create a data frame from a complex nested dictionary?

I have a big nested, then nested then nested json file saved as .txt format. I need to access some specific key pairs and crate a data frame or another transformed json object for further use. Here is a small sample with 2 key pairs.
[
{
"ko_id": [819752],
"concepts": [
{
"id": ["11A71731B880:http://ontology.intranet.com/Taxonomy/116#en"],
"uri": ["http://ontology.intranet.com/Taxonomy/116"],
"language": ["en"],
"prefLabel": ["Client coverage & relationship management"]
}
]
},
{
"ko_id": [819753],
"concepts": [
{
"id": ["11A71731B880:http://ontology.intranet.com/Taxonomy/116#en"],
"uri": ["http://ontology.intranet.com/Taxonomy/116"],
"language": ["en"],
"prefLabel": ["Client coverage & relationship management"]
}
]
}
]
The following code load the data as list but I need to access to the data probably as a dictionary and I need the "ko_id", "uri" and "prefLabel" from each key pair and put it to a pandas data frame or a dictionary for further analysis.
with open('sample_data.txt') as data_file:
json_sample = js.load(data_file)
The following code gives me the exact value of the first element. But donot actually know how to put it together and build the ultimate algorithm to create the dataframe.
print(sample_dict["ko_id"][0])
print(sample_dict["concepts"][0]["prefLabel"][0])
print(sample_dict["concepts"][0]["uri"][0])
for record in sample_dict:
df = pd.DataFrame(record['concepts'])
df['ko_id'] = record['ko_id']
final_df = final_df.append(df)
You can pass the data to pandas.DataFrame using a generator:
import pandas as pd
import json as js
with open('sample_data.txt') as data_file:
json_sample = js.load(data_file)
df = pd.DataFrame(data = ((key["ko_id"][0],
key["concepts"][0]["prefLabel"][0],
key["concepts"][0]["uri"][0]) for key in json_sample),
columns = ("ko_id", "prefLabel", "uri"))
Output:
>>> df
ko_id prefLabel uri
0 819752 Client coverage & relationship management http://ontology.intranet.com/Taxonomy/116
1 819753 Client coverage & relationship management http://ontology.intranet.com/Taxonomy/116

Categories

Resources