JSON format for a dictionary with Pandas Dataframe as values - python

I need to return, from a web framework (Flask for instance), a few dataframes and a string in a single Json object. My code looks something like this:
import pandas as pd
data1 = [['Alex',10],['Bob',12],['Clarke',13]]
df1 = pd.DataFrame(data1,columns=['Name','Age'])
data2 = [['Cycle',5],['Run',1],['Hike',7]]
df2 = pd.DataFrame(data1,columns=['Sport','Duration'])
test_value={}
test_value["df1"] = df1.to_json(orient='records')
test_value["df2"] = df2.to_json(orient='records')
print(json.dumps(test_value))
This outputs :
{"df1": "[{\"Name\":\"Alex\",\"Age\":10},{\"Name\":\"Bob\",\"Age\":12},{\"Name\":\"Clarke\",\"Age\":13}]", "df2": "[{\"Sport\":\"Alex\",\"Duration\":10},{\"Sport\":\"Bob\",\"Duration\":12},{\"Sport\":\"Clarke\",\"Duration\":13}]"}
So a number of escape characters in front of every key of the value of "df1" and "df2". If on the other hand, I look at test_value, I get:
{'df1': '[{"Name":"Alex","Age":10},{"Name":"Bob","Age":12},{"Name":"Clarke","Age":13}]', 'df2': '[{"Sport":"Alex","Duration":10},{"Sport":"Bob","Duration":12},{"Sport":"Clarke","Duration":13}]'}
Which is not quite right. What I need is 'df1' to be in double quotes "df1". Short of doing a search and replace in the string, what is the way to achieve that?
I've even tried to create the string myself, doing something like that :
print('\{"test": "{0:.2f}"\}'.format(123))
but I get this error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-266-1fa35152436c> in <module>
----> 1 print('\{"test": "{0:.2f}"\}'.format(123))
KeyError: '"test"'
which I really don't get :). That said, there must be a better way then searching/replacing for the 'df1' for "df1".
Ideas?

There is double converting to json, in to_json and in json.dumps functions. Solution is convert values to dictionaries by DataFrame.to_dict and then only once to json by json.dumps:
test_value={}
test_value["df1"] = df1.to_dict(orient='records')
test_value["df2"] = df2.to_dict(orient='records')
print(json.dumps(test_value))
{"df1": [{"Name": "Alex", "Age": 10},
{"Name": "Bob", "Age": 12},
{"Name": "Clarke", "Age": 13}],
"df2": [{"Sport": "Alex", "Duration": 10},
{"Sport": "Bob", "Duration": 12},
{"Sport": "Clarke", "Duration": 13}]}

Related

parse JSON file to CSV with key values null in python

Example
{"data":"value1","version":"value2","version1":"value3"}
{"data":"value1","version1":"value3"}
{"data":"value1","version1":"value3","hi":{"a":"true,"b":"false"}}
I have a JSON file and need to convert it to csv, however the rows are not having same columns, and some rows have nested attributes,how to convert them in python script.
I tried JSON to csv using Python code, but it gives me an error
In order to convert a JSON file to a CSV file in Python, you will need to use the Pandas library.
import pandas as pd
data = [
{
"data": "value1",
"version": "value2",
"version1": "value3"
},
{
"data": "value1",
"version1": "value3"
},
{
"data": "value1",
"version1": "value3",
"hi": {
"a": "true,",
"b": "false"
}
}
]
df = pd.DataFrame(data)
df.to_csv('data.csv', index=False)
I have correctly formatted your JSON since it was giving errors.
You could convert the JSON data to a flat list of lists with column names on the first line. Then process that to make the CSV output.
def flatDict(D,p=""):
if not isinstance(D,dict):
return {"":D}
return {p+k+s:v for k,d in D.items() for s,v in flatDict(d,".").items()}
def flatData(data):
lines = [*map(flatDict,data)]
names = dict.fromkeys(k for d in lines for k in d)
return [[*names]] + [ [*map(line.get,names)] for line in lines ]
The flatDict function converts a nested dictionary structure to a single level dictionary with nested keys combined and brought up to the top level. This is done recursively so that it works for any depth of nesting
The flatData function processes each line, to make a list of flattened dictionaries (lines). The union of all keys in that list forms the list of columns names (using a dictionary constructor to get them in order of appearance). The list of names and lines is returned by converting each dictionary to a list mapping key names to line data where present (using the .get() method of dictionaries).
output:
E = [{"data":"value1","version":"value2","version1":"value3"},
{"data":"value1","version1":"value3"},
{"data":"value1","version1":"value3","hi":{"a":"true","b":"false"}} ]
for line in flatData(E):
print(line)
['data', 'version', 'version1', 'hi.a', 'hi.b'] # col names
['value1', 'value2', 'value3', None, None] # data ...
['value1', None, 'value3', None, None]
['value1', None, 'value3', 'true', 'false']

Why is a string integer read incorrectly with pandas.read_json?

I am not the one for any hyperbole but I am really stumped by this error and i am sure you will be too..
Here is a simple json object:
[
{
"id": "7012104767417052471",
"session": -1332751885,
"transactionId": "515934477",
"ts": "2019-10-30 12:15:40 AM (+0000)",
"timestamp": 1572394540564,
"sku": "1234",
"price": 39.99,
"qty": 1,
"ex": [
{
"expId": 1007519,
"versionId": 100042440,
"variationId": 100076318,
"value": 1
}
]
}
]
Now I saved the file into ex.json and then executed the following python code:
import pandas as pd
df = pd.read_json('ex.json')
When i see the dataframe the value of my id has changed from "7012104767417052471" to "7012104767417052160"py
Does anyone understand why python does this? I tried it in node, js, and even excel and it is looking fine in everything else..
If I do this I get the right id:
with open('Siva.json') as data_file:
data = json.load(data_file)
df = json_normalize(data)
But I want to understand why pandas doesn't process json properly in a strange way.
This is a known issue:
This has been an OPEN issue since 2018-04-04
read_json reads large integers as strings incorrectly if dtype not explicitly mentioned #20608
As stated in the issue. Explicitly designate the dtype to get the correct number.
import pandas as pd
df = pd.read_json('test.json', dtype={'id': 'int64'})
id session transactionId ts timestamp sku price qty ex
7012104767417052471 -1332751885 515934477 2019-10-30 12:15:40 AM (+0000) 2019-10-30 00:15:40.564 1234 39.99 1 [{'expId': 1007519, 'versionId': 100042440, 'variationId': 100076318, 'value': 1}]

List key values for Json data file

I have a very long json file, that I need make sense of in order to query the correct data that is related to what I am interested in. In order to do this, I would like to extract all of the key values in order to know what is available to query. Is there an quick way of doing this, or should I just write a parser that traverses the json file and extracts anything in-between either { and : or , and : ?
given the example:
[{"Name": "key1", "Value": "value1"}, {"Name": "key2", "Value": "value2"}]
I am looking for the values:
"Name"
"Value"
That will depend on if there's any nesting. But the basic pattern is something like this:
import json
with open("foo.json", "r") as fh:
data = json.load(fh)
all_keys = set()
for datum in data:
keys = set(datum.keys())
all_keys.update(keys)
This:
dict = [{"Name": "key1", "Value": "value1"}, {"Name": "key2", "Value": "value2"}]
for val in dict:
print(val.keys())
gives you:
dict_keys(['Name', 'Value'])
dict_keys(['Name', 'Value'])

Load a dataframe from a single json object

I have the following json object:
{
"Name": "David",
"Gender": "M",
"Date": "2014-01-01",
"Address": {
"Street": "429 Ford",
"City": "Oxford",
"State": "DE",
"Zip": 1009
}
}
How would I load this into a pandas dataframe so that it orients itself as:
name gender date address
David M 20140-01-01 {...}
What I'm trying now is:
pd.read_json(file)
But it orients it as four records instead of one.
You should read it as a Series and then (optionally) convert to a DataFrame:
df = pd.DataFrame(pd.read_json(file, typ='series')).T
df.shape
#(1, 4)
if your JSON file is composed of 1 JSON object per line (not an array, not a pretty printed JSON object)
then you can use:
df = pd.read_json(file, lines=True)
and it will do what you want
if file contains:
{"Name": "David","Gender": "M","Date": "2014-01-01","Address": {"Street": "429 Ford","City": "Oxford","State": "DE","Zip": 1009}}
on 1 line, then you get:
If you use
df = pd.read_json(file, orient='records')
you can load as 1 key per column, but the sub-keys will be split up into multiple rows.

Unpacking nested dictionary list within python

I would be very grateful if someone could suggest a more Pythonic way of handling the following issue:
Problem:
I have a json object parsed into a python object (dict). The issue I have is that the json object structure is a list of dictionaries(dict1). These dictionaries contain a dictionary(dict2).
I would like to parse all the content of dict1 and combine the contents of dict2 within dict1.
Thereafter, I would like to parse this into pandas.
json_object = {
"data": [{
"complete": "true",
"data_two": {
"a": "5",
"b": "6",
"c": "6",
"d": "8"
},
"time": "2016-10-17",
"End_number": 2
},
{
"complete": "true",
"data_two": {
"a": "11",
"b": "21",
"c": "31",
"d": "41"
},
"time": "2016-10-17",
"End_number": 1
}
],
"Location": "DE",
"End Zone": 5
}
My attempt:
dataList = json_object['data']
Unpacked_Data = [(d['time'],d['End_number'], d['data_two'].keys(),d['data_two'].values()) for d in dataList]
Unpacked_Data is a list of tuples that now contains (time, end_number, [List of keys], [list of values])
To use this in a Pandas dataframe I would then need to unpack the two lists within my tuple. --> is there an easy way to unpack lists within a tuple?
Is there a better and more elegant/Pythonic way of approaching this problem?
Thanks,
12avi
One way (using pandas) is to start by putting everything into a dataframe, then apply pd.Series to it:
df = pd.DataFrame(Unpacked_Data)
unpacked0 = df[2].apply(lambda x: pd.Series(list(x)))
unpacked1 = df[3].apply(lambda x: pd.Series(list(x)))
pd.concat((df[[0,1]],unpacked0,unpacked1))
The other way is to use list comprehension and argument unpacking:
df = pd.DataFrame([[a,b,*c,*d] for a,b,c,d in Unpacked_Data])
However, the second method may not line up the way you want it if the packed lists are not of the same length.

Categories

Resources