I have few columns in my data that are encoded in JSON format. I am trying to convert this data into several columns in python pandas dataframe.I have if problems with the empty JSON objects it's not letting me use the JSON.loads function to decode the JSON to several columns.
Here is my code:
json_columns = ['customDimensions','device','geoNetwork','hits','totals','trafficSource']
for json in json_columns:
if df[json][0].startswith('['):
df[json] = df[json].apply(lambda x: x[1:-1])
df[json] = df[json].apply(lambda x: x.replace('\'',"\""))
json_load = df[json].apply(loads)
json_list = list(json_load)
json_data = dumps(json_list)
df = df.join(pd.read_json(json_data))
df = df.drop(json,axis=1)
Here are some examples:
"customDimensions": [], "customMetrics": [], "customVariables": [],"experiment": []
I want these empty object to return NULL when it gets decoded into a pandas data frame. hits columns has the above empty JSON object which is resulting in the below error.
JSONDecodeError: Expecting value: line 1 column 79 (char 78)
I have shared a sample from my source data frame. Just in case if anyone needs to have a look at it.
URL: https://drive.google.com/file/d/142tOk03WxxPF30xE9G_UtJ2j0KbpHWc4/view?usp=sharing
Thanks
Related
Hi I am making a call to a web service from Python with the following code:
response = urllib.request.urlopen(req)
string = response.read().decode('utf-8')
json_obj = json.loads(string)
df = pd.DataFrame(json_obj)
print(df)
The result of this is:
Results
forecast [2.1632421537363355]
index [{'SaleDate': 1644278400000, 'OfferingGroupId': 0...
prediction_interval [[-114.9747272420262, 119.30121154949884]]
What I am trying to do now is to have data in DataFrame as:
Forecast SaleDate OfferingGroupId
2.1632421537363355 2022-02-08 0
I have tried so many different things that have lost the count.
Could you please help me with this?
You could first convert the json string to a dictionary (thanks #JonSG):
import json
response = urllib.request.urlopen(req)
string = response.read().decode('utf-8')
data = json.loads(string)
or use the json method of response:
data = response.json()
then use pandas.json_normalize where you can directly pass in the record and meta paths of your data to convert the dictionary to a pandas DataFrame object:
import pandas as pd
out = pd.json_normalize(data['Results'], record_path = ['index'], meta = ['forecast'])
Output:
SaleDate OfferingGroupId forecast
0 1644278400000 0 2.163242
I have a data frame with 2 columns, one of which consists of URLs.
Sample code:
df = pd.DataFrame(columns=('name', 'image'))
df = df.append({'name': 'sample_name', 'image': 'https://images.pexels.com/photos/736230/pexels-photo-736230.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500'}, ignore_index=True)
df = df.append({'name': 'sample_name2', 'image': 'https://cdn.theatlantic.com/assets/media/img/mt/2017/10/Pict1_Ursinia_calendulifolia/lead_720_405.jpg?mod=1533691909'}, ignore_index=True)
I want to convert this dataframe to JSON directly. I've used to_json() method of DataFrame to convert, but when I do it, it kind of messes up the urls in the data frame.
Conversion to JSON:
json = df.to_json(orient='records')
When I print it, the conversion inserts '\' character to beginning of every '/' character in my url.
print(json)
Result:
[{"name":"sample_name","image":"https:\/\/images.pexels.com\/photos\/736230\/pexels-photo-736230.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500"},{"na
me":"sample_name2","image":"https:\/\/cdn.theatlantic.com\/assets\/media\/img\/mt\/2017\/10\/Pict1_Ursinia_calendulifolia\/lead_720_405.jpg?mod=15
33691909"}]
I want the json to look like (no extra '\' in urls):
[{"name":"sample_name","image":"https://images.pexels.com/photos/736230/pexels-photo-736230.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500"},{"na
me":"sample_name2","image":"https://cdn.theatlantic.com/assets/media/img/mt/2017/10/Pict1_Ursinia_calendulifolia/lead_720_405.jpg?mod=15
33691909"}]
I checked documentation of to_json() and other questions as well but couldn't find an answer to deal with it. How can I just convert my url strings to json, as they are in data frame?
Pandas uses ujson [PiPy] internally to encode the data to a JSON blob. ujson by default escapes slashes with the escape_forward_slashes option.
You can just json.dumps(…) the result of converting your dataframe to a dictionary with .to_dict:
>>> import json
>>> print(json.dumps(df.to_dict('records')))
[{"name": "sample_name", "image": "https://images.pexels.com/photos/736230/pexels-photo-736230.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500"}, {"name": "sample_name2", "image": "https://cdn.theatlantic.com/assets/media/img/mt/2017/10/Pict1_Ursinia_calendulifolia/lead_720_405.jpg?mod=1533691909"}]
I have downloaded a sample dataset from here that is a series of JSON objects.
{...}
{...}
I need to load them to a pandas dataframe. I have tried below code
import pandas as pd
import json
filename = "sample-S2-records"
df = pd.DataFrame.from_records(map(json.loads, "sample-S2-records"))
But there seems to be parsing error
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
What am I missing?
You can try pandas.read_json method:
import pandas as pd
data = pd.read_json('/path/to/file.json', lines=True)
print data
I have tested it with this file, it works fine
The function needs a list of JSON objects. For example,
data = [ json_obj_1,json_obj_2,....]
The file does not contain the syntax for list and just has series of JSON objects. Following would solve the issue:
import pandas as pd
import json
# Load content to a variable
with open('../sample-S2-records/sample-S2-records', 'r') as content_file:
content = content_file.read().strip()
# Split content by new line
content = content.split('\n')
# Read each line which has a json obj and store json obj in a list
json_list = []
for each_line in content:
json_list.append(json.loads(each_line))
# Load the json list in form of a string
df = pd.read_json(json.dumps(json_list))
I am using python 3.6 and trying to download json file (350 MB) as pandas dataframe using the code below. However, I get the following error:
data_json_str = "[" + ",".join(data) + "]
"TypeError: sequence item 0: expected str instance, bytes found
How can I fix the error?
import pandas as pd
# read the entire file into a python array
with open('C:/Users/Alberto/nutrients.json', 'rb') as f:
data = f.readlines()
# remove the trailing "\n" from each line
data = map(lambda x: x.rstrip(), data)
# each element of 'data' is an individual JSON object.
# i want to convert it into an *array* of JSON objects
# which, in and of itself, is one large JSON object
# basically... add square brackets to the beginning
# and end, and have all the individual business JSON objects
# separated by a comma
data_json_str = "[" + ",".join(data) + "]"
# now, load it into pandas
data_df = pd.read_json(data_json_str)
From your code, it looks like you're loading a JSON file which has JSON data on each separate line. read_json supports a lines argument for data like this:
data_df = pd.read_json('C:/Users/Alberto/nutrients.json', lines=True)
Note
Remove lines=True if you have a single JSON object instead of individual JSON objects on each line.
Using the json module you can parse the json into a python object, then create a dataframe from that:
import json
import pandas as pd
with open('C:/Users/Alberto/nutrients.json', 'r') as f:
data = json.load(f)
df = pd.DataFrame(data)
If you open the file as binary ('rb'), you will get bytes. How about:
with open('C:/Users/Alberto/nutrients.json', 'rU') as f:
Also as noted in this answer you can also use pandas directly like:
df = pd.read_json('C:/Users/Alberto/nutrients.json', lines=True)
if you want to convert it into an array of JSON objects, I think this one will do what you want
import json
data = []
with open('nutrients.json', errors='ignore') as f:
for line in f:
data.append(json.loads(line))
print(data[0])
The easiest way to read json file using pandas is:
pd.read_json("sample.json",lines=True,orient='columns')
To deal with nested json like this
[[{Value1:1},{value2:2}],[{value3:3},{value4:4}],.....]
Use Python basics
value1 = df['column_name'][0][0].get(Value1)
Please the code below
#call the pandas library
import pandas as pd
#set the file location as URL or filepath of the json file
url = 'https://www.something.com/data.json'
#load the json data from the file to a pandas dataframe
df = pd.read_json(url, orient='columns')
#display the top 10 rows from the dataframe (this is to test only)
df.head(10)
Please review the code and modify based on your need. I have added comments to explain each line of code. Hope this helps!
I'm trying write python code to build a nested JSON file from a flat table in a pandas data frame. I created a dictionary of dictionaries from the pandas dataframe. When I try to export the dict to JSON, I get an error that the unicode text is not a string. How can I convert dictionaries with unicode strings to JSON?
My current code is:
data = pandas.read_excel(inputExcel, sheetname = 'SCAT Teams', encoding = 'utf8')
columnList = tuple(data[0:])
for index, row in data.iterrows():
dataRow = tuple(row)
rowDict = dict(zip(dataRow[2:],columnList[2:]))
memberId = str(tuple(row[1]))
teamName = str(tuple(row[0]))
memberDict1 = {memberId[1:2]:rowDict}
memberDict2 = {teamName:memberDict1}
This produces a dict of dicts like where each row looks like this:
'(1L,)': {'0': {(u'Doe',): (u'lastname',), (u'John',): (u'firstname',), (u'none',): (u'mobile',), (u'916-555-1234',): (u'phone',), (u'john.doe#wildlife.net',): (u'email',), (u'Anon',): (u'orgname',)}}}
But when I try to dump to JSON, the unicode text can't be parsed as strings, so I get this error:
TypeError: key (u'teamname',) is not a string
How can I convert my nested dicts to JSON without invoking the error?