Unicode text in pandas dataframe cannot parse to JSON

Unicode text in pandas dataframe cannot parse to JSON - python

I'm trying write python code to build a nested JSON file from a flat table in a pandas data frame. I created a dictionary of dictionaries from the pandas dataframe. When I try to export the dict to JSON, I get an error that the unicode text is not a string. How can I convert dictionaries with unicode strings to JSON?
My current code is:
data = pandas.read_excel(inputExcel, sheetname = 'SCAT Teams', encoding = 'utf8')
columnList = tuple(data[0:])
for index, row in data.iterrows():
dataRow = tuple(row)
rowDict = dict(zip(dataRow[2:],columnList[2:]))
memberId = str(tuple(row[1]))
teamName = str(tuple(row[0]))
memberDict1 = {memberId[1:2]:rowDict}
memberDict2 = {teamName:memberDict1}
This produces a dict of dicts like where each row looks like this:
'(1L,)': {'0': {(u'Doe',): (u'lastname',), (u'John',): (u'firstname',), (u'none',): (u'mobile',), (u'916-555-1234',): (u'phone',), (u'john.doe#wildlife.net',): (u'email',), (u'Anon',): (u'orgname',)}}}
But when I try to dump to JSON, the unicode text can't be parsed as strings, so I get this error:
TypeError: key (u'teamname',) is not a string
How can I convert my nested dicts to JSON without invoking the error?

Related

how can I extract columns from my json dataset?

my jsons looks like {"metadata"} ,but it always return me as [] empty , but I want to properly extract the keys
with gzip.open(gzip_file) as file:
parser = ijson.parse(file)
objects = ijson.items(parser, 'meta.view.columns.item')
columns = list(objects)

Empty JSON Objects

I have few columns in my data that are encoded in JSON format. I am trying to convert this data into several columns in python pandas dataframe.I have if problems with the empty JSON objects it's not letting me use the JSON.loads function to decode the JSON to several columns.
Here is my code:
json_columns = ['customDimensions','device','geoNetwork','hits','totals','trafficSource']
for json in json_columns:
if df[json][0].startswith('['):
df[json] = df[json].apply(lambda x: x[1:-1])
df[json] = df[json].apply(lambda x: x.replace('\'',"\""))
json_load = df[json].apply(loads)
json_list = list(json_load)
json_data = dumps(json_list)
df = df.join(pd.read_json(json_data))
df = df.drop(json,axis=1)
Here are some examples:
"customDimensions": [], "customMetrics": [], "customVariables": [],"experiment": []
I want these empty object to return NULL when it gets decoded into a pandas data frame. hits columns has the above empty JSON object which is resulting in the below error.
JSONDecodeError: Expecting value: line 1 column 79 (char 78)
I have shared a sample from my source data frame. Just in case if anyone needs to have a look at it.
URL: https://drive.google.com/file/d/142tOk03WxxPF30xE9G_UtJ2j0KbpHWc4/view?usp=sharing
Thanks

How to convert a csv format values inside a dict to JSON format in python?

I know how to convert the csv file and extract the datas. But this is something different, I get the values from the database as in csv format and I need to convert for JSON format.
Current Output:
u{'content':'heading1,heading2,heading3'\nrowa1,rowa2,rowa3\nrowb1,rowb2,rowb3}
Expected to convert this in to proper JSON format.

Take this example
data = "heading1,heading2,heading3;rowa1,rowa2,rowa3;rowb1,rowb2,rowb3;"
lines = data.split(';')
headers = lines[0].split(',')
output = [{h: d for h,d in zip(headers,line.split(','))} for line in lines[1:-1]]
print(output)

Convert DataFrame with url in string format to JSON properly

I have a data frame with 2 columns, one of which consists of URLs.
Sample code:
df = pd.DataFrame(columns=('name', 'image'))
df = df.append({'name': 'sample_name', 'image': 'https://images.pexels.com/photos/736230/pexels-photo-736230.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500'}, ignore_index=True)
df = df.append({'name': 'sample_name2', 'image': 'https://cdn.theatlantic.com/assets/media/img/mt/2017/10/Pict1_Ursinia_calendulifolia/lead_720_405.jpg?mod=1533691909'}, ignore_index=True)
I want to convert this dataframe to JSON directly. I've used to_json() method of DataFrame to convert, but when I do it, it kind of messes up the urls in the data frame.
Conversion to JSON:
json = df.to_json(orient='records')
When I print it, the conversion inserts '\' character to beginning of every '/' character in my url.
print(json)
Result:
[{"name":"sample_name","image":"https:\/\/images.pexels.com\/photos\/736230\/pexels-photo-736230.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500"},{"na
me":"sample_name2","image":"https:\/\/cdn.theatlantic.com\/assets\/media\/img\/mt\/2017\/10\/Pict1_Ursinia_calendulifolia\/lead_720_405.jpg?mod=15
33691909"}]
I want the json to look like (no extra '\' in urls):
[{"name":"sample_name","image":"https://images.pexels.com/photos/736230/pexels-photo-736230.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500"},{"na
me":"sample_name2","image":"https://cdn.theatlantic.com/assets/media/img/mt/2017/10/Pict1_Ursinia_calendulifolia/lead_720_405.jpg?mod=15
33691909"}]
I checked documentation of to_json() and other questions as well but couldn't find an answer to deal with it. How can I just convert my url strings to json, as they are in data frame?

Pandas uses ujson [PiPy] internally to encode the data to a JSON blob. ujson by default escapes slashes with the escape_forward_slashes option.
You can just json.dumps(…) the result of converting your dataframe to a dictionary with .to_dict:
>>> import json
>>> print(json.dumps(df.to_dict('records')))
[{"name": "sample_name", "image": "https://images.pexels.com/photos/736230/pexels-photo-736230.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500"}, {"name": "sample_name2", "image": "https://cdn.theatlantic.com/assets/media/img/mt/2017/10/Pict1_Ursinia_calendulifolia/lead_720_405.jpg?mod=1533691909"}]

Reading the 2nd entry of a .json

I am trying to read the density entry of a list of arrays within a .json file. He's a small portion of the file from the beginning:
["time_tag","density","speed","temperature"],["2019-04-14 18:20:00.000","4.88","317.9","11495"],["2019-04-14 18:21:00.000","4.89","318.4","11111"]
This is the code I have thus far:
with open('plasma.json', 'r') as myfile:
data = myfile.read()
obj = json.loads(data)
print(str(obj['density']))
It should print everything under the density column but I'm getting an error saying that the file can't be opened

First, you json file is not correct. If you want to read it with a single call obj = json.load(data), the json file should be:
[["time_tag","density","speed","temperature"],["2019-04-14 18:20:00.000","4.88","317.9","11495"],["2019-04-14 18:21:00.000","4.89","318.4","11111"]]
Notice the extra square bracket, making it a single list of sublists.
This said, being obj a list of lists, there is no way print(str(obj['density'])) will work. You need to loop on the list to print what you want, or convert this to a dataframe before.
Looping directly
idx = obj[0].index('density') #get the index of the density entry
#from the first list in obj, the header
for row in obj[1:]: #looping on all sublists except the first one
print(row[idx]) #printing
Using a dataframe (pandas)
import pandas as pd
df = pd.DataFrame(obj[1:], columns=obj[0]) #converting to a dataframe, using
#first row as column header
print(df['density'])

Are you sure your data is a valid json and not a csv?
As the snippet of data provided above matches that of a csv file and not a json.
You will be able to read the density key of the csv with:
import csv
input_file = csv.DictReader(open("plasma.csv"))
for row in input_file:
print(row['density'])
Data formatted as csv
["time_tag","density","speed","temperature"]
["2019-04-14 18:20:00.000","4.88","317.9","11495"]
["2019-04-14 18:21:00.000","4.89","318.4","11111"]
Result
4.88
4.89

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unicode text in pandas dataframe cannot parse to JSON - python

Related

how can I extract columns from my json dataset?

Empty JSON Objects

How to convert a csv format values inside a dict to JSON format in python?

Convert DataFrame with url in string format to JSON properly

Reading the 2nd entry of a .json

Categories

Resources