Convert DataFrame with url in string format to JSON properly - python

I have a data frame with 2 columns, one of which consists of URLs.
Sample code:
df = pd.DataFrame(columns=('name', 'image'))
df = df.append({'name': 'sample_name', 'image': 'https://images.pexels.com/photos/736230/pexels-photo-736230.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500'}, ignore_index=True)
df = df.append({'name': 'sample_name2', 'image': 'https://cdn.theatlantic.com/assets/media/img/mt/2017/10/Pict1_Ursinia_calendulifolia/lead_720_405.jpg?mod=1533691909'}, ignore_index=True)
I want to convert this dataframe to JSON directly. I've used to_json() method of DataFrame to convert, but when I do it, it kind of messes up the urls in the data frame.
Conversion to JSON:
json = df.to_json(orient='records')
When I print it, the conversion inserts '\' character to beginning of every '/' character in my url.
print(json)
Result:
[{"name":"sample_name","image":"https:\/\/images.pexels.com\/photos\/736230\/pexels-photo-736230.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500"},{"na
me":"sample_name2","image":"https:\/\/cdn.theatlantic.com\/assets\/media\/img\/mt\/2017\/10\/Pict1_Ursinia_calendulifolia\/lead_720_405.jpg?mod=15
33691909"}]
I want the json to look like (no extra '\' in urls):
[{"name":"sample_name","image":"https://images.pexels.com/photos/736230/pexels-photo-736230.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500"},{"na
me":"sample_name2","image":"https://cdn.theatlantic.com/assets/media/img/mt/2017/10/Pict1_Ursinia_calendulifolia/lead_720_405.jpg?mod=15
33691909"}]
I checked documentation of to_json() and other questions as well but couldn't find an answer to deal with it. How can I just convert my url strings to json, as they are in data frame?

Pandas uses ujson [PiPy] internally to encode the data to a JSON blob. ujson by default escapes slashes with the escape_forward_slashes option.
You can just json.dumps(…) the result of converting your dataframe to a dictionary with .to_dict:
>>> import json
>>> print(json.dumps(df.to_dict('records')))
[{"name": "sample_name", "image": "https://images.pexels.com/photos/736230/pexels-photo-736230.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500"}, {"name": "sample_name2", "image": "https://cdn.theatlantic.com/assets/media/img/mt/2017/10/Pict1_Ursinia_calendulifolia/lead_720_405.jpg?mod=1533691909"}]

Related

Empty JSON Objects

I have few columns in my data that are encoded in JSON format. I am trying to convert this data into several columns in python pandas dataframe.I have if problems with the empty JSON objects it's not letting me use the JSON.loads function to decode the JSON to several columns.
Here is my code:
json_columns = ['customDimensions','device','geoNetwork','hits','totals','trafficSource']
for json in json_columns:
if df[json][0].startswith('['):
df[json] = df[json].apply(lambda x: x[1:-1])
df[json] = df[json].apply(lambda x: x.replace('\'',"\""))
json_load = df[json].apply(loads)
json_list = list(json_load)
json_data = dumps(json_list)
df = df.join(pd.read_json(json_data))
df = df.drop(json,axis=1)
Here are some examples:
"customDimensions": [], "customMetrics": [], "customVariables": [],"experiment": []
I want these empty object to return NULL when it gets decoded into a pandas data frame. hits columns has the above empty JSON object which is resulting in the below error.
JSONDecodeError: Expecting value: line 1 column 79 (char 78)
I have shared a sample from my source data frame. Just in case if anyone needs to have a look at it.
URL: https://drive.google.com/file/d/142tOk03WxxPF30xE9G_UtJ2j0KbpHWc4/view?usp=sharing
Thanks

Putting JSON API in Pandas Dataframe

I have a problem converting a JSON API into a pandas dataframe. I have the following structure of the json file:
{"place":{"AMS":[{"UTC":"14-11-2017 10:00","ValidUTC":"14-11-2017 00:00","Cardinality":"4",...},{"UTC":"14-11-2017 11:00",...}]}}
Now in my pandas DataFrame I want the columns, UTC, ValidUtc, cardinality, etc. So I tried to use the Json normalize function:
main_api = ('https://api.xxx")
url=main_api
json_data = requests.get(url).json()
df = json_normalize(json_data, 'place', ['AMS'])
and
main_api = ('https://api.xxx")
url=main_api
json_data = requests.get(url).json()
df = json_normalize(json_data, 'place')
df = json_normalize(json_data, 'AMS')
but they do not seem to work. Anyone has an idea about how to convert the json correctly in the pandas DataFrame.
Refer to JSON to pandas DataFrame it is very well described for normalizing JSON. Besides, you can to pass a parsing function for the json columns.
Not sure if I recreated your input correctly (you should add the full json without ...).
data = {"place":{"AMS":[{"UTC":"14-11-2017 10:00","ValidUTC":"14-11-2017 00:00","Cardinality":"4"}, {"UTC":"15-11-2017 10:00","ValidUTC":"15-11-2017 00:00","Cardinality":"5"} ]}}
pd.json_normalize(data['place']['AMS'])
Output
UTC ValidUTC Cardinality
0 14-11-2017 10:00 14-11-2017 00:00 4
1 15-11-2017 10:00 15-11-2017 00:00 5

Python dataframe to JSON add additional fields

I am looking to transform my dataframe to json
Age Eye Gender
30 blue male
My current code, I convert the dataframe to json and get the below result:
json_file = df.to_json(orient='records')
json_file
[{'age':'30'},{'eye':'blue'},{'gender':'male'}]
However, I want to add an additional layer that would state the id and name to the json data and then label it as 'info'.
{'id':'5231'
'name':'Bob'
'info': [
{'age':'30'},{'eye':'blue'},{'gender':'male'}
]
}
How would I add the additional fields? I tried reading the docs however I do not see a clear answer on how to add the additional fields in during dataframe to json conversion.
Based on the data you provided this is your answer:
import pandas as pd
a = {'id':'5231',
'name':'Bob',
}
df = pd.DataFrame({'Age':[30], 'Eye':['blue'], 'Gender': ['male']})
json = df.to_json(orient='records')
a['info'] = json

python - converting unicode in list to dataframe

I am using an API to get some data. The data returned is in Unicode (not a dictionary / json object).
# get data
data = []
for urls in api_call_list:
data.append(requests.get(urls))
the data looks like this:
>>> data[0].text
u'Country;Celebrity;Song Volume;CPP;Index\r\nus;Taylor Swift;33100;0.83;0.20\r\n'
>>> data[1].text
u'Country;Celebrity;Song Volume;CPP;Index\r\nus;Rihanna;28100;0.76;0.33\r\n'
I want to put this in a DataFrame with Country, Celebrity, Song, Volume, CPP and Index as column names.
First I tried to split it on \r\n like this:
x = [i.text.split('\r\n') for i in data]
and got:
[[u'Country;Celebrity;Song Volume;CPP;Index',
u'us;Taylor Swift;33100;0.83;0.20',
u''],
[u'Country;Celebrity;Song Volume;CPP;Index',
u'us;Rihanna;28100;0.76;0.33',
u'']]
Not sure where to go from here . . .
You can use pandas.read_csv to read data as a list of data frames and then concatenate them:
# if you use python 2 change this to // from io import BytesIO and use BytesIO instead
from io import StringIO
import pandas as pd
pd.concat([pd.read_csv(StringIO(d), sep = ";") for d in data])
Since your actual data is a list of responses, you may need access the text firstly:
pd.concat([pd.read_csv(StringIO(d.text), sep = ";") for d in data])
data = [u'Country;Celebrity;Song Volume;CPP;Index\r\nus;Taylor Swift;33100;0.83;0.20\r\n',
u'Country;Celebrity;Song Volume;CPP;Index\r\nus;Rihanna;28100;0.76;0.33\r\n']

Unicode text in pandas dataframe cannot parse to JSON

I'm trying write python code to build a nested JSON file from a flat table in a pandas data frame. I created a dictionary of dictionaries from the pandas dataframe. When I try to export the dict to JSON, I get an error that the unicode text is not a string. How can I convert dictionaries with unicode strings to JSON?
My current code is:
data = pandas.read_excel(inputExcel, sheetname = 'SCAT Teams', encoding = 'utf8')
columnList = tuple(data[0:])
for index, row in data.iterrows():
dataRow = tuple(row)
rowDict = dict(zip(dataRow[2:],columnList[2:]))
memberId = str(tuple(row[1]))
teamName = str(tuple(row[0]))
memberDict1 = {memberId[1:2]:rowDict}
memberDict2 = {teamName:memberDict1}
This produces a dict of dicts like where each row looks like this:
'(1L,)': {'0': {(u'Doe',): (u'lastname',), (u'John',): (u'firstname',), (u'none',): (u'mobile',), (u'916-555-1234',): (u'phone',), (u'john.doe#wildlife.net',): (u'email',), (u'Anon',): (u'orgname',)}}}
But when I try to dump to JSON, the unicode text can't be parsed as strings, so I get this error:
TypeError: key (u'teamname',) is not a string
How can I convert my nested dicts to JSON without invoking the error?

Categories

Resources