How to remove delimeted pipe from my json column and split them to different columns and their respective values - python

"description": ID|100|\nName|Sam|\nCity|New York City|\nState|New York|\nContact|1234567890|\nEmail|1234#yahoo.com|
This is how my code in json looks like. I wanted to convert this json file to excel sheet to split the nested column to separate columns and have used pandas for it, but couldn't achieve it. The output I want in my excel sheet is:
ID Name City State Contact Email
100 Sam New York City New York 1234567890 1234#yahoo.com
I want to remove those pipes and the solution should be in pandas. Please help me out with this.
The code I am trying:
I want output as:
The output on excel sheet:
[2]: https://i.stack.imgur.com/QjSUU.png
The list of dict column looks like:
"assignees": [{
"id": 1234,
"username": "xyz",
"name": "XYZ",
"state": "active",
"avatar_url": "aaaaaaaaaaaaaaa",
"web_url": "bbbbbbbbbbb"
},
{
"id": 5678,
"username": "abcd",
"name": "ABCD",
"state": "active",
"avatar_url": "hhhhhhhhhhh",
"web_url": "mmmmmmmmm"
}
],

This could be one way:
import pandas as pd
df = pd.read_json('Sample.json')
df2 = pd.DataFrame()
for i in df.index:
desc = df['description'][i]
attributes = desc.split("\n")
d = {}
for attrib in attributes:
if not(attrib.startswith('Name') or attrib.startswith('-----')):
kv = attrib.split("|")
d[kv[0]] = kv[1]
df2 = df2.append(d, ignore_index=True)
print(df2)
df2.to_csv("output.csv")
Output xls:

Related

convert json data to pandas dataframe in python (dictionary inside list )

I have json data like below:
{"name": "Monkey", "image": "https://media.npr.org/assets/img/2017/09/12/macaca_nigra_self-portrait-3e0070aa19a7fe36e802253048411a38f14a79f8-s800-c85.webp", "attributes": [{"trait_type": "Bones", "value": "Zombie"}, {"trait_type": "Clothes", "value": "Striped"}, {"trait_type": "Mouth", "value": "Bubblegum"}, {"trait_type": "Eyes", "value": "Black Sunglasses"}, {"trait_type": "Hat", "value": "Sushi"}, {"trait_type": "Background", "value": "Purple"}]}
I want to convert this json data as pandas dataframe only selecting the attributes as filter it as below:
Bones Clothes Mouth Eyes Hat Background
zombie striped bubblegum black sushi purple
Can any expert please help me to get the output as i mentioned
Thank you
There is probably a prettier solution but this does the job:
import json
import pandas as pd
with open('file.json') as f:
trait_types= []
values = []
data = json.load(f)
df = pd.DataFrame(data)
for key in data['attributes']:
trait_types.append(key['trait_type'])
values.append(key['value'])
df = pd.DataFrame({
'trait type': trait_types,
'value' : values})
print(df)

Extracting str from pandas dataframe using json

I read csv file into a dataframe named df
Each rows contains str below.
'{"id":2140043003,"name":"Olallo Rubio",...}'
I would like to extract "name" and "id" from each row and make a new dataframe to store the str.
I use the following codes to extract but it shows an error. Please let me know if there is any suggestions on how to solve this problem. Thanks
JSONDecodeError: Expecting ',' delimiter: line 1 column 32 (char 31)
text={
"id": 2140043003,
"name": "Olallo Rubio",
"is_registered": True,
"chosen_currency": 'Null',
"avatar": {
"thumb": "https://ksr-ugc.imgix.net/assets/019/223/259/16513215a3869caaea2d35d43f3c0c5f_original.jpg?w=40&h=40&fit=crop&v=1510685152&auto=format&q=92&s=653706657ccc49f68a27445ea37ad39a",
"small": "https://ksr-ugc.imgix.net/assets/019/223/259/16513215a3869caaea2d35d43f3c0c5f_original.jpg?w=160&h=160&fit=crop&v=1510685152&auto=format&q=92&s=0bd2f3cec5f12553e679153ba2b5d7fa",
"medium": "https://ksr-ugc.imgix.net/assets/019/223/259/16513215a3869caaea2d35d43f3c0c5f_original.jpg?w=160&h=160&fit=crop&v=1510685152&auto=format&q=92&s=0bd2f3cec5f12553e679153ba2b5d7fa"
},
"urls": {
"web": {
"user": "https://www.kickstarter.com/profile/2140043003"
},
"api": {
"user": "https://api.kickstarter.com/v1/users/2140043003?signature=1531480520.09df9a36f649d71a3a81eb14684ad0d3afc83e03"
}
}
}
def extract(text,*args):
list1=[]
for i in args:
list1.append(text[i])
return list1
print(extract(text,'name','id'))
# ['Olallo Rubio', 2140043003]
Here's what I came up with using pandas.json_normalize():
import pandas as pd
sample = [{
"id": 2140043003,
"name":"Olallo Rubio",
"is_registered": True,
"chosen_currency": None,
"avatar":{
"thumb":"https://ksr-ugc.imgix.net/assets/019/223/259/16513215a3869caaea2d35d43f3c0c5f_original.jpg?w=40&h=40&fit=crop&v=1510685152&auto=format&q=92&s=653706657ccc49f68a27445ea37ad39a",
"small":"https://ksr-ugc.imgix.net/assets/019/223/259/16513215a3869caaea2d35d43f3c0c5f_original.jpg?w=160&h=160&fit=crop&v=1510685152&auto=format&q=92&s=0bd2f3cec5f12553e679153ba2b5d7fa",
"medium":"https://ksr-ugc.imgix.net/assets/019/223/259/16513215a3869caaea2d35d43f3c0c5f_original.jpg?w=160&h=160&fit=crop&v=1510685152&auto=format&q=92&s=0bd2f3cec5f12553e679153ba2b5d7fa"
},
"urls":{
"web":{
"user":"https://www.kickstarter.com/profile/2140043003"
},
"api":{
"user":"https://api.kickstarter.com/v1/users/2140043003?signature=1531480520.09df9a36f649d71a3a81eb14684ad0d3afc83e03"
}
}
}]
# Create datafrane
df = pd.json_normalize(sample)
# Select columns into new dataframe.
df1 = df.loc[:, ["name", "id",]]
Check df1:
Input:
print(df1)
Output:
name id
0 Olallo Rubio 2140043003

How to handle a JSON that returns a list of dict-like objects in Pandas?

I am using an API from collegefootballdata.com to get data on scores and betting lines. I want to use betting lines to infer expected win % and then compare that to actual results (I feel like my team loses too many games where we are big favorites and want to test that.) This code retrieves one game for example purposes.
parameters = {
"gameId": 401112435,
"year": 2019
}
response = requests.get("https://api.collegefootballdata.com/lines", params=parameters)
The JSON output is this:
[
{
"awayConference": "ACC",
"awayScore": 28,
"awayTeam": "Virginia Tech",
"homeConference": "ACC",
"homeScore": 35,
"homeTeam": "Boston College",
"id": 401112435,
"lines": [
{
"formattedSpread": "Virginia Tech -4.5",
"overUnder": "57.5",
"provider": "consensus",
"spread": "4.5"
},
{
"formattedSpread": "Virginia Tech -4.5",
"overUnder": "57",
"provider": "Caesars",
"spread": "4.5"
},
{
"formattedSpread": "Virginia Tech -4.5",
"overUnder": "58",
"provider": "numberfire",
"spread": "4.5"
},
{
"formattedSpread": "Virginia Tech -4.5",
"overUnder": "56.5",
"provider": "teamrankings",
"spread": "4.5"
}
],
"season": 2019,
"seasonType": "regular",
"week": 1
}
]
I'm then loading into a pandas dataframe with:
def jstring(obj):
# create a formatted string of the Python JSON object
text = json.dumps(obj, sort_keys=True, indent=4)
return text
json_str = jstring(response.json())
df = pd.read_json(json_str)
This creates a dataframe with a "lines" column that contains the entire lines section of the JSON as a string. Ultimately, I want to use the "spread" value in the block where "provider" = "consensus". Everything else is extraneous for my purposes. I've tried exploding the column with
df = df.explode('lines')
which gives me 4 rows with something like this for each game (as expected):
{'formattedSpread': 'Virginia Tech -4.5', 'overUnder': '57.5', 'provider': 'consensus', 'spread': '4.5'}
Here is where I'm stuck. I want to keep only the rows where 'provider' = 'consensus', and further I need to have 'spread' to use as a separate variable/column in my analysis. I've tried exploding a 2nd time, df.split, df.replace to change { to [ and explode as a list, all to no avail. Any help is appreciated!!
This is probably what you're looking for -
EDIT: Handling special case.
import pandas as pd
import requests
params = {
"gameId": 401112435,
"year": 2019,
}
r = requests.get("https://api.collegefootballdata.com/lines", params=params)
df = pd.DataFrame(r.json()) # Create a DataFrame with a lines column that contains JSON
df = df.explode('lines') # Explode the DataFrame so that each line gets its own row
df = df.reset_index(drop=True) # After explosion, the indices are all the same - this resets them so that you can align the DataFrame below cleanly
def fill_na_lines(lines):
if pd.isna(lines):
return {k: None for k in ['provider', 'spread', 'formattedSpread', 'overUnder']}
return lines
df.lines = df.lines.apply(fill_na_lines)
lines_df = pd.DataFrame(df.lines.tolist()) # A separate lines DataFrame created from the lines JSON column
df = pd.concat([df, lines_df], axis=1) # Concatenating the two DataFrames along the vertical axis.
# Now you can filter down to whichever rows you need.
df = df[df.provider == 'consensus']
The documentation on joining DataFrames in different ways is probably useful.

How to convert JSON into a table using python?

HI I'm a beginner with python.
I have a csv file which I retrieve from my database. The table has several columns, one of which contains data in json. I have difficulty converting json data into a table to be saved in pdf.
First I load the csv. Then I take the column that has the data in json and I convert them.
df = pd.DataFrame(pd.read_csv('database.csv',sep=';'))
print(df.head())
for index, row in df.iterrows():
str = row["json_data"]
val = ast.literal_eval(str)
val1 = json.loads(str)
A sample of my json is that
{
"title0": {
"id": "1",
"ex": "90",
},
"title1": {
"name": "Alex",
"surname": "Boris",
"code": "RBAMRA4",
},
"title2": {
"company": "LA",
"state": "USA",
"example": "9090",
}
}
I'm trying to create a table like this
-------------------------------------------\
title0
--------------------------------------------\
id 1
ex 90
---------------------------------------------\
title 1
---------------------------------------------\
name Alex
surname Boris
code RBAMRA4
----------------------------------------------\
title2
----------------------------------------------\
company LA
state USA
example 9090
You can use the Python JSON library to achieve this.
import json
my_str = open("outfile").read()
val1 = json.loads(my_str)
for key in val1:
print("--------------------\n"+key+"\n--------------------")
for k in val1[key]:
print(k, val1[key][k])
Load the JSON data into the json.jsonloads function, this will deserialize the string and convert it to a python object (the whole object becomes a dict).
Then you loop through the dict the way you want.
--------------------
title0
--------------------
id 1
ex 90
--------------------
title1
--------------------
name Alex
surname Boris
code RBAMRA4
--------------------
title2
--------------------
company LA
state USA
example 9090
Read about parsing a dict, then you will understand the for loop.

Python - JSON to CSV table?

I was wondering how I could import a JSON file, and then save that to an ordered CSV file, with header row and the applicable data below.
Here's what the JSON file looks like:
[
{
"firstName": "Nicolas Alexis Julio",
"lastName": "N'Koulou N'Doubena",
"nickname": "N. N'Koulou",
"nationality": "Cameroon",
"age": 24
},
{
"firstName": "Alexandre Dimitri",
"lastName": "Song-Billong",
"nickname": "A. Song",
"nationality": "Cameroon",
"age": 26,
etc. etc. + } ]
Note there are multiple 'keys' (firstName, lastName, nickname, etc.). I would like to create a CSV file with those as the header, then the applicable info beneath in rows, with each row having a player's information.
Here's the script I have so far for Python:
import urllib2
import json
import csv
writefilerows = csv.writer(open('WCData_Rows.csv',"wb+"))
api_key = "xxxx"
url = "http://worldcup.kimonolabs.com/api/players?apikey=" + api_key + "&limit=1000"
json_obj = urllib2.urlopen(url)
readable_json = json.load(json_obj)
list_of_attributes = readable_json[0].keys()
print list_of_attributes
writefilerows.writerow(list_of_attributes)
for x in readable_json:
writefilerows.writerow(x[list_of_attributes])
But when I run that, I get a "TypeError: unhashable type:'list'" error. I am still learning Python (obviously I suppose). I have looked around online (found this) and can't seem to figure out how to do it without explicitly stating what key I want to print...I don't want to have to list each one individually...
Thank you for any help/ideas! Please let me know if I can clarify or provide more information.
Your TypeError is occuring because you are trying to index a dictionary, x with a list, list_of_attributes with x[list_of_attributes]. This is not how python works. In this case you are iterating readable_json which appears it will return a dictionary with each iteration. There is no need pull values out of this data in order to write them out.
The DictWriter should give you what your looking for.
import csv
[...]
def encode_dict(d, out_encoding="utf8"):
'''Encode dictionary to desired encoding, assumes incoming data in unicode'''
encoded_d = {}
for k, v in d.iteritems():
k = k.encode(out_encoding)
v = unicode(v).encode(out_encoding)
encoded_d[k] = v
return encoded_d
list_of_attributes = readable_json[0].keys()
# sort fields in desired order
list_of_attributes.sort()
with open('WCData_Rows.csv',"wb+") as csv_out:
writer = csv.DictWriter(csv_out, fieldnames=list_of_attributes)
writer.writeheader()
for data in readable_json:
writer.writerow(encode_dict(data))
Note:
This assumes that each entry in readable_json has the same fields.
Maybe pandas could do this - but I newer tried to read JSON
import pandas as pd
df = pd.read_json( ... )
df.to_csv( ... )
pandas.DataFrame.to_csv
pandas.io.json.read_json
EDIT:
data = ''' [
{
"firstName": "Nicolas Alexis Julio",
"lastName": "N'Koulou N'Doubena",
"nickname": "N. N'Koulou",
"nationality": "Cameroon",
"age": 24
},
{
"firstName": "Alexandre Dimitri",
"lastName": "Song-Billong",
"nickname": "A. Song",
"nationality": "Cameroon",
"age": 26,
}
]'''
import pandas as pd
df = pd.read_json(data)
print df
df.to_csv('results.csv')
result:
age firstName lastName nationality nickname
0 24 Nicolas Alexis Julio N'Koulou N'Doubena Cameroon N. N'Koulou
1 26 Alexandre Dimitri Song-Billong Cameroon A. Song
With pandas you can save it in csv, excel, etc (and maybe even directly in database).
And you can do some operations on data in table and show it as graph.

Categories

Resources