I want to convert JSON file into proper format.
I have a JSON file as given below:
{
"fruit": "Apple",
"size": "Large",
"color": "Red",
"details":"|seedless:true|,|condition:New|"
},
{
"fruit": "Almond",
"size": "small",
"color": "brown",
"details":"|Type:dry|,|seedless:true|,|condition:New|"
}
You can see the data in the details can vary.
I want to change it into :
{
"fruit": "Apple",
"size": "Large",
"color": "Red",
"seedless":"true",
"condition":"New",
},
{
"fruit": "Almond",
"size": "small",
"color": "brown",
"Type":"dry",
"seedless":"true",
"condition":"New",
}
I have tried doing it in python using pandas as:
import json
import pandas as pd
import re
df = pd.read_json("data.json",lines=True)
#I tried to change the pattern of data in details column as
re1 = re.compile('r/|(.?):(.?)|/')
re2 = re.compile('r\"(.*?)\":\"(.*?)\"')
df.replace({'details' :re1}, {'details' : re2},inplace = True, regex = True);
But that giving output as "objects" in all the rows of details column.
Try this,
for d in data:
details = d.pop('details')
d.update(dict(x.split(":") for x in details.split("|") if ":" in x))
print(data)
[{'color': 'Red',
'condition': 'New',
'fruit': 'Apple',
'seedless': 'true',
'size': 'Large'},
{'Type': 'dry',
'color': 'brown',
'condition': 'New',
'fruit': 'Almond',
'seedless': 'true',
'size': 'small'}]
You can convert the (list of) dictionaries to a pandas data frame.
import pandas as pd
# data is a list of dictionaries
data = [{
"fruit": "Apple",
"size": "Large",
"color": "Red",
"details":"|seedless:true|,|condition:New|"
},
{
"fruit": "Almond",
"size": "small",
"color": "brown",
"details":"|Type:dry,|seedless:true|,|condition:New|"
}]
# convert to data frame
df = pd.DataFrame(data)
# remove '|' from details and convert to list
df['details'] = df['details'].str.replace(r'\|', '').str.split(',')
# explode list => one row for each element
df = df.explode('details')
# split details into name/value pair
df[['name', 'value']] = df['details'].str.split(':').apply(lambda x: pd.Series(x))
# drop details column
df = df.drop(columns='details')
print(df)
fruit size color name value
0 Apple Large Red seedless true
0 Apple Large Red condition New
1 Almond small brown Type dry
1 Almond small brown seedless true
1 Almond small brown condition New
Related
I was trying to to get all the types inside a colours->numbers but it doesn't work because the colours e.g. green,blue inside an integer so i could not go through a loop to get them.
Here is the code that I was trying to do:
x=0
for colour in colours['rootdata']:
print(colour[x][type])
x+1
but is shows 'string indices must be integers'
I'm able to get a single value with for loop like this :(but this not what i want)
colour_red = JsonResult['rootdata']['colours']['0']['type']
print (colour_red )
This is the simple json sample that I'm using
{
"rootdata": {
"colours": {
"0": {
"type": "red"
},
"1": {
"type": "green"
},
"2": {
"type": "blue"
}
}
}
}
Try this:
my_dict = {"rootdata":
{
"colours": {"0": {"type": "red"}, "1": {"type": "green"}, "2": {"type": "blue"}}
}
}
types = []
types_dict = {}
for k, v in my_dict["rootdata"]["colours"].items():
types.append(v["type"])
types_dict[k] = v["type"]
print(types)
# R e s u l t : ['red', 'green', 'blue']
print(types_dict)
# R e s u l t : {'0': 'red', '1': 'green', '2': 'blue'}
Regards...
You do not need to use a counter. You can obtain the colours by using the following code:
for colour in colours['rootdata']['colours'].values():
print(colour['type'])
This code is works even if the numbers are not in sequential order in the json, and the order of the output does not really matter.
I have json data like below:
{"name": "Monkey", "image": "https://media.npr.org/assets/img/2017/09/12/macaca_nigra_self-portrait-3e0070aa19a7fe36e802253048411a38f14a79f8-s800-c85.webp", "attributes": [{"trait_type": "Bones", "value": "Zombie"}, {"trait_type": "Clothes", "value": "Striped"}, {"trait_type": "Mouth", "value": "Bubblegum"}, {"trait_type": "Eyes", "value": "Black Sunglasses"}, {"trait_type": "Hat", "value": "Sushi"}, {"trait_type": "Background", "value": "Purple"}]}
I want to convert this json data as pandas dataframe only selecting the attributes as filter it as below:
Bones Clothes Mouth Eyes Hat Background
zombie striped bubblegum black sushi purple
Can any expert please help me to get the output as i mentioned
Thank you
There is probably a prettier solution but this does the job:
import json
import pandas as pd
with open('file.json') as f:
trait_types= []
values = []
data = json.load(f)
df = pd.DataFrame(data)
for key in data['attributes']:
trait_types.append(key['trait_type'])
values.append(key['value'])
df = pd.DataFrame({
'trait type': trait_types,
'value' : values})
print(df)
I read csv file into a dataframe named df
Each rows contains str below.
'{"id":2140043003,"name":"Olallo Rubio",...}'
I would like to extract "name" and "id" from each row and make a new dataframe to store the str.
I use the following codes to extract but it shows an error. Please let me know if there is any suggestions on how to solve this problem. Thanks
JSONDecodeError: Expecting ',' delimiter: line 1 column 32 (char 31)
text={
"id": 2140043003,
"name": "Olallo Rubio",
"is_registered": True,
"chosen_currency": 'Null',
"avatar": {
"thumb": "https://ksr-ugc.imgix.net/assets/019/223/259/16513215a3869caaea2d35d43f3c0c5f_original.jpg?w=40&h=40&fit=crop&v=1510685152&auto=format&q=92&s=653706657ccc49f68a27445ea37ad39a",
"small": "https://ksr-ugc.imgix.net/assets/019/223/259/16513215a3869caaea2d35d43f3c0c5f_original.jpg?w=160&h=160&fit=crop&v=1510685152&auto=format&q=92&s=0bd2f3cec5f12553e679153ba2b5d7fa",
"medium": "https://ksr-ugc.imgix.net/assets/019/223/259/16513215a3869caaea2d35d43f3c0c5f_original.jpg?w=160&h=160&fit=crop&v=1510685152&auto=format&q=92&s=0bd2f3cec5f12553e679153ba2b5d7fa"
},
"urls": {
"web": {
"user": "https://www.kickstarter.com/profile/2140043003"
},
"api": {
"user": "https://api.kickstarter.com/v1/users/2140043003?signature=1531480520.09df9a36f649d71a3a81eb14684ad0d3afc83e03"
}
}
}
def extract(text,*args):
list1=[]
for i in args:
list1.append(text[i])
return list1
print(extract(text,'name','id'))
# ['Olallo Rubio', 2140043003]
Here's what I came up with using pandas.json_normalize():
import pandas as pd
sample = [{
"id": 2140043003,
"name":"Olallo Rubio",
"is_registered": True,
"chosen_currency": None,
"avatar":{
"thumb":"https://ksr-ugc.imgix.net/assets/019/223/259/16513215a3869caaea2d35d43f3c0c5f_original.jpg?w=40&h=40&fit=crop&v=1510685152&auto=format&q=92&s=653706657ccc49f68a27445ea37ad39a",
"small":"https://ksr-ugc.imgix.net/assets/019/223/259/16513215a3869caaea2d35d43f3c0c5f_original.jpg?w=160&h=160&fit=crop&v=1510685152&auto=format&q=92&s=0bd2f3cec5f12553e679153ba2b5d7fa",
"medium":"https://ksr-ugc.imgix.net/assets/019/223/259/16513215a3869caaea2d35d43f3c0c5f_original.jpg?w=160&h=160&fit=crop&v=1510685152&auto=format&q=92&s=0bd2f3cec5f12553e679153ba2b5d7fa"
},
"urls":{
"web":{
"user":"https://www.kickstarter.com/profile/2140043003"
},
"api":{
"user":"https://api.kickstarter.com/v1/users/2140043003?signature=1531480520.09df9a36f649d71a3a81eb14684ad0d3afc83e03"
}
}
}]
# Create datafrane
df = pd.json_normalize(sample)
# Select columns into new dataframe.
df1 = df.loc[:, ["name", "id",]]
Check df1:
Input:
print(df1)
Output:
name id
0 Olallo Rubio 2140043003
I have a generator being returned from:
data = public_client.get_product_trades(product_id='BTC-USD', limit=10)
How do i turn the data in to a pandas dataframe?
the method DOCSTRING reads:
"""{"Returns": [{
"time": "2014-11-07T22:19:28.578544Z",
"trade_id": 74,
"price": "10.00000000",
"size": "0.01000000",
"side": "buy"
}, {
"time": "2014-11-07T01:08:43.642366Z",
"trade_id": 73,
"price": "100.00000000",
"size": "0.01000000",
"side": "sell"
}]}"""
I have tried:
df = [x for x in data]
df = pd.DataFrame.from_records(df)
but it does not work as i get the error:
AttributeError: 'str' object has no attribute 'keys'
When i print the above "x for x in data" i see the list of dicts but the end looks strange, could this be why?
print(list(data))
[{'time': '2020-12-30T13:04:14.385Z', 'trade_id': 116918468, 'price': '27853.82000000', 'size': '0.00171515', 'side': 'sell'},{'time': '2020-12-30T12:31:24.185Z', 'trade_id': 116915675, 'price': '27683.70000000', 'size': '0.01683711', 'side': 'sell'}, 'message']
It looks to be a list of dicts but the end value is a single string 'message'.
Based on the updated question:
df = pd.DataFrame(list(data)[:-1])
Or, more cleanly:
df = pd.DataFrame([x for x in data if isinstance(x, dict)])
print(df)
time trade_id price size side
0 2020-12-30T13:04:14.385Z 116918468 27853.82000000 0.00171515 sell
1 2020-12-30T12:31:24.185Z 116915675 27683.70000000 0.01683711 sell
Oh, and BTW, you'll still need to change those strings into something usable...
So e.g.:
df['time'] = pd.to_datetime(df['time'])
for k in ['price', 'size']:
df[k] = pd.to_numeric(df[k])
You could access the values in the dictionary and build a dataframe from it (although not particularly clean):
dict_of_data = [{
"time": "2014-11-07T22:19:28.578544Z",
"trade_id": 74,
"price": "10.00000000",
"size": "0.01000000",
"side": "buy"
}, {
"time": "2014-11-07T01:08:43.642366Z",
"trade_id": 73,
"price": "100.00000000",
"size": "0.01000000",
"side": "sell"
}]
import pandas as pd
list_of_data = [list(dict_of_data[0].values()),list(dict_of_data[1].values())]
pd.DataFrame(list_of_data, columns=list(dict_of_data[0].keys())).set_index('time')
its straightforward just use the pd.DataFrame constructor:
#list_of_dicts = [{
# "time": "2014-11-07T22:19:28.578544Z",
# "trade_id": 74,
# "price": "10.00000000",
# "size": "0.01000000",
# "side": "buy"
# }, {
# "time": "2014-11-07T01:08:43.642366Z",
# "trade_id": 73,
# "price": "100.00000000",
# "size": "0.01000000",
# "side": "sell"
#}]
# or if you take it from 'data'
list_of_dicts = data[:-1]
df = pd.DataFrame(list_of_dicts)
df
Out[4]:
time trade_id price size side
0 2014-11-07T22:19:28.578544Z 74 10.00000000 0.01000000 buy
1 2014-11-07T01:08:43.642366Z 73 100.00000000 0.01000000 sell
UPDATE
according to the question update, it seems you have json data that is still string...
import json
data = json.loads(data)
data = data['Returns']
pd.DataFrame(data)
time trade_id price size side
0 2014-11-07T22:19:28.578544Z 74 10.00000000 0.01000000 buy
1 2014-11-07T01:08:43.642366Z 73 100.00000000 0.01000000 sell
I'm trying to wrangle some data to make a recommender system for an app. Of course, to do this I need a record of which users like which posts. I currently have that data in a JSON file that is formatted like this (numbers being post id, and letters being user ids):
{
"-1234": {
"abc": "abc",
"def": "def",
"ghi": "ghi"
},
"-5678": {
"jkl": "jkl",
"mno": "mno"
}
I'm trying to figure out how to get this into a pandas dataframe that would look like this:
example format
I've tried using a few online JSON to CSV converters out of laziness which unsurprisingly didn't bring it into a useable format for me. I've tried using "print(json_normalize(data))", as well which also did not work, and put each instance of a like into separate columns.
Any advice?
This is a solution optimized for the peculiarities in your dataset.
import pandas as pd
data = {
"-1234": {
"abc": "abc",
"def": "def",
"ghi": "ghi"
},
"-5678": {
"jkl": "jkl",
"mno": "mno"
}}
formatted = [{'PostID': d, 'User Like': list(data[d].keys())} for d in data]
df = pd.DataFrame.from_dict(formatted)
Output:
From my experience for such simple formats, writing a quick and dirty loop is usually the fastest method rather than finding some ready solution and customizing it. An example for the data you gave here:
import json
my_json=""" {
"-1234": {
"abc": "abc",
"def": "def",
"ghi": "ghi"
},
"-5678": {
"jkl": "jkl",
"mno": "mno"
}
}"""
parsed_json = json.loads(my_json)
print(parsed_json)
# result:
# {'-1234': {'abc': 'abc', 'def': 'def', 'ghi': 'ghi'},
# '-5678': {'jkl': 'jkl', 'mno': 'mno'}}
for key in parsed_json.keys():
line = ''
line += key
line += ' | '
for value in parsed_json[key].values():
line += value + ', '
line = line[:-2] # stripping the ', ' from the end of the line
print(line)
# result:
# -1234 | abc, def, ghi
# -5678 | jkl, mno
Setup
Thanks Zaroth
import json
my_json=""" {
"-1234": {
"abc": "abc",
"def": "def",
"ghi": "ghi"
},
"-5678": {
"jkl": "jkl",
"mno": "mno"
}
}"""
parsed_json = json.loads(my_json)
Comprehension
pd.DataFrame(
[(k, [*v]) for k, v in parsed_json.items()],
columns=['PostID', 'User Like']
)
PostID User Like
0 -1234 [abc, def, ghi]
1 -5678 [jkl, mno]
OR
pd.DataFrame({
'PostID': [*parsed_json],
'User Like': [[*v] for v in parsed_json.values()]
})
data = {"-1234": {"abc": "abc","def": "def","ghi": "ghi"},"-5678": {"jkl": "jkl","mno": "mno"}}
key = []
val = []
for k,v in data.items():
key.append(k)
val.append(list(v.values()))
pd.DataFrame(zip(key,val),columns=['PostID','User Like'])