I'm struggling to convert a JSON API response into a pandas Dataframe object. I've read answers to similar questions/documentation but nothing has helped. My closest attempt is below:
r = requests.get('https://api.xxx')
data = r.text
df = pd.read_json(data, orient='records')
Which returns the following format:
0 {'type': 'bid', 'price': 6.193e-05, ...},
1 {'type': 'bid', 'price': 6.194e-05, ...},
3 {'type': 'bid', 'price': 6.149e-05, ...} etc
The original format of the data is:
{'abc': [{'type': 'bid',
'price': 6.194e-05,
'amount': 2321.37952545,
'tid': 8577050,
'timestamp': 1498649162},
{'type': 'bid',
'price': 6.194e-05,
'amount': 498.78993587,
'tid': 8577047,
'timestamp': 1498649151},
...]}
I'm happy to be directed to good documentation.
I think you need json_normalize:
from pandas import json_normalize
import requests
r = requests.get('https://api.xxx')
data = r.text
df = json_normalize(data, 'abc')
print (df)
amount price tid timestamp type
0 2321.379525 0.000062 8577050 1498649162 bid
1 498.789936 0.000062 8577047 1498649151 bid
For multiple keys is possible use concat with list comprehension and DataFrame constructor:
d = {'abc': [{'type': 'bid', 'price': 6.194e-05, 'amount': 2321.37952545, 'tid': 8577050, 'timestamp': 1498649162}, {'type': 'bid', 'price': 6.194e-05, 'amount': 498.78993587, 'tid': 8577047, 'timestamp': 1498649151}],
'def': [{'type': 'bid', 'price': 6.194e-05, 'amount': 2321.37952545, 'tid': 8577050, 'timestamp': 1498649162}, {'type': 'bid', 'price': 6.194e-05, 'amount': 498.78993587, 'tid': 8577047, 'timestamp': 1498649151}]}
df = pd.concat([pd.DataFrame(v) for k,v in d.items()], keys=d)
print (df)
amount price tid timestamp type
abc 0 2321.379525 0.000062 8577050 1498649162 bid
1 498.789936 0.000062 8577047 1498649151 bid
def 0 2321.379525 0.000062 8577050 1498649162 bid
1 498.789936 0.000062 8577047 1498649151 bid
Related
I would really appreciate any help on the below. I am looking to create a set of values with 1 name compiling all duplicates, with a second dict value to total another value from a list of dicts. i have compiled the below code as an example:
l = [{'id': 1, 'name': 'apple', 'price': '100', 'year': '2000', 'currency': 'eur'},
{'id': 2, 'name': 'apple', 'price': '150', 'year': '2071', 'currency': 'eur'},
{'id': 3, 'name': 'apple', 'price': '1220', 'year': '2076', 'currency': 'eur'},
{'id': 4, 'name': 'cucumber', 'price': '90000000', 'year': '2080', 'currency': 'eur'},
{'id': 5, 'name': 'pear', 'price': '1000', 'year': '2000', 'currency': 'eur'},
{'id': 6, 'name': 'apple', 'price': '150', 'year': '2022', 'currency': 'eur'},
{'id': 9, 'name': 'apple', 'price': '100', 'year': '2000', 'currency': 'eur'},
{'id': 10, 'name': 'grape', 'price': '150', 'year': '2022', 'currency': 'eur'},
]
new_list = []
for d in l:
if d['name'] not in new_list:
new_list.append(d['name'])
print(new_list)
price_list = []
for price in l:
if price['price'] not in price_list:
price_list.append(price['price'])
print(price_list)
The out put i am hoping to achieve is:
[{'name': 'apple'}, {'price': <The total price for all apples>}]
Use a dictionary whose keys are the names and values are the list of prices. Then calculate the averages of each list.
d = {}
for item in l:
d.setdefault(item['name'], []).append(int(item['price']))
for name, prices in d.items()
d[name] = sum(prices)
print(d)
Actually, I thought this was the same as yesterday's question, where you wanted the average. If you just want the total, you don't need the lists. Use a defaultdict containing integers, and just add the price to it.
from collections import defaultdict
d = defaultdict(int)
for item in l:
d[item['name']] += int(item['price'])
print(d)
This method only requires one loop:
prices = {}
for item in l:
prices.update({item['name']: prices.get(item['name'], 0) + int(item['price'])})
print(prices)
Just for fun I decided to also implement the functionality with the item and price dictionaries separated as asked in the question, which gave the following horrendous code:
prices = []
for item in l:
# get indices of prices of corresponding items
price_idx = [n+1 for n, x in enumerate(prices) if item['name'] == x.get('name') and n % 2 == 0]
if not price_idx:
prices.append({'name': item['name']})
prices.append({'price': int(item['price'])})
else:
prices[price_idx[0]].update({'price': prices[price_idx[0]]['price'] + int(item['price'])})
print(prices)
And requires the following function to retrieve prices:
def get_price(name):
for n, x in enumerate(prices):
if n % 2 == 0 and x['name'] == name:
return prices2[n+1]['price']
Which honestly completely defeats the point of having a data structure. But if it answers your question, there you go.
This could be another one:
result = {}
for item in l:
if item['name'] not in result:
result[item['name']] = {'name': item['name'], 'price': 0}
result[item['name']]['price'] += int(item['price'])
I have the next DataFrame:
a = [{'order': '789', 'name': 'A', 'date': 20220501, 'sum': 15.1}, {'order': '456', 'name': 'A', 'date': 20220501, 'sum': 19}, {'order': '704', 'name': 'B', 'date': 20220502, 'sum': 14.1}, {'order': '704', 'name': 'B', 'date': 20220502, 'sum': 22.9}, {'order': '700', 'name': 'B', 'date': 20220502, 'sum': 30.1}, {'order': '710', 'name': 'B', 'date': 20220502, 'sum': 10.5}]
df = pd.DataFrame(a)
print(df)
I need, to distinct (count) value in column order and to add values to the new column order_count, grouping by columns name and date, sum values in column sum.
I need to get the next result:
In your case do
out = df.groupby(['name','date'],as_index=False).agg({'sum':'sum','order':'nunique'})
Out[652]:
name date sum order
0 A 20220501 34.1 2
1 B 20220502 77.6 3
import pandas as pd
df[['name','date','sum']].groupby(by=['name','date']).sum().reset_index().rename(columns={'sum':'order_count'}).join(df[['name','date','sum']].groupby(by=['name','date']).count().reset_index().drop(['name','date'],axis=1))
I have a list of dictionaries:
mylist = [{'Date': '01/02/2020', 'Value': '13'},
{'Date': '01/03/2020', 'Value': '2'},
{'Date': '10/3/2020', 'Value': '4'},
{'Date': '12/25/2020', 'Value': '2'}]
I wanted to sum the Values of the Date from 01/01/2020 to 01/04/2020. I tried the following to select the rows within the date range:
from datetime import datetime
dfmylist = pd.DataFrame(mylist)
dfmylist['Date'] = pd.to_datetime(dfmylist['Date'])
dfmylistnew = (dfmylist['Date'] > '01/01/2020') & (dfmylist['Date'] <= '01/04/2020')
dfmylistnew1 = dfmylist.loc[dfmylistnew]
dfmylistnew1
I got the output data frame:
Date Value
0 2020-01-02 13
1 2020-01-03 2
I want to get the sum Value from the above data frame, which is 15
I tried:
total = dfmylistnew1['Value'].sum()
but the output is 132, instead of 15
From your data, convert values with the right type:
mylist = [{'Date': '01/02/2020', 'Value': '13'},
{'Date': '01/03/2020', 'Value': '2'},
{'Date': '10/3/2020', 'Value': '4'},
{'Date': '12/25/2020', 'Value': '2'}]
df = pd.DataFrame(mylist).astype({'Date': 'datetime64', 'Value': 'int'})
total = df.loc[df['Date'].between('01/01/2020', '01/04/2020', inclusive='right'),
'Value'].sum()
print(total)
# Output
15
I have a pandas dataframe in which one column custom consists of dictionaries within a list. The list may be empty or have one or more dictionary objects within it. for example...
id custom
1 []
2 [{'key': 'impact', 'name': 'Impact', 'value': 'abc', 'type': 'string'}, {'key': 'proposed_status','name': 'PROPOSED Status [temporary]', 'value': 'pqr', 'type': 'string'}]
3 [{'key': 'impact', 'name': 'Impact', 'value': 'xyz', 'type': 'string'}]
I'm interested in extracting the data from the JSON into separate columns based on the dict keys named 'key' and 'value'!
for example: here, the output df will have additional columns impact and proposed_status:
id custom impact proposed_status
1 ... NA NA
2 ... abc pqr
3 ... xyz NA
Could the smart people of StackOverflow please guide me on the right way to solve this? Thanks!
The approach is in the comments
df = pd.DataFrame({'id': [1, 2, 3],
'custom': [[],
[{'key': 'impact', 'name': 'Impact', 'value': 'abc', 'type': 'string'},
{'key': 'proposed_status',
'name': 'PROPOSED Status [temporary]',
'value': 'pqr',
'type': 'string'}],
[{'key': 'impact', 'name': 'Impact', 'value': 'xyz', 'type': 'string'}]]})
# expand out lists, reset_index() so join() will work
df2 = df.explode("custom").reset_index(drop=True)
# join to keep "id"
df2 = (df2.join(df2["custom"]
# expand embedded dict
.apply(pd.Series))
.loc[:,["id","key","value"]]
# empty list generate spurios NaN, remove
.dropna()
# turn key attribute into column
.set_index(["id","key"]).unstack(1)
# cleanup multi index columns
.droplevel(0, axis=1)
)
df.merge(df2, on="id", how="left")
id
custom
impact
proposed_status
0
1
[]
nan
nan
1
2
[{'key': 'impact', 'name': 'Impact', 'value': 'abc', 'type': 'string'}, {'key': 'proposed_status', 'name': 'PROPOSED Status [temporary]', 'value': 'pqr', 'type': 'string'}]
abc
pqr
2
3
[{'key': 'impact', 'name': 'Impact', 'value': 'xyz', 'type': 'string'}]
xyz
nan
I have a dataframe like this
df['likes']
0 {'data': [{'id': '651703178310339', 'name': 'A...
1 {'data': [{'id': '798659570200808', 'name': 'B...
2 {'data': [{'id': '10200132902001105', 'name': ...
3 {'data': [{'id': '10151983313320836', 'name': ...
4 NaN
5 {'data': [{'id': '1551927888235503', 'name': '...
6 {'data': [{'id': '10204089171847031', 'name': ...
7 {'data': [{'id': '399992547089295', 'name': 'В...
8 {'data': [{'id': '10201813292573808', 'name': ...
9 NaN
Some cells have several elements 'id'
df['likes'][0]
{'data': [{'id': '651703178310339', 'name': 'A'},
{'id': '10204089171847031', 'name': 'B'}],
'paging': {'cursors': {'after': 'MTAyMDQwODkxNzE4NDcwMzEZD',
'before': 'NjUxNzAzMTc4MzEwMzM5'}}}
Some cells have zero. I want to get a new variable
df['number']
0 2
1 4
2 3
4 0
That contains number of elements 'id'. df['likes'] was obtained from dict. I tried to count 'id'
df['likes'].apply(lambda x: x.count('id'))
AttributeError: 'dict' object has no attribute 'count'
So I tried like this
df['likes'].apply(lambda x: len(x.keys()))
AttributeError: 'float' object has no attribute 'keys'
How to fix it?
I was asked to publish a full set of data, I publish three lines so as not to take up much space
`df['likes']`
`0 {'data': [{'id': '651703178310339', 'name': 'A'},
{'id': '10204089171847031', 'name': 'B'}],
'paging': {'cursors': {'after': 'MTAyMDQwODkxNzE4NDcwMzEZD',
'before': 'NjUxNzAzMTc4MzEwMzM5'}}}
1 {'data': [{'id': '798659570200808', 'name': 'C'},
{'id': '574668895969867', 'name': 'D'},
{'id': '651703178310339', 'name': 'A'},
{'id': '1365088683555195', 'name': 'G'}],
'paging': {'cursors': {'after': 'MTM2NTA4ODY4MzU1NTE5NQZDZD',
'before': 'Nzk4NjU5NTcwMjAwODA4'}}}
2 NaN`
Option 1:
In [120]: df.likes.apply(pd.Series)['data'].apply(lambda x: pd.Series(x).notnull()).sum(1)
Out[120]:
0 2.0
1 4.0
2 0.0
dtype: float64
Option 2:
In [146]: df['count'] = [sum('id' in d for d in x.get('data',[]))
if pd.notna(x) else 0
for x in df['likes']]
In [147]: df
Out[147]:
likes count
0 {'data': [{'id': '651703178310339', 'name': 'A... 2
1 {'data': [{'id': '798659570200808', 'name': 'C... 4
2 NaN 0
Data set:
In [137]: df.to_dict('r')
Out[137]:
[{'likes': {'data': [{'id': '651703178310339', 'name': 'A'},
{'id': '10204089171847031', 'name': 'B'}],
'paging': {'cursors': {'after': 'MTAyMDQwODkxNzE4NDcwMzEZD',
'before': 'NjUxNzAzMTc4MzEwMzM5'}}}},
{'likes': {'data': [{'id': '798659570200808', 'name': 'C'},
{'id': '574668895969867', 'name': 'D'},
{'id': '651703178310339', 'name': 'A'},
{'id': '1365088683555195', 'name': 'G'}],
'paging': {'cursors': {'after': 'MTM2NTA4ODY4MzU1NTE5NQZDZD',
'before': 'Nzk4NjU5NTcwMjAwODA4'}}}},
{'likes': nan}]
This almost works:
df['likes'].apply(lambda x: len(x['data']))
Note the error:
> AttributeError: 'float' object has no attribute 'keys'
That happens because you have some NaN values (which are represented as float NAN). So:
df['likes'][df['likes'].notnull()].apply(lambda x: len(x['data']))