Extract values from Dictionary inside a pandas dataframe (Python)

Extract values from Dictionary inside a pandas dataframe (Python) - python

trying to extract the dictionary in a dataframe. but unable to. none of the solution mentioned matches my requirement hence seeking help for the same.
instrument_token last_price change depth
0 17600770 180.75 20.500000 {'buy': [{'quantity': 1, 'price': 1, 'orders': 1},{'quantity': 0, 'price': 0.0, 'orders': 0}], 'sell': [{'quantity': 1, 'price': 1, 'orders': 1},{'quantity': 0, 'price': 0.0, 'orders': 0}]}
1 12615426 0.05 -50.000000 {'buy': [{'quantity': 2, 'price': 2, 'orders': 2},{'quantity': 0, 'price': 0.0, 'orders': 0}], 'sell': [{'quantity': 2, 'price': 2, 'orders': 2},{'quantity': 0, 'price': 0.0, 'orders': 0}]}
2 17543682 0.35 -89.062500 {'buy': [{'quantity': 3, 'price': 3, 'orders': 3},{'quantity': 0, 'price': 0.0, 'orders': 0}], 'sell': [{'quantity': 3, 'price': 3, 'orders': 3},{'quantity': 0, 'price': 0.0, 'orders': 0}]}
3 17565954 6.75 -10.000000 {'buy': [{'quantity': 4, 'price': 4, 'orders': 4},{'quantity': 0, 'price': 0.0, 'orders': 0}], 'sell': [{'quantity': 4, 'price': 4, 'orders': 4},{'quantity': 0, 'price': 0.0, 'orders': 0}]}
4 26077954 3.95 -14.130435 {'buy': [{'quantity': 5, 'price': 5, 'orders': 5},{'quantity': 0, 'price': 0.0, 'orders': 0}], 'sell': [{'quantity': 5, 'price': 5, 'orders': 5},{'quantity': 0, 'price': 0.0, 'orders': 0}]}
5 17599490 141.75 -2.241379 {'buy': [{'quantity': 6, 'price': 6, 'orders': 6},{'quantity': 0, 'price': 0.0, 'orders': 0}], 'sell': [{'quantity': 6, 'price': 6, 'orders': 6},{'quantity': 0, 'price': 0.0, 'orders': 0}]}
6 17566978 17.65 -1.671309 {'buy': [{'quantity': 7, 'price': 7, 'orders': 7},{'quantity': 0, 'price': 0.0, 'orders': 0}], 'sell': [{'quantity': 7, 'price': 7, 'orders': 7},{'quantity': 0, 'price': 0.0, 'orders': 0}]}
7 26075906 24.70 -16.554054 {'buy': [{'quantity': 8, 'price': 8, 'orders': 8},{'quantity': 0, 'price': 0.0, 'orders': 0}], 'sell': [{'quantity': 8, 'price': 8, 'orders': 8},{'quantity': 0, 'price': 0.0, 'orders': 0}]}
looking to convert to the following:
instrument_token last_price change buy_price sell_price
0 17600770 180.75 20.500000 1 1
1 12615426 0.05 -50.000000 2 2
2 17543682 0.35 -89.062500 3 3
3 17565954 6.75 -10.000000 4 4
4 26077954 3.95 -14.130435 5 5
5 17599490 141.75 -2.241379 6 6
6 17566978 17.65 -1.671309 7 7
...
able to access the individual elements using a for loop by unable to convert the dictionary to the desired df.col as shown in the above desired df.

You want to get price only from the first element of the list, and not a sum, then do:
df["buy_price"]=df["depth"].str["buy"].str[0].str["price"]
df["sell_price"]=df["depth"].str["sell"].str[0].str["price"]
In case you wish to get a sum of all nested elements:
df["buy_price"]=df["depth"].str["buy"].apply(lambda x: sum(el["price"] for el in x))
df["sell_price"]=df["depth"].str["sell"].apply(lambda x: sum(el["price"] for el in x))

Is this what you're looking for?
def get_prices(depth, tag):
def sum(items):
total = 0
for item in items:
total += item['price']
return total
return int(sum(depth[tag]))
df['buy_price'] = df['depth'].apply(lambda depth: get_prices(depth, 'buy'))
df['sell_price'] = df['depth'].apply(lambda depth: get_prices(depth, 'sell'))
df.drop(columns='depth', inplace=True)
print(df)
Output:
instrument_token last_price change buy_price sell_price
0 17600770 180.75 20.500000 1 1
1 12615426 0.05 -50.000000 2 2
2 17543682 0.35 -89.062500 3 3
3 17565954 6.75 -10.000000 4 4
4 26077954 3.95 -14.130435 5 5
5 17599490 141.75 -2.241379 6 6
6 17566978 17.65 -1.671309 7 7
7 26075906 24.70 -16.554054 8 8

I use ast here to get it into Python data structure from string. For actual dictionaries, as is your case, you can remove the ast.literal_eval part out of the script.
Get the dictionary and merge back to original dataframe. Assumption, based on your output is that you are only interested in the first dict in each sublist for buy and sell respectively.
import ast
res = [{f"{x}_price" : ast.literal_eval(ent)[x][0]['price']
for x in ("buy","sell")}
for ent in df.pop('depth') ]
df.join(pd.DataFrame(res))
instrument_token last_price change buy_price sell_price
0 17600770 180.75 20.500000 1 1
1 12615426 0.05 -50.000000 2 2
2 17543682 0.35 -89.062500 3 3
3 17565954 6.75 -10.000000 4 4
4 26077954 3.95 -14.130435 5 5
5 17599490 141.75 -2.241379 6 6
6 17566978 17.65 -1.671309 7 7
7 26075906 24.70 -16.554054 8 8
For actual dictionaries:
res = [{f"{x}_price" : ent[x][0]['price']
for x in ("buy","sell")}
for ent in df.pop('depth') ]
#merge back to df
result = df.join(pd.DataFrame(res))

Related

python nested dictionary to pandas DataFrame

main_dict = {
'NSE:ACC': {'average_price': 0,
'buy_quantity': 0,
'depth': {'buy': [{'orders': 0, 'price': 0, 'quantity': 0},
{'orders': 0, 'price': 0, 'quantity': 0},
{'orders': 0, 'price': 0, 'quantity': 0},
{'orders': 0, 'price': 0, 'quantity': 0},
{'orders': 0, 'price': 0, 'quantity': 0}],
'sell': [{'orders': 0, 'price': 0, 'quantity': 0},
{'orders': 0, 'price': 0, 'quantity': 0},
{'orders': 0, 'price': 0, 'quantity': 0},
{'orders': 0, 'price': 0, 'quantity': 0},
{'orders': 0, 'price': 0, 'quantity': 0}]},
'instrument_token': 5633,
'last_price': 2488.9,
'last_quantity': 0,
'last_trade_time': '2022-09-23 15:59:10',
'lower_circuit_limit': 2240.05,
'net_change': 0,
'ohlc': {'close': 2555.7,
'high': 2585.5,
'low': 2472.2,
'open': 2575},
'oi': 0,
'oi_day_high': 0,
'oi_day_low': 0,
'sell_quantity': 0,
'timestamp': '2022-09-23 18:55:17',
'upper_circuit_limit': 2737.75,
'volume': 0},
}
convert dict to pandas dataframe
for example:
symbol last_price net_change Open High Low Close
NSE:ACC 2488.9 0 2575 2585.5 2472.2 2555.7
I am trying pd.DataFrame.from_dict(main_dict)
but it does not work.
please give the best suggestion.

I would first select the necessary data from your dict and then pass that as input to pd.DataFrame()
df_input = [{
"symbol": symbol,
"last_price": main_dict.get(symbol).get("last_price"),
"net_change": main_dict.get(symbol).get("net_change"),
"open": main_dict.get(symbol).get("ohlc").get("open"),
"high": main_dict.get(symbol).get("ohlc").get("high"),
"low": main_dict.get(symbol).get("ohlc").get("low"),
"close": main_dict.get(symbol).get("ohlc").get("close")
} for symbol in main_dict]
import pandas as pd
df = pd.DataFrame(df_input)

Nested Json in Pandas Column

I have a dataframe with nested json as column.
df.depth
0 {'buy': [{'quantity': 51, 'price': 2275.85, 'o...
1 {'buy': [{'quantity': 1, 'price': 2275.85, 'or...
2 {'buy': [{'quantity': 1, 'price': 2275.85, 'or..
inside each row have 5 depths of buy sell
df.depth[0]
{'buy': [{...}, {...}, {...}, {...}, {...}], 'sell': [{...}, {...}, {...}, {...}, {...}]}
real json structure is as below
{'buy': [{'quantity': 51, 'price': 2275.85, 'orders': 2}, {'quantity': 38, 'price': 2275.8, 'orders': 2}, {'quantity': 108, 'price': 2275.75, 'orders': 3}, {'quantity': 120, 'price': 2275.7, 'orders': 2}, {'quantity': 6, 'price': 2275.6, 'orders': 1}], 'sell': [{'quantity': 353, 'price': 2276.95, 'orders': 1}, {'quantity': 29, 'price': 2277.0, 'orders': 2}, {'quantity': 54, 'price': 2277.1, 'orders': 2}, {'quantity': 200, 'price': 2277.2, 'orders': 1}, {'quantity': 4, 'price': 2277.25, 'orders': 1}]}
i want to explode this in to something like this
Required Output:
depth.buy.quantity1 df.depth.buy.price1 ... depth.sell.quantity1 depth.sell.price1....
0 51 2275.85.... 353 2276
1 1 2275.85.... 352 2276
how to do it ?
Edit:
for help i have added demo dataframe:
a={'buy': [{'quantity': 51, 'price': 2275.85, 'orders': 2}, {'quantity': 38, 'price': 2275.8, 'orders': 2}, {'quantity': 108, 'price': 2275.75, 'orders': 3}, {'quantity': 120, 'price': 2275.7, 'orders': 2}, {'quantity': 6, 'price': 2275.6, 'orders': 1}], 'sell': [{'quantity': 353, 'price': 2276.95, 'orders': 1}, {'quantity': 29, 'price': 2277.0, 'orders': 2}, {'quantity': 54, 'price': 2277.1, 'orders': 2}, {'quantity': 200, 'price': 2277.2, 'orders': 1}, {'quantity': 4, 'price': 2277.25, 'orders': 1}]}
c=dict()
c['depth'] = a
df = pd.DataFrame([c,c])

You could try concat:
df = pd.concat([pd.concat([pd.DataFrame(x, index=[0]) for x in i], axis=1) for i in pd.json_normalize(df['depth'])['buy'].tolist()], ignore_index=True)
print(df)
Output:
quantity price orders quantity price orders ... quantity price orders quantity price orders
0 51 2275.85 2 38 2275.8 2 ... 120 2275.7 2 6 2275.6 1
1 51 2275.85 2 38 2275.8 2 ... 120 2275.7 2 6 2275.6 1
[2 rows x 15 columns]

Pandas grouping and express as proportion

d = [{'name': 'tv', 'value': 10, 'amount': 35},
{'name': 'tv', 'value': 10, 'amount': 14},
{'name': 'tv', 'value': 15, 'amount': 23},
{'name': 'tv', 'value': 34, 'amount': 56},
{'name': 'radio', 'value': 90, 'amount': 35},
{'name': 'radio', 'value': 90, 'amount': 65},
{'name': 'radio', 'value': 100, 'amount': 50},
{'name': 'dvd', 'value': 0.5, 'amount': 35},
{'name': 'dvd', 'value': 0.2, 'amount': 40},
{'name': 'dvd', 'value': 0.5, 'amount': 15}
]
df = pd.DataFrame(d)
dff = df.groupby(['name', 'value']).agg('sum').reset_index()
dfff = dff.groupby(['name']).apply(lambda x: round((x['amount']/x['amount'].sum())*100))
print(dff)
print(dfff)
name value amount
0 dvd 0.2 40
1 dvd 0.5 50
2 radio 90.0 100
3 radio 100.0 50
4 tv 10.0 49
5 tv 15.0 23
6 tv 34.0 56
name
dvd 0 44.0
1 56.0
radio 2 67.0
3 33.0
tv 4 38.0
5 18.0
6 44.0
I now want to take this dataset and concatenate the rows grouped on the name variable. The amount variable should be expressed as a proportion.
The final dataset should look like below, where the value is the first term and amount expressed as a proportion is the second term.
name concatenated_values
0 dvd 0.2, 44%, 0.5, 56%
1 radio 90, 67%, 100, 33%
.
.
.

Use custom lambda function with flatten nested lists in GroupBy.apply:
dff = df.groupby(['name', 'value']).agg('sum').reset_index()
dff['amount'] = ((dff['amount'] / dff.groupby(['name'])['amount'].transform('sum')*100)
.round().astype(int).astype(str) + '%')
f = lambda x: ', '.join(str(z) for y in x.to_numpy() for z in y)
d = dff.groupby('name')[['value','amount']].apply(f).reset_index(name='concatenated_values')
print(d)
name concatenated_values
0 dvd 0.2, 44%, 0.5, 56%
1 radio 90.0, 67%, 100.0, 33%
2 tv 10.0, 38%, 15.0, 18%, 34.0, 44%

How to unpack an object of dictionaries to a range of Data Frames

I am creating a function that grabs data from an ERP system to display to the end user.
I want to unpack an object of dictionaries and create a range of Pandas DataFrames with them.
For example, I have:
troRows
{0: [{'productID': 134336, 'price': '10.0000', 'amount': '1', 'cost': 0}],
1: [{'productID': 142141, 'price': '5.5000', 'amount': '4', 'cost': 0}],
2: [{'productID': 141764, 'price': '5.5000', 'amount': '1', 'cost': 0}],
3: [{'productID': 81661, 'price': '4.5000', 'amount': '1', 'cost': 0}],
4: [{'productID': 146761, 'price': '5.5000', 'amount': '1', 'cost': 0}],
5: [{'productID': 143585, 'price': '5.5900', 'amount': '9', 'cost': 0}],
6: [{'productID': 133018, 'price': '5.0000', 'amount': '1', 'cost': 0}],
7: [{'productID': 146250, 'price': '13.7500', 'amount': '5', 'cost': 0}],
8: [{'productID': 149986, 'price': '5.8900', 'amount': '2', 'cost': 0},
{'productID': 149790, 'price': '4.9900', 'amount': '2', 'cost': 0},
{'productID': 149972, 'price': '5.2900', 'amount': '2', 'cost': 0},
{'productID': 149248, 'price': '2.0000', 'amount': '2', 'cost': 0},
{'productID': 149984, 'price': '4.2000', 'amount': '2', 'cost': 0},
Each time the function will need to unpack x number of dictionaries which may have different number of rows into a range of DataFrames.
So for example, this range of Dictionaries would return
DF0, DF1, DF2, DF3, DF4, DF5, DF6, DF7, DF8.
I can unpack a single Dictionary with:
pd.DataFrame(troRows[8])
which returns
amount cost price productID
0 2 0 5.8900 149986
1 2 0 4.9900 149790
2 2 0 5.2900 149972
3 2 0 2.0000 149248
4 2 0 4.2000 149984
How can I structure my code so that it does this for all the dictionaries for me?

Solution for dictionary of DataFrames - use dictioanry comprehension and set index values to keys of dictionary:
dfs = {k: pd.DataFrame(v) for k, v in troRows.items()}
print (dfs)
{0: amount cost price productID
0 1 0 10.0000 134336, 1: amount cost price productID
0 4 0 5.5000 142141, 2: amount cost price productID
0 1 0 5.5000 141764, 3: amount cost price productID
0 1 0 4.5000 81661, 4: amount cost price productID
0 1 0 5.5000 146761, 5: amount cost price productID
0 9 0 5.5900 143585, 6: amount cost price productID
0 1 0 5.0000 133018, 7: amount cost price productID
0 5 0 13.7500 146250, 8: amount cost price productID
0 2 0 5.8900 149986
1 2 0 4.9900 149790
2 2 0 5.2900 149972
3 2 0 2.0000 149248
4 2 0 4.2000 149984}
print (dfs[8])
amount cost price productID
0 2 0 5.8900 149986
1 2 0 4.9900 149790
2 2 0 5.2900 149972
3 2 0 2.0000 149248
4 2 0 4.2000 149984
Solutions for one DataFrame:
Use list comprehension with flattening and pass it to DataFrame constructor:
troRows = pd.Series([[{'productID': 134336, 'price': '10.0000', 'amount': '1', 'cost': 0}],
[{'productID': 142141, 'price': '5.5000', 'amount': '4', 'cost': 0}],
[{'productID': 141764, 'price': '5.5000', 'amount': '1', 'cost': 0}],
[{'productID': 81661, 'price': '4.5000', 'amount': '1', 'cost': 0}],
[{'productID': 146761, 'price': '5.5000', 'amount': '1', 'cost': 0}],
[{'productID': 143585, 'price': '5.5900', 'amount': '9', 'cost': 0}],
[{'productID': 133018, 'price': '5.0000', 'amount': '1', 'cost': 0}],
[{'productID': 146250, 'price': '13.7500', 'amount': '5', 'cost': 0}],
[{'productID': 149986, 'price': '5.8900', 'amount': '2', 'cost': 0},
{'productID': 149790, 'price': '4.9900', 'amount': '2', 'cost': 0},
{'productID': 149972, 'price': '5.2900', 'amount': '2', 'cost': 0},
{'productID': 149248, 'price': '2.0000', 'amount': '2', 'cost': 0},
{'productID': 149984, 'price': '4.2000', 'amount': '2', 'cost': 0}]])
df = pd.DataFrame([y for x in troRows for y in x])
Another solution for flatten your data is use chain.from_iterable:
from itertools import chain
df = pd.DataFrame(list(chain.from_iterable(troRows)))
print (df)
amount cost price productID
0 1 0 10.0000 134336
1 4 0 5.5000 142141
2 1 0 5.5000 141764
3 1 0 4.5000 81661
4 1 0 5.5000 146761
5 9 0 5.5900 143585
6 1 0 5.0000 133018
7 5 0 13.7500 146250
8 2 0 5.8900 149986
9 2 0 4.9900 149790
10 2 0 5.2900 149972
11 2 0 2.0000 149248
12 2 0 4.2000 149984

Find keys occurence in list of objects in python

I'm trying to transform this list:
list = [
{'product_name': '4x6', 'quantity': 1, 'price': 0.29},
{'product_name': '4x6', 'quantity': 1, 'price': 0.29},
{'product_name': '4x6', 'quantity': 1, 'price': 0.29},
{'product_name': '4x6', 'quantity': 1, 'price': 0.29},
{'product_name': '4x4', 'quantity': 1, 'price': 0.29},
{'product_name': '4x4', 'quantity': 1, 'price': 0.29},
{'product_name': '4x4', 'quantity': 1, 'price': 0.29},
{'product_name': '4x4', 'quantity': 1, 'price': 0.29},
]
into the sum of occurence on the key 'product_name' like this:
list_final = [
{'product_name': '4x6', 'quantity': 4, 'price': 1.16},
{'product_name': '4x4', 'quantity': 4, 'price': 1.16},
]
I can't figure how to search the occurence of the key 'product_name' without doing loops in loops
what I did :
for item in list:
if item.product_name in data.keys():
data[item.product_name]['qty'] += 1
data[item.product_name]['price'] *= 2
else:
data.update({item.product_name: [{'qty': item['quantity'], 'price': item['price']}]})
but I cant find a solution to get my list as I want
how can I do this right ?

Here's a solution with OrderedDict that handles multiple products.
from collections import OrderedDict
o = OrderedDict()
for x in data:
p = x['product_name']
if p not in o:
o[p] = x
else:
o[p].update({k : o[p][k] + x[k] for k in x.keys() - {'product_name'}})
list_final = list(o.values())
A product is added to the inventory if it doesn't exist, or else is summed with the existing inventory. This should work on python3.x and above.
print(list_final)
[{'price': 1.16, 'product_name': '4x6', 'quantity': 4}]
For python2.x, change this
o[p].update({k : o[p][k] + x[k] for k in x.keys() - {'product_name'}})
To
o[p].update({k : o[p][k] + x[k] for k in set(x.keys()) - {'product_name'}})

Probably not the most readable, but here's a for loop-free implementation:
def transform(array):
def inner(cumulator, row):
product_name = row['product_name']
bucket = cumulator.get(product_name, {'quantity': 0, 'price': 0})
cumulator[product_name] = {
'quantity': bucket['quantity'] + row['quantity'],
'price': bucket['price'] + row['price'],
}
return cumulator
return reduce(inner, array, {})
And then you just
transform(list)
// {'4x6': {'price': 1.16, 'quantity': 4}}

this might help:
l = [
{'product_name': '4x6', 'quantity': 1, 'price': 0.29},
{'product_name': '4x6', 'quantity': 1, 'price': 0.29},
{'product_name': '4x6', 'quantity': 1, 'price': 0.29},
{'product_name': '4x6', 'quantity': 1, 'price': 0.29},
{'product_name': '4x4', 'quantity': 1, 'price': 0.29},
{'product_name': '4x4', 'quantity': 1, 'price': 0.29},
{'product_name': '4x4', 'quantity': 1, 'price': 0.29},
{'product_name': '4x4', 'quantity': 1, 'price': 0.29},
]
from collections import defaultdict
count = defaultdict(lambda: {'quantity':0, 'price':0.0})
for d in l:
count[d['product_name']]['quantity'] += 1
count[d['product_name']]['price'] = d['price']
for prod_name, prod_info in count.items():
print("product_name:", prod_name, "quantity: {quantity} price: {price}".format(**prod_info))
Output for your input:
product_name: 4x6 quantity: 4 price: 0.29
product_name: 4x4 quantity: 4 price: 0.29
Note: This also works with python2

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extract values from Dictionary inside a pandas dataframe (Python) - python

Related

python nested dictionary to pandas DataFrame

Nested Json in Pandas Column

Pandas grouping and express as proportion

How to unpack an object of dictionaries to a range of Data Frames

Find keys occurence in list of objects in python

Categories

Resources