How to map a dataframe columns to dictionary of lists? - python

I have a dataframe of two columns where one category (area_id) englobes the other one (location_id), how can I get a dictionary of lists where keys are the "area_id" and their respective values are lists of "location_id" present in the given "area_id"?
Concretely, given the dataframe:
df = pd.DataFrame(data={'area_id': ['area_1', 'area_1', 'area_1', 'area_2', 'area_2', 'area_3'],
'location_id': ['loc_a', 'loc_a', 'loc_b', 'loc_c', 'loc_d', 'loc_e']})
area_id location_id
0 area_1 loc_a
1 area_1 loc_a
2 area_1 loc_b
3 area_2 loc_c
4 area_2 loc_d
5 area_3 loc_e
I would like the following dictionary:
{'area_1': ['loc_a', 'loc_b'],
'area_2': ['loc_c', 'loc_d'],
'area_3': ['loc_e']}
Code below is a working solution, but I am wondering if there is a more elegant solution which avoids using a "for" loop:
res = {}
for _area in df['area_id'].unique():
_locs = list(df[df['area_id'] == _area]['location_id'].unique())
res[_area] = _locs
Thank you

Use:
df.drop_duplicates().groupby('area_id')['location_id'].agg(list).to_dict()
Output:
{'area_1': ['loc_a', 'loc_b'],
'area_2': ['loc_c', 'loc_d'],
'area_3': ['loc_e']}

Related

Merge dataframes from two dictionaries through a loop

Tried to keep this relatively simple but let me know if you need more information.
I have 2 dictionaries made up of three dataframes each, these have been produced through loops then added into a dictionary. They have the keys ['XAUUSD', 'EURUSD', 'GBPUSD'] in common:
trades_dict
{'XAUUSD': df_trades_1
'EURUSD': df_trades_2
'GBPUSD': df_trades_3}
prices_dict
{'XAUUSD': df_prices_1
'EURUSD': df_prices_2
'GBPUSD': df_prices_3}
I would like to merge the tables on the closest timestamps to produce 3 new dataframes such that the XAUUSD trades dataframe is merged with the corresponding XAUUSD prices dataframe and so on
I have been able to join the dataframes in a loop using:
df_merge_list = []
for trades in trades_dict.values():
for prices in prices_dict.values():
df_merge = pd.merge_asof(trades, prices, left_on='transact_time', right_on='time', direction='backward')
df_merge_list.append(df_merge)
However this produces a list of 9 dataframes, XAUUSD trades + XAUUSD price, XAUUSD trades + EURUSD price and XAUUSD trades + GBPUSD price etc.
Is there a way for me to join only the dataframes where the keys are identical? I'm assuming it will need to be something like this: if trades_dict.keys() == prices_dict.keys():
df_merge_list = []
for trades in trades_dict.values():
for prices in prices_dict.values():
if trades_dict.keys() == prices_dict.keys():
df_merge = pd.merge_asof(trades, prices, left_on='transact_time', right_on='time', direction='backward')
df_merge_list.append(df_merge)
but I'm getting the same result as above
Am I close? How can I do this for all instruments and only produce the 3 outputs I need? Any help is appreciated
Thanks in advance
"""
Pseudocode :
For each key in the list of keys in trades_dict :
Pick that key's value (trades df) from trades_dict
Using the same key, pick corresponding value (prices df) from prices_dict
Merge both values (trades & prices dataframes)
"""
df_merge_list = []
for key in trades_dict.keys():
trades = trades_dict[key]
prices = prices_dict[key] # using the same key to get corresponding prices
df_merge = pd.merge_asof(trades, prices, left_on='transact_time', right_on='time', direction='backward')
df_merge_list.append(df_merge)
What went wrong in code posted in question?
Nested for loop creates cartesian product
3 iterations in outer loop multiplied by 3 iterations in inner loop = 9 iterations
Result of trades_dict.keys() == prices_dict.keys() is True in all 9 iterations
dict_a_all_keys == dict_b_all_keys is not same as dict_a_key_1 == dict_b_key_1. So, you could iterate through keys of dictionary and check if they are matching in nested loop, like this :
df_merge_list = []
for trades_key in trades_dict.keys():
for prices_key in prices_dict.keys():
if trades_key == prices_key:
trades = trades_dict[trades_key]
prices = prices_dict[trades_key] # since trades_key is same as prices_key, they are interchangeable
df_merge = pd.merge_asof(trades, prices, left_on='transact_time', right_on='time', direction='backward')
df_merge_list.append(df_merge)
You need to provide the exact dataframes with the correct column names in a reproducible form but you can use a dictionary like this:
import numpy as np
import pandas as pd
np.random.seed(42)
df_trades_1 = df_trades_2 = df_trades_3 = pd.DataFrame(np.random.rand(10, 2), columns = ['ID1', 'Val1'])
df_prices_1 = df_prices_2 = df_prices_3 = pd.DataFrame(np.random.rand(10, 2), columns = ['ID2', 'Val2'])
trades_dict = {'XAUUSD':df_trades_1, 'EURUSD':df_trades_2, 'GBPUSD':df_trades_3}
prices_dict = {'XAUUSD':df_prices_1, 'EURUSD':df_prices_2, 'GBPUSD':df_prices_3}
frames ={}
for t in trades_dict.keys():
frames[t] = (pd.concat([trades_dict[t], prices_dict[t]], axis = 1))
frames['XAUUSD']
This would concatenate the two dataframes, making them both available under the same key:
ID1 Val1 ID2 Val2
0 0.374540 0.950714 0.611853 0.139494
1 0.731994 0.598658 0.292145 0.366362
2 0.156019 0.155995 0.456070 0.785176
3 0.058084 0.866176 0.199674 0.514234
4 0.601115 0.708073 0.592415 0.046450
5 0.020584 0.969910 0.607545 0.170524
6 0.832443 0.212339 0.065052 0.948886
7 0.181825 0.183405 0.965632 0.808397
8 0.304242 0.524756 0.304614 0.097672
9 0.431945 0.291229 0.684233 0.440152
You may need some error checking in case your keys don't match or the kind of join (left, right, inner etc.) depending upon your columns but that's the gist of it.

Extract values within the quotes signs into two separate columns with python

How can i extract the values within the quotes signs into two separate columns with python. The dataframe is given below:
df = pd.DataFrame(["'FRH02';'29290'", "'FRH01';'29300'", "'FRT02';'29310'", "'FRH03';'29340'",
"'FRH05';'29350'", "'FRG02';'29360'"], columns = ['postcode'])
df
postcode
0 'FRH02';'29290'
1 'FRH01';'29300'
2 'FRT02';'29310'
3 'FRH03';'29340'
4 'FRH05';'29350'
5 'FRG02';'29360'
i would like to get an output like the one below:
postcode1 postcode2
FRH02 29290
FRH01 29300
FRT02 29310
FRH03 29340
FRH05 29350
FRG02 29360
i have tried several str.extract codes but havent been able to figure this out. Thanks in advance.
Finishing Quang Hoang's solution that he left in the comments:
import pandas as pd
df = pd.DataFrame(["'FRH02';'29290'",
"'FRH01';'29300'",
"'FRT02';'29310'",
"'FRH03';'29340'",
"'FRH05';'29350'",
"'FRG02';'29360'"],
columns = ['postcode'])
# Remove the quotes and split the strings, which results in a Series made up of 2-element lists
postcodes = df['postcode'].str.replace("'", "").str.split(';')
# Unpack the transposed postcodes into 2 new columns
df['postcode1'], df['postcode2'] = zip(*postcodes)
# Delete the original column
del df['postcode']
print(df)
Output:
postcode1 postcode2
0 FRH02 29290
1 FRH01 29300
2 FRT02 29310
3 FRH03 29340
4 FRH05 29350
5 FRG02 29360
You can use Series.str.split:
p1 = []
p2 = []
for row in df['postcode'].str.split(';'):
p1.append(row[0])
p2.append(row[1])
df2 = pd.DataFrame()
df2["postcode1"] = p1
df2["postcode2"] = p2

Calculate Product of length of lists in dataframe and store in a new column

I have a dataframe, whose values are lists. How can I calculate product of lengths of all lists in a row, and store in a separate column? Maybe the following example will make it clear:
test_1 = ['Protocol', 'SCADA', 'SHM System']
test_2 = ['CM', 'Finances']
test_3 = ['RBA', 'PBA']
df = pd.DataFrame({'a':[test_1,test_2,test_3],'b':[test_2]*3, 'c':[test_3]*3, 'product of len(lists)':[12,8,8]})
This is a sample code which shows that in first row, the product is 3 * 2 * 2 = 12 which are lengths of each list in first row...and simlarly for other rows.
How can I compute these products and store in a new column, for a dataframe whose all values are lists?
Thank you.
Try using DataFrame.applymap and DataFrame.product:
df['product of len(lists)'] = df[['a', 'b', 'c']].applymap(len).product(axis=1)
[out]
a b c product of len(lists)
0 [Protocol, SCADA, SHM System] [CM, Finances] [RBA, PBA] 12
1 [CM, Finances] [CM, Finances] [RBA, PBA] 8
2 [RBA, PBA] [CM, Finances] [RBA, PBA] 8

Create a dataframe from one dictionary and remove a specific character

I would like to know if it is possible to create a dataframe from two dictionaries.
I get two dictionaries like this:
dict= {'MO': ['N-2', 'N-8', 'N-7', 'N-6', 'N-9'], 'MO2': ['N0-6'], 'MO3': ['N-2']}
My result will be like this :
ID NUM
0 MO 'N-2', 'N-8', 'N-7', 'N-6', 'N-9'
1 MO2 'N0-6'
2 MO3 'N-2'
I try to obtain this result but in the column with the value I get [] and I can't remove it
liste_id=list(dict.keys())
liste_num=list(dict.values())
df = pandas.DataFrame({'ID':liste_id,'NUM':liste_num})
Merge the values in the dictionary into a string, before creating the dataframe; this ensures the arrays are of the same length
pd.DataFrame([(key, ", ".join(value))
for key, value in dicts.items()],
columns = ['ID', 'NUM'])
ID NUM
0 MO N-2, N-8, N-7, N-6, N-9
1 MO2 N0-6
2 MO3 N-2

Pandas: How to map the list of dictionary in a column as a new row

The dataframe which is in below format has to be converted like "op_df",
ip_df=pd.DataFrame({'class':['I','II','III'],'details':[[{'sec':'A','assigned_to':'tom'},{'sec':'B','assigned_to':'sam'}],[{'sec':'B','assigned_to':'joe'}],[]]})
ip_df:
class details
0 I [{'sec':'A','assigned_to':'tom'},{'sec':'B','assigned_to':'sam'}]
1 II [{'sec':'B','assigned_to':'joe'}]
2 III []
The required output dataframe is suppose to be,
op_df:
class sec assigned_to
0 I A tom
1 I B sam
2 II B joe
3 III NaN NaN
How to change each dictionaries of "details" column as a new row with keys of the dictionary as column name and value of the dictionary as its respective column value?
I have tried with,
ip_df.join(ip_df['details'].apply(pd.Series))
whereas, I am unable to frame like "op_df".
I am sure there are better ways to do it, but I had to deconstruct your details list and create your dataframe as follows:
dict_values = {'class':['I','II','III'],'details':[[{'sec':'A','assigned_to':'tom'},{'sec':'B','assigned_to':'sam'}],[{'sec':'B','assigned_to':'joe'}],[]]}
all_values = []
for cl, detail in zip(dict_values['class'], dict_values['details']):
if len(detail) > 0:
for innerdict in detail:
row = {'class': cl}
for innerkey in innerdict.keys():
row[innerkey] = innerdict[innerkey]
all_values.append(row)
else:
row = {'class': cl}
all_values.append(row)
op_df = pd.DataFrame(all_values)

Categories

Resources