combine multiple dictionaries of pandas dataframes by dictionary key and common columns

combine multiple dictionaries of pandas dataframes by dictionary key and common columns - python

I have four dictionaries created by splitting four data frames by group I now need to join the data frames from each dictionary into a new dictionary using the key and common columns as join conditions.
for example:
import pandas as pd
from functools import reduce
df_1 = pd.DataFrame({'Group': ['A','B','C'] , 'ID': [1,2,3],'count': [10, 20, 30], 'colors': ['red', 'white', 'blue']})
df_2 = pd.DataFrame({'Group': ['A','B','C'] , 'ID': [1,2,3],'time': [1.3, 2.5, 3]})
df_3 = pd.DataFrame({'Group': ['A','B','C'] , 'ID': [1,2,3],'order_num': [2, 4, 7]})
df_4 = pd.DataFrame({'Group': ['A','B','C'] , 'ID': [1,2,3],'result': ['g','b','b']})
dict1= dict(tuple(df_1.groupby('Group')))
dict2= dict(tuple(df_2.groupby('Group')))
dict3= dict(tuple(df_3.groupby('Group')))
dict4= dict(tuple(df_4.groupby('Group')))
Desired results using manual solution:
datA=[dict1['A'],dict2['A'],dict3['A'],dict4['A']]
datB=[dict1['B'],dict2['B'],dict3['B'],dict4['B']]
datC=[dict1['C'],dict2['C'],dict3['C'],dict4['C']]
final_dict = {'A' : reduce(lambda left,right: pd.merge(left,right,on=['Group','ID']), datA),
'B' : reduce(lambda left,right: pd.merge(left,right,on=['Group','ID']), datB),
'C' : reduce(lambda left,right: pd.merge(left,right,on=['Group','ID']), datC)}
Any help with finding a scalable non-manual solution would be appreciated.

Is this dynamic enough?
# Put all your dicts into a dict of dicts
dict_dict = {str(i):dict_i for i,dict_i in enumerate([dict1,dict2,dict3,dict4])}
# swap the order of the indices so groups are keys and the
# list of grouped dfs are the items
dat_dicts = {group_key:[df_dict[group_key] for df_dict in dict_dict.values()]
for group_key in list(dict_dict.values())[0].keys()}
# Apply the reduce on each group key to merge the dfs
merged_dat_df_dict = {group_key:reduce(lambda left,right:
pd.merge(left,right,on=['Group','ID']),
dat_df_list)
for group_key,dat_df_list in dat_dicts.items()}

Related

Extract key value pairs from dict in pandas column using list items in another column

Trying to create a new column that is the key/value pairs extracted from a dict in another column using list items in a second column.
Sample Data:
names name_dicts
['Mary', 'Joe'] {'Mary':123, 'Ralph':456, 'Joe':789}
Expected Result:
names name_dicts new_col
['Mary', 'Joe'] {'Mary':123, 'Ralph':456, 'Joe':789} {'Mary':123, 'Joe':789}
I have attempted to use AST to convert the name_dicts column to a column of true dictionaries.
This function errored out with a "cannot convert string" error.
col here is the df['name_dicts'] col
def get_name_pairs(col):
for k,v in col.items():
if k.isin(df['names']):
return

Using a list comprehension and operator.itemgetter:
from operator import itemgetter
df['new_col'] = [dict(zip(l, itemgetter(*l)(d)))
for l,d in zip(df['names'], df['name_dicts'])]
output:
names name_dicts new_col
0 [Mary, Joe] {'Mary': 123, 'Ralph': 456, 'Joe': 789} {'Mary': 123, 'Joe': 789}
used input:
df = pd.DataFrame({'names': [['Mary', 'Joe']],
'name_dicts': [{'Mary':123, 'Ralph':456, 'Joe':789}]
})

You can apply a lambda function with dictionary comprehension at row level to get the values from the dict in second column based on the keys in the list of first column:
# If col values are stored as string:
import ast
for col in df:
df[col] = df[col].apply(ast.literal_eval)
df['new_col']=df.apply(lambda x: {k:x['name_dicts'].get(k,0) for k in x['names']},
axis=1)
# Replace above lambda by
# lambda x: {k:x['name_dicts'][k] for k in x['names'] if k in x['name_dicts']}
# If you want to include only key/value pairs for the key that is in
# both the list and the dictionary
names ... new_col
0 [Mary, Joe] ... {'Mary': 123, 'Joe': 789}
[1 rows x 3 columns]
PS: ast.literal_eval runs without error for the sample data you have posted for above code.

Your function needs only small change - and you can use it with .apply()
import pandas as pd
df = pd.DataFrame({
'names': [['Mary', 'Joe']],
'name_dicts': [{'Mary':123, 'Ralph':456, 'Joe':789}],
})
def filter_data(row):
result = {}
for key, val in row['name_dicts'].items():
if key in row['names']:
result[key] = val
return result
df['new_col'] = df.apply(filter_data, axis=1)
print(df.to_string())
Result:
names name_dicts new_col
0 [Mary, Joe] {'Mary': 123, 'Ralph': 456, 'Joe': 789} {'Mary': 123, 'Joe': 789}
EDIT:
If you have string "{'Mary':123, 'Ralph':456, 'Joe':789}" in name_dicts then you can replace ' with " and you will have json which you can convert to dictionary using json.loads
import json
df['name_dicts'] = df['name_dicts'].str.replace("'", '"').apply(json.loads)
Or directly convert it as Python's code:
import ast
df['name_dicts'] = df['name_dicts'].apply(ast.literal_eval)
Eventually:
df['name_dicts'] = df['name_dicts'].apply(eval)
Full code:
import pandas as pd
df = pd.DataFrame({
'names': [['Mary', 'Joe']],
'name_dicts': ["{'Mary':123, 'Ralph':456, 'Joe':789}",], # strings
})
#import json
#df['name_dicts'] = df['name_dicts'].str.replace("'", '"').apply(json.loads)
#df['name_dicts'] = df['name_dicts'].apply(eval)
import ast
df['name_dicts'] = df['name_dicts'].apply(ast.literal_eval)
def filter_data(row):
result = {}
for key, val in row['name_dicts'].items():
if key in row['names']:
result[key] = val
return result
df['new_col'] = df.apply(filter_data, axis=1)
print(df.to_string())

Find columns in Pandas DataFrame containing dicts

I have a pandas DataFrame with several columns containing dicts. I am trying to identify columns that contain at least 1 dict.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'i': [0, 1, 2, 3],
'd': [np.nan, {'p':1}, {'q':2}, np.nan],
't': [np.nan, {'u':1}, {'v':2}, np.nan]
})
# Iterate over cols to find dicts
cdict = [i for i in df.columns if isinstance(df[i][0],dict)]
cdict
[]
How do I find cols with dicts? Is there a solution to find cols with dicts without iterating over every cell / value of columns?

You can do :
s = df.applymap(lambda x:isinstance(x, dict)).any()
dict_cols = s[s].index.tolist()
print(dict_cols)
['d', 't']

We can apply over the columns although this still is iterating but making use of apply.
df.apply(lambda x: [any(isinstance(y, dict) for y in x)], axis=0)
EDIT: I think using applymap is more direct. However, we can use our boolean result to get the column names
any_dct = df.apply(lambda x: [any(isinstance(y, dict) for y in
x)], axis=0, result_type="expand")
df.iloc[:,any_dct.iloc[0,:].tolist()].columns.values

extract dataframes from list of dictionaries and combine into one

I have a list of dictionaries. Each item in the list is a dictionary. Each dictionary is a pair of key and value with the value being a data frame.
I would like to extract all the data frames and combine them into one.
I have tried:
df = pd.DataFrame.from_dict(data)
for both the full data file and for each dictionary in the list.
This gives the following error:
ValueError: If using all scalar values, you must pass an index
I have also tried turning the dictionary into a list, then converting to a pd.DataFrame, i get:
KeyError: 0
Any ideas?

It should be doable with pd.concat(). Let's say you have a list of dictionaries l:
l = (
{'a': pd.DataFrame(np.arange(9).reshape((3,3)))},
{'b': pd.DataFrame(np.arange(9).reshape((3,3)))},
{'c': pd.DataFrame(np.arange(9).reshape((3,3)))}
)
You can feed dataframes from each dict in the list to pd.concat():
df = pd.concat([[pd.DataFrame(df_) for df_ in dict_.values()][0] for dict_ in l])
In my example all data frames have the same number of columns, so the result has 9 x 3 shape. If your dataframes have different columns the output will be malformed and required extra steps to process.

This should work.
import pandas as pd
dict1 = {'d1': pd.DataFrame({'a': [1,2,3], 'b': ['one', 'two', 'three']})}
dict2 = {'d2': pd.DataFrame({'a': [4,5,6], 'b': ['four', 'five', 'six']})}
dict3 = {'d3': pd.DataFrame({'a': [7,8,9], 'b': ['seven', 'eigth', 'nine']})}
# dicts list. you would start from here
dicts_list = [dict1, dict2, dict3]
dict_counter = 0
for _dict in dicts_list:
aux_df = list(_dict.values())[0]
if dict_counter == 0:
df = aux_df
else:
df = df.append(aux_df)
dict_counter += 1
# Reseting and dropping old index
df = df.reset_index(drop=True)
print(df)
Just out of curiosity: Why are your sub-dataframes already included in a dictionary? An easy way of creating a dataframe from dictionaries is just building a list of dictionaries and then calling pd.DataFrame(list_with_dicts). If the keys are the same across all dictionaries, it should work. Just a suggestion from my side. Something like this:
list_with_dicts = [{'a': 1, 'b': 2}, {'a': 5, 'b': 4}, ...]
# my_df -> DataFrame with columns [a, b] and two rows with the values in the dict.
my_df = pd.DataFrame(list_with_dicts)

Get values from one dataframe where they are between an interval in another dataframe

Given a dataframe containing a numeric (float) series and a categorical ID (df). How can I create a dictionary in the form 'key': [] where the key is an ID from the dataframe and the list contains the difference between the numbers in the separate dataframes?
I have managed this using loops though I am looking for a more pandas way of doing this.
import pandas as pd
from collections import defaultdict
df = pd.DataFrame({'a': [0.75435, 0.74897, 0.60949,
0.87438, 0.90885, 0.28547,
0.27327, 0.31078, 0.15576,
0.58139],
'id': list('aaaxxbbyyy')})
rl = pd.DataFrame({'b': [0.51, 0.30], 'id': ['aaa', 'bbb']})
interval = 0.1
d = defaultdict(list)
for index, row in rl.iterrows():
before = df[df['a'].between(row['b'] - interval, row['b'], inclusive=False)]
after = df[df['a'].between(row['b'], row['b'] + interval, inclusive=True)]
for x, b_row in before.iterrows():
d[b_row['id']].append((b_row['a'] - row['b']))
for x, a_row in after.iterrows():
d[a_row['id']].append((a_row['a'] - row['b']))
for k, v in d.items():
print('{k}\t{v}'.format(k=k, v=len(v)))
a 1
y 2
b 2
d
defaultdict(list,
{'a': [0.09948],
'b': [-0.01452, -0.02672],
'y': [0.07138, 0.01078]})

How to create a dictionary of key : column_name and value : unique values in column in python from a dataframe

I am trying to create a dictionary of key:value pairs where key is the column name of a dataframe and value will be a list containing all the unique values in that column.Ultimately I want to be able to filter out the key_value pairs from the dict based on conditions. This is what I have been able to do so far:
for col in col_list[1:]:
_list = []
_list.append(footwear_data[col].unique())
list_name = ''.join([str(col),'_list'])
product_list = ['shoe','footwear']
color_list = []
size_list = []
Here product,color,size are all column names and the dict keys should be named accordingly like color_list etc.
Ultimately I will need to access each key:value_list in the dictionary.
Expected output:
KEY VALUE
color_list : ["red","blue","black"]
size_list: ["9","XL","32","10 inches"]
Can someone please help me regarding this?A snapshot of the data is attached.

With a DataFrame like this:
import pandas as pd
df = pd.DataFrame([["Women", "Slip on", 7, "Black", "Clarks"], ["Women", "Slip on", 8, "Brown", "Clarcks"], ["Women", "Slip on", 7, "Blue", "Clarks"]], columns= ["Category", "Sub Category", "Size", "Color", "Brand"])
print(df)
Output:
Category Sub Category Size Color Brand
0 Women Slip on 7 Black Clarks
1 Women Slip on 8 Brown Clarcks
2 Women Slip on 7 Blue Clarks
You can convert your DataFrame into dict and create your new dict when mapping the the columns of the DataFrame, like this example:
new_dict = {"color_list": list(df["Color"]), "size_list": list(df["Size"])}
# OR:
#new_dict = {"color_list": [k for k in df["Color"]], "size_list": [k for k in df["Size"]]}
print(new_dict)
Output:
{'color_list': ['Black', 'Brown', 'Blue'], 'size_list': [7, 8, 7]}
In order to have a unique values, you can use set like this example:
new_dict = {"color_list": list(set(df["Color"])), "size_list": list(set(df["Size"]))}
print(new_dict)
Output:
{'color_list': ['Brown', 'Blue', 'Black'], 'size_list': [8, 7]}
Or, like what #Ami Tavory said in his answer, in order to have the whole unique keys and values from your DataFrame, you can simply do this:
new_dict = {k:list(df[k].unique()) for k in df.columns}
print(new_dict)
Output:
{'Brand': ['Clarks', 'Clarcks'],
'Category': ['Women'],
'Color': ['Black', 'Brown', 'Blue'],
'Size': [7, 8],
'Sub Category': ['Slip on']}

I am trying to create a dictionary of key:value pairs where key is the column name of a dataframe and value will be a list containing all the unique values in that column.
You could use a simple dictionary comprehension for that.
Say you start with
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 1], 'b': [1, 4, 5]})
Then the following comprehension solves it:
>>> {c: list(df[c].unique()) for c in df.columns}
{'a': [1, 2], 'b': [1, 4, 5]}

If I understand your question correctly, you may need set instead of list. Probably at this piece of code, you might add set to get the unique values of the given list.
for col in col_list[1:]:
_list = []
_list.append(footwear_data[col].unique())
list_name = ''.join([str(col),'_list'])
list_name = set(list_name)
Sample of usage
>>> a_list = [7, 8, 7, 9, 10, 9]
>>> set(a_list)
{8, 9, 10, 7}

Here how i did it let me know if it helps
import pandas as pd
df = pd.read_csv("/path/to/csv/file")
colList = list(df)
dic = {}
for x in colList:
_list = []
_list.append(list(set(list(df[x]))))
list_name = ''.join([str(x), '_list'])
dic[str(x)+"_list"] = _list
print dic
Output:
{'Color_list': [['Blue', 'Orange', 'Black', 'Red']], 'Size_list': [['9', '8', '10 inches', 'XL', '7']], 'Brand_list': [['Clarks']], 'Sub_list': [['SO', 'FOR']], 'Category_list': [['M', 'W']]}
MyCsv File
Category,Sub,Size,Color,Brand
W,SO,7,Blue,Clarks
W,SO,7,Blue,Clarks
W,SO,7,Black,Clarks
W,SO,8,Orange,Clarks
W,FOR,8,Red,Clarks
M,FOR,9,Black,Clarks
M,FOR,10 inches,Blue,Clarks
M,FOR,XL,Blue,Clarks

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

combine multiple dictionaries of pandas dataframes by dictionary key and common columns - python

Related

Extract key value pairs from dict in pandas column using list items in another column

Find columns in Pandas DataFrame containing dicts

extract dataframes from list of dictionaries and combine into one

Get values from one dataframe where they are between an interval in another dataframe

How to create a dictionary of key : column_name and value : unique values in column in python from a dataframe

Categories

Resources