I have one dictionary of several pandas dataframes. It looks like this:
key Value
A pandas dataframe here
B pandas dataframe here
C pandas dataframe here
I need to extract dataframes from dict as a separate part and assign dict key as a name.
Desired output should be as many separate dataframes as many values of my dict have.
A = dict.values() - this is first dataframe
B = dict.values() - this is second dataframe
Note that dataframes names are dict keys.
I tried this code but without any success.
for key, value in my_dict_name.items():
key = pd.DataFrame.from_dict(value)
Any help would be appreciated.
It is not recommended, but possible:
Thanks # Willem Van Onsem for better explanation:
It is a quite severe anti-pattern, especially since it can override existing variables, and one can never exclude that scenario
a = pd.DataFrame({'a':['a']})
b = pd.DataFrame({'b':['b']})
c = pd.DataFrame({'c':['c']})
d = {'A':a, 'B':b, 'C':c}
print (d)
{'A': a
0 a, 'B': b
0 b, 'C': c
0 c}
for k, v in d.items():
globals()[k] = v
print (A)
a
0 a
I think here the best is MultiIndex if same columns or index values in each DataFrame, also dictionary of DataFrame is perfectly OK.
Related
I have a dictionary:
dict = {"name1":["name1_a, "name1_b"], "name2":["name2_a", "name2_b", "name2_c"]
Then I read in a .csv file as a dataframe that has the following structure:
df = pd.read_csv('file.csv')
Name
Value
"name1"
10
"name1_b"
30
"name2_c"
30
I need a function to iterate through the dataframe and the dictionary, in a way that it searches the dataframe for each name in the dictionary lists ("name1_a", "name1_b", etc). Once it finds a match, let's say for "name1_b", it should add the corresponding value (30) to "name1" in the dataframe. If the name doesn't exist in the dataframe (like "name2" in the example), it should create a new row and assign the value corresponding to the sum of "name2_a" + "name2_b", etc.
So the resulting dataframe should be like this (value of "name_1b" was added to the value of "name1", and "name2" was created and assigned the value of "name2_c):
Name
Value
"name1"
40
"name1_b"
30
"name2_c"
30
"name2"
30
Thanks for the help!
You could index df by name and create a separate dataframe that holds values that will be added to df. Some target keys in dict won't be in df, so they will need to be added with a default. Its similar with the addend lists in dict, some will not have values and will need a default.
Once those two are setup, you can loop through the addends, collect sums and add those to df.
import pandas as pd
df = pd.DataFrame({"Name":["name1", "name1_b", "name2_c"],
"Value":[10, 30, 30]})
# map of target:addends to apply to dataframe
mydict = {"name1":["name1_a", "name1_b"], "name2":["name2_a", "name2_b", "name2_c"]}
# index dataframe by name and default unknown values
df.set_index("Name", inplace=True)
unknowns = pd.DataFrame(index=mydict.keys()-df.index)
unknowns["Value"] = 0
df = df.append(unknowns)
del unknowns
# create dataframe addends, defaulting unknown values
addends_df = pd.DataFrame(index={val for values in mydict.values()
for val in values})
addends_df["Value"] = df
addends_df.fillna(0, inplace=True)
# for each target, add the addends
for target, addends in mydict.items():
df.loc[target] += addends_df.loc[addends].sum()
print(df)
You can try firstly via dict comprehension make a key:value pair out of the list then chack if 'Name' present in dd and filter out results then replace the values of 'Name' with their values by using replace() and assign() to assign the changes back then append this new dataframe in the original one and then groupby 'Name' and calculate sum:
d={"name1":["name1_a", "name1_b"], "name2":["name2_a", "name2_b", "name2_c"]}
dd={i:k for k,v in d.items() for i in v}
df=(df.append(df[df['Name'].isin(dd)]
.assign(Name=lambda x:x['Name'].replace(dd)))
.groupby('Name',as_index=False).sum())
OR
The same approach but in seperate steps:
d={"name1":["name1_a", "name1_b"], "name2":["name2_a", "name2_b", "name2_c"]}
dd={i:k for k,v in d.items() for i in v}
df1=df[df['Name'].isin(dd)]
df1['Name']=df1['Name'].map(dd)
df=df.append(df1,ignore_index=True)
df=df.groupby('Name',as_index=False)['name2'].sum()
output of df:
Name name2
0 name1 40
1 name1_b 30
2 name2 30
3 name2_c 30
Note: don't assign anything to dict function in python
Iterate through the dictionary items and mask the data frame from the matching key and value list and get the sum value using .sum(). if a specific name exists in the data frame simply assign the value else create a new row.
dict_ = {"name1":["name1_a", "name1_b"], "name2":["name2_a", "name2_b", "name2_c"]}
for k,v in dict_.items():
mask_list = v + [k]
sum_value = df[df['Name'].isin(mask_list)]['Value'].sum()
if k in df['Name'].unique():
df.loc[df['Name'] == k, 'Value'] = sum_value
else:
df.loc[len(df.index)] = [k, sum_value]
I have the below dataframe:
And I have the below dictionary:
resource_ids_dict = {'Austria':1586023272, 'Bulgaria':1550004006, 'Croatia':1131119835, 'Denmark':1703440195,
'Finland':2005848983, 'France':1264698819, 'Germany':1907737079, 'Greece':2113941104,
'Italy':27898245, 'Netherlands':1832579427, 'Norway':1054291604, 'Poland':1188865122,
'Romania':270819662, 'Russia':2132391298, 'Serbia':1155274960, 'South Africa':635838568,
'Spain':52600180, 'Switzerland':842323896, 'Turkey':1716131192, 'UK':199152257}
I am using the above dictionary values to make calls to a vendor API. I then append all the return data into a dataframe df.
What I would like to do now is add a column after ID that is the dictionary keys of the dictionay values that lie in ResourceSetID.
I have had a look on the web, but haven't managed to find anything (probably due to my lack of accurate key word searches). Surely this should be a one-liner? I want avoid looping through the dataframe and the dictionary and mapping that way..
Use Series.map but first is necessary swap values with keys in dictionary:
d = {v:k for k, v in resource_ids_dict.items()}
#alternative
#d = dict(zip(resource_ids_dict.values(), resource_ids_dict.keys()))
df['new'] = df['ResourceSetID'].map(d)
My searching was unable to find a solution for this one. I hope it is simple and just missed it.
I am trying to assign a dataframe variable based on a dictionary key. I want to loop through a dictionary of keys 0, 1, 2 3... and save the dataframe as df_0, df_1, df_2 ... I am able to get the key and values working and can assign one dataframe, but cannot find a way to assign new dataframes based on the keys.
I tried How to create a new dataframe with every iteration of for loop in Python but it didn't seem to work.
Here is what I tried:
docs_dict = {0: '2635_base', 1: '2635_tri'}
for keys, docs in docs_dict.items():
print(keys, docs)
df = pd.read_excel(Path(folder_loc[docs]) / file_name[docs], sheet_name=sheet_name[docs], skiprows=3)}
Output: 0 2635_base 1 2635_tri from the print statement, and %whos DataFrame > df as excepted.
What I would like to get is: df_0 and df_1 based on the excel files in other dictionaries which work fine.
df[keys] = pd.read_excel(Path(folder_loc[docs]) / file_name[docs], sheet_name=sheet_name[docs], skiprows=3)
produces a ValueError: Wrong number of items passed 26, placement implies 1
SOLVED thanks to RubenB for pointing me to How do I create a variable number of variables? and answer by #rocky-li using globals()
for keys, docs in docs_dict.items():
print(keys, docs)
globals()['df_{}'.format(keys)] = pd.read_excel(...)}
>> Output: dataframes df_0, df_1, ...
You might want to try a dict comprehension as such (substitute pd.read_excel(...docs...) with whatever you need to read the dataframe from disc):
docs_dict = {0: '2635_base', 1: '2635_tri'}
dfs_dict = {k: pd.read_excel(...docs...) for k, docs in docs_dict.items()}
I want an empty column in pandas. For example, data['dict']. I want every element in this column to be an empty dictionary. For example:
>>> data['dict']
{}
{}
{}
{}
How to write code? Thank you very much
Use a list comprehension.
For existing DataFrame:
df['dict'] = [{} for _ in range(len(df))]
For new object:
pd.DataFrame([{} for _ in range(100)])
One caution is that you lose some of the abilities of Pandas to vectorize operations when you use a complex Pandas data structure inside each (row, column) cell.
In order to avoid the same copy and create the feature problem when assign the values.
df['dict']=df.apply(lambda x : {},axis=1)
df
Out[730]:
0 1 2 dict
0 a b c {}
1 a NaN b {}
2 NaN t a {}
3 a d b {}
I have an Ordered Dictionary, where the keys are the worksheet names, and the values contain the the worksheet items. Thus, the question: How do I use each of the keys and convert to an individual dataframe?
import pandas as pd
powerbipath = 'PowerBI_Ingestion.xlsx' dfs = pd.read_excel(powerbipath, None)
values=[] for idx, eachdf in enumerate(dfs):
eachdf=dfs[eachdf]
new_list1.append(eachdf)
eachdf = pd.DataFrame(new_list1[idx])
Examples I have seen only show how to convert from an ordered dictionary to 1 pandas dataframe. I want to convert to multiple dataframes. Thus, if there are 5 keys, there will be 5 dataframes.
You may want to do something like this, (Assuming your dictionary looks like 'd') :
d = {'first': [1, 2], 'second': [3, 4]}
for i in d:
df = pd.DataFrame(d.get(i), columns=[i])
print(df)
Output looks like :
first
0 1
1 2
second
0 3
1 4
Here is a basic answer using one of these ideas
keys = df["key_column"].unique
df_array = {}
for k in keys :
df_array[k] = dfs[dfs['key_column'] == k]
There might be more efficient way to do it though.