Below id my DataFrame named df2
ID,'AI','DB','ML','Python','IR'
0,1,1,0,0,0
1,1,1,1,0,1
2,1,1,0,1,1
3,1,0,1,0,1
I want to create a dictionary such that the first row and index become tuple and the values become values of the dictionary, something like
{
(0,"AI"):1,(0,"DB"):1,(0,"ML"):0,(0,"Python"):0,(0,"IR"):0,
(1,"AI"):1,(1,"DB"):1,(1,"ML"):1,(1,"Python"):0,(1,"IR"):0,
(2,"AI"):1,(2,"DB"):1,(2,"ML"):1,(2,"Python"):0,(2,"IR"):1,
(3,"AI"):1,(3,"DB"):0,(3,"ML"):1,(3,"Python"):0,(3,"IR"):1,
}
My trial so far your_dict = dict(zip(es, df2))
print(your_dict) but,does not produce my desired output
Create every combination of indices/columns, then create a dictionary with the key as a tuple of these values and the value is the value in the dataframe at that postion.
import itertools
{(x,y):df[y][x] for x, y in itertools.product(df.index, df.columns)}
Related
I'm new to pandas and I want to know if there is a way to map a column of lists in a dataframe to values stored in a dictionary.
Lets say I have the dataframe 'df' and the dictionary 'dict'. I want to create a new column named 'Description' in the dataframe where I can see the description of the Codes shown. The values of the items in the column should be stored in a list as well.
import pandas as pd
data = {'Codes':[['E0'],['E0','E1'],['E3']]}
df = pd.DataFrame(data)
dic = {'E0':'Error Code', 'E1':'Door Open', 'E2':'Door Closed'}
Most efficient would be to use a list comprehension.
df['Description'] = [[dic.get(x, None) for x in l] for l in df['Codes']]
output:
Codes Description
0 [E0] [Error Code]
1 [E0, E1] [Error Code, Door Open]
2 [E3] [None]
If needed you can post-process to replace the empty lists with NaN, use an alternative list comprehension to avoid non-matches: [[dic[x] for x in l if x in dic] for l in df['Codes']], but this would probably be ambiguous if you have one no-match among several matches (which one is which?).
I am trying to convert a dictionary key to a string and make the values a list
this is where i am and i don't know what to do next
dict_from_csv = pd.read_csv('Emissions.csv', header=None, index_col=0, squeeze=True).to_dict()
keys = list(dict_from_csv.keys())
values = list(dict_from_csv.values())
keys
values
Do you want a unique string to represent all the keys? If yes, you can do this:
keys = " ".join(list(dict_from_csv.keys()))
And, Do you like a single list with all values? If yes, you can do this:
values = [val for key in df for val in dict_from_csv[key].values()]
I have a dictionary:
dict = {"name1":["name1_a, "name1_b"], "name2":["name2_a", "name2_b", "name2_c"]
Then I read in a .csv file as a dataframe that has the following structure:
df = pd.read_csv('file.csv')
Name
Value
"name1"
10
"name1_b"
30
"name2_c"
30
I need a function to iterate through the dataframe and the dictionary, in a way that it searches the dataframe for each name in the dictionary lists ("name1_a", "name1_b", etc). Once it finds a match, let's say for "name1_b", it should add the corresponding value (30) to "name1" in the dataframe. If the name doesn't exist in the dataframe (like "name2" in the example), it should create a new row and assign the value corresponding to the sum of "name2_a" + "name2_b", etc.
So the resulting dataframe should be like this (value of "name_1b" was added to the value of "name1", and "name2" was created and assigned the value of "name2_c):
Name
Value
"name1"
40
"name1_b"
30
"name2_c"
30
"name2"
30
Thanks for the help!
You could index df by name and create a separate dataframe that holds values that will be added to df. Some target keys in dict won't be in df, so they will need to be added with a default. Its similar with the addend lists in dict, some will not have values and will need a default.
Once those two are setup, you can loop through the addends, collect sums and add those to df.
import pandas as pd
df = pd.DataFrame({"Name":["name1", "name1_b", "name2_c"],
"Value":[10, 30, 30]})
# map of target:addends to apply to dataframe
mydict = {"name1":["name1_a", "name1_b"], "name2":["name2_a", "name2_b", "name2_c"]}
# index dataframe by name and default unknown values
df.set_index("Name", inplace=True)
unknowns = pd.DataFrame(index=mydict.keys()-df.index)
unknowns["Value"] = 0
df = df.append(unknowns)
del unknowns
# create dataframe addends, defaulting unknown values
addends_df = pd.DataFrame(index={val for values in mydict.values()
for val in values})
addends_df["Value"] = df
addends_df.fillna(0, inplace=True)
# for each target, add the addends
for target, addends in mydict.items():
df.loc[target] += addends_df.loc[addends].sum()
print(df)
You can try firstly via dict comprehension make a key:value pair out of the list then chack if 'Name' present in dd and filter out results then replace the values of 'Name' with their values by using replace() and assign() to assign the changes back then append this new dataframe in the original one and then groupby 'Name' and calculate sum:
d={"name1":["name1_a", "name1_b"], "name2":["name2_a", "name2_b", "name2_c"]}
dd={i:k for k,v in d.items() for i in v}
df=(df.append(df[df['Name'].isin(dd)]
.assign(Name=lambda x:x['Name'].replace(dd)))
.groupby('Name',as_index=False).sum())
OR
The same approach but in seperate steps:
d={"name1":["name1_a", "name1_b"], "name2":["name2_a", "name2_b", "name2_c"]}
dd={i:k for k,v in d.items() for i in v}
df1=df[df['Name'].isin(dd)]
df1['Name']=df1['Name'].map(dd)
df=df.append(df1,ignore_index=True)
df=df.groupby('Name',as_index=False)['name2'].sum()
output of df:
Name name2
0 name1 40
1 name1_b 30
2 name2 30
3 name2_c 30
Note: don't assign anything to dict function in python
Iterate through the dictionary items and mask the data frame from the matching key and value list and get the sum value using .sum(). if a specific name exists in the data frame simply assign the value else create a new row.
dict_ = {"name1":["name1_a", "name1_b"], "name2":["name2_a", "name2_b", "name2_c"]}
for k,v in dict_.items():
mask_list = v + [k]
sum_value = df[df['Name'].isin(mask_list)]['Value'].sum()
if k in df['Name'].unique():
df.loc[df['Name'] == k, 'Value'] = sum_value
else:
df.loc[len(df.index)] = [k, sum_value]
I have the below dataframe:
And I have the below dictionary:
resource_ids_dict = {'Austria':1586023272, 'Bulgaria':1550004006, 'Croatia':1131119835, 'Denmark':1703440195,
'Finland':2005848983, 'France':1264698819, 'Germany':1907737079, 'Greece':2113941104,
'Italy':27898245, 'Netherlands':1832579427, 'Norway':1054291604, 'Poland':1188865122,
'Romania':270819662, 'Russia':2132391298, 'Serbia':1155274960, 'South Africa':635838568,
'Spain':52600180, 'Switzerland':842323896, 'Turkey':1716131192, 'UK':199152257}
I am using the above dictionary values to make calls to a vendor API. I then append all the return data into a dataframe df.
What I would like to do now is add a column after ID that is the dictionary keys of the dictionay values that lie in ResourceSetID.
I have had a look on the web, but haven't managed to find anything (probably due to my lack of accurate key word searches). Surely this should be a one-liner? I want avoid looping through the dataframe and the dictionary and mapping that way..
Use Series.map but first is necessary swap values with keys in dictionary:
d = {v:k for k, v in resource_ids_dict.items()}
#alternative
#d = dict(zip(resource_ids_dict.values(), resource_ids_dict.keys()))
df['new'] = df['ResourceSetID'].map(d)
My searching was unable to find a solution for this one. I hope it is simple and just missed it.
I am trying to assign a dataframe variable based on a dictionary key. I want to loop through a dictionary of keys 0, 1, 2 3... and save the dataframe as df_0, df_1, df_2 ... I am able to get the key and values working and can assign one dataframe, but cannot find a way to assign new dataframes based on the keys.
I tried How to create a new dataframe with every iteration of for loop in Python but it didn't seem to work.
Here is what I tried:
docs_dict = {0: '2635_base', 1: '2635_tri'}
for keys, docs in docs_dict.items():
print(keys, docs)
df = pd.read_excel(Path(folder_loc[docs]) / file_name[docs], sheet_name=sheet_name[docs], skiprows=3)}
Output: 0 2635_base 1 2635_tri from the print statement, and %whos DataFrame > df as excepted.
What I would like to get is: df_0 and df_1 based on the excel files in other dictionaries which work fine.
df[keys] = pd.read_excel(Path(folder_loc[docs]) / file_name[docs], sheet_name=sheet_name[docs], skiprows=3)
produces a ValueError: Wrong number of items passed 26, placement implies 1
SOLVED thanks to RubenB for pointing me to How do I create a variable number of variables? and answer by #rocky-li using globals()
for keys, docs in docs_dict.items():
print(keys, docs)
globals()['df_{}'.format(keys)] = pd.read_excel(...)}
>> Output: dataframes df_0, df_1, ...
You might want to try a dict comprehension as such (substitute pd.read_excel(...docs...) with whatever you need to read the dataframe from disc):
docs_dict = {0: '2635_base', 1: '2635_tri'}
dfs_dict = {k: pd.read_excel(...docs...) for k, docs in docs_dict.items()}