I have a dataframe A with column 'col_1' and values of column is A and B and and I am trying to map the values of A and B present in Dictionary
DataFrame A:
enter image description here
and have dictionary
enter image description here
and I want the output like this
Dataframe :
col_1 Values
A 1
A 2
A 3
B 1
B 2
Any help will be highly appreciated
thanks
I tried to frame your problem properly:
df = pd.DataFrame({"col_1":["A","A","A","B","B"]})
Printing df gives us your dataframe shown in the image above:
print(df)
col_1
0 A
1 A
2 A
3 B
4 B
Here is your dictionary:
dict1 = {"A":[1,2,3], "B":[1,2]}
I created an empty list to hold the elements then stack up the list with your request, and finally created a new column called values and write the list into the column
values1 = []
for key,value_list in dict1.items():
for item in value_list:
value_item = key+" "+ str(item)
values1.append(value_item)
df["values"] = values1
printing df results into:
df
col_1 values
0 A A 1
1 A A 2
2 A A 3
3 B B 1
4 B B 2
Related
I have a dataframe df which has 4 columns 'A','B','C','D'
I have to search for a substring in each column and return the complete dataframe in the search order for example if I get the substring in column B row 3,4,5 then my final df would be having
3 rows. For this I am using df[df['A'].str.contains('string_to _search') and it's working fine but one of the column consist each element in the column as list of strings like in column B
A B C D
0 asdfg [asdfgh, cvb] asdfg nbcjsh
1 fghjk [ertyu] fghhjk yrewf
2 xcvb [qwerr, hjklk, bnm] cvbvb gjfsjgf
3 ertyu [qwert] ertyhhu ertkkk
so df[df['A'].str.contains('string_to _search') is not working for column B pls suggest how can I search in this column and maintain the order of complete dataframe.
There are lists in column B, so need in statement:
df1 = df[df['B'].apply(lambda x: 'cvb' in x)]
print (df1)
A B C D
0 asdfg [asdfgh, cvb] asdfg nbcjsh
If want use str.contains then is possible use str.join first, so is possible search also substrings:
df1 = df[df['B'].str.join(' ').str.contains('er')]
print (df1)
A B C D
1 fghjk [ertyu] fghhjk yrewf
2 xcvb [qwerr, hjklk, bnm] cvbvb gjfsjgf
3 ertyu [qwert] ertyhhu ertkkk
If want search in all columns:
df2 = (df[df.assign(B = df['B'].str.join(' '))
.apply(' '.join, axis=1)
.str.contains('g')]
)
print (df2)
A B C D
0 asdfg [asdfgh, cvb] asdfg nbcjsh
1 fghjk [ertyu] fghhjk yrewf
2 xcvb [qwerr, hjklk, bnm] cvbvb gjfsjgf
I would like to convert a dictionary of key-value pairs to an excel file with column names that match the values to the corresponding columns.
For example :
I have an excel file with column names as:
a,b,c,d,e,f,g and h.
I have a dictionary like:
{1:['c','d'],2:['a','h'],3:['a','b','b','f']}.
I need the output to be:
a b c d e f g h
1 1 1
2 1 1
3 1 2 1
the 1,2,3 are the keys from the dictionary.
The rest of the columns could be either 0 or null.
I have tried splitting the dictionary and am getting
1 = ['c','d']
2 = ['a','h']
3 = ['a','b','b','f']
but, I don't know how to pass this to match with the excel file.
Your problem can be solved with pandas and collections (there may exist a more efficient solution):
import pandas as pd
from collections import Counter
d = {...} # Your dictionary
series = pd.Series(d) # Convert the dict into a Series
counts = series.apply(Counter) # Count items row-wise
counts = counts.apply(pd.Series) # Convert the counters to Series
table = counts.fillna(0).astype(int) # Fill the gaps and make the counts integer
print(table)
# a b c d f h
1 0 0 1 1 0 0
2 1 0 0 0 0 1
3 1 2 0 0 1 0
It is not clear what type of output you expect, so I leave it to you to convert the DataFrame to the output of your choice.
A simple solution only based on standard lists and dictionaries. It generates a 2D list, which is then easy to convert into a CSV file than can be loaded by Excel.
d = {1:['c','d'],2:['a','h'],3:['a','b','b','f']}
cols = dict((c,n) for n,c in enumerate('abcdefgh'))
rows = dict((k,n) for n,k in enumerate('123'))
table = [[0 for col in cols] for row in rows]
for row, values in d.items():
for col in values:
table[rows[row]][cols[col]] += 1
print(table)
# output:
# [[0,0,1,1,0,0,0,0], [1,0,0,0,0,0,0,1], [1,2,0,0,0,1,0,0]]
I need to add a description column to a dataframe that is built by grouping items from another dataframe.
grouped= df1.groupby('item')
list= grouped['total'].agg(np.sum)
list= list.reset_index()
to assign a description label to every item I've come up with this solution:
def des(item):
return df1['description'].loc[df1['item']== item].iloc[0]
list['description'] = list['item'].apply(des)
it works but it takes an enourmous amount of time to execute.
I'd like to do something like that
list=list.assign(description= df1['description'].loc[df1['item']==list['item']]
or
list=list.assign(description= df1['description'].loc[df1['item'].isin(list['item'])]
Theese are very wrong but hope you get the idea, hoping there is some pandas stuff that do the trick more efficently but can't find it
Any ideas?
I think you need DataFrameGroupBy.agg by dict of functions - for column total sum and for description first:
df = df1.groupby('item', as_index=False).agg({'total':'sum', 'description':'first'})
Also dont use variable name list, because list is python code reserved word.
Sample:
df1 = pd.DataFrame({'description':list('abcdef'),
'B':[4,5,4,5,5,4],
'total':[5,3,6,9,2,4],
'item':list('aaabbb')})
print (df1)
B description item total
0 4 a a 5
1 5 b a 3
2 4 c a 6
3 5 d b 9
4 5 e b 2
5 4 f b 4
df = df1.groupby('item', as_index=False).agg({'total':'sum', 'description':'first'})
print (df)
item total description
0 a 14 a
1 b 15 d
I would like to create a new column for my dataframe named "Id" where the value is the row index +1.
I would like to be like the example below:
ID Col1 ...
0 1 a ...
1 2 b ...
2 3 c ...
You can add one to the index and assign it to the id column:
df = pd.DataFrame({"Col1": list("abc")})
df["id"] = df.index + 1
df
#Col1 id
#0 a 1
#1 b 2
#2 c 3
I converted the prediction list to an array then created an dataframe with group totals and plotted the dataframe
y_pred=model.predict(X_test)
#print(y_pred)
setosa=np.array([y_pred[i][0] for i in range(len(y_pred))])
versicolor=np.array([y_pred[i][1] for i in range(len(y_pred))])
virginica=np.array([y_pred[i][2] for i in range(len(y_pred))])
df2=pd.DataFrame({'setosa': [len(setosa[setosa>.5])],'versicolor':
[len(versicolor[versicolor>.5])], 'virginica': [len(virginica[virginica>.5])] })
df2['id']=np.arange(0,1)
df2=df2.set_index('id')
df2.plot.bar()
plt.show()
I have the following dataframe where I am trying to create new column C such that it is based on cumlative value of Column 'A' and 'B' as dictionary. And also, if '0' in Column 'B', the entry of that key is deleted from 'C'
df = DataFrame({'A' : [1,2,3,2,3,2],
'B':['Hi','Hello','HiWorld','HelloWorld','0','0']})
for indx,row in df.iterrows():
df['C'].append(dict(zip([row['A'],row['B']])))
I am looking for the following output in column C:
A B C
0 1 Hi {1:Hi}
1 2 Hello {1:Hi,2:Hello}
2 3 HiWorld {1:Hi,2:Hello,3:HiWorld}
3 2 HelloWorld {1:Hi,2:HelloWorld,3:HiWorld}
4 3 0 {1:Hi,2:HelloWorld}
5 2 0 {1:Hi}
I have tried potential solutions using cumsum, concat & series.shift(1) but had a block. Now I came across using dict & zip which seems clean solution but doesn't work for me. Any suggestions.
Try this:
d = dict()
column = list()
for _, a, b in df.itertuples():
if b != '0':
d[a] = b
else:
d.pop(a, None)
column.append(d.copy())
df['C'] = column