I created a pandas dataframe from a dictionary like this:
dictionary={'cat': [B1, B2,B3,B4,B5,B6,B7,B8,B9,B10], 'Dog': [c1, c2,c3], 'Bird': [d1,d2,d3,d4,d5]}
df = pd.DataFrame(dictionary.items(), columns=['ID_1','ID_match'])
But I get a table looking like this:
And I would like to be this way:
So far I did this way:
df_2_1=df .replace('', np.nan).set_index('ID_1').stack().reset_index(name='ID_match').drop('level_1',1)
But I get the second value as list...
Can someone point me in the right direction?
Solution:
I just needed to expand the second column:
df.explode('ID_match')
This solution should work. The first .iloc is taking every other starting with the first column, and the second is taking every other starting with the second column.
df1 = df.iloc[:,::2].melt()
df1 = df1['variable']
df2 = df.iloc[:,1::2].melt()
df2 = df2['value']
df3 = pd.DataFrame({'col1':df1, 'col2':df2})
Related
I want to replace df2 elements with df1 elements but according to that: If df2 first row first column has value '1' than df1 first row first column element is getting there, If it is zero than '0' stands. If df2 any row last column element is '1' than df1 that row last column element is coming there. It is going to be like that.
So i want to replace all df2 '1' element with df1 elements according to that rule. df3 is going to be like:
abcde0000;
abcd0e000;
abcd00e00;...
We can use apply function for this. But first you have concat both frames along axis 1. I am using a dummy table with just three entries. It can be applied for any number of rows.
import pandas as pd
import numpy as np
# Dummy data
df1 = pd.DataFrame([['a','b','c','d','e'],['a','b','c','d','e'],['a','b','c','d','e']])
df2 = pd.DataFrame([[1,1,1,1,1,0,0,0,0],[1,1,1,1,0,1,0,0,0],[1,1,1,1,0,0,1,0,0]])
# Display dataframe . May not work in python scripts. I used them in jupyter notebooks
display(df1)
display(df2)
# Concat DFs
df3 = pd.concat([df1,df2],axis=1)
display(df3)
# Define function for replacing
def replace(letters,indexes):
seek =0
for i in range(len(indexes)):
if indexes[i]==1:
indexes[i]=letters[seek]
seek+=1
return ''.join(list(map(str,indexes)))
# Applying replace function to dataframe
df4 = df3.apply(lambda x: replace(x[:5],x[5:]),axis=1)
# Display df4
display(df4)
The result is
0 abcde0000
1 abcd0e000
2 abcd00e00
dtype: object
I think this will solve your problem
I have the following dataframe:
newItem = pd.DataFrame({'c1': range(10), 'c2': (1,90,100,50,30,10,50,30,90,1000)})
Which looks like this:
I want to sort the columns by descending order, and extract the i'th row to a new pandas series.
So my function looks like this:
def getLargestRow(dataFrame, indexAfterSort):
numRows, numCols = dataFrame.shape
seriesToReturn = pd.Series()
dataFrame= dataFrame.sort_values(by=list(df.columns), ascending = False)
My problem is that I can't get to concatenate dataFrame's row number indexAfterSort.
I've tried to use:
seriesToReturn = seriesToReturn.add(df.iloc[indexAfterSort])
But confusingly I got NaN values, instead of the row values.
The dataframe after sort:
The output I receive (no matter what's the input for row index):
What am I missing here?
Thanks in advance.
It's a good idea to use built-in pandas functions for simple operations like sorting. Function sort_values is a good option here. This sorts the rows of the dataframe by c1 column:
seriesToReturn = newItem.sort_values('c1', ascending=False)
This returns a dataframe with both c1 and c2 columns, to get series of c2 column, just use seriesToReturn = seriesToReturn['c2'].
I have a not too large DF. I want to add a column that looks up the value in the column of that specific row. So in the example below, the value should come from the column names 'PA1.13'
example = {'Honda Civic': [1],
'Toyota': [0],
'valuetolookup': ['Honda Civic'],
'Result should be': [1]
}
As you can see the column has two levels. I cannot seem to find how to make a second column level from scratch, but here I hope that I can work it out if someone wants to use my example code to solve it :-)
You can use a simple apply() to extract data like you want:
import pandas as pd
example = {'Honda Civic': [1,3],
'Toyota': [0,2],
'valuetolookup': ['Honda Civic','Toyota'],
'Result should be': [1,2]
}
df = pd.DataFrame(example)
#In the pandas apply, i use the "valuetolookup" column value to get the column name
df["Result"] = df.apply(lambda x : x[x["valuetolookup"]],axis=1)
I added another row to show you that you can use different columns to lookup :)
I have 2 dataframes:
DF A:
and DF B:
I need to check every row in the DFA['item'] if it contains some of the values in the DFB['original'] and if it does, then add new column in DFA['my'] that would correspond to the value in DFB['my'].
So here is the result I need:
I tought of converting the DFB['original'] into list and then use regex, but this way I wont get the matching result from column 'my'.
Ok, maybe not the best solution, but it seems to be working.
I did cartesian join and then check the records which contains the data needed
dfa['join'] = 1
dfb['join'] = 1
dfFull = dfa.merge(dfb, on='join').drop('join' , axis=1)
dfFull['match'] = dfFull.apply(lambda x: x.original in x.item, axis = 1)
dfFull[dfFull['match']]
I have two DataFrames:
df = pd.DataFrame({'ID': ['bumgm001', 'lestj001',
'tanam001', 'hellj001', 'chacj001']})
df1 = pd.DataFrame({'playerID': ['bumgama01', 'lestejo01',
'tanakama01', 'hellije01', 'chacijh01'],
'retroID': ['bumgm001', 'lestj001', 'tanam001', 'hellj001', 'chacj001']})
OR
df df1
ID playerID retroID
'bumgm001' 'bumgama01' 'bumgm001'
'lestj001' 'lestejo01' 'lestj001'
'tanam001' 'tanakama01' 'tanam001'
'hellj001' 'hellije01' 'hellj001'
'chacj001' 'chacijh01' 'chacj001'
Now, my actual DataFrames are a little more complicated than this, but I simplified it here so it's clearer what I'm trying to do.
I would like to take all of the ID's in df and replace them with the corresponding playerID's in df1.
My final df should look like this:
df
**ID**
'bumgama01'
'lestejo01'
'tanakama01'
'hellije01'
'chacijh01'
I have tried to do it using the following method:
for row in df.itertuples(): #row[1] == the retroID column
playerID = df1.loc[df1['retroID']==row[1], 'playerID']]
df.loc[df['ID']==row[1], 'ID'].replace(to_replace=
df.loc[df['ID']==row[1], 'ID'], value=playerID)
The code seems to run just fine. But my retroID's in df have been changed to NaN rather than the proper playerIDs.
This strikes me as a datatype problem, but I'm not familiar enough with Pandas to diagnose any further.
EDIT:
Unfortunately, I made my example too simplistic. I edited to better represent the issue I'm having. I'm trying to look up the item from one DataFrame in a second DataFrame, then I want to replace the item from the first Dataframe with an item from the corresponding row of the second Dataframe. The columns DO NOT have the same name.
You can use the second dataframe as a dictionary for replacement:
to_replace = df1.set_index('retroID')['playerID'].to_dict()
df['retroID'].replace(to_replace, inplace=True)
According to your example, this is what you want:
df['ID'] = df1['playerID']
If data is not in order (row 1 from df is not the same as row 1 from df1) then use
df['ID']=df1.set_index('retroID').reindex(df['ID'])['playerID'].values
Credit to Wen for second approach
Output
ID
0 bumgama01
1 lestejo01
2 tanakama01
3 hellije01
4 chacijh01
Let me know if it's correct
OK, I've figured out a solution. As it turns out, my problem was a type problem. I updated my code from:
for row in df.itertuples(): #row[1] == the retroID column
playerID = df1.loc[df1['retroID']==row[1], 'playerID']]
df.loc[df['ID']==row[1], 'ID'].replace(to_replace=
df.loc[df['ID']==row[1], 'ID'], value=playerID)
to:
for row in df.itertuples(): #row[1] == the retroID column
playerID = df1.loc[df1['retroID']==row[1], 'playerID']].values[0]
df.loc[df['ID']==row[1], 'ID'].replace(to_replace=
df.loc[df['ID']==row[1], 'ID'], value=playerID)
This works because "playerID" is now a scalar object(thanks to .values[0]) rather than some other datatype which is not compatible with a DataFrame.