I need your help:
I have dataframe d3:
i am using pivot_table
df4 = df3.pivot_table(index = ['Number','Department','Task'], columns="Date", values="Score",fill_value = 'N/A')
output d4 looks like:
why is not showing rows where Task empty is.
What i am doing wrong?
I would like to create dataframe like this:
I think here is necessary replace misisng values before pivot_table:
cols = ['Number','Department','Task']
df[cols] = df[cols].fillna('N/A')
Related
I have the following dataframe:
newItem = pd.DataFrame({'c1': range(10), 'c2': (1,90,100,50,30,10,50,30,90,1000)})
Which looks like this:
I want to sort the columns by descending order, and extract the i'th row to a new pandas series.
So my function looks like this:
def getLargestRow(dataFrame, indexAfterSort):
numRows, numCols = dataFrame.shape
seriesToReturn = pd.Series()
dataFrame= dataFrame.sort_values(by=list(df.columns), ascending = False)
My problem is that I can't get to concatenate dataFrame's row number indexAfterSort.
I've tried to use:
seriesToReturn = seriesToReturn.add(df.iloc[indexAfterSort])
But confusingly I got NaN values, instead of the row values.
The dataframe after sort:
The output I receive (no matter what's the input for row index):
What am I missing here?
Thanks in advance.
It's a good idea to use built-in pandas functions for simple operations like sorting. Function sort_values is a good option here. This sorts the rows of the dataframe by c1 column:
seriesToReturn = newItem.sort_values('c1', ascending=False)
This returns a dataframe with both c1 and c2 columns, to get series of c2 column, just use seriesToReturn = seriesToReturn['c2'].
I created a pandas dataframe from a dictionary like this:
dictionary={'cat': [B1, B2,B3,B4,B5,B6,B7,B8,B9,B10], 'Dog': [c1, c2,c3], 'Bird': [d1,d2,d3,d4,d5]}
df = pd.DataFrame(dictionary.items(), columns=['ID_1','ID_match'])
But I get a table looking like this:
And I would like to be this way:
So far I did this way:
df_2_1=df .replace('', np.nan).set_index('ID_1').stack().reset_index(name='ID_match').drop('level_1',1)
But I get the second value as list...
Can someone point me in the right direction?
Solution:
I just needed to expand the second column:
df.explode('ID_match')
This solution should work. The first .iloc is taking every other starting with the first column, and the second is taking every other starting with the second column.
df1 = df.iloc[:,::2].melt()
df1 = df1['variable']
df2 = df.iloc[:,1::2].melt()
df2 = df2['value']
df3 = pd.DataFrame({'col1':df1, 'col2':df2})
I have an array of dataframes dfs = [df0, df1, ...]. Each one of them have a date column of varying size (some dates might be in one dataframe but not the other).
What I'm trying to do is this:
pd.concat(dfs).groupby("date", as_index=False).sum()
But with date no longer being a column but an index (dfs = [df.set_index("date") for df in dfs]).
I've seen you can pass df.index to groupby (.groupby(df.index)) but df.index might not include all the dates.
How can I do this?
The goal here is to call .sum() on the groupby, so I'm not tied to using groupby nor concat is there's any alternative method to do so.
If I am able to understand maybe you want something like this:
df = pd.concat([dfs])
df.groupby(df.index).sum()
Here's small example:
tmp1 = pd.DataFrame({'date':['2019-09-01','2019-09-02','2019-09-03'],'value':[1,1,1]}).set_index('date')
tmp2 = pd.DataFrame({'date':['2019-09-01','2019-09-02','2019-09-04','2019-09-05'],'value':[2,2,2,2]}).set_index('date')
df = pd.concat([tmp1,tmp2])
df.groupby(df.index).sum()
I have two data frames: df1
and df2
Now I want to replace one of rows of df1 (highlighted in red colour) with all values of df2. I try with following codes but didn't give the desired result. Here is the code:
df1[df1['Category_2']=='Specified Functionality'].update(df2)
I also tried:
df1[df1['Category_2']=='Specified Functionality'] = df2
Could anyone guide me where I am making the mistake?
You can insert the rows like this:
row = 13
df2 = df2.rename(columns = {'Functionality': 'Category_2')
df = pd.concat([df1[0:row], df2, df1[row+1:]]).reset_index(drop=True)
I have two DataFrames:
df = pd.DataFrame({'ID': ['bumgm001', 'lestj001',
'tanam001', 'hellj001', 'chacj001']})
df1 = pd.DataFrame({'playerID': ['bumgama01', 'lestejo01',
'tanakama01', 'hellije01', 'chacijh01'],
'retroID': ['bumgm001', 'lestj001', 'tanam001', 'hellj001', 'chacj001']})
OR
df df1
ID playerID retroID
'bumgm001' 'bumgama01' 'bumgm001'
'lestj001' 'lestejo01' 'lestj001'
'tanam001' 'tanakama01' 'tanam001'
'hellj001' 'hellije01' 'hellj001'
'chacj001' 'chacijh01' 'chacj001'
Now, my actual DataFrames are a little more complicated than this, but I simplified it here so it's clearer what I'm trying to do.
I would like to take all of the ID's in df and replace them with the corresponding playerID's in df1.
My final df should look like this:
df
**ID**
'bumgama01'
'lestejo01'
'tanakama01'
'hellije01'
'chacijh01'
I have tried to do it using the following method:
for row in df.itertuples(): #row[1] == the retroID column
playerID = df1.loc[df1['retroID']==row[1], 'playerID']]
df.loc[df['ID']==row[1], 'ID'].replace(to_replace=
df.loc[df['ID']==row[1], 'ID'], value=playerID)
The code seems to run just fine. But my retroID's in df have been changed to NaN rather than the proper playerIDs.
This strikes me as a datatype problem, but I'm not familiar enough with Pandas to diagnose any further.
EDIT:
Unfortunately, I made my example too simplistic. I edited to better represent the issue I'm having. I'm trying to look up the item from one DataFrame in a second DataFrame, then I want to replace the item from the first Dataframe with an item from the corresponding row of the second Dataframe. The columns DO NOT have the same name.
You can use the second dataframe as a dictionary for replacement:
to_replace = df1.set_index('retroID')['playerID'].to_dict()
df['retroID'].replace(to_replace, inplace=True)
According to your example, this is what you want:
df['ID'] = df1['playerID']
If data is not in order (row 1 from df is not the same as row 1 from df1) then use
df['ID']=df1.set_index('retroID').reindex(df['ID'])['playerID'].values
Credit to Wen for second approach
Output
ID
0 bumgama01
1 lestejo01
2 tanakama01
3 hellije01
4 chacijh01
Let me know if it's correct
OK, I've figured out a solution. As it turns out, my problem was a type problem. I updated my code from:
for row in df.itertuples(): #row[1] == the retroID column
playerID = df1.loc[df1['retroID']==row[1], 'playerID']]
df.loc[df['ID']==row[1], 'ID'].replace(to_replace=
df.loc[df['ID']==row[1], 'ID'], value=playerID)
to:
for row in df.itertuples(): #row[1] == the retroID column
playerID = df1.loc[df1['retroID']==row[1], 'playerID']].values[0]
df.loc[df['ID']==row[1], 'ID'].replace(to_replace=
df.loc[df['ID']==row[1], 'ID'], value=playerID)
This works because "playerID" is now a scalar object(thanks to .values[0]) rather than some other datatype which is not compatible with a DataFrame.