I have two data frames:
df1 = pd.read_excel("test1.xlsx")
df2 = pd.read_excel("test2.xlsx")
I am trying to assign values of df1 to df2 where a certain condition is met (Column1 is equal to Column1 then assign values of ColY to ColX).
df1.loc[df1['Col1'] == df2['Col1'],'ColX'] = df2['ColY']
This results in an error as df2['ColY] is the whole column. How do i assign for only the rows that match?
You can use numpy.where:
import numpy as np
df1['ColX'] = np.where(df1['Col1'].eq(df2['Col1']), df2['ColY'], df1['ColX'])
Since you wanted to assign from df1 to df2 your code should have been
df2.loc[df1['Col1']==df2['Col2'],'ColX']=df1.['ColY']
The code you wrote won't assign the values from df1 to df2, but from df2 to df1.
And also if you could clarify to which dataframe ColX and ColY belong to I could help more(Or does both dataframe have them??).
Your code is pretty much right!!! Only change the df1 and df2 as above.
I want to find the top 1% in my dataframe and append all the values in a list. Then i can check the first value inside and use it as a filter in the dataframe, any idea how to do it ? Or if you have a simplier way to do it !
You can find the dataframe i use here :
https://raw.githubusercontent.com/srptwice/forstack/main/resultat_projet.csv
What i tried is to watch my dataframe with heatmap (from Seaborn) and use a filter like that :
df4 = df2[df2 > 50700]
You can use df.<column name>.quantile(<percentile>) to get the top % of a dataframe. For example, the code below would get you the rows for df2 where bfly column is at the top 10% (90th percentile)
import pandas as pd
df = pd.read_csv('./resultstat_projet.csv')
df.columns = df.columns.str.replace(' ', '') # remove blank spaces in columns
df2 = df[df.bfly > df.bfly.quantile(0.9)]
print(df2)
The df is as shown below...
The below code can only rank one column in place. I would like to rank all columns and post the rank values in a separate df
df['rank_2020-06-23'] = df['2020-06-23'].rank(pct=True)
print(df)
Something like that should work:
df_ranks=pd.concat([pd.DataFrame(df[col].rank(pct=True)) for col in df.columns], axis=1)
It's simply using your function in a list comprehension, storing the results in dataframes to get a list of dataframes:
list_df_ranks=[pd.DataFrame(df[col].rank(pct=True)) for col in df.columns]
Then merging into one:
df_ranks=pd.concat(list_df_ranks, axis=1)
I have two data frames: df1
and df2
Now I want to replace one of rows of df1 (highlighted in red colour) with all values of df2. I try with following codes but didn't give the desired result. Here is the code:
df1[df1['Category_2']=='Specified Functionality'].update(df2)
I also tried:
df1[df1['Category_2']=='Specified Functionality'] = df2
Could anyone guide me where I am making the mistake?
You can insert the rows like this:
row = 13
df2 = df2.rename(columns = {'Functionality': 'Category_2')
df = pd.concat([df1[0:row], df2, df1[row+1:]]).reset_index(drop=True)
I have two DataFrames:
df = pd.DataFrame({'ID': ['bumgm001', 'lestj001',
'tanam001', 'hellj001', 'chacj001']})
df1 = pd.DataFrame({'playerID': ['bumgama01', 'lestejo01',
'tanakama01', 'hellije01', 'chacijh01'],
'retroID': ['bumgm001', 'lestj001', 'tanam001', 'hellj001', 'chacj001']})
OR
df df1
ID playerID retroID
'bumgm001' 'bumgama01' 'bumgm001'
'lestj001' 'lestejo01' 'lestj001'
'tanam001' 'tanakama01' 'tanam001'
'hellj001' 'hellije01' 'hellj001'
'chacj001' 'chacijh01' 'chacj001'
Now, my actual DataFrames are a little more complicated than this, but I simplified it here so it's clearer what I'm trying to do.
I would like to take all of the ID's in df and replace them with the corresponding playerID's in df1.
My final df should look like this:
df
**ID**
'bumgama01'
'lestejo01'
'tanakama01'
'hellije01'
'chacijh01'
I have tried to do it using the following method:
for row in df.itertuples(): #row[1] == the retroID column
playerID = df1.loc[df1['retroID']==row[1], 'playerID']]
df.loc[df['ID']==row[1], 'ID'].replace(to_replace=
df.loc[df['ID']==row[1], 'ID'], value=playerID)
The code seems to run just fine. But my retroID's in df have been changed to NaN rather than the proper playerIDs.
This strikes me as a datatype problem, but I'm not familiar enough with Pandas to diagnose any further.
EDIT:
Unfortunately, I made my example too simplistic. I edited to better represent the issue I'm having. I'm trying to look up the item from one DataFrame in a second DataFrame, then I want to replace the item from the first Dataframe with an item from the corresponding row of the second Dataframe. The columns DO NOT have the same name.
You can use the second dataframe as a dictionary for replacement:
to_replace = df1.set_index('retroID')['playerID'].to_dict()
df['retroID'].replace(to_replace, inplace=True)
According to your example, this is what you want:
df['ID'] = df1['playerID']
If data is not in order (row 1 from df is not the same as row 1 from df1) then use
df['ID']=df1.set_index('retroID').reindex(df['ID'])['playerID'].values
Credit to Wen for second approach
Output
ID
0 bumgama01
1 lestejo01
2 tanakama01
3 hellije01
4 chacijh01
Let me know if it's correct
OK, I've figured out a solution. As it turns out, my problem was a type problem. I updated my code from:
for row in df.itertuples(): #row[1] == the retroID column
playerID = df1.loc[df1['retroID']==row[1], 'playerID']]
df.loc[df['ID']==row[1], 'ID'].replace(to_replace=
df.loc[df['ID']==row[1], 'ID'], value=playerID)
to:
for row in df.itertuples(): #row[1] == the retroID column
playerID = df1.loc[df1['retroID']==row[1], 'playerID']].values[0]
df.loc[df['ID']==row[1], 'ID'].replace(to_replace=
df.loc[df['ID']==row[1], 'ID'], value=playerID)
This works because "playerID" is now a scalar object(thanks to .values[0]) rather than some other datatype which is not compatible with a DataFrame.