This question already has answers here:
Adding a new pandas column with mapped value from a dictionary [duplicate]
(1 answer)
Pandas create new column with count from groupby
(5 answers)
Closed 2 years ago.
I'm looking to replace values in a Pandas column with their respective frequencies in the column.
I'm aware I can use value_counts to retrieve the frequency distribution for each value in the column. What I'm not sure on is how to replace every occurance of a value with its respective frequency.
An example dataframe:
a b c
0 tiger 2 3
1 tiger 5 6
2 lion 8 9
Example output of df['a'].value_counts():
tiger 2
lion 1
Name: a, dtype: int64
Expected result when applied to column 'a':
a b c
0 2 2 3
1 2 5 6
2 1 8 9
Related
This question already has answers here:
Get the row(s) which have the max value in groups using groupby
(15 answers)
Closed 1 year ago.
I wanna grab the maximum value of column score for common values in first column
example
f score
A 4
B 5
A 6
A 0
C 1
C 4
Y 2
output
f score
A 6
B 5
C 4
Y 2
explanation: the max score when f[i]==A is 6 and when f[i]==C is 4
You can use pandas.DataFrame.groupby and pandas.Series.max:
df.groupby('f', as_index=False, sort=False).max()
output:
f score
0 A 6
1 B 5
2 C 4
3 Y 2
*NB. depending on the use case, the sort=False option in groupby can be set to keep the original order of the groups (which doesn't make a difference with this dataset).
This question already has answers here:
Add a sequential counter column on groups to a pandas dataframe
(4 answers)
Closed 1 year ago.
When using groupby(), how can I create a DataFrame with a new column containing an increasing index of each group. For example, if I have
df=pd.DataFrame('a':[1,1,1,2,2,2])
df
a
0 1
1 1
2 1
3 2
4 2
5 2
How can I get a DataFrame where the index resets for each new group in the column. The association between a and index is not important...just need to have each case of a receive a unique index starting from 1.
a idx
0 1 1
1 1 2
2 1 3
3 2 1
4 2 2
5 2 3
The answer in the comments :
df['idx'] = df.groupby('a').cumcount() + 1
This question already has answers here:
How are iloc and loc different?
(6 answers)
Selection with .loc in python
(5 answers)
Closed 4 years ago.
If I have a pandas data frame like this:
A B C D E
1 3 4 2 5 1
2 5 4 2 4 4
3 5 1 8 1 3
4 1 1 9 9 4
5 3 6 4 1 1
and want to find a value with a row value of 3 and column value of D how do I go about doing it?
In this case, I had a row value of 3 and column value of D how would I get a return of 1 in this instance?
Or if I had a row value of 2 and column value of B how would I get a return of 4?
You can use DataFrame.loc: df.loc[row, 'col_name'], eg, df.loc[2, 'B'] for 4
This question already has answers here:
How to move pandas data from index to column after multiple groupby
(4 answers)
How to convert index of a pandas dataframe into a column
(9 answers)
Closed 4 years ago.
This is the original table:
A B C E
0 1 1 5 4
1 1 1 1 1
2 3 3 8 2
I wanted to apply some aggregate functions to this table which I did with:
df.sort_values('C').groupby(['A', 'B'], sort=False).agg({'C': 'sum', 'E': 'last'})
My new table looks like this:
A B C E
1 1 6 4
3 3 8 2
When I measure the column lenght of the original VS the modified table with this command len(df.columns) , the results differ though.
The original table returns 4 columns and the modified table returns 2 columns.
My question: Why did this happen and how can I get to return 4 columns with the modified table?
This question already has answers here:
Pandas: How to fill null values with mean of a groupby?
(2 answers)
Pandas replace nan with mean value for a given grouping
(2 answers)
Closed 4 years ago.
I have a dataframe named as nf having columns name as type and minutes. for the null vlaues for a specific type i want to replace with mean of that specific type only
ID Type Minute
1 A 2
2 A 5
3 B 7
4 B NAN
5 B 3
6 C 4
7 C 6
8 C NAN
9 A 8
10 C 2
for the above dataframe i want to replace nan in the minutes with the mean of that specific type. for example for B i want to replace with 5 as the other two values sum upto to 10 and 2 values so 5 and similarly for C.
I have tried to use mean function but I dont have a knowledge to do it for a specific variable.
Thank for the help
You can use GroupBy + 'mean' with transform:
df['Minute'] = df['Minute'].fillna(df.groupby('Type')['Minute'].transform('mean'))
transform performs the indexing for you, so you don't have to split the operation into 2 steps:
s = df.groupby('Type')['Minute'].mean()
df['Minute'] = df['Minute'].fillna(df['Type'].map(s))