This question already has answers here:
Pandas: How to fill null values with mean of a groupby?
(2 answers)
Pandas replace nan with mean value for a given grouping
(2 answers)
Closed 4 years ago.
I have a dataframe named as nf having columns name as type and minutes. for the null vlaues for a specific type i want to replace with mean of that specific type only
ID Type Minute
1 A 2
2 A 5
3 B 7
4 B NAN
5 B 3
6 C 4
7 C 6
8 C NAN
9 A 8
10 C 2
for the above dataframe i want to replace nan in the minutes with the mean of that specific type. for example for B i want to replace with 5 as the other two values sum upto to 10 and 2 values so 5 and similarly for C.
I have tried to use mean function but I dont have a knowledge to do it for a specific variable.
Thank for the help
You can use GroupBy + 'mean' with transform:
df['Minute'] = df['Minute'].fillna(df.groupby('Type')['Minute'].transform('mean'))
transform performs the indexing for you, so you don't have to split the operation into 2 steps:
s = df.groupby('Type')['Minute'].mean()
df['Minute'] = df['Minute'].fillna(df['Type'].map(s))
Related
This question already has answers here:
Get the row(s) which have the max value in groups using groupby
(15 answers)
Closed 1 year ago.
I wanna grab the maximum value of column score for common values in first column
example
f score
A 4
B 5
A 6
A 0
C 1
C 4
Y 2
output
f score
A 6
B 5
C 4
Y 2
explanation: the max score when f[i]==A is 6 and when f[i]==C is 4
You can use pandas.DataFrame.groupby and pandas.Series.max:
df.groupby('f', as_index=False, sort=False).max()
output:
f score
0 A 6
1 B 5
2 C 4
3 Y 2
*NB. depending on the use case, the sort=False option in groupby can be set to keep the original order of the groups (which doesn't make a difference with this dataset).
This question already has answers here:
Remap values in pandas column with a dict, preserve NaNs
(11 answers)
Closed 2 years ago.
I have a dictionary and a dataframe:
dic = {"A":1,"B":2,"C":3,"D":4}
key
0 A
1 C
2 D
3 B
4 A
5 C
6 C
How can I populate the dataframe, using the dictionary, to generate a new dataframe as follows:
key value
0 A 1
1 C 3
2 D 4
3 B 2
4 A 1
5 C 3
6 C 3
Thought about maybe using apply(lambda) function, but with no success.
Thank you!
You could map using dictionary but the simplest way if generating categories would be
df['Value2'] = pd.factorize(df['key'])[0] + 1
This question already has answers here:
Adding a new pandas column with mapped value from a dictionary [duplicate]
(1 answer)
Pandas create new column with count from groupby
(5 answers)
Closed 2 years ago.
I'm looking to replace values in a Pandas column with their respective frequencies in the column.
I'm aware I can use value_counts to retrieve the frequency distribution for each value in the column. What I'm not sure on is how to replace every occurance of a value with its respective frequency.
An example dataframe:
a b c
0 tiger 2 3
1 tiger 5 6
2 lion 8 9
Example output of df['a'].value_counts():
tiger 2
lion 1
Name: a, dtype: int64
Expected result when applied to column 'a':
a b c
0 2 2 3
1 2 5 6
2 1 8 9
This question already has answers here:
How to move pandas data from index to column after multiple groupby
(4 answers)
How to convert index of a pandas dataframe into a column
(9 answers)
Closed 4 years ago.
This is the original table:
A B C E
0 1 1 5 4
1 1 1 1 1
2 3 3 8 2
I wanted to apply some aggregate functions to this table which I did with:
df.sort_values('C').groupby(['A', 'B'], sort=False).agg({'C': 'sum', 'E': 'last'})
My new table looks like this:
A B C E
1 1 6 4
3 3 8 2
When I measure the column lenght of the original VS the modified table with this command len(df.columns) , the results differ though.
The original table returns 4 columns and the modified table returns 2 columns.
My question: Why did this happen and how can I get to return 4 columns with the modified table?
This question already has answers here:
Cumulative Sum Function on Pandas Data Frame
(2 answers)
Closed 4 years ago.
I'm fairly new to working with pandas, and I've been trying to create a new dataframe where the price value at index n is the sum of the values from indices 0 to n.
For example, if I have:
dataframe df:
price
0 1
1 2
2 3
3 2
4 4
5 2
The resulting data frame should look like this:
dataframe df:
price
0 1
1 3
2 6
3 8
4 12
5 14
I can think of a messy way of doing it using nested for loops, but I'm trying to shy away from using costly methods and doing things a more "sophisticated" way. But I can't really seem to think of a better method of doing this, and I know there has to a better way. What is the smart way of getting this "sum" dataframe? Thank you.
I think what you're looking for is the cumulative sum, for which you can use df.cumsum:
df.cumsum()
Which returns:
price
0 1
1 3
2 6
3 8
4 12
5 14