This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 3 years ago.
so what i have is the following:
test_df = pd.DataFrame({"index":[1,2,3,1],"columns":[5,6,7,5],"values":[9,9,9,9]})
index columns values
0 1 5 9
1 2 6 9
2 3 7 9
3 1 5 9
i would like the following, the index cols as my index, the columns cols as the columns and the values aggregated in their respective fields, like this:
5 6 7
1 18 nan nan
2 nan 9 nan
3 nan nan 9
thank you!!
EDIT: sorry i made i mistake. the value columns are also categorical, and i need their individual values.. so instead of 18 it should be something like [9:2,10:0,11:0] (assuming the possible value categoricals are 9,10,11)
What about?:
test_df.pivot_table(values='values', index='index', columns='columns', aggfunc='sum')
Also: This is just about reading the manual here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html. I suspect you've to read better about the 'aggfunc' param.
Related
This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 6 months ago.
I have a Pandas DataFrame in Python such as this:
Group Pre/post Value
0 A Pre 3
1 A Pre 5
2 A Post 13
3 A Post 15
4 B Pre 7
5 B Pre 8
6 B Post 17
7 B Post 18
And I'd like to turn it into a different table such as:
Group Pre Post
0 A 3 13
1 A 5 15
2 B 7 17
3 B 8 18
I tried pivoting with df.pivot(index='Group', columns='Pre/post', values='Value') but since I have repeated values and order is important, it went traceback
Here is one way to do it, use list as an aggfunc in pivot_table, to collect the duplicate values for index and column as a list, then using explode split the list into multiple rows.
df.pivot_table(index='Group', columns='Pre/post', values='Value', aggfunc=list
).reset_index().explode(['Post','Pre'], ignore_index=True)
Pre/post Group Post Pre
0 A 13 3
1 A 15 5
2 B 17 7
3 B 18 8
This question already has answers here:
How to reset index in a pandas dataframe? [duplicate]
(3 answers)
Closed 1 year ago.
I have a large dataset which I filtered by location. The end result is something like this:
column 1 column 2
0 a 1
106 b 2
178 c 3
I guessed that the index values are skipping all over the place since the all the columns with the same locations aren't consecutive. To reset the indices, I did df.reindex(index = np.arange(len(df))), and it worked... but broke everything else. The output is this:
column 1 column 2
0 a 1
1 NAN NAN
12 NAN NAN
I don't have any idea why this is happening, and how I can fix this. Thanks for any help provided!
Use reset_index:
>>> df.reset_index(drop=True)
column 1 column 2
0 a 1
1 b 2
2 c 3
This question already has answers here:
Pandas: How to fill null values with mean of a groupby?
(2 answers)
Pandas replace nan with mean value for a given grouping
(2 answers)
Closed 4 years ago.
I have a dataframe named as nf having columns name as type and minutes. for the null vlaues for a specific type i want to replace with mean of that specific type only
ID Type Minute
1 A 2
2 A 5
3 B 7
4 B NAN
5 B 3
6 C 4
7 C 6
8 C NAN
9 A 8
10 C 2
for the above dataframe i want to replace nan in the minutes with the mean of that specific type. for example for B i want to replace with 5 as the other two values sum upto to 10 and 2 values so 5 and similarly for C.
I have tried to use mean function but I dont have a knowledge to do it for a specific variable.
Thank for the help
You can use GroupBy + 'mean' with transform:
df['Minute'] = df['Minute'].fillna(df.groupby('Type')['Minute'].transform('mean'))
transform performs the indexing for you, so you don't have to split the operation into 2 steps:
s = df.groupby('Type')['Minute'].mean()
df['Minute'] = df['Minute'].fillna(df['Type'].map(s))
This question already has answers here:
Cumulative Sum Function on Pandas Data Frame
(2 answers)
Closed 4 years ago.
I'm fairly new to working with pandas, and I've been trying to create a new dataframe where the price value at index n is the sum of the values from indices 0 to n.
For example, if I have:
dataframe df:
price
0 1
1 2
2 3
3 2
4 4
5 2
The resulting data frame should look like this:
dataframe df:
price
0 1
1 3
2 6
3 8
4 12
5 14
I can think of a messy way of doing it using nested for loops, but I'm trying to shy away from using costly methods and doing things a more "sophisticated" way. But I can't really seem to think of a better method of doing this, and I know there has to a better way. What is the smart way of getting this "sum" dataframe? Thank you.
I think what you're looking for is the cumulative sum, for which you can use df.cumsum:
df.cumsum()
Which returns:
price
0 1
1 3
2 6
3 8
4 12
5 14
This question already has answers here:
Deleting DataFrame row in Pandas based on column value
(18 answers)
Closed 7 years ago.
I have a pandas dataframe with duplicate ids. Below is my dataframe
id nbr type count
7 21 High 4
7 21 Low 6
8 39 High 2
8 39 Low 3
9 13 High 5
9 13 Low 7
How to delete only the rows having the type Low
You can also just slice your df using iloc:
df.iloc[::2]
This will step every 2 rows
You can try this way :
df = df[df.type != "Low"]
Another possible solution is to use drop_duplicates
df = df.drop_duplicates('nbr')
print(df)
id nbr type count
0 7 21 High 4
2 8 39 High 2
4 9 13 High 5
You can also do:
df.drop_duplicates('nbr', inplace=True)
That way you don't have to reassign it.