Pandas - "Smart" Way of Summing Up Previous Values at Index 'n' [duplicate] - python

This question already has answers here:
Cumulative Sum Function on Pandas Data Frame
(2 answers)
Closed 4 years ago.
I'm fairly new to working with pandas, and I've been trying to create a new dataframe where the price value at index n is the sum of the values from indices 0 to n.
For example, if I have:
dataframe df:
price
0 1
1 2
2 3
3 2
4 4
5 2
The resulting data frame should look like this:
dataframe df:
price
0 1
1 3
2 6
3 8
4 12
5 14
I can think of a messy way of doing it using nested for loops, but I'm trying to shy away from using costly methods and doing things a more "sophisticated" way. But I can't really seem to think of a better method of doing this, and I know there has to a better way. What is the smart way of getting this "sum" dataframe? Thank you.

I think what you're looking for is the cumulative sum, for which you can use df.cumsum:
df.cumsum()
Which returns:
price
0 1
1 3
2 6
3 8
4 12
5 14

Related

Pandas add a second level index to the columns using a list

I have a dataframe with column headings (and for my real data multi-level row indexes). I want to add a second level index to the columns based on a list I have.
import pandas as pd
data = {"apple": [7,5,6,4,7,5,8,6],
"strawberry": [3,5,2,1,3,0,4,2],
"banana": [1,2,1,2,2,2,1,3],
"chocolate" : [5,8,4,2,1,6,4,5],
"cake":[4,4,5,1,3,0,0,3]
}
df = pd.DataFrame(data)
food_cat = ["fv","fv","fv","j","j"]
I am wanting something that looks like this:
I tried to use How to add a second level column header/index to dataframe by matching to dictionary values? - however couldn't get it working (and not ideal as I'd need to figure out how to automate the dictionary, which I don't have).
I also tried adding the list as a row in the dataframe and converting that row to a second level index as in this answer using
df.loc[len(df)] = food_cat
df = pd.MultiIndex.from_arrays(df.columns, df.iloc[len(df)-1])
but got the error
Check if lengths of all arrays are equal or not,
TypeError: Input must be a list / sequence of array-likes.
I also tried using df = pd.MultiIndex.from_arrays(df.columns, np.array(food_cat)) with import numpy as np but got the same error.
I feel like this should be a simple task (it is for rows), and there are a lot of questions asked, but I was struggling to find something I could duplicate to adapt to my data.
Pandas multi index creation requires a list(or list like) passed as an argument:
df.columns = pd.MultiIndex.from_arrays([food_cat, df.columns])
df
fv j
apple strawberry banana chocolate cake
0 7 3 1 5 4
1 5 5 2 8 4
2 6 2 1 4 5
3 4 1 2 2 1
4 7 3 2 1 3
5 5 0 2 6 0
6 8 4 1 4 0
7 6 2 3 5 3

Right way to reindex a dataframe? [duplicate]

This question already has answers here:
How to reset index in a pandas dataframe? [duplicate]
(3 answers)
Closed 1 year ago.
I have a large dataset which I filtered by location. The end result is something like this:
column 1 column 2
0 a 1
106 b 2
178 c 3
I guessed that the index values are skipping all over the place since the all the columns with the same locations aren't consecutive. To reset the indices, I did df.reindex(index = np.arange(len(df))), and it worked... but broke everything else. The output is this:
column 1 column 2
0 a 1
1 NAN NAN
12 NAN NAN
I don't have any idea why this is happening, and how I can fix this. Thanks for any help provided!
Use reset_index:
>>> df.reset_index(drop=True)
column 1 column 2
0 a 1
1 b 2
2 c 3

pandas pivot table and aggregate [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 3 years ago.
so what i have is the following:
test_df = pd.DataFrame({"index":[1,2,3,1],"columns":[5,6,7,5],"values":[9,9,9,9]})
index columns values
0 1 5 9
1 2 6 9
2 3 7 9
3 1 5 9
i would like the following, the index cols as my index, the columns cols as the columns and the values aggregated in their respective fields, like this:
5 6 7
1 18 nan nan
2 nan 9 nan
3 nan nan 9
thank you!!
EDIT: sorry i made i mistake. the value columns are also categorical, and i need their individual values.. so instead of 18 it should be something like [9:2,10:0,11:0] (assuming the possible value categoricals are 9,10,11)
What about?:
test_df.pivot_table(values='values', index='index', columns='columns', aggfunc='sum')
Also: This is just about reading the manual here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html. I suspect you've to read better about the 'aggfunc' param.

Convert multi-dim list in one column in python pandas

I would like to know whether I can get some help in "translating" a multi dim list in a single column of a frame in pandas.
I found help here to translate a multi dim list in a column with multiple columns, but I need to translate the data in one
Suppose I have the following list of list
x=[[1,2,3],[4,5,6]]
If I create a frame I get
frame=pd.Dataframe(x)
0 1 2
1 2 3
4 5 6
But my desire outcome shall be
0
1
2
3
4
5
6
with the zero as column header.
I can of course get the result with a for loop, which from my point of view takes much time. Is there any pythonic/pandas way to get it?
Thanks for helping men
You can use np.concatenate
x=[[1,2,3],[4,5,6]]
frame=pd.DataFrame(np.concatenate(x))
print(frame)
Output:
0
0 1
1 2
2 3
3 4
4 5
5 6
First is necessary flatten values of lists and pass to DataFrame constructor:
df = pd.DataFrame([z for y in x for z in y])
Or:
from itertools import chain
df = pd.DataFrame(list(chain.from_iterable(x)))
print (df)
0
0 1
1 2
2 3
3 4
4 5
5 6
If you use numpy you can utilize the method ravel():
pd.DataFrame(np.array(x).ravel())

Replace null by mean of specific variable [duplicate]

This question already has answers here:
Pandas: How to fill null values with mean of a groupby?
(2 answers)
Pandas replace nan with mean value for a given grouping
(2 answers)
Closed 4 years ago.
I have a dataframe named as nf having columns name as type and minutes. for the null vlaues for a specific type i want to replace with mean of that specific type only
ID Type Minute
1 A 2
2 A 5
3 B 7
4 B NAN
5 B 3
6 C 4
7 C 6
8 C NAN
9 A 8
10 C 2
for the above dataframe i want to replace nan in the minutes with the mean of that specific type. for example for B i want to replace with 5 as the other two values sum upto to 10 and 2 values so 5 and similarly for C.
I have tried to use mean function but I dont have a knowledge to do it for a specific variable.
Thank for the help
You can use GroupBy + 'mean' with transform:
df['Minute'] = df['Minute'].fillna(df.groupby('Type')['Minute'].transform('mean'))
transform performs the indexing for you, so you don't have to split the operation into 2 steps:
s = df.groupby('Type')['Minute'].mean()
df['Minute'] = df['Minute'].fillna(df['Type'].map(s))

Categories

Resources