how to slice data frame by row number and aggregate in pandas - python

how could i use group_by function by sequential rows, for example,
how could I calculate the sum for each seven rows, such as the sum of 1-7 rows and the sum of 8-14 row?
values
1 4
2 2
3 1
4 5
6 1
7 8
...

Use integer division by helper array created by np.arange be length of DataFrame and pass to groupby for aggregate sum:
df = df.groupby(np.arange(len(df)) // 7).sum()
print (df)
values
0 21

Related

Python sum aggregate numbers in a dataframe column into a new column

How to aggregate numbers in a dataframe into a new column gradual sum of column number into a new column:
Index
numbers
new column
0
1
1
1
2
3
2
3
6
3
4
10
4
5
15
The solusion for getting the result and new column as described in the table:
df.cumsum()

Incrementing row values without looping [duplicate]

I have a pandas dataframe defined as:
A B SUM_C
1 1 10
1 2 20
I would like to do a cumulative sum of SUM_C and add it as a new column to the same dataframe. In other words, my end goal is to have a dataframe that looks like below:
A B SUM_C CUMSUM_C
1 1 10 10
1 2 20 30
Using cumsum in pandas on group() shows the possibility of generating a new dataframe where column name SUM_C is replaced with cumulative sum. However, my ask is to add the cumulative sum as a new column to the existing dataframe.
Thank you
Just apply cumsum on the pandas.Series df['SUM_C'] and assign it to a new column:
df['CUMSUM_C'] = df['SUM_C'].cumsum()
Result:
df
Out[34]:
A B SUM_C CUMSUM_C
0 1 1 10 10
1 1 2 20 30

Pandas groupby cumulative sum ignore current row

I know there's some questions about this topic (like Pandas: Cumulative sum of one column based on value of another) however, none of them fuull fill my requirements.
Let's say I have a dataframe like this one
.
I want to compute the cumulative sum of Cost grouping by month, avoiding taking into account the current value, in order to get the Desired column.By using groupby and cumsum I obtain colum CumSum
.
The DDL to generate the dataframe is
df = pd.DataFrame({'Month': [1,1,1,2,2,1,3],
'Cost': [5,8,10,1,3,4,1]})
IIUC you can use groupby.cumsum and then just subtract cost;
df['cumsum_'] = df.groupby('Month').Cost.cumsum().sub(df.Cost)
print(df)
Month Cost cumsum_
0 1 5 0
1 1 8 5
2 1 10 13
3 2 1 0
4 2 3 1
5 1 4 23
6 3 1 0
You can do the following:
df['agg']=df.groupby('Month')['Cost'].shift().fillna(0)
df['Cumsum']=df['Cost']+df['agg']

Sum of previous rows values

how I can sum previous rows values and current row value to a new column?
My current output:
index,value
0,1
1,2
2,3
3,4
4,5
My goal output is:
index,value,sum
0,1,1
1,2,3
2,3,6
3,4,10
4,5,15
I know that this is easy to do with Excel, but I'm looking solution to do with pandas.
My code:
import random, pandas
recordlist=[1,2,3,4,5]
df=pandas.DataFrame(recordlist, columns=["Values"])
use cumsum
df.assign(sum=df.value.cumsum())
value sum
index
0 1 1
1 2 3
2 3 6
3 4 10
4 5 15
Or
df['sum'] = df.value.cumsum()
df
value sum
index
0 1 1
1 2 3
2 3 6
3 4 10
4 5 15
If df is a series
pd.DataFrame(dict(value=df, sum=df.cumsum())
As already used in the previous posts, df.assign is a great function.
If you want to have a little bit more flexibility here, you can use a lambda function, like so
df.assign[ sum=lambda l: l['index'] + l['value'] ]
Just to do the summing, this could even be shortened with
df.assign[ sum=df['index'] + df['value'] ]
Note that sum (before the = sign) is not a function or variable, but the name for the new column. So this could be also df.assign[ mylongersumlabel=.. ]

Subtracting min value from previous value in pandas DataFrame

I want to subtract the minimum value of a column in a DataFrame from the value just above it. In R I would do this:
df <- data.frame(a=1:5, b=c(5,6,7,4,9))
df
a b
1 1 5
2 2 6
3 3 7
4 4 4
5 5 9
df$b[which.min(df$b)-1] - df$b[which.min(df$b)]
[1] 3
How can I do the same thing in pandas? More generally, how can I extract the row number in a pandas DataFrame where a certain condition is met?
You can use argmin to find out the index of the minimum value(the first one if there are ties), then you can do the subtraction based on the location:
index = df.b.argmin()
df.b[index-1] - df.b[index]
# 3
In case the index is not consecutive numbers:
i_index = df.b.values.argmin()
df.b.iat[i_index-1] - df.b.iat[i_index]
# 3
Or less efficiently:
-df.b.diff()[df.b.argmin()]
# 3.0

Categories

Resources