Rolling sum of groups by period - python

I have got this dataframe:
lst=[['01012021','A',10],['01012021','B',20],['02012021','A',12],['02012021','B',23]]
df2=pd.DataFrame(lst,columns=['Date','FN','AuM'])
I would like to get the rolling sum by date and FN. The desired result looks like this:
lst=[['01012021','A',10,''],['01012021','B',20,''],['02012021','A',12,22],['02012021','B',23,33]]
df2=pd.DataFrame(lst,columns=['Date','FN','AuM','Roll2PeriodSum'])
Would you please help me?
Thank you

Solution if consecutive datetimes, not used column date for count per groups:
df2['Roll2PeriodSum'] = (df2.groupby('FN').AuM
.rolling(2)
.sum()
.reset_index(level=0, drop=True))
print (df2)
Date FN AuM Roll2PeriodSum
0 01012021 A 10 NaN
1 01012021 B 20 NaN
2 02012021 A 12 22.0
3 02012021 B 23 43.0
Solution with datetimes, is used column date for counts:
df2['Date'] = pd.to_datetime(df2['Date'], format='%d%m%Y')
df = (df2.join(df2.set_index('Date')
.groupby('FN').AuM
.rolling('2D')
.sum().rename('Roll2PeriodSum'), on=['FN','Date']))
print (df)
Date FN AuM Roll2PeriodSum
0 2021-01-01 A 10 10.0
1 2021-01-01 B 20 20.0
2 2021-01-02 A 12 22.0
3 2021-01-02 B 23 43.0
df = (df2.join(df2.set_index('Date')
.groupby('FN').AuM
.rolling('2D', min_periods=2)
.sum()
.rename('Roll2PeriodSum'), on=['FN','Date']))
print (df)
Date FN AuM Roll2PeriodSum
0 2021-01-01 A 10 NaN
1 2021-01-01 B 20 NaN
2 2021-01-02 A 12 22.0
3 2021-01-02 B 23 43.0

Use groupby.rolling.sum:
df2['Roll2PeriodSum'] = (
df2.assign(Date=pd.to_datetime(df2['Date'], format='%d%m%Y'))
.groupby('FN').rolling(2)['AuM'].sum().droplevel(0)
)
print(df2)
# Output
Date FN AuM Roll2PeriodSum
0 01012021 A 10 NaN
1 01012021 B 20 NaN
2 02012021 A 12 22.0
3 02012021 B 23 43.0

Related

Convert a python df which is in pivot format to a proper row column format

i have the following dataframe
id a_1_1, a_1_2, a_1_3, a_1_4, b_1_1, b_1_2, b_1_3, c_1_1, c_1_2, c_1_3
1 10 20 30 40 90 80 70 Nan Nan Nan
2 33 34 35 36 nan nan nan 11 12 13
and i want my result to be as follow
id col_name 1 2 3
1 a 10 20 30
1 b 90 80 70
2 a 33 34 35
2 c 11 12 13
I am trying to use pd.melt function, but not yielding correct result ?
IIUC, you can reshape using an intermediate MultiIndex after extracting the letter and last digit from the original column names:
(df.set_index('id')
.pipe(lambda d: d.set_axis(pd.MultiIndex.from_frame(
d.columns.str.extract(r'^([^_]+).*(\d+)'),
names=['col_name', None]
), axis=1))
.stack('col_name')
.dropna(axis=1) # assuming you don't want columns with NaNs
.reset_index()
)
Variant using janitor's pivot_longer:
# pip install janitor
import janitor
(df
.pivot_longer(index='id', names_to=('col name', '.value'),
names_pattern=r'([^_]+).*(\d+)')
.pipe(lambda d: d.dropna(thresh=d.shape[1]-2))
.dropna(axis=1)
)
output:
id col_name 1 2 3
0 1 a 10.0 20.0 30.0
1 1 b 90.0 80.0 70.0
2 2 a 33.0 34.0 35.0
3 2 c 11.0 12.0 13.0
Code:
df = df1.melt(id_vars=["id"],
var_name="Col_name",
value_name="Value").dropna()
df['Num'] = df['Col_name'].apply(lambda x: x[-1])
df['Col_name'] = df['Col_name'].apply(lambda x: x[0])
df = df.pivot(index=['id','Col_name'], columns='Num', values='Value').reset_index().dropna(axis=1)
df
Output:
Num id Col_name 1 2 3
0 1 a 10.0 20.0 30.0
1 1 b 90.0 80.0 70.0
2 2 a 33.0 34.0 35.0
3 2 c 11.0 12.0 13.0

How to add values of quantity column and sum-product of Qty and price column based on Scrip name and Category using pandas data frame python

sample dataframe-
Scrip Name Category Quantity Price
a Buy 10 8
b Buy 20 15
b Buy 5 5
b Sell 3 4
c Buy 5 5
c Buy 6 7
c Sell 5 5
desired DAtaframe
Scrip Name buy_qty buy_val(Quantity*price) sell_qty sell_val
a 10 8 0 0
b 25 325 3 12
c 11 67 5 25
Find "Value"; then groupby "Scrip Name" and "Category" and find the sum of "Quantity" and "Value" for each group. Then pivot the Dataframe. Finally, do some costemic changes to get the desired outcome:
out = (df.assign(Value=df['Quantity']*df['Price'])
.groupby(['Scrip Name','Category'])[['Quantity','Value']]
.sum()
.reset_index()
.pivot('Scrip Name', 'Category', ['Quantity', 'Value'])
.fillna(0)
.swaplevel(0,1, axis=1)
)
out.columns = ['_'.join(col) for col in out.columns]
Output:
Buy_Quantity Sell_Quantity Buy_Value Sell_Value
Scrip Name
a 10.0 0.0 80.0 0.0
b 25.0 3.0 325.0 12.0
c 11.0 5.0 67.0 25.0
Try:
df.groupby(by=["Scrip Name"]).count()

Combining two dataframes

I've tried merging two dataframes, but I can't seem to get it to work. Each time I merge, the rows where I expect values are all 0. Dataframe df1 already as some data in it, with some left blank. Dataframe df2 will populate those blank rows in df1 where column names match at each value in "TempBin" and each value in "Month" in df1.
EDIT:
Both dataframes are in a for loop. df1 acts as my "storage", df2 changes for each location iteration. So if df2 contained the results for LocationZP, I would also want that data inserted in the matching df1 rows. If I use df1 = df1.append(df2) in the for loop, all of the rows from df2 keep inserting at the very end of df1 for each iteration.
df1:
Month TempBin LocationAA LocationXA LocationZP
1 0 7 1 2
1 1 98 0 89
1 2 12 23 38
1 3 3 14 17
1 4 7 9 14
1 5 1 8 99
13 0 0 0 0
13 1 0 0 0
13 2 0 0 0
13 3 0 0 0
13 4 0 0 0
13 5 0 0 0
df2:
Month TempBin LocationAA
13 0 11
13 1 22
13 2 33
13 3 44
13 4 55
13 5 66
desired output in df1:
Month TempBin LocationAA LocationXA LocationZP
1 0 7 1 2
1 1 98 0 89
1 2 12 23 38
1 3 3 14 17
1 4 7 9 14
1 5 1 8 99
13 0 11 0 0
13 1 22 0 0
13 2 33 0 0
13 3 44 0 0
13 4 55 0 0
13 5 66 0 0
import pandas as pd
df1 = pd.DataFrame({'Month': [1]*6 + [13]*6,
'TempBin': [0,1,2,3,4,5]*2,
'LocationAA': [7,98,12,3,7,1,0,0,0,0,0,0],
'LocationXA': [1,0,23,14,9,8,0,0,0,0,0,0],
'LocationZP': [2,89,38,17,14,99,0,0,0,0,0,0]}
)
df2 = pd.DataFrame({'Month': [13]*6,
'TempBin': [0,1,2,3,4,5],
'LocationAA': [11,22,33,44,55,66]}
)
df1 = pd.merge(df1, df2, on=["Month","TempBin","LocationAA"], how="left")
result:
Month TempBin LocationAA LocationXA LocationZP
1 0 7.0 1.0 2.0
1 1 98.0 0.0 89.0
1 2 12.0 23.0 38.0
1 3 3.0 14.0 17.0
1 4 7.0 9.0 14.0
1 5 1.0 8.0 99.0
13 0 NaN NaN NaN
13 1 NaN NaN NaN
13 2 NaN NaN NaN
13 3 NaN NaN NaN
13 4 NaN NaN NaN
13 5 NaN NaN NaN
Here's some code that worked for me:
# Merge two df into one dataframe on the columns "TempBin" and "Month" filling nan values with 0.
import pandas as pd
df1 = pd.DataFrame({'Month': [1]*6 + [13]*6,
'TempBin': [0,1,2,3,4,5]*2,
'LocationAA': [7,98,12,3,7,1,0,0,0,0,0,0],
'LocationXA': [1,0,23,14,9,8,0,0,0,0,0,0],
'LocationZP': [2,89,38,17,14,99,0,0,0,0,0,0]}
)
df2 = pd.DataFrame({'Month': [13]*6,
'TempBin': [0,1,2,3,4,5],
'LocationAA': [11,22,33,44,55,66]})
df_merge = pd.merge(df1, df2, how='left',
left_on=['TempBin', 'Month'],
right_on=['TempBin', 'Month'])
df_merge.fillna(0, inplace=True)
# add column LocationAA and fill it with the not null value from column LocationAA_x and LocationAA_y
df_merge['LocationAA'] = df_merge.apply(lambda x: x['LocationAA_x'] if pd.isnull(x['LocationAA_y']) else x['LocationAA_y'], axis=1)
# remove column LocationAA_x and LocationAA_y
df_merge.drop(['LocationAA_x', 'LocationAA_y'], axis=1, inplace=True)
print(df_merge)
Output:
Month TempBin LocationXA LocationZP LocationAA
0 1 0 1.0 2.0 0.0
1 1 1 0.0 89.0 0.0
2 1 2 23.0 38.0 0.0
3 1 3 14.0 17.0 0.0
4 1 4 9.0 14.0 0.0
5 1 5 8.0 99.0 0.0
6 13 0 0.0 0.0 11.0
7 13 1 0.0 0.0 22.0
8 13 2 0.0 0.0 33.0
9 13 3 0.0 0.0 44.0
10 13 4 0.0 0.0 55.0
11 13 5 0.0 0.0 66.0
Let me know if there's something you don't understand in the comments :)
PS: Sorry for the extra comments. But I left them there for some more explanations.
You need to use append to get the desired output:
df1 = df1.append(df2)
and if you want to replace the Nulls to zeros add:
df1 = df1.fillna(0)
Here is another way using combine_first()
i = ['Month','TempBin']
df2.set_index(i).combine_first(df1.set_index(i)).reset_index()

Save previos entry per group / id and date in a column

I have a dataframe in python, with the following sorted format:
df
Name Date Value
A 01.01.20 10
A 02.01.20 20
A 03.01.20 15
B 01.01.20 5
B 02.01.20 10
B 03.01.20 5
C 01.01.20 3
C 03.01.20 6
So not every Name has every date filled, how can I create a new column with previos date value (if it is missing, just pick the current value) so that it leads to:
Name Date Value Previos
A 01.01.20 10 10
A 02.01.20 20 10
A 03.01.20 15 20
B 01.01.20 5 5
B 02.01.20 10 5
B 03.01.20 5 10
C 01.01.20 3 3
C 03.01.20 6 6
Use DataFrameGroupBy.shift with Series.fillna:
df['Date'] = pd.to_datetime(df['Date'], format='%d.%m.%y')
df['Previos'] = df.groupby('Name')['Value'].shift().fillna(df['Value'])
print (df)
Name Date Value Previos
0 A 2020-01-01 10 10.0
1 A 2020-01-02 20 10.0
2 A 2020-01-03 15 20.0
3 B 2020-01-01 5 5.0
4 B 2020-01-02 10 5.0
5 B 2020-01-03 5 10.0
6 C 2020-01-01 3 3.0
7 C 2020-01-03 6 3.0
But if need shift by 1 day so in last group are same values like original solution is different - first is created DatetimeIndex and for new column is used DataFrame.join:
df['Date'] = pd.to_datetime(df['Date'], format='%d.%m.%y')
df = df.set_index('Date')
s = df.groupby('Name')['Value'].shift(freq='D').rename('Previous')
df = df.join(s, on=['Name','Date']).fillna({'Previous': df['Value']})
print (df)
Name Value Previous
Date
2020-01-01 A 10 10.0
2020-01-02 A 20 10.0
2020-01-03 A 15 20.0
2020-01-01 B 5 5.0
2020-01-02 B 10 5.0
2020-01-03 B 5 10.0
2020-01-01 C 3 3.0
2020-01-03 C 6 6.0

filter pandas data on specific index

I'd like to filter a dataframe based on specifics index.
I've read things about query but I don't succeed.
Here is the code which create my pivot table. I'd like to filter on specific members
df = pd.DataFrame(my_dataframe)
table = pd.pivot_table(df,index=["Date","member","Card"], columns=["Type"],values=["Heure"],aggfunc=[len]) #,fill_value=0)
table.to_excel(writer, sheet_name='TcD')
What should I do ?
Thanks
You can use query or select by level of MultiIndex by slicers:
df = pd.DataFrame({'Card':list('baaaaa'),
'Date':['2017-10-01'] * 6,
'Heure':[1,3,5,7,1,0],
'Type':[5,5,5,9,5,9],
'member':list('aaabbb')})
print (df)
Card Date Heure Type member
0 b 2017-10-01 1 5 a
1 a 2017-10-01 3 5 a
2 a 2017-10-01 5 5 a
3 a 2017-10-01 7 9 b
4 a 2017-10-01 1 5 b
5 a 2017-10-01 0 9 b
table = pd.pivot_table(df,index=["Date","member","Card"],
columns="Type",
values="Heure",
aggfunc='size')
print (table)
Type 5 9
Date member Card
2017-10-01 a a 2.0 NaN
b 1.0 NaN
b a 1.0 2.0
table1 = table.query('member == "a"')
print (table1)
Type 5 9
Date member Card
2017-10-01 a a 2.0 NaN
b 1.0 NaN
idx = pd.IndexSlice
table1 = table.loc[idx[:,'a',:],:]
print (table1)
Type 5 9
Date member Card
2017-10-01 a a 2.0 NaN
b 1.0 NaN
EDIT:
For filter by multiple values use:
table1 = table.query('member in ["a", "b"]')
print (table1)
Type 5 9
Date member Card
2017-10-01 a a 2.0 NaN
b 1.0 NaN
b a 1.0 2.0
idx = pd.IndexSlice
table1 = table.loc[idx[:,['a', 'b'],:],:]
print (table1)
Type 5 9
Date member Card
2017-10-01 a a 2.0 NaN
b 1.0 NaN
b a 1.0 2.0

Categories

Resources