A B C D
0 2002-01-12 10:00:00 John 19
1 2002-01-12 11:00:00 Africa 15
2 2002-01-12 12:00:00 Mary 30
3 2002-01-13 09:00:00 Billy 5
4 2002-01-13 11:00:00 Mira 6
5 2002-01-13 12:00:00 Hillary 50
6 2002-01-13 12:00:00 Romina 50
7 2002-01-14 10:00:00 George 30
8 2002-01-14 11:00:00 Denzel 12
9 2002-01-14 11:00:00 Michael 12
10 2002-01-14 12:00:00 Bisc 25
11 2002-01-16 10:00:00 Virgin 16
12 2002-01-16 11:00:00 Antonio 10
13 2002-01-16 12:00:00 Sito 5
I want to create two new columns df['E'] and df['F'], knowing that the same A and B values, always correspond to the same D value:
df['E']: percent of variance of D value respect previous D value.
df['F']: percent of variance between D and previous D value at 12:00:00.
Output should be:
A B C D E F
0 2002-01-12 10:00:00 John 19 0 0
1 2002-01-12 11:00:00 Africa 15 -21.05 0
2 2002-01-12 12:00:00 Mary 30 100.00 0
3 2002-01-13 09:00:00 Billy 5 -83.33 -83.33
4 2002-01-13 11:00:00 Mira 6 20.00 -80.00
5 2002-01-13 12:00:00 Hillary 50 733.33 66.66
6 2002-01-13 12:00:00 Romina 50 733.33 66.66
7 2002-01-14 10:00:00 George 30 -40.00 -40.00
8 2002-01-14 11:00:00 Denzel 12 -60.00 -76.00
9 2002-01-14 11:00:00 Michael 12 -60.00 -76.00
10 2002-01-14 12:00:00 Bisc 25 108.33 -50.00
11 2002-01-16 10:00:00 Virgin 16 -36.00 -36.00
12 2002-01-16 11:00:00 Antonio 10 -37.50 -60.00
13 2002-01-16 12:00:00 Sito 5 -50.00 -80.00
Would it be possible to use map to get it?
I´ve tried:
x = df[df['B'].eq(time(12))].drop_duplicates(subset=['A']).set_index('A')['D'](100 * (df.D - df.D.shift(1)) / df.D.shift(1)).fillna(0)
df['F'] = df['A'].map(x)
Use:
df['E'] = df['D'].pct_change().mul(100).replace(0,np.nan).ffill().fillna(0).round(2)
s = df[df['B'].eq(time(12))].drop_duplicates(subset=['A']).set_index('A')['D']
df['F'] = (df['D'].div(df['A'].map(s.shift()))).sub(1).mul(100).round(2).fillna(0)
print (df)
A B C D E F
0 2002-01-12 10:00:00 John 19 0.00 0.00
1 2002-01-12 11:00:00 Africa 15 -21.05 0.00
2 2002-01-12 12:00:00 Mary 30 100.00 0.00
3 2002-01-13 09:00:00 Billy 5 -83.33 -83.33
4 2002-01-13 11:00:00 Mira 6 20.00 -80.00
5 2002-01-13 12:00:00 Hillary 50 733.33 66.67
6 2002-01-13 12:00:00 Romina 50 733.33 66.67
7 2002-01-14 10:00:00 George 30 -40.00 -40.00
8 2002-01-14 11:00:00 Denzel 12 -60.00 -76.00
9 2002-01-14 11:00:00 Michael 12 -60.00 -76.00
10 2002-01-14 12:00:00 Bisc 25 108.33 -50.00
11 2002-01-16 10:00:00 Virgin 16 -36.00 -36.00
12 2002-01-16 11:00:00 Antonio 10 -37.50 -60.00
13 2002-01-16 12:00:00 Sito 5 -50.00 -80.00
Explanation:
For column E is used pct_change, then replace 0 to NaN and forward filling NaNs.
For column F is used formula with mapping column A by rows with 12:00:00 in column B
Related
This is a sample of my df, consisting of temperature and rain (mm) per city:
Datetime
Berlin_temperature
Dublin_temperature
London_temperature
Paris_temperature
Berlin_rain
Dublin_rain
London_rain
Paris_rain
2022-01-01 10:00:00
24
24
24
24
10
10
10
10
2022-01-01 11:00:00
24
24
24
24
10
10
10
10
2022-01-01 12:00:00
24
24
24
24
10
10
10
10
2022-01-01 13:00:00
24
24
24
24
10
10
10
10
I want to achieve following output as a dataframe:
Datetime
City
Temperature
Rainfall
2022-01-01 10:00:00
Berlin
24
10
2022-01-01 10:00:00
Dublin
24
10
2022-01-01 10:00:00
London
24
10
2022-01-01 10:00:00
Paris
24
10
2022-01-01 11:00:00
Berlin
24
10
2022-01-01 11:00:00
Dublin
24
10
2022-01-01 11:00:00
London
24
10
2022-01-01 11:00:00
Paris
24
10
2022-01-01 12:00:00
...
...
...
At the moment I don't know how to achieve this by transposing or something similar. How would this be possible?
Use DataFrame.stack with MultiIndex created by splitted columns with _ - but first convert Datetime to index by DataFrame.set_index:
df1 = df.set_index('Datetime')
df1.columns = df1.columns.str.split('_', expand=True)
df1 = df1.stack(0).rename_axis(['Datetime','City']).reset_index()
print (df1)
Datetime City rain temperature
0 2022-01-01 10:00:00 Berlin 10 24
1 2022-01-01 10:00:00 Dublin 10 24
2 2022-01-01 10:00:00 London 10 24
3 2022-01-01 10:00:00 Paris 10 24
4 2022-01-01 11:00:00 Berlin 10 24
5 2022-01-01 11:00:00 Dublin 10 24
6 2022-01-01 11:00:00 London 10 24
7 2022-01-01 11:00:00 Paris 10 24
8 2022-01-01 12:00:00 Berlin 10 24
9 2022-01-01 12:00:00 Dublin 10 24
10 2022-01-01 12:00:00 London 10 24
11 2022-01-01 12:00:00 Paris 10 24
12 2022-01-01 13:00:00 Berlin 10 24
13 2022-01-01 13:00:00 Dublin 10 24
14 2022-01-01 13:00:00 London 10 24
15 2022-01-01 13:00:00 Paris 10 24
Using janitor.pivot_longer:
import janitor
out = df.pivot_longer(index='Datetime', names_to=('City', '.value'),
names_sep='_', sort_by_appearance=True)
Output:
Datetime City temperature rain
0 2022-01-01 10:00:00 Berlin 24 10
1 2022-01-01 10:00:00 Dublin 24 10
2 2022-01-01 10:00:00 London 24 10
3 2022-01-01 10:00:00 Paris 24 10
4 2022-01-01 11:00:00 Berlin 24 10
5 2022-01-01 11:00:00 Dublin 24 10
6 2022-01-01 11:00:00 London 24 10
7 2022-01-01 11:00:00 Paris 24 10
8 2022-01-01 12:00:00 Berlin 24 10
9 2022-01-01 12:00:00 Dublin 24 10
10 2022-01-01 12:00:00 London 24 10
11 2022-01-01 12:00:00 Paris 24 10
12 2022-01-01 13:00:00 Berlin 24 10
13 2022-01-01 13:00:00 Dublin 24 10
14 2022-01-01 13:00:00 London 24 10
15 2022-01-01 13:00:00 Paris 24 10
I'm trying to merge two df's, one df has a datetime column, and the other has just a date column. My application for this is to find yesterday's high price using an OHLC dataset. I've attached some starter code below, but I'll describe what I'm looking for.
Given this intraday dataset:
time current_intraday_high
0 2022-02-11 09:00:00 1
1 2022-02-11 10:00:00 2
2 2022-02-11 11:00:00 3
3 2022-02-11 12:00:00 4
4 2022-02-11 13:00:00 5
5 2022-02-14 09:00:00 6
6 2022-02-14 10:00:00 7
7 2022-02-14 11:00:00 8
8 2022-02-14 12:00:00 9
9 2022-02-14 13:00:00 10
10 2022-02-15 09:00:00 11
11 2022-02-15 10:00:00 12
12 2022-02-15 11:00:00 13
13 2022-02-15 12:00:00 14
14 2022-02-15 13:00:00 15
15 2022-02-16 09:00:00 16
16 2022-02-16 10:00:00 17
17 2022-02-16 11:00:00 18
18 2022-02-16 12:00:00 19
19 2022-02-16 13:00:00 20
...and this daily dataframe:
time daily_high
0 2022-02-11 5
1 2022-02-14 10
2 2022-02-15 15
3 2022-02-16 20
...how can I merge them together, and have each row of the intraday dataframe contain the previous (business) day's high price, like so:
time current_intraday_high yesterdays_high
0 2022-02-11 09:00:00 1 NaN
1 2022-02-11 10:00:00 2 NaN
2 2022-02-11 11:00:00 3 NaN
3 2022-02-11 12:00:00 4 NaN
4 2022-02-11 13:00:00 5 NaN
5 2022-02-14 09:00:00 6 5.0
6 2022-02-14 10:00:00 7 5.0
7 2022-02-14 11:00:00 8 5.0
8 2022-02-14 12:00:00 9 5.0
9 2022-02-14 13:00:00 10 5.0
10 2022-02-15 09:00:00 11 10.0
11 2022-02-15 10:00:00 12 10.0
12 2022-02-15 11:00:00 13 10.0
13 2022-02-15 12:00:00 14 10.0
14 2022-02-15 13:00:00 15 10.0
15 2022-02-16 09:00:00 16 15.0
16 2022-02-16 10:00:00 17 15.0
17 2022-02-16 11:00:00 18 15.0
18 2022-02-16 12:00:00 19 15.0
19 2022-02-16 13:00:00 20 15.0
(Note the NaN's at the top because we don't have any data for Feb 10, 2022 from the intraday dataset, and see how each row contains the intraday data, plus the PREVIOUS day's max "high" price.)
Minimal reproducible example code below:
import pandas as pd
###################################################
# CREATE MOCK INTRADAY DATAFRAME
###################################################
intraday_date_time = [
"2022-02-11 09:00:00",
"2022-02-11 10:00:00",
"2022-02-11 11:00:00",
"2022-02-11 12:00:00",
"2022-02-11 13:00:00",
"2022-02-14 09:00:00",
"2022-02-14 10:00:00",
"2022-02-14 11:00:00",
"2022-02-14 12:00:00",
"2022-02-14 13:00:00",
"2022-02-15 09:00:00",
"2022-02-15 10:00:00",
"2022-02-15 11:00:00",
"2022-02-15 12:00:00",
"2022-02-15 13:00:00",
"2022-02-16 09:00:00",
"2022-02-16 10:00:00",
"2022-02-16 11:00:00",
"2022-02-16 12:00:00",
"2022-02-16 13:00:00",
]
intraday_date_time = pd.to_datetime(intraday_date_time)
intraday_df = pd.DataFrame(
{
"time": intraday_date_time,
"current_intraday_high": [x for x in range(1, 21)],
},
)
print(intraday_df)
# intraday_df.to_csv('intradayTEST.csv', index=True)
###################################################
# AGGREGATE/UPSAMPLE TO DAILY DATAFRAME
###################################################
# Aggregate to business days using intraday_df
agg_dict = {'current_intraday_high': 'max'}
daily_df = intraday_df.set_index('time').resample('B').agg(agg_dict).reset_index()
daily_df.rename(columns={"current_intraday_high": "daily_high"}, inplace=True)
print(daily_df)
# daily_df.to_csv('dailyTEST.csv', index=True)
###################################################
# MERGE THE TWO DATAFRAMES
###################################################
# Need to merge the daily dataset to the intraday dataset, such that,
# any row on the newly merged/joined/concat'd dataset will have:
# 1. The current intraday datetime in the 'time' column
# 2. The current 'intraday_high' value
# 3. The PREVIOUS DAY's 'daily_high' value
# This doesn't work as the daily_df just gets appended to the bottom
# of the intraday_df due to the datetimes/dates merging
merged_df = pd.merge(intraday_df, daily_df, how='outer', on='time')
print(merged_df)
pd.merge_asof allows you to easily do a merge like this.
yesterdays_high = (intraday_df.resample('B', on='time')['current_intraday_high'].max()
.shift()
.rename('yesterdays_high')
.reset_index())
merged_df = pd.merge_asof(intraday_df, yesterdays_high)
print(merged_df)
Output:
time current_intraday_high yesterdays_high
0 2022-02-11 09:00:00 1 NaN
1 2022-02-11 10:00:00 2 NaN
2 2022-02-11 11:00:00 3 NaN
3 2022-02-11 12:00:00 4 NaN
4 2022-02-11 13:00:00 5 NaN
5 2022-02-14 09:00:00 6 5.0
6 2022-02-14 10:00:00 7 5.0
7 2022-02-14 11:00:00 8 5.0
8 2022-02-14 12:00:00 9 5.0
9 2022-02-14 13:00:00 10 5.0
10 2022-02-15 09:00:00 11 10.0
11 2022-02-15 10:00:00 12 10.0
12 2022-02-15 11:00:00 13 10.0
13 2022-02-15 12:00:00 14 10.0
14 2022-02-15 13:00:00 15 10.0
15 2022-02-16 09:00:00 16 15.0
16 2022-02-16 10:00:00 17 15.0
17 2022-02-16 11:00:00 18 15.0
18 2022-02-16 12:00:00 19 15.0
19 2022-02-16 13:00:00 20 15.0
Given your already existing code, you can map the shifted values:
intraday_df['yesterdays_high'] = (intraday_df['time']
.dt.date
.map(daily_df['daily_high']
.set_axis(daily_df['time'].shift(-1)))
)
If you don't have all days and really want to map the real previous business day:
intraday_df['yesterdays_high'] = (intraday_df['time']
.dt.date
.map(daily_df['daily_high']
.set_axis(daily_df['time'].add(pd.offsets.BusinessDay())))
)
Output:
time current_intraday_high yesterdays_high
0 2022-02-11 09:00:00 1 NaN
1 2022-02-11 10:00:00 2 NaN
2 2022-02-11 11:00:00 3 NaN
3 2022-02-11 12:00:00 4 NaN
4 2022-02-11 13:00:00 5 NaN
5 2022-02-14 09:00:00 6 5.0
6 2022-02-14 10:00:00 7 5.0
7 2022-02-14 11:00:00 8 5.0
8 2022-02-14 12:00:00 9 5.0
9 2022-02-14 13:00:00 10 5.0
10 2022-02-15 09:00:00 11 10.0
11 2022-02-15 10:00:00 12 10.0
12 2022-02-15 11:00:00 13 10.0
13 2022-02-15 12:00:00 14 10.0
14 2022-02-15 13:00:00 15 10.0
15 2022-02-16 09:00:00 16 15.0
16 2022-02-16 10:00:00 17 15.0
17 2022-02-16 11:00:00 18 15.0
18 2022-02-16 12:00:00 19 15.0
19 2022-02-16 13:00:00 20 15.0
We can use .dt.date as an index to join two frames together on the same days. As of previous day hight_price, we can apply shift on daily_df:
intra_date = intraday_df['time'].dt.date
daily_date = daily_df['time'].dt.date
answer = intraday_df.set_index(intra_date).join(
daily_df.set_index(daily_date)['daily_high'].shift()
).reset_index(drop=True)
I would like to replace the value NaN of day 20211021 - HOUR 1 with the value of day 20211020 - HOUR 1, the value of day day 20211021 - HOUR 2 with the value of day 20211020 - HOUR 2...
timestamp
data
year
month
day
hour
solar_total
2021-10-20 00:00:00
20211020
2021
10
20
1
0.0
2021-10-20 01:00:00
20211020
2021
10
20
2
0.0
2021-10-20 02:00:00
20211020
2021
10
20
3
0.0
2021-10-20 03:00:00
20211020
2021
10
20
4
0.0
2021-10-20 04:00:00
20211020
2021
10
20
5
0.0
2021-10-20 05:00:00
20211020
2021
10
20
6
0.0
2021-10-20 06:00:00
20211020
2021
10
20
7
0.0
2021-10-20 07:00:00
20211020
2021
10
20
8
65.0
2021-10-20 08:00:00
20211020
2021
10
20
9
1498.0
2021-10-20 09:00:00
20211020
2021
10
20
10
4034.0
2021-10-20 10:00:00
20211020
2021
10
20
11
6120.0
2021-10-20 11:00:00
20211020
2021
10
20
12
7450.0
2021-10-20 12:00:00
20211020
2021
10
20
13
7943.0
2021-10-20 13:00:00
20211020
2021
10
20
14
7821.0
2021-10-20 14:00:00
20211020
2021
10
20
15
7058.0
2021-10-20 16:00:00
20211020
2021
10
20
17
3664.0
2021-10-20 17:00:00
20211020
2021
10
20
18
1375.0
2021-10-20 18:00:00
20211020
2021
10
20
19
11.0
2021-10-20 19:00:00
20211020
2021
10
20
20
0.0
2021-10-20 20:00:00
20211020
2021
10
20
21
0.0
2021-10-20 21:00:00
20211020
2021
10
20
22
0.0
2021-10-20 22:00:00
20211020
2021
10
20
23
0.0
2021-10-20 23:00:00
20211020
2021
10
20
24
0.0
2021-10-21 00:00:00
20211021
2021
10
21
1
NaN
2021-10-21 01:00:00
20211021
2021
10
21
2
NaN
2021-10-21 02:00:00
20211021
2021
10
21
3
NaN
2021-10-21 03:00:00
20211021
2021
10
21
4
NaN
2021-10-21 04:00:00
20211021
2021
10
21
5
NaN
2021-10-21 05:00:00
20211021
2021
10
21
6
NaN
2021-10-21 06:00:00
20211021
2021
10
21
7
NaN
2021-10-21 07:00:00
20211021
2021
10
21
8
NaN
2021-10-21 08:00:00
20211021
2021
10
21
9
NaN
2021-10-21 09:00:00
20211021
2021
10
21
10
NaN
2021-10-21 10:00:00
20211021
2021
10
21
11
NaN
2021-10-21 11:00:00
20211021
2021
10
21
12
NaN
2021-10-21 12:00:00
20211021
2021
10
21
13
NaN
2021-10-21 13:00:00
20211021
2021
10
21
14
NaN
2021-10-21 14:00:00
20211021
2021
10
21
15
NaN
2021-10-21 15:00:00
20211021
2021
10
21
16
NaN
2021-10-21 16:00:00
20211021
2021
10
21
17
NaN
2021-10-21 17:00:00
20211021
2021
10
21
18
NaN
2021-10-21 18:00:00
20211021
2021
10
21
19
NaN
2021-10-21 19:00:00
20211021
2021
10
21
20
NaN
2021-10-21 20:00:00
20211021
2021
10
21
21
NaN
2021-10-21 21:00:00
20211021
2021
10
21
22
NaN
2021-10-21 22:00:00
20211021
2021
10
21
23
NaN
2021-10-21 23:00:00
20211021
2021
10
21
24
NaN
If you have no missing timestamp, shift your rows by 23:
df['solar_total'] = df['solar_total'].fillna(df['solar_total'].shift(23))
timestamp
data
year
month
day
hour
solar_total
2021-10-20 00:00:00
20211020
2021
10
20
1
0
2021-10-20 01:00:00
20211020
2021
10
20
2
0
2021-10-20 02:00:00
20211020
2021
10
20
3
0
2021-10-20 03:00:00
20211020
2021
10
20
4
0
2021-10-20 04:00:00
20211020
2021
10
20
5
0
2021-10-20 05:00:00
20211020
2021
10
20
6
0
2021-10-20 06:00:00
20211020
2021
10
20
7
0
2021-10-20 07:00:00
20211020
2021
10
20
8
65
2021-10-20 08:00:00
20211020
2021
10
20
9
1498
2021-10-20 09:00:00
20211020
2021
10
20
10
4034
2021-10-20 10:00:00
20211020
2021
10
20
11
6120
2021-10-20 11:00:00
20211020
2021
10
20
12
7450
2021-10-20 12:00:00
20211020
2021
10
20
13
7943
2021-10-20 13:00:00
20211020
2021
10
20
14
7821
2021-10-20 14:00:00
20211020
2021
10
20
15
7058
2021-10-20 16:00:00
20211020
2021
10
20
17
3664
2021-10-20 17:00:00
20211020
2021
10
20
18
1375
2021-10-20 18:00:00
20211020
2021
10
20
19
11
2021-10-20 19:00:00
20211020
2021
10
20
20
0
2021-10-20 20:00:00
20211020
2021
10
20
21
0
2021-10-20 21:00:00
20211020
2021
10
20
22
0
2021-10-20 22:00:00
20211020
2021
10
20
23
0
2021-10-20 23:00:00
20211020
2021
10
20
24
0
2021-10-21 00:00:00
20211021
2021
10
21
1
0
2021-10-21 01:00:00
20211021
2021
10
21
2
0
2021-10-21 02:00:00
20211021
2021
10
21
3
0
2021-10-21 03:00:00
20211021
2021
10
21
4
0
2021-10-21 04:00:00
20211021
2021
10
21
5
0
2021-10-21 05:00:00
20211021
2021
10
21
6
0
2021-10-21 06:00:00
20211021
2021
10
21
7
0
2021-10-21 07:00:00
20211021
2021
10
21
8
65
2021-10-21 08:00:00
20211021
2021
10
21
9
1498
2021-10-21 09:00:00
20211021
2021
10
21
10
4034
2021-10-21 10:00:00
20211021
2021
10
21
11
6120
2021-10-21 11:00:00
20211021
2021
10
21
12
7450
2021-10-21 12:00:00
20211021
2021
10
21
13
7943
2021-10-21 13:00:00
20211021
2021
10
21
14
7821
2021-10-21 14:00:00
20211021
2021
10
21
15
7058
2021-10-21 15:00:00
20211021
2021
10
21
16
3664
2021-10-21 16:00:00
20211021
2021
10
21
17
1375
2021-10-21 17:00:00
20211021
2021
10
21
18
11
2021-10-21 18:00:00
20211021
2021
10
21
19
0
2021-10-21 19:00:00
20211021
2021
10
21
20
0
2021-10-21 20:00:00
20211021
2021
10
21
21
0
2021-10-21 21:00:00
20211021
2021
10
21
22
0
2021-10-21 22:00:00
20211021
2021
10
21
23
0
2021-10-21 23:00:00
20211021
2021
10
21
24
nan
How would you transform rows into columns using my data? My current dataset looks like 'Original df' shown below, and I want it to look like the 'New df2'. Just to be clear, Appoint 1, matches with SDAP1 and RDAP1 and hence Appoint 2 corresponds to SDAP2 RDAP2.
Original df:
Name Appoint1 Appoint2 Appoint1t Appoint2t SDAP1 RDAP1 SDAP2 RDAP2
Sam 23.09.2017 24.09.2017 11:00:00 11:00:00 3 -9 6 8
Sarah 24.09.2017 27.09.2017 12:00:00 12:00:00 2 Nan 7 8
Steve 23.10.2017 31.10.2017 11:00:00 12:00:00 5 9 7 9
Mark 23.09.2017 11:00:00 0 3
James 23.09.2017 26.09.2017 11:00:00 4 7 1 4
New df:
Name Appointments Appointmenttime SDAP RDAP
Sam 23.09.2017 11:00:00 3 -9
Sam 24.09.2017 11:00:00 6 8
Sarah 24.09.2017 12:00:00 2 NaN
Sarah 27.09.2017 12:00:00 7 8
Steve 23.10.2017 11:00:00 5 9
Steve 31.10.2017 12:00:00 7 9
Mark 23.09.2017 11:00:00 0 3
James 23.09.2017 4 7
James 26.09.2017 11:00:00 1 4
Is it necessary to use wide_to_long? It seems much easier to use concat.
df1 = df[["Name","Appoint1","Appoint1t"]]
df2 = df[["Name","Appoint2","Appoint2t"]].rename(columns={"Appoint2": "Appoint1", "Appoint2t": "Appoint1t"})
print (pd.concat([df1,df2]).dropna().sort_index())
#
Name Appoint1 Appoint1t
0 Sam 23.09.2017 11:00:00
0 Sam 24.09.2017 11:00:00
1 Sarah 24.09.2017 12:00:00
1 Sarah 27.09.2017 12:00:00
2 Steve 23.10.2017 11:00:00
2 Steve 31.10.2017 12:00:00
3 Mark 23.09.2017 11:00:00
4 James 23.09.2017 11:00:00
4 James 26.09.2017 11:00:00
Using wide_to_long by first renaming the columns:
df.columns = ['Name', 'Appoint_1', 'Appoint_2', 'Time_1', 'Time_2']
print (pd.wide_to_long(df,stubnames=["Appoint","Time"],i="Name",j="count",sep='_')
.dropna().reset_index().drop("count",axis=1))
#
Name Appoint Time
0 Sam 23.09.2017 11:00:00
1 Sarah 24.09.2017 12:00:00
2 Steve 23.10.2017 11:00:00
3 Mark 23.09.2017 11:00:00
4 James 23.09.2017 11:00:00
5 Sam 24.09.2017 11:00:00
6 Sarah 27.09.2017 12:00:00
7 Steve 31.10.2017 12:00:00
8 James 26.09.2017 11:00:00
A B C
0 2002-01-13 10:00:00 Jack 10
1 2002-01-13 10:00:00 Hellen 10
2 2002-01-13 12:00:00 Sibl 14
3 2002-01-13 12:00:00 Steve 14
4 2002-01-18 10:00:00 Ridley 38
5 2002-01-18 10:00:00 Scott 38
6 2002-01-18 12:00:00 Rambo 52
7 2002-01-18 12:00:00 Peter 52
8 2002-02-09 08:00:00 Brad 90
9 2002-02-09 08:00:00 Victoria 90
10 2002-02-09 14:00:00 Caroline 8
11 2002-02-09 14:00:00 Andrea 8
I want to create a new df['D'] with a "3 period - Simple Moving Average" of C, group by A datetime column. If possible, using convolve.
Output should be:
A B C D
0 2002-01-13 10:00:00 Jack 10
1 2002-01-13 10:00:00 Hellen 10
2 2002-01-13 12:00:00 Sibl 14
3 2002-01-13 12:00:00 Steve 14
4 2002-01-18 10:00:00 Ridley 38 20.66
5 2002-01-18 10:00:00 Scott 38 20.66
6 2002-01-18 12:00:00 Rambo 52 34.66
7 2002-01-18 12:00:00 Peter 52 34.66
8 2002-02-09 08:00:00 Brad 90 60.00
9 2002-02-09 08:00:00 Victoria 90 60.00
10 2002-02-09 14:00:00 Caroline 8 50.00
11 2002-02-09 14:00:00 Andrea 8 50.00
Let's try:
df['D'] = df.A.map(df.groupby('A')['C'].mean().rolling(3).mean())
Output:
A B C D
0 2002-01-13 10:00:00 Jack 10 NaN
1 2002-01-13 10:00:00 Hellen 10 NaN
2 2002-01-13 12:00:00 Sibl 14 NaN
3 2002-01-13 12:00:00 Steve 14 NaN
4 2002-01-18 10:00:00 Ridley 38 20.666667
5 2002-01-18 10:00:00 Scott 38 20.666667
6 2002-01-18 12:00:00 Rambo 52 34.666667
7 2002-01-18 12:00:00 Peter 52 34.666667
8 2002-02-09 08:00:00 Brad 90 60.000000
9 2002-02-09 08:00:00 Victoria 90 60.000000
10 2002-02-09 14:00:00 Caroline 8 50.000000
11 2002-02-09 14:00:00 Andrea 8 50.000000