Python Pandas Faster Rolling Calculation Alternative - python

Here is the raw data:
date name score
0 2021-01-02 A 100
1 2021-01-03 A 120
2 2021-01-04 A 130
3 2021-01-05 A 115
4 2021-01-06 A 120
5 2021-01-07 A 70
6 2021-01-08 A 60
7 2021-01-09 A 30
8 2021-01-10 A 10
9 2021-01-11 A 100
10 2021-01-02 B 50
11 2021-01-03 B 40
12 2021-01-04 B 80
13 2021-01-05 B 115
14 2021-01-06 B 100
15 2021-01-07 B 50
16 2021-01-08 B 20
17 2021-01-09 B 40
18 2021-01-10 B 120
19 2021-01-11 B 20
20 2021-01-02 C 80
21 2021-01-03 C 100
22 2021-01-04 C 120
23 2021-01-05 C 115
24 2021-01-06 C 90
25 2021-01-07 C 80
26 2021-01-08 C 150
27 2021-01-09 C 200
28 2021-01-10 C 30
29 2021-01-11 C 40
I would like to get the following output, with a new column to calculate trailing 3-day average for each name. Besides, I would love to add some new columns doing logical calculation like df.score.shift(1) <= 100.
date name score 3_day_average previous_score<=100
0 2021-01-02 A 100 NaN False
1 2021-01-03 A 120 NaN True
2 2021-01-04 A 130 116.666667 False
3 2021-01-05 A 115 121.666667 False
4 2021-01-06 A 120 121.666667 False
5 2021-01-07 A 70 101.666667 False
6 2021-01-08 A 60 83.333333 True
7 2021-01-09 A 30 53.333333 True
8 2021-01-10 A 10 33.333333 True
9 2021-01-11 A 100 46.666667 True
10 2021-01-02 B 50 NaN False
11 2021-01-03 B 40 NaN True
12 2021-01-04 B 80 56.666667 True
13 2021-01-05 B 115 78.333333 True
14 2021-01-06 B 100 98.333333 False
15 2021-01-07 B 50 88.333333 True
16 2021-01-08 B 20 56.666667 True
17 2021-01-09 B 40 36.666667 True
18 2021-01-10 B 120 60.000000 True
19 2021-01-11 B 20 60.000000 False
20 2021-01-02 C 80 NaN False
21 2021-01-03 C 100 NaN True
22 2021-01-04 C 120 100.000000 True
23 2021-01-05 C 115 111.666667 False
24 2021-01-06 C 90 108.333333 False
25 2021-01-07 C 80 95.000000 True
26 2021-01-08 C 150 106.666667 True
27 2021-01-09 C 200 143.333333 False
28 2021-01-10 C 30 126.666667 False
29 2021-01-11 C 40 90.000000 True
I'm now using df.groupby('name') with df.apply a function, How could I improve the execution time using alternatives? Thanks in advance!

Use rolling after groupby first and then DataFrameGroupBy.shift:
df['3_day_average'] = (df.groupby('name')['score']
.rolling(3)
.mean()
.reset_index(level=0, drop=True))
df['previous_score<=100'] = df.groupby('name')['score'].shift() <= 100
print (df.head(15))
date name score 3_day_average previous_score<=100
0 2021-01-02 A 100 NaN False
1 2021-01-03 A 120 NaN True
2 2021-01-04 A 130 116.666667 False
3 2021-01-05 A 115 121.666667 False
4 2021-01-06 A 120 121.666667 False
5 2021-01-07 A 70 101.666667 False
6 2021-01-08 A 60 83.333333 True
7 2021-01-09 A 30 53.333333 True
8 2021-01-10 A 10 33.333333 True
9 2021-01-11 A 100 46.666667 True
10 2021-01-02 B 50 NaN False
11 2021-01-03 B 40 NaN True
12 2021-01-04 B 80 56.666667 True
13 2021-01-05 B 115 78.333333 True
14 2021-01-06 B 100 98.333333 False

data=[(0 ,'2021-01-02','A',100),
(1 ,'2021-01-03','A',120),
(2 ,'2021-01-04','A',130),
(3 ,'2021-01-05','A',115),
(4 ,'2021-01-06','A',120),
(5 ,'2021-01-07','A', 70),
(6 ,'2021-01-08','A', 60),
(7 ,'2021-01-09','A', 30),
(8 ,'2021-01-10','A', 10),
(9 ,'2021-01-11','A',100),
(10 ,'2021-01-02','B', 50),
(11 ,'2021-01-03','B', 40),
(12 ,'2021-01-04','B', 80),
(13 ,'2021-01-05','B',115),
(14 ,'2021-01-06','B',100),
(15 ,'2021-01-07','B', 50),
(16 ,'2021-01-08','B', 20),
(17 ,'2021-01-09','B', 40),
(18 ,'2021-01-10','B',120),
(19 ,'2021-01-11','B', 20),
(20 ,'2021-01-02','C', 80),
(21 ,'2021-01-03','C',100),
(22 ,'2021-01-04','C',120),
(23 ,'2021-01-05','C',115),
(24 ,'2021-01-06','C', 90),
(25 ,'2021-01-07','C', 80),
(26 ,'2021-01-08','C',150),
(27 ,'2021-01-09','C',200),
(28 ,'2021-01-10','C', 30),
(29 ,'2021-01-11','C', 40)]
header=['id','date','name','score']
df=pd.DataFrame(data,columns=header)
df['3d_rolling_avg'] = df.iloc[:,3].rolling(
window=3,
center=False
).mean()
df['shift']=df.apply(lambda x: x.shift(1))['score']
df['prev_score_lessthan_100']=df['shift'].apply(lambda x: True if (x <=100) & (x != None) else False)
print(df)
output:
id date name score 3d_rolling_avg shift prev_score_lessthan_100
0 0 2021-01-02 A 100 NaN NaN False
1 1 2021-01-03 A 120 NaN 100.0 True
2 2 2021-01-04 A 130 116.666667 120.0 False
3 3 2021-01-05 A 115 121.666667 130.0 False
4 4 2021-01-06 A 120 121.666667 115.0 False
5 5 2021-01-07 A 70 101.666667 120.0 False
6 6 2021-01-08 A 60 83.333333 70.0 True
7 7 2021-01-09 A 30 53.333333 60.0 True
8 8 2021-01-10 A 10 33.333333 30.0 True
9 9 2021-01-11 A 100 46.666667 10.0 True
10 10 2021-01-02 B 50 53.333333 100.0 True
11 11 2021-01-03 B 40 63.333333 50.0 True
12 12 2021-01-04 B 80 56.666667 40.0 True
13 13 2021-01-05 B 115 78.333333 80.0 True
14 14 2021-01-06 B 100 98.333333 115.0 False
15 15 2021-01-07 B 50 88.333333 100.0 True
16 16 2021-01-08 B 20 56.666667 50.0 True
17 17 2021-01-09 B 40 36.666667 20.0 True
18 18 2021-01-10 B 120 60.000000 40.0 True
19 19 2021-01-11 B 20 60.000000 120.0 False
20 20 2021-01-02 C 80 73.333333 20.0 True
21 21 2021-01-03 C 100 66.666667 80.0 True
22 22 2021-01-04 C 120 100.000000 100.0 True
23 23 2021-01-05 C 115 111.666667 120.0 False
24 24 2021-01-06 C 90 108.333333 115.0 False
25 25 2021-01-07 C 80 95.000000 90.0 True
26 26 2021-01-08 C 150 106.666667 80.0 True
27 27 2021-01-09 C 200 143.333333 150.0 False
28 28 2021-01-10 C 30 126.666667 200.0 False
29 29 2021-01-11 C 40 90.000000 30.0 True

Related

Generating monthly level data using ffill and bffill on multiplce columns of a log file

I have a log file in following format:
Item
Month_end_date
old_price
new_price
row
A
2022-03-31
25
30
1
A
2022-06-30
30
40
2
A
2022-08-31
40
45
3
B
2022-04-30
80
70
4
Here, its assumed that the price of the item A from the start of the year was 25 using 1st row of the above table. I want to get monthly prices using this table. The ideal output looks like the table below:
Item
Month_end_date
price
A
2022-01-31
25
A
2022-02-28
25
A
2022-03-31
30
A
2022-04-30
30
A
2022-05-31
30
A
2022-06-30
40
A
2022-07-31
40
A
2022-08-31
45
A
2022-09-30
45
A
2022-10-31
45
A
2022-11-30
45
A
2022-12-31
45
B
2022-01-31
80
B
2022-02-28
80
B
2022-03-31
80
B
2022-04-30
70
B
2022-05-31
70
B
2022-06-30
70
B
2022-07-31
70
B
2022-08-31
70
B
2022-09-30
70
B
2022-10-31
70
B
2022-11-30
70
B
2022-12-31
70
IIUC, you can reshape, fill in the missing periods and ffill/bfill per group:
(df
.assign(**{'Month_end_date': pd.to_datetime(df['Month_end_date'])})
.set_index(['Item', 'Month_end_date'])
[['old_price', 'new_price']]
.reindex(pd.MultiIndex
.from_product([df['Item'].unique(),
pd.date_range('2022-01-01',
'2022-12-31',
freq='M')],
names=['Items', 'Month_end_date'])
)
.stack(dropna=False)
.groupby(level=0).apply(lambda g: g.ffill().bfill())
.unstack()['new_price']
.reset_index(name='price')
)
output:
Items Month_end_date price
0 A 2022-01-31 25.0
1 A 2022-02-28 25.0
2 A 2022-03-31 30.0
3 A 2022-04-30 30.0
4 A 2022-05-31 30.0
5 A 2022-06-30 40.0
6 A 2022-07-31 40.0
7 A 2022-08-31 45.0
8 A 2022-09-30 45.0
9 A 2022-10-31 45.0
10 A 2022-11-30 45.0
11 A 2022-12-31 45.0
12 B 2022-01-31 80.0
13 B 2022-02-28 80.0
14 B 2022-03-31 80.0
15 B 2022-04-30 70.0
16 B 2022-05-31 70.0
17 B 2022-06-30 70.0
18 B 2022-07-31 70.0
19 B 2022-08-31 70.0
20 B 2022-09-30 70.0
21 B 2022-10-31 70.0
22 B 2022-11-30 70.0
23 B 2022-12-31 70.0

Python: Pandas merge three dataframes on date, keeping all dates [duplicate]

This question already has answers here:
Merge multiple DataFrames Pandas
(5 answers)
Pandas Merging 101
(8 answers)
Closed 7 months ago.
I have three dataframes
Dataframe df1:
date A
0 2022-04-11 1
1 2022-04-12 2
2 2022-04-14 26
3 2022-04-16 2
4 2022-04-17 1
5 2022-04-20 17
6 2022-04-21 14
7 2022-04-22 1
8 2022-04-23 9
9 2022-04-24 1
10 2022-04-25 5
11 2022-04-26 2
12 2022-04-27 21
13 2022-04-28 9
14 2022-04-29 17
15 2022-04-30 5
16 2022-05-01 8
17 2022-05-07 1241217
18 2022-05-08 211
19 2022-05-09 1002521
20 2022-05-10 488739
21 2022-05-11 12925
22 2022-05-12 57
23 2022-05-13 8515098
24 2022-05-14 1134576
Dateframe df2:
date B
0 2022-04-12 8
1 2022-04-14 7
2 2022-04-16 2
3 2022-04-19 2
4 2022-04-23 2
5 2022-05-07 2
6 2022-05-08 5
7 2022-05-09 2
8 2022-05-14 1
Dataframe df3:
date C
0 2022-04-12 6
1 2022-04-13 1
2 2022-04-14 2
3 2022-04-20 3
4 2022-04-21 9
5 2022-04-22 25
6 2022-04-23 56
7 2022-04-24 49
8 2022-04-25 68
9 2022-04-26 71
10 2022-04-27 40
11 2022-04-28 44
12 2022-04-29 27
13 2022-04-30 34
14 2022-05-01 28
15 2022-05-07 9
16 2022-05-08 20
17 2022-05-09 24
18 2022-05-10 21
19 2022-05-11 8
20 2022-05-12 8
21 2022-05-13 14
22 2022-05-14 25
23 2022-05-15 43
24 2022-05-16 36
25 2022-05-17 29
26 2022-05-18 28
27 2022-05-19 17
28 2022-05-20 6
I would like to merge df1, df2, df3 in a single dataframe with columns date, A, B, C, in such a way that date contains all dates which appeared in df1 and/or df2 and/or df3 (without repetition), and if a particular date was not in any of the dataframes, then for the respective column I put value 0.0. So, I would like to have something like that:
date A B C
0 2022-04-11 1.0 0.0 0.0
1 2022-08-12 2.0 8.0 6.0
2 2022-08-13 0.0 0.0 1.0
...
I tried to use this method
merge1 = pd.merge(df1, df2, how='outer')
sorted_merge1 = merge1.sort_values(by=['date'], ascending=False)
full_merge = pd.merge(sorted_merg1, df3, how='outer')
However, it seems it skips the dates which are not common for all three dataframes.
Try this,
print(pd.merge(df1, df2, on='date', how='outer').merge(df3, on='date', how='outer').fillna(0))
O/P:
date A B C
0 2022-04-11 1.0 0.0 0.0
1 2022-04-12 2.0 8.0 6.0
2 2022-04-14 26.0 7.0 2.0
3 2022-04-16 2.0 2.0 0.0
4 2022-04-17 1.0 0.0 0.0
5 2022-04-20 17.0 0.0 3.0
6 2022-04-21 14.0 0.0 9.0
7 2022-04-22 1.0 0.0 25.0
8 2022-04-23 9.0 2.0 56.0
9 2022-04-24 1.0 0.0 49.0
10 2022-04-25 5.0 0.0 68.0
11 2022-04-26 2.0 0.0 71.0
12 2022-04-27 21.0 0.0 40.0
13 2022-04-28 9.0 0.0 44.0
14 2022-04-29 17.0 0.0 27.0
15 2022-04-30 5.0 0.0 34.0
16 2022-05-01 8.0 0.0 28.0
17 2022-05-07 1241217.0 2.0 9.0
18 2022-05-08 211.0 5.0 20.0
19 2022-05-09 1002521.0 2.0 24.0
20 2022-05-10 488739.0 0.0 21.0
21 2022-05-11 12925.0 0.0 8.0
22 2022-05-12 57.0 0.0 8.0
23 2022-05-13 8515098.0 0.0 14.0
24 2022-05-14 1134576.0 1.0 25.0
25 2022-04-19 0.0 2.0 0.0
26 2022-04-13 0.0 0.0 1.0
27 2022-05-15 0.0 0.0 43.0
28 2022-05-16 0.0 0.0 36.0
29 2022-05-17 0.0 0.0 29.0
30 2022-05-18 0.0 0.0 28.0
31 2022-05-19 0.0 0.0 17.0
32 2022-05-20 0.0 0.0 6.0
​
perform merge chain and fill NaN with 0

Pandas Insert missing dates values with mutiples IDs

I have a pandas dataframe, with 1.7 million of rows. Like this:
ID
date
value
10
2022-01-01
100
10
2022-01-02
150
10
2022-01-05
200
10
2022-01-07
150
10
2022-01-12
100
23
2022-02-01
490
23
2022-02-03
350
23
2022-02-04
333
23
2022-02-08
211
23
2022-02-09
100
I would like to insert the missing dates in the column date. Like this:
ID
date
value
10
2022-01-01
100
10
2022-01-02
150
10
2022-01-03
0
10
2022-01-04
0
10
2022-01-05
200
10
2022-01-06
0
10
2022-01-07
150
10
2022-01-08
0
10
2022-01-09
0
10
2022-01-10
0
10
2022-01-11
0
10
2022-01-12
100
23
2022-02-01
490
10
2022-02-02
0
23
2022-02-03
350
23
2022-02-04
333
´´
10
2022-02-05
10
2022-02-06
0
10
2022-02-07
0
23
2022-02-08
211
23
2022-02-09
100
I used:
s = (pd.MultiIndex.from_tuples([[x, d]
for x, y in df.groupby("Id")["Dt"]
for d in pd.date_range(min(y), max(df["Dt"]), freq="MS")], names=["Id", "Dt"]))
print (df.set_index(["Id", "Dt"]).reindex(s, fill_value=0).reset_index())
But, It took too long. Is there a more performative way to do this?
You can try:
df['date'] = pd.to_datetime(df['date'])
df = (df.groupby('ID')['date'].apply(lambda d:
pd.date_range(start=d.min(),end=d.max()).to_list())
.explode().reset_index()
.merge(df, on=['ID','date'],how='left'))
df['value'] = df['value'].fillna(0).astype(int)
Output:
ID date value
0 10 2022-01-01 100
1 10 2022-01-02 150
2 10 2022-01-03 0
3 10 2022-01-04 0
4 10 2022-01-05 200
5 10 2022-01-06 0
6 10 2022-01-07 150
7 10 2022-01-08 0
8 10 2022-01-09 0
9 10 2022-01-10 0
10 10 2022-01-11 0
11 10 2022-01-12 100
12 23 2022-02-01 490
13 23 2022-02-02 0
14 23 2022-02-03 350
15 23 2022-02-04 333
16 23 2022-02-05 0
17 23 2022-02-06 0
18 23 2022-02-07 0
19 23 2022-02-08 211
20 23 2022-02-09 100
Use asfreq and fillna:
#convert to datetime if needed
df["date"] = pd.to_datetime(df["date"])
df = df.set_index("date").asfreq("D").fillna({"value": "0"}).ffill().reset_index()
>>> df
date ID value
0 2022-01-01 10.0 100.0
1 2022-01-02 10.0 150.0
2 2022-01-03 10.0 0
3 2022-01-04 10.0 0
4 2022-01-05 10.0 200.0
5 2022-01-06 10.0 0
6 2022-01-07 10.0 150.0
7 2022-01-08 10.0 0
8 2022-01-09 10.0 0
9 2022-01-10 10.0 0
10 2022-01-11 10.0 0
11 2022-01-12 10.0 100.0
12 2022-01-13 10.0 0
13 2022-01-14 10.0 0
14 2022-01-15 10.0 0
15 2022-01-16 10.0 0
16 2022-01-17 10.0 0
17 2022-01-18 10.0 0
18 2022-01-19 10.0 0
19 2022-01-20 10.0 0
20 2022-01-21 10.0 0
21 2022-01-22 10.0 0
22 2022-01-23 10.0 0
23 2022-01-24 10.0 0
24 2022-01-25 10.0 0
25 2022-01-26 10.0 0
26 2022-01-27 10.0 0
27 2022-01-28 10.0 0
28 2022-01-29 10.0 0
29 2022-01-30 10.0 0
30 2022-01-31 10.0 0
31 2022-02-01 23.0 490.0
32 2022-02-02 23.0 0
33 2022-02-03 23.0 350.0
34 2022-02-04 23.0 333.0
35 2022-02-05 23.0 0
36 2022-02-06 23.0 0
37 2022-02-07 23.0 0
38 2022-02-08 23.0 211.0
39 2022-02-09 23.0 100.0

How to conditionally aggregate values of previous rows of Pandas DataFrame?

I have the following example Pandas DataFrame
UserID Total Date
1 20 2019-01-01
1 18 2019-01-04
1 22 2019-01-05
1 16 2019-01-07
1 17 2019-01-09
1 26 2019-01-11
1 30 2019-01-12
1 28 2019-01-13
1 28 2019-01-15
1 28 2019-01-16
2 22 2019-01-06
2 11 2019-01-07
2 23 2019-01-09
2 14 2019-01-13
2 19 2019-01-14
2 29 2019-01-15
2 21 2019-01-16
2 22 2019-01-18
2 30 2019-01-22
2 16 2019-01-23
3 27 2019-01-01
3 13 2019-01-04
3 12 2019-01-05
3 27 2019-01-06
3 26 2019-01-09
3 26 2019-01-10
3 30 2019-01-11
3 19 2019-01-12
3 27 2019-01-13
3 29 2019-01-14
4 29 2019-01-07
4 12 2019-01-09
4 25 2019-01-10
4 11 2019-01-11
4 19 2019-01-13
4 20 2019-01-14
4 33 2019-01-15
4 24 2019-01-18
4 22 2019-01-19
4 24 2019-01-21
My goal is to add a column named TotalPrev10Days which is basically the sum of Total for previous 10 days (for each UserID)
I did a basic implementation using nested loops and comparing the current date with a timedelta.
Here's my code:
users = set(df.UserID) # get set of all unique user IDs
TotalPrev10Days = []
delta = timedelta(days=10) # 10 day time delta to subtract from each row date
for user in users: # looping over all user IDs
user_df = df[df["UserID"] == user] #creating dataframe that includes only current userID data
for row_index in user_df.index: #looping over each row from UserID dataframe
row_date = user_df["Date"][row_index]
row_date_minus_10 = row_date - delta #subtracting 10 days
sum_prev_10_days = user_df[(user_df["Date"] < row_date) & (user_df["Date"] >= row_date_minus_10)]["Total"].sum()
TotalPrev10Days.append(sum_prev_10_days) #appending total to a list
df["TotalPrev10Days"] = TotalPrev10Days #Assigning list to new DataFrame column
While it works perfectly, it's very slow for large datasets.
Is there a faster, more Pandas-native approach to this problem?
IIUC, try:
df["TotalPrev10Days"] = df.groupby("UserID") \
.rolling("9D", on="Date") \
.sum() \
.shift() \
.fillna(0)["Total"] \
.droplevel(0)
>>> df
UserID Total Date TotalPrev10Days
0 1 20 2019-01-01 0.0
1 1 18 2019-01-04 20.0
2 1 22 2019-01-05 38.0
3 1 16 2019-01-07 60.0
4 1 17 2019-01-09 76.0
5 1 26 2019-01-11 93.0
6 1 30 2019-01-12 99.0
7 1 28 2019-01-13 129.0
8 1 28 2019-01-15 139.0
9 1 28 2019-01-16 145.0
10 2 22 2019-01-06 0.0
11 2 11 2019-01-07 22.0
12 2 23 2019-01-09 33.0
13 2 14 2019-01-13 56.0
14 2 19 2019-01-14 70.0
15 2 29 2019-01-15 89.0
16 2 21 2019-01-16 96.0
17 2 22 2019-01-18 106.0
18 2 30 2019-01-22 105.0
19 2 16 2019-01-23 121.0
20 3 27 2019-01-01 0.0
21 3 13 2019-01-04 27.0
22 3 12 2019-01-05 40.0
23 3 27 2019-01-06 52.0
24 3 26 2019-01-09 79.0
25 3 26 2019-01-10 105.0
26 3 30 2019-01-11 104.0
27 3 19 2019-01-12 134.0
28 3 27 2019-01-13 153.0
29 3 29 2019-01-14 167.0
30 4 29 2019-01-07 0.0
31 4 12 2019-01-09 29.0
32 4 25 2019-01-10 41.0
33 4 11 2019-01-11 66.0
34 4 19 2019-01-13 77.0
35 4 20 2019-01-14 96.0
36 4 33 2019-01-15 116.0
37 4 24 2019-01-18 149.0
38 4 22 2019-01-19 132.0
39 4 24 2019-01-21 129.0

How to join a table with each group of a dataframe in pandas

I have a dataframe like below. Each date is Monday of each week.
df = pd.DataFrame({'date' :['2020-04-20', '2020-05-11','2020-05-18',
'2020-04-20', '2020-04-27','2020-05-04','2020-05-18'],
'name': ['A', 'A', 'A', 'B', 'B', 'B', 'B'],
'count': [23, 44, 125, 6, 9, 10, 122]})
date name count
0 2020-04-20 A 23
1 2020-05-11 A 44
2 2020-05-18 A 125
3 2020-04-20 B 6
4 2020-04-27 B 9
5 2020-05-04 B 10
6 2020-05-18 B 122
Neither 'A' and 'B' covers the whole date range. Both of them have some missing dates, which means the counts on that week is 0. Below is all the dates:
df_dates = pd.DataFrame({ 'date':['2020-04-20', '2020-04-27','2020-05-04','2020-05-11','2020-05-18'] })
So what I need is like the dataframe below:
date name count
0 2020-04-20 A 23
1 2020-04-27 A 0
2 2020-05-04 A 0
3 2020-05-11 A 44
4 2020-05-18 A 125
5 2020-04-20 B 6
6 2020-04-27 B 9
7 2020-05-04 B 10
8 2020-05-11 B 0
9 2020-05-18 B 122
It seems like I need to join (merge) df_dates with df for each name group ( A and B) and then fill the data with missing name and missing count value with 0's. Does anyone know achieve that? how I can join with another table with a grouped table?
I tried and no luck...
pd.merge(df_dates, df.groupby('name'), how='left', on='date')
We can do reindex with multiple index creation
idx=pd.MultiIndex.from_product([df_dates.date,df.name.unique()],names=['date','name'])
s=df.set_index(['date','name']).reindex(idx,fill_value=0).reset_index().sort_values('name')
Out[136]:
date name count
0 2020-04-20 A 23
2 2020-04-27 A 0
4 2020-05-04 A 0
6 2020-05-11 A 44
8 2020-05-18 A 125
1 2020-04-20 B 6
3 2020-04-27 B 9
5 2020-05-04 B 10
7 2020-05-11 B 0
9 2020-05-18 B 122
Or
s=df.pivot(*df.columns).reindex(df_dates.date).fillna(0).reset_index().melt('date')
Out[145]:
date name value
0 2020-04-20 A 23.0
1 2020-04-27 A 0.0
2 2020-05-04 A 0.0
3 2020-05-11 A 44.0
4 2020-05-18 A 125.0
5 2020-04-20 B 6.0
6 2020-04-27 B 9.0
7 2020-05-04 B 10.0
8 2020-05-11 B 0.0
9 2020-05-18 B 122.0
If you are looking for just fill in the union of dates in df, you can do:
(df.set_index(['date','name'])
.unstack('date',fill_value=0)
.stack().reset_index()
)
Output:
name date count
0 A 2020-04-20 23
1 A 2020-04-27 0
2 A 2020-05-04 0
3 A 2020-05-11 44
4 A 2020-05-18 125
5 B 2020-04-20 6
6 B 2020-04-27 9
7 B 2020-05-04 10
8 B 2020-05-11 0
9 B 2020-05-18 122

Categories

Resources