Calculate moving average of residuals iteratively in pandas - python

I have a dataframe of this form
Residual = Actual - Pred
Actual Pred Residual
0 11 10 1
1 12 10 2
2 13 10 3
3 14 10 4
4 15 10 5
5 16 10 6
6 17 10 7
7 18 10 8
8 19 10 9
I want to calculate the 3 day moving average of the residuals and add it back to Pred column; then recalculate the residuals and repeat the process for next day iteratively as shown in the df below
For eg:
For index=3; MA of the previous 3 day residuals will be (1+2+3)/3 = 2. We add this value to the today's prediction, which will be 12 and the new residual will be 14-12 = 2.
Now, for index=4; we take the last 3 days MA of Residual_New i.e. (2+3+2)/3 ~ 2.33. So Pred_new = 12.33 and Residual_New = 15-12.33 = 2.67 .. and so on
Actual Pred Residual Pred_New Residual_New
0 11 10 1 10 1
1 12 10 2 10 2
2 13 10 3 10 3
3 14 10 4 10+2 2
4 15 10 5 10+2.33 2.67
5 16 10 6 ........
6 17 10 7 .......
7 18 10 8
8 19 10 9
How can I achieve this effectively in pandas.
Thanks

Related

Python Pandas - sumif in excel - criteria and range same df volumn

I’ve been trying to code python equivalent of excel sumif
Excel:
Sumif($A$1:$A$20,A1,$C$1:$C$20)
enter code here
Pandas df:
A C Term
1 10 1
1 20 2
1 10 3
1 10 4
2 30 5
2 30 6
2 30 7
3 20 8
3 10 9
3 10 10
3 10 11
3 10 12
Output df - I want output df with ‘fwdSum’ as follows
—————————
A C Term fwdSum
1 10 1 50
1 20 2 50
1 10 3 50
1 10 4 50
2 30 5 90
2 30 6 90
2 30 7 90
3 20 8 60
3 10 9 60
3 10 10 60
3 10 11 60
3 10 12 60
I tried creating another df with groupby and sum and then later merge
Please can anyone suggest the best Way to achieve this?
df['fwdSum'] = df.groupby('A')['C'].transform('sum')
print(df)
Prints:
A C Term fwdSum
0 1 10 1 50
1 1 20 2 50
2 1 10 3 50
3 1 10 4 50
4 2 30 5 90
5 2 30 6 90
6 2 30 7 90
7 3 20 8 60
8 3 10 9 60
9 3 10 10 60
10 3 10 11 60
11 3 10 12 60

Pandas Business Day Offset: Request for Simple Example

I have a dataframe, "df", with a datetime index. Here is a rough snapshot of its dimensions:
V1 V2 V3 V4 V5
1/12/2008 4 15 11 7 1
1/13/2008 5 2 8 7 1
1/14/2008 13 13 9 6 4
1/15/2008 14 15 12 9 3
1/16/2008 1 10 2 12 15
1/17/2008 10 5 9 9 1
1/18/2008 13 11 5 7 2
1/19/2008 2 6 7 9 6
1/20/2008 5 4 14 3 7
1/21/2008 11 11 4 7 15
1/22/2008 9 4 15 10 3
1/23/2008 2 13 13 10 3
1/24/2008 12 15 14 12 8
1/25/2008 1 4 2 6 15
Some of the days in the index are weekends and holidays.
I would like to move all dates, in the datetime index of "df", to their respective closest (US) business day (i.e. Mon-Friday, excluding holidays).
How would you recommend for me to do this? I am aware that Pandas has a "timeseries offset" facility for this. But, I haven't been able to find an example that walks a novice reader through this.
Can you help?
I am not familiar with this class but after looking at the source code it seems fairly straightforward to achieve this. Keep in mind that it picks the next closest business day meaning Saturday turns into Monday as opposed to Friday. Also making your index be non-unique will decrease performance on your DataFrame, so I suggest assigning these values to a new column.
The one prerequisite is you have to make sure your index is any of these three types, datetime, timedelta, pd.tseries.offsets.Tick.
offset = pd.tseries.offsets.CustomBusinessDay(n=0)
df.assign(
closest_business_day=df.index.to_series().apply(offset)
)
V1 V2 V3 V4 V5 closest_business_day
2008-01-12 4 15 11 7 1 2008-01-14
2008-01-13 5 2 8 7 1 2008-01-14
2008-01-14 13 13 9 6 4 2008-01-14
2008-01-15 14 15 12 9 3 2008-01-15
2008-01-16 1 10 2 12 15 2008-01-16
2008-01-17 10 5 9 9 1 2008-01-17
2008-01-18 13 11 5 7 2 2008-01-18
2008-01-19 2 6 7 9 6 2008-01-21
2008-01-20 5 4 14 3 7 2008-01-21
2008-01-21 11 11 4 7 15 2008-01-21
2008-01-22 9 4 15 10 3 2008-01-22
2008-01-23 2 13 13 10 3 2008-01-23
2008-01-24 12 15 14 12 8 2008-01-24
2008-01-25 1 4 2 6 15 2008-01-25

How to rolling non-overlapping window in pandas

My dataframe looks like:
c1
0 10
1 11
2 12
3 13
4 14
5 15
6 16
7 17
I want to find the minimum for every 3 rows. which looks like:
c1 min
0 10 10
1 11 10
2 12 10
3 13 13
4 14 13
5 15 13
6 16 16
7 17 16
and the number of rows might not be divisible by 3. I can't achieve it with rolling function.
If there is default index values use integer division by 3 and pass to GroupBy.transform with min:
df['min'] = df['c1'].groupby(df.index // 3).transform('min')
Or if any index generate helper np.arange:
df['min'] = df['c1'].groupby(np.arange(len(df)) // 3).transform('min')
print (df)
c1 min
0 10 10
1 11 10
2 12 10
3 13 13
4 14 13
5 15 13
6 16 16
7 17 16
You can also do this:
>>> df['min'] = df['c1'][::3]
>>> df.ffill().astype(int)
c1 min
0 10 10
1 11 10
2 12 10
3 13 13
4 14 13
5 15 13
6 16 16
7 17 16

Finding all simple cycles in undirected graphs

I am trying to implement a task of finding all simple cycles in undirected graph. Originally, the task was to find all cycles of fixed length (= 3), and I've managed to do it using the properties of adjacency matrices. But before using that approach I was also trying to use DFS and it worked correctly for really small input sizes, but for bigger inputs it was going crazy, ending with (nearly) infinite loops. I tried to fix the code, but then it just could not find all the cycles.
My code is attached below.
1. Please, do not pay attention to several global variables used. The working code using another approach was already submitted. This one is just for me to see if how to make DFS work properly.
2. Yes, I've searched for this problem before posting this question, but either the option I've managed to find used different approach, or it was just about detecting if there are cycles at all. Besides, I want to know if it is possible to fix my code.
Big thanks to anyone who could help.
num_res = 0
adj_list = []
cycles_list = []
def dfs(v, path):
global num_res
for node in adj_list[v]:
if node not in path:
dfs(node, path + [node])
elif len(path) >= 3 and (node == path[-3]):
if sorted(path[-3:]) not in cycles_list:
cycles_list.append(sorted(path[-3:]))
num_res += 1
if __name__ == "__main__":
num_towns, num_pairs = [int(x) for x in input().split()]
adj_list = [[] for x in range(num_towns)]
adj_matrix = [[0 for x in range(num_towns)] for x in range(num_towns)]
# EDGE LIST TO ADJACENCY LIST
for i in range(num_pairs):
cur_start, cur_end = [int(x) for x in input().split()]
adj_list[cur_start].append(cur_end)
adj_list[cur_end].append(cur_start)
dfs(0, [0])
print(num_res)
UPD: Works ok for following inputs:
5 8
4 0
0 2
0 1
3 2
4 3
4 2
1 3
3 0
(output: 5)
6 15
5 4
2 0
3 1
5 1
4 1
5 3
1 0
4 0
4 3
5 2
2 1
3 0
3 2
5 0
4 2
(output: 20)
9 12
0 1
0 2
1 3
1 4
2 4
2 5
3 6
4 6
4 7
5 7
6 8
7 8
(output: 0)
Does NOT give any output and just continues through the loop.
22 141
5 0
12 9
18 16
7 6
7 0
4 1
16 1
8 1
6 1
14 0
16 0
11 9
20 14
12 3
18 3
1 0
17 0
17 15
14 5
17 13
6 5
18 12
21 1
13 4
18 11
18 13
8 0
15 9
21 18
13 6
12 8
16 13
20 18
21 3
11 6
15 14
13 5
17 5
10 8
9 5
16 14
19 9
7 5
14 10
16 4
18 7
12 1
16 3
19 18
19 17
20 2
12 11
15 3
15 11
13 2
10 7
15 13
10 9
7 3
14 3
10 1
21 19
9 2
21 4
19 0
18 1
10 6
15 0
20 7
14 11
19 6
18 10
7 4
16 10
9 4
13 3
12 2
4 3
17 7
15 8
13 7
21 14
4 2
21 0
20 16
18 8
20 12
14 2
13 1
16 15
17 11
17 16
20 10
15 7
14 1
13 0
17 12
18 5
12 4
15 1
16 9
9 1
17 14
16 2
12 5
20 8
19 2
18 4
19 4
19 11
15 12
14 12
11 8
17 10
18 14
12 7
16 8
20 11
8 7
18 9
6 4
11 5
17 6
5 3
15 10
20 19
15 6
19 10
20 13
9 3
13 9
13 10
21 7
19 13
19 12
19 14
6 3
21 15
21 6
17 3
10 5
(output should be 343)

Duplicating pandas dataframe vertically

I have the foll. dataframe:
Month Day season
0 4 15 current
1 4 16 current
2 4 17 current
3 4 18 current
4 4 19 current
5 4 20 current
I would like to duplicate it like so:
Month Day season
0 4 15 current
1 4 16 current
2 4 17 current
3 4 18 current
4 4 19 current
5 4 20 current
6 4 15 past
7 4 16 past
8 4 17 past
9 4 18 past
10 4 19 past
11 4 20 past
I can duplicate it using:
df.append([df]*2,ignore_index=True)
However, how do I duplicate so that the season column has past as the duplicated values instead of current
I think this would be a good case for assign since it allows you to keep your functional programming style (i approve!)
In [144]: df.append([df.assign(season='past')]*2,ignore_index=True)
Out[144]:
Month Day season
0 4 15 current
1 4 16 current
2 4 17 current
3 4 18 current
4 4 19 current
5 4 20 current
6 4 15 past
7 4 16 past
8 4 17 past
9 4 18 past
10 4 19 past
11 4 20 past
12 4 15 past
13 4 16 past
14 4 17 past
15 4 18 past
16 4 19 past
17 4 20 past

Categories

Resources