I have a dataframe, "df", with a datetime index. Here is a rough snapshot of its dimensions:
V1 V2 V3 V4 V5
1/12/2008 4 15 11 7 1
1/13/2008 5 2 8 7 1
1/14/2008 13 13 9 6 4
1/15/2008 14 15 12 9 3
1/16/2008 1 10 2 12 15
1/17/2008 10 5 9 9 1
1/18/2008 13 11 5 7 2
1/19/2008 2 6 7 9 6
1/20/2008 5 4 14 3 7
1/21/2008 11 11 4 7 15
1/22/2008 9 4 15 10 3
1/23/2008 2 13 13 10 3
1/24/2008 12 15 14 12 8
1/25/2008 1 4 2 6 15
Some of the days in the index are weekends and holidays.
I would like to move all dates, in the datetime index of "df", to their respective closest (US) business day (i.e. Mon-Friday, excluding holidays).
How would you recommend for me to do this? I am aware that Pandas has a "timeseries offset" facility for this. But, I haven't been able to find an example that walks a novice reader through this.
Can you help?
I am not familiar with this class but after looking at the source code it seems fairly straightforward to achieve this. Keep in mind that it picks the next closest business day meaning Saturday turns into Monday as opposed to Friday. Also making your index be non-unique will decrease performance on your DataFrame, so I suggest assigning these values to a new column.
The one prerequisite is you have to make sure your index is any of these three types, datetime, timedelta, pd.tseries.offsets.Tick.
offset = pd.tseries.offsets.CustomBusinessDay(n=0)
df.assign(
closest_business_day=df.index.to_series().apply(offset)
)
V1 V2 V3 V4 V5 closest_business_day
2008-01-12 4 15 11 7 1 2008-01-14
2008-01-13 5 2 8 7 1 2008-01-14
2008-01-14 13 13 9 6 4 2008-01-14
2008-01-15 14 15 12 9 3 2008-01-15
2008-01-16 1 10 2 12 15 2008-01-16
2008-01-17 10 5 9 9 1 2008-01-17
2008-01-18 13 11 5 7 2 2008-01-18
2008-01-19 2 6 7 9 6 2008-01-21
2008-01-20 5 4 14 3 7 2008-01-21
2008-01-21 11 11 4 7 15 2008-01-21
2008-01-22 9 4 15 10 3 2008-01-22
2008-01-23 2 13 13 10 3 2008-01-23
2008-01-24 12 15 14 12 8 2008-01-24
2008-01-25 1 4 2 6 15 2008-01-25
Related
I have a pandas.DataFrame of the form. I'll show you a simple example.(In reality, it consists of hundreds of millions of rows of data.).
I want to change the number as the letter in column '2' changes. Numbers in the remaining columns (columns:1,3 ~) should not change.
df=
index 1 2 3
0 0 a100 1
1 1.04 a100 2
2 32 a100 3
3 5.05 a105 4
4 1.01 a105 5
5 155 a105 6
6 3155.26 a105 7
7 354.12 a100 8
8 5680.13 a100 9
9 125.55 a100 10
10 13.32 a100 11
11 5656.33 a156 12
12 456.61 a156 13
13 23.52 a1235 14
14 35.35 a1235 15
15 350.20 a100 16
16 30. a100 17
17 13.50 a100 18
18 323.13 a231 19
19 15.11 a1111 20
20 11.22 a1111 21
Here is my expected result:
df=
index 1 2 3
0 0 0 1
1 1.04 0 2
2 32 0 3
3 5.05 1 4
4 1.01 1 5
5 155 1 6
6 3155.26 1 7
7 354.12 2 8
8 5680.13 2 9
9 125.55 2 10
10 13.32 2 11
11 5656.33 3 12
12 456.61 3 13
13 23.52 4 14
14 35.35 4 15
15 350.20 5 16
16 30 5 17
17 13.50 5 18
18 323.13 6 19
19 15.11 7 20
20 11.22 7 21
How do I solve this problem?
Use consecutive groups created by compare for not equal shifted values with cumulative sum and then subtract 1:
#if column is string '2'
df['2'] = df['2'].ne(df['2'].shift()).cumsum().sub(1)
#if column is number 2
df[2] = df[2].ne(df[2].shift()).cumsum().sub(1)
print (df)
index 1 2 3
0 0 0.00 0 1
1 1 1.04 0 2
2 2 32.00 0 3
3 3 5.05 1 4
4 4 1.01 1 5
5 5 155.00 1 6
6 6 3155.26 1 7
7 7 354.12 2 8
8 8 5680.13 2 9
9 9 125.55 2 10
10 10 13.32 2 11
11 11 5656.33 3 12
12 12 456.61 3 13
13 13 23.52 4 14
14 14 35.35 4 15
15 15 350.20 5 16
16 16 30.00 5 17
17 17 13.50 5 18
18 18 323.13 6 19
19 19 15.11 7 20
20 20 11.22 7 21
I have a pandas dataframe like this:
pd.DataFrame({'week': ['2019-w01', '2019-w02','2019-w03','2019-w04',
'2019-w05','2019-w06','2019-w07','2019-w08',
'2019-w9','2019-w10','2019-w11','2019-w12'],
'value': [11,22,33,34,57,88,2,9,10,1,76,14],
'period': [1,1,1,1,2,2,2,2,3,3,3,3]})
week value
0 2019-w1 11
1 2019-w2 22
2 2019-w3 33
3 2019-w4 34
4 2019-w5 57
5 2019-w6 88
6 2019-w7 2
7 2019-w8 9
8 2019-w9 10
9 2019-w10 1
10 2019-w11 76
11 2019-w12 14
what I need is like below. I would like to assign a period ID every 4-week interval.
week value period
0 2019-w01 11 1
1 2019-w02 22 1
2 2019-w03 33 1
3 2019-w04 34 1
4 2019-w05 57 2
5 2019-w06 88 2
6 2019-w07 2 2
7 2019-w08 9 2
8 2019-w9 10 3
9 2019-w10 1 3
10 2019-w11 76 3
11 2019-w12 14 3
what is the best way to achieve that? Thanks.
try with:
df['period']=(pd.to_numeric(df['week'].str.split('-').str[-1]
.str.replace('w',''))//4).shift(fill_value=0).add(1)
print(df)
week value period
0 2019-w01 11 1
1 2019-w02 22 1
2 2019-w03 33 1
3 2019-w04 34 1
4 2019-w05 57 2
5 2019-w06 88 2
6 2019-w07 2 2
7 2019-w08 9 2
8 2019-w9 10 3
9 2019-w10 1 3
10 2019-w11 76 3
11 2019-w12 14 3
I have created a dataframe from an Excel sheet, then filtered it to values in the [Date_rank] column less than 10. The resulting dataframe is filtered
I've then used: g = groupby("Well_name") to segregate the data by each well
Now that I have the data grouped by Well_name, how can I find the standard deviation of [RandomNumber] in this group (providing me with the stdev for both of the wells RandomNumbers)? Perhaps it was not necessary to use the groupby function?
df = pd.read_csv('here.csv')
print(df)
filtered = df[df['Date_rank']<10] #filter the datafram to less than 10
print(filtered)
g = filtered.groupby('Well_name') #grouped the data to segregate by well name
Here is my data
Well_name Date_rank RandomNumber
0 Velta 1 4
1 Velta 2 5
2 Velta 3 2
3 Velta 4 4
4 Velta 5 4
5 Velta 6 9
6 Velta 7 0
7 Velta 8 9
8 Velta 9 1
9 Velta 10 3
10 Velta 11 8
11 Velta 12 3
12 Velta 13 10
13 Velta 14 10
14 Velta 15 0
15 Ronnie 1 8
16 Ronnie 2 1
17 Ronnie 3 6
18 Ronnie 4 2
19 Ronnie 5 2
20 Ronnie 6 9
21 Ronnie 7 6
22 Ronnie 8 5
23 Ronnie 9 2
24 Ronnie 10 1
25 Ronnie 11 3
26 Ronnie 12 3
27 Ronnie 13 4
28 Ronnie 14 0
29 Ronnie 15 4
You should be able to solve the problem with groupby() as you stated. The code you should use is the following:
g = filtered.groupby('Well_name')['RandomNumber'].std()
Or using .agg()
g = filtered.groupby('Well_name').agg({'RandomNumber':'np.std'})
I am trying to implement a task of finding all simple cycles in undirected graph. Originally, the task was to find all cycles of fixed length (= 3), and I've managed to do it using the properties of adjacency matrices. But before using that approach I was also trying to use DFS and it worked correctly for really small input sizes, but for bigger inputs it was going crazy, ending with (nearly) infinite loops. I tried to fix the code, but then it just could not find all the cycles.
My code is attached below.
1. Please, do not pay attention to several global variables used. The working code using another approach was already submitted. This one is just for me to see if how to make DFS work properly.
2. Yes, I've searched for this problem before posting this question, but either the option I've managed to find used different approach, or it was just about detecting if there are cycles at all. Besides, I want to know if it is possible to fix my code.
Big thanks to anyone who could help.
num_res = 0
adj_list = []
cycles_list = []
def dfs(v, path):
global num_res
for node in adj_list[v]:
if node not in path:
dfs(node, path + [node])
elif len(path) >= 3 and (node == path[-3]):
if sorted(path[-3:]) not in cycles_list:
cycles_list.append(sorted(path[-3:]))
num_res += 1
if __name__ == "__main__":
num_towns, num_pairs = [int(x) for x in input().split()]
adj_list = [[] for x in range(num_towns)]
adj_matrix = [[0 for x in range(num_towns)] for x in range(num_towns)]
# EDGE LIST TO ADJACENCY LIST
for i in range(num_pairs):
cur_start, cur_end = [int(x) for x in input().split()]
adj_list[cur_start].append(cur_end)
adj_list[cur_end].append(cur_start)
dfs(0, [0])
print(num_res)
UPD: Works ok for following inputs:
5 8
4 0
0 2
0 1
3 2
4 3
4 2
1 3
3 0
(output: 5)
6 15
5 4
2 0
3 1
5 1
4 1
5 3
1 0
4 0
4 3
5 2
2 1
3 0
3 2
5 0
4 2
(output: 20)
9 12
0 1
0 2
1 3
1 4
2 4
2 5
3 6
4 6
4 7
5 7
6 8
7 8
(output: 0)
Does NOT give any output and just continues through the loop.
22 141
5 0
12 9
18 16
7 6
7 0
4 1
16 1
8 1
6 1
14 0
16 0
11 9
20 14
12 3
18 3
1 0
17 0
17 15
14 5
17 13
6 5
18 12
21 1
13 4
18 11
18 13
8 0
15 9
21 18
13 6
12 8
16 13
20 18
21 3
11 6
15 14
13 5
17 5
10 8
9 5
16 14
19 9
7 5
14 10
16 4
18 7
12 1
16 3
19 18
19 17
20 2
12 11
15 3
15 11
13 2
10 7
15 13
10 9
7 3
14 3
10 1
21 19
9 2
21 4
19 0
18 1
10 6
15 0
20 7
14 11
19 6
18 10
7 4
16 10
9 4
13 3
12 2
4 3
17 7
15 8
13 7
21 14
4 2
21 0
20 16
18 8
20 12
14 2
13 1
16 15
17 11
17 16
20 10
15 7
14 1
13 0
17 12
18 5
12 4
15 1
16 9
9 1
17 14
16 2
12 5
20 8
19 2
18 4
19 4
19 11
15 12
14 12
11 8
17 10
18 14
12 7
16 8
20 11
8 7
18 9
6 4
11 5
17 6
5 3
15 10
20 19
15 6
19 10
20 13
9 3
13 9
13 10
21 7
19 13
19 12
19 14
6 3
21 15
21 6
17 3
10 5
(output should be 343)
I have the foll. dataframe:
Month Day season
0 4 15 current
1 4 16 current
2 4 17 current
3 4 18 current
4 4 19 current
5 4 20 current
I would like to duplicate it like so:
Month Day season
0 4 15 current
1 4 16 current
2 4 17 current
3 4 18 current
4 4 19 current
5 4 20 current
6 4 15 past
7 4 16 past
8 4 17 past
9 4 18 past
10 4 19 past
11 4 20 past
I can duplicate it using:
df.append([df]*2,ignore_index=True)
However, how do I duplicate so that the season column has past as the duplicated values instead of current
I think this would be a good case for assign since it allows you to keep your functional programming style (i approve!)
In [144]: df.append([df.assign(season='past')]*2,ignore_index=True)
Out[144]:
Month Day season
0 4 15 current
1 4 16 current
2 4 17 current
3 4 18 current
4 4 19 current
5 4 20 current
6 4 15 past
7 4 16 past
8 4 17 past
9 4 18 past
10 4 19 past
11 4 20 past
12 4 15 past
13 4 16 past
14 4 17 past
15 4 18 past
16 4 19 past
17 4 20 past