count groups of values with aggregated value - python

I have a dataset like this one:
DateTime
Value
2022-01-01 11:03:45
0
2022-01-01 11:03:50
40
2022-01-01 11:03:55
50
2022-01-01 11:04:00
60
2022-01-01 11:04:05
5
2022-01-01 11:04:10
4
2022-01-01 11:04:15
3
2022-01-01 11:04:20
0
2022-01-01 11:04:25
0
2022-01-01 11:04:30
40
2022-01-01 11:04:35
50
2022-01-01 11:04:40
4
2022-01-01 11:04:45
3
2022-01-01 11:04:50
0
2022-01-02 11:03:45
0
2022-01-02 11:03:50
5
2022-01-02 11:03:55
50
2022-01-02 11:04:00
60
2022-01-02 11:04:05
5
2022-01-02 11:04:10
4
2022-01-02 11:04:15
3
2022-01-02 11:04:20
0
2022-01-02 11:04:25
49
2022-01-02 11:04:30
40
2022-01-02 11:04:35
50
2022-01-02 11:04:40
4
2022-01-02 11:04:45
3
2022-01-02 11:04:50
0
as you can see I have some timestamps with values. It is a measurement of a device. It takes a sample every 5 seconds. It is only a subset of all data. There are some group with low value and high value. I define high value if it is greater then 10. If consecutive rows have high value then I consider it as a group. What I would like to achieve:
count number of groups in day
for each group calculate duration
I will show example of my desired result below:
DateTime
Value
GroupId
Duration (in seconds)
2022-01-01 11:03:45
0
NaN
Nan
2022-01-01 11:03:50
40
1
15
2022-01-01 11:03:55
50
1
15
2022-01-01 11:04:00
60
1
15
2022-01-01 11:04:05
5
NaN
Nan
2022-01-01 11:04:10
4
NaN
Nan
2022-01-01 11:04:15
3
NaN
Nan
2022-01-01 11:04:20
0
NaN
Nan
2022-01-01 11:04:25
0
NaN
Nan
2022-01-01 11:04:30
40
2
10
2022-01-01 11:04:35
50
2
10
2022-01-01 11:04:40
4
NaN
Nan
2022-01-01 11:04:45
3
NaN
Nan
2022-01-01 11:04:50
0
NaN
Nan
2022-01-02 11:03:45
0
NaN
Nan
2022-01-02 11:03:50
5
NaN
Nan
2022-01-02 11:03:55
50
1
10
2022-01-02 11:04:00
60
1
10
2022-01-02 11:04:05
5
NaN
Nan
2022-01-02 11:04:10
4
NaN
Nan
2022-01-02 11:04:15
3
NaN
Nan
2022-01-02 11:04:20
0
NaN
Nan
2022-01-02 11:04:25
49
2
15
2022-01-02 11:04:30
40
2
15
2022-01-02 11:04:35
50
2
15
2022-01-02 11:04:40
4
NaN
Nan
2022-01-02 11:04:45
3
NaN
Nan
2022-01-02 11:04:50
0
NaN
Nan
I know how to read data in Pandas and do basic manipulation, can you give me any hints on how to find those groups and how to measure their duration and assign a number to them? THanks!

For GroupId greate groups by consecutive values greater like 10 and aggregate cumulative sum by GroupBy.cumsum, then per dates and GroupId get maximal and minimal datetime and subtract, last add 5 seconds because sample every 5 seconds:
df['DateTime'] = pd.to_datetime(df['DateTime'])
s = df['Value'].gt(10)
date = df['DateTime'].dt.date
df['GroupId'] = s.ne(s.shift())[s].groupby(date).cumsum()
g = df.groupby([date,'GroupId'])['DateTime']
df['Duration (in seconds)'] = (g.transform('max').sub(g.transform('min'))
.dt.total_seconds().add(5))
print (df)
DateTime Value GroupId Duration (in seconds)
0 2022-01-01 11:03:45 0 NaN NaN
1 2022-01-01 11:03:50 40 1.0 15.0
2 2022-01-01 11:03:55 50 1.0 15.0
3 2022-01-01 11:04:00 60 1.0 15.0
4 2022-01-01 11:04:05 5 NaN NaN
5 2022-01-01 11:04:10 4 NaN NaN
6 2022-01-01 11:04:15 3 NaN NaN
7 2022-01-01 11:04:20 0 NaN NaN
8 2022-01-01 11:04:25 0 NaN NaN
9 2022-01-01 11:04:30 40 2.0 10.0
10 2022-01-01 11:04:35 50 2.0 10.0
11 2022-01-01 11:04:40 4 NaN NaN
12 2022-01-01 11:04:45 3 NaN NaN
13 2022-01-01 11:04:50 0 NaN NaN
14 2022-01-02 11:03:45 0 NaN NaN
15 2022-01-02 11:03:50 5 NaN NaN
16 2022-01-02 11:03:55 50 1.0 10.0
17 2022-01-02 11:04:00 60 1.0 10.0
18 2022-01-02 11:04:05 5 NaN NaN
19 2022-01-02 11:04:10 4 NaN NaN
20 2022-01-02 11:04:15 3 NaN NaN
21 2022-01-02 11:04:20 0 NaN NaN
22 2022-01-02 11:04:25 49 2.0 15.0
23 2022-01-02 11:04:30 40 2.0 15.0
24 2022-01-02 11:04:35 50 2.0 15.0
25 2022-01-02 11:04:40 4 NaN NaN
26 2022-01-02 11:04:45 3 NaN NaN
27 2022-01-02 11:04:50 0 NaN NaN
Another idea for count Duration by previous matched value per groups:
df['DateTime'] = pd.to_datetime(df['DateTime'])
s = df['Value'].gt(10)
date = df['DateTime'].dt.date
df['GroupId'] = s.ne(s.shift())[s].groupby(date).cumsum()
prev = df.groupby(date)['GroupId'].bfill(limit=1)
g = df.groupby([date,prev])['DateTime']
df['Duration (in seconds)'] = (g.transform('max').sub(g.transform('min'))
.dt.total_seconds()
.where(s))
print (df)
DateTime Value GroupId Duration (in seconds)
0 2022-01-01 11:03:45 0 NaN NaN
1 2022-01-01 11:03:50 40 1.0 15.0
2 2022-01-01 11:03:55 50 1.0 15.0
3 2022-01-01 11:04:00 60 1.0 15.0
4 2022-01-01 11:04:05 5 NaN NaN
5 2022-01-01 11:04:10 4 NaN NaN
6 2022-01-01 11:04:15 3 NaN NaN
7 2022-01-01 11:04:20 0 NaN NaN
8 2022-01-01 11:04:25 0 NaN NaN
9 2022-01-01 11:04:30 40 2.0 10.0
10 2022-01-01 11:04:35 50 2.0 10.0
11 2022-01-01 11:04:40 4 NaN NaN
12 2022-01-01 11:04:45 3 NaN NaN
13 2022-01-01 11:04:50 0 NaN NaN
14 2022-01-02 11:03:45 0 NaN NaN
15 2022-01-02 11:03:50 5 NaN NaN
16 2022-01-02 11:03:55 50 1.0 10.0
17 2022-01-02 11:04:00 60 1.0 10.0
18 2022-01-02 11:04:05 5 NaN NaN
19 2022-01-02 11:04:10 4 NaN NaN
20 2022-01-02 11:04:15 3 NaN NaN
21 2022-01-02 11:04:20 0 NaN NaN
22 2022-01-02 11:04:25 49 2.0 15.0
23 2022-01-02 11:04:30 40 2.0 15.0
24 2022-01-02 11:04:35 50 2.0 15.0
25 2022-01-02 11:04:40 4 NaN NaN
26 2022-01-02 11:04:45 3 NaN NaN
27 2022-01-02 11:04:50 0 NaN NaN

Related

Pandas dataframe expand rows in specific times

I have a dataframe:
df = T1 C1
01/01/2022 11:20 2
01/01/2022 15:40 8
01/01/2022 17:50 3
I want to expand it such that
I will have the value in specific given times
I will have a row for each round timestamp
So if the times are given in
l=[ 01/01/2022 15:46 , 01/01/2022 11:28]
I will have:
df_new = T1 C1
01/01/2022 11:20 2
01/01/2022 11:28 2
01/01/2022 12:00 2
01/01/2022 13:00 2
01/01/2022 14:00 2
01/01/2022 15:00 2
01/01/2022 15:40 8
01/01/2022 15:46 8
01/01/2022 16:00 8
01/01/2022 17:00 8
01/01/2022 17:50 3
You can add the extra dates and ffill:
df['T1'] = pd.to_datetime(df['T1'])
extra = pd.date_range(df['T1'].min().ceil('H'), df['T1'].max().floor('H'), freq='1h')
(pd.concat([df, pd.DataFrame({'T1': extra})])
.sort_values(by='T1', ignore_index=True)
.ffill()
)
Output:
T1 C1
0 2022-01-01 11:20:00 2.0
1 2022-01-01 12:00:00 2.0
2 2022-01-01 13:00:00 2.0
3 2022-01-01 14:00:00 2.0
4 2022-01-01 15:00:00 2.0
5 2022-01-01 15:40:00 8.0
6 2022-01-01 16:00:00 8.0
7 2022-01-01 17:00:00 8.0
8 2022-01-01 17:50:00 3.0
Here is a way to do what your question asks that will ensure:
there are no duplicate times in T1 in the output, even if any of the times in the original are round hours
the results will be of the same type as the values in the C1 column of the input (in this case, integers not floats).
hours = pd.date_range(df.T1.min().ceil("H"), df.T1.max().floor("H"), freq="60min")
idx_new = df.set_index('T1').join(pd.DataFrame(index=hours), how='outer', sort=True).index
df_new = df.set_index('T1').reindex(index = idx_new, method='ffill').reset_index().rename(columns={'index':'T1'})
Output:
T1 C1
0 2022-01-01 11:20:00 2
1 2022-01-01 12:00:00 2
2 2022-01-01 13:00:00 2
3 2022-01-01 14:00:00 2
4 2022-01-01 15:00:00 2
5 2022-01-01 15:40:00 8
6 2022-01-01 16:00:00 8
7 2022-01-01 17:00:00 8
8 2022-01-01 17:50:00 3
Example of how round dates in the input are handled:
df = pd.DataFrame({
#'T1':pd.to_datetime(['01/01/2022 11:20','01/01/2022 15:40','01/01/2022 17:50']),
'T1':pd.to_datetime(['01/01/2022 11:00','01/01/2022 15:40','01/01/2022 17:00']),
'C1':[2,8,3]})
Input:
T1 C1
0 2022-01-01 11:00:00 2
1 2022-01-01 15:40:00 8
2 2022-01-01 17:00:00 3
Output (no duplicates):
T1 C1
0 2022-01-01 11:00:00 2
1 2022-01-01 12:00:00 2
2 2022-01-01 13:00:00 2
3 2022-01-01 14:00:00 2
4 2022-01-01 15:00:00 2
5 2022-01-01 15:40:00 8
6 2022-01-01 16:00:00 8
7 2022-01-01 17:00:00 3
Another possible solution, based on pandas.DataFrame.resample:
df['T1'] = pd.to_datetime(df['T1'])
(pd.concat([df, df.set_index('T1').resample('1H').asfreq().reset_index()])
.sort_values('T1').ffill().dropna().reset_index(drop=True))
Output:
T1 C1
0 2022-01-01 11:20:00 2.0
1 2022-01-01 12:00:00 2.0
2 2022-01-01 13:00:00 2.0
3 2022-01-01 14:00:00 2.0
4 2022-01-01 15:00:00 2.0
5 2022-01-01 15:40:00 8.0
6 2022-01-01 16:00:00 8.0
7 2022-01-01 17:00:00 8.0
8 2022-01-01 17:50:00 3.0

Merge two dataframes in pandas with common info as columns or as cells

I have two pandas dateframes in python, df_main and df_aux.
df_main is a table which gathers events, with the datetime when it happened and a column "Description" which gives a codified location. It has the following structure:
Date
Description
2022-01-01 13:45:23
A
2022-01-01 14:22:00
C
2022-01-01 16:15:33
D
2022-01-01 16:21:22
E
2022-01-02 13:21:56
B
2022-01-02 14:45:41
B
2022-01-02 15:11:34
C
df_aux is a table which gives the number of other events (let's say for example people walking by within Initial_Date and Final_Date) which are happening in each location (A, B, C, D), with a 1-hour granularity. The structure of df_aux is as follows:
Initial_Date
Final_Date
A
B
C
D
2022-01-01 12:00:00
2022-01-01 12:59:59
2
0
1
2
2022-01-01 13:00:00
2022-01-01 13:59:59
3
2
4
5
2022-01-01 14:00:00
2022-01-01 14:59:59
2
2
7
0
2022-01-01 15:00:00
2022-01-01 15:59:59
5
2
2
0
2022-01-02 12:00:00
2022-01-02 12:59:59
1
1
0
3
2022-01-02 13:00:00
2022-01-02 13:59:59
5
5
0
3
2022-01-02 14:00:00
2022-01-02 14:59:59
2
3
2
1
2022-01-02 15:00:00
2022-01-02 15:59:59
3
4
1
0
So my problem is that I would need to add a new column in df_main to account for the number of people who have walked by in the hour previous to the event. For example, in the first event, which happens at 13:45:23h, we would go to the df_aux and look for the previous hour (12:45:23), which is the first row, as 12:45:23 is between 12:00:00 and 12:59:59. In that time range, column A has a value of 2, so we would add a new column to the df_main, "People_prev_hour", taking the value 2.
Following the same logic, the full df_main would be,
Date
Description
People_prev_hour
2022-01-01 13:45:23
A
2
2022-01-01 14:22:00
C
4
2022-01-01 16:15:33
D
0
2022-01-01 16:21:22
E
NaN
2022-01-02 13:21:56
B
1
2022-01-02 14:45:41
B
5
2022-01-02 15:11:34
F
NaN
Datetimes will always be complete between both dfs, but the Description column may not. As seen in the full df_main, two rows have as Description values E and F, which are not in df_aux. Therefore, in those cases a NaN must be present.
I can't think of a way of merging these two dfs into the desired output, as pd.merge uses common columns, and I don't manage to do anything with pd.melt or pd.pivot. Any help is much appreciated!
First idea was use merge_asof, because hours not overlaping intervals:
df1 = pd.merge_asof(df_main,
df_aux.assign(Initial_Date = df_aux['Initial_Date'] + pd.Timedelta(1, 'hour')),
left_on='Date',
right_on='Initial_Date')
Then use indexing lookup:
idx, cols = pd.factorize(df1['Description'])
df_main['People_prev_hour'] = (df1.reindex(cols, axis=1).to_numpy() [np.arange(len(df1)), idx])
print (df_main)
Date Description People_prev_hour
0 2022-01-01 13:45:23 A 2.0
1 2022-01-01 14:22:00 C 4.0
2 2022-01-01 16:15:33 D 0.0
3 2022-01-01 16:21:22 E NaN
4 2022-01-02 13:21:56 B 1.0
5 2022-01-02 14:45:41 B 5.0
6 2022-01-02 15:11:34 C 2.0
Another idea with IntervalIndex:
s = pd.IntervalIndex.from_arrays(df_aux.Initial_Date + pd.Timedelta(1, 'hour'),
df_aux.Final_Date + pd.Timedelta(1, 'hour'), 'both')
df1 = df_aux.set_index(s).loc[df_main.Date]
print (df1)
Initial_Date \
[2022-01-01 13:00:00, 2022-01-01 13:59:59] 2022-01-01 12:00:00
[2022-01-01 14:00:00, 2022-01-01 14:59:59] 2022-01-01 13:00:00
[2022-01-01 16:00:00, 2022-01-01 16:59:59] 2022-01-01 15:00:00
[2022-01-01 16:00:00, 2022-01-01 16:59:59] 2022-01-01 15:00:00
[2022-01-02 13:00:00, 2022-01-02 13:59:59] 2022-01-02 12:00:00
[2022-01-02 14:00:00, 2022-01-02 14:59:59] 2022-01-02 13:00:00
[2022-01-02 15:00:00, 2022-01-02 15:59:59] 2022-01-02 14:00:00
Final_Date A B C D
[2022-01-01 13:00:00, 2022-01-01 13:59:59] 2022-01-01 12:59:59 2 0 1 2
[2022-01-01 14:00:00, 2022-01-01 14:59:59] 2022-01-01 13:59:59 3 2 4 5
[2022-01-01 16:00:00, 2022-01-01 16:59:59] 2022-01-01 15:59:59 5 2 2 0
[2022-01-01 16:00:00, 2022-01-01 16:59:59] 2022-01-01 15:59:59 5 2 2 0
[2022-01-02 13:00:00, 2022-01-02 13:59:59] 2022-01-02 12:59:59 1 1 0 3
[2022-01-02 14:00:00, 2022-01-02 14:59:59] 2022-01-02 13:59:59 5 5 0 3
[2022-01-02 15:00:00, 2022-01-02 15:59:59] 2022-01-02 14:59:59 2 3 2 1
idx, cols = pd.factorize(df_main['Description'])
df_main['People_prev_hour'] = (df1.reindex(cols, axis=1).to_numpy() [np.arange(len(df1)), idx])
print (df_main)
Date Description People_prev_hour
0 2022-01-01 13:45:23 A 2.0
1 2022-01-01 14:22:00 C 4.0
2 2022-01-01 16:15:33 D 0.0
3 2022-01-01 16:21:22 E NaN
4 2022-01-02 13:21:56 B 1.0
5 2022-01-02 14:45:41 B 5.0
6 2022-01-02 15:11:34 C 2.0

Adding a list of different length under a certain condition to a date time index pandas dataframe

How can I insert a list of values in a certain position of a dataframe with date time index under the condition that I want to insert the list after a certain value is > a certain number? Example below:
import pandas as pd
example_list = [2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5]
example_index = pd.date_range('2022-01-01', periods=120, freq='1min')
example_df = pd.DataFrame({'example values': np.arange(120)})
example_df.index = example_index
example_df
Output:
example values
2022-01-01 00:00:00 0
2022-01-01 00:01:00 1
2022-01-01 00:02:00 2
2022-01-01 00:03:00 3
2022-01-01 00:04:00 4
... ...
2022-01-01 01:55:00 115
2022-01-01 01:56:00 116
2022-01-01 01:57:00 117
2022-01-01 01:58:00 118
2022-01-01 01:59:00 119
I want to insert the example_list as a new column called
"example_values_2" at the position where the example_values>20. Is this possible?
IIUC, you can find the index of this value and slice:
start = example_df['example values'].gt(20).argmax()
idx = example_df.index
example_df.loc[idx[start:start+len(example_list)], 'example_values_2'] = example_list
output:
example values example_values_2
... ... ...
2022-01-01 00:20:00 20 NaN
2022-01-01 00:21:00 21 2.0
2022-01-01 00:22:00 22 2.0
2022-01-01 00:23:00 23 2.0
2022-01-01 00:24:00 24 2.0
2022-01-01 00:25:00 25 3.0
2022-01-01 00:26:00 26 3.0
2022-01-01 00:27:00 27 3.0
2022-01-01 00:28:00 28 3.0
2022-01-01 00:29:00 29 4.0
2022-01-01 00:30:00 30 4.0
2022-01-01 00:31:00 31 4.0
2022-01-01 00:32:00 32 4.0
2022-01-01 00:33:00 33 5.0
2022-01-01 00:34:00 34 5.0
2022-01-01 00:35:00 35 5.0
2022-01-01 00:36:00 36 NaN
... ... ...

remove certain numbers from two dataframes python

I have two dataframes
dt AAPL AMC AMZN ASO ATH ... SPCE SRNE TH TSLA VIAC WKHS
0 2021-04-12 36 28 6 20 1 ... 5 0 0 50 23 0
1 2021-04-13 46 15 5 16 6 ... 5 0 0 122 12 1
2 2021-04-14 12 4 1 5 2 ... 2 0 0 39 1 0
3 2021-04-15 30 23 3 14 2 ... 15 0 0 101 9 0
dt AAPL AMC AMZN ASO ATH ... SPCE SRNE TH TSLA VIAC WKHS
0 2021-04-12 41 28 4 33 10 ... 5 0 0 56 14 3
1 2021-04-13 76 22 7 12 29 ... 4 0 0 134 8 2
2 2021-04-14 21 15 2 7 16 ... 2 0 0 61 3 0
3 2021-04-15 54 43 9 2 31 ... 16 0 0 83 13 1
I want to remove numbers from two dataframe that are lower than 10 if the instance is deleted from one dataframe the same cell should be remove in another dataframe same thing goes other way around
Appreciate your help
Use a mask:
## pre-requisite
df1 = df1.set_index('dt')
df2 = df2.set_index('dt')
## processing
mask = df1.lt(10) | df2.lt(10)
df1 = df1.mask(mask)
df2 = df2.mask(mask)
output:
>>> df1
AAPL AMC AMZN ASO ATH SPCE SRNE TH TSLA VIAC WKHS
dt
2021-04-12 36 28.0 NaN 20.0 NaN NaN NaN NaN 50 23.0 NaN
2021-04-13 46 15.0 NaN 16.0 NaN NaN NaN NaN 122 NaN NaN
2021-04-14 12 NaN NaN NaN NaN NaN NaN NaN 39 NaN NaN
2021-04-15 30 23.0 NaN NaN NaN 15.0 NaN NaN 101 NaN NaN
>>> df2
AAPL AMC AMZN ASO ATH SPCE SRNE TH TSLA VIAC WKHS
dt
2021-04-12 41 28.0 NaN 33.0 NaN NaN NaN NaN 56 14.0 NaN
2021-04-13 76 22.0 NaN 12.0 NaN NaN NaN NaN 134 NaN NaN
2021-04-14 21 NaN NaN NaN NaN NaN NaN NaN 61 NaN NaN
2021-04-15 54 43.0 NaN NaN NaN 16.0 NaN NaN 83 NaN NaN

Add missing times in dataframe column with pandas

I have a dataframe like so:
df = pd.DataFrame({'time':['23:59:45','23:49:50','23:59:55','00:00:00','00:00:05','00:00:10','00:00:15'],
'X':[-5,-4,-2,5,6,10,11],
'Y':[3,4,5,9,20,22,23]})
As you can see, the time is formed by hours (string format) and are across midnight. The time is given every 5 seconds!
My goal is however to add empty rows (filled with Nan for examples) so that the time is every second. Finally the column time should be converted as a time stamp and set as index.
Could you please suggest a smart and elegant way to achieve my goal?
Here is what the output should look like:
X Y
time
23:59:45 -5.0 3.0
23:59:46 NaN NaN
23:59:47 NaN NaN
23:59:48 NaN NaN
... ... ...
00:00:10 10.0 22.0
00:00:11 NaN NaN
00:00:12 NaN NaN
00:00:13 NaN NaN
00:00:14 NaN NaN
00:00:15 11.0 23.0
Note: I do not need the dates.
Use to_timedelta with reindex by timedelta_range:
df['time'] = pd.to_timedelta(df['time'])
idx = pd.timedelta_range('0', '23:59:59', freq='S', name='time')
df = df.set_index('time').reindex(idx).reset_index()
print (df.head(10))
time X Y
0 00:00:00 5.0 9.0
1 00:00:01 NaN NaN
2 00:00:02 NaN NaN
3 00:00:03 NaN NaN
4 00:00:04 NaN NaN
5 00:00:05 6.0 20.0
6 00:00:06 NaN NaN
7 00:00:07 NaN NaN
8 00:00:08 NaN NaN
9 00:00:09 NaN NaN
If need replace NaNs:
df = df.set_index('time').reindex(idx, fill_value=0).reset_index()
print (df.head(10))
time X Y
0 00:00:00 5 9
1 00:00:01 0 0
2 00:00:02 0 0
3 00:00:03 0 0
4 00:00:04 0 0
5 00:00:05 6 20
6 00:00:06 0 0
7 00:00:07 0 0
8 00:00:08 0 0
9 00:00:09 0 0
Another solution with resample, but is possible some rows are missing in the end:
df = df.set_index('time').resample('S').first()
print (df.tail(10))
X Y
time
23:59:46 NaN NaN
23:59:47 NaN NaN
23:59:48 NaN NaN
23:59:49 NaN NaN
23:59:50 NaN NaN
23:59:51 NaN NaN
23:59:52 NaN NaN
23:59:53 NaN NaN
23:59:54 NaN NaN
23:59:55 -2.0 5.0
EDIT1:
idx1 = pd.timedelta_range('23:59:45', '23:59:59', freq='S', name='time')
idx2 = pd.timedelta_range('0', '00:00:15', freq='S', name='time')
idx = np.concatenate([idx1, idx2])
df['time'] = pd.to_timedelta(df['time'])
df = df.set_index('time').reindex(idx).reset_index()
print (df.head(10))
time X Y
0 23:59:45 -5.0 3.0
1 23:59:46 NaN NaN
2 23:59:47 NaN NaN
3 23:59:48 NaN NaN
4 23:59:49 NaN NaN
5 23:59:50 NaN NaN
6 23:59:51 NaN NaN
7 23:59:52 NaN NaN
8 23:59:53 NaN NaN
9 23:59:54 NaN NaN
print (df.tail(10))
time X Y
21 00:00:06 NaN NaN
22 00:00:07 NaN NaN
23 00:00:08 NaN NaN
24 00:00:09 NaN NaN
25 00:00:10 10.0 22.0
26 00:00:11 NaN NaN
27 00:00:12 NaN NaN
28 00:00:13 NaN NaN
29 00:00:14 NaN NaN
30 00:00:15 11.0 23.0
EDIT:
Another solution - change next day to 1 day timedeltas:
df['time'] = pd.to_timedelta(df['time'])
a = pd.to_timedelta(df['time'].diff().dt.days.abs().cumsum().fillna(1).sub(1), unit='d')
df['time'] = df['time'] + a
print (df)
X Y time
0 -5 3 0 days 23:59:45
1 -4 4 0 days 23:49:50
2 -2 5 0 days 23:59:55
3 5 9 1 days 00:00:00
4 6 20 1 days 00:00:05
5 10 22 1 days 00:00:10
6 11 23 1 days 00:00:15
idx = pd.timedelta_range(df['time'].min(), df['time'].max(), freq='S', name='time')
df = df.set_index('time').reindex(idx).reset_index()
print (df.head(10))
time X Y
0 23:49:50 -4.0 4.0
1 23:49:51 NaN NaN
2 23:49:52 NaN NaN
3 23:49:53 NaN NaN
4 23:49:54 NaN NaN
5 23:49:55 NaN NaN
6 23:49:56 NaN NaN
7 23:49:57 NaN NaN
8 23:49:58 NaN NaN
9 23:49:59 NaN NaN
print (df.tail(10))
time X Y
616 1 days 00:00:06 NaN NaN
617 1 days 00:00:07 NaN NaN
618 1 days 00:00:08 NaN NaN
619 1 days 00:00:09 NaN NaN
620 1 days 00:00:10 10.0 22.0
621 1 days 00:00:11 NaN NaN
622 1 days 00:00:12 NaN NaN
623 1 days 00:00:13 NaN NaN
624 1 days 00:00:14 NaN NaN
625 1 days 00:00:15 11.0 23.0

Categories

Resources