How to convert a DatetimeIndexResamplerGroupby object to a Data Frame?

How to convert a DatetimeIndexResamplerGroupby object to a Data Frame? - python

I want to resampling a data frame which has a time series data at 30 second interval to 1 second interval. For this I used:
test_data=test_data.groupby('entity_id').resample('S', fill_method='ffill')
The output is:
<pandas.core.resample.DatetimeIndexResamplerGroupby object at 0x1a1f64f588>
How can I convert this object to a data frame?
I have tried:
test_data = pd.DataFrame(test_data)
after running the last command but it returns a data frame which has the index and a list of all other elements of that row.

Use ffill method:
test_data = pd.DataFrame({
'entity_id': ['a','a','a','a','b','b','b','c','d'],
'data':range(9)},
index=pd.date_range('2018-01-01', periods=9, freq='3S'))
print (test_data)
entity_id data
2018-01-01 00:00:00 a 0
2018-01-01 00:00:03 a 1
2018-01-01 00:00:06 a 2
2018-01-01 00:00:09 a 3
2018-01-01 00:00:12 b 4
2018-01-01 00:00:15 b 5
2018-01-01 00:00:18 b 6
2018-01-01 00:00:21 c 7
2018-01-01 00:00:24 d 8
test_data=test_data.groupby('entity_id')['data'].resample('S').ffill()
print (test_data)
entity_id
a 2018-01-01 00:00:00 0
2018-01-01 00:00:01 0
2018-01-01 00:00:02 0
2018-01-01 00:00:03 1
2018-01-01 00:00:04 1
2018-01-01 00:00:05 1
2018-01-01 00:00:06 2
2018-01-01 00:00:07 2
2018-01-01 00:00:08 2
2018-01-01 00:00:09 3
b 2018-01-01 00:00:12 4
2018-01-01 00:00:13 4
2018-01-01 00:00:14 4
2018-01-01 00:00:15 5
2018-01-01 00:00:16 5
2018-01-01 00:00:17 5
2018-01-01 00:00:18 6
c 2018-01-01 00:00:21 7
d 2018-01-01 00:00:24 8
Name: data, dtype: int64

Related

find the date range of groupped data in a dataframe

I have a dataset with 15-minutes observations for different stations for 20 years. I want to know the range time that each station has data.
station_id
start_time
end_time
observation
2
2000-01-02 01:00:00
2000-01-02 01:15:00
50
2
2000-01-02 01:15:00
2000-01-02 01:30:00
15
2
2000-02-02 01:30:00
2000-01-02 01:45:00
3
3
2000-01-02 05:00:00
2000-01-02 05:15:00
10
3
2000-01-02 05:15:00
2000-01-02 05:30:00
2
3
2000-02-03 01:00:00
2000-01-02 01:15:00
15
3
2000-02-04 01:00:00
2000-01-02 01:15:00
20
an example of I want to have
|station_id | start | end | years |days
| 2 |2000-01-02 01:00:00|2000-01-02 01:45:00| 1 | 1
| 3 |2000-01-02 05:00:00|2000-01-02 01:15:00| 1 | 1

Try using groupby, diff, abs, agg and assign:
df[['start_time', 'end_time']] = df[['start_time', 'end_time']].apply(pd.to_datetime)
x = df.groupby('station_id').agg({'start_time': 'first', 'end_time': 'last'})
temp = x.diff(axis=1).abs()['end_time']
x = x.assign(years=temp.dt.days // 365, days=temp.dt.days % 365).reset_index()
print(x)

How to extract the first and last value from a data sequence based on a column value?

I have a time series dataset that can be created with the following code.
idx = pd.date_range("2018-01-01", periods=100, freq="H")
ts = pd.Series(idx)
dft = pd.DataFrame(ts,columns=["date"])
dft["data"] = ""
dft["data"][0:5]= "a"
dft["data"][5:15]= "b"
dft["data"][15:20]= "c"
dft["data"][20:30]= "d"
dft["data"][30:40]= "a"
dft["data"][40:70]= "c"
dft["data"][70:85]= "b"
dft["data"][85:len(dft)]= "c"
In the data column, the unique values are a,b,c,d. These values are repeating in a sequence in different time windows. I want to capture the first and last value of that time window. How can I do that?

Compute a grouper for your changing values using shift to compare consecutive rows, then use groupby+agg to get the min/max per group:
group = dft.data.ne(dft.data.shift()).cumsum()
dft.groupby(group)['date'].agg(['min', 'max'])
output:
min max
data
1 2018-01-01 00:00:00 2018-01-01 04:00:00
2 2018-01-01 05:00:00 2018-01-01 14:00:00
3 2018-01-01 15:00:00 2018-01-01 19:00:00
4 2018-01-01 20:00:00 2018-01-02 05:00:00
5 2018-01-02 06:00:00 2018-01-02 15:00:00
6 2018-01-02 16:00:00 2018-01-03 21:00:00
7 2018-01-03 22:00:00 2018-01-04 12:00:00
8 2018-01-04 13:00:00 2018-01-05 03:00:00
edit. combining with original data:
dft.groupby(group).agg({'data': 'first', 'date': ['min', 'max']})
output:
data date
first min max
data
1 a 2018-01-01 00:00:00 2018-01-01 04:00:00
2 b 2018-01-01 05:00:00 2018-01-01 14:00:00
3 c 2018-01-01 15:00:00 2018-01-01 19:00:00
4 d 2018-01-01 20:00:00 2018-01-02 05:00:00
5 a 2018-01-02 06:00:00 2018-01-02 15:00:00
6 c 2018-01-02 16:00:00 2018-01-03 21:00:00
7 b 2018-01-03 22:00:00 2018-01-04 12:00:00
8 c 2018-01-04 13:00:00 2018-01-05 03:00:00

Pandas cumulative sum reset based on percentage of the last checkpoint

Sample Data
date_rng = pd.date_range('2019-01-01', freq='s', periods=400)
df = pd.DataFrame(np.random.lognormal(.005, .5, size=(len(date_rng))),
columns=['data1'],
index=date_rng)
Example input
data1
2019-01-01 00:00:00 1
2019-01-01 00:00:01 -2
2019-01-01 00:00:02 2
2019-01-01 00:00:03 3
2019-01-01 00:00:04 1
2019-01-01 00:00:05 2
2019-01-01 00:00:06 -1
2019-01-01 00:00:07 3
2019-01-01 00:00:08 4
2019-01-01 00:00:09 5
2019-01-01 00:00:10 7
2019-01-01 00:00:11 2
2019-01-01 00:00:12 4
2019-01-01 00:00:13 -1
2019-01-01 00:00:14 5
2019-01-01 00:00:15 3
2019-01-01 00:00:16 5
2019-01-01 00:00:17 -3
... ...
Expected output
data1 cumsum
2019-01-01 00:00:00 1 1
2019-01-01 00:00:01 -2 -1
2019-01-01 00:00:02 2 1
2019-01-01 00:00:03 3 4
2019-01-01 00:00:04 1 5 (reset cumsum at this point)
2019-01-01 00:00:05 2 2
2019-01-01 00:00:06 -1 1
2019-01-01 00:00:07 3 4
2019-01-01 00:00:08 4 8 (reset at this point)
2019-01-01 00:00:09 5 5 (reset at this point)
2019-01-01 00:00:10 7 7 (reset at this point)
2019-01-01 00:00:11 2 2
2019-01-01 00:00:12 4 6 (reset at this point)
2019-01-01 00:00:13 -1 -1
2019-01-01 00:00:14 5 4
2019-01-01 00:00:15 3 7 (reset at this point)
2019-01-01 00:00:16 1 1
2019-01-01 00:00:17 -3 -2
... ...
I want to calculate the cumulative sum of the data1 based on the condtiontion that it will reset if it larger than 5 and exceed 20% of the last reset. For example: for the first calculation, it will only reset if it's larger than 5, after that, it will reset based on both conditions.
I have checked out some other answers on stackoverflow but I didn't find similar problems. Please advice me on how to solve this problem.

You can use a simple loop with if else and some logic to solve this problem.
ls = [] # store result
cumsum = 0
last_reset = 0
for _, row in df.iterrows():
cumsum = cumsum + row.data1
ls.append(cumsum)
if cumsum >= 5 and cumsum > 0.2*last_reset:
last_reset = cumsum
cumsum = 0
df['cumsum'] = ls
df
data1 cumsum
2019-01-01 00:00:00 1 1
2019-01-01 00:00:01 -2 -1
2019-01-01 00:00:02 2 1
2019-01-01 00:00:03 3 4
2019-01-01 00:00:04 1 5
2019-01-01 00:00:05 2 2
2019-01-01 00:00:06 -1 1
2019-01-01 00:00:07 3 4
2019-01-01 00:00:08 4 8
2019-01-01 00:00:09 5 5
2019-01-01 00:00:10 7 7
2019-01-01 00:00:11 2 2
2019-01-01 00:00:12 4 6
2019-01-01 00:00:13 -1 -1
2019-01-01 00:00:14 5 4
2019-01-01 00:00:15 3 7
2019-01-01 00:00:16 1 1
2019-01-01 00:00:17 -3 -2

Mapping datetime columns of two tables

I created a dataframe with only datetime column with 1 second interval for jan 1, 2018 as shown in the code below.
i = pd.date_range(start='2018-01-01 00:00:00', end='2018-01-01 23:59:00', freq="1S")
ts = pd.DataFrame(index=i)
ts = ts.reset_index()
ts = ts.rename(columns={'index': 'datetime'})`
df1:
datetime
0 2018-01-01 00:00:00
1 2018-01-01 00:00:01
2 2018-01-01 00:00:02
3 2018-01-01 00:00:03
4 2018-01-01 00:00:04
5 2018-01-01 00:00:05
6 2018-01-01 00:00:06
7 2018-01-01 00:00:07
8 2018-01-01 00:00:08
9 2018-01-01 00:00:09
10 2018-01-01 00:00:10
11 2018-01-01 00:00:11
12 2018-01-01 00:00:12
13 2018-01-01 00:00:13
14 2018-01-01 00:00:14
15 2018-01-01 00:00:15
16 2018-01-01 00:00:16
17 2018-01-01 00:00:17
18 2018-01-01 00:00:18
19 2018-01-01 00:00:19
20 2018-01-01 00:00:20
21 2018-01-01 00:00:21
22 2018-01-01 00:00:22
23 2018-01-01 00:00:23
24 2018-01-01 00:00:24
25 2018-01-01 00:00:25
26 2018-01-01 00:00:26
27 2018-01-01 00:00:27
28 2018-01-01 00:00:28
29 2018-01-01 00:00:29`
I have another dataframe with a datetime column another columns
df2:
datetime a b c d e
0 2018-01-01 00:00:04 0.9
1 2018-01-01 00:00:06 0.6 0.7
2 2018-01-01 00:00:09 0.5 0.7 0.8
3 2018-01-01 00:00:16 2.3 3.6 4.9 5.0
4 2018-01-01 00:00:17 0.9 3.5 5.5
5 2018-01-01 00:00:23 0.1 0.6 0.0 1.7
6 2018-01-01 00:00:29 2.7 5.5 4.3 `
Now I am trying to map the datetime columns of df1 and df2 using pandas outer join and I would like my expected result to look like
datetime a b c d e
0 2018-01-01 00:00:00
1 2018-01-01 00:00:01
2 2018-01-01 00:00:02
3 2018-01-01 00:00:03
4 2018-01-01 00:00:04 0.9
5 2018-01-01 00:00:05
6 2018-01-01 00:00:06 0.6 0.7
7 2018-01-01 00:00:07
8 2018-01-01 00:00:08
9 2018-01-01 00:00:09 0.5 0.7 0.8
10 2018-01-01 00:00:10
11 2018-01-01 00:00:11
12 2018-01-01 00:00:12
13 2018-01-01 00:00:13
14 2018-01-01 00:00:14
15 2018-01-01 00:00:15
16 2018-01-01 00:00:16 2.3 3.6 4.9 5.0
17 2018-01-01 00:00:17 0.9 3.5 5.5
18 2018-01-01 00:00:18
19 2018-01-01 00:00:19
20 2018-01-01 00:00:20
21 2018-01-01 00:00:21
22 2018-01-01 00:00:22
23 2018-01-01 00:00:23 0.1 0.6 0.0 1.7
24 2018-01-01 00:00:24
25 2018-01-01 00:00:25
26 2018-01-01 00:00:26
27 2018-01-01 00:00:27
28 2018-01-01 00:00:28
29 2018-01-01 00:00:29 2.7 5.5 4.3 `
but my output looks like this
datetime a b c d e
0 2018-01-01 00:00:00
1 2018-01-01 00:00:01
2 2018-01-01 00:00:02
3 2018-01-01 00:00:03
4 2018-01-01 00:00:04
5 2018-01-01 00:00:05
6 2018-01-01 00:00:06
7 2018-01-01 00:00:07
8 2018-01-01 00:00:08
9 2018-01-01 00:00:09
10 2018-01-01 00:00:10
11 2018-01-01 00:00:11
12 2018-01-01 00:00:12
13 2018-01-01 00:00:13
14 2018-01-01 00:00:14
15 2018-01-01 00:00:15
16 2018-01-01 00:00:16
17 2018-01-01 00:00:17
18 2018-01-01 00:00:18
19 2018-01-01 00:00:19
20 2018-01-01 00:00:20
21 2018-01-01 00:00:21
22 2018-01-01 00:00:22
23 2018-01-01 00:00:23
24 2018-01-01 00:00:24
25 2018-01-01 00:00:25
26 2018-01-01 00:00:26
27 2018-01-01 00:00:27
28 2018-01-01 00:00:28
29 2018-01-01 00:00:29
30 2018-01-01 00:00:04 0.9
31 2018-01-01 00:00:06 0.6 0.7
32 2018-01-01 00:00:09 0.5 0.7 0.8
33 2018-01-01 00:00:16 2.3 3.6 4.9 5.0
34 2018-01-01 00:00:17 0.9 3.5 5.5
35 2018-01-01 00:00:23 0.1 0.6 0.0 1.7
36 2018-01-01 00:00:29 2.7 5.5 4.3 `
The code I am using to do that operation is:
test = pandas.merge(df1, df2, on = ['datetime'], how= 'outer')
I am not quite sure how to approach this issue and I would appreciate if I can get some help.

Keep ts with datetime index and use Reindex as #Scott Boston mentioned in the comments,
i = pd.date_range(start='2018-01-01 00:00:00', end='2018-01-01 23:59:00', freq="1S")
ts = pd.DataFrame(index=i)
df['datetime'] = pd.to_datetime(df['datetime'])
df.set_index('datetime').reindex(ts.index)
a b c d e
2018-01-01 00:00:00 NaN NaN NaN NaN NaN
2018-01-01 00:00:01 NaN NaN NaN NaN NaN
2018-01-01 00:00:02 NaN NaN NaN NaN NaN
2018-01-01 00:00:03 NaN NaN NaN NaN NaN
2018-01-01 00:00:04 0.9
2018-01-01 00:00:05 NaN NaN NaN NaN NaN
2018-01-01 00:00:06 0.6 0.7
2018-01-01 00:00:07 NaN NaN NaN NaN NaN
2018-01-01 00:00:08 NaN NaN NaN NaN NaN
2018-01-01 00:00:09 0.5 0.7 0.8
2018-01-01 00:00:10 NaN NaN NaN NaN NaN
2018-01-01 00:00:11 NaN NaN NaN NaN NaN
2018-01-01 00:00:12 NaN NaN NaN NaN NaN
2018-01-01 00:00:13 NaN NaN NaN NaN NaN
2018-01-01 00:00:14 NaN NaN NaN NaN NaN
2018-01-01 00:00:15 NaN NaN NaN NaN NaN
2018-01-01 00:00:16 2.3 3.6 4.9 5.0
2018-01-01 00:00:17 0.9 3.5 5.5
Option 2: concat
pd.concat([ts, df.set_index('datetime')], axis = 1)

pandas groupby time series by 10 min and also keep some columns

i have this information; where "opid" is categorical
datetime id nut opid user amount
2018-01-01 07:01:00 1531 3hrnd 1 mherrera 1
2018-01-01 07:05:00 9510 sd45f 1 svasqu 1
2018-01-01 07:06:00 8125 5s8fr 15 urubi 1
2018-01-01 07:08:15 6324 sd5d6 1 jgonza 1
2018-01-01 07:12:01 0198 tgfg5 1 julmaf 1
2018-01-01 07:13:50 6589 mbkg4 15 jdjiep 1
2018-01-01 07:16:10 9501 wurf4 15 polga 1
the result i'm looking for is something like this
datetime opid amount
2018-01-01 07:00:00 1 3
2018-01-01 07:00:00 15 1
2018-01-01 07:10:00 1 1
2018-01-01 07:10:00 15 2
so... basically i need to know how many of each "opid" are done every 10 min
P.D "amount" is always 1, "opid" is from 1 - 15

Using grouper:
df.set_index('datetime').groupby(['opid', pd.Grouper(freq='10min')]).amount.sum()
opid datetime
1 2018-01-01 07:00:00 3
2018-01-01 07:10:00 1
15 2018-01-01 07:00:00 1
2018-01-01 07:10:00 2
Name: amount, dtype: int64

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to convert a DatetimeIndexResamplerGroupby object to a Data Frame? - python

Related

find the date range of groupped data in a dataframe

How to extract the first and last value from a data sequence based on a column value?

Pandas cumulative sum reset based on percentage of the last checkpoint

Mapping datetime columns of two tables

pandas groupby time series by 10 min and also keep some columns

Categories

Resources