pandas datetime: groupy hourly and every monday

pandas datetime: groupy hourly and every monday - python

I'm new to pandas / python:
I have a dataframe (events.number) indexed by a datetime object.
I'm trying to extract an event count hourly, on every Monday (or other particular weekday). I wrote:
hour_tally_monday = events.number.groupby(lambda x: (x.hour & x.weekday==0) ).count()
but this does not work correctly.
I can drop the "& x.weekday==1" and it works but presumably uses all the days in the frame. What's the right (simplest) syntax to just average over Mondays?

I think you need first filter dataframe with boolean indexing and then use groupby with size:
import pandas as pd
start = pd.to_datetime('2016-02-01')
end = pd.to_datetime('2016-02-25')
rng = pd.date_range(start, end, freq='12H')
events = pd.DataFrame({'number': [1] * 20 + [2] * 15 + [3] * 14}, index=rng)
print events
number
2016-02-01 00:00:00 1
2016-02-01 12:00:00 1
2016-02-02 00:00:00 1
2016-02-02 12:00:00 1
2016-02-03 00:00:00 1
2016-02-03 12:00:00 1
2016-02-04 00:00:00 1
2016-02-04 12:00:00 1
2016-02-05 00:00:00 1
2016-02-05 12:00:00 1
2016-02-06 00:00:00 1
2016-02-06 12:00:00 1
2016-02-07 00:00:00 1
...
...
filtered = events[events.index.weekday == 0]
print filtered
number
2016-02-01 00:00:00 1
2016-02-01 12:00:00 1
2016-02-08 00:00:00 1
2016-02-08 12:00:00 1
2016-02-15 00:00:00 2
2016-02-15 12:00:00 2
2016-02-22 00:00:00 3
2016-02-22 12:00:00 3
In version 0.18.1 you can use new method DatetimeIndex.weekday_name:
filtered = events[events.index.weekday_name == 'Monday']
print filtered
number
2016-02-01 00:00:00 1
2016-02-01 12:00:00 1
2016-02-08 00:00:00 1
2016-02-08 12:00:00 1
2016-02-15 00:00:00 2
2016-02-15 12:00:00 2
2016-02-22 00:00:00 3
2016-02-22 12:00:00 3
print filtered.groupby(filtered.index.hour).size()
0 4
12 4
dtype: int64

Related

How to add a new categorical column with numbering as per time Interval in Pandas

Value
2021-07-15 00:00:00 10
2021-07-15 06:00:00 10
2021-07-15 12:00:00 10
2021-07-15 18:00:00 10
2021-07-16 00:00:00 20
2021-07-16 06:00:00 10
2021-07-16 12:00:00 10
2021-07-16 18:00:00 20
I want to add a column such that when it
00:00:00 1
06:00:00 2
12:00:00 3
18:00:00 4
Eventually, I want something like this
Value Number
2021-07-15 00:00:00 10 1
2021-07-15 06:00:00 10 2
2021-07-15 12:00:00 10 3
2021-07-15 18:00:00 10 4
2021-07-16 00:00:00 20 1
2021-07-16 06:00:00 10 2
2021-07-16 12:00:00 10 3
2021-07-16 18:00:00 20 4
and so on
I want that Numbering column such that whenever it's 00:00:00 time it always says 1, whenever it's 06:00:00 time it always says 2, whenever it's 12:00:00 time it always says 3, whenever it's 18:00:00 time it always says 4. In this way, I will have a categorical column having only 1,2,3,4 values

Sorry, new here, so I don't have enough rep to comment. But #Keiku's solution is closer than you realise. If you replace .time by .hour, you get the hour of the day. Divide that by 6 to get 0-3 categories for 0:00 to 18:00. If you must have them in the range 1-4 specifically, simply add 1.
To borrow #Keiku's example code:
import pandas as pd
df = pd.DataFrame({
'2021-07-15 00:00:00 0.48',
'2021-07-15 06:00:00 80.00',
'2021-07-15 12:00:00 6.10',
'2021-07-15 18:00:00 1400.00',
'2021-07-16 00:00:00 1400.00'
}, columns=['value'])
df['date'] = pd.to_datetime(df['value'].str[:19])
df.sort_values(['date'], ascending=[True], inplace=True)
df['category'] = df['date'].dt.hour / 6 # + 1 if you want this to be 1-4

You can use pd.to_datetime to convert to datetime and .dt.time to extract the time. You can use pd.factorize for 1,2,3,4 categories.
import pandas as pd
df = pd.DataFrame({
'2021-07-15 00:00:00 0.48',
'2021-07-15 06:00:00 80.00',
'2021-07-15 12:00:00 6.10',
'2021-07-15 18:00:00 1400.00',
'2021-07-16 00:00:00 1400.00'
}, columns=['value'])
df
# value
# 0 2021-07-15 00:00:00 0.48
# 1 2021-07-15 06:00:00 80.00
# 2 2021-07-15 12:00:00 6.10
# 3 2021-07-16 00:00:00 1400.00
# 4 2021-07-15 18:00:00 1400.00
df['date'] = pd.to_datetime(df['value'].str[:19])
df.sort_values(['date'], ascending=[True], inplace=True)
df['time'] = df['date'].dt.time
df['index'], _ = pd.factorize(df['time'])
df['index'] += 1
df
# value date time index
# 0 2021-07-15 00:00:00 0.48 2021-07-15 00:00:00 00:00:00 1
# 1 2021-07-15 06:00:00 80.00 2021-07-15 06:00:00 06:00:00 2
# 2 2021-07-15 12:00:00 6.10 2021-07-15 12:00:00 12:00:00 3
# 4 2021-07-15 18:00:00 1400.00 2021-07-15 18:00:00 18:00:00 4
# 3 2021-07-16 00:00:00 1400.00 2021-07-16 00:00:00 00:00:00 1

Create regular time series from irregular interval with python

I wonder if is it possible to convert irregular time series interval to regular one without interpolating value from other column like this :
Index count
2018-01-05 00:00:00 1
2018-01-07 00:00:00 4
2018-01-08 00:00:00 15
2018-01-11 00:00:00 2
2018-01-14 00:00:00 5
2018-01-19 00:00:00 5
....
2018-12-26 00:00:00 6
2018-12-29 00:00:00 7
2018-12-30 00:00:00 8
And I expect the result to be something like this:
Index count
2018-01-01 00:00:00 0
2018-01-02 00:00:00 0
2018-01-03 00:00:00 0
2018-01-04 00:00:00 0
2018-01-05 00:00:00 1
2018-01-06 00:00:00 0
2018-01-07 00:00:00 4
2018-01-08 00:00:00 15
2018-01-09 00:00:00 0
2018-01-10 00:00:00 0
2018-01-11 00:00:00 2
2018-01-12 00:00:00 0
2018-01-13 00:00:00 0
2018-01-14 00:00:00 5
2018-01-15 00:00:00 0
2018-01-16 00:00:00 0
2018-01-17 00:00:00 0
2018-01-18 00:00:00 0
2018-01-19 00:00:00 5
....
2018-12-26 00:00:00 6
2018-12-27 00:00:00 0
2018-12-28 00:00:00 0
2018-12-29 00:00:00 7
2018-12-30 00:00:00 8
2018-12-31 00:00:00 0
So, far I just try resample from pandas but it only partially solved my problem.
Thanks in advance

Use DataFrame.reindex with date_range:
#if necessary
df.index = pd.to_datetime(df.index)
df = df.reindex(pd.date_range('2018-01-01','2018-12-31'), fill_value=0)
print (df)
count
2018-01-01 0
2018-01-02 0
2018-01-03 0
2018-01-04 0
2018-01-05 1
...
2018-12-27 0
2018-12-28 0
2018-12-29 7
2018-12-30 8
2018-12-31 0
[365 rows x 1 columns]

Convert datetime to the cloest time point

I have a dateset as below.
dummy
datetime
2015-10-25 06:00:00 1
2015-04-05 20:00:00 1
2015-11-24 00:00:00 1
2015-08-18 08:00:00 1
2015-10-21 12:00:00 1
I want to change the datetime to the cloest predefined time point, say 00:00:00 and 12:00:00
dummy
datetime
2015-10-25 00:00:00 1
2015-04-05 12:00:00 1
2015-11-24 00:00:00 1
2015-08-18 00:00:00 1
2015-10-21 12:00:00 1

Here is possible use DatetimeIndex.floor:
df.index = df.index.floor('12H')
print (df)
dummy
datetime
2015-10-25 00:00:00 1
2015-04-05 12:00:00 1
2015-11-24 00:00:00 1
2015-08-18 00:00:00 1
2015-10-21 12:00:00 1

pandas groupby time series by 10 min and also keep some columns

i have this information; where "opid" is categorical
datetime id nut opid user amount
2018-01-01 07:01:00 1531 3hrnd 1 mherrera 1
2018-01-01 07:05:00 9510 sd45f 1 svasqu 1
2018-01-01 07:06:00 8125 5s8fr 15 urubi 1
2018-01-01 07:08:15 6324 sd5d6 1 jgonza 1
2018-01-01 07:12:01 0198 tgfg5 1 julmaf 1
2018-01-01 07:13:50 6589 mbkg4 15 jdjiep 1
2018-01-01 07:16:10 9501 wurf4 15 polga 1
the result i'm looking for is something like this
datetime opid amount
2018-01-01 07:00:00 1 3
2018-01-01 07:00:00 15 1
2018-01-01 07:10:00 1 1
2018-01-01 07:10:00 15 2
so... basically i need to know how many of each "opid" are done every 10 min
P.D "amount" is always 1, "opid" is from 1 - 15

Using grouper:
df.set_index('datetime').groupby(['opid', pd.Grouper(freq='10min')]).amount.sum()
opid datetime
1 2018-01-01 07:00:00 3
2018-01-01 07:10:00 1
15 2018-01-01 07:00:00 1
2018-01-01 07:10:00 2
Name: amount, dtype: int64

Conditional selection before certain time of day - Pandas dataframe

I have the above dataframe (snippet) and want create a new dataframe which is a conditional selection where I keep only the rows that are timestamped with a time before 15:00:00.
I'm still somewhat new to Pandas / python and have been stuck on this for a while :(

You can use DataFrame.between_time:
start = pd.to_datetime('2015-02-24 11:00')
rng = pd.date_range(start, periods=10, freq='14h')
df = pd.DataFrame({'Date': rng, 'a': range(10)})
print (df)
Date a
0 2015-02-24 11:00:00 0
1 2015-02-25 01:00:00 1
2 2015-02-25 15:00:00 2
3 2015-02-26 05:00:00 3
4 2015-02-26 19:00:00 4
5 2015-02-27 09:00:00 5
6 2015-02-27 23:00:00 6
7 2015-02-28 13:00:00 7
8 2015-03-01 03:00:00 8
9 2015-03-01 17:00:00 9
df = df.set_index('Date').between_time('00:00:00', '15:00:00')
print (df)
a
Date
2015-02-24 11:00:00 0
2015-02-25 01:00:00 1
2015-02-25 15:00:00 2
2015-02-26 05:00:00 3
2015-02-27 09:00:00 5
2015-02-28 13:00:00 7
2015-03-01 03:00:00 8
If need exclude 15:00:00 add parameter include_end=False:
df = df.set_index('Date').between_time('00:00:00', '15:00:00', include_end=False)
print (df)
a
Date
2015-02-24 11:00:00 0
2015-02-25 01:00:00 1
2015-02-26 05:00:00 3
2015-02-27 09:00:00 5
2015-02-28 13:00:00 7
2015-03-01 03:00:00 8

You can check the hours of the date column and use it for subsetting:
df['date'] = pd.to_datetime(df['date']) # optional if the date column is of datetime type
df[df.date.dt.hour < 15]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

pandas datetime: groupy hourly and every monday - python

Related

How to add a new categorical column with numbering as per time Interval in Pandas

Create regular time series from irregular interval with python

Convert datetime to the cloest time point

pandas groupby time series by 10 min and also keep some columns

Conditional selection before certain time of day - Pandas dataframe

Categories

Resources