pandas datetime: groupy hourly and every monday - python

I'm new to pandas / python:
I have a dataframe (events.number) indexed by a datetime object.
I'm trying to extract an event count hourly, on every Monday (or other particular weekday). I wrote:
hour_tally_monday = events.number.groupby(lambda x: (x.hour & x.weekday==0) ).count()
but this does not work correctly.
I can drop the "& x.weekday==1" and it works but presumably uses all the days in the frame. What's the right (simplest) syntax to just average over Mondays?

I think you need first filter dataframe with boolean indexing and then use groupby with size:
import pandas as pd
start = pd.to_datetime('2016-02-01')
end = pd.to_datetime('2016-02-25')
rng = pd.date_range(start, end, freq='12H')
events = pd.DataFrame({'number': [1] * 20 + [2] * 15 + [3] * 14}, index=rng)
print events
number
2016-02-01 00:00:00 1
2016-02-01 12:00:00 1
2016-02-02 00:00:00 1
2016-02-02 12:00:00 1
2016-02-03 00:00:00 1
2016-02-03 12:00:00 1
2016-02-04 00:00:00 1
2016-02-04 12:00:00 1
2016-02-05 00:00:00 1
2016-02-05 12:00:00 1
2016-02-06 00:00:00 1
2016-02-06 12:00:00 1
2016-02-07 00:00:00 1
...
...
filtered = events[events.index.weekday == 0]
print filtered
number
2016-02-01 00:00:00 1
2016-02-01 12:00:00 1
2016-02-08 00:00:00 1
2016-02-08 12:00:00 1
2016-02-15 00:00:00 2
2016-02-15 12:00:00 2
2016-02-22 00:00:00 3
2016-02-22 12:00:00 3
In version 0.18.1 you can use new method DatetimeIndex.weekday_name:
filtered = events[events.index.weekday_name == 'Monday']
print filtered
number
2016-02-01 00:00:00 1
2016-02-01 12:00:00 1
2016-02-08 00:00:00 1
2016-02-08 12:00:00 1
2016-02-15 00:00:00 2
2016-02-15 12:00:00 2
2016-02-22 00:00:00 3
2016-02-22 12:00:00 3
print filtered.groupby(filtered.index.hour).size()
0 4
12 4
dtype: int64

Related

How to add a new categorical column with numbering as per time Interval in Pandas

Value
2021-07-15 00:00:00 10
2021-07-15 06:00:00 10
2021-07-15 12:00:00 10
2021-07-15 18:00:00 10
2021-07-16 00:00:00 20
2021-07-16 06:00:00 10
2021-07-16 12:00:00 10
2021-07-16 18:00:00 20
I want to add a column such that when it
00:00:00 1
06:00:00 2
12:00:00 3
18:00:00 4
Eventually, I want something like this
Value Number
2021-07-15 00:00:00 10 1
2021-07-15 06:00:00 10 2
2021-07-15 12:00:00 10 3
2021-07-15 18:00:00 10 4
2021-07-16 00:00:00 20 1
2021-07-16 06:00:00 10 2
2021-07-16 12:00:00 10 3
2021-07-16 18:00:00 20 4
and so on
I want that Numbering column such that whenever it's 00:00:00 time it always says 1, whenever it's 06:00:00 time it always says 2, whenever it's 12:00:00 time it always says 3, whenever it's 18:00:00 time it always says 4. In this way, I will have a categorical column having only 1,2,3,4 values
Sorry, new here, so I don't have enough rep to comment. But #Keiku's solution is closer than you realise. If you replace .time by .hour, you get the hour of the day. Divide that by 6 to get 0-3 categories for 0:00 to 18:00. If you must have them in the range 1-4 specifically, simply add 1.
To borrow #Keiku's example code:
import pandas as pd
df = pd.DataFrame({
'2021-07-15 00:00:00 0.48',
'2021-07-15 06:00:00 80.00',
'2021-07-15 12:00:00 6.10',
'2021-07-15 18:00:00 1400.00',
'2021-07-16 00:00:00 1400.00'
}, columns=['value'])
df['date'] = pd.to_datetime(df['value'].str[:19])
df.sort_values(['date'], ascending=[True], inplace=True)
df['category'] = df['date'].dt.hour / 6 # + 1 if you want this to be 1-4
You can use pd.to_datetime to convert to datetime and .dt.time to extract the time. You can use pd.factorize for 1,2,3,4 categories.
import pandas as pd
df = pd.DataFrame({
'2021-07-15 00:00:00 0.48',
'2021-07-15 06:00:00 80.00',
'2021-07-15 12:00:00 6.10',
'2021-07-15 18:00:00 1400.00',
'2021-07-16 00:00:00 1400.00'
}, columns=['value'])
df
# value
# 0 2021-07-15 00:00:00 0.48
# 1 2021-07-15 06:00:00 80.00
# 2 2021-07-15 12:00:00 6.10
# 3 2021-07-16 00:00:00 1400.00
# 4 2021-07-15 18:00:00 1400.00
df['date'] = pd.to_datetime(df['value'].str[:19])
df.sort_values(['date'], ascending=[True], inplace=True)
df['time'] = df['date'].dt.time
df['index'], _ = pd.factorize(df['time'])
df['index'] += 1
df
# value date time index
# 0 2021-07-15 00:00:00 0.48 2021-07-15 00:00:00 00:00:00 1
# 1 2021-07-15 06:00:00 80.00 2021-07-15 06:00:00 06:00:00 2
# 2 2021-07-15 12:00:00 6.10 2021-07-15 12:00:00 12:00:00 3
# 4 2021-07-15 18:00:00 1400.00 2021-07-15 18:00:00 18:00:00 4
# 3 2021-07-16 00:00:00 1400.00 2021-07-16 00:00:00 00:00:00 1

Create regular time series from irregular interval with python

I wonder if is it possible to convert irregular time series interval to regular one without interpolating value from other column like this :
Index count
2018-01-05 00:00:00 1
2018-01-07 00:00:00 4
2018-01-08 00:00:00 15
2018-01-11 00:00:00 2
2018-01-14 00:00:00 5
2018-01-19 00:00:00 5
....
2018-12-26 00:00:00 6
2018-12-29 00:00:00 7
2018-12-30 00:00:00 8
And I expect the result to be something like this:
Index count
2018-01-01 00:00:00 0
2018-01-02 00:00:00 0
2018-01-03 00:00:00 0
2018-01-04 00:00:00 0
2018-01-05 00:00:00 1
2018-01-06 00:00:00 0
2018-01-07 00:00:00 4
2018-01-08 00:00:00 15
2018-01-09 00:00:00 0
2018-01-10 00:00:00 0
2018-01-11 00:00:00 2
2018-01-12 00:00:00 0
2018-01-13 00:00:00 0
2018-01-14 00:00:00 5
2018-01-15 00:00:00 0
2018-01-16 00:00:00 0
2018-01-17 00:00:00 0
2018-01-18 00:00:00 0
2018-01-19 00:00:00 5
....
2018-12-26 00:00:00 6
2018-12-27 00:00:00 0
2018-12-28 00:00:00 0
2018-12-29 00:00:00 7
2018-12-30 00:00:00 8
2018-12-31 00:00:00 0
So, far I just try resample from pandas but it only partially solved my problem.
Thanks in advance
Use DataFrame.reindex with date_range:
#if necessary
df.index = pd.to_datetime(df.index)
df = df.reindex(pd.date_range('2018-01-01','2018-12-31'), fill_value=0)
print (df)
count
2018-01-01 0
2018-01-02 0
2018-01-03 0
2018-01-04 0
2018-01-05 1
...
2018-12-27 0
2018-12-28 0
2018-12-29 7
2018-12-30 8
2018-12-31 0
[365 rows x 1 columns]

Convert datetime to the cloest time point

I have a dateset as below.
dummy
datetime
2015-10-25 06:00:00 1
2015-04-05 20:00:00 1
2015-11-24 00:00:00 1
2015-08-18 08:00:00 1
2015-10-21 12:00:00 1
I want to change the datetime to the cloest predefined time point, say 00:00:00 and 12:00:00
dummy
datetime
2015-10-25 00:00:00 1
2015-04-05 12:00:00 1
2015-11-24 00:00:00 1
2015-08-18 00:00:00 1
2015-10-21 12:00:00 1
Here is possible use DatetimeIndex.floor:
df.index = df.index.floor('12H')
print (df)
dummy
datetime
2015-10-25 00:00:00 1
2015-04-05 12:00:00 1
2015-11-24 00:00:00 1
2015-08-18 00:00:00 1
2015-10-21 12:00:00 1

pandas groupby time series by 10 min and also keep some columns

i have this information; where "opid" is categorical
datetime id nut opid user amount
2018-01-01 07:01:00 1531 3hrnd 1 mherrera 1
2018-01-01 07:05:00 9510 sd45f 1 svasqu 1
2018-01-01 07:06:00 8125 5s8fr 15 urubi 1
2018-01-01 07:08:15 6324 sd5d6 1 jgonza 1
2018-01-01 07:12:01 0198 tgfg5 1 julmaf 1
2018-01-01 07:13:50 6589 mbkg4 15 jdjiep 1
2018-01-01 07:16:10 9501 wurf4 15 polga 1
the result i'm looking for is something like this
datetime opid amount
2018-01-01 07:00:00 1 3
2018-01-01 07:00:00 15 1
2018-01-01 07:10:00 1 1
2018-01-01 07:10:00 15 2
so... basically i need to know how many of each "opid" are done every 10 min
P.D "amount" is always 1, "opid" is from 1 - 15
Using grouper:
df.set_index('datetime').groupby(['opid', pd.Grouper(freq='10min')]).amount.sum()
opid datetime
1 2018-01-01 07:00:00 3
2018-01-01 07:10:00 1
15 2018-01-01 07:00:00 1
2018-01-01 07:10:00 2
Name: amount, dtype: int64

Conditional selection before certain time of day - Pandas dataframe

I have the above dataframe (snippet) and want create a new dataframe which is a conditional selection where I keep only the rows that are timestamped with a time before 15:00:00.
I'm still somewhat new to Pandas / python and have been stuck on this for a while :(
You can use DataFrame.between_time:
start = pd.to_datetime('2015-02-24 11:00')
rng = pd.date_range(start, periods=10, freq='14h')
df = pd.DataFrame({'Date': rng, 'a': range(10)})
print (df)
Date a
0 2015-02-24 11:00:00 0
1 2015-02-25 01:00:00 1
2 2015-02-25 15:00:00 2
3 2015-02-26 05:00:00 3
4 2015-02-26 19:00:00 4
5 2015-02-27 09:00:00 5
6 2015-02-27 23:00:00 6
7 2015-02-28 13:00:00 7
8 2015-03-01 03:00:00 8
9 2015-03-01 17:00:00 9
df = df.set_index('Date').between_time('00:00:00', '15:00:00')
print (df)
a
Date
2015-02-24 11:00:00 0
2015-02-25 01:00:00 1
2015-02-25 15:00:00 2
2015-02-26 05:00:00 3
2015-02-27 09:00:00 5
2015-02-28 13:00:00 7
2015-03-01 03:00:00 8
If need exclude 15:00:00 add parameter include_end=False:
df = df.set_index('Date').between_time('00:00:00', '15:00:00', include_end=False)
print (df)
a
Date
2015-02-24 11:00:00 0
2015-02-25 01:00:00 1
2015-02-26 05:00:00 3
2015-02-27 09:00:00 5
2015-02-28 13:00:00 7
2015-03-01 03:00:00 8
You can check the hours of the date column and use it for subsetting:
df['date'] = pd.to_datetime(df['date']) # optional if the date column is of datetime type
df[df.date.dt.hour < 15]

Categories

Resources