Pandas count by time slots - python

I'm new to pandas and it's still hard to understand how to do that. In pure python, it becomes so hard and unreadable. I need to count rows day-by-day with 1-hour time slots (is the begin-end range in this slot).
For ex., for data:
begin_time end_time
2020-01-01 11:02:10 2020-01-01 12:33:05
2020-01-01 12:22:20 2020-01-01 13:01:51
2020-01-02 09:02:24 2020-01-02 11:33:46
we'll have:
# for 2020-01-01
time slot count
11:00 - 12:00 1
12:00 - 13:00 2
13:00 - 14:00 1
...
Help will be gladly accepted. Thanks.

This question needs a bit more context but I think you're looking for
df.groupby([pd.Grouper(key='begin_time', freq='H')])['column_to_count'].count()

Related

Time difference in pandas (from string format to datetime)

I have the following column
Time
2:00
00:13
1:00
00:24
in object format (strings). This time refers to hours and minutes ago from a time that I need to use as a start: 8:00 (it might change; in this example is 8:00).
Since the times in the column Time are referring to hours/minutes ago, what I would like to expect should be
Time
6:00
07:47
7:00
07:36
calculated as time difference (e.g. 8:00 - 2:00).
However, I am having difficulties in doing this calculation and transform the result in a datetime (keeping only hours and minutes).
I hope you can help me.
Since the Time columns contains only Hour:Minute I suggest using timedelta instead of datetime:
df['Time'] = pd.to_timedelta(df.Time+':00')
df['Start_Time'] = pd.to_timedelta('8:00:00') - df['Time']
Output:
Time Start_Time
0 02:00:00 06:00:00
1 00:13:00 07:47:00
2 01:00:00 07:00:00
3 00:24:00 07:36:00
you can do it using pd.to_datetime.
ref = pd.to_datetime('08:00') #here define the hour of reference
s = ref-pd.to_datetime(df['Time'])
print (s)
0 06:00:00
1 07:47:00
2 07:00:00
3 07:36:00
Name: Time, dtype: timedelta64[ns]
This return a series, that can be change to a dataframe with s.to_frame() for example

How to subtract only the time (excluding date) in 2 Pandas columns?

Having a terrible time finding information on this. I am tracking several completion times every single day to measure them against goal completion time.
I am reading the completion date and time into a pandas dataframe and using df.map to map a dictionary of completion times to create a "goal time" column in a dataframe.
Sample Data:
Date Process
1/2/2020 10:20:00 AM Test 1
1/2/2020 10:25:00 AM Test 2
1/3/2020 10:15:00 AM Test 1
1/3/2020 10:00:00 AM Test 2
Using df.map() to create a column with the goal time:
goalmap={
'Test 1':dt.datetime.strptime('10:15', '%H:%M'),
'Test 2':dt.datetime.strptime('10:30', '%H:%M')}
df['Goal Time']=df['Process'].map(goalmap)
I am then trying to create a new column of "Delta" that calculates the time difference between the two in minutes. Most of the issues I am running into relate to the data types. I got it to calculate an time difference by converting column one (Date) using pd.to_datetime but because my 'Goal Time' column does not store a date, it calculates a delta that is massive (back to 1900). I've also tried parsing the time out of the Date Time column to no avail.
Any best way to calculate the difference between time stamps only?
I recommend timedelta over datetime:
goalmap={
'Test 1': pd.to_timedelta('10:15:00'),
'Test 2': pd.to_timedelta('10:30:00') }
df['Goal Time']=df['Process'].map(goalmap)
df['Goal_Timestamp'] = df['Date'].dt.normalize() + df['Goal Time']
df['Meet_Goal'] = df['Date'] <= df['Goal_Timestamp']
Output:
Date Process Goal Time Goal_Timestamp Meet_Goal
0 2020-01-02 10:20:00 Test 1 10:15:00 2020-01-02 10:15:00 False
1 2020-01-02 10:25:00 Test 2 10:30:00 2020-01-02 10:30:00 True
2 2020-01-03 10:15:00 Test 1 10:15:00 2020-01-03 10:15:00 True
3 2020-01-03 10:00:00 Test 2 10:30:00 2020-01-03 10:30:00 True

Difference between arrival and departure

departure_day departure_time arrival_day arrival_time
1 00:00:00 3 01:00:00
1 10:00:00 1 02:00:00
6 15:00:00 1 06:00:00
I would like to have a variable that has the difference between those two. 1 is for Monday and 7 for Sunday. Additionally, sometimes it goes from 6 to 1 for example like the last case.
I would like to convert it to hours and days in the end.
So far I have transformed them also to DateTime variables but I am struggling at the moment. Any tips on how to move forward?
Example output:
difference (in hours)
49
14
39
Please find below the code snippet for this:
def day(departure, arrival, dep_time, arr_time):
if departure<=arrival and dep_time<arr_time:
total_hours=abs(hours(dep_time,arr_time).hours)+ (arrival-
departure)*24
elif dep_time>arr_time and departure==arrival-1:
total_hours=24+(hours(dep_time,arr_time).hours)
elif dep_time<arr_time and departure!=arrival-1:
total_hours=abs(hours(dep_time,arr_time).hours)+ (7+arrival-
departure)*24
elif dep_time>arr_time and departure!=arrival-1:
total_hours=24+(hours(dep_time,arr_time).hours)+ (6+arrival-
departure)*24
return total_hours
def hours(dep_time,arr_time):
arr=datetime.strptime(arr_time, FMT)
dep=datetime.strptime(dep_time, FMT)
diff = relativedelta(arr, dep)
return diff
Note: I think your second row is incorrect. Please check.

How to calculate delta time (hour) between two datetime from two csv file?

I have two csv files, first one (e.g. "time.csv") contains information about ID and specific time (one ID one datetime information).
ID datetime
1 2019-05-01 14:00
2 2019-05-02 12:00
3 2019-04-02 10:00
And the other csv file contains other features from each ID with one hour timestamp. One cow may have several rows, and I need to create new columns 'deltahour' which shows the difference between current time and the datetime for specific ID on "time.csv".
ID datetime deltahour
1 2019-05-01 08:00 6
1 2019-05-01 09:00 5
1 2019-05-01 10:00 4
.
.
1 2019-05-01 14:00 0
2 2019-05-02 08:00 4
2 2019-05-02 09:00 3
.
.
2 2019-05-01 12:00 0
How do I get this using Pandas? Thanks!
You can do with merge then just subtract the value
df1.datetime=pd.to_datetime(df1.datetime)
df2.datetime=pd.to_datetime(df2.datetime)
df=df1.merge(df2,on='ID').assign(deltahour=lambda x : x['datetime_x']-x['datetime_y'])
Try using pd.merge after making sure the datetime column is parsed as date

Groupby, in dataframe based on the index value (hourly timestamp) - when the index is interrupted

I am working on a dataframe and I am in a situation where I need to group together the rows based on the value of the index. The index is hourly timestamp, but it happens that some specific hours are not in the dataframe (because they do not satisfy a specific condition). So I need to group together all the continuous hours, and when a specific hour is missing another group should be created.
The image below describes what I want to achieve:
Timestamp Value
1/2/2017 1:00 231.903601
1/2/2017 2:00 228.225897
1/2/2017 7:00 211.998416
1/2/2017 8:00 227.219204
1/2/2017 9:00 229.203123
1/3/2017 6:00 237.907033
1/3/2017 7:00 206.684276
1/3/2017 8:00 228.4801
The output should be (Starting-ending date and the average value):
Timestamp Avg_Value
1/2/2017 1:00-1/2/2017 2:00 230.06
1/2/2017 7:00-1/2/2017 9:00 222.8
1/3/2017 6:00-1/3/2017 8:00 224.35
Could you please help me with a way, do do this with Python dataframes?
Thank you,
First convert to a Timestamp.
Then form groups by taking the cumulative sum of a Series that checks if the time difference is not 1 Hour. Use .agg to get the relevant calculations for each column.
import pandas as pd
df['Timestamp'] = pd.to_datetime(df.Timestamp, format='%m/%d/%Y %H:%M')
s = df.Timestamp.diff().bfill().dt.total_seconds().ne(3600).cumsum()
df.groupby(s).agg({'Timestamp': ['min', 'max'], 'Value': 'mean'}).rename_axis(None, 0)
Output:
Timestamp Value
min max mean
0 2017-01-02 01:00:00 2017-01-02 02:00:00 230.064749
1 2017-01-02 07:00:00 2017-01-02 09:00:00 222.806914
2 2017-01-03 06:00:00 2017-01-03 08:00:00 224.357136

Categories

Resources