I have the following column
Time
2:00
00:13
1:00
00:24
in object format (strings). This time refers to hours and minutes ago from a time that I need to use as a start: 8:00 (it might change; in this example is 8:00).
Since the times in the column Time are referring to hours/minutes ago, what I would like to expect should be
Time
6:00
07:47
7:00
07:36
calculated as time difference (e.g. 8:00 - 2:00).
However, I am having difficulties in doing this calculation and transform the result in a datetime (keeping only hours and minutes).
I hope you can help me.
Since the Time columns contains only Hour:Minute I suggest using timedelta instead of datetime:
df['Time'] = pd.to_timedelta(df.Time+':00')
df['Start_Time'] = pd.to_timedelta('8:00:00') - df['Time']
Output:
Time Start_Time
0 02:00:00 06:00:00
1 00:13:00 07:47:00
2 01:00:00 07:00:00
3 00:24:00 07:36:00
you can do it using pd.to_datetime.
ref = pd.to_datetime('08:00') #here define the hour of reference
s = ref-pd.to_datetime(df['Time'])
print (s)
0 06:00:00
1 07:47:00
2 07:00:00
3 07:36:00
Name: Time, dtype: timedelta64[ns]
This return a series, that can be change to a dataframe with s.to_frame() for example
Related
I'm new to pandas and it's still hard to understand how to do that. In pure python, it becomes so hard and unreadable. I need to count rows day-by-day with 1-hour time slots (is the begin-end range in this slot).
For ex., for data:
begin_time end_time
2020-01-01 11:02:10 2020-01-01 12:33:05
2020-01-01 12:22:20 2020-01-01 13:01:51
2020-01-02 09:02:24 2020-01-02 11:33:46
we'll have:
# for 2020-01-01
time slot count
11:00 - 12:00 1
12:00 - 13:00 2
13:00 - 14:00 1
...
Help will be gladly accepted. Thanks.
This question needs a bit more context but I think you're looking for
df.groupby([pd.Grouper(key='begin_time', freq='H')])['column_to_count'].count()
departure_day departure_time arrival_day arrival_time
1 00:00:00 3 01:00:00
1 10:00:00 1 02:00:00
6 15:00:00 1 06:00:00
I would like to have a variable that has the difference between those two. 1 is for Monday and 7 for Sunday. Additionally, sometimes it goes from 6 to 1 for example like the last case.
I would like to convert it to hours and days in the end.
So far I have transformed them also to DateTime variables but I am struggling at the moment. Any tips on how to move forward?
Example output:
difference (in hours)
49
14
39
Please find below the code snippet for this:
def day(departure, arrival, dep_time, arr_time):
if departure<=arrival and dep_time<arr_time:
total_hours=abs(hours(dep_time,arr_time).hours)+ (arrival-
departure)*24
elif dep_time>arr_time and departure==arrival-1:
total_hours=24+(hours(dep_time,arr_time).hours)
elif dep_time<arr_time and departure!=arrival-1:
total_hours=abs(hours(dep_time,arr_time).hours)+ (7+arrival-
departure)*24
elif dep_time>arr_time and departure!=arrival-1:
total_hours=24+(hours(dep_time,arr_time).hours)+ (6+arrival-
departure)*24
return total_hours
def hours(dep_time,arr_time):
arr=datetime.strptime(arr_time, FMT)
dep=datetime.strptime(dep_time, FMT)
diff = relativedelta(arr, dep)
return diff
Note: I think your second row is incorrect. Please check.
I am working on a dataframe and I am in a situation where I need to group together the rows based on the value of the index. The index is hourly timestamp, but it happens that some specific hours are not in the dataframe (because they do not satisfy a specific condition). So I need to group together all the continuous hours, and when a specific hour is missing another group should be created.
The image below describes what I want to achieve:
Timestamp Value
1/2/2017 1:00 231.903601
1/2/2017 2:00 228.225897
1/2/2017 7:00 211.998416
1/2/2017 8:00 227.219204
1/2/2017 9:00 229.203123
1/3/2017 6:00 237.907033
1/3/2017 7:00 206.684276
1/3/2017 8:00 228.4801
The output should be (Starting-ending date and the average value):
Timestamp Avg_Value
1/2/2017 1:00-1/2/2017 2:00 230.06
1/2/2017 7:00-1/2/2017 9:00 222.8
1/3/2017 6:00-1/3/2017 8:00 224.35
Could you please help me with a way, do do this with Python dataframes?
Thank you,
First convert to a Timestamp.
Then form groups by taking the cumulative sum of a Series that checks if the time difference is not 1 Hour. Use .agg to get the relevant calculations for each column.
import pandas as pd
df['Timestamp'] = pd.to_datetime(df.Timestamp, format='%m/%d/%Y %H:%M')
s = df.Timestamp.diff().bfill().dt.total_seconds().ne(3600).cumsum()
df.groupby(s).agg({'Timestamp': ['min', 'max'], 'Value': 'mean'}).rename_axis(None, 0)
Output:
Timestamp Value
min max mean
0 2017-01-02 01:00:00 2017-01-02 02:00:00 230.064749
1 2017-01-02 07:00:00 2017-01-02 09:00:00 222.806914
2 2017-01-03 06:00:00 2017-01-03 08:00:00 224.357136
I'm using rrule as shown here:
https://labix.org/python-dateutil#head-470fa22b2db72000d7abe698a5783a46b0731b57
I'm wondering if it somehow possible to create a rule where different times are specified for different weekdays
e.g. WEEKLY Thursday 6pm and Saturday 10am
Hope someone can help :)
A single rrule can not specify both pairs of days and hours, but you could use a rrule.ruleset to combine rrules:
import datetime as DT
import dateutil.rrule as RR
today = DT.date.today()
aset = RR.rruleset()
aset.rrule(RR.rrule(RR.WEEKLY, byweekday=RR.TH, byhour=18, count=3, dtstart=today))
aset.rrule(RR.rrule(RR.WEEKLY, byweekday=RR.SA, byhour=10, count=3, dtstart=today))
for date in aset:
print(date)
yields
2015-03-26 18:00:00
2015-03-28 10:00:00
2015-04-02 18:00:00
2015-04-04 10:00:00
2015-04-09 18:00:00
2015-04-11 10:00:00
This is my current code
class TimeSeries():
def year(year):
today = datetime.now()
start_date = today+relativedelta(years=-1)
mint, maxt = datetime.min.time(), datetime.max.time()
for st in rrule(MONTHLY, count=24, bymonthday=(1,-1,), dtstart=start_date):
yield st.combine(st, mint)
And this is output from this:
for y in TimeSeries().year():
print(y)
2013-01-31 00:00:00
2013-02-01 00:00:00
2013-02-28 00:00:00
2013-03-01 00:00:00
2013-03-31 00:00:00
2013-04-01 00:00:00
2013-04-30 00:00:00
2013-05-01 00:00:00
2013-05-31 00:00:00
2013-06-01 00:00:00
2013-06-30 00:00:00
2013-07-01 00:00:00
2013-07-31 00:00:00
2013-08-01 00:00:00
2013-08-31 00:00:00
2013-09-01 00:00:00
2013-09-30 00:00:00
2013-10-01 00:00:00
2013-10-31 00:00:00
2013-11-01 00:00:00
2013-11-30 00:00:00
2013-12-01 00:00:00
2013-12-31 00:00:00
2014-01-01 00:00:00
The question is how I can force that counting are started from 2013-01-01 00:00:00 and month end like 2013-01-31 23:59:59 and so on.
And the end of loop ends on 2014-01-31 23:59:59 instead 2014-01-01 00:00:00
Also I like make start date and end date on one line:
2013-03-01 00:00:00 2013-03-31 23:59:59
2013-04-01 00:00:00 2013-03-30 23:59:59
...
...
2014-01-01 00:00:00 2014-01-31 23:59:59
Any suggestion?
First, are you really sure that you want 2013-03-31 23:59:59. Date intervals are traditionally specified as half-open intervals—just like ranges in Python. And the reason for this is that 23:59:59 is not actually the end of a day.
Most obviously, 23:59:59.001 is later than that but on the same day. Python datetime objects include microseconds, so this isn't just a "meh, whatever" problem—if you, e.g., call now(), you can get a time that's incorrectly later than your "end of the day" on the same day.
Less obviously, on a day with a leap second, 23:59:60 is also later but on the same day.
But if you really want this, there are two obvious ways to get it:
You're already iterating dates instead of datetimes and combining the times in manually. And it's obvious when you're dealing with a day 1 vs. day -1, because the date's day member will be 1 or it won't be. So:
class TimeSeries():
def year(year):
today = datetime.now()
start_date = today+relativedelta(years=-1)
mint, maxt = datetime.min.time(), datetime.max.time()
for st in rrule(MONTHLY, count=24, bymonthday=(1, -1,), dtstart=start_date):
yield st.combine(st, mint if st.day=1 else maxt)
Alternatively, instead of iterating both first and last days, just iterate first days, and subtract a second to get the last second of the previous month:
class TimeSeries():
def year(year):
today = datetime.now()
start_date = today+relativedelta(years=-1)
mint, maxt = datetime.min.time(), datetime.max.time()
for st in rrule(MONTHLY, count=24, bymonthday=(1,), dtstart=start_date):
dt = st.combine(st, mint)
yield dt - timedelta(seconds=1)
yield dt
As far as printing these in pairs… well, as written, that's an underspecified problem. The first value in your list is the second value in a pair—except when you run this on the 1st of a month. And likewise, the last date is the first value in a pair, except when you run this on the 31st. So, what do you want to do with them?
If this isn't obvious, look at your example. Your first value is 2013-01-31 00:00:00, but your first pair doesn't start with 2013-01-31.
There are many things you could want here:
Start with the first of the month a year ago, rather than the first first-or-last of the month that happened within the last year. And likewise for the end. So you would have 2013-01-01 in your list, and there would always be pairs.
Start with the first month that started within the last year, and likewise for the end. So you wouldn't get 2013-01-31 in your list, and there would always be pairs.
Use your current rule, and there's not a pair, use None for the missing value.
etc.
Whatever rule you actually want can be coded up pretty easily. And then you'll probably want to yield in (start, end) tuples, so the print loop can just do this:
for start, end in TimeSeries().year():
print(start, end)