I have a pandas dataframe time column like following.
segments_data['time']
Out[1585]:
0 04:50:00
1 04:50:00
2 05:00:00
3 05:12:00
4 06:04:00
5 06:44:00
6 06:44:00
7 06:47:00
8 06:47:00
9 06:47:00
I want to add 5 hours and 30 mins to above time column.
I am doing following in python.
pd.DatetimeIndex(segments_data['time']) + pd.DateOffset(hours=5,minutes=30)
But it gives me an error.
TypeError: object of type 'datetime.time' has no len()
please help.
as of '0.25.3' this is as simple as
df[column] = df[column] + pd.Timedelta(hours=1)
You can try importing timedelta:
from datetime import datetime, timedelta
and then:
segments_data['time'] = pd.DatetimeIndex(segments_data['time']) + timedelta(hours=5,minutes=30)
Pandas does not support vectorised operations with datetime.time objects. For efficient, vectorised operations, there is no requirement to use the datetime module from the standard library.
You have a couple of options to vectorise your calculation. Either use a Pandas timedelta series, if your times represent a duration. Or use a Pandas datetime series, if your times represent specific points in time.
The choice depends entirely on what your data represents.
timedelta series
df['time'] = pd.to_timedelta(df['time'].astype(str)) + pd.to_timedelta('05:30:00')
print(df['time'].head())
0 10:20:00
1 10:20:00
2 10:30:00
3 10:42:00
4 11:34:00
Name: 1, dtype: timedelta64[ns]
datetime series
df['time'] = pd.to_datetime(df['time'].astype(str)) + pd.DateOffset(hours=5, minutes=30)
print(df['time'].head())
0 2018-12-24 10:20:00
1 2018-12-24 10:20:00
2 2018-12-24 10:30:00
3 2018-12-24 10:42:00
4 2018-12-24 11:34:00
Name: 1, dtype: datetime64[ns]
Notice by default the current date is assumed.
This is a gnarly way of doing it, principally the problem here is the lack of vectorised support for time objects, so you first need to convert the time to datetime by using combine and then apply the offset and get the time component back:
In [28]:
import datetime as dt
df['new_time'] = df['time'].apply(lambda x: (dt.datetime.combine(dt.datetime(1,1,1), x,) + dt.timedelta(hours=3,minutes=30)).time())
df
Out[28]:
time new_time
index
0 04:50:00 08:20:00
1 04:50:00 08:20:00
2 05:00:00 08:30:00
3 05:12:00 08:42:00
4 06:04:00 09:34:00
5 06:44:00 10:14:00
6 06:44:00 10:14:00
7 06:47:00 10:17:00
8 06:47:00 10:17:00
9 06:47:00 10:17:00
Related
I have following dataframe, where date was set as the index col,
date
renormalized
2017-01-01
6
2017-01-08
5
2017-01-15
3
2017-01-22
3
2017-01-29
3
I want to append 00:00:00 to each of the datetime in the index column, make it like
date
renormalized
2017-01-01 00:00:00
6
2017-01-08 00:00:00
5
2017-01-15 00:00:00
3
2017-01-22 00:00:00
3
2017-01-29 00:00:00
3
It seems I got stuck for no solution to make it happen.... It will be great if anyone can help...
Thanks
AL
When your time is 0 for all instances, pandas doesn't show the time by default (although it's a Timestamp class, so it has the time!). Probably your data is already normalized, and you can perform delta time operations as usual.
You can see a target observation with df.index[0] for instance, or take a look at all the times with df.index.time.
You can use DatetimeIndex.strftime
df.index = pd.to_datetime(df.index).strftime('%Y-%m-%d %H:%M:%S')
print(df)
renormalized
date
2017-01-01 00:00:00 6
2017-01-08 00:00:00 5
2017-01-15 00:00:00 3
2017-01-22 00:00:00 3
2017-01-29 00:00:00 3
Or you can choose
df.index = df.index + ' 00:00:00'
I am looking to do something like in this thread. However, I only want to subtract the time component of the two datetime columns.
For eg., given this dataframe:
ts1 ts2
0 2018-07-25 11:14:00 2018-07-27 12:14:00
1 2018-08-26 11:15:00 2018-09-24 10:15:00
2 2018-07-29 11:17:00 2018-07-22 11:00:00
The expected output for ts2 -ts1 time component only should give:
ts1 ts2 ts_delta
0 2018-07-25 11:14:00 2018-07-27 12:14:00 1:00:00
1 2018-08-26 11:15:00 2018-09-24 10:15:00 -1:00:00
2 2018-07-29 11:17:00 2018-07-22 11:00:00 -0:17:00
So, for row 0: the time for ts2 is 12:14:00, the time for ts1 is 11:14:00. The expected output is just these two times subtracting (don't care about the days). In this case:
12:14:00 - 11:14:00 = 1:00:00.
How would I do this in one single line?
Since you only want the time difference and you're not working with timezone-aware datetime, the date does not matter. Therefore you don't have to change any dates or set some arbitrary reference date. Just work with what you have.
Subtract ts1's time component from ts2 as a timedelta, then convert the resulting datetime to a timedelta by subtracting ts2' date:
df["delta_time"] = (df["ts2"] - pd.to_timedelta(df["ts1"].dt.time.astype(str))) - df["ts2"].dt.floor("d")
df
ts1 ts2 delta_time
0 2018-07-25 11:14:00 2018-07-27 12:14:00 0 days 01:00:00
1 2018-08-26 11:15:00 2018-09-24 10:15:00 -1 days +23:00:00
2 2018-07-29 11:17:00 2018-07-22 11:00:00 -1 days +23:43:00
You need to set both datetimes to a common date first.
One way is to use pandas.DateOffset:
o = pd.DateOffset(day=1, month=1, year=2022) # the exact numbers don't matter
# reset dates
ts1 = df['ts1'].add(o)
ts2 = df['ts2'].add(o)
# subtract
df['ts_delta'] = ts2.sub(ts1)
As one-liner:
df['ts_delta'] = df['ts2'].add((o:=pd.DateOffset(day=1, month=1, year=2022))).sub(df['ts1'].add(o))
Other way using a difference between ts2-ts1 (with dates) and ts2-ts1 (dates only):
df['ts_delta'] = (df['ts2'].sub(df['ts1'])
-df['ts2'].dt.normalize().sub(df['ts1'].dt.normalize())
)
output:
ts1 ts2 ts_delta
0 2018-07-25 11:14:00 2018-07-27 12:14:00 0 days 01:00:00
1 2018-08-26 11:15:00 2018-09-24 10:15:00 -1 days +23:00:00
2 2018-07-29 11:17:00 2018-07-22 11:00:00 -1 days +23:43:00
NB. don't get confused by the -1 days +23:00:00, this is actually the ways to represent -1hour
I've tried to simulate your problem in my local environment. Apparently pandas.datetime64' types supporting add/subtract operations. You don't actually need to access datetime` object to execute these operations.
I did my experiments as below;
import pandas as pd
df = pd.DataFrame({'a' : ['2018-07-25 11:14:00', '2018-08-26 11:15:00', '2018-07-29 11:17:00'],
'b' : ['2018-07-27 12:14:00', '2018-09-24 10:15:00', '2018-07-22 11:00:00'] })
df['a'] = pd.to_datetime(df['a'])
df['b'] = pd.to_datetime(df['b'])
df['d'] = df['b'] - df['a']
and df is like;
a b d
0 2018-07-25 11:14:00 2018-07-27 12:14:00 2 days 01:00:00
1 2018-08-26 11:15:00 2018-09-24 10:15:00 28 days 23:00:00
2 2018-07-29 11:17:00 2018-07-22 11:00:00 -8 days +23:43:00
Try this, first strip time, then put time on same day, subtract, and take absolute value.
l = lambda x: pd.to_datetime("01-01-1900 " + x)
df["ts_delta"] = (
df["ts2"].dt.time.astype(str).apply(l) - df["ts1"].dt.time.astype(str).apply(l)
).abs()
df
Output:
ts1 ts2 ts_delta
0 2018-07-25 11:14:00 2018-07-27 12:14:00 0 days 01:00:00
1 2018-08-26 11:15:00 2018-09-24 10:15:00 0 days 01:00:00
2 2018-07-29 11:17:00 2018-07-22 11:00:00 0 days 00:17:00
Use df['col_name'].dt.time to get time from date time column. Let's assume your dataframe name is df. Now, in your case
t1 = df['ts1'].dt.time
t2 = df['ts2'].dt.time
df['ts_delta'] = t2 - t1
For single line
df['ts_delta'] = df['ts2'].dt.time - df['ts1'].dt.time
I hope it will resolve your issue. Happy Coding!
I want the time without the date in Pandas.
I want to keep the time as dtype datetime64[ns] and not as an object so that I can determine periods between times.
The closest I have gotten is as follows, but it gives back the date in a new column not the time as needed as dtype datetime.
df_pres_mf['time'] = pd.to_datetime(df_pres_mf['time'], format ='%H:%M', errors = 'coerce') # returns date (1900-01-01) and actual time as a dtype datetime64[ns] format
df_pres_mf['just_time'] = df_pres_mf['time'].dt.date
df_pres_mf['normalised_time'] = df_pres_mf['time'].dt.normalize()
df_pres_mf.head()
Returns the date as 1900-01-01 and not the time that is needed.
Edit: Data
time
1900-01-01 11:16:00
1900-01-01 15:20:00
1900-01-01 09:55:00
1900-01-01 12:01:00
You could do it like Vishnudev suggested but then you would have dtype: object (or even strings, after using dt.strftime), which you said you didn't want.
What you are looking for doesn't exist, but the closest thing that I can get you is converting to timedeltas. Which won't seem like a solution at first but is actually very useful.
Convert it like this:
# sample df
df
>>
time
0 2021-02-07 09:22:00
1 2021-05-10 19:45:00
2 2021-01-14 06:53:00
3 2021-05-27 13:42:00
4 2021-01-18 17:28:00
df["timed"] = df.time - df.time.dt.normalize()
df
>>
time timed
0 2021-02-07 09:22:00 0 days 09:22:00 # this is just the time difference
1 2021-05-10 19:45:00 0 days 19:45:00 # since midnight, which is essentially the
2 2021-01-14 06:53:00 0 days 06:53:00 # same thing as regular time, except
3 2021-05-27 13:42:00 0 days 13:42:00 # that you can go over 24 hours
4 2021-01-18 17:28:00 0 days 17:28:00
this allows you to calculate periods between times like this:
# subtract the last time from the current
df["difference"] = df.timed - df.timed.shift()
df
Out[48]:
time timed difference
0 2021-02-07 09:22:00 0 days 09:22:00 NaT
1 2021-05-10 19:45:00 0 days 19:45:00 0 days 10:23:00
2 2021-01-14 06:53:00 0 days 06:53:00 -1 days +11:08:00 # <-- this is because the last
3 2021-05-27 13:42:00 0 days 13:42:00 0 days 06:49:00 # time was later than the current
4 2021-01-18 17:28:00 0 days 17:28:00 0 days 03:46:00 # (see below)
to get rid of odd differences, make it absolute:
df["abs_difference"] = df.difference.abs()
df
>>
time timed difference abs_difference
0 2021-02-07 09:22:00 0 days 09:22:00 NaT NaT
1 2021-05-10 19:45:00 0 days 19:45:00 0 days 10:23:00 0 days 10:23:00
2 2021-01-14 06:53:00 0 days 06:53:00 -1 days +11:08:00 0 days 12:52:00 ### <<--
3 2021-05-27 13:42:00 0 days 13:42:00 0 days 06:49:00 0 days 06:49:00
4 2021-01-18 17:28:00 0 days 17:28:00 0 days 03:46:00 0 days 03:46:00
Use proper formatting according to your date format and convert to datetime
df['time'] = pd.to_datetime(df['time'], format='%Y-%m-%d %H:%M:%S')
Format according to the preferred format
df['time'].dt.strftime('%H:%M')
Output
0 11:16
1 15:20
2 09:55
3 12:01
Name: time, dtype: object
I have this dataframe object:
Date
2018-12-14
2019-01-11
2019-01-25
2019-02-08
2019-02-22
2019-07-26
What I want, if it's possible, is to add for example: 3 months to the dates, and then 3 months to the new date (original date + 3 months) and repeat this x times. I am using pd.offsets.MonthOffset but this just adds the months one time and I need to do it more times.
I don't know if it is possible but any help would be perfect.
Thank you so much for taking your time.
The expected output is (for 1 month adding 2 times):
[[2019-01-14, 2019-02-11, 2019-02-25, 2019-03-08, 2019-03-22, 2019-08-26],[2019-02-14, 2019-03-11, 2019-03-25, 2019-04-08, 2019-04-22, 2019-09-26]]
I believe you need loop with f-strings for new columns names:
for i in range(1,4):
df[f'Date_added_{i}_months'] = df['Date'] + pd.offsets.MonthBegin(i)
print (df)
Date Date_added_1_months Date_added_2_months Date_added_3_months
0 2018-12-14 2019-01-01 2019-02-01 2019-03-01
1 2019-01-11 2019-02-01 2019-03-01 2019-04-01
2 2019-01-25 2019-02-01 2019-03-01 2019-04-01
3 2019-02-08 2019-03-01 2019-04-01 2019-05-01
4 2019-02-22 2019-03-01 2019-04-01 2019-05-01
5 2019-07-26 2019-08-01 2019-09-01 2019-10-01
Or:
for i in range(1,4):
df[f'Date_added_{i}_months'] = df['Date'] + pd.offsets.MonthOffset(i)
print (df)
Date Date_added_1_months Date_added_2_months Date_added_3_months
0 2018-12-14 2019-01-14 2019-02-14 2019-03-14
1 2019-01-11 2019-02-11 2019-03-11 2019-04-11
2 2019-01-25 2019-02-25 2019-03-25 2019-04-25
3 2019-02-08 2019-03-08 2019-04-08 2019-05-08
4 2019-02-22 2019-03-22 2019-04-22 2019-05-22
5 2019-07-26 2019-08-26 2019-09-26 2019-10-26
I hope this helps
from dateutil.relativedelta import relativedelta
month_offset = [3,6,9]
for i in month_offset:
df['Date_plus_'+i+'_months'] = df['Date'].map(lambda x: x+relativedelta(months=i))
If your dates are date objects, it should be pretty easy. You can just create a timedelta of 3 months and add it to each date.
Alternatively, you can convert them to date objects with .strptime() and then do what you are suggesting. You can convert them back to a string with .strftime().
I have a dataframe which I want to split into 5 chunks (more generally n chunks), so that I can apply a groupby on the chunks.
I want the chunks to have equal time intervals but in general each group may contain different numbers of records.
Let's call the data
s = pd.Series(pd.date_range('2012-1-1', periods=100, freq='D'))
and the timeinterval ti = (s.max() - s.min())/n
So the first chunk should include all rows with dates between s.min() and s.min() + ti, the second, all rows with dates between s.min() + ti and s.min() + 2*ti, etc.
Can anyone suggest an easy way to achieve this? If somehow I could convert all my dates into seconds since the epoch, then I could do something like thisgroup = floor(thisdate/ti).
Is there an easy 'pythonic' or 'panda-ista' way to do this?
Thanks very much (and Merry Christmas!),
Robin
You can use numpy.array_split:
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(pd.date_range('2012-1-1', periods=10, freq='D'))
>>> np.array_split(s, 5)
[0 2012-01-01 00:00:00
1 2012-01-02 00:00:00
dtype: datetime64[ns], 2 2012-01-03 00:00:00
3 2012-01-04 00:00:00
dtype: datetime64[ns], 4 2012-01-05 00:00:00
5 2012-01-06 00:00:00
dtype: datetime64[ns], 6 2012-01-07 00:00:00
7 2012-01-08 00:00:00
dtype: datetime64[ns], 8 2012-01-09 00:00:00
9 2012-01-10 00:00:00
dtype: datetime64[ns]]
>>> np.array_split(s, 2)
[0 2012-01-01 00:00:00
1 2012-01-02 00:00:00
2 2012-01-03 00:00:00
3 2012-01-04 00:00:00
4 2012-01-05 00:00:00
dtype: datetime64[ns], 5 2012-01-06 00:00:00
6 2012-01-07 00:00:00
7 2012-01-08 00:00:00
8 2012-01-09 00:00:00
9 2012-01-10 00:00:00
dtype: datetime64[ns]]
The answer is as follows:
s = pd.DataFrame(pd.date_range('2012-1-1', periods=20, freq='D'), columns=["date"])
n = 5
s["date"] = np.int64(s) #This step may not be needed in future pandas releases
s["bin"] = np.floor((n-0.001)*(s["date"] - s["date"].min( )) /((s["date"].max( ) - s["date"].min( ))))