I have a data file that contains the evendata such as event starting date (Date), starting time (KOTime) and event time (EveTime).
The following is the sample of data.
df = pd.DataFrame()
df['Date'] = ['2018/08/12','2018/08/12','2018/08/12','2018/08/12','2018/08/12','2018/08/12']
df['KOTime'] = ['12:30:00','12:30:00','12:30:00','12:30:00','12:30:00','12:30:00']
df['EveTime'] = ['04:50:00','01:03:00','1900-01-03 05:22:00','1900-01-02 16:04:00','1900-01-01 10:28:00','1900-01-01 16:23:00']
Evetime is not formatted in the raw data file as can be seen in the data.
if the Evetime is greater than 24 hours, it is shown as 1900-01-xx .
If we look at the 3rd value of EveTime, it is shown as 1900-01-03 05:22:00.
It is supposed to be 2018/08/12 13:47:22.
I want to create a new column that contains Date and EveTime and the expected output is as follow:
2018/08/12 12:34:50
2018/08/12 12:31:03
2018/08/12 13:47:22
2018/08/12 13:34:04
2018/08/12 12:40:28
2018/08/12 12:46:23
Can anyone suggest me how to do to get the format mentioned above?
I think need convert values to timedeltas and add to datetimes column:
#convert to numeric
num = pd.to_numeric(df['EveTime'].str[-11:-8], errors='coerce')
#convert to timedeltas with seconds
td1 = pd.to_timedelta(np.where(num > 1, num, 0) * 24 * 60, unit='s')
td2 = pd.to_timedelta('00:' + df['EveTime'].str[-8:-3])
df['date'] = pd.to_datetime(df['Date'] + ' ' + df['KOTime']) + td1 + td2
print (df)
Date KOTime EveTime date
0 2018/08/12 12:30:00 04:50:00 2018-08-12 12:34:50
1 2018/08/12 12:30:00 01:03:00 2018-08-12 12:31:03
2 2018/08/12 12:30:00 1900-01-03 05:22:00 2018-08-12 13:47:22
3 2018/08/12 12:30:00 1900-01-02 16:04:00 2018-08-12 13:34:04
4 2018/08/12 12:30:00 1900-01-01 10:28:00 2018-08-12 12:40:28
5 2018/08/12 12:30:00 1900-01-01 16:23:00 2018-08-12 12:46:23
print (td1)
TimedeltaIndex(['00:00:00', '00:00:00', '01:12:00', '00:48:00', '00:00:00',
'00:00:00'],
dtype='timedelta64[ns]', freq=None)
print (td2)
0 00:04:50
1 00:01:03
2 00:05:22
3 00:16:04
4 00:10:28
5 00:16:23
Name: EveTime, dtype: timedelta64[ns]
Related
I have a dataframe like as shown below
df = pd.DataFrame({'person_id': [11,11,11,21,21],
'offset' :['-131 days','29 days','142 days','20 days','-200 days'],
'date_1': ['05/29/2017', '01/21/1997', '7/27/1989','01/01/2013','12/31/2016'],
'dis_date': ['05/29/2017', '01/24/1999', '7/22/1999','01/01/2015','12/31/1991'],
'vis_date':['05/29/2018', '01/27/1994', '7/29/2011','01/01/2018','12/31/2014']})
df['date_1'] = pd.to_datetime(df['date_1'])
df['dis_date'] = pd.to_datetime(df['dis_date'])
df['vis_date'] = pd.to_datetime(df['vis_date'])
I would like to shift all the dates of each subject based on his offset
Though my code works (credit - SO), I am looking for an elegant approach. You can see am kind of repeating almost the same line thrice.
df['offset_to_shift'] = pd.to_timedelta(df['offset'],unit='d')
#am trying to make the below lines elegant/efficient
df['shifted_date_1'] = df['date_1'] + df['offset_to_shift']
df['shifted_dis_date'] = df['dis_date'] + df['offset_to_shift']
df['shifted_vis_date'] = df['vis_date'] + df['offset_to_shift']
I expect my output to be like as shown below
Use, DataFrame.add along with DataFrame.add_prefix and DataFrame.join:
cols = ['date_1', 'dis_date', 'vis_date']
df = df.join(df[cols].add(df['offset_to_shift'], 0).add_prefix('shifted_'))
OR, it is also possible to use pd.concat:
df = pd.concat([df, df[cols].add(df['offset_to_shift'], 0).add_prefix('shifted_')], axis=1)
OR, we can also directly assign the new shifted columns to the dataframe:
df[['shifted_' + col for col in cols]] = df[cols].add(df['offset_to_shift'], 0)
Result:
# print(df)
person_id offset date_1 dis_date vis_date offset_to_shift shifted_date_1 shifted_dis_date shifted_vis_date
0 11 -131 days 2017-05-29 2017-05-29 2018-05-29 -131 days 2017-01-18 2017-01-18 2018-01-18
1 11 29 days 1997-01-21 1999-01-24 1994-01-27 29 days 1997-02-19 1999-02-22 1994-02-25
2 11 142 days 1989-07-27 1999-07-22 2011-07-29 142 days 1989-12-16 1999-12-11 2011-12-18
3 21 20 days 2013-01-01 2015-01-01 2018-01-01 20 days 2013-01-21 2015-01-21 2018-01-21
4 21 -200 days 2016-12-31 1991-12-31 2014-12-31 -200 days 2016-06-14 1991-06-14 2014-06-14
I have a data frame with a field time of timestamps with dates, and another column period. How can I add a number of days to time based on period?
Current Output:
time period
------------------------------
2020-04-28 10:00:00 1
2020-04-27 12:34:56 3
Expected Output
time
---------------
2020-04-29 10:00:00
2020-04-30 12:34:56
If I try df['time'] = df['time'] + pd.DateOffset(df['period']) I get an error TypeError:nargument must be an integer, got <class 'pandas.core.series.Series'> because it is trying to pass the whole column into the function which expects an integer. How can this be accomplished?
Because days can be converted to timedeltas by to_timedelta is possible use:
df['time'] = df['time'] + pd.to_timedelta(df['period'], unit='d')
print (df)
time period
0 2020-04-29 10:00:00 1
1 2020-04-30 12:34:56 3
But if want add months is necessary use:
df['time'] = df['time'] + df['period'].apply(lambda x: pd.DateOffset(months=x))
print (df)
time period
0 2020-05-28 10:00:00 1
1 2020-07-27 12:34:56 3
If use month timedelatas is working with 'default month', so precision is different:
df['time'] = df['time'] + pd.to_timedelta(df['period'], unit='M')
print (df)
time period
0 2020-05-28 20:29:06 1
1 2020-07-27 20:02:14 3
I have a column in my dataframe which I want to convert to a Timestamp. However, it is in a bit of a strange format that I am struggling to manipulate. The column is in the format HHMMSS, but does not include the leading zeros.
For example for a time that should be '00:03:15' the dataframe has '315'. I want to convert the latter to a Timestamp similar to the former. Here is an illustration of the column:
message_time
25
35
114
1421
...
235347
235959
Thanks
Use Series.str.zfill for add leading zero and then to_datetime:
s = df['message_time'].astype(str).str.zfill(6)
df['message_time'] = pd.to_datetime(s, format='%H%M%S')
print (df)
message_time
0 1900-01-01 00:00:25
1 1900-01-01 00:00:35
2 1900-01-01 00:01:14
3 1900-01-01 00:14:21
4 1900-01-01 23:53:47
5 1900-01-01 23:59:59
In my opinion here is better create timedeltas by to_timedelta:
s = df['message_time'].astype(str).str.zfill(6)
df['message_time'] = pd.to_timedelta(s.str[:2] + ':' + s.str[2:4] + ':' + s.str[4:])
print (df)
message_time
0 00:00:25
1 00:00:35
2 00:01:14
3 00:14:21
4 23:53:47
5 23:59:59
I have tried many suggestions from here but none of them solved.
I have two columns with observations like this: 15:08:19
If I write
df.time_entry.describe()
it appears:
count 814262
unique 56765
top 15:03:00
freq 103
Name: time_entry, dtype: object
I've already run this code:
df['time_entry'] = pd.to_datetime(df['time_entry'],format= '%H:%M:%S', errors='ignore' ).dt.time
But rerunning the describe code still returns dtype: object.
What is the purpose of dt.time?
Just remove dt.time and your conversion from object to datetime will work perfectly fine.
df['time_entry'] = pd.to_datetime(df['time_entry'],format= '%H:%M:%S')
The problem is that you are using the datetime accessor (.dt) with the property time and then you are not able to subtract the two columns from eachother. So, just leave out .dt.time and it should work.
Here is some data with 2 columns of strings
df = pd.DataFrame()
df['time_entry'] = ['12:01:00', '15:03:00', '16:43:00', '14:11:00']
df['time_entry2'] = ['13:03:00', '14:04:00', '19:23:00', '18:12:00']
print(df)
time_entry time_entry2
0 12:01:00 13:03:00
1 15:03:00 14:04:00
2 16:43:00 19:23:00
3 14:11:00 18:12:00
Convert both columns to datetime dtype
df['time_entry'] = pd.to_datetime(df['time_entry'], format= '%H:%M:%S', errors='ignore')
df['time_entry2'] = pd.to_datetime(df['time_entry2'], format= '%H:%M:%S', errors='ignore')
print(df)
time_entry time_entry2
0 1900-01-01 12:01:00 1900-01-01 13:03:00
1 1900-01-01 15:03:00 1900-01-01 14:04:00
2 1900-01-01 16:43:00 1900-01-01 19:23:00
3 1900-01-01 14:11:00 1900-01-01 18:12:00
print(df.dtypes)
time_entry datetime64[ns]
time_entry2 datetime64[ns]
dtype: object
(Optional) Specify timezone
df['time_entry'] = df['time_entry'].dt.tz_localize('US/Central')
df['time_entry2'] = df['time_entry2'].dt.tz_localize('US/Central')
Now perform the time difference (subtraction) between the 2 columns and get the time difference in number of days (as a float)
Method 1 gives Diff_days1
Method 2 gives Diff_days2
Method 3 gives Diff_days3
df['Diff_days1'] = (df['time_entry'] - df['time_entry2']).dt.total_seconds()/60/60/24
df['Diff_days2'] = (df['time_entry'] - df['time_entry2']) / np.timedelta64(1, 'D')
df['Diff_days3'] = (df['time_entry'].sub(df['time_entry2'])).dt.total_seconds()/60/60/24
print(df)
time_entry time_entry2 Diff_days1 Diff_days2 Diff_days3
0 1900-01-01 12:01:00 1900-01-01 13:03:00 -0.043056 -0.043056 -0.043056
1 1900-01-01 15:03:00 1900-01-01 14:04:00 0.040972 0.040972 0.040972
2 1900-01-01 16:43:00 1900-01-01 19:23:00 -0.111111 -0.111111 -0.111111
3 1900-01-01 14:11:00 1900-01-01 18:12:00 -0.167361 -0.167361 -0.167361
EDIT
If you're trying to access datetime attributes, then you can do so by using the time_entry column directly (not the time difference column). Here's an example
df['day1'] = df['time_entry'].dt.day
df['time1'] = df['time_entry'].dt.time
df['minute1'] = df['time_entry'].dt.minute
df['dayofweek1'] = df['time_entry'].dt.weekday
df['day2'] = df['time_entry2'].dt.day
df['time2'] = df['time_entry2'].dt.time
df['minute2'] = df['time_entry2'].dt.minute
df['dayofweek2'] = df['time_entry2'].dt.weekday
print(df[['day1', 'time1', 'minute1', 'dayofweek1',
'day2', 'time2', 'minute2', 'dayofweek2']])
day1 time1 minute1 dayofweek1 day2 time2 minute2 dayofweek2
0 1 12:01:00 1 0 1 13:03:00 3 0
1 1 15:03:00 3 0 1 14:04:00 4 0
2 1 16:43:00 43 0 1 19:23:00 23 0
3 1 14:11:00 11 0 1 18:12:00 12 0
I just want to extract from my df HH:MM. How do I do it?
Here's a description of the column in the df:
count 810
unique 691
top 2018-07-25 11:14:00
freq 5
Name: datetime, dtype: object
The string value includes a full time stamp. The goal is to parse each row's HH:MM into another df, and to loop back over and extract just the %Y-%m-%d into another df.
Assume the df looks like
print(df)
date_col
0 2018-07-25 11:14:00
1 2018-08-26 11:15:00
2 2018-07-29 11:17:00
#convert from string to datetime
df['date_col'] = pd.to_datetime(df['date_col'])
#to get date only
print(df['date_col'].dt.date)
0 2018-07-25
1 2018-08-26
2 2018-07-29
#to get time:
print(df['date_col'].dt.time)
0 11:14:00
1 11:15:00
2 11:17:00
#to get hour and minute
print(df['date_col'].dt.strftime('%H:%M'))
0 11:14
1 11:15
2 11:17
First convert to datetime:
df['datetime'] = pd.to_datetime(df['datetime'])
Then you can do:
df2['datetime'] = df['datetime'].dt.strptime('%H:%M')
df3['datetime'] = df['datetime'].dt.strptime('%Y-%m-%d')
General solution (not pandas based)
import time
top = '2018-07-25 11:14:00'
time_struct = time.strptime(top, '%Y-%m-%d %H:%M:%S')
short_top = time.strftime('%H:%M', time_struct)
print(short_top)
Output
11:14