Split datetime column into separate date and time columns - python

I am trying to extract a Date and a Time from a Timestamp:
DateTime
31/12/2015 22:45
to be:
Date | Time |
31/12/2015| 22:45 |
however when I use:
df['Date'] = pd.to_datetime(df['DateTime']).dt.date
I Get :
2015-12-31
Similarly with Time i get:
df['Time'] = pd.to_datetime(df['DateTime']).dt.time
gives
23:45:00
but if I try to format it I get an error:
df['Date'] = pd.to_datetime(f['DateTime'], format='%d/%m/%Y').dt.date
ValueError: unconverted data remains: 00:00

Try strftime
df['DateTime'] = pd.to_datetime(df['DateTime'])
df['Date'] = df['DateTime'].dt.strftime('%d/%m/%Y')
df['Time'] = df['DateTime'].dt.strftime('%H:%M')
DateTime Date Time
0 2015-12-31 22:45:00 31/12/2015 22:45

Option 1
Since you don't really need to operate on the dates per se, just split your column on space:
df = df.DateTime.str.split(expand=True)
df.columns = ['Date', 'Time']
df
Date Time
0 31/12/2015 22:45
Option 2
Alternatively, just drop the format specifier completely:
v = pd.to_datetime(df['DateTime'], errors='coerce')
df['Time'] = v.dt.time
df['Date'] = v.dt.floor('D')
df
Time Date
0 22:45:00 2015-12-31

If your DateTime column is already a datetime type, you shouldn't need to call pd.to_datetime on it.
Are you looking for a string ("12:34") or a timestamp (the concept of 12:34 in the afternoon)? If you're looking for the former, there are answers here that cover that. If you're looking for the latter, you can use the .dt.time and .dt.date accessors.
>>> pd.__version__
u'0.20.2'
>>> df = pd.DataFrame({'DateTime':pd.date_range(start='2018-01-01', end='2018-01-10')})
>>> df['date'] = df.DateTime.dt.date
>>> df['time'] = df.DateTime.dt.time
>>> df
DateTime date time
0 2018-01-01 2018-01-01 00:00:00
1 2018-01-02 2018-01-02 00:00:00
2 2018-01-03 2018-01-03 00:00:00
3 2018-01-04 2018-01-04 00:00:00
4 2018-01-05 2018-01-05 00:00:00
5 2018-01-06 2018-01-06 00:00:00
6 2018-01-07 2018-01-07 00:00:00
7 2018-01-08 2018-01-08 00:00:00
8 2018-01-09 2018-01-09 00:00:00
9 2018-01-10 2018-01-10 00:00:00

Related

convert the date column of the dataframe to the date type

I have the following dataframe:
import pandas as pd
from datetime import datetime
df = pd.DataFrame({'Id_sensor': [1, 2, 3, 4],
'Date_start': ['2018-01-04 00:00:00.0', '2018-01-04 00:00:10.0',
'2018-01-04 00:14:00.0', '2018-01-04'],
'Date_end': ['2018-01-05', '2018-01-06', '2017-01-06', '2018-01-05']})
The columns (Date_start and Date_end) are of type Object. I would like to transform to the data type of dates. And make the columns look the same. That is, in other words, fill in the date, hour and minute fields with zeros that the column (Date_end) does not have.
I tried to make the following code:
df['Date_start'] = pd.to_datetime(df['Date_start'], format='%Y/%m/%d %H:%M:%S')
df['Date_end'] = pd.to_datetime(df['Date_end'], format='%Y/%m/%d %H:%M:%S')
My output:
Id_sensor Date_start Date_end
1 2018-01-04 00:00:00 2018-01-05
2 2018-01-04 00:00:10 2018-01-06
3 2018-01-04 00:14:00 2017-01-06
4 2018-01-04 00:00:00 2018-01-05
But I would like the output to be like this:
Id_sensor Date_start Date_end
1 2018-01-04 00:00:00 2018-01-05 00:00:00
2 2018-01-04 00:00:10 2018-01-06 00:00:00
3 2018-01-04 00:14:00 2017-01-06 00:00:00
4 2018-01-04 00:00:00 2018-01-05 00:00:00
Actually what is happening is that both Series df['Date_start'] and df['Date_end'] are of type datetime64[ns], but when you show the dataframe, if all the time values of the columns are zero, it doesn't show them. What you can try, if you need a formatted output, is to convert them to object types again, and give them format with dt.strftime:
df['Date_start'] = pd.to_datetime(df['Date_start']).dt.strftime('%Y/%m/%d %H:%M:%S')
df['Date_end'] = pd.to_datetime(df['Date_end']).dt.strftime('%Y/%m/%d %H:%M:%S')
print (df)
Outputs:
Id_sensor Date_start Date_end
0 1 2018/01/04 00:00:00 2018/01/05 00:00:00
1 2 2018/01/04 00:00:10 2018/01/06 00:00:00
2 3 2018/01/04 00:14:00 2017/01/06 00:00:00
3 4 2018/01/04 00:00:00 2018/01/05 00:00:00
You can first convert your columns to datetime datatype using to_datetime, and subsequently use dt.strftime to convert the columns to string datatype with your desired format:
import pandas as pd
from datetime import datetime
df = pd.DataFrame({
'Id_sensor': [1, 2, 3, 4],
'Date_start': ['2018-01-04 00:00:00.0', '2018-01-04 00:00:10.0',
'2018-01-04 00:14:00.0', '2018-01-04'],
'Date_end': ['2018-01-05', '2018-01-06', '2017-01-06', '2018-01-05']})
df['Date_start'] = pd.to_datetime(df['Date_start']).dt.strftime('%Y-%m-%d %H:%M:%S')
df['Date_end'] = pd.to_datetime(df['Date_end']).dt.strftime('%Y-%m-%d %H:%M:%S')
print(df)
# Output:
#
# Id_sensor Date_start Date_end
# 0 1 2018-01-04 00:00:00 2018-01-05 00:00:00
# 1 2 2018-01-04 00:00:10 2018-01-06 00:00:00
# 2 3 2018-01-04 00:14:00 2017-01-06 00:00:00
# 3 4 2018-01-04 00:00:00 2018-01-05 00:00:00

Change date-time to a single digit hour ending for dataframe in Python

I have a CSV file that has a column that has values like:
10/23/2018 11:00:00 PM
I want to convert these values strictly by time and create a new column which takes the time of the entry (11:00:00 etc) and changes it into an hour ending time.
Example looks like:
11:00:00 PM to 12:00:00 AM = 24, 12:00:00 AM to 1:00:00 AM = 1, 1:00:00 AM to 2:00:00 AM = 2 .....etc
Looking for a simple way to calculate these by indexing them based off this conversion.
My first pseudo code idea is to do something like grabbing the column df['Date'] and finding out what the time is:
file = pd.read_csv()
def conv(n):
date_time = n.iloc[1,1] #Position of the date-time column in file
for i in date_time:
time = date_time[11:] #Point of the line where time begins
Unsure how to proceed.
You can also do this:
import pandas as pd
data ='''
10/23/2018 11:00:00 PM
10/23/2018 12:00:00 AM
'''.strip().split('\n')
df = pd.DataFrame(data, columns=['date'])
df['date'] = pd.to_datetime(df['date'])
#df['pad1hour'] = df['date'].dt.hour+1
#or
df['pad1hour'] = df['date'] + pd.Timedelta('1 hours')
# I prefer the second as you can add whatever interval e.g. '1 days 3 minutes'
print(df['pad1hour'].dt.time)
You should convert to a datetime with pd.to_datetime(df.your_col) (your format will be automatically parsed correctly, though you can specify it to improve the speed) and then you can use the .dt.hour accessor.
import pandas as pd
# Sample Data
df = pd.DataFrame({'date': pd.date_range('2018-01-01', '2018-01-03', freq='30min')})
df['hour'] = df.date.dt.hour+1
print(df.sample(20))
date hour
95 2018-01-02 23:30:00 24
66 2018-01-02 09:00:00 10
82 2018-01-02 17:00:00 18
80 2018-01-02 16:00:00 17
75 2018-01-02 13:30:00 14
83 2018-01-02 17:30:00 18
49 2018-01-02 00:30:00 1
47 2018-01-01 23:30:00 24
30 2018-01-01 15:00:00 16
52 2018-01-02 02:00:00 3
29 2018-01-01 14:30:00 15
86 2018-01-02 19:00:00 20
59 2018-01-02 05:30:00 6
65 2018-01-02 08:30:00 9
92 2018-01-02 22:00:00 23
8 2018-01-01 04:00:00 5
91 2018-01-02 21:30:00 22
10 2018-01-01 05:00:00 6
89 2018-01-02 20:30:00 21
51 2018-01-02 01:30:00 2
This is the best way to do it:
from datetime import timedelta
import pandas as pd
file = pd.read_csv()
Case One: If you want to keep the date
file['New datetime'] = file['Date_time'].apply(lambda x: pd.to_datetime(x) + timedelta(hours = 1))
Case Two: If you just want the time
file['New time'] = file['Date_time'].apply(lambda x: (pd.to_datetime(x) + timedelta(hours = 1)).time())
If you need the column's data type as string instead of Timestamp you can just do:
file['New time'] = file['New time'].astype(str)
To convert it to a readable string.
Hope it helps.

Create datetime column from month and day with year based on month

I have columnar data of dates of the form mm-dd as shown. I need to add the correct year (dates October to December are 2017 and dates after 1-1 are 2018) and make a datetime object. The code below works, but it's ugly. Is there a more Pythonic way to accomplish this?
import pandas as pd
from datetime import datetime
import io
data = '''Date
1-3
1-2
1-1
12-21
12-20
12-19
12-18'''
df = pd.read_csv(io.StringIO(data))
for i,s in enumerate(df.Date):
s = s.split('-')
if int(s[0]) >= 10:
s = s[0]+'-'+s[1]+'-17'
else:
s = s[0]+'-'+s[1]+'-18'
df.Date[i] = pd.to_datetime(s)
print(df.Date[i])
Prints:
2018-01-03 00:00:00
2018-01-02 00:00:00
2018-01-01 00:00:00
2017-12-21 00:00:00
2017-12-20 00:00:00
2017-12-19 00:00:00
2017-12-18 00:00:00
You can conver the date to pandas datetimeobjects. Then modify their year with datetime.replace. See docs for more information.
You can use the below code:
df['Date'] = pd.to_datetime(df['Date'], format="%m-%d")
df['Date'] = df['Date'].apply(lambda x: x.replace(year=2017) if x.month in(range(10,13)) else x.replace(year=2018))
Output:
Date
0 2018-01-03
1 2018-01-02
2 2018-01-01
3 2017-12-21
4 2017-12-20
5 2017-12-19
6 2017-12-18
This is one way using pandas vectorised functionality:
df['Date'] = pd.to_datetime(df['Date'] + \
np.where(df['Date'].str.split('-').str[0].astype(int).between(10, 12),
'-2017', '-2018'))
print(df)
Date
0 2018-01-03
1 2018-01-02
2 2018-01-01
3 2017-12-21
4 2017-12-20
5 2017-12-19
6 2017-12-18

Splitting a datetime, python, pandas

Sorry I am new to asking questions on stackoverflow so I don't understand how to format properly.
So I'm given a Pandas dataframe that contains column of datetime which contains the date and the time and an associated column that contains some sort of value. The given dates and times are incremented by the hour. I would like to manipulate the dataframe to have them increment every 15 minutes, but retain the same value. How would I do that? Thanks!
I have tried :
df = df.asfreq('15Min',method='ffill').
But I get a error:
"TypeError: Cannot compare type 'Timestamp' with type 'long'"
current dataframe:
datetime value
00:00:00 1
01:00:00 2
new dataframe:
datetime value
00:00:00 1
00:15:00 1
00:30:00 1
00:45:00 1
01:00:00 2
01:15:00 2
01:30:00 2
01:45:00 2
Update:
The approved answer below works, but so does the initial code I tried above
df = df.asfreq('15Min',method='ffill'). I was messing around with other Dataframes and I seemed to be having trouble with some null values so I took care of that with a fillna statements and everything worked.
You can use TimedeltaIndex, but is necessary manually add last value for correct reindex:
df['datetime'] = pd.to_timedelta(df['datetime'])
df = df.set_index('datetime')
tr = pd.timedelta_range(df.index.min(),
df.index.max() + pd.Timedelta(45*60, unit='s'), freq='15Min')
df = df.reindex(tr, method='ffill')
print (df)
value
00:00:00 1
00:15:00 1
00:30:00 1
00:45:00 1
01:00:00 2
01:15:00 2
01:30:00 2
01:45:00 2
Another solution with resample and same problem - need append new value for correct appending last values:
df['datetime'] = pd.to_timedelta(df['datetime'])
df = df.set_index('datetime')
df.loc[df.index.max() + pd.Timedelta(1, unit='h')] = 1
df = df.resample('15Min').ffill().iloc[:-1]
print (df)
value
datetime
00:00:00 1
00:15:00 1
00:30:00 1
00:45:00 1
01:00:00 2
01:15:00 2
01:30:00 2
01:45:00 2
But if values are datetimes:
print (df)
datetime value
0 2018-01-01 00:00:00 1
1 2018-01-01 01:00:00 2
df['datetime'] = pd.to_datetime(df['datetime'])
df = df.set_index('datetime')
tr = pd.date_range(df.index.min(),
df.index.max() + pd.Timedelta(45*60, unit='s'), freq='15Min')
df = df.reindex(tr, method='ffill')
df['datetime'] = pd.to_datetime(df['datetime'])
df = df.set_index('datetime')
df.loc[df.index.max() + pd.Timedelta(1, unit='h')] = 1
df = df.resample('15Min').ffill().iloc[:-1]
print (df)
value
datetime
2018-01-01 00:00:00 1
2018-01-01 00:15:00 1
2018-01-01 00:30:00 1
2018-01-01 00:45:00 1
2018-01-01 01:00:00 2
2018-01-01 01:15:00 2
2018-01-01 01:30:00 2
2018-01-01 01:45:00 2
You can use pandas.daterange
pd.date_range('00:00:00', '01:00:00', freq='15T')

python/pandas - converting date and hour integers to datetime

I have a dataframe that has a date column and an hour column.
DATE HOUR
2015-1-1 1
2015-1-1 2
. .
. .
. .
2015-1-1 24
I want to convert these columns into a datetime format something like:
2015-12-26 01:00:00
You could first convert df.DATE to datetime column and add df.HOUR delta via timedelta64[h]
In [10]: df
Out[10]:
DATE HOUR
0 2015-1-1 1
1 2015-1-1 2
2 2015-1-1 24
In [11]: pd.to_datetime(df.DATE) + df.HOUR.astype('timedelta64[h]')
Out[11]:
0 2015-01-01 01:00:00
1 2015-01-01 02:00:00
2 2015-01-02 00:00:00
dtype: datetime64[ns]
Or, use pd.to_timedelta
In [12]: pd.to_datetime(df.DATE) + pd.to_timedelta(df.HOUR, unit='h')
Out[12]:
0 2015-01-01 01:00:00
1 2015-01-01 02:00:00
2 2015-01-02 00:00:00
dtype: datetime64[ns]

Categories

Resources