I have a dataframe that has a date column and an hour column.
DATE HOUR
2015-1-1 1
2015-1-1 2
. .
. .
. .
2015-1-1 24
I want to convert these columns into a datetime format something like:
2015-12-26 01:00:00
You could first convert df.DATE to datetime column and add df.HOUR delta via timedelta64[h]
In [10]: df
Out[10]:
DATE HOUR
0 2015-1-1 1
1 2015-1-1 2
2 2015-1-1 24
In [11]: pd.to_datetime(df.DATE) + df.HOUR.astype('timedelta64[h]')
Out[11]:
0 2015-01-01 01:00:00
1 2015-01-01 02:00:00
2 2015-01-02 00:00:00
dtype: datetime64[ns]
Or, use pd.to_timedelta
In [12]: pd.to_datetime(df.DATE) + pd.to_timedelta(df.HOUR, unit='h')
Out[12]:
0 2015-01-01 01:00:00
1 2015-01-01 02:00:00
2 2015-01-02 00:00:00
dtype: datetime64[ns]
Related
I have a dataframe including a datetime column for date and a column for hour.
like this:
min hour date
0 0 2020-12-01
1 5 2020-12-02
2 6 2020-12-01
I need a datetime column including both date and hour.
like this :
min hour date datetime
0 0 2020-12-01 2020-12-01 00:00:00
0 5 2020-12-02 2020-12-02 05:00:00
0 6 2020-12-01 2020-12-01 06:00:00
How can I do it?
Use pd.to_datetime and pd.to_timedelta:
In [393]: df['date'] = pd.to_datetime(df['date'])
In [396]: df['datetime'] = df['date'] + pd.to_timedelta(df['hour'], unit='h')
In [405]: df
Out[405]:
min hour date datetime
0 0 0 2020-12-01 2020-12-01 00:00:00
1 1 5 2020-12-02 2020-12-02 05:00:00
2 2 6 2020-12-01 2020-12-01 06:00:00
You could also try using apply and np.timedelta64:
df['datetime'] = df['date'] + df['hour'].apply(lambda x: np.timedelta64(x, 'h'))
print(df)
Output:
min hour date datetime
0 0 0 2020-12-01 2020-12-01 00:00:00
1 1 5 2020-12-02 2020-12-02 05:00:00
2 2 6 2020-12-01 2020-12-01 06:00:00
In the first question it is not clear the data type of columns, so i thought they are
in date (not pandas) and he want the datetime version.
If this is the case so, solution is similar to the previous, but using a different constructor.
from datetime import datetime
df['datetime'] = df.apply(lambda x: datetime(x.date.year, x.date.month, x.date.day, int(x['hour']), int(x['min'])), axis=1)
I have the following dataframe:
import pandas as pd
from datetime import datetime
df = pd.DataFrame({'Id_sensor': [1, 2, 3, 4],
'Date_start': ['2018-01-04 00:00:00.0', '2018-01-04 00:00:10.0',
'2018-01-04 00:14:00.0', '2018-01-04'],
'Date_end': ['2018-01-05', '2018-01-06', '2017-01-06', '2018-01-05']})
The columns (Date_start and Date_end) are of type Object. I would like to transform to the data type of dates. And make the columns look the same. That is, in other words, fill in the date, hour and minute fields with zeros that the column (Date_end) does not have.
I tried to make the following code:
df['Date_start'] = pd.to_datetime(df['Date_start'], format='%Y/%m/%d %H:%M:%S')
df['Date_end'] = pd.to_datetime(df['Date_end'], format='%Y/%m/%d %H:%M:%S')
My output:
Id_sensor Date_start Date_end
1 2018-01-04 00:00:00 2018-01-05
2 2018-01-04 00:00:10 2018-01-06
3 2018-01-04 00:14:00 2017-01-06
4 2018-01-04 00:00:00 2018-01-05
But I would like the output to be like this:
Id_sensor Date_start Date_end
1 2018-01-04 00:00:00 2018-01-05 00:00:00
2 2018-01-04 00:00:10 2018-01-06 00:00:00
3 2018-01-04 00:14:00 2017-01-06 00:00:00
4 2018-01-04 00:00:00 2018-01-05 00:00:00
Actually what is happening is that both Series df['Date_start'] and df['Date_end'] are of type datetime64[ns], but when you show the dataframe, if all the time values of the columns are zero, it doesn't show them. What you can try, if you need a formatted output, is to convert them to object types again, and give them format with dt.strftime:
df['Date_start'] = pd.to_datetime(df['Date_start']).dt.strftime('%Y/%m/%d %H:%M:%S')
df['Date_end'] = pd.to_datetime(df['Date_end']).dt.strftime('%Y/%m/%d %H:%M:%S')
print (df)
Outputs:
Id_sensor Date_start Date_end
0 1 2018/01/04 00:00:00 2018/01/05 00:00:00
1 2 2018/01/04 00:00:10 2018/01/06 00:00:00
2 3 2018/01/04 00:14:00 2017/01/06 00:00:00
3 4 2018/01/04 00:00:00 2018/01/05 00:00:00
You can first convert your columns to datetime datatype using to_datetime, and subsequently use dt.strftime to convert the columns to string datatype with your desired format:
import pandas as pd
from datetime import datetime
df = pd.DataFrame({
'Id_sensor': [1, 2, 3, 4],
'Date_start': ['2018-01-04 00:00:00.0', '2018-01-04 00:00:10.0',
'2018-01-04 00:14:00.0', '2018-01-04'],
'Date_end': ['2018-01-05', '2018-01-06', '2017-01-06', '2018-01-05']})
df['Date_start'] = pd.to_datetime(df['Date_start']).dt.strftime('%Y-%m-%d %H:%M:%S')
df['Date_end'] = pd.to_datetime(df['Date_end']).dt.strftime('%Y-%m-%d %H:%M:%S')
print(df)
# Output:
#
# Id_sensor Date_start Date_end
# 0 1 2018-01-04 00:00:00 2018-01-05 00:00:00
# 1 2 2018-01-04 00:00:10 2018-01-06 00:00:00
# 2 3 2018-01-04 00:14:00 2017-01-06 00:00:00
# 3 4 2018-01-04 00:00:00 2018-01-05 00:00:00
I am trying to extract a Date and a Time from a Timestamp:
DateTime
31/12/2015 22:45
to be:
Date | Time |
31/12/2015| 22:45 |
however when I use:
df['Date'] = pd.to_datetime(df['DateTime']).dt.date
I Get :
2015-12-31
Similarly with Time i get:
df['Time'] = pd.to_datetime(df['DateTime']).dt.time
gives
23:45:00
but if I try to format it I get an error:
df['Date'] = pd.to_datetime(f['DateTime'], format='%d/%m/%Y').dt.date
ValueError: unconverted data remains: 00:00
Try strftime
df['DateTime'] = pd.to_datetime(df['DateTime'])
df['Date'] = df['DateTime'].dt.strftime('%d/%m/%Y')
df['Time'] = df['DateTime'].dt.strftime('%H:%M')
DateTime Date Time
0 2015-12-31 22:45:00 31/12/2015 22:45
Option 1
Since you don't really need to operate on the dates per se, just split your column on space:
df = df.DateTime.str.split(expand=True)
df.columns = ['Date', 'Time']
df
Date Time
0 31/12/2015 22:45
Option 2
Alternatively, just drop the format specifier completely:
v = pd.to_datetime(df['DateTime'], errors='coerce')
df['Time'] = v.dt.time
df['Date'] = v.dt.floor('D')
df
Time Date
0 22:45:00 2015-12-31
If your DateTime column is already a datetime type, you shouldn't need to call pd.to_datetime on it.
Are you looking for a string ("12:34") or a timestamp (the concept of 12:34 in the afternoon)? If you're looking for the former, there are answers here that cover that. If you're looking for the latter, you can use the .dt.time and .dt.date accessors.
>>> pd.__version__
u'0.20.2'
>>> df = pd.DataFrame({'DateTime':pd.date_range(start='2018-01-01', end='2018-01-10')})
>>> df['date'] = df.DateTime.dt.date
>>> df['time'] = df.DateTime.dt.time
>>> df
DateTime date time
0 2018-01-01 2018-01-01 00:00:00
1 2018-01-02 2018-01-02 00:00:00
2 2018-01-03 2018-01-03 00:00:00
3 2018-01-04 2018-01-04 00:00:00
4 2018-01-05 2018-01-05 00:00:00
5 2018-01-06 2018-01-06 00:00:00
6 2018-01-07 2018-01-07 00:00:00
7 2018-01-08 2018-01-08 00:00:00
8 2018-01-09 2018-01-09 00:00:00
9 2018-01-10 2018-01-10 00:00:00
I have data in a table as presented below:
YEAR DOY Hour
2015 1 0
2015 1 1
2015 1 2
2015 1 3
2015 1 4
2015 1 5
This is how I'm reading the file:
df = pd.read_table('data2015.lst', sep='\s+')
lines = len(df)
To convert it to a datetime object I do:
dates = []
for l in range(0,lines):
date = str(df.ix[l,0])[:-2] +' '+ str(df.ix[l,1])[:-2] +' '+ str(df.ix[l,2])[:-2]
d = pd.to_datetime(date, format='%Y %j %H')
dates.append(d)
But this is taking a lot of time.
Is there some way to do it (more directly) without the loop?
You can do it in one line when reading it:
df = pd.read_csv('file.txt', sep='\s+', index_col='Timestamp',
parse_dates={'Timestamp': [0,1,2]},
date_parser=lambda x: pd.datetime.strptime(x, '%Y %j %H'))
Timestamp
2015-01-01 00:00:00
2015-01-01 01:00:00
2015-01-01 02:00:00
2015-01-01 03:00:00
2015-01-01 04:00:00
2015-01-01 05:00:00
I have two dates in pandas dataframes (df1.a_date & df2.another_date) read from CSV files. They match at the date level (YYYY-MM-DD) but not at the time (HH:MM:SS). Both are read in as dtype: object.
I need to merge the two dataframes on the dates, but since they aren't exact, i probably need to convert them first. Any ideas?
edit:
I've tried using diatomite.date to construct a new date from the pandas.datetime, but that doesn't seem to work.
datetime.date(df.a_date.year, df.a_date.month, df.a_date.day)
pandas datetime objects don't have year, month, day accessors, though.
You can normalize a date column/DatetimeIndex index:
Note: At the moment normalize isn't exported to the dt accessor so we need to wrap with DatetimeIndex.
In [11]: df = pd.DataFrame(pd.date_range('2015-01-01 05:00', periods=3), columns=['datetime'])
In [12]: df
Out[12]:
datetime
0 2015-01-01 05:00:00
1 2015-01-02 05:00:00
2 2015-01-03 05:00:00
In [13]: df["date"] = pd.DatetimeIndex(df["datetime"]).normalize()
In [14]: df
Out[14]:
datetime date
0 2015-01-01 05:00:00 2015-01-01
1 2015-01-02 05:00:00 2015-01-02
2 2015-01-03 05:00:00 2015-01-03
This works if it's a DatetimeIndex too, use df.index rather than df[col_name].
Format the datetime to only include YYYY-MM-DD:
assuming df is your dataframe:
'{:%Y-%m-%d}'.format(d)
Assume, dft is your dataframe and 'index' column contains datetime:
In [1804]: dft.head()
Out[1804]:
index A
0 2013-01-01 00:00:00 1.193366
1 2013-01-01 00:01:00 1.013425
2 2013-01-01 00:02:00 1.281902
3 2013-01-01 00:03:00 -0.043788
4 2013-01-01 00:04:00 -1.610164
You could convert the column to contain just the date and save it in a different column, if you want. And operate on that:
In [1805]: dft['index'].apply(lambda v:v.date()).head()
Out[1805]:
0 2013-01-01
1 2013-01-01
2 2013-01-01
3 2013-01-01
4 2013-01-01
Name: index, dtype: object