I'm trying to convert some date time data in to pandas.to_datetime() format. It is not working and the type of df['Time'] is Object. Where is wrong?
Please Note that I have attached my time file.
My Code
import pandas as pd
import numpy as np
from datetime import datetime
f = open('time','r')
lines = f.readlines()
t = []
for line in lines:
time = line.split()[1][-20:]
time2 = time[:11] + ' ' +time[12:21]
t.append(time2)
df = pd.DataFrame(t)
df.columns = ['Time']
df['Time'] = pd.to_datetime(df['Time'])
print df['Time']
Name: Time, Length: 16136, dtype: object
please find the attach time data file here
The file time contain some invalid data.
For example, line 8323 contain 8322 "5/Jul/2013::8:25:18 0530",
which is different from normal lines 8321 "15/Jul/2013:18:25:18 +0530".
8321 "15/Jul/2013:18:25:18 +0530"
8322 "5/Jul/2013::8:25:18 0530"
For normal line, time2 become 15/Jul/2013 18:25:18, but for invalid line "5/Jul/2013::8:25:18
15/Jul/2013 18:25:18
"5/Jul/2013::8:25:18
Which cause some lines are parsed to datetime, and some lines not; data are coerced to object (to contain both datetime and string).
>>> pd.Series(pd.to_datetime(['15/Jul/2013 18:25:18', '15/Jul/2013 18:25:18']))
0 2013-07-15 18:25:18
1 2013-07-15 18:25:18
dtype: datetime64[ns]
>>> pd.Series(pd.to_datetime(['15/Jul/2013 18:25:18', '*5/Jul/2013 18:25:18']))
0 15/Jul/2013 18:25:18
1 *5/Jul/2013 18:25:18
dtype: object
If you take only first 5 data (which has correct date format) from files, you will get what you expected.
...
df = pd.DataFrame(t[:5])
df.columns = ['Time']
df['Time'] = pd.to_datetime(df['Time'])
Above code yield:
0 2013-07-15 00:00:12
1 2013-07-15 00:00:18
2 2013-07-15 00:00:23
3 2013-07-15 00:00:27
4 2013-07-15 00:00:29
Name: Time, dtype: datetime64[ns]
UPDATE
Added a small example that show the cause of dtype of object, not datetime.
Related
I have a similar question to this: Convert date column (string) to datetime and match the format and I want to convert a string like '12/7/21' to '2021-07-12' as a date object. I believe the answer given in the link above is wrong and here is why:
# The suggested solution on Stackoverflow
>>> import pandas as pd
>>> df = pd.DataFrame({'Date':['15/7/21']})
>>> df['Date']
0 15/7/21
Name: Date, dtype: object
>>> pd.to_datetime(df['Date'].astype('datetime64'),format='%d/%m/%y')
0 2021-07-15
Name: Date, dtype: datetime64[ns]
Because Python doesn't care about the specified format in the above code! If you simply change 15 to 12 and input '12/7/21' then it treats 12 as month instead of day:
>>> df = pd.DataFrame({'Date':['12/7/21']})
>>> df['Date']
0 12/7/21
Name: Date, dtype: object
>>> pd.to_datetime(df['Date'].astype('datetime64'),format='%d/%m/%y')
0 2021-12-07
Name: Date, dtype: datetime64[ns]
Does anyone know what's the best solution to this problem?
(In R you simply use lubridate::dmy(df$Date) and it works perfectly)
.astype('datetime64') attempts to parse the string MM/DD/YY however if it can't (in the case that MM > 12) it will fall back to parsing as DD/MM/YY this is why you see seemingly inconsistent behaviour:
>>> import pandas as pd
>>> pd.Series('15/7/21').astype('datetime64')
0 2021-07-15
dtype: datetime64[ns]
>>> pd.Series('14/7/21').astype('datetime64')
0 2021-07-14
dtype: datetime64[ns]
>>> pd.Series('13/7/21').astype('datetime64')
0 2021-07-13
dtype: datetime64[ns]
>>> pd.Series('12/7/21').astype('datetime64')
0 2021-12-07
dtype: datetime64[ns]
The way to solve this is just to pass a Series of strings to pd.to_datetime instead of intermediately converting to datetime64s. So you can simply do
pd.to_datetime(df['Date'], format='%d/%m/%y')
without the .astype cast
Hello there stackoverflow community,
I would like to change the datetime format of a column, but I doesn't work and I don't know what I'am doing wrong.
After executing the following code:
df6['beginn'] = pd.to_datetime(df6['beginn'], unit='s', errors='ignore')
I got this output, and thats fine, but i would like to take out the hour to have only %m/%d/%Y left.
ID DATE
91060 2017-11-10 00:00:00
91061 2022-05-01 00:00:00
91062 2022-04-01 00:00:00
Name: beginn, Length: 91063, dtype: object
I've tried this one and many others
df6['beginn'] = df6['beginn'].dt.strftime('%m/%d/%Y')
and get the following output:
AttributeError: Can only use .dt accessor with datetimelike values.
But I don't understand why, I've transformed the data with pd.to_datetime or not?
Appreciate any hint you can give me! Thanks a lot!
The reason you have to use errors="ignore" is because not all the dates you are parsing are in the correct format. If you use errors="coerce" like #phi has mentioned then any dates that cannot be converted will be set to NaT. The columns datatype will still be converted to datatime64 and you can then format as you like and deal with the NaT as you want.
Example
A dataframe with one item in Date not written as Year/Month/Day (25th Month is wrong):
>>> df = pd.DataFrame({'ID': [91060, 91061, 91062, 91063], 'Date': ['2017/11/10', '2022/05/01', '2022/04/01', '2055/25/25']})
>>> df
ID Date
0 91060 2017/11/10
1 91061 2022/05/01
2 91062 2022/04/01
3 91063 2055/25/25
>>> df.dtypes
ID int64
Date object
dtype: object
Using errors="ignore":
>>> df['Date'] = pd.to_datetime(df['Date'], errors='ignore')
>>> df
ID Date
0 91060 2017/11/10
1 91061 2022/05/01
2 91062 2022/04/01
3 91063 2055/25/25
>>> df.dtypes
ID int64
Date object
dtype: object
Column Date is still an object because not all the values have been converted. Running df['Date'] = df['Date'].dt.strftime("%m/%d/%Y") will result in the AttributeError
Using errors="coerce":
>>> df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
>>> df
ID Date
0 91060 2017-11-10
1 91061 2022-05-01
2 91062 2022-04-01
3 91063 NaT
>>> df.dtypes
ID int64
Date datetime64[ns]
dtype: object
Invalid dates are set to NaT and the column is now of type datatime64 and you can now format it:
>>> df['Date'] = df['Date'].dt.strftime("%m/%d/%Y")
>>> df
ID Date
0 91060 11/10/2017
1 91061 05/01/2022
2 91062 04/01/2022
3 91063 NaN
Note: When formatting datatime64, it is converted back to type object so NaT's are changed to NaN. The issue you are having is a case of some dirty data not in the correct format.
import pandas as pd
import datetime
dictt={'s_time': ["06:30:00", "07:30:00","16:30:00"], 'f_time': ["10:30:00", "23:30:00","23:30:00"]}
df=pd.DataFrame(dictt)
in this case i want to convert them times in to datetime object so i can later on use it for calculation or others.
when i command df['s_time']=pd.to_datetime(df['s_time'],format='%H:%M:%S').dt.time
it gives error:
time data '24:00:00' does not match format '%H:%M:%S' (match)
so i dont know how to fix this
"24:00:00" means "00:00:00"
If it's just "24:00:00" that's causing trouble, you can replace the "24:" prefix with "00:":
import pandas as pd
df = pd.DataFrame({'time': ["06:30:24", "07:24:00", "24:00:00"]})
# replace prefix "24:" with "00:"
df['time'] = df['time'].str.replace('^24:', '00:', regex=True)
# now to_datetime
df['time'] = pd.to_datetime(df['time'])
df['time']
0 2021-04-17 06:30:24
1 2021-04-17 07:24:00
2 2021-04-17 00:00:00
Name: time, dtype: datetime64[ns]
1 to 24 hour clock (instead of 0 to 23)
If however your time notation goes from 1 to 24 hours (instead of 0 to 23), you can parse string to timedelta, subtract one hour and then cast to datetime:
df = pd.DataFrame({'time': ["06:30:24", "07:24:00", "24:00:00"]})
# to timedelta and subtract one hour
df['time'] = pd.to_timedelta(df['time']) - pd.Timedelta(hours=1)
# to string and then datettime:
df['time'] = pd.to_datetime(df['time'].astype(str).str.split(' ').str[-1])
df['time']
0 2021-04-17 05:30:24
1 2021-04-17 06:24:00
2 2021-04-17 23:00:00
Name: time, dtype: datetime64[ns]
Note: the underlying assumption here is that the date is irrelevant. If there also is a date, see the related question I linked in the comments section.
I have a column in a dataframe which has timestamps and their datatype is object (string):
data_log = pd.read_csv(DATA_LOG_PATH)
print(data_log['LocalTime'])
0 09:38:49
1 09:38:50
2 09:38:51
3 09:38:52
4 09:38:53
...
Name: LocalTime, Length: 872, dtype: object
Now I try to convert to datetime:
data_log['LocalTime'] = pd.to_datetime(data_log['LocalTime'], format='%H:%M:%S')
print(data_log['LocalTime'])
0 1900-01-01 09:38:49
1 1900-01-01 09:38:50
2 1900-01-01 09:38:51
3 1900-01-01 09:38:52
4 1900-01-01 09:38:53
...
Name: LocalTime, Length: 872, dtype: datetime64[ns]
How do I remove that date there? I just want the time in the format that I specified, but it adds the 1900-01-01 to every row.
You can get the time part of a datetime series with Series.dt.time
print(data_log['LocalTime'].dt.time)
This series will consist of Python standard library datetime.time objects.
You can do it in different ways from the datatype with 1900-01-01:
data_log['LocalTime'] = pd.Series([lt.time() for lt in data_log['LocalTime']])
or using a lambda function:
data_log['LocalTime'] = data_log.LocalTime.apply(lambda x: x.time(), axis = 1)
For check the type in specific columns:
print(df['LocalTime'].dtypes)
to_dateTime func from pandas
https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html
df['LocalTime'] = pd.to_datetime(df['timestamp'], unit='s')
where: unit='s' defines the unit of the timestamp (seconds in this case)
For taking consider timezones:
df.rimestamp.dt.tz_localize('UTC').dt.tz_convert('Europe/Brussels')
What's the best way to do this? I thought about extracting the two separately then combining them? This doesn't seem like it should be the most efficient way?
df['date'] = df['datetime'].dt.date
df['hour'] = df['datetime'].hour
df['dateAndHour'] = df['datetime'].dt.date.astype(str) + ' ' + df['datetime'].dt.hour.astype(str)
You can use strftime and it depends on the format your date is in and how you want to combine them
from datetime import datetime
import pandas as pd
df = pd.DataFrame({'date':[datetime.now()]})
df['date-hour'] = df.date.dt.strftime('%Y-%m-%d %H')
df
date date-hour
0 2020-11-18 11:03:38.390393 2020-11-18 11
Depends what you want to do with it, but one way to do this would be to use strftime to format the datetime column to %Y-%m-%d %H or similar:
>>> df
datetime
0 2020-01-01 12:15:00
1 2020-10-22 11:11:11
>>> df.datetime.dt.strftime("%Y-%m-%d %H")
0 2020-01-01 12
1 2020-10-22 11
Name: datetime, dtype: object