converting "H:M:S" string in pandas to datetime object - python

import pandas as pd
import datetime
dictt={'s_time': ["06:30:00", "07:30:00","16:30:00"], 'f_time': ["10:30:00", "23:30:00","23:30:00"]}
df=pd.DataFrame(dictt)
in this case i want to convert them times in to datetime object so i can later on use it for calculation or others.
when i command df['s_time']=pd.to_datetime(df['s_time'],format='%H:%M:%S').dt.time
it gives error:
time data '24:00:00' does not match format '%H:%M:%S' (match)
so i dont know how to fix this

"24:00:00" means "00:00:00"
If it's just "24:00:00" that's causing trouble, you can replace the "24:" prefix with "00:":
import pandas as pd
df = pd.DataFrame({'time': ["06:30:24", "07:24:00", "24:00:00"]})
# replace prefix "24:" with "00:"
df['time'] = df['time'].str.replace('^24:', '00:', regex=True)
# now to_datetime
df['time'] = pd.to_datetime(df['time'])
df['time']
0 2021-04-17 06:30:24
1 2021-04-17 07:24:00
2 2021-04-17 00:00:00
Name: time, dtype: datetime64[ns]
1 to 24 hour clock (instead of 0 to 23)
If however your time notation goes from 1 to 24 hours (instead of 0 to 23), you can parse string to timedelta, subtract one hour and then cast to datetime:
df = pd.DataFrame({'time': ["06:30:24", "07:24:00", "24:00:00"]})
# to timedelta and subtract one hour
df['time'] = pd.to_timedelta(df['time']) - pd.Timedelta(hours=1)
# to string and then datettime:
df['time'] = pd.to_datetime(df['time'].astype(str).str.split(' ').str[-1])
df['time']
0 2021-04-17 05:30:24
1 2021-04-17 06:24:00
2 2021-04-17 23:00:00
Name: time, dtype: datetime64[ns]
Note: the underlying assumption here is that the date is irrelevant. If there also is a date, see the related question I linked in the comments section.

Related

Pandas Date Formatting (With Optional Milliseconds)

I'm getting data from an API and putting it into a Pandas DataFrame. The date column needs formatting into date/time, which I am doing. However the API sometimes returns dates without milliseconds which doesn't match the format pattern. This results in an error:
time data '2020-07-30T15:57:37Z' does not match format '%Y-%m-%dT%H:%M:%S.%fZ' (match)
In this example, how can I format the date column to date/time, so all dates are formatted with milliseconds?
import pandas as pd
dates = {
'date': ['2020-07-30T15:57:37Z', '2020-07-30T15:57:37.1Z']
}
df = pd.DataFrame(dates)
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%dT%H:%M:%S.%fZ')
print(df)
do it one time with milliseconds included and another time without milliseconds included. use errors='coerce' to return NaT when ValueError occurs.
with_miliseconds = pd.to_datetime(df['date'], format='%Y-%m-%dT%H:%M:%S.%fZ',errors='coerce')
without_miliseconds = pd.to_datetime(df['date'], format='%Y-%m-%dT%H:%M:%SZ',errors='coerce')
the results would be something like this:
with milliseconds:
0 NaT
1 2020-07-30 15:57:37.100
Name: date, dtype: datetime64[ns]
without milliseconds:
0 2020-07-30 15:57:37
1 NaT
Name: date, dtype: datetime64[ns]
then you can fill NaTs of one dataframe with values of the other because they complement each other.
with_miliseconds.fillna(without_miliseconds)
0 2020-07-30 15:57:37.000
1 2020-07-30 15:57:37.100
Name: date, dtype: datetime64[ns]
To have a consistent format in your output DataFrame, you could run a Regex replacement before converting to a df for all values without mills.
dates = {'date': [re.sub(r'Z', '.0Z', date) if '.' not in date else date for date in dates['date']]}
Since only those dates containing a . have mills, we can run the replacements on the others.
After that, everything else is the same as in your code.
Output:
date
0 2020-07-30 15:57:37.000
1 2020-07-30 15:57:37.100
As your date string seems like the standard ISO 8601 you can just avoid the use of the format param. The parser will take into account that miliseconds are optional.
import pandas as pd
dates = {
'date': ['2020-07-30T15:57:37Z', '2020-07-30T15:57:37.1Z']
}
df = pd.DataFrame(dates)
df['date'] = pd.to_datetime(df['date'])
print(df)
date
0 2020-07-30 15:57:37+00:00
1 2020-07-30 15:57:37.100000+00:00

How to remove hours, minutes, seconds and UTC offset from pandas date column? I'm running with streamlit and pandas

How to remove T00:00:00+05:30 after year, month and date values in pandas? I tried converting the column into datetime but also it's showing the same results, I'm using pandas in streamlit. I tried the below code
df['Date'] = pd.to_datetime(df['Date'])
The output is same as below :
Date
2019-07-01T00:00:00+05:30
2019-07-01T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-03T00:00:00+05:30
2019-07-03T00:00:00+05:30
2019-07-04T00:00:00+05:30
2019-07-04T00:00:00+05:30
2019-07-05T00:00:00+05:30
Can anyone help me how to remove T00:00:00+05:30 from the above rows?
If I understand correctly, you want to keep only the date part.
Convert date strings to datetime
df = pd.DataFrame(
columns={'date'},
data=["2019-07-01T02:00:00+05:30", "2019-07-02T01:00:00+05:30"]
)
date
0 2019-07-01T02:00:00+05:30
1 2019-07-02T01:00:00+05:30
2 2019-07-03T03:00:00+05:30
df['date'] = pd.to_datetime(df['date'])
date
0 2019-07-01 02:00:00+05:30
1 2019-07-02 01:00:00+05:30
Remove the timezone
df['datetime'] = df['datetime'].dt.tz_localize(None)
date
0 2019-07-01 02:00:00
1 2019-07-02 01:00:00
Keep the date only
df['date'] = df['date'].dt.date
0 2019-07-01
1 2019-07-02
Don't bother with apply to Python dates or string changes. The former will leave you with an object type column and the latter is slow. Just round to the day frequency using the library function.
>>> pd.Series([pd.Timestamp('2000-01-05 12:01')]).dt.round('D')
0 2000-01-06
dtype: datetime64[ns]
If you have a timezone aware timestamp, convert to UTC with no time zone then round:
>>> pd.Series([pd.Timestamp('2019-07-01T00:00:00+05:30')]).dt.tz_convert(None) \
.dt.round('D')
0 2019-07-01
dtype: datetime64[ns]
Pandas doesn't have a builtin conversion to datetime.date, but you could use .apply to achieve this if you want to have date objects instead of string:
import pandas as pd
import datetime
df = pd.DataFrame(
{"date": [
"2019-07-01T00:00:00+05:30",
"2019-07-01T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-03T00:00:00+05:30",
"2019-07-03T00:00:00+05:30",
"2019-07-04T00:00:00+05:30",
"2019-07-04T00:00:00+05:30",
"2019-07-05T00:00:00+05:30"]})
df["date"] = df["date"].apply(lambda x: datetime.datetime.fromisoformat(x).date())
print(df)

Python pandas - Convert string to datetime without a year using pandas.to_datetime()

I have a column in a dataframe which has timestamps and their datatype is object (string):
data_log = pd.read_csv(DATA_LOG_PATH)
print(data_log['LocalTime'])
0 09:38:49
1 09:38:50
2 09:38:51
3 09:38:52
4 09:38:53
...
Name: LocalTime, Length: 872, dtype: object
Now I try to convert to datetime:
data_log['LocalTime'] = pd.to_datetime(data_log['LocalTime'], format='%H:%M:%S')
print(data_log['LocalTime'])
0 1900-01-01 09:38:49
1 1900-01-01 09:38:50
2 1900-01-01 09:38:51
3 1900-01-01 09:38:52
4 1900-01-01 09:38:53
...
Name: LocalTime, Length: 872, dtype: datetime64[ns]
How do I remove that date there? I just want the time in the format that I specified, but it adds the 1900-01-01 to every row.
You can get the time part of a datetime series with Series.dt.time
print(data_log['LocalTime'].dt.time)
This series will consist of Python standard library datetime.time objects.
You can do it in different ways from the datatype with 1900-01-01:
data_log['LocalTime'] = pd.Series([lt.time() for lt in data_log['LocalTime']])
or using a lambda function:
data_log['LocalTime'] = data_log.LocalTime.apply(lambda x: x.time(), axis = 1)
For check the type in specific columns:
print(df['LocalTime'].dtypes)
to_dateTime func from pandas
https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html
df['LocalTime'] = pd.to_datetime(df['timestamp'], unit='s')
where: unit='s' defines the unit of the timestamp (seconds in this case)
For taking consider timezones:
df.rimestamp.dt.tz_localize('UTC').dt.tz_convert('Europe/Brussels')

Cannot remove timestamp in datetime

I have this date column which the dtype: object and the format is 31-Mar-20. So i tried to turn it with datetime.strptime into datetime64[D] and with format of 2020-03-31 which somehow whatever i have tried it does not work, i have tried some methode from this and this. In some way, it does turn my column to datetime64 but it has timestamp in it and i don't want it. I need it to be datetime without timestamp and the format is 2020-03-31 This is my code
dates = [datetime.datetime.strptime(ts,'%d-%b-%y').strftime('%Y-%m-%d')
for ts in df['date']]
df['date']= pd.DataFrame({'date': dates})
df = df.sort_values(by=['date'])
This approach might work -
import pandas as pd
df = pd.DataFrame({'dates': ['20-Mar-2020', '21-Mar-2020', '22-Mar-2020']})
df
dates
0 20-Mar-2020
1 21-Mar-2020
2 22-Mar-2020
df['dates'] = pd.to_datetime(df['dates'], format='%d-%b-%Y').dt.date
df
dates
0 2020-03-20
1 2020-03-21
2 2020-03-22
df['date'] = pd.to_datetime(df['date'], format="%d-%b-%y")
This converts it to a datetime, when you look at df it displays values as 2020-03-31 like you want, however these are all datetime objects so if you extract one value with df['date'][0] then you see Timestamp('2020-03-31 00:00:00')
if you want to convert them into a date you can do
df['date'] = [df_datetime.date() for df_datetime in df['date'] ]
There is probably a better way of doing this step.

convert to datetime64 format with to_datetime()

I'm trying to convert some date time data in to pandas.to_datetime() format. It is not working and the type of df['Time'] is Object. Where is wrong?
Please Note that I have attached my time file.
My Code
import pandas as pd
import numpy as np
from datetime import datetime
f = open('time','r')
lines = f.readlines()
t = []
for line in lines:
time = line.split()[1][-20:]
time2 = time[:11] + ' ' +time[12:21]
t.append(time2)
df = pd.DataFrame(t)
df.columns = ['Time']
df['Time'] = pd.to_datetime(df['Time'])
print df['Time']
Name: Time, Length: 16136, dtype: object
please find the attach time data file here
The file time contain some invalid data.
For example, line 8323 contain 8322 "5/Jul/2013::8:25:18 0530",
which is different from normal lines 8321 "15/Jul/2013:18:25:18 +0530".
8321 "15/Jul/2013:18:25:18 +0530"
8322 "5/Jul/2013::8:25:18 0530"
For normal line, time2 become 15/Jul/2013 18:25:18, but for invalid line "5/Jul/2013::8:25:18
15/Jul/2013 18:25:18
"5/Jul/2013::8:25:18
Which cause some lines are parsed to datetime, and some lines not; data are coerced to object (to contain both datetime and string).
>>> pd.Series(pd.to_datetime(['15/Jul/2013 18:25:18', '15/Jul/2013 18:25:18']))
0 2013-07-15 18:25:18
1 2013-07-15 18:25:18
dtype: datetime64[ns]
>>> pd.Series(pd.to_datetime(['15/Jul/2013 18:25:18', '*5/Jul/2013 18:25:18']))
0 15/Jul/2013 18:25:18
1 *5/Jul/2013 18:25:18
dtype: object
If you take only first 5 data (which has correct date format) from files, you will get what you expected.
...
df = pd.DataFrame(t[:5])
df.columns = ['Time']
df['Time'] = pd.to_datetime(df['Time'])
Above code yield:
0 2013-07-15 00:00:12
1 2013-07-15 00:00:18
2 2013-07-15 00:00:23
3 2013-07-15 00:00:27
4 2013-07-15 00:00:29
Name: Time, dtype: datetime64[ns]
UPDATE
Added a small example that show the cause of dtype of object, not datetime.

Categories

Resources