I have a column with times that are not timestamps and would like to know the timedelta to 00:30:00 o'clock. However, I can only find methods for timestamps.
df['Time'] = ['22:30:00', '23:30:00', '00:15:00']
The intended result should look something like this:
df['Output'] = ['02:00:00', '01:00:00', '00:15:00']
This code convert a type of Time value from str to datetime (date is automatically set as 1900-01-01). Then, calculated timedelta by setting standardTime as 1900-01-02-00:30:00.
import pandas as pd
from datetime import datetime, timedelta
df = pd.DataFrame()
df['Time'] = ['22:30:00', '23:30:00', '00:15:00']
standardTime = datetime(1900, 1, 2, 0, 30, 0)
df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S')
df['Output'] = df['Time'].apply(lambda x: standardTime-x).astype(str).str[7:] # without astype(str).str[7:], the Output value include a day such as "0 days 01:00:00"
print(df)
# Time Output
#0 1900-01-01 22:30:00 02:00:00
#1 1900-01-01 23:30:00 01:00:00
#2 1900-01-01 00:15:00 00:15:00
One could want to use datetime.time as data structures, but these cannot be subtracted, so you can't conveniently get a timedelta from them.
On the other hand, datetime.datetime objects can be subtracted, so if you're always interested in positive deltas, you could construct a datetime object from your time representation using 1970-01-01 as date, and compare that to 1970-01-02T00:30.
For instance, if your times are stored as strings (as per your snippet):
import datetime as dt
def timedelta_to_0_30(time_string: str) -> dt.timedelta:
time_string_as_datetime = dt.datetime.fromisoformat(f"1970-01-01T{time_string}")
return dt.datetime(1970, 1, 2, 0, 30) - time_string_as_datetime
my_time_string = "22:30:00"
timedelta_to_0_30(my_time_string) # 2:00:00
Related
How to remove T00:00:00+05:30 after year, month and date values in pandas? I tried converting the column into datetime but also it's showing the same results, I'm using pandas in streamlit. I tried the below code
df['Date'] = pd.to_datetime(df['Date'])
The output is same as below :
Date
2019-07-01T00:00:00+05:30
2019-07-01T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-03T00:00:00+05:30
2019-07-03T00:00:00+05:30
2019-07-04T00:00:00+05:30
2019-07-04T00:00:00+05:30
2019-07-05T00:00:00+05:30
Can anyone help me how to remove T00:00:00+05:30 from the above rows?
If I understand correctly, you want to keep only the date part.
Convert date strings to datetime
df = pd.DataFrame(
columns={'date'},
data=["2019-07-01T02:00:00+05:30", "2019-07-02T01:00:00+05:30"]
)
date
0 2019-07-01T02:00:00+05:30
1 2019-07-02T01:00:00+05:30
2 2019-07-03T03:00:00+05:30
df['date'] = pd.to_datetime(df['date'])
date
0 2019-07-01 02:00:00+05:30
1 2019-07-02 01:00:00+05:30
Remove the timezone
df['datetime'] = df['datetime'].dt.tz_localize(None)
date
0 2019-07-01 02:00:00
1 2019-07-02 01:00:00
Keep the date only
df['date'] = df['date'].dt.date
0 2019-07-01
1 2019-07-02
Don't bother with apply to Python dates or string changes. The former will leave you with an object type column and the latter is slow. Just round to the day frequency using the library function.
>>> pd.Series([pd.Timestamp('2000-01-05 12:01')]).dt.round('D')
0 2000-01-06
dtype: datetime64[ns]
If you have a timezone aware timestamp, convert to UTC with no time zone then round:
>>> pd.Series([pd.Timestamp('2019-07-01T00:00:00+05:30')]).dt.tz_convert(None) \
.dt.round('D')
0 2019-07-01
dtype: datetime64[ns]
Pandas doesn't have a builtin conversion to datetime.date, but you could use .apply to achieve this if you want to have date objects instead of string:
import pandas as pd
import datetime
df = pd.DataFrame(
{"date": [
"2019-07-01T00:00:00+05:30",
"2019-07-01T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-03T00:00:00+05:30",
"2019-07-03T00:00:00+05:30",
"2019-07-04T00:00:00+05:30",
"2019-07-04T00:00:00+05:30",
"2019-07-05T00:00:00+05:30"]})
df["date"] = df["date"].apply(lambda x: datetime.datetime.fromisoformat(x).date())
print(df)
import pandas as pd
import datetime
dictt={'s_time': ["06:30:00", "07:30:00","16:30:00"], 'f_time': ["10:30:00", "23:30:00","23:30:00"]}
df=pd.DataFrame(dictt)
in this case i want to convert them times in to datetime object so i can later on use it for calculation or others.
when i command df['s_time']=pd.to_datetime(df['s_time'],format='%H:%M:%S').dt.time
it gives error:
time data '24:00:00' does not match format '%H:%M:%S' (match)
so i dont know how to fix this
"24:00:00" means "00:00:00"
If it's just "24:00:00" that's causing trouble, you can replace the "24:" prefix with "00:":
import pandas as pd
df = pd.DataFrame({'time': ["06:30:24", "07:24:00", "24:00:00"]})
# replace prefix "24:" with "00:"
df['time'] = df['time'].str.replace('^24:', '00:', regex=True)
# now to_datetime
df['time'] = pd.to_datetime(df['time'])
df['time']
0 2021-04-17 06:30:24
1 2021-04-17 07:24:00
2 2021-04-17 00:00:00
Name: time, dtype: datetime64[ns]
1 to 24 hour clock (instead of 0 to 23)
If however your time notation goes from 1 to 24 hours (instead of 0 to 23), you can parse string to timedelta, subtract one hour and then cast to datetime:
df = pd.DataFrame({'time': ["06:30:24", "07:24:00", "24:00:00"]})
# to timedelta and subtract one hour
df['time'] = pd.to_timedelta(df['time']) - pd.Timedelta(hours=1)
# to string and then datettime:
df['time'] = pd.to_datetime(df['time'].astype(str).str.split(' ').str[-1])
df['time']
0 2021-04-17 05:30:24
1 2021-04-17 06:24:00
2 2021-04-17 23:00:00
Name: time, dtype: datetime64[ns]
Note: the underlying assumption here is that the date is irrelevant. If there also is a date, see the related question I linked in the comments section.
Can anyone solve this problem! I am trying to convert a Date object column to Datetime string format with the help of python. From 'YY-mm-dd' to 'YY/mm/dd 00:00' format. Dataset is given below. I have tried every options like energy_df['Date']= pd.to_datetime(energy_df['Date']),
energy_df['Date'] = pd.to_datetime(energy_df['Date'])
energy_df['month'] = energy_df['Date'].dt.month.astype(int)
energy_df['day_of_month'] = energy_df['Date'].dt.day.astype(int)
energy_df['day_of_week'] = energy_df['Date'].dt.dayofweek.astype(int)
energy_df['hour_of_day'] = energy_df['Hours']
selected_columns = ['Date', 'day_of_week', 'hour_of_day', 'Avg Specific Humidity[g/Kg]']
energy_df = energy_df[selected_columns]
Dataset image:
Convert the 'date' column to dtype datetime, the 'hour' column to dtype timedelta, add them together, and format to string.
Ex:
import pandas as pd
# some dummy input...
df = pd.DataFrame({'date': ['2015-01-01', '2015-01-01', '2015-01-01'],
'hour': [1, 2, 3]})
# to datetime / timedelta...
df['datetime'] = pd.to_datetime(df['date']) + pd.to_timedelta(df['hour'], unit='h')
# and format to string...
df['timestamp'] = df['datetime'].dt.strftime('%Y/%m/%d %H:%M')
# will give you:
df
date hour datetime timestamp
0 2015-01-01 1 2015-01-01 01:00:00 2015/01/01 01:00
1 2015-01-01 2 2015-01-01 02:00:00 2015/01/01 02:00
2 2015-01-01 3 2015-01-01 03:00:00 2015/01/01 03:00
I have a dataset where I have 2 columns in a data frame - Date in YYYY-MM-DD format and another column with Hour in format 0100 (for 1am) until 2300 (for 12pm).
Date Hour
2017-01-01 0200
2017-01-01 0400
etc
In order to get it ready for Time series mode, I want to convert these into datetime objects and concatenate these columns. Example output desired: 2017-01-01 01:00:00, etc
I have tried df['Date'] = pd.to_datetime(df['Date']) and converted this into datetime object, But I'm struggling with the Hour column. Please help
This is one way. The trick is to note that pd.to_datetime is actually quite flexible: it accepts strings of the format "YYYY-MM-DD HHMM".
I assume here that your Hour is given as a string (otherwise leading zeros are not possible).
import pandas as pd
df = pd.DataFrame({'Date': ['2017-01-01', '2017-01-01'],
'Hour': ['0200', '0400']})
# as per #COLDSPEED's suggestion
df['DateTime'] = pd.to_datetime(df['Date'] + ' ' + df['Hour'])
print(df)
# Date Hour DateTime
# 0 2017-01-01 0200 2017-01-01 02:00:00
# 1 2017-01-01 0400 2017-01-01 04:00:00
print(df.dtypes)
# Date object
# Hour object
# DateTime datetime64[ns]
# dtype: object
Previous version with pd.DataFrame.apply is possible but inefficient:
df['DateTime'] = df.apply(lambda x: x['Date'] + ' ' + x['Hour'], axis=1)\
.apply(pd.to_datetime)
I am looking to add three columns to my current dataframe (utc_date, apac_date, and hour).
I successfully obtain two of the three columns, however hour should be corresponding to apac_date (17) but it is returning the hour for utc_date (9).
Any help would be greatly appreciated!
This is the starting dataframe:
import pandas as pd
from tzlocal import get_localzone
from pytz import timezone
raw_data = {
'id': ['123456'],
'start_date': [pd.datetime(2017, 9, 21, 5, 30, 0)]}
df = pd.DataFrame(raw_data, columns = ['id', 'start_date'])
df
Result:
id start_date
123456 2017-09-21 05:30:00
Next, I convert the timezones for utc and apac based on the users current region.
local_tz = get_localzone()
df['utc_date'] = df['start_date'].apply(lambda x: x.tz_localize(local_tz).astimezone(timezone('utc')))
df['apac_date'] = df['utc_date'].apply(lambda x: x.tz_localize('utc').astimezone(timezone('Asia/Hong_Kong')))
df
Result:
id start_date utc_date apac_date
123456 2017-09-21 05:30:00 2017-09-21 09:30:00+00:00 2017-09-21 17:30:00+08:00
Next, I retrieve the hour for the apac_date (it is giving me utc hour instead):
df['hour'] = df['apac_date'].apply(lambda x: int(x.strftime('%H')))
df
Result:
id start_date utc_date apac_date hour
123456 2017-09-21 05:30:00 2017-09-21 09:30:00+00:00 2017-09-21 17:30:00+08:00 9
can you try using:
df['apac_date'] = df['utc_date'].apply(lambda x: x.tz_convert('Asia/Hong_Kong'))
I got errors with your above code with using tz_localize() on a timezone that has already been localized.