python - Convert timezone and retrieve hour - python

I am looking to add three columns to my current dataframe (utc_date, apac_date, and hour).
I successfully obtain two of the three columns, however hour should be corresponding to apac_date (17) but it is returning the hour for utc_date (9).
Any help would be greatly appreciated!
This is the starting dataframe:
import pandas as pd
from tzlocal import get_localzone
from pytz import timezone
raw_data = {
'id': ['123456'],
'start_date': [pd.datetime(2017, 9, 21, 5, 30, 0)]}
df = pd.DataFrame(raw_data, columns = ['id', 'start_date'])
df
Result:
id start_date
123456 2017-09-21 05:30:00
Next, I convert the timezones for utc and apac based on the users current region.
local_tz = get_localzone()
df['utc_date'] = df['start_date'].apply(lambda x: x.tz_localize(local_tz).astimezone(timezone('utc')))
df['apac_date'] = df['utc_date'].apply(lambda x: x.tz_localize('utc').astimezone(timezone('Asia/Hong_Kong')))
df
Result:
id start_date utc_date apac_date
123456 2017-09-21 05:30:00 2017-09-21 09:30:00+00:00 2017-09-21 17:30:00+08:00
Next, I retrieve the hour for the apac_date (it is giving me utc hour instead):
df['hour'] = df['apac_date'].apply(lambda x: int(x.strftime('%H')))
df
Result:
id start_date utc_date apac_date hour
123456 2017-09-21 05:30:00 2017-09-21 09:30:00+00:00 2017-09-21 17:30:00+08:00 9

can you try using:
df['apac_date'] = df['utc_date'].apply(lambda x: x.tz_convert('Asia/Hong_Kong'))
I got errors with your above code with using tz_localize() on a timezone that has already been localized.

Related

Pandas - Datetime Manipulation

I have a dataframe like so:
CREATED_AT COUNT
'1990-01-01' '2022-01-01 07:30:00' 5
'1990-01-02' '2022-01-01 07:30:00' 10
...
Where the index is a date and the CREATED_AT column is a datetime that is the same value for all rows.
How can I update the CREATED_AT_COLUMN such that it inherits its date portion from the index?
The result should look like:
CREATED_AT COUNT
'1990-01-01' '1990-01-01 07:30:00' 5
'1990-01-02' '1990-01-02 07:30:00' 10
...
Attempts at this result in errors like:
cannot add DatetimeArray and DatetimeArray
You can use df.reset_index() to use the index as a column and then do a simple maniuplation to get the output you want like this:
# Creating a test df
import pandas as pd
from datetime import datetime, timedelta, date
df = pd.DataFrame.from_dict({
"CREATED_AT": [datetime.now(), datetime.now() + timedelta(hours=1)],
"COUNT": [5, 10]
})
df_with_index = df.set_index(pd.Index([date.today() - timedelta(days=10), date.today() - timedelta(days=9)]))
# Creating the column with the result
df_result = df_with_index.reset_index()
df_result["NEW_CREATED_AT"] = pd.to_datetime(df_result["index"].astype(str) + ' ' + df_result["CREATED_AT"].dt.time.astype(str))
Result:
index CREATED_AT COUNT NEW_CREATED_AT
0 2022-11-11 2022-11-21 16:15:31.520960 5 2022-11-11 16:15:31.520960
1 2022-11-12 2022-11-21 17:15:31.520965 10 2022-11-12 17:15:31.520965
You can use:
# ensure CREATED_AT is a datetime
s = pd.to_datetime(df['CREATED_AT'])
# subtract the date to only get the time, add to the index
# ensuring the index is of datetime type
df['CREATED_AT'] = s.sub(s.dt.normalize()).add(pd.to_datetime(df.index))
If everything is already of datetime type, this simplifies to:
df['CREATED_AT'] = (df['CREATED_AT']
.sub(df['CREATED_AT'].dt.normalize())
.add(df.index)
)
Output:
CREATED_AT COUNT
1990-01-01 1990-01-01 07:30:00 5
1990-01-02 1990-01-02 07:30:00 10

I want to create a new column with ages according to birth dates but I'm not able to

I have a df with date of birth and I want to add another column with age. I want to understand what the problem is with this iteration I made. Why does this code put the age of the last user on every line instead of putting the age of each user?
import pandas as pd
import datetime as dt
df = pd.DataFrame(['9/26/1987 12:00:00 AM',
'9/21/1989 12:00:00 AM',
'2/23/1980 12:00:00 AM',
'7/19/1988 12:00:00 AM',
'1/23/1984 12:00:00 AM'], columns=['dob'])
df['Age'] = ""
for i in range(len(df)):
df.replace(df.iloc[i,1], dt.date.today().year - dt.datetime.strptime(df.iloc[i,0], "%m/%d/%Y %H:%M:%S %p").year, inplace = True)
print(df)
Output:
dob Age
0 9/26/1987 12:00:00 AM 38
1 9/21/1989 12:00:00 AM 38
2 2/23/1980 12:00:00 AM 38
3 7/19/1988 12:00:00 AM 38
4 1/23/1984 12:00:00 AM 38
when using Pandas your last resort should be to use loops: vectorization is the purpose of the module. That being said:
I'm going to assume your dob column is already in a datetime format. if not just call something like:
df['dob'] = pd.to_datetime(df['dob'])
Then get today's date from the datetime module and convert it to a type Pandas can handle,
from datetime import date
df['Age'] = (pd.to_datetime(date.today()) - df['dob']).dt.days
calling df['Age'] = ... creates a new column, whereas df['dob'] references the column that is already available. Pandas is smart enough to know that you want to take the individual date of today and subtract it from every item in the 'dob' column. No need to run a loop since they are very slow in Python.
You don't have to use ".dt.days" at the end if you want the exact time between today and dob.

Timedelta without Date

I have a column with times that are not timestamps and would like to know the timedelta to 00:30:00 o'clock. However, I can only find methods for timestamps.
df['Time'] = ['22:30:00', '23:30:00', '00:15:00']
The intended result should look something like this:
df['Output'] = ['02:00:00', '01:00:00', '00:15:00']
This code convert a type of Time value from str to datetime (date is automatically set as 1900-01-01). Then, calculated timedelta by setting standardTime as 1900-01-02-00:30:00.
import pandas as pd
from datetime import datetime, timedelta
df = pd.DataFrame()
df['Time'] = ['22:30:00', '23:30:00', '00:15:00']
standardTime = datetime(1900, 1, 2, 0, 30, 0)
df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S')
df['Output'] = df['Time'].apply(lambda x: standardTime-x).astype(str).str[7:] # without astype(str).str[7:], the Output value include a day such as "0 days 01:00:00"
print(df)
# Time Output
#0 1900-01-01 22:30:00 02:00:00
#1 1900-01-01 23:30:00 01:00:00
#2 1900-01-01 00:15:00 00:15:00
One could want to use datetime.time as data structures, but these cannot be subtracted, so you can't conveniently get a timedelta from them.
On the other hand, datetime.datetime objects can be subtracted, so if you're always interested in positive deltas, you could construct a datetime object from your time representation using 1970-01-01 as date, and compare that to 1970-01-02T00:30.
For instance, if your times are stored as strings (as per your snippet):
import datetime as dt
def timedelta_to_0_30(time_string: str) -> dt.timedelta:
time_string_as_datetime = dt.datetime.fromisoformat(f"1970-01-01T{time_string}")
return dt.datetime(1970, 1, 2, 0, 30) - time_string_as_datetime
my_time_string = "22:30:00"
timedelta_to_0_30(my_time_string) # 2:00:00

Convert a Date Object excel column to Datetime string by adding a given hour column

Can anyone solve this problem! I am trying to convert a Date object column to Datetime string format with the help of python. From 'YY-mm-dd' to 'YY/mm/dd 00:00' format. Dataset is given below. I have tried every options like energy_df['Date']= pd.to_datetime(energy_df['Date']),
energy_df['Date'] = pd.to_datetime(energy_df['Date'])
energy_df['month'] = energy_df['Date'].dt.month.astype(int)
energy_df['day_of_month'] = energy_df['Date'].dt.day.astype(int)
energy_df['day_of_week'] = energy_df['Date'].dt.dayofweek.astype(int)
energy_df['hour_of_day'] = energy_df['Hours']
selected_columns = ['Date', 'day_of_week', 'hour_of_day', 'Avg Specific Humidity[g/Kg]']
energy_df = energy_df[selected_columns]
Dataset image:
Convert the 'date' column to dtype datetime, the 'hour' column to dtype timedelta, add them together, and format to string.
Ex:
import pandas as pd
# some dummy input...
df = pd.DataFrame({'date': ['2015-01-01', '2015-01-01', '2015-01-01'],
'hour': [1, 2, 3]})
# to datetime / timedelta...
df['datetime'] = pd.to_datetime(df['date']) + pd.to_timedelta(df['hour'], unit='h')
# and format to string...
df['timestamp'] = df['datetime'].dt.strftime('%Y/%m/%d %H:%M')
# will give you:
df
date hour datetime timestamp
0 2015-01-01 1 2015-01-01 01:00:00 2015/01/01 01:00
1 2015-01-01 2 2015-01-01 02:00:00 2015/01/01 02:00
2 2015-01-01 3 2015-01-01 03:00:00 2015/01/01 03:00

Boolean column to determine if datetime index is between 8am and 9pm?

I have the following dataframe and was trying to create a new column of boolean values that would be generated based on my datetime index. A value of 1 if the hour is >= 08:00:00 and <= "21:00:00" and if the hour is outside of that range than 0.
Timestamp Bath_County_Gen Wing_Gen Boolean
2020-09-23 00:00:00 -390.0 2954.0 0
2020-09-23 00:15:00 -363.33 3007.75 0
2020-09-23 00:30:00 -250.0 3049.0 0
2020-09-23 00:45:00 -220.0 3143.5 0
2020-09-23 01:00:00 -206.67 3193.33 0
2020-09-23 01:15:00 -185.0 3195.25 0
I tried the following but had no luck and wasn't sure how else to dynamically the boolean column value.
df['boolean'] = np.where(df.between_time('08:00:00', '21:00:00'), 1,0)
Thanks for the help!
After ensuring your "Timestamp" column is in datetime format, you can extract the hour of the day from it and perform the following operation:
df['Timestamp'] = df.Timestamp.apply(pd.to_datetime) # ensure it's datetime
df['is_between_8_and_21'] = df['Timestamp'].dt.hour.between(8, 21, inclusive=True) # extract the hour and check if it's between 8 and 21h
now df will look like this:
Timestamp Bath_County_Gen Wing_Gen is_between_8_and_21
2020-10-23 00:00:00 -390.00 2954.00 False
2020-10-23 00:15:00 -363.33 3007.75 False
2020-10-23 00:30:00 -250.00 3049.00 False
Note that 21:05 will be translated to 21, so it will be included if you set the flag inclusive=True.
EDIT
As you mention, your "Timestamp" is actually a DateTime index. In this case, as you suggested you can already directly operate on the dataframe:
df.between_time('8:00', '21:00', include_start=True, include_end=True)
From the Pandas documentation on .between_time(), it appears that if you specify the start_time and end_time as strings, they must be in a format as "08:25", or "21:51". If you want more fine-grained control to the second, you can use the alternative specification via datetime.time, so for example:
import datetime
start_time = datetime.time(8, 0, 0)
end_time = datetime.time(21, 0, 0)
df.between_time(start_time, end_time, include_start=True,
include_end=False) # to ensure 21 o'clock exactly is excluded
Make sure that the Timestamp column has datetime format:
df["Timestamp"] = pd.to_datetime(df["Timestamp"])
Afterwards you can access the hour with datetime.hour. Full code (I added two rows for testing):
df = pd.DataFrame({
"Timestamp": ["2020-09-23 00:00:00", "2020-09-23 00:15:00", "2020-09-23 00:30:00", "2020-09-23 00:45:00", "2020-09-23 01:00:00", "2020-09-23 01:15:00", "2020-09-23 15:15:00", "2020-09-23 23:15:00"]
})
df["Timestamp"] = pd.to_datetime(df["Timestamp"])
def is_between_8_and_21(datetime):
return 1 if (datetime.hour >= 8) & (datetime.hour <= 21) else 0
df["Boolean"] = df["Timestamp"].apply(lambda x: is_between_8_and_21(x))
df

Categories

Resources