Pandas: convert UTC to Local time using timezone, then drop timezone - python

I have a dataframe with columns:
time: time in UTC format
timezone: the corresponding timezone.
time timezone
0 2022-12-28T20:16:31.373Z Europe/Athens
1 2022-07-28T20:16:31.373Z Europe/Athens
2 2022-11-01T21:35:35.865Z Europe/Dublin
3 2022-08-03T19:44:07.611Z America/Los_Angeles
4 2022-08-02T12:44:44.360Z Europe/Minsk
I want to:
Convert UTC time to Local time (using timezone)
Remove the Timezone and just keep the datetime
It seems to me that this solution works, but want to make sure that I am not missing something (eg. doesn't deal with dailight saving or something)
import pandas as pd
# example dataframe
df = pd.DataFrame({
'time' : ['2022-12-28T20:16:31.373Z', '2022-07-28T20:16:31.373Z', '2022-11-01T21:35:35.865Z', '2022-08-03T19:44:07.611Z', '2022-08-02T12:44:44.360Z'],
'timezone': ['Europe/Athens', 'Europe/Athens', 'Europe/Dublin', 'America/Los_Angeles', 'Europe/Minsk']
})
# function
def get_local_time (timestamp: pd.Timestamp, timezone: str) -> pd.Timestamp:
timestamp = pd.to_datetime(timestamp).tz_convert(timezone).replace(tzinfo=None)
return timestamp
df['local_time'] = df.apply(lambda row: get_local_time(row['time'], row['timezone']), axis = 1).dt.round(freq='S')
print (df)
---
OUT:
time timezone local_time
0 2022-12-28T20:16:31.373Z Europe/Athens 2022-12-28 22:16:31
1 2022-07-28T20:16:31.373Z Europe/Athens 2022-07-28 23:16:31
2 2022-11-01T21:35:35.865Z Europe/Dublin 2022-11-01 21:35:36
3 2022-08-03T19:44:07.611Z America/Los_Angeles 2022-08-03 12:44:08
4 2022-08-02T12:44:44.360Z Europe/Minsk 2022-08-02 15:44:44

Related

Converting a datetime column with UTC timezone to another TimeZone based on a column of the Dataframe

I have a Pandas dateframe which has two columns, with column names 'DateTimeInUTC' and 'TimeZone'. 'DateTimeInUTC' is the date and time of the instance in UTC, and 'TimeZone' is the time zone of the location of the instance.
An instance in the dataframe could be like this:
DateTimeInUTC: '2019-12-31 07:00:00'
TimeZone: 'US/Eastern'
I want to add another column to the dataframe with dType: datetime64 which converts 'DateTimeInUTC' to the specified time zone in that instance.
I tried using the method Pandas.tz_convert() but it takes the timezone as an argument and not as another column in the dataframe
EDIT: My best solution so far is to split the dataframe by timezone using pandas select statements and then apply the timezone on each dataframe with the same timezone, and then concatenate all the dataframes
The Better solution:
I was able to improve my own solution substantially:
timezones = weatherDf['TimeZone'].unique()
for timezone in timezones:
weatherDf.loc[weatherDf['TimeZone'] == timezone, 'DateTimeInTimeZone'] = weatherDf.loc[weatherDf['TimeZone'] == timezone, 'DateTimeInUTC'].dt.tz_localize('UTC').dt.tz_convert(timezone).dt.tz_localize(None)
This solution converted around 7 million instances on my system in 3.6 seconds
My previous solution:
This solution works, but it is probably not optimal:
let's assume weatherDf is my dataframe, which has these columns: DateTimeInUTC and TimeZone
timezones = weatherDf['TimeZone'].unique()
weatherDfs = []
for timezone in timezones:
tempDf = weatherDf[weatherDf['TimeZone'] == timezone]
tempDf['DateTimeInTimeZone'] = tempDf['DateTimeInUTC'].dt.tz_convert(timezone)
weatherDfs.append(tempDf)
weatherDfConverted = pd.concat(weatherDfs)
This solution converted around 7 million instances on my system in around 40 seconds
Approach with groupby():
import pytz
import random
import time
tic = time.perf_counter()
ltz = len(pytz.all_timezones) - 1
length = 7 * 10 ** 6
pd.options.display.max_columns = None
pd.options.display.max_colwidth = None
# generate the dummy data
df = pd.DataFrame({'DateTimeInUTC': pd.date_range('01.01.2000', periods=length, freq='T', tz='UTC'),
'TimeZone': [pytz.all_timezones[random.randint(0, ltz)] for tz in range(length)]})
toc = time.perf_counter()
print(f"Generated the df in {toc - tic:0.4f} seconds\n")
tic = time.perf_counter()
df['Converted'] = df.groupby('TimeZone')['DateTimeInUTC'].apply(lambda x: x.dt.tz_convert(x.name).dt.tz_localize(None))
print(df)
toc = time.perf_counter()
print(f"\nConverted the df in {toc - tic:0.4f} seconds")
Output:
Generated the df in 6.3333 seconds
DateTimeInUTC TimeZone Converted
0 2000-01-01 00:00:00+00:00 Asia/Qyzylorda 2000-01-01 05:00:00
1 2000-01-01 00:01:00+00:00 America/Moncton 1999-12-31 20:01:00
2 2000-01-01 00:02:00+00:00 America/Cordoba 1999-12-31 21:02:00
3 2000-01-01 00:03:00+00:00 Africa/Dakar 2000-01-01 00:03:00
4 2000-01-01 00:04:00+00:00 Pacific/Wallis 2000-01-01 12:04:00
... ... ... ...
6999995 2013-04-23 02:35:00+00:00 America/Guyana 2013-04-22 22:35:00
6999996 2013-04-23 02:36:00+00:00 America/St_Vincent 2013-04-22 22:36:00
6999997 2013-04-23 02:37:00+00:00 MST7MDT 2013-04-22 20:37:00
6999998 2013-04-23 02:38:00+00:00 Antarctica/McMurdo 2013-04-23 14:38:00
6999999 2013-04-23 02:39:00+00:00 America/Atikokan 2013-04-22 21:39:00
[7000000 rows x 3 columns]
Converted the df in 4.1579 seconds

How to convert UTC datetime to local datetime (Australia/Melbourne) in Python

I have a data frame with "Date" column in UTC format.
Date
2021-10-14T06:57:00.000+0000
2021-09-05T08:30:00.000+0000
2021-10-20T04:34:00.000+0000
2021-10-19T21:49:00.000+0000
2021-09-30T20:53:00.000+0000
Tried this but didnt work;
df['Date'] = df['Date'].substr(replace(to_iso8601(from_iso8601_timestamp(Date) AT TIME ZONE 'Australia/Melbourne'), 'T', ' '), 1, 16) Date_local
I am unable to converte the UTC time to the local time zone (Australia/Melbourne).
Any help would be highly appreciated.
use pandas functionality; pd.to_datetime and then tz_convert.
# input strings to datetime data type:
df['Date'] = pd.to_datetime(df['Date'])
# UTC is already set (aware datetime); just convert:
df['Date'] = df['Date'].dt.tz_convert('Australia/Melbourne')
df['Date']
Out[2]:
0 2021-10-14 17:57:00+11:00
1 2021-09-05 18:30:00+10:00
2 2021-10-20 15:34:00+11:00
3 2021-10-20 08:49:00+11:00
4 2021-10-01 06:53:00+10:00
Name: Date, dtype: datetime64[ns, Australia/Melbourne]

Adding a datetime column in pandas dataframe from minute values

I have a data frame where there is time columns having minutes from 0-1339 meaning 1440 minutes of a day. I want to add a column datetime representing the day 2021-3-21 including hh amd mm like this 1980-03-01 11:00 I tried following code
from datetime import datetime, timedelta
date = datetime.date(2021, 3, 21)
days = date - datetime.date(1900, 1, 1)
df['datetime'] = pd.to_datetime(df['time'],format='%H:%M:%S:%f') + pd.to_timedelta(days, unit='d')
But the error seems like descriptor 'date' requires a 'datetime.datetime' object but received a 'int'
Is there any other way to solve this problem or fixing this code? Please help to figure this out.
>>df
time
0
1
2
3
..
1339
I want to convert this minutes to particular format 1980-03-01 11:00 where I will use the date 2021-3-21 and convert the minutes tohhmm part. The dataframe will look like.
>df
datetime time
2021-3-21 00:00 0
2021-3-21 00:01 1
2021-3-21 00:02 2
...
How can I format my data in this way?
Let's try with pd.to_timedelta instead to get the duration in minutes from time then add a TimeStamp:
df['datetime'] = (
pd.Timestamp('2021-3-21') + pd.to_timedelta(df['time'], unit='m')
)
df.head():
time datetime
0 0 2021-03-21 00:00:00
1 1 2021-03-21 00:01:00
2 2 2021-03-21 00:02:00
3 3 2021-03-21 00:03:00
4 4 2021-03-21 00:04:00
Complete Working Example with Sample Data:
import numpy as np
import pandas as pd
df = pd.DataFrame({'time': np.arange(0, 1440)})
df['datetime'] = (
pd.Timestamp('2021-3-21') + pd.to_timedelta(df['time'], unit='m')
)
print(df)

Pandas dataframe not including time of day when converting from UNIX

I am retrieving data from an API which is timestamped in UNIX millisecond time and am trying to save this data to a CSV file. The data is in daily intervals but represented in UNIX millisecond time as mentioned.
I am using pandas functions to convert from milliseconds to datetime but is still not saving the data with the time of day part. My code is as follows:
ticker = 'tBTCUSD'
r = requests.get(url, params = params)
data = pd.DataFrame(r.json())
data.set_index([0], inplace = True)
data.index = pd.to_datetime(data.index, unit = 'ms' )
data.to_csv('bitfinex_{}_usd_{}.csv'.format(ticker[1:-3].lower(), '1D'), mode='a', header=False)
It saves the data as 2020-08-21 instead of 2020-08-21 00:00:00. When I poll the API on say, an hourly or 15-minutely basis, that still includes the time but on daily intervals it doesn't. I was wondering if there is a step that I am missing to convert the time accordingly from UNIX millisecond to a %Y-%m-%d %H:%M:%S %Z format?
You can always explicitly specify the format:
data.index = pd.to_datetime(data.index, unit='ms').strftime('%Y-%m-%d %H:%M:%S UTC')
print(data)
1 2 3 4 5
0
2020-09-10 00:00:00 UTC 10241.000000 10333.862868 10516.00000 10233.087967 3427.178984
2020-09-09 00:00:00 UTC 10150.000000 10240.000000 10359.00000 10010.000000 2406.147398
2020-09-08 00:00:00 UTC 10400.000000 10148.000000 10464.00000 9882.400000 6761.138356
2020-09-07 00:00:00 UTC 10275.967600 10397.000000 10430.00000 9913.800000 6301.951492
2020-09-06 00:00:00 UTC 10197.000000 10276.000000 10365.07422 10031.000000 2755.663001
... ... ... ... ... ...
2020-05-18 00:00:00 UTC 9668.200000 9714.825163 9944.00000 9450.000000 9201.536549
2020-05-17 00:00:00 UTC 9386.000000 9668.200000 9883.50000 9329.700000 9663.262087
2020-05-16 00:00:00 UTC 9307.600000 9387.952090 9580.00000 9222.000000 4157.691762
2020-05-15 00:00:00 UTC 9791.000000 9311.200000 9848.90000 9130.200000 11340.269781
2020-05-14 00:00:00 UTC 9311.967387 9790.954158 9938.70000 9266.200000 12867.687617

Python - Local Time

I have a dataframe that has entries like this, where the times are in UTC:
start_date_time timezone
1 2017-01-01 14:00:00 America/Los_Angeles
2 2017-01-01 14:00:00 America/Denver
3 2017-01-01 14:00:00 America/Phoenix
4 2017-01-01 14:30:00 America/Los_Angeles
5 2017-01-01 14:30:00 America/Los_Angeles
I need to be able to group by date (local date, not UTC date) and I need to be able to create indicators for whether the event happened between certain times (local times, not UTC times).
I have successfully done the above in R by:
Creating a time variable in each of the timezones
Converting those to strings
Pulling each of the string date/time variables into one column, which one I pull depends on the appropriate timezone
Then, splitting that column to get a string date column and a string time column
I can then convert everything back to datetime objects for comparisons. e.g. now I can say if something happened between 2 and 3pm and it will correctly identify everything that happened between 2 and 3pm locally.
I have tried a bunch in python and have the dates as
2017-01-02 04:30:00-08:00
but I can't figure out how to go from there to
2017-01-01 20:30:00
Thanks!
Your example is incorrect. Your timezone is eight hours behind UTC, which means you need to add eight hours to 4:30AM which is 12:30PM UTC time.
The datetime object function astimezone(...) will do the conversion for you. For ease of use, I recommend pytz.
However in pure python:
import datetime as dt
local_tz = dt.timezone(dt.timedelta(hours=-8))
utc = dt.timezone.utc
d = dt.datetime(2017, 1, 2, 4, 30, 0, 0, local_tz)
print(d, d.astimezone(utc))
Will print:
2017-01-02 04:30:00-08:00 2017-01-02 12:30:00+00:00
Here's an example using pytz to lookup time zones:
import datetime as dt
import pytz
dates = [("2017-01-01 14:00:00", "America/Los_Angeles"),
("2017-01-01 14:00:00", "America/Denver"),
("2017-01-01 14:00:00", "America/Phoenix"),
("2017-01-01 14:30:00", "America/Los_Angeles"),
]
for d, tz_str in dates:
start = dt.datetime.strptime(d, "%Y-%m-%d %H:%M:%S")
start = start.replace(tzinfo=pytz.utc)
local_tz = pytz.timezone(tz_str) # convert to desired timezone
print(start, local_tz.zone, "\t", start.astimezone(local_tz))
This produces:
2017-01-01 14:00:00+00:00 America/Los_Angeles 2017-01-01 06:00:00-08:00
2017-01-01 14:00:00+00:00 America/Denver 2017-01-01 07:00:00-07:00
2017-01-01 14:00:00+00:00 America/Phoenix 2017-01-01 07:00:00-07:00
2017-01-01 14:30:00+00:00 America/Los_Angeles 2017-01-01 06:30:00-08:00

Categories

Resources