I've got some date and time data as a string that is formatted like this, in UTC:
,utc_date_and_time, api_calls
0,2022-10-20 00:00:00,12
1,2022-10-20 00:05:00,14
2,2022-10-20 00:10:00,17
Is there a way to create another column here that always represents that time, but so it is for London/Europe?
,utc_date_and_time, api_calls, london_date_and_time
0,2022-10-20 00:00:00,12,2022-10-20 01:00:00
1,2022-10-20 00:05:00,14,2022-10-20 01:05:00
2,2022-10-20 00:10:00,17,2022-10-20 01:10:00
I want to write some code that, for any time of the year, will display the time in London - but I'm worried that when the timezone changes in London/UK that my code will break.
with pandas, you'd convert to datetime, specify UTC and then call tz_convert:
df
Out[9]:
utc_date_and_time api_calls
0 2022-10-20 00:00:00 12
1 2022-10-20 00:05:00 14
2 2022-10-20 00:10:00 17
df["utc_date_and_time"] = pd.to_datetime(df["utc_date_and_time"], utc=True)
df["london_date_and_time"] = df["utc_date_and_time"].dt.tz_convert("Europe/London")
df
Out[12]:
utc_date_and_time api_calls london_date_and_time
0 2022-10-20 00:00:00+00:00 12 2022-10-20 01:00:00+01:00
1 2022-10-20 00:05:00+00:00 14 2022-10-20 01:05:00+01:00
2 2022-10-20 00:10:00+00:00 17 2022-10-20 01:10:00+01:00
in vanilla Python >= 3.9, you'd let zoneinfo handle the conversion;
from datetime import datetime
from zoneinfo import ZoneInfo
t = "2022-10-20 00:00:00"
# to datetime, set UTC
dt = datetime.fromisoformat(t).replace(tzinfo=ZoneInfo("UTC"))
# to london time
dt_london = dt.astimezone(ZoneInfo("Europe/London"))
print(dt_london)
2022-10-20 01:00:00+01:00
You should use utc timezone
from datetime import datetime, timezone
datetime.now(timezone.utc).isoformat()
Outputs:
2022-10-25T15:27:08.874057+00:00
Related
I have a dataframe with columns:
time: time in UTC format
timezone: the corresponding timezone.
time timezone
0 2022-12-28T20:16:31.373Z Europe/Athens
1 2022-07-28T20:16:31.373Z Europe/Athens
2 2022-11-01T21:35:35.865Z Europe/Dublin
3 2022-08-03T19:44:07.611Z America/Los_Angeles
4 2022-08-02T12:44:44.360Z Europe/Minsk
I want to:
Convert UTC time to Local time (using timezone)
Remove the Timezone and just keep the datetime
It seems to me that this solution works, but want to make sure that I am not missing something (eg. doesn't deal with dailight saving or something)
import pandas as pd
# example dataframe
df = pd.DataFrame({
'time' : ['2022-12-28T20:16:31.373Z', '2022-07-28T20:16:31.373Z', '2022-11-01T21:35:35.865Z', '2022-08-03T19:44:07.611Z', '2022-08-02T12:44:44.360Z'],
'timezone': ['Europe/Athens', 'Europe/Athens', 'Europe/Dublin', 'America/Los_Angeles', 'Europe/Minsk']
})
# function
def get_local_time (timestamp: pd.Timestamp, timezone: str) -> pd.Timestamp:
timestamp = pd.to_datetime(timestamp).tz_convert(timezone).replace(tzinfo=None)
return timestamp
df['local_time'] = df.apply(lambda row: get_local_time(row['time'], row['timezone']), axis = 1).dt.round(freq='S')
print (df)
---
OUT:
time timezone local_time
0 2022-12-28T20:16:31.373Z Europe/Athens 2022-12-28 22:16:31
1 2022-07-28T20:16:31.373Z Europe/Athens 2022-07-28 23:16:31
2 2022-11-01T21:35:35.865Z Europe/Dublin 2022-11-01 21:35:36
3 2022-08-03T19:44:07.611Z America/Los_Angeles 2022-08-03 12:44:08
4 2022-08-02T12:44:44.360Z Europe/Minsk 2022-08-02 15:44:44
I get the following error for my timestamp field when converting from UTC to CET due to the transition to daylight savings last Saturday/Sunday:
AmbiguousTimeError: Cannot infer dst time from 2020-07-31 11:17:18+00:00, try using the 'ambiguous' argument
#converting timestamp fields to CET (Europe,Berlin)
df['timestamp_berlin_time'] = df['timestamp'].dt.tz_localize('Europe/Berlin')
I tried the following snippet:
df['timestamp_berlin_time'] = df['timestamp'].dt.tz_localize('CET',ambiguous='infer')
but this gives me then this error:
AmbiguousTimeError: 2020-07-31 11:17:18+00:00
Data sample:
0 2020-07-31 11:17:18+00:00
1 2020-07-31 11:17:18+00:00
2 2020-08-31 16:26:42+00:00
3 2020-10-20 07:28:46+00:00
4 2020-10-01 22:11:33+00:00
Name: timestamp, dtype: datetime64[ns, UTC]
If your input is UTC but UTC isn't set yet, you can localize to UTC first, here e.g.:
df['timestamp'] = df['timestamp'].dt.tz_localize('UTC')
If your input already is converted to UTC, you can simply tz_convert, e.g.:
s = pd.Series(pd.to_datetime(['2020-10-25 00:40:03.925000',
'2020-10-25 01:40:03.925000',
'2020-10-25 02:40:03.925000'], utc=True))
s.dt.tz_convert('Europe/Berlin')
# 0 2020-10-25 02:40:03.925000+02:00
# 1 2020-10-25 02:40:03.925000+01:00
# 2 2020-10-25 03:40:03.925000+01:00
# dtype: datetime64[ns, Europe/Berlin]
If your input timestamps represent local time (here: Europe/Berlin time zone), you can try to infer the DST transition based on order:
s = pd.Series(pd.to_datetime(['2020-10-25 02:40:03.925000',
'2020-10-25 02:40:03.925000',
'2020-10-25 03:40:03.925000']))
s.dt.tz_localize('Europe/Berlin', ambiguous='infer')
# 0 2020-10-25 02:40:03.925000+02:00
# 1 2020-10-25 02:40:03.925000+01:00
# 2 2020-10-25 03:40:03.925000+01:00
# dtype: datetime64[ns, Europe/Berlin]
Note: CET is not a time zone in a geographical sense. pytz can handle some of these for historical reasons but don't count on it. In any case, it might give you static tz offsets - which is not what you want if you expect it to include DST transitions.
The starting date format I currently have is 2019-09-04 16:00 UTC+3 and I'm trying to convert it into a datetime format of 2019-09-04 16:00:00+0300.
The format I thought would work was format='%Y-%m-%d %H:%M %Z%z', but when I run it I get the error message ValueError: Cannot parse both %Z and %z.
Does anyone know the correct format to use, or should I be trying a different method altogether? Thanks.
Edit
Sorry, I had a hard time putting into words what it is I am looking to do, hopefully I can clarify.
I'm looking to change all the date and times in a dataframe into the datetime format.
This is the method I was trying to use which presented me with an error
df['datepicker'] = pd.to_datetime(df['datepicker'], format='%Y-%m-%d %H:%M %Z%z')
And here is a sample of the data I currently have.
datepicker
2019-09-07 16:00 UTC+2
2019-09-04 18:30 UTC+4
2019-09-06 17:00 UTC±0
2019-09-10 16:00 UTC+1
2019-09-04 18:00 UTC+3
And this is what I'm looking to convert them into, a timestamp format.
datepicker
2019-09-07 16:00:00+0200
2019-09-04 18:30:00+0400
2019-09-06 17:00:00+0000
2019-09-10 16:00:00+0100
2019-09-04 18:00:00+0300
pandas.to_datetime should parse this happily if you tweak the strings slightly:
import pandas as pd
df = pd.DataFrame({"datepicker":[ "2019-09-07 16:00 UTC+2", "2019-09-04 18:30 UTC+4",
"2019-09-06 17:00 UTC±0", "2019-09-10 16:00 UTC+1",
"2019-09-04 18:00 UTC+3"]})
df['datetime'] = pd.to_datetime(df['datepicker'].str.replace('±', '+'))
# df['datetime']
# 0 2019-09-07 16:00:00-02:00
# 1 2019-09-04 18:30:00-04:00
# 2 2019-09-06 17:00:00+00:00
# 3 2019-09-10 16:00:00-01:00
# 4 2019-09-04 18:00:00-03:00
# Name: datetime, dtype: object
Note that due to the mixed UTC offsets, the column's data type is 'object' (datetime objects). If you wish, you can also convert to UTC straight away, to get a column of dtype datetime[ns]:
df['UTC'] = pd.to_datetime(df['datepicker'].str.replace('±', '+'), utc=True)
# df['UTC']
# 0 2019-09-07 18:00:00+00:00
# 1 2019-09-04 22:30:00+00:00
# 2 2019-09-06 17:00:00+00:00
# 3 2019-09-10 17:00:00+00:00
# 4 2019-09-04 21:00:00+00:00
# Name: UTC, dtype: datetime64[ns, UTC]
When i defined as below. it works as you expect.
from datetime import datetime, timedelta, timezone
UTC = timezone(timedelta(hours=+3))
dt = datetime(2019, 1, 1, 12, 0, 0, tzinfo=UTC)
timestampStr = dt.strftime("%Y-%m-%d %H:%M %Z%z")
print(timestampStr)
With the output of:
2019-01-01 12:00 UTC+03:00+0300
Using Pandas 1.0.0, how can I change the time of a datetime dataframe column to midnight in one line of code?
e.g.:
from
START_DATETIME
2017-02-13 09:13:33
2017-03-11 23:11:35
2017-03-12 00:44:32
...
to
START_DATETIME
2017-02-13 00:00:00
2017-03-11 00:00:00
2017-03-12 00:00:00
...
My attempt:
df['START_DATETIME'] = df['START_DATETIME'].apply(lambda x: pd.Timestamp(x).replace(hour=0, minute=0, second=0))
but this produces
START_DATETIME
2017-02-13
2017-03-11
2017-03-12
...
Your method already converted datetime values correctly to midnight. I.e., their time are 00:00:00. Pandas just intelligently doesn't show the time part because it is redundant to show all same time of 00:00:00. After you assigning result back to START_DATETIME, print a cell will show
print(df.loc[0, START_DATETIME])
Output:
2017-02-13 00:00:00
Besides, to convert time to 00:00:00, you should use dt.normalize or dt.floor
df['START_DATETIME'] = pd.to_datetime(df['START_DATETIME']).dt.normalize()
or
df['START_DATETIME'] = pd.to_datetime(df['START_DATETIME']).dt.floor('D')
If you want to force pandas to show 00:00:00 in the series output, you need convert START_DATETIME to str after converting
pd.to_datetime(df['START_DATETIME']).dt.floor('D').dt.strftime('%Y-%m-%d %H:%M:%S')
Out[513]:
0 2017-02-13 00:00:00
1 2017-03-11 00:00:00
2 2017-03-12 00:00:00
Name: START_DATETIME, dtype: object
You can do:
import pandas as pd
df=pd.DataFrame({"START_DATETIME":
["2017-02-13 09:13:33","2017-03-11 23:11:35","2017-03-12 00:44:32"]})
#you should convert it to date time first
#in case if it's not already:
df["START_DATETIME"]=pd.to_datetime(df["START_DATETIME"])
df["START_DATETIME_DT"]=df["START_DATETIME"].dt.strftime("%Y-%m-%d 00:00:00")
Outputs:
START_DATETIME START_DATETIME_DT
0 2017-02-13 09:13:33 2017-02-13 00:00:00
1 2017-03-11 23:11:35 2017-03-11 00:00:00
2 2017-03-12 00:44:32 2017-03-12 00:00:00
I have a dataframe that has entries like this, where the times are in UTC:
start_date_time timezone
1 2017-01-01 14:00:00 America/Los_Angeles
2 2017-01-01 14:00:00 America/Denver
3 2017-01-01 14:00:00 America/Phoenix
4 2017-01-01 14:30:00 America/Los_Angeles
5 2017-01-01 14:30:00 America/Los_Angeles
I need to be able to group by date (local date, not UTC date) and I need to be able to create indicators for whether the event happened between certain times (local times, not UTC times).
I have successfully done the above in R by:
Creating a time variable in each of the timezones
Converting those to strings
Pulling each of the string date/time variables into one column, which one I pull depends on the appropriate timezone
Then, splitting that column to get a string date column and a string time column
I can then convert everything back to datetime objects for comparisons. e.g. now I can say if something happened between 2 and 3pm and it will correctly identify everything that happened between 2 and 3pm locally.
I have tried a bunch in python and have the dates as
2017-01-02 04:30:00-08:00
but I can't figure out how to go from there to
2017-01-01 20:30:00
Thanks!
Your example is incorrect. Your timezone is eight hours behind UTC, which means you need to add eight hours to 4:30AM which is 12:30PM UTC time.
The datetime object function astimezone(...) will do the conversion for you. For ease of use, I recommend pytz.
However in pure python:
import datetime as dt
local_tz = dt.timezone(dt.timedelta(hours=-8))
utc = dt.timezone.utc
d = dt.datetime(2017, 1, 2, 4, 30, 0, 0, local_tz)
print(d, d.astimezone(utc))
Will print:
2017-01-02 04:30:00-08:00 2017-01-02 12:30:00+00:00
Here's an example using pytz to lookup time zones:
import datetime as dt
import pytz
dates = [("2017-01-01 14:00:00", "America/Los_Angeles"),
("2017-01-01 14:00:00", "America/Denver"),
("2017-01-01 14:00:00", "America/Phoenix"),
("2017-01-01 14:30:00", "America/Los_Angeles"),
]
for d, tz_str in dates:
start = dt.datetime.strptime(d, "%Y-%m-%d %H:%M:%S")
start = start.replace(tzinfo=pytz.utc)
local_tz = pytz.timezone(tz_str) # convert to desired timezone
print(start, local_tz.zone, "\t", start.astimezone(local_tz))
This produces:
2017-01-01 14:00:00+00:00 America/Los_Angeles 2017-01-01 06:00:00-08:00
2017-01-01 14:00:00+00:00 America/Denver 2017-01-01 07:00:00-07:00
2017-01-01 14:00:00+00:00 America/Phoenix 2017-01-01 07:00:00-07:00
2017-01-01 14:30:00+00:00 America/Los_Angeles 2017-01-01 06:30:00-08:00