Why does converting timezones (and to unix timestamps) behave inconsistently in Pandas?

Why does converting timezones (and to unix timestamps) behave inconsistently in Pandas? - python

I'm parsing and manipulating some dates and times which, for reasons of interoperability with other systems, also need to be stored as UNIX (epoch) timestamps. In doing so, I'm seeing some weird behavior from pandas' Timestamp.tz_convert(), and then in its Timestamp.strftime() behavior in casting to epoch time, that makes me doubt my understanding of what should be going on.
The times I'm working with are in the US/Eastern timezone, but of course, epoch time is UTC, so my approach had been to cast to UTC since most conversions to/from UNIX timestamps assume that a tz-naive DateTime is in UTC. Let's leave aside whether doing that conversion is absolutely necessary to get valid timestamps; here's what I'm seeing that's problematic:
1. Using Timestamp.tz_convert() to change the timezone representation of a timestamp (i.e., a universal point in time) also changes the UNIX timestamp when you convert using Timestamp.strftime().
2. The differences in those timestamps don't even correspond to the proper hour differences between US-Eastern and GMT.
Here's some basic interactive-mode python to illustrate:
>>> import pytz
>>> from pytz import timezone
>>> import pandas as pd
>>> dtest = pd.to_datetime("Sunday, July 28, 2018 10:00 AM", infer_datetime_format=True).replace(tzinfo=timezone('America/New_York')) # okay, this should uniquely represent a point in time
>>> dtest
Timestamp('2018-07-28 10:00:00-0400', tz='America/New_York') # yup, that's the time - 10AM at GMT-0400.
>>> dtest2 = dtest.tz_convert('UTC') # convert to UTC
>>> dtest2
Timestamp('2018-07-28 14:00:00+0000', tz='UTC') # yup, same point in time, just different time zone now
>>> dtest.strftime('%s') # let's convert to unix time - this looks right
'1532786400'
>>> dtest2.strftime('%s') # should be the same, but it's not. WTF?
'1532804400'
The timestamps look like they are describing things equivalently: one is 10 AM at GMT-0400, the other is 2 PM at GMT+0000, a difference of 4 hours of clock time, as expected. They're both, of course, timezone-aware. But then converting them to UNIX timestamps yields
(A) different numbers, and even worse,
(B) numbers that differ by 5 hours (18000 seconds = 5 * 60 * 60) rather than 4, so I can't even assume that strftime() is merely ignoring timezone.
I'm using https://www.epochconverter.com/ to validate any timestamps as I sanity-check this, so that's a possible point of being misled. But according to that site,
1532786400 = 2018-07-28T10:00 -0400, and
1532804400 (that last result) = 2018-07-28T15:00 -0400, or 7pm GMT, a difference of 5 hours.
There are lots of questions on the subject of casting pandas Timestamps FROM a UNIX timestamp, but very little on questions casting TO epoch time. I can think of 2 possible explanations:
(1) tz_convert() is pulling some environment variable on my system that says I'm GMT -0500 and using that in the conversion process, in spite of that being irrelevant to converting between timezone-aware timestamps, and in so doing is actually changing the underlying point in time being represented. Or:
(2) Timestamp.strftime() is bugged and either ignoring the timezone parameter of a tz-aware timestamp or doing something truly bizarre when asked for a '%s' formatting parameter.
All advice greatly appreciated.

Related

If a timestamp is anchored in UTC, why isn't Python's `fromtimestamp` timezone-aware?

To my knowledge:
Python's datetime can be "naive" (if no timezone-info is available) or "timezone-aware". In contrast, a timestamp is well-defined to be anchored in UTC, i.e. a timestamp 0 corresponds to 1970-01-01 00:00:00+00:00 (no matter of your location).
Question: Why does datetime.fromtimestamp() return a naive datetime object though it has a well-defined input?
MWE
from datetime import datetime, timezone
timestamp = 0
# output: "1970-01-01 00:00:00+00:00", i.e. providing the timezone information,
# the resulting datetime is timezone-aware and accurate
print(datetime.fromtimestamp(timestamp, tz=timezone.utc))
# output: "1970-01-01 01:00:00" (for me running it in CET+0100 timezone), i.e.
# the interpretation is aware of my local time shift, but the resulting datetime
# is naive though it could be timezone-aware and thus not well-defined anymore
#
# I would have wished for/expected: "1970-01-01 01:00:00+01:00"
print(datetime.fromtimestamp(timestamp))
Why do I care?
The point is that we loose information in a dangerous way, i.e. we switch from a well-defined object to an object that is only well-defined if we know the timezone of the PC it has been read in. Though it could do better, imo. The way it is implemented, it is easy to mess things up without recognizing it.
But maybe I got the whole concept wrong :) That is why I am asking...

Timestamp with utc and any other timezone is coming out to be same with arrow

import arrow
print arrow.utcnow()
print arrow.utcnow().timestamp
print arrow.utcnow().to('Asia/Kolkata')
print arrow.utcnow().to('Asia/Kolkata').timestamp
I need the timestamp (in int) of 'Asia/Kolkata' timezone, which is +5:30 from utc.
arrow.utcnow() and arrow.utcnow().to('Asia/Kolkata') are coming out to be different and the second one is +5:30 the first, as expected.
However, arrow.utcnow().timestamp and arrow.utcnow().to('Asia/Kolkata').timestamp are still coming out to be same.
I am sure I am missing something very basic here, but can anyone explain this?

I think "timestamp", by definition, is always in UTC:
The Unix time (or Unix epoch or POSIX time or Unix timestamp) is a
system for describing points in time, defined as the number of seconds
elapsed since midnight proleptic Coordinated Universal Time (UTC) of
January 1, 1970, not counting leap seconds.
If you take your localized time string, convert it to a UTC date time (that is, 5pm Kolkata time becomes 5pm UTC), then you can get a timestamp that corresponds to the local clock time. Example:
import arrow
print arrow.utcnow()
print arrow.utcnow().timestamp
kolkata = arrow.utcnow().to('Asia/Kolkata')
print kolkata.replace(tzinfo='UTC').timestamp

Timestamps are UTC, this is also described in the Arrow docs
timestamp
Returns a timestamp representation of the Arrow object, in
UTC time.
Arrow will let you convert a non-UTC timestamp into arrow time but it won't let you shoot yourself in the foot by generating non-UTC timestamps for you.
classmethod fromtimestamp(timestamp, tzinfo=None)
Constructs an Arrow
object from a timestamp, converted to the given timezone.
Parameters: timestamp – an int or float timestamp, or a str that
converts to either. tzinfo – (optional) a tzinfo object. Defaults to
local time. Timestamps should always be UTC. If you have a non-UTC
timestamp:
arrow.Arrow.utcfromtimestamp(1367900664).replace(tzinfo='US/Pacific')
<Arrow [2013-05-07T04:24:24-07:00]>
The full docs are here:
http://arrow.readthedocs.io/en/latest/

Django/python - dispelling confusion regarding dates and timezone-awareness

I've been working extensively with dates in python/django. In order to solve various use-cases I've been blindly trying a variety of different approaches until one of them worked, without learning the logic behind how the various functions work.
Now it's crunch time. I'd like to ask a couple of questions regarding the intricacies of dates and timezones in django/python.
How do I interpret a datetime object that already has a timezone?
To clarify, let's say I do the following:
>>> generate_a_datetime()
datetime.datetime(2015, 12, 2, 0, 0, tzinfo=<DstTzInfo 'Canada/Eastern' LMT-1 day, 18:42:00 STD>)
>>>
The console output seems ambiguous to me:
Q1) This datetime object says that is 2015-12-02 - What is the generate_a_datetime function telling me? Is it saying that "a man standing in eastern Canada looking at his calendar sees "2015-12-02"? OR does it mean "This is "2015-12-02 UTC"... but don't forget to adjust this to the eastern-Canada timezone!"
django.utils.timezone.make_aware confuses me.
For example:
>>> from django.utils import timezone
>>> import pytz
>>> tz = pytz.timezone('Canada/Eastern')
>>> now_unaware = datetime.datetime.now()
>>> now_aware_with_django = timezone.make_aware(now_unaware, tz)
>>> now_aware_with_datetime = now_unaware.replace(tzinfo=tz)
>>> now_unaware
datetime.datetime(2015, 12, 2, 22, 1, 19, 564003)
>>> now_aware_with_django
datetime.datetime(2015, 12, 2, 22, 1, 19, 564003, tzinfo=<DstTzInfo 'Canada/Eastern' EST-1 day, 19:00:00 STD>)
>>> now_aware_with_datetime
datetime.datetime(2015, 12, 2, 22, 1, 19, 564003, tzinfo=<DstTzInfo 'Canada/Eastern' LMT-1 day, 18:42:00 STD>)
>>>
The objects now_aware_with_django and now_aware_with_datetime seem to behave similarly, but their console output suggests they are different.
Q2) What is the difference between now_aware_with_django and now_aware_with_datetime?
Q3) How do I know if I need to use timezone.make_aware or datetime.replace?
Naive datetimes vs. UTC datetimes
UTC means there is no change to the time value. "Naive" seems to mean that the time has no timezone associated with it.
Q4) What is the difference between naive and UTC datetimes? It seems like they are exactly the same - neither imposing any transformation upon the actual time value.
Q5) How do I know when I want to use naive times, and when I want to use UTC times?
If I could get an answer to all 5 questions that would be positively splendid. Thanks very much!

Q1) This datetime object says that is 2015-12-02 - What is the generate_a_datetime function telling me? Is it saying that "a man standing in eastern Canada looking at his calendar sees "2015-12-02"? OR does it mean "This is "2015-12-02 UTC"... but don't forget to adjust this to the eastern-Canada timezone!"
The first interpretation was correct. The timezone-aware datetime is already "adjusted" for you, and the tzinfo just telling you which timezone it is specified in.
Q2) What is the difference between now_aware_with_django and now_aware_with_datetime?
For the first case you are creating a datetime which represents the same point in time as the 'naive' one, and that's assuming the naive one was in your local timezone.
For the second case, you're saying that the naive one was already in the timezone you're providing, and then you just tack on the tzinfo.
Q3) How do I know if I need to use timezone.make_aware or datetime.replace?
Well, since they do different things, you need to know what you're trying to do to know which to use. If you want to convert from a naive timezone (in your local time) into a different timezone, you can use make_aware for that. If you already know the timezone of your naive datetime, you just use the replace (or look at localize in pytz, which is a bit more careful about this task).
Note: usually if you have any naive datetimes hanging around in the first place, you are doing something wrong earlier on and you should catch that earlier on. Try to get them tz aware at the boundary of your app - I'll say more about this in Q5.
Q4) What is the difference between naive and UTC datetimes? It seems like they are exactly the same - neither imposing any transformation upon the actual time value.
A naive datetime is just a datetime which doesn't tell you what timezone it's in. It's not necessarily UTC, it could be anything. It's similar to bytestrings and unicode - you have to know what the encoding is to say what the decoded bytes are saying. For a naive datetime, you have to know what timezone it's in before you can say what time it actually represents. So in this sense, a UTC datetime provides more information than a naive datetime.
UTC is coordinated universal time, blame the French for the weird acronym. Time zones are usually defined as differing from UTC by an integer number of hours, and for all practical purposes you can think of UTC as the timezone which differs from UTC by 0 hours. And it's like GMT without any daylight savings nonsense.
Q5) How do I know when I want to use naive times, and when I want to use UTC times?
There are differences of opinion on this. My recommendation is to always work with everything in UTC inside your app (and only store UTC in the databases too!). When any datetime data enters your app, however it enters your app, make sure it's correctly converted to UTC. This also means that anywhere inside your app that uses datetime.now() (which is a naive datetime with the "missing" tzinfo which should be the local timezone of the machine) instead uses datetime.utcnow() (which is a naive datetime in UTC) or even better datetime.now(tz=pytz.utc) (which is timezone aware).
Only change into local timezone at the "display" end of your app. You can usually do this with template tags, or even with clientside js.

Make Datetime Timezone Aware From UTC Offset and DST Bit

I am currently battling the cruel beast that is timezone localization in my django application, and having some trouble... I want to make naive datetimes timezone aware, based on a location. I have a database of zip codes that have the UTC offset in hours, as well as a 0 or 1 depending on if the zip codes adhere to DST. How might I use this data to accurately apply a timezone to my datetimes? Ideally the datetime would respond to changes in DST, rather than just always simply following the UTC offset.

With pytz it's not hard to convert the datetimes as you describe; the only complication is getting tzinfo instances corresponding to the time zone descriptions in your database.
The problem is that real timezones are more complicated than just offset + DST. For example, different regions adopted DST at different points in history, and different regions in the world can make the DST switch at different points in the year.
If your usage is only for the US, and only concerns future (not historical) dates, then there are a couple options that should yield accurate results (though note the caveat below):
Just create your own concrete tzinfo subclass that uses the offset and DST flag from your database. For example, the Python documentation gives sample code for "a complete implementation of current DST rules for major US time zones."
Map from the offset / DST to the corresponding pytz tzinfo object. Since there are only a handful of possible combinations in the US, just figure out which timezone name corresponds and use that.
TZ_MAP = {
...
(-5, 1): pytz.timezone('US/Eastern')
...
}
tz = TZ_MAP[(offset, is_dst)]
Once you have the tzinfo instance the conversion is simple, but note that dealing with DST involves inherent ambiguities. For example, when the clock is turned back at 2am, all the times between 1am and 2am occur twice in the local timezone. Assuming you don't know which one you actually mean, you can either pick one arbitrarily, or raise an exception.
# with no is_dst argument, pytz will guess if there is ambiguity
aware_dt = tz.localize(naive_dst)
# with is_dst=None, pytz will raise an exception if there is ambiguity
aware_dt = tz.localize(naive_dst, is_dst=None)

Python: strftime() UTC Offset Not working as Expected in Windows

Every time I use:
time.strftime("%z")
I get:
Eastern Daylight Time
However, I would like the UTC offset in the form +HHMM or -HHMM. I have even tried:
time.strftime("%Z")
Which still yields:
Eastern Daylight Time
I have read several other posts related to strftime() and %z always seems to return the UTC offset in the proper +HHMM or -HHMM format. How do I get strftime() to output in the +HHMM or -HHMM format for python 3.3?
Edit: I'm running Windows 7

In 2.x, if you look at the docs for time.strftime, they don't even mention %z. It's not guaranteed to exist at all, much less to be consistent across platforms. In fact, as footnote 1 implies, it's left up to the C strftime function. In 3.x, on the other hand, they do mention %z, and the footnote that explains that it doesn't work the way you'd expect is not easy to see; that's an open bug.
However, in 2.6+ (including all 3.x versions), datetime.strftime is guaranteed to support %z as "UTC offset in the form +HHMM or -HHMM (empty string if the the object is naive)." So, that makes for a pretty easy workaround: use datetime instead of time. Exactly how to change things depends on what exactly you're trying to do — using Python-dateutil tz then datetime.now(tz.tzlocal()).strftime('%z') is the way to get just the local timezone formatted as a GMT offset, but if you're trying to format a complete time the details will be a little different.
If you look at the source, time.strftime basically just checks the format string for valid-for-the-platform specifiers and calls the native strftime function, while datetime.strftime has a bunch of special handling for different specifiers, including %z; in particular, it will replace the %z with a formatted version of utcoffset before passing things on to strftime. The code has changed a few times since 2.7, and even been radically reorganized once, but the same difference is basically there even in the pre-3.5 trunk.

For a proper solution, see abarnert’s answer below.
You can use time.altzone which returns a negative offset in seconds. For example, I’m on CEST at the moment (UTC+2), so I get this:
>>> time.altzone
-7200
And to put it in your desired format:
>>> '{}{:0>2}{:0>2}'.format('-' if time.altzone > 0 else '+', abs(time.altzone) // 3600, abs(time.altzone // 60) % 60)
'+0200'
As abarnert mentioned in the comments, time.altzone gives the offset when DST is active while time.timezone does for when DST is not active. To figure out which to use, you can do what J.F. Sebastian suggested in his answer to a different question. So you can get the correct offset like this:
time.altzone if time.daylight and time.localtime().tm_isdst > 0 else time.timezone
As also suggested by him, you can use the following in Python 3 to get the desired format using datetime.timezone:
>>> datetime.now(timezone.utc).astimezone().strftime('%z')
'+0200'

Use time.timezone to get the time offset in seconds.
Format it using :
("-" if time.timezone > 0 else "+") + time.strftime("%H:%M", time.gmtime(abs(time.timezone)))
to convert the same to +/-HH:MM format.
BTW isn't this supposed to be a bug ? According to strftime docs.
Also I thought this SO answer might help you to convert from Zone offset string to HH:MM format. But since "%z" is not working as expected, I feel its moot.
NOTE: The time.timezone is immune to Daylight savings.

It will come as no surprise that this bug persists in, what is the latest Windows version available currently, Win 10 Version 1703 (Creators). However, time marches on and there is a lovely date-and-time library called pendulum that does what the question asks for. Sébastien Eustace (principal author of the product?) has shown me this.
>>> pendulum.now().strftime('%z')
'-0400'
pendulum assumes UTC/GMT unless told otherwise, and keeps timezone with the date-time object. There are many other possibilities, amongst them these:
>>> pendulum.now(tz='Europe/Paris').strftime('%z')
'+0200'
>>> pendulum.create(year=2016, month=11, day=5, hour=16, minute=23, tz='America/Winnipeg').strftime('%z')
'-0500'
>>> pendulum.now(tz='America/Winnipeg').strftime('%z')
'-0500'

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.