Django/python - dispelling confusion regarding dates and timezone-awareness

Django/python - dispelling confusion regarding dates and timezone-awareness - python

I've been working extensively with dates in python/django. In order to solve various use-cases I've been blindly trying a variety of different approaches until one of them worked, without learning the logic behind how the various functions work.
Now it's crunch time. I'd like to ask a couple of questions regarding the intricacies of dates and timezones in django/python.
How do I interpret a datetime object that already has a timezone?
To clarify, let's say I do the following:
>>> generate_a_datetime()
datetime.datetime(2015, 12, 2, 0, 0, tzinfo=<DstTzInfo 'Canada/Eastern' LMT-1 day, 18:42:00 STD>)
>>>
The console output seems ambiguous to me:
Q1) This datetime object says that is 2015-12-02 - What is the generate_a_datetime function telling me? Is it saying that "a man standing in eastern Canada looking at his calendar sees "2015-12-02"? OR does it mean "This is "2015-12-02 UTC"... but don't forget to adjust this to the eastern-Canada timezone!"
django.utils.timezone.make_aware confuses me.
For example:
>>> from django.utils import timezone
>>> import pytz
>>> tz = pytz.timezone('Canada/Eastern')
>>> now_unaware = datetime.datetime.now()
>>> now_aware_with_django = timezone.make_aware(now_unaware, tz)
>>> now_aware_with_datetime = now_unaware.replace(tzinfo=tz)
>>> now_unaware
datetime.datetime(2015, 12, 2, 22, 1, 19, 564003)
>>> now_aware_with_django
datetime.datetime(2015, 12, 2, 22, 1, 19, 564003, tzinfo=<DstTzInfo 'Canada/Eastern' EST-1 day, 19:00:00 STD>)
>>> now_aware_with_datetime
datetime.datetime(2015, 12, 2, 22, 1, 19, 564003, tzinfo=<DstTzInfo 'Canada/Eastern' LMT-1 day, 18:42:00 STD>)
>>>
The objects now_aware_with_django and now_aware_with_datetime seem to behave similarly, but their console output suggests they are different.
Q2) What is the difference between now_aware_with_django and now_aware_with_datetime?
Q3) How do I know if I need to use timezone.make_aware or datetime.replace?
Naive datetimes vs. UTC datetimes
UTC means there is no change to the time value. "Naive" seems to mean that the time has no timezone associated with it.
Q4) What is the difference between naive and UTC datetimes? It seems like they are exactly the same - neither imposing any transformation upon the actual time value.
Q5) How do I know when I want to use naive times, and when I want to use UTC times?
If I could get an answer to all 5 questions that would be positively splendid. Thanks very much!

Q1) This datetime object says that is 2015-12-02 - What is the generate_a_datetime function telling me? Is it saying that "a man standing in eastern Canada looking at his calendar sees "2015-12-02"? OR does it mean "This is "2015-12-02 UTC"... but don't forget to adjust this to the eastern-Canada timezone!"
The first interpretation was correct. The timezone-aware datetime is already "adjusted" for you, and the tzinfo just telling you which timezone it is specified in.
Q2) What is the difference between now_aware_with_django and now_aware_with_datetime?
For the first case you are creating a datetime which represents the same point in time as the 'naive' one, and that's assuming the naive one was in your local timezone.
For the second case, you're saying that the naive one was already in the timezone you're providing, and then you just tack on the tzinfo.
Q3) How do I know if I need to use timezone.make_aware or datetime.replace?
Well, since they do different things, you need to know what you're trying to do to know which to use. If you want to convert from a naive timezone (in your local time) into a different timezone, you can use make_aware for that. If you already know the timezone of your naive datetime, you just use the replace (or look at localize in pytz, which is a bit more careful about this task).
Note: usually if you have any naive datetimes hanging around in the first place, you are doing something wrong earlier on and you should catch that earlier on. Try to get them tz aware at the boundary of your app - I'll say more about this in Q5.
Q4) What is the difference between naive and UTC datetimes? It seems like they are exactly the same - neither imposing any transformation upon the actual time value.
A naive datetime is just a datetime which doesn't tell you what timezone it's in. It's not necessarily UTC, it could be anything. It's similar to bytestrings and unicode - you have to know what the encoding is to say what the decoded bytes are saying. For a naive datetime, you have to know what timezone it's in before you can say what time it actually represents. So in this sense, a UTC datetime provides more information than a naive datetime.
UTC is coordinated universal time, blame the French for the weird acronym. Time zones are usually defined as differing from UTC by an integer number of hours, and for all practical purposes you can think of UTC as the timezone which differs from UTC by 0 hours. And it's like GMT without any daylight savings nonsense.
Q5) How do I know when I want to use naive times, and when I want to use UTC times?
There are differences of opinion on this. My recommendation is to always work with everything in UTC inside your app (and only store UTC in the databases too!). When any datetime data enters your app, however it enters your app, make sure it's correctly converted to UTC. This also means that anywhere inside your app that uses datetime.now() (which is a naive datetime with the "missing" tzinfo which should be the local timezone of the machine) instead uses datetime.utcnow() (which is a naive datetime in UTC) or even better datetime.now(tz=pytz.utc) (which is timezone aware).
Only change into local timezone at the "display" end of your app. You can usually do this with template tags, or even with clientside js.

Related

If a timestamp is anchored in UTC, why isn't Python's `fromtimestamp` timezone-aware?

To my knowledge:
Python's datetime can be "naive" (if no timezone-info is available) or "timezone-aware". In contrast, a timestamp is well-defined to be anchored in UTC, i.e. a timestamp 0 corresponds to 1970-01-01 00:00:00+00:00 (no matter of your location).
Question: Why does datetime.fromtimestamp() return a naive datetime object though it has a well-defined input?
MWE
from datetime import datetime, timezone
timestamp = 0
# output: "1970-01-01 00:00:00+00:00", i.e. providing the timezone information,
# the resulting datetime is timezone-aware and accurate
print(datetime.fromtimestamp(timestamp, tz=timezone.utc))
# output: "1970-01-01 01:00:00" (for me running it in CET+0100 timezone), i.e.
# the interpretation is aware of my local time shift, but the resulting datetime
# is naive though it could be timezone-aware and thus not well-defined anymore
#
# I would have wished for/expected: "1970-01-01 01:00:00+01:00"
print(datetime.fromtimestamp(timestamp))
Why do I care?
The point is that we loose information in a dangerous way, i.e. we switch from a well-defined object to an object that is only well-defined if we know the timezone of the PC it has been read in. Though it could do better, imo. The way it is implemented, it is easy to mess things up without recognizing it.
But maybe I got the whole concept wrong :) That is why I am asking...

Why does converting timezones (and to unix timestamps) behave inconsistently in Pandas?

I'm parsing and manipulating some dates and times which, for reasons of interoperability with other systems, also need to be stored as UNIX (epoch) timestamps. In doing so, I'm seeing some weird behavior from pandas' Timestamp.tz_convert(), and then in its Timestamp.strftime() behavior in casting to epoch time, that makes me doubt my understanding of what should be going on.
The times I'm working with are in the US/Eastern timezone, but of course, epoch time is UTC, so my approach had been to cast to UTC since most conversions to/from UNIX timestamps assume that a tz-naive DateTime is in UTC. Let's leave aside whether doing that conversion is absolutely necessary to get valid timestamps; here's what I'm seeing that's problematic:
1. Using Timestamp.tz_convert() to change the timezone representation of a timestamp (i.e., a universal point in time) also changes the UNIX timestamp when you convert using Timestamp.strftime().
2. The differences in those timestamps don't even correspond to the proper hour differences between US-Eastern and GMT.
Here's some basic interactive-mode python to illustrate:
>>> import pytz
>>> from pytz import timezone
>>> import pandas as pd
>>> dtest = pd.to_datetime("Sunday, July 28, 2018 10:00 AM", infer_datetime_format=True).replace(tzinfo=timezone('America/New_York')) # okay, this should uniquely represent a point in time
>>> dtest
Timestamp('2018-07-28 10:00:00-0400', tz='America/New_York') # yup, that's the time - 10AM at GMT-0400.
>>> dtest2 = dtest.tz_convert('UTC') # convert to UTC
>>> dtest2
Timestamp('2018-07-28 14:00:00+0000', tz='UTC') # yup, same point in time, just different time zone now
>>> dtest.strftime('%s') # let's convert to unix time - this looks right
'1532786400'
>>> dtest2.strftime('%s') # should be the same, but it's not. WTF?
'1532804400'
The timestamps look like they are describing things equivalently: one is 10 AM at GMT-0400, the other is 2 PM at GMT+0000, a difference of 4 hours of clock time, as expected. They're both, of course, timezone-aware. But then converting them to UNIX timestamps yields
(A) different numbers, and even worse,
(B) numbers that differ by 5 hours (18000 seconds = 5 * 60 * 60) rather than 4, so I can't even assume that strftime() is merely ignoring timezone.
I'm using https://www.epochconverter.com/ to validate any timestamps as I sanity-check this, so that's a possible point of being misled. But according to that site,
1532786400 = 2018-07-28T10:00 -0400, and
1532804400 (that last result) = 2018-07-28T15:00 -0400, or 7pm GMT, a difference of 5 hours.
There are lots of questions on the subject of casting pandas Timestamps FROM a UNIX timestamp, but very little on questions casting TO epoch time. I can think of 2 possible explanations:
(1) tz_convert() is pulling some environment variable on my system that says I'm GMT -0500 and using that in the conversion process, in spite of that being irrelevant to converting between timezone-aware timestamps, and in so doing is actually changing the underlying point in time being represented. Or:
(2) Timestamp.strftime() is bugged and either ignoring the timezone parameter of a tz-aware timestamp or doing something truly bizarre when asked for a '%s' formatting parameter.
All advice greatly appreciated.

Make Datetime Timezone Aware From UTC Offset and DST Bit

I am currently battling the cruel beast that is timezone localization in my django application, and having some trouble... I want to make naive datetimes timezone aware, based on a location. I have a database of zip codes that have the UTC offset in hours, as well as a 0 or 1 depending on if the zip codes adhere to DST. How might I use this data to accurately apply a timezone to my datetimes? Ideally the datetime would respond to changes in DST, rather than just always simply following the UTC offset.

With pytz it's not hard to convert the datetimes as you describe; the only complication is getting tzinfo instances corresponding to the time zone descriptions in your database.
The problem is that real timezones are more complicated than just offset + DST. For example, different regions adopted DST at different points in history, and different regions in the world can make the DST switch at different points in the year.
If your usage is only for the US, and only concerns future (not historical) dates, then there are a couple options that should yield accurate results (though note the caveat below):
Just create your own concrete tzinfo subclass that uses the offset and DST flag from your database. For example, the Python documentation gives sample code for "a complete implementation of current DST rules for major US time zones."
Map from the offset / DST to the corresponding pytz tzinfo object. Since there are only a handful of possible combinations in the US, just figure out which timezone name corresponds and use that.
TZ_MAP = {
...
(-5, 1): pytz.timezone('US/Eastern')
...
}
tz = TZ_MAP[(offset, is_dst)]
Once you have the tzinfo instance the conversion is simple, but note that dealing with DST involves inherent ambiguities. For example, when the clock is turned back at 2am, all the times between 1am and 2am occur twice in the local timezone. Assuming you don't know which one you actually mean, you can either pick one arbitrarily, or raise an exception.
# with no is_dst argument, pytz will guess if there is ambiguity
aware_dt = tz.localize(naive_dst)
# with is_dst=None, pytz will raise an exception if there is ambiguity
aware_dt = tz.localize(naive_dst, is_dst=None)

Python pytz: convert local time to utc. Localize doesn't seem to convert

I have a database that stores datetime as UTC. I need to look up info from a particular time, but the date and time are given in a local time, let's say 'Europe/Copenhagen'. I'm given these as:
year = 2012; month = 12; day = 2; hour = 13; min = 1;
So, I need to convert these to UTC so I can look them up in the database. I want to do this using pytz. I am looking at localize:
local_tz = timezone('Europe/Copenhagen')
t = local_tz.localize(datetime.datetime(year, month, day, hour, min))
But I'm confused about localize(). Is this assuming that year, etc, are given to me in local time? Or, is it assuming that they they are given in UTC and now it has converted them to local time?
print t gives me:
2012-12-02 13:01:00+01:00
So it seems that it assumed that the original year, etc was in utc; hours is now 13+1 instead of 13. So what should I do instead? I have read the pytz documentation and this does not make it clearer to me. It mentions a lot that things are tricky so I'm not sure whether pytz is actually solving these issues. And, I don't always know if the examples are showing me things that work or things that won't work.
I tried normalize:
print local_tz.normalize(t)
That gives me the same result as print t.
EDIT: With the numbers given above for year etc. it should match up with information in the database for 2012-12-2 12:01. (since Copenhagen is utc+1 on that date)

localize() attaches the timezone to a naive datetime.datetime instance in the local timezone.
If you have datetime values in a local timezone, localize to that timezone, then use .astimezone() to cast the value to UTC:
>>> localdt = local_tz.localize(datetime.datetime(year, month, day, hour, min))
>>> localdt.astimezone(pytz.UTC)
datetime.datetime(2012, 12, 2, 12, 1, tzinfo=<UTC>)
Note that you don't need to do this, datetime objects with a timezone can be compared; they'll both be normalized to UTC for the test:
>>> localdt.astimezone(pytz.UTC) == localdt
True

If you know the incoming time representation is in the Europe/Copenhagen timezone, you can create it as timezone-aware to begin with:
local_tz = timezone('Europe/Copenhagen')
t = local_tz.localize(datetime.datetime(year, month, day, hour, min))
You can then "convert" this to UTC with:
t_utc = t.astimezone(pytz.UTC)
but this might not be necessary, depending on how sane your database drivers are. t and t_utc represent the same point-in-time and well-behaving code should treat them interchangeably. The (year, month, day, hour, minute, second, …) tuple is merely a human-readable representation of this point-in-time in a specific time zone and calendar system.

Python - Setting a datetime in a specific timezone (without UTC conversions)

Just to be clear, this is python 2.6, I am using pytz.
This is for an application that only deals with US timezones, I need to be able to anchor a date (today), and get a unix timestamp (epoch time) for 8pm and 11pm in PST only.
This is driving me crazy.
> pacific = pytz.timezone("US/Pacific")
> datetime(2011,2,11,20,0,0,0,pacific)
datetime.datetime(2011, 2, 11, 20, 0, tzinfo=<DstTzInfo 'US/Pacific' PST-1 day, 16:00:0 STD>)
> datetime(2011,2,11,20,0,0,0,pacific).strftime("%s")
'1297454400'
zsh> date -d '#1297454400'
Fri Feb 11 12:00:00 PST 2011
So, even though I am setting up a timezone, and creating the datetime with that time zone, it is still creating it as UTC and then converting it. This is more of a problem since UTC will be a day ahead when I am trying to do the calculations.
Is there an easy (or at least sensical) way to generate a timestamp for 8pm PST today?
(to be clear, I do understand the value of using UTC in most situations, like database timestamps, or for general storage. This is not one of those situations, I specifically need a timestamp for evening in PST, and UTC should not have to enter into it.)

There are at least two issues:
you shouldn't pass a timezone with non-fixed UTC offset such as "US/Pacific" as tzinfo parameter directly. You should use pytz.timezone("US/Pacific").localize() method instead
.strftime('%s') is not portable, it ignores tzinfo, and it always uses the local timezone. Use datetime.timestamp() or its analogs on older Python versions instead.
To make a timezone-aware datetime in the given timezone:
#!/usr/bin/env python
from datetime import datetime
import pytz # $ pip install pytz
tz = pytz.timezone("US/Pacific")
aware = tz.localize(datetime(2011, 2, 11, 20), is_dst=None)
To get POSIX timestamp:
timestamp = (aware - datetime(1970, 1, 1, tzinfo=pytz.utc)).total_seconds()
(On Python 2.6, see totimestamp() function on how to emulate .total_seconds() method).

Create a tzinfo object utc for the UTC time zone, then try this:
#XXX: WRONG (for any timezone with a non-fixed utc offset), DON'T DO IT
datetime(2011,2,11,20,0,0,0,pacific).astimezone(utc).strftime("%s")
Edit: As pointed out in the comments, putting the timezone into the datetime constructor isn't always robust. The preferred method using the pytz documentation would be:
pacific.localize(datetime(2011,2,11,20,0,0,0)).astimezone(utc).strftime("%s")
Also note from the comments that strftime("%s") isn't reliable, it ignores the time zone information (even UTC) and assumes the time zone of the system it's running on. It relies on an underlying C library implementation and doesn't work at all on some systems (e.g. Windows).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.