Python: How to check if date is ambiguous? - python

Is there any simple way, how to determine if date in a datetime object is ambiguous - meaning that particular datetime could exist twice in a particular timezone in order of regular time change?
Here is an example of ambiguous datetime object. In New York timezone, on 5th November 2017 2 AM, time was shifted back to 1 AM.
import datetime
import dateutil
# definition of time zone
eastern_time = dateutil.tz.gettz('America/New_York')
# definition of ambiguous datetime object
dt = datetime.datetime(2017, 11, 5, 1, 30, 0, tzinfo=eastern_time)
I expect something like this.
>>> suggested_module.is_dt_ambiguous(dt)
True

You're looking for the datetime_ambiguous function, which applies during a backward transition (such as end of DST in the fall).
dateutil.tz.datetime_ambiguous(dt, eastern_time)
You might also be interested in datetime_exists, which applies during a forward transition (such as start of DST in the spring).

Related

How to convert Zulu time format to datetime format in PySpark?

I am trying to convert a column that contains Zulu formatted timestamps to a typical datetime format. This is an example of the format the dates are in: 1533953335000.
So far, I have been using this:
from pyspark.sql import functions as f
from pyspark.sql import types as t
df=df.withColumn('DTMZ',f.date_format(df.DTMZ.cast(dataType=t.TimestampType()), "yyyy-MM-dd"))
df=df.withColumn('DTMZ', f.to_date(df.DTMZ.cast(dataType=t.TimestampType())))
My output when I use the above code is: 50579-01-17
I am hoping to be able to view these dates in a typical readable format.
Could anyone help me out with this?
datetime in Python and most languages and databases is a binary type, it has no format. Python has no timestamp type. The value 1533953335000 looks to be a UNIX timestamp, an offset since 1970-01-01.Normally that's in seconds but in this case it seems it's in milliseconds.
You can convert a UNIX timestamp to a datetime using datetime.fromtimestamp. 1533953335000 must be divided by 1000 first because fromtimestamp expects seconds :
>>> from datetime import datetime
>>> datetime.fromtimestamp(1533953335000/1000)
datetime.datetime(2018, 8, 11, 5, 8, 55)
This returns a local time. Zulu is sometimes used to refer to UTC/+00:00 for historical reasons :
The time zone using UTC is sometimes denoted UTC±00:00 or by the letter Z—a reference to the equivalent nautical time zone (GMT), which has been denoted by a Z since about 1950. Time zones were identified by successive letters of the alphabet and the Greenwich time zone was marked by a Z as it was the point of origin. The letter also refers to the "zone description" of zero hours, which has been used since 1920 (see time zone history). Since the NATO phonetic alphabet word for Z is "Zulu", UTC is sometimes known as "Zulu time". This is especially true in aviation, where "Zulu" is the universal standard
To get UTC time, the datetime.utcfromtimestamp function is used :
>>> datetime.utcfromtimestamp(1533953335000/1000)
datetime.datetime(2018, 8, 11, 2, 8, 55)
Notice that the results are identical. Both methods return naive objects, ie objects with no timezone information. There's no way to tell if this is local or UTC, summer or winter time. The timezone must be passed explicitly to get an "aware" object :
>>> from datetime import timezone
>>> datetime.fromtimestamp(1533953335000/1000,timezone.utc)
datetime.datetime(2018, 8, 11, 2, 8, 55, tzinfo=datetime.timezone.utc)
Other languages and databases have explicit timezone-aware types, eg datetimeoffset

Calculate hours difference between timezone and UTC based timezone name in following format: America/New_York

I have a library which returns the timezone of a location in the format: America/New_York. Based on that timezone name, I want to calculate the hours between that time zone and UTC, taking into account daylight savings time and all. I'm using Python.
My first idea was to use the Google python library and search for 'America/New_York time' but that only gave me back a list of urls which I could visit to get the info myself. It would be awesome if I could get the current time seen if I were to manually search 'America/New_York time' into google, right into my program.
I'm sure this question has been asked before, but I am new to stack overflow and python so help is appreciated.
Thanks!
The offset from UTC depends on the date (since daylight saving time may or may not be in effect). So you need to provide a datetime for the comparison.
ZoneInfo.utcoffset will return a timedelta object directly.
>>> from zoneinfo import ZoneInfo
>>> from datetime import datetime
>>> ZoneInfo("America/New_York").utcoffset(datetime(2021, 10, 23)) #EDT
datetime.timedelta(days=-1, seconds=72000)
>>> ZoneInfo("America/New_York").utcoffset(datetime(2021, 11, 15)) #EST
datetime.timedelta(days=-1, seconds=68400)
>>> ZoneInfo("Asia/Tokyo").utcoffset(datetime(2021, 10, 23))
datetime.timedelta(seconds=32400)
Not a complete answer, but maybe you could implement a dictionary that connects these format to the normal format with three letters. With this you can then use datetime and pytz to make the rest. If you don't have too many possible outputs in the current format this would be feasible, otherwise of course not.
>>> from datetime import datetime, timedelta
>>> from datetime import timezone
>>> from zoneinfo import ZoneInfo
>>> dt1 = datetime(2020, 11, 1, 8, tzinfo=timezone.utc)
>>> dt2 = datetime(2020, 11, 1, 8, tzinfo=ZoneInfo("America/New_York"))
>>> dt2 - dt1
datetime.timedelta(seconds=18000)
>>>
Note that the difference will be four for five hours depending on whether daylight saving time is in effect or not.

Given a UTC time, get a specified timezone's midnight

Note this is not quite the same as this question. That question assumes the time you want is "now", which is not the same as for an arbitrary point in time.
I have a UTC, aware, datetime object, call it point_in_time (e.g. datetime(2017, 3, 12, 16, tzinfo=tz.tzutc())).
I have a timezone, call it location (e.g. 'US/Pacific'), because I care about where it is, but its hours offset from UTC may change throughout the year with daylight savings and whatnot.
I want to
1) get the date of point_in_time if I'm standing in location,
2) get midnight of that date if I'm standing in location.
===
I tried to simply use .astimezone(timezone('US/Pacific')) and then .replace(hours=0, ...) to move to midnight, but as you might notice about my example point_in_time, the midnight for that date is on the other side of a daylight savings switch!
The result was that I got a time representing UTC datetime(2017, 3, 12, 7), instead of a time representing UTC datetime(2017, 3, 12, 8), which is the true midnight.
EDIT:
I'm actually thinking the difference between mine and the linked question is that I'm looking for the most recent midnight in the past. That question's answer seems to be able to give a midnight that could be in the past or future, perhaps?
Your example highlights the perils of doing datetime arithmetic in a local time zone.
You can probably achieve this using pytz's normalize() function, but here's the method that occurs to me:
point_in_time = datetime(2017, 3, 12, 16, tzinfo=pytz.utc)
pacific = pytz.timezone("US/Pacific")
pacific_time = point_in_time.astimezone(pacific)
pacific_midnight_naive = pacific_time.replace(hour=0, tzinfo=None)
pacific_midnight_aware = pacific.localize(pacific_midnight_naive)
pacific_midnight_aware.astimezone(pytz.utc) # datetime(2017, 3, 12, 8)
In other words, you first convert to Pacific time to figure out the right date; then you convert again from midnight on that date to get the correct local time.
Named timezones such as "US/Pacific" are by definition daylight-savings aware. If you wish to use a fixed non-daylight-savings-aware offset from GMT you can use the timezones "Etc/GMT+*", where * is the desired offset. For example for US Pacific Standard Time you would use "Etc/GMT+8":
import pandas as pd
point_in_time = pd.to_datetime('2017-03-12 16:00:00').tz_localize('UTC')
# not what you want
local_time = point_in_time.tz_convert("US/Pacific")
(local_time - pd.Timedelta(hours=local_time.hour)).tz_convert('UTC')
# Timestamp('2017-03-12 07:00:00+0000', tz='UTC')
# what you want
local_time = point_in_time.tz_convert("Etc/GMT+8")
(local_time - pd.Timedelta(hours=local_time.hour)).tz_convert('UTC')
# Timestamp('2017-03-12 08:00:00+0000', tz='UTC')
See the docs at http://pvlib-python.readthedocs.io/en/latest/timetimezones.html for more info.
EDIT Now that I think about it, Midnight PST will always be 8am UTC, so you could simplify this as
if point_in_time.hour >=8:
local_midnight = point_in_time - point_in_time.hour + 8
else:
local_midnight = point_in_time - point_in_time.hour - 16

Python datetime.datetime from time.structtime difference

I use feedparser to grab the entries from some RSS feeds.
The entries have a published_parsed field which are parsed by feedparser into time.structtime.
I use this function to convert the time.structtime into a datetime.datetime:
def publishParsedToDatetime(structTime):
return datetime.datetime.fromtimestamp(time.mktime(structTime))
Input (structtime):
time.struct_time(tm_year=2015, tm_mon=8, tm_mday=1, tm_hour=20, tm_min=28, tm_sec=33, tm_wday=5, tm_yday=213, tm_isdst=0)
Output (datetime):
2015-08-01 21:28:33
I see a problem which could be timezone related, there is 1 hour difference between the structtime and the datetime values.
The structtime value is UTC.
But the datetime.datetime value is neither UTC, nor my current timezone (CET, Central European Time, we observe Summertime, so we have UTC + 2hrs at the moment).
How can this be explained?
Actually, as explained in the documentation for datetime.fromtimestamp, it converts to local time by default:
Return the local date and time corresponding to the POSIX timestamp, such as is returned by time.time(). If optional argument tz is None or not specified, the timestamp is converted to the platform’s local date and time, and the returned datetime object is naive
The 1 hour difference can then be explained by the field tm_isdst=0 tells it to not use daylight savings (despite your local time zone using it).
To see this more clearly, we construct two test cases
import time, datetime
# this is how your time object was constructed before
tm_isdst = 0
t = time.mktime((2015, 8, 1, 20, 28, 33, 5, 213, tm_isdst))
print("Old conversion: {0}".format(datetime.datetime.fromtimestamp(t)))
# this is what happens if you let mktime "divine" a time zone
tm_isdst = -1
t = time.mktime((2015, 8, 1, 20, 28, 33, 5, 213, tm_isdst))
print("New conversion: {0}".format(datetime.datetime.fromtimestamp(t)))
The output of this is as follows:
Old conversion: 2015-08-01 21:28:33
New conversion: 2015-08-01 20:28:33
The problem then, you see, is that the structTime object being passed to your publishParsedToDatetime has tm_isdst=0 but the time stamp you wanted to parse was for a DST time zone.
As you have already noted in another comment, the proper solution to this is probably to always use UTC in your back-end code, and only do time zone handling when showing the time to the user, or when reading user input.
calendar.timegm takes a UTC timetuple as input and returns its timestamp.
In contrast, time.mktime takes a local timetuple as input and returns its (UTC) timestamp. All timestamps represent seconds since the Epoch, 1970-1-1 00:00:00 UTC.
utcfromtimestamp takes a timestamp as input and converts it to a naive
(i.e. timezone-unaware) UTC datetime.
fromtimestamp takes the same timestamp and converts it to the corresponding
naive local datetime.
Since your timetuples (e.g. structTime) are UTC timetuples, you should use calendar.timegm, not time.mktime, to find the correct timestamp.
Once you have the correct timestamp, fromtimestamp will return the corresponding naive local datetime.
import time
import calendar
import datetime as DT
timetuple = (2015, 8, 1, 20, 28, 33, 5, 213, 0)
timestamp = calendar.timegm(timetuple)
naive_local_date = DT.datetime.fromtimestamp(timestamp)
print('Naive local: {}'.format(naive_local_date))
yields
Naive local: 2015-08-01 22:28:33

How can I convert data into "ddmmyyyy hh:mm:ss"?

All that I have is the number 223 (which is the number of days from Jan 01,2012), and the time at which the event occurred at (for example: 09, 55, 56.38 = (hh, mm, ss)).
In Excel I can get a serial number - in the format 10-Aug-12 09:55:56 - with these four numbers.
In Python I've been having some troubles doing the same. Does anyone have any idea about what commands I could be using?
The timedelta class in Python's datetime module lets you create a time difference, in days (or other units), which you can add to/subtract from datetime objects to get a new date.
(Note: the datetime module contains a datetime object. Yup.)
So, you could construct a datetime object using 1st Jan 2012 and the hour, minute and second values you've got, and then add a timedelta of 223 days to it.
To get that datetime as a string in your desired format, the strftime method on the datetime object is your friend.
Putting it all on one line:
from datetime import datetime, timedelta
serial_number = (datetime(2012, 1, 1, 9, 55, 56) + timedelta(223)).strftime('%d-%h-%y %H:%M:%S')

Categories

Resources