How do I quickly compare dates from parsed XML in Python?

How do I quickly compare dates from parsed XML in Python? - python

I'm playing with a function in Python 3 that queries small blocks of XML from the eBird API, parsing them with minidom. The function locates and compares dates from two requested blocks of XML, returning the most recent. The code below does its job, but I wanted to ask if there was a simpler way of doing this (the for loops seem unnecessary since each bit of XML will only ever have one date, and comparing pieces of the returned string bit by bit seems clunky). Is there a faster way to produce the same result?
from xml.dom import minidom
import requests
def report(owl):
#GETS THE MOST RECENT OBSERVATION FROM BOTH USA AND CANADA
usa_xml = requests.get('http://ebird.org/ws1.1/data/obs/region_spp/recent?rtype=country&r=US&sci=surnia%20ulula&back=30&maxResults=1&includeProvisional=true')
canada_xml = requests.get('http://ebird.org/ws1.1/data/obs/region_spp/recent?rtype=country&r=CA&sci=surnia%20ulula&back=30&maxResults=1&includeProvisional=true')
usa_parsed = minidom.parseString(usa_xml.text)
canada_parsed = minidom.parseString(canada_xml.text)
#COMPARES THE RESULTS AND RETURNS THE MOST RECENT
usa_raw_date = usa_parsed.getElementsByTagName('obs-dt')
canada_raw_date = canada_parsed.getElementsByTagName('obs-dt')
for date in usa_raw_date:
usa_date = str(date.childNodes[0].nodeValue)
for date in canada_raw_date:
canada_date = str(date.childNodes[0].nodeValue)
if int(usa_date[0:4]) > int(canada_date[0:4]):
most_recent = usa_date
elif int(usa_date[5:7]) > int(canada_date[5:7]):
most_recent = usa_date
elif int(usa_date[8:10]) > int(canada_date[8:10]):
most_recent = usa_date
elif int(usa_date[11:13]) > int(canada_date[11:13]):
most_recent = usa_date
elif int(usa_date[14:16]) > int(canada_date[14:16]):
most_recent = usa_date
else:
most_recent = canada_date
return most_recent

Use the datetime.datetime.strftime() to parse the dates into datetime.datetime() objects, then us max() to return the greater value (most recent):
usa_date = datetime.datetime.strptime(
usa_raw_date[-1].childNodes[0].nodeValue, '%Y-%m-%d %H:%M')
canada_date = datetime.datetime.strptime(
canada_raw_date[-1].childNodes[0].nodeValue, '%Y-%m-%d %H:%M')
return max(usa_date, canada_date)
Running this now against the URLs you provided, that results in:
>>> usa_date = datetime.datetime.strptime(
... usa_raw_date[-1].childNodes[0].nodeValue, '%Y-%m-%d %H:%M')
>>> canada_date = datetime.datetime.strptime(
... canada_raw_date[-1].childNodes[0].nodeValue, '%Y-%m-%d %H:%M')
>>> usa_date, canada_date
(datetime.datetime(2014, 5, 5, 11, 0), datetime.datetime(2014, 5, 11, 18, 0))
>>> max(usa_date, canada_date)
datetime.datetime(2014, 5, 11, 18, 0)
This returns a datetime.datetime() object; if returning a string is important to you, you can always still return:
return max(usa_date, canada_date).strftime('%Y-%m-%d %H:%M')
e.g. format the datetime object to a string again.

Related

datetime year out of range

I passed filenames such as abcef-1591282803 into this function and the function worked fine:
def parse_unixtimestamp(filename):
ts = re.findall(r'\d{10}', filename)[0]
return ts
However, then I modified the function to this so the it also works when the timestamp is of 13 digits instead of 10. file20-name-1591282803734
def parse_unixtimestamp(filename):
ts = re.findall(r'\d{13}|\d{10}', filename)[0]
return ts
It didn't throw an error until here. But in case of 13 digits, I get ts values like this 1591282803734. Now when I pass this value to this function for year:
def get_dateparts(in_unixtimestamp):
dt_object = datetime.fromtimestamp(int(in_unixtimestamp))
date_string = dt_object.strftime("%Y-%m-%d")
year_string = dt_object.strftime("%Y")
month_string = dt_object.strftime("%m")
day_string = dt_object.strftime("%d")
logger.info(f'year_string: {year_string}')
result = {"date_string": date_string, "year_string": year_string, "month_string": month_string,
"day_string": day_string}
return result
I get an error that:
year 52395 is out of range
I wouldn't get this error when the unixtimestamp passed into parse_unixtimestampIs only 10 digits. How can I modify this function such that it works in both cases?

datetime.fromtimestamp requires you to supply the time in seconds. If your string has 13 digits instead of 9 (i.e. it is milliseconds), you should be using:
>>> datetime.fromtimestamp(1591282803734/1000)
datetime.datetime(2020, 6, 4, 11, 0, 3, 734000)
To do this in your function, you could check the length of your timestamp before calling datetime.fromtimestamp.
Change your function as follows:
def get_dateparts(in_unixtimestamp):
if len(in_unixtimestamp) == 13:
dt_object = datetime.fromtimestamp(int(in_unixtimestamp)/1000)
else:
dt_object = datetime.fromtimestamp(int(in_unixtimestamp))
date_string = dt_object.strftime("%Y-%m-%d")
year_string = dt_object.strftime("%Y")
month_string = dt_object.strftime("%m")
day_string = dt_object.strftime("%d")
logger.info(f'year_string: {year_string}')
result = {"date_string": date_string,
"year_string": year_string,
"month_string": month_string,
"day_string": day_string}
return result
>>> get_dateparts(parse_unixtimestamp("abcef-1591282803"))
{'date_string': '2020-06-04',
'year_string': '2020',
'month_string': '06',
'day_string': '04'}
>>> get_dateparts(parse_unixtimestamp("file20-name-1591282803734"))
{'date_string': '2020-06-04',
'year_string': '2020',
'month_string': '06',
'day_string': '04'}

Python/Discord.py remove milliseconds from time.time()

I did this for my python Discord bot (basically it's a voice activity tracker), everything works fine but I want to remove the milliseconds from total_time. I would like to get something in this format '%H:%M:%S'
Is this possible ?
Here's a part of the code:
if(before.channel == None):
join_time = round(time.time())
userdata["join_time"] = join_time
elif(after.channel == None):
if(userdata["join_time"] == None): return
userdata = voice_data[guild_id][new_user]
leave_time = time.time()
passed_time = leave_time - userdata["join_time"]
userdata["total_time"] += passed_time
userdata["join_time"] = None
And here's the output:
{
"total_time": 7.4658853358879,
}

You can use a datetime.timedelta object, with some caveats.
>>> import datetime as dt
>>> data = {"total_time": 7.4658853358879}
>>> data["total_time"] = str(dt.timedelta(seconds=int(data["total_time"])))
>>> data
{'total_time': '0:00:07'}
If your time is greater than 1 day, or less than zero, the format starts including days
>>> str(dt.timedelta(days=1))
'1 day, 0:00:00'
>>> str(dt.timedelta(seconds=-1))
'-1 day, 23:59:59'
>>>

Parse Timestamp String with Quarter to Python datetime

I am searching for a way to parse a rather unusual timestamp string to a Python datetime object. The problem here is, that this string includes the corresponding quarter, which seems not to be supported by the datetime.strptime function. The format of the string is as follows: YYYY/qq/mm/dd/HH/MM e.g 1970/Q1/01/01/00/00. I am searching for a function, which is allows me to parse string in such a format, including a validity check, if the quarter is correct for the date.

Question: Datetime String with Quarter to Python datetime
This implements a OOP solution which extends Python datetime with a directive: %Q.
Possible values: Q1|Q2|Q3|Q4, for example:
data_string = '1970/Q1/01/01/00/00'
# '%Y/%Q/%m/%d/%H/%M'
Note: This depends on the module _strptime class TimeRE and may fail if the internal implementation changes!
from datetime import datetime
class Qdatetime(datetime):
re_compile = None
#classmethod
def _strptime(cls):
import _strptime
_class = _strptime.TimeRE
if not 'strptime_compile' in _class.__dict__:
setattr(_class, 'strptime_compile', getattr(_class, 'compile'))
setattr(_class, 'compile', cls.compile)
def compile(self, format):
import _strptime
self = _strptime._TimeRE_cache
# Add directive %Q
if not 'Q' in self:
self.update({'Q': r"(?P<Q>Q[1-4])"})
Qdatetime.re_compile = self.strptime_compile(format)
return Qdatetime.re_compile
def validate(self, quarter):
# 1970, 1, 1 is the lowest date used in timestamp
month = [1, 4, 7, 10][quarter - 1]
day = [31, 30, 30, 31][quarter - 1]
q_start = datetime(self.year, month, 1).timestamp()
q_end = datetime(self.year, month + 2, day).timestamp()
dtt = self.timestamp()
return dtt >= q_start and dtt<= q_end
#property
def quarter(self): return self._quarter
#quarter.setter
def quarter(self, data):
found_dict = Qdatetime.re_compile.match(data).groupdict()
self._quarter = int(found_dict['Q'][1])
#property
def datetime(self):
return datetime(self.year, self.month, self.day,
hour=self.hour, minute=self.minute, second=self.second)
def __str__(self):
return 'Q{} {}'.format(self.quarter, super().__str__())
#classmethod
def strptime(cls, data_string, _format):
cls._strptime()
dt = super().strptime(data_string, _format)
dt.quarter = data_string
if not dt.validate(dt.quarter):
raise ValueError("time data '{}' does not match quarter 'Q{}'"\
.format(data_string, dt.quarter))
return dt
Usage:
for data_string in ['1970/Q1/01/01/00/00',
'1970/Q3/12/31/00/00',
'1970/Q2/05/05/00/00',
'1970/Q3/07/01/00/00',
'1970/Q4/12/31/00/00',
]:
try:
d = Qdatetime.strptime(data_string, '%Y/%Q/%m/%d/%H/%M')
except ValueError as e:
print(e)
else:
print(d, d.datetime)
Output:
Q1 1970-01-01 00:00:00 1970-01-01 00:00:00
time data '1970/Q3/12/31/00/00' does not match quarter 'Q3'
Q2 1970-05-05 00:00:00 1970-05-05 00:00:00
Q3 1970-07-01 00:00:00 1970-07-01 00:00:00
Q4 1970-12-31 00:00:00 1970-12-31 00:00:00
Tested with Python: 3.6 - verified with Python 3.8 source

How to workout if a datetime is older than x months in Python

I want to find out if a entry has been updated in the last 6 months.
This is what I have tried:
def is_old(self):
"""
Is older than 6 months (since last update)
"""
time_threshold = datetime.date.today() - datetime.timedelta(6*365/12)
if self.last_update < time_threshold:
return False
return True
but i get the error:
if self.last_update < time_threshold:
TypeError: can't compare datetime.datetime to datetime.date

You need the days keyword
>>> import datetime
>>> datetime.date.today() - datetime.timedelta(days=30)
datetime.date(2014, 5, 26)
>>> datetime.date.today() - datetime.timedelta(days=180)
datetime.date(2013, 12, 27)
>>> datetime.date.today() - datetime.timedelta(days=6*365/12)
datetime.date(2013, 12, 25)
Also, coming to your actual error: TypeError: can't compare datetime.datetime to datetime.date
You can just do
def is_old(self):
time_threshold = datetime.date.today() - datetime.timedelta(days=6*365/12)
#The following code can be simplified, i shall let you figure that out yourself.
if self.last_update and self.last_update.date() < time_threshold:
return False
return True

Your database field last_update is datetime field and you are comparing it against date hence the error, Instead of datetime.date.today() use datetime.datetime.now(). Better use django.utils.timezone which will respect the TIME_ZONE in settings:
from django.utils import timezone
def is_old(self):
"""
Is older than 6 months (since last update)
"""
time_threshold = timezone.now() - datetime.timedelta(6*365/12)
return bool(self.last_update > time_threshold)

You can use the external module dateutil:
from dateutil.relativedelta import relativedelta
def is_old(last_update):
time_threshold = date.today() - relativedelta(months=6)
return last_update < time_threshold
That's assuming the type of last_update is a date object

Python pass tzinfo to naive datetime without pytz

I've been struggling for way too long on dates/timezones in Python and was thinking someone could give me a hand here.
Basically I want to do a conversion in UTC and taking into account DST changes.
I've created the following tzinfo class from one of the Python tutorials (not 100% accurate I know but it doesn't need to):
from datetime import tzinfo, timedelta, datetime
ZERO = timedelta(0)
HOUR = timedelta(hours=1)
def first_sunday_on_or_after(dt):
days_to_go = 6 - dt.weekday()
if days_to_go:
dt += timedelta(days_to_go)
return dt
DSTSTART_2007 = datetime(1, 3, 8, 2)
DSTEND_2007 = datetime(1, 11, 1, 1)
DSTSTART_1987_2006 = datetime(1, 4, 1, 2)
DSTEND_1987_2006 = datetime(1, 10, 25, 1)
DSTSTART_1967_1986 = datetime(1, 4, 24, 2)
DSTEND_1967_1986 = DSTEND_1987_2006
class USTimeZone(tzinfo):
def __init__(self, hours, reprname, stdname, dstname):
self.stdoffset = timedelta(hours=hours)
self.reprname = reprname
self.stdname = stdname
self.dstname = dstname
def __repr__(self):
return self.reprname
def tzname(self, dt):
if self.dst(dt):
return self.dstname
else:
return self.stdname
def utcoffset(self, dt):
return self.stdoffset + self.dst(dt)
def dst(self, dt):
if dt is None or dt.tzinfo is None:
# An exception may be sensible here, in one or both cases.
# It depends on how you want to treat them. The default
# fromutc() implementation (called by the default astimezone()
# implementation) passes a datetime with dt.tzinfo is self.
return ZERO
assert dt.tzinfo is self
# Find start and end times for US DST. For years before 1967, return
# ZERO for no DST.
if 2006 < dt.year:
dststart, dstend = DSTSTART_2007, DSTEND_2007
elif 1986 < dt.year < 2007:
dststart, dstend = DSTSTART_1987_2006, DSTEND_1987_2006
elif 1966 < dt.year < 1987:
dststart, dstend = DSTSTART_1967_1986, DSTEND_1967_1986
else:
return ZERO
start = first_sunday_on_or_after(dststart.replace(year=dt.year))
end = first_sunday_on_or_after(dstend.replace(year=dt.year))
# Can't compare naive to aware objects, so strip the timezone from
# dt first.
if start <= dt.replace(tzinfo=None) < end:
return HOUR
else:
return ZERO
On the other side I have an arbitrary date object in EST, and I want to know the number of hours they differ by taking into account DST.
I've tried something like this:
>>> Eastern = ustimezone.USTimeZone(-5, "Eastern", "EST", "EDT")
>>> x = datetime.date.today() # I actually get an arbitrary date but this is for the example
>>> x_dt = datetime.datetime.combine(x, datetime.time())
>>> x_dt_tz = x_dt.astimezone(Eastern)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: astimezone() cannot be applied to a naive datetime
I've seen several posts who tell to use localize from the pytz module, but unfortunately I am not able to use additional modules, so impossible to use pyzt
Does anyone know how I can get this naive datetime into a timezoned object without using pytz?

For what it's worth, the answer #skyl provided is more-or-less equivalent to what pytz does.
Here is the relevant pytz source. It just calls replace on the datetime object with the tzinfo kwarg:
def localize(self, dt, is_dst=False):
'''Convert naive time to local time'''
if dt.tzinfo is not None:
raise ValueError('Not naive datetime (tzinfo is already set)')
return dt.replace(tzinfo=self)

Use x_dt.replace(tzinfo=Eastern) (found from this Google Groups thread).
x_dt.replace(tzinfo=Eastern).utcoffset() returns datetime.timedelta(-1, 72000) which corresponds to -4 hours! (from Question's comment)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I quickly compare dates from parsed XML in Python? - python

Related

datetime year out of range

Python/Discord.py remove milliseconds from time.time()

Parse Timestamp String with Quarter to Python datetime

How to workout if a datetime is older than x months in Python

Python pass tzinfo to naive datetime without pytz

Categories

Resources