dateutils default to the last occurence of recognized part, not next - python

I am using dateutils.parser.parse to parse date strings which might contain partial information. If some information is not present, parse can take a default keyword argument from which it will fill any missing fields. This default defaults to datetime.datetime.today().
For a case like dateutil.parser.parse("Thursday"), this means it will return the date of the next Thursday. However, I need it to return the date of the last Thursday (including today, if today happens to be a Thursday).
So, assuming today == datetime.datetime(2018, 2, 20) (a Tuesday), I would like to get all of these asserts to be true:
from dateutil import parser
from datetime import datetime
def parse(date_str, default=None):
# this needs to be modified
return parser.parse(date_str, default=default)
today = datetime(2018, 2, 20)
assert parse("Tuesday", default=today) == today # True
assert parse("Thursday", default=today) == datetime(2018, 2, 15) # False
assert parse("Jan 31", default=today) == datetime(2018, 1, 31) # True
assert parse("December 10", default=today) == datetime(2017, 12, 10) # False
Is there an easy way to achieve this? With the current parse function only the first and third assert would pass.

Here's your modified code (code.py):
#!/usr/bin/env python3
import sys
from dateutil import parser
from datetime import datetime, timedelta
today = datetime(2018, 2, 20)
data = [
("Tuesday", today, today),
("Thursday", datetime(2018, 2, 15), today),
("Jan 31", datetime(2018, 1, 31), today),
("December 10", datetime(2017, 12, 10), today),
]
def parse(date_str, default=None):
# this needs to be modified
return parser.parse(date_str, default=default)
def _days_in_year(year):
try:
datetime(year, 2, 29)
except ValueError:
return 365
return 366
def parse2(date_str, default=None):
dt = parser.parse(date_str, default=default)
if default is not None:
weekday_strs = [day_str.lower() for day_tuple in parser.parserinfo.WEEKDAYS for day_str in day_tuple]
if date_str.lower() in weekday_strs:
if dt.weekday() > default.weekday():
dt -= timedelta(days=7)
else:
if (dt.month > today.month) or ((dt.month == today.month) and (dt.day > today.day)):
dt -= timedelta(days=_days_in_year(dt.year))
return dt
def print_stats(parse_func):
print("\nPrinting stats for \"{:s}\"".format(parse_func.__name__))
for triple in data:
d = parse_func(triple[0], default=triple[2])
print(" [{:s}] [{:s}] [{:s}] [{:s}]".format(triple[0], str(d), str(triple[1]), "True" if d == triple[1] else "False"))
if __name__ == "__main__":
print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
print_stats(parse)
print_stats(parse2)
Notes:
I changed the structure of the code "a bit", to parametrize it, so if a change is needed (e.g. a new example to be added) the changes should be minimal
Instead of asserts, I added a function (print_stats) that prints the results (instead raising AssertError and exiting the program if things don't match)
Takes an argument (parse_func) which is a function that does the parsing (e.g. parse)
Uses some globally declared data (data) together with the (above) function
data - is a list of triples, where each triple contains:
Text to be converted
Expected datetime ([Python 3.Docs]: datetime Objects) to be yielded by the conversion
default argument to be passed to the parsing function (parse_func)
parse2 function (an improved version of parse):
Accepts 2 types of date strings:
Weekday name
Month / Day (unordered)
Does the regular parsing, and if the converted object comes after the one passed as the default argument (that is determined by comparing the appropriate attributes of the 2 objects), it subtracts a period (take a look at [Python 3.Docs]: timedelta Objects):
"Thursday" comes after "Tuesday", so it subtracts the number of days in a week (7)
"December 10" comes after "February 20", so it subtracts the number of days in the year*
weekday_strs: I'd better explain it by example:
>>> parser.parserinfo.WEEKDAYS
[('Mon', 'Monday'), ('Tue', 'Tuesday'), ('Wed', 'Wednesday'), ('Thu', 'Thursday'), ('Fri', 'Friday'), ('Sat', 'Saturday'), ('Sun', 'Sunday')]
>>> [day_str.lower() for day_tuple in parser.parserinfo.WEEKDAYS for day_str in day_tuple]
['mon', 'monday', 'tue', 'tuesday', 'wed', 'wednesday', 'thu', 'thursday', 'fri', 'friday', 'sat', 'saturday', 'sun', 'sunday']
Flattens parser.parserinfo.WEEKDAYS
Converts strings to lowercase (for simplifying comparisons)
_days_in_year* - as you probably guessed, returns the number of days in an year (couldn't simply subtract 365 because leap years might mess things up):
>>> dt = datetime(2018, 3, 1)
>>> dt
datetime.datetime(2018, 3, 1, 0, 0)
>>> dt - timedelta(365)
datetime.datetime(2017, 3, 1, 0, 0)
>>> dt = datetime(2016, 3, 1)
>>> dt
datetime.datetime(2016, 3, 1, 0, 0)
>>> dt - timedelta(365)
datetime.datetime(2015, 3, 2, 0, 0)
Output:
(py35x64_test) E:\Work\Dev\StackOverflow\q048884480>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" code.py
Python 3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32
Printing stats for "parse"
[Tuesday] [2018-02-20 00:00:00] [2018-02-20 00:00:00] [True]
[Thursday] [2018-02-22 00:00:00] [2018-02-15 00:00:00] [False]
[Jan 31] [2018-01-31 00:00:00] [2018-01-31 00:00:00] [True]
[December 10] [2018-12-10 00:00:00] [2017-12-10 00:00:00] [False]
Printing stats for "parse2"
[Tuesday] [2018-02-20 00:00:00] [2018-02-20 00:00:00] [True]
[Thursday] [2018-02-15 00:00:00] [2018-02-15 00:00:00] [True]
[Jan 31] [2018-01-31 00:00:00] [2018-01-31 00:00:00] [True]
[December 10] [2017-12-10 00:00:00] [2017-12-10 00:00:00] [True]

Related

I need to find the difference in years between today and a list of dates

I am trying to write a simple function to compare a list of dates to today in order to find dates that are more than 13 years old. For the first part of the challenge I’m just trying to write the function that will compare the dates to find the difference today and each in the birthdays list.
import datetime
birthdays = [
datetime.datetime(2012, 4, 29),
datetime.datetime(2006, 8, 9),
datetime.datetime(1978, 5, 16),
datetime.datetime(1981, 8, 15),
datetime.datetime(2001, 7, 4),
datetime.datetime(1999, 12, 30)
]
today = datetime.datetime.today()
def is_over_13(dt):
Diff = (today - dt)
Return (diff)
Is_over_13(birthdays)
The problem I seem to be facing is that I’m trying to compare a datetime object to a list. So my thinking is that I need to be able to do something to the list to make it compatible to compare? Or the other way round.
I’m learning Python and this is step 1 of a specific code challenge so I can’t use lambda or panda or libraries. I’m expected to do it using functional chaining or comprehensions.
Thanks in advance
I think you need a loop.
for dt in birthdays:
diff=(today-dt)
return diff
You can loop over a list of datetimes to all compare them one by one with today.
This function will return True if one date is over 13 years.
def is_it_over_13(birthday_list):
for date in birthday_list:
if abs(today - date) > datetime.timedelta(days = 13*365):
return True
else:
return False
import datetime
birthdays = [
datetime.datetime(2012, 4, 29),
datetime.datetime(2006, 8, 9),
datetime.datetime(1978, 5, 16),
datetime.datetime(1981, 8, 15),
datetime.datetime(2001, 7, 4),
datetime.datetime(1999, 12, 30)
]
def is_over_13(dt):
today = datetime.datetime.today()
diff = (datetime.datetime.now() - dt).days
# 4745 == 13*365
if diff>4745:
return dt
for i in birthdays:
print(is_over_13(i))
You have a few tweaks to makes.
import datetime
birthdays = [
datetime.datetime(2012, 4, 29),
datetime.datetime(2006, 8, 9),
datetime.datetime(1978, 5, 16),
datetime.datetime(1981, 8, 15),
datetime.datetime(2001, 7, 4),
datetime.datetime(1999, 12, 30)
]
today = datetime.datetime.today()
def is_over_13(dt):
diff = (today - dt) # you have capital D, need to match variable name with lower case.
return diff # Return should be return
for birthday in birthdays: # loop over each day to check
print(is_over_13(birthday)) ## you have capital I, needs to match your function name with lower case i
Also your function name says is_over_13 which implies true or false return i think. You might want to take the diff value and compare it to 13 years and return true or false
This might look like
import datetime
birthdays = [
datetime.datetime(2012, 4, 29),
datetime.datetime(2006, 8, 9),
datetime.datetime(1978, 5, 16),
datetime.datetime(1981, 8, 15),
datetime.datetime(2001, 7, 4),
datetime.datetime(1999, 12, 30)
]
today = datetime.datetime.today()
DAYS_IN_YEAR = 365
YEARS = 13
def is_over_13(dt):
diff = (today - dt).days # get the days for your diff
return diff > (DAYS_IN_YEAR * YEARS) # return if greater than 13 or not
for birthday in birthdays:
print(is_over_13(birthday))
This will get you started. You need a loop. In this case, I used a comprehension:
import datetime
birthdays = [
datetime.datetime(2012, 4, 29),
datetime.datetime(2006, 8, 9),
datetime.datetime(1978, 5, 16),
datetime.datetime(1981, 8, 15),
datetime.datetime(2001, 7, 4),
datetime.datetime(1999, 12, 30)
]
today = datetime.datetime.today()
>>> [f'{today.date()} - {bd.date()}>13: {((today-bd).days)>365.2425*13}' for bd in birthdays]
['2022-01-05 - 2012-04-29>13: False', '2022-01-05 - 2006-08-09>13: True', '2022-01-05 - 1978-05-16>13: True', '2022-01-05 - 1981-08-15>13: True', '2022-01-05 - 2001-07-04>13: True', '2022-01-05 - 1999-12-30>13: True']
Because of leap years, time differences are expressed in days, minutes, seconds. This uses a common, but less than perfect, shortcut of estimating a year difference by multiplying a day difference by 365.2425 that is usually accurate but not 100%.
To get 100% accuracy, you need to detect leap year birthdays and adjust the date to 2/28/the_year in a non-leap year.
To do that, use this function:
def is_older_than(bd, age, day=datetime.datetime.today()):
try:
age_date=bd.replace(year=bd.year+age)
except ValueError:
age_date=bd.replace(year=bd.year+age, day=28)
return age_date<=day
Then redo the loop that will be accurate by the day no matter what:
>>> [f'{today.date()} - {bd.date()}>13: {is_older_than(bd, 13)}' for bd in birthdays]
['2022-01-05 - 2012-04-29>13: False', '2022-01-05 - 2006-08-09>13: True', '2022-01-05 - 1978-05-16>13: True', '2022-01-05 - 1981-08-15>13: True', '2022-01-05 - 2001-07-04>13: True', '2022-01-05 - 1999-12-30>13: True']

How to get all dates for a given year and calendar week in Python?

Say you have a year "2017" and an isocalendar week "13". How would you get all dates corresponding to this year and calendar week efficiently (with use of a library like datetime or calendar) as a list in Python?
Not sure this is what you want. I played around with calendar module and this is the best I managed to do, let me know:
import calendar
def get_week(y, w):
weeks = [
tuple(w)
for t in cal.yeardatescalendar(y) for month in t for w in month
if w[0].year == y
]
weeks = sorted(list(set(weeks)))
return list(weeks[w-1])
get_week(2017, 13)
Output:
[datetime.date(2017, 3, 27),
datetime.date(2017, 3, 28),
datetime.date(2017, 3, 29),
datetime.date(2017, 3, 30),
datetime.date(2017, 3, 31),
datetime.date(2017, 4, 1),
datetime.date(2017, 4, 2)]
Or with datetime.isocalendar() function:
import datetime as dt
def get_week(y, w):
first = next(
(dt.date(y, 1, 1) + dt.timedelta(days=i)
for i in range(367)
if (dt.date(y, 1, 1) + dt.timedelta(days=i)).isocalendar()[1] == w))
return [first + dt.timedelta(days=i) for i in range(7)]
get_week(2017, 13)
This ought to do it.
import datetime
def get_month (year: int, weekday: int):
start = datetime.datetime (year=year, month=1, day=1)
'''
Weekday correction:
source: https://www.calendar-week.org/
"In Europe, the first calendar week of the year is the week that
contains four days of the new year."
'''
if start.weekday () >= 4:
start += datetime.timedelta (days=7-start.weekday ())
else:
start -= datetime.timedelta (days=start.weekday ())
start += datetime.timedelta (days=7*(weekday-1))
return [start + datetime.timedelta(days=i) for i in range (0, 7)]
Will return a list of datetime-s. To get string representation:
print (list (map (lambda x: str (x.date()), get_month (2017, 2))))

Convert integer date format to human readable format

I have a date time format where dates are represented as integers from 1/1/1900 .
For example: 1 is 1/1/1900 and 42998 is 20/9/2017.
How can I convert this format to a human readable format like dd/mm/yyyy ? I checked datetime documentation but I did not find any way to do this. I want to do this either on python 3 or 2.7.
Thanks for any suggestion.
You can define your dates as offsets from your basetime and construct a datetime:
In[22]:
import datetime as dt
dt.datetime(1900,1,1) + dt.timedelta(42998)
Out[22]: datetime.datetime(2017, 9, 22, 0, 0)
Once it's a datetime object you can convert this to a str via strftime using whatever format you desire:
In[24]:
(dt.datetime(1900,1,1) + dt.timedelta(42998-1)).strftime('%d/%m/%Y')
Out[24]: '21/09/2017'
So you can define a user func to do this:
In[27]:
def dtToStr(val):
base = dt.datetime(1900,1,1)
return (base + dt.timedelta(val-1)).strftime('%d/%m/%Y')
dtToStr(42998)
Out[27]: '21/09/2017'
import datetime
base_date = datetime.datetime(1900, 1, 1)
convert = lambda x: base_date + datetime.timedelta(days=x-1)
>>> convert(42998)
datetime.datetime(2017, 9, 21, 0, 0)
You can use a datetime object to do that.
import datetime
d = datetime.datetime(1900, 1, 1, 0, 0)
d + datetime.timedelta(days = 42998)
>> datetime.datetime(2017, 9, 22, 0, 0)

Python - Convert Month Name to Integer

How can I convert 'Jan' to an integer using Datetime? When I try strptime, I get an error time data 'Jan' does not match format '%m'
You have an abbreviated month name, so use %b:
>>> from datetime import datetime
>>> datetime.strptime('Jan', '%b')
datetime.datetime(1900, 1, 1, 0, 0)
>>> datetime.strptime('Aug', '%b')
datetime.datetime(1900, 8, 1, 0, 0)
>>> datetime.strptime('Jan 15 2015', '%b %d %Y')
datetime.datetime(2015, 1, 15, 0, 0)
%m is for a numeric month.
However, if all you wanted to do was map an abbreviated month to a number, just use a dictionary. You can build one from calendar.month_abbr:
import calendar
abbr_to_num = {name: num for num, name in enumerate(calendar.month_abbr) if num}
Demo:
>>> import calendar
>>> abbr_to_num = {name: num for num, name in enumerate(calendar.month_abbr) if num}
>>> abbr_to_num['Jan']
1
>>> abbr_to_num['Aug']
8
This is straightforward enough that you could consider just using a dictionary, then you have fewer dependencies anyway.
months = dict(Jan=1, Feb=2, Mar=3, ...)
print(months['Jan'])
>>> 1
Off the cuff-
Did you try %b?
from calendar import month_abbr
month = "Jun"
for k, v in enumerate(month_abbr):
if v == month:
month = k
break
print(month)
6
You will get the number of month 6

How do I find missing dates in a list of sorted dates?

In Python how do I find all the missing days in a sorted list of dates?
using sets
>>> from datetime import date, timedelta
>>> d = [date(2010, 2, 23), date(2010, 2, 24), date(2010, 2, 25),
date(2010, 2, 26), date(2010, 3, 1), date(2010, 3, 2)]
>>> date_set = set(d[0] + timedelta(x) for x in range((d[-1] - d[0]).days))
>>> missing = sorted(date_set - set(d))
>>> missing
[datetime.date(2010, 2, 27), datetime.date(2010, 2, 28)]
>>>
Sort the list of dates and iterate over it, remembering the previous entry. If the difference between the previous and current entry is more than one day, you have missing days.
Here's one way to implement it:
from datetime import date, timedelta
from itertools import tee, izip
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
b.next()
return izip(a, b)
def missing_dates(dates):
for prev, curr in pairwise(sorted(dates)):
i = prev
while i + timedelta(1) < curr:
i += timedelta(1)
yield i
dates = [ date(2010, 1, 8),
date(2010, 1, 2),
date(2010, 1, 5),
date(2010, 1, 1),
date(2010, 1, 7) ]
for missing in missing_dates(dates):
print missing
Output:
2010-01-03
2010-01-04
2010-01-06
Performance is O(n*log(n)) where n is the number of days in the span when the input is unsorted. As your list is already sorted, it will run in O(n).
>>> from datetime import datetime, timedelta
>>> date_list = [datetime(2010, 2, 23),datetime(2010, 2, 24),datetime(2010, 2, 25),datetime(2010, 2, 26),datetime(2010, 3, 1),datetime(2010, 3, 2)]
>>>
>>> date_set=set(date_list) # for faster membership tests than list
>>> one_day = timedelta(days=1)
>>>
>>> test_date = date_list[0]
>>> missing_dates=[]
>>> while test_date < date_list[-1]:
... if test_date not in date_set:
... missing_dates.append(test_date)
... test_date += one_day
...
>>> print missing_dates
[datetime.datetime(2010, 2, 27, 0, 0), datetime.datetime(2010, 2, 28, 0, 0)]
This also works for datetime.date objects, but the OP says the list is datetime.datetime objects
USING A FOR LOOP
The imports you'll need:
import datetime
from datetime import date, timedelta
Let's say you have a sorted list called dates with several missing dates in it.
First select the first and last date:
start_date = dates[0]
end_date = dates[len(dates)-1]
Than count the number of days between these two dates:
numdays = (end_date - start_date).days
Than create a new list with all dates between start_date and end_date:
all_dates = []
for x in range (0, (numdays+1)):
all_dates.append(start_date + datetime.timedelta(days = x))
Than check with dates are in all_dates but not in dates by using a for loop with range and adding these dates to dates_missing:
dates_missing = []
for i in range (0, len(all_dates)):
if (all_dates[i] not in dates):
dates_missing.append(all_dates[i])
else:
pass
Now you'll have a list called dates_missing with all the missing dates.
Put the dates in a set and then iterate from the first date to the last using datetime.timedelta(), checking for containment in the set each time.
Here's an example for a pandas dataframe with a date column. If it's an index, then change df.Date to df.index.
import pandas as pd
df.Date = pd.to_datetime(df.Date) # ensure datetime format of date column
min_dt = df.Date.min() # get lowest date
max_dt = df.Date.max() # get highest date
dt_range = pd.date_range(min_dt, max_dt) # get all requisite dates in range
missing_dts = [d for d in dt_range if d not in df.Date] # list missing
print("There are {n} missing dates".format(n=len(missing_dts)))
import datetime
DAY = datetime.timedelta(days=1)
# missing dates: a list of [start_date, end)
missing = [(d1+DAY, d2) for d1, d2 in zip(dates, dates[1:]) if (d2 - d1) > DAY]
def date_range(start_date, end, step=DAY):
d = start_date
while d < end:
yield d
d += step
missing_dates = [d for d1, d2 in missing for d in date_range(d1, d2)]
Using a list comprehension
>>> from datetime import date, timedelta
>>> d = [date(2010, 2, 23),date(2010, 2, 24),date(2010, 2, 25),date(2010, 2, 26),date(2010, 3, 1),date(2010, 3, 2)]
>>> date_set=set(d)
>>> missing = [x for x in (d[0]+timedelta(x) for x in range((d[-1]-d[0]).days)) if x not in date_set]
>>> missing
[datetime.date(2010, 2, 27), datetime.date(2010, 2, 28)]
A good way of getting this done in Python is as follows. You need not worry about efficiency unless you have dates from multiple years in your list and this code always needs to run as per user interaction and yield output immediately.
Get missing dates from one list (sorted or not)
Create a function that gives you all the dates from start_date to end_date. And use it.
import datetime
def get_dates(start_date, end_date):
span_between_dates = range(end_date - start_date).days
for index in span_between_dates + 1:
# +1 is to make start and end dates inclusive.
yield start_date + datetime.timedelta(index)
my_date_list = ['2017-03-05', '2017-03_07', ...]
# Edit my_date_list as per your requirement.
start_date = min(my_date_list)
end_date = max(my_date_list)
for current_date in get_dates(start_date, end_date)
if date not in my_date_list:
print date
Get missing or overlapping dates between two date ranges.
get_dates function should be defined.
my_other_date_list = [] # your other date range
start_date = min(my_date_list)
end_date = max(my_date_list)
for current_date in get_dates(start_date, end_date)
if (date in my_date_range) and (date in my_other_date_list):
print ('overlapping dates between 2 lists:')
print date
elif (date in my_date_range) and (date not in my_other_date_list):
print ('missing dates:')
print date

Categories

Resources