How can I get the maximum length of datetime.strftime? - python

Currently I'm working on a command line program and there I print out dates.
I do this with datetime.datetime.strftime:
import datetime
d = datetime.datetime(2012,12,12)
date_str = d.strftime(config.output_str)
Where config.output_str is a format string that can be set by the user.
Is there a way to tell how long the string date_str will be at maximum?
Especially if a format string like u'%d %B %Y' is used, where the length of the month (%B) depends on the language of the user?

If you are not setting the locale with the locale module, then Python uses the C locale and you can predict the maximum length produced. All strings will be in English and the maximum length per format character is known.
Parse the string yourself, count the non-format characters and map format characters to the maximum length for that field.
If you were to use locale, you'll need to calculate the max length per language. You can automate the locale-dependent fields by looping over the months, weekdays, and AM/PM and measuring the max length for the %a, %A, %b, %B, %c, %p, %x and %X formats. I'd do that on the fly as needed.
The rest of the formats do not vary by locale and have a documented maximum length (the examples in the strptime table are typical, you can rely on those documenting the field length).

Here the solution I wrote to solve this, for those who are interested.
I use the given format string format_str to figure out how long it could get.
Therefore I assume that only the month and the day can very in length.
The function loop over the months to see which has the longest form and then I loop over the days with the previously found month.
import datetime
def max_date_len(format_str):
def date_len(date):
return len(date.strftime(format_str))
def find_max_index(lst):
return max(range(len(lst)), key=lst.__getitem__)
# run through all month and add 1 to the index since we need a month
# between 1 and 12
max_month = 1 + find_max_index([date_len(datetime.datetime(2012, month, 12, 12, 12)) for month in range(1, 13)])
# run throw all days of the week from day 10 to 16 since
# this covers all weekdays and double digit days
return max([date_len(datetime.datetime(2012, max_month, day, 12, 12)) for day in range(10, 17)])

Related

Generate randomly formatted date strings for machine learning

For a NLP project in python I need to generate random dates for model training purpose. Particularly, the date format must be random and coherent with a set of language locales. The formats includes those with only numbers and formats with (partially) written out day and month names, and various common punctuations.
My best solution so far is the following algorithm:
generate a datetime() object with random values (nice solution here)
randomly select a locale, i.e. pick one of ['en_US','fr_FR','it_IT','de_DE'] where in this case this list is well known and short, so not a problem.
randomly select a format string for strftime(), i.e. ['%Y-%m-%d','%d %B %Y',...]. In my case the list should reflect potentially occuring date formats in the documents that will be exposed to the NLP model in the future.
generate a sting with strftime()
Especially for 3) i do not know a better version than to hardcode the list of what I saw manually within the training documents. I could not yet find a function that would turn ocr-dates into a format string, such that i could extend the list when yet-unseen date formats come by.
Do you have any suggestions on how to come up with better randomly formatted dates, or how to improve this approach?
USE random.randrange() AND datetime.timedelta() TO GENERATE A RANDOM DATE BETWEEN TWO DATES
Call datetime.date(year, month, day) to return a datetime object representing the time indicated by year, month, and day. Call this twice to define the start and end date. Subtract the start date from the end date to get the time between the two dates. Call datetime.timedelta.days to get the number of days from the previous result datetime.timedelta. Call random.randrange(days) to get a random integer less than the previous result days. Call datetime.timedelta(days=n) to get a datetime.timedelta representing the previous result n. Add this result to the start date.
start_date = datetime.date(2020, 1, 1)
end_date = datetime.date(2020, 2, 1)
time_between_dates = end_date - start_date
days_between_dates = time_between_dates.days
random_number_of_days = random.randrange(days_between_dates)
random_date = start_date + datetime.timedelta(days=random_number_of_days)
print(random_date)
Here is my solution. Concerning the local, all need to be available on your computer to avoid error
import random
from datetime import datetime, timedelta
import locale
LOCALE = ['en_US','fr_FR','it_IT','de_DE'] # all need to be available on your computer to avoid error
DATE_FORMAT = ['%Y-%m-%d','%d %B %Y']
def gen_datetime(min_year=1900, max_year=datetime.now().year):
# generate a datetime
start = datetime(min_year, 1, 1)
years = max_year - min_year + 1
end = start + timedelta(days=365 * years)
format_date = DATE_FORMAT[random.randint(0, len(DATE_FORMAT)-1)]
locale_date = LOCALE[random.randint(0, len(LOCALE)-1)]
locale.setlocale(locale.LC_ALL, locale_date) # generate error if local are not available on your computer
return (start + (end - start) * random.random()).strftime(format_date)
date = gen_datetime()
print(date)

How do I parse a date without zero padding, in the format (1 or 2-digit year)-(Month abbreviation)?

I need to parse a few dates that are roughly in the format (1 or 2-digit year)-(Month abbreviation), for example:
5-Jun (June 2005)
13-Jan (January 2013)
I tried using strptime with the format %b-%y but it did not consistently produce the desired date. Per the documentation, this is because some years in my dataset are not zero-padded.
Further, when I tested the datetime module (please see below for my code) on the string "5-Jun", I got "2019-06-05", instead of the desired result (June 2005), even if I set yearfirst=True when calling parse.
from dateutil.parser import parse
parsed = parse("5-Jun",yearfirst=True)
print(parsed)
It will be easier if 0 is padded to single digit years, as it can be directly converted to time using format. Regular expression is used here to replace any instance of single digit number with it's '0 padded in front' value. I've used regex from here.
Sample code:
import re
match_condn = r'\b([0-9])\b'
replace_str = r'0\1'
datetime.strptime(re.sub(match_condn, replace_str, '15-Jun'), '%y-%b').strftime("%B %Y")
Output:
June 2015
One approach is to use str.zfill
Ex:
import datetime
d = ["5-Jun", "13-Jan"]
for date in d:
date, month = date.split("-")
date = date.zfill(2)
print(datetime.datetime.strptime(date+"-"+month, "%y-%b").strftime("%B %Y"))
Output:
June 2005
January 2013
Ah. I see from #Rakesh's answer what your data is about. I thought you needed to parse the full name of the month. So you had your two terms %b and %y backwards, but then you had the problem with the single-digit years. I get it now. Here's a much simpler way to get what you want if you can assume your dates are always in one of the two formats you indicate:
inp = "5-Jun"
t = time.strptime(("0" + inp)[-6:], "%y-%b")

Converting string to datetime with milliseconds and timezone - Python

I have the following python snippet:
from datetime import datetime
timestamp = '05/Jan/2015:17:47:59:000-0800'
datetime_object = datetime.strptime(timestamp, '%d/%m/%y:%H:%M:%S:%f-%Z')
print datetime_object
However when I execute the code, I'm getting the following error:
ValueError: time data '05/Jan/2015:17:47:59:000-0800' does not match format '%d/%m/%y:%H:%M:%S:%f-%Z'
what's wrong with my matching expression?
EDIT 2: According to this post, strptime doesn't support %z (despite what the documentation suggests). To get around this, you can just ignore the timezone adjustment?:
from datetime import datetime
timestamp = '05/Jan/2015:17:47:59:000-0800'
# only take the first 24 characters of `timestamp` by using [:24]
dt_object = datetime.strptime(timestamp[:24], '%d/%b/%Y:%H:%M:%S:%f')
print(dt_object)
Gives the following output:
$ python date.py
2015-01-05 17:47:59
EDIT: Your datetime.strptime argument should be '%d/%b/%Y:%H:%M:%S:%f-%z'
With strptime(), %y refers to
Year without century as a zero-padded decimal number
I.e. 01, 99, etc.
If you want to use the full 4-digit year, you need to use %Y
Similarly, if you want to use the 3-letter month, you need to use %b, not %m
I haven't looked at the rest of the string, but there are possibly more mismatches. You can find out how each section can be defined in the table at https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior
And UTC offset is lowercase z.

Python: Get times from file and compare

First part: I have a Python script that opens a file, gets a starting timecode string in the format:
136:17:30:00 (DOY:HH:MM:SS)
And a stopping timecode string in the format:
137:01:30:00 (DOY:HH:MM:SS)
The user enters the start year (2017), and the stopyear (2017).
I would like to know how to convert the starting and stopping timecode (along with the year string) into a 'datetime" object. This is considered to be an experiment start and stop time.
Second part: The user will enter a "recording" start time and a "recording" stop time in the formats:
2017:136:18:00:00 (YEAR:DOY:HH:MM:SS)
2017:137:01:00:00 (YEAR:DOY:HH:MM:SS)
This is considered to be record start/stop times.
I'd like to convert into a datetime object as well. I suppose it will be the same mechanism as the first part.
I want to first validate that the record stop/start times are within the experiment stop/start times, and be able to subtract stop time from start time.
Some of the Python documentation is confusing to me as to how to do this. Any ideas? Thanks in advance.
Python's datetime.strptime() is used for converting string representations of dates into datatime objects. It lets you specify %j for the day of the year as follows:
from datetime import datetime
print datetime.strptime("136:17:30:00", "%j:%H:%M:%S").replace(year=2017)
print datetime.strptime("2017:136:17:30:00", "%Y:%j:%H:%M:%S")
This would display:
2017-05-16 17:30:00
2017-05-16 17:30:00
.replace() is used to give the datetime object a year, otherwise it would default to 1900.
In summary:
%j - Day of the year as a zero-padded decimal number.
%Y - Year with century as a decimal number.
%H - Hour (24-hour clock) as a zero-padded decimal number.
%M - Minute as a zero-padded decimal number.
%S - Second as a zero-padded decimal number.
If you convert your start and end codes into datetime objects, you can then subtract them to give you the time between as follows:
dt_start = datetime.strptime("136:17:30:00", "%j:%H:%M:%S")
dt_end = datetime.strptime("137:01:30:00", "%j:%H:%M:%S")
print dt_end - dt_start
Giving you:
8:00:00
I don't know what exactly you want from your question. What you are doing with year. I hope the following will help you however:
time1='136:17:30:00'
time2='137:01:30:00'
#yr=2017
from datetime import timedelta
def objMaker(time):
arr=time1.split(':')
obj=timedelta(hours=int(arr[1]),minutes=int(arr[2]),seconds=int(arr[3]))
return obj
obj1=objMaker(time1)
obj2=objMaker(time2)

Attempting to insert an integer from a list into datetime object

What I am trying to accomplish is very simple: creating a loop from a range (pretty self explanatory below) that will insert the month into the datetime object. I know %d requires an integer, and I know that 'month' type is int...so I'm kind of stuck as to why I can't substitute my month variable. Here is my code:
all_months=range(1,13)
for month in all_months:
month_start = (datetime.date(2010,'%d',1))%month
next_month_begin= datetime.date(2010,'%d',1)%(month+1)
month_end=next_month_begin - timedelta(days=1)
print month_start
print month_end
What am I doing wrong?
All help appreciated! Thanks
There are a few things that you need to fix here.
EDIT: First, be careful with your range, since you are using month+1 to create next_month_begin, you do not want this to be greater than 12 or you will get an error.
Next, when you are trying to create the date object you are passing the month in as a string when you use (datetime.date(2010,'%d',1))%month. Your code probably throwing this error TypeError: an integer is required.
You need to give it the integer representing the month, not a string of the integer (there is a difference between 1 and '1'). This is also a simple fix, since you have variable named month that is already an integer, just use that instead of making a string. So you code should be something like:
month_start = datetime.date(2010,month,1)
I think you can figure out how to apply this to your next_month_begin assignment.
The last problem is that you need to use datetime.timedelta to tell Python to look in the datetime module for the timedelta() function -- your program would currently give you an error saying that timedelta is not defined.
Let me know if you have any problems applying these fixes. Be sure to include what the error you may be getting as well.
You've got other answers, but here's a way to get the last day of the month. Adding 31 days will get you into the next month regardless of the number of days in the current month, then moving back to the first and subtracting a day will give the ending date.
import datetime
for month in range(1,13):
month_start = datetime.date(2010,month,1)
into_next_month = month_start + datetime.timedelta(days=31)
month_end = into_next_month.replace(day=1) - datetime.timedelta(days=1)
print month_start,month_end
month is a variable and you can use it to create the datetime object. I think you want to do the following:
month_start = datetime.date(2010, month, 1)
next_month_begin = datetime.date(2010, month+1, 1)
That will work, because datetime.date() requires 3 integer arguments. '%d' % month would instead format the integer month as string. '%04d' % 3 for example would format the number 3 with 4 digits and leading zeros. But it's important to know, that even the (nearly unformatted) string "3" is different to the number 3 in Python.
And you can't write datetime(...) % 3 because the % operator will only work when used on a format string like the previous "%03d" % 3 example and not on a datetime object.
But other types might also accept the % operator (not including datetime objects). For example, integers accept the % operator to get the remainder of a division: 3 % 2 # returns 1. But there, the meaning of % is completely different, because the meaning of the operator depends on the types involved. For example, try 3 + 2 and "3" + "2". There, the meaning of + differs (integer addition vs. string concatenation), because the types are different too.
Check out the calendar module (http://docs.python.org/library/calendar.html).
It has batteries included for this sort of thing...
You could just do:
from calendar import Calendar
def start_and_end_days(year, month):
cal = Calendar()
month_days = [day for day in cal.itermonthdays(year, month) if day.month == month]
first_day = month_days[0]
last_day = month_days[-1]
return (first_day, last_day)

Categories

Resources