Why is a datetime string format not reversible? - python

I expected datetime.strftime and datetime.strptime calls to be reversible. Such that calling
datetime.strptime(datetime.now().strftime(fmt), fmt))
Would give the closest reconstruction of now() (given the information preserved by the format).
However, this is not the case when formatting a date to a string with a YYYY-Week# format:
>>> yyyy_u = datetime.datetime(1992, 5, 17).strftime('%Y-%U')
>>> print(yyyy_u)
'1992-20'
Formatting the string back to a date does not give the expected response:
>>> datetime.datetime.strptime(yyyy_u, '%Y-%U')
datetime.datetime(1992, 1, 1, 0, 0)
I would have expected the response to be the first day of week 20 in 1992 (17 May 1992).
Is this a failure of the %U format option or more generally are datetime.strftime and datetime.strptime calls not meant to be reversible?

From the Python docs regarding strptime() behaviour:
When used with the strptime() method, %U and %W are only used in calculations when the day of the week and the year are specified.
Day of the week must be specified along with Week number and Year.
(%Y-%U-%w)
datetime.datetime.strptime('1992-20-0', '%Y-%U-%w') gives the first day of week 20 for 1992 year.

Related

How did datetime decide '22' in %y is 2022 and not 1922?

Example code
a = '27-10-80'
b = '27-10-22'
c = '27-10-50'
x = datetime.strptime(a, '%d-%m-%y')
print(x)
y = datetime.strptime(b, '%d-%m-%y')
print(y)
z = datetime.strptime(c, '%d-%m-%y')
print(z)
Output:
1980-10-27 00:00:00
2022-10-27 00:00:00
2050-10-27 00:00:00
How did datetime decide the year generated from string using '%y' format. Why did 80 become 1980 while 50 become 2050?
The Python documentation indicates that strftime/strptime follows the semantics of C
Most of the functions defined in this module call platform C library functions with the same name.
as well as documents the behaviour of %y:
Function strptime() can parse 2-digit years when given %y format code. When 2-digit years are parsed, they are converted according to the POSIX and ISO C standards: values 69–99 are mapped to 1969–1999, and values 0–68 are mapped to 2000–2068.
But we can confirm this behaviour by looking up STRPTIME(3), for %y:
the year within the current century. When a century is not otherwise specified, values in the range 69-99 refer to years in the twentieth century (1969 to 1999 inclusive); values in the range 00-68 refer to years in the twenty-first century (2000 to 2068 inclusive). Leading zeros are permitted but not required.
emphasis mine.
There is no expressed reasoning for that cutoff, but it is reasonable to think that it stems from the UNIX EPOCH being 1970, with a year of safety margin at the boundary.
This is documented under the comment section of _strptime.py.
if group_key == 'y':
year = int(found_dict['y'])
# Open Group specification for strptime() states that a %y
#value in the range of [00, 68] is in the century 2000, while
#[69,99] is in the century 1900
if year <= 68:
year += 2000
else:
year += 1900
From the documentation of time
Function strptime() can parse 2-digit years when given %y format code. When 2-digit years are parsed, they are converted according to the POSIX and ISO C standards: values 69–99 are mapped to 1969–1999, and values 0–68 are mapped to 2000–2068.

Converting worded date format to datetime format in pandas

Today one of my script gave an error for an invalid datetime format as an input. The script is expecting the datetime input as '%m/%d/%Y', but it got it in an entirely different format. For example, the date should have been 5/2/2022 but it was May 2, 2022. To add a bit more information for clarity, the input is coming for a Google sheet and the entire date is in a single cell (rather than different cells for month, date and year).
Is there a way to convert this kind of worded format to the desired datetime format before the script starts any kind of processing?
If you're in presence of the full month name, try this:
>>> pd.to_datetime(df["Date"], format="%B %d, %Y")
0 2022-05-02
Name: Date, dtype: datetime64[ns]
According to the Python docs:
%B: "Month as locale’s full name".
%d: "Day of the month as a zero-padded decimal number". (Although it seems to work in this case)
%Y: "Year with century as a decimal number."
Now, if you want to transform this date to the format you initially expected, just transform the series using .dt.strftime:
>>> pd.to_datetime(df["Date"], format="%B %d, %Y").dt.strftime("%m/%d/%Y")
0 05/02/2022
Name: Date, dtype: object

Creating a datetime object by only providing the month or only providing the day of the week

Creating a datetime object in Python by only providing the month preserves the month information:
>>> import datetime
>>> datetime.datetime.strptime('Feb', '%b')
datetime.datetime(1900, 2, 1, 0, 0)
>>> datetime.datetime.strptime('Feb', '%b').strftime('%B')
'February'
Since no year or day is provided, Python uses the defaults 1900 and 01, respectively, resulting in datetime.datetime(1900, 2, 1, 0, 0).
However, if a day of the week is provided:
>>> datetime.datetime.strptime('Tue', '%a')
datetime.datetime(1900, 1, 1, 0, 0)
>>> datetime.datetime.strptime('Tue', '%a').strftime('%A')
'Monday'
I understand that 1900-01-01 was Monday, but why isn't Python creating an object datetime.datetime(1900, 1, 2, 0, 0) which was the first Tuesday after 1900-01-01, similar to what it does with February in the first example?
It seems like the initial information (i.e. that the day was Tuesday) is lost without any warning or error. Is there a fundamental difference between creating a datetime object by only providing the month and only providing the day of the week?
As per docs:
For the datetime.strptime() class method, the default value is 1900-01-01T00:00:00.000: any components not specified in the format string will be pulled from the default value.
And important part:
Similar to %U and %W, %V is only used in calculations when the day of
the week and the ISO year (%G) are specified in a strptime() format
string. Also note that %G and %Y are not interchangeable.
I guess %a falls under the same condition but isn't specified in the docs.
You could use %G (ISO 8601 year) with %V (ISO 8601 week) alongside with %a, then it should work:
>>> datetime.datetime.strptime('1900 01 Tue', '%G %V %a').strftime('%Y-%m-%d %a')
'1900-01-02 Tue'
>>> datetime.datetime.strptime('1900 02 Tue', '%G %V %a').strftime('%Y-%m-%d %a')
'1900-01-09 Tue'
I believe it's because this way you specify the needed week.

How to get Year-Week format in ISO calendar format?

I am trying to get the current date in ISO Calendar format as follows alongwith the zero padding on the week?
2019/W06
I tried the following, but prefer something using strftime as it is much easier to read.
print(str(datetime.datetime.today().isocalendar()[0]) + '/W' + str(datetime.datetime.today().isocalendar()[1]))
2019/W6
Use following code:
print(datetime.now().strftime('%Y/W%V'))
%Y Year with century as a decimal number.
%V - The ISO 8601 week number of the current year (01 to 53), where
week 1 is the first week that has at least 4 days in the current year,
and with Monday as the first day of the week.
https://docs.python.org/3.7/library/datetime.html#strftime-and-strptime-behavior
Solution with strftime:
If you want the zero padding:
datetime.date.today().strftime("%Y/W%V")
Output:
2019/W06
If you don't want it:
datetime.date.today().strftime("%Y/W%-V")
Output:
2019/W6
Note that "%V" returns the week number, and the "-" is what removes the leading zero.

Format date without dash?

In Python, we can convert a date to a string by:
>>> import datetime
>>> datetime.date(2002, 12,4).isoformat()
'2002-12-04'
How can we format the output to be '20021204', i.e. without dashes?
There are two functions, but I don't know how to specify the format:
date.strftime(format)
Return a string representing the date,
controlled by an explicit format string. Format codes referring to
hours, minutes or seconds will see 0 values. For a complete list of
formatting directives, see section strftime() and strptime() Behavior.
and
date.__format__(format)
Same as date.strftime(). This makes it
possible to specify format string for a date object when using
str.format(). See section strftime() and strptime() Behavior.
You are using the wrong tool for your job, use strftime
>>> datetime.date(2002, 12,4).strftime("%Y%m%d")
'20021204'
For details on using strftime and strptime, refer strftime() and strptime() Behavior
For your particular case, I will quote the relevant excerpt
%Y Year with century as a decimal number. 1970, 1988, 2001, 2013
%m Month as a zero-padded decimal number. 01, 02, ..., 12
%d Day of the month as a zero-padded decimal number. 01, 02, ..., 31
alternatively, you could have always removed or replaced the hyphen from the isoformat
>>> str(datetime.date(2002, 12,4)).translate(None,'-')
'20021204'
You can use '%m%d%Y as your format :
>>> d=datetime.date(2002, 12,4)
>>> d.strftime('%m%d%Y')
'12042002'
Or in your first code, you can use str.replace :
>>> datetime.date(2002, 12,4).isoformat().replace('-','')
'20021204'

Categories

Resources