Python: Convert 2-digit / 4-digit year [duplicate] - python

This question already has answers here:
Python converting letter to two-digit number
(2 answers)
Closed 1 year ago.
I am trying to convert the date format '31-Dec-09' to '2009-12-31' in Python with the following code:
df = ['19-Jan-19', '4-Jan-19']
f = [datetime.datetime.strptime(x,'%d-%mmm-%y').strftime('%Y-%m-%d') for x in df]
print(f)
Getting the following error:
time data '9-Jan-19' does not match format '%d-%mmm-%y'
I have read somewhere that matching a two-digit year '00' would require a lowercase y instead of an upper case one. Either way, I am getting the same error so I guess something else is wrong. Any help?

Your %y and %Y patterns are fine, the issue is that you used %mmm here. The datetime.strptime() method patterns are all single letter patterns, and %mmm is seen as the %m pattern followed by two literal m characters. %m matches a numeric month (1 or 2 digits, zero-padding optional). So 19-1mm-19 would match, but 19-Jan-19 does not because the month is not numeric and the two literal m characters are missing.
The correct pattern to use is '%d-%b-%y' here, where %b matches an abbreviated month name.
Demo:
>>> import datetime
>>> df = ['19-Jan-19', '4-Jan-19']
>>> [datetime.datetime.strptime(x,'%d-%b-%y').strftime('%Y-%m-%d') for x in df]
['2019-01-19', '2019-01-04']

Since you're specifying month in short hand notation, you should use %b instead of %mmm (That is not a valid format in datetime)

Related

What format code should I use for this timestamp with strptime in Python?

I have a .txt file that contains the string "2020-08-13T20:41:15.4227628Z"
What format code should I use in strptime function in Python 3.7? I tried the following but the '8' at end just before 'Z' is not a valid weekday
from datetime import datetime
timestamp_str = "2020-08-13T20:41:15.4227628Z"
timestamp = datetime.strptime(timestamp_str, '%Y-%m-%dT%H:%M:%S.%f%uZ')
ValueError: time data '2020-08-13T20:41:15.4227628Z' does not match format '%Y-%m-%dT%H:%M:%S.%f%uZ'
your timestamp's format is mostly in accordance with ISO 8601, except for the 7 digit fractional seconds.
The 7th digit would be 1/10th of a microsecond; normally you'd have 3, 6 or 9 digits resolution (milli-, micro or nanoseconds respectively).
The Z denotes UTC
In Python, you can parse this format conveniently as I show here.
The 7 digits following the . appear to be a number of nanoseconds. You may have a platform-specific format (defined by strftime(3)) available to use in place of %f, but if not, your best bet is to drop the trailing digit before attempting to parse the remaining string as a timestamp.
regex = "(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{6}).(\d.*)"
if (m := re.match(regex, timestamp_str) is not None:
timestamp_str = "".join(m.groups())
timestamp = datetime.strptime(timestamp_str, '%Y-%m-%dT%H:%M:%S.%fZ')

re.findall only finding half the patterns [duplicate]

This question already has answers here:
Why doesn't [01-12] range work as expected?
(7 answers)
Closed 4 years ago.
I'm using re.findall to parse the year and month from a string, however it is only outputting patterns from half the string. Why is this?
date_string = '2011-1-1_2012-1-3,2015-3-1_2015-3-3'
find_year_and_month = re.findall('[1-2][0-9][0-9][0-9]-[1-12]', date_string)
print(find_year_and_month)
and my output is this:
['2011-1', '2012-1']
This is the current output for those dates but why am I only getting pattern matching for half the string?
[1-12] doesn't do what you think it does. It matches anything in the range 1 to 1, or it matches a 2.
See this question for some replacement regex options, like ([1-9]|1[0-2]): How to represent regex number ranges (e.g. 1 to 12)?
If you want an interactive tool for experimenting with regexes, I personally recommend Regexr.
Adjust your regex pattern as shown below:
import re
date_string = '2011-1-1_2012-1-3,2015-3-1_2015-3-3'
find_year_and_month = re.findall('([1-2][0-9]{3}-(?:1[0-2]|[1-9]))', date_string)
print(find_year_and_month)
The output:
['2011-1', '2012-1', '2015-3', '2015-3']

Python : Matching a custom date format in a string

I have a string like this:
>>> string = "bla_bla-whatever_2018.02.09_11.34.09_more_bla-123"
I need to extract the date 2018.02.09_11.34.09 from it. It will always be in this format.
So I tried:
>>> match = re.search(r'\d{4}\.\d{2}\.\d{2}_\d{2}\.\d{2}\.\d{2}', string)
It correctly extracts out the date from that string:
>>> match.group()
'2018.02.09_11.34.09'
But then when I try to create a datetime object from this string, it doesn't work:
>>> datetime.datetime.strptime(match.group(), '%Y.%m.%d_%H.%I.%S')
ValueError: time data '2018.02.09_11.34.09' does not match format '%Y.%m.%d_%H.%I.%S'
What am I doing wrong?
You need to replace the format specifier %I with %M, for minutes:
%Y.%m.%d_%H.%M.%S
%I denotes hour in 12-hour format so from (0)1..12, whereas based on your example, you have 34 as the value, which presumably is in minutes (%M).

How to change date format of a date inside a string? [duplicate]

This question already has answers here:
How to convert a date string to different format [duplicate]
(2 answers)
Closed 5 years ago.
I have a string "Hello please change the date from 04/24/2017 by putting month in the middle"
class Paragraph:
#staticmethod
def change_date_format(paragraph):
return None
print(Paragraph.change_date_format('Hello please change the date from 04-24-2017 by putting month in the middle'))
I need to change this to "Hello please change the date from 24/04/2017 by putting month in the middle"
import re
def change_date_format(paragraph):
return re.sub('(\d+)/(\d+)/(\d+)',
lambda m: '{}/{}/{}'.format(m.group(2), m.group(1), m.group(3)),
paragraph)
This finds a sequence of three groups of digits separated by slashes (like 04/27/2017), then uses format to swap the first and second group.
datetime can do this:
import datetime
def change_date_format(date):
return datetime.datetime.strptime(date, '%m/%d/%Y').strftime('%d/%m/%Y')

Python regex to match dates

What regular expression in Python do I use to match dates like this: "11/12/98"?
Instead of using regex, it is generally better to parse the string as a datetime.datetime object:
In [140]: datetime.datetime.strptime("11/12/98","%m/%d/%y")
Out[140]: datetime.datetime(1998, 11, 12, 0, 0)
In [141]: datetime.datetime.strptime("11/12/98","%d/%m/%y")
Out[141]: datetime.datetime(1998, 12, 11, 0, 0)
You could then access the day, month, and year (and hour, minutes, and seconds) as attributes of the datetime.datetime object:
In [143]: date.year
Out[143]: 1998
In [144]: date.month
Out[144]: 11
In [145]: date.day
Out[145]: 12
To test if a sequence of digits separated by forward-slashes represents a valid date, you could use a try..except block. Invalid dates will raise a ValueError:
In [159]: try:
.....: datetime.datetime.strptime("99/99/99","%m/%d/%y")
.....: except ValueError as err:
.....: print(err)
.....:
.....:
time data '99/99/99' does not match format '%m/%d/%y'
If you need to search a longer string for a date,
you could use regex to search for digits separated by forward-slashes:
In [146]: import re
In [152]: match = re.search(r'(\d+/\d+/\d+)','The date is 11/12/98')
In [153]: match.group(1)
Out[153]: '11/12/98'
Of course, invalid dates will also match:
In [154]: match = re.search(r'(\d+/\d+/\d+)','The date is 99/99/99')
In [155]: match.group(1)
Out[155]: '99/99/99'
To check that match.group(1) returns a valid date string, you could then parsing it using datetime.datetime.strptime as shown above.
I find the below RE working fine for Date in the following format;
14-11-2017
14.11.2017
14|11|2017
It can accept year from 2000-2099
Please do not forget to add $ at the end,if not it accept 14-11-201 or 20177
date="13-11-2017"
x=re.search("^([1-9] |1[0-9]| 2[0-9]|3[0-1])(.|-)([1-9] |1[0-2])(.|-|)20[0-9][0-9]$",date)
x.group()
output = '13-11-2017'
I built my solution on top of #aditya Prakash appraoch:
print(re.search("^([1-9]|0[1-9]|1[0-9]|2[0-9]|3[0-1])(\.|-|/)([1-9]|0[1-9]|1[0-2])(\.|-|/)([0-9][0-9]|19[0-9][0-9]|20[0-9][0-9])$|^([0-9][0-9]|19[0-9][0-9]|20[0-9][0-9])(\.|-|/)([1-9]|0[1-9]|1[0-2])(\.|-|/)([1-9]|0[1-9]|1[0-9]|2[0-9]|3[0-1])$",'01/01/2018'))
The first part (^([1-9]|0[1-9]|1[0-9]|2[0-9]|3[0-1])(\.|-|/)([1-9]|0[1-9]|1[0-2])(\.|-|/)([0-9][0-9]|19[0-9][0-9]|20[0-9][0-9])$) can handle the following formats:
01.10.2019
1.1.2019
1.1.19
12/03/2020
01.05.1950
The second part (^([0-9][0-9]|19[0-9][0-9]|20[0-9][0-9])(\.|-|/)([1-9]|0[1-9]|1[0-2])(\.|-|/)([1-9]|0[1-9]|1[0-9]|2[0-9]|3[0-1])$) can basically do the same, but in inverse order, where the year comes first, followed by month, and then day.
2020/02/12
As delimiters it allows ., /, -. As years it allows everything from 1900-2099, also giving only two numbers is fine.
If you have suggestions for improvement please let me know in the comments, so I can update the answer.
Using this regular expression you can validate different kinds of Date/Time samples, just a little change is needed.
^\d\d\d\d/(0?[1-9]|1[0-2])/(0?[1-9]|[12][0-9]|3[01]) (00|[0-9]|1[0-9]|2[0-3]):([0-9]|[0-5][0-9]):([0-9]|[0-5][0-9])$ -->validate this: 2018/7/12 13:00:00
for your format you cad change it to:
^(0?[1-9]|[12][0-9]|3[01])/(0?[1-9]|1[0-2])/\d\d$ --> validates this: 11/12/98
I use something like this
>>> import datetime
>>> regex = datetime.datetime.strptime
>>>
>>> # TEST
>>> assert regex('2020-08-03', '%Y-%m-%d')
>>>
>>> assert regex('2020-08', '%Y-%m-%d')
ValueError: time data '2020-08' does not match format '%Y-%m-%d'
>>> assert regex('08/03/20', '%m/%d/%y')
>>>
>>> assert regex('08-03-2020', '%m/%d/%y')
ValueError: time data '08-03-2020' does not match format '%m/%d/%y'
Well, from my understanding, simply for matching this format in a given string, I prefer this regular expression:
pattern='[0-9|/]+'
to match the format in a more strict way, the following works:
pattern='(?:[0-9]{2}/){2}[0-9]{2}'
Personally, I cannot agree with unutbu's answer since sometimes we use regular expression for "finding" and "extract", not only "validating".
Sometimes we need to get the date from a string.
One example with grouping:
record = '1518-09-06 00:57 some-alphanumeric-charecter'
pattern_date_time = ([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}) .+
match = re.match(pattern_date_time, record)
if match is not None:
group = match.group()
date = group[0]
print(date) // outputs 1518-09-06 00:57
As the question title asks for a regex that finds many dates, I would like to propose a new solution, although there are many solutions already.
In order to find all dates of a string that are in this millennium (2000 - 2999), for me it worked the following:
dates = re.findall('([1-9]|1[0-9]|2[0-9]|3[0-1]|0[0-9])(.|-|\/)([1-9]|1[0-2]|0[0-9])(.|-|\/)(20[0-9][0-9])',dates_ele)
dates = [''.join(dates[i]) for i in range(len(dates))]
This regex is able to find multiple dates in the same string, like bla Bla 8.05/2020 \n BLAH bla15/05-2020 blaa. As one could observe, instead of / the date can have . or -, not necessary at the same time.
Some explaining
More specifically it can find dates of format day , moth year. Day is an one digit integer or a zero followed by one digit integer or 1 or 2 followed by an one digit integer or a 3 followed by 0 or 1. Month is an one digit integer or a zero followed by one digit integer or 1 followed by 0, 1, or 2. Year is the number 20 followed by any number between 00 and 99.
Useful notes
One can add more date splitting symbols by adding | symbol at the end of both (.|-|\/). For example for adding -- one would do (.|-|\/|--)
To have years outside of this millennium one has to modify (20[0-9][0-9]) to ([0-9][0-9][0-9][0-9])
I use something like this :
string="text 24/02/2021 ... 24-02-2021 ... 24_02_2021 ... 24|02|2021 text"
new_string = re.sub(r"[0-9]{1,4}[\_|\-|\/|\|][0-9]{1,2}[\_|\-|\/|\|][0-9]{1,4}", ' ', string)
print(new_string)
out : text ... ... ... text
If you don't want to raise ValueError exception like in methods with datetime, you can use re. Maybe you should also check that day of month lower than 31 and month number is lower than 12, inclusive:
from re import search as re_search
date_input = '31.12.1998'
re_search(r'^(3[01]|[12][0-9]|0[1-9]).(1[0-2]|0[1-9]).[0-9]{4}$', date_input)
With datetime good answer gave #unutbu earlier.
In case anyone wants to match this type of date "24 November 2008"
you can use
import re
date = "24 November 2008"
regex = re.compile("\d+\s\w+\s\d+")
matchDate = regex.findall(date)
print(matchDate)
Or
import re
date = "24 November 2008"
matchDate = re.findall("\d+\s\w+\s\d+", date)
print(matchDate)
This regular expression for matching dates in this format "22/10/2021" works for me :
import re
date = "WHATEVER 22/10/2029 WHATEVER"
match = re.search("([0-9]|1[0-9]|2[0-9]|3[0-5])/([0-9]|1[0-9]|2[0-9]|3[0-5])/([0-9][0-9][0-9][0-9])", date)
print(match)
OUTPUT = <re.Match object; span=(9, 19), match='22/10/2029'>
You can see in the fourth line that there is this string ([0-9]|1[0-9]|2[0-9]|3[0-5])/([0-9]|1[0-9]|2[0-9]|3[0-5])/([0-9][0-9][0-9][0-9]), this is the regular expression that I made based in this page.

Categories

Resources