Hi i have written regex to check where ther string have the char like - or . or / or : or AM or PM or space .The follworig regex work for that but i want to make case fail if the string contain the char other than AMP .
import re
Datere = re.compile("[-./\:?AMP ]+")
FD = { 'Date' : lambda date : bool(re.search(Datere,date)),}
def Validate(date):
for k,v in date.iteritems():
print k,v
print FD.get(k)(v)
Output:
Validate({'Date':'12/12/2010'})
Date 12/12/2010
True
Validate({'Date':'12/12/2010 12:30 AM'})
Date 12/12/2010
True
Validate({'Date':'12/12/2010 ZZ'})
Date 12/12/2010
True (Expecting False)
Edited:
Validate({'Date':'12122010'})
Date 12122010
False (Expecting False)
How could i find the string have other than the char APM any suggestion.Thanks a lot.
Give this a try:
^[-./\:?AMP \d]*$
The changes to your regex are
It's anchored with ^ and $ which means that the whole line should match and not partially
the \d is added to the character class to allow digits
Now the regex basically reads as list of symbols that are allowed on 1 lines
If you want the empty string not to match then change the * to a +
You could use an expression like this instead:
^[-0-9./:AMP ]+$
^ and $ anchor the expression at the beginning and end of string, making sure there is nothing else in it (except an optional new line after $).
The way you approach this is too naive to deal with garbled input like '-30/A-MP/2012/12', '-30/A-MP/20PA12/12'.
If you want to validate your dates robustly, how about:
import datetime
date = '12-12-2012 10:45 AM'
formats = ("%d-%m-%Y %I:%M %p", "%d/%m/%Y %I:%M %p", ...)
for fmt in formats:
try:
valid_date = datetime.datetime.strptime(date, fmt)
except ValueError as e:
print(e)
You would have to define all possible formats, but you will get full datetime objects (or time or date objects, they work similar), and you can be absolutely sure they are valid. For a full explanation of the available format specifiers: http://docs.python.org/library/time.html#time.strftime
Kind of elaborate, but does the trick.
import re
Datere = re.compile("""
^(?:\d\d[-./\:]){2} ## dd_SEP_dd
\d{4}\s* ## year may be followed by spaces
(?:\d\d[-./\:]\d\d\s+(?:AM|PM))? ## hh_SEP_mm spaces followed by AM/PM and this is optional
\s*$""",re.X)
FD = { 'Date' : lambda date : bool(re.search(Datere,date)),}
def Validate(date):
for k,v in date.iteritems():
print k,v
print FD.get(k)(v)
print Validate({'Date':'12/12/2010'})
print Validate({'Date':'12/12/2010 12:30 AM'})
print Validate({'Date':'12/12/2010 ZZ'})
Related
I have to find some date inside an string with a regular expresions in python
astring ='L2A_T21HUB_A023645_20210915T135520'
and i'm trying to get the part before the T with shape xxxxxxxx where every x is a number.
desiredOutput = '20210915'
I'm new in regex so I have no idea how to solve this
If the astring's format is consistent, meaning it will always have the same shape with respect to the date, you can split the string by '_' and get the last substring and get the date from there as such:
astring ='L2A_T21HUB_A023645_20210915T135520'
date_split = astring.split("_"). # --> ['L2A', 'T21HUB', 'A023645', '20210915T135520']
desiredOutput = date_split[3][:8] # --> [3] = '20210915T135520' [:8] gets first 8 chars
print(desiredOutput) # --> 20210915
If you wanted an actual datetime object
>>> from datetime import datetime
>>> astring = 'L2A_T21HUB_A023645_20210915T135520'
>>> date_str = astring.split('_')[-1]
>>> datetime.strptime(date_str, '%Y%m%dT%H%M%S')
datetime.datetime(2021, 9, 15, 13, 55, 20)
From that, you can use datetime.strftime to reformat to a new string, or you can use split('T')[0] to get the string you want.
The trouble with Regex is that there can be unexpected patterns that match your expected pattern and throw things off. However, if you know that only the date portion will ever have 8 sequential digits, you can do this:
import re
date_patt = re.compile('\d{8}')
date = date_patt.search(astring).group(0)
You can develop more robust patterns based on your knowledge of the formatting of the incoming strings. For instance, if you know that the date will always follow an underscore, you could use a look-behind assertion:
date_patt = re.compile(r'(?<=\_)\d{8}') # look for '_' before the date, but don't capture
Hope this helps. Regex can be finicky and may take some tweaking, but hope this sets you in the right direction.
I have a datetime string and, I want to separate the hour format from the minute format and then print both of them on a separate line.
Here is the code:
import re
from datetime import datetime
ct = datetime.now().strftime("%H:%M")
time = "12:30"
minus = datetime.strptime(time,"%H:%M") - datetime.strptime(ct,"%H:%M")
minus = datetime.strptime(str(minus),"%H:%M:%S").strftime("%H:%M")
# print(minus)
regex2 = re.compile(r'(\d)+:(\d)+')
match = regex2.search(minus)
print(match.group(0))
If the variable minus gives an output: 01:22
Then I want it 01 and 22 to be printed on different lines.
Output Should be like:
01
22
you don't need regex for that.
you can use timedelta directly:
from datetime import datetime
ct = datetime.now().strftime("%H:%M")
time = "10:30"
minus = datetime.strptime(time,"%H:%M") - datetime.strptime(ct,"%H:%M")
## extract whatever values you want from this delta:
## eg:
seconds = minus.seconds
hours = minus.seconds//3600
minutes = (minus.seconds//60)%60
print(hours)
print(minutes)
Ignoring the fact that your script fails due to a runtime error at line 6 (which prevents any output from being generated from the regex part of the script), to capture the digits surrounding the colon, you need to put the repetition marker (i.e. the plus sign) inside the capturing markers (i.e. the parenthesis).
So revising your regex:
regex2 = re.compile(r'(\d)+:(\d)+')
as follows:
regex2 = re.compile(r'(\d+):(\d+)')
Will achieve that.
The second thing you need to do is print the matching groups (as opposed to the entire matched text - i.e. group(0)).
Printing the first and second matching groups is achieved as follows:
print(match.group(1))
print(match.group(2))
The difference between your regular expression and mine was in the number of groups that would ultimately be generated on a match.
In your case (\d)+ you requested the creation of one or more groups of a single digit.
In my case (\d+) I am requesting the creation of a single group containing one or more digits. I then use this pattern twice - once on each side of the colon.
So, for input like this 01:22, you would have created 4 groups being "0", "1", "2" and "2". Whereas my regex would generate just 2 groups "01" and "22". Which I think is what you are trying to achieve.
Note that if you had input like this "0:122", your regular expression would have generated the same 4 groups ("0", "1", "2" and "2") - you would have no idea where the ":" character was unless you also captured it (the colon). In my RE, the input "0:122" would generate "0" and "122" thereby correctly informing you where the colon was (i.e. between the "0" and the "122"
Use the regex below:
'([^:]+)'
See regex explanation here: https://regex101.com/r/EgXlcD/46
import re
from datetime import datetime
ct = datetime.now().strftime("%H:%M")
time = "10:30"
minus = datetime.strptime(time,"%H:%M") - datetime.strptime(ct,"%H:%M")
minus = datetime.strptime(str(minus),"%H:%M:%S"
"").strftime("%H:%M")
#print(minus)
matches = re.findall(r'([^:]+)',minus)
for match in matches:
print (match)
I have a string like this:
>>> string = "bla_bla-whatever_2018.02.09_11.34.09_more_bla-123"
I need to extract the date 2018.02.09_11.34.09 from it. It will always be in this format.
So I tried:
>>> match = re.search(r'\d{4}\.\d{2}\.\d{2}_\d{2}\.\d{2}\.\d{2}', string)
It correctly extracts out the date from that string:
>>> match.group()
'2018.02.09_11.34.09'
But then when I try to create a datetime object from this string, it doesn't work:
>>> datetime.datetime.strptime(match.group(), '%Y.%m.%d_%H.%I.%S')
ValueError: time data '2018.02.09_11.34.09' does not match format '%Y.%m.%d_%H.%I.%S'
What am I doing wrong?
You need to replace the format specifier %I with %M, for minutes:
%Y.%m.%d_%H.%M.%S
%I denotes hour in 12-hour format so from (0)1..12, whereas based on your example, you have 34 as the value, which presumably is in minutes (%M).
I have an array of strings representing dates like '2015-6-03' and I want to convert these to the format '2015-06-03'.
Instead of doing the replacement with an ugly loop, I'd like to use a regular expression. Something along the lines of:
str.replace('(-){1}(\d){1}(-){1}', '-0{my digit here}-')
Is something like this possible?
You don't have to retrieve the digit from the match. You can replace the hyphen before a single-digit month with -0.
Like this:
re.sub('-(?=\d-)', '-0', text)
Note that (?=\d-) is a non-capturing expression because the opening parenthesis is followed by the special sequence ?=. That's why only the hyphen gets replaced.
Test:
import re
text = '2015-09-03 2015-6-03 2015-1-03 2015-10-03'
re.sub('-(?=\d-)', '-0', text)
Result:
'2015-09-03 2015-06-03 2015-01-03 2015-10-03'
Yes, a regex will accomplish what you want
\d+-(\d)-\d+
and so to replace you would use something like
import re
target = "2015-6-05"
out = re.sub(r'\d+-(\d)-\d+','(0\\1)', target)
No need for regex, you can load it as datetime object and format the string as requested when you print it:
import datetime
s = '2015-6-03'
date_obj = datetime.datetime.strptime(s, '%Y-%m-%d')
print "%d-%02d-%02d" % (date_obj.year, date_obj.month, date_obj.day)
OUTPUT
2015-06-03
Something along the lines of...
import re
def replaceRegex(what, pattern, filler):
regex = re.compile(pattern)
match = regex.match(what)
if match != None:
from, to = match.span()
return what.replace(what[from : to], filler)
else:
return None
Might help you.
What regular expression in Python do I use to match dates like this: "11/12/98"?
Instead of using regex, it is generally better to parse the string as a datetime.datetime object:
In [140]: datetime.datetime.strptime("11/12/98","%m/%d/%y")
Out[140]: datetime.datetime(1998, 11, 12, 0, 0)
In [141]: datetime.datetime.strptime("11/12/98","%d/%m/%y")
Out[141]: datetime.datetime(1998, 12, 11, 0, 0)
You could then access the day, month, and year (and hour, minutes, and seconds) as attributes of the datetime.datetime object:
In [143]: date.year
Out[143]: 1998
In [144]: date.month
Out[144]: 11
In [145]: date.day
Out[145]: 12
To test if a sequence of digits separated by forward-slashes represents a valid date, you could use a try..except block. Invalid dates will raise a ValueError:
In [159]: try:
.....: datetime.datetime.strptime("99/99/99","%m/%d/%y")
.....: except ValueError as err:
.....: print(err)
.....:
.....:
time data '99/99/99' does not match format '%m/%d/%y'
If you need to search a longer string for a date,
you could use regex to search for digits separated by forward-slashes:
In [146]: import re
In [152]: match = re.search(r'(\d+/\d+/\d+)','The date is 11/12/98')
In [153]: match.group(1)
Out[153]: '11/12/98'
Of course, invalid dates will also match:
In [154]: match = re.search(r'(\d+/\d+/\d+)','The date is 99/99/99')
In [155]: match.group(1)
Out[155]: '99/99/99'
To check that match.group(1) returns a valid date string, you could then parsing it using datetime.datetime.strptime as shown above.
I find the below RE working fine for Date in the following format;
14-11-2017
14.11.2017
14|11|2017
It can accept year from 2000-2099
Please do not forget to add $ at the end,if not it accept 14-11-201 or 20177
date="13-11-2017"
x=re.search("^([1-9] |1[0-9]| 2[0-9]|3[0-1])(.|-)([1-9] |1[0-2])(.|-|)20[0-9][0-9]$",date)
x.group()
output = '13-11-2017'
I built my solution on top of #aditya Prakash appraoch:
print(re.search("^([1-9]|0[1-9]|1[0-9]|2[0-9]|3[0-1])(\.|-|/)([1-9]|0[1-9]|1[0-2])(\.|-|/)([0-9][0-9]|19[0-9][0-9]|20[0-9][0-9])$|^([0-9][0-9]|19[0-9][0-9]|20[0-9][0-9])(\.|-|/)([1-9]|0[1-9]|1[0-2])(\.|-|/)([1-9]|0[1-9]|1[0-9]|2[0-9]|3[0-1])$",'01/01/2018'))
The first part (^([1-9]|0[1-9]|1[0-9]|2[0-9]|3[0-1])(\.|-|/)([1-9]|0[1-9]|1[0-2])(\.|-|/)([0-9][0-9]|19[0-9][0-9]|20[0-9][0-9])$) can handle the following formats:
01.10.2019
1.1.2019
1.1.19
12/03/2020
01.05.1950
The second part (^([0-9][0-9]|19[0-9][0-9]|20[0-9][0-9])(\.|-|/)([1-9]|0[1-9]|1[0-2])(\.|-|/)([1-9]|0[1-9]|1[0-9]|2[0-9]|3[0-1])$) can basically do the same, but in inverse order, where the year comes first, followed by month, and then day.
2020/02/12
As delimiters it allows ., /, -. As years it allows everything from 1900-2099, also giving only two numbers is fine.
If you have suggestions for improvement please let me know in the comments, so I can update the answer.
Using this regular expression you can validate different kinds of Date/Time samples, just a little change is needed.
^\d\d\d\d/(0?[1-9]|1[0-2])/(0?[1-9]|[12][0-9]|3[01]) (00|[0-9]|1[0-9]|2[0-3]):([0-9]|[0-5][0-9]):([0-9]|[0-5][0-9])$ -->validate this: 2018/7/12 13:00:00
for your format you cad change it to:
^(0?[1-9]|[12][0-9]|3[01])/(0?[1-9]|1[0-2])/\d\d$ --> validates this: 11/12/98
I use something like this
>>> import datetime
>>> regex = datetime.datetime.strptime
>>>
>>> # TEST
>>> assert regex('2020-08-03', '%Y-%m-%d')
>>>
>>> assert regex('2020-08', '%Y-%m-%d')
ValueError: time data '2020-08' does not match format '%Y-%m-%d'
>>> assert regex('08/03/20', '%m/%d/%y')
>>>
>>> assert regex('08-03-2020', '%m/%d/%y')
ValueError: time data '08-03-2020' does not match format '%m/%d/%y'
Well, from my understanding, simply for matching this format in a given string, I prefer this regular expression:
pattern='[0-9|/]+'
to match the format in a more strict way, the following works:
pattern='(?:[0-9]{2}/){2}[0-9]{2}'
Personally, I cannot agree with unutbu's answer since sometimes we use regular expression for "finding" and "extract", not only "validating".
Sometimes we need to get the date from a string.
One example with grouping:
record = '1518-09-06 00:57 some-alphanumeric-charecter'
pattern_date_time = ([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}) .+
match = re.match(pattern_date_time, record)
if match is not None:
group = match.group()
date = group[0]
print(date) // outputs 1518-09-06 00:57
As the question title asks for a regex that finds many dates, I would like to propose a new solution, although there are many solutions already.
In order to find all dates of a string that are in this millennium (2000 - 2999), for me it worked the following:
dates = re.findall('([1-9]|1[0-9]|2[0-9]|3[0-1]|0[0-9])(.|-|\/)([1-9]|1[0-2]|0[0-9])(.|-|\/)(20[0-9][0-9])',dates_ele)
dates = [''.join(dates[i]) for i in range(len(dates))]
This regex is able to find multiple dates in the same string, like bla Bla 8.05/2020 \n BLAH bla15/05-2020 blaa. As one could observe, instead of / the date can have . or -, not necessary at the same time.
Some explaining
More specifically it can find dates of format day , moth year. Day is an one digit integer or a zero followed by one digit integer or 1 or 2 followed by an one digit integer or a 3 followed by 0 or 1. Month is an one digit integer or a zero followed by one digit integer or 1 followed by 0, 1, or 2. Year is the number 20 followed by any number between 00 and 99.
Useful notes
One can add more date splitting symbols by adding | symbol at the end of both (.|-|\/). For example for adding -- one would do (.|-|\/|--)
To have years outside of this millennium one has to modify (20[0-9][0-9]) to ([0-9][0-9][0-9][0-9])
I use something like this :
string="text 24/02/2021 ... 24-02-2021 ... 24_02_2021 ... 24|02|2021 text"
new_string = re.sub(r"[0-9]{1,4}[\_|\-|\/|\|][0-9]{1,2}[\_|\-|\/|\|][0-9]{1,4}", ' ', string)
print(new_string)
out : text ... ... ... text
If you don't want to raise ValueError exception like in methods with datetime, you can use re. Maybe you should also check that day of month lower than 31 and month number is lower than 12, inclusive:
from re import search as re_search
date_input = '31.12.1998'
re_search(r'^(3[01]|[12][0-9]|0[1-9]).(1[0-2]|0[1-9]).[0-9]{4}$', date_input)
With datetime good answer gave #unutbu earlier.
In case anyone wants to match this type of date "24 November 2008"
you can use
import re
date = "24 November 2008"
regex = re.compile("\d+\s\w+\s\d+")
matchDate = regex.findall(date)
print(matchDate)
Or
import re
date = "24 November 2008"
matchDate = re.findall("\d+\s\w+\s\d+", date)
print(matchDate)
This regular expression for matching dates in this format "22/10/2021" works for me :
import re
date = "WHATEVER 22/10/2029 WHATEVER"
match = re.search("([0-9]|1[0-9]|2[0-9]|3[0-5])/([0-9]|1[0-9]|2[0-9]|3[0-5])/([0-9][0-9][0-9][0-9])", date)
print(match)
OUTPUT = <re.Match object; span=(9, 19), match='22/10/2029'>
You can see in the fourth line that there is this string ([0-9]|1[0-9]|2[0-9]|3[0-5])/([0-9]|1[0-9]|2[0-9]|3[0-5])/([0-9][0-9][0-9][0-9]), this is the regular expression that I made based in this page.