I have a date format like - yyyymmdd. I basically need to get current date from a regex expression.
Example of date format - 20191211 (for 11th December 2019)
Currently I am using the following regex - ([12]\d{3}(0[1-9]|1[0-2])(0[1-9]|[12]\d|3[01]))
The code I am using -
prefix_regex = "([12]\d{3}(0[1-9]|1[0-2])(0[1-9]|[12]\d|3[01]))"
now = datetime.now()
date_after_match = now.strftime(prefix_regex)
print (date_after_match)
Output - ([12]\d{3}(0[1-9]|1[0-2])(0[1-9]|[12]\d|3[01]))
I know this is possible if instead of the regex I simply use "%Y%m%d" like -
prefix_regex = "%Y%m%d"
now = datetime.now()
date_after_match = now.strftime(prefix_regex)
print (date_after_match)
Output - 20191211. This is the desired output from the regex expression.
But, I need to use a regex. Is there a way to do it?
this works for me using re :
import re
datestr='20191211'
print(re.findall(pattern="\d"*8, string=datestr)[0])
Related
I have to find some date inside an string with a regular expresions in python
astring ='L2A_T21HUB_A023645_20210915T135520'
and i'm trying to get the part before the T with shape xxxxxxxx where every x is a number.
desiredOutput = '20210915'
I'm new in regex so I have no idea how to solve this
If the astring's format is consistent, meaning it will always have the same shape with respect to the date, you can split the string by '_' and get the last substring and get the date from there as such:
astring ='L2A_T21HUB_A023645_20210915T135520'
date_split = astring.split("_"). # --> ['L2A', 'T21HUB', 'A023645', '20210915T135520']
desiredOutput = date_split[3][:8] # --> [3] = '20210915T135520' [:8] gets first 8 chars
print(desiredOutput) # --> 20210915
If you wanted an actual datetime object
>>> from datetime import datetime
>>> astring = 'L2A_T21HUB_A023645_20210915T135520'
>>> date_str = astring.split('_')[-1]
>>> datetime.strptime(date_str, '%Y%m%dT%H%M%S')
datetime.datetime(2021, 9, 15, 13, 55, 20)
From that, you can use datetime.strftime to reformat to a new string, or you can use split('T')[0] to get the string you want.
The trouble with Regex is that there can be unexpected patterns that match your expected pattern and throw things off. However, if you know that only the date portion will ever have 8 sequential digits, you can do this:
import re
date_patt = re.compile('\d{8}')
date = date_patt.search(astring).group(0)
You can develop more robust patterns based on your knowledge of the formatting of the incoming strings. For instance, if you know that the date will always follow an underscore, you could use a look-behind assertion:
date_patt = re.compile(r'(?<=\_)\d{8}') # look for '_' before the date, but don't capture
Hope this helps. Regex can be finicky and may take some tweaking, but hope this sets you in the right direction.
I am trying to write a regex for a python script that matches the second group if only the first group is a match.
I am trying to grab the dates if the text looks like this:
Cancel Date: 08/09/19
Cancellation Date: 08/05/19
It should not grab the date if the text is anything else other than what is mentioned above.
e.g Due date: 12/34/12 should not match or grab the dates.
Current regex solution:
(Cancel Date:|Cancellation Date:)[\s\n\r\t]*(\d{1,2}/\d{1,2}/\d{2})
I am using regex.search().group(2) to grab the info but seem to keep getting a none type attribute error for where the dates need to be. Any help or an alternative solution is appreciated.
I am capturing the regex in a config file with xml format.
This seems to work for me. Did you properly escape your regex-pattern, or make it a raw-string?
import re as regex
strings = [
"Cancel Date: 08/09/19",
"Cancellation Date: 08/05/19",
"Due date: 12/34/12"
]
pattern = "Cancel(lation)? Date:[\s]*(\\d{1,2}/\\d{1,2}/\\d{2})"
for string in strings:
match = regex.match(pattern, string)
if match is None:
print("No Match")
else:
print(match.group(2))
Output:
08/09/19
08/05/19
No Match
I am using python 2.7 version
I have a string as below:
flow_id="Livongo_Weekly_Enrollment_2019_03_19"
I am getting flow_id from a function, so now I want to extract Livongo_Weekly_Enrollment, I want something like below :
if flow_id contains "YYYY-MM-DD"
clean_flow_id=re.sub('\d{4}[_]\d{2}[_]\d{2}','',flow_id)
but the output of clean_flow_id is 'Livongo_Weekly_Enrollment_'
I don't want last '_'
So to conclude first I need to match if the string contains date like "YYYY_MM_DD" if yes then extract the name as given above, can anyone please help?
I cannot find a function or library to match flow_id with date format ("YYYY_MM_DD")
to get clean_flow_id I tried something like below :
clean_flow_id=re.sub('\d{4}[_]\d{2}[_]\d{2}','',flow_id)
clean_flow_id='Livongo_Weekly_Enrollment_'
import re
flow_id="Livongo_Weekly_Enrollment_2019_03_19"
if flow_id
Expected :
clean_flow_id=flow_id=Livongo_Weekly_Enrollment
I expect the output of clean_flow_id to be Livongo_Weekly_Enrollment, but the actual output is Livongo_Weekly_Enrollment_
Could you please in finding function to match date format in flow_id?
Maybe this simple expression returns the desired output:
(.*?)_[0-9]{4}_[0-9]{2}_[0-9]{2}\b
With re.sub you can simply replace it with \1.
Test with re.sub
import re
expression = r"(.*?)_[0-9]{4}_[0-9]{2}_[0-9]{2}\b"
string = """
Livongo_Weekly_Enrollment_2019_03_19
Livongo_Weekly_Enrollment_2019_01_01
Livongo_Weekly_Enrollment_2019_3_1
Livongo_Weekly_Enrollment_2019_03_1
Livongo_Weekly_Enrollment_2019_03_111
"""
print(re.sub(expression, r"\1", string))
Output
Livongo_Weekly_Enrollment
Livongo_Weekly_Enrollment
Livongo_Weekly_Enrollment_2019_3_1
Livongo_Weekly_Enrollment_2019_03_1
Livongo_Weekly_Enrollment_2019_03_111
Test with re.findall
import re
expression = r"(.*?)_[0-9]{4}_[0-9]{2}_[0-9]{2}\b"
string = """
Livongo_Weekly_Enrollment_2019_03_19
Livongo_Weekly_Enrollment_2019_01_01
Livongo_Weekly_Enrollment_2019_3_1
Livongo_Weekly_Enrollment_2019_03_1
Livongo_Weekly_Enrollment_2019_03_111
"""
print(re.findall(expression, string))
Output
['Livongo_Weekly_Enrollment', 'Livongo_Weekly_Enrollment']
If you wish to explore/simplify/modify the expression, it's been
explained on the top right panel of
regex101.com. If you'd like, you
can also watch in this
link, how it would match
against some sample inputs.
Test if there is a date
import re
expression = r"^(?=.*\d{4}_\d{2}_\d{2}).*$"
string = "Livongo_Weekly_Enrollment_2019_03_19"
if re.search(expression, string):
print(f"There is at least one date in the {string}")
else:
print(f"Sorry! {string} has no date.")
Output
There is at least one date in the Livongo_Weekly_Enrollment_2019_03_19
Demo
You can use re.search in your if statement:
test = re.search('(.*)_\d{4}[_]\d{2}[_]\d{2}', flow_id)
if test:
your_match = test[1]
test[0] is the whole string
test[1] is the first parentheses
test[2] is the date (as a string)
--
Edit
def get_clean_flow_id(filename):
test = re.search('(.*)_\d{4}[_]\d{2}[_]\d{2}', filename)
if test:
return test[1]
else:
return None
Returns:
>>> get_clean_flow_id("Livongo_Weekly_Enrollment_2019_03_19.csv")
'Livongo_Weekly_Enrollment'
>>> get_clean_flow_id("Omada_weekly_fle_20190319120301.csv")
>>> get_clean_flow_id("tivity_weekly_fle_20190319120301.json")
>>>
Using python, I need to extract an ID and Date from this filename:
export-foobar-54321-2015_02_18_23_30_00.csv.gz
Where:
ID = 54321
Date = 2015_02_18
So far, I can match the file name with this regular expression:
export-foobar-[0-9]{5}\-[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]{2}_[0-9]{2}_[0-9]{2}.csv.gz
What I'd like as my final print out put would be:
ID = 54321
Date =02-18-2015
Being new to python, I tried the following, however, im not sure how to print what I need. I have this so far:
>>> import re
>>> filename='export-generic-33605-2015_02_18_23_30_00.csv.gz'
>>> matches=re.search("export-foobar-[0-9]{5}\-[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]{2}_[0-9]{2}_[0-9]{2}.csv.gz",filename)
>>> print(matches)
<_sre.SRE_Match object at 0x7f2ee3616718>
If I could please get some help in printing out what I need and then customizing the print to match the date MM-DD-YYYY correctly would be appreciated.
Use capturing groups and also replace foobar in your regex to generic or use [^-]+ instead of generic if you don't know the actual value.
>>> import re
>>> filename='export-generic-33605-2015_02_18_23_30_00.csv.gz'
>>> matches=re.search(r"export-generic-([0-9]{5})-([0-9]{4}_[0-9]{2}_[0-9]{2})_[0-9]{2}_[0-9]{2}_[0-9]{2}\.csv\.gz",filename).groups()
>>> Id, Date = matches
>>> Id
'33605'
>>> Date
'2015_02_18'
>>> date = re.sub(r'^([^_]+)_([^_]+)_([^_]+)$', r'\2-\3-\1', Date)
>>> date
'02-18-2015'
You can use the following to capture the digits of interest and rearrange the date. generic in your regexp was changed to \w+ to capture any text string.
filename = 'export-foobar-54321-2015_02_18_23_30_00.csv.gz'
matches=re.search(r"export-\w+-([0-9]{5})-([0-9]{4})_([0-9]{2})_([0-9]{2})_[0-9]{2}_[0-9]{2}_[0-9]{2}\.csv\.gz",filename).groups()
Id, Year, Month, Day = matches
Date = '-'.join([Month, Day, Year])
print(Id) # 54321
print(Date) # 02-18-2015
Hi i have written regex to check where ther string have the char like - or . or / or : or AM or PM or space .The follworig regex work for that but i want to make case fail if the string contain the char other than AMP .
import re
Datere = re.compile("[-./\:?AMP ]+")
FD = { 'Date' : lambda date : bool(re.search(Datere,date)),}
def Validate(date):
for k,v in date.iteritems():
print k,v
print FD.get(k)(v)
Output:
Validate({'Date':'12/12/2010'})
Date 12/12/2010
True
Validate({'Date':'12/12/2010 12:30 AM'})
Date 12/12/2010
True
Validate({'Date':'12/12/2010 ZZ'})
Date 12/12/2010
True (Expecting False)
Edited:
Validate({'Date':'12122010'})
Date 12122010
False (Expecting False)
How could i find the string have other than the char APM any suggestion.Thanks a lot.
Give this a try:
^[-./\:?AMP \d]*$
The changes to your regex are
It's anchored with ^ and $ which means that the whole line should match and not partially
the \d is added to the character class to allow digits
Now the regex basically reads as list of symbols that are allowed on 1 lines
If you want the empty string not to match then change the * to a +
You could use an expression like this instead:
^[-0-9./:AMP ]+$
^ and $ anchor the expression at the beginning and end of string, making sure there is nothing else in it (except an optional new line after $).
The way you approach this is too naive to deal with garbled input like '-30/A-MP/2012/12', '-30/A-MP/20PA12/12'.
If you want to validate your dates robustly, how about:
import datetime
date = '12-12-2012 10:45 AM'
formats = ("%d-%m-%Y %I:%M %p", "%d/%m/%Y %I:%M %p", ...)
for fmt in formats:
try:
valid_date = datetime.datetime.strptime(date, fmt)
except ValueError as e:
print(e)
You would have to define all possible formats, but you will get full datetime objects (or time or date objects, they work similar), and you can be absolutely sure they are valid. For a full explanation of the available format specifiers: http://docs.python.org/library/time.html#time.strftime
Kind of elaborate, but does the trick.
import re
Datere = re.compile("""
^(?:\d\d[-./\:]){2} ## dd_SEP_dd
\d{4}\s* ## year may be followed by spaces
(?:\d\d[-./\:]\d\d\s+(?:AM|PM))? ## hh_SEP_mm spaces followed by AM/PM and this is optional
\s*$""",re.X)
FD = { 'Date' : lambda date : bool(re.search(Datere,date)),}
def Validate(date):
for k,v in date.iteritems():
print k,v
print FD.get(k)(v)
print Validate({'Date':'12/12/2010'})
print Validate({'Date':'12/12/2010 12:30 AM'})
print Validate({'Date':'12/12/2010 ZZ'})