i am trying to extract date from text using regular expression [duplicate] - python

This question already has answers here:
How do I parse an ISO 8601-formatted date?
(29 answers)
Closed 4 years ago.
I am trying to extract the date from this '2025-03-21T12:54:41Z' text using python regular expression.
date=re.match('(\d{4})[/.-](\d{2})[/.-](\d{2})$', date[0])
print(date)
This give output as None
also, I tried this code
date_reg_exp = re.compile('\d{4}(?P<sep>[-/])\d{2}(?P=sep)\d{2}')
matches_list=date_reg_exp.findall(date[0])
for match in matches_list:
print match
This gives output as - only
Please help

Your regular expression is wrong because it has a $ at the end. $ asserts that this is the end of the string.
The regex engine matches your string with the regex and after matching the last two digits, expects a $ - end of the string. However, your string still has T12:54:41Z before the end, so the regex does not match.
To fix this, remove $:
>>> re.match('(\d{4})[/.-](\d{2})[/.-](\d{2})', '2025-03-21T12:54:41Z')
<_sre.SRE_Match object; span=(0, 10), match='2025-03-21'>

Instead of using $ sigil at the end of your regexp, which is matching end-of-line character, try using ^ at the beginning:
import re
date='2025-03-21T12:54:41Z'
date=re.match('^(\d{4})[/.-](\d{2})[/.-](\d{2})', date)
print(date)
Output in python3:
<_sre.SRE_Match object; span=(0, 10), match='2025-03-21'>
Python2:
<_sre.SRE_Match object at 0x7fd191ac1ae0>

Related

\b boundary regex not working as expected [duplicate]

This question already has answers here:
Python regular expression not matching
(3 answers)
Closed 1 year ago.
I was trying to use the \b regex to match whole words but I coudn't get it to work.
match = re.match(r'\bcat\b', 'the cat is sleeping')
print(match) # prints None
With this piece of code, I was expecting to get a match on cat, but it returns None. I tried running the code on my local machine, and also on an online python shell.
re.match starts the match from the beginning of the string. Since your cat is not starting the string, so that's why it's not matching.
You need to use re.search in this case.
re.search(r'\bcat\b', 'the cat is sleeping')
<_sre.SRE_Match object; span=(4, 7), match='cat'>

Why the positive lookahead not working with /$ in the end [duplicate]

This question already has answers here:
Why the positive lookahead not working for this regex [closed]
(3 answers)
Closed 3 years ago.
So I want to match issues/ or settings/general/ but in the second case /general should not be included in the match, so i tried using positive lookahead for the second case but it does not seems to be working, this is what i came up with.
^(issues|settings(?=/general))/$
It's because the /general part of the string is not consumed.
After having checked that settings is correctly followed by /general, the cursor is still at the end of settings, so the matching will continue from this point on.
So the slash is correctly matched, but not the end of line.
As suggested by Wiktor, you'd be better off using groups if you want to extract a part of the string.
Here's a proposition:
^(issues|settings)/general/$
Trying it out:
>>> result = re.match("^(issues|settings)/general/$", "issues/general/")
>>> result
<re.Match object; span=(0, 15), match='issues/general/'>
>>> result.group(1)
'issues'
If you really want to avoid groups though, you can also include /$ inside the lookahead, and so the regex becomes ^(issues|settings(?=/general/$)):
>>> re.match("^(issues|settings(?=/general/$))", "issues/general/")
<re.Match object; span=(0, 6), match='issues'>

Searching for an exact match that contains brackets [duplicate]

This question already has an answer here:
Python re - escape coincidental parentheses in regex pattern
(1 answer)
Closed 5 years ago.
I am reading in lines from a file each of which are formatted like this:
array_name[0]
array_name[1]
How can I do an exact match on this string in python? I've tried this:
if re.match(line, "array_name[0]")
but it seems to match all the time without taking the parts in bracket ([0], [1], etc.) into account
re.escape from the re module is a useful tool for automatically escaping characters that the regex engine considers special. From the docs:
re.escape(pattern)
Escape all the characters in pattern except ASCII
letters and numbers. This is useful if you want to match an arbitrary
literal string that may have regular expression metacharacters in it.
In [1]: re.escape("array_name[0]")
Out[1]: 'array_name\\[0\\]'
Also, you've reversed the order of your arguments. You'll need your pattern to come first, followed by the text you want to match:
re.match(re.escape("array_name[0]"), line)
Example:
In [2]: re.match(re.escape("array_name[0]"), 'array_name[0] in a line')
Out[2]: <_sre.SRE_Match object; span=(0, 13), match='array_name[0]'>

Python Regular Expression newline is matched [duplicate]

This question already has answers here:
REGEX - Differences between `^`, `$` and `\A`, `\Z`
(1 answer)
Checking whole string with a regex
(5 answers)
Closed 5 years ago.
I want to match a string that has alphanumerics and some special characters but not the newline. But, whenever my string has a newline, it matches the newline character as well. I checked document for some flags but none of them looked relevant.
The following is a sample code in Python 3.6.2 REPL
>>> import re
>>> s = "do_not_match\n"
>>> p = re.compile(r"^[a-zA-Z\+\-\/\*\%\_\>\<=]*$")
>>> p.match(s)
<_sre.SRE_Match object; span=(0, 12), match='do_not_match'>
The expected result is that it shouldn't match as I have newline at the end.
https://regex101.com/r/qyRw5s/1
I am a bit confused on what I am missing here.
The problem is that $ matches at the end of the string before the newline (if any).
If you don't want to match the newline at the end, use \Z instead of $ in your regex.
See the re module's documentation:
'$'
Matches the end of the string or just before the newline at the end of the string,
\Z
Matches only at the end of the string.

How to use regex with optional characters in python?

Say I have a string
"3434.35353"
and another string
"3593"
How do I make a single regular expression that is able to match both without me having to set the pattern to something else if the other fails? I know \d+ would match the 3593, but it would not do anything for the 3434.35353, but (\d+\.\d+) would only match the one with the decimal and return no matches found for the 3593.
I expect m.group(1) to return:
"3434.35353"
or
"3593"
You can put a ? after a group of characters to make it optional.
You want a dot followed by any number of digits \.\d+, grouped together (\.\d+), optionally (\.\d+)?. Stick that in your pattern:
import re
print re.match("(\d+(\.\d+)?)", "3434.35353").group(1)
3434.35353
print re.match("(\d+(\.\d+)?)", "3434").group(1)
3434
This regex should work:
\d+(\.\d+)?
It matches one ore more digits (\d+) optionally followed by a dot and one or more digits ((\.\d+)?).
Use the "one or zero" quantifier, ?. Your regex becomes: (\d+(\.\d+)?).
See Chapter 8 of the TextWrangler manual for more details about the different quantifiers available, and how to use them.
use (?:<characters>|). replace <characters> with the string to make optional. I tested in python shell and got the following result:
>>> s = re.compile('python(?:3|)')
>>> s
re.compile('python(?:3|)')
>>> re.match(s, 'python')
<re.Match object; span=(0, 6), match='python'>
>>> re.match(s, 'python3')
<re.Match object; span=(0, 7), match='python3'>```
Read up on the Python RegEx library. The link answers your question and explains why.
However, to match a digit followed by more digits with an optional decimal, you can use
re.compile("(\d+(\.\d+)?)")
In this example, the ? after the .\d+ capture group specifies that this portion is optional.
Example

Categories

Resources