This question already has answers here:
Why the positive lookahead not working for this regex [closed]
(3 answers)
Closed 3 years ago.
So I want to match issues/ or settings/general/ but in the second case /general should not be included in the match, so i tried using positive lookahead for the second case but it does not seems to be working, this is what i came up with.
^(issues|settings(?=/general))/$
It's because the /general part of the string is not consumed.
After having checked that settings is correctly followed by /general, the cursor is still at the end of settings, so the matching will continue from this point on.
So the slash is correctly matched, but not the end of line.
As suggested by Wiktor, you'd be better off using groups if you want to extract a part of the string.
Here's a proposition:
^(issues|settings)/general/$
Trying it out:
>>> result = re.match("^(issues|settings)/general/$", "issues/general/")
>>> result
<re.Match object; span=(0, 15), match='issues/general/'>
>>> result.group(1)
'issues'
If you really want to avoid groups though, you can also include /$ inside the lookahead, and so the regex becomes ^(issues|settings(?=/general/$)):
>>> re.match("^(issues|settings(?=/general/$))", "issues/general/")
<re.Match object; span=(0, 6), match='issues'>
Related
This question already has answers here:
Python regular expression not matching
(3 answers)
Closed 1 year ago.
I was trying to use the \b regex to match whole words but I coudn't get it to work.
match = re.match(r'\bcat\b', 'the cat is sleeping')
print(match) # prints None
With this piece of code, I was expecting to get a match on cat, but it returns None. I tried running the code on my local machine, and also on an online python shell.
re.match starts the match from the beginning of the string. Since your cat is not starting the string, so that's why it's not matching.
You need to use re.search in this case.
re.search(r'\bcat\b', 'the cat is sleeping')
<_sre.SRE_Match object; span=(4, 7), match='cat'>
This question already has answers here:
How do I parse an ISO 8601-formatted date?
(29 answers)
Closed 4 years ago.
I am trying to extract the date from this '2025-03-21T12:54:41Z' text using python regular expression.
date=re.match('(\d{4})[/.-](\d{2})[/.-](\d{2})$', date[0])
print(date)
This give output as None
also, I tried this code
date_reg_exp = re.compile('\d{4}(?P<sep>[-/])\d{2}(?P=sep)\d{2}')
matches_list=date_reg_exp.findall(date[0])
for match in matches_list:
print match
This gives output as - only
Please help
Your regular expression is wrong because it has a $ at the end. $ asserts that this is the end of the string.
The regex engine matches your string with the regex and after matching the last two digits, expects a $ - end of the string. However, your string still has T12:54:41Z before the end, so the regex does not match.
To fix this, remove $:
>>> re.match('(\d{4})[/.-](\d{2})[/.-](\d{2})', '2025-03-21T12:54:41Z')
<_sre.SRE_Match object; span=(0, 10), match='2025-03-21'>
Instead of using $ sigil at the end of your regexp, which is matching end-of-line character, try using ^ at the beginning:
import re
date='2025-03-21T12:54:41Z'
date=re.match('^(\d{4})[/.-](\d{2})[/.-](\d{2})', date)
print(date)
Output in python3:
<_sre.SRE_Match object; span=(0, 10), match='2025-03-21'>
Python2:
<_sre.SRE_Match object at 0x7fd191ac1ae0>
This question already has answers here:
Regex plus vs star difference? [duplicate]
(9 answers)
Closed 5 years ago.
I'm new to python regex and am learning the lookahead assertion.
I found the following strange. Could someone tell me how it works?
import regex as re
re.search('(\d*)(?<=a)(\.)','1a.')
<regex.Match object; span=(2, 3), match='.'>
re.search('(\d+)(?<=a)(\.)','1a.')
out put nothing
Why doesn't the second one match anything?
The first pattern:
re.search('(\d*)(?<=a)(\.)', '1a.')
says to find zero or more digits, followed by a dot. Right before the dot, it has a positive lookbehind, which asserts the previous character was an a. In this case, Python will match zero digits, followed by a single dot. The lookbehind fires true, because the preceding character was in fact an a.
However, the second pattern:
re.search('(\d+)(?<=a)(\.)','1a.')
matches one or more digits, followed the lookbehind and matching dot. In this case, Python is compelled to match the number 1. But then it the lookbehind must fail. Obviously, if the last character matched were a number, it cannot be the letter a. So, there is no match possible in the second case. Even if we were to remove (?<=a) from the second pattern, it would still fail because we are not accounting for the letter a.
This question already has an answer here:
Python re - escape coincidental parentheses in regex pattern
(1 answer)
Closed 5 years ago.
I am reading in lines from a file each of which are formatted like this:
array_name[0]
array_name[1]
How can I do an exact match on this string in python? I've tried this:
if re.match(line, "array_name[0]")
but it seems to match all the time without taking the parts in bracket ([0], [1], etc.) into account
re.escape from the re module is a useful tool for automatically escaping characters that the regex engine considers special. From the docs:
re.escape(pattern)
Escape all the characters in pattern except ASCII
letters and numbers. This is useful if you want to match an arbitrary
literal string that may have regular expression metacharacters in it.
In [1]: re.escape("array_name[0]")
Out[1]: 'array_name\\[0\\]'
Also, you've reversed the order of your arguments. You'll need your pattern to come first, followed by the text you want to match:
re.match(re.escape("array_name[0]"), line)
Example:
In [2]: re.match(re.escape("array_name[0]"), 'array_name[0] in a line')
Out[2]: <_sre.SRE_Match object; span=(0, 13), match='array_name[0]'>
Say I have a string
"3434.35353"
and another string
"3593"
How do I make a single regular expression that is able to match both without me having to set the pattern to something else if the other fails? I know \d+ would match the 3593, but it would not do anything for the 3434.35353, but (\d+\.\d+) would only match the one with the decimal and return no matches found for the 3593.
I expect m.group(1) to return:
"3434.35353"
or
"3593"
You can put a ? after a group of characters to make it optional.
You want a dot followed by any number of digits \.\d+, grouped together (\.\d+), optionally (\.\d+)?. Stick that in your pattern:
import re
print re.match("(\d+(\.\d+)?)", "3434.35353").group(1)
3434.35353
print re.match("(\d+(\.\d+)?)", "3434").group(1)
3434
This regex should work:
\d+(\.\d+)?
It matches one ore more digits (\d+) optionally followed by a dot and one or more digits ((\.\d+)?).
Use the "one or zero" quantifier, ?. Your regex becomes: (\d+(\.\d+)?).
See Chapter 8 of the TextWrangler manual for more details about the different quantifiers available, and how to use them.
use (?:<characters>|). replace <characters> with the string to make optional. I tested in python shell and got the following result:
>>> s = re.compile('python(?:3|)')
>>> s
re.compile('python(?:3|)')
>>> re.match(s, 'python')
<re.Match object; span=(0, 6), match='python'>
>>> re.match(s, 'python3')
<re.Match object; span=(0, 7), match='python3'>```
Read up on the Python RegEx library. The link answers your question and explains why.
However, to match a digit followed by more digits with an optional decimal, you can use
re.compile("(\d+(\.\d+)?)")
In this example, the ? after the .\d+ capture group specifies that this portion is optional.
Example