How to match a full string, instead of partial string? [duplicate] - python

This question already has answers here:
Order of regular expression operator (..|.. ... ..|..)
(1 answer)
Checking whole string with a regex
(5 answers)
Closed 2 years ago.
pattern = (1|2|3|4|5|6|7|8|9|10|11|12)
str = '11'
This only matches '1', not '11'. How to match the full '11'? I changed it to:
pattern = (?:1|2|3|4|5|6|7|8|9|10|11|12)
It is the same.
I am testing here first:
https://regex101.com/

It is matching 1 instead of 11 because you have 1 before 11 in your alternation. If you use re.findall then it will match 1 twice for input string 11.
However to match numbers from 1 to 12 you can avoid alternation and use:
\b[1-9]|1[0-2]?\b
It is safer to use word boundary to avoid matching within word digits.
RegEx Demo

Regex always matches left before right.
On an alternation you'd put the longest first.
However, factoring should take precedense.
(1|2|3|4|5|6|7|8|9|10|11|12)
then it turns into
1[012]?|[2-9]
https://regex101.com/r/qmlKr0/1
I purposely didn't add boundary parts as
everybody has their own preference.

do you mean this solution?
[\d]+

Related

Python Regex: alternation gives empty matches [duplicate]

This question already has answers here:
Why do some regex engines match .* twice in a single input string?
(1 answer)
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I was doing some regex which simplifies to this code:
>>> import re
>>> re.sub(r'^.*$|', "xyz", "abc")
xyzxyz
I was expecting it to replace abc with xyz as the RE ^.*$ matches the whole string, the engine should just return that and exit. So I ran the same regex with re.findall().
>>> re.findall(r'^.*$|', 'abcd')
['abcd', '']
in the docs it says:
A|B, where A and B can be arbitrary REs. As the target string is scanned, REs separated by '|'
are tried from left to right. When one pattern completely matches,
that branch is accepted. This means that once A matches, B will not be
tested further, even if it would produce a longer overall match.
but than why is the regex matching an empty string?

How to find values in specific format including parenthesis using regular expression in python [duplicate]

This question already has answers here:
Regular expression to return text between parenthesis
(11 answers)
Closed 2 years ago.
I have long string S, and I want to find value (numeric) in the following format "Value(**)", where ** is values I want to extract.
For example, S is "abcdef Value(34) Value(56) Value(13)", then I want to extract values 34, 56, 13 from S.
I tried to use regex as follows.
import re
regex = re.compile('\Value(.*'))
re.findall(regex, S)
But the code yields the result I did not expect.
Edit. I edited some mistakes.
You should escape the parentheses, correct the typo of Value (as opposed to Values), use a lazy repeater *? instead of *, add the missing right parenthesis, and capture what's enclosed in the escaped parentheses with a pair of parentheses:
regex = re.compile(r'Value\((.*?)\)')
Only one of your numbers follows the word 'Value', so you can extract anything inside parentheses. You also need to escape the parentheses which are special characters.
regex = re.compile('\(.*?\)')
re.findall(regex, S)
Output:
['(34)', '(56)', '(13)']
I think what you're looking for is a capturing group that can return multiple matches. This string is: (\(\d{2}\))?. \d matches an digit and {2} matches exactly 2 digits. {1,2} will match 1 or 2 digits ect. ? matches 0 to unlimited number of times. Each group can be indexed and combined into a list. This is a simple implementation and will return the numbers within parentheses.
eg. 'asdasd Value(102222), fgdf(20), he(77)' will match 20 and 77 but not 102222.

Regex: How to find substring that does NOT contain a certain word [duplicate]

This question already has answers here:
Regular expressions: Ensuring b doesn't come between a and c
(4 answers)
Closed 3 years ago.
I have this string;
string = "STARTcandyFINISH STARTsugarFINISH STARTpoisonFINISH STARTBlobpoisonFINISH STARTpoisonBlobFINISH"
I would like to match and capture all substrings that appear in between START and FINISH but only if the word "poison" does NOT appear in that substring. How do I exclude this word and capture only the desired substrings?
re.findall(r'START(.*?)FINISH', string)
Desired captured groups:
candy
sugar
Using a tempered dot, we can try:
string = "STARTcandyFINISH STARTsugarFINISH STARTpoisonFINISH STARTBlobpoisonFINISH STARTpoisonBlobFINISH"
matches = re.findall(r'START((?:(?!poison).)*?)FINISH', string)
print(matches)
This prints:
['candy', 'sugar']
For an explanation of how the regex pattern works, we can have a closer look at:
(?:(?!poison).)*?
This uses a tempered dot trick. It will match, one character at a time, so long as what follows is not poison.

regular expression findall errors [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 4 years ago.
I run the following script
a = r'[abc] [abc] [y78]'
paaa = re.compile(r'\[ab.*]')
paaa.findall(a)
I obtained
['[abc] [abc] [y78]']
Why the '[abc]' is missing? The '[abc]' clearly matches the pattern as well. Is there any bug in the python3 re.findall function?
Clarification:
Sorry the paaa should be paaa = re.compile(r'\[ab.*\]')
What I am looking for is something which will return
['[abc]', '[abc]', '[abc] [abc]', '[abc] [abc] [y78]']
Basically, any substring matches the pattern.
The repeated . in [ab.*] is greedy - it'll match as many characters as it can such that those characters are followed by a ]. So, everything in between the first [ and the last ] are matched.
Use lazy repetition instead, with .*?:
a = r'[abc] [abc] [y78]'
paaa = re.compile(r'\[ab.*?]')
print(paaa.findall(a))
['[abc]', '[abc]']
You should escape the right square bracket as well, and use non-greedy repeater *? in your regex:
import re
a = r'[abc] [abc] [y78]'
paaa = re.compile(r'\[ab.*?\]')
print(paaa.findall(a))
This outputs:
['[abc]', '[abc]']

Match regex multiple times [duplicate]

This question already has answers here:
How to find overlapping matches with a regexp?
(4 answers)
Closed 9 years ago.
Is it possible to construct a regex that matches a pattern multiple times?
For example searching for ff in fff would give two matches. Their starting position would be 0 and 1 respectively.
Yes, it is possible. You can use positive lookahead for this.
>>> import re
>>> [m.start() for m in re.finditer(r'f(?=f)', 'fff')]
[0, 1]
Yes. Use findall(string[, pos[, endpos]])
Similar to the findall() function, using the compiled pattern, but
also accepts optional pos and endpos parameters that limit the search
region like for match().
i.e. Each time you will begin search from the m.start() of the previous match + 1.

Categories

Resources