Match regex multiple times [duplicate] - python

This question already has answers here:
How to find overlapping matches with a regexp?
(4 answers)
Closed 9 years ago.
Is it possible to construct a regex that matches a pattern multiple times?
For example searching for ff in fff would give two matches. Their starting position would be 0 and 1 respectively.

Yes, it is possible. You can use positive lookahead for this.
>>> import re
>>> [m.start() for m in re.finditer(r'f(?=f)', 'fff')]
[0, 1]

Yes. Use findall(string[, pos[, endpos]])
Similar to the findall() function, using the compiled pattern, but
also accepts optional pos and endpos parameters that limit the search
region like for match().
i.e. Each time you will begin search from the m.start() of the previous match + 1.

Related

Is there any way using regular expression to capture this group. result should be 2 [duplicate]

This question already has answers here:
How to use regex to find all overlapping matches
(5 answers)
Closed 2 years ago.
import re
pattern = r'faf'
string = 'fafaf'
print(len(re.findall(pattern, string)))
it is giving the answer as 1, but required answer is 2
You want to use a positive lookahead r"(?=(<pattern>))" to find overlapping patterns:
import re
pattern = r"(?=(faf))"
string = "fafaf"
print(len(re.findall(pattern, string)))
You can test regexes here: https://regex101.com/r/3FxCok/1

Python Regex: alternation gives empty matches [duplicate]

This question already has answers here:
Why do some regex engines match .* twice in a single input string?
(1 answer)
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I was doing some regex which simplifies to this code:
>>> import re
>>> re.sub(r'^.*$|', "xyz", "abc")
xyzxyz
I was expecting it to replace abc with xyz as the RE ^.*$ matches the whole string, the engine should just return that and exit. So I ran the same regex with re.findall().
>>> re.findall(r'^.*$|', 'abcd')
['abcd', '']
in the docs it says:
A|B, where A and B can be arbitrary REs. As the target string is scanned, REs separated by '|'
are tried from left to right. When one pattern completely matches,
that branch is accepted. This means that once A matches, B will not be
tested further, even if it would produce a longer overall match.
but than why is the regex matching an empty string?

How to match a full string, instead of partial string? [duplicate]

This question already has answers here:
Order of regular expression operator (..|.. ... ..|..)
(1 answer)
Checking whole string with a regex
(5 answers)
Closed 2 years ago.
pattern = (1|2|3|4|5|6|7|8|9|10|11|12)
str = '11'
This only matches '1', not '11'. How to match the full '11'? I changed it to:
pattern = (?:1|2|3|4|5|6|7|8|9|10|11|12)
It is the same.
I am testing here first:
https://regex101.com/
It is matching 1 instead of 11 because you have 1 before 11 in your alternation. If you use re.findall then it will match 1 twice for input string 11.
However to match numbers from 1 to 12 you can avoid alternation and use:
\b[1-9]|1[0-2]?\b
It is safer to use word boundary to avoid matching within word digits.
RegEx Demo
Regex always matches left before right.
On an alternation you'd put the longest first.
However, factoring should take precedense.
(1|2|3|4|5|6|7|8|9|10|11|12)
then it turns into
1[012]?|[2-9]
https://regex101.com/r/qmlKr0/1
I purposely didn't add boundary parts as
everybody has their own preference.
do you mean this solution?
[\d]+

Find substrings matching a pattern allowing overlaps [duplicate]

This question already has answers here:
How to use regex to find all overlapping matches
(5 answers)
Closed 3 years ago.
So I have strings that form concatenated 1's and 0's with length 12. Here are some examples:
100010011100
001111110000
001010100011
I want to isolate sections of each which start with 1, following with any numbers of zeros, and then ends with 1.
So for the first string, I would want ['10001','1001']
The second string, I would want nothing returned
The third list, I would want ['101','101','10001']
I've tried using a combination of positive lookahead and positive lookbehind, but it isn't working. This is what I've come up with so far [(?<=1)0][0(?=1)]
For a non-regex approach, you can split the string on 1. The matches you want are any elements in the resulting list with a 0 in it, excluding the first and last elements of the array.
Code:
myStrings = [
"100010011100",
"001111110000",
"001010100011"
]
for s in myStrings:
matches = ["1"+z+"1" for i, z in enumerate(s.split("1")[:-1]) if (i>0) and ("0" in z)]
print(matches)
Output:
#['10001', '1001']
#[]
#['101', '101', '10001']
I suggest writing a simple regex: r'10+1'. Then use python logic to find each match using re.search(). After each match, start the next search at the position after the beginning of the match.
Can't do it in one search with a regex.
def parse(s):
pattern = re.compile(r'(10+1)')
match = pattern.search(s)
while match:
yield match[0]
match = pattern.search(s, match.end()-1)

RETURN OVERLAPPING PATTERNS AS GROUPS IN PYTHON [duplicate]

This question already has answers here:
How to use regex to find all overlapping matches
(5 answers)
Closed 6 years ago.
I have been sifting across the internet for long and stack overflow in particular. But, I do not seem to find a regex explanation to find multiple intersecting / non-intersecting sub strings:
Suppose my original string is:
aabcdacdacdfghcdacds
and the sub string to be fetched is:
cdacd
and I wish to find intersecting or non-intersecting sub strings as groups
This means I want the regex for three groups from the original string:
group 1: (cdacd)
group 2: (cdacd)
group 3: (cdacd)
Notice that cdacd for group 1 and group2 in aabcdacdacdfghcdacds have an intersecting cd.
Please advise.
try it like this:
In [1]: import re
In [2]: re.findall('(?=(cdacd))', 'aabcdacdacdfghcdacds')
Out[2]: ['cdacd', 'cdacd', 'cdacd']
from python docs (search for ?=):
Matches if ... matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example, Isaac (?=Asimov) will match 'Isaac ' only if it’s followed by 'Asimov'.

Categories

Resources