Excluding words using regex without excluding its variants [duplicate] - python

This question already has answers here:
Find substring in string but only if whole words?
(8 answers)
Closed 4 years ago.
I am trying to exclude the word ‘define’ without excluding other forms of the word like ‘defined’ or ‘defining’ but the below mentioned regex doesn’t work. Help.
Regex :
^((?!define).)*$

Use word boundaries around the word define:
^((?!\bdefine\b).)*$
You could also write this pattern as:
^(?!.*\bdefine\b).*$
Demo

Related

One character in multiple groups [duplicate]

This question already has answers here:
How to use regex to find all overlapping matches
(5 answers)
Closed 3 years ago.
I have a string like:
s = 'ababbabbba'
I'm trying to match all patterns matching any number of b's between a's. This is what I expect the patterns to be for s above:
['aba', 'abba', 'abbba']
This is what I've tried:
import re
re.findall('ab+a', s)
Which gives:
['aba', 'abbba']
I think that happens because any single a can only be part of a single group. Whereas my requirement would make the middle a's be part of two groups. Reading through the re documentation, I can't find any way to do this.
Based on the comment above, the solution is:
re.findall('(?=(ab+a))', s)

Python regex with multiple matches in the same string [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Python non-greedy regexes
(7 answers)
Closed 3 years ago.
test = '<tag>part1</tag><tag can have random stuff here>part2</tag>'
print(re.findall("<tag.*>(.*)</tag>", test))
It outputs:
['part2']
The text can have any amount of "parts". I want to return all of them, not only the last one. What's the best way to do it?
You could change your .* to be .*? so that they are non-greedy. That will make your original example work:
import re
test = '<tag>part1</tag><tag can have random stuff here>part2</tag>'
print(re.findall(r'<tag.*?>(.*?)</tag>', test))
Output:
['part1', 'part2']
Though it would probably be best to not try to parse this with just regex, but instead use a proper HTML parser library.

python split regex into multiple lines [duplicate]

This question already has answers here:
How to split long regular expression rules to multiple lines in Python
(6 answers)
Closed 4 years ago.
Just a simple question. Lets say i have a very long regex.
regex = "(foo|foo|foo|foo|bar|bar|bar)"
Now i want to split this regex into multiple lines. I tried
regex = "(foo|foo|foo|foo|\
bar|bar|bar)"
but this doesnt seems to work. I get different outputs. Any ideas?
Just do it like this
regex = "(foo|foo|foo|foo" \
"|bar|bar|bar)"

Regex: match any word (including foobar), but not foo [duplicate]

This question already has answers here:
Matching all words except one
(8 answers)
Regex: match everything but a specific pattern
(6 answers)
Closed 4 years ago.
I'm very new to regex and I've looked but haven't found the syntax I'm looking for.
I want to match any word (including foobar), but not foo. However, everythimg I've found catches foobar with foo.
What's the correct way to do this? I'm working in Python, if that matters
(?!^foo$)^\w+$
This is a negative look ahead (?!), saying don't match the word foo, but match any other word.
^ and $ assert the start and end of the string, respectively. \w+ means match one or more of any word character.
And an example:
https://regex101.com/r/nfxyso/2

Why does Python 3 re.findall fail to find all matches here? [duplicate]

This question already has answers here:
How to use regex to find all overlapping matches
(5 answers)
How to find overlapping matches with a regexp?
(4 answers)
Closed 5 years ago.
I'm using Python 3's re.findall to find all occurrences of a substring within a string:
import re
full_string = 'ABCDCDC'
sub_pattern = 'CDC'
re.findall(sub_pattern, full_string)
re.findall only finds ['CDC'], however, the pattern CDC occurs 2 times in the full string.
Why isn't re.findall finding all occurrences of CDC here?
What's needed so that re correctly finds all occurrences?

Categories

Resources