Python regex - Substring match [duplicate] - python

This question already has answers here:
Check if a word is in a string in Python
(14 answers)
Closed 8 years ago.
I have a pattern
pattern = "hello"
and a string
str = "good morning! hello helloworld"
I would like to search pattern in str such that the entire string is present as a word i.e it should not return substring hello in helloworld. If str does not contain hello, it should return False.
I am looking for a regex pattern.

\b matches start or end of a word.
So the pattern would be pattern = re.compile(r'\bhello\b')
Assuming you are only looking for one match, re.search() returns None or a class type object (using .group() returns the exact string matched).
For multiple matches you need re.findall(). Returns a list of matches (empty list for no matches).
Full code:
import re
str1 = "good morning! hello helloworld"
str2 = ".hello"
pattern = re.compile(r'\bhello\b')
try:
match = re.search(pattern, str1).group()
print(match)
except AttributeError:
print('No match')

You can use word boundaries around the pattern you are searching for if you are looking to use a regular expression for this task.
>>> import re
>>> pattern = re.compile(r'\bhello\b', re.I)
>>> mystring = 'good morning! hello helloworld'
>>> bool(pattern.search(mystring))
True

Related

How can I add a string inside a string?

The problem is simple, I'm given a random string and a random pattern and I'm told to get all the posible combinations of that pattern that occur in the string and mark then with [target] and [endtarget] at the beggining and end.
For example:
given the following text: "XuyZB8we4"
and the following pattern: "XYZAB"
The expected output would be: "[target]X[endtarget]uy[target]ZB[endtarget]8we4".
I already got the part that identifies all the words, but I can't find a way of placing the [target] and [endtarget] strings after and before the pattern (called in the code match).
import re
def tagger(text, search):
place_s = "[target]"
place_f = "[endtarget]"
pattern = re.compile(rf"[{search}]+")
matches = pattern.finditer(text)
for match in matches:
print(match)
return test_string
test_string = "alsikjuyZB8we4 aBBe8XAZ piarBq8 Bq84Z "
pattern = "XYZAB"
print(tagger(test_string, pattern))
I also tried the for with the sub method, but I couldn't get it to work.
for match in matches:
re.sub(match.group(0), place_s + match.group(0) + place_f, text)
return text
re.sub allows you to pass backreferences to matched groups within your pattern. so you do need to enclose your pattern in parentheses, or create a named group, and then it will replace all matches in the entire string at once with your desired replacements:
In [10]: re.sub(r'([XYZAB]+)', r'[target]\1[endtarget]', test_string)
Out[10]: 'alsikjuy[target]ZB[endtarget]8we4 a[target]BB[endtarget]e8[target]XAZ[endtarget] piar[target]B[endtarget]q8 [target]B[endtarget]q84[target]Z[endtarget] '
With this approach, re.finditer is not not needed at all.

python - regex why does `findall` find nothing, but `search` works? [duplicate]

This question already has answers here:
re.findall behaves weird
(3 answers)
Closed 5 years ago.
>>> reg = re.compile(r'^\d{1,3}(,\d{3})*$')
>>> str = '42'
>>> reg.search(str).group()
'42'
>>> reg.findall(str)
['']
>>>
python regex
Why does reg.findall find nothing, but reg.search works in this piece of code above?
When you have capture groups (wrapped with parenthesis) in the regex, findall will return the match of the captured group; And in your case the captured group matches an empty string; You can make it non capture with ?: if you want to return the whole match; re.search ignores capture groups on the other hand. These are reflected in the documentation:
re.findall:
Return all non-overlapping matches of pattern in string, as a list of
strings. The string is scanned left-to-right, and matches are returned
in the order found. If one or more groups are present in the pattern,
return a list of groups; this will be a list of tuples if the pattern
has more than one group.
re.search:
Scan through string looking for the first location where the regular
expression pattern produces a match, and return a corresponding
MatchObject instance. Return None if no position in the string matches
the pattern; note that this is different from finding a zero-length
match at some point in the string.
import re
reg = re.compile(r'^\d{1,3}(?:,\d{3})*$')
s = '42'
reg.search(s).group()
​# '42'
reg.findall(s)
# ['42']

Working With Python Regex [duplicate]

This question already has answers here:
Checking whole string with a regex
(5 answers)
Closed 5 years ago.
I am trying to compile a regex on python but am having limited success. I am doing the following
import re
pattern = re.compile("[a-zA-Z0-9_])([a-zA-Z0-9_-]*)")
m=pattern.match("gb,&^(#)")
if m: print 1
else: print 2
I am expecting the output of the above to print 2, but instead it is printing one. The regex should match strings as follows:
The first letter is alphanumeric or an underscore. All characters after that can be alphanumeric, an underscore, or a dash and there can be 0 or more characters after the first.
I was thinking that this thing should fail as soon as it sees the comma, but it is not.
What am I doing wrong here?
import re
pattern = re.compile("^([a-zA-Z0-9_])([a-zA-Z0-9_-]*)$") # when you don't use $ at end it will match only initial string satisfying your regex
m=pattern.match("gb,&^(#)")
if m:
print(1)
else:
print(2)
pat = re.compile("^([a-zA-Z0-9_])([a-zA-Z0-9_-]*)") # this pattern is written by you which matches any string having alphanumeric characters and underscore in starting
if pat.match("_#"):
print('match')
else:
print('no match 1')
This will also help you understand explaination by #Wiktor with example.

Regular expression return only first occurence [duplicate]

This question already has answers here:
List of all words matching regular expression
(4 answers)
Closed 6 years ago.
I don't understand why my regular expression extraction don't return all occurences (https://regex101.com/r/1yWpq6/1):
import re
s = """
_('hello')
foo
_('world')
bar
"""
print(re.search('_\(\'(.*)\'\)', s, re.MULTILINE).groups())
produce
('hello',)
I expect ``('hello', 'world')
Why only thirst occurence returned ?
print(re.findall('_\(\'(.*)\'\)', s, re.MULTILINE))
out:
['hello', 'world']
re.search(pattern, string, flags=0)
Scan through string looking for a
location where the regular expression pattern produces a match, and
return a corresponding match object.
re.findall(pattern, string, flags=0)
Return all non-overlapping
matches of pattern in string, as a list of strings.
You need to use re.findall() like this:
print(re.findall('_\(\'(.*)\'\)', s))
Output:
>>> import re
>>>
>>> print(re.findall('_\(\'(.*)\'\)', s))
['hello', 'world']

RegEx Python not working [duplicate]

This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Closed 8 years ago.
My Reg-Ex pattern is not working, why?
string = "../../example/tobematched/nonimportant.html"
pattern = "example\/([a-z]+)\/"
test = re.match(pattern, string)
# None
http://www.regexr.com/39mpu
re.match() matches from the beginning of the string, you need to use re.search() which looks for the first location where the regular expression pattern produces a match and returns a corresponding MatchObject instance.
>>> import re
>>> s = "../../example/tobematched/nonimportant.html"
>>> re.search(r'example/([a-z]+)/', s).group(1)
'tobematched'
Try this.
test = re.search(pattern, string)
Match matches the whole string from the start, so it will give None as the result.
Grab the result from test.group().
To give you the answer in short:
search ⇒ finds something anywhere in the string and return a match object.
match ⇒ finds something at the beginning of the string and return a match object.
That is the reason you have to use
foo = re.search(pattern, bar)

Categories

Resources