This question already has answers here:
How to use regex for words
(4 answers)
Closed 9 years ago.
I am trying to match text containing a word (let's say 'word'). I am using the following regex:
r = re.compile(r'\bword\b')
When I try this regex I get the following results:
r.match('a word a') > None
r.match(' word ') > None
r.match('word') > match
Shouldn't all three strings match?
From the docs:
re.search(pattern, string, flags=0) Scan through string looking for a
location where the regular expression pattern produces a match, and
return a corresponding MatchObject instance. Return None if no
position in the string matches the pattern; note that this is
different from finding a zero-length match at some point in the
string.
re.match(pattern, string, flags=0) If zero or more characters at the
beginning of string match the regular expression pattern, return a
corresponding MatchObject instance. Return None if the string does not
match the pattern; note that this is different from a zero-length
match.
Note that even in MULTILINE mode, re.match() will only match at the
beginning of the string and not at the beginning of each line.
If you want to locate a match anywhere in string, use search() instead
So, just do r.search(...) and you should get what you want.
Related
What's the difference between pandas.Series.str.contains and pandas.Series.str.match? Why is the case below?
s1 = pd.Series(['house and parrot'])
s1.str.contains(r"\bparrot\b", case=False)
I got True, but when i do
s1.str.match(r"\bparrot\b", case=False)
I got False. Why is the case?
The documentation for str.contains() states:
Test if pattern or regex is contained within a string of a Series or
Index.
The documentation for str.match() states:
Determine if each string matches a regular expression.
The difference in these two methods is that str.contains() uses: re.search, while str.match() uses re.match.
As per documentation of re.match()
If zero or more characters at the beginning of string match the
regular expression pattern, return a corresponding match object.
Return None if the string does not match the pattern; note that this
is different from a zero-length match.
So parrot does not match the first character of the string so your expression returns False. House does match the first character so it finds house and returns true.
This question already has answers here:
re.findall behaves weird
(3 answers)
Closed 5 years ago.
>>> reg = re.compile(r'^\d{1,3}(,\d{3})*$')
>>> str = '42'
>>> reg.search(str).group()
'42'
>>> reg.findall(str)
['']
>>>
python regex
Why does reg.findall find nothing, but reg.search works in this piece of code above?
When you have capture groups (wrapped with parenthesis) in the regex, findall will return the match of the captured group; And in your case the captured group matches an empty string; You can make it non capture with ?: if you want to return the whole match; re.search ignores capture groups on the other hand. These are reflected in the documentation:
re.findall:
Return all non-overlapping matches of pattern in string, as a list of
strings. The string is scanned left-to-right, and matches are returned
in the order found. If one or more groups are present in the pattern,
return a list of groups; this will be a list of tuples if the pattern
has more than one group.
re.search:
Scan through string looking for the first location where the regular
expression pattern produces a match, and return a corresponding
MatchObject instance. Return None if no position in the string matches
the pattern; note that this is different from finding a zero-length
match at some point in the string.
import re
reg = re.compile(r'^\d{1,3}(?:,\d{3})*$')
s = '42'
reg.search(s).group()
# '42'
reg.findall(s)
# ['42']
This question already has answers here:
Checking whole string with a regex
(5 answers)
Closed 6 years ago.
Here's my code...
import re
l=["chap","chap11","chapa","chapb","chapc","chap3","chap2","chapf","chap4","chap55","chapf","chap33","chap54","chapgk"]
for i in l:
matchobj=re.match(r'chap[0-9]',i,re.M|re.I)
if matchobj:
print(i)
as I have mentioned chap[0-9].. so it should only those strings which follow only one integer after chap
so I should get the following output..
chap3
chap2
chap4
but I am getting the following output...
chap11
chap3
chap2
chap4
chap55
chap33
chap54
match matches your pattern at the beginning of the string. Append e.g. end of string '$' or word boundary '\b' to your pattern:
matchobj=re.match(r'chap\d$',i,re.M|re.I)
# \d (digit) is shortcut for [0-9]
From the docs on re.match:
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance.
You should add a dollar sign to the end of your regex expression. The dollar ($) means the end of the string, and for future reference, the carat (^) signifies the beginning.
import re
l=["chap","chap11","chapa","chapb","chapc","chap3","chap2","chapf","chap4","chap55","chapf","chap33","chap54","chapgk"]
for i in l:
matchobj=re.match(r'chap[0-9]$',i,re.M|re.I)
if matchobj:
print(i)
Output
chap3
chap2
chap4
This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Closed 8 years ago.
My Reg-Ex pattern is not working, why?
string = "../../example/tobematched/nonimportant.html"
pattern = "example\/([a-z]+)\/"
test = re.match(pattern, string)
# None
http://www.regexr.com/39mpu
re.match() matches from the beginning of the string, you need to use re.search() which looks for the first location where the regular expression pattern produces a match and returns a corresponding MatchObject instance.
>>> import re
>>> s = "../../example/tobematched/nonimportant.html"
>>> re.search(r'example/([a-z]+)/', s).group(1)
'tobematched'
Try this.
test = re.search(pattern, string)
Match matches the whole string from the start, so it will give None as the result.
Grab the result from test.group().
To give you the answer in short:
search ⇒ finds something anywhere in the string and return a match object.
match ⇒ finds something at the beginning of the string and return a match object.
That is the reason you have to use
foo = re.search(pattern, bar)
This question already has an answer here:
Python regex alternation
(1 answer)
Closed 8 years ago.
I am trying to find the characters immediately before and after a regex match in a given string. This is the code.
>>>import re
>>>s='dafddadffdbdasbffsbbfdbabbfsdfadsfdfddf' #completely garbage test string
>>>re.findall('.{0,5}(abb).{0,5}',s)
['abb']
The test string has an occurence of 'abb' here ...fdbabbfsd... I am under the impression that the special character . matches any character other than \n and the {m,n} Causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as many repetitions as possible as stated here
So I expect my re to return ['bbfdbabbfsdfa'] and not just ['abb']. What am I missing?
It's because of the capturing group. Just move the parentheses:
re.findall('(.{0,5}abb.{0,5})',s)
findall only matches groups, so everything you want to match needs to be in the parentheses.
According to re.findall documentation:
Return all non-overlapping matches of pattern in string, as a list of
strings. The string is scanned left-to-right, and matches are returned
in the order found. If one or more groups are present in the pattern,
return a list of groups; this will be a list of tuples if the pattern
has more than one group.
So by surrounding whole pattern as a group or removing group will give you what you want.
>>> re.findall('(.{0,5}abb.{0,5})',s) # Entire pattern as a group
['bbfdbabbfsdfa']
>>> re.findall('.{0,5}abb.{0,5}',s) # No capturing group
['bbfdbabbfsdfa']