Regular expression return only first occurence [duplicate] - python

This question already has answers here:
List of all words matching regular expression
(4 answers)
Closed 6 years ago.
I don't understand why my regular expression extraction don't return all occurences (https://regex101.com/r/1yWpq6/1):
import re
s = """
_('hello')
foo
_('world')
bar
"""
print(re.search('_\(\'(.*)\'\)', s, re.MULTILINE).groups())
produce
('hello',)
I expect ``('hello', 'world')
Why only thirst occurence returned ?

print(re.findall('_\(\'(.*)\'\)', s, re.MULTILINE))
out:
['hello', 'world']
re.search(pattern, string, flags=0)
Scan through string looking for a
location where the regular expression pattern produces a match, and
return a corresponding match object.
re.findall(pattern, string, flags=0)
Return all non-overlapping
matches of pattern in string, as a list of strings.

You need to use re.findall() like this:
print(re.findall('_\(\'(.*)\'\)', s))
Output:
>>> import re
>>>
>>> print(re.findall('_\(\'(.*)\'\)', s))
['hello', 'world']

Related

python - regex why does `findall` find nothing, but `search` works? [duplicate]

This question already has answers here:
re.findall behaves weird
(3 answers)
Closed 5 years ago.
>>> reg = re.compile(r'^\d{1,3}(,\d{3})*$')
>>> str = '42'
>>> reg.search(str).group()
'42'
>>> reg.findall(str)
['']
>>>
python regex
Why does reg.findall find nothing, but reg.search works in this piece of code above?
When you have capture groups (wrapped with parenthesis) in the regex, findall will return the match of the captured group; And in your case the captured group matches an empty string; You can make it non capture with ?: if you want to return the whole match; re.search ignores capture groups on the other hand. These are reflected in the documentation:
re.findall:
Return all non-overlapping matches of pattern in string, as a list of
strings. The string is scanned left-to-right, and matches are returned
in the order found. If one or more groups are present in the pattern,
return a list of groups; this will be a list of tuples if the pattern
has more than one group.
re.search:
Scan through string looking for the first location where the regular
expression pattern produces a match, and return a corresponding
MatchObject instance. Return None if no position in the string matches
the pattern; note that this is different from finding a zero-length
match at some point in the string.
import re
reg = re.compile(r'^\d{1,3}(?:,\d{3})*$')
s = '42'
reg.search(s).group()
​# '42'
reg.findall(s)
# ['42']

Is the behavior of Python vs Perl regex for carat and dollar different? [duplicate]

This question already has an answer here:
Python regular expression re.match, why this code does not work? [duplicate]
(1 answer)
Closed 6 years ago.
In Python:
Given a string 've, I can catch the start of the string with carat:
>>> import re
>>> s = u"'ve"
>>> re.match(u"^[\'][a-z]", s)
<_sre.SRE_Match object at 0x1109ee030>
So it matches even though the length substring after the single quote is > 1.
But for the dollar (matching end of string):
>>> import re
>>> s = u"'ve"
>>> re.match(u"[a-z]$", s)
>>>
In Perl, from here
It seems like the end of string can be matched with:
$s =~ /[\p{IsAlnum}]$/
Is $s =~ /[\p{IsAlnum}]$/ the same as re.match(u"[a-z]$", s) ?
Why is the carat and dollar behavior different? And are they different for Python and Perl?
re.match is implicitly anchored at the start of the string. Quoting the documentation:
re.match(pattern, string, flags=0)
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance.
Try re.search instead.
>>> import re
>>> s = u"'ve"
>>> re.search(u"[a-z]$", s)
<_sre.SRE_Match object at 0x7fea24df3780>
>>>

Python regex - Substring match [duplicate]

This question already has answers here:
Check if a word is in a string in Python
(14 answers)
Closed 8 years ago.
I have a pattern
pattern = "hello"
and a string
str = "good morning! hello helloworld"
I would like to search pattern in str such that the entire string is present as a word i.e it should not return substring hello in helloworld. If str does not contain hello, it should return False.
I am looking for a regex pattern.
\b matches start or end of a word.
So the pattern would be pattern = re.compile(r'\bhello\b')
Assuming you are only looking for one match, re.search() returns None or a class type object (using .group() returns the exact string matched).
For multiple matches you need re.findall(). Returns a list of matches (empty list for no matches).
Full code:
import re
str1 = "good morning! hello helloworld"
str2 = ".hello"
pattern = re.compile(r'\bhello\b')
try:
match = re.search(pattern, str1).group()
print(match)
except AttributeError:
print('No match')
You can use word boundaries around the pattern you are searching for if you are looking to use a regular expression for this task.
>>> import re
>>> pattern = re.compile(r'\bhello\b', re.I)
>>> mystring = 'good morning! hello helloworld'
>>> bool(pattern.search(mystring))
True

RegEx Python not working [duplicate]

This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Closed 8 years ago.
My Reg-Ex pattern is not working, why?
string = "../../example/tobematched/nonimportant.html"
pattern = "example\/([a-z]+)\/"
test = re.match(pattern, string)
# None
http://www.regexr.com/39mpu
re.match() matches from the beginning of the string, you need to use re.search() which looks for the first location where the regular expression pattern produces a match and returns a corresponding MatchObject instance.
>>> import re
>>> s = "../../example/tobematched/nonimportant.html"
>>> re.search(r'example/([a-z]+)/', s).group(1)
'tobematched'
Try this.
test = re.search(pattern, string)
Match matches the whole string from the start, so it will give None as the result.
Grab the result from test.group().
To give you the answer in short:
search ⇒ finds something anywhere in the string and return a match object.
match ⇒ finds something at the beginning of the string and return a match object.
That is the reason you have to use
foo = re.search(pattern, bar)

Split string based on regexp without consuming characters [duplicate]

This question already has answers here:
Non-consuming regular expression split in Python
(2 answers)
Closed 8 years ago.
I would like to split a string like the following
text="one,two;three.four:"
into the list
textOut=["one", ",two", ";three", ".four", ":"]
I have tried with
import re
textOut = re.split(r'(?=[.:,;])', text)
But this does not split anything.
I would use re.findall here instead of re.split:
>>> from re import findall
>>> text = "one,two;three.four:"
>>> findall("(?:^|\W)\w*", text)
['one', ',two', ';three', '.four', ':']
>>>
Below is a breakdown of the Regex pattern used above:
(?: # The start of a non-capturing group
^|\W # The start of the string or a non-word character (symbol)
) # The end of the non-capturing group
\w* # Zero or more word characters (characters that are not symbols)
For more information, see here.
I don't know what else can occur in your string, but will this do the trick?
>>> s='one,two;three.four:'
>>> [x for x in re.findall(r'[.,;:]?\w*', s) if x]
['one', ',two', ';three', '.four', ':']

Categories

Resources