Regex can't escape question mark? [duplicate] - python

This question already has an answer here:
match trailing slash with Python regex
(1 answer)
Closed 8 years ago.
I can't match the question mark character although I escaped it.
I tried escaping with multiple backslashes and also using re.escape().
What am I missing?
Code:
import re
text = 'test?'
result = ''
result = re.match(r'\?',text)
print ("input: "+text)
print ("found: "+str(result))
Output:
input: test?
found: None

re.match only matches a pattern at the begining of string; as in the docs:
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding match object.
so, either:
>>> re.match(r'.*\?', text).group(0)
'test?
or re.search
>>> re.search(r'\?', text).group(0)
'?'

Related

Python Regex: alternation gives empty matches [duplicate]

This question already has answers here:
Why do some regex engines match .* twice in a single input string?
(1 answer)
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I was doing some regex which simplifies to this code:
>>> import re
>>> re.sub(r'^.*$|', "xyz", "abc")
xyzxyz
I was expecting it to replace abc with xyz as the RE ^.*$ matches the whole string, the engine should just return that and exit. So I ran the same regex with re.findall().
>>> re.findall(r'^.*$|', 'abcd')
['abcd', '']
in the docs it says:
A|B, where A and B can be arbitrary REs. As the target string is scanned, REs separated by '|'
are tried from left to right. When one pattern completely matches,
that branch is accepted. This means that once A matches, B will not be
tested further, even if it would produce a longer overall match.
but than why is the regex matching an empty string?

regular expression findall errors [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 4 years ago.
I run the following script
a = r'[abc] [abc] [y78]'
paaa = re.compile(r'\[ab.*]')
paaa.findall(a)
I obtained
['[abc] [abc] [y78]']
Why the '[abc]' is missing? The '[abc]' clearly matches the pattern as well. Is there any bug in the python3 re.findall function?
Clarification:
Sorry the paaa should be paaa = re.compile(r'\[ab.*\]')
What I am looking for is something which will return
['[abc]', '[abc]', '[abc] [abc]', '[abc] [abc] [y78]']
Basically, any substring matches the pattern.
The repeated . in [ab.*] is greedy - it'll match as many characters as it can such that those characters are followed by a ]. So, everything in between the first [ and the last ] are matched.
Use lazy repetition instead, with .*?:
a = r'[abc] [abc] [y78]'
paaa = re.compile(r'\[ab.*?]')
print(paaa.findall(a))
['[abc]', '[abc]']
You should escape the right square bracket as well, and use non-greedy repeater *? in your regex:
import re
a = r'[abc] [abc] [y78]'
paaa = re.compile(r'\[ab.*?\]')
print(paaa.findall(a))
This outputs:
['[abc]', '[abc]']

Working With Python Regex [duplicate]

This question already has answers here:
Checking whole string with a regex
(5 answers)
Closed 5 years ago.
I am trying to compile a regex on python but am having limited success. I am doing the following
import re
pattern = re.compile("[a-zA-Z0-9_])([a-zA-Z0-9_-]*)")
m=pattern.match("gb,&^(#)")
if m: print 1
else: print 2
I am expecting the output of the above to print 2, but instead it is printing one. The regex should match strings as follows:
The first letter is alphanumeric or an underscore. All characters after that can be alphanumeric, an underscore, or a dash and there can be 0 or more characters after the first.
I was thinking that this thing should fail as soon as it sees the comma, but it is not.
What am I doing wrong here?
import re
pattern = re.compile("^([a-zA-Z0-9_])([a-zA-Z0-9_-]*)$") # when you don't use $ at end it will match only initial string satisfying your regex
m=pattern.match("gb,&^(#)")
if m:
print(1)
else:
print(2)
pat = re.compile("^([a-zA-Z0-9_])([a-zA-Z0-9_-]*)") # this pattern is written by you which matches any string having alphanumeric characters and underscore in starting
if pat.match("_#"):
print('match')
else:
print('no match 1')
This will also help you understand explaination by #Wiktor with example.

My regular expression is not getting matched exactly in python [duplicate]

This question already has answers here:
Checking whole string with a regex
(5 answers)
Closed 6 years ago.
Here's my code...
import re
l=["chap","chap11","chapa","chapb","chapc","chap3","chap2","chapf","chap4","chap55","chapf","chap33","chap54","chapgk"]
for i in l:
matchobj=re.match(r'chap[0-9]',i,re.M|re.I)
if matchobj:
print(i)
as I have mentioned chap[0-9].. so it should only those strings which follow only one integer after chap
so I should get the following output..
chap3
chap2
chap4
but I am getting the following output...
chap11
chap3
chap2
chap4
chap55
chap33
chap54
match matches your pattern at the beginning of the string. Append e.g. end of string '$' or word boundary '\b' to your pattern:
matchobj=re.match(r'chap\d$',i,re.M|re.I)
# \d (digit) is shortcut for [0-9]
From the docs on re.match:
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance.
You should add a dollar sign to the end of your regex expression. The dollar ($) means the end of the string, and for future reference, the carat (^) signifies the beginning.
import re
l=["chap","chap11","chapa","chapb","chapc","chap3","chap2","chapf","chap4","chap55","chapf","chap33","chap54","chapgk"]
for i in l:
matchobj=re.match(r'chap[0-9]$',i,re.M|re.I)
if matchobj:
print(i)
Output
chap3
chap2
chap4

RegEx Python not working [duplicate]

This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Closed 8 years ago.
My Reg-Ex pattern is not working, why?
string = "../../example/tobematched/nonimportant.html"
pattern = "example\/([a-z]+)\/"
test = re.match(pattern, string)
# None
http://www.regexr.com/39mpu
re.match() matches from the beginning of the string, you need to use re.search() which looks for the first location where the regular expression pattern produces a match and returns a corresponding MatchObject instance.
>>> import re
>>> s = "../../example/tobematched/nonimportant.html"
>>> re.search(r'example/([a-z]+)/', s).group(1)
'tobematched'
Try this.
test = re.search(pattern, string)
Match matches the whole string from the start, so it will give None as the result.
Grab the result from test.group().
To give you the answer in short:
search ⇒ finds something anywhere in the string and return a match object.
match ⇒ finds something at the beginning of the string and return a match object.
That is the reason you have to use
foo = re.search(pattern, bar)

Categories

Resources