Working With Python Regex [duplicate] - python

This question already has answers here:
Checking whole string with a regex
(5 answers)
Closed 5 years ago.
I am trying to compile a regex on python but am having limited success. I am doing the following
import re
pattern = re.compile("[a-zA-Z0-9_])([a-zA-Z0-9_-]*)")
m=pattern.match("gb,&^(#)")
if m: print 1
else: print 2
I am expecting the output of the above to print 2, but instead it is printing one. The regex should match strings as follows:
The first letter is alphanumeric or an underscore. All characters after that can be alphanumeric, an underscore, or a dash and there can be 0 or more characters after the first.
I was thinking that this thing should fail as soon as it sees the comma, but it is not.
What am I doing wrong here?

import re
pattern = re.compile("^([a-zA-Z0-9_])([a-zA-Z0-9_-]*)$") # when you don't use $ at end it will match only initial string satisfying your regex
m=pattern.match("gb,&^(#)")
if m:
print(1)
else:
print(2)
pat = re.compile("^([a-zA-Z0-9_])([a-zA-Z0-9_-]*)") # this pattern is written by you which matches any string having alphanumeric characters and underscore in starting
if pat.match("_#"):
print('match')
else:
print('no match 1')
This will also help you understand explaination by #Wiktor with example.

Related

I'm stuck with regular expressions [duplicate]

This question already has answers here:
Regexp to remove specific number of occurrences of character only
(2 answers)
How to only match a single instance of a character?
(3 answers)
Closed 11 months ago.
I'm stuck with regular expressions in Python...
#!/usr/bin/python3
import re
combi="ABBAEAADCA"
one_a = len(re.findall('[^A](A)[^A]', combi))
print("A:"+str(one_a))
I try to make this variable (one_a) contain the number of A's that appear alone (3) but it does not count those at the beginning and end of lines so....
one_a = len(re.findall('\A(A)[^A]', combi))
print("A ini:"+str(one_a))
one_a += len(re.findall('[^A](A)[^A]', combi))
print("A_cen:"+str(one_a))
one_a += len(re.findall('[^A](A)\Z', combi))
print("A_end:"+str(one_a))
but it didn't work either when in this particular case the value that should stay in the variable should be 3.
I would appreciate knowing what I am missing or what mistake I am making.
Thank you very much
Using a negated character class [^A] matches a single character, and \A asserts the start of the string.
To get the A's that stand alone you can negative lookarounds asserting not A directly to the left and right:
(?<!A)A(?!A)
See a regex demo and a Python demo.
import re
combi="ABBAEAADCA"
one_a = len(re.findall('(?<!A)A(?!A)', combi))
print("A:"+str(one_a))
Output
A:3
You can combine start-of-string (^) and end of string ($) with regular character classes through the or (|) operator.
re.findall(r'(?:^|[^A])A(?:$|[^A])', combi)
This gives you all substrings where A is either surrended by start of string and end of string, start of string and not-A, not-A or end of string or not-A and not-A.
>>> re.findall(r'(?:^|[^A])A(?:$|[^A])', combi)
['AB', 'BAE', 'CA']
Applying len to this list gives you the count of single A's.

Need to find '$word;' pattern in string [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I have big text file and I have to find all words starts with '$' and ends with ';' like $word;.
import re
text = "$h;BREWERY$h_end;You've built yourself a brewery."
x = re.findall("$..;", text)
print(x)
I want my output like ['$h;', '$h_end;'] How can I do that?
I have to find all words starts with '$' and ends with ';' like $word;.
I would do:
import re
text = "$h;BREWERY$h_end;You've built yourself a brewery."
result = re.findall('\$[^;]+;',text)
print(result)
Output:
['$h;', '$h_end;']
Note that $ needs to be escaped (\$) as it is one of special characters. Then I match 1 or more occurences of anything but ; and finally ;.
You may use
\$\w+;
See the regex demo. Details:
\$ - a $ char
\w+ - 1+ letters, digits, _ (=word) chars
; - a semi-colon.
Python demo:
import re
text = "$h;BREWERY$h_end;You've built yourself a brewery."
x = re.findall(r"\$\w+;", text)
print(x) # => ['$h;', '$h_end;']

regular expression findall errors [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 4 years ago.
I run the following script
a = r'[abc] [abc] [y78]'
paaa = re.compile(r'\[ab.*]')
paaa.findall(a)
I obtained
['[abc] [abc] [y78]']
Why the '[abc]' is missing? The '[abc]' clearly matches the pattern as well. Is there any bug in the python3 re.findall function?
Clarification:
Sorry the paaa should be paaa = re.compile(r'\[ab.*\]')
What I am looking for is something which will return
['[abc]', '[abc]', '[abc] [abc]', '[abc] [abc] [y78]']
Basically, any substring matches the pattern.
The repeated . in [ab.*] is greedy - it'll match as many characters as it can such that those characters are followed by a ]. So, everything in between the first [ and the last ] are matched.
Use lazy repetition instead, with .*?:
a = r'[abc] [abc] [y78]'
paaa = re.compile(r'\[ab.*?]')
print(paaa.findall(a))
['[abc]', '[abc]']
You should escape the right square bracket as well, and use non-greedy repeater *? in your regex:
import re
a = r'[abc] [abc] [y78]'
paaa = re.compile(r'\[ab.*?\]')
print(paaa.findall(a))
This outputs:
['[abc]', '[abc]']

My regular expression is not getting matched exactly in python [duplicate]

This question already has answers here:
Checking whole string with a regex
(5 answers)
Closed 6 years ago.
Here's my code...
import re
l=["chap","chap11","chapa","chapb","chapc","chap3","chap2","chapf","chap4","chap55","chapf","chap33","chap54","chapgk"]
for i in l:
matchobj=re.match(r'chap[0-9]',i,re.M|re.I)
if matchobj:
print(i)
as I have mentioned chap[0-9].. so it should only those strings which follow only one integer after chap
so I should get the following output..
chap3
chap2
chap4
but I am getting the following output...
chap11
chap3
chap2
chap4
chap55
chap33
chap54
match matches your pattern at the beginning of the string. Append e.g. end of string '$' or word boundary '\b' to your pattern:
matchobj=re.match(r'chap\d$',i,re.M|re.I)
# \d (digit) is shortcut for [0-9]
From the docs on re.match:
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance.
You should add a dollar sign to the end of your regex expression. The dollar ($) means the end of the string, and for future reference, the carat (^) signifies the beginning.
import re
l=["chap","chap11","chapa","chapb","chapc","chap3","chap2","chapf","chap4","chap55","chapf","chap33","chap54","chapgk"]
for i in l:
matchobj=re.match(r'chap[0-9]$',i,re.M|re.I)
if matchobj:
print(i)
Output
chap3
chap2
chap4

Regex can't escape question mark? [duplicate]

This question already has an answer here:
match trailing slash with Python regex
(1 answer)
Closed 8 years ago.
I can't match the question mark character although I escaped it.
I tried escaping with multiple backslashes and also using re.escape().
What am I missing?
Code:
import re
text = 'test?'
result = ''
result = re.match(r'\?',text)
print ("input: "+text)
print ("found: "+str(result))
Output:
input: test?
found: None
re.match only matches a pattern at the begining of string; as in the docs:
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding match object.
so, either:
>>> re.match(r'.*\?', text).group(0)
'test?
or re.search
>>> re.search(r'\?', text).group(0)
'?'

Categories

Resources