python regular expression doesn't work [duplicate] - python

This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Closed 6 years ago.
I have this android logcat's log:
"Could not find class android.app.Notification$Action$Builder, referenced from method b.a"
and I'm trying to apply a regular expression, in python, to extract android.app.Notification$Action$Builder and b.a.
I use this code:
regexp = '\'([\w\d\.\$\:\-\[\]\<\>]+).*\s([\w\d\.\$\:\-\[\]\<\>]+)'
match = re.match(r'%s' % regexp, msg, re.M | re.I)
I tested the regular expression online and it works as expected, but it never matches in python. Someone can give me some suggestions?
Thanks

.re.match() matches only at the start of a string. Use re.search() instead, see match() vs. search().
Note that you appear to misunderstand what a raw string literal is; r'%s' % string does not produce a special, different object. r'..' is just notation, it still produces a regular string object. Put the r on the original string literal instead (but if you use double quotes you do not need to quote the single quote contained):
regexp = r"'([\w\d\.\$\:\-\[\]\<\>]+).*\s([\w\d\.\$\:\-\[\]\<\>]+)"
For this specific regex it doesn't otherwise matter to the pattern produced.
Note that the pattern doesn't actually capture what you want to capture. Apart from the escaped ' at the start (which doesn't appear in your text at all, it won't work as it doesn't require dots and dollars to be part of the name. As such, you capture Could and b.a instead, the first and last words in the regular expression.
I'd anchor on the words class and method instead, and perhaps require there to be dots in the class name:
regexp = r'class\s+((?:[\w\d\$\:\-\[\]\<\>]+\.)+[\w\d\$\:\-\[\]\<\>]+).*method ([\w\d.\$\:\-\[\]\<\>]+)'
Demo:
>>> import re
>>> regexp = r'class\s+((?:[\w\d\$\:\-\[\]\<\>]+\.)+[\w\d\$\:\-\[\]\<\>]+).*method ([\w\d.\$\:\-\[\]\<\>]+)'
>>> msg = "Could not find class android.app.Notification$Action$Builder, referenced from method b.a"
>>> re.search(regexp, msg, re.M | re.I)
<_sre.SRE_Match object at 0x1023072d8>
>>> re.search(regexp, msg, re.M | re.I).groups()
('android.app.Notification$Action$Builder', 'b.a')

Related

Don't raw strings treat backslashes as a literal character? [duplicate]

This question already has answers here:
Raw string and regular expression in Python
(4 answers)
Closed 2 years ago.
I have a question about the backslashes when using the re module in python. Consider the code:
import re
message = 'My phone number is 345-298-2372'
num_reg = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
match = num_reg.search(message)
print(match.group())
In the code above, a raw string is passed into the re.compile method, but the backslash is still not treated as a literal character, as /d remain a placeholder for a digit. Why the raw string then?
The documentation for re and raw strings answers this question well.
So in your example the parameter passed to re.compile() ends up containing the original \. This is desirable when working with re because it has its own escape sequences that may or may not conflict with python's escape sequences. Typically it's much more convenient to use r'foo' when working with regex so you don't have to double escape your regex special characters.
Without the raw string, for the escape character to make it to re for processing you would need to use:
import re
message = 'My phone number is 345-298-2372'
num_reg = re.compile('\\d\\d\\d-\\d\\d\\d-\\d\\d\\d\\d')
match = num_reg.search(message)
print(match.group())
You may consider looking at regex quantifier/repetition syntax as it generally makes re more readable:
import re
message = 'My phone number is 345-298-2372'
num_reg = re.compile(r'\d{3}-\d{3}-\d{4}')
match = num_reg.search(message)
print(match.group())

How can I use a variable as regex in python? [duplicate]

This question already has answers here:
How to use a variable inside a regular expression?
(12 answers)
Closed 4 years ago.
I use re to find a word on a file and I stored it as lattice_type
Now I want to use the word stored on lattice_type to make another regex
I tried using the name of the variable on this way
pnt_grp=re.match(r'+ lattice_type + (.*?) .*',line, re.M|re.I)
Here I look for the regex lattice_type= and store the group(1) in lattice_type
latt=open(cell_file,"r")
for types in latt:
line = types
latt_type = re.match(r'lattice_type = (.*)', line, re.M|re.I)
if latt_type:
lattice_type=latt_type.group(1)
Here is where I want to use the variable containing the word to find it on another file, but I got problems
pg=open(parameters,"r")
for lines in pg:
line=lines
pnt_grp=re.match(r'+ lattice_type + (.*?) .*',line, re.M|re.I)
if pnt_grp:
print(pnt_grp(1))
The r prefix is only needed when defining a string with a lot of backslashes, because both regex and Python string syntax attach meaning to backslashes. r'..' is just an alternative syntax that makes it easier to work with regex patterns. You don't have to use r'..' raw string literals. See The backslash plague in the Python regex howto for more information.
All that means that you certainly don't need to use the r prefix when already have a string value. A regex pattern is just a string value, and you can just use normal string formatting or concatenation techniques:
pnt_grp = re.match(lattice_type + '(.*?) .*', line, re.M|re.I)
I didn't use r in the string literal above, because there are no \ backslashes in the expression there to cause issues.
You may need to use the re.escape() function on your lattice_type value, if there is a possibility of that value containing regular expression meta-characters such as . or ? or [, etc. re.escape() escapes such metacharacters so that only literal text is matched:
pnt_grp = re.match(re.escape(lattice_type) + '(.*?) .*', line, re.M|re.I)

Search and replace --.sub(replacement, string[, count=0])-does not replace special character \ [duplicate]

This question already has an answer here:
Search and replace --.sub(replacement, string[, count=0])-does not work with special characters
(1 answer)
Closed 6 years ago.
I have a string and I want to replace special characters with html code. The code is as follows:
s= '\nAxes.axvline\tAdd a vertical line across the axes.\nAxes.axvspan\tAdd a vertical span (rectangle) across the axes.\nSpectral\nAxes.acorr'
p = re.compile('(\\t)')
s= p.sub('<\span>', s)
p = re.compile('(\\n)')
s = p.sub('<p>', s)
This code replaces \t in the string with <\\span> rather than with <\span> as asked by the code.
I have tested the regex pattern on regex101.com and it works. I cannot understand why the code is not working.
My objective is to use the output as html code. The '<\span>' string is not recognized as a Tag by HTML and thus it is useless. I must find a way to replace the \t in the text with <\span> and not with <\span>. Is this impossible in Python? I have posted earlier a similar question but that question did not specifically addressed the problem that I raise here, neither was making clear my objective to use the corrected text as HTML code. The answer that was received did not function properly, possibly because the person responding was negligent of these facts.
No, it does work. It's just that you printed the repr of it. Were you testing this in the python shell?
In the python shell:
>>> '\\'
'\\'
>>> print('\\')
\
>>> print(repr('\\'))
'\\'
>>>
The shell outputs the returned value (if it's not None) using the the repr function. To overcome
this, you can use the print function, which returns None (so is not outputted by the shell), and
doesn't call the repr function.
Note that in this case, you don't need regex. You just do a simple replace:
s = s.replace('\n', '<p>').replace('\t', '<\span>')
And, for your regex, you should prefix your strings with r:
compiled_regex = re.compile(r'[a-z]+\s?') # for example
matchobj = compiled_regex.search('in this normal string')
othermatchobj = compiled_regex.search('in this other string')
Note that if you're not using your compile regex more than once, you can do this in one step
matchobj = re.search(r'[a-z]+\s?', '<- the pattern -> the string to search in')
Regex are super powerful though. Don't give up!

matching parentheses in python regular expression [duplicate]

This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Closed 1 year ago.
I have something like
store(s)
ending line like "1 store(s)".
I want to match it using Python regular expression.
I tried something like re.match('store\(s\)$', text)
but it's not working.
This is the code I tried:
import re
s = '1 store(s)'
if re.match('store\(s\)$', s):
print('match')
In more or less direct reply to your comment
Try this
import re
s = '1 stores(s)'
if re.match('store\(s\)$',s):
print('match')
The solution is to use re.search instead of re.match as the latter tries to match the whole string with the regexp while the former just tries to find a substring inside of the string that does match the expression.
Python offers two different primitive
operations based on regular
expressions: match checks for a match
only at the beginning of the string,
while search checks for a match
anywhere in the string (this is what
Perl does by default)
Straight from the docs, but it does come up alot.
have you considered re.match('(.*)store\(s\)$',text) ?

Python regex confused by brackets ([])? [duplicate]

This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Closed 3 years ago.
Is python confused, or is the programmer?
I've got a lot of lines of this:
some_dict[0x2a] = blah
some_dict[0xab] = blah, blah
What I'd like to do is to convert the hex codes into all uppercase to look like this:
some_dict[0x2A] = blah
some_dict[0xAB] = blah, blah
So I decided to call in the regular expressions. Normally, I'd just do this using my editor's regexps (xemacs), but the need to convert to uppercase pushes one into Lisp. ....ok... how about Python?
So I whip together a short script which doesn't work. I've condensed the code into this example, which doesn't work either. It looks to me like Python's regexps are getting confused by the brackets in the code. Is it me or Python?
import fileinput
import sys
import re
this = "0x2a"
that = "[0x2b]"
for line in [this, that]:
found = re.match("0x([0-9,a-f]{2})", line)
if found:
print("Found: %s" % found.group(0))
(I'm using the () grouping constructs so I don't capitalize the 'x' in '0x'.)
This example only prints the 0x2a value, not the 0x2b. Is this correct behavior?
I can easily work around this by changing the match expression to:
found = re.match("\[0x([0-9,a-f]{2}\])", line)
but I'm just wondering if someone can give me some insight into what's going on here.
Running Python 2.6.2 on Linux.
re.match matches from the start of the string. Use re.search instead to "match the first occurrence anywhere in the string". The key bit about this in the docs is here.
I don't think you need the comma within the brackets. i.e.:
found = re.match("0x([0-9,a-f]{2})", line)
tells python to look for commas which it might be mistakenly matching. I think you want
found = re.match("0x([0-9a-f]{2})", line)
You're using a partial pattern, so you can't use re.match, which expects to match the entire input string. You need to use re.search, which can perform partial matches.
>>> that = "[0x2b]"
>>> m = re.search("0x([0-9,a-f]{2})", that)
>>> m.group()
'0x2b'
You'll want to change
found = re.match("0x([0-9,a-f]{2})", line)
to
found = re.search("0x([0-9,a-f]{2})", line)
re.match will match only from the beginning of the string, which fails in the "[0x2b]" case.
re.search will match anywhere in the string, and thus ignore the leading "[" in the "[0x2b]" case.
See search() vs. match() for details.
You want to use re.search. This explains why.
If you use re.sub, and pass a callable as the replacement string, it will also do the uppercasing for you:
>>> that = 'some_dict[0x2a] = blah'
>>> m = re.sub("0x([0-9,a-f]{2})", lambda x: "0x"+x.group(1).upper(), that)
>>> m
'some_dict[0x2A] = blah'

Categories

Resources