Python regex confused by brackets ([])? [duplicate] - python

This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Closed 3 years ago.
Is python confused, or is the programmer?
I've got a lot of lines of this:
some_dict[0x2a] = blah
some_dict[0xab] = blah, blah
What I'd like to do is to convert the hex codes into all uppercase to look like this:
some_dict[0x2A] = blah
some_dict[0xAB] = blah, blah
So I decided to call in the regular expressions. Normally, I'd just do this using my editor's regexps (xemacs), but the need to convert to uppercase pushes one into Lisp. ....ok... how about Python?
So I whip together a short script which doesn't work. I've condensed the code into this example, which doesn't work either. It looks to me like Python's regexps are getting confused by the brackets in the code. Is it me or Python?
import fileinput
import sys
import re
this = "0x2a"
that = "[0x2b]"
for line in [this, that]:
found = re.match("0x([0-9,a-f]{2})", line)
if found:
print("Found: %s" % found.group(0))
(I'm using the () grouping constructs so I don't capitalize the 'x' in '0x'.)
This example only prints the 0x2a value, not the 0x2b. Is this correct behavior?
I can easily work around this by changing the match expression to:
found = re.match("\[0x([0-9,a-f]{2}\])", line)
but I'm just wondering if someone can give me some insight into what's going on here.
Running Python 2.6.2 on Linux.

re.match matches from the start of the string. Use re.search instead to "match the first occurrence anywhere in the string". The key bit about this in the docs is here.

I don't think you need the comma within the brackets. i.e.:
found = re.match("0x([0-9,a-f]{2})", line)
tells python to look for commas which it might be mistakenly matching. I think you want
found = re.match("0x([0-9a-f]{2})", line)

You're using a partial pattern, so you can't use re.match, which expects to match the entire input string. You need to use re.search, which can perform partial matches.
>>> that = "[0x2b]"
>>> m = re.search("0x([0-9,a-f]{2})", that)
>>> m.group()
'0x2b'

You'll want to change
found = re.match("0x([0-9,a-f]{2})", line)
to
found = re.search("0x([0-9,a-f]{2})", line)
re.match will match only from the beginning of the string, which fails in the "[0x2b]" case.
re.search will match anywhere in the string, and thus ignore the leading "[" in the "[0x2b]" case.
See search() vs. match() for details.

You want to use re.search. This explains why.

If you use re.sub, and pass a callable as the replacement string, it will also do the uppercasing for you:
>>> that = 'some_dict[0x2a] = blah'
>>> m = re.sub("0x([0-9,a-f]{2})", lambda x: "0x"+x.group(1).upper(), that)
>>> m
'some_dict[0x2A] = blah'

Related

Extract Digits from String [duplicate]

This question already has an answer here:
How can I find all matches to a regular expression in Python?
(1 answer)
Closed 4 years ago.
I'm trying to extract digits from a unicode string. The string looks like raised by 64 backers and raised by 2062 backers. I tried many different things, but the following code is the only one that actually worked.
backers = browser.find_element_by_xpath('//span[#gogo-test="backers"]').text
match = re.search(r'(\d+)', backers)
print(match.group(0))
Since I'm not sure how often I'll need to extract substrings from strings, and I don't want to be creating tons of extra variables and lines of code, I'm wondering if there's a shorter way to accomplish this?
I know I could do something like this.
def extract_digits(string):
return re.search(r'(\d+)', string)
But I was hoping for a one liner, so that I could structure the script without using an additional function like so.
backers = ...
title = ...
description = ...
...
Even though it obviously doesn't work, I'd like to do something similar to the following, but it doesn't work as intended.
backers = re.search(r'(\d+)', browser.find_element_by_xpath('//span[#gogo-test="backers"]').text)
And the output looks like this.
<_sre.SRE_Match object at 0x000000000542FD50>
Any way to deal with this?!
As an option you can skip using regex and use built-in Python isdigit() (no additional imports needed):
digit = [sub for sub in browser.find_element_by_xpath('//span[#gogo-test="backers"]').text.split() if sub.isdigit()][0]
You can try this:
number = backers.findall(r'\b\d+\b', 'raised by 64 backers')
output:
64
So the method could be like this:
def extract_digits(string):
return re.findall(r'\b\d+\b', string)
DEMO here
EDIT: since you want everything in one line, try this:
import re
backers = re.findall(r'\b\d+\b', browser.find_element_by_xpath('//span[#gogo-test="backers"]').text)[0]
PS:
search ⇒ find something anywhere in the string and return a match object
findall ⇒ find something anywhere in the string and return a list.
Documentation:
Scan through string looking for the first location where the regular
expression pattern produces a match, and return a corresponding
MatchObject instance. Return None if no position in the string matches
the pattern; note that this is different from finding a zero-length
match at some point in the string.
Documentation link: docs.python.org/2/library/re.html
So to do the same with search use this:
backers = re.search(r'(\d+)', browser.find_element_by_xpath('//span[#gogo-test="backers"]').text).group(0)

Search and replace --.sub(replacement, string[, count=0])-does not replace special character \ [duplicate]

This question already has an answer here:
Search and replace --.sub(replacement, string[, count=0])-does not work with special characters
(1 answer)
Closed 6 years ago.
I have a string and I want to replace special characters with html code. The code is as follows:
s= '\nAxes.axvline\tAdd a vertical line across the axes.\nAxes.axvspan\tAdd a vertical span (rectangle) across the axes.\nSpectral\nAxes.acorr'
p = re.compile('(\\t)')
s= p.sub('<\span>', s)
p = re.compile('(\\n)')
s = p.sub('<p>', s)
This code replaces \t in the string with <\\span> rather than with <\span> as asked by the code.
I have tested the regex pattern on regex101.com and it works. I cannot understand why the code is not working.
My objective is to use the output as html code. The '<\span>' string is not recognized as a Tag by HTML and thus it is useless. I must find a way to replace the \t in the text with <\span> and not with <\span>. Is this impossible in Python? I have posted earlier a similar question but that question did not specifically addressed the problem that I raise here, neither was making clear my objective to use the corrected text as HTML code. The answer that was received did not function properly, possibly because the person responding was negligent of these facts.
No, it does work. It's just that you printed the repr of it. Were you testing this in the python shell?
In the python shell:
>>> '\\'
'\\'
>>> print('\\')
\
>>> print(repr('\\'))
'\\'
>>>
The shell outputs the returned value (if it's not None) using the the repr function. To overcome
this, you can use the print function, which returns None (so is not outputted by the shell), and
doesn't call the repr function.
Note that in this case, you don't need regex. You just do a simple replace:
s = s.replace('\n', '<p>').replace('\t', '<\span>')
And, for your regex, you should prefix your strings with r:
compiled_regex = re.compile(r'[a-z]+\s?') # for example
matchobj = compiled_regex.search('in this normal string')
othermatchobj = compiled_regex.search('in this other string')
Note that if you're not using your compile regex more than once, you can do this in one step
matchobj = re.search(r'[a-z]+\s?', '<- the pattern -> the string to search in')
Regex are super powerful though. Don't give up!

python regular expression doesn't work [duplicate]

This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Closed 6 years ago.
I have this android logcat's log:
"Could not find class android.app.Notification$Action$Builder, referenced from method b.a"
and I'm trying to apply a regular expression, in python, to extract android.app.Notification$Action$Builder and b.a.
I use this code:
regexp = '\'([\w\d\.\$\:\-\[\]\<\>]+).*\s([\w\d\.\$\:\-\[\]\<\>]+)'
match = re.match(r'%s' % regexp, msg, re.M | re.I)
I tested the regular expression online and it works as expected, but it never matches in python. Someone can give me some suggestions?
Thanks
.re.match() matches only at the start of a string. Use re.search() instead, see match() vs. search().
Note that you appear to misunderstand what a raw string literal is; r'%s' % string does not produce a special, different object. r'..' is just notation, it still produces a regular string object. Put the r on the original string literal instead (but if you use double quotes you do not need to quote the single quote contained):
regexp = r"'([\w\d\.\$\:\-\[\]\<\>]+).*\s([\w\d\.\$\:\-\[\]\<\>]+)"
For this specific regex it doesn't otherwise matter to the pattern produced.
Note that the pattern doesn't actually capture what you want to capture. Apart from the escaped ' at the start (which doesn't appear in your text at all, it won't work as it doesn't require dots and dollars to be part of the name. As such, you capture Could and b.a instead, the first and last words in the regular expression.
I'd anchor on the words class and method instead, and perhaps require there to be dots in the class name:
regexp = r'class\s+((?:[\w\d\$\:\-\[\]\<\>]+\.)+[\w\d\$\:\-\[\]\<\>]+).*method ([\w\d.\$\:\-\[\]\<\>]+)'
Demo:
>>> import re
>>> regexp = r'class\s+((?:[\w\d\$\:\-\[\]\<\>]+\.)+[\w\d\$\:\-\[\]\<\>]+).*method ([\w\d.\$\:\-\[\]\<\>]+)'
>>> msg = "Could not find class android.app.Notification$Action$Builder, referenced from method b.a"
>>> re.search(regexp, msg, re.M | re.I)
<_sre.SRE_Match object at 0x1023072d8>
>>> re.search(regexp, msg, re.M | re.I).groups()
('android.app.Notification$Action$Builder', 'b.a')

Python split by regular expression

In Python, I am extracting emails from a string like so:
split = re.split(" ", string)
emails = []
pattern = re.compile("^[a-zA-Z0-9_\.-]+#[a-zA-Z0-9-]+.[a-zA-Z0-9-\.]+$");
for bit in split:
result = pattern.match(bit)
if(result != None):
emails.append(bit)
And this works, as long as there is a space in between the emails. But this might not always be the case. For example:
Hello, foo#foo.com
would return:
foo#foo.com
but, take the following string:
I know my best friend mailto:foo#foo.com!
This would return null. So the question is: how can I make it so that a regex is the delimiter to split? I would want to get
foo#foo.com
in all cases, regardless of punctuation next to it. Is this possible in Python?
By "splitting by regex" I mean that if the program encounters the pattern in a string, it will extract that part and put it into a list.
I'd say you're looking for re.findall:
>>> email_reg = re.compile(r'[a-zA-Z0-9_.-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+')
>>> email_reg.findall('I know my best friend mailto:foo#foo.com!')
['foo#foo.com']
Notice that findall can handle more than one email address:
>>> email_reg.findall('Text text foo#foo.com, text text, baz#baz.com!')
['foo#foo.com', 'baz#baz.com']
Use re.search or re.findall.
You also need to escape your expression properly (. needs to be escaped outside of character classes, not inside) and remove/replace the anchors ^ and $ (for example with \b), eg:
r"\b[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+\b"
The problem I see in your regex is your use of ^ which matches the start of a string and $ which matches the end of your string. If you remove it and then run it with your sample test case it will work
>>> re.findall("[A-Za-z0-9\._-]+#[A-Za-z0-9-]+.[A-Za-z0-9-\.]+","I know my best friend mailto:foo#foo.com!")
['foo#foo.com']
>>> re.findall("[A-Za-z0-9\._-]+#[A-Za-z0-9-]+.[A-Za-z0-9-\.]+","Hello, foo#foo.com")
['foo#foo.com']
>>>

matching parentheses in python regular expression [duplicate]

This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Closed 1 year ago.
I have something like
store(s)
ending line like "1 store(s)".
I want to match it using Python regular expression.
I tried something like re.match('store\(s\)$', text)
but it's not working.
This is the code I tried:
import re
s = '1 store(s)'
if re.match('store\(s\)$', s):
print('match')
In more or less direct reply to your comment
Try this
import re
s = '1 stores(s)'
if re.match('store\(s\)$',s):
print('match')
The solution is to use re.search instead of re.match as the latter tries to match the whole string with the regexp while the former just tries to find a substring inside of the string that does match the expression.
Python offers two different primitive
operations based on regular
expressions: match checks for a match
only at the beginning of the string,
while search checks for a match
anywhere in the string (this is what
Perl does by default)
Straight from the docs, but it does come up alot.
have you considered re.match('(.*)store\(s\)$',text) ?

Categories

Resources