i wan to extract (abc)(def) using the regex
which i ended up with that error below
import re
def main():
str = "-->(abc)(def)<--"
match = re.search("\-->(.*?)\<--" , str).group(1)
print match
The error is:
Traceback (most recent call last):
File "test.py", line 7, in <module>
match = re.search("\-->(.*?)\<--" , str).group()
File "/usr/lib/python2.7/re.py", line 146, in search
return _compile(pattern, flags).search(string)
TypeError: expected string or buffer
Corrected:
import re
def main():
my_string = "-->(abc)(def)<--"
match = re.search("\-->(.*?)\<--" , my_string).group(1)
print match
# (abc)(def)
main()
Note, that I renamed str to my_string (do not use standard library functions as own variables!). Maybe you can still optimize your regex with lookarounds, the lazy star (.*?) can get very ineffective sometimes.
Related
I am trying to parse the output from an SSH session using Paramiko module. Paramiko channel.recv() returns the output is bytes. I then converted it to UTF-8 string using bytes.decode("utf-8"). No matter what encoding I use, Regex always raises TypeError: expected string or bytes-like object exception.
import re
bytes = b"optical temp=10950"
bytes = bytes.decode("utf-8")
pattern = re.compile("(?<=temp=).*")
temp = re.search(bytes, pattern)
Traceback:
Traceback (most recent call last):
File "main.py", line 7, in <module>
temp = re.search(bytes, pattern)
File "/usr/lib/python3.8/re.py", line 201, in search
return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object
Your code is almost functional.
re.search takes the pattern first, then the string to be searched:
import re
bytes = b"optical temp=10950"
bytes = bytes.decode("utf-8")
pattern = re.compile("(?<=temp=).*")
temp = re.search(pattern, bytes)
#OR
temp = pattern.search(bytes)
I have a string that contains commas both inside and outside of a parentheses block:
foo(bat,foo),bat
How can I use regex to replace the comma not inside parentheses?
foo(bat,foo)bat
Do you really want to use re, or is anyway to achieve your goal is ok?
In the latter case, here is a way to do it:
mystring = 'foo(bat,foo),bat'
''.join(si + ',' if '(' in si else si for si in mystring.split(','))
#'foo(bat,foo)bat'
Assuming that there are no nested parentheses and there are no invalid pairings of parentheses, you can do this with a regex based on the fact that a comma will only be outside a pair of parentheses if and only if there are an even number of ( and ) symbols that follow it. Thus, you can use a lookahead regex to achieve this.
,(?![^(]*\))
If there are nested parentheses, it becomes a context-free grammar and you cannot capture this with a regular expression alone. You are better off just using split methods.
example:
import re
ori_str = "foo(bat,foo),bat foo(bat,foo),bat";
rep_str = re.sub(r',(?![^(]*\))', '', ori_str)
print(rep_str)
Considering that we want to remove all commas outside of all blocks and don't want to modify nested blocks.
Let's add string validation for cases when there are unclosed/unopened blocks found with
def validate_string(string):
left_parts_count = len(string.split('('))
right_parts_count = len(string.split(')'))
diff = left_parts_count - right_parts_count
if diff == 0:
return
if diff < 0:
raise ValueError('Invalid string: "{string}". '
'Number of closed '
'but not opened blocks: {diff}.'
.format(string=string,
diff=-diff))
raise ValueError('Invalid string: "{string}". '
'Number of opened '
'but not closed blocks: {diff}.'
.format(string=string,
diff=diff))
then we can do our job without regular expressions, just using str methods
def remove_commas_outside_of_parentheses(string):
# if you don't need string validation
# then remove this line and string validator
validate_string(string)
left_parts = string.split('(')
if len(left_parts) == 1:
# no opened blocks found,
# remove all commas
return string.replace(',', '')
left_outer_part = left_parts[0]
left_outer_part = left_outer_part.replace(',', '')
left_unopened_parts = left_parts[-1].split(')')
right_outer_part = left_unopened_parts[-1]
right_outer_part = right_outer_part.replace(',', '')
return '('.join([left_outer_part] +
left_parts[1:-1] +
[')'.join(left_unopened_parts[:-1]
+ [right_outer_part])])
it can look a bit nasty, i suppose, but it works.
Tests
>>>remove_commas_outside_of_parentheses('foo,bat')
foobat
>>>remove_commas_outside_of_parentheses('foo,(bat,foo),bat')
foo(bat,foo)bat
>>>remove_commas_outside_of_parentheses('bar,baz(foo,(bat,foo),bat),bar,baz')
barbaz(foo,(bat,foo),bat)barbaz
"broken" ones:
>>>remove_commas_outside_of_parentheses('(')
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "<input>", line 4, in remove_commas_outside_of_parentheses
File "<input>", line 17, in validate_string
ValueError: Invalid string: "(". Number of opened but not closed blocks: 1.
>>>remove_commas_outside_of_parentheses(')')
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "<input>", line 4, in remove_commas_outside_of_parentheses
File "<input>", line 12, in validate_string
ValueError: Invalid string: ")". Number of closed but not opened blocks: 1.
I'm trying to break up a long regex into smaller chunks. Is it possible/good practice to change A to B?
A:
line = re.sub(r'\$\{([0-9]+)\}|\$([0-9]+)|\$\{(\w+?\=\w?+)\}|[^\\]\$(\w[^-]+)|[^\\]\$\{(\w[^-]+)\}',replace,line)
B:
line = re.sub(r'\$\{([0-9]+)\}|'
r'\$([0-9]+)|'
r'\$\{(\w+?\=\w?+)\}|'
r'[^\\]\$(\w[^-]+)|'
r'[^\\]\$\{(\w[^-]+)\}',replace,line)
Edit:
I receive the following error when running this in Python 2:
def main():
while(1):
line = raw_input("(%s)$ " % ncmd)
line = re.sub(r'''
\$\{([0-9]+)\}|
\$([0-9]+)|
\$\{(\w+?\=\w?+)\}|
[^\\]\$(\w[^-]+)|
[^\\]\$\{(\w[^-]+)\}
''',replace,line,re.VERBOSE)
print '>> ' + line
Error:
(1)$ abc
Traceback (most recent call last):
File "Test.py", line 4, in <module>
main()
File "Test.py", line 2, in main
[^\\]\$\{(\w[^-]+)\}''',replace,line,re.VERBOSE)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 151, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 242, in _compile
raise error, v # invalid expression
sre_constants.error: multiple repeat
You can use a triple-quoted (multi-line) string and set the re.VERBOSE flag, which allows you to break a Regex pattern over multiple lines:
line = re.sub(r'''
\$\{([0-9]+)\}|
\$([0-9]+)|
\$\{(\w+?\=\w?+)\}|
[^\\]\$(\w[^-]+)|
[^\\]\$\{(\w[^-]+)\}
''', replace, line, re.VERBOSE)
You can even include comments directly inside the string:
line = re.sub(r'''
\$\{([0-9]+)\}| # Pattern 1
\$([0-9]+)| # Pattern 2
\$\{(\w+?\=\w?+)\}| # Pattern 3
[^\\]\$(\w[^-]+)| # Pattern 4
[^\\]\$\{(\w[^-]+)\} # Pattern 5
''', replace, line, re.VERBOSE)
Lastly, it should be noted that you can likewise activate the verbose flag by using re.X or by placing (?x) at the start of your Regex pattern.
You can also separate your expression over multiple lines using double quotes, like the following:
line = re.sub(r"\$\{([0-9]+)\}|\$([0-9]+)|"
r"\$\{(.+-.+)\}|"
r"\$\{(\w+?\=\w+?)\}|"
r"\$(\w[^-]+)|\$\{(\w[^-]+)\}",replace,line)
I have a string that looks like a path from which I am trying to extract 020414_001 with a regular expression I got from here.
str1 = "Test 123 <C:\User\Test\xyz\022014-101\more\stuff\022014\1> Text"
Actually I am retrieving the string from a text file so I dont have to escape it, but for testing purpose I used this string instead:
str1 = <C:\\User\\Test\\xyz\\022014-101\\more\\stuff\\022014\\1>
Here is the code I tried to match the first occuring 022014-101:
import re
p = re.compile('(?<=\\)[\d]{6}[^\\]*')
m = p.match(str1)
print m.group(0) #Line 6
It gave me this error:
Traceback (most recent call last):
File "test12.py", line 6, in <module>
print m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'
How can I get the desired output 020414_001 ?
EDIT:
That did it:
import re
m = re.search(r'(?<=\\)[\d]{6}[^\\]*', str1)
print m.group(0)
I am working through some example code which I've found on What's the most efficient way to find one of several substrings in Python?. I've changed the code to:
import re
to_find = re.compile("hello|there")
search_str = "blah fish cat dog haha"
match_obj = to_find.search(search_str)
#the_index = match_obj.start()
which_word_matched = ""
which_word_matched = match_obj.group()
Since there is now no match , I get:
Traceback (most recent call last):
File "<console>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'
What is the standard way in python to handle the scenario of no match, so as to avoid the error
match_obj = to_find.search(search_str)
if match_obj:
#do things with match_obj
Other handling will go in an else block if you need to do something even when there's no match.
Your match_obj is None because the regular expression did not match. Test for it explicitly:
which_word_matched = match_obj.group() if match_obj else ''