I am using a regex and I get the error:
Traceback (most recent call last):
File "tokennet.py", line 825, in <module>
RunIt(ContentToRun,Content[0])
File "tokennet.py", line 401, in RunIt
if re.search(r'\b'+word+r'\b', str1) and re.search(r'\b'+otherWord+r'\b', str1) and word != otherWord:
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 142, in search
return _compile(pattern, flags).search(string)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 242, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat
I've looked around, and it seems this error is associated with *, but not sure why I'm getting it. What do I have to do to str1 to stop getting it? str1 is one line in a massive text file, and when I print str1 to see what line in particular is bugging, it looks like a normal line...
I suggest you to use re.escape(word), since your variable word may contain any regex special characters. I think the error came because of special characters present inside the variable. By using re.escape(variable-name), it escapes any special characters present inside the variable.
if re.search(r'\b'+re.escape(word)+r'\b', str1) and re.search(r'\b'+re.escape(otherWord)+r'\b', str1) and word != otherWord:
Related
I would like to extract the error message displayed in last line. I could able to do it by splitting.
test_str = """Traceback (most recent call last):
File "/usr/local/Cellar/python#3.9/3.9.2_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/unittest/case.py", line 59, in testPartExecutor
yield
File "/usr/local/Cellar/python#3.9/3.9.2_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/unittest/case.py", line 593, in run
self._callTestMethod(testMethod)
File "/usr/local/Cellar/python#3.9/3.9.2_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/unittest/case.py", line 550, in _callTestMethod
method()
File "/Users/abcd/efgh/ijkl/bin/../tests/abd/dummy/dummy1.py", line 95, in test_abc
assert False, "FAILING FOR A REASON"
AssertionError: FAILING FOR A REASON """
last_line = test_str.split("\n")[-1]
print(last_line.split(":")[-1])
How can I achieve the same with a regular expression?
It can contain any type of error, AssertionError, AttributeError, TypeError, SyntaxError etc.
Why do you want a regex for that? It doesn't add any value here.
print(re.split(":", last_line, 1)[-1])
Perhaps somewhat more usefully,
matched = re.match(r'^\s*([^:\s]+):(.*)', last_line)
if matched:
error, message = matched.groups()
\s matches any whitespace character, * says repeat zero or more times; [^:\s] matches any character which isn't whitespace or a colon, and we repeat that up through just before the first colon.
In slightly higher-level terms, this finds a line which contains a colon after the first stretch of non-space (and non-colon) characters, and extracts the token before the colon, and everything after it, as two separate groups. (The second group will contain any whitespace after the colon, too; maybe add \s* before the opening parenthesis to trim any leading spaces.)
But again, if you can do this without regex, do that; using one just makes this slower and more complicated.
If you want to pull out a traceback from a longer piece of text, try
re.match(r''''^Traceback \(most recent call last\):
(?: File "[^\n]+", line \d+, in [^\n]+
[\n]+
)*([^:]+):([^\n]+)''', many_lines, re.MULTILINE)
but this is untested and probably needs some modification to work on samples which are not exactly like the one you shared (in particular, errors from code which isn't in a file will look slightly different; but I also notice your sample traceback does not look like a standard one, because the spacing patterns are slightly different - this regex targets the standard format).
Perhaps notice how we use a '''...''' triple-quoted string to embed literal newlines into the regular expression; we still use the r sigil to mark this as a raw string, so that we don't have to double all the backslashes.
The re.MULTILINE flag is a modifier which changes the meaning of the ^ anchor; with the flag, it matches at the beginning of any line (whereas without the flag, it only matches at the beginning of the input string).
You could match the first line, followed by all lines that do not start with either Traceback or a word ending on Error to prevent overmatching.
^Traceback \(most recent call last\):(?:\r?\n(?![^\S\r\n]*(?:Traceback|\w*Error)\b).*)*\r?\n[^\S\r\n]*(\w*Error:.*)
^ Start of string
Traceback \(most recent call last\): Match Traceback (most recent call last):
(?: Non capture group
\r?\n Match a newline
(?![^\S\r\n]*(?:Traceback|\w*Error)\b) Assert that the line does not start with Traceback or a word ending on Error
.* Match the whole line
)* Close the non capture group and repeat 0+ times
\r?\n[^\S\r\n]* Match a newline, optional whitespace chars without a newline
(\w*Error:.*) Capture group 1, match optional word chars and Error: followed by the rest of the line
Regex demo
For example:
import re
test_str = """Traceback (most recent call last):
File "/usr/local/Cellar/python#3.9/3.9.2_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/unittest/case.py", line 59, in testPartExecutor
yield
File "/usr/local/Cellar/python#3.9/3.9.2_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/unittest/case.py", line 593, in run
self._callTestMethod(testMethod)
File "/usr/local/Cellar/python#3.9/3.9.2_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/unittest/case.py", line 550, in _callTestMethod
method()
File "/Users/abcd/efgh/ijkl/bin/../tests/abd/dummy/dummy1.py", line 95, in test_abc
assert False, "FAILING FOR A REASON"
AssertionError: FAILING FOR A REASON """
pattern = r"^Traceback \(most recent call last\):(?:\r?\n(?![^\S\r\n]*(?:Traceback|\w*Error)\b).*)*\r?\n[^\S\r\n]*(\w*Error:.*)"
match = re.match(pattern, test_str)
if match:
print(match.group(1))
Output
['AssertionError: FAILING FOR A REASON ']
Python demo
import re
test_str = """Traceback (most recent call last):
File "/usr/local/Cellar/python#3.9/3.9.2_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/unittest/case.py", line 59, in testPartExecutor
yield
File "/usr/local/Cellar/python#3.9/3.9.2_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/unittest/case.py", line 593, in run
self._callTestMethod(testMethod)
File "/usr/local/Cellar/python#3.9/3.9.2_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/unittest/case.py", line 550, in _callTestMethod
method()
File "/Users/abcd/efgh/ijkl/bin/../tests/abd/dummy/dummy1.py", line 95, in test_abc
assert False, "FAILING FOR A REASON"
AssertionError: FAILING FOR A REASON """
match = re.search(r'(?<=Error:.)(.*)(?=.$)', test_str)
print(match.group(1))
# FAILING FOR A REASON
I am trying to match strings which can be typed from a normal english keyboard.
So, it should include alphabets, digits, and all symbols present on our keyboard.
Corresponding regex : "[a-zA-Z0-9\t ./,<>?;:\"'`!##$%^&*()\[\]{}_+=|\\-]+"
I verfied this regex on regexr.com.
In python, on matching I am getting following error :
>>> a=re.match("+how to block a website in edge",pattern)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\tf_1.2\lib\re.py", line 163, in match
return _compile(pattern, flags).match(string)
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\tf_1.2\lib\re.py", line 293, in _compile
p = sre_compile.compile(pattern, flags)
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\tf_1.2\lib\sre_compile.py", line 536, in compile
p = sre_parse.parse(p, flags)
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\tf_1.2\lib\sre_parse.py", line 829, in parse
p = _parse_sub(source, pattern, 0)
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\tf_1.2\lib\sre_parse.py", line 437, in _parse_sub
itemsappend(_parse(source, state, nested + 1))
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\tf_1.2\lib\sre_parse.py", line 638, in _parse
source.tell() - here + len(this))
sre_constants.error: nothing to repeat at position 0
This error message is not about position of arguments. Yes, in question above they are not in the right order, but this is only half of problem.
I've got this problem once when i had something like this:
re.search('**myword', '/path/to/**myword')
I wanted to get '**' automatically so i did not wanted to write '\' manually somewhere. For this cause there is re.escape() function. This is the right code:
re.search(re.escape('**myword'), '/path/to/**myword')
The problem here is that special character placed after the beginning of line.
You have your arguments for re.match backward: it should be
re.match(pattern, "+how to block a website in edge")
I am using a regex and I get the error:
Traceback (most recent call last):
File "tokennet.py", line 825, in <module>
RunIt(ContentToRun,Content[0])
File "tokennet.py", line 401, in RunIt
if re.search(r'\b'+word+r'\b', str1) and re.search(r'\b'+otherWord+r'\b', str1) and word != otherWord:
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 142, in search
return _compile(pattern, flags).search(string)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 242, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat
I've looked around, and it seems this error is associated with *, but not sure why I'm getting it. What do I have to do to str1 to stop getting it? str1 is one line in a massive text file, and when I print str1 to see what line in particular is bugging, it looks like a normal line...
I suggest you to use re.escape(word), since your variable word may contain any regex special characters. I think the error came because of special characters present inside the variable. By using re.escape(variable-name), it escapes any special characters present inside the variable.
if re.search(r'\b'+re.escape(word)+r'\b', str1) and re.search(r'\b'+re.escape(otherWord)+r'\b', str1) and word != otherWord:
This is a list of characters I need for a regex match:
A-Za-z0-9_-\[]{}^`|
However, some of them, like \, [], ^ and | are regex syntax characters, when I tried using this pattern, I got this error:
Traceback (most recent call last):
File "C:\Users\dell\Desktop\test.py", line 8, in <module>
if re.match("^[A-Za-z0-9_-\[]{}^`|]*$", weird_input):
File "C:\Python27\lib\re.py", line 137, in match
return _compile(pattern, flags).match(string)
File "C:\Python27\lib\re.py", line 242, in _compile
raise error, v # invalid expression
error: bad character range
Is there any way I could include those characters?
You need to escape them using \ like:
Online Demo
import re
p = re.compile(ur'[A-Za-z0-9_\-\\\[\]\{\}^`\|]+')
test_str = u"test"
re.match(p, test_str)
I have a module file containing the following functions:
def replace(filename):
match = re.sub(r'[^\s^\w]risk', 'risk', filename)
return match
def count_words(newstring):
from collections import defaultdict
word_dict=defaultdict(int)
for line in newstring:
words=line.lower().split()
for word in words:
word_dict[word]+=1
for word in word_dict:
if'risk'==word:
return word, word_dict[word]
when I do this in IDLE:
>>> mylist = open('C:\\Users\\ahn_133\\Desktop\\Python Project\\test10.txt').read()
>>> newstrings=replace(mylist)
>>> newone=count_words(newstrings)
test10.txt just contains words for testing like:
#
risk risky riskier risk. risk?
#
I get the following error:
Traceback (most recent call last):
File "<pyshell#134>", line 1, in <module>
newPH = replace(newPassage)
File "C:\Users\ahn_133\Desktop\Python Project\text_modules.py", line 56, in replace
match = re.sub(r'[^\s^\w]risk', 'risk', filename)
File "C:\Python27\lib\re.py", line 151, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or buffer
Is there anyway to run both functions without saving newstrings into a file, opening it using readlines(), and then running count_words function?
To run a module, you just do python modulename.py or python.exe modulename.py - or just double click the icon.
But i guess your problem really isn't what your question title states, so you really should learn how to debug python