How to extract any specific Error message from Traceback? - python

I would like to extract the error message displayed in last line. I could able to do it by splitting.
test_str = """Traceback (most recent call last):
File "/usr/local/Cellar/python#3.9/3.9.2_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/unittest/case.py", line 59, in testPartExecutor
yield
File "/usr/local/Cellar/python#3.9/3.9.2_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/unittest/case.py", line 593, in run
self._callTestMethod(testMethod)
File "/usr/local/Cellar/python#3.9/3.9.2_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/unittest/case.py", line 550, in _callTestMethod
method()
File "/Users/abcd/efgh/ijkl/bin/../tests/abd/dummy/dummy1.py", line 95, in test_abc
assert False, "FAILING FOR A REASON"
AssertionError: FAILING FOR A REASON """
last_line = test_str.split("\n")[-1]
print(last_line.split(":")[-1])
How can I achieve the same with a regular expression?
It can contain any type of error, AssertionError, AttributeError, TypeError, SyntaxError etc.

Why do you want a regex for that? It doesn't add any value here.
print(re.split(":", last_line, 1)[-1])
Perhaps somewhat more usefully,
matched = re.match(r'^\s*([^:\s]+):(.*)', last_line)
if matched:
error, message = matched.groups()
\s matches any whitespace character, * says repeat zero or more times; [^:\s] matches any character which isn't whitespace or a colon, and we repeat that up through just before the first colon.
In slightly higher-level terms, this finds a line which contains a colon after the first stretch of non-space (and non-colon) characters, and extracts the token before the colon, and everything after it, as two separate groups. (The second group will contain any whitespace after the colon, too; maybe add \s* before the opening parenthesis to trim any leading spaces.)
But again, if you can do this without regex, do that; using one just makes this slower and more complicated.
If you want to pull out a traceback from a longer piece of text, try
re.match(r''''^Traceback \(most recent call last\):
(?: File "[^\n]+", line \d+, in [^\n]+
[\n]+
)*([^:]+):([^\n]+)''', many_lines, re.MULTILINE)
but this is untested and probably needs some modification to work on samples which are not exactly like the one you shared (in particular, errors from code which isn't in a file will look slightly different; but I also notice your sample traceback does not look like a standard one, because the spacing patterns are slightly different - this regex targets the standard format).
Perhaps notice how we use a '''...''' triple-quoted string to embed literal newlines into the regular expression; we still use the r sigil to mark this as a raw string, so that we don't have to double all the backslashes.
The re.MULTILINE flag is a modifier which changes the meaning of the ^ anchor; with the flag, it matches at the beginning of any line (whereas without the flag, it only matches at the beginning of the input string).

You could match the first line, followed by all lines that do not start with either Traceback or a word ending on Error to prevent overmatching.
^Traceback \(most recent call last\):(?:\r?\n(?![^\S\r\n]*(?:Traceback|\w*Error)\b).*)*\r?\n[^\S\r\n]*(\w*Error:.*)
^ Start of string
Traceback \(most recent call last\): Match Traceback (most recent call last):
(?: Non capture group
\r?\n Match a newline
(?![^\S\r\n]*(?:Traceback|\w*Error)\b) Assert that the line does not start with Traceback or a word ending on Error
.* Match the whole line
)* Close the non capture group and repeat 0+ times
\r?\n[^\S\r\n]* Match a newline, optional whitespace chars without a newline
(\w*Error:.*) Capture group 1, match optional word chars and Error: followed by the rest of the line
Regex demo
For example:
import re
test_str = """Traceback (most recent call last):
File "/usr/local/Cellar/python#3.9/3.9.2_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/unittest/case.py", line 59, in testPartExecutor
yield
File "/usr/local/Cellar/python#3.9/3.9.2_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/unittest/case.py", line 593, in run
self._callTestMethod(testMethod)
File "/usr/local/Cellar/python#3.9/3.9.2_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/unittest/case.py", line 550, in _callTestMethod
method()
File "/Users/abcd/efgh/ijkl/bin/../tests/abd/dummy/dummy1.py", line 95, in test_abc
assert False, "FAILING FOR A REASON"
AssertionError: FAILING FOR A REASON """
pattern = r"^Traceback \(most recent call last\):(?:\r?\n(?![^\S\r\n]*(?:Traceback|\w*Error)\b).*)*\r?\n[^\S\r\n]*(\w*Error:.*)"
match = re.match(pattern, test_str)
if match:
print(match.group(1))
Output
['AssertionError: FAILING FOR A REASON ']
Python demo

import re
test_str = """Traceback (most recent call last):
File "/usr/local/Cellar/python#3.9/3.9.2_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/unittest/case.py", line 59, in testPartExecutor
yield
File "/usr/local/Cellar/python#3.9/3.9.2_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/unittest/case.py", line 593, in run
self._callTestMethod(testMethod)
File "/usr/local/Cellar/python#3.9/3.9.2_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/unittest/case.py", line 550, in _callTestMethod
method()
File "/Users/abcd/efgh/ijkl/bin/../tests/abd/dummy/dummy1.py", line 95, in test_abc
assert False, "FAILING FOR A REASON"
AssertionError: FAILING FOR A REASON """
match = re.search(r'(?<=Error:.)(.*)(?=.$)', test_str)
print(match.group(1))
# FAILING FOR A REASON

Related

How to match this part of the string with regex in Python without getting look-behind requires fixed-width pattern?

I want to extract the name from a create Table statement for example this:
"CREATE OR REPLACE TEMPORARY TABLE IF NOT EXISTS author ("
the name here is author. If you look into the official mariaDB documentation you will notice, that OR REPLACE, TEMPORARY and IF NOT EXISTS are optional parameters.
The regex I've come up with:
r"(?<=(create)\s*(or replace\s*)?(temporary\s*)?(table)\s*(if not exists\s*)?)(\w+)(?=(\s*)?(\())"
There also is no upper limit of how many spaces need to be between each word, but there is at least one required.
When i try this regex on https://regexr.com/ it works with these examples(With flags case insensitive, multiline and global):
CREATE TABLE book (
CREATE OR REPLACE TABLE IF NOT EXISTS author(
CREATE OR REPLACE TEMPORARY TABLE IF NOT EXISTS publisher (
However if i try to do this in python:
import re
firstRow = "CREATE OR REPLACE TABLE IF NOT EXISTS author ("
res = re.search(r"(?<=(create)\s(or replace\s)?(temporary\s)?(table)\s(if not exists\s)?)(\w+)(?=(\s)?(\())", firstRow, re.IGNORECASE)
it throws following error message:
Traceback (most recent call last):
File "test.py", line 116, in <module>
res = re.sub(r"(?<=(create)\s(or replace\s)?(temporary\s)?(table)\s(if not exists\s)?)(\w+)(?=(\s)?(\())", firstRow, re.IGNORECASE)
File "C:\Users\stefa\AppData\Local\Programs\Python\Python38-32\lib\re.py", line 210, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "C:\Users\stefa\AppData\Local\Programs\Python\Python38-32\lib\re.py", line 304, in _compile
p = sre_compile.compile(pattern, flags)
File "C:\Users\stefa\AppData\Local\Programs\Python\Python38-32\lib\sre_compile.py", line 768, in compile
code = _code(p, flags)
File "C:\Users\stefa\AppData\Local\Programs\Python\Python38-32\lib\sre_compile.py", line 607, in _code
_compile(code, p.data, flags)
File "C:\Users\stefa\AppData\Local\Programs\Python\Python38-32\lib\sre_compile.py", line 182, in _compile
raise error("look-behind requires fixed-width pattern")
re.error: look-behind requires fixed-width pattern
It works as you have selected Javascript, which might support an infinite quantifier in a lookbehind assertion.
You don't need any lookarounds at all as you are already using a capture group, so you can match what is around the table name.
As all the \s* are optional, you might consider using word boundaries \b to prevent partial word matches.
\bcreate\b\s*(?:or replace\s*(?:\btemporary\s*)?)?\btable\s*(?:\bif not exists\s*)?\b(\w+)\s*\(
Regex demo

Python Regex Error : nothing to repeat at position 0

I am trying to match strings which can be typed from a normal english keyboard.
So, it should include alphabets, digits, and all symbols present on our keyboard.
Corresponding regex : "[a-zA-Z0-9\t ./,<>?;:\"'`!##$%^&*()\[\]{}_+=|\\-]+"
I verfied this regex on regexr.com.
In python, on matching I am getting following error :
>>> a=re.match("+how to block a website in edge",pattern)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\tf_1.2\lib\re.py", line 163, in match
return _compile(pattern, flags).match(string)
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\tf_1.2\lib\re.py", line 293, in _compile
p = sre_compile.compile(pattern, flags)
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\tf_1.2\lib\sre_compile.py", line 536, in compile
p = sre_parse.parse(p, flags)
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\tf_1.2\lib\sre_parse.py", line 829, in parse
p = _parse_sub(source, pattern, 0)
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\tf_1.2\lib\sre_parse.py", line 437, in _parse_sub
itemsappend(_parse(source, state, nested + 1))
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\tf_1.2\lib\sre_parse.py", line 638, in _parse
source.tell() - here + len(this))
sre_constants.error: nothing to repeat at position 0
This error message is not about position of arguments. Yes, in question above they are not in the right order, but this is only half of problem.
I've got this problem once when i had something like this:
re.search('**myword', '/path/to/**myword')
I wanted to get '**' automatically so i did not wanted to write '\' manually somewhere. For this cause there is re.escape() function. This is the right code:
re.search(re.escape('**myword'), '/path/to/**myword')
The problem here is that special character placed after the beginning of line.
You have your arguments for re.match backward: it should be
re.match(pattern, "+how to block a website in edge")

Except "error: nothing to repeat" or "error: multiple repeat" in try/except block [duplicate]

I am using a regex and I get the error:
Traceback (most recent call last):
File "tokennet.py", line 825, in <module>
RunIt(ContentToRun,Content[0])
File "tokennet.py", line 401, in RunIt
if re.search(r'\b'+word+r'\b', str1) and re.search(r'\b'+otherWord+r'\b', str1) and word != otherWord:
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 142, in search
return _compile(pattern, flags).search(string)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 242, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat
I've looked around, and it seems this error is associated with *, but not sure why I'm getting it. What do I have to do to str1 to stop getting it? str1 is one line in a massive text file, and when I print str1 to see what line in particular is bugging, it looks like a normal line...
I suggest you to use re.escape(word), since your variable word may contain any regex special characters. I think the error came because of special characters present inside the variable. By using re.escape(variable-name), it escapes any special characters present inside the variable.
if re.search(r'\b'+re.escape(word)+r'\b', str1) and re.search(r'\b'+re.escape(otherWord)+r'\b', str1) and word != otherWord:

sre_constants.error: nothing to repeat

I am using a regex and I get the error:
Traceback (most recent call last):
File "tokennet.py", line 825, in <module>
RunIt(ContentToRun,Content[0])
File "tokennet.py", line 401, in RunIt
if re.search(r'\b'+word+r'\b', str1) and re.search(r'\b'+otherWord+r'\b', str1) and word != otherWord:
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 142, in search
return _compile(pattern, flags).search(string)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 242, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat
I've looked around, and it seems this error is associated with *, but not sure why I'm getting it. What do I have to do to str1 to stop getting it? str1 is one line in a massive text file, and when I print str1 to see what line in particular is bugging, it looks like a normal line...
I suggest you to use re.escape(word), since your variable word may contain any regex special characters. I think the error came because of special characters present inside the variable. By using re.escape(variable-name), it escapes any special characters present inside the variable.
if re.search(r'\b'+re.escape(word)+r'\b', str1) and re.search(r'\b'+re.escape(otherWord)+r'\b', str1) and word != otherWord:

Python Regex match: include Regex syntax

This is a list of characters I need for a regex match:
A-Za-z0-9_-\[]{}^`|
However, some of them, like \, [], ^ and | are regex syntax characters, when I tried using this pattern, I got this error:
Traceback (most recent call last):
File "C:\Users\dell\Desktop\test.py", line 8, in <module>
if re.match("^[A-Za-z0-9_-\[]{}^`|]*$", weird_input):
File "C:\Python27\lib\re.py", line 137, in match
return _compile(pattern, flags).match(string)
File "C:\Python27\lib\re.py", line 242, in _compile
raise error, v # invalid expression
error: bad character range
Is there any way I could include those characters?
You need to escape them using \ like:
Online Demo
import re
p = re.compile(ur'[A-Za-z0-9_\-\\\[\]\{\}^`\|]+')
test_str = u"test"
re.match(p, test_str)

Categories

Resources