I'm working on a script that will run another script and automatically generate a formatted Stack Overflow question based on the problem the script is having. I'd like to be able to provide accurate code samples and tracebacks while avoiding all the copying and pasting. Here's what I have so far:
# script.py
# usage: python3 script.py > stack_overflow_question.txt
from os.path import basename
from traceback import format_exc
def stack_overflow_question_generator(q_params):
"takes a source script & descriptions and returns a Stack Overflow question"
# get the code from the source script & pad each line with spaces
with open(q_params['source']) as f:
source = f.readlines()
code = ''.join(4 * ' ' + i for i in source)
# run the source script and get the traceback object
try:
trace_obj = None
try:
source_module = q_params['source'][:q_params['source'].index('.')]
__import__(source_module)
except Exception as e:
raise RuntimeError from e
except RuntimeError:
trace_obj = format_exc()
# raise a warning if the source script doesn't raise an error
if not trace_obj:
no_error_message = 'No exception raised by ' + q_params['source']
raise UserWarning(no_error_message)
# break the trace up into lines
trace_lines = trace_obj.split('\n')
# find the first line in the trace from the source and target it & its index
for i, t in enumerate(trace_lines):
if q_params['source'] in t:
target_line = t
index_one = i + 1
break
# find the first line of the outer trace and target its index
break_trace = ' '.join(['The above exception was the direct',
'cause of the following exception:'])
index_two = trace_lines.index(break_trace) - 1
# find the source script name in the target line & rebuild it with that name
quotes_index = [i for i, x in enumerate(target_line) if x == '"'][:2]
source_name = basename(target_line[quotes_index[0] + 1:quotes_index[1]])
filename = ''.join([' File "', source_name, target_line[quotes_index[1]:]])
# format the source traceback as a string & pad each line with spaces
trace = '\n'.join(4 * ' ' + i
for i in [trace_lines[0],
filename] + trace_lines[index_one:index_two])
# build and return the question formatted as a string
return '\n'.join([q_params['issue'], code, q_params['error'], trace,
q_params['request']])
# Example: set source script & other question params (issue, error, request):
question_params = {
'source': 'script.py',
'issue': ' '.join(['I\'m working on a script that will run another script',
'and automatically generate a formatted Stack Overflow',
'question based on the problem the script is having.',
'I\'d like to be able to provide accurate code samples',
'and tracebacks while avoiding all the copying and',
'pasting. Here\'s what I have so far:\n']),
'error': ' '.join(['However, when the source script doesn\'t actually',
'raise an exception, the question-generator script',
'can\'t really format the question appropriately and',
'just raises a warning:\n']),
'request': ' '.join(['Is there a better way for this script to generate',
'questions when the original script doesn\'t raise',
'any errors but still has some issue? I\'d also love',
'to see any implementations that accomplish the task',
'more efficiently or that improve the quality of the',
'question. (Also, apologies if this question is',
'better suited to stack overflow meta or code review;',
'I figured it made more sense here since I\'m trying',
'to build a tool in python for formatting output from',
'a python script.)'])
}
# Generate & print formatted SO question
print(stack_overflow_question_generator(question_params))
However, when the source script doesn't actually raise an exception, the question-generator script can't really format the question appropriately and just raises a warning:
Traceback (most recent call last):
File "script.py", line 19, in stack_overflow_question_generator
__import__(source_module)
File "/Users/alecrasmussen/script.py", line 86, in <module>
print(stack_overflow_question_generator(question_params))
File "/Users/alecrasmussen/script.py", line 28, in stack_overflow_question_generator
raise UserWarning(no_error_message)
UserWarning: No exception raised by script.py
Is there a better way for this script to generate questions when the original script doesn't raise any errors but still has some issue? I'd also love to see any implementations that accomplish the task more efficiently or that improve the quality of the question. (Also, apologies if this question is better suited to stack overflow meta or code review; I figured it made more sense here since I'm trying to build a tool in python for formatting output from a python script.)
Related
Here is my problem. We have an Excel based report that business users enter comments into two separate fields, as well as selecting a code form a drop down. We then have a manual process that collects those files and pushes the comments and codes to a Snowflake table to be able to use in various reports.
I am trying to improve the process with a Python script that will collect the files, copy them to a staging_folder location, then read in the data from the sheet, append it all together, do some cleanup and push to Snowflake. The plan is that this would be completely automated - but this is where we run into issues.
Initial step works perfectly. I have a loop that grabs the files based on the previous business day date, copies them to a staging folder. There are typically 32 files each day.
Next step reads those files to append to a dataframe. Here is the function that is loading the Excel files in my Python script.
def load_files():
file_list = glob.glob(file_path + r'\*')
df = pd.DataFrame()
print("Importing data to Pandas DF...")
for file in file_list:
try:
wb = load_workbook(file)
ws = wb["Daily Outs"]
data = ws.values
cols = next(data)[1:]
data = list(data)
idx = [r[0] for r in data]
data = (islice(r, 1, None) for r in data)
data_1 = pd.DataFrame(data, index=idx, columns=cols)
df = df.append(data_1, sort=False)
print(file + " Imported to Df...")
except Exception as e:
print("Error: " + e + " When attempting to open file: " + file)
# error_notify(e)
print(df.head(10))
return df
The problem is when we have files that have some sort of corruption. The files when opened manually will show an error like the one below.
I thought with my try, except code above this would catch an error like this and alert me with the error_notify(e) function. However, we get a result where the Python script crashes with an error like this: zipfile.BadZipFile: File is not a zip file
During handling of the above exception, another exception occurred.
There is more to the error, but I only copied & pasted this part in some communication with some folks int he office. Impossible to replicate the error on our own - I have no idea how the files get corrupted in this way - except that there are multiple people accessing the files throughout the day.
The way to make the file readable is completely manual - we must open the file, get that error, hit yes, and save the file over the existing one. Then re-launch the script. But since the try, except isn't catching it and alerting us to the failure, we have to run the script manually to see if it works or not.
Two questions - am I doing something incorrect in my try, except command? I am admittedly weak in error catching so my first thought is there is more I can do there to make that work. Secondly, is there a Python way to get past that error in the Excel workbook files?
Here is the error text:
Traceback (most recent call last):
File "G:/Replenishment/Reporting/00 - I&A Replenishment/02 - Service
Level/Daily Outs Comment Capture/Python/daily_outs_missed_files.py", line 48, in load_files
wb = load_workbook(file)
File "C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\reader\excel.py", line 314, in load_workbook
data_only, keep_links)
File "C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\reader\excel.py", line 124, in init
self.archive = _validate_archive(fn)
File "C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\reader\excel.py", line 96, in _validate_archive
archive = ZipFile(filename, 'r')
File "C:\ProgramData\Anaconda3\lib\zipfile.py", line 1222, in init
self._RealGetContents()
File "C:\ProgramData\Anaconda3\lib\zipfile.py", line 1289, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "G:/Replenishment/Reporting/00 - I&A Replenishment/02 - Service Level/Daily Outs Comment Capture/Python/daily_outs_missed_files.py", line 123, in <module>
main()
File "G:/Replenishment/Reporting/00 - I&A Replenishment/02 - Service Level/Daily Outs Comment Capture/Python/daily_outs_missed_files.py", line 86, in main
df_output = df_clean()
File "G:/Replenishment/Reporting/00 - I&A Replenishment/02 - Service Level/Daily Outs Comment Capture/Python/daily_outs_missed_files.py", line 68, in df_clean
df = load_files()
File "G:/Replenishment/Reporting/00 - I&A Replenishment/02 - Service Level/Daily Outs Comment Capture/Python/daily_outs_missed_files.py", line 61, in load_files
print("Error: " + e + " When attempting to open file: " + file)
TypeError: can only concatenate str (not "BadZipFile") to str
Your try/except code looks correct. All user defined exceptions in python should be classes based on Exception. See BaseException and
and Exception in python documentation :
"Exception (..) All user-defined exceptions should also be derived from this class" see also the exception class hierarchy tree at the end of the python doc sesction.
If your python script "crashes" it means one of the library procedures throws an exception which is not based on the Exception class, something that "should not" be. You could look at the Traceback and try catching the offending exception type separately, or find what part of the source code and which library is the cause, fix it and submit a PR. Here are two examples of a good and bad way of deriving own exceptions
class MyBadError(BaseException):
"""
my bad exception, do not make yours that way
"""
pass
instead of recommended
class MyGoodError(Exception):
"""
exception based on the Exception
"""
pass
Where and what exactly fails is a bit of mystery still but the problems with your exception from the Traceback is not new, see zipfile.BadZipfile issue in pandas discussion. Note that xlrd used by pandas to read Excel workbooks data is currently a "no-maintainer-ware" declaration about xlrd from the authors and in case of any issues the recommendation is to use openpyxl instead or fix any issues yourself (pandas maintainers are doing pontius pilate on that, but happily use xlrd as a dependency). I suggest you catch the BadZipfile as a special known corruption error separately from all other exceptions, see python error handling tutorial for example code (you probably already have seen it, this is for other readers). If that does not work I can trace it in the source code of your libraries / python modules to the exact offending section and find the culprit, if you reach out directly.
I have a log file which is auto updated and looks like this:
...
[23:32:19.586] PULL START
[23:32:19.637] PULL RESP NONE
[23:32:22.576] Rx - +CMS ERROR: 29
[23:32:22.686] STAT - TRY 2
[23:32:22.797] Tx - AT+CMGF=1
[23:32:23.008] Rx - OK
[23:32:23.017] Tx - at+cmgs="number"
[23:32:23.428] Rx - >
[23:32:23.438] Tx - message
[23:32:24.675] PULL START
[23:32:24.714] PULL RESP NONE
[23:32:26.663] Rx - +CMS ERROR: 29
[23:32:26.681] STAT - 68$$"+CMS ERROR: 29"
[23:32:26.695] SEND - RESPONSE, TRANS ID = xxxxxxxx, RESP CODE = xx, MESSAGE = +CMS ERROR: 29
and I have a list to be compared which is looks like this:
[
'+CMS ERROR: 8',
'+CMS ERROR: 28',
'+CMS ERROR: 29',
'+CMS ERROR: 50',
'+CMS ERROR: 226',
]
All I want to do is if the last line of the log file has string +CMS ERROR: XX and matches one from the list, I want to terminate the log related program.
Note that the log file will keep updating at random time as long as the program is running and my program will re-check the log file each seconds. If the updates(the last line printed on the log file) does not contain any of the string on the list, it will not terminate any program.
Is it possible to do that in python? like using regex or something? Please help.
So there's three major parts of this script.
Read and parse a log
Conditionally kill a process
Repeat every x seconds
The first part is easy. Let's call it should_act.
def should_act():
errors = ['+CMS ERROR: 8',
'+CMS ERROR: 28',
'+CMS ERROR: 29',
'+CMS ERROR: 50',
'+CMS ERROR: 226']
with open("path/to/logfile.log") as f:
for line in f:
pass
return any(error in line for error in errors)
The second part isn't too bad either. Let's call that act.
def act():
pid = YOUR_PROCESS_ID
subprocess.run(['taskkill', '/PID', str(pid)])
# or alternatively taskkill /IM YOUR_IMAGE_NAME works too.
The third part creates some problems, but ultimately isn't too bad either. There are lots of ways to do this, the best being to schedule it outside of the application. taskschd.msc is the best way to do this on Windows, and cron is the best way in general.
Doing this in application has a bunch of answers, some better than others. I'll let you select from those solutions and instead advise that you use the OS to schedule the script to run every x seconds.
import subprocess
# the two code blocks above
if __name__ == "__main__":
if should_act():
act()
You want to continuously watch the file? Exactly akin to what the Unix command tail -f does? Then before a code pointer, I suggest picking the right tool for the job. Outsource. If you want an in-process solution, take a look at Watchdog. If you are comfortable reading from a subprocess, consider any of the solutions in A Windows equivalent of the Unix tail command
Meanwhile, if you absolutely must open the file afresh each time, seek to the end first for efficiency:
with open('mylog.txt') as logf:
logf.seek(-1024, 2) # 2 = magic number to say "from end of file"
last_line = logf.readlines()[-1]
for exit_error in exit_error_strings:
if exit_error in last_line:
raise SystemExit # just exit
Now, this assumes that no log line will be more than 1024 characters. If that's not a safe assumption, then obviously choose a value that is safe or add additional logic as appropriate.
Regarding regular expressions, they are often more expensive (computationally, memory) than you think, but if you've measured, you could also do something like:
import re
exit_error_re = re.compile(r'\+CMS ERROR: \d\d')
...
if exit_error_re.search(last_line):
# do something
Obviously, set the regex as appropriate for your needs.
you can perform this action by using converting your file into a list of an array to get the last line of the file. you can put it in a loop so it can get updated automatically
I choose for this example Error number 8
from os import stat
filename = 'log.txt'
statinfo = os.stat(filename)
size = int(str(statinfo.st_size).replace('L', ''))
with open(filename, 'r') as f:
array_list = fin.seek(size/2) #the will read half of the incase the file size is and you want fast way to read your file
array_list = array_list.readlines()
if '+CMS ERROR: 8' in array_list[len(array_list)-1]:
#Your Code Here
I can't seem to generate this exception with python code with a genuine error.
I used the code from this question to check my work. Here it is, modified only slightly:
import py_compile
def check(python_file):
try:
file = open(python_file, 'r')
py_compile.compile(python_file, doraise=True)
except py_compile.PyCompileError:
print("<"+python_file+"> does not contain syntactically correct Python code")
else:
print("Compiled " + python_file + " with no issues.")
check("example.py")
The file example.py contains just:
print ("This is fine.")
prant ("This should be an error.")
'prant' instead of 'print' would be a simple syntax error, and if I run 'python example.py' then I see:
This is fine.
Traceback (most recent call last):
File "example.py", line 2, in <module>
prant ("This should be an error.")
NameError: name 'prant' is not defined
If I call the script at the top 'compiler.py' and then run 'python compiler.py' it will say there are no issues.
I have verified that compiler.py will complain about syntactic correctness if there are unmatched parentheses or quotes, so it does catch some problems. But I would like to be able to detect when a file has errors in the same way that running 'python example.py' or whatever would do. Basically, if it has an error when running it with 'python', I'd like to be able to detect that.
Is there a way to do this? And why is PyCompileError not being thrown when there is a syntax error?
I am trying to learn how to use Python-click. I was not able to use a help parameter with one of my options so I finally gave up and changed the code to not include help for that option. However, despite closing and restarting Python and now rebooting my computer the error message associated with trying to use the help parameter is still appearing.
Code:
import click
def something():
pass
#click.command()
#click.argument('dest_dir',type=click.Path(exists=True, readable=True,
resolve_path=True, dir_okay=True),
help='Location of directory where results will be saved')
#click.option('--use_terms', is_flag=True,
help='Process strings based on terms or phrases')
#click.option('--use_query', is_flag=True, help='Process string based on
search query')
#click.option('--search_phrase', '-s', multiple=True)
def do_process(dest_dir,use_terms,use_query,*search_phrase):
""" testing setting parameters for snip tables"""
outref = open('e:\\myTemp\\testq.txt')
ms = dest_dir + '\n'
if use_terms:
ms += use_term + '\n'
else:
ms += use_query + '\n'
for each in search_phrase:
x = something()
ms += each + '\n'
outref.writelines(ms)
outref.close()
if __name__ == "__main__":
do_process()
Originally for the last #click.option I had
#click.option('--search_phrase', '-s', multiple=True, help='The search phrase to use')
I kept getting an error message that I could not solve relating to having an unknown parameter help. I ditched it, changed to what is above and now I am getting a similar error,
I then shut down Python, I closed my module and then restarted Python opened and ran my code again and still getting this error message
Traceback:
Traceback (most recent call last):
File "C:\Program Files\PYTHON\snipTables\test_snip_click.py", line 14, in <module>
#click.option('--search_phrase', '-s', multiple=True)
File "C:\Program Files\PYTHON\lib\site-packages\click\decorators.py", line 148, in decorator
_param_memo(f, ArgumentClass(param_decls, **attrs))
File "C:\Program Files\PYTHON\lib\site-packages\click\core.py", line 1618, in __init__
Parameter.__init__(self, param_decls, required=required, **attrs)
TypeError: __init__() got an unexpected keyword argument 'help'
So then I shut down Python Idle, I saved and closed my code and then restarted Python, reopened my code, but I am still getting the same traceback except notice that the traceback has the line of code I switched to after beating my head hard against the monitor and giving up
I am getting ready to reboot but am really curious as to the cause.
I rebooted and still am getting the same error
Renaming the file and running again did not change outcome - same traceback
The problem is that click does not accept a help string with an argument parameter. It is interesting behavior. The help string associated with the argument will be the string in the function that processes the argument and options.
The error message will always show up associated with the last option. So the correct code for this example would be
import click
def something():
pass
#click.command()
#click.argument('dest_dir',type=click.Path(exists=True, readable=True,
resolve_path=True, dir_okay=True)
##Notice no help string here
#click.option('--use_terms', is_flag=True,
help='Process strings based on terms or phrases')
#click.option('--use_query', is_flag=True, help='Process string based on
search query')
#click.option('--search_phrase', '-s', multiple=True)
def do_process(dest_dir,use_terms,use_query,*search_phrase):
""" testing setting parameters for snip tables"""
outref = open('e:\\myTemp\\testq.txt')
ms = dest_dir + '\n'
if use_terms:
ms += use_term + '\n'
else:
ms += use_query + '\n'
for each in search_phrase:
x = something()
ms += each + '\n'
outref.writelines(ms)
outref.close()
if __name__ == "__main__":
do_process()
This runs fine the problem I was originally having is that click was not doing a good job of explaining the source of the error. Above, even though I got rid of the help string in the option the click parser associates the help string from the argument with the last option it parses.
Maybe you renamed the source file and you are running an old version that was compiled with the previous name?
try deleting *.pyc files
I write a lot of Python code that uses external libraries. Frequently I will write a bug, and when I run the code I get a big long traceback in the Python console. 99.999999% of the time it's due to a coding error in my code, not because of a bug in the package. But the traceback goes all the way to the line of error in the package code, and either it takes a lot of scrolling through the traceback to find the code I wrote, or the traceback is so deep into the package that my own code doesn't even appear in the traceback.
Is there a way to "black-box" the package code, or somehow only show traceback lines from my code? I'd like the ability to specify to the system which directories or files I want to see traceback from.
In order to print your own stacktrace, you would need to handle all unhandled exceptions yourself; this is how the sys.excepthook becomes handy.
The signature for this function is sys.excepthook(type, value, traceback) and its job is:
This function prints out a given traceback and exception to sys.stderr.
So as long as you can play with the traceback and only extract the portion you care about you should be fine. Testing frameworks do that very frequently; they have custom assert functions which usually does not appear in the traceback, in other words they skip the frames that belong to the test framework. Also, in those cases, the tests usually are started by the test framework as well.
You end up with a traceback that looks like this:
[ custom assert code ] + ... [ code under test ] ... + [ test runner code ]
How to identify your code.
You can add a global to your code:
__mycode = True
Then to identify the frames:
def is_mycode(tb):
globals = tb.tb_frame.f_globals
return globals.has_key('__mycode')
How to extract your frames.
skip the frames that don't matter to you (e.g. custom assert code)
identify how many frames are part of your code -> length
extract length frames
def mycode_traceback_levels(tb):
length = 0
while tb and is_mycode(tb):
tb = tb.tb_next
length += 1
return length
Example handler.
def handle_exception(type, value, tb):
# 1. skip custom assert code, e.g.
# while tb and is_custom_assert_code(tb):
# tb = tb.tb_next
# 2. only display your code
length = mycode_traceback_levels(tb)
print ''.join(traceback.format_exception(type, value, tb, length))
install the handler:
sys.excepthook = handle_exception
What next?
You could adjust length to add one or more levels if you still want some info about where the failure is outside of your own code.
see also https://gist.github.com/dnozay/b599a96dc2d8c69b84c6
As others suggested, you could use sys.excepthook:
This function prints out a given traceback and exception to sys.stderr.
When an exception is raised and uncaught, the interpreter calls sys.excepthook with three arguments, the exception class, exception instance, and a traceback object. In an interactive session this happens just before control is returned to the prompt; in a Python program this happens just before the program exits. The handling of such top-level exceptions can be customized by assigning another three-argument function to sys.excepthook.
(emphasis mine)
It's possible to filter a traceback extracted by extract_tb (or similar functions from the traceback module) based on specified directories.
Two functions that can help:
from os.path import join, abspath
from traceback import extract_tb, format_list, format_exception_only
def spotlight(*show):
''' Return a function to be set as new sys.excepthook.
It will SHOW traceback entries for files from these directories. '''
show = tuple(join(abspath(p), '') for p in show)
def _check_file(name):
return name and name.startswith(show)
def _print(type, value, tb):
show = (fs for fs in extract_tb(tb) if _check_file(fs.filename))
fmt = format_list(show) + format_exception_only(type, value)
print(''.join(fmt), end='', file=sys.stderr)
return _print
def shadow(*hide):
''' Return a function to be set as new sys.excepthook.
It will HIDE traceback entries for files from these directories. '''
hide = tuple(join(abspath(p), '') for p in hide)
def _check_file(name):
return name and not name.startswith(hide)
def _print(type, value, tb):
show = (fs for fs in extract_tb(tb) if _check_file(fs.filename))
fmt = format_list(show) + format_exception_only(type, value)
print(''.join(fmt), end='', file=sys.stderr)
return _print
They both use the traceback.extract_tb. It returns "a list of “pre-processed” stack trace entries extracted from the traceback object"; all of them are instances of traceback.FrameSummary (a named tuple). Each traceback.FrameSummary object has a filename field which stores the absolute path of the corresponding file. We check if it starts with any of the directory paths provided as separate function arguments to determine if we'll need to exclude the entry (or keep it).
Here's an Example:
The enum module from the standard library doesn't allow reusing keys,
import enum
enum.Enum('Faulty', 'a a', module=__name__)
yields
Traceback (most recent call last):
File "/home/vaultah/so/shadows/main.py", line 23, in <module>
enum.Enum('Faulty', 'a a', module=__name__)
File "/home/vaultah/cpython/Lib/enum.py", line 243, in __call__
return cls._create_(value, names, module=module, qualname=qualname, type=type, start=start)
File "/home/vaultah/cpython/Lib/enum.py", line 342, in _create_
classdict[member_name] = member_value
File "/home/vaultah/cpython/Lib/enum.py", line 72, in __setitem__
raise TypeError('Attempted to reuse key: %r' % key)
TypeError: Attempted to reuse key: 'a'
We can restrict stack trace entries to our code (in /home/vaultah/so/shadows/main.py).
import sys, enum
sys.excepthook = spotlight('/home/vaultah/so/shadows')
enum.Enum('Faulty', 'a a', module=__name__)
and
import sys, enum
sys.excepthook = shadow('/home/vaultah/cpython/Lib')
enum.Enum('Faulty', 'a a', module=__name__)
give the same result:
File "/home/vaultah/so/shadows/main.py", line 22, in <module>
enum.Enum('Faulty', 'a a', module=__name__)
TypeError: Attempted to reuse key: 'a'
There's a way to exclude all site directories (where 3rd party packages are installed - see site.getsitepackages)
import sys, site, jinja2
sys.excepthook = shadow(*site.getsitepackages())
jinja2.Template('{%}')
# jinja2.exceptions.TemplateSyntaxError: unexpected '}'
# Generates ~30 lines, but will only display 4
Note: Don't forget to restore sys.excepthook from sys.__excepthook__. Unfortunately, you won't be able to "patch-restore" it using a context manager.
the traceback.extract_tb(tb) would return a tuple of error frames in the format(file, line_no, type, error_statement) , you can play with that to format the traceback. Also refer https://pymotw.com/2/sys/exceptions.html
import sys
import traceback
def handle_exception(ex_type, ex_info, tb):
print ex_type, ex_info, traceback.extract_tb(tb)
sys.excepthook = handle_exception