python - how to log everything without instrumentation - python

I want to log everything:
Function entered + values of parameters + function exited
Result of every assignment or operation
etc.
Is it possible to log "everything" in a Python execution without instrumenting the code?
Since things are executing in a VM, it should be possible to configure this at the VM level (hopefully?).
I'm using Pycharm but I could do it via commandline it it's necessary.
There's this existing question: How to do logging at function entry, inside and exit in Python but it doesn't address how to log the result of variable assignments.

You would need to use the trace module and/or perhaps the pdb module. They may not give you everything you need, but it would be a starting point. The logging module doesn't work at such a low level as you seem to want.

Related

How can I change the default pager for Python's help() debugger command?

I'm currently doing some work in a server (Ubuntu) without admin rights nor contact with the administrator. When using the help(command) in the python command line I get an error.
Here's an example:
>>> help(someCommand)
/bin/sh: most: command not found
So, this error indicates that most pager is not currently installed. However, the server I'm working on has "more" and "less" pagers installed. So, how can I change the default pager configuration for this python utility?
This one is annoyingly difficult to research, but I think I found it.
The built-in help generates its messages using the standard library pydoc module (the module is also intended to be usable as a standalone script). In that documentation, we find:
When printing output to the console, pydoc attempts to paginate the output for easier reading. If the PAGER environment variable is set, pydoc will use its value as a pagination program.
So, presumably, that's been set to most on your system. Assuming it won't break anything else on your system, just unset or change it. (It still pages without a value set - even on Windows. I assume it has a built-in fallback.)
You can make a custom most script that just invokes less (or even more).
The steps would be:
Set up a script called most, the contents of which are:
#!/bin/sh
less ${#:1} # wierdess is just "all arguments except argument 0"
Put that script in a location that is on your PATH
Then most filename should just run less on that file, and that command should get called from in your python interpreter.
To be honest though, I'd just use Karl's approach.
You can view the various pager options in the source code. That function can be replaced to return whatever is desired. For example:
import pydoc
pydoc.getpager = lambda: lambda text: pydoc.pipepager(text, 'less')

Using pycharm to debug a custom GNU Radio block work function written in python

I have found that the pycharm debugger will not stop at breakpoints set in work functions of custom python GR blocks. However, it will stop at breakpoints set in a block's constructor.
Is there a way to stop at breakpoints in a work function? I know that the function is being run because when I put print statements in the work function, I see the results of the print when the flowgraph is run.
That's to be expected: the work() is called from C++ context; your debugger simply doesn't "control" the execution.
I'm sure it's possible to allow attaching breakpoints, but I don't know of a specific way. It would be relatively involved, because you need to instrument python in a GNU Radio-specific way, I guess.

In python, why use logging instead of print?

For simple debugging in a complex project is there a reason to use the python logger instead of print? What about other use-cases? Is there an accepted best use-case for each (especially when you're only looking for stdout)?
I've always heard that this is a "best practice" but I haven't been able to figure out why.
The logging package has a lot of useful features:
Easy to see where and when (even what line no.) a logging call is being made from.
You can log to files, sockets, pretty much anything, all at the same time.
You can differentiate your logging based on severity.
Print doesn't have any of these.
Also, if your project is meant to be imported by other python tools, it's bad practice for your package to print things to stdout, since the user likely won't know where the print messages are coming from. With logging, users of your package can choose whether or not they want to propogate logging messages from your tool or not.
One of the biggest advantages of proper logging is that you can categorize messages and turn them on or off depending on what you need. For example, it might be useful to turn on debugging level messages for a certain part of the project, but tone it down for other parts, so as not to be taken over by information overload and to easily concentrate on the task for which you need logging.
Also, logs are configurable. You can easily filter them, send them to files, format them, add timestamps, and any other things you might need on a global basis. Print statements are not easily managed.
Print statements are sort of the worst of both worlds, combining the negative aspects of an online debugger with diagnostic instrumentation. You have to modify the program but you don't get more, useful code from it.
An online debugger allows you to inspect the state of a running program; But the nice thing about a real debugger is that you don't have to modify the source; neither before nor after the debugging session; You just load the program into the debugger, tell the debugger where you want to look, and you're all set.
Instrumenting the application might take some work up front, modifying the source code in some way, but the resulting diagnostic output can have enormous amounts of detail, and can be turned on or off to a very specific degree. The python logging module can show not just the message logged, but also the file and function that called it, a traceback if there was one, the actual time that the message was emitted, and so on. More than that; diagnostic instrumentation need never be removed; It's just as valid and useful when the program is finished and in production as it was the day it was added; but it can have it's output stuck in a log file where it's not likely to annoy anyone, or the log level can be turned down to keep all but the most urgent messages out.
anticipating the need or use for a debugger is really no harder than using ipython while you're testing, and becoming familiar with the commands it uses to control the built in pdb debugger.
When you find yourself thinking that a print statement might be easier than using pdb (as it often is), You'll find that using a logger pulls your program in a much easier to work on state than if you use and later remove print statements.
I have my editor configured to highlight print statements as syntax errors, and logging statements as comments, since that's about how I regard them.
In brief, the advantages of using logging libraries do outweigh print as below reasons:
Control what’s emitted
Define what types of information you want to include in your logs
Configure how it looks when it’s emitted
Most importantly, set the destination for your logs
In detail, segmenting log events by severity level is a good way to sift through which log messages may be most relevant at a given time. A log event’s severity level also gives you an indication of how worried you should be when you see a particular message. For instance, dividing logging type to debug, info, warning, critical, and error. Timing can be everything when you’re trying to understand what went wrong with an application. You want to know the answers to questions like:
“Was this happening before or after my database connection died?”
“Exactly when did that request come in?”
Furthermore, it is easy to see where a log has occurred through line number and filename or method name even in which thread.
Here's a functional logging library for Python named loguru.
If you use logging then the person responsible for deployment can configure the logger to send it to a custom location, with custom information. If you only print, then that's all they get.
Logging essentially creates a searchable plain text database of print outputs with other meta data (timestamp, loglevel, line number, process etc.).
This is pure gold, I can run egrep over the log file after the python script has run.
I can tune my egrep pattern search to pick exactly what I am interested in and ignore the rest. This reduction of cognitive load and freedom to pick my egrep pattern later on by trial and error is the key benefit for me.
tail -f mylogfile.log | egrep "key_word1|key_word2"
Now throw in other cool things that print can't do (sending to socket, setting debug levels, logrotate, adding meta data etc.), you have every reason to prefer logging over plain print statements.
I tend to use print statements because it's lazy and easy, adding logging needs some boiler plate code, hey we have yasnippets (emacs) and ultisnips (vim) and other templating tools, so why give up logging for plain print statements!?
I would add to all other mentionned advantages that the print function in standard configuration is buffered. The flush may occure only at the end of the current block (the one where the print is).
This is true for any program launched in a non interactive shell (codebuild, gitlab-ci for instance) or whose output is redirected.
If for any reason the program is killed (kill -9, hard reset of the computer, …), you may be missing some line of logs if you used print for the same.
However, the logging library will ensure to flush the logs printed to stderr and stdout immediately at any call.

Seeing what gets written to stderr in Python web site using FastCGI

I am working on a website, hosted on DreamHost, using Python. For a while, I was using their default setup, which runs Python scripts using CGI. It worked fine, but I was worried that if I get a lot of traffic, it would run slow and use a lot of memory, so I switched it over to FastCGI using this module.
Overall, it still works fine, but there is one major annoyance: I can't seem to be able to see anything that gets written to the standard error stream. If anything goes wrong, my usual source of useful clues for what to do about it no longer works. Before, I used to see stuff sent to standard error in my Apache error log. Now, it just seems to disappear.
I tried making a test script, and writing strings using sys.stderr.write (from various places), and environ["wsgi.errors"].write (from within my app, where environ is the first parameter passed to the app by the WSGI/FastCGI wrapper). Either way, I couldn't find them. Does anyone know why, or how to access this data?
Keep in mind that this is my first time ever using FastCGI, so please let me know if I am making a bad choice by using this fcgi module.
If something in your system is capturing file-descriptor two (the "real" stderr), you can assign sys.stderr to any open, writeable file object, or to a file-like object (it basically just needs to implement write) -- including a cStdIO.StdIO instance, whose value you can get at any time (before it's closed) with a call to its .getvalue() method.
To capture any uncaught exception just before it terminates your code, assign to sys.excepthook a function of yours in which you get the information and emit it in any way of your choice; or, to get and emit anything that was written to sys.stderr even without an exception (if that's what you want -- I'm not sure, from your question), use atexit to
register your grab-info-and-emit-it function.

Embedded Python - Blocking operations in time module

I'm developing my own Python code interpreter using the Python C API, as described in the Python documentation. I've taken a look on the Python source code and I tried to follow the same steps that are carried out in the standard interpreter when executing a py file. These steps (sequence of C API function calls) are basically:
PyRun_AnyFileExFlags()
PyRun_SimpleFileExFlags()
PyRun_FileExFlags()
PyArena_New()
PyParser_ASTFromFile()
run_mod()
PyAST_Compile()
PyEval_EvalCode()
PyEval_EvalCodeEx()
PyThreadState_GET()
PyFrame_New()
PyEval_EvalFrameEx()
The only difference in my code is that I do manually the AST compilation, frame creation, etc. and then I call PyEval_EvalFrame.
With this, I am able to execute an arbitrary .py file with my program, as if it were the normal Python interpreter. My problem comes when the code that my program is executing makes use of the time module: all time module operations get blocked in the GIL! For example, if the Python code calls time.sleep(1), this call is blocked and never gets executed.
Obviously I am doing something wrong that blocks the GIL (and therefore blocks the time module) but I dont know how to correct it. The last statement in my code where I have control is in PyEval_EvalFrameEx, and from that point on, everything runs "as in regular Python interpreter", I think.
Anybody had a similar problem? What am I doing wrong, so that I block the time module?
Hope somebody can help me...
Thanks for your time. Best regards,
R.
You need to provide more detail.
How does your interpreter's behavior differ from the standard interpreter?
If you just want to run arbitrary source files, why are you not calling one of the higher level interfaces, like PyRun_SimpleFile? Did your code call Py_Initialize?

Categories

Resources