Seeing what gets written to stderr in Python web site using FastCGI

Seeing what gets written to stderr in Python web site using FastCGI - python

I am working on a website, hosted on DreamHost, using Python. For a while, I was using their default setup, which runs Python scripts using CGI. It worked fine, but I was worried that if I get a lot of traffic, it would run slow and use a lot of memory, so I switched it over to FastCGI using this module.
Overall, it still works fine, but there is one major annoyance: I can't seem to be able to see anything that gets written to the standard error stream. If anything goes wrong, my usual source of useful clues for what to do about it no longer works. Before, I used to see stuff sent to standard error in my Apache error log. Now, it just seems to disappear.
I tried making a test script, and writing strings using sys.stderr.write (from various places), and environ["wsgi.errors"].write (from within my app, where environ is the first parameter passed to the app by the WSGI/FastCGI wrapper). Either way, I couldn't find them. Does anyone know why, or how to access this data?
Keep in mind that this is my first time ever using FastCGI, so please let me know if I am making a bad choice by using this fcgi module.

If something in your system is capturing file-descriptor two (the "real" stderr), you can assign sys.stderr to any open, writeable file object, or to a file-like object (it basically just needs to implement write) -- including a cStdIO.StdIO instance, whose value you can get at any time (before it's closed) with a call to its .getvalue() method.
To capture any uncaught exception just before it terminates your code, assign to sys.excepthook a function of yours in which you get the information and emit it in any way of your choice; or, to get and emit anything that was written to sys.stderr even without an exception (if that's what you want -- I'm not sure, from your question), use atexit to
register your grab-info-and-emit-it function.

Related

How to customize Flyway so that it can handle CSV files as input as well?

Has someone implemented the CSV-handling for Flyway? It was requested some time ago (Flyway specific migration with csv files). Flyway comments it now as a possibility for the MigrationResolver and MigrationExecutor, but it does not seem to be implemented.
I've tried to do it myself with Flyway 4.2, but I'm not very good with java. I got as far as creating my own jar using the sample and make it accessible to flyway. But how does flyway distinguish when to use the SqlMigrator and when to use my CsvMigrator? I thought I have to register my own prefix/suffix (as the question above writes), but FlywayConfiguration seems to be read-only, at least I did not see any API calls for doing this :(.
How to connect the different Resolvers to the different migration file types? (.sql to the migration using Sql and .csv/.py to the loading of Csv and executing python scripts)

After some shed of tears and blood, it looks like came up with something on this. I can't make the whole code available because it is using proprietary file format, but here's the main ideas:
implement the ConfigurationAware as well, and use the setFlywayConfiguration implementation to catalog the extra files you want to handle (i.e. .csv). This is executed only once during the run.
during this cataloging I could not use the scanner or LoadableResources, there's some Java magic I do not understand. All the classes and methods seem to be available and accessible, even when using .getMethods() runtime... but when trying to actually call them during a run it throws java.lang.NoSuchMethodError and java.lang.NoClassDefFoundError. I've wasted a whole day on this - don't do that, just copy-paste the code from org.flywaydb.core.internal.util.scanner.filesystem.FileSystemScanner.
use Set< String > instead of LoadableResources[], way easier to work with, especially since there's no access to LoadableResources anyway and working with [] was a nightmare.
the python/shell call will go to the execute(). Some tips:
any exception or fawlty exitcode needs to be translated to an SQLException.
the build is enforcing Java 1.6, so new ProcessBuilder(cmd).inheritIO() cannot be used. Look at these solutions: ProcessBuilder: Forwarding stdout and stderr of started processes without blocking the main thread if you want to print the STDOUT/STDERR.
to compile flyway including your custom module, clone the whole flyway repo from git, edit the main pom.xml to include your module as well and use this command to compile: "mvn install -P-CommercialDBTest -P-CommandlinePlatformAssemblies -DskipTests=true" (I found this in another stackoverflow question.)
what I haven't done yet is the checksum part, I don't know yet what that wants.

Python fabric put statistics

When I put a file on a remote server (using put()), is there anyway I can see the upload information or statistics printed to the stdout file descriptor?

There's no such way according to the documentation. You could however try the project tools.
There's also the option to play with fabric's local function, but of course breaks the whole host concept.
There's also no way to make fabric more verbose than the default (except for debugging). This makes sense because fabric doesn't really work with terminal escape keys to delete lines again. Displaying statistics would print way to many lines. This would actually be a nice feature - detecting line deletions within fabric and applying them (just throwing the idea out for a potential pull request).

subprocess.call does not wait for the process to complete

Per Python documentation, subprocess.call should be blocking and wait for the subprocess to complete. In this code I am trying to convert few xls files to a new format by calling Libreoffice on command line. I assumed that the call to subprocess call is blocking but seems like I need to add an artificial delay after each call otherwise I miss few files in the out directory.
what am I doing wrong? and why do I need the delay?
from subprocess import call
for i in range(0,len(sorted_files)):
args = ['libreoffice', '-headless', '-convert-to',
'xls', "%s/%s.xls" %(sorted_files[i]['filename'],sorted_files[i]['filename']), '-outdir', 'out']
call(args)
var = raw_input("Enter something: ") # if comment this line I dont get all the files in out directory
EDIT It might be hard to find the answer through the comments below. I used unoconv for document conversion which is blocking and easy to work with from an script.

It's possible likely that libreoffice is implemented as some sort of daemon/intermediary process. The "daemon" will (effectively1) parse the commandline and then farm the work off to some other process, possibly detaching them so that it can exit immediately. (based on the -invisible option in the documentation I suspect strongly that this is indeed the case you have).
If this is the case, then your subprocess.call does do what it is advertised to do -- It waits for the daemon to complete before moving on. However, it doesn't do what you want which is to wait for all of the work to be completed. The only option you have in that scenario is to look to see if the daemon has a -wait option or similar.
1It is likely that we don't have an actual daemon here, only something which behaves similarly. See comments by abernert

The problem is that the soffice command-line tool (which libreoffice is either just a link to, or a further wrapper around) is just a "controller" for the real program soffice.bin. It finds a running copy of soffice.bin and/or creates on, tells it to do some work, and then quits.
So, call is doing exactly the right thing: it waits for libreoffice to quit.
But you don't want to wait for libreoffice to quit, you want to wait for soffice.bin to finish doing the work that libreoffice asked it to do.
It looks like what you're trying to do isn't possible to do directly. But it's possible to do indirectly.
The docs say that headless mode:
… allows using the application without user interface.
This special mode can be used when the application is controlled by external clients via the API.
In other words, the app doesn't quit after running some UNO strings/doing some conversions/whatever else you specify on the command line, it sits around waiting for more UNO commands from outside, while the launcher just runs as soon as it sends the appropriate commands to the app.
You probably have to use that above-mentioned external control API (UNO) directly.
See Scripting LibreOffice for the basics (although there's more info there about internal scripting than external), and the API documentation for details and examples.
But there may be an even simpler answer: unoconv is a simple command-line tool written using the UNO API that does exactly what you want. It starts up LibreOffice if necessary, sends it some commands, waits for the results, and then quits. So if you just use unoconv instead of libreoffice, call is all you need.
Also notice that unoconv is written in Python, and is designed to be used as a module. If you just import it, you can write your own (simpler, and use-case-specific) code to replace the "Main entrance" code, and not use subprocess at all. (Or, of course, you can tear apart the module and use the relevant code yourself, or just use it as a very nice piece of sample code for using UNO from Python.)
Also, the unoconv page linked above lists a variety of other similar tools, some that work via UNO and some that don't, so if it doesn't work for you, try the others.
If nothing else works, you could consider, e.g., creating a sentinel file and using a filesystem watch, so at least you'll be able to detect exactly when it's finished its work, instead of having to guess at a timeout. But that's a real last-ditch workaround that you shouldn't even consider until eliminating all of the other options.

If libreoffice is being using an intermediary (daemon) as mentioned by #mgilson, then one solution is to find out what program it's invoking, and then directly invoke it yourself.

In python, why use logging instead of print?

For simple debugging in a complex project is there a reason to use the python logger instead of print? What about other use-cases? Is there an accepted best use-case for each (especially when you're only looking for stdout)?
I've always heard that this is a "best practice" but I haven't been able to figure out why.

The logging package has a lot of useful features:
Easy to see where and when (even what line no.) a logging call is being made from.
You can log to files, sockets, pretty much anything, all at the same time.
You can differentiate your logging based on severity.
Print doesn't have any of these.
Also, if your project is meant to be imported by other python tools, it's bad practice for your package to print things to stdout, since the user likely won't know where the print messages are coming from. With logging, users of your package can choose whether or not they want to propogate logging messages from your tool or not.

One of the biggest advantages of proper logging is that you can categorize messages and turn them on or off depending on what you need. For example, it might be useful to turn on debugging level messages for a certain part of the project, but tone it down for other parts, so as not to be taken over by information overload and to easily concentrate on the task for which you need logging.
Also, logs are configurable. You can easily filter them, send them to files, format them, add timestamps, and any other things you might need on a global basis. Print statements are not easily managed.

Print statements are sort of the worst of both worlds, combining the negative aspects of an online debugger with diagnostic instrumentation. You have to modify the program but you don't get more, useful code from it.
An online debugger allows you to inspect the state of a running program; But the nice thing about a real debugger is that you don't have to modify the source; neither before nor after the debugging session; You just load the program into the debugger, tell the debugger where you want to look, and you're all set.
Instrumenting the application might take some work up front, modifying the source code in some way, but the resulting diagnostic output can have enormous amounts of detail, and can be turned on or off to a very specific degree. The python logging module can show not just the message logged, but also the file and function that called it, a traceback if there was one, the actual time that the message was emitted, and so on. More than that; diagnostic instrumentation need never be removed; It's just as valid and useful when the program is finished and in production as it was the day it was added; but it can have it's output stuck in a log file where it's not likely to annoy anyone, or the log level can be turned down to keep all but the most urgent messages out.
anticipating the need or use for a debugger is really no harder than using ipython while you're testing, and becoming familiar with the commands it uses to control the built in pdb debugger.
When you find yourself thinking that a print statement might be easier than using pdb (as it often is), You'll find that using a logger pulls your program in a much easier to work on state than if you use and later remove print statements.
I have my editor configured to highlight print statements as syntax errors, and logging statements as comments, since that's about how I regard them.

In brief, the advantages of using logging libraries do outweigh print as below reasons:
Control what’s emitted
Define what types of information you want to include in your logs
Configure how it looks when it’s emitted
Most importantly, set the destination for your logs
In detail, segmenting log events by severity level is a good way to sift through which log messages may be most relevant at a given time. A log event’s severity level also gives you an indication of how worried you should be when you see a particular message. For instance, dividing logging type to debug, info, warning, critical, and error. Timing can be everything when you’re trying to understand what went wrong with an application. You want to know the answers to questions like:
“Was this happening before or after my database connection died?”
“Exactly when did that request come in?”
Furthermore, it is easy to see where a log has occurred through line number and filename or method name even in which thread.
Here's a functional logging library for Python named loguru.

If you use logging then the person responsible for deployment can configure the logger to send it to a custom location, with custom information. If you only print, then that's all they get.

Logging essentially creates a searchable plain text database of print outputs with other meta data (timestamp, loglevel, line number, process etc.).
This is pure gold, I can run egrep over the log file after the python script has run.
I can tune my egrep pattern search to pick exactly what I am interested in and ignore the rest. This reduction of cognitive load and freedom to pick my egrep pattern later on by trial and error is the key benefit for me.
tail -f mylogfile.log | egrep "key_word1|key_word2"
Now throw in other cool things that print can't do (sending to socket, setting debug levels, logrotate, adding meta data etc.), you have every reason to prefer logging over plain print statements.
I tend to use print statements because it's lazy and easy, adding logging needs some boiler plate code, hey we have yasnippets (emacs) and ultisnips (vim) and other templating tools, so why give up logging for plain print statements!?

I would add to all other mentionned advantages that the print function in standard configuration is buffered. The flush may occure only at the end of the current block (the one where the print is).
This is true for any program launched in a non interactive shell (codebuild, gitlab-ci for instance) or whose output is redirected.
If for any reason the program is killed (kill -9, hard reset of the computer, …), you may be missing some line of logs if you used print for the same.
However, the logging library will ensure to flush the logs printed to stderr and stdout immediately at any call.

Send emacs buffer to arbitrary Python process

I like the python-send-buffer command, however I very often use Python embedded in applications, or launch Python via a custom package management system (to launch Python with certain dependencies).. In other words, I can't just run "python" and get a useful Python instance (something that python-send-buffer relies on)
What I would like to achieve is:
in any Python interpreter (or application that allows you to evaluate Python code), import a magic_emacs_python_server.py module (appending to sys.path as necessary)
In emacs, run magic-emacs-python-send-buffer
This would evaluate the buffer in the remote Python instance.
Seems like it should be pretty simple - the Python module listens on a socket, in a thread. It evaluates in the main thread, and returns the repr() of the result (or maybe captures the stdout/stderr, or maybe both). The emacs module would just send text to the socket, waits for a string in response, and displays it in a buffer.
Sounds so simple something like this must exist already... IPython has ipy_vimserver, but this is the wrong way around. There is also swank, while it seems very Lisp-specific, there is a Javascript backend which looks very like what I want... but searching finds almost nothing, other than some vague (possibly true) claims that SLIME doesn't work nicely with non-Lisp languages
In short:
Does a project exist to send code from an emacs buffer to an existing Python process?
If not, how would you recommend I write such a thing (not being very familiar with elisp) - SWANK? IPython's server code? Simple TCP server from scratch?

comint provides most of the infrastructure for stuff like this. There's a bunch of good examples, like this or this
It allows you to run a command, provides things comint-send-string to easily implement send-region type commands.
dbr/remoterepl on Github is a crude proof-of-concept of what I described in the question.
It lacks any kind of polish, but it mostly works - you import the replify.py module in the target interpreter, then evaluate the emacs-remote-repl.el after fixing the stupid hardcoded path to client.py

Doesn't shell-command give you what you are looking for? You could write a wrapper script or adjust the #! and sys.path appropriately.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.