Handling logs in multiprocess tornado apps

Handling logs in multiprocess tornado apps - python

### Start 4 subprocesses ###
server = tornado.httpserver.HTTPServer(app)
server.bind(8000)
server.start(4) # 4 subprocesses
### Logger using TimeRotatingFileHandler within each app ###
timefilehandler = logging.handlers.TimedRotatingFileHandler(
filename=os.path.join(dirname, logname + '.log'),
when='MIDNIGHT',
interval=1,
encoding='utf-8'
)
Using tornado with mutiple subprocesses and logger resulted in multiple logging files subfixed like(if using file name as logging name):
service_0.log
service_1.log
service_2.log
service_3.log
Is it possible to enable all the subprocesses to write to one place in tornado? Or if it is better to use some log aggregation tools to handle the hassle since it is quite inconvenient to check the logs one by one, any ideas? Thanks in advance.

You can't have different (sub)processes write to a single file - if you want to solve that you should use a log aggregator where different tornado servers log to a common endpoint (either in the cloud or locally). If you're not inclined to use a third party solution you can write one in Tornado.
Look into https://docs.python.org/3/library/logging.handlers.html to see if there's anything you like.
Or you can grep 4 files at the same time.
p.s. IIRC using subprocesses is not recommended for production, so I would suggest you run 4 processes with different ports and use the port in the log name as well.

Related

Generate a consolidated log from multiple remote processes

I have script that spawns another script on several hosts on a network. These scripts generate output that I want to capture. So the only option I can see right now is this:
Log each process' output in a separate file (like 20130308.hostname.log, etc).
Is there a way to generate a consolidated log out of all the processes? By consolidated I mean something like this:
host1:
outputline1
outputline2
host2:
outputline1
outputline2
outputline3
host3:
...
I want to be able to open one file - and check up on what happened on a particular host.

If I understood well your question, you want to have several processes log into one file. There are better option then the one you suggest, please check the python logging cookbook. 'Multiple modules' and 'multiple handlers' should to solve your issue.

You can use a centralized logging system like syslog for aggregating all logging messages into a central log. Python's logging module provides a SyslogHandler for that.
For aggregating individual logfiles into one logfile manually you will have to write a script performing the aggregation and merging.

Where to see stdout if I use mod_wsgi to serve application?

Sometimes I need to check the output of a python web apllication.
If I execute the application directly, I can see it from terminal screen.
But I have no idea how to check that for mod_wsgi. Will it appear in a seperate log of apache? Or do I need to add some codes for logging?

Instead of print "message", you could use sys.stderr.write("message")
For logging to stderr with a StreamHandler:
import logging
handler = logging.StreamHandler(stream=sys.stderr)
log = logging.getLogger(__name__)
log.setLevel(logging.INFO)
log.addHandler(handler)
log.info("Message")

wsgilog is simple, and can redirect standard out to a log file automatically for you. Breeze to set up, and I haven't had any real problems with it.

No WSGI application component which claims to be portable should write to standard output. That is, an application should not use the Python print statement without directing output to some alternate stream. An application should also not write directly to sys.stdout. (ModWSGI Wiki)
So don't... Instead, I recommend using a log aggregation tool like sentry. It is useful while developing and a must-have in production.

Streaming text logfiles into RabbitMQ, then reconstructing at other end?

Requirements
We have several servers (20-50) - Solaris 10 and Linux (SLES) - running a mix of different applications, each generating a bunch of log events into textfiles. We need to capture these to a separate monitoring box, where we can do analysis/reporting/alerts.
Current Approach
Currently, we use SSH with a remote "tail -f" to stream the logfiles from the servers onto the monitoring box. However, this is somewhat brittle.
New Approach
I'd like to replace this with RabbitMQ. The servers would publish their log events into this, and each monitoring script/app could then subscribe to the appropriate queue.
Ideally, we'd like the applications themselves to dump events directly into the RabbitMQ queue.
However, assuming that's not an option in the short term (we may not have source for all the apps), we need a way to basically "tail -f" the logfiles from disk. I'm most comfortable in Python, so I was looking at a Pythonic way of doing that - the consensus seems to be to just use a loop with readline() and sleep() to emulate "tail -f".
Questions
Is there an easier way of "tail -f" a whole bunch of textfiles directly onto a RabbitMQ stream? Something inbuilt, or an extension we could leverage on? Any other tips/advice here?
If we do write a Python wrapper to capture all the logfiles and publish them - I'd ideally like a single Python script to concurrently handle all the logfiles, rather than manually spinning up a separate instance for each logfile. How should we tackle this? Are there considerations in terms of performance, CPU usage, throughput, concurrency etc.?
We need to subscribe to the queues, and then possibly dump the events back to disk and reconstruct the original logfiles. Any tips/advice on this? And we'd also like a single Python script we could startup to handle reconstructing all of the logfiles - rather than 50 separate instances of the same script - is that easily achievable?
Cheers,
Victor
PS: We did have a look at Facebook's Scribe, as well as Flume, and both seem a little heavyweight for our needs.

You seem to be describing centralized syslog with rabbitmq as the transport.
If you could live with syslog, take a look at syslog-ng. Otherwise, you might
save some time by using parts of logstash ( http://logstash.net/ ).

If it would be possible you can make the Application publish the events Asynchronously to RabbitMQ instead of writing it to log files. I have done this currently in Java.
But some times it is not possible to make the app log the way you want.
1 ) You can write a file tailer in python which publishes to AMQP. I don't know of anything which plugs in a File as the input to RabbitMQ. Have a look at http://code.activestate.com/recipes/436477-filetailpy/ and http://www.perlmonks.org/?node_id=735039 for tailing files.
2) You can create a Python Daemon which can tail all the given files either as processes or in a round robin fashion.
3) A similar approach to 2 can help you solve this. You can probably have a single queue for each log file.

If you are talking about application logging (as opposed to e.g. access logs such as Apache webserver logs), you can use a handler for stdlib logging which writes to AMQP middleware.

How to add contextual information to log lines from multiprocessing workers?

I have a pool of worker processes (using multiprocessing.Pool) and want to log from these to a single log file. I am aware of logging servers, syslog, etc. but they all seem to require some changes to how my app is installed, monitored, logs processed etc. which I would like to avoid.
I am using CPython 2.6 on Linux.
Finally I stumbled into a solution which almost works for me. The basic idea is that you start a log listener process, set up a queue between it and the worker processes, and the workers log into the queue (using QueueHandler), and the listener then formats and serializes the log lines to a file.
This is all working so far according to the solution linked above.
But then I wanted to have the workers log some contextual information, for example a job token, for every log line. In pool.apply_async() method I can pass in the contextual info I want to be logged. Note that I am only interested in the contextual information while the worker is doing the specific job; when it is idle there should not be any contextual information if the worker wants to log something. So basically the log listener has log format specified as something like:
"%(job_token)s %(process)d %(asctime)s %(msg)"
and the workers are supposed to provide job_token as contextual info in the log record (the other format specifiers are standard).
I have looked at custom log filters. With custom filter I can create a filter when the job starts and apply the filter to the root logger, but I am using 3rd party modules which create their own loggers (typically at module import time), and my custom filter is not applied to them.
Is there a way to make this work in the above setup? Or is there some alternative way to make this work (remember that I would still prefer a single log file, no separate log servers, job-specific contextual information for worker log lines)?

Filters can be applied to handlers as well as loggers - so you can just apply the filter to your QueueHandler. If this handler is attached to the root logger in your processes, then any logging by third party modules should also be handled by the handler, so you should get the context in those logged events, too.

Django and fcgi - logging question

I have a site running in Django. Frontend is lighttpd and is using fcgi to host django.
I start my fcgi processes as follows:
python2.6 /<snip>/manage.py runfcgi maxrequests=10 host=127.0.0.1 port=8000 pidfile=django.pid
For logging, I have a RotatingFileHandler defined as follows:
file_handler = RotatingFileHandler(filename, maxBytes=10*1024*1024, backupCount=5,encoding='utf-8')
The logging is working. However, it looks like the files are rotating when they do not even get up to 10Kb, let alone 10Mb. My guess is that each fcgi instance is only handling 10 requests, and then re-spawning. Each respawn of fcgi creates a new file. I confirm that fcgi is starting up under new process id every so often (hard to tell time exactly, but under a minute).
Is there any way to get around this issues? I would like all fcgi instances logging to one file until it reaches the size limit, at which point a log file rotation would take place.

As Alex stated, logging is thread-safe, but the standard handlers cannot be safely used to log from multiple processes into a single file.
ConcurrentLogHandler uses file locking to allow for logging from within multiple processes.

In your shoes I'd switch to a TimedRotatingFileHandler -- I'm surprised that the size-based rotating file handles is giving this problem (as it should be impervious to what processes are producing the log entries), but the timed version (though not controlled on exactly the parameter you prefer) should solve it. Or, write your own, more solid, rotating file handler (you can take a lot from the standard library sources) that ensures varying processes are not a problem (as they should never be).

As you appear to be using the default file opening mode of append ("a") rather than write ("w"), if a process re-spawns it should append to the existing file, then rollover when the size limit is reached. So I am not sure that what you are seeing is caused by re-spawning CGI processes. (This of course assumes that the filename remains the same when the process re-spawns).
Although the logging package is thread-safe, it does not handle concurrent access to the same file from multiple processes - because there is no standard way to do it in the stdlib. My normal advice is to set up a separate daemon process which implements a socket server and logs events received across it to file - the other processes then just implement a SocketHandler to communicate with the logging daemon. Then all events will get serialised to disk properly. The Python documentation contains a working socket server which could serve as a basis for this need.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.