Log rotation - python and windows - python

(I have searched and not found a duplicate for this question but happy to be proved otherwise).
I need to rotate a log from within some Python code. The code is running on Windows (Server 2008 R2).
Initially I used TimedRotatingFileHandler (from Python's logging.handlers package) but this doesn't work as we need due to what I understand is an issue it has with multi-processing (subprocess.check_call is used to kick off another application).
I have checked out ConcurrentLogHandler which looks like it might do the job but I'm a bit concerned that it hasn't been updated since 2013 (although issues have been raised more recently).
UPDATE: an open bug (since 2013) indicates that ConcurrentLogHandler does not work with Python 2.7/Windows. On logging, the code just hangs.
Is there a best practice Windows solution I should be using?

Maybe I'm missing something, but Python's logging module comes with a RotatingFileHandler:
import logging
import time
from logging.handlers import RotatingFileHandler
#----------------------------------------------------------------------
def create_rotating_log(path):
"""
Creates a rotating log
"""
logger = logging.getLogger("Rotating Log")
logger.setLevel(logging.INFO)
# add a rotating handler
handler = RotatingFileHandler(path, maxBytes=20,
backupCount=5)
logger.addHandler(handler)
for i in range(10):
logger.info("This is test log line %s" % i)
time.sleep(1.5)
#----------------------------------------------------------------------
if __name__ == "__main__":
log_file = r"c:\path\to\test.log"
create_rotating_log(log_file)
This worked fine for me with Python 2.7 on Windows 7. Here are a couple of links that go into more detail:
http://www.blog.pythonlibrary.org/2014/02/11/python-how-to-create-rotating-logs/
https://docs.python.org/2/library/logging.handlers.html#rotatingfilehandler

OK - this is what I ended up doing.
Because as far as the logging is concerned things are only multithreaded (additional process does not write to my log) I have ended up hand cranking this. I'm not thrilled by this approach - create a threading lock object, close the logging (logging.shutdown - not thrilled about that either, as the doco says to call it on program exit ...), move the file & start up logging again. Then release the lock.
It is all in a try/except block so that if something goes wrong the lock is released.
Testing indicates that this does what is required.
Does calling logging.shutdown in this kind of context have some repercussions I'm not aware of?!

QueueHandler, which is addressed in the comment of the original question, is available from Python 3.2.
https://docs.python.org/ko/3/library/logging.handlers.html#queuehandler
https://docs.python.org/3/howto/logging-cookbook.html#logging-to-a-single-file-from-multiple-processes
Python documentation also suggests using SocketHandler to send all logs into a socket server that dedicates the file writing process.
https://docs.python.org/3/howto/logging-cookbook.html#network-logging

Related

Why does the tornado logger come with a StreamHandler attached?

I just discovered this very strange behaviour of the logging module in Spyder:
import logging
logging.getLogger("hurricane").handlers
Out[2]: [] # expected
logging.getLogger("tornado").handlers
Out[3]: [<StreamHandler <stderr> (NOTSET)>] # Where does that StreamHandler come from?!
Note that these are the first lines from a freshly started interpreter. So I haven't imported tornado or any other package except logging. Yet, unlike any other logger I tried to get, it comes with a StreamHandler attached.
Why?
Related question: How to prevent Python Tornado from logging to stdout/console?
I think Tornado uses logging for its normal-operation request logging. This is not a great idea - an application is supposed to work exactly as before if all logging is disabled (aside from the logging part, of course) but numerous web server authors use logging functionality as a convenience to do part of their normal operation rather than just using it for diagnostics. I believe you can turn this off using a logging=none configuration option; see this page for more information.

How to disable discord.py logger?

So I've tried to add some logger to my discord bot, to see logs in file not just in console, cause obviously it's irritating when I reset app and find out that I have to check logs that I've already destroyed, I set it up like this:
logging.basicConfig(filename='CoronaLog.log', level=logging.DEBUG, format='%(levelname)s %(asctime)s %(message)s')
And learned the hard way that discord.py library has its own logger installed so now my logs look like one big mess, is there any way to disable discord.py's logging, or at least output it to another file?
EDIT: I've tried creating two loggers, so it would look like this:
logging.basicConfig(filename='discord.log', level=logging.DEBUG, format='%(levelname)s %(asctime)s %(message)s')
nonDiscordLog = logging.getLogger('discord')
handler = logging.FileHandler(filename='CoronaLog.log', encoding='utf-8', mode='w')
handler.setFormatter(logging.Formatter('%(levelname)s %(asctime)s:%(name)s: %(message)s'))
nonDiscordLog.addHandler(handler)
So the discord log, would be logged as the basic config says to discord.log file and, when executed like this:
nonDiscordLog.info("execution took %s seconds \n" % (time.time() - startTime))
would log into CoronaLog.log file, although it didn't really change anything
discord.py is in this regard terribly unintuitive for everyone but beginners, anyway after confronting the docs you can find out that this behavior can be avoided with:
import discord
client = discord.Client(intents=discord.Intents.all())
# Your regular code here
client.run(__YOURTOKEN__, log_handler=None)
Of course, you can supplement your own logger instead of None but to answer your question exactly, this is how you can disable discord.py's logging.
There's actually a bit more that you can do with the default logging, and you can read all about it on
the official docs
https://discordpy.readthedocs.io/en/latest/logging.html says:
"discord.py logs errors and debug information via the logging python module. It is strongly recommended that the logging module is configured, as no errors or warnings will be output if it is not set up. Configuration of the logging module can be as simple as:
import logging
logging.basicConfig(level=logging.INFO)
Placed at the start of the application. This will output the logs from discord as well as other libraries that use the logging module directly to the console."
Maybe try configuring logging a different way? Because when starting logging, it appears to initialize discord.py's llogging. maybe try
import logging
# Setup logging...
import discord
Maybe if you import it afterwords, it won't set it up.

syslog.syslog vs SysLogHandler

I'm looking at how to log to syslog from within my Python app, and I found there are two ways of doing it:
Using syslog.syslog() routines
Using the logger module SysLogHandler
Which is the best option to use, advantages/disadvantages of each one, etc, because I really don't know which one should I use.
syslog.syslog() can only be used to send messages to the local syslogd. SysLogHandler can be used as part of a comprehensive, configurable logging subsystem, and can log to remote machines.
The logging module is a more comprehensive solution that can potentially handle all of your log messages, and is very flexible. For instance, you can setup multiple handers for your logger and each can be set to log at a different level. You can have a SysLogHandler for sending errors to syslog, and a FileHandler for debugging logs, and an SMTPHandler to email the really critical messages to ops. You can also define a hierarchy of loggers within your modules, and each one has its own level so you can enable/disable messages from specific modules, such as:
import logging
logger = logging.getLogger('package.stable_module')
logger.setLevel(logging.WARNING)
And in another module:
import logging
logger = logging.getLogger('package.buggy_module')
logger.setLevel(logging.DEBUG)
The log messages in both of the these modules will be sent, depending on the level, to the 'package' logger and ultimately to the handlers you've defined. You can also add handlers directly to the module loggers, and so on. If you've followed along this far and are still interested, then I recommend jumping to the logging tutorial for more details.
So far, there is a disadvantage in logging.handlers.SysLogHander which is not mentioned yet. That is I can't set options like LOG_ODELAY or LOG_NOWAIT or LOG_PID. On the other hands, LOG_CONS and LOG_PERROR can be achieved with adding more handlers, and LOG_NDELAY is already set by default, because the connection opens when the handler is instantiated.

Python multiprocessing logging - why multiprocessing.get_logger

I've been struggled with multiprocessing logging for some time, and for many reasons.
One of my reason is, why another get_logger.
Of course I've seen this question and it seems the logger that multiprocessing.get_logger returns do some "process-shared locks" magic to make logging handling smooth.
So, today I looked into the multiprocessing code of Python 2.7 (/multiprocessing/util.py), and found that this logger is just a plain logging.Logger, and there's barely any magic around it.
Here's the description in Python documentation, right before the
get_logger function:
Some support for logging is available. Note, however, that the logging
package does not use process shared locks so it is possible (depending
on the handler type) for messages from different processes to get
mixed up.
So when you use a wrong logging handler, even the get_logger logger may go wrong?
I've used a program uses get_logger for logging for some time.
It prints logs to StreamHandler and (seems) never gets mixed up.
Now My theory is:
multiprocessing.get_logger don't do process-shared locks at all
StreamHandler works for multiprocessing, but FileHandler doesn't
major purpose of this get_logger logger is for tracking processes'
life-cycle, and provide a easy-to-get and ready-to-use logger
that already logs process's name/id kinds of stuff
Here's the question:
Is my theory right?
How/Why/When do you use this get_logger?
Yes, I believe you're right that multiprocessing.get_logger() doesn't do process-shared locks - as you say, the docs even state this. Despite all the upvotes, it looks like the question you link to is flawed in stating that it does (to give it the benefit of doubt, it was written over a decade ago - so perhaps that was the case at one point).
Why does multiprocessing.get_logger() exist then? The docs say that it:
Returns the logger used by multiprocessing. If necessary, a new one will be created.
When first created the logger has level logging.NOTSET and no default handler. Messages sent to this logger will not by default propagate to the root logger.
i.e. by default the multiprocessing module will not produce any log output since its logger's logging level is set to NOTSET so no log messages are produced.
If you were to have a problem with your code that you suspected to be an issue with multiprocessing, that lack of log output wouldn't be helpful for debugging, and that's what multiprocessing.get_logger() exists for - it returns the logger used by the multiprocessing module itself so that you can override the default logging configuration to get some logs from it and see what it's doing.
Since you asked for how to use multiprocessing.get_logger(), you'd call it like so and configure the logger in the usual fashion, for example:
logger = multiprocessing.get_logger()
formatter = logging.Formatter('[%(levelname)s/%(processName)s] %(message)s')
handler = logging.StreamHandler()
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.setLevel(logging.INFO)
# now run your multiprocessing code
That said, you may actually want to use multiprocessing.log_to_stderr() instead for convenience - as per the docs:
This function performs a call to get_logger() but in addition to returning the logger created by get_logger, it adds a handler which sends output to sys.stderr using format '[%(levelname)s/%(processName)s] %(message)s'
i.e. it saves you needing to set up quite so much logging config yourself, and you can instead start debugging your multiprocessing issue with just:
logger = multiprocessing.log_to_stderr()
logger.setLevel(logging.INFO)
# now run your multiprocessing code
To reiterate though, that's just a normal module logger that's being configured and used, i.e. there's nothing special or process-safe about it. It just lets you see what's happening inside the multiprocessing module itself.
This answer is not about get_logger specifically, but perhaps you can use the approach suggested in this post? Note that the QueueHandler/QueueListener classes are available for earlier Python versions via the logutils package (available on PyPI, too).

Django and fcgi - logging question

I have a site running in Django. Frontend is lighttpd and is using fcgi to host django.
I start my fcgi processes as follows:
python2.6 /<snip>/manage.py runfcgi maxrequests=10 host=127.0.0.1 port=8000 pidfile=django.pid
For logging, I have a RotatingFileHandler defined as follows:
file_handler = RotatingFileHandler(filename, maxBytes=10*1024*1024, backupCount=5,encoding='utf-8')
The logging is working. However, it looks like the files are rotating when they do not even get up to 10Kb, let alone 10Mb. My guess is that each fcgi instance is only handling 10 requests, and then re-spawning. Each respawn of fcgi creates a new file. I confirm that fcgi is starting up under new process id every so often (hard to tell time exactly, but under a minute).
Is there any way to get around this issues? I would like all fcgi instances logging to one file until it reaches the size limit, at which point a log file rotation would take place.
As Alex stated, logging is thread-safe, but the standard handlers cannot be safely used to log from multiple processes into a single file.
ConcurrentLogHandler uses file locking to allow for logging from within multiple processes.
In your shoes I'd switch to a TimedRotatingFileHandler -- I'm surprised that the size-based rotating file handles is giving this problem (as it should be impervious to what processes are producing the log entries), but the timed version (though not controlled on exactly the parameter you prefer) should solve it. Or, write your own, more solid, rotating file handler (you can take a lot from the standard library sources) that ensures varying processes are not a problem (as they should never be).
As you appear to be using the default file opening mode of append ("a") rather than write ("w"), if a process re-spawns it should append to the existing file, then rollover when the size limit is reached. So I am not sure that what you are seeing is caused by re-spawning CGI processes. (This of course assumes that the filename remains the same when the process re-spawns).
Although the logging package is thread-safe, it does not handle concurrent access to the same file from multiple processes - because there is no standard way to do it in the stdlib. My normal advice is to set up a separate daemon process which implements a socket server and logs events received across it to file - the other processes then just implement a SocketHandler to communicate with the logging daemon. Then all events will get serialised to disk properly. The Python documentation contains a working socket server which could serve as a basis for this need.

Categories

Resources