How to set the name of a QThread in pyqt?

How to set the name of a QThread in pyqt? - python

I am using QtCore.QThread (from PyQt4).
To log, I am also using the following formatter :
logging.Formatter('%(levelname)-8s %(asctime)s %(threadName)-15s %(message)s')
The resulting log is :
DEBUG 2012-10-01 03:59:31,479 Dummy-3 my_message
My problem is that I want to know more explicitly which thread is logging... Dummy-3 is not the most explicit name to me....
Is there a way to set a name to a QtCore.QThread that will be usable by the logging module (as a LogRecord attribute) in order to have a log more meaningful ?
Thanks !

If the threading module is available, the logging module will use threading.current_thread().name to set the threadName LogRecord attribute.
But the docs for threading.current_thread say that a dummy thread object will be used if the current thread was not created by the threading module (hence the "Dummy-x" name).
I suppose it would be possible to monkey-patch threading.current_thread to reset the name to something more appropriate. But surely a much better approach would be to make use of the extra dictionary when logging a message:
logging.Formatter('%(levelname)-8s %(asctime)s %(qthreadname)-15s %(message)s')
...
extras = {'qthreadname': get_qthreadname()}
logging.warning(message, extra=extras)

I had exactly your problem: I have a Qt GUI app running in the main thread and several Workers running in separate threads.
I started using the extras = {'qthreadname': get_qthreadname()} approach suggested by Ekhumoro, but the problem was the integration with other libraries using logging. If you don't provide the extra dictionary, logging will throw an exception as written in the doc and here summarized:
If you choose to use these attributes in logged messages, you need to exercise some care. In the above example, for instance, the Formatter has been set up with a format string which expects ‘clientip’ and ‘user’ in the attribute dictionary of the LogRecord. If these are missing, the message will not be logged because a string formatting exception will occur. So in this case, you always need to pass the extra dictionary with these keys.
While this might be annoying, this feature is intended for use in specialized circumstances, such as multi-threaded servers where the same code executes in many contexts, and interesting conditions which arise are dependent on this context (such as remote client IP address and authenticated user name, in the above example). In such circumstances, it is likely that specialized Formatters would be used with particular Handlers.
Instead of having specialized Formatters and Handlers, I have found a different approach. A QThread is also a thread and you can always get a reference to the current thread and set its name. Here a snippet of my code:
import threading
#
# your code
worker = MyWorker()
worker_thread = QThread()
worker_thread.setObjectName('MyThread')
worker.moveToThread(worker_thread)
#
# your code
class MyWorker(QtCore.QtObject):
# your code
def start(self):
threading.current_thread().name = QtCore.QThread.currentThread().objectName()
# your code
Now all the log messages arriving from the QThread are properly identify.
I hope it can help you!

From the Qt5 documentation, you can call setObjectName() to modify the thread name
To choose the name that your thread will be given (as identified by the command ps -L on Linux, for example), you can call setObjectName() before starting the thread.
If you don't call setObjectName(), the name given to your thread will be the class name of the runtime type of your thread object (for example, "RenderThread" in the case of the Mandelbrot Example, as that is the name of the QThread subclass).
Unfortunately, it also notes:
this is currently not available with release builds on Windows.

Related

How to enable logging so messages kept internally until get/clear?

In an application I want to collect messages related to some dedicated part of the processing, and then show these messages later at user request. I would like to report severity (e.g. info, warning), time of message, etc., so for this I considered to use the Python standard logging module to collect the messages and related information. However, I don't want these messages to go to a console or file.
Is there a way to create a Python logger, using logging, where the messages are kept internally (in memory) only, until read out by the application. I would expect start of code like:
log = logging.getLogger('my_logger')
... some config of log for internal only; not to console
log.error('Just some error')
... some code to get/clear messages in log until now
I have tried to look in logging — Logging facility for Python, but most example are for immediate output to file or console, so an example for internal logging or reference is appreciated.

You should just use another handler. You could use a StreamHandler over an io.StringIO that would simply log to memory:
log = logging.getLogger('my_logger')
memlog = io.StringIO()
log.addHandler(logging.StreamHandler(memlog))
All logging sent to log can be found in memlog.getvalue()
Of course, this is just a simple Handler that concatenates everything in one single string, even if for versions >= 3.2 each record is terminated, by default with a \n. For more specific requirements, you could have a look at a QueueHandler or implement a dedicated Handler.
References: logging.handlers in the Python Standard Library reference manual.

How to prevent duplicate exception logging from a celery task

Second Edit: After a bit of digging, the question changed from how to log an exception with local variables to how to prevent celery from sending the 2nd log message which does not has the local vars. After the below attempt, I actually noticed that I was always receiving 2 emails, one with local vars per frame and the other without.
First Edit: I've managed to sort of get the local variables, by adding a custom on_failure override (using annotations for all tasks like so:
def include_f_locals(self, exc, task_id, args, kwargs, einfo):
import logging
logger = logging.getLogger('celery')
logger.error(exc, exc_info=einfo)
CELERY_ANNOTATIONS = {'*': {'on_failure': include_f_locals}}
But the problem now is that the error arrives 3 times, once through the celery logger and twice through root (although I'm not propagating 'celery' logger in my logging settings)
Original question:
I have a django/celery project to which I recently added a sentry handler as the root logger with level 'ERROR'. This works fine for most errors and exceptions occurring in django, except the ones coming from celery workers.
What happens is that sentry receives an exception with the traceback and the locals of the daemon, but does not include the f_locals (local vars) of each frame in the stack. And these do appear in normal python/django exceptions.
I guess I could try to catch all exceptions and log with exc_info manually. But this is less than ideal.

Funny enough, all my troubles went away when I simply upgraded raven to a version after 5.0 (specifically after 5.1).
While I'm not sure which changes caused exceptions to be logged properly (with f_locals correctly appearing in sentry), but the fact remains that raven < 5.0 did not work for me.
Also, there is no need to do any fancy CELERY_ANNOTATIONS legwork (as above), simply defining sentry handler for the correct logger seems to be enough to catch exceptions, as well as message of other logging levels.

syslog.syslog vs SysLogHandler

I'm looking at how to log to syslog from within my Python app, and I found there are two ways of doing it:
Using syslog.syslog() routines
Using the logger module SysLogHandler
Which is the best option to use, advantages/disadvantages of each one, etc, because I really don't know which one should I use.

syslog.syslog() can only be used to send messages to the local syslogd. SysLogHandler can be used as part of a comprehensive, configurable logging subsystem, and can log to remote machines.

The logging module is a more comprehensive solution that can potentially handle all of your log messages, and is very flexible. For instance, you can setup multiple handers for your logger and each can be set to log at a different level. You can have a SysLogHandler for sending errors to syslog, and a FileHandler for debugging logs, and an SMTPHandler to email the really critical messages to ops. You can also define a hierarchy of loggers within your modules, and each one has its own level so you can enable/disable messages from specific modules, such as:
import logging
logger = logging.getLogger('package.stable_module')
logger.setLevel(logging.WARNING)
And in another module:
import logging
logger = logging.getLogger('package.buggy_module')
logger.setLevel(logging.DEBUG)
The log messages in both of the these modules will be sent, depending on the level, to the 'package' logger and ultimately to the handlers you've defined. You can also add handlers directly to the module loggers, and so on. If you've followed along this far and are still interested, then I recommend jumping to the logging tutorial for more details.

So far, there is a disadvantage in logging.handlers.SysLogHander which is not mentioned yet. That is I can't set options like LOG_ODELAY or LOG_NOWAIT or LOG_PID. On the other hands, LOG_CONS and LOG_PERROR can be achieved with adding more handlers, and LOG_NDELAY is already set by default, because the connection opens when the handler is instantiated.

Python multiprocessing logging - why multiprocessing.get_logger

I've been struggled with multiprocessing logging for some time, and for many reasons.
One of my reason is, why another get_logger.
Of course I've seen this question and it seems the logger that multiprocessing.get_logger returns do some "process-shared locks" magic to make logging handling smooth.
So, today I looked into the multiprocessing code of Python 2.7 (/multiprocessing/util.py), and found that this logger is just a plain logging.Logger, and there's barely any magic around it.
Here's the description in Python documentation, right before the
get_logger function:
Some support for logging is available. Note, however, that the logging
package does not use process shared locks so it is possible (depending
on the handler type) for messages from different processes to get
mixed up.
So when you use a wrong logging handler, even the get_logger logger may go wrong?
I've used a program uses get_logger for logging for some time.
It prints logs to StreamHandler and (seems) never gets mixed up.
Now My theory is:
multiprocessing.get_logger don't do process-shared locks at all
StreamHandler works for multiprocessing, but FileHandler doesn't
major purpose of this get_logger logger is for tracking processes'
life-cycle, and provide a easy-to-get and ready-to-use logger
that already logs process's name/id kinds of stuff
Here's the question:
Is my theory right?
How/Why/When do you use this get_logger?

Yes, I believe you're right that multiprocessing.get_logger() doesn't do process-shared locks - as you say, the docs even state this. Despite all the upvotes, it looks like the question you link to is flawed in stating that it does (to give it the benefit of doubt, it was written over a decade ago - so perhaps that was the case at one point).
Why does multiprocessing.get_logger() exist then? The docs say that it:
Returns the logger used by multiprocessing. If necessary, a new one will be created.
When first created the logger has level logging.NOTSET and no default handler. Messages sent to this logger will not by default propagate to the root logger.
i.e. by default the multiprocessing module will not produce any log output since its logger's logging level is set to NOTSET so no log messages are produced.
If you were to have a problem with your code that you suspected to be an issue with multiprocessing, that lack of log output wouldn't be helpful for debugging, and that's what multiprocessing.get_logger() exists for - it returns the logger used by the multiprocessing module itself so that you can override the default logging configuration to get some logs from it and see what it's doing.
Since you asked for how to use multiprocessing.get_logger(), you'd call it like so and configure the logger in the usual fashion, for example:
logger = multiprocessing.get_logger()
formatter = logging.Formatter('[%(levelname)s/%(processName)s] %(message)s')
handler = logging.StreamHandler()
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.setLevel(logging.INFO)
# now run your multiprocessing code
That said, you may actually want to use multiprocessing.log_to_stderr() instead for convenience - as per the docs:
This function performs a call to get_logger() but in addition to returning the logger created by get_logger, it adds a handler which sends output to sys.stderr using format '[%(levelname)s/%(processName)s] %(message)s'
i.e. it saves you needing to set up quite so much logging config yourself, and you can instead start debugging your multiprocessing issue with just:
logger = multiprocessing.log_to_stderr()
logger.setLevel(logging.INFO)
# now run your multiprocessing code
To reiterate though, that's just a normal module logger that's being configured and used, i.e. there's nothing special or process-safe about it. It just lets you see what's happening inside the multiprocessing module itself.

This answer is not about get_logger specifically, but perhaps you can use the approach suggested in this post? Note that the QueueHandler/QueueListener classes are available for earlier Python versions via the logutils package (available on PyPI, too).

How to add contextual information to log lines from multiprocessing workers?

I have a pool of worker processes (using multiprocessing.Pool) and want to log from these to a single log file. I am aware of logging servers, syslog, etc. but they all seem to require some changes to how my app is installed, monitored, logs processed etc. which I would like to avoid.
I am using CPython 2.6 on Linux.
Finally I stumbled into a solution which almost works for me. The basic idea is that you start a log listener process, set up a queue between it and the worker processes, and the workers log into the queue (using QueueHandler), and the listener then formats and serializes the log lines to a file.
This is all working so far according to the solution linked above.
But then I wanted to have the workers log some contextual information, for example a job token, for every log line. In pool.apply_async() method I can pass in the contextual info I want to be logged. Note that I am only interested in the contextual information while the worker is doing the specific job; when it is idle there should not be any contextual information if the worker wants to log something. So basically the log listener has log format specified as something like:
"%(job_token)s %(process)d %(asctime)s %(msg)"
and the workers are supposed to provide job_token as contextual info in the log record (the other format specifiers are standard).
I have looked at custom log filters. With custom filter I can create a filter when the job starts and apply the filter to the root logger, but I am using 3rd party modules which create their own loggers (typically at module import time), and my custom filter is not applied to them.
Is there a way to make this work in the above setup? Or is there some alternative way to make this work (remember that I would still prefer a single log file, no separate log servers, job-specific contextual information for worker log lines)?

Filters can be applied to handlers as well as loggers - so you can just apply the filter to your QueueHandler. If this handler is attached to the root logger in your processes, then any logging by third party modules should also be handled by the handler, so you should get the context in those logged events, too.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.