How to prevent duplicate exception logging from a celery task

How to prevent duplicate exception logging from a celery task - python

Second Edit: After a bit of digging, the question changed from how to log an exception with local variables to how to prevent celery from sending the 2nd log message which does not has the local vars. After the below attempt, I actually noticed that I was always receiving 2 emails, one with local vars per frame and the other without.
First Edit: I've managed to sort of get the local variables, by adding a custom on_failure override (using annotations for all tasks like so:
def include_f_locals(self, exc, task_id, args, kwargs, einfo):
import logging
logger = logging.getLogger('celery')
logger.error(exc, exc_info=einfo)
CELERY_ANNOTATIONS = {'*': {'on_failure': include_f_locals}}
But the problem now is that the error arrives 3 times, once through the celery logger and twice through root (although I'm not propagating 'celery' logger in my logging settings)
Original question:
I have a django/celery project to which I recently added a sentry handler as the root logger with level 'ERROR'. This works fine for most errors and exceptions occurring in django, except the ones coming from celery workers.
What happens is that sentry receives an exception with the traceback and the locals of the daemon, but does not include the f_locals (local vars) of each frame in the stack. And these do appear in normal python/django exceptions.
I guess I could try to catch all exceptions and log with exc_info manually. But this is less than ideal.

Funny enough, all my troubles went away when I simply upgraded raven to a version after 5.0 (specifically after 5.1).
While I'm not sure which changes caused exceptions to be logged properly (with f_locals correctly appearing in sentry), but the fact remains that raven < 5.0 did not work for me.
Also, there is no need to do any fancy CELERY_ANNOTATIONS legwork (as above), simply defining sentry handler for the correct logger seems to be enough to catch exceptions, as well as message of other logging levels.

Related

Changing log levels per request in Flask

I have a Flask codebase with the standard library's logging module used throughout. I'd like to be able to increase and decrease the log level on a per-request basis. This way I could get trace or debug level information logged just on the specific requests I'm interested in debugging by setting a query string param or something similar to that.
I know that Flask has a before_request and after_request decorator that could be used to raise and lower the log level. However, it would be a bad deal if an unhandled exception is thrown and prevented the log level from being reset to its default level. Is there a reliable way to set and restore the log level per request?
I am also not very familiar with how the logging module interacts with threads and Flask. If a multi-threaded application server is running my Flask app is the logger shared across threads? I don't want to lower the log level in one request and have another, unrelated request execute with the first requests's logger.

You can use the Flask logger to create a child logger:
#app.before_request
def before_request():
if 'trace' in request.args:
trace_logger = app.logger.getChild('trace')
trace_logger.setLevel(logging.DEBUG)
g._trace_logger = trace_logger
It will re-use the parent's configuration. Specifically, it'll use the handlers and formatters you've configured but not the parent's log level or filters. The logger is also scoped to the request via the g object so it will not be available in other requests, making this design thread-safe.
Next, write a helper function:
def trace(*args, **kwargs):
logger = getattr(g, '_trace_logger', None)
if logger is not None:
logger.debug(*args, **kwargs)
You can use this function as if it were a logger call throughout your codebase.
If you are not running in trace mode the function is a no-op.

Invalid transaction persisting across requests

Summary
One of our threads in production hit an error and is now producing InvalidRequestError: This session is in 'prepared' state; no further SQL can be emitted within this transaction. errors, on every request with a query that it serves, for the rest of its life! It's been doing this for days, now! How is this possible, and how can we prevent it going forward?
Background
We are using a Flask app on uWSGI (4 processes, 2 threads), with Flask-SQLAlchemy providing us DB connections to SQL Server.
The problem seemed to start when one of our threads in production was tearing down its request, inside this Flask-SQLAlchemy method:
#teardown
def shutdown_session(response_or_exc):
if app.config['SQLALCHEMY_COMMIT_ON_TEARDOWN']:
if response_or_exc is None:
self.session.commit()
self.session.remove()
return response_or_exc
...and somehow managed to call self.session.commit() when the transaction was invalid. This resulted in sqlalchemy.exc.InvalidRequestError: Can't reconnect until invalid transaction is rolled back getting output to stdout, in defiance of our logging configuration, which makes sense since it happened during the app context tearing down, which is never supposed to raise exceptions. I'm not sure how the transaction got to be invalid without response_or_exec getting set, but that's actually the lesser problem AFAIK.
The bigger problem is, that's when the "'prepared' state" errors started, and haven't stopped since. Every time this thread serves a request that hits the DB, it 500s. Every other thread seems to be fine: as far as I can tell, even the thread that's in the same process is doing OK.
Wild guess
The SQLAlchemy mailing list has an entry about the "'prepared' state" error saying it happens if a session started committing and hasn't finished yet, and something else tries to use it. My guess is that the session in this thread never got to the self.session.remove() step, and now it never will.
I still feel like that doesn't explain how this session is persisting across requests though. We haven't modified Flask-SQLAlchemy's use of request-scoped sessions, so the session should get returned to SQLAlchemy's pool and rolled back at the end of the request, even the ones that are erroring (though admittedly, probably not the first one, since that raised during the app context tearing down). Why are the rollbacks not happening? I could understand it if we were seeing the "invalid transaction" errors on stdout (in uwsgi's log) every time, but we're not: I only saw it once, the first time. But I see the "'prepared' state" error (in our app's log) every time the 500s occur.
Configuration details
We've turned off expire_on_commit in the session_options, and we've turned on SQLALCHEMY_COMMIT_ON_TEARDOWN. We're only reading from the database, not writing yet. We're also using Dogpile-Cache for all of our queries (using the memcached lock since we have multiple processes, and actually, 2 load-balanced servers). The cache expires every minute for our major query.
Updated 2014-04-28: Resolution steps
Restarting the server seems to have fixed the problem, which isn't entirely surprising. That said, I expect to see it again until we figure out how to stop it. benselme (below) suggested writing our own teardown callback with exception handling around the commit, but I feel like the bigger problem is that the thread was messed up for the rest of its life. The fact that this didn't go away after a request or two really makes me nervous!

Edit 2016-06-05:
A PR that solves this problem has been merged on May 26, 2016.
Flask PR 1822
Edit 2015-04-13:
Mystery solved!
TL;DR: Be absolutely sure your teardown functions succeed, by using the teardown-wrapping recipe in the 2014-12-11 edit!
Started a new job also using Flask, and this issue popped up again, before I'd put in place the teardown-wrapping recipe. So I revisited this issue and finally figured out what happened.
As I thought, Flask pushes a new request context onto the request context stack every time a new request comes down the line. This is used to support request-local globals, like the session.
Flask also has a notion of "application" context which is separate from request context. It's meant to support things like testing and CLI access, where HTTP isn't happening. I knew this, and I also knew that that's where Flask-SQLA puts its DB sessions.
During normal operation, both a request and an app context are pushed at the beginning of a request, and popped at the end.
However, it turns out that when pushing a request context, the request context checks whether there's an existing app context, and if one's present, it doesn't push a new one!
So if the app context isn't popped at the end of a request due to a teardown function raising, not only will it stick around forever, it won't even have a new app context pushed on top of it.
That also explains some magic I hadn't understood in our integration tests. You can INSERT some test data, then run some requests and those requests will be able to access that data despite you not committing. That's only possible since the request has a new request context, but is reusing the test application context, so it's reusing the existing DB connection. So this really is a feature, not a bug.
That said, it does mean you have to be absolutely sure your teardown functions succeed, using something like the teardown-function wrapper below. That's a good idea even without that feature to avoid leaking memory and DB connections, but is especially important in light of these findings. I'll be submitting a PR to Flask's docs for this reason. (Here it is)
Edit 2014-12-11:
One thing we ended up putting in place was the following code (in our application factory), which wraps every teardown function to make sure it logs the exception and doesn't raise further. This ensures the app context always gets popped successfully. Obviously this has to go after you're sure all teardown functions have been registered.
# Flask specifies that teardown functions should not raise.
# However, they might not have their own error handling,
# so we wrap them here to log any errors and prevent errors from
# propagating.
def wrap_teardown_func(teardown_func):
#wraps(teardown_func)
def log_teardown_error(*args, **kwargs):
try:
teardown_func(*args, **kwargs)
except Exception as exc:
app.logger.exception(exc)
return log_teardown_error
if app.teardown_request_funcs:
for bp, func_list in app.teardown_request_funcs.items():
for i, func in enumerate(func_list):
app.teardown_request_funcs[bp][i] = wrap_teardown_func(func)
if app.teardown_appcontext_funcs:
for i, func in enumerate(app.teardown_appcontext_funcs):
app.teardown_appcontext_funcs[i] = wrap_teardown_func(func)
Edit 2014-09-19:
Ok, turns out --reload-on-exception isn't a good idea if 1.) you're using multiple threads and 2.) terminating a thread mid-request could cause trouble. I thought uWSGI would wait for all requests for that worker to finish, like uWSGI's "graceful reload" feature does, but it seems that's not the case. We started having problems where a thread would acquire a dogpile lock in Memcached, then get terminated when uWSGI reloads the worker due to an exception in a different thread, meaning the lock is never released.
Removing SQLALCHEMY_COMMIT_ON_TEARDOWN solved part of our problem, though we're still getting occasional errors during app teardown during session.remove(). It seems these are caused by SQLAlchemy issue 3043, which was fixed in version 0.9.5, so hopefully upgrading to 0.9.5 will allow us to rely on the app context teardown always working.
Original:
How this happened in the first place is still an open question, but I did find a way to prevent it: uWSGI's --reload-on-exception option.
Our Flask app's error handling ought to be catching just about anything, so it can serve a custom error response, which means only the most unexpected exceptions should make it all the way to uWSGI. So it makes sense to reload the whole app whenever that happens.
We'll also turn off SQLALCHEMY_COMMIT_ON_TEARDOWN, though we'll probably commit explicitly rather than writing our own callback for app teardown, since we're writing to the database so rarely.

A surprising thing is that there's no exception handling around that self.session.commit. And a commit can fail, for example if the connection to the DB is lost. So the commit fails, session is not removed and next time that particular thread handles a request it still tries to use that now-invalid session.
Unfortunately, Flask-SQLAlchemy doesn't offer any clean possibility to have your own teardown function. One way would be to have the SQLALCHEMY_COMMIT_ON_TEARDOWN set to False and then writing your own teardown function.
It should look like this:
#app.teardown_appcontext
def shutdown_session(response_or_exc):
try:
if response_or_exc is None:
sqla.session.commit()
finally:
sqla.session.remove()
return response_or_exc
Now, you will still have your failing commits, and you'll have to investigate that separately... But at least your thread should recover.

Suspend on exceptions caused inside Django app in PyCharm

I debug django application and want to suspend code execution at the point where exception occurs with cursor pointing to problematic place in code. Pretty HTML display by django would be helpful either but not mandatory. My IDE is PyCharm.
If I set pycharm to suspend on termination of exception, then I never catch it, because django handles the exception with HTML debug info and exceptions never terminate. Setting DEBUG_PROPAGATE_EXCEPTIONS = True inside settings.py causes HTML debug info to disappear but the execution does not terminate either.
If I set pycharm to suspend on raise of exception, then I have to pass all existing exceptions inside py internals such as copy.py, decimal.py, gettext.py, etc, which is inconvenient (there are so many of them that I could never reach exceptions caused by my code).
If I set "temporary" setup to suspend on raise of exception which occurs after given breakpoint (which I place at the last line of settings.py) then django server does not start.
Thanks in advance for your help.

This should happen automatically in PyCharm. What you need to do is set no breakpoints, but run as debug (click on the green bug icon). When the exception occurs, execution should automatically halt.

Python multiprocessing logging - why multiprocessing.get_logger

I've been struggled with multiprocessing logging for some time, and for many reasons.
One of my reason is, why another get_logger.
Of course I've seen this question and it seems the logger that multiprocessing.get_logger returns do some "process-shared locks" magic to make logging handling smooth.
So, today I looked into the multiprocessing code of Python 2.7 (/multiprocessing/util.py), and found that this logger is just a plain logging.Logger, and there's barely any magic around it.
Here's the description in Python documentation, right before the
get_logger function:
Some support for logging is available. Note, however, that the logging
package does not use process shared locks so it is possible (depending
on the handler type) for messages from different processes to get
mixed up.
So when you use a wrong logging handler, even the get_logger logger may go wrong?
I've used a program uses get_logger for logging for some time.
It prints logs to StreamHandler and (seems) never gets mixed up.
Now My theory is:
multiprocessing.get_logger don't do process-shared locks at all
StreamHandler works for multiprocessing, but FileHandler doesn't
major purpose of this get_logger logger is for tracking processes'
life-cycle, and provide a easy-to-get and ready-to-use logger
that already logs process's name/id kinds of stuff
Here's the question:
Is my theory right?
How/Why/When do you use this get_logger?

Yes, I believe you're right that multiprocessing.get_logger() doesn't do process-shared locks - as you say, the docs even state this. Despite all the upvotes, it looks like the question you link to is flawed in stating that it does (to give it the benefit of doubt, it was written over a decade ago - so perhaps that was the case at one point).
Why does multiprocessing.get_logger() exist then? The docs say that it:
Returns the logger used by multiprocessing. If necessary, a new one will be created.
When first created the logger has level logging.NOTSET and no default handler. Messages sent to this logger will not by default propagate to the root logger.
i.e. by default the multiprocessing module will not produce any log output since its logger's logging level is set to NOTSET so no log messages are produced.
If you were to have a problem with your code that you suspected to be an issue with multiprocessing, that lack of log output wouldn't be helpful for debugging, and that's what multiprocessing.get_logger() exists for - it returns the logger used by the multiprocessing module itself so that you can override the default logging configuration to get some logs from it and see what it's doing.
Since you asked for how to use multiprocessing.get_logger(), you'd call it like so and configure the logger in the usual fashion, for example:
logger = multiprocessing.get_logger()
formatter = logging.Formatter('[%(levelname)s/%(processName)s] %(message)s')
handler = logging.StreamHandler()
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.setLevel(logging.INFO)
# now run your multiprocessing code
That said, you may actually want to use multiprocessing.log_to_stderr() instead for convenience - as per the docs:
This function performs a call to get_logger() but in addition to returning the logger created by get_logger, it adds a handler which sends output to sys.stderr using format '[%(levelname)s/%(processName)s] %(message)s'
i.e. it saves you needing to set up quite so much logging config yourself, and you can instead start debugging your multiprocessing issue with just:
logger = multiprocessing.log_to_stderr()
logger.setLevel(logging.INFO)
# now run your multiprocessing code
To reiterate though, that's just a normal module logger that's being configured and used, i.e. there's nothing special or process-safe about it. It just lets you see what's happening inside the multiprocessing module itself.

This answer is not about get_logger specifically, but perhaps you can use the approach suggested in this post? Note that the QueueHandler/QueueListener classes are available for earlier Python versions via the logutils package (available on PyPI, too).

How to set the name of a QThread in pyqt?

I am using QtCore.QThread (from PyQt4).
To log, I am also using the following formatter :
logging.Formatter('%(levelname)-8s %(asctime)s %(threadName)-15s %(message)s')
The resulting log is :
DEBUG 2012-10-01 03:59:31,479 Dummy-3 my_message
My problem is that I want to know more explicitly which thread is logging... Dummy-3 is not the most explicit name to me....
Is there a way to set a name to a QtCore.QThread that will be usable by the logging module (as a LogRecord attribute) in order to have a log more meaningful ?
Thanks !

If the threading module is available, the logging module will use threading.current_thread().name to set the threadName LogRecord attribute.
But the docs for threading.current_thread say that a dummy thread object will be used if the current thread was not created by the threading module (hence the "Dummy-x" name).
I suppose it would be possible to monkey-patch threading.current_thread to reset the name to something more appropriate. But surely a much better approach would be to make use of the extra dictionary when logging a message:
logging.Formatter('%(levelname)-8s %(asctime)s %(qthreadname)-15s %(message)s')
...
extras = {'qthreadname': get_qthreadname()}
logging.warning(message, extra=extras)

I had exactly your problem: I have a Qt GUI app running in the main thread and several Workers running in separate threads.
I started using the extras = {'qthreadname': get_qthreadname()} approach suggested by Ekhumoro, but the problem was the integration with other libraries using logging. If you don't provide the extra dictionary, logging will throw an exception as written in the doc and here summarized:
If you choose to use these attributes in logged messages, you need to exercise some care. In the above example, for instance, the Formatter has been set up with a format string which expects ‘clientip’ and ‘user’ in the attribute dictionary of the LogRecord. If these are missing, the message will not be logged because a string formatting exception will occur. So in this case, you always need to pass the extra dictionary with these keys.
While this might be annoying, this feature is intended for use in specialized circumstances, such as multi-threaded servers where the same code executes in many contexts, and interesting conditions which arise are dependent on this context (such as remote client IP address and authenticated user name, in the above example). In such circumstances, it is likely that specialized Formatters would be used with particular Handlers.
Instead of having specialized Formatters and Handlers, I have found a different approach. A QThread is also a thread and you can always get a reference to the current thread and set its name. Here a snippet of my code:
import threading
#
# your code
worker = MyWorker()
worker_thread = QThread()
worker_thread.setObjectName('MyThread')
worker.moveToThread(worker_thread)
#
# your code
class MyWorker(QtCore.QtObject):
# your code
def start(self):
threading.current_thread().name = QtCore.QThread.currentThread().objectName()
# your code
Now all the log messages arriving from the QThread are properly identify.
I hope it can help you!

From the Qt5 documentation, you can call setObjectName() to modify the thread name
To choose the name that your thread will be given (as identified by the command ps -L on Linux, for example), you can call setObjectName() before starting the thread.
If you don't call setObjectName(), the name given to your thread will be the class name of the runtime type of your thread object (for example, "RenderThread" in the case of the Mandelbrot Example, as that is the name of the QThread subclass).
Unfortunately, it also notes:
this is currently not available with release builds on Windows.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.