Appengine Appstats giving deadlock - python

i have a Flask app on gae, it is working correctly. I am trying to add Appstats support, but once i enable it, i have a deadlock.
This deadlock is apparently happening when i try to setup a werkzeug LocalProxy with the logged user ndb model (it is called current_user, like it's done in Flask-Login, to give you more details).
The error is:
RuntimeError: Deadlock waiting for <Future 104c02f50 created by get_async(key.py:545) for tasklet get(context.py:612) suspended generator get(context.py:645); pending>
The LocalProxy object is setup using this syntax (as per Werkzeug doc):
current_user = LocalProxy(lambda: _get_user())
And _get_user() makes a simple synchronous query ndb.query.
Thanks in advance for any help.

I ran into this issue today. In my case it seems to be that the request to get a users details is triggering appstats. Appstats is then going through the call stack and storing details of all the local variables in each stack frame.
The session itself is in one of these stack frames, so appstats tries to print it out and triggers the user fetching code again.
Came up with 2 "solutions", though neither of them are great.
Disable appstats altogether.
Disable logging of local variables in appstats.
I've gone for the latter at the moment. appstats allows you to configure various settings in your appengine_config.py file. I was able to avoid logging of local variable details (which stops the code from triggering the bug) by adding this:
appstats_MAX_LOCALS = 0

Related

Connecting to MongoDb (or any other db server) with Pyramid

What is the difference between connecting to the MongoDb server with the following two lines in the models.py module and then import models.py inside views.py:
from pymongo import MongoClient
db = MongoClient()['name']
versus adding db to request as described here or here?
I just started playing round with Pyramid and MongoDb, I used the first approach and it works well. Then I found out that people use the second approach.
Am I doing something wrong?
There's nothing wrong with what you're doing, but it's less future proof in case your app is going to become complex. The pattern your using is what sometimes is called "using a module as a singleton". The first time your module is imported, the code runs, creating a module level object that can be used from any other code that imports from this module. There's nothing wrong with this, it's a normal python pattern and is the reason you don't see much in the way of singleton boilerplate in python land.
However, in a complex app, it can become useful to control exactly when something happens, regardless of who imports what when. When we create the client at config time as per the docs example, you know that it gets created when the config (server startup) block is running as opposed to whenever any code imports your module, and you know from then on that it's available through your registry, which is accessible everywhere in a Pyramid app through the request object. This is the normal Pyramid best practise: set up all your one-time shared across requests machinery in the server start up code where you create your configurator, and (probably) attach them to the configurator or its registry.
This is the same reason we hook things into request lifecycle callbacks, it allows us to know where and when some piece of per-request code executes, and to make sure that a clean up helper always fires at the end of the request lifecycle. So for DB access, we create the shared machinery in config startup, and at the beginning of a request we create the per-connection code, cleaning up afterwards at the end of the request. For an SQL db, this would mean starting the transaction, and then committing or rolling back at the end.
So it might not matter at all for your app right now, but it's good practise for growing code bases. Most of the Pyramid design decisions were made for complex code situations.

Invalid transaction persisting across requests

Summary
One of our threads in production hit an error and is now producing InvalidRequestError: This session is in 'prepared' state; no further SQL can be emitted within this transaction. errors, on every request with a query that it serves, for the rest of its life! It's been doing this for days, now! How is this possible, and how can we prevent it going forward?
Background
We are using a Flask app on uWSGI (4 processes, 2 threads), with Flask-SQLAlchemy providing us DB connections to SQL Server.
The problem seemed to start when one of our threads in production was tearing down its request, inside this Flask-SQLAlchemy method:
#teardown
def shutdown_session(response_or_exc):
if app.config['SQLALCHEMY_COMMIT_ON_TEARDOWN']:
if response_or_exc is None:
self.session.commit()
self.session.remove()
return response_or_exc
...and somehow managed to call self.session.commit() when the transaction was invalid. This resulted in sqlalchemy.exc.InvalidRequestError: Can't reconnect until invalid transaction is rolled back getting output to stdout, in defiance of our logging configuration, which makes sense since it happened during the app context tearing down, which is never supposed to raise exceptions. I'm not sure how the transaction got to be invalid without response_or_exec getting set, but that's actually the lesser problem AFAIK.
The bigger problem is, that's when the "'prepared' state" errors started, and haven't stopped since. Every time this thread serves a request that hits the DB, it 500s. Every other thread seems to be fine: as far as I can tell, even the thread that's in the same process is doing OK.
Wild guess
The SQLAlchemy mailing list has an entry about the "'prepared' state" error saying it happens if a session started committing and hasn't finished yet, and something else tries to use it. My guess is that the session in this thread never got to the self.session.remove() step, and now it never will.
I still feel like that doesn't explain how this session is persisting across requests though. We haven't modified Flask-SQLAlchemy's use of request-scoped sessions, so the session should get returned to SQLAlchemy's pool and rolled back at the end of the request, even the ones that are erroring (though admittedly, probably not the first one, since that raised during the app context tearing down). Why are the rollbacks not happening? I could understand it if we were seeing the "invalid transaction" errors on stdout (in uwsgi's log) every time, but we're not: I only saw it once, the first time. But I see the "'prepared' state" error (in our app's log) every time the 500s occur.
Configuration details
We've turned off expire_on_commit in the session_options, and we've turned on SQLALCHEMY_COMMIT_ON_TEARDOWN. We're only reading from the database, not writing yet. We're also using Dogpile-Cache for all of our queries (using the memcached lock since we have multiple processes, and actually, 2 load-balanced servers). The cache expires every minute for our major query.
Updated 2014-04-28: Resolution steps
Restarting the server seems to have fixed the problem, which isn't entirely surprising. That said, I expect to see it again until we figure out how to stop it. benselme (below) suggested writing our own teardown callback with exception handling around the commit, but I feel like the bigger problem is that the thread was messed up for the rest of its life. The fact that this didn't go away after a request or two really makes me nervous!
Edit 2016-06-05:
A PR that solves this problem has been merged on May 26, 2016.
Flask PR 1822
Edit 2015-04-13:
Mystery solved!
TL;DR: Be absolutely sure your teardown functions succeed, by using the teardown-wrapping recipe in the 2014-12-11 edit!
Started a new job also using Flask, and this issue popped up again, before I'd put in place the teardown-wrapping recipe. So I revisited this issue and finally figured out what happened.
As I thought, Flask pushes a new request context onto the request context stack every time a new request comes down the line. This is used to support request-local globals, like the session.
Flask also has a notion of "application" context which is separate from request context. It's meant to support things like testing and CLI access, where HTTP isn't happening. I knew this, and I also knew that that's where Flask-SQLA puts its DB sessions.
During normal operation, both a request and an app context are pushed at the beginning of a request, and popped at the end.
However, it turns out that when pushing a request context, the request context checks whether there's an existing app context, and if one's present, it doesn't push a new one!
So if the app context isn't popped at the end of a request due to a teardown function raising, not only will it stick around forever, it won't even have a new app context pushed on top of it.
That also explains some magic I hadn't understood in our integration tests. You can INSERT some test data, then run some requests and those requests will be able to access that data despite you not committing. That's only possible since the request has a new request context, but is reusing the test application context, so it's reusing the existing DB connection. So this really is a feature, not a bug.
That said, it does mean you have to be absolutely sure your teardown functions succeed, using something like the teardown-function wrapper below. That's a good idea even without that feature to avoid leaking memory and DB connections, but is especially important in light of these findings. I'll be submitting a PR to Flask's docs for this reason. (Here it is)
Edit 2014-12-11:
One thing we ended up putting in place was the following code (in our application factory), which wraps every teardown function to make sure it logs the exception and doesn't raise further. This ensures the app context always gets popped successfully. Obviously this has to go after you're sure all teardown functions have been registered.
# Flask specifies that teardown functions should not raise.
# However, they might not have their own error handling,
# so we wrap them here to log any errors and prevent errors from
# propagating.
def wrap_teardown_func(teardown_func):
#wraps(teardown_func)
def log_teardown_error(*args, **kwargs):
try:
teardown_func(*args, **kwargs)
except Exception as exc:
app.logger.exception(exc)
return log_teardown_error
if app.teardown_request_funcs:
for bp, func_list in app.teardown_request_funcs.items():
for i, func in enumerate(func_list):
app.teardown_request_funcs[bp][i] = wrap_teardown_func(func)
if app.teardown_appcontext_funcs:
for i, func in enumerate(app.teardown_appcontext_funcs):
app.teardown_appcontext_funcs[i] = wrap_teardown_func(func)
Edit 2014-09-19:
Ok, turns out --reload-on-exception isn't a good idea if 1.) you're using multiple threads and 2.) terminating a thread mid-request could cause trouble. I thought uWSGI would wait for all requests for that worker to finish, like uWSGI's "graceful reload" feature does, but it seems that's not the case. We started having problems where a thread would acquire a dogpile lock in Memcached, then get terminated when uWSGI reloads the worker due to an exception in a different thread, meaning the lock is never released.
Removing SQLALCHEMY_COMMIT_ON_TEARDOWN solved part of our problem, though we're still getting occasional errors during app teardown during session.remove(). It seems these are caused by SQLAlchemy issue 3043, which was fixed in version 0.9.5, so hopefully upgrading to 0.9.5 will allow us to rely on the app context teardown always working.
Original:
How this happened in the first place is still an open question, but I did find a way to prevent it: uWSGI's --reload-on-exception option.
Our Flask app's error handling ought to be catching just about anything, so it can serve a custom error response, which means only the most unexpected exceptions should make it all the way to uWSGI. So it makes sense to reload the whole app whenever that happens.
We'll also turn off SQLALCHEMY_COMMIT_ON_TEARDOWN, though we'll probably commit explicitly rather than writing our own callback for app teardown, since we're writing to the database so rarely.
A surprising thing is that there's no exception handling around that self.session.commit. And a commit can fail, for example if the connection to the DB is lost. So the commit fails, session is not removed and next time that particular thread handles a request it still tries to use that now-invalid session.
Unfortunately, Flask-SQLAlchemy doesn't offer any clean possibility to have your own teardown function. One way would be to have the SQLALCHEMY_COMMIT_ON_TEARDOWN set to False and then writing your own teardown function.
It should look like this:
#app.teardown_appcontext
def shutdown_session(response_or_exc):
try:
if response_or_exc is None:
sqla.session.commit()
finally:
sqla.session.remove()
return response_or_exc
Now, you will still have your failing commits, and you'll have to investigate that separately... But at least your thread should recover.

event, exception logging for django

Ok, here is my confusion/problem:
I develop in localhost and there you could raise exceptions and could see the logs in command line easily.
Then I deploy the code on test, stage and production server, that is where the problem begins, it is not easy to see logs or debug errors and exceptions. For normal errors I guess django-toolbar could be enabled, but I do get some silent exceptions which dont crash but the process manipulates to failure because of that. For example, I have payment integration, and few days ago the payments were failing on return (callback) on our site, but nothing was crashing, just that payment process failed message was coming, but the payment gateway vendor was working fine, then I had to look for some failure instances which could lead to this problem and figured out that one db save operation was not saving because the variable was not there.
Now my question, is Sentry (https://github.com/getsentry/sentry) an answer for that? Or is there any other option for this?
Please do ask if any further clarification is needed for my requirement.
Sentry is an option, but honestly is too limited (I tried it a month ago or so), it's intended to track exceptions, but in the real world we should track important informations and events too.
If you didn't setup an application logging, I suggest you to do it, by following this example.
In my app I defined several loggers, for different purposes, the python logging configuration via dictionary (the one used by Django), is very powerful and you have a full control over how things get logged, for example you can write logs to a file, to a database, send an email, call a third party api or whatever. If your app is running in a load balanced environment (so you have several machines running your app), you can use services like Loggly to aggregate the logs coming from your instances in a single place (and since it uses RSYSLOG, it aggregates not only your Django app logs, but also all the logs of your underlying OS).
I suggest you to use also New Relic, which keeps track of a lot of stuff automatically: query executed and time, template loading time, errors and a lot of other useful statistics.

Logging execution time on Pyramid

I'm trying to log how much time my webapp takes to answer each request.
Right now I have a metaclass for Handlers that wraps each action and calculates the time passed between entering the method and exiting. This works fine, except that the logged times do not include the time spent rendering the templates... How could I do it?
This is the purpose of a tween. It is middleware that wraps the Pyramid application, so it has access to both the ingress and egress of a request within Pyramid. Note that there is already the debug toolbar which displays how long the entire request took. This is also a good application for WSGI middleware of which I'm sure a package already exists or you could easily write your own.
http://docs.pylonsproject.org/projects/pyramid/en/1.3-branch/narr/hooks.html#registering-tweens

500 Error without anything in the apache logs

I am currently developing an application based on flask. It runs fine spawning the server manually using app.run(). I've tried to run it through mod_wsgi now. Strangely, I get a 500 error, and nothing in the logs. I've investigated a bit and here are my findings.
Inserting a line like print >>sys.stderr, "hello" works as expected. The message shows up in the error log.
When calling a method without using a template it works just fine. No 500 Error.
Using a simple template works fine too.
BUT as soon as I trigger a database access inside the template (for example looping over a query) I get the error.
My gut tells me that it's SQLAlchemy which emits an error, and maybe some logging config causes the log to be discarded at some point in the application.
Additionally, for testing, I am using SQLite. This, as far as I can recall, can only be accessed from one thread. So if mod_wsgi spawns more threads, it may break the app.
I am a bit at a loss, because it only breaks running behind mod_wsgi, which also seems to swallow my errors. What can I do to make the errors bubble up into the apache error_log?
For reference, the code can be seen on this github permalink.
Turns out I was not completely wrong. The exception was indeed thrown by sqlalchemy. And as it's streamed to stdout by default, mod_wsgi silently ignored it (as far as I can tell).
To answer my main question: How to see the errors produced by the WSGI app?
It's actually very simple. Redirect your logs to stderr. The only thing you need to do, is add the following to your WSGI script:
import logging, sys
logging.basicConfig(stream=sys.stderr)
Now, this is the most mundane logging config. As I haven't put anything into place yet for my application this will do. But, I guess, once the application matures you will have a more sophisticated logging config anyways, so this won't bite you.
But for quick and dirty debugging, this will do just fine.
I had a similar problem: occasional "Internal Server Error" without logs. When you use mod_wsgi you should remove "app.run()" because this will always start a local WSGI server which we do not want if we deploy that application to mod_wsgi. See docs. I do not know if this is your case, but I hope this can help.
If you put this into your config.py it will help dramatically in propagating errors up to the apache error log:
PROPAGATE_EXCEPTIONS = True

Categories

Resources