Why Twisted Manhole ConnectionDone is an error? - python

I'm using twisted manhole (https://github.com/HoverHell/pyaux/blob/master/pyaux/runlib.py#L126), and I also send errors caught by Twisted into python logging (https://github.com/HoverHell/pyaux/blob/master/pyaux/twisted_aux.py#L9).
However, as a result, the log gets ConnectionDone() errors, which isn't a very interesting thing as an error.
What would be appropriate to change to avoid getting this (and, possibly, some other) not-exactly-errors? Filtering for twisted.python.failure.Failure cases, perhaps? And where from is the ConnectionDone() even raised and why?

ConnectionDone() instance is given to connectionLost() callback after the connection has been closed. You should be seeing this, when the client side decides to close the connection.
You definitely don't want to filter the Failure out. You can think of the failure as a "asynchronous analogy" of the Exception. The usual thing to do, not to see some kind of exceptions is something like:
from twisted.internet import error
...
def connectionLost(self, reason):
if reason.check(error.ConnectionDone):
# this is normal, ignore this
pass
else:
# do whatever you have been doing for logging

Related

Python 3 exception handling and catching

I'm designing a workflow engine for a very specific task and I'm thinking about exception handling.
I've got a main process that calls a few functions. Most of those functions call other more specific functions and so on. There are a few libraries involved so there are a lot of specific errors that can occur. IOError, OSError, AuthenticationException ...
I have to stop the workflow when an error occurs and log it so I can continue from that point when the error is resolved.
Example of what I mean:
def workflow_runner():
download_file()
...
(more calls with their own exceptions)
...
def download_file():
ftps = open_ftp_connection()
ftps.get(filename)
...
(more calls with their own exceptions)
...
def open_ftp_connection():
ftps = ftplib.FTP_TLS()
try:
ftps.connect(domain, port)
ftps.login(username, password)
except ftplib.all_errors as e:
print(e)
raise
return ftps
Your basic, run of the mill, modular functions.
My question is this:
What's considered the best way of doing top to bottom error handling in Python 3?
To raise every exception to the top and thus put "try except" over each function call up the stack?
To handle every exception when it happens, log and raise and have no "try except" at the "top"?
Some better alternative?
Would it be better to just finish and raise the error on the spot or catch it in the "download_file" and/or "workflow_runner" functions?
I ask because if I end up catching everything at the top I feel like I might end up with:
except AError
except BError
...
except A4Error
It depends… You catch an exception at the point where you can do something about it. That differs between different functions and different exception types. A piece of code calls a subsystem (generically speaking any function), and it knows that subsystem may raise exception A, B or C. It now needs to decide what exceptions it expects and/or what it can do about each one of them. In the end it may decide to catch A and B exceptions, but it wouldn't make sense for it to catch C exceptions because it can't do anything about them. This now means this piece of code may raise C exceptions, and its callers need to be aware of that and make the same kinds of decisions.
So different exceptions are caught at different layers, as appropriate.
In more concrete terms, say you have some system which consists of some HTTP object which downloads some stuff from remote servers, some job manager which wrangles a bunch of these HTTP objects and stores their result in a database, and a top level coordinator that starts and stops the job managers. The HTTP objects may obviously raise all sorts of HTTP exceptions when network requests fail, and the job managers may raise exceptions when something's wrong with the database. You will probably let the job managers worry about HTTP errors like 404, but not about something fundamental like ComputerDoesntHaveANetworkInterface errors; equally DatabaseIsUnreachable exceptions is nothing a job manager can do anything about, and should probably lead to the termination of the application.

Python SimpleHTTPServer keeps going down and I don't know why

This is my first time working with SimpleHTTPServer, and honestly my first time working with web servers in general, and I'm having a frustrating problem. I'll start up my server (via SSH) and then I'll go try to access it and everything will be fine. But I'll come back a few hours later and the server won't be running anymore. And by that point the SSH session has disconnected, so I can't see if there were any error messages. (Yes, I know I should use something like screen to save the shell messages -- trying that right now, but I need to wait for it to go down again.)
I thought it might just be that my code was throwing an exception, since I had no error handling, but I added what should be a pretty catch-all try/catch block, and I'm still experiencing the issue. (I feel like this is probably not the best method of error handling, but I'm new at this... so let me know if there's a better way to do this)
class MyRequestHandler(SimpleHTTPServer.SimpleHTTPRequestHandler):
# (this is the only function my request handler has)
def do_GET(self):
if 'search=' in self.path:
try:
# (my code that does stuff)
except Exception as e:
# (log the error to a file)
return
else:
SimpleHTTPServer.SimpleHTTPRequestHandler.do_GET(self)
Does anyone have any advice for things to check, or ways to diagnose the issue? Most likely, I guess, is that my code is just crashing somewhere else... but if there's anything in particular I should know about the way SimpleHTTPServer operates, let me know.
I never had SimpleHTTPServer running for an extended period of time usually I just use it to transfer a couple of files in an ad-hoc manner, but I guess that it wouldn't be so bad as long as your security restraints are elsewhere (ie firewall) and you don't have need for much scale.
The SSH session is ending, which is killing your tasks (both foreground and background tasks). There are two solutions to this:
Like you've already mentioned use a utility such as screen to prevent your session from ending.
If you really want this to run for an extended period of time, you should look into your operating system's documentation on how to start/stop/enable services (now-a-days most of the cool kids are using systemd, but you might also find yourself using SysVinit or some other init system)
EDIT:
This link is in the comments, but I thought I should put it here as it answers this question pretty well

keep a continuous mongo connection active using pymongo

I have a consumer reading from kafka which has a continuous stream of events, every so often I have to write to a mongo collection for which I have to have a continuous mongo connection open. My solution to this which is fairly hacky I feel is to re-initialize the connection every 5 minutes or so to avoid Network timeout. This is to avoid periods in which there are no events from kafka and the connection is idle.
Can anyone suggest a better way to do this? Since I'm pretty sure this is the wrong way to go about establishing a continuous connection to mongo.
I'm using the pymongo client.
I have a MongoAdapter class which has helper methods:
from pymongo import MongoClient
import pymongo
import time
class MongoAdapter:
def __init__(self,databaseName,userid,password,host):
self.databaseName=databaseName
self.userid=userid
self.password=password
self.host=host
self.connection=MongoClient(host=self.host,maxPoolSize=100,socketTimeoutMS=1000,connectTimeoutMS=1000)
self.getDatabase()
def getDatabase(self):
try:
if(self.connection[self.databaseName].authenticate(self.userid,self.password)):
print "authenticated true"
self.database=self.connection[self.databaseName]
except pymongo.errors.OperationFailure:
print "Error: Please check Database Name, UserId,Password"
and I use the class in the following way to re-connect:
adapter_reinit_threshold=300 #every 300 seconds, instantiate new mongo conn.
adapter_config_time=time.time()
while True
if (time.time()-adapter_config_time) > adapter_reinit_threshold:
adapter=MongoAdapter(config.db_name,config.db_user,config.db_password,config.db_host) #re-connect
adapter_config_time=time.time() #update adapter_config_time
The reason I went ahead and did it this way was because I thought the old unused objects (with open connections, would be garbage collected and connections closed). Although this method works fine, I want to know if there's a cleaner way to do it and what the pitfalls of this approach might be.
From the documentation of pymongo.mongo_client.MongoClient
If an operation fails because of a network error, ConnectionFailure is
raised and the client reconnects in the background. Application code
should handle this exception (recognizing that the operation failed)
and then continue to execute.
I don't think you need to implement your own re-connection method.

Invalid transaction persisting across requests

Summary
One of our threads in production hit an error and is now producing InvalidRequestError: This session is in 'prepared' state; no further SQL can be emitted within this transaction. errors, on every request with a query that it serves, for the rest of its life! It's been doing this for days, now! How is this possible, and how can we prevent it going forward?
Background
We are using a Flask app on uWSGI (4 processes, 2 threads), with Flask-SQLAlchemy providing us DB connections to SQL Server.
The problem seemed to start when one of our threads in production was tearing down its request, inside this Flask-SQLAlchemy method:
#teardown
def shutdown_session(response_or_exc):
if app.config['SQLALCHEMY_COMMIT_ON_TEARDOWN']:
if response_or_exc is None:
self.session.commit()
self.session.remove()
return response_or_exc
...and somehow managed to call self.session.commit() when the transaction was invalid. This resulted in sqlalchemy.exc.InvalidRequestError: Can't reconnect until invalid transaction is rolled back getting output to stdout, in defiance of our logging configuration, which makes sense since it happened during the app context tearing down, which is never supposed to raise exceptions. I'm not sure how the transaction got to be invalid without response_or_exec getting set, but that's actually the lesser problem AFAIK.
The bigger problem is, that's when the "'prepared' state" errors started, and haven't stopped since. Every time this thread serves a request that hits the DB, it 500s. Every other thread seems to be fine: as far as I can tell, even the thread that's in the same process is doing OK.
Wild guess
The SQLAlchemy mailing list has an entry about the "'prepared' state" error saying it happens if a session started committing and hasn't finished yet, and something else tries to use it. My guess is that the session in this thread never got to the self.session.remove() step, and now it never will.
I still feel like that doesn't explain how this session is persisting across requests though. We haven't modified Flask-SQLAlchemy's use of request-scoped sessions, so the session should get returned to SQLAlchemy's pool and rolled back at the end of the request, even the ones that are erroring (though admittedly, probably not the first one, since that raised during the app context tearing down). Why are the rollbacks not happening? I could understand it if we were seeing the "invalid transaction" errors on stdout (in uwsgi's log) every time, but we're not: I only saw it once, the first time. But I see the "'prepared' state" error (in our app's log) every time the 500s occur.
Configuration details
We've turned off expire_on_commit in the session_options, and we've turned on SQLALCHEMY_COMMIT_ON_TEARDOWN. We're only reading from the database, not writing yet. We're also using Dogpile-Cache for all of our queries (using the memcached lock since we have multiple processes, and actually, 2 load-balanced servers). The cache expires every minute for our major query.
Updated 2014-04-28: Resolution steps
Restarting the server seems to have fixed the problem, which isn't entirely surprising. That said, I expect to see it again until we figure out how to stop it. benselme (below) suggested writing our own teardown callback with exception handling around the commit, but I feel like the bigger problem is that the thread was messed up for the rest of its life. The fact that this didn't go away after a request or two really makes me nervous!
Edit 2016-06-05:
A PR that solves this problem has been merged on May 26, 2016.
Flask PR 1822
Edit 2015-04-13:
Mystery solved!
TL;DR: Be absolutely sure your teardown functions succeed, by using the teardown-wrapping recipe in the 2014-12-11 edit!
Started a new job also using Flask, and this issue popped up again, before I'd put in place the teardown-wrapping recipe. So I revisited this issue and finally figured out what happened.
As I thought, Flask pushes a new request context onto the request context stack every time a new request comes down the line. This is used to support request-local globals, like the session.
Flask also has a notion of "application" context which is separate from request context. It's meant to support things like testing and CLI access, where HTTP isn't happening. I knew this, and I also knew that that's where Flask-SQLA puts its DB sessions.
During normal operation, both a request and an app context are pushed at the beginning of a request, and popped at the end.
However, it turns out that when pushing a request context, the request context checks whether there's an existing app context, and if one's present, it doesn't push a new one!
So if the app context isn't popped at the end of a request due to a teardown function raising, not only will it stick around forever, it won't even have a new app context pushed on top of it.
That also explains some magic I hadn't understood in our integration tests. You can INSERT some test data, then run some requests and those requests will be able to access that data despite you not committing. That's only possible since the request has a new request context, but is reusing the test application context, so it's reusing the existing DB connection. So this really is a feature, not a bug.
That said, it does mean you have to be absolutely sure your teardown functions succeed, using something like the teardown-function wrapper below. That's a good idea even without that feature to avoid leaking memory and DB connections, but is especially important in light of these findings. I'll be submitting a PR to Flask's docs for this reason. (Here it is)
Edit 2014-12-11:
One thing we ended up putting in place was the following code (in our application factory), which wraps every teardown function to make sure it logs the exception and doesn't raise further. This ensures the app context always gets popped successfully. Obviously this has to go after you're sure all teardown functions have been registered.
# Flask specifies that teardown functions should not raise.
# However, they might not have their own error handling,
# so we wrap them here to log any errors and prevent errors from
# propagating.
def wrap_teardown_func(teardown_func):
#wraps(teardown_func)
def log_teardown_error(*args, **kwargs):
try:
teardown_func(*args, **kwargs)
except Exception as exc:
app.logger.exception(exc)
return log_teardown_error
if app.teardown_request_funcs:
for bp, func_list in app.teardown_request_funcs.items():
for i, func in enumerate(func_list):
app.teardown_request_funcs[bp][i] = wrap_teardown_func(func)
if app.teardown_appcontext_funcs:
for i, func in enumerate(app.teardown_appcontext_funcs):
app.teardown_appcontext_funcs[i] = wrap_teardown_func(func)
Edit 2014-09-19:
Ok, turns out --reload-on-exception isn't a good idea if 1.) you're using multiple threads and 2.) terminating a thread mid-request could cause trouble. I thought uWSGI would wait for all requests for that worker to finish, like uWSGI's "graceful reload" feature does, but it seems that's not the case. We started having problems where a thread would acquire a dogpile lock in Memcached, then get terminated when uWSGI reloads the worker due to an exception in a different thread, meaning the lock is never released.
Removing SQLALCHEMY_COMMIT_ON_TEARDOWN solved part of our problem, though we're still getting occasional errors during app teardown during session.remove(). It seems these are caused by SQLAlchemy issue 3043, which was fixed in version 0.9.5, so hopefully upgrading to 0.9.5 will allow us to rely on the app context teardown always working.
Original:
How this happened in the first place is still an open question, but I did find a way to prevent it: uWSGI's --reload-on-exception option.
Our Flask app's error handling ought to be catching just about anything, so it can serve a custom error response, which means only the most unexpected exceptions should make it all the way to uWSGI. So it makes sense to reload the whole app whenever that happens.
We'll also turn off SQLALCHEMY_COMMIT_ON_TEARDOWN, though we'll probably commit explicitly rather than writing our own callback for app teardown, since we're writing to the database so rarely.
A surprising thing is that there's no exception handling around that self.session.commit. And a commit can fail, for example if the connection to the DB is lost. So the commit fails, session is not removed and next time that particular thread handles a request it still tries to use that now-invalid session.
Unfortunately, Flask-SQLAlchemy doesn't offer any clean possibility to have your own teardown function. One way would be to have the SQLALCHEMY_COMMIT_ON_TEARDOWN set to False and then writing your own teardown function.
It should look like this:
#app.teardown_appcontext
def shutdown_session(response_or_exc):
try:
if response_or_exc is None:
sqla.session.commit()
finally:
sqla.session.remove()
return response_or_exc
Now, you will still have your failing commits, and you'll have to investigate that separately... But at least your thread should recover.

Tornado: Can I run code after calling self.finish() in an asynchronous RequestHandler?

I'm using Tornado. I have a bunch of asynchronous request handlers. Most of them do their work asynchronously, and then report the result of that work back to the user. But I have one handler whose job it is to simply tell the user that their request is going to be processed at some point in the future. I finish the HTTP connection and then do more work. Here's a trivialized example:
class AsyncHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
def get(self, *args, **kwargs):
# first just tell the user to go away
self.write("Your request is being processed.")
self.finish()
# now do work
...
My question is: is this a legitimate use of Tornado? Will the code after the self.finish() run reliably? I've never had a problem with it before, but now I'm seeing a problem with it in one of my development environments (not all of them). There are a number of work-arounds here that I've already identified, but I want to make sure I'm not missing something fundamental to the request-lifecycle in Tornado. There doesn't SEEM to be a reason why I wouldn't be able to run code after calling self.finish(), but maybe I'm wrong.
Thanks!
Yes, you can.
You have to define on_finish method of your RequestHandler. This is a function run after the request finished and has sent the response to client.
RequestHandler.on_finish()
Called after the end of a request.
Override this method to perform cleanup, logging, etc. This method is
a counterpart to prepare. on_finish may not produce any output, as it
is called after the response has been sent to the client.
Yes, your code after self.finish() will work reliably. But you can't call self.finish() twice - it will raise an exception. You can use self.finish() to close connection before all work on server is done.
But as Cole Maclean told - don't do heavy work after finish.
Look for another way to do heavy tasks in background.

Categories

Resources