Twisted server crashes unexpectedly while running django - python

I am running a django application on twisted using the django-on-twisted scripts from this site.
All requests are served by an nginx server which reverse proxies relevant requests to twisted. I have a url setup for an API, which basically just receives get requests and does some processing on the get parameters before sending a response. However, when a specific client is hitting the api, the twisted server just shuts down. Pasted below is the Nginx log:
the.ip.of.client - - [21/Apr/2012:11:30:36 -0400] "GET /api/url/?get=params&more=params HTTP/1.1" 499 0 "-" "Java/1.6.0_24"
The twisted logs show nothing but twisted stops working at this point. By the error code 499, i am assuming that the client closed the connection unexpectedly, which I have no problem with. Whether the client receives the response or not is not important to me. Here is the relevant django view:
def api_url(request):
if request.GET:
get_param = request.GET.get('get', [''])[0]
more_param = request.GET.get('more', [''])[0]
#some processing here based on the get params
return HttpResponse('OK')
else:
raise Http404
The request from the client is a valid request and does not affect the processing in an adverse way. I have tested it from the shell. When I tried it on the django development server, it crashed in the same way too without leaving any traces of receiving the request. Everything works perfectly well when testing it from the browser. Also, the twisted server works well for all the regular use cases. This is the first time I am facing an issue with it. Any help or pointers will be appreciated.

There is no 499 http code in rfc. Nginx defines 499 code itself.
When a client sent a request, and closed the connection without waiting for
the response, a 499 code occurs. If there're a lot of 499s in your
access_log, it's mostly caused by the slow back-ends (too slow for your
users to wait). You may have to optimize your website performance.
http://forum.nginx.org/read.php?2,213789,213794#msg-213794

you say the problem is from a client hitting a particular url (reproducible?)
since it works for you with gunicorn but not django-on-twisted, either the script is not working properly or twisted.web2 is the issue.
please try $ sh init.sh yourdjangoproject stand.
you can also try to modify run.py to catch SystemExit:
import pdb
try:
# __main__ stuff here.
except (KeyboardInterrupt, SystemExit):
pdb.set_trace()

Related

Django - strange browser behavior cause broken pipe

I noticed a strange behavior for a long time that cause my server do extra work. For Safari browser. Whenever you touch the address bar and start to edit the existing URL, the browser sends the same get request to the server and close the connection before the server return the response. When you finish editing the address and hit enter it will send the new request.
This behavior can happen multiple times while you edit your URL in the address bar.
This cause the server to fully process the response and when it return the result it through Broken pipe.
This happen on both cases for DEBUG = True/False. So I can see it on local debug server and I can see a request happening on my NGINX production server.
Is there a way to identify this request so to not serve results and save the server processing power?
Thanks

nginx gives 502 for non handled exception 500 error in flask

I am running a Flask application and using uwsgi socket nginx configuration.
This may be a stupid question but the issue I am facing is that whenever there is exception raised in Flask code (non handled exception for example 1/0), my nginx gives 502 instead of 500. I wanted to know if raising the exception is not getting propagated to nginx as 500 from uwsgi unix socket by default or do I need to specify this explicitly? Somewhere I read that for exceptions, Flask doesn't raise 500 error message automatically. Any comments will be helpful. Thanks.
Look at the 502 spec:
The HyperText Transfer Protocol 502 Bad Gateway server error response
code indicates that the server, while acting as a gateway or proxy,
received an invalid response from the upstream server.
If your app responds anything while hitting an exception, it's most likely garbage and nginx raises (correctly) the 502, meaning "I won't (as opposed to can't) talk to the backend".
If you want 500s, you have to catch any possible underlying exception, wrap it in a valid response and it will be processed by nginx.
Finally I was able to solve this issue. Thanks to #Jeff Storey and #joppich.
My proxy server was not able to read the response from by backend server which was causing this issue. I added an interceptor to catch all exceptions and propagate it to NGINX via uwsgi using the nginx proxy_intercept_errors on;. Big thanks Jeff and joppich. Appreciate your help.

Socket.io POST Requests from Socket.IO-Client-Swift

I am running socket.io on an Apache server through Python Flask. We're integrating it into an iOS app (using the Socket.IO-Client-Swift library) and we're having a weird issue.
From the client side code in the app (written in Swift), I can view the actual connection log (client-side in XCode) and see the connection established from the client's IP and the requests being made. The client never receives the information back (or any information back; even when using a global event response handler) from the socket server.
I wrote a very simple test script in Javascript on an HTML page and sent requests that way and received the proper responses back. With that said, it seems to likely be an issue with iOS. I've found these articles (but none of them helped fix the problem):
https://github.com/nuclearace/Socket.IO-Client-Swift/issues/95
https://github.com/socketio/socket.io-client-swift/issues/359
My next thought is to extend the logging of socket.io to find out exact what data is being POSTed to the socket namespace. Is there a way to log exactly what data is coming into the server (bear in mind that the 'on' hook on the server side that I've set up is not getting any data; I've tried to log it from there but it doesn't appear to even get that far).
I found mod_dumpio for Linux to log all POST requests but I'm not sure how well it will play with multi-threading and a socket server.
Any ideas on how to get the exact data being posted so we can at least troubleshoot the syntax and make sure the data isn't being malformed when it's sent to the server?
Thanks!
Update
When testing locally, we got it working (it was a setting in the Swift code where the namespace wasn't being declared properly). This works fine now on localhost but we are having the exact same issues when emitting to the Apache server.
We are not using mod_wsgi (as far as I know; I'm relatively new to mod_wsgi, apologies for any ignorance). We used to have a .wsgi file that called the main app script to run but we had to change that because mod_wsgi is not compatible with Flask SocketIO (as stated in the uWSGI Web Server section here). The way I am running the script now is by using supervisord to run the .py file as a daemon (using that specifically so it will autostart in the event of a server crash).
Locally, it worked great once we installed the eventlet module through pip. When I ran pip freeze on my virtual environment on the server, eventlet was installed. I uninstalled and reinstalled it just to see if that cleared anything up and that did nothing. No other Python modules that are on my local copy seem to be something that would affect this.
One other thing to keep in mind is that in the function that initializes the app, we change the port to port 80:
socketio.run(app,host='0.0.0.0',port=80)
because we have other API functions that run through a domain that is pointing to the server in this app. I'm not sure if that would affect anything but it doesn't seem to matter on the local version.
I'm at a dead end again and am trying to find anything that could help. Thanks for your assistance!
Another Update
I'm not exactly sure what was happening yet but we went ahead and rewrote some of the code, making sure to pay extra special attention to the namespace declarations within each socket event on function. It's working fine now. As I get more details, I will post them here as I figure this will be something useful for other who have the same problem. This thread also has some really valuable information on how to go about debugging/logging these types of issues although we never actually fully figured out the answer to the original question.
I assume you have verified that Apache does get the POST requests. That should be your first test, if Apache does not log the POST requests coming from iOS, then you have a different kind of problem.
If you do get the POST requests, then you can add some custom code in the middleware used by Flask-SocketIO and print the request data forwarded by Apache's mod_wsgi. The this is in file flask_socketio/init.py. The relevant portion is this:
class _SocketIOMiddleware(socketio.Middleware):
# ...
def __call__(self, environ, start_response):
# log what you need from environ here
environ['flask.app'] = self.flask_app
return super(_SocketIOMiddleware, self).__call__(environ, start_response)
You can find out what's in environ in the WSGI specification. In particular, the body of the request is available in environ['wsgi.input'], which is a file-like object you read from.
Keep in mind that once you read the payload, this file will be consumed, so the WSGI server will not be able to read from it again. Seeking the file back to the position it was before the read may work on some WSGI implementations. A safer hack I've seen people do to avoid this problem is to read the whole payload into a buffer, then replace environ['wsgi.input'] with a brand new StringIO or BytesIO object.
Are you using flask-socketio on the server side? If you are, there is a lot of debugging available in the constructor.
socketio = SocketIO(app, async_mode=async_mode, logger=True, engineio_logger=True)

nginx + uwsgi 502 Bad Gateway python

I'm running a script in python and takes a long time to process. The thing is if the function takes to long to run, i guess the nginx has a timeout, in his configuration and that prevents somekind of errors, and prevents the function to run completely.
I just want to know were i can increse the value of the timeout. Because i've tried some commands in the file conf of nginx such as:
uwsgi_connect_timeout 75;
uwsgi_send_timeout 75;
uwsgi_read_timeout 75;
keepalive_timeout 650;
but none of this worked.
Thks in advance
The problem with just extending the timeout is that no matter how much longer you set it to you will run into limitations somewhere along the line. Either with the web server, the browser or your geocode calls. If it is something that routinely fails n times in a request, then you can't really make any guarantees.
So rather than having the client request hanging on a long running process (and by extension risking a server timeout), why don't you use something like celery to run those geocode tasks and on the client-side, submit your client-side request via javascript and poll the server for the answer via ajax until it get's a response?
I also had Bad gateway error in NGIX + uWSGI configuration, and for sake of people who google this question: it might be missing uwsgi python plugin. Please see: uWSGI configuration issue: uwsgi fails without any error message..
I tried everything written in the above response as well as other places but they did not work.
My solution was changing my socket in both the uwsgi.conf and nginx.conf files.

104, 'Connection reset by peer' socket error, or When does closing a socket result in a RST rather than FIN?

We're developing a Python web service and a client web site in parallel. When we make an HTTP request from the client to the service, one call consistently raises a socket.error in socket.py, in read:
(104, 'Connection reset by peer')
When I listen in with wireshark, the "good" and "bad" responses look very similar:
Because of the size of the OAuth header, the request is split into two packets. The service responds to both with ACK
The service sends the response, one packet per header (HTTP/1.0 200 OK, then the Date header, etc.). The client responds to each with ACK.
(Good request) the server sends a FIN, ACK. The client responds with a FIN, ACK. The server responds ACK.
(Bad request) the server sends a RST, ACK, the client doesn't send a TCP response, the socket.error is raised on the client side.
Both the web service and the client are running on a Gentoo Linux x86-64 box running glibc-2.6.1. We're using Python 2.5.2 inside the same virtual_env.
The client is a Django 1.0.2 app that is calling httplib2 0.4.0 to make requests. We're signing requests with the OAuth signing algorithm, with the OAuth token always set to an empty string.
The service is running Werkzeug 0.3.1, which is using Python's wsgiref.simple_server. I ran the WSGI app through wsgiref.validator with no issues.
It seems like this should be easy to debug, but when I trace through a good request on the service side, it looks just like the bad request, in the socket._socketobject.close() function, turning delegate methods into dummy methods. When the send or sendto (can't remember which) method is switched off, the FIN or RST is sent, and the client starts processing.
"Connection reset by peer" seems to place blame on the service, but I don't trust httplib2 either. Can the client be at fault?
** Further debugging - Looks like server on Linux **
I have a MacBook, so I tried running the service on one and the client website on the other. The Linux client calls the OS X server without the bug (FIN ACK). The OS X client calls the Linux service with the bug (RST ACK, and a (54, 'Connection reset by peer')). So, it looks like it's the service running on Linux. Is it x86_64? A bad glibc? wsgiref? Still looking...
** Further testing - wsgiref looks flaky **
We've gone to production with Apache and mod_wsgi, and the connection resets have gone away. See my answer below, but my advice is to log the connection reset and retry. This will let your server run OK in development mode, and solidly in production.
I've had this problem. See The Python "Connection Reset By Peer" Problem.
You have (most likely) run afoul of small timing issues based on the Python Global Interpreter Lock.
You can (sometimes) correct this with a time.sleep(0.01) placed strategically.
"Where?" you ask. Beats me. The idea is to provide some better thread concurrency in and around the client requests. Try putting it just before you make the request so that the GIL is reset and the Python interpreter can clear out any pending threads.
Don't use wsgiref for production. Use Apache and mod_wsgi, or something else.
We continue to see these connection resets, sometimes frequently, with wsgiref (the backend used by the werkzeug test server, and possibly others like the Django test server). Our solution was to log the error, retry the call in a loop, and give up after ten failures. httplib2 tries twice, but we needed a few more. They seem to come in bunches as well - adding a 1 second sleep might clear the issue.
We've never seen a connection reset when running through Apache and mod_wsgi. I don't know what they do differently, (maybe they just mask them), but they don't appear.
When we asked the local dev community for help, someone confirmed that they see a lot of connection resets with wsgiref that go away on the production server. There's a bug there, but it is going to be hard to find it.
Normally, you'd get an RST if you do a close which doesn't linger (i.e. in which data can be discarded by the stack if it hasn't been sent and ACK'd) and a normal FIN if you allow the close to linger (i.e. the close waits for the data in transit to be ACK'd).
Perhaps all you need to do is set your socket to linger so that you remove the race condition between a non lingering close done on the socket and the ACKs arriving?
I had the same issue however with doing an upload of a very large file using a python-requests client posting to a nginx+uwsgi backend.
What ended up being the cause was the the backend had a cap on the max file size for uploads lower than what the client was trying to send.
The error never showed up in our uwsgi logs since this limit was actually one imposed by nginx.
Upping the limit in nginx removed the error.

Categories

Resources