nginx + uwsgi 502 Bad Gateway python

nginx + uwsgi 502 Bad Gateway python - python

I'm running a script in python and takes a long time to process. The thing is if the function takes to long to run, i guess the nginx has a timeout, in his configuration and that prevents somekind of errors, and prevents the function to run completely.
I just want to know were i can increse the value of the timeout. Because i've tried some commands in the file conf of nginx such as:
uwsgi_connect_timeout 75;
uwsgi_send_timeout 75;
uwsgi_read_timeout 75;
keepalive_timeout 650;
but none of this worked.
Thks in advance

The problem with just extending the timeout is that no matter how much longer you set it to you will run into limitations somewhere along the line. Either with the web server, the browser or your geocode calls. If it is something that routinely fails n times in a request, then you can't really make any guarantees.
So rather than having the client request hanging on a long running process (and by extension risking a server timeout), why don't you use something like celery to run those geocode tasks and on the client-side, submit your client-side request via javascript and poll the server for the answer via ajax until it get's a response?

I also had Bad gateway error in NGIX + uWSGI configuration, and for sake of people who google this question: it might be missing uwsgi python plugin. Please see: uWSGI configuration issue: uwsgi fails without any error message..

I tried everything written in the above response as well as other places but they did not work.
My solution was changing my socket in both the uwsgi.conf and nginx.conf files.

Related

Socket.io POST Requests from Socket.IO-Client-Swift

I am running socket.io on an Apache server through Python Flask. We're integrating it into an iOS app (using the Socket.IO-Client-Swift library) and we're having a weird issue.
From the client side code in the app (written in Swift), I can view the actual connection log (client-side in XCode) and see the connection established from the client's IP and the requests being made. The client never receives the information back (or any information back; even when using a global event response handler) from the socket server.
I wrote a very simple test script in Javascript on an HTML page and sent requests that way and received the proper responses back. With that said, it seems to likely be an issue with iOS. I've found these articles (but none of them helped fix the problem):
https://github.com/nuclearace/Socket.IO-Client-Swift/issues/95
https://github.com/socketio/socket.io-client-swift/issues/359
My next thought is to extend the logging of socket.io to find out exact what data is being POSTed to the socket namespace. Is there a way to log exactly what data is coming into the server (bear in mind that the 'on' hook on the server side that I've set up is not getting any data; I've tried to log it from there but it doesn't appear to even get that far).
I found mod_dumpio for Linux to log all POST requests but I'm not sure how well it will play with multi-threading and a socket server.
Any ideas on how to get the exact data being posted so we can at least troubleshoot the syntax and make sure the data isn't being malformed when it's sent to the server?
Thanks!
Update
When testing locally, we got it working (it was a setting in the Swift code where the namespace wasn't being declared properly). This works fine now on localhost but we are having the exact same issues when emitting to the Apache server.
We are not using mod_wsgi (as far as I know; I'm relatively new to mod_wsgi, apologies for any ignorance). We used to have a .wsgi file that called the main app script to run but we had to change that because mod_wsgi is not compatible with Flask SocketIO (as stated in the uWSGI Web Server section here). The way I am running the script now is by using supervisord to run the .py file as a daemon (using that specifically so it will autostart in the event of a server crash).
Locally, it worked great once we installed the eventlet module through pip. When I ran pip freeze on my virtual environment on the server, eventlet was installed. I uninstalled and reinstalled it just to see if that cleared anything up and that did nothing. No other Python modules that are on my local copy seem to be something that would affect this.
One other thing to keep in mind is that in the function that initializes the app, we change the port to port 80:
socketio.run(app,host='0.0.0.0',port=80)
because we have other API functions that run through a domain that is pointing to the server in this app. I'm not sure if that would affect anything but it doesn't seem to matter on the local version.
I'm at a dead end again and am trying to find anything that could help. Thanks for your assistance!
Another Update
I'm not exactly sure what was happening yet but we went ahead and rewrote some of the code, making sure to pay extra special attention to the namespace declarations within each socket event on function. It's working fine now. As I get more details, I will post them here as I figure this will be something useful for other who have the same problem. This thread also has some really valuable information on how to go about debugging/logging these types of issues although we never actually fully figured out the answer to the original question.

I assume you have verified that Apache does get the POST requests. That should be your first test, if Apache does not log the POST requests coming from iOS, then you have a different kind of problem.
If you do get the POST requests, then you can add some custom code in the middleware used by Flask-SocketIO and print the request data forwarded by Apache's mod_wsgi. The this is in file flask_socketio/init.py. The relevant portion is this:
class _SocketIOMiddleware(socketio.Middleware):
# ...
def __call__(self, environ, start_response):
# log what you need from environ here
environ['flask.app'] = self.flask_app
return super(_SocketIOMiddleware, self).__call__(environ, start_response)
You can find out what's in environ in the WSGI specification. In particular, the body of the request is available in environ['wsgi.input'], which is a file-like object you read from.
Keep in mind that once you read the payload, this file will be consumed, so the WSGI server will not be able to read from it again. Seeking the file back to the position it was before the read may work on some WSGI implementations. A safer hack I've seen people do to avoid this problem is to read the whole payload into a buffer, then replace environ['wsgi.input'] with a brand new StringIO or BytesIO object.

Are you using flask-socketio on the server side? If you are, there is a lot of debugging available in the constructor.
socketio = SocketIO(app, async_mode=async_mode, logger=True, engineio_logger=True)

flask application timeout with amazon load balancer

I'm trying to use a Flask application behind an Amazon Load Balancer and the Flask threads keep timing out. It appears that the load balancer is sending a Connection: keep-alive header and this is causing the Flask process to never return (or takes a long time). With gunicorn in front the processes are killed and new ones started. We also tried using uWSGI and simply exposign the Flask app directly (no wrapper). All result in the Flask process just not responding.
I see nothing in the Flask docs which would make it ignore this header. I'm at a loss as to what else I can do with Flask to fix the problem.
Curl and direct connections to the machine work fine, only those via the load balancer are causing the problem. The load balancer itself doesn't appear to be doing anything wrong and we use it successfully with several other stacks.

The solution I have now is using gunicorn as a wrapper around the flask application. For the worker_class I am using eventlet with several workers. This combination seems stable and responsive. Gunicorn is also configured for HTTPS.
I assume it is a defect in Flask that causes the problem and this is an effective workaround.

Did you remember to set session.permanent = True and app.permanent_session_lifetime?

The easiest way is to force all connections is to make sure you are using HTTP/1.0 and not adding the header Connection: Keep-Alive to the response.
Please checkout werkzeug.http.remove_hop_by_hop_headers().

Do you need an HTTP load balancer? Using a layer 4 balancer might just as well solve your problem, because it does not interfere with higher protocol levels.

Python -- ConnectionError: Max retries exceeded

I occasionally get this error when my server (call it Server A) makes requests to a resource on another one of my servers (all it Server B):
ConnectionError: HTTPConnectionPool(host='some_ip', port=some_port): Max retries exceeded with url: /some_url/ (Caused by : [Errno 111] Connection refused)
The message in the exception is
message : None: Max retries exceeded with url: /some_url/ (Caused by redirect)
which I include because it has that extra piece of information (caused by redirect).
As I said, I control both servers involved in this request, so I can make changes to either and/or both. Also, the error appears to be intermittent, in that it doesn't happen every time.
Potentially relevant information -- Server A is a Python server running apache, and Server B is a NodeJS server. I am not exactly a web server wizard, so beyond that, I'm not exactly sure what information would be relevant.
Does anyone know exactly what this error means, or how to go about investigating a fix? Or, does anyone know which server is likely to be the problem, the one making the request, or the one receiving it?
Edit: The error has begun happening with our calls to external web resources also.

You are getting a CONN Refused on "some_ip" and port. That's likely caused by
- No server actually listening on that port/IP combination
- Firewall settings that send Conn Refused (less likely a cause!)
- Third - a misconfigured (more likely) or busy server, that cannot handle requests.
I Believe When - server A is trying to connect to server B you are getting that error. (Assuming it's Linux and/or some unix derivative) what does netstat -ln -tcp show on the server? (man netstat to understand the flags - what we are doing here is - trying to find which all programs are listening on which port). If that indeed shows your server B listening - iptables -L -n to show the firewall rules. If nothing's wrong there - it's a bad configuration of listen queue most probably. (http://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/023/2333/2333s2.html) or google for listen backlog.
This most likely is a bad configuration issue on your server B. (Note: a redirect loop as someone mentioned above - not handled correctly could just end up making the server busy! so possibly solving that could solve your problem as well)

If you're using gevent on your python server, you might need to upgrade the version. It looks like there's just some bug with gevent's DNS resolution.
This is a discussion from the requests library: https://github.com/kennethreitz/requests/issues/1202#issuecomment-13881265

This looks like a redirect loop on the Node side.
You mention server B is the node server, you can accidentally create a redirect loop if you set up the routes incorrectly. For example, if you are using express on server B - the Node server, you might have two routes, and assuming you keep your route logic in a separate module:
var routes = require(__dirname + '/routes/router')(app);
//... express setup stuff like app.use & app.configure
app.post('/apicall1', routes.apicall1);
app.post('/apicall2', routes.apicall2);
Then your routes/router.js might look like:
module.exports = Routes;
function Routes(app){
var self = this;
if (!(self instanceof Routes)) return new Routes(app);
//... do stuff with app if you like
}
Routes.prototype.apicall1 = function(req, res){
res.redirect('/apicall2');
}
Routes.prototype.apicall2 = function(req, res){
res.redirect('/apicall1');
}
That example is obvious, but you might have a redirect loop hidden in a bunch of conditions in some of those routes. I'd start with the edge cases, like what happens at the end of the conditionals within the routes in question, what is the default behavior if the call for example doesn't have the right parameters and what is the exception behavior?
As an aside, you can use something like node-validator (https://github.com/chriso/node-validator) to help determine and handle incorrect request or post parameters
// Inside router/routes.js:
var check = require('validator').check;
function Routes(app){ /* setup stuff */ }
Routes.prototype.apicall1 = function(req, res){
try{
check(req.params.csrftoken, 'Invalid CSRF').len(6,255);
// Handle it here, invoke appropriate business logic or model,
// or redirect, but be careful! res.redirect('/secure/apicall2');
}catch(e){
//Here you could Log the error, but don't accidentally create a redirect loop
// send appropriate response instead
res.send(401);
}
}
To help determine if it is a redirect loop you can do one of several things, you can use curl to hit the url with the same post parameters (assuming it is a post, otherwise you can just use chrome, it'll error out in the console if it notices a redirect loop), or you can write to stdout on the Node server or syslog out inside of the offending route(s).
Hope that helps, good thing you mentioned the "caused by redirect" part, that is I think the problem.
The example situation above uses express to describe the situation, but of course the problem can exist using just connect, other frameworks, or even your own handler code as well if you aren't using any frameworks or libraries at all. Either way, I'd make it a habit to put in good parameter checking and always test your edge cases, I've run myself into this problem exactly when I've been in a hurry in the past.

Twisted server crashes unexpectedly while running django

I am running a django application on twisted using the django-on-twisted scripts from this site.
All requests are served by an nginx server which reverse proxies relevant requests to twisted. I have a url setup for an API, which basically just receives get requests and does some processing on the get parameters before sending a response. However, when a specific client is hitting the api, the twisted server just shuts down. Pasted below is the Nginx log:
the.ip.of.client - - [21/Apr/2012:11:30:36 -0400] "GET /api/url/?get=params&more=params HTTP/1.1" 499 0 "-" "Java/1.6.0_24"
The twisted logs show nothing but twisted stops working at this point. By the error code 499, i am assuming that the client closed the connection unexpectedly, which I have no problem with. Whether the client receives the response or not is not important to me. Here is the relevant django view:
def api_url(request):
if request.GET:
get_param = request.GET.get('get', [''])[0]
more_param = request.GET.get('more', [''])[0]
#some processing here based on the get params
return HttpResponse('OK')
else:
raise Http404
The request from the client is a valid request and does not affect the processing in an adverse way. I have tested it from the shell. When I tried it on the django development server, it crashed in the same way too without leaving any traces of receiving the request. Everything works perfectly well when testing it from the browser. Also, the twisted server works well for all the regular use cases. This is the first time I am facing an issue with it. Any help or pointers will be appreciated.

There is no 499 http code in rfc. Nginx defines 499 code itself.
When a client sent a request, and closed the connection without waiting for
the response, a 499 code occurs. If there're a lot of 499s in your
access_log, it's mostly caused by the slow back-ends (too slow for your
users to wait). You may have to optimize your website performance.
http://forum.nginx.org/read.php?2,213789,213794#msg-213794

you say the problem is from a client hitting a particular url (reproducible?)
since it works for you with gunicorn but not django-on-twisted, either the script is not working properly or twisted.web2 is the issue.
please try $ sh init.sh yourdjangoproject stand.
you can also try to modify run.py to catch SystemExit:
import pdb
try:
# __main__ stuff here.
except (KeyboardInterrupt, SystemExit):
pdb.set_trace()

104, 'Connection reset by peer' socket error, or When does closing a socket result in a RST rather than FIN?

We're developing a Python web service and a client web site in parallel. When we make an HTTP request from the client to the service, one call consistently raises a socket.error in socket.py, in read:
(104, 'Connection reset by peer')
When I listen in with wireshark, the "good" and "bad" responses look very similar:
Because of the size of the OAuth header, the request is split into two packets. The service responds to both with ACK
The service sends the response, one packet per header (HTTP/1.0 200 OK, then the Date header, etc.). The client responds to each with ACK.
(Good request) the server sends a FIN, ACK. The client responds with a FIN, ACK. The server responds ACK.
(Bad request) the server sends a RST, ACK, the client doesn't send a TCP response, the socket.error is raised on the client side.
Both the web service and the client are running on a Gentoo Linux x86-64 box running glibc-2.6.1. We're using Python 2.5.2 inside the same virtual_env.
The client is a Django 1.0.2 app that is calling httplib2 0.4.0 to make requests. We're signing requests with the OAuth signing algorithm, with the OAuth token always set to an empty string.
The service is running Werkzeug 0.3.1, which is using Python's wsgiref.simple_server. I ran the WSGI app through wsgiref.validator with no issues.
It seems like this should be easy to debug, but when I trace through a good request on the service side, it looks just like the bad request, in the socket._socketobject.close() function, turning delegate methods into dummy methods. When the send or sendto (can't remember which) method is switched off, the FIN or RST is sent, and the client starts processing.
"Connection reset by peer" seems to place blame on the service, but I don't trust httplib2 either. Can the client be at fault?
** Further debugging - Looks like server on Linux **
I have a MacBook, so I tried running the service on one and the client website on the other. The Linux client calls the OS X server without the bug (FIN ACK). The OS X client calls the Linux service with the bug (RST ACK, and a (54, 'Connection reset by peer')). So, it looks like it's the service running on Linux. Is it x86_64? A bad glibc? wsgiref? Still looking...
** Further testing - wsgiref looks flaky **
We've gone to production with Apache and mod_wsgi, and the connection resets have gone away. See my answer below, but my advice is to log the connection reset and retry. This will let your server run OK in development mode, and solidly in production.

I've had this problem. See The Python "Connection Reset By Peer" Problem.
You have (most likely) run afoul of small timing issues based on the Python Global Interpreter Lock.
You can (sometimes) correct this with a time.sleep(0.01) placed strategically.
"Where?" you ask. Beats me. The idea is to provide some better thread concurrency in and around the client requests. Try putting it just before you make the request so that the GIL is reset and the Python interpreter can clear out any pending threads.

Don't use wsgiref for production. Use Apache and mod_wsgi, or something else.
We continue to see these connection resets, sometimes frequently, with wsgiref (the backend used by the werkzeug test server, and possibly others like the Django test server). Our solution was to log the error, retry the call in a loop, and give up after ten failures. httplib2 tries twice, but we needed a few more. They seem to come in bunches as well - adding a 1 second sleep might clear the issue.
We've never seen a connection reset when running through Apache and mod_wsgi. I don't know what they do differently, (maybe they just mask them), but they don't appear.
When we asked the local dev community for help, someone confirmed that they see a lot of connection resets with wsgiref that go away on the production server. There's a bug there, but it is going to be hard to find it.

Normally, you'd get an RST if you do a close which doesn't linger (i.e. in which data can be discarded by the stack if it hasn't been sent and ACK'd) and a normal FIN if you allow the close to linger (i.e. the close waits for the data in transit to be ACK'd).
Perhaps all you need to do is set your socket to linger so that you remove the race condition between a non lingering close done on the socket and the ACKs arriving?

I had the same issue however with doing an upload of a very large file using a python-requests client posting to a nginx+uwsgi backend.
What ended up being the cause was the the backend had a cap on the max file size for uploads lower than what the client was trying to send.
The error never showed up in our uwsgi logs since this limit was actually one imposed by nginx.
Upping the limit in nginx removed the error.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.