Python -- ConnectionError: Max retries exceeded - python

I occasionally get this error when my server (call it Server A) makes requests to a resource on another one of my servers (all it Server B):
ConnectionError: HTTPConnectionPool(host='some_ip', port=some_port): Max retries exceeded with url: /some_url/ (Caused by : [Errno 111] Connection refused)
The message in the exception is
message : None: Max retries exceeded with url: /some_url/ (Caused by redirect)
which I include because it has that extra piece of information (caused by redirect).
As I said, I control both servers involved in this request, so I can make changes to either and/or both. Also, the error appears to be intermittent, in that it doesn't happen every time.
Potentially relevant information -- Server A is a Python server running apache, and Server B is a NodeJS server. I am not exactly a web server wizard, so beyond that, I'm not exactly sure what information would be relevant.
Does anyone know exactly what this error means, or how to go about investigating a fix? Or, does anyone know which server is likely to be the problem, the one making the request, or the one receiving it?
Edit: The error has begun happening with our calls to external web resources also.

You are getting a CONN Refused on "some_ip" and port. That's likely caused by
- No server actually listening on that port/IP combination
- Firewall settings that send Conn Refused (less likely a cause!)
- Third - a misconfigured (more likely) or busy server, that cannot handle requests.
I Believe When - server A is trying to connect to server B you are getting that error. (Assuming it's Linux and/or some unix derivative) what does netstat -ln -tcp show on the server? (man netstat to understand the flags - what we are doing here is - trying to find which all programs are listening on which port). If that indeed shows your server B listening - iptables -L -n to show the firewall rules. If nothing's wrong there - it's a bad configuration of listen queue most probably. (http://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/023/2333/2333s2.html) or google for listen backlog.
This most likely is a bad configuration issue on your server B. (Note: a redirect loop as someone mentioned above - not handled correctly could just end up making the server busy! so possibly solving that could solve your problem as well)

If you're using gevent on your python server, you might need to upgrade the version. It looks like there's just some bug with gevent's DNS resolution.
This is a discussion from the requests library: https://github.com/kennethreitz/requests/issues/1202#issuecomment-13881265

This looks like a redirect loop on the Node side.
You mention server B is the node server, you can accidentally create a redirect loop if you set up the routes incorrectly. For example, if you are using express on server B - the Node server, you might have two routes, and assuming you keep your route logic in a separate module:
var routes = require(__dirname + '/routes/router')(app);
//... express setup stuff like app.use & app.configure
app.post('/apicall1', routes.apicall1);
app.post('/apicall2', routes.apicall2);
Then your routes/router.js might look like:
module.exports = Routes;
function Routes(app){
var self = this;
if (!(self instanceof Routes)) return new Routes(app);
//... do stuff with app if you like
}
Routes.prototype.apicall1 = function(req, res){
res.redirect('/apicall2');
}
Routes.prototype.apicall2 = function(req, res){
res.redirect('/apicall1');
}
That example is obvious, but you might have a redirect loop hidden in a bunch of conditions in some of those routes. I'd start with the edge cases, like what happens at the end of the conditionals within the routes in question, what is the default behavior if the call for example doesn't have the right parameters and what is the exception behavior?
As an aside, you can use something like node-validator (https://github.com/chriso/node-validator) to help determine and handle incorrect request or post parameters
// Inside router/routes.js:
var check = require('validator').check;
function Routes(app){ /* setup stuff */ }
Routes.prototype.apicall1 = function(req, res){
try{
check(req.params.csrftoken, 'Invalid CSRF').len(6,255);
// Handle it here, invoke appropriate business logic or model,
// or redirect, but be careful! res.redirect('/secure/apicall2');
}catch(e){
//Here you could Log the error, but don't accidentally create a redirect loop
// send appropriate response instead
res.send(401);
}
}
To help determine if it is a redirect loop you can do one of several things, you can use curl to hit the url with the same post parameters (assuming it is a post, otherwise you can just use chrome, it'll error out in the console if it notices a redirect loop), or you can write to stdout on the Node server or syslog out inside of the offending route(s).
Hope that helps, good thing you mentioned the "caused by redirect" part, that is I think the problem.
The example situation above uses express to describe the situation, but of course the problem can exist using just connect, other frameworks, or even your own handler code as well if you aren't using any frameworks or libraries at all. Either way, I'd make it a habit to put in good parameter checking and always test your edge cases, I've run myself into this problem exactly when I've been in a hurry in the past.

Related

Ignore a request in Flask

I want to not answer a request handled by Flask. I don't want to return any error code, data, or an answer at all.
What I am trying to accomplish by doing this is that there is an endpoint takes sensor data and do not return any information. The clients POST the data to this endpoint, but they do not wait for an answer and shutdown (I have no control over the clients.) So I'm seeing the following error: "[Errno 10053] An established connection was aborted by the software in your host machine". So I asked myself, why do I even respond to these requests.
I can think of two reasons to do something like this:
You have a "friend" that you want to prevent from accessing your site, or
You have the misguided notion that this will help prevent (D)DoS attacks.
When you say "ignore a request totally" you kind of actually can't do that, generally speaking. Unless you know the IP address that the traffic is coming from, and then you can instruct your OS, Network card, router, switch, load balancer, maybe even ISP to filter out the traffic coming from that IP.
Otherwise, you're kind of out of luck because of how the Internet works.
HTTP works over TCP*. Specifically the client process looks something like this:
Translate DNS (e.g. google.com) to IP address (e.g. 216.58.218.174)
open up a TCP connection to 216.58.218.174:80 (using google for the example)
send the HTTP header over to Google:
GET / HTTP/1.1
read the response
Once that TCP/IP connection has been created to your server, at the very least you're going to have to terminate the connection.
There's really no good way to do this from within Python itself, and certainly not within Flask.
As you've updated your answer, it turns out you really don't have to change anything, Flask is already handling the error behind the scenes. It may be routing the message to a specific logger that you might be able to handle if you really don't want to see the messages, but it's not really important.
The only thing you may want to do, if your return processing is expensive (like tying up the database with a several second long query) is look into streaming your response instead, which will fail much more cheaply.
*Mostly. Sure you can do it over UDP, but you probably aren't

Broken pipe error and connection reset by peer 104

I'm using Bottle server to implement my own server using an implementation not so far away from the simple "hello world" here , my own implementation is (without the routing section of course):
bottleApp =bottle.app()
bottleApp.run(host='0.0.0.0',port=80, debug=true)
My server is keep getting unresponsive all the time and then I get in the Browser: Connection reset by peer, broken pipe errno 32
The logs give me almost exactly the same stack traces such as in question.
Here are my own logs:
What I tried so far, without success:
Wrapping the server run line with try except, something like, shown here the answer of "mhawke".
This stopped the error messages in logs, apparently because I caught them in except clause, but problem is that when catching the exception like that it means that we have been thrown out of the run method context, and I want to catch it in a way it will not cause my server to fall.
I don't know if its possible without touching the inner implementations files of bottle.
Adding this before server run line:
from signal import signal, SIGPIPE, SIG_DFL
signal(SIGPIPE,SIG_DFL)
As suggested here, but it seems that it didn't had any impact on not getting Broken pipe\connection reset errors and server responsiveness.
I thought of trying also the second answer here, but I don't have any idea where to locate this code in the context of the bottle server.
This sounds like a permissions issue or a firewall.
if you really need to listen on port 80, then you need to run with a privileged account. Also you will probably need to open port 80 for tcp traffic.
I can see your using something that appears to be Posix (Linux/Unix/OSx) If you post what OS you are using I can edit this answer to be more specific as to how to open the firewall and execute privileged commands (probably sudo but who knows).

nginx + uwsgi 502 Bad Gateway python

I'm running a script in python and takes a long time to process. The thing is if the function takes to long to run, i guess the nginx has a timeout, in his configuration and that prevents somekind of errors, and prevents the function to run completely.
I just want to know were i can increse the value of the timeout. Because i've tried some commands in the file conf of nginx such as:
uwsgi_connect_timeout 75;
uwsgi_send_timeout 75;
uwsgi_read_timeout 75;
keepalive_timeout 650;
but none of this worked.
Thks in advance
The problem with just extending the timeout is that no matter how much longer you set it to you will run into limitations somewhere along the line. Either with the web server, the browser or your geocode calls. If it is something that routinely fails n times in a request, then you can't really make any guarantees.
So rather than having the client request hanging on a long running process (and by extension risking a server timeout), why don't you use something like celery to run those geocode tasks and on the client-side, submit your client-side request via javascript and poll the server for the answer via ajax until it get's a response?
I also had Bad gateway error in NGIX + uWSGI configuration, and for sake of people who google this question: it might be missing uwsgi python plugin. Please see: uWSGI configuration issue: uwsgi fails without any error message..
I tried everything written in the above response as well as other places but they did not work.
My solution was changing my socket in both the uwsgi.conf and nginx.conf files.

django + orbited/stomp

I'm using django server together with orbited/stomp server to write something like chat. Assume that some users are connected to orbited. When one of them disconnects from orbited, how can I notify the rest? I mean I've tried the following code (javascript on the client side - maybe this is already wrong, server should do the push, right?):
function end()
{
stomp.send('user killed', '/channel');
}
together with
stomp.onclose = end;
but this doesn't work at all. Then I used
window.onbeforeunload = end;
but again no visible effect. I also replaced end() with different function, which just do ajax post to django server. But then stomp.onclose again does nothing and window.onbeforeunload gives me a broken pipe.
So these were attempts to implement "client leaves message before qutting" idea. But that failed.
I'm not even sure whether I'm doing this right. Is there a way to notify orbited/stomp users about leaving of a user? All ideas would be appreciated.
EDIT: Maybe there's antother way. I've read that it is possible to configure orbited server to make an http callback to the application with the user's key when someone's connection closes. Unfortunetly there was no explanation how to do that. Anyone knows the answer?
It seems that orbited is not suited for this kind of things (I talked with orbited creator). I switched to hookbox and it works fine.

104, 'Connection reset by peer' socket error, or When does closing a socket result in a RST rather than FIN?

We're developing a Python web service and a client web site in parallel. When we make an HTTP request from the client to the service, one call consistently raises a socket.error in socket.py, in read:
(104, 'Connection reset by peer')
When I listen in with wireshark, the "good" and "bad" responses look very similar:
Because of the size of the OAuth header, the request is split into two packets. The service responds to both with ACK
The service sends the response, one packet per header (HTTP/1.0 200 OK, then the Date header, etc.). The client responds to each with ACK.
(Good request) the server sends a FIN, ACK. The client responds with a FIN, ACK. The server responds ACK.
(Bad request) the server sends a RST, ACK, the client doesn't send a TCP response, the socket.error is raised on the client side.
Both the web service and the client are running on a Gentoo Linux x86-64 box running glibc-2.6.1. We're using Python 2.5.2 inside the same virtual_env.
The client is a Django 1.0.2 app that is calling httplib2 0.4.0 to make requests. We're signing requests with the OAuth signing algorithm, with the OAuth token always set to an empty string.
The service is running Werkzeug 0.3.1, which is using Python's wsgiref.simple_server. I ran the WSGI app through wsgiref.validator with no issues.
It seems like this should be easy to debug, but when I trace through a good request on the service side, it looks just like the bad request, in the socket._socketobject.close() function, turning delegate methods into dummy methods. When the send or sendto (can't remember which) method is switched off, the FIN or RST is sent, and the client starts processing.
"Connection reset by peer" seems to place blame on the service, but I don't trust httplib2 either. Can the client be at fault?
** Further debugging - Looks like server on Linux **
I have a MacBook, so I tried running the service on one and the client website on the other. The Linux client calls the OS X server without the bug (FIN ACK). The OS X client calls the Linux service with the bug (RST ACK, and a (54, 'Connection reset by peer')). So, it looks like it's the service running on Linux. Is it x86_64? A bad glibc? wsgiref? Still looking...
** Further testing - wsgiref looks flaky **
We've gone to production with Apache and mod_wsgi, and the connection resets have gone away. See my answer below, but my advice is to log the connection reset and retry. This will let your server run OK in development mode, and solidly in production.
I've had this problem. See The Python "Connection Reset By Peer" Problem.
You have (most likely) run afoul of small timing issues based on the Python Global Interpreter Lock.
You can (sometimes) correct this with a time.sleep(0.01) placed strategically.
"Where?" you ask. Beats me. The idea is to provide some better thread concurrency in and around the client requests. Try putting it just before you make the request so that the GIL is reset and the Python interpreter can clear out any pending threads.
Don't use wsgiref for production. Use Apache and mod_wsgi, or something else.
We continue to see these connection resets, sometimes frequently, with wsgiref (the backend used by the werkzeug test server, and possibly others like the Django test server). Our solution was to log the error, retry the call in a loop, and give up after ten failures. httplib2 tries twice, but we needed a few more. They seem to come in bunches as well - adding a 1 second sleep might clear the issue.
We've never seen a connection reset when running through Apache and mod_wsgi. I don't know what they do differently, (maybe they just mask them), but they don't appear.
When we asked the local dev community for help, someone confirmed that they see a lot of connection resets with wsgiref that go away on the production server. There's a bug there, but it is going to be hard to find it.
Normally, you'd get an RST if you do a close which doesn't linger (i.e. in which data can be discarded by the stack if it hasn't been sent and ACK'd) and a normal FIN if you allow the close to linger (i.e. the close waits for the data in transit to be ACK'd).
Perhaps all you need to do is set your socket to linger so that you remove the race condition between a non lingering close done on the socket and the ACKs arriving?
I had the same issue however with doing an upload of a very large file using a python-requests client posting to a nginx+uwsgi backend.
What ended up being the cause was the the backend had a cap on the max file size for uploads lower than what the client was trying to send.
The error never showed up in our uwsgi logs since this limit was actually one imposed by nginx.
Upping the limit in nginx removed the error.

Categories

Resources