PyAPNS SSL3_WRITE_PENDING error - python

I'm using PyAPNS module and Bottle framework in demo of my app to send push notifications to all registered devices.
At the beginning everything works fine, I've followed manual for PyAPNS. But after some time my service is running in the background on server, I start to recieve error:
SSLError: [Errno 1] _ssl.c:1217: error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad write retry
After restarting service everything works fine again. What should I do with that? Or how should I run such service in background? (for now I'm just running it in another screen)

I had the same issue as you did when using this library (I'm assuming you are in fact using https://github.com/simonwhitaker/PyAPNs, which is what I'm using. There is at least one other lib out there with a similar name, but I don't think you'd be using that).
AFAIK when you're using the simple notification service the APNS server might hang up on you for reasons including: using an incorrect token, having a malformed request, etc. Or perhaps your connection might get broken if your network connection drops out or you. The PyAPNS code doesn't handle such a hangup very gracefully right now and it attempts to re-use the socket even after it has been closed. My experience with seeing the SSL3_WRITE_PENDING error was that I would always see an error such as "error: [Errno 110] Connection timed out" happen on the socket before I would then get SSL3_WRITE_PENDING error when PyAPNS tried to re-use the socket.
If you are seeing the server hangup on you and you want to know why it's doing that, it helps to use the enhanced version of APNS, so that the server will write back info about what you did wrong.
As it happens, there is currently a pull request (https://github.com/simonwhitaker/PyAPNs/pull/23/files) that both moves PyAPNS to use enhanced APNS AND handles disconnections more gracefully. You'll see I commented on that pull request and have created my own fork of PyAPNS that handles disconnections in the way that suited my use case the best.
So you can use the code from pull request to perhaps find out why the APNS server is hanging up on you. And / or you could use it to simplify your failure recovery so you just retry the send if an exception is thrown rather than have to re-create the APNS object.
Hopefully the pull request will be merged to master soon (possibly including my changes as well).

Related

Broken pipe error and connection reset by peer 104

I'm using Bottle server to implement my own server using an implementation not so far away from the simple "hello world" here , my own implementation is (without the routing section of course):
bottleApp =bottle.app()
bottleApp.run(host='0.0.0.0',port=80, debug=true)
My server is keep getting unresponsive all the time and then I get in the Browser: Connection reset by peer, broken pipe errno 32
The logs give me almost exactly the same stack traces such as in question.
Here are my own logs:
What I tried so far, without success:
Wrapping the server run line with try except, something like, shown here the answer of "mhawke".
This stopped the error messages in logs, apparently because I caught them in except clause, but problem is that when catching the exception like that it means that we have been thrown out of the run method context, and I want to catch it in a way it will not cause my server to fall.
I don't know if its possible without touching the inner implementations files of bottle.
Adding this before server run line:
from signal import signal, SIGPIPE, SIG_DFL
signal(SIGPIPE,SIG_DFL)
As suggested here, but it seems that it didn't had any impact on not getting Broken pipe\connection reset errors and server responsiveness.
I thought of trying also the second answer here, but I don't have any idea where to locate this code in the context of the bottle server.
This sounds like a permissions issue or a firewall.
if you really need to listen on port 80, then you need to run with a privileged account. Also you will probably need to open port 80 for tcp traffic.
I can see your using something that appears to be Posix (Linux/Unix/OSx) If you post what OS you are using I can edit this answer to be more specific as to how to open the firewall and execute privileged commands (probably sudo but who knows).

Recover from dropped connection in redis pub/sub

I am running client that is connecting to a redis db. The client is on a WiFi connection and will drop the connection at times. Unfortunately, when this happens, the program just keeps running without throwing any type of warning.
r = redis.StrictRedis(host=XX, password=YY...)
ps = r.pubsub()
ps.subscribe("12345")
for items in ps.listen():
if items['type'] == 'message':
data = items['data']
Ideally, what I am looking for is a catch an event when the connection is lost, try and reestablish the connection, do some error correcting, then get things back up and running. Should this be done in the python program? Should I have an external watchdog?
Unfortunately, one have to 'ping' Redis to check if it is available. If You try to put a value to Redis storage, it will raise an ConnectionError exception if connection is lost. But the listen() generator will not close automatically when connection is lost.
I think that hacking Redis' connection pool could help, give it a try.
P.S. In is very insecure to connect to redis in an untrusted network environment.
This is an old, old question but I linked one of my own questions to it and happened to run across it again. It turned out there was a bug in the redis library that caused the client to enter an infinite loop attempting to reconnect if it lost connection to the redis server. I debugged the issue and PR'd the change. it was merged a long time ago now. Once surfaced the maintainer also knew of a second location that had the same issue.
This problem shouldn't occur anymore.
To fully answer the question, I can't remember which error it is given the time since I fixed this but there is now a specific error returned you can catch and reconnect on.

Python RabbitMQ and kombu: heartbeats

I have a Twisted application (Python 2.7) that uses the kombu module to communicate with a RabbitMQ message server.
We're having problems with closing connections (probably firewall related) and I'm trying to implement the heartbeat_check() method to handle this. I've got a heartbeat value of 10 set on the connection and I've got a Twisted LoopingCall that calls the heartbeat_check(rate=2) method every 5 seconds.
However, once things get rolling I'm getting an exception thrown every other call to heartbeat_check() (based on the logging information I've got in the function that LoopingCall calls, which includes the call to heartbeat_check). I've tried all kinds of variations of heartbeat and rate values and nothing seems to help. When I debug into the heartbeat_call() it looks like some minor message traffic is being sent back and forth, am I supposed to respond to that in my message consumer?
Any hints or pointers would be very helpful, thanks in advance!!
Doug

Why does my web server sometimes only transmit part of the response?

I have a piece of custom built web server code. It was written using the evnet module.
It seems to the cut the length of the message when requested from a remote client. But when I use it on the same machine, it seems to deliver the full message. I can't figure out what the problem could be or how to diagnose it. I tested it using a web browser, curl and nc. It never delivered the full length message when requesting from remote clients.
Here's a simplified version of my webserver that still exhibits the problem. I am doing this on Ubuntu 11.04 with Python 2.7.1
You are closing the socket immediately after calling send() to send a bunch of data. If there is data still buffered, it will be thrown away when you close the socket.
Instead, you should call shutdown(SHUT_WR) on the socket to tell the remote end that you are finished sending. This is called a TCP "half-close". In response, the other end will close its side, and you will get a notification that the socket is no longer readable. Then, and only then, should you close the socket handle.

104, 'Connection reset by peer' socket error, or When does closing a socket result in a RST rather than FIN?

We're developing a Python web service and a client web site in parallel. When we make an HTTP request from the client to the service, one call consistently raises a socket.error in socket.py, in read:
(104, 'Connection reset by peer')
When I listen in with wireshark, the "good" and "bad" responses look very similar:
Because of the size of the OAuth header, the request is split into two packets. The service responds to both with ACK
The service sends the response, one packet per header (HTTP/1.0 200 OK, then the Date header, etc.). The client responds to each with ACK.
(Good request) the server sends a FIN, ACK. The client responds with a FIN, ACK. The server responds ACK.
(Bad request) the server sends a RST, ACK, the client doesn't send a TCP response, the socket.error is raised on the client side.
Both the web service and the client are running on a Gentoo Linux x86-64 box running glibc-2.6.1. We're using Python 2.5.2 inside the same virtual_env.
The client is a Django 1.0.2 app that is calling httplib2 0.4.0 to make requests. We're signing requests with the OAuth signing algorithm, with the OAuth token always set to an empty string.
The service is running Werkzeug 0.3.1, which is using Python's wsgiref.simple_server. I ran the WSGI app through wsgiref.validator with no issues.
It seems like this should be easy to debug, but when I trace through a good request on the service side, it looks just like the bad request, in the socket._socketobject.close() function, turning delegate methods into dummy methods. When the send or sendto (can't remember which) method is switched off, the FIN or RST is sent, and the client starts processing.
"Connection reset by peer" seems to place blame on the service, but I don't trust httplib2 either. Can the client be at fault?
** Further debugging - Looks like server on Linux **
I have a MacBook, so I tried running the service on one and the client website on the other. The Linux client calls the OS X server without the bug (FIN ACK). The OS X client calls the Linux service with the bug (RST ACK, and a (54, 'Connection reset by peer')). So, it looks like it's the service running on Linux. Is it x86_64? A bad glibc? wsgiref? Still looking...
** Further testing - wsgiref looks flaky **
We've gone to production with Apache and mod_wsgi, and the connection resets have gone away. See my answer below, but my advice is to log the connection reset and retry. This will let your server run OK in development mode, and solidly in production.
I've had this problem. See The Python "Connection Reset By Peer" Problem.
You have (most likely) run afoul of small timing issues based on the Python Global Interpreter Lock.
You can (sometimes) correct this with a time.sleep(0.01) placed strategically.
"Where?" you ask. Beats me. The idea is to provide some better thread concurrency in and around the client requests. Try putting it just before you make the request so that the GIL is reset and the Python interpreter can clear out any pending threads.
Don't use wsgiref for production. Use Apache and mod_wsgi, or something else.
We continue to see these connection resets, sometimes frequently, with wsgiref (the backend used by the werkzeug test server, and possibly others like the Django test server). Our solution was to log the error, retry the call in a loop, and give up after ten failures. httplib2 tries twice, but we needed a few more. They seem to come in bunches as well - adding a 1 second sleep might clear the issue.
We've never seen a connection reset when running through Apache and mod_wsgi. I don't know what they do differently, (maybe they just mask them), but they don't appear.
When we asked the local dev community for help, someone confirmed that they see a lot of connection resets with wsgiref that go away on the production server. There's a bug there, but it is going to be hard to find it.
Normally, you'd get an RST if you do a close which doesn't linger (i.e. in which data can be discarded by the stack if it hasn't been sent and ACK'd) and a normal FIN if you allow the close to linger (i.e. the close waits for the data in transit to be ACK'd).
Perhaps all you need to do is set your socket to linger so that you remove the race condition between a non lingering close done on the socket and the ACKs arriving?
I had the same issue however with doing an upload of a very large file using a python-requests client posting to a nginx+uwsgi backend.
What ended up being the cause was the the backend had a cap on the max file size for uploads lower than what the client was trying to send.
The error never showed up in our uwsgi logs since this limit was actually one imposed by nginx.
Upping the limit in nginx removed the error.

Categories

Resources