Google App Engine requests hanging timeout

Google App Engine requests hanging timeout - python

I have an application in production for couple of years in Google App Engine (GAP). In a given entry point of my application I get data from a 3rd party API using requests as such:
#app.route('/example')
def example():
response = requests.get(url, params=PARAMS, headers=HEADERS)
Studently (2 days ago), this peace of code stopped working. Basically, what happens is that the request is hanging, and after 30sec, the worker dies with the error message:
upstream prematurely closed connection while reading response header from upstream, client...
Nonetheless, the API responds immediately when executing the code in my local computer. The same exact behavior is detected when using curl in shell.
Any idea what might causing this? Or to debug it? Can this be related with some DNS problem? I don't even know if the problem is related with the API or is GAE that somehow is blocking the request.
Thanks in advance!

I have some theories:
(1) Rate Limiting -- If you are doing this API call a lot, they may be blocking your IP address.
(2) 1st Gen GAE headers -- 1st gen GAE sends special headers with every request. This makes it obvious that a computer is initiating the request and not a person. It is possible that the third party blocks requests with these headers.
(3) Blocking Cloud IPs -- Google publishes a full list of its IP addresses and the third party may be blocking all of them.
Given that it works on your own computer with curl, I suspect (1) is the answer. If they would do (2) or (3) then they would likely also block curl.

Related

How do I terminate a long-running Django request if the XHR gets an abort()?

I initiate a request client-side, then I change my mind and call xhr.abort().
How does Django react to this? Does it terminate the thread somehow? If not, how do I get Django to stop wasting time trying to respond to the aborted request? How do I handle it gracefully?

Due to how http works and that you usually got a frontend in front of your django gunicorn app processes (or uswgi etc), your http cancel request is buffered by nginx. The gunicorns don't get a signal, they just finish processing and then output whatever to the http socket. But if that socket is closed it will have an error (which is caught as a closed connection and move one).
So it's easy to DOS a server if you can find a way to spawn many of these requests.
But to answer your question it depends on the backend, with gunicorn it will keep going until the timeout.

Just think of the Web as a platform for building easy-to-use, distributed, loosely couple systems, with no guarantee about the availability of resources as 404 status code suggests.
I think that creating tightly coupled solutions such as your idea is going against web principles and usage of REST. xhr.abort() is client side programming, it's completely different from server side. It's a bad practice trying to tighten client side technology to server side internal behavior.
Not only this is a waste of resources, but also there is no guarantee on processing status of the request by web server. It may lead to data inconsistency too.
If your request generates no server-side side effects for which the client
can be held responsible. It is better just to ignore it, since these kind of requests does not change server state & the response is usually cached for better performance.
If your request could cause changes in server state or data, for the sake of data consistency you can check whether the changes have taken effect or not using an API. In case of affection try to rollback using another API.

What happens if a HTTP connection is closed while AppEngine is still running

The real question is if Google App Engine guarantees it would complete a HTTP request even if the connection is no longer existed (such as terminated, lost Internet connection).
Says we have a python script running on Google App Engine:
db.put(status = "Outputting")
print very_very_very_long_string_like_1GB
db.put(status = "done")
If the client decides to close the connection in the middle (too much data coming...), will status = "done" be executed? Or will the instance be killed and all following code be ignored?

If the client breaks the connect, the request will continue to execute. Unless it reaches the deadline of 60 seconds.

GAE uses Pending Queue to queue up requests. If client drops connection and request is already in the queue or being executed, then it will not be aborted. Afaik all other http servres behave the same way.
This will be a real problem when you make requests that change state (PUT, POST, DELETE) on mobile networks. On Edge networks we see about 1% of large requests (uploads, ~500kb) dropped in the middle of request executing (exec takes about 1s): e.g. server gets the data and processes it, but client does not receive response, triggering it to retry. This could produce duplicate data in the DB, breaking integrity of this data.
To alleviate this you will need to make your web methods idempotent: repeating the same method with same arguments does not change state. The easiest way to achieve this would be one of:
Hash relevant data and compare to existing hashes. In you case it would be the string you are trying to save (very_very_very_long_string_like_1GB). You can do this server side.
Client provides unique request-scoped ID, and sever checks if this ID was already used.

How to speed-up a HTTP request

I need to get json data and I'm using urllib2:
request = urllib2.Request(url)
request.add_header('Accept-Encoding', 'gzip')
opener = urllib2.build_opener()
connection = opener.open(request)
data = connection.read()
but although the data aren't so big it is too slow.
Is there a way to speed it up? I can use 3rd party libraries too.

Accept-Encoding:gzip means that the client is ready to gzip Encoded content if the Server is ready to send it first. The rest of the request goes down the sockets and to over your Operating Systems TCP/IP stack and then to physical layer.
If the Server supports ETags, then you can send a If-None-Match header to ensure that content has not changed and rely on the cache. An example is given here.
You cannot do much with clients only to improve your HTTP request speed.

You're dependant on a number of different things here that may not be within your control:
Latency/Bandwidth of your connection
Latency/Bandwidth of server connection
Load of server application and its individual processes
Items 2 and 3 are probably where the problem lies and you won't be able to do much about it. Is the content cache-able? This will depend on your own application needs and HTTP headers (e.g. ETags, Cache-Control, Last-Modified) that are returned from the server. The server may only up date every day in which case you might be better off only requesting data every hour.

There is unlikely an issue with urllib. If you have network issues and performance problems: consider using tools like Wireshark to investigate on the network level. I have very strong doubts that this is related to Python in any way.

If you are making lots of requests, look into threading. Having about 10 workers making requests can speed things up - you don't grind to a halt if one of them takes too long getting a connection.

Python: Asynchronous http requests sent in order with automatic handling of cookies?

I am coding a python (2.6) interface to a web service. I need to communicate via http so that :
Cookies are handled automatically,
The requests are asynchronous,
The order in which the requests are sent is respected (the order in which the responses to these requests are received does not matter).
I have tried what could be easily derived from the build-in libraries, facing different problems :
Using httplib and urllib2, the requests are synchronous unless I use thread, in which case the order is not guaranteed to be respected,
Using asyncore, there was no library to automatically deal with cookies send by the web service.
After some googling, it seems that there are many examples of python scripts or libraries that match 2 out of the 3 criteria, but not the 3 of them. I am thinking of reading through the cookielib sources and adapting what I need of it to asyncore (or only to my application in a ad hoc manner), but it seems strange that nothing like this exists yet, as I guess I am not the only one interested. If anyone knows of pointers about this problem, it would be greatly appreciated.
Thank you.
Edit to clarify :
What I am doing is a local proxy that interfaces my IRC client with a webchat. It creates a socket that listens to IRC connections, then upon receiving one, it logs in the webchat via http. I don't have access to the behaviour of the webchat, and it uses cookies for session IDs. When client sends several IRC requests to my python proxy, I have to forward them to the webchat's server via http and with cookies. I also want to do this asynchronously (I don't want to wait for the http response before I send the next request), and currently what happens is that the order in which the http requests are sent is not the order in which the IRC commands were received.
I hope this clarifies the question, and I will of course detail more if it doesn't.

Using httplib and urllib2, the
requests are synchronous unless I use
thread, in which case the order is not
guaranteed to be respected
How would you know that the order has been respected unless you get your response back from the first connection before you send the response to the second connection? After all, you don't care what order the responses come in, so it's very possible that the responses come back in the order you expect but that your requests were processed in the wrong order!
The only way you can guarantee the ordering is by waiting for confirmation that the first request has successfully arrived (eg. you start receiving the response for it) before beginning the second request. You can do this by not launching the second thread until you reach the response handling part of the first thread.

104, 'Connection reset by peer' socket error, or When does closing a socket result in a RST rather than FIN?

We're developing a Python web service and a client web site in parallel. When we make an HTTP request from the client to the service, one call consistently raises a socket.error in socket.py, in read:
(104, 'Connection reset by peer')
When I listen in with wireshark, the "good" and "bad" responses look very similar:
Because of the size of the OAuth header, the request is split into two packets. The service responds to both with ACK
The service sends the response, one packet per header (HTTP/1.0 200 OK, then the Date header, etc.). The client responds to each with ACK.
(Good request) the server sends a FIN, ACK. The client responds with a FIN, ACK. The server responds ACK.
(Bad request) the server sends a RST, ACK, the client doesn't send a TCP response, the socket.error is raised on the client side.
Both the web service and the client are running on a Gentoo Linux x86-64 box running glibc-2.6.1. We're using Python 2.5.2 inside the same virtual_env.
The client is a Django 1.0.2 app that is calling httplib2 0.4.0 to make requests. We're signing requests with the OAuth signing algorithm, with the OAuth token always set to an empty string.
The service is running Werkzeug 0.3.1, which is using Python's wsgiref.simple_server. I ran the WSGI app through wsgiref.validator with no issues.
It seems like this should be easy to debug, but when I trace through a good request on the service side, it looks just like the bad request, in the socket._socketobject.close() function, turning delegate methods into dummy methods. When the send or sendto (can't remember which) method is switched off, the FIN or RST is sent, and the client starts processing.
"Connection reset by peer" seems to place blame on the service, but I don't trust httplib2 either. Can the client be at fault?
** Further debugging - Looks like server on Linux **
I have a MacBook, so I tried running the service on one and the client website on the other. The Linux client calls the OS X server without the bug (FIN ACK). The OS X client calls the Linux service with the bug (RST ACK, and a (54, 'Connection reset by peer')). So, it looks like it's the service running on Linux. Is it x86_64? A bad glibc? wsgiref? Still looking...
** Further testing - wsgiref looks flaky **
We've gone to production with Apache and mod_wsgi, and the connection resets have gone away. See my answer below, but my advice is to log the connection reset and retry. This will let your server run OK in development mode, and solidly in production.

I've had this problem. See The Python "Connection Reset By Peer" Problem.
You have (most likely) run afoul of small timing issues based on the Python Global Interpreter Lock.
You can (sometimes) correct this with a time.sleep(0.01) placed strategically.
"Where?" you ask. Beats me. The idea is to provide some better thread concurrency in and around the client requests. Try putting it just before you make the request so that the GIL is reset and the Python interpreter can clear out any pending threads.

Don't use wsgiref for production. Use Apache and mod_wsgi, or something else.
We continue to see these connection resets, sometimes frequently, with wsgiref (the backend used by the werkzeug test server, and possibly others like the Django test server). Our solution was to log the error, retry the call in a loop, and give up after ten failures. httplib2 tries twice, but we needed a few more. They seem to come in bunches as well - adding a 1 second sleep might clear the issue.
We've never seen a connection reset when running through Apache and mod_wsgi. I don't know what they do differently, (maybe they just mask them), but they don't appear.
When we asked the local dev community for help, someone confirmed that they see a lot of connection resets with wsgiref that go away on the production server. There's a bug there, but it is going to be hard to find it.

Normally, you'd get an RST if you do a close which doesn't linger (i.e. in which data can be discarded by the stack if it hasn't been sent and ACK'd) and a normal FIN if you allow the close to linger (i.e. the close waits for the data in transit to be ACK'd).
Perhaps all you need to do is set your socket to linger so that you remove the race condition between a non lingering close done on the socket and the ACKs arriving?

I had the same issue however with doing an upload of a very large file using a python-requests client posting to a nginx+uwsgi backend.
What ended up being the cause was the the backend had a cap on the max file size for uploads lower than what the client was trying to send.
The error never showed up in our uwsgi logs since this limit was actually one imposed by nginx.
Upping the limit in nginx removed the error.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.