Assuming that I can verify that a bunch of POST requests are in fact logically independent, how can I set up HTTP pipelining using python-requests and force it to allow POST requests in the pipeline?
Does someone have a working example?
P.S. for extra points, how to handle errors for outstanding requests if pipeline suddenly breaks?
P.P.S. grequests is not an option in this case.
Pipelining requests can be done with the builtin httplib, but only by accessing the connection and response objects below their public interface. This snippet demonstrates.
Edit: updated version for Python3: https://github.com/urllib3/urllib3/issues/52#issuecomment-109756116
The requests library does not support HTTP pipelining.
You can approximate pipelining by using grequests which makes it easier to run many requests in parallel, but each parallel request would still use a new TCP connection.
(requests does pool connections, keeping the TCP connection open if the remote server permits this, but that only helps for sequential connections, and request and response still have to alternate).
Related
As you may know HTTP/1.1 can allow you leave the socket open between HTTP requests leveraging the famous Keep-Alive connection. But, what less people exploit is the feature of just launch a burst of multiple sequential HTTP/1.1 requests without wait for the response in the middle time, Then the responses should return to you the same order paying the latency time just one time. (This consumption pattern is encouraged in Redis clients for example).
I know this pattern has been improved in HTTP/2 with the multiplexing feature but my concern right now is if I can use that pipelining pattern with the tornado library exploiting its async features, or may be other library capable?
No, Tornado does not support HTTP/1.1 pipelining. It won't start serving the second request until the response to the first request has been written.
I have just begun to look at tornado and asynchronous web servers. In many examples for tornado, longer requests are handled by something like:
make a call to tornado webserver
tornado makes async web call to an api
let tornado keep taking requests while callback waits to be called
handle response in callback. server to user.
So for hypothetical purposes say users are making a request to tornado server at /retrive. /retrieve will make a request to an internal api myapi.com/retrieve_posts_for_user_id/ or w/e. the api request could take a second to run while getting requests, then when it finally returns tornado servers up the response. First of all is this flow the 'normal' way to use tornado? Many of the code examples online would suggest so.
Secondly, (this is where my mind is starting to get boggled) assuming that the above flow is the standard flow, should myapi.com be asyncronous? If its not async and the requests can take seconds apiece wouldn't it create the same bottleneck a blocking server would? Perhaps an example of a normal use case for tornado or any async would help to shed some light on this issue? Thank you.
Yes, as I understand your question, that is a normal use-case for Tornado.
If all requests to your Tornado server would make requests to myapi.com, and myapi.com is blocking, then yes, myapi.com would still be the bottleneck. However, if only some requests have to be handled by myapi.com, then Tornado would still be a win, as it can keep handling such requests while waiting for responses for the requests to myapi.com. But regardless, if myapi.com can't handle the load, then putting a Tornado server in front of it won't magically fix that. The difference is that your Tornado server will still be able to respond to requests even when myapi.com is busy.
I need to get json data and I'm using urllib2:
request = urllib2.Request(url)
request.add_header('Accept-Encoding', 'gzip')
opener = urllib2.build_opener()
connection = opener.open(request)
data = connection.read()
but although the data aren't so big it is too slow.
Is there a way to speed it up? I can use 3rd party libraries too.
Accept-Encoding:gzip means that the client is ready to gzip Encoded content if the Server is ready to send it first. The rest of the request goes down the sockets and to over your Operating Systems TCP/IP stack and then to physical layer.
If the Server supports ETags, then you can send a If-None-Match header to ensure that content has not changed and rely on the cache. An example is given here.
You cannot do much with clients only to improve your HTTP request speed.
You're dependant on a number of different things here that may not be within your control:
Latency/Bandwidth of your connection
Latency/Bandwidth of server connection
Load of server application and its individual processes
Items 2 and 3 are probably where the problem lies and you won't be able to do much about it. Is the content cache-able? This will depend on your own application needs and HTTP headers (e.g. ETags, Cache-Control, Last-Modified) that are returned from the server. The server may only up date every day in which case you might be better off only requesting data every hour.
There is unlikely an issue with urllib. If you have network issues and performance problems: consider using tools like Wireshark to investigate on the network level. I have very strong doubts that this is related to Python in any way.
If you are making lots of requests, look into threading. Having about 10 workers making requests can speed things up - you don't grind to a halt if one of them takes too long getting a connection.
The title may be a bit vague, but here's my goal: I have a frontend webserver which takes incoming HTTP requests, does some preprocessing on them, and then passes the requests off to my real webserver to get the HTTP response, which is then passed back to the client.
Currently, my frontend is built off of BaseHTTPServer.HTTPServer and the backend is CherryPy.
So the question is: Is there a way to take these HTTP requests / client connections and insert them into a CherryPy server to get the HTTP response? One obvious solution is to run an instance of the CherryPy backend on a local port or using UNIX domain sockets, and then the frontend webserver establishes a connection with the backend and relays any requests/responses. Obviously, this isn't ideal due to the overhead.
What I'd really like is for the CherryPy backend to not bind to any port, but just sit there waiting for the frontend to pass the client's socket (as well as the modified HTTP Request info), at which point it does its normal CherryPy magic and returns the request directly to the client.
I've been perusing the CherryPy source to find some way to accomplish this, and currently am attempting to modify wsgiserver.CherryPyWSGIServer, but it's getting pretty hairy and is probably not the best approach.
Is your main app a wsgi application? If so, you could write some middleware that wraps around it and does all the request wrangling before passing on to the main application.
If this this is possible it would avoid you having to run two webservers and all the problems you are encountering.
Answered the Upgrade question at Handling HTTP/1.1 Upgrade requests in CherryPy. Not sure if that addresses this one or not.
I am coding a python (2.6) interface to a web service. I need to communicate via http so that :
Cookies are handled automatically,
The requests are asynchronous,
The order in which the requests are sent is respected (the order in which the responses to these requests are received does not matter).
I have tried what could be easily derived from the build-in libraries, facing different problems :
Using httplib and urllib2, the requests are synchronous unless I use thread, in which case the order is not guaranteed to be respected,
Using asyncore, there was no library to automatically deal with cookies send by the web service.
After some googling, it seems that there are many examples of python scripts or libraries that match 2 out of the 3 criteria, but not the 3 of them. I am thinking of reading through the cookielib sources and adapting what I need of it to asyncore (or only to my application in a ad hoc manner), but it seems strange that nothing like this exists yet, as I guess I am not the only one interested. If anyone knows of pointers about this problem, it would be greatly appreciated.
Thank you.
Edit to clarify :
What I am doing is a local proxy that interfaces my IRC client with a webchat. It creates a socket that listens to IRC connections, then upon receiving one, it logs in the webchat via http. I don't have access to the behaviour of the webchat, and it uses cookies for session IDs. When client sends several IRC requests to my python proxy, I have to forward them to the webchat's server via http and with cookies. I also want to do this asynchronously (I don't want to wait for the http response before I send the next request), and currently what happens is that the order in which the http requests are sent is not the order in which the IRC commands were received.
I hope this clarifies the question, and I will of course detail more if it doesn't.
Using httplib and urllib2, the
requests are synchronous unless I use
thread, in which case the order is not
guaranteed to be respected
How would you know that the order has been respected unless you get your response back from the first connection before you send the response to the second connection? After all, you don't care what order the responses come in, so it's very possible that the responses come back in the order you expect but that your requests were processed in the wrong order!
The only way you can guarantee the ordering is by waiting for confirmation that the first request has successfully arrived (eg. you start receiving the response for it) before beginning the second request. You can do this by not launching the second thread until you reach the response handling part of the first thread.