Why do long HTTP round trip-times stall my Tornado AsyncHttpClient?

Why do long HTTP round trip-times stall my Tornado AsyncHttpClient? - python

I'm using Tornado to send requests in rapid, periodic succession (every 0.1s or even 0.01s) to a server. For this, I'm using AsyncHttpClient.fetch with a callback to handle the response.
Here's a very simple code to show what I mean:
from functools import partial
from tornado import gen, locks, httpclient
from datetime import timedelta, datetime
# usually many of these running on the same thread, maybe requesting the same server
#gen.coroutine
def send_request(url, interval):
wakeup_condition = locks.Condition()
#using this to allow requests to send immediately
http_client = httpclient.AsyncHTTPClient(max_clients=1000)
for i in range(300):
req_time = datetime.now()
current_callback = partial(handle_response, req_time)
http_client.fetch(url, current_callback, method='GET')
yield wakeup_condition.wait(timeout=timedelta(seconds=interval))
def handle_response(req_time, response):
resp_time = datetime.now()
write_to_log(req_time, resp_time, resp_time - req_time) #opens the log and writes to it
When I was testing it against a local server, it was working fine, the requests were being sent on time, the round trip time was obviously minimal.
However, when I test it against a remote server, with larger round trip times (especially for higher request loads), the request timing gets messed up by multiple seconds: The period of wait between each request becomes much larger than the desired period.
How come? I thought the async code wouldn't be affected by the roundtrip time since it isn't blocking while waiting for the response. Is there any known solution to this?

After some tinkering and tcpdumping, I've concluded that two things were really slowing down my coroutine. With these two corrected stalling has gone down enormously drastically and the timeout in yield wakeup_condition.wait(timeout=timedelta(seconds=interval)) is much better respected:
The computer I'm running on doesn't seem to be caching DNS, which for AsyncHTTPClient seem to be a blocking network call. As such every coroutine sending requests has the added time to wait for the DNS to resolve. Tornado docs say:
tornado.httpclient in the default configuration blocks on DNS
resolution but not on other network access (to mitigate this use
ThreadedResolver or a tornado.curl_httpclient with a
properly-configured build of libcurl).
...and in the AsynHTTPClient docs
To select curl_httpclient, call AsyncHTTPClient.configure at startup:
AsyncHTTPClient.configure("tornado.curl_httpclient.CurlAsyncHTTPClient")
I ended up implementing my own thread which resolves and caches DNS, however, and that resolved the issue by issuing the request directly to the IP address.
The URL I was using was HTTPS, changing to a HTTP url improved performance. For my use case that's not always possible, but it's good to be able to localize part of the issue

Related

Mocking "sleep"

I am writing an application that would asynchronously trigger some events. The test looks like this: set everything up, sleep for sometime, check that event has triggered.
However because of that waiting the test takes quite a time to run - I'm waiting for about 10 seconds on every test. I feel that my tests are slow - there are other places where I can speed them up, but this sleeping seems the most obvious place to speed it up.
What would be the correct way to eliminate that sleep? Is there some way to cheat datetime or something like that?
The application is a tornado-based web app and async events are triggered with IOLoop, so I don't have a way to directly trigger it myself.
Edit: more details.
The test is a kind of integration test, where I am willing to mock the 3rd party code, but don't want to directly trigger my own code.
The test is to verify that a certain message is sent using websocket and is processed correctly in the browser. Message is sent after a certain timeout which is started at the moment the client connects to the websocket handler. The timeout value is taken as a difference between datetime.now() at the moment of connection and a value in database. The value is artificially set to be datetime.now() - 5 seconds before using selenium to request the page. Since loading the page requires some time and could be a bit random on different machines I don't think reducing the 5 seconds time gap would be wise. Loading the page after timeout will produce a different result (no websocket message should be sent).
So the problem is to somehow force tornado's IOLoop to send the message at any moment after the websocket is connected - if that happened in 0.5 seconds after setting the database value, 4.5 seconds left to wait and I want to try and eliminate that delay.
Two obvious places to mock are IOLoop itself and datetime.now(). the question is now which one I should monkey-patch and how.

I you want to mock sleep then you must not use it directly in your application's code. I would create a class method like System.sleep() and use this in your application. System.sleep() can be mocked then.

Use the built in tornado testing tools. Each test gets it's own IOLoop, and you use self.stop and self.wait to get results from it, e.g. (from the docs):
client = AsyncHTTPClient(self.io_loop)
# call self.stop on fetch completion
client.fetch("http://www.tornadoweb.org/", self.stop)
response = self.wait()

Building an HTTP API for continuously running python process

TL;DR: I have a beautifully crafted, continuously running piece of Python code controlling and reading out a physics experiment. Now I want to add an HTTP API.
I have written a module which controls the hardware using USB. I can script several types of autonomously operating experiments, but I'd like to control my running experiment over the internet. I like the idea of an HTTP API, and have implemented a proof-of-concept using Flask's development server.
The experiment runs as a single process claiming the USB connection and periodically (every 16 ms) all data is read out. This process can write hardware settings and commands, and reads data and command responses.
I have a few problems choosing the 'correct' way to communicate with this process. It works if the HTTP server only has a single worker. Then, I can use python's multiprocessing.Pipe for communication. Using more-or-less low-level sockets (or things like zeromq) should work, even for request/response, but I have to implement some sort of protocol: send {'cmd': 'set_voltage', 'value': 900} instead of calling hardware.set_voltage(800) (which I can use in the stand-alone scripts). I can use some sort of RPC, but as far as I know they all (SimpleXMLRPCServer, Pyro) use some sort of event loop for the 'server', in this case the process running the experiment, to process requests. But I can't have an event loop waiting for incoming requests; it should be reading out my hardware! I googled around quite a bit, but however I try to rephrase my question, I end up with Celery as the answer, which mostly fires off one job after another, but isn't really about communicating with a long-running process.
I'm confused. I can get this to work, but I fear I'll be reinventing a few wheels. I just want to launch my app in the terminal, open a web browser from anywhere, and monitor and control my experiment.
Update: The following code is a basic example of using the module:
from pysparc.muonlab.muonlab_ii import MuonlabII
muonlab = MuonlabII()
muonlab.select_lifetime_measurement()
muonlab.set_pmt1_voltage(900)
muonlab.set_pmt1_threshold(500)
lifetimes = []
while True:
data = muonlab.read_lifetime_data()
if data:
print "Muon decays detected with lifetimes", data
lifetimes.extend(data)
The module lives at https://github.com/HiSPARC/pysparc/tree/master/pysparc/muonlab.
My current implementation of the HTTP API lives at https://github.com/HiSPARC/pysparc/blob/master/bin/muonlab_with_http_api.
I'm pretty happy with the module (with lots of tests) but the HTTP API runs using Flask's single-threaded development server (which the documentation and the internet tells me is a bad idea) and passes dictionaries through a Pipe as some sort of IPC. I'd love to be able to do something like this in the above script:
while True:
data = muonlab.read_lifetime_data()
if data:
print "Muon decays detected with lifetimes", data
lifetimes.extend(data)
process_remote_requests()
where process_remote_requests is a fairly short function to call the muonlab instance or return data. Then, in my Flask views, I'd have something like:
muonlab = RemoteMuonlab()
#app.route('/pmt1_voltage', methods=['GET', 'PUT'])
def get_data():
if request.method == 'PUT':
voltage = request.form['voltage']
muonlab.set_pmt1_voltage(voltage)
else:
voltage = muonlab.get_pmt1_voltage()
return jsonify(voltage=voltage)
Getting the measurement data from the app is perhaps less of a problem, since I could store that in SQLite or something else that handles concurrent access.

But... you do have an IO loop; it runs every 16ms.
You can use BaseHTTPServer.HTTPServer in such a case; just set the timeout attribute to something small. bascially...
class XmlRPCApi:
def do_something(self):
print "doing something"
server = SimpleXMLRPCServer(("localhost", 8000))
server.register_instance(XMLRpcAPI())
server.timeout = 0
while True:
sleep(0.016)
do_normal_thing()
x.handle_request()
Edit: python has a built in server, also built on BaseHTTPServer, capable of serving a flask app. since flask.Flask() happens to be a wsgi compliant application, your process_remote_requests() should look like this:
import wsgiref.simple_server
remote_server = wsgire.simple_server('localhost', 8000, app)
# app here is just your Flask() application!
# as before, set timeout to zero so that you can go right back
# to your event loop if there are no requests to handle
remote_server.timeout = 0
def process_remote_requests():
remote_server.handle_request()
This works well enough if you have only short running requests; but if you need to handle requests that may possibly take longer than your event loop's normal polling interval, or if you need to handle more requests than you have polls per unit of time, then you can't use this approach, exactly.
You don't necessarily need to fork off another process, though, You can potentially get by using a pool of workers in another thread. roughly:
import threading
import wsgiref.simple_server
remote_server = wsgire.simple_server('localhost', 8000, app)
POOL_SIZE = 10 # or some other value.
pool = [threading.Thread(target=remote_server.serve_forever) for dummy in xrange(POOL_SIZE)]
for thread in pool:
thread.daemon = True
thread.start()
while True:
pass # normal experiment processing here; don't handle requests in this thread.
However; this approach has one major shortcoming, you now have to deal with concurrency! It's not safe to manipulate your program state as freely as you could with the above loop, since you might be, concurrently manipulating that same state in the main thread (or another http server thread). It's up to you to know when this is valid, wrapping each resource with some sort of mutex lock or whatever is appropriate.

How to get actual time TCP request is made and response is received by OS

I'm load testing a server so I have a client spitting out lots of HTTP requests (hundreds, possibly thousands per second). I want to measure how long it takes for the server to respond. Currently I'm measuring this response time as follows:
import requests, time
start_time = time.time()
response = requests.get('https://testserver.mydomain.com/service')
response_time = time.time() - start_time
I'm worried however that when the client is making too many requests per second, then the http request is not actually sent to the server at start_time but rather spends some time kicking around the client machine in some queue or something of that nature. How can I get a more accurate start_time?
(Note that have modified ulimits and some other stuff on both the client and server to handle a high number of concurrent requests---my question isn't so much about how to get a system to handle many concurrent requests, but rather about how to measure when the request is actually made).

If your main objective is just to load test the server, you should probably consider using an existing tool for the job, e.g. ApacheBench. For more complicated tests, you could try Multi-Mechanize. These tools are specifically designed for load testing purposes and possibly lack in the overhead when compared to libraries like Requests in your example.

Various timeouts for python httplib

I'm implementing a little service that fetches web pages from various servers. I need to be able to configure different types of timeouts. I've tried mucking around with the settimeout method of sockets but it's not exactly as I'd like it. Here are the problems.
I need to specify a timeout for the initial DNS lookup. I understand this is done when I instantiate the HTTPConnection at the beginning.
My code is written in such a way that I first .read a chunk of data (around 10 MB) and if the entire payload fits in this, I move on to other parts of the code. If it doesn't fit in this, I directly stream the payload out to a file rather than into memory. When this happens, I do an unbounded .read() to get the data and if the remote side sends me, say, a byte of data every second, the connection just keeps waiting receiving one byte every second. I want to be able to disconnect with a "you're taking too long". A thread based solution would be the last resort.

httplib is to straight forward for what you are looking for.
I would recommend to take a look for http://pycurl.sourceforge.net/ and the http://curl.haxx.se/libcurl/c/curl_easy_setopt.html#CURLOPTTIMEOUT option.
The http://curl.haxx.se/libcurl/c/curl_easy_setopt.html#CURLOPT_NOSIGNAL option sounds also interesting:
Consider building libcurl with c-ares support to enable asynchronous DNS lookups, which enables nice timeouts for name resolves without signals.

Have you tried requests?
You can set timeouts conveniently http://docs.python-requests.org/en/latest/user/quickstart/#timeouts
>>> requests.get('http://github.com', timeout=0.001)
EDIT:
I missed the part 2 of the question. For that you could use this:
import sys
import signal
import requests
class TimeoutException(Exception):
pass
def get_timeout(url, dns_timeout=10, load_timeout=60):
def timeout_handler(signum, frame):
raise TimeoutException()
signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(load_timeout) # triger alarm in seconds
try:
response = requests.get(url, timeout=dns_timeout)
except TimeoutException:
return "you're taking too long"
return response
and in your code use the get_timeout function.
If you need the timeout to be available for other functions you could create a decorator.
Above code from http://pguides.net/python-tutorial/python-timeout-a-function/.

app engine python urlfetch timing out

I have two instances of app engine applications running that I want to communicate with a Restful interface. Once the data of one is updated, it calls a web hook on the second which will retrieve a fresh copy of the data for it's own system.
Inside 'site1' i have:
from google.appengine.api import urlfetch
url = www.site2.com/data_updated
result = urlfetch.fetch(url)
Inside the handler for data_updated on 'site2' I have:
url = www.site1.com/get_new_data
result = urlfetch.fetch(url)
There is very little data being passed between the two sites but I receive the following error. I've tried increasing the deadline to 10 seconds but this still doesn't work.
DeadlineExceededError: ApplicationError: 5
Can anyone provide any insight into what might be happening?
Thanks - Richard

App Engine's urlfetch doesn't always behave as it is expected, you have about 10 seconds to fetch the URL. Assuming the URL you're trying to fetch is up and running, you should be able to catch the DeadlineExceededError by calling from google.appengine.runtime import apiproxy_errors and then wrapping the urlfetch call within a try/except block using except apiproxy_errors.DeadlineExceededError:.
Relevant answer here.

Changing the method
from
result = urlfetch.fetch(url)
to
result = urlfetch(url,deadline=2,method=urlfetch.POST)
has fixed the Deadline errors.
From the urlfetch documentation:
deadline
The maximum amount of time to wait for a response from the
remote host, as a number of seconds. If the remote host does not
respond in this amount of time, a DownloadError is raised.
Time spent waiting for a request does not count toward the CPU quota
for the request. It does count toward the request timer. If the app
request timer expires before the URL Fetch call returns, the call is
canceled.
The deadline can be up to a maximum of 60 seconds for request handlers
and 10 minutes for tasks queue and cron job handlers. If deadline is
None, the deadline is set to 5 seconds.

Have you tried manually querying the URLs (www.site2.com/data_updated and www.site1.com/get_new_data) with curl or otherwise to make sure that they're responding within the time limit? Even if the amount of data that needs to be transferred is small, maybe there's a problem with the handler that's causing a delay in returning the results.

The amount of data being transferred is not the problem here, the latency is.
If the app you are talking to is often taking > 10 secs to respond, you will have to use a "proxy callback" server on another cloud platform (EC2, etc.) If you can hold off for a while the new backend instances are supposed to relax the urlfetch time limits somewhat.
If the average response time is < 10 secs, and only a relatively few are failing, just retry a few times. I hope for your sake the calls are idempotent (i.e. so that a retry doesn't have adverse effects). If not, you might be able to roll your own layer on top - it's a bit painful but it works ok, it's what we do.
J

The GAE doc now states the deadline can be 60 sec:
result = urlfetch(url,deadline=60,method=urlfetch.POST)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.