Right way to "timeout" a Request in Tornado - python

I managed to code a rather silly bug that would make one of my request handlers run a very slow DB query.
Interesting bit is that I noticed that even long-after siege completed Tornado was still churning through requests (sometimes 90s later). (Comment --> I'm not 100% sure of the workings of Siege, but I'm fairly sure it closed the connection..)
My question in two parts:
- Does Tornado cancel request handlers when client closes the connection?
- Is there a way to timeout request handlers in Tornado?
I read through the code and can't seem to find anything. Even though my request handlers are running asynchronously in the above bug the number of pending requests piled up to a level where it was slowing down the app and it would have been better to close out the connections.

Tornado does not automatically close the request handler when the client drops the connection. However, you can override on_connection_close to be alerted when the client drops, which would allow you to cancel the connection on your end. A context manager (or a decorator) could be used to handle setting a timeout for handling the request; use tornado.ioloop.IOLoop.add_timeout to schedule some method that times out the request to run after timeout as part of the __enter__ of the context manager, and then cancel that callback in the __exit__ block of the context manager. Here's an example demonstrating both of those ideas:
import time
import contextlib
from tornado.ioloop import IOLoop
import tornado.web
from tornado import gen
#gen.coroutine
def async_sleep(timeout):
yield gen.Task(IOLoop.instance().add_timeout, time.time() + timeout)
#contextlib.contextmanager
def auto_timeout(self, timeout=2): # Seconds
handle = IOLoop.instance().add_timeout(time.time() + timeout, self.timed_out)
try:
yield handle
except Exception as e:
print("Caught %s" % e)
finally:
IOLoop.instance().remove_timeout(handle)
if not self._timed_out:
self.finish()
else:
raise Exception("Request timed out") # Don't continue on passed this point
class TimeoutableHandler(tornado.web.RequestHandler):
def initialize(self):
self._timed_out = False
def timed_out(self):
self._timed_out = True
self.write("Request timed out!\n")
self.finish() # Connection to client closes here.
# You might want to do other clean up here.
class MainHandler(TimeoutableHandler):
#gen.coroutine
def get(self):
with auto_timeout(self): # We'll timeout after 2 seconds spent in this block.
self.sleeper = async_sleep(5)
yield self.sleeper
print("writing") # get will abort before we reach here if we timed out.
self.write("hey\n")
def on_connection_close(self):
# This isn't the greatest way to cancel a future, since it will not actually
# stop the work being done asynchronously. You'll need to cancel that some
# other way. Should be pretty straightforward with a DB connection (close
# the cursor/connection, maybe?)
self.sleeper.set_exception(Exception("cancelled"))
application = tornado.web.Application([
(r"/test", MainHandler),
])
application.listen(8888)
IOLoop.instance().start()

Another solution to this problem is to use gen.with_timeout:
import time
from tornado import gen
from tornado.util import TimeoutError
class MainHandler
#gen.coroutine
def get(self):
try:
# I'm using gen.sleep here but you can use any future in this place
yield gen.with_timeout(time.time() + 2, gen.sleep(5))
self.write("This will never be reached!!")
except TimeoutError as te:
logger.warning(te.__repr__())
self.timed_out()
def timed_out(self):
self.write("Request timed out!\n")
I liked the way handled by the contextlib solution but I'm was always getting logging leftovers.
The native coroutine solution would be:
async def get(self):
try:
await gen.with_timeout(time.time() + 2, gen.sleep(5))
self.write("This will never be reached!!")
except TimeoutError as te:
logger.warning(te.__repr__())
self.timed_out()

Related

Python how can I do a multithreading/asynchronous HTTP server with twisted

Now I wrote ferver by this tutorial:
https://twistedmatrix.com/documents/14.0.0/web/howto/web-in-60/asynchronous-deferred.html
But it seems to be good only for delayng process, not actually concurently process 2 or more requests. My full code is:
from twisted.internet.task import deferLater
from twisted.web.resource import Resource
from twisted.web.server import Site, NOT_DONE_YET
from twisted.internet import reactor, threads
from time import sleep
class DelayedResource(Resource):
def _delayedRender(self, request):
print 'Sorry to keep you waiting.'
request.write("<html><body>Sorry to keep you waiting.</body></html>")
request.finish()
def make_delay(self, request):
print 'Sleeping'
sleep(5)
return request
def render_GET(self, request):
d = threads.deferToThread(self.make_delay, request)
d.addCallback(self._delayedRender)
return NOT_DONE_YET
def main():
root = Resource()
root.putChild("social", DelayedResource())
factory = Site(root)
reactor.listenTCP(8880, factory)
print 'started httpserver...'
reactor.run()
if __name__ == '__main__':
main()
But when I passing 2 requests console output is like:
Sleeping
Sorry to keep you waiting.
Sleeping
Sorry to keep you waiting.
But if it was concurrent it should be like:
Sleeping
Sleeping
Sorry to keep you waiting.
Sorry to keep you waiting.
So the question is how to make twisted not to wait until response is finished before processing next?
Also make_delayIRL is a large function with heavi logic. Basically I spawn lot of threads and make requests to other urls and collecting results intro response, so it can take some time and not easly to be ported
Twisted processes everything in one event loop. If somethings blocks the execution, it also blocks Twisted. So you have to prevent blocking calls.
In your case you have time.sleep(5). It is blocking. You found the better way to do it in Twisted already: deferLater(). It returns a Deferred that will continue execution after the given time and release the events loop so other things can be done meanwhile. In general all things that return a deferred are good.
If you have to do heavy work that for some reason can not be deferred, you should use deferToThread() to execute this work in a thread. See https://twistedmatrix.com/documents/15.5.0/core/howto/threading.html for details.
You can use greenlents in your code (like threads).
You need to install the geventreactor - https://gist.github.com/yann2192/3394661
And use reactor.deferToGreenlet()
Also
In your long-calculation code need to call gevent.sleep() for change context to another greenlet.
msecs = 5 * 1000
timeout = 100
for xrange(0, msecs, timeout):
sleep(timeout)
gevent.sleep()

Tornado with ThreadPoolExecutor

I have setup that uses Tornado as http server and custom made http framework. Idea is to have single tornado handler and every request that arrives should be just submitted to ThreadPoolExecutor and leave Tornado to listen for new requests. Once thread finishes processing request, callback is called that sends response to client in same thread where IO loop is being executes.
Stripped down, code looks something like this. Base http server class:
class HttpServer():
def __init__(self, router, port, max_workers):
self.router = router
self.port = port
self.max_workers = max_workers
def run(self):
raise NotImplementedError()
Tornado backed implementation of HttpServer:
class TornadoServer(HttpServer):
def run(self):
executor = futures.ThreadPoolExecutor(max_workers=self.max_workers)
def submit(callback, **kwargs):
future = executor.submit(Request(**kwargs))
future.add_done_callback(callback)
return future
application = web.Application([
(r'(.*)', MainHandler, {
'submit': submit,
'router': self.router
})
])
application.listen(self.port)
ioloop.IOLoop.instance().start()
Main handler that handles all tornado requests (implemented only GET, but other would be the same):
class MainHandler():
def initialize(self, submit, router):
self.submit = submit
self.router = router
def worker(self, request):
responder, kwargs = self.router.resolve(request)
response = responder(**kwargs)
return res
def on_response(self, response):
# when this is called response should already have result
if isinstance(response, Future):
response = response.result()
# response is my own class, just write returned content to client
self.write(response.data)
self.flush()
self.finish()
def _on_response_ready(self, response):
# schedule response processing in ioloop, to be on ioloop thread
ioloop.IOLoop.current().add_callback(
partial(self.on_response, response)
)
#web.asynchronous
def get(self, url):
self.submit(
self._on_response_ready, # callback
url=url, method='post', original_request=self.request
)
Server is started with something like:
router = Router()
server = TornadoServer(router, 1111, max_workers=50)
server.run()
So, as you can see, main handler just submits every request to thread pool and when processing is done, callback is called (_on_response_ready) which just schedules request finish to be executed on IO loop (to make sure that it is done on same thread where IO loop is being executed).
This works. At least it looks like it does.
My problem here is performance regarding max workers in ThreadPoolExecutor.
All handlers are IO bound, there is no computation going on (they are mostly waiting for DB or external services), so with 50 workers I would expect 50 concurent requests to finish approximately 50 times faster then 50 concurent requests with only one worker.
But that is not the case. What I see is almost identical requests per second when I have 50 workers in thread pool and 1 worker.
For measuring, I have used Apache-Bench with something like:
ab -n 100 -c 10 http://localhost:1111/some_url
Does anybody have idea what am I doing wrong? Did I misunderstand how Tornado or ThreadPool works? Or combination?
The momoko wrapper for postgres remedies this issue, as suggested by kwarunek. If you want to solicit further debugging advice from outside collaborators, it would help to post timestamped debug logs from a test task that does sleep(10) before each DB access.

concurrency of heavy tasks in tornado

my code:
import tornado.tcpserver
import tornado.ioloop
import itertools
import socket
import time
class Talk():
def __init__(self, id):
self.id = id
#tornado.gen.coroutine
def on_connect(self):
try:
while "connection alive":
self.said = yield self.stream.read_until(b"\n")
response = yield tornado.gen.Task(self.task) ### LINE 1
yield self.stream.write(response) ### LINE 2
except tornado.iostream.StreamClosedError:
print('error: socket closed')
return
#tornado.gen.coroutine
def task(self):
if self.id == 1:
time.sleep(3) # sometimes request is heavy blocking
return b"response"
#tornado.gen.coroutine
def on_disconnect(self):
yield []
class Server(tornado.tcpserver.TCPServer):
def __init__(self, io_loop=None, ssl_options=None, max_buffer_size=None):
tornado.tcpserver.TCPServer.__init__(self,
io_loop=io_loop,
ssl_options=ssl_options,
max_buffer_size=max_buffer_size)
self.talk_id_alloc = itertools.count(1)
return
#tornado.gen.coroutine
def handle_stream(self, stream, address):
talk_id = next(self.talk_id_alloc)
talk = Talk(talk_id)
stream.set_close_callback(talk.on_disconnect)
stream.socket.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
stream.socket.setsockopt(socket.IPPROTO_TCP, socket.SO_KEEPALIVE, 1)
talk.stream = stream
yield talk.on_connect()
return
Server().listen(8888)
tornado.ioloop.IOLoop.instance().start()
problem:
I need a tornado as tcp server - it looks like a good choice for handling many requests with low computation.
however:
99% of requests will last less than 0,05 sec, but
1% of them can last even 3 sec (special cases).
single response must be returned at once, not partially.
what is best aproach here?
how to achieve a code where LINE #1 is never blocking more than 0.1 sec
yield tornado.gen.with_timeout(
datetime.timedelta(seconds=0.1), tornado.gen.Task(self.task))
doesnt work form me - do nothing
tornado.ioloop.IOLoop.current().add_timeout(
datetime.timedelta(seconds=0.1),
lambda: result.set_exception(TimeoutError("Timeout")))
either nothing.
looking for better solutions:
task can detect if need high computation (API ...) - using timeout?,
then run/fork to another thread or even process
and send to tornado server execption - "receive" me later from results queue (consumer/producer)
i dont want case where timeout kill heavy task without saving results, and task is reopened within special wrapper - so consumer/producer pattern should be for all tasks?
adding new ioloop when current is blocked - how detect blocking?
I dont see any solution in tornado.
task in line #1 could be simple (~99%) or complicated, which can require:
I/O:
- disk/DB access
- ram/redis access
network:
- API call
CPU:
- algorithms, regex
(the worst task will do all of above).
I know what kind of task it is (the weight) only when I start doing it,
so appriopriate is use a task queue in separate threads.
I dont want delay simple/quick tasks.
so if you manage to cancel the heavy tasks, I recommend cancelling them with a time-out and then spawning them off to another thread. Performance-wise this is not ideal (GIL) but you prevent tornado from blocking - which is your ultimate goal.
A nice write-up about how this can be done can be found here: http://lbolla.info/blog/2013/01/22/blocking-tornado.
If you want to go further you could use something like celery where you can offload to other processes transparently - though this much heavier.

Implementing and testing WebSocket server connection timeout

I am implementing a WebSockets server in Tornado 3.2. The client connecting to the server won't be a browser.
For cases in which there is back-and-forth communication between server and client, I would like to add a max. time the server will wait for a client response before closing the connection.
This is roughly what I've been trying:
import datetime
import tornado
class WSHandler(WebSocketHandler):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.timeout = None
def _close_on_timeout(self):
if self.ws_connection:
self.close()
def open(self):
initialize()
def on_message(self, message):
# Remove previous timeout, if one exists.
if self.timeout:
tornado.ioloop.IOLoop.instance().remove_timeout(self.timeout)
self.timeout = None
if is_last_message:
self.write_message(message)
self.close()
else:
# Add a new timeout.
self.timeout = tornado.ioloop.IOLoop.instance().add_timeout(
datetime.timedelta(milliseconds=1000), self._close_on_timeout)
self.write_message(message)
Am I being a klutz and is there a much simpler way of doing this? I can't even seem to schedule a simple print statement via add_timeout above.
I also need some help testing this. This is what I have so far:
from tornado.websocket import websocket_connect
from tornado.testing import AsyncHTTPTestCase, gen_test
import time
class WSTests(AsyncHTTPTestCase):
#gen_test
def test_long_response(self):
ws = yield websocket_connect('ws://address', io_loop=self.io_loop)
# First round trip.
ws.write_message('First message.')
result = yield ws.read_message()
self.assertEqual(result, 'First response.')
# Wait longer than the timeout.
# The test is in its own IOLoop, so a blocking sleep should be okay?
time.sleep(1.1)
# Expect either write or read to fail because of a closed socket.
ws.write_message('Second message.')
result = yield ws.read_message()
self.assertNotEqual(result, 'Second response.')
The client has no problem writing to and reading from the socket. This is presumably because the add_timeout isn't firing.
Does the test need to yield somehow to allow the timeout callback on the server to run? I would have thought not since the docs say the tests run in their own IOLoop.
Edit
This is the working version, per Ben's suggestions.
import datetime
import tornado
class WSHandler(WebSocketHandler):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.timeout = None
def _close_on_timeout(self):
if self.ws_connection:
self.close()
def open(self):
initialize()
def on_message(self, message):
# Remove previous timeout, if one exists.
if self.timeout:
tornado.ioloop.IOLoop.current().remove_timeout(self.timeout)
self.timeout = None
if is_last_message:
self.write_message(message)
self.close()
else:
# Add a new timeout.
self.timeout = tornado.ioloop.IOLoop.current().add_timeout(
datetime.timedelta(milliseconds=1000), self._close_on_timeout)
self.write_message(message)
The test:
from tornado.websocket import websocket_connect
from tornado.testing import AsyncHTTPTestCase, gen_test
import time
class WSTests(AsyncHTTPTestCase):
#gen_test
def test_long_response(self):
ws = yield websocket_connect('ws://address', io_loop=self.io_loop)
# First round trip.
ws.write_message('First message.')
result = yield ws.read_message()
self.assertEqual(result, 'First response.')
# Wait a little more than the timeout.
yield gen.Task(self.io_loop.add_timeout, datetime.timedelta(seconds=1.1))
# Expect either write or read to fail because of a closed socket.
ws.write_message('Second message.')
result = yield ws.read_message()
self.assertEqual(result, None)
The timeout-handling code in your first example looks correct to me.
For testing, each test case gets its own IOLoop, but there is only one IOLoop for both the test and anything else it runs, so you must use add_timeout instead of time.sleep() here as well to avoid blocking the server.
Ey Ben, I know this question was long ago resolved, but I wanted to share with any user reading this the solution I made for this.
It's basically based on yours, but it solves the problem from an external Service that can be easily integrated within any websocket using composition instead of inheritance:
class TimeoutWebSocketService():
_default_timeout_delta_ms = 10 * 60 * 1000 # 10 min
def __init__(self, websocket, ioloop=None, timeout=None):
# Timeout
self.ioloop = ioloop or tornado.ioloop.IOLoop.current()
self.websocket = websocket
self._timeout = None
self._timeout_delta_ms = timeout or TimeoutWebSocketService._default_timeout_delta_ms
def _close_on_timeout(self):
self._timeout = None
if self.websocket.ws_connection:
self.websocket.close()
def refresh_timeout(self, timeout=None):
timeout = timeout or self._timeout_delta_ms
if timeout > 0:
# Clean last timeout, if one exists
self.clean_timeout()
# Add a new timeout (must be None from clean).
self._timeout = self.ioloop.add_timeout(
datetime.timedelta(milliseconds=timeout), self._close_on_timeout)
def clean_timeout(self):
if self._timeout is not None:
# Remove previous timeout, if one exists.
self.ioloop.remove_timeout(self._timeout)
self._timeout = None
In order to use the service, itś as easy as create a new TimeoutWebService instance (optionally with the timeout in ms, as well as the ioloop where it should be executed) and call the method ''refresh_timeout'' to either set the timeout for the first time or reset an already existing timeout, or ''clean_timeout'' to stop the timeout service.
class BaseWebSocketHandler(WebSocketHandler):
def prepare(self):
self.timeout_service = TimeoutWebSocketService(timeout=(1000*60))
## Optionally starts the service here
self.timeout_service.refresh_timeout()
## rest of prepare method
def on_message(self):
self.timeout_service.refresh_timeout()
def on_close(self):
self.timeout_service.clean_timeout()
Thanks to this approach, you can control when exactly and under which conditions you want to restart the timeout which might differ from app to app. As an example you might only want to refresh the timeout if user acomplish X conditions, or if the message is the expected one.
I hope ppl enjoy this solution !

Timeout for python requests.get entire response

I'm gathering statistics on a list of websites and I'm using requests for it for simplicity. Here is my code:
data=[]
websites=['http://google.com', 'http://bbc.co.uk']
for w in websites:
r= requests.get(w, verify=False)
data.append( (r.url, len(r.content), r.elapsed.total_seconds(), str([(l.status_code, l.url) for l in r.history]), str(r.headers.items()), str(r.cookies.items())) )
Now, I want requests.get to timeout after 10 seconds so the loop doesn't get stuck.
This question has been of interest before too but none of the answers are clean.
I hear that maybe not using requests is a good idea but then how should I get the nice things requests offer (the ones in the tuple).
Set the timeout parameter:
r = requests.get(w, verify=False, timeout=10) # 10 seconds
Changes in version 2.25.1
The code above will cause the call to requests.get() to timeout if the connection or delays between reads takes more than ten seconds. See: https://requests.readthedocs.io/en/stable/user/advanced/#timeouts
What about using eventlet? If you want to timeout the request after 10 seconds, even if data is being received, this snippet will work for you:
import requests
import eventlet
eventlet.monkey_patch()
with eventlet.Timeout(10):
requests.get("http://ipv4.download.thinkbroadband.com/1GB.zip", verify=False)
UPDATE: https://requests.readthedocs.io/en/master/user/advanced/#timeouts
In new version of requests:
If you specify a single value for the timeout, like this:
r = requests.get('https://github.com', timeout=5)
The timeout value will be applied to both the connect and the read timeouts. Specify a tuple if you would like to set the values separately:
r = requests.get('https://github.com', timeout=(3.05, 27))
If the remote server is very slow, you can tell Requests to wait forever for a response, by passing None as a timeout value and then retrieving a cup of coffee.
r = requests.get('https://github.com', timeout=None)
My old (probably outdated) answer (which was posted long time ago):
There are other ways to overcome this problem:
1. Use the TimeoutSauce internal class
From: https://github.com/kennethreitz/requests/issues/1928#issuecomment-35811896
import requests from requests.adapters import TimeoutSauce
class MyTimeout(TimeoutSauce):
def __init__(self, *args, **kwargs):
connect = kwargs.get('connect', 5)
read = kwargs.get('read', connect)
super(MyTimeout, self).__init__(connect=connect, read=read)
requests.adapters.TimeoutSauce = MyTimeout
This code should cause us to set the read timeout as equal to the
connect timeout, which is the timeout value you pass on your
Session.get() call. (Note that I haven't actually tested this code, so
it may need some quick debugging, I just wrote it straight into the
GitHub window.)
2. Use a fork of requests from kevinburke: https://github.com/kevinburke/requests/tree/connect-timeout
From its documentation: https://github.com/kevinburke/requests/blob/connect-timeout/docs/user/advanced.rst
If you specify a single value for the timeout, like this:
r = requests.get('https://github.com', timeout=5)
The timeout value will be applied to both the connect and the read
timeouts. Specify a tuple if you would like to set the values
separately:
r = requests.get('https://github.com', timeout=(3.05, 27))
kevinburke has requested it to be merged into the main requests project, but it hasn't been accepted yet.
timeout = int(seconds)
Since requests >= 2.4.0, you can use the timeout argument, i.e:
requests.get('https://duckduckgo.com/', timeout=10)
Note:
timeout is not a time limit on the entire response download; rather,
an exception is raised if the server has not issued a response for
timeout seconds ( more precisely, if no bytes have been received on the
underlying socket for timeout seconds). If no timeout is specified
explicitly, requests do not time out.
To create a timeout you can use signals.
The best way to solve this case is probably to
Set an exception as the handler for the alarm signal
Call the alarm signal with a ten second delay
Call the function inside a try-except-finally block.
The except block is reached if the function timed out.
In the finally block you abort the alarm, so it's not singnaled later.
Here is some example code:
import signal
from time import sleep
class TimeoutException(Exception):
""" Simple Exception to be called on timeouts. """
pass
def _timeout(signum, frame):
""" Raise an TimeoutException.
This is intended for use as a signal handler.
The signum and frame arguments passed to this are ignored.
"""
# Raise TimeoutException with system default timeout message
raise TimeoutException()
# Set the handler for the SIGALRM signal:
signal.signal(signal.SIGALRM, _timeout)
# Send the SIGALRM signal in 10 seconds:
signal.alarm(10)
try:
# Do our code:
print('This will take 11 seconds...')
sleep(11)
print('done!')
except TimeoutException:
print('It timed out!')
finally:
# Abort the sending of the SIGALRM signal:
signal.alarm(0)
There are some caveats to this:
It is not threadsafe, signals are always delivered to the main thread, so you can't put this in any other thread.
There is a slight delay after the scheduling of the signal and the execution of the actual code. This means that the example would time out even if it only slept for ten seconds.
But, it's all in the standard python library! Except for the sleep function import it's only one import. If you are going to use timeouts many places You can easily put the TimeoutException, _timeout and the singaling in a function and just call that. Or you can make a decorator and put it on functions, see the answer linked below.
You can also set this up as a "context manager" so you can use it with the with statement:
import signal
class Timeout():
""" Timeout for use with the `with` statement. """
class TimeoutException(Exception):
""" Simple Exception to be called on timeouts. """
pass
def _timeout(signum, frame):
""" Raise an TimeoutException.
This is intended for use as a signal handler.
The signum and frame arguments passed to this are ignored.
"""
raise Timeout.TimeoutException()
def __init__(self, timeout=10):
self.timeout = timeout
signal.signal(signal.SIGALRM, Timeout._timeout)
def __enter__(self):
signal.alarm(self.timeout)
def __exit__(self, exc_type, exc_value, traceback):
signal.alarm(0)
return exc_type is Timeout.TimeoutException
# Demonstration:
from time import sleep
print('This is going to take maximum 10 seconds...')
with Timeout(10):
sleep(15)
print('No timeout?')
print('Done')
One possible down side with this context manager approach is that you can't know if the code actually timed out or not.
Sources and recommended reading:
The documentation on signals
This answer on timeouts by #David Narayan. He has organized the above code as a decorator.
Try this request with timeout & error handling:
import requests
try:
url = "http://google.com"
r = requests.get(url, timeout=10)
except requests.exceptions.Timeout as e:
print e
The connect timeout is the number of seconds Requests will wait for your client to establish a connection to a remote machine (corresponding to the connect()) call on the socket. It’s a good practice to set connect timeouts to slightly larger than a multiple of 3, which is the default TCP packet retransmission window.
Once your client has connected to the server and sent the HTTP request, the read timeout started. It is the number of seconds the client will wait for the server to send a response. (Specifically, it’s the number of seconds that the client will wait between bytes sent from the server. In 99.9% of cases, this is the time before the server sends the first byte).
If you specify a single value for the timeout, The timeout value will be applied to both the connect and the read timeouts. like below:
r = requests.get('https://github.com', timeout=5)
Specify a tuple if you would like to set the values separately for connect and read:
r = requests.get('https://github.com', timeout=(3.05, 27))
If the remote server is very slow, you can tell Requests to wait forever for a response, by passing None as a timeout value and then retrieving a cup of coffee.
r = requests.get('https://github.com', timeout=None)
https://docs.python-requests.org/en/latest/user/advanced/#timeouts
Most other answers are incorrect
Despite all the answers, I believe that this thread still lacks a proper solution and no existing answer presents a reasonable way to do something which should be simple and obvious.
Let's start by saying that as of 2022, there is still absolutely no way to do it properly with requests alone. It is a concious design decision by the library's developers.
Solutions utilizing the timeout parameter simply do not accomplish what they intend to do. The fact that it "seems" to work at the first glance is purely incidental:
The timeout parameter has absolutely nothing to do with the total execution time of the request. It merely controls the maximum amount of time that can pass before underlying socket receives any data. With an example timeout of 5 seconds, server can just as well send 1 byte of data every 4 seconds and it will be perfectly okay, but won't help you very much.
Answers with stream and iter_content are somewhat better, but they still do not cover everything in a request. You do not actually receive anything from iter_content until after response headers are sent, which falls under the same issue - even if you use 1 byte as a chunk size for iter_content, reading full response headers could take a totally arbitrary amount of time and you can never actually get to the point in which you read any response body from iter_content.
Here are some examples that completely break both timeout and stream-based approach. Try them all. They all hang indefinitely, no matter which method you use.
server.py
import socket
import time
server = socket.socket()
server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, True)
server.bind(('127.0.0.1', 8080))
server.listen()
while True:
try:
sock, addr = server.accept()
print('Connection from', addr)
sock.send(b'HTTP/1.1 200 OK\r\n')
# Send some garbage headers very slowly but steadily.
# Never actually complete the response.
while True:
sock.send(b'a')
time.sleep(1)
except:
pass
demo1.py
import requests
requests.get('http://localhost:8080')
demo2.py
import requests
requests.get('http://localhost:8080', timeout=5)
demo3.py
import requests
requests.get('http://localhost:8080', timeout=(5, 5))
demo4.py
import requests
with requests.get('http://localhost:8080', timeout=(5, 5), stream=True) as res:
for chunk in res.iter_content(1):
break
The proper solution
My approach utilizes Python's sys.settrace function. It is dead simple. You do not need to use any external libraries or turn your code upside down. Unlike most other answers, this actually guarantees that the code executes in specified time. Be aware that you still need to specify the timeout parameter, as settrace only concerns Python code. Actual socket reads are external syscalls which are not covered by settrace, but are covered by the timeout parameter. Due to this fact, the exact time limit is not TOTAL_TIMEOUT, but a value which is explained in comments below.
import requests
import sys
import time
# This function serves as a "hook" that executes for each Python statement
# down the road. There may be some performance penalty, but as downloading
# a webpage is mostly I/O bound, it's not going to be significant.
def trace_function(frame, event, arg):
if time.time() - start > TOTAL_TIMEOUT:
raise Exception('Timed out!') # Use whatever exception you consider appropriate.
return trace_function
# The following code will terminate at most after TOTAL_TIMEOUT + the highest
# value specified in `timeout` parameter of `requests.get`.
# In this case 10 + 6 = 16 seconds.
# For most cases though, it's gonna terminate no later than TOTAL_TIMEOUT.
TOTAL_TIMEOUT = 10
start = time.time()
sys.settrace(trace_function)
try:
res = requests.get('http://localhost:8080', timeout=(3, 6)) # Use whatever timeout values you consider appropriate.
except:
raise
finally:
sys.settrace(None) # Remove the time constraint and continue normally.
# Do something with the response
Condensed
import requests, sys, time
TOTAL_TIMEOUT = 10
def trace_function(frame, event, arg):
if time.time() - start > TOTAL_TIMEOUT:
raise Exception('Timed out!')
return trace_function
start = time.time()
sys.settrace(trace_function)
try:
res = requests.get('http://localhost:8080', timeout=(3, 6))
except:
raise
finally:
sys.settrace(None)
That's it!
Set stream=True and use r.iter_content(1024). Yes, eventlet.Timeout just somehow doesn't work for me.
try:
start = time()
timeout = 5
with get(config['source']['online'], stream=True, timeout=timeout) as r:
r.raise_for_status()
content = bytes()
content_gen = r.iter_content(1024)
while True:
if time()-start > timeout:
raise TimeoutError('Time out! ({} seconds)'.format(timeout))
try:
content += next(content_gen)
except StopIteration:
break
data = content.decode().split('\n')
if len(data) in [0, 1]:
raise ValueError('Bad requests data')
except (exceptions.RequestException, ValueError, IndexError, KeyboardInterrupt,
TimeoutError) as e:
print(e)
with open(config['source']['local']) as f:
data = [line.strip() for line in f.readlines()]
The discussion is here https://redd.it/80kp1h
This may be overkill, but the Celery distributed task queue has good support for timeouts.
In particular, you can define a soft time limit that just raises an exception in your process (so you can clean up) and/or a hard time limit that terminates the task when the time limit has been exceeded.
Under the covers, this uses the same signals approach as referenced in your "before" post, but in a more usable and manageable way. And if the list of web sites you are monitoring is long, you might benefit from its primary feature -- all kinds of ways to manage the execution of a large number of tasks.
I believe you can use multiprocessing and not depend on a 3rd party package:
import multiprocessing
import requests
def call_with_timeout(func, args, kwargs, timeout):
manager = multiprocessing.Manager()
return_dict = manager.dict()
# define a wrapper of `return_dict` to store the result.
def function(return_dict):
return_dict['value'] = func(*args, **kwargs)
p = multiprocessing.Process(target=function, args=(return_dict,))
p.start()
# Force a max. `timeout` or wait for the process to finish
p.join(timeout)
# If thread is still active, it didn't finish: raise TimeoutError
if p.is_alive():
p.terminate()
p.join()
raise TimeoutError
else:
return return_dict['value']
call_with_timeout(requests.get, args=(url,), kwargs={'timeout': 10}, timeout=60)
The timeout passed to kwargs is the timeout to get any response from the server, the argument timeout is the timeout to get the complete response.
Despite the question being about requests, I find this very easy to do with pycurl CURLOPT_TIMEOUT or CURLOPT_TIMEOUT_MS.
No threading or signaling required:
import pycurl
import StringIO
url = 'http://www.example.com/example.zip'
timeout_ms = 1000
raw = StringIO.StringIO()
c = pycurl.Curl()
c.setopt(pycurl.TIMEOUT_MS, timeout_ms) # total timeout in milliseconds
c.setopt(pycurl.WRITEFUNCTION, raw.write)
c.setopt(pycurl.NOSIGNAL, 1)
c.setopt(pycurl.URL, url)
c.setopt(pycurl.HTTPGET, 1)
try:
c.perform()
except pycurl.error:
traceback.print_exc() # error generated on timeout
pass # or just pass if you don't want to print the error
In case you're using the option stream=True you can do this:
r = requests.get(
'http://url_to_large_file',
timeout=1, # relevant only for underlying socket
stream=True)
with open('/tmp/out_file.txt'), 'wb') as f:
start_time = time.time()
for chunk in r.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
if time.time() - start_time > 8:
raise Exception('Request took longer than 8s')
The solution does not need signals or multiprocessing.
Just another one solution (got it from http://docs.python-requests.org/en/master/user/advanced/#streaming-uploads)
Before upload you can find out the content size:
TOO_LONG = 10*1024*1024 # 10 Mb
big_url = "http://ipv4.download.thinkbroadband.com/1GB.zip"
r = requests.get(big_url, stream=True)
print (r.headers['content-length'])
# 1073741824
if int(r.headers['content-length']) < TOO_LONG:
# upload content:
content = r.content
But be careful, a sender can set up incorrect value in the 'content-length' response field.
timeout = (connection timeout, data read timeout) or give a single argument(timeout=1)
import requests
try:
req = requests.request('GET', 'https://www.google.com',timeout=(1,1))
print(req)
except requests.ReadTimeout:
print("READ TIME OUT")
this code working for socketError 11004 and 10060......
# -*- encoding:UTF-8 -*-
__author__ = 'ACE'
import requests
from PyQt4.QtCore import *
from PyQt4.QtGui import *
class TimeOutModel(QThread):
Existed = pyqtSignal(bool)
TimeOut = pyqtSignal()
def __init__(self, fun, timeout=500, parent=None):
"""
#param fun: function or lambda
#param timeout: ms
"""
super(TimeOutModel, self).__init__(parent)
self.fun = fun
self.timeer = QTimer(self)
self.timeer.setInterval(timeout)
self.timeer.timeout.connect(self.time_timeout)
self.Existed.connect(self.timeer.stop)
self.timeer.start()
self.setTerminationEnabled(True)
def time_timeout(self):
self.timeer.stop()
self.TimeOut.emit()
self.quit()
self.terminate()
def run(self):
self.fun()
bb = lambda: requests.get("http://ipv4.download.thinkbroadband.com/1GB.zip")
a = QApplication([])
z = TimeOutModel(bb, 500)
print 'timeout'
a.exec_()
Well, I tried many solutions on this page and still faced instabilities, random hangs, poor connections performance.
I'm now using Curl and i'm really happy about it's "max time" functionnality and about the global performances, even with such a poor implementation :
content=commands.getoutput('curl -m6 -Ss "http://mywebsite.xyz"')
Here, I defined a 6 seconds max time parameter, englobing both connection and transfer time.
I'm sure Curl has a nice python binding, if you prefer to stick to the pythonic syntax :)
There is a package called timeout-decorator that you can use to time out any python function.
#timeout_decorator.timeout(5)
def mytest():
print("Start")
for i in range(1,10):
time.sleep(1)
print("{} seconds have passed".format(i))
It uses the signals approach that some answers here suggest. Alternatively, you can tell it to use multiprocessing instead of signals (e.g. if you are in a multi-thread environment).
If it comes to that, create a watchdog thread that messes up requests' internal state after 10 seconds, e.g.:
closes the underlying socket, and ideally
triggers an exception if requests retries the operation
Note that depending on the system libraries you may be unable to set deadline on DNS resolution.
I'm using requests 2.2.1 and eventlet didn't work for me. Instead I was able use gevent timeout instead since gevent is used in my service for gunicorn.
import gevent
import gevent.monkey
gevent.monkey.patch_all(subprocess=True)
try:
with gevent.Timeout(5):
ret = requests.get(url)
print ret.status_code, ret.content
except gevent.timeout.Timeout as e:
print "timeout: {}".format(e.message)
Please note that gevent.timeout.Timeout is not caught by general Exception handling.
So either explicitly catch gevent.timeout.Timeout
or pass in a different exception to be used like so: with gevent.Timeout(5, requests.exceptions.Timeout): although no message is passed when this exception is raised.
The biggest problem is that if the connection can't be established, the requests package waits too long and blocks the rest of the program.
There are several ways how to tackle the problem but when I looked for a oneliner similar to requests, I couldn't find anything. That's why I built a wrapper around requests called reqto ("requests timeout"), which supports proper timeout for all standard methods from requests.
pip install reqto
The syntax is identical to requests
import reqto
response = reqto.get(f'https://pypi.org/pypi/reqto/json',timeout=1)
# Will raise an exception on Timeout
print(response)
Moreover, you can set up a custom timeout function
def custom_function(parameter):
print(parameter)
response = reqto.get(f'https://pypi.org/pypi/reqto/json',timeout=5,timeout_function=custom_function,timeout_args="Timeout custom function called")
#Will call timeout_function instead of raising an exception on Timeout
print(response)
Important note is that the import line
import reqto
needs to be earlier import than all other imports working with requests, threading, etc. due to monkey_patch which runs in the background.
I came up with a more direct solution that is admittedly ugly but fixes the real problem. It goes a bit like this:
resp = requests.get(some_url, stream=True)
resp.raw._fp.fp._sock.settimeout(read_timeout)
# This will load the entire response even though stream is set
content = resp.content
You can read the full explanation here

Categories

Resources