Timeout for requests.post not working in Python - python

I have a python script using requests.post :
try:
r = request.post(url, json=data, timeout=10)
except requests.Timeout:
print("timeout")
(I have also tried with except Timeout: and except requests.exceptions.Timeout)
This code should print "Timeout" after around 10 seconds if the server is down, right?
However, it doesn't. My script is waiting indefinitely, like if timeout was None
Do you know why?
Thanks
EDIT
Here is the whole code:
import requests
from twisted.internet import task, reactor
import json
import sys
import os\
timeout = 30 # 30 sec timeout to loop PostParam
url = os.getenv('URL',"http://127.0.0.1:5000")
def PostParams():
# Subscribe to MQTT Broker
data = subscribing_broker()
# Iterate in the JSON payload to get the different units
for unit in data:
try:
# Make the POST request - data as JSON - blocking call - timeout 10s
req_result = requests.post(url, json=unit, timeout=10)
# Get the answer in json
pred = req_result.json()
# Publish to MQTT
publishing_broker(pred, pub_topic )
# Connection timeout, continue
except Timeout:
print("Connection timed out, passing")
pass
# Infinite loop - Timeout is 30 sec
loop = task.LoopingCall(PostParams)
loop.start(timeout)
reactor.run()

Related

Delay between when packets returned and when asynchronous python code receives them?

I have about 130 asynchronous GET requests being sent using httpx and asyncio in python, via a proxy which I created myself on AWS.
In the python script, I have printed the time just before each request is sent and can see that they are all sent within less than 70ms. However, I have timed the duration of the requests by getting the current time immediately after and some requests take up to 30 seconds! The distribution seems pretty level over this time
so I am getting back about 3-5 requests every second for 30 seconds.
I used tcpdump and wireshark to look at the packets coming back, and it seems that all the application data is coming back within 4 seconds (including the tcp handshakes) so I don't understand the reason for the delay in python.
The tcp teardowns are happening up to 35 seconds later so maybe this could be the reason for the delay? Does httpx wait for the connection to close (FIN and ACK) before the httpx.get() is unblocked and the request can be read?
What can I try to speed this up?
Here is a simplified version of my code:
import asyncio
import datetime
import httpx
from utils import store_data, get_proxy_addr
CLIENT = None
async def get_and_store_thing_data(thing):
t0 = datetime.now()
res = await CLIENT.get('https://www.placetogetdata.com', params={'thing': thing})
t1 = datetime.now()
# It's this line that shows the time is anywhere from 0-30 seconds for the
# request to return
print(f'time taken: {t1-t0}')
data = res.json()
store_data(data)
return data
def get_tasks(things):
tasks = []
for thing in things:
tasks = get_and_store_thing_data(thing)
tasks.append(tasks)
return tasks
async def run_tasks(tasks):
global CLIENT
CLIENT = httpx.AsyncClient(proxies={'https://': proxy_addr})
try:
await asyncio.wait(tasks)
finally:
await CLIENT.aclose()
def run():
proxy_addr = get_proxy_addr()
tasks = get_tasks
asyncio.run(run_tasks(tasks, proxy_addr))

ReadTimeout: HTTPSConnectionPool(host='', port=443): Read timed out. (read timeout=10)

I'm doing a webscraping on a site and sometimes when running the script I get this error:
ReadTimeout: HTTPSConnectionPool(host='...', port=443): Read timed out. (read timeout=10)
My code:
url = 'mysite.com'
all_links_page = []
page_one = requests.get(url, headers=getHeaders(), timeout=10)
sleep(2)
if page_one.status_code == requests.codes.ok:
soup_one = BeautifulSoup(page_one.content.decode('utf-8'), 'lxml')
page_links_one = soup_one.select("ul.product_list")
for links_one in page_links_one:
for li in links_one.select("li"):
all_links_page.append(li.a.get("href").strip())
The answers I found was not satisfactory
I was helped by increasing the timeout, immediately set 120 seconds. It turned out that the response from the server comes within 40 seconds.
Why do you have the timeout parameter in there? I would just eliminate the timeout parameter. The reason you get that error is because you set it to 10 which says if you don't receive a response from the server in 10 seconds, raise and error. So it's not necessarily the server calling you out. If no timeout is specified explicitly, requests do not time out (at least on your end).
page_one = requests.get(url, headers=headers) #< --- don't use the timeout parameter
This exception might occurs due to timeout or the available memory:
The response from the server takes longer than the specified timeout. So to solve it you need to set a higher timeout.
The file your are trying to read is large and the socket buffer is not enough to handle it. So you can try increasing the buffer size based on your machine's capacity.
import urllib3, socket
from urllib3.connection import HTTPConnection
HTTPConnection.default_socket_options = (
HTTPConnection.default_socket_options + [
(socket.SOL_SOCKET, socket.SO_SNDBUF, 1000000), #1MB in byte
(socket.SOL_SOCKET, socket.SO_RCVBUF, 1000000)
])

Python - How to capture gevent socket timeout exception

import gevent.monkey
gevent.monkey.patch_socket()
import requests
from gevent.pool import Pool
import socket
urls = ["http://www.iraniansingles.com"]
def check_urls(urls):
pool = Pool(1)
for url in urls:
pool.spawn(fetch, url)
pool.join()
def fetch(url):
print url
try:
resp = requests.get(url, verify=False, timeout=5.0)
print resp.status_code
except socket.timeout:
print "SocketTimeout"
check_urls(urls)
If I remove the first 2 lines, my program printing SocketTimeout. But with monkeypatch, my program waits forever.
Can someone tell me how to capture that socket timeout exception with monkeypatch?
Problem was gevent default timeout set to None. So we have to set default socket timeout manually.
from gevent import socket
socket.setdefaulttimeout(5)

Timeout for python requests.get entire response

I'm gathering statistics on a list of websites and I'm using requests for it for simplicity. Here is my code:
data=[]
websites=['http://google.com', 'http://bbc.co.uk']
for w in websites:
r= requests.get(w, verify=False)
data.append( (r.url, len(r.content), r.elapsed.total_seconds(), str([(l.status_code, l.url) for l in r.history]), str(r.headers.items()), str(r.cookies.items())) )
Now, I want requests.get to timeout after 10 seconds so the loop doesn't get stuck.
This question has been of interest before too but none of the answers are clean.
I hear that maybe not using requests is a good idea but then how should I get the nice things requests offer (the ones in the tuple).
Set the timeout parameter:
r = requests.get(w, verify=False, timeout=10) # 10 seconds
Changes in version 2.25.1
The code above will cause the call to requests.get() to timeout if the connection or delays between reads takes more than ten seconds. See: https://requests.readthedocs.io/en/stable/user/advanced/#timeouts
What about using eventlet? If you want to timeout the request after 10 seconds, even if data is being received, this snippet will work for you:
import requests
import eventlet
eventlet.monkey_patch()
with eventlet.Timeout(10):
requests.get("http://ipv4.download.thinkbroadband.com/1GB.zip", verify=False)
UPDATE: https://requests.readthedocs.io/en/master/user/advanced/#timeouts
In new version of requests:
If you specify a single value for the timeout, like this:
r = requests.get('https://github.com', timeout=5)
The timeout value will be applied to both the connect and the read timeouts. Specify a tuple if you would like to set the values separately:
r = requests.get('https://github.com', timeout=(3.05, 27))
If the remote server is very slow, you can tell Requests to wait forever for a response, by passing None as a timeout value and then retrieving a cup of coffee.
r = requests.get('https://github.com', timeout=None)
My old (probably outdated) answer (which was posted long time ago):
There are other ways to overcome this problem:
1. Use the TimeoutSauce internal class
From: https://github.com/kennethreitz/requests/issues/1928#issuecomment-35811896
import requests from requests.adapters import TimeoutSauce
class MyTimeout(TimeoutSauce):
def __init__(self, *args, **kwargs):
connect = kwargs.get('connect', 5)
read = kwargs.get('read', connect)
super(MyTimeout, self).__init__(connect=connect, read=read)
requests.adapters.TimeoutSauce = MyTimeout
This code should cause us to set the read timeout as equal to the
connect timeout, which is the timeout value you pass on your
Session.get() call. (Note that I haven't actually tested this code, so
it may need some quick debugging, I just wrote it straight into the
GitHub window.)
2. Use a fork of requests from kevinburke: https://github.com/kevinburke/requests/tree/connect-timeout
From its documentation: https://github.com/kevinburke/requests/blob/connect-timeout/docs/user/advanced.rst
If you specify a single value for the timeout, like this:
r = requests.get('https://github.com', timeout=5)
The timeout value will be applied to both the connect and the read
timeouts. Specify a tuple if you would like to set the values
separately:
r = requests.get('https://github.com', timeout=(3.05, 27))
kevinburke has requested it to be merged into the main requests project, but it hasn't been accepted yet.
timeout = int(seconds)
Since requests >= 2.4.0, you can use the timeout argument, i.e:
requests.get('https://duckduckgo.com/', timeout=10)
Note:
timeout is not a time limit on the entire response download; rather,
an exception is raised if the server has not issued a response for
timeout seconds ( more precisely, if no bytes have been received on the
underlying socket for timeout seconds). If no timeout is specified
explicitly, requests do not time out.
To create a timeout you can use signals.
The best way to solve this case is probably to
Set an exception as the handler for the alarm signal
Call the alarm signal with a ten second delay
Call the function inside a try-except-finally block.
The except block is reached if the function timed out.
In the finally block you abort the alarm, so it's not singnaled later.
Here is some example code:
import signal
from time import sleep
class TimeoutException(Exception):
""" Simple Exception to be called on timeouts. """
pass
def _timeout(signum, frame):
""" Raise an TimeoutException.
This is intended for use as a signal handler.
The signum and frame arguments passed to this are ignored.
"""
# Raise TimeoutException with system default timeout message
raise TimeoutException()
# Set the handler for the SIGALRM signal:
signal.signal(signal.SIGALRM, _timeout)
# Send the SIGALRM signal in 10 seconds:
signal.alarm(10)
try:
# Do our code:
print('This will take 11 seconds...')
sleep(11)
print('done!')
except TimeoutException:
print('It timed out!')
finally:
# Abort the sending of the SIGALRM signal:
signal.alarm(0)
There are some caveats to this:
It is not threadsafe, signals are always delivered to the main thread, so you can't put this in any other thread.
There is a slight delay after the scheduling of the signal and the execution of the actual code. This means that the example would time out even if it only slept for ten seconds.
But, it's all in the standard python library! Except for the sleep function import it's only one import. If you are going to use timeouts many places You can easily put the TimeoutException, _timeout and the singaling in a function and just call that. Or you can make a decorator and put it on functions, see the answer linked below.
You can also set this up as a "context manager" so you can use it with the with statement:
import signal
class Timeout():
""" Timeout for use with the `with` statement. """
class TimeoutException(Exception):
""" Simple Exception to be called on timeouts. """
pass
def _timeout(signum, frame):
""" Raise an TimeoutException.
This is intended for use as a signal handler.
The signum and frame arguments passed to this are ignored.
"""
raise Timeout.TimeoutException()
def __init__(self, timeout=10):
self.timeout = timeout
signal.signal(signal.SIGALRM, Timeout._timeout)
def __enter__(self):
signal.alarm(self.timeout)
def __exit__(self, exc_type, exc_value, traceback):
signal.alarm(0)
return exc_type is Timeout.TimeoutException
# Demonstration:
from time import sleep
print('This is going to take maximum 10 seconds...')
with Timeout(10):
sleep(15)
print('No timeout?')
print('Done')
One possible down side with this context manager approach is that you can't know if the code actually timed out or not.
Sources and recommended reading:
The documentation on signals
This answer on timeouts by #David Narayan. He has organized the above code as a decorator.
Try this request with timeout & error handling:
import requests
try:
url = "http://google.com"
r = requests.get(url, timeout=10)
except requests.exceptions.Timeout as e:
print e
The connect timeout is the number of seconds Requests will wait for your client to establish a connection to a remote machine (corresponding to the connect()) call on the socket. It’s a good practice to set connect timeouts to slightly larger than a multiple of 3, which is the default TCP packet retransmission window.
Once your client has connected to the server and sent the HTTP request, the read timeout started. It is the number of seconds the client will wait for the server to send a response. (Specifically, it’s the number of seconds that the client will wait between bytes sent from the server. In 99.9% of cases, this is the time before the server sends the first byte).
If you specify a single value for the timeout, The timeout value will be applied to both the connect and the read timeouts. like below:
r = requests.get('https://github.com', timeout=5)
Specify a tuple if you would like to set the values separately for connect and read:
r = requests.get('https://github.com', timeout=(3.05, 27))
If the remote server is very slow, you can tell Requests to wait forever for a response, by passing None as a timeout value and then retrieving a cup of coffee.
r = requests.get('https://github.com', timeout=None)
https://docs.python-requests.org/en/latest/user/advanced/#timeouts
Most other answers are incorrect
Despite all the answers, I believe that this thread still lacks a proper solution and no existing answer presents a reasonable way to do something which should be simple and obvious.
Let's start by saying that as of 2022, there is still absolutely no way to do it properly with requests alone. It is a concious design decision by the library's developers.
Solutions utilizing the timeout parameter simply do not accomplish what they intend to do. The fact that it "seems" to work at the first glance is purely incidental:
The timeout parameter has absolutely nothing to do with the total execution time of the request. It merely controls the maximum amount of time that can pass before underlying socket receives any data. With an example timeout of 5 seconds, server can just as well send 1 byte of data every 4 seconds and it will be perfectly okay, but won't help you very much.
Answers with stream and iter_content are somewhat better, but they still do not cover everything in a request. You do not actually receive anything from iter_content until after response headers are sent, which falls under the same issue - even if you use 1 byte as a chunk size for iter_content, reading full response headers could take a totally arbitrary amount of time and you can never actually get to the point in which you read any response body from iter_content.
Here are some examples that completely break both timeout and stream-based approach. Try them all. They all hang indefinitely, no matter which method you use.
server.py
import socket
import time
server = socket.socket()
server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, True)
server.bind(('127.0.0.1', 8080))
server.listen()
while True:
try:
sock, addr = server.accept()
print('Connection from', addr)
sock.send(b'HTTP/1.1 200 OK\r\n')
# Send some garbage headers very slowly but steadily.
# Never actually complete the response.
while True:
sock.send(b'a')
time.sleep(1)
except:
pass
demo1.py
import requests
requests.get('http://localhost:8080')
demo2.py
import requests
requests.get('http://localhost:8080', timeout=5)
demo3.py
import requests
requests.get('http://localhost:8080', timeout=(5, 5))
demo4.py
import requests
with requests.get('http://localhost:8080', timeout=(5, 5), stream=True) as res:
for chunk in res.iter_content(1):
break
The proper solution
My approach utilizes Python's sys.settrace function. It is dead simple. You do not need to use any external libraries or turn your code upside down. Unlike most other answers, this actually guarantees that the code executes in specified time. Be aware that you still need to specify the timeout parameter, as settrace only concerns Python code. Actual socket reads are external syscalls which are not covered by settrace, but are covered by the timeout parameter. Due to this fact, the exact time limit is not TOTAL_TIMEOUT, but a value which is explained in comments below.
import requests
import sys
import time
# This function serves as a "hook" that executes for each Python statement
# down the road. There may be some performance penalty, but as downloading
# a webpage is mostly I/O bound, it's not going to be significant.
def trace_function(frame, event, arg):
if time.time() - start > TOTAL_TIMEOUT:
raise Exception('Timed out!') # Use whatever exception you consider appropriate.
return trace_function
# The following code will terminate at most after TOTAL_TIMEOUT + the highest
# value specified in `timeout` parameter of `requests.get`.
# In this case 10 + 6 = 16 seconds.
# For most cases though, it's gonna terminate no later than TOTAL_TIMEOUT.
TOTAL_TIMEOUT = 10
start = time.time()
sys.settrace(trace_function)
try:
res = requests.get('http://localhost:8080', timeout=(3, 6)) # Use whatever timeout values you consider appropriate.
except:
raise
finally:
sys.settrace(None) # Remove the time constraint and continue normally.
# Do something with the response
Condensed
import requests, sys, time
TOTAL_TIMEOUT = 10
def trace_function(frame, event, arg):
if time.time() - start > TOTAL_TIMEOUT:
raise Exception('Timed out!')
return trace_function
start = time.time()
sys.settrace(trace_function)
try:
res = requests.get('http://localhost:8080', timeout=(3, 6))
except:
raise
finally:
sys.settrace(None)
That's it!
Set stream=True and use r.iter_content(1024). Yes, eventlet.Timeout just somehow doesn't work for me.
try:
start = time()
timeout = 5
with get(config['source']['online'], stream=True, timeout=timeout) as r:
r.raise_for_status()
content = bytes()
content_gen = r.iter_content(1024)
while True:
if time()-start > timeout:
raise TimeoutError('Time out! ({} seconds)'.format(timeout))
try:
content += next(content_gen)
except StopIteration:
break
data = content.decode().split('\n')
if len(data) in [0, 1]:
raise ValueError('Bad requests data')
except (exceptions.RequestException, ValueError, IndexError, KeyboardInterrupt,
TimeoutError) as e:
print(e)
with open(config['source']['local']) as f:
data = [line.strip() for line in f.readlines()]
The discussion is here https://redd.it/80kp1h
This may be overkill, but the Celery distributed task queue has good support for timeouts.
In particular, you can define a soft time limit that just raises an exception in your process (so you can clean up) and/or a hard time limit that terminates the task when the time limit has been exceeded.
Under the covers, this uses the same signals approach as referenced in your "before" post, but in a more usable and manageable way. And if the list of web sites you are monitoring is long, you might benefit from its primary feature -- all kinds of ways to manage the execution of a large number of tasks.
I believe you can use multiprocessing and not depend on a 3rd party package:
import multiprocessing
import requests
def call_with_timeout(func, args, kwargs, timeout):
manager = multiprocessing.Manager()
return_dict = manager.dict()
# define a wrapper of `return_dict` to store the result.
def function(return_dict):
return_dict['value'] = func(*args, **kwargs)
p = multiprocessing.Process(target=function, args=(return_dict,))
p.start()
# Force a max. `timeout` or wait for the process to finish
p.join(timeout)
# If thread is still active, it didn't finish: raise TimeoutError
if p.is_alive():
p.terminate()
p.join()
raise TimeoutError
else:
return return_dict['value']
call_with_timeout(requests.get, args=(url,), kwargs={'timeout': 10}, timeout=60)
The timeout passed to kwargs is the timeout to get any response from the server, the argument timeout is the timeout to get the complete response.
Despite the question being about requests, I find this very easy to do with pycurl CURLOPT_TIMEOUT or CURLOPT_TIMEOUT_MS.
No threading or signaling required:
import pycurl
import StringIO
url = 'http://www.example.com/example.zip'
timeout_ms = 1000
raw = StringIO.StringIO()
c = pycurl.Curl()
c.setopt(pycurl.TIMEOUT_MS, timeout_ms) # total timeout in milliseconds
c.setopt(pycurl.WRITEFUNCTION, raw.write)
c.setopt(pycurl.NOSIGNAL, 1)
c.setopt(pycurl.URL, url)
c.setopt(pycurl.HTTPGET, 1)
try:
c.perform()
except pycurl.error:
traceback.print_exc() # error generated on timeout
pass # or just pass if you don't want to print the error
In case you're using the option stream=True you can do this:
r = requests.get(
'http://url_to_large_file',
timeout=1, # relevant only for underlying socket
stream=True)
with open('/tmp/out_file.txt'), 'wb') as f:
start_time = time.time()
for chunk in r.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
if time.time() - start_time > 8:
raise Exception('Request took longer than 8s')
The solution does not need signals or multiprocessing.
Just another one solution (got it from http://docs.python-requests.org/en/master/user/advanced/#streaming-uploads)
Before upload you can find out the content size:
TOO_LONG = 10*1024*1024 # 10 Mb
big_url = "http://ipv4.download.thinkbroadband.com/1GB.zip"
r = requests.get(big_url, stream=True)
print (r.headers['content-length'])
# 1073741824
if int(r.headers['content-length']) < TOO_LONG:
# upload content:
content = r.content
But be careful, a sender can set up incorrect value in the 'content-length' response field.
timeout = (connection timeout, data read timeout) or give a single argument(timeout=1)
import requests
try:
req = requests.request('GET', 'https://www.google.com',timeout=(1,1))
print(req)
except requests.ReadTimeout:
print("READ TIME OUT")
this code working for socketError 11004 and 10060......
# -*- encoding:UTF-8 -*-
__author__ = 'ACE'
import requests
from PyQt4.QtCore import *
from PyQt4.QtGui import *
class TimeOutModel(QThread):
Existed = pyqtSignal(bool)
TimeOut = pyqtSignal()
def __init__(self, fun, timeout=500, parent=None):
"""
#param fun: function or lambda
#param timeout: ms
"""
super(TimeOutModel, self).__init__(parent)
self.fun = fun
self.timeer = QTimer(self)
self.timeer.setInterval(timeout)
self.timeer.timeout.connect(self.time_timeout)
self.Existed.connect(self.timeer.stop)
self.timeer.start()
self.setTerminationEnabled(True)
def time_timeout(self):
self.timeer.stop()
self.TimeOut.emit()
self.quit()
self.terminate()
def run(self):
self.fun()
bb = lambda: requests.get("http://ipv4.download.thinkbroadband.com/1GB.zip")
a = QApplication([])
z = TimeOutModel(bb, 500)
print 'timeout'
a.exec_()
Well, I tried many solutions on this page and still faced instabilities, random hangs, poor connections performance.
I'm now using Curl and i'm really happy about it's "max time" functionnality and about the global performances, even with such a poor implementation :
content=commands.getoutput('curl -m6 -Ss "http://mywebsite.xyz"')
Here, I defined a 6 seconds max time parameter, englobing both connection and transfer time.
I'm sure Curl has a nice python binding, if you prefer to stick to the pythonic syntax :)
There is a package called timeout-decorator that you can use to time out any python function.
#timeout_decorator.timeout(5)
def mytest():
print("Start")
for i in range(1,10):
time.sleep(1)
print("{} seconds have passed".format(i))
It uses the signals approach that some answers here suggest. Alternatively, you can tell it to use multiprocessing instead of signals (e.g. if you are in a multi-thread environment).
If it comes to that, create a watchdog thread that messes up requests' internal state after 10 seconds, e.g.:
closes the underlying socket, and ideally
triggers an exception if requests retries the operation
Note that depending on the system libraries you may be unable to set deadline on DNS resolution.
I'm using requests 2.2.1 and eventlet didn't work for me. Instead I was able use gevent timeout instead since gevent is used in my service for gunicorn.
import gevent
import gevent.monkey
gevent.monkey.patch_all(subprocess=True)
try:
with gevent.Timeout(5):
ret = requests.get(url)
print ret.status_code, ret.content
except gevent.timeout.Timeout as e:
print "timeout: {}".format(e.message)
Please note that gevent.timeout.Timeout is not caught by general Exception handling.
So either explicitly catch gevent.timeout.Timeout
or pass in a different exception to be used like so: with gevent.Timeout(5, requests.exceptions.Timeout): although no message is passed when this exception is raised.
The biggest problem is that if the connection can't be established, the requests package waits too long and blocks the rest of the program.
There are several ways how to tackle the problem but when I looked for a oneliner similar to requests, I couldn't find anything. That's why I built a wrapper around requests called reqto ("requests timeout"), which supports proper timeout for all standard methods from requests.
pip install reqto
The syntax is identical to requests
import reqto
response = reqto.get(f'https://pypi.org/pypi/reqto/json',timeout=1)
# Will raise an exception on Timeout
print(response)
Moreover, you can set up a custom timeout function
def custom_function(parameter):
print(parameter)
response = reqto.get(f'https://pypi.org/pypi/reqto/json',timeout=5,timeout_function=custom_function,timeout_args="Timeout custom function called")
#Will call timeout_function instead of raising an exception on Timeout
print(response)
Important note is that the import line
import reqto
needs to be earlier import than all other imports working with requests, threading, etc. due to monkey_patch which runs in the background.
I came up with a more direct solution that is admittedly ugly but fixes the real problem. It goes a bit like this:
resp = requests.get(some_url, stream=True)
resp.raw._fp.fp._sock.settimeout(read_timeout)
# This will load the entire response even though stream is set
content = resp.content
You can read the full explanation here

Read timeout using either urllib2 or any other http library

I have code for reading an url like this:
from urllib2 import Request, urlopen
req = Request(url)
for key, val in headers.items():
req.add_header(key, val)
res = urlopen(req, timeout = timeout)
# This line blocks
content = res.read()
The timeout works for the urlopen() call. But then the code gets to the res.read() call where I want to read the response data and the timeout isn't applied there. So the read call may hang almost forever waiting for data from the server. The only solution I've found is to use a signal to interrupt the read() which is not suitable for me since I'm using threads.
What other options are there? Is there a HTTP library for Python that handles read timeouts? I've looked at httplib2 and requests and they seem to suffer the same issue as above. I don't want to write my own nonblocking network code using the socket module because I think there should already be a library for this.
Update: None of the solutions below are doing it for me. You can see for yourself that setting the socket or urlopen timeout has no effect when downloading a large file:
from urllib2 import urlopen
url = 'http://iso.linuxquestions.org/download/388/7163/http/se.releases.ubuntu.com/ubuntu-12.04.3-desktop-i386.iso'
c = urlopen(url)
c.read()
At least on Windows with Python 2.7.3, the timeouts are being completely ignored.
It's not possible for any library to do this without using some kind of asynchronous timer through threads or otherwise. The reason is that the timeout parameter used in httplib, urllib2 and other libraries sets the timeout on the underlying socket. And what this actually does is explained in the documentation.
SO_RCVTIMEO
Sets the timeout value that specifies the maximum amount of time an input function waits until it completes. It accepts a timeval structure with the number of seconds and microseconds specifying the limit on how long to wait for an input operation to complete. If a receive operation has blocked for this much time without receiving additional data, it shall return with a partial count or errno set to [EAGAIN] or [EWOULDBLOCK] if no data is received.
The bolded part is key. A socket.timeout is only raised if not a single byte has been received for the duration of the timeout window. In other words, this is a timeout between received bytes.
A simple function using threading.Timer could be as follows.
import httplib
import socket
import threading
def download(host, path, timeout = 10):
content = None
http = httplib.HTTPConnection(host)
http.request('GET', path)
response = http.getresponse()
timer = threading.Timer(timeout, http.sock.shutdown, [socket.SHUT_RD])
timer.start()
try:
content = response.read()
except httplib.IncompleteRead:
pass
timer.cancel() # cancel on triggered Timer is safe
http.close()
return content
>>> host = 'releases.ubuntu.com'
>>> content = download(host, '/15.04/ubuntu-15.04-desktop-amd64.iso', 1)
>>> print content is None
True
>>> content = download(host, '/15.04/MD5SUMS', 1)
>>> print content is None
False
Other than checking for None, it's also possible to catch the httplib.IncompleteRead exception not inside the function, but outside of it. The latter case will not work though if the HTTP request doesn't have a Content-Length header.
I found in my tests (using the technique described here) that a timeout set in the urlopen() call also effects the read() call:
import urllib2 as u
c = u.urlopen('http://localhost/', timeout=5.0)
s = c.read(1<<20)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/socket.py", line 380, in read
data = self._sock.recv(left)
File "/usr/lib/python2.7/httplib.py", line 561, in read
s = self.fp.read(amt)
File "/usr/lib/python2.7/httplib.py", line 1298, in read
return s + self._file.read(amt - len(s))
File "/usr/lib/python2.7/socket.py", line 380, in read
data = self._sock.recv(left)
socket.timeout: timed out
Maybe it's a feature of newer versions? I'm using Python 2.7 on a 12.04 Ubuntu straight out of the box.
One possible (imperfect) solution is to set the global socket timeout, explained in more detail here:
import socket
import urllib2
# timeout in seconds
socket.setdefaulttimeout(10)
# this call to urllib2.urlopen now uses the default timeout
# we have set in the socket module
req = urllib2.Request('http://www.voidspace.org.uk')
response = urllib2.urlopen(req)
However, this only works if you're willing to globally modify the timeout for all users of the socket module. I'm running the request from within a Celery task, so doing this would mess up timeouts for the Celery worker code itself.
I'd be happy to hear any other solutions...
I'd expect this to be a common problem, and yet - no answers to be found anywhere... Just built a solution for this using timeout signal:
import urllib2
import socket
timeout = 10
socket.setdefaulttimeout(timeout)
import time
import signal
def timeout_catcher(signum, _):
raise urllib2.URLError("Read timeout")
signal.signal(signal.SIGALRM, timeout_catcher)
def safe_read(url, timeout_time):
signal.setitimer(signal.ITIMER_REAL, timeout_time)
url = 'http://uberdns.eu'
content = urllib2.urlopen(url, timeout=timeout_time).read()
signal.setitimer(signal.ITIMER_REAL, 0)
# you should also catch any exceptions going out of urlopen here,
# set the timer to 0, and pass the exceptions on.
The credit for the signal part of the solution goes here btw: python timer mystery
Any asynchronous network library should allow to enforce the total timeout on any I/O operation e.g., here's gevent code example:
#!/usr/bin/env python2
import gevent
import gevent.monkey # $ pip install gevent
gevent.monkey.patch_all()
import urllib2
with gevent.Timeout(2): # enforce total timeout
response = urllib2.urlopen('http://localhost:8000')
encoding = response.headers.getparam('charset')
print response.read().decode(encoding)
And here's asyncio equivalent:
#!/usr/bin/env python3.5
import asyncio
import aiohttp # $ pip install aiohttp
async def fetch_text(url):
response = await aiohttp.get(url)
return await response.text()
text = asyncio.get_event_loop().run_until_complete(
asyncio.wait_for(fetch_text('http://localhost:8000'), timeout=2))
print(text)
The test http server is defined here.
pycurl.TIMEOUT option works for the whole request:
#!/usr/bin/env python3
"""Test that pycurl.TIMEOUT does limit the total request timeout."""
import sys
import pycurl
timeout = 2 #NOTE: it does limit both the total *connection* and *read* timeouts
c = pycurl.Curl()
c.setopt(pycurl.CONNECTTIMEOUT, timeout)
c.setopt(pycurl.TIMEOUT, timeout)
c.setopt(pycurl.WRITEFUNCTION, sys.stdout.buffer.write)
c.setopt(pycurl.HEADERFUNCTION, sys.stderr.buffer.write)
c.setopt(pycurl.NOSIGNAL, 1)
c.setopt(pycurl.URL, 'http://localhost:8000')
c.setopt(pycurl.HTTPGET, 1)
c.perform()
The code raises the timeout error in ~2 seconds. I've tested the total read timeout with the server that sends the response in multiple chunks with the time less than the timeout between chunks:
$ python -mslow_http_server 1
where slow_http_server.py:
#!/usr/bin/env python
"""Usage: python -mslow_http_server [<read_timeout>]
Return an http response with *read_timeout* seconds between parts.
"""
import time
try:
from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer, test
except ImportError: # Python 3
from http.server import BaseHTTPRequestHandler, HTTPServer, test
def SlowRequestHandlerFactory(read_timeout):
class HTTPRequestHandler(BaseHTTPRequestHandler):
def do_GET(self):
n = 5
data = b'1\n'
self.send_response(200)
self.send_header("Content-type", "text/plain; charset=utf-8")
self.send_header("Content-Length", n*len(data))
self.end_headers()
for i in range(n):
self.wfile.write(data)
self.wfile.flush()
time.sleep(read_timeout)
return HTTPRequestHandler
if __name__ == "__main__":
import sys
read_timeout = int(sys.argv[1]) if len(sys.argv) > 1 else 5
test(HandlerClass=SlowRequestHandlerFactory(read_timeout),
ServerClass=HTTPServer)
I've tested the total connection timeout with http://google.com:22222.
This isn't the behavior I see. I get a URLError when the call times out:
from urllib2 import Request, urlopen
req = Request('http://www.google.com')
res = urlopen(req,timeout=0.000001)
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# ...
# raise URLError(err)
# urllib2.URLError: <urlopen error timed out>
Can't you catch this error and then avoid trying to read res?
When I try to use res.read() after this I get NameError: name 'res' is not defined. Is something like this what you need:
try:
res = urlopen(req,timeout=3.0)
except:
print 'Doh!'
finally:
print 'yay!'
print res.read()
I suppose the way to implement a timeout manually is via multiprocessing, no? If the job hasn't finished you can terminate it.
Had the same issue with socket timeout on the read statement. What worked for me was putting both the urlopen and the read inside a try statement. Hope this helps!

Categories

Resources