Read timeout using either urllib2 or any other http library - python
I have code for reading an url like this:
from urllib2 import Request, urlopen
req = Request(url)
for key, val in headers.items():
req.add_header(key, val)
res = urlopen(req, timeout = timeout)
# This line blocks
content = res.read()
The timeout works for the urlopen() call. But then the code gets to the res.read() call where I want to read the response data and the timeout isn't applied there. So the read call may hang almost forever waiting for data from the server. The only solution I've found is to use a signal to interrupt the read() which is not suitable for me since I'm using threads.
What other options are there? Is there a HTTP library for Python that handles read timeouts? I've looked at httplib2 and requests and they seem to suffer the same issue as above. I don't want to write my own nonblocking network code using the socket module because I think there should already be a library for this.
Update: None of the solutions below are doing it for me. You can see for yourself that setting the socket or urlopen timeout has no effect when downloading a large file:
from urllib2 import urlopen
url = 'http://iso.linuxquestions.org/download/388/7163/http/se.releases.ubuntu.com/ubuntu-12.04.3-desktop-i386.iso'
c = urlopen(url)
c.read()
At least on Windows with Python 2.7.3, the timeouts are being completely ignored.
It's not possible for any library to do this without using some kind of asynchronous timer through threads or otherwise. The reason is that the timeout parameter used in httplib, urllib2 and other libraries sets the timeout on the underlying socket. And what this actually does is explained in the documentation.
SO_RCVTIMEO
Sets the timeout value that specifies the maximum amount of time an input function waits until it completes. It accepts a timeval structure with the number of seconds and microseconds specifying the limit on how long to wait for an input operation to complete. If a receive operation has blocked for this much time without receiving additional data, it shall return with a partial count or errno set to [EAGAIN] or [EWOULDBLOCK] if no data is received.
The bolded part is key. A socket.timeout is only raised if not a single byte has been received for the duration of the timeout window. In other words, this is a timeout between received bytes.
A simple function using threading.Timer could be as follows.
import httplib
import socket
import threading
def download(host, path, timeout = 10):
content = None
http = httplib.HTTPConnection(host)
http.request('GET', path)
response = http.getresponse()
timer = threading.Timer(timeout, http.sock.shutdown, [socket.SHUT_RD])
timer.start()
try:
content = response.read()
except httplib.IncompleteRead:
pass
timer.cancel() # cancel on triggered Timer is safe
http.close()
return content
>>> host = 'releases.ubuntu.com'
>>> content = download(host, '/15.04/ubuntu-15.04-desktop-amd64.iso', 1)
>>> print content is None
True
>>> content = download(host, '/15.04/MD5SUMS', 1)
>>> print content is None
False
Other than checking for None, it's also possible to catch the httplib.IncompleteRead exception not inside the function, but outside of it. The latter case will not work though if the HTTP request doesn't have a Content-Length header.
I found in my tests (using the technique described here) that a timeout set in the urlopen() call also effects the read() call:
import urllib2 as u
c = u.urlopen('http://localhost/', timeout=5.0)
s = c.read(1<<20)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/socket.py", line 380, in read
data = self._sock.recv(left)
File "/usr/lib/python2.7/httplib.py", line 561, in read
s = self.fp.read(amt)
File "/usr/lib/python2.7/httplib.py", line 1298, in read
return s + self._file.read(amt - len(s))
File "/usr/lib/python2.7/socket.py", line 380, in read
data = self._sock.recv(left)
socket.timeout: timed out
Maybe it's a feature of newer versions? I'm using Python 2.7 on a 12.04 Ubuntu straight out of the box.
One possible (imperfect) solution is to set the global socket timeout, explained in more detail here:
import socket
import urllib2
# timeout in seconds
socket.setdefaulttimeout(10)
# this call to urllib2.urlopen now uses the default timeout
# we have set in the socket module
req = urllib2.Request('http://www.voidspace.org.uk')
response = urllib2.urlopen(req)
However, this only works if you're willing to globally modify the timeout for all users of the socket module. I'm running the request from within a Celery task, so doing this would mess up timeouts for the Celery worker code itself.
I'd be happy to hear any other solutions...
I'd expect this to be a common problem, and yet - no answers to be found anywhere... Just built a solution for this using timeout signal:
import urllib2
import socket
timeout = 10
socket.setdefaulttimeout(timeout)
import time
import signal
def timeout_catcher(signum, _):
raise urllib2.URLError("Read timeout")
signal.signal(signal.SIGALRM, timeout_catcher)
def safe_read(url, timeout_time):
signal.setitimer(signal.ITIMER_REAL, timeout_time)
url = 'http://uberdns.eu'
content = urllib2.urlopen(url, timeout=timeout_time).read()
signal.setitimer(signal.ITIMER_REAL, 0)
# you should also catch any exceptions going out of urlopen here,
# set the timer to 0, and pass the exceptions on.
The credit for the signal part of the solution goes here btw: python timer mystery
Any asynchronous network library should allow to enforce the total timeout on any I/O operation e.g., here's gevent code example:
#!/usr/bin/env python2
import gevent
import gevent.monkey # $ pip install gevent
gevent.monkey.patch_all()
import urllib2
with gevent.Timeout(2): # enforce total timeout
response = urllib2.urlopen('http://localhost:8000')
encoding = response.headers.getparam('charset')
print response.read().decode(encoding)
And here's asyncio equivalent:
#!/usr/bin/env python3.5
import asyncio
import aiohttp # $ pip install aiohttp
async def fetch_text(url):
response = await aiohttp.get(url)
return await response.text()
text = asyncio.get_event_loop().run_until_complete(
asyncio.wait_for(fetch_text('http://localhost:8000'), timeout=2))
print(text)
The test http server is defined here.
pycurl.TIMEOUT option works for the whole request:
#!/usr/bin/env python3
"""Test that pycurl.TIMEOUT does limit the total request timeout."""
import sys
import pycurl
timeout = 2 #NOTE: it does limit both the total *connection* and *read* timeouts
c = pycurl.Curl()
c.setopt(pycurl.CONNECTTIMEOUT, timeout)
c.setopt(pycurl.TIMEOUT, timeout)
c.setopt(pycurl.WRITEFUNCTION, sys.stdout.buffer.write)
c.setopt(pycurl.HEADERFUNCTION, sys.stderr.buffer.write)
c.setopt(pycurl.NOSIGNAL, 1)
c.setopt(pycurl.URL, 'http://localhost:8000')
c.setopt(pycurl.HTTPGET, 1)
c.perform()
The code raises the timeout error in ~2 seconds. I've tested the total read timeout with the server that sends the response in multiple chunks with the time less than the timeout between chunks:
$ python -mslow_http_server 1
where slow_http_server.py:
#!/usr/bin/env python
"""Usage: python -mslow_http_server [<read_timeout>]
Return an http response with *read_timeout* seconds between parts.
"""
import time
try:
from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer, test
except ImportError: # Python 3
from http.server import BaseHTTPRequestHandler, HTTPServer, test
def SlowRequestHandlerFactory(read_timeout):
class HTTPRequestHandler(BaseHTTPRequestHandler):
def do_GET(self):
n = 5
data = b'1\n'
self.send_response(200)
self.send_header("Content-type", "text/plain; charset=utf-8")
self.send_header("Content-Length", n*len(data))
self.end_headers()
for i in range(n):
self.wfile.write(data)
self.wfile.flush()
time.sleep(read_timeout)
return HTTPRequestHandler
if __name__ == "__main__":
import sys
read_timeout = int(sys.argv[1]) if len(sys.argv) > 1 else 5
test(HandlerClass=SlowRequestHandlerFactory(read_timeout),
ServerClass=HTTPServer)
I've tested the total connection timeout with http://google.com:22222.
This isn't the behavior I see. I get a URLError when the call times out:
from urllib2 import Request, urlopen
req = Request('http://www.google.com')
res = urlopen(req,timeout=0.000001)
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# ...
# raise URLError(err)
# urllib2.URLError: <urlopen error timed out>
Can't you catch this error and then avoid trying to read res?
When I try to use res.read() after this I get NameError: name 'res' is not defined. Is something like this what you need:
try:
res = urlopen(req,timeout=3.0)
except:
print 'Doh!'
finally:
print 'yay!'
print res.read()
I suppose the way to implement a timeout manually is via multiprocessing, no? If the job hasn't finished you can terminate it.
Had the same issue with socket timeout on the read statement. What worked for me was putting both the urlopen and the read inside a try statement. Hope this helps!
Related
Timeout for requests.post not working in Python
I have a python script using requests.post : try: r = request.post(url, json=data, timeout=10) except requests.Timeout: print("timeout") (I have also tried with except Timeout: and except requests.exceptions.Timeout) This code should print "Timeout" after around 10 seconds if the server is down, right? However, it doesn't. My script is waiting indefinitely, like if timeout was None Do you know why? Thanks EDIT Here is the whole code: import requests from twisted.internet import task, reactor import json import sys import os\ timeout = 30 # 30 sec timeout to loop PostParam url = os.getenv('URL',"http://127.0.0.1:5000") def PostParams(): # Subscribe to MQTT Broker data = subscribing_broker() # Iterate in the JSON payload to get the different units for unit in data: try: # Make the POST request - data as JSON - blocking call - timeout 10s req_result = requests.post(url, json=unit, timeout=10) # Get the answer in json pred = req_result.json() # Publish to MQTT publishing_broker(pred, pub_topic ) # Connection timeout, continue except Timeout: print("Connection timed out, passing") pass # Infinite loop - Timeout is 30 sec loop = task.LoopingCall(PostParams) loop.start(timeout) reactor.run()
How do I exit a wsgiserver that was started on its own thread?
I have a project that I'm working on where I hope to be able to: start a wsgiserver on its own thread do stuff (some of which involves interacting with the wsgiserver close the thread end the program I can do the first two steps, but I'm having trouble with the last two. I've provided a simpler version of my project that exhibits the issue I have where I can do the first two steps from above, just not the last two. A couple questions: How do I get the thread to stop the wsgi server? Do I just need to pull out the wsgiserver code and start it on its own process? Some details of my project that may head off some questions: My project currently spins up other processes that are intended to talk to my wsgi server. I can spin everything up and get my processes to talk to my server, but I'm not able to get a graceful shutdown. This code sample is intended to provide a 'relatively simple' sample that can be more easily reviewed. there are remnants of failed attempts at solving this in the code, hopefully, they aren't too distracting. #Simple echo program #listens on port 3000 and returns anything posted by http to that port #installing required libraries #download/install Microsoft Visual C++ 9.0 for Python #https://www.microsoft.com/en-us/download/details.aspx?id=44266 #pip install greenlet #pip install gevent import sys import threading import urllib import urllib2 import time import traceback from gevent.pywsgi import WSGIServer, WSGIHandler from gevent import socket server = "" def request_error(start_response): global server # Send error to atm - must provide start_response start_response('500', []) #server.stop() return [''] def handle_transaction(env, start_response): global server try: result = env['wsgi.input'].read() print("Received: " + result) sys.stdout.flush() start_response('200 OK', []) if (result.lower()=="exit"): #server.stop() return result else: return result except: return request_error(start_response) class ErrorCapturingWSGIHandler(WSGIHandler): def read_requestline(self): result = None try: result = WSGIHandler.read_requestline(self) except: protocol_error() raise # re-raise error, to not change WSGIHandler functionality return result class ErrorCapturingWSGIServer(WSGIServer): handler_class = ErrorCapturingWSGIHandler def start_server(): global server server = ErrorCapturingWSGIServer( ('', 3000), handle_transaction, log=None) server.serve_forever() def main(): global server #start server on it's own thread print("Echoing...") commandServerThread = threading.Thread(target=start_server) commandServerThread.start() #now that the server is started, send data req = urllib2.Request("http://127.0.0.1:3000", data='ping') response = urllib2.urlopen(req) reply = response.read() print(reply) #take a look at the threading info print(threading.active_count()) #try to exit req = urllib2.Request("http://127.0.0.1:3000", data='exit') response = urllib2.urlopen(req) reply = response.read() print(reply) #Now that I'm done, exit #sys.exit(0) return if __name__ == '__main__': main()
urlopen with timeout fails behind proxy
python 2.7.3 under linux: getting strange behaviour when trying to use the timeout parameter from urllib2 import urlopen, Request, HTTPError, URLError url = "http://speedtest.website-solution.net/speedtest/random350x350.jpg" try: #f = urlopen(url, timeout=30) #never works - always times out f = urlopen(url) #always works fine, returns after < 2 secs print("opened") f.close() print("closed") except IOError as e: print(e) pass EDIT: Digging into this more, it seems lower level.. the following code has the same issue: s = socket.socket() s.settimeout(30) s.connect(("speedtest.website-solution.net", 80)) #times out print("opened socket") s.close() It's running behind a socks proxy. Running using tsocks python test.py. Wonder if that can be screwing up the socket timeout for some reason? Seems strange that timeout=None works fine though.
OK.. figured it out. This is indeed related to the proxy. No idea why, but the following code seems to fix it: Source: https://code.google.com/p/socksipy-branch/ Put this at the start of the code: import urllib2 from urllib2 import urlopen, Request, HTTPError, URLError import httplib import socks import socket socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "192.168.56.1", 101) socks.wrapmodule(urllib2) Now everything works fine..
Timeout for python requests.get entire response
I'm gathering statistics on a list of websites and I'm using requests for it for simplicity. Here is my code: data=[] websites=['http://google.com', 'http://bbc.co.uk'] for w in websites: r= requests.get(w, verify=False) data.append( (r.url, len(r.content), r.elapsed.total_seconds(), str([(l.status_code, l.url) for l in r.history]), str(r.headers.items()), str(r.cookies.items())) ) Now, I want requests.get to timeout after 10 seconds so the loop doesn't get stuck. This question has been of interest before too but none of the answers are clean. I hear that maybe not using requests is a good idea but then how should I get the nice things requests offer (the ones in the tuple).
Set the timeout parameter: r = requests.get(w, verify=False, timeout=10) # 10 seconds Changes in version 2.25.1 The code above will cause the call to requests.get() to timeout if the connection or delays between reads takes more than ten seconds. See: https://requests.readthedocs.io/en/stable/user/advanced/#timeouts
What about using eventlet? If you want to timeout the request after 10 seconds, even if data is being received, this snippet will work for you: import requests import eventlet eventlet.monkey_patch() with eventlet.Timeout(10): requests.get("http://ipv4.download.thinkbroadband.com/1GB.zip", verify=False)
UPDATE: https://requests.readthedocs.io/en/master/user/advanced/#timeouts In new version of requests: If you specify a single value for the timeout, like this: r = requests.get('https://github.com', timeout=5) The timeout value will be applied to both the connect and the read timeouts. Specify a tuple if you would like to set the values separately: r = requests.get('https://github.com', timeout=(3.05, 27)) If the remote server is very slow, you can tell Requests to wait forever for a response, by passing None as a timeout value and then retrieving a cup of coffee. r = requests.get('https://github.com', timeout=None) My old (probably outdated) answer (which was posted long time ago): There are other ways to overcome this problem: 1. Use the TimeoutSauce internal class From: https://github.com/kennethreitz/requests/issues/1928#issuecomment-35811896 import requests from requests.adapters import TimeoutSauce class MyTimeout(TimeoutSauce): def __init__(self, *args, **kwargs): connect = kwargs.get('connect', 5) read = kwargs.get('read', connect) super(MyTimeout, self).__init__(connect=connect, read=read) requests.adapters.TimeoutSauce = MyTimeout This code should cause us to set the read timeout as equal to the connect timeout, which is the timeout value you pass on your Session.get() call. (Note that I haven't actually tested this code, so it may need some quick debugging, I just wrote it straight into the GitHub window.) 2. Use a fork of requests from kevinburke: https://github.com/kevinburke/requests/tree/connect-timeout From its documentation: https://github.com/kevinburke/requests/blob/connect-timeout/docs/user/advanced.rst If you specify a single value for the timeout, like this: r = requests.get('https://github.com', timeout=5) The timeout value will be applied to both the connect and the read timeouts. Specify a tuple if you would like to set the values separately: r = requests.get('https://github.com', timeout=(3.05, 27)) kevinburke has requested it to be merged into the main requests project, but it hasn't been accepted yet.
timeout = int(seconds) Since requests >= 2.4.0, you can use the timeout argument, i.e: requests.get('https://duckduckgo.com/', timeout=10) Note: timeout is not a time limit on the entire response download; rather, an exception is raised if the server has not issued a response for timeout seconds ( more precisely, if no bytes have been received on the underlying socket for timeout seconds). If no timeout is specified explicitly, requests do not time out.
To create a timeout you can use signals. The best way to solve this case is probably to Set an exception as the handler for the alarm signal Call the alarm signal with a ten second delay Call the function inside a try-except-finally block. The except block is reached if the function timed out. In the finally block you abort the alarm, so it's not singnaled later. Here is some example code: import signal from time import sleep class TimeoutException(Exception): """ Simple Exception to be called on timeouts. """ pass def _timeout(signum, frame): """ Raise an TimeoutException. This is intended for use as a signal handler. The signum and frame arguments passed to this are ignored. """ # Raise TimeoutException with system default timeout message raise TimeoutException() # Set the handler for the SIGALRM signal: signal.signal(signal.SIGALRM, _timeout) # Send the SIGALRM signal in 10 seconds: signal.alarm(10) try: # Do our code: print('This will take 11 seconds...') sleep(11) print('done!') except TimeoutException: print('It timed out!') finally: # Abort the sending of the SIGALRM signal: signal.alarm(0) There are some caveats to this: It is not threadsafe, signals are always delivered to the main thread, so you can't put this in any other thread. There is a slight delay after the scheduling of the signal and the execution of the actual code. This means that the example would time out even if it only slept for ten seconds. But, it's all in the standard python library! Except for the sleep function import it's only one import. If you are going to use timeouts many places You can easily put the TimeoutException, _timeout and the singaling in a function and just call that. Or you can make a decorator and put it on functions, see the answer linked below. You can also set this up as a "context manager" so you can use it with the with statement: import signal class Timeout(): """ Timeout for use with the `with` statement. """ class TimeoutException(Exception): """ Simple Exception to be called on timeouts. """ pass def _timeout(signum, frame): """ Raise an TimeoutException. This is intended for use as a signal handler. The signum and frame arguments passed to this are ignored. """ raise Timeout.TimeoutException() def __init__(self, timeout=10): self.timeout = timeout signal.signal(signal.SIGALRM, Timeout._timeout) def __enter__(self): signal.alarm(self.timeout) def __exit__(self, exc_type, exc_value, traceback): signal.alarm(0) return exc_type is Timeout.TimeoutException # Demonstration: from time import sleep print('This is going to take maximum 10 seconds...') with Timeout(10): sleep(15) print('No timeout?') print('Done') One possible down side with this context manager approach is that you can't know if the code actually timed out or not. Sources and recommended reading: The documentation on signals This answer on timeouts by #David Narayan. He has organized the above code as a decorator.
Try this request with timeout & error handling: import requests try: url = "http://google.com" r = requests.get(url, timeout=10) except requests.exceptions.Timeout as e: print e
The connect timeout is the number of seconds Requests will wait for your client to establish a connection to a remote machine (corresponding to the connect()) call on the socket. It’s a good practice to set connect timeouts to slightly larger than a multiple of 3, which is the default TCP packet retransmission window. Once your client has connected to the server and sent the HTTP request, the read timeout started. It is the number of seconds the client will wait for the server to send a response. (Specifically, it’s the number of seconds that the client will wait between bytes sent from the server. In 99.9% of cases, this is the time before the server sends the first byte). If you specify a single value for the timeout, The timeout value will be applied to both the connect and the read timeouts. like below: r = requests.get('https://github.com', timeout=5) Specify a tuple if you would like to set the values separately for connect and read: r = requests.get('https://github.com', timeout=(3.05, 27)) If the remote server is very slow, you can tell Requests to wait forever for a response, by passing None as a timeout value and then retrieving a cup of coffee. r = requests.get('https://github.com', timeout=None) https://docs.python-requests.org/en/latest/user/advanced/#timeouts
Most other answers are incorrect Despite all the answers, I believe that this thread still lacks a proper solution and no existing answer presents a reasonable way to do something which should be simple and obvious. Let's start by saying that as of 2022, there is still absolutely no way to do it properly with requests alone. It is a concious design decision by the library's developers. Solutions utilizing the timeout parameter simply do not accomplish what they intend to do. The fact that it "seems" to work at the first glance is purely incidental: The timeout parameter has absolutely nothing to do with the total execution time of the request. It merely controls the maximum amount of time that can pass before underlying socket receives any data. With an example timeout of 5 seconds, server can just as well send 1 byte of data every 4 seconds and it will be perfectly okay, but won't help you very much. Answers with stream and iter_content are somewhat better, but they still do not cover everything in a request. You do not actually receive anything from iter_content until after response headers are sent, which falls under the same issue - even if you use 1 byte as a chunk size for iter_content, reading full response headers could take a totally arbitrary amount of time and you can never actually get to the point in which you read any response body from iter_content. Here are some examples that completely break both timeout and stream-based approach. Try them all. They all hang indefinitely, no matter which method you use. server.py import socket import time server = socket.socket() server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, True) server.bind(('127.0.0.1', 8080)) server.listen() while True: try: sock, addr = server.accept() print('Connection from', addr) sock.send(b'HTTP/1.1 200 OK\r\n') # Send some garbage headers very slowly but steadily. # Never actually complete the response. while True: sock.send(b'a') time.sleep(1) except: pass demo1.py import requests requests.get('http://localhost:8080') demo2.py import requests requests.get('http://localhost:8080', timeout=5) demo3.py import requests requests.get('http://localhost:8080', timeout=(5, 5)) demo4.py import requests with requests.get('http://localhost:8080', timeout=(5, 5), stream=True) as res: for chunk in res.iter_content(1): break The proper solution My approach utilizes Python's sys.settrace function. It is dead simple. You do not need to use any external libraries or turn your code upside down. Unlike most other answers, this actually guarantees that the code executes in specified time. Be aware that you still need to specify the timeout parameter, as settrace only concerns Python code. Actual socket reads are external syscalls which are not covered by settrace, but are covered by the timeout parameter. Due to this fact, the exact time limit is not TOTAL_TIMEOUT, but a value which is explained in comments below. import requests import sys import time # This function serves as a "hook" that executes for each Python statement # down the road. There may be some performance penalty, but as downloading # a webpage is mostly I/O bound, it's not going to be significant. def trace_function(frame, event, arg): if time.time() - start > TOTAL_TIMEOUT: raise Exception('Timed out!') # Use whatever exception you consider appropriate. return trace_function # The following code will terminate at most after TOTAL_TIMEOUT + the highest # value specified in `timeout` parameter of `requests.get`. # In this case 10 + 6 = 16 seconds. # For most cases though, it's gonna terminate no later than TOTAL_TIMEOUT. TOTAL_TIMEOUT = 10 start = time.time() sys.settrace(trace_function) try: res = requests.get('http://localhost:8080', timeout=(3, 6)) # Use whatever timeout values you consider appropriate. except: raise finally: sys.settrace(None) # Remove the time constraint and continue normally. # Do something with the response Condensed import requests, sys, time TOTAL_TIMEOUT = 10 def trace_function(frame, event, arg): if time.time() - start > TOTAL_TIMEOUT: raise Exception('Timed out!') return trace_function start = time.time() sys.settrace(trace_function) try: res = requests.get('http://localhost:8080', timeout=(3, 6)) except: raise finally: sys.settrace(None) That's it!
Set stream=True and use r.iter_content(1024). Yes, eventlet.Timeout just somehow doesn't work for me. try: start = time() timeout = 5 with get(config['source']['online'], stream=True, timeout=timeout) as r: r.raise_for_status() content = bytes() content_gen = r.iter_content(1024) while True: if time()-start > timeout: raise TimeoutError('Time out! ({} seconds)'.format(timeout)) try: content += next(content_gen) except StopIteration: break data = content.decode().split('\n') if len(data) in [0, 1]: raise ValueError('Bad requests data') except (exceptions.RequestException, ValueError, IndexError, KeyboardInterrupt, TimeoutError) as e: print(e) with open(config['source']['local']) as f: data = [line.strip() for line in f.readlines()] The discussion is here https://redd.it/80kp1h
This may be overkill, but the Celery distributed task queue has good support for timeouts. In particular, you can define a soft time limit that just raises an exception in your process (so you can clean up) and/or a hard time limit that terminates the task when the time limit has been exceeded. Under the covers, this uses the same signals approach as referenced in your "before" post, but in a more usable and manageable way. And if the list of web sites you are monitoring is long, you might benefit from its primary feature -- all kinds of ways to manage the execution of a large number of tasks.
I believe you can use multiprocessing and not depend on a 3rd party package: import multiprocessing import requests def call_with_timeout(func, args, kwargs, timeout): manager = multiprocessing.Manager() return_dict = manager.dict() # define a wrapper of `return_dict` to store the result. def function(return_dict): return_dict['value'] = func(*args, **kwargs) p = multiprocessing.Process(target=function, args=(return_dict,)) p.start() # Force a max. `timeout` or wait for the process to finish p.join(timeout) # If thread is still active, it didn't finish: raise TimeoutError if p.is_alive(): p.terminate() p.join() raise TimeoutError else: return return_dict['value'] call_with_timeout(requests.get, args=(url,), kwargs={'timeout': 10}, timeout=60) The timeout passed to kwargs is the timeout to get any response from the server, the argument timeout is the timeout to get the complete response.
Despite the question being about requests, I find this very easy to do with pycurl CURLOPT_TIMEOUT or CURLOPT_TIMEOUT_MS. No threading or signaling required: import pycurl import StringIO url = 'http://www.example.com/example.zip' timeout_ms = 1000 raw = StringIO.StringIO() c = pycurl.Curl() c.setopt(pycurl.TIMEOUT_MS, timeout_ms) # total timeout in milliseconds c.setopt(pycurl.WRITEFUNCTION, raw.write) c.setopt(pycurl.NOSIGNAL, 1) c.setopt(pycurl.URL, url) c.setopt(pycurl.HTTPGET, 1) try: c.perform() except pycurl.error: traceback.print_exc() # error generated on timeout pass # or just pass if you don't want to print the error
In case you're using the option stream=True you can do this: r = requests.get( 'http://url_to_large_file', timeout=1, # relevant only for underlying socket stream=True) with open('/tmp/out_file.txt'), 'wb') as f: start_time = time.time() for chunk in r.iter_content(chunk_size=1024): if chunk: # filter out keep-alive new chunks f.write(chunk) if time.time() - start_time > 8: raise Exception('Request took longer than 8s') The solution does not need signals or multiprocessing.
Just another one solution (got it from http://docs.python-requests.org/en/master/user/advanced/#streaming-uploads) Before upload you can find out the content size: TOO_LONG = 10*1024*1024 # 10 Mb big_url = "http://ipv4.download.thinkbroadband.com/1GB.zip" r = requests.get(big_url, stream=True) print (r.headers['content-length']) # 1073741824 if int(r.headers['content-length']) < TOO_LONG: # upload content: content = r.content But be careful, a sender can set up incorrect value in the 'content-length' response field.
timeout = (connection timeout, data read timeout) or give a single argument(timeout=1) import requests try: req = requests.request('GET', 'https://www.google.com',timeout=(1,1)) print(req) except requests.ReadTimeout: print("READ TIME OUT")
this code working for socketError 11004 and 10060...... # -*- encoding:UTF-8 -*- __author__ = 'ACE' import requests from PyQt4.QtCore import * from PyQt4.QtGui import * class TimeOutModel(QThread): Existed = pyqtSignal(bool) TimeOut = pyqtSignal() def __init__(self, fun, timeout=500, parent=None): """ #param fun: function or lambda #param timeout: ms """ super(TimeOutModel, self).__init__(parent) self.fun = fun self.timeer = QTimer(self) self.timeer.setInterval(timeout) self.timeer.timeout.connect(self.time_timeout) self.Existed.connect(self.timeer.stop) self.timeer.start() self.setTerminationEnabled(True) def time_timeout(self): self.timeer.stop() self.TimeOut.emit() self.quit() self.terminate() def run(self): self.fun() bb = lambda: requests.get("http://ipv4.download.thinkbroadband.com/1GB.zip") a = QApplication([]) z = TimeOutModel(bb, 500) print 'timeout' a.exec_()
Well, I tried many solutions on this page and still faced instabilities, random hangs, poor connections performance. I'm now using Curl and i'm really happy about it's "max time" functionnality and about the global performances, even with such a poor implementation : content=commands.getoutput('curl -m6 -Ss "http://mywebsite.xyz"') Here, I defined a 6 seconds max time parameter, englobing both connection and transfer time. I'm sure Curl has a nice python binding, if you prefer to stick to the pythonic syntax :)
There is a package called timeout-decorator that you can use to time out any python function. #timeout_decorator.timeout(5) def mytest(): print("Start") for i in range(1,10): time.sleep(1) print("{} seconds have passed".format(i)) It uses the signals approach that some answers here suggest. Alternatively, you can tell it to use multiprocessing instead of signals (e.g. if you are in a multi-thread environment).
If it comes to that, create a watchdog thread that messes up requests' internal state after 10 seconds, e.g.: closes the underlying socket, and ideally triggers an exception if requests retries the operation Note that depending on the system libraries you may be unable to set deadline on DNS resolution.
I'm using requests 2.2.1 and eventlet didn't work for me. Instead I was able use gevent timeout instead since gevent is used in my service for gunicorn. import gevent import gevent.monkey gevent.monkey.patch_all(subprocess=True) try: with gevent.Timeout(5): ret = requests.get(url) print ret.status_code, ret.content except gevent.timeout.Timeout as e: print "timeout: {}".format(e.message) Please note that gevent.timeout.Timeout is not caught by general Exception handling. So either explicitly catch gevent.timeout.Timeout or pass in a different exception to be used like so: with gevent.Timeout(5, requests.exceptions.Timeout): although no message is passed when this exception is raised.
The biggest problem is that if the connection can't be established, the requests package waits too long and blocks the rest of the program. There are several ways how to tackle the problem but when I looked for a oneliner similar to requests, I couldn't find anything. That's why I built a wrapper around requests called reqto ("requests timeout"), which supports proper timeout for all standard methods from requests. pip install reqto The syntax is identical to requests import reqto response = reqto.get(f'https://pypi.org/pypi/reqto/json',timeout=1) # Will raise an exception on Timeout print(response) Moreover, you can set up a custom timeout function def custom_function(parameter): print(parameter) response = reqto.get(f'https://pypi.org/pypi/reqto/json',timeout=5,timeout_function=custom_function,timeout_args="Timeout custom function called") #Will call timeout_function instead of raising an exception on Timeout print(response) Important note is that the import line import reqto needs to be earlier import than all other imports working with requests, threading, etc. due to monkey_patch which runs in the background.
I came up with a more direct solution that is admittedly ugly but fixes the real problem. It goes a bit like this: resp = requests.get(some_url, stream=True) resp.raw._fp.fp._sock.settimeout(read_timeout) # This will load the entire response even though stream is set content = resp.content You can read the full explanation here
How to handle timeouts with httplib (python 2.6)?
I'm using httplib to access an api over https and need to build in exception handling in the event that the api is down. Here's an example connection: connection = httplib.HTTPSConnection('non-existent-api.com', timeout=1) connection.request('POST', '/request.api', xml, headers={'Content-Type': 'text/xml'}) response = connection.getresponse() This should timeout, so I was expecting an exception to be raised, and response.read() just returns an empty string. How can I know if there was a timeout? Even better, what's the best way to gracefully handle the problem of a 3rd-party api being down?
Even better, what's the best way to gracefully handle the problem of a 3rd-party api being down? what's mean API is down , API return http 404 , 500 ... or you mean when the API can't be reachable ? first of all i don't think you can know if a web service in general is down before trying to access it so i will recommend for first one you can do like this: import httplib conn = httplib.HTTPConnection('www.google.com') # I used here HTTP not HTTPS for simplify conn.request('HEAD', '/') # Just send a HTTP HEAD request res = conn.getresponse() if res.status == 200: print "ok" else: print "problem : the query returned %s because %s" % (res.status, res.reason) and for checking if the API is not reachable i think you will be better doing a try catch: import httplib import socket try: # I don't think you need the timeout unless you want to also calculate the response time ... conn = httplib.HTTPSConnection('www.google.com') conn.connect() except (httplib.HTTPException, socket.error) as ex: print "Error: %s" % ex You can mix the two ways if you want something more general ,Hope this will help
urllib and httplib don't expose timeout. You have to include socket and set the timeout there: import socket socket.settimeout(10) # or whatever timeout you want
This is what I found to be working correctly with httplib2. Posting it as it might still help someone : import httplib2, socket def check_url(url): h = httplib2.Http(timeout=0.1) #100 ms timeout try: resp = h.request(url, 'HEAD') except (httplib2.HttpLib2Error, socket.error) as ex: print "Request timed out for ", url return False return int(resp[0]['status']) < 400