Python - How to capture gevent socket timeout exception

Python - How to capture gevent socket timeout exception - python

import gevent.monkey
gevent.monkey.patch_socket()
import requests
from gevent.pool import Pool
import socket
urls = ["http://www.iraniansingles.com"]
def check_urls(urls):
pool = Pool(1)
for url in urls:
pool.spawn(fetch, url)
pool.join()
def fetch(url):
print url
try:
resp = requests.get(url, verify=False, timeout=5.0)
print resp.status_code
except socket.timeout:
print "SocketTimeout"
check_urls(urls)
If I remove the first 2 lines, my program printing SocketTimeout. But with monkeypatch, my program waits forever.
Can someone tell me how to capture that socket timeout exception with monkeypatch?

Problem was gevent default timeout set to None. So we have to set default socket timeout manually.
from gevent import socket
socket.setdefaulttimeout(5)

Related

Timeout for requests.post not working in Python

I have a python script using requests.post :
try:
r = request.post(url, json=data, timeout=10)
except requests.Timeout:
print("timeout")
(I have also tried with except Timeout: and except requests.exceptions.Timeout)
This code should print "Timeout" after around 10 seconds if the server is down, right?
However, it doesn't. My script is waiting indefinitely, like if timeout was None
Do you know why?
Thanks
EDIT
Here is the whole code:
import requests
from twisted.internet import task, reactor
import json
import sys
import os\
timeout = 30 # 30 sec timeout to loop PostParam
url = os.getenv('URL',"http://127.0.0.1:5000")
def PostParams():
# Subscribe to MQTT Broker
data = subscribing_broker()
# Iterate in the JSON payload to get the different units
for unit in data:
try:
# Make the POST request - data as JSON - blocking call - timeout 10s
req_result = requests.post(url, json=unit, timeout=10)
# Get the answer in json
pred = req_result.json()
# Publish to MQTT
publishing_broker(pred, pub_topic )
# Connection timeout, continue
except Timeout:
print("Connection timed out, passing")
pass
# Infinite loop - Timeout is 30 sec
loop = task.LoopingCall(PostParams)
loop.start(timeout)
reactor.run()

Exception Handling with requests_futures in python

I am trying to use requests_futures (https://github.com/ross/requests-futures) for asynchronous requests which seems to work fine. The only problem is, it doesn't throw any exceptions for me (i.e. TimeOut Exception). The code I used is:
from concurrent.futures import ThreadPoolExecutor
from requests_futures.sessions import FuturesSession
session = FuturesSession(executor=ThreadPoolExecutor(max_workers=10))
def callback(sess, resp):
# Print the ip address in callback
print 'IP', resp.text
proxy = {'http': 'http://176.194.189.57:8080'}
try:
future = session.get('http://api.ipify.org', background_callback=callback, timeout=5, proxies=proxy)
except Exception as e:
print "Error %s" % e
# future2 = session.get('http://api.ipify.org', background_callback=callback, timeout=5)
The first session.get() should throw an Exception as it isn't a valid proxy.

For the exception to be raised, you have to check the result() method of the future object you just created.

Async http request with python3

Is there any way to make async python3 like node.js do?
I want a minimal example, I've tried the below, but still works with sync mode.
import urllib.request
class MyHandler(urllib.request.HTTPHandler):
#staticmethod
def http_response(request, response):
print(response.code)
return response
opener = urllib.request.build_opener(MyHandler())
try:
opener.open('http://www.google.com/')
print('exit')
except Exception as e:
print(e)
If the async mode works, the print('exit') should display first.
Can anyone help?

Using threading (based on your own code):
import urllib.request
import threading
class MyHandler(urllib.request.HTTPHandler):
#staticmethod
def http_response(request, response):
print(response.code)
return response
opener = urllib.request.build_opener(MyHandler())
try:
thread = threading.Thread(target=opener.open, args=('http://www.google.com',))
thread.start() #begin thread execution
print('exit')
# other program actions
thread.join() #ensure thread in finished before program terminates
except Exception as e:
print(e)

urlopen with timeout fails behind proxy

python 2.7.3 under linux: getting strange behaviour when trying to use the timeout parameter
from urllib2 import urlopen, Request, HTTPError, URLError
url = "http://speedtest.website-solution.net/speedtest/random350x350.jpg"
try:
#f = urlopen(url, timeout=30) #never works - always times out
f = urlopen(url) #always works fine, returns after < 2 secs
print("opened")
f.close()
print("closed")
except IOError as e:
print(e)
pass
EDIT:
Digging into this more, it seems lower level.. the following code has the same issue:
s = socket.socket()
s.settimeout(30)
s.connect(("speedtest.website-solution.net", 80)) #times out
print("opened socket")
s.close()
It's running behind a socks proxy. Running using tsocks python test.py. Wonder if that can be screwing up the socket timeout for some reason? Seems strange that timeout=None works fine though.

OK.. figured it out. This is indeed related to the proxy. No idea why, but the following code seems to fix it:
Source: https://code.google.com/p/socksipy-branch/
Put this at the start of the code:
import urllib2
from urllib2 import urlopen, Request, HTTPError, URLError
import httplib
import socks
import socket
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "192.168.56.1", 101)
socks.wrapmodule(urllib2)
Now everything works fine..

Read timeout using either urllib2 or any other http library

I have code for reading an url like this:
from urllib2 import Request, urlopen
req = Request(url)
for key, val in headers.items():
req.add_header(key, val)
res = urlopen(req, timeout = timeout)
# This line blocks
content = res.read()
The timeout works for the urlopen() call. But then the code gets to the res.read() call where I want to read the response data and the timeout isn't applied there. So the read call may hang almost forever waiting for data from the server. The only solution I've found is to use a signal to interrupt the read() which is not suitable for me since I'm using threads.
What other options are there? Is there a HTTP library for Python that handles read timeouts? I've looked at httplib2 and requests and they seem to suffer the same issue as above. I don't want to write my own nonblocking network code using the socket module because I think there should already be a library for this.
Update: None of the solutions below are doing it for me. You can see for yourself that setting the socket or urlopen timeout has no effect when downloading a large file:
from urllib2 import urlopen
url = 'http://iso.linuxquestions.org/download/388/7163/http/se.releases.ubuntu.com/ubuntu-12.04.3-desktop-i386.iso'
c = urlopen(url)
c.read()
At least on Windows with Python 2.7.3, the timeouts are being completely ignored.

It's not possible for any library to do this without using some kind of asynchronous timer through threads or otherwise. The reason is that the timeout parameter used in httplib, urllib2 and other libraries sets the timeout on the underlying socket. And what this actually does is explained in the documentation.
SO_RCVTIMEO
Sets the timeout value that specifies the maximum amount of time an input function waits until it completes. It accepts a timeval structure with the number of seconds and microseconds specifying the limit on how long to wait for an input operation to complete. If a receive operation has blocked for this much time without receiving additional data, it shall return with a partial count or errno set to [EAGAIN] or [EWOULDBLOCK] if no data is received.
The bolded part is key. A socket.timeout is only raised if not a single byte has been received for the duration of the timeout window. In other words, this is a timeout between received bytes.
A simple function using threading.Timer could be as follows.
import httplib
import socket
import threading
def download(host, path, timeout = 10):
content = None
http = httplib.HTTPConnection(host)
http.request('GET', path)
response = http.getresponse()
timer = threading.Timer(timeout, http.sock.shutdown, [socket.SHUT_RD])
timer.start()
try:
content = response.read()
except httplib.IncompleteRead:
pass
timer.cancel() # cancel on triggered Timer is safe
http.close()
return content
>>> host = 'releases.ubuntu.com'
>>> content = download(host, '/15.04/ubuntu-15.04-desktop-amd64.iso', 1)
>>> print content is None
True
>>> content = download(host, '/15.04/MD5SUMS', 1)
>>> print content is None
False
Other than checking for None, it's also possible to catch the httplib.IncompleteRead exception not inside the function, but outside of it. The latter case will not work though if the HTTP request doesn't have a Content-Length header.

I found in my tests (using the technique described here) that a timeout set in the urlopen() call also effects the read() call:
import urllib2 as u
c = u.urlopen('http://localhost/', timeout=5.0)
s = c.read(1<<20)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/socket.py", line 380, in read
data = self._sock.recv(left)
File "/usr/lib/python2.7/httplib.py", line 561, in read
s = self.fp.read(amt)
File "/usr/lib/python2.7/httplib.py", line 1298, in read
return s + self._file.read(amt - len(s))
File "/usr/lib/python2.7/socket.py", line 380, in read
data = self._sock.recv(left)
socket.timeout: timed out
Maybe it's a feature of newer versions? I'm using Python 2.7 on a 12.04 Ubuntu straight out of the box.

One possible (imperfect) solution is to set the global socket timeout, explained in more detail here:
import socket
import urllib2
# timeout in seconds
socket.setdefaulttimeout(10)
# this call to urllib2.urlopen now uses the default timeout
# we have set in the socket module
req = urllib2.Request('http://www.voidspace.org.uk')
response = urllib2.urlopen(req)
However, this only works if you're willing to globally modify the timeout for all users of the socket module. I'm running the request from within a Celery task, so doing this would mess up timeouts for the Celery worker code itself.
I'd be happy to hear any other solutions...

I'd expect this to be a common problem, and yet - no answers to be found anywhere... Just built a solution for this using timeout signal:
import urllib2
import socket
timeout = 10
socket.setdefaulttimeout(timeout)
import time
import signal
def timeout_catcher(signum, _):
raise urllib2.URLError("Read timeout")
signal.signal(signal.SIGALRM, timeout_catcher)
def safe_read(url, timeout_time):
signal.setitimer(signal.ITIMER_REAL, timeout_time)
url = 'http://uberdns.eu'
content = urllib2.urlopen(url, timeout=timeout_time).read()
signal.setitimer(signal.ITIMER_REAL, 0)
# you should also catch any exceptions going out of urlopen here,
# set the timer to 0, and pass the exceptions on.
The credit for the signal part of the solution goes here btw: python timer mystery

Any asynchronous network library should allow to enforce the total timeout on any I/O operation e.g., here's gevent code example:
#!/usr/bin/env python2
import gevent
import gevent.monkey # $ pip install gevent
gevent.monkey.patch_all()
import urllib2
with gevent.Timeout(2): # enforce total timeout
response = urllib2.urlopen('http://localhost:8000')
encoding = response.headers.getparam('charset')
print response.read().decode(encoding)
And here's asyncio equivalent:
#!/usr/bin/env python3.5
import asyncio
import aiohttp # $ pip install aiohttp
async def fetch_text(url):
response = await aiohttp.get(url)
return await response.text()
text = asyncio.get_event_loop().run_until_complete(
asyncio.wait_for(fetch_text('http://localhost:8000'), timeout=2))
print(text)
The test http server is defined here.

pycurl.TIMEOUT option works for the whole request:
#!/usr/bin/env python3
"""Test that pycurl.TIMEOUT does limit the total request timeout."""
import sys
import pycurl
timeout = 2 #NOTE: it does limit both the total *connection* and *read* timeouts
c = pycurl.Curl()
c.setopt(pycurl.CONNECTTIMEOUT, timeout)
c.setopt(pycurl.TIMEOUT, timeout)
c.setopt(pycurl.WRITEFUNCTION, sys.stdout.buffer.write)
c.setopt(pycurl.HEADERFUNCTION, sys.stderr.buffer.write)
c.setopt(pycurl.NOSIGNAL, 1)
c.setopt(pycurl.URL, 'http://localhost:8000')
c.setopt(pycurl.HTTPGET, 1)
c.perform()
The code raises the timeout error in ~2 seconds. I've tested the total read timeout with the server that sends the response in multiple chunks with the time less than the timeout between chunks:
$ python -mslow_http_server 1
where slow_http_server.py:
#!/usr/bin/env python
"""Usage: python -mslow_http_server [<read_timeout>]
Return an http response with *read_timeout* seconds between parts.
"""
import time
try:
from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer, test
except ImportError: # Python 3
from http.server import BaseHTTPRequestHandler, HTTPServer, test
def SlowRequestHandlerFactory(read_timeout):
class HTTPRequestHandler(BaseHTTPRequestHandler):
def do_GET(self):
n = 5
data = b'1\n'
self.send_response(200)
self.send_header("Content-type", "text/plain; charset=utf-8")
self.send_header("Content-Length", n*len(data))
self.end_headers()
for i in range(n):
self.wfile.write(data)
self.wfile.flush()
time.sleep(read_timeout)
return HTTPRequestHandler
if __name__ == "__main__":
import sys
read_timeout = int(sys.argv[1]) if len(sys.argv) > 1 else 5
test(HandlerClass=SlowRequestHandlerFactory(read_timeout),
ServerClass=HTTPServer)
I've tested the total connection timeout with http://google.com:22222.

This isn't the behavior I see. I get a URLError when the call times out:
from urllib2 import Request, urlopen
req = Request('http://www.google.com')
res = urlopen(req,timeout=0.000001)
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# ...
# raise URLError(err)
# urllib2.URLError: <urlopen error timed out>
Can't you catch this error and then avoid trying to read res?
When I try to use res.read() after this I get NameError: name 'res' is not defined. Is something like this what you need:
try:
res = urlopen(req,timeout=3.0)
except:
print 'Doh!'
finally:
print 'yay!'
print res.read()
I suppose the way to implement a timeout manually is via multiprocessing, no? If the job hasn't finished you can terminate it.

Had the same issue with socket timeout on the read statement. What worked for me was putting both the urlopen and the read inside a try statement. Hope this helps!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - How to capture gevent socket timeout exception - python

Problem was gevent default timeout set to None. So we have to set default socket timeout manually. from gevent import socket socket.setdefaulttimeout(5)

Related

Timeout for requests.post not working in Python

Exception Handling with requests_futures in python

Async http request with python3

urlopen with timeout fails behind proxy

Read timeout using either urllib2 or any other http library

Categories

Resources