I have recently run into an issue at work where we are having intermittent problems with an internal website not loading due to an interrupted system call. We are using urllib2 to access the website. I can't share the exact code, but here is basically how we do it:
payload = {'userName': user_name,
'emailAddress': email_address,
'password': password}
headers = {'Accept': 'application/json',
'Content-Type': 'application/json',
'Authorization': token}
values = json.dumps(payload)
req = urllib2.Request(url, values, headers)
try:
response = urllib2.urlopen(req, timeout=30)
break
except IOError, e:
if e.errno != errno.EINTR:
print e.errno
raise
We log the errono and the raised exception. The exception is:
IOError: <urlopen error [Errno 4] Interrupted system call>
And the errno is None. I expected it to be 4.
Is there a better way to catch this error in Python 2.7? I am aware of PEP475, but we cannot upgrade to Python 3 right now.
The <urlopen error [Errno 4] Interrupted system call> indicates it is actually a URLError from urllib2, which subclasses IOError, but handles arguments completely differently. That is why the attributes errno and strerror are not initialized. It both passes strings as reason:
raise URLError("qop '%s' is not supported." % qop)
and wraps exceptions from other sources:
try:
h.request(req.get_method(), req.get_selector(), req.data, headers)
except socket.error, err: # XXX what error?
h.close()
raise URLError(err)
This is why you will not find errno in the usual place:
>>> try:
urlopen('http://asdf')
except URLError, e:
pass
...
>>> e
URLError(gaierror(-2, 'Name or service not known'),)
>>> e.errno
>>> e.reason
gaierror(-2, 'Name or service not known')
>>> e.reason.errno
-2
This worked in this case, but the reason attribute could be a string or a socket.error, which has (had) its own problems with errno.
The definition of URLError in urllib2.py:
class URLError(IOError):
# URLError is a sub-type of IOError, but it doesn't share any of
# the implementation. need to override __init__ and __str__.
# It sets self.args for compatibility with other EnvironmentError
# subclasses, but args doesn't have the typical format with errno in
# slot 0 and strerror in slot 1. This may be better than nothing.
def __init__(self, reason):
self.args = reason,
self.reason = reason
def __str__(self):
return '<urlopen error %s>' % self.reason
So long story short, it's a horrible mess. You have to check e.reason for
Is it just a string? If so, there'll be no errno anywhere.
Is it a socket.error? Handle quirks of that. Again the errno attribute can be unset, or None, since it could also be raised with a single string argument.
Is it a subclass of IOError or OSError (which subclass EnvironmentError)? Read errno attribute of that – and hope for the best.
This can be and probably is overly cautious for your case, but it is good to understand the edges. Tornado had similar issues and is using a utility function to get errno from exception, but unfortunately that function does not work with URLErrors.
What could cover at least some cases:
while True: # or some amount of retries
try:
response = urllib2.urlopen(req, timeout=30)
break
except URLError, e:
if getattr(e.reason, 'errno', None) == errno.EINTR:
# Retry
continue
Related
In case of a connection error, I want Python to wait and re-try. Here's the relevant code, where "link" is some link:
import requests
import urllib.request
import urllib.parse
from random import randint
try:
r=requests.get(link)
except ConnectionError or TimeoutError:
print("Will retry again in a little bit")
time.sleep(randint(2500,3000))
r=requests.get(link)
Except I still periodically get a connection error. And I never see the text "Will retry again in a little bit" so I know the code is not re-trying. What am I doing wrong? I'm pasting parts of the error code below in case I'm misreading the error. TIA!
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
During handling of the above exception, another exception occurred:
requests.packages.urllib3.exceptions.ProtocolError: ('Connection aborted.', TimeoutError(10060, 'A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond', None, 10060, None))
During handling of the above exception, another exception occurred:
requests.exceptions.ConnectionError: ('Connection aborted.', TimeoutError(10060, 'A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond', None, 10060, None))
For me, using a custom user agent in the request fixes this issue. With this method you spoof your browser.
Works:
url = "https://www.nasdaq.com/market-activity/stocks/amd"
headers = {'User-Agent': 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070802 SeaMonkey/1.1.4'}
response = requests.get(url, headers=headers)
Doesn't work:
url = "https://www.nasdaq.com/market-activity/stocks/amd"
response = requests.get(url)
The second request is not inside a try block so exceptions are not caught. Also in the try-except block you're not catching other exceptions that may occur.
You could use a loop to attempt a connection two times, and break if the request is successful.
for _ in range(2):
try:
r = requests.get(link)
break
except (ConnectionError, TimeoutError):
print("Will retry again in a little bit")
except Exception as e:
print(e)
time.sleep(randint(2500,3000))
I think you should use
except (ConnectionError, TimeoutError) as e:
print("Will retry again in a little bit")
time.sleep(randint(2500,3000))
r=requests.get(link)
See this similar question, or check the docs.
I had the same problem. It turns out that urlib3 relies on socket.py, which raises an OSError. So, you need to catch that:
try:
r = requests.get(link)
except OSError as e:
print("There as an error: {}".format(e))
I am trying to use requests_futures (https://github.com/ross/requests-futures) for asynchronous requests which seems to work fine. The only problem is, it doesn't throw any exceptions for me (i.e. TimeOut Exception). The code I used is:
from concurrent.futures import ThreadPoolExecutor
from requests_futures.sessions import FuturesSession
session = FuturesSession(executor=ThreadPoolExecutor(max_workers=10))
def callback(sess, resp):
# Print the ip address in callback
print 'IP', resp.text
proxy = {'http': 'http://176.194.189.57:8080'}
try:
future = session.get('http://api.ipify.org', background_callback=callback, timeout=5, proxies=proxy)
except Exception as e:
print "Error %s" % e
# future2 = session.get('http://api.ipify.org', background_callback=callback, timeout=5)
The first session.get() should throw an Exception as it isn't a valid proxy.
For the exception to be raised, you have to check the result() method of the future object you just created.
I want to grab the HTTP status code once it raises a URLError exception:
I tried this but didn't help:
except URLError, e:
logger.warning( 'It seems like the server is down. Code:' + str(e.code) )
You shouldn't check for a status code after catching URLError, since that exception can be raised in situations where there's no HTTP status code available, for example when you're getting connection refused errors.
Use HTTPError to check for HTTP specific errors, and then use URLError to check for other problems:
try:
urllib2.urlopen(url)
except urllib2.HTTPError, e:
print e.code
except urllib2.URLError, e:
print e.args
Of course, you'll probably want to do something more clever than just printing the error codes, but you get the idea.
Not sure why you are getting this error. If you are using urllib2 this should help:
import urllib2
from urllib2 import URLError
try:
urllib2.urlopen(url)
except URLError, e:
print e.code
The error being thrown is:
error: [Errno 110] Connection timed out
I'm not sure what to except for?
try:
smtpObj = smtplib.SMTP('smtp.example.com')
smtpObj.starttls()
smtpObj.login('user','pass')
smtpObj.sendmail(sender, receivers, message)
print "Successfully sent email"
except smtplib.SMTPException('Error: unable to send email"'):
pass
except smtplib.socket.error ('Error: could not connect to server'):
pass
Thanks.
You need to provide the exception class, not an instance thereof. That is to say, the code should look like
try:
smtpObj = smtplib.SMTP('smtp.example.com')
smtpObj.starttls()
smtpObj.login('user','pass')
smtpObj.sendmail(sender, receivers, message)
print "Successfully sent email"
except smtplib.SMTPException: # Didn't make an instance.
pass
except smtplib.socket.error:
pass
The second exception, smtplib.socket.error, seems to be the applicable one to catch that error. It is usually accessed directly from the socket module import socket, socket.error.
Note that I said that was what the code "should" look like, and that's sort of an exaggeration. When using try/except, you want to include as little code as possible in the try block, especially when you are catching fairly general errors like socket.error.
I believe socket.error should work but if you post the code you're using, we can help you better. smtplib.SMTPConnectError should also be of interest.
Try something like this:
try:
server = smtplib.SMTP("something.com")
except (socket.error, smtplib.SMTPConnectError):
print >> stderr, "Error connecting"
sys.exit(-1)
OSError is the base class for smtplib.SMTPConnectError, socket.timeout, TimeoutError, etc. Therefore, you should catch OSError if you want to handle them all:
try:
...
except OSError:
...
See: https://bugs.python.org/issue20903
I have the following code to do a postback to a remote URL:
request = urllib2.Request('http://www.example.com', postBackData, { 'User-Agent' : 'My User Agent' })
try:
response = urllib2.urlopen(request)
except urllib2.HTTPError, e:
checksLogger.error('HTTPError = ' + str(e.code))
except urllib2.URLError, e:
checksLogger.error('URLError = ' + str(e.reason))
except httplib.HTTPException, e:
checksLogger.error('HTTPException')
The postBackData is created using a dictionary encoded using urllib.urlencode. checksLogger is a logger using logging.
I have had a problem where this code runs when the remote server is down and the code exits (this is on customer servers so I don't know what the exit stack dump / error is at this time). I'm assuming this is because there is an exception and/or error that is not being handled. So are there any other exceptions that might be triggered that I'm not handling above?
Add generic exception handler:
request = urllib2.Request('http://www.example.com', postBackData, { 'User-Agent' : 'My User Agent' })
try:
response = urllib2.urlopen(request)
except urllib2.HTTPError, e:
checksLogger.error('HTTPError = ' + str(e.code))
except urllib2.URLError, e:
checksLogger.error('URLError = ' + str(e.reason))
except httplib.HTTPException, e:
checksLogger.error('HTTPException')
except Exception:
import traceback
checksLogger.error('generic exception: ' + traceback.format_exc())
From the docs page urlopen entry, it looks like you just need to catch URLError. If you really want to hedge your bets against problems within the urllib code, you can also catch Exception as a fall-back. Do not just except:, since that will catch SystemExit and KeyboardInterrupt also.
Edit: What I mean to say is, you're catching the errors it's supposed to throw. If it's throwing something else, it's probably due to urllib code not catching something that it should have caught and wrapped in a URLError. Even the stdlib tends to miss simple things like AttributeError. Catching Exception as a fall-back (and logging what it caught) will help you figure out what's happening, without trapping SystemExit and KeyboardInterrupt.
$ grep "raise" /usr/lib64/python/urllib2.py
IOError); for HTTP errors, raises an HTTPError, which can also be
raise AttributeError, attr
raise ValueError, "unknown url type: %s" % self.__original
# XXX raise an exception if no one else should try to handle
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
perform the redirect. Otherwise, raise HTTPError if no-one
raise HTTPError(req.get_full_url(), code, msg, headers, fp)
raise HTTPError(req.get_full_url(), code,
raise HTTPError(req.get_full_url(), 401, "digest auth failed",
raise ValueError("AbstractDigestAuthHandler doesn't know "
raise URLError('no host given')
raise URLError('no host given')
raise URLError(err)
raise URLError('unknown url type: %s' % type)
raise URLError('file not on local host')
raise IOError, ('ftp error', 'no host given')
raise URLError(msg)
raise IOError, ('ftp error', msg), sys.exc_info()[2]
raise GopherError('no host given')
There is also the possibility of exceptions in urllib2 dependencies, or of exceptions caused by genuine bugs.
You are best off logging all uncaught exceptions in a file via a custom sys.excepthook. The key rule of thumb here is to never catch exceptions you aren't planning to correct, and logging is not a correction. So don't catch them just to log them.
You can catch all exceptions and log what's get caught:
import sys
import traceback
def formatExceptionInfo(maxTBlevel=5):
cla, exc, trbk = sys.exc_info()
excName = cla.__name__
try:
excArgs = exc.__dict__["args"]
except KeyError:
excArgs = "<no args>"
excTb = traceback.format_tb(trbk, maxTBlevel)
return (excName, excArgs, excTb)
try:
x = x + 1
except:
print formatExceptionInfo()
(Code from http://www.linuxjournal.com/article/5821)
Also read documentation on sys.exc_info.
I catch:
httplib.HTTPException
urllib2.HTTPError
urllib2.URLError
I believe this covers everything including socket errors.