Cant catch exeptions with urllib2 - python

I have a script printing out the response from an API, but I cant seem to catch any exceptions. I think I've gone thru every question asked on this topic without any luck.
How can I check if the script will catch any errors/exceptions?
I'm testing the script on a site i know returns 403 Forbidden, but it does'nt show.
My script:
import urllib2
url_se = 'http://www.example.com'
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'API to File')]
try:
request = opener.open(url_se)
except urllib2.HTTPError, e:
print e.code
except urllib2.URLError, e:
print e.args
except Exception:
import traceback
print 'Generic exception ' + traceback.format_exc()
response = request.read()
print response
Is this the right approach? Whats the best practice for catching exeptions concerning
urllib2

There is a bug in your program. If any exception occurs in try block then variable request becomes undefined in response = request.text() block.
Correct it as
import urllib2
url_se = 'http://www.example.com'
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'API to File')]
try:
request = opener.open(url_se)
response = request.read()
print response
except urllib2.HTTPError, e:
print e.code
except urllib2.URLError, e:
print e.args
except Exception as e:
import traceback
print 'Generic exception ' + traceback.format_exc()
Test it on your machine you will surely see the Exceptions.
Catching exceptions individually only make sense if you are doing something specify with them otherwise if you only want to log the exceptions then a universal except block will do the job.

Related

How can I catch a connection refused error in a proper way?

In my code, I've made some post requests. How can I catch connection refused error in that call?
try:
headers = {'content-type': 'application/x-www-form-urlencoded; charset=UTF-8'}
response = requests.request("POST", local_wallet_api + "v1/wallet/get_public_keys", headers=headers)
res = json.loads(response.text)
except Exception as e:
if e.errno == errno.ECONNREFUSED:
print("connection refused")
sys.exit(141)
I've tried the above code, but it is not working as it says e has no errno parameter. Is there any proper way to handle this kind of error?
from requests.exceptions import ConnectionError
try:
headers = {'content-type': 'application/x-www-form-urlencoded; charset=UTF-8'}
response = requests.request("POST", local_wallet_api + "v1/wallet/get_public_keys", headers=headers)
res = json.loads(response.text)
except ConnectionError:
sys.exit(141)
you can use requests.exceptions.RequestException as your exception.
Example:
except requests.exceptions.RequestException as e:
# exception here
For the list of requests exceptions, check requests.exception documentation. You can refer to this link.
take a look at this.
you can get the errno by e.args[0].reason.errno.
also use this except:
except requests.exceptions.ConnectionError as e:

404 error received for working url using python urllib2

I am trying to get the following url: ow dot ly/LApK30cbLKj that is working but I am getting http 404 error:
my_url = 'ow' + '.ly/LApK30cbLKj' # SO won't accept an ow.ly url
headers = {'User-Agent' : user_agent }
request = urllib2.Request(my_url,"", headers)
response = None
try:
response = urllib2.urlopen(request)
except urllib2.HTTPError, e:
print '+++HTTPError = ' + str(e.code)
Is there something I can do to get this url with a http 200 status as I do when I visit in a browser?
Your example works for me, except you need to add http://
my_url = 'http://ow' + '.ly/LApK30cbLKj'
You need to define the url's protocol, the thing is that when you visit the url in browser, the default protocol will be HTTP. However, urllib2 doesn't do that for you, you need to add http:// in the beginning of url, otherwise, the error will be raised:
ValueError: unknown url type: ow.ly/LApK30cbLKj
As #enjoi mentioned, I used requests:
import requests
result = None
try:
result = requests.get(agen_cont.source_url)
except requests.exceptions.Timeout as e:
print '+++timeout exception: '
print e
except requests.exceptions.TooManyRedirects as e:
print '+++ too manuy redirects exception: '
print e
except requests.exceptions.RequestException as e:
print '+++ request exception: '
print e
except Exception:
import traceback
print '+++generic exception: ' + traceback.format_exc()
if result:
final_url = result.url
print final_url
response = result.content

How to check HTTP errors for more than two URLs?

Question: I've 3 URLS - testurl1, testurl2 and testurl3. I'd like to try testurl1 first, if I get 404 error then try testurl2, if that gets 404 error then try testurl3. How to achieve this? So far I've tried below but that works only for two url, how to add support for third url?
from urllib2 import Request, urlopen
from urllib2 import URLError, HTTPError
def checkfiles():
req = Request('http://testurl1')
try:
response = urlopen(req)
url1=('http://testurl1')
except HTTPError, URLError:
url1 = ('http://testurl2')
print url1
finalURL='wget '+url1+'/testfile.tgz'
print finalURL
checkfiles()
Another job for plain old for loop:
for url in testurl1, testurl2, testurl3
req = Request(url)
try:
response = urlopen(req)
except HttpError as err:
if err.code == 404:
continue
raise
else:
# do what you want with successful response here (or outside the loop)
break
else:
# They ALL errored out with HTTPError code 404. Handle this?
raise err
Hmmm maybe something like this?
from urllib2 import Request, urlopen
from urllib2 import URLError, HTTPError
def checkfiles():
req = Request('http://testurl1')
try:
response = urlopen(req)
url1=('http://testurl1')
except HTTPError, URLError:
try:
url1 = ('http://testurl2')
except HTTPError, URLError:
url1 = ('http://testurl3')
print url1
finalURL='wget '+url1+'/testfile.tgz'
print finalURL
checkfiles()

In Python, how do I use urllib to see if a website is 404 or 200?

How to get the code of the headers through urllib?
The getcode() method (Added in python2.6) returns the HTTP status code that was sent with the response, or None if the URL is no HTTP URL.
>>> a=urllib.urlopen('http://www.google.com/asdfsf')
>>> a.getcode()
404
>>> a=urllib.urlopen('http://www.google.com/')
>>> a.getcode()
200
You can use urllib2 as well:
import urllib2
req = urllib2.Request('http://www.python.org/fish.html')
try:
resp = urllib2.urlopen(req)
except urllib2.HTTPError as e:
if e.code == 404:
# do something...
else:
# ...
except urllib2.URLError as e:
# Not an HTTP-specific error (e.g. connection refused)
# ...
else:
# 200
body = resp.read()
Note that HTTPError is a subclass of URLError which stores the HTTP status code.
For Python 3:
import urllib.request, urllib.error
url = 'http://www.google.com/asdfsf'
try:
conn = urllib.request.urlopen(url)
except urllib.error.HTTPError as e:
# Return code error (e.g. 404, 501, ...)
# ...
print('HTTPError: {}'.format(e.code))
except urllib.error.URLError as e:
# Not an HTTP-specific error (e.g. connection refused)
# ...
print('URLError: {}'.format(e.reason))
else:
# 200
# ...
print('good')
import urllib2
try:
fileHandle = urllib2.urlopen('http://www.python.org/fish.html')
data = fileHandle.read()
fileHandle.close()
except urllib2.URLError, e:
print 'you got an error with the code', e

Overriding urllib2.HTTPError or urllib.error.HTTPError and reading response HTML anyway

I receive a 'HTTP Error 500: Internal Server Error' response, but I still want to read the data inside the error HTML.
With Python 2.6, I normally fetch a page using:
import urllib2
url = "http://google.com"
data = urllib2.urlopen(url)
data = data.read()
When attempting to use this on the failing URL, I get the exception urllib2.HTTPError:
urllib2.HTTPError: HTTP Error 500: Internal Server Error
How can I fetch such error pages (with or without urllib2), all while they are returning Internal Server Errors?
Note that with Python 3, the corresponding exception is urllib.error.HTTPError.
The HTTPError is a file-like object. You can catch it and then read its contents.
try:
resp = urllib2.urlopen(url)
contents = resp.read()
except urllib2.HTTPError, error:
contents = error.read()
If you mean you want to read the body of the 500:
request = urllib2.Request(url, data, headers)
try:
resp = urllib2.urlopen(request)
print resp.read()
except urllib2.HTTPError, error:
print "ERROR: ", error.read()
In your case, you don't need to build up the request. Just do
try:
resp = urllib2.urlopen(url)
print resp.read()
except urllib2.HTTPError, error:
print "ERROR: ", error.read()
so, you don't override urllib2.HTTPError, you just handle the exception.
alist=['http://someurl.com']
def testUrl():
errList=[]
for URL in alist:
try:
urllib2.urlopen(URL)
except urllib2.URLError, err:
(err.reason != 200)
errList.append(URL+" "+str(err.reason))
return URL+" "+str(err.reason)
return "".join(errList)
testUrl()

Categories

Resources