I have
import urllib2
try:
urllib2.urlopen("some url")
except urllib2.HTTPError:
<whatever>
but what I end up is catching any kind of HTTP error. I want to catch only if the specified webpage doesn't exist (404?).
Python 3
from urllib.error import HTTPError
Python 2
from urllib2 import HTTPError
Just catch HTTPError, handle it, and if it's not Error 404, simply use raise to re-raise the exception.
See the Python tutorial.
Here is a complete example for Python 2:
import urllib2
from urllib2 import HTTPError
try:
urllib2.urlopen("some url")
except HTTPError as err:
if err.code == 404:
<whatever>
else:
raise
For Python 3.x
import urllib.request
import urllib.error
try:
urllib.request.urlretrieve(url, fullpath)
except urllib.error.HTTPError as err:
print(err.code)
Tim's answer seems to me as misleading especially when urllib2 does not return the expected code. For example, this error will be fatal (believe or not - it is not uncommon one when downloading urls):
AttributeError: 'URLError' object has no attribute 'code'
Fast, but maybe not the best solution would be code using nested try/except block:
import urllib2
try:
urllib2.urlopen("some url")
except urllib2.HTTPError as err:
try:
if err.code == 404:
# Handle the error
else:
raise
except:
...
More information to the topic of nested try/except blocks Are nested try/except blocks in python a good programming practice?
If from urllib.error import HTTPError doesn't work, try using from requests.exceptions import HTTPError.
Sample:
from requests.exceptions import HTTPError
try:
<access some url>
except HTTPError:
# Handle the error like ususal
Related
I'm working with Python requests and testing URLs from https://badssl.com/ certificate section and all the invalid URLs are returning errors except for https://revoked.badssl.com/ and https://pinning-test.badssl.com/. They are responding with 200 status codes. I would like someone to explain why this is happening, despite the pages exhibiting errors such as NET::ERR_CERT_REVOKED and NET::ERR_SSL_PINNED_KEY_NOT_IN_CERT_CHAIN for the former and latter respectively.
import requests
def check_connection():
url='https://revoked.badssl.com/' or 'https://pinning-test.badssl.com/'
try:
r = requests.get(url)
r.raise_for_status()
print(r)
except requests.exceptions.RequestException as err:
print ("OOps: Something Else",err)
except requests.exceptions.HTTPError as errh:
print ("Http Error:",errh)
except requests.exceptions.ConnectionError as errc:
print ("Error Connecting:",errc)
except requests.exceptions.Timeout as errt:
print ("Timeout Error:",errt)
check_connection()
You're not getting an analog to "NET::ERR_CERT_REVOKED" message because requests is just an HTTP request tool; it's not a browser. If you want to query an OCSP responder to see if a server certificate has been revoked, you can use the ocsp module to do that. There's an example here.
The answer is going to be similar for "NET::ERR_SSL_PINNED_KEY_NOT_IN_CERT_CHAIN"; the requests module isn't the sort of high-level tool that implements certificate pinning. In fact, even the development builds of major browsers don't implement this; there's some interesting discussion about this issue in https://github.com/chromium/badssl.com/issues/15.
i was opening a url by using urllib.request.urlopen.
The following exception was raised
http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
HTTPError
I went through the documentation of the requests library
*exception urllib.error.HTTPError
Though being an exception (a subclass of URLError), an HTTPError can also function as a non-exceptional file-like return value (the same thing that urlopen() returns). This is useful when handling exotic HTTP errors, such as requests for authentication.
code
An HTTP status code as defined in RFC 2616. This numeric value corresponds to a value found in the dictionary of codes as found in http.server.BaseHTTPRequestHandler.responses.
reason
This is usually a string explaining the reason for this error.
headers
The HTTP response headers for the HTTP request that caused the HTTPError.*
But in my case there was no error code or string which gave reason for the exception.
enter image description here
If you catch the exception you can see the reason, like this:
try:
urllib.request.urlopen(req)
except urllib.error.HTTPError as e:
print(e.reason)
I am trying to request weather api data and I can pull some of the data but eventually the code throws an HTTPError despite the Try/Except that I already have there. What am I writing wrong for my Try/Except?
I have tried to put the HTTPError in parentheses and catch is with HTTPError as he to give me back the error as a variable so I could read it. I've tried from urllib.error import HTTPError. Nothing works.
from urllib.error import HTTPError
for city in cities:
current_city = owm.get_current(city, **settings)
try:
print(f'Current city is {current_city["name"]} and the city number is: {current_city["id"]}')
except HTTPError:
print("Ooops")
print("------------")
Here is the error message:
HTTPError: HTTP Error 404: Not Found
I am new to this so please help me. I am using urllib.request to open and reading webpages. Can someone tell me how can my code handle redirects, timeouts, badly formed URLs?
I have sort of found a way for timeouts, I am not sure if it is correct though. Is it? All opinions are welcomed! Here it is:
from socket import timeout
import urllib.request
try:
text = urllib.request.urlopen(url, timeout=10).read().decode('utf-8')
except (HTTPError, URLError) as error:
logging.error('Data of %s not retrieved because %s\nURL: %s', name, error, url)
except timeout:
logging.error('socket timed out - URL %s', url)
Please help me as I am new to this. Thanks!
Take a look at the urllib error page.
So for following behaviours:
Redirect: HTTP code 302, so that's a HTTPError with a code. You could also use the HTTPRedirectHandler instead of failing.
Timeouts: You have that correct.
Badly formed URLs: That's a URLError.
Here's the code I would use:
from socket import timeout
import urllib.request
try:
text = urllib.request.urlopen("http://www.google.com", timeout=0.1).read()
except urllib.error.HTTPError as error:
print(error)
except urllib.error.URLError as error:
print(error)
except timeout as error:
print(error)
I can't finding a redirecting URL, so I'm not exactly sure how to check the HTTPError is a redirect.
You might find the requests package is a bit easier to use (it's suggested on the urllib page).
Using requests package I was able to find a better solution. With the only exception you need to handle are:
try:
r = requests.get(url, timeout =5)
except requests.exceptions.Timeout:
# Maybe set up for a retry, or continue in a retry loop
except requests.exceptions.TooManyRedirects as error:
# Tell the user their URL was bad and try a different one
except requests.exceptions.ConnectionError:
# Connection could not be completed
except requests.exceptions.RequestException as e:
# catastrophic error. bail.
And to get the text of that page, all you need to do is:
r.text
I am making a python app and I want to read a file from the net.
this is the code that I am using to read it :
urllib2.urlopen("http://example.com/check.txt").read()
everything works great but when I point it to a url that do not exist, it gives HTTP 404: not found error and that is normal.
the problem is that the app is designed to work on windows, so, it will be compiled.
on windows, when the app tries to get a file from a url that do not exist, the app crushes and gives an error window + it creates a log that contains HTTP 404: NOT found error.
I tried to escape this error but I failed. This is the full code:
import urllib2
file = urllib2.urlopen("http://example.com/check.txt")
try:
file.read()
except urllib2.URLError:
print "File Not Found"
else:
print "File is found"
please, if you know how to escape this error, help me.
You should apply the try..except around the urlopen, not the read.
Try this
import urllib2
try:
fh = urllib2.urlopen('http://example.com/check.txt')
print fh.read()
except urllib2.HTTPError, e:
print e.code
except urllib2.URLError, e:
print e.code