Python - urllib2 > how to escape HTTP errors - python

I am making a python app and I want to read a file from the net.
this is the code that I am using to read it :
urllib2.urlopen("http://example.com/check.txt").read()
everything works great but when I point it to a url that do not exist, it gives HTTP 404: not found error and that is normal.
the problem is that the app is designed to work on windows, so, it will be compiled.
on windows, when the app tries to get a file from a url that do not exist, the app crushes and gives an error window + it creates a log that contains HTTP 404: NOT found error.
I tried to escape this error but I failed. This is the full code:
import urllib2
file = urllib2.urlopen("http://example.com/check.txt")
try:
file.read()
except urllib2.URLError:
print "File Not Found"
else:
print "File is found"
please, if you know how to escape this error, help me.

You should apply the try..except around the urlopen, not the read.
Try this
import urllib2
try:
fh = urllib2.urlopen('http://example.com/check.txt')
print fh.read()
except urllib2.HTTPError, e:
print e.code
except urllib2.URLError, e:
print e.code

Related

Why is 'https://revoked.badssl.com/' and 'https://pinning-test.badssl.com/' returning 200 response using Python requests?

I'm working with Python requests and testing URLs from https://badssl.com/ certificate section and all the invalid URLs are returning errors except for https://revoked.badssl.com/ and https://pinning-test.badssl.com/. They are responding with 200 status codes. I would like someone to explain why this is happening, despite the pages exhibiting errors such as NET::ERR_CERT_REVOKED and NET::ERR_SSL_PINNED_KEY_NOT_IN_CERT_CHAIN for the former and latter respectively.
import requests
def check_connection():
url='https://revoked.badssl.com/' or 'https://pinning-test.badssl.com/'
try:
r = requests.get(url)
r.raise_for_status()
print(r)
except requests.exceptions.RequestException as err:
print ("OOps: Something Else",err)
except requests.exceptions.HTTPError as errh:
print ("Http Error:",errh)
except requests.exceptions.ConnectionError as errc:
print ("Error Connecting:",errc)
except requests.exceptions.Timeout as errt:
print ("Timeout Error:",errt)
check_connection()
You're not getting an analog to "NET::ERR_CERT_REVOKED" message because requests is just an HTTP request tool; it's not a browser. If you want to query an OCSP responder to see if a server certificate has been revoked, you can use the ocsp module to do that. There's an example here.
The answer is going to be similar for "NET::ERR_SSL_PINNED_KEY_NOT_IN_CERT_CHAIN"; the requests module isn't the sort of high-level tool that implements certificate pinning. In fact, even the development builds of major browsers don't implement this; there's some interesting discussion about this issue in https://github.com/chromium/badssl.com/issues/15.

When url is invalid Request.get throws ConnectionError and python script terminates

As the title explains; the python script is running fine as long as I have a valid url, as soon as a I switch this to an invalid url the script exits with a long error message. I ultimately want the program to keep checking for a connection.
Here is some sample code:
works fine with valid url
import requests
request = requests.get('http://www.example.com')
if request.status_code == 200:
print('Web site exists')
else:
print('Web site does not exist')
Does not work as expected with invalid url
import requests
request = requests.get('http://www.1337example.com')
if request.status_code == 200:
print('Web site exists')
else:
print('Web site does not exist')
The reason it is erroring out is because a connection was not made at all, which would not allow python to check for any status codes. Hence the error out.
Try Except blocks can fix that
This will run any code in the try block, and then run the code in the except block if it encounters any errors
try:
request = requests.get('http://www.example.com')
print('Web site exists')
except ConnectionError :
print('Web site does not exist')
Edit: Added the specific exception

Bad URLs in Python 3.4.3

I am new to this so please help me. I am using urllib.request to open and reading webpages. Can someone tell me how can my code handle redirects, timeouts, badly formed URLs?
I have sort of found a way for timeouts, I am not sure if it is correct though. Is it? All opinions are welcomed! Here it is:
from socket import timeout
import urllib.request
try:
text = urllib.request.urlopen(url, timeout=10).read().decode('utf-8')
except (HTTPError, URLError) as error:
logging.error('Data of %s not retrieved because %s\nURL: %s', name, error, url)
except timeout:
logging.error('socket timed out - URL %s', url)
Please help me as I am new to this. Thanks!
Take a look at the urllib error page.
So for following behaviours:
Redirect: HTTP code 302, so that's a HTTPError with a code. You could also use the HTTPRedirectHandler instead of failing.
Timeouts: You have that correct.
Badly formed URLs: That's a URLError.
Here's the code I would use:
from socket import timeout
import urllib.request
try:
text = urllib.request.urlopen("http://www.google.com", timeout=0.1).read()
except urllib.error.HTTPError as error:
print(error)
except urllib.error.URLError as error:
print(error)
except timeout as error:
print(error)
I can't finding a redirecting URL, so I'm not exactly sure how to check the HTTPError is a redirect.
You might find the requests package is a bit easier to use (it's suggested on the urllib page).
Using requests package I was able to find a better solution. With the only exception you need to handle are:
try:
r = requests.get(url, timeout =5)
except requests.exceptions.Timeout:
# Maybe set up for a retry, or continue in a retry loop
except requests.exceptions.TooManyRedirects as error:
# Tell the user their URL was bad and try a different one
except requests.exceptions.ConnectionError:
# Connection could not be completed
except requests.exceptions.RequestException as e:
# catastrophic error. bail.
And to get the text of that page, all you need to do is:
r.text

How to accessing xml response when site also issues an HTTP error code

the following url returns the expected resonse in the browser:
http://ws.audioscrobbler.com/2.0/?method=user.getinfo&user=notonfile99&api_key=8e9de6bd545880f19d2d2032c28992b4
<lfm status="failed">
<error code="6">No user with that name was found</error>
</lfm>
But I am unable to access the xml in Python via the following code as an HTTPError exception is raised:
"due to an HTTP Error 400: Bad Request"
import urllib
urlopen('http://ws.audioscrobbler.com/2.0/?method=user.getinfo&user=notonfile99&api_key=8e9de6bd545880f19d2d2032c28992b4')
I see that I can work aound this via using urlretrieve rather than urlopen, but the html response gets written to disc.
Is there a way, just using the python v2.7 standard library, where I can get hold of the xml response, without having to read it from disc, and do housekeeping?
I see that this question has been asked before in a PHP context, but I don't know how to apply the answer to Python:
DOMDocument load on a page returning 400 Bad Request status
Copying from here: http://www.voidspace.org.uk/python/articles/urllib2.shtml#httperror
The exception that is thrown contains the full body of the error page:
#!/usr/bin/python2
import urllib2
try:
resp = urllib2.urlopen('http://ws.audioscrobbler.com/2.0/?method=user.getinfo&user=notonfile99&api_key=8e9de6bd545880f19d2d2032c28992b4')
except urllib2.HTTPError, e:
print e.code
print e.read()

Why does 'url' not work as a variable here?

I originally had the variable cpanel named url and the code would not return anything. Any idea why? It doesn't seem to be used by anything else, but there's gotta be something I'm overlooking.
import urllib2
cpanel = 'http://www.tas-tech.com/cpanel'
req = urllib2.Request(cpanel)
try:
handle = urllib2.urlopen(req)
except IOError, e:
if hasattr(e, 'code'):
if e.code != 401:
print 'We got another error'
print e.code
else:
print e.headers
print e.headers['www-authenticate']
Note that urllib2.Request has a parameter named url, but that really shouldn't be the source of the problem, it works as expected:
>>> import urllib2
>>> url = "http://www.google.com"
>>> req = urllib2.Request(url)
>>> urllib2.urlopen(req).code
200
Note that your code above functions identically when you switch cpanel for url. So the problem must have been elsewhere.
I'm pretty sure that /cpanel (if it is the hosting control panel) actually redirects (302) you to http://www.tas-tech.com:2082/ or something like that. You should just update your thing to deal with the redirect (or better yet, just send the request to the real address).

Categories

Resources