requests.HTTPError uncaught after a requests.get() 404 response - python

I'm having a slight problem with the requests library.
Say for example I have a statement like this in Python:
try:
request = requests.get('google.com/admin') #Should return 404
except requests.HTTPError, e:
print 'HTTP ERROR %s occured' % e.code
For some reason the exception is not being caught. I've checked the API documentation for requests but it's a bit slim. Is there anyone who has more experience with the library that might be able to help me out?

Interpreter is your friend:
import requests
requests.get('google.com/admin')
# MissingSchema: Invalid URL u'google.com/admin': No schema supplied
Also, requests exceptions:
import requests.exceptions
dir(requests.exceptions)
Also notice that by default requests doesn't raise exception if status is not 200:
In [9]: requests.get('https://google.com/admin')
Out[9]: <Response [503]>
There is raise_for_status() method that does it:
In [10]: resp = requests.get('https://google.com/admin')
In [11]: resp
Out[11]: <Response [503]>
In [12]: resp.raise_for_status()
...
HTTPError: 503 Server Error: Service Unavailable

Running your code in python 2.7.5:
import requests
try:
response = requests.get('google.com/admin') #Should return 404
except requests.HTTPError, e:
print 'HTTP ERROR %s occured' % e.code
print e
Results in:
File "C:\Python27\lib\site-packages\requests\models.py", line 291, in prepare_url
raise MissingSchema("Invalid URL %r: No schema supplied" % url)
requests.exceptions.MissingSchema: Invalid URL u'google.com/admin': No schema supplied
To get your code to pick up this exception you need to add:
except (requests.exceptions.MissingSchema) as e:
print 'Missing schema occured. status'
print e
Note also it is not a missing schema but a missing scheme.

Related

Sending the failed JSON objects into a file when the post requests fail

I have a JSON post request that I am sending to an api for each row in a dataframe. I want to throw the unsuccessful JSON objects into another text file so I can re-process them once the entire dataframe has been looped over completely.
This is the sample code that I have currently for checking different kinds of exceptions:
for i in df.index:
print "This is a JSON object."
payload='''{"individualInfo":[%s]}''' %(df.loc[i].to_json(orient='columns'))
print payload
try:
r = requests.post(api_url, data=payload, timeout=(0.2,20))
print r.json()
print r.raise_for_status()
except requests.exceptions.HTTPError as errh:
print "HTTP Error: %s" %errh
except requests.exceptions.ConnectionError as errc:
print "Error Connecting: %s" %errc
except requests.exceptions.Timeout as errt:
print "Timeout error: %s" %errt
I want each payload to be thrown into 2 different files based on whether it got posted successfully or not.

404 error received for working url using python urllib2

I am trying to get the following url: ow dot ly/LApK30cbLKj that is working but I am getting http 404 error:
my_url = 'ow' + '.ly/LApK30cbLKj' # SO won't accept an ow.ly url
headers = {'User-Agent' : user_agent }
request = urllib2.Request(my_url,"", headers)
response = None
try:
response = urllib2.urlopen(request)
except urllib2.HTTPError, e:
print '+++HTTPError = ' + str(e.code)
Is there something I can do to get this url with a http 200 status as I do when I visit in a browser?
Your example works for me, except you need to add http://
my_url = 'http://ow' + '.ly/LApK30cbLKj'
You need to define the url's protocol, the thing is that when you visit the url in browser, the default protocol will be HTTP. However, urllib2 doesn't do that for you, you need to add http:// in the beginning of url, otherwise, the error will be raised:
ValueError: unknown url type: ow.ly/LApK30cbLKj
As #enjoi mentioned, I used requests:
import requests
result = None
try:
result = requests.get(agen_cont.source_url)
except requests.exceptions.Timeout as e:
print '+++timeout exception: '
print e
except requests.exceptions.TooManyRedirects as e:
print '+++ too manuy redirects exception: '
print e
except requests.exceptions.RequestException as e:
print '+++ request exception: '
print e
except Exception:
import traceback
print '+++generic exception: ' + traceback.format_exc()
if result:
final_url = result.url
print final_url
response = result.content

How to use exceptions for different cases with python requests

I have this code
try:
response = requests.post(url, data=json.dumps(payload))
except (ConnectionError, HTTPError):
msg = "Connection problem"
raise Exception(msg)
Now i want the following
if status_code == 401
login() and then try request again
if status_code == 400
then send respose as normal
if status_code == 500
Then server problem , try the request again and if not successful raise EXception
Now these are status codes , i donn't know how can i mix status codes with exceptions. I also don't know what codes will be covered under HttpError
requests has a call called raise_for_status available in your request object which will raise an HTTPError exception if any code is returned in the 400 to 500 range inclusive.
Documentation for raise_for_status is here
So, what you can do, is after you make your call:
response = requests.post(url, data=json.dumps(payload))
You make a call for raise_for_status as
response.raise_for_status()
Now, you are already catching this exception, which is great, so all you have to do is check to see which status code you have in your error. This is available to you in two ways. You can get it from your exception object, or from the request object. Here is the example for this:
from requests import get
from requests.exceptions import HTTPError
try:
r = get('http://google.com/asdf')
r.raise_for_status()
except HTTPError as e:
# Get your code from the exception object like this
print(e.response.status_code)
# Or you can get the code which will be available from r.status_code
print(r.status_code)
So, with the above in mind, you can now use the status codes in your conditional statements
https://docs.python.org/2/library/urllib2.html#urllib2.URLError
code
An HTTP status code as defined in RFC 2616. This numeric value
corresponds to a value found in the dictionary of codes as found in
BaseHTTPServer.BaseHTTPRequestHandler.responses.
You can get the error code from an HTTPError from its code member, like so
try:
# ...
except HTTPError as ex:
status_code = ex.code

In Python, how do I use urllib to see if a website is 404 or 200?

How to get the code of the headers through urllib?
The getcode() method (Added in python2.6) returns the HTTP status code that was sent with the response, or None if the URL is no HTTP URL.
>>> a=urllib.urlopen('http://www.google.com/asdfsf')
>>> a.getcode()
404
>>> a=urllib.urlopen('http://www.google.com/')
>>> a.getcode()
200
You can use urllib2 as well:
import urllib2
req = urllib2.Request('http://www.python.org/fish.html')
try:
resp = urllib2.urlopen(req)
except urllib2.HTTPError as e:
if e.code == 404:
# do something...
else:
# ...
except urllib2.URLError as e:
# Not an HTTP-specific error (e.g. connection refused)
# ...
else:
# 200
body = resp.read()
Note that HTTPError is a subclass of URLError which stores the HTTP status code.
For Python 3:
import urllib.request, urllib.error
url = 'http://www.google.com/asdfsf'
try:
conn = urllib.request.urlopen(url)
except urllib.error.HTTPError as e:
# Return code error (e.g. 404, 501, ...)
# ...
print('HTTPError: {}'.format(e.code))
except urllib.error.URLError as e:
# Not an HTTP-specific error (e.g. connection refused)
# ...
print('URLError: {}'.format(e.reason))
else:
# 200
# ...
print('good')
import urllib2
try:
fileHandle = urllib2.urlopen('http://www.python.org/fish.html')
data = fileHandle.read()
fileHandle.close()
except urllib2.URLError, e:
print 'you got an error with the code', e

Overriding urllib2.HTTPError or urllib.error.HTTPError and reading response HTML anyway

I receive a 'HTTP Error 500: Internal Server Error' response, but I still want to read the data inside the error HTML.
With Python 2.6, I normally fetch a page using:
import urllib2
url = "http://google.com"
data = urllib2.urlopen(url)
data = data.read()
When attempting to use this on the failing URL, I get the exception urllib2.HTTPError:
urllib2.HTTPError: HTTP Error 500: Internal Server Error
How can I fetch such error pages (with or without urllib2), all while they are returning Internal Server Errors?
Note that with Python 3, the corresponding exception is urllib.error.HTTPError.
The HTTPError is a file-like object. You can catch it and then read its contents.
try:
resp = urllib2.urlopen(url)
contents = resp.read()
except urllib2.HTTPError, error:
contents = error.read()
If you mean you want to read the body of the 500:
request = urllib2.Request(url, data, headers)
try:
resp = urllib2.urlopen(request)
print resp.read()
except urllib2.HTTPError, error:
print "ERROR: ", error.read()
In your case, you don't need to build up the request. Just do
try:
resp = urllib2.urlopen(url)
print resp.read()
except urllib2.HTTPError, error:
print "ERROR: ", error.read()
so, you don't override urllib2.HTTPError, you just handle the exception.
alist=['http://someurl.com']
def testUrl():
errList=[]
for URL in alist:
try:
urllib2.urlopen(URL)
except urllib2.URLError, err:
(err.reason != 200)
errList.append(URL+" "+str(err.reason))
return URL+" "+str(err.reason)
return "".join(errList)
testUrl()

Categories

Resources