There is a json() method in the requests library.
I want to handle his exceptions, "if suddenly something is wrong." For example, the server did not respond with json, but with something else, well, there are all kinds of problems with encodings, etc.
I climbed into the documentation, then into the sources, and realized that in the end, in the requests itself, everything is tied to two libraries at once:
try:
import simplejson as json
except ImportError:
import json
Accordingly, there are two types of exceptions for each of them at once.
And I have a "combined arms" library, which I would like to use in independent conditions from the installed system, libraries, etc.
How to correctly register in this case
try:
answer = requests.get (url, param, headers) .json ()
except (???, ???, ???):
do_my_function (incorrect_answer)
so that it does not depend on the libraries the import took place, and with them their specific exceptions?
Since you climbed into the source, you should go to the end. And also the documentation can help.
In the [documentation] [1] on requests json () it says:
In case the JSON decoding fails, r.json () raises an exception. For example, if the response gets a 204 (No Content), or if the response contains invalid JSON, attempting r.json () raises ValueError: No JSON object could be decoded.
We must catch the ValueError exception:
try:
answer = requests.get (url, param, headers).json ()
except ValueError:
do_my_function (incorrect_answer)
If you look at the json and simplejson sources, you can see that both packages use a more accurate exception inherited from ValueError:JSONDecodeError. Thus, you can do this:
try:
from simplejson import JSONDecodeError
except ImportError:
from json import JSONDecodeError
...
try:
answer = requests.get (url, param, headers) .json ()
except JSONDecodeError:
do_my_function (incorrect_answer)
[1]: https://requests.readthedocs.io/en/master/user/quickstart/#json-response-content
Related
I am new to this so please help me. I am using urllib.request to open and reading webpages. Can someone tell me how can my code handle redirects, timeouts, badly formed URLs?
I have sort of found a way for timeouts, I am not sure if it is correct though. Is it? All opinions are welcomed! Here it is:
from socket import timeout
import urllib.request
try:
text = urllib.request.urlopen(url, timeout=10).read().decode('utf-8')
except (HTTPError, URLError) as error:
logging.error('Data of %s not retrieved because %s\nURL: %s', name, error, url)
except timeout:
logging.error('socket timed out - URL %s', url)
Please help me as I am new to this. Thanks!
Take a look at the urllib error page.
So for following behaviours:
Redirect: HTTP code 302, so that's a HTTPError with a code. You could also use the HTTPRedirectHandler instead of failing.
Timeouts: You have that correct.
Badly formed URLs: That's a URLError.
Here's the code I would use:
from socket import timeout
import urllib.request
try:
text = urllib.request.urlopen("http://www.google.com", timeout=0.1).read()
except urllib.error.HTTPError as error:
print(error)
except urllib.error.URLError as error:
print(error)
except timeout as error:
print(error)
I can't finding a redirecting URL, so I'm not exactly sure how to check the HTTPError is a redirect.
You might find the requests package is a bit easier to use (it's suggested on the urllib page).
Using requests package I was able to find a better solution. With the only exception you need to handle are:
try:
r = requests.get(url, timeout =5)
except requests.exceptions.Timeout:
# Maybe set up for a retry, or continue in a retry loop
except requests.exceptions.TooManyRedirects as error:
# Tell the user their URL was bad and try a different one
except requests.exceptions.ConnectionError:
# Connection could not be completed
except requests.exceptions.RequestException as e:
# catastrophic error. bail.
And to get the text of that page, all you need to do is:
r.text
I'm looking for a way, in Python, to get the HTTP message from the status code. Is there a way that is better than building the dictionary yourself?
Something like:
>>> print http.codemap[404]
'Not found'
In Python 2.7, the httplib module has what you need:
>>> import httplib
>>> httplib.responses[400]
'Bad Request
Constants are also available:
httplib.responses[httplib.NOT_FOUND]
httplib.responses[httplib.OK]
Update
#shao.lo added a useful comment bellow. The http.client module can be used:
# For Python 3 use
import http.client
http.client.responses[http.client.NOT_FOUND]
This problem solved in Python 3.5 onward. Now Python http library has built-in module called HTTPStatus. You can easily get the http status details such as status code, description, etc. These are the example code for HTTPStatus.
Python 3:
You could use HTTPStatus(<error-code>).phrase to obtain the description for a given code value. E.g. like this:
try:
response = opener.open(url,
except HTTPError as e:
print(f'In func my_open got HTTP {e.code} {HTTPStatus(e.code).phrase}')
Would give for example:
In func my_open got HTTP 415 Unsupported Media Type
Which answers the question. However a much more direct solution can be obtained from HTTPError.reason:
try:
response = opener.open(url, data)
except HTTPError as e:
print(f'In func my_open got HTTP {e.code} {e.reason}')
A 404 error indicates that the page may have been moved or taken offline. It differs from a 400 error which would indicate pointing at a page that never existed.
I'm currently doing a lot of stuff with BigQuery, and am using a lot of try... except.... It looks like just about every error I get back from BigQuery is a apiclient.errors.HttpError, but with different strings attached to them, i.e.:
<HttpError 409 when requesting https://www.googleapis.com/bigquery/v2/projects/some_id/datasets/some_dataset/tables?alt=json returned "Already Exists: Table some_id:some_dataset.some_table">
<HttpError 404 when requesting https://www.googleapis.com/bigquery/v2/projects/some_id/jobs/sdfgsdfg?alt=json returned "Not Found: Job some_id:sdfgsdfg">
among many others. Right now the only way I see to handle these is to run regexs on the error messages, but this is messy and definitely not ideal. Is there a better way?
BigQuery is a REST API, the errors it uses follow standard HTTP error conventions.
In python, an HttpError has a resp.status field that returns the HTTP status code.
As you show above, 409 is 'conflict', 404 is 'not found'.
For example:
from googleapiclient.errors import HttpError
try:
...
except HttpError as err:
# If the error is a rate limit or connection error,
# wait and try again.
if err.resp.status in [403, 500, 503]:
time.sleep(5)
else: raise
The response is also a json object, an even better way is to parse the json and read the error reason field:
if err.resp.get('content-type', '').startswith('application/json'):
reason = json.loads(err.content).get('error').get('errors')[0].get('reason')
This can be:
notFound, duplicate, accessDenied, invalidQuery, backendError, resourcesExceeded, invalid, quotaExceeded, rateLimitExceeded, timeout, etc.
Google Cloud now provides exception handlers:
from google.api_core.exceptions import AlreadyExists, NotFound
try:
...
except AlreadyExists:
...
except NotFound:
...
This should prove more exact in catching the details of the error.
Please reference this source code to find other exceptions to utilize: http://google-cloud-python.readthedocs.io/en/latest/_modules/google/api_core/exceptions.html
the following url returns the expected resonse in the browser:
http://ws.audioscrobbler.com/2.0/?method=user.getinfo&user=notonfile99&api_key=8e9de6bd545880f19d2d2032c28992b4
<lfm status="failed">
<error code="6">No user with that name was found</error>
</lfm>
But I am unable to access the xml in Python via the following code as an HTTPError exception is raised:
"due to an HTTP Error 400: Bad Request"
import urllib
urlopen('http://ws.audioscrobbler.com/2.0/?method=user.getinfo&user=notonfile99&api_key=8e9de6bd545880f19d2d2032c28992b4')
I see that I can work aound this via using urlretrieve rather than urlopen, but the html response gets written to disc.
Is there a way, just using the python v2.7 standard library, where I can get hold of the xml response, without having to read it from disc, and do housekeeping?
I see that this question has been asked before in a PHP context, but I don't know how to apply the answer to Python:
DOMDocument load on a page returning 400 Bad Request status
Copying from here: http://www.voidspace.org.uk/python/articles/urllib2.shtml#httperror
The exception that is thrown contains the full body of the error page:
#!/usr/bin/python2
import urllib2
try:
resp = urllib2.urlopen('http://ws.audioscrobbler.com/2.0/?method=user.getinfo&user=notonfile99&api_key=8e9de6bd545880f19d2d2032c28992b4')
except urllib2.HTTPError, e:
print e.code
print e.read()
I'm having little trouble creating a script working with URLs. I'm using urllib.urlopen() to get content of desired URL. But some of these URLs requires authentication. And urlopen prompts me to type in my username and then password.
What I need is to ignore every URL that'll require authentication, just easily skip it and continue, is there a way to do this?
I was wondering about catching HTTPError exception, but in fact, exception is handled by urlopen() method, so it's not working.
Thanks for every reply.
You are right about the urllib2.HTTPError exception:
exception urllib2.HTTPError
Though being an exception (a subclass of URLError), an HTTPError can also function as a non-exceptional file-like return value (the same thing that urlopen() returns). This is useful when handling exotic HTTP errors, such as requests for authentication.
code
An HTTP status code as defined in RFC 2616. This numeric value corresponds to a value found in the dictionary of codes as found in BaseHTTPServer.BaseHTTPRequestHandler.responses.
The code attribute of the exception can be used to verify that authentication is required - code 401.
>>> try:
... conn = urllib2.urlopen('http://www.example.com/admin')
... # read conn and process data
... except urllib2.HTTPError, x:
... print 'Ignoring', x.code
...
Ignoring 401
>>>