Cannot read urllib error message once it is read() - python

My problem is with error handling of the python urllib error object. I am unable to read the error message while still keeping it intact in the error object, for it to be consumed later.
response = urllib.request.urlopen(request) # request that will raise an error
response.read()
response.read() # is empty now
# Also tried seek(0), that does not work either.
So this how I intend to use it, but when the Exception bubbles up, the.read() second time is empty.
try:
response = urllib.request.urlopen(request)
except urllib.error.HTTPError as err:
self.log.exception(err.read())
raise err
I tried making a deepcopy of the err object,
import copy
try:
response = urllib.request.urlopen(request)
except urllib.error.HTTPError as err:
err_obj_copy = copy.deepcopy(err)
self.log.exception(
"Method:{}\n"
"URL:{}\n"
"Data:{}\n"
"Details:{}\n"
"Headers:{}".format(method, url, data, err_obj_copy.read(), headers))
raise err
but copy is unable to make a deepcopy and throws an error -
TypeError: __init__() missing 5 required positional arguments: 'url', 'code', 'msg', 'hdrs', and 'fp'.
How do I read the error message, while still keeping it intact in the object?
I do know how to do it using requests, but I am stuck with legacy code and need to make it work with urllib

This is what I did. Worked for me.
When reading the error for the first time, save it to a variable like this: msg = response.read().decode('utf8'). You can then create a new HTTPError instance, with the message, and propagate it.
resp = urllib.request.urlopen(request)
msg = resp.read().decode('utf8')
self.log.exception(msg)
raise HTTPError(resp.url, resp.code, resp.reason, resp.headers, io.BytesIO(bytes(msg, 'utf8')))

The error object may read from the network. Network is not seekable -- you can't go back in the general case.
You could replace err with a new HTTPError instance that reads from a buffer (like io.BytesIO()) instead of the network e.g., (not tested):
content = err.read()
self.log.exception(content)
raise HTTPError(err.url, err.code, err.reason, err.headers, io.BytesIO(content))
Though I'm not sure that you should -- handle the error in a single place instead e.g., reraise a more application specific exception or leave the logging to an upstream handler.

Related

When to use `raise_for_status` vs `status_code` testing

I have always used:
r = requests.get(url)
if r.status_code == 200:
# my passing code
else:
# anything else, if this even exists
Now I was working on another issue and decided to allow for other errors and am instead now using:
try:
r = requests.get(url)
r.raise_for_status()
except requests.exceptions.ConnectionError as err:
# eg, no internet
raise SystemExit(err)
except requests.exceptions.HTTPError as err:
# eg, url, server and other errors
raise SystemExit(err)
# the rest of my code is going here
With the exception that various other errors could be tested for at this level, is one method any better than the other?
Response.raise_for_status() is just a built-in method for checking status codes and does essentially the same thing as your first example.
There is no "better" here, just about personal preference with flow control. My preference is toward try/except blocks for catching errors in any call, as this informs the future programmer that these conditions are some sort of error. If/else doesn't necessarily indicate an error when scanning code.
Edit: Here's my quick-and-dirty pattern.
import time
import requests
from requests.exceptions import HTTPError
url = "https://theurl.com"
retries = 3
for n in range(retries):
try:
response = requests.get(url)
response.raise_for_status()
break
except HTTPError as exc:
code = exc.response.status_code
if code in [429, 500, 502, 503, 504]:
# retry after n seconds
time.sleep(n)
continue
raise
However, in most scenarios, I subclass requests.Session, make a custom HTTPAdapter that handles exponential backoffs, and the above lives in an overridden requests.Session.request method. An example of that can be seen here.
Almost always, raise_for_status() is better.
The main reason is that there is a bit more to it than testing status_code == 200, and you should be making best use of tried-and-tested code rather than creating your own implementation.
For instance, did you know that there are actually five different 'success' codes defined by the HTTP standard? Four of those 'success' codes will be misinterpreted as failure by testing for status_code == 200.
If you are not sure, follow the Ian Goldby's answer.
...however please be aware that raise_for_status() is not some magical or exceptionally smart solution - it's a very simple function that decodes the response body and throws an exception for HTTP codes 400-599, distinguishing client-side and server-side errors (see its code here).
And especially the client-side error responses may contain valuable information in the response body that you may want to process. For example a HTTP 400 Bad Request response may contain the error reason.
In such a case it may be better to not use raise_for_status() but instead cover all the cases by yourself.
Example code
try:
r = requests.get(url)
# process the specific codes from the range 400-599
# that you are interested in first
if r.status_code == 400:
invalid_request_reason = r.text
print(f"Your request has failed because: {invalid_request_reason}")
return
# this will handle all other errors
elif r.status_code > 400:
print(f"Your request has failed with status code: {r.status_code}")
return
except requests.exceptions.ConnectionError as err:
# eg, no internet
raise SystemExit(err)
# the rest of my code is going here
Real-world use case
PuppetDB's API using the Puppet Query Language (PQL) responds with a HTTP 400 Bad Request to a syntactically invalid query with a very precise info where is the error.
Request query:
nodes[certname] { certname == "bastion" }
Body of the HTTP 400 response:
PQL parse error at line 1, column 29:
nodes[certname] { certname == "bastion" }
^
Expected one of:
[
false
true
#"[0-9]+"
-
'
"
#"\s+"
See my Pull Request to an app that uses this API to make it show this error message to a user here, but note that it doesn't exactly follow the example code above.
Better is somewhat subjective; both can get the job done. That said, as a relatively inexperienced programmer I prefer the Try / Except form.
For me, the T / E reminds me that requests don't always give you what you expect (in a way that if / else doesn't - but that could just be me).
raise_for_status() also lets you easily implement as many or as few different actions for the different error types (.HTTPError, .ConnectionError) as you need.
In my current project, I've settled on the form below, as I'm taking the same action regardless of cause, but am still interested to know the cause:
try:
...
except requests.exceptions.RequestException as e:
raise SystemExit(e) from None
Toy implementation:
import requests
def http_bin_repsonse(status_code):
sc = status_code
try:
url = "http://httpbin.org/status/" + str(sc)
response = requests.post(url)
response.raise_for_status()
p = response.content
except requests.exceptions.RequestException as e:
print("placeholder for save file / clean-up")
raise SystemExit(e) from None
return response, p
response, p = http_bin_repsonse(403)
print(response.status_code)

What exactly is 'e' and what does e.code() or e.read() do?

try:
response = urllib2.urlopen(request)
except urllib2.URLError as e:
response = json.loads(e.read())
return error(e.code(),response['errors'][0]['message'])
response = json.loads(response.read())
if 'errors' in response:
return error(response['ErrorCode'],response['Error'])
Here is the piece of code I am working with and you can help me out referring to this piece.
e is the caught exception, here an instance of the urllib2.URLError class or a subclass thereof. The code expects that instance to define the e.read() and e.code() methods.
However, it has some errors. It actually assumes it is catching the urllib2.HTTPError exception, a subclass of URLError. The exception handler certainly will catch such exceptions, but it could also be the base URLError, in which case there won't be an e.read() method! It also tries to call HTTPError.code, which is not a method but an attribute.
The HTTPError exception is thrown for HTTP error codes, so only when there was a response from the server. e.read() lets you read the response body from the socket, and e.code is the HTTP code the server responded with. From the documentation:
Though being an exception (a subclass of URLError), an HTTPError can also function as a non-exceptional file-like return value (the same thing that urlopen() returns). This is useful when handling exotic HTTP errors, such as requests for authentication.
For the code to work in all cases, it would have to be corrected to:
try:
response = urllib2.urlopen(request)
except urllib2.HTTPError as e:
response = json.loads(e.read())
return error(e.code, response['errors'][0]['message'])
perhaps with an additional except urllib2.URLError as e: block to handle errors that don't involve a HTTP response.

How to handle requests exception in Python [duplicate]

try:
r = requests.get(url, params={'s': thing})
except requests.ConnectionError, e:
print(e)
Is this correct? Is there a better way to structure this? Will this cover all my bases?
Have a look at the Requests exception docs. In short:
In the event of a network problem (e.g. DNS failure, refused connection, etc), Requests will raise a ConnectionError exception.
In the event of the rare invalid HTTP response, Requests will raise an HTTPError exception.
If a request times out, a Timeout exception is raised.
If a request exceeds the configured number of maximum redirections, a TooManyRedirects exception is raised.
All exceptions that Requests explicitly raises inherit from requests.exceptions.RequestException.
To answer your question, what you show will not cover all of your bases. You'll only catch connection-related errors, not ones that time out.
What to do when you catch the exception is really up to the design of your script/program. Is it acceptable to exit? Can you go on and try again? If the error is catastrophic and you can't go on, then yes, you may abort your program by raising SystemExit (a nice way to both print an error and call sys.exit).
You can either catch the base-class exception, which will handle all cases:
try:
r = requests.get(url, params={'s': thing})
except requests.exceptions.RequestException as e: # This is the correct syntax
raise SystemExit(e)
Or you can catch them separately and do different things.
try:
r = requests.get(url, params={'s': thing})
except requests.exceptions.Timeout:
# Maybe set up for a retry, or continue in a retry loop
except requests.exceptions.TooManyRedirects:
# Tell the user their URL was bad and try a different one
except requests.exceptions.RequestException as e:
# catastrophic error. bail.
raise SystemExit(e)
As Christian pointed out:
If you want http errors (e.g. 401 Unauthorized) to raise exceptions, you can call Response.raise_for_status. That will raise an HTTPError, if the response was an http error.
An example:
try:
r = requests.get('http://www.google.com/nothere')
r.raise_for_status()
except requests.exceptions.HTTPError as err:
raise SystemExit(err)
Will print:
404 Client Error: Not Found for url: http://www.google.com/nothere
One additional suggestion to be explicit. It seems best to go from specific to general down the stack of errors to get the desired error to be caught, so the specific ones don't get masked by the general one.
url='http://www.google.com/blahblah'
try:
r = requests.get(url,timeout=3)
r.raise_for_status()
except requests.exceptions.HTTPError as errh:
print ("Http Error:",errh)
except requests.exceptions.ConnectionError as errc:
print ("Error Connecting:",errc)
except requests.exceptions.Timeout as errt:
print ("Timeout Error:",errt)
except requests.exceptions.RequestException as err:
print ("OOps: Something Else",err)
Http Error: 404 Client Error: Not Found for url: http://www.google.com/blahblah
vs
url='http://www.google.com/blahblah'
try:
r = requests.get(url,timeout=3)
r.raise_for_status()
except requests.exceptions.RequestException as err:
print ("OOps: Something Else",err)
except requests.exceptions.HTTPError as errh:
print ("Http Error:",errh)
except requests.exceptions.ConnectionError as errc:
print ("Error Connecting:",errc)
except requests.exceptions.Timeout as errt:
print ("Timeout Error:",errt)
OOps: Something Else 404 Client Error: Not Found for url: http://www.google.com/blahblah
Exception object also contains original response e.response, that could be useful if need to see error body in response from the server. For example:
try:
r = requests.post('somerestapi.com/post-here', data={'birthday': '9/9/3999'})
r.raise_for_status()
except requests.exceptions.HTTPError as e:
print (e.response.text)
Here's a generic way to do things which at least means that you don't have to surround each and every requests call with try ... except:
Basic version
# see the docs: if you set no timeout the call never times out! A tuple means "max
# connect time" and "max read time"
DEFAULT_REQUESTS_TIMEOUT = (5, 15) # for example
def log_exception(e, verb, url, kwargs):
# the reason for making this a separate function will become apparent
raw_tb = traceback.extract_stack()
if 'data' in kwargs and len(kwargs['data']) > 500: # anticipate giant data string
kwargs['data'] = f'{kwargs["data"][:500]}...'
msg = f'BaseException raised: {e.__class__.__module__}.{e.__class__.__qualname__}: {e}\n' \
+ f'verb {verb}, url {url}, kwargs {kwargs}\n\n' \
+ 'Stack trace:\n' + ''.join(traceback.format_list(raw_tb[:-2]))
logger.error(msg)
def requests_call(verb, url, **kwargs):
response = None
exception = None
try:
if 'timeout' not in kwargs:
kwargs['timeout'] = DEFAULT_REQUESTS_TIMEOUT
response = requests.request(verb, url, **kwargs)
except BaseException as e:
log_exception(e, verb, url, kwargs)
exception = e
return (response, exception)
NB
Be aware of ConnectionError which is a builtin, nothing to do with the class requests.ConnectionError*. I assume the latter is more common in this context but have no real idea...
When examining a non-None returned exception, requests.RequestException, the superclass of all the requests exceptions (including requests.ConnectionError), is not "requests.exceptions.RequestException" according to the docs. Maybe it has changed since the accepted answer.**
Obviously this assumes a logger has been configured. Calling logger.exception in the except block might seem a good idea but that would only give the stack within this method! Instead, get the trace leading up to the call to this method. Then log (with details of the exception, and of the call which caused the problem)
*I looked at the source code: requests.ConnectionError subclasses the single class requests.RequestException, which subclasses the single class IOError (builtin)
**However at the bottom of this page you find "requests.exceptions.RequestException" at the time of writing (2022-02)... but it links to the above page: confusing.
Usage is very simple:
search_response, exception = utilities.requests_call('get',
f'http://localhost:9200/my_index/_search?q={search_string}')
First you check the response: if it's None something funny has happened and you will have an exception which has to be acted on in some way depending on context (and on the exception). In Gui applications (PyQt5) I usually implement a "visual log" to give some output to the user (and also log simultaneously to the log file), but messages added there should be non-technical. So something like this might typically follow:
if search_response == None:
# you might check here for (e.g.) a requests.Timeout, tailoring the message
# accordingly, as the kind of error anyone might be expected to understand
msg = f'No response searching on |{search_string}|. See log'
MainWindow.the().visual_log(msg, log_level=logging.ERROR)
return
response_json = search_response.json()
if search_response.status_code != 200: # NB 201 ("created") may be acceptable sometimes...
msg = f'Bad response searching on |{search_string}|. See log'
MainWindow.the().visual_log(msg, log_level=logging.ERROR)
# usually response_json will give full details about the problem
log_msg = f'search on |{search_string}| bad response\n{json.dumps(response_json, indent=4)}'
logger.error(log_msg)
return
# now examine the keys and values in response_json: these may of course
# indicate an error of some kind even though the response returned OK (status 200)...
Given that the stack trace is logged automatically you often need no more than that...
Advanced version when json object returned
(... potentially sparing a great deal of boilerplate!)
To cross the Ts, when a json object is expected to be returned:
If, as above, an exception gives your non-technical user a message "No response", and a non-200 status "Bad response", I suggest that
a missing expected key in the response's JSON structure should give rise to a message "Anomalous response"
an out-of-range or strange value to a message "Unexpected response"
and the presence of a key such as "error" or "errors", with value True or whatever, to a message "Error response"
These may or may not prevent the code from continuing.
... and in fact to my mind it is worth making the process even more generic. These next functions, for me, typically cut down 20 lines of code using the above requests_call to about 3, and make most of your handling and your log messages standardised. More than a handful of requests calls in your project and the code gets a lot nicer and less bloated:
def log_response_error(response_type, call_name, deliverable, verb, url, **kwargs):
# NB this function can also be used independently
if response_type == 'No': # exception was raised (and logged)
if isinstance(deliverable, requests.Timeout):
MainWindow.the().visual_log(f'Time out of {call_name} before response received!', logging.ERROR)
return
else:
if isinstance(deliverable, BaseException):
# NB if response.json() raises an exception we end up here
log_exception(deliverable, verb, url, kwargs)
else:
# if we get here no exception has been raised, so no stack trace has yet been logged.
# a response has been returned, but is either "Bad" or "Anomalous"
response_json = deliverable.json()
raw_tb = traceback.extract_stack()
if 'data' in kwargs and len(kwargs['data']) > 500: # anticipate giant data string
kwargs['data'] = f'{kwargs["data"][:500]}...'
added_message = ''
if hasattr(deliverable, 'added_message'):
added_message = deliverable.added_message + '\n'
del deliverable.added_message
call_and_response_details = f'{response_type} response\n{added_message}' \
+ f'verb {verb}, url {url}, kwargs {kwargs}\nresponse:\n{json.dumps(response_json, indent=4)}'
logger.error(f'{call_and_response_details}\nStack trace: {"".join(traceback.format_list(raw_tb[:-1]))}')
MainWindow.the().visual_log(f'{response_type} response {call_name}. See log.', logging.ERROR)
def check_keys(req_dict_structure, response_dict_structure, response):
# so this function is about checking the keys in the returned json object...
# NB both structures MUST be dicts
if not isinstance(req_dict_structure, dict):
response.added_message = f'req_dict_structure not dict: {type(req_dict_structure)}\n'
return False
if not isinstance(response_dict_structure, dict):
response.added_message = f'response_dict_structure not dict: {type(response_dict_structure)}\n'
return False
for dict_key in req_dict_structure.keys():
if dict_key not in response_dict_structure:
response.added_message = f'key |{dict_key}| missing\n'
return False
req_value = req_dict_structure[dict_key]
response_value = response_dict_structure[dict_key]
if isinstance(req_value, dict):
# if the response at this point is a list apply the req_value dict to each element:
# failure in just one such element leads to "Anomalous response"...
if isinstance(response_value, list):
for resp_list_element in response_value:
if not check_keys(req_value, resp_list_element, response):
return False
elif not check_keys(req_value, response_value, response): # any other response value must be a dict (tested in next level of recursion)
return False
elif isinstance(req_value, list):
if not isinstance(response_value, list): # if the req_value is a list the reponse must be one
response.added_message = f'key |{dict_key}| not list: {type(response_value)}\n'
return False
# it is OK for the value to be a list, but these must be strings (keys) or dicts
for req_list_element, resp_list_element in zip(req_value, response_value):
if isinstance(req_list_element, dict):
if not check_keys(req_list_element, resp_list_element, response):
return False
if not isinstance(req_list_element, str):
response.added_message = f'req_list_element not string: {type(req_list_element)}\n'
return False
if req_list_element not in response_value:
response.added_message = f'key |{req_list_element}| missing from response list\n'
return False
# put None as a dummy value (otherwise something like {'my_key'} will be seen as a set, not a dict
elif req_value != None:
response.added_message = f'required value of key |{dict_key}| must be None (dummy), dict or list: {type(req_value)}\n'
return False
return True
def process_json_requests_call(verb, url, **kwargs):
# "call_name" is a mandatory kwarg
if 'call_name' not in kwargs:
raise Exception('kwarg "call_name" not supplied!')
call_name = kwargs['call_name']
del kwargs['call_name']
required_keys = {}
if 'required_keys' in kwargs:
required_keys = kwargs['required_keys']
del kwargs['required_keys']
acceptable_statuses = [200]
if 'acceptable_statuses' in kwargs:
acceptable_statuses = kwargs['acceptable_statuses']
del kwargs['acceptable_statuses']
exception_handler = log_response_error
if 'exception_handler' in kwargs:
exception_handler = kwargs['exception_handler']
del kwargs['exception_handler']
response, exception = requests_call(verb, url, **kwargs)
if response == None:
exception_handler('No', call_name, exception, verb, url, **kwargs)
return (False, exception)
try:
response_json = response.json()
except BaseException as e:
logger.error(f'response.status_code {response.status_code} but calling json() raised exception')
# an exception raised at this point can't truthfully lead to a "No response" message... so say "bad"
exception_handler('Bad', call_name, e, verb, url, **kwargs)
return (False, response)
status_ok = response.status_code in acceptable_statuses
if not status_ok:
response.added_message = f'status code was {response.status_code}'
log_response_error('Bad', call_name, response, verb, url, **kwargs)
return (False, response)
check_result = check_keys(required_keys, response_json, response)
if not check_result:
log_response_error('Anomalous', call_name, response, verb, url, **kwargs)
return (check_result, response)
Example call (NB with this version, the "deliverable" is either an exception or a response which delivers a json structure):
success, deliverable = utilities.process_json_requests_call('get',
f'{ES_URL}{INDEX_NAME}/_doc/1',
call_name=f'checking index {INDEX_NAME}',
required_keys={'_source':{'status_text': None}})
if not success: return False
# here, we know the deliverable is a response, not an exception
# we also don't need to check for the keys being present:
# the generic code has checked that all expected keys are present
index_status = deliverable.json()['_source']['status_text']
if index_status != 'successfully completed':
# ... i.e. an example of a 200 response, but an error nonetheless
msg = f'Error response: ES index {INDEX_NAME} does not seem to have been built OK: cannot search'
MainWindow.the().visual_log(msg)
logger.error(f'index |{INDEX_NAME}|: deliverable.json() {json.dumps(deliverable.json(), indent=4)}')
return False
So the "visual log" message seen by the user in the case of missing key "status_text", for example, would be "Anomalous response checking index XYZ. See log." (and the log would give a more detailed technical message, constructed automatically, including the stack trace but also details of the missing key in question).
NB
mandatory kwarg: call_name; optional kwargs: required_keys, acceptable_statuses, exception_handler.
the required_keys dict can be nested to any depth
finer-grained exception-handling can be accomplished by including a function exception_handler in kwargs (though don't forget that requests_call will have logged the call details, the exception type and __str__, and the stack trace).
in the above I also implement a check on key "data" in any kwargs which may be logged. This is because a bulk operation (e.g. to populate an index in the case of Elasticsearch) can consist of enormous strings. So curtail to the first 500 characters, for example.
PS Yes, I do know about the elasticsearch Python module (a "thin wrapper" around requests). All the above is for illustration purposes.

How to get response on error in SUDS?

If i have some bad authorization data (for example wrong password) SUDS rises exception (400, u'Bad Request') from which i cant get anything, but in teh log is response, which contains data that password is wrong, but how to get this response? I tried like this:
except Exception as e:
print str(e)
print self._client.last_received()
It prints:
(400, u'Bad Request')
None
But in log there is long xml which contains <SOAP-ENV:Reason><SOAP-ENV:Text xml:lang="en">Sender not authorized</SOAP-ENV:Text></SOAP-ENV:Reason>
I am pulling this out of a comment and into an answer because of the code block.
import suds.client
try:
auth_url = "https://url.to.my.service/authenticator?wsdl"
auth_client = suds.client.Client(auth_url)
cookie = auth_client.service.authenticate(user,password)
except Exception as e:
print str(e)
print auth_client.last_received()
Using this code, I receive the appropriate response from my service if I pass an invalid password:
Server raised fault: 'error.pwd.incorrect'
None
And an appropriate response if I pass an invalid user id:
Server raised fault: 'error.uid.missing'
None
Something you may want to consider doing, is changing your except statement to catch suds.WebFault instead of the generic exception. There may be something else that is occurring and triggering your exception block.
One other thing that may help with your issue, is to pass faults=True in your Client() call.
The Client can be configured to throw web faults as WebFault or to
return a tuple (, )
The code I posted above would look like this:
auth_client = suds.client.Client(auth_url, faults=True)

How to keep a program running if there is an Traceback error

I made a simple script for amusment that takes the latest comment from http://www.reddit.com/r/random/comments.json?limit=1 and speaks through espeak. I ran into a problem however. If Reddit fails to give me the json data, which it commonly does, the script stops and gives a traceback. This is a problem, as it stops the script. Is there any sort of way to retry to get the json if it fails to load. I am using requests if that means anything
If you need it, here is the part of the code that gets the json data
url = 'http://www.reddit.com/r/random/comments.json?limit=1'
r = requests.get(url)
quote = r.text
body = json.loads(quote)['data']['children'][0]['data']['body']
subreddit = json.loads(quote)['data']['children'][0]['data']['subreddit']
For the vocabulary, the actual error you're having is an exception that has been thrown at some point in a program because of a detected runtime error, and the traceback is the program thread that tells you where the exception has been thrown.
Basically, what you want is an exception handler:
try:
url = 'http://www.reddit.com/r/random/comments.json?limit=1'
r = requests.get(url)
quote = r.text
body = json.loads(quote)['data']['children'][0]['data']['body']
subreddit = json.loads(quote)['data']['children'][0]['data']['subreddit']
except Exception as err:
print err
so that you jump over the part that needs the thing that couldn't work. Have a look at that doc as well: HandlingExceptions - Python Wiki
As pss suggests, if you want to retry after the url failed to load:
done = False
while not done:
try:
url = 'http://www.reddit.com/r/random/comments.json?limit=1'
r = requests.get(url)
except Exception as err:
print err
done = True
quote = r.text
body = json.loads(quote)['data']['children'][0]['data']['body']
subreddit = json.loads(quote)['data']['children'][0]['data']['subreddit']
N.B.: That solution may not be optimal, since if you're offline or the URL is always failing, it'll do an infinite loop. If you retry too fast and too much, Reddit may also ban you.
N.B. 2: I'm using the newest Python 3 syntax for exception handling, which may not work with Python older than 2.7.
N.B. 3: You may also want to choose a class other than Exception for the exception handling, to be able to select what kind of error you want to handle. It mostly depends on your app design, and given what you say, you might want to handle requests.exceptions.ConnectionError, but have a look at request's doc to choose the right one.
Here's what you may want, but please think this through and adapt it to your use case:
import requests
import time
import json
def get_reddit_comments():
retries = 5
while retries != 0:
try:
url = 'http://www.reddit.com/r/random/comments.json?limit=1'
r = requests.get(url)
break # if the request succeeded we get out of the loop
except requests.exceptions.ConnectionError as err:
print("Warning: couldn't get the URL: {}".format(err))
time.delay(1) # wait 1 second between two requests
retries -= 1
if retries == 0: # if we've done 5 attempts, we fail loudly
return None
return r.text
def use_data(quote):
if not quote:
print("could not get URL, despites multiple attempts!")
return False
data = json.loads(quote)
if 'error' in data.keys():
print("could not get data from reddit: error code #{}".format(quote['error']))
return False
body = data['data']['children'][0]['data']['body']
subreddit = data['data']['children'][0]['data']['subreddit']
# … do stuff with your data here
if __name__ == "__main__":
quote = get_reddit_comments()
if not use_data(quote):
print("Fatal error: Couldn't handle data receipt from reddit.")
sys.exit(1)
I hope this snippet will help you correctly design your program. And now that you've discovered exceptions, please always remember that exceptions are for handling things that shall stay exceptional. If you throw an exception at some point in one of your programs, always ask yourself if this is something that should happen when something unexpected happens (like a webpage not loading), or if it's an expected error (like a page loading but giving you an output that is not expected).

Categories

Resources