I'm trying to implement a method which tries to make a few attempts to download an image from url. To do so, I'm using requests lib. An example of my code is:
while attempts < nmr_attempts:
try:
attempts += 1
response = requests.get(self.basis_url, params=query_params, timeout=response_timeout)
except Exception as e:
pass
Each attempt can't spend more than "response_timeout" making the request. However It seems that the timeout variable is not doing anything since it does not respect the times given by myself.
How can I limit the max blocking time at response.get() call.
Thanks in advance
Can you try following (get rid of try-except block) and see if it helps? except Exception is probably suppressing the exception that requests.get throws.
while attempts < nmr_attempts:
response = requests.get(self.basis_url, params=query_params, timeout=response_timeout)
Or with your original code, you can catch requests.exceptions.ReadTimeout exception. Such as:
while attempts < nmr_attempts:
try:
attempts += 1
response = requests.get(self.basis_url, params=query_params, timeout=response_timeout)
except requests.exceptions.ReadTimeout as e:
do_something()
Related
I have a function for a scraper that connects to a webpage, and checks the response code (anything within 200 fine, anything else not ok). The function retries the connection, when it has a connection error or an SSLerror, and tries it again and again until the try limit has been reached. Within my try block, I try to validate the response with if else statement. If the response is ok, the response is returned, otherwise, the else statement should print the response code, but also execute the except block. Should this be done by manually raising an exception, and calling this in the except block as such?
#Try to get the url at least ten times, in case it times out
def retries(scraper, json, headers, record_url, tries=10):
for i in range(tries):
try:
response=scraper.post("https://webpageurl/etc", json=json, headers=headers)
if response.ok:
print ('OK!')
return response
else:
print (str(response))
raise Exception("Bad Response Code")
except (Exception, ConnectionError, SSLError):
if i < tries - 1:
sleep(randint(1, 2))
continue
else:
return 'Con error'
Since this is not really an exception, you could just check for a problem condition after the try .. except block:
#Try to get the url at least ten times, in case it times out
def retries(scraper, json, headers, record_url, tries=10):
problem = False
for i in range(tries):
try:
response=scraper.post("https://webpageurl/etc", json=json, headers=headers)
if response.ok:
print ('OK!')
return response
else:
print (str(response))
problem = True
except (Exception, ConnectionError, SSLError):
problem = True
if problem:
if i < tries - 1:
sleep(randint(1, 2))
continue
else:
return 'Con error'
Since your try..except is within the for loop, it would continue regardless, I assumed that was by design.
In the example above, I removed the "Bad Response Code" text, since you made no use of it anyway (you don't access the exception, and don't reraise it), but of course instead of a flag, you could also pass a specific problem code, or even a message.
The advantage of the if after the except is that no exception has to be raised to achieve the same, which is a much more expensive operation.
However, I'm not saying the above is always preferred - the advantage of an exception is that you could reraise it as needed and you can catch it outside the function instead of inside, if the exception would require it.
I want to catch an Exception in a for loop, and print a message in all items of the for loop, but at the end, still throw an Exception, so that I know the script failed. How to do that?
I'm trying this for now:
for key, value in marketplaces.items():
try:
r = requests.post(value, data=json.dumps({'Content': message(key, res)}), verify=False)
except:
r = requests.post(value, data=json.dumps({'Content': "Sorry, an error occured"}), verify=False)
But with this code, I only catch the exception and post an error message to all items in the for loop, but don't know that the script failed. I want to know this and throw and exception, after posting an error message to all items in for loop.
Any suggestions? Thank you!
The idea is to set a variable inside the except clause and check it after the loop is done. The example below is roughly what you need.
Note that there could be more than one failure, so we use a list to store all the errors, and report them all at the end.
Also note that the correct way to catch an exception in Python is to catch a specific exception -- in this case, it suffices to catch the requests.exceptions base class.
The code below stores the exception string e in the list of errors and afterwards, we check if there are any errors, and raise a new exception with the entire list of errors.
In this case, we're just raising the most generic exception type for requests exceptions -- requests.exceptions.RequestException (i.e. the base class once again). This is because there are various possible errors which could have occurred during the requests. Depending on your use case, you might prefer another more standard exception type, such as RuntimeError -- see the docs for the standard exception types.
import requests
errors = []
for key, value in marketplaces.items():
try:
r = requests.post(value, data=json.dumps({'Content': message(key, res)}), verify=False)
except requests.exceptions.RequestException as e:
r = requests.post(value, data=json.dumps({'Content': "Sorry, an error occured"}), verify=False)
errors.append(str(e))
if errors:
raise requests.exceptions.RequestException(errors)
Alternative:
Though the above is a fairly common approach that suits many use cases, it may be convoluted in some situations. Hence, I'm proposing a much simpler and more trivial alternative, which may also suit your needs, or anyone else who stumbles across this. Simply print out an error message each time an exception is raised (i.e. each time the code enters the except clause), like this:
import requests
import sys
for key, value in marketplaces.items():
try:
r = requests.post(value, data=json.dumps({'Content': message(key, res)}), verify=False)
except requests.exceptions.RequestException as e:
r = requests.post(value, data=json.dumps({'Content': "Sorry, an error occured"}), verify=False)
print(str(e), file=sys.stderr)
Hope this helps!
I have always used:
r = requests.get(url)
if r.status_code == 200:
# my passing code
else:
# anything else, if this even exists
Now I was working on another issue and decided to allow for other errors and am instead now using:
try:
r = requests.get(url)
r.raise_for_status()
except requests.exceptions.ConnectionError as err:
# eg, no internet
raise SystemExit(err)
except requests.exceptions.HTTPError as err:
# eg, url, server and other errors
raise SystemExit(err)
# the rest of my code is going here
With the exception that various other errors could be tested for at this level, is one method any better than the other?
Response.raise_for_status() is just a built-in method for checking status codes and does essentially the same thing as your first example.
There is no "better" here, just about personal preference with flow control. My preference is toward try/except blocks for catching errors in any call, as this informs the future programmer that these conditions are some sort of error. If/else doesn't necessarily indicate an error when scanning code.
Edit: Here's my quick-and-dirty pattern.
import time
import requests
from requests.exceptions import HTTPError
url = "https://theurl.com"
retries = 3
for n in range(retries):
try:
response = requests.get(url)
response.raise_for_status()
break
except HTTPError as exc:
code = exc.response.status_code
if code in [429, 500, 502, 503, 504]:
# retry after n seconds
time.sleep(n)
continue
raise
However, in most scenarios, I subclass requests.Session, make a custom HTTPAdapter that handles exponential backoffs, and the above lives in an overridden requests.Session.request method. An example of that can be seen here.
Almost always, raise_for_status() is better.
The main reason is that there is a bit more to it than testing status_code == 200, and you should be making best use of tried-and-tested code rather than creating your own implementation.
For instance, did you know that there are actually five different 'success' codes defined by the HTTP standard? Four of those 'success' codes will be misinterpreted as failure by testing for status_code == 200.
If you are not sure, follow the Ian Goldby's answer.
...however please be aware that raise_for_status() is not some magical or exceptionally smart solution - it's a very simple function that decodes the response body and throws an exception for HTTP codes 400-599, distinguishing client-side and server-side errors (see its code here).
And especially the client-side error responses may contain valuable information in the response body that you may want to process. For example a HTTP 400 Bad Request response may contain the error reason.
In such a case it may be better to not use raise_for_status() but instead cover all the cases by yourself.
Example code
try:
r = requests.get(url)
# process the specific codes from the range 400-599
# that you are interested in first
if r.status_code == 400:
invalid_request_reason = r.text
print(f"Your request has failed because: {invalid_request_reason}")
return
# this will handle all other errors
elif r.status_code > 400:
print(f"Your request has failed with status code: {r.status_code}")
return
except requests.exceptions.ConnectionError as err:
# eg, no internet
raise SystemExit(err)
# the rest of my code is going here
Real-world use case
PuppetDB's API using the Puppet Query Language (PQL) responds with a HTTP 400 Bad Request to a syntactically invalid query with a very precise info where is the error.
Request query:
nodes[certname] { certname == "bastion" }
Body of the HTTP 400 response:
PQL parse error at line 1, column 29:
nodes[certname] { certname == "bastion" }
^
Expected one of:
[
false
true
#"[0-9]+"
-
'
"
#"\s+"
See my Pull Request to an app that uses this API to make it show this error message to a user here, but note that it doesn't exactly follow the example code above.
Better is somewhat subjective; both can get the job done. That said, as a relatively inexperienced programmer I prefer the Try / Except form.
For me, the T / E reminds me that requests don't always give you what you expect (in a way that if / else doesn't - but that could just be me).
raise_for_status() also lets you easily implement as many or as few different actions for the different error types (.HTTPError, .ConnectionError) as you need.
In my current project, I've settled on the form below, as I'm taking the same action regardless of cause, but am still interested to know the cause:
try:
...
except requests.exceptions.RequestException as e:
raise SystemExit(e) from None
Toy implementation:
import requests
def http_bin_repsonse(status_code):
sc = status_code
try:
url = "http://httpbin.org/status/" + str(sc)
response = requests.post(url)
response.raise_for_status()
p = response.content
except requests.exceptions.RequestException as e:
print("placeholder for save file / clean-up")
raise SystemExit(e) from None
return response, p
response, p = http_bin_repsonse(403)
print(response.status_code)
I made a simple script for amusment that takes the latest comment from http://www.reddit.com/r/random/comments.json?limit=1 and speaks through espeak. I ran into a problem however. If Reddit fails to give me the json data, which it commonly does, the script stops and gives a traceback. This is a problem, as it stops the script. Is there any sort of way to retry to get the json if it fails to load. I am using requests if that means anything
If you need it, here is the part of the code that gets the json data
url = 'http://www.reddit.com/r/random/comments.json?limit=1'
r = requests.get(url)
quote = r.text
body = json.loads(quote)['data']['children'][0]['data']['body']
subreddit = json.loads(quote)['data']['children'][0]['data']['subreddit']
For the vocabulary, the actual error you're having is an exception that has been thrown at some point in a program because of a detected runtime error, and the traceback is the program thread that tells you where the exception has been thrown.
Basically, what you want is an exception handler:
try:
url = 'http://www.reddit.com/r/random/comments.json?limit=1'
r = requests.get(url)
quote = r.text
body = json.loads(quote)['data']['children'][0]['data']['body']
subreddit = json.loads(quote)['data']['children'][0]['data']['subreddit']
except Exception as err:
print err
so that you jump over the part that needs the thing that couldn't work. Have a look at that doc as well: HandlingExceptions - Python Wiki
As pss suggests, if you want to retry after the url failed to load:
done = False
while not done:
try:
url = 'http://www.reddit.com/r/random/comments.json?limit=1'
r = requests.get(url)
except Exception as err:
print err
done = True
quote = r.text
body = json.loads(quote)['data']['children'][0]['data']['body']
subreddit = json.loads(quote)['data']['children'][0]['data']['subreddit']
N.B.: That solution may not be optimal, since if you're offline or the URL is always failing, it'll do an infinite loop. If you retry too fast and too much, Reddit may also ban you.
N.B. 2: I'm using the newest Python 3 syntax for exception handling, which may not work with Python older than 2.7.
N.B. 3: You may also want to choose a class other than Exception for the exception handling, to be able to select what kind of error you want to handle. It mostly depends on your app design, and given what you say, you might want to handle requests.exceptions.ConnectionError, but have a look at request's doc to choose the right one.
Here's what you may want, but please think this through and adapt it to your use case:
import requests
import time
import json
def get_reddit_comments():
retries = 5
while retries != 0:
try:
url = 'http://www.reddit.com/r/random/comments.json?limit=1'
r = requests.get(url)
break # if the request succeeded we get out of the loop
except requests.exceptions.ConnectionError as err:
print("Warning: couldn't get the URL: {}".format(err))
time.delay(1) # wait 1 second between two requests
retries -= 1
if retries == 0: # if we've done 5 attempts, we fail loudly
return None
return r.text
def use_data(quote):
if not quote:
print("could not get URL, despites multiple attempts!")
return False
data = json.loads(quote)
if 'error' in data.keys():
print("could not get data from reddit: error code #{}".format(quote['error']))
return False
body = data['data']['children'][0]['data']['body']
subreddit = data['data']['children'][0]['data']['subreddit']
# … do stuff with your data here
if __name__ == "__main__":
quote = get_reddit_comments()
if not use_data(quote):
print("Fatal error: Couldn't handle data receipt from reddit.")
sys.exit(1)
I hope this snippet will help you correctly design your program. And now that you've discovered exceptions, please always remember that exceptions are for handling things that shall stay exceptional. If you throw an exception at some point in one of your programs, always ask yourself if this is something that should happen when something unexpected happens (like a webpage not loading), or if it's an expected error (like a page loading but giving you an output that is not expected).
I might be approaching this the wrong way, but I've got a POST request going out:
response = requests.post(full_url, json.dumps(data))
Which could potentially fail for a number of reasons, some being related to the data, some being temporary failures, which due to a poorly designed endpoint may well return as the same error (server does unpredictable things with invalid data). To catch these temporary failures and let others pass I thought the best way to go about this would be to retry once and then continue if the error is raised again. I believe I could do it with a nested try/except, but it seems like bad practice to me (what if I want to try twice before giving up?)
That solution would be:
try:
response = requests.post(full_url, json.dumps(data))
except RequestException:
try:
response = requests.post(full_url, json.dumps(data))
except:
continue
Is there a better way to do this? Alternately is there a better way in general to deal with potentially faulty HTTP responses?
for _ in range(2):
try:
response = requests.post(full_url, json.dumps(data))
break
except RequestException:
pass
else:
raise # both tries failed
If you need a function for this:
def multiple_tries(func, times, exceptions):
for _ in range(times):
try:
return func()
except Exception as e:
if not isinstance(e, exceptions):
raise # reraises unexpected exceptions
raise # reraises if attempts are unsuccessful
Use like this:
func = lambda:requests.post(full_url, json.dumps(data))
response = multiple_tries(func, 2, RequestException)