Thinking of best practices, is it good to use try except when trying to get respond from requests.get(url) or using selenium webdriver.get(url) ?
Maybe more general question, when its meant to be use try except - except of file handling?
Thank you.
for example:
import requests
try:
respond = requests.get('https://www.google.com')
print(respond.status_code)
except Exception as e:
print(f'error while open url - {e}')
I would say it is good practice, even tough it might never come up especially when dealing with stable and widely used sites, such as Google-sites.
But in the case that the site you are trying to request is down or not responding to you, in my experience try except comes in handy and speeds up the process of finding the cause of an error.
This is good practice to have, from experience if you leave the GET request outside of a try, if you wanted to loop through a list of URLs, this would end up falling over.
Also, if it fails, you could then handle that in the exception, for example your ip is blocked from accessing x website, you can output the url and either retry with proxies or write out the url for future handling.
I am running a job which makes many requests to retrieve data from an API. In order to make the requests, I am using the requests module and an iteration over this code:
logger.debug("Some log message")
response = requests.get(
url=self._url,
headers=self.headers,
auth=self.auth,
)
logger.debug("Some other log message")
This usually produces the following logs:
[...] Some log message
[2019-08-27 03:00:57,201 - DEBUG - connectionpool.py:393] https://my.url.com:port "GET /some/important/endpoint?$skiptoken='12345' HTTP/1.1" 401 0
[2019-08-27 03:00:57,601 - DEBUG - connectionpool.py:393] https://my.url.com:port "GET /some/important/endpoint?$skiptoken='12345' HTTP/1.1" 200 951999
[...] Some other log message
In very rare occasions however, the job never terminates and in the logs it says:
[...] Some log message
[2019-08-27 03:00:57,201 - DEBUG - connectionpool.py:393] https://my.url.com:port "GET /some/important/endpoint?$skiptoken='12345' HTTP/1.1" 401 0
It never printed the remaining log messages and never returned. I am not able to reproduce the issue. I made the request which never returned manually but it gave me the desired response.
Questions:
Why does urllib3 always print a log with a status code 401 before printing a log with status code 200? Is this always the case or caused by an issue with the authentication or with the API server?
In the rare case of the second log snipped, is my assumption correct, that the application is stuck making a request which never returns? Or:
a) Could the requests.get throw an exception which results in the other log statements to never be printed and then is "magically" get caught somewhere in my code?
b) Is there a different possibility which I have not realised?
Additional Information:
Python 2.7.13 (We are already in the middle of upgrading to Python3, but this needs to be solved before that is completed)
requests 2.21.0
urllib3 1.24.3
auth is passed a requests.auth.HTTPDigestAuth(username, password)
My code has no try/except block which is why I wrote "magically" in Question 2.a. This is because we would prefer the job to Fail "loudly".
I am iterating over a generator yielding urls in order to make multiple requests
The job is run by
Jenkins 2.95 on a schedule
When everything runs successfully it makes around 300 requests in about 5min
I am running two python scripts both running the same code but against different endpoints in one job but in parallel
Update
Answer to Q1:
This seems to be expected behaviour for HTTP Digest Auth.
See this github issue and Wikipedia.
To answer your question,
1. Seems like its a problem from your API. To make sure can you run a curl command and see?
curl -i https://my.url.com:port/some/important/endpoint?$skiptoken='12345'
It never terminates, maybe due to that API is not responding back. Add timeout to avoid this kind of blocks.
response = requests.get(
url=self._url,
headers=self.headers,
auth=self.auth,
timeout=60
)
Hope this helps your problem.
As Vithulan already answered, you should always set a timeout value when doing network calls - unless you don't care about your process staying stuck forever that is...
Now wrt/ error handling etc:
a) Could the requests.get throw an exception which results in the
other log statements to never be printed and then is "magically" get
caught somewhere in my code?
It's indeed possible that some other try/except block upper in the call stack swallows the exception, but only you can tell. Note that if that's the case, you have some very ill-behaved code - a try/except should 1/ only target the exact exception(s) it's supposed to handle, 2/ have the least possible code within the try block to avoid catching similar errors from another part of the code and 3/ never silence an exception (IOW it should at least log the exception and traceback).
Note that you might as well just have a deactivated logger FWIW ;-)
This being said and until you made sure you didn't have such an issue, you can still get more debugging info by logging requests exceptions in your function:
logger.debug("Some log message")
try:
response = requests.get(
url=self._url,
headers=self.headers,
auth=self.auth,
timeout=SOME_TIMEOUT_VALUE
)
except Exception as e:
# this will log the full traceback too
logger.exception("oops, call to %s failed : %s", self._url, e)
# make sure we don't swallow the exception
raise
logger.debug("Some other log message")
Now a fact of life is that HTTP requests can fail for such an awful lot of reasons that you should actually expect it to fail, so you may want to have some retry mechanism. Also, the fact that the call to requests.get didn't raise doesn't mean the call failed - you still have to check the response code (or use response.raise_for_status()).
EDIT:
As mentioned in my question, my code has no try/except block because we would like the entire job to terminate if any problem occurs.
A try/except block doesn't prevent you from terminating the job - just re-raise the exception (eventually after X retries), or raise a new one instead, or call sys.exit() (which actually works by raising an exception) -, and it lets you get useful debugging infos etc, cf my example code.
If there is an issue with the logger, this would then only occur in rare occasions. I can not imagine a scenario where the same code is run but sometimes the loggers are activated and sometimes not.
I was talking about another logger upper in the call stack. But this was only for completeness, I really think you just have a request that never returns for the lack of a timeout.
Do you know why I am noticing the Issue I talk about in Question 1?
Nope, and that's actually something I'd immediatly investigate since, AFAICT, for a same request, you should either have only the 401 or only the 200.
According to the RFC:
10.4.2 401 Unauthorized
The request requires user authentication. The response MUST include a WWW-Authenticate header
field (section 14.47) containing a challenge applicable to the
requested resource. The client MAY repeat the request with a suitable
Authorization header field (section 14.8).
If the request already included Authorization credentials, then the 401 response indicates
that authorization has been refused for those credentials. If the 401
response contains the same challenge as the prior response, and the
user agent has already attempted authentication at least once, then
the user SHOULD be presented the entity that was given in the
response, since that entity might include relevant diagnostic
information.
So unless requests does something weird with auth headers (which is not the fact as far as I remember but...), you should only have one single response logged.
EDIT 2:
I wanted to say that if an exception is thrown but not explicitly caught by my code, it should terminate the job (which was the case in some tests I ran)
If an exception reaches the top of the call stack without being handled, the runtime will indeed terminate the process - but you have to be sure that no handler up the call stack kicks in and swallows the exception. Testing the function in isolation will not exhibit this issue, so you'd have to check the full call stack.
This being said:
The fact, that it does not terminate, suggests to me, that no exception is thrown.
That's indeed the most likely, but only you can make sure it's really the case (we don't know the full code, the logger configuration etc).
I'm developing some REST api for a webservice.
Starting from some days I've experienced a big problem that is blocking me.
When the code has an exception (during developing) the django server respond only after 5/8 or 10 minutes... with the error that occurs.
To understand what is happening I've started the server in debug using pycharm.... and then clicking on pause during the big waiting.. the code is looping here into python2.7/SocketServer.py
def _eintr_retry(func, *args):
"""restart a system call interrupted by EINTR"""
while True:
try:
return func(*args)
except (OSError, select.error) as e:
if e.args[0] != errno.EINTR:
raise
print(foo)
What can I do? I'm pretty desperate!
Sometimes that happens in Django debug mode because Django generates a nice page with traceback and a list of all local variables for every stack frame.
When Django models has an inefficient __str__ (__unicode__ since you're using Python 2) method, this can result in Django loading many thousands of objects from the database to display traceback.
In my experience this is the only reason for a very long pauses on exception in Django. Try running with DEBUG = False or check what exactly model has inefficient __str__ method that hits database.
I'm sending through some arguments to a Splash endpoint from a Scrapy crawler to be used by a Splash script that will be run. Sometimes errors may occur in that script. For runtime errors I use pcall to wrap the questionable script so I can gracefully catch and return run time errors. For syntax errors however, this doesn't work and instead a 400 error is thrown. I set my crawler to handle such an errors with the handle_httpstatus_list attribute, so my parse callback is being called on such an error, where I can inspect what went wrong gracefully.
So far so good, except since I couldn't handle the syntax error gracefully in the lua script I wasn't able to return some of the input Splash args that the callback will be expecting to access. Right now I'm calling response._splash_args() on the SplashJsonResponse object that I do have and this is allowing me to access those values. However this is essentially a protected method which means it might not be a long term solution.
Is there a better way to access the Splash args in the response when you can't rely on the associated splash script running at all?
I have a method on an API gateway which integrates with a Python Lambda function. I would like my API Gateway to return a 500 response if an exception is raised in my Python Lambda function where the errorMessage field doesn't match one of the regex's in the Integration Response's for the method.
I would only like to return a 200 if an exception is not raised and the lambda returns without failure.
With the setup in the picture above - any exception raised which does not match (.|\n)*'type':\s*'.*Error'(.|\n)* for the 400 response will give a 200 response.
How can I return 500 if there is any exception raised which does not match an already configured response - while still returning 200 when the Python function returns successfully? Do I just have to wrap everything in my code (with try except) which could possibly raise an exception and generate a predictable errorMessage string?
EDIT:
I am currently 'achieving this' by having an Integration Response for 500 with a Lambda Error Regex of (.|\n)*. Is this a reasonable way to catch unhandled exceptions as 500 errors?
EDIT:
It turns out this configuration is giving 500 errors when the Python function returns without Exception.
I would expect the Lambda Error Regex's to only attempt to match if an actual exception is raised by the Python function (as they only check the errorMessage field according to the documentation). Causing it to fall back on the default method of 201 (in the updated scenario). Oddly, it returns 400 when testing with the API GW console 'Test', but gives 500 (with the body mapping defined in the 500 Integration Response) when trying from anywhere else.
The only way I can think to return all unhandled exceptions as 500 is to raise Exception('Success') instead of return from my Python function - then have the 201 Lambda Error Regex match 'Success'... Which I'd really rather not do.
Your current approach is reasonable given the limitations of API Gateway's error response handling mechanism.
The API Gateway team has received feedback from multiple customers indicating the shortcomings of our error handling support. We are actively working on improving this and hope to deliver an improved error handling experience in the future.