I have a micro service with a job that needs to happen only if a different server is up.
for a few weeks it works great, if the server was down, the micro service sleeps a bit without doing the job (as should) and if the server was up - the job was done.
the server is never down for more then a few minutes (for sure! the server is highly monitored), so the job is skipped 2-3 times tops.
Today I entered my Docker Container and noticed in the logs that the job didn't even try to continue for a few weeks now (bad choice not to monitor I know), indicating, I assume that some kind of deadlock happened.
I also assume that the problem is with my Exception handling, could use some advice I work alone.
def is_server_healthy():
url = "url" #correct url for health check path
try:
res = requests.get(url)
except Exception as ex:
LOGGER.error(f"Can't health check!{ex}")
finally:
pass
return res
def init():
while True:
LOGGER.info(f"Sleeping for {SLEEP_TIME} Minutes")
time.sleep(SLEEP_TIME*ONE_MINUTE)
res = is_server_healthy()
if res.status_code == 200:
my_api.DoJob()
LOGGER.info(f"Server is: {res.text}")
else:
LOGGER.info(f"Server is down... {res.status_code}")
(The names of the variables were changed to simplify the question)
The health check is simple enough - return "up" if up. anything else considered to be down, so unless status 200 and "up" came back I consider the server to be down.
In case your server is down you get a non-captured error:
NameError: name 'res' is not defined
Why? See:
def is_server_healthy():
url = "don't care"
try:
raise Exception() # simulate fail
except Exception as ex:
print(f"Can't health check!{ex}")
finally:
pass
return res ## name is not known ;o)
res = is_server_healthy()
if res.status_code == 200: # here, next exception bound to happen
my_api.DoJob()
LOGGER.info(f"Server is: {res.text}")
else:
LOGGER.info(f"Server is down... {res.status_code}")
Even if you declared the name, it would try to access some attribute thats not there:
if res.status_code == 200: # here - object has no attribute 'status_code'
my_api.DoJob()
LOGGER.info(f"Server is: {res.text}")
else:
LOGGER.info(f"Server is down... {res.status_code}")
would try to access a member thats simply not there => Exception, and process gone.
You are probably better off using some system-specific way to call your script once every minute (Cron Jobs, Task Scheduler) then idling in a while True: with sleep.
Related
I have a python script that calls the google analytics api once for everyday that I'm trying to get data for. However, on some calls I'm apparently receiving nothing. That or I'm handling errors incorrectly. Here is the function that I'm using to call the api.
def run_query(hour_in_dim, start_date, sessions_writer, connection_error_count, pageToken=None):
# Try to run api request for one day. Wait 10 seconds if "service is currently unavailable."
try:
traffic_results = get_api_query(analytics, start_date, start_date, pageToken)
except HttpError as err:
if err.resp.status in [503]:
print("Sleeping, api service temporarily unavailable.")
time.sleep(10)
run_query(hour_in_dim, start_date, sessions_writer, connection_error_count, pageToken)
else:
raise
except ConnectionResetError:
connection_error_count += 1
time.sleep(10)
if connection_error_count > 2:
raise
else:
run_query(hour_in_dim, start_date, sessions_writer, connection_error_count, pageToken)
# TODO: solve random occurances of "UnboundLocalError: local variable 'traffic_results' referenced before assignment"
dimensions_ga = traffic_results['reports'][0]['columnHeader']['dimensions']
rows = traffic_results['reports'][0]['data']['rows']
The Unbound Local Error is coming from the second line from the bottom where I call traffic results and try to assign it to the dimensions_ga variable.
I believe the problem is that I was using recursion instead of a loop. I used the sample code provided here:
https://developers.google.com/analytics/devguides/reporting/core/v3/errors
also changing "except HttpError, error:" to "except HttpError as error" for python 3.
Not sure the best way to test this, as error is not manually reproducible..
I have created a web bot that iterates over a website e.g example.com/?id=int where int is some integer. the function gets the result in raw html using requests library then hands it to parseAndWrite to extract a div and save its value in a sqlite db:
def archive(initial_index, final_index):
while True:
try:
for i in range(initial_index, final_index):
res = requests.get('https://www.example.com/?id='+str(i))
parseAndWrite(res.text)
print(i, ' archived')
except requests.exceptions.ConnectionError:
print("[-] Connection lost. ")
continue
except:
exit(1)
break
archive(1, 10000)
My problem is that, after some time, the loop doesn't continue to 10000 but repeats itself from a random value, resulting in many duplicate records in the database. What is causing this inconsistency ?
I think your two loops are nested in the wrong order. The outer while loop is supposed to retry any URLs that cause connection errors, but you've put it outside the for loop the iterates over the URL numbers. That means you always start from the initial index whenever an error happens.
Try swapping the loops, and you'll only repeat one URL until it works:
def archive(initial_index, final_index):
for i in range(initial_index, final_index):
while True:
try:
res = requests.get('https://www.example.com/?id='+str(i))
parseAndWrite(res.text)
print(i, ' archived')
except requests.exceptions.ConnectionError:
print("[-] Connection lost. ")
continue
except:
exit(1)
break
archive(1, 10000)
A general rule for a try statement is to execute as little code as possible inside one; only put the code you expect will produce the error you want to catch in it; all other code goes before or after the statement.
Don't catch errors you don't know what to do with. Exiting the program is rarely the right thing to do; that will happen anyway if no one else catches the exception, so given your caller the chance to handle it.
And finally, don't build URLs yourself; let the requests library do that for you. The base URL is http://www.example.com; the id parameter and its value can be passed via a dict to requests.get.
Your outer loop will iterate over the various parameters used to construct the URL; the inner loop will try the request until it succeeds. Once the inner loop terminates, then you can use the response to call parseAndWrite.
def archive(initial_index, final_index):
base_url = 'https://www.example.com/'
for i in range(initial_index, final_index + 1):
while True:
try:
res = requests.get(base_url, params={'id': i})
except requests.exception.ConnectionError:
print("[-] Connection lost, trying again")
continue
else:
break
parseAndWrite(res.text)
print('{} archived'.format(i))
archived(1, 10000)
You might also consider letting requests handle the retries for you. See Can I set max_retries for requests.request? for a start.
If any connection error occurs, you restart at initial_index. Instead, you could retry the current index again and again, until the connection succeeds:
def archive(initial_index, final_index):
for i in range(initial_index, final_index):
while True:
try:
response = requests.get(f'https://www.example.com/?id={i}')
parseAndWrite(response.text)
print(f'{i} archived')
except requests.exceptions.ConnectionError:
print("[-] Connection lost. ")
else:
break
archive(1, 10000)
I made a simple script for amusment that takes the latest comment from http://www.reddit.com/r/random/comments.json?limit=1 and speaks through espeak. I ran into a problem however. If Reddit fails to give me the json data, which it commonly does, the script stops and gives a traceback. This is a problem, as it stops the script. Is there any sort of way to retry to get the json if it fails to load. I am using requests if that means anything
If you need it, here is the part of the code that gets the json data
url = 'http://www.reddit.com/r/random/comments.json?limit=1'
r = requests.get(url)
quote = r.text
body = json.loads(quote)['data']['children'][0]['data']['body']
subreddit = json.loads(quote)['data']['children'][0]['data']['subreddit']
For the vocabulary, the actual error you're having is an exception that has been thrown at some point in a program because of a detected runtime error, and the traceback is the program thread that tells you where the exception has been thrown.
Basically, what you want is an exception handler:
try:
url = 'http://www.reddit.com/r/random/comments.json?limit=1'
r = requests.get(url)
quote = r.text
body = json.loads(quote)['data']['children'][0]['data']['body']
subreddit = json.loads(quote)['data']['children'][0]['data']['subreddit']
except Exception as err:
print err
so that you jump over the part that needs the thing that couldn't work. Have a look at that doc as well: HandlingExceptions - Python Wiki
As pss suggests, if you want to retry after the url failed to load:
done = False
while not done:
try:
url = 'http://www.reddit.com/r/random/comments.json?limit=1'
r = requests.get(url)
except Exception as err:
print err
done = True
quote = r.text
body = json.loads(quote)['data']['children'][0]['data']['body']
subreddit = json.loads(quote)['data']['children'][0]['data']['subreddit']
N.B.: That solution may not be optimal, since if you're offline or the URL is always failing, it'll do an infinite loop. If you retry too fast and too much, Reddit may also ban you.
N.B. 2: I'm using the newest Python 3 syntax for exception handling, which may not work with Python older than 2.7.
N.B. 3: You may also want to choose a class other than Exception for the exception handling, to be able to select what kind of error you want to handle. It mostly depends on your app design, and given what you say, you might want to handle requests.exceptions.ConnectionError, but have a look at request's doc to choose the right one.
Here's what you may want, but please think this through and adapt it to your use case:
import requests
import time
import json
def get_reddit_comments():
retries = 5
while retries != 0:
try:
url = 'http://www.reddit.com/r/random/comments.json?limit=1'
r = requests.get(url)
break # if the request succeeded we get out of the loop
except requests.exceptions.ConnectionError as err:
print("Warning: couldn't get the URL: {}".format(err))
time.delay(1) # wait 1 second between two requests
retries -= 1
if retries == 0: # if we've done 5 attempts, we fail loudly
return None
return r.text
def use_data(quote):
if not quote:
print("could not get URL, despites multiple attempts!")
return False
data = json.loads(quote)
if 'error' in data.keys():
print("could not get data from reddit: error code #{}".format(quote['error']))
return False
body = data['data']['children'][0]['data']['body']
subreddit = data['data']['children'][0]['data']['subreddit']
# … do stuff with your data here
if __name__ == "__main__":
quote = get_reddit_comments()
if not use_data(quote):
print("Fatal error: Couldn't handle data receipt from reddit.")
sys.exit(1)
I hope this snippet will help you correctly design your program. And now that you've discovered exceptions, please always remember that exceptions are for handling things that shall stay exceptional. If you throw an exception at some point in one of your programs, always ask yourself if this is something that should happen when something unexpected happens (like a webpage not loading), or if it's an expected error (like a page loading but giving you an output that is not expected).
My goal:
To go through a list of websites to check them using Requests. This is being done in apply_job.
My problem:
When job_pool.next is called, a few websites are in error and instead of giving an error, they just stand there and don't even give a TimeoutError. That's why I am using a timeout in the next function with 10s of timeout. This timeout works well but when the TimeoutError exception arises, the next function the following times keep raising the exception even though the next websites are good. It seems to me that it doesn't move to the next item and just loop over the same one.
I tried with imap and imap_unordered, no difference in that.
My code here:
def run_check(websites):
""" Run check on the given websites """
import multiprocessing
from multiprocessing.pool import ThreadPool
pool = ThreadPool(processes=JOB_POOL_SIZE)
try:
job_pool = pool.imap_unordered(apply_job, websites)
try:
while True:
try:
res = job_pool.next(10)
except multiprocessing.TimeoutError:
logging.error("Timeout Error")
res = 'No Res'
csv_callback(res)
except StopIteration:
pass
pool.terminate()
except Exception, e:
logging.error("Run_check Error: %s"%e)
raise
I use res = requests.get(url, timeout=10) to check the websites. This timeout doesn't work for this issue.
To test, here are the websites that makes the problem (not every time but very often): http://www.kddecorators.netfirms.com, http://www.railcar.netfirms.com.
I can't figure out what is different with these websites but my guess is that they keep sending a byte once in a while so it isn't considered as a real timeout even though they are unusable.
If anyone has an idea, it would be greatly appreciated, I have been stuck on that one for a few days now. I even tried future and async but they don't raise the exception which I need.
Thanks guys!
Your intuition that passing a timeout to next would abort the job is wrong. It just aborts waiting, but the particular job keeps running. The next time you wait, you do wait for the same job. To achieve a timeout on the actual jobs you should look at the requests documentation. Note that there is no reliable way to terminate another thread. So if you absolutely cannot make your jobs terminate within a reasonable time frame, you can switch to a process based pool and forcefully terminate the processes (e.g. using signal.alarm).
I found a solution for my issue, I used eventlet and its Timeout function.
def apply_job(account_info):
""" Job for the Thread """
try:
account_id = account_info['id']
account_website = account_info['website']
url = account_website
result = "ERROR: GreenPool Timeout"
with Timeout(TIMEOUT*2, False):
url, result = tpool.execute(website.try_url, account_website)
return (account_id, account_website, url, result)
except Exception, e:
logging.error("Apply_job Error: %s"%e)
def start_db(res):
update_db(res)
csv_file.csv_callback(res)
def spawn_callback(result):
res = result.wait()
tpool.execute(start_db, res)
def run_check(websites):
""" Run check on the given websites """
print str(len(websites)) + " items found\n"
pool = eventlet.GreenPool(100)
for i, account_website in enumerate(websites):
res = pool.spawn(apply_job, account_website)
res.link(spawn_callback)
pool.waitall()
This solution works well because it times-out over the whole execution of the function website.try_url in the command url, result = tpool.execute(website.try_url, account_website).
I'm using RPC to fetch multiple URLs asynchronously. I'm using a global variable to track completion and notice that the contents of that global have radically different contents before and after the RPC calls complete.
Feels like I'm missing something obvious... Is it possible for the rpc.wait() to result in the app context being loaded on a new instance when the callbacks are made?
Here's the basic pattern...
aggregated_results = {}
def aggregateData(sid):
# local variable tracking results
aggregated_results[sid] = []
# create a bunch of asynchronous url fetches to get all of the route data
rpcs = []
for r in routes:
rpc = urlfetch.create_rpc()
rpc.callback = create_callback(rpc,sid)
urlfetch.make_fetch_call(rpc, url)
rpcs.append(rpc)
# all of the schedule URLs have been fetched. now wait for them to finish
for rpc in rpcs:
rpc.wait()
# look at results
try:
if len(aggregated_results[sid]) == 0:
logging.debug("We couldn't find results for transaction")
except KeyError as e:
logging.error('aggregation error: %s' % e.message)
logging.debug(aggregated_results)
return aggregated_results[sid]
def magic_callback(rpc, sid):
# do some work to parse the result
# of the urlfetch call...
# <hidden>
#
try:
if len(aggregated_results[sid]) == 0:
aggregated_results[sid] = [stop]
else:
done = False
for i, s in enumerate(aggregated_results[sid]):
if stop.time <= s.time:
aggregated_results[sid].insert(i,stop)
done = True
break
if not done:
aggregated_results[sid].append(stop)
except KeyError as e:
logging.error('aggregation error: %s' % e.message)
The KeyError is thrown both inside the callback as well as the end of processing all of the results. Neither of those should happen.
When I print out the contents of the dictionary, the sid is in fact gone, but there are other entries for other requests that are being processed. In some cases, more entries than I see when the respective request starts.
This pattern is called on a web request handler. Not in the background.
It's as if, the callbacks occur on a difference instance.
The sid key in this case is a combination of strings that includes a time string and I'm confident it is unique.