Making sure loop continues to make API calls even if one failed - python

If I request data from an API using a raspberry pi in a while/for loop in python and append data to csv and one iteration fails due to something like faulty wifi connection that comes and goes, what is a foolproof method of having an indication that an error occurred and have it keep trying again either immediately or after some rest period?

Use try/except to catch the exception, e.g.:
while True:
try:
my_function_that_sometimes_fails()
except Exception e:
print e

I guess retry package (and decorator) will suit your needs. You can specify what kind of exception it should catch and how many times it should retry before stopping completely. You can also specify the amount of time between each try.

Related

Refresh webpage if got connection time out selenium python

I am using proxies rotating and sometimes it could happen one of these randomly doesn't work and I get "ERR_TIMED_OUT" enable to reach server and script just crashed without continuing, is it possible to refresh the webpage automatically in silenium when this happens (so the proxy will rotate). I thought about catch exception putting then driver.refresh() but how can I catch into the entire code without try- except for every instruction? is there another solution? thanks!
You can use event_firing_webdriver:
https://www.selenium.dev/selenium/docs/api/py/webdriver_support/selenium.webdriver.support.event_firing_webdriver.html
you can decorate method get() and execute try except there (with refresh on timeout exception).

Try Except for selenium webdriver or requests scripts

Thinking of best practices, is it good to use try except when trying to get respond from requests.get(url) or using selenium webdriver.get(url) ?
Maybe more general question, when its meant to be use try except - except of file handling?
Thank you.
for example:
import requests
try:
respond = requests.get('https://www.google.com')
print(respond.status_code)
except Exception as e:
print(f'error while open url - {e}')
I would say it is good practice, even tough it might never come up especially when dealing with stable and widely used sites, such as Google-sites.
But in the case that the site you are trying to request is down or not responding to you, in my experience try except comes in handy and speeds up the process of finding the cause of an error.
This is good practice to have, from experience if you leave the GET request outside of a try, if you wanted to loop through a list of URLs, this would end up falling over.
Also, if it fails, you could then handle that in the exception, for example your ip is blocked from accessing x website, you can output the url and either retry with proxies or write out the url for future handling.

Python 3 exception handling and catching

I'm designing a workflow engine for a very specific task and I'm thinking about exception handling.
I've got a main process that calls a few functions. Most of those functions call other more specific functions and so on. There are a few libraries involved so there are a lot of specific errors that can occur. IOError, OSError, AuthenticationException ...
I have to stop the workflow when an error occurs and log it so I can continue from that point when the error is resolved.
Example of what I mean:
def workflow_runner():
download_file()
...
(more calls with their own exceptions)
...
def download_file():
ftps = open_ftp_connection()
ftps.get(filename)
...
(more calls with their own exceptions)
...
def open_ftp_connection():
ftps = ftplib.FTP_TLS()
try:
ftps.connect(domain, port)
ftps.login(username, password)
except ftplib.all_errors as e:
print(e)
raise
return ftps
Your basic, run of the mill, modular functions.
My question is this:
What's considered the best way of doing top to bottom error handling in Python 3?
To raise every exception to the top and thus put "try except" over each function call up the stack?
To handle every exception when it happens, log and raise and have no "try except" at the "top"?
Some better alternative?
Would it be better to just finish and raise the error on the spot or catch it in the "download_file" and/or "workflow_runner" functions?
I ask because if I end up catching everything at the top I feel like I might end up with:
except AError
except BError
...
except A4Error
It depends… You catch an exception at the point where you can do something about it. That differs between different functions and different exception types. A piece of code calls a subsystem (generically speaking any function), and it knows that subsystem may raise exception A, B or C. It now needs to decide what exceptions it expects and/or what it can do about each one of them. In the end it may decide to catch A and B exceptions, but it wouldn't make sense for it to catch C exceptions because it can't do anything about them. This now means this piece of code may raise C exceptions, and its callers need to be aware of that and make the same kinds of decisions.
So different exceptions are caught at different layers, as appropriate.
In more concrete terms, say you have some system which consists of some HTTP object which downloads some stuff from remote servers, some job manager which wrangles a bunch of these HTTP objects and stores their result in a database, and a top level coordinator that starts and stops the job managers. The HTTP objects may obviously raise all sorts of HTTP exceptions when network requests fail, and the job managers may raise exceptions when something's wrong with the database. You will probably let the job managers worry about HTTP errors like 404, but not about something fundamental like ComputerDoesntHaveANetworkInterface errors; equally DatabaseIsUnreachable exceptions is nothing a job manager can do anything about, and should probably lead to the termination of the application.

Python threading passing statuses

Basically what I'm trying to do is fetch a couple of websites using proxies and process the data. The problem is that the requests rarely fail in a convincing way, setting socket timeouts wasnt very helpful either because they often didn't work.
So what I did is:
q = Queue()
s = ['google.com','ebay.com',] # And so on
for item in s:
q.put(item)
def worker():
item = q.get()
data = fetch(item) # This is the buggy part
# Process the data, yadayada
for i in range(workers):
t = InterruptableThread(target=worker)
t.start()
# Somewhere else
if WorkerHasLivedLongerThanTimeout:
worker.terminate()
(InterruptableThread class)
The problem is that I only want to kill threads which are still stuck on the fetching part. Also, I want the item to return to the queue. Ie:
def worker():
self.status = 0
item = q.get()
data = fetch(item) # This is the buggy part
self.status = 1 # Don't kill me now, bro!
# Process the data, yadayada
# Somewhere else
if WorkerHasLivedLongerThanTimeout and worker.status != 1:
q.put(worker.item)
worker.terminate()
How can this be done?
edit: breaking news; see below · · · ······
I decided recently that I wanted to do something pretty similar, and what came out of it was the pqueue_fetcher module. It ended up being mainly a learning endeavour: I learned, among other things, that it's almost certainly better to use something like twisted than to try to kill Python threads with any sort of reliability.
That being said, there's code in that module that more or less answers your question. It basically consists of a class whose objects can be set up to get locations from a priority queue and feed them into a fetch function that's supplied at object instantiation. If the location's resources get successfully received before their thread is killed, they get forwarded on to the results queue; otherwise they're returned to the locations queue with a downgraded priority. Success is determined by a passed-in function that defaults to bool.
Along the way I ended up creating the terminable_thread module, which just packages the most mature variation I could find of the code you linked to as InterruptableThread. It also adds a fix for 64-bit machines, which I needed in order to use that code on my ubuntu box. terminable_thread is a dependency of pqueue_fetcher.
Probably the biggest stumbling block I hit is that raising an asynchronous exception as do terminable_thread, and the InterruptableThread you mentioned, can have some weird results. In the test suite for pqueue_fetcher, the fetch function blocks by calling time.sleep. I found that if a thread is terminate()d while so blocking, and the sleep call is the last (or not even the last) statement in a nested try block, execution will actually bounce to the except clause of the outer try block, even if the inner one has an except matching the raised exception. I'm still sort of shaking my head in disbelief, but there's a test case in pqueue_fetcher that reenacts this. I believe "leaky abstraction" is the correct term here.
I wrote a hacky workaround that just does some random thing (in this case getting a value from a generator) to break up the "atomicity" (not sure if that's actually what it is) of that part of the code. This workaround can be overridden via the fission parameter to pqueue_fetcher.Fetcher. It (i.e. the default one) seems to work, but certainly not in any way that I would consider particularly reliable or portable.
So my call after discovering this interesting piece of data was to heretofore avoid using this technique (i.e. calling ctypes.pythonapi.PyThreadState_SetAsyncExc) altogether.
In any case, this still won't work if you need to guarantee that any request whose entire data set has been received (and i.e. acknowledged to the server) gets forwarded on to results. In order to be sure of that, you have to guarantee that the bit that does that last network transaction and the forwarding is guarded from being interrupted, without guarding the entire retrieval operation from being interrupted (since this would prevent timeouts from working..). And in order to do that you need to basically rewrite the retrieval operation (i.e. the socket code) to be aware of whichever exception you're going to raise with terminable_thread.Thread.raise_exc.
I've yet to learn twisted, but being the Premier Python Asynchronous Networking Framework©™®, I expect it must have some elegant or at least workable way of dealing with such details. I'm hoping it provides a parallel way to implement fetching from non-network sources (e.g. a local filestore, or a DB, or an etc.), since I'd like to build an app that can glean data from a variety of sources in a medium-agnostic way.
Anyhow, if you're still intent on trying to work out a way to manage the threads yourself, you can perhaps learn from my efforts. Hope this helps.
· · · · ······ this just in:
I've realized that the tests that I thought had stabilized have actually not, and are giving inconsistent results. This appears to be related to the issues mentioned above with exception handling and the use of the fission function. I'm not really sure what's going on with it, and don't plan to investigate in the immediate future unless I end up having a need to actually do things this way.

how to time-out gracefully while downloading with python

I'm downloading a huge set of files with following code in a loop:
try:
urllib.urlretrieve(url2download, destination_on_local_filesystem)
except KeyboardInterrupt:
break
except:
print "Timed-out or got some other exception: "+url2download
If the server times-out on URL url2download when connection is just initiating, the last exception is handled properly. But sometimes server responded, and downloading is started, but the server is so slow, that it'll takes hours for even one file, and eventually it returns something like:
Enter username for Clients Only at albrightandomalley.com:
Enter password for in Clients Only at albrightandomalley.com:
and just hangs there (although no username/passworde is aksed if the same link is downloaded through the browser).
My intention in this situation would be -- skip this file and go to the next one. The question is -- how to do that? Is there a way in python to specify how long is OK to work on downloading one file, and if more time is already spent, interrupt, and go forward?
Try:
import socket
socket.setdefaulttimeout(30)
If you're not limited to what's shipped with python out of the box, then the urlgrabber module might come in handy:
import urlgrabber
urlgrabber.urlgrab(url2download, destination_on_local_filesystem,
timeout=30.0)
There's a discussion of this here. Caveats (in addition to the ones they mention): I haven't tried it, and they're using urllib2, not urllib (would that be a problem for you?) (Actually, now that I think about it, this technique would probably work for urllib, too).
This question is more general about timing out a function:
How to limit execution time of a function call in Python
I've used the method described in my answer there to write a wait for text function that times out to attempt an auto-login. If you'd like similar functionality you can reference the code here:
http://code.google.com/p/psftplib/source/browse/trunk/psftplib.py

Categories

Resources