I am looping over a list and performing some action on each member of the list.
If a member takes too much time (1 sec in this case), I intend to pass it. However the block inside the try statement is always being processed and is never timing out. I don't understand why.
from eventlet import *
for rule in data:
#Timeout block
t=Timeout(1)
try:
f = expr2bdd(expr(rule))
solutions = satisfy_all(f, count=True)
each_rule["solution"]=solutions
except:
pass
finally:
t.cancel()
Eventlet is a concurrent networking library...
It's not clear what expr2bdd and satisfy_all functions do, but most likely they only do some CPU calculations and no disk/network IO. In this case there is no point where Eventlet gets a chance to run and fire timeout exception.
If you have control over expr2bdd and satisfy_all functions and there are any kind of loops, place eventlet.sleep(0) at each iteration. That's Eventlet idiom for "yield control to other coroutines", that's where timeout will be fired.
If you don't have control over said functions, second best option is to run them in a separate process which you can forcefully kill. On POSIX compatible OS (e.g. Linux, *BSD, OSX), you can use os.fork to run a piece of code in separate process. For maximum portability, use subprocess.Popen([sys.executable,...]) or multiprocessing.Process. The latter gives higher level API, mainly around easier data exchange (serialization) at the cost of performance overhead, which may be negligible in your case. In any case, basic pattern is this: (in a thread or eventlet coroutine, you start a second process and then .communicate()/join() on it. Use eventlet.Timeout or Thread.join() with timeout. If timeout fires, use p.terminate() or p.kill() to stop current calculations.
Related
I'm using python's multiprocessing module to handle a number of functions concurrently. Each spawned-process' function gets some initial input arguments, and a Pipe connection to send its results back. For various reasons, I must use individual processes like this, i.e. tools like Pool.map_async()-methods are off the table.
Occasionally, I need to terminate a process which takes too long to finish.
According to the Process documentation:
Warning: If this method is used when the associated process is using a
pipe or queue then the pipe or queue is liable to become corrupted and
may become unusable by other process. Similarly, if the process has
acquired a lock or semaphore etc. then terminating it is liable to
cause other processes to deadlock.
I'm not worried about the first part, as each process gets their own pipe object, but how do I determine if a process has 'acquired a lock or semaphore', and/or terminate in a way that's safe for the remainder of my program?
As a side note: It might be worthwhile to check why your subprocesses are taking 'too long to finish'.
As for the warning, it relates to when you 'lock' resources for use. For example:
# function to withdraw from account
def withdraw(balance, lock):
for _ in range(10000):
lock.acquire()
balance.value = balance.value - 1
lock.release()
source: https://www.geeksforgeeks.org/synchronization-pooling-processes-python/
If you would terminate the subprocess after it has performed lock.acquire()and before it performed lock.release(), you would have a deadlock situation.
So the question is, do you use any threading.Lock or threading.Semaphore objects in the processes that you want to terminate?
I hope this helps in understanding whether it is safe to terminate your subprocess/thread.
EDIT: By the way, you should also consider using kill() instead of terminate().
On *nix, you can try sending SIGINT to the process instead of terminating/killing it and catch KeyboardInterrupt exception for the cleanup:
from multiprocessing import Process
import os
import time
import signal
def f():
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
print('Do the emergency cleanup here')
p = Process(target=f)
p.start()
os.kill(p.pid, signal.SIGINT)
p.join()
Q : "Safe way to terminate a python subprocess?"
Well, if there were some, you would never run into this.
If your actual needs can justify the costs of doing so right and well, the best workaround would be to learn from Masters of reliable processing, designing robust, resilient, self-healing systems - like Ms.Margaret HAMILTON (MIT) pioneered in NASA operated Apollo Moon Landing Programme, designing the AGC ( Apollo Guidance Computer ) so right and so well, it could survive its own deadlocking risks, preventing the Eagle lander from crashing the Moon surface.
Best inspirations come from distributed-computing, available for pythonistas if designing safe and self-healing autonomous components using a robust and independent many-node-to-many-node communications-plane frameworks ZeroMQ or nanomsg.
So I have this library that I use and within one of my functions I call a function from that library, which happens to take a really long time. Now, at the same time I have another thread running where I check for different conditions, what I want is that if a condition is met, I want to cancel the execution of the library function.
Right now I'm checking the conditions at the start of the function, but if the conditions happen to change while the library function is running, I don't need its results, and want to return from it.
Basically this is what I have now.
def my_function():
if condition_checker.condition_met():
return
library.long_running_function()
Is there a way to run the condition check every second or so and return from my_function when the condition is met?
I've thought about decorators, coroutines, I'm using 2.7 but if this can only be done in 3.x I'd consider switching, it's just that I can't figure out how.
You cannot terminate a thread. Either the library supports cancellation by design, where it internally would have to check for a condition every once in a while to abort if requested, or you have to wait for it to finish.
What you can do is call the library in a subprocess rather than a thread, since processes can be terminated through signals. Python's multiprocessing module provides a threading-like API for spawning forks and handling IPC, including synchronization.
Or spawn a separate subprocess via subprocess.Popen if forking is too heavy on your resources (e.g. memory footprint through copying of the parent process).
I can't think of any other way, unfortunately.
Generally, I think you want to run your long_running_function in a separate thread, and have it occasionally report its information to the main thread.
This post gives a similar example within a wxpython program.
Presuming you are doing this outside of wxpython, you should be able to replace the wx.CallAfter and wx.Publisher with threading.Thread and PubSub.
It would look something like this:
import threading
import time
def myfunction():
# subscribe to the long_running_function
while True:
# subscribe to the long_running_function and get the published data
if condition_met:
# publish a stop command
break
time.sleep(1)
def long_running_function():
for loop in loops:
# subscribe to main thread and check for stop command, if so, break
# do an iteration
# publish some data
threading.Thread(group=None, target=long_running_function, args=()) # launches your long_running_function but doesn't block flow
myfunction()
I haven't used pubsub a ton so I can't quickly whip up the code but it should get you there.
As an alternative, do you know the stop criteria before you launch the long_running_function? If so, you can just pass it as an argument and check whether it is met internally.
I have a python program which operates an external program and starts a timeout thread. Timeout thread should countdown for 10 minutes and if the script, which operates the external program isn't finished in that time, it should kill the external program.
My thread seems to work fine on the first glance, my main script and the thread run simultaneously with no issues. But if a pop up window appears in the external program, it stops my scripts, so that even the countdown thread stops counting, therefore totally failing it's job.
I assume the issue is that the script calls a blocking function in API for the external program, which is blocked by the pop up window. I understand why it blocks my main program, but don't understand why it blocks my countdown thread. So, one possible solution might be to run a separate script for the countdown, but I would like to keep it as clean as possible and it seems really messy to start a script for this.
I have searched everywhere for a clue, but I didn't find much. There was a reference to the gevent library here:
background function in Python
, but it seems like such a basic task, that I don't want to include external library for this.
I also found a solution which uses a windows multimedia timer here, but I've never worked with this before and am afraid the code won't be flexible with this. Script is Windows-only, but it should work on all Windows from XP on.
For Unix I found signal.alarm which seems to do exactly what I want, but it's not available for Windows. Any alternatives for this?
Any ideas on how to work with this in the most simplified manner?
This is the simplified thread I'm creating (run in IDLE to reproduce the issue):
import threading
import time
class timeToKill():
def __init__(self, minutesBeforeTimeout):
self.stop = threading.Event()
self.countdownFrom = minutesBeforeTimeout * 60
def startCountdown(self):
self.countdownThread= threading.Thread(target=self.countdown, args=(self.countdownFrom,))
self.countdownThread.start()
def stopCountdown(self):
self.stop.set()
self.countdownThread.join()
def countdown(self,seconds):
for second in range(seconds):
if(self.stop.is_set()):
break
else:
print (second)
time.sleep(1)
timeout = timeToKill(1)
timeout.startCountdown()
raw_input("Blocking call, waiting for input:\n")
One possible explanation for a function call to block another Python thread is that CPython uses global interpreter lock (GIL) and the blocking API call doesn't release it (NOTE: CPython releases GIL on blocking I/O calls therefore your raw_input() example should work as is).
If you can't make the buggy API call to release GIL then you could use a process instead of a thread e.g., multiprocessing.Process instead of threading.Thread (the API is the same). Different processes are not limited by GIL.
For quick and dirty threading, I usually resort to subprocess commands. it is quite robust and os independent. It does not give as fine grained control as the thread and queue modules but for external calls to programs generally does nicely. Note the shell=True must be used with caution.
#this can be any command
p1 = subprocess.Popen(["python", "SUBSCRIPTS/TEST.py", "0"], shell=True)
#the thread p1 will run in the background - asynchronously. If you want to kill it after some time, then you need
#here do some other tasks/computations
time.sleep(10)
currentStatus = p1.poll()
if currentStatus is None: #then it is still running
try:
p1.kill() #maybe try os.kill(p1.pid,2) if p1.kill does not work
except:
#do something else if process is done running - maybe do nothing?
pass
I'm writing a web UI for data analysis tasks.
Here's the way it's supposed to work:
After a user specifies parameters like dataset and learning rate, I create a new task record, then a executor for this task is started asyncly (The executor may take a long time to run.), and the user is redirected to some other page.
After searching for an async library for python, I started with eventlet, here's what I wrote in a flask view function:
db.save(task)
eventlet.spawn(executor, task)
return redirect("/show_tasks")
With the code above, the executor didn't execute at all.
What may be the problem of my code? Or maybe I should try something else?
While you been given with direct solutions, i will try to answer your first question and explain why your code does not work as expected.
Disclosures: i currently maintain Eventlet. This comment will contain a number of simplifications to fit into reasonable size.
Brief introduction to cooperative multithreading
There are two ways to do Multithreading and Eventlet exploits cooperative approach. At the core is Greenlet library which basically allows you to create independent "execution contexts". One could think of such context as frozen state of all local variables and a pointer to next instruction. Basically, multithreading = contexts + scheduler. Greenlet provides contexts so we need a scheduler, something that makes decisions about which context should occupy CPU right now. It turns, to make decisions we should also run some code. Which means a separate context (green thread). This special green thread is called a Hub in Eventlet code base. Scheduler maintains an ordered set of contexts that need to be run ASAP - run queue and set of contexts that are waiting for something (e.g. network IO or time limited sleep) to finish.
But since we are doing cooperative multitasking, one context will execute indefinitely unless it explicitly yields to another. This would be very sad style of programming, and also by definition incompatible with existing libraries (pointing at they-know-who); so what Eventlet does is it provides green versions of common modules, changed in such way that they switch to Hub instead of blocking everything. Then, some time may be spent in other green threads or in Hub's wait-for-external-events implementation, in which case Hub would switch back to green thread originating that event - and it would continue execution.
End. Now back to your problem.
What eventlet.spawn actually does: it creates a new execution context. Basically, allocates an object in memory. Also it tells scheduler to put this context into run queue, so at first possible moment, Hub will switch to newly spawned function. Your code does not provide such a moment. There is no place where you explicitly give up execution to other green threads, for Eventlet this is usually done via eventlet.sleep(). And since you don't use green versions of common modules, there is no chance to yield implicitly when other code waits. Most appropriate (if not the only one) place would be your WSGI server's accept loop: it should give other green threads chance to run while waiting for next request. Mentioned in first answer eventlet.monkey_patch() is just a convenient way to replace all (or subset of) common modules with their corresponding green versions.
Unwanted opinion on overall design
In separate section, to skip easily. Iff you are building error resistant software, you usually want to limit execution time for spawned threads (including but not limited to "green") and processes and at least report(log) or react to their unhandled errors. In provided code, your spawned green thread, technically may run in next moment or five minutes later (again, because nobody yields CPU) or fail with unhandled exception. Luckily, Eventlet provides two solutions for both problems: Timeout with_timeout() allow to limit waiting time (remember, if it does not yield, you can't possibly limit it) and GreenThread.link() to catch all exceptions. It may be tempting (it was for me) to reraise exceptions in "main" code, and link() allows that easily, but consider that exceptions would be raised from sleep and IO calls - places where you yield to Hub. This may provide some really counter intuitive tracebacks.
You'll need to patch some system libraries in order to make eventlet work. Here is a minimal working example (also as gist):
#!/usr/bin/env python
from flask import Flask
import time
import eventlet
eventlet.monkey_patch()
app = Flask(__name__)
app.debug = True
def background():
""" do something in the background """
print('[background] working in the background...')
time.sleep(2)
print('[background] done.')
return 42
def callback(gt, *args, **kwargs):
""" this function is called when results are available """
result = gt.wait()
print("[cb] %s" % result)
#app.route('/')
def index():
greenth = eventlet.spawn(background)
greenth.link(callback)
return "Hello World"
if __name__ == '__main__':
app.run()
More on that:
http://eventlet.net/doc/patching.html#monkey-patch
One of the challenges of writing a library like Eventlet is that the built-in networking libraries don’t natively support the sort of cooperative yielding that we need.
Eventlet may indeed be suitable for your purposes, but it doesn't just fit in with any old application; Eventlet requires that it be in control of all your application's I/O.
You may be able to get away with either
Starting Eventlet's main loop in another thread, or even
Not using Eventlet and just spawning your task in another thread.
Celery may be another option.
What is the best way to continuously repeat the execution of a given function at a fixed interval while being able to terminate the executor (thread or process) immediately?
Basically I know two approaches:
use multiprocessing and function with infinite cycle and time.sleep at the end. Processing is terminated with process.terminate() in any state.
use threading and constantly recreate timers at the end of the thread function. Processing is terminated by timer.cancel() while sleeping.
(both “in any state” and “while sleeping” are fine, even though the latter may be not immediate). The problem is that I have to use both multiprocessing and threading as the latter appears not to work on ARM (some fuzzy interaction of python interpreter and vim, outside of vim everything is fine) (I was using the second approach there, have not tried threading+cycle; no code is currently left) and the former spawns way too many processes which I would like not to see unless really required. This leads to a problem of having to code two different approaches while threading with cycle is just a few more imports for drop-in replacements of all multiprocessing stuff wrapped in if/else (except that there is no thread.terminate()). Is there some better way to do the job?
Currently used code is here (currently with cycle for both jobs), but I do not think it will be much useful to answer the question.
Update: The reason why I am using this solution are functions that display file status (and some other things like branch) in version control systems in vim statusline. These statuses must be updated, but updating them immediately cannot be done without using hooks and I have no idea how to set hooks temporary and remove on vim quit without possibly spoiling user configuration. Thus standard solution is cache expiring after N seconds. But when cache expired I need to do an expensive shell call and the delay appears to be noticeable, the more noticeable the heavier IO load is. What I am implementing now is updating values for viewed buffers each N seconds in a separate process thus delays are bothering that process and not me. Threads are likely to also work because GIL does not affect calls to external programs.
I'm not clear on why a single long-lived thread that loops infinitely over the tasks wouldn't work for you? Or why you end up with many processes in the multiprocess option?
My immediate reaction would have been a single thread with a queue to feed it things to do. But I may be misunderstanding the problem.
I do not know how do it simply and/or cleanly in Python, but I was wondering if maybe you couldn't take avantage of an existing system scheduler, e.g. crontab for *nix system.
There is an API in python and it might satisfied your needs.