Asynchronous Timer Implementation in Python - python

I need to implement an asyncronous timer to 'watch' the execution of a list of functions, till the timer expires. But the problem is function execution is a blocking call and in that case how can I track the timer if the function take too long to comeback.
functions = [func_1, func_2, func_3, func_n]
timer = Timer(30) # timer of 30 sec, just for example.
while timer.expires():
for func in functions:
func() # what if this function runs for a min
I would like to avoid multithreading and multiprocessing as far as possible, but if multiprocessing/threading is the only way out then please provide those solutions also.
What are different ways in python in which asynchronous behaviour can be achieved.

If the functions you call are blocking due to IO, you can use the asyncio module to turn them into non blocking. At that point you wrap them into a future and set a timeout for their completion. Keep in mind that the timeout is considering only the IO.
If the functions are blocking due to CPU bound jobs (while loops, long calculations) there is no way to achieve that without using processes.

Related

Does python3 asyncio use a work stealing scheduler like Rust Tokio?

Does Python 3 asyncio use a work-stealing scheduler like Rust Tokio? What's the behavior of the default scheduler? Is it documented somewhere?
"Work-stealing" is a property of multi-threaded executors. Python asyncio's executor (event loop) is single-threaded, so it's by definition not work-stealing. The behavior of the asyncio event loop wrt threads is documented (among other places) in the Concurrency and Multithreading section of the documentation.
As for the algorithm used for scheduling, it's intentionally unspecified, but the stdlib implementation uses:
a deque to store callbacks that are ready to run (those scheduled with call_soon() or create_task()) as well as those associated with file descriptors that are ready to read/write, and
a binary heap to store callbacks scheduled for a particular time ordered by the absolute time when they're supposed to fire. This covers callbacks scheduled by loop.call_after() and loop.call_at(), but also continuation of coroutines suspended by asyncio.sleep(), which internally uses loop.call_at().
At each loop iteration the loop waits for something to happen on file descriptors associated with coroutines and sets the timeout to interrupt the sleep at the nearest time-based callback, in case nothing interesting happens before that. It proceeds to call the ready callbacks and the timeouts scheduled to run at the current or earlier time. This is repeated until the event loop is instructed to stop.

Which is more efficient? threading.Thread vs threading.Timer

This is more out of theoretical curiosity than an actual problem I am having.
Say you want to run some code at a regular interval, what are the pros and cons of using a Timer vs using a thread + time.sleep in terms of CPU consumption?
The two below approaches do the same. I am aware that the Thread approach is not exactly one second interval, but rather adds a delay after each execution, which can matter if the task_function operation takes a long time. I am also aware that there are many other ways to solve this problem, but lets focus on the threading package.
Timer approach
def task_function():
print(time.time())
def task():
task_function()
threading.Timer(1,task).start()
task()
Thread approach
def task_function():
while True:
print(time.time())
time.sleep(1)
threading.Thread(target=task_function).start()
I read somewhere that starting a thread is quite resource intensive. So I wonder that if you had some code you wanted to run every 0.1 seconds, would the Timer approach not be sub-optimal since a new thread has to be started so often?
If the code must repeat on an interval, use the plain Thread (to be clear, Timer is just a thin wrapper around a Thread in the first place; it's implemented as a subclass). Spawning a new thread (via Timer) 10x a second is wasteful, and gains you nothing in any event.
You should make the worker thread a daemon thread though, unless you really want it to keep the process alive indefinitely.

Handling per worker timeout with Python multiprocessing

I've been having some troubles with Python's multiprocessing module. I need to test a function with different parameters, and as that function does a lot of calculations, the use of all the cores is most desirable. I ended up using pool.map(), which suits my needs. The problem is that sometimes my function never ends, so pool.map keeps blocked forever expecting a returned value. I don't know why this happens. I've done a lot of tests without using multiprocessing, just passing an argumment after another in a for loop, and it always ends.
Anyway, What I want to do now is to specify a timeout for each worker / execution of the function, but I need a variable inside the function to be returned in case that timeout is reached. That would be like the status of the function before the timeout happens. My code is too big to post it here, but here's a simple, equivalent example:
def func(x):
secsPassed = 0
for _ in xrange(x):
time.sleep(1)
secsPassed +=1
return secsPassed
pool = Pool(4)
results = pool.map(func, [3, 10, 50, 20, 300])
So I'd like that each execution takes at max 30 seconds, and I'd also like to know the value of secsPassed just before func gets interrumpted. I'm using Python 2.7 and I can make changes to func, or use another tool aside from Pool.map if necessary.
Thanks in advance.
This question has been asked several times in the past.
multiprocessing.Pool has not been designed for such use case.
Forcing one of the workers to commit suicide will lead to undefined behaviour which might vary from remaining stuck there forever to getting your program to crash.
There are libraries which can solve your problem. pebble allows you to set timeout to your workers and will stop them if the time limit has exceeded.

Not able to timeout using python eventlet library (eventlet.timeout.Timeout)

I am looping over a list and performing some action on each member of the list.
If a member takes too much time (1 sec in this case), I intend to pass it. However the block inside the try statement is always being processed and is never timing out. I don't understand why.
from eventlet import *
for rule in data:
#Timeout block
t=Timeout(1)
try:
f = expr2bdd(expr(rule))
solutions = satisfy_all(f, count=True)
each_rule["solution"]=solutions
except:
pass
finally:
t.cancel()
Eventlet is a concurrent networking library...
It's not clear what expr2bdd and satisfy_all functions do, but most likely they only do some CPU calculations and no disk/network IO. In this case there is no point where Eventlet gets a chance to run and fire timeout exception.
If you have control over expr2bdd and satisfy_all functions and there are any kind of loops, place eventlet.sleep(0) at each iteration. That's Eventlet idiom for "yield control to other coroutines", that's where timeout will be fired.
If you don't have control over said functions, second best option is to run them in a separate process which you can forcefully kill. On POSIX compatible OS (e.g. Linux, *BSD, OSX), you can use os.fork to run a piece of code in separate process. For maximum portability, use subprocess.Popen([sys.executable,...]) or multiprocessing.Process. The latter gives higher level API, mainly around easier data exchange (serialization) at the cost of performance overhead, which may be negligible in your case. In any case, basic pattern is this: (in a thread or eventlet coroutine, you start a second process and then .communicate()/join() on it. Use eventlet.Timeout or Thread.join() with timeout. If timeout fires, use p.terminate() or p.kill() to stop current calculations.

Python: set a function timeout without using signal or threads?

Is there a way to have a function raise an error if it takes longer than a certain amount of time to return? I want to do this without using signal (because I am not in the main thread) or by spawning more threads, which is cumbersome.
If your function is looping through a lot of things, you could check the elapsed time during each iteration of the loop... but if it's blocked on something for the long period, then you need to have some other thread which can be handling the timing stuff while the thread you're timing is blocked.

Categories

Resources