How does asyncio.sleep work with negative values? - python

I decided to implement sleep sort (https://rosettacode.org/wiki/Sorting_algorithms/Sleep_sort) using Python's asyncio when I made a strange discovery: it works with negative values (and returns immediately with 0)!
Here is the code (you can run it here https://repl.it/DYTZ):
import asyncio
import random
async def sleepy(value):
return await asyncio.sleep(value, result=value)
async def main(input_values):
result = []
for sleeper in asyncio.as_completed(map(sleepy, input_values)):
result.append(await sleeper)
print(result)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
input_values = list(range(-5, 6))
random.shuffle(input_values)
loop.run_until_complete(main(input_values))
The code takes 5 seconds to execute, as expected, but the result is always [0, -5, -4, -3, -2, -1, 1, 2, 3, 4, 5]. I can understand 0 returning immediately, but how are the negative values coming back in the right order?

Well, looking at the source:
delay == 0 is special-cased to return immediately, it doesn't even try to sleep.
Non-zero delay calls events.get_event_loop(). Since there are no calls to events.set_event_loop_policy(policy) in asyncio.tasks, it would seem to fall back on the default unless it's already been set somewhere else, and the default is asyncio.DefaultEventLoopPolicy.
This is not defined in events.py, because it's different on Windows from on UNIX.
Either way, sleep calls loop.create_future(). That's defined a few inheritances back, over in base_events.BaseEventLoop. It's just a simple call to the Future() constructor, no significant logic.
From the instance of Future it delegates back to the loop, as follows:
future._loop.call_later(delay,
futures._set_result_unless_cancelled,
future, result)
That one is also in BaseEventLoop, and still doesn't directly handle the delay number: it calls self.call_at, adding the current time to the delay.
call_at schedules and returns an events.TimerHandle, and the callback is to tell the Future it's done. The return value is only relevant if the task is to be cancelled, which it is automatically at the end for cleanup. The scheduling is the important bit.
_scheduled is sorted via heapq - everything goes on there in sorted order, and timers sort by their _when. This is key.
Every time it checks, it strips out all cancelled scheduled things, then runs all remaining scheduled callbacks, in order, until it hits one that's not ready.
TL;DR:
Sleeping with asyncio for a negative duration schedules tasks to be "ready" in the past. This means that they go to the top of the list of scheduled tasks, and are run as soon as the event loop checks. Effectively, 0 comes first because it doesn't even schedule, but everything else registers to the scheduler as "running late" and is handled immediately in order of how late it is.

If you take a look at the asyncio source, sleep special cases 0 and returns immediately.
if delay == 0:
yield
return result
If you continue through the source, you'll see that any other value gets passed through to the event loop's call_later method. Looking at how call_later is implemented for the default loop (BaseEventLoop), you'll see that call_later passes a time to call_at.
self.call_at(self.time() + delay, callback, *args)
The reason the values are turned in order is that the times created with negative delays occur before those with positive delays.

Related

Python Concurrency: Wait for one out of few futures and cancel the rest

I am usually using concurrent.futures.ThreadPoolExecutor for executing tasks concurrently in python.
There is a function that is very lengthy and non-deterministic in terms of the time it takes to be executed (it gets a proxy, send an HTTP request, etc.).
I want to call it few times (let's say 2), and here is when it gets complicated for me:
When one of the tasks finishes, I would like to check its return value and if it's True, carry on with the code-path and i don't care anymore about the second task and there is no need to wait for it anymore.
But if the return value is False, I would like to wait for the second task to finish and then continue with the code-path.
I tried to look in several places here in SO, like this python concurrency question but still couldn't understand how to do it precisely.
You can either use concurrent.futures.wait with the return_when=FIRST_COMPLETED option (which will wait on multiple futures, but return as soon as any of them complete). But even simpler is to use concurrent.futures.as_completed, which gives you an iterator that returns the futures as they complete or are cancelled.
f1 = executor.submit(job_1)
f2 = executor.submit(job_2)
for f in concurrent.futures.as_completed((f1,f2)):
if f.completed() and f.result():
# a job completed and returned True, so skip the rest
break
else:
# handle case where none of the tasks succeeded
pass
# normal code path

Multiprocessing with shared queue and end criteria

I've got this original function that I want to switch to multiprocess:
def optimal(t0, tf, frequences, delay, ratio = 0):
First = True # First
for s in delay:
delay = 0 # delay between signals,
timelines = list()
for i in range(len(frequences)):
timelines.append(time_builder(frequences[i], t0+delay, tf))
delay += s
trio_overlap = trio_combination(timelines, ratio)
valid = True
for items in trio_overlap.values():
if len(list(set(items))) == len(items):
continue
else:
valid = False
if not valid:
continue
overlap = duo_combination(timelines)
optimal = ... depending of conditions
return optimal
If valid = True after the test, it will compute an optimization parameter called optim_param and try to minimize it. If it gets under a certain threshold, optim_param < 0.3, I break out of the loop and take this value as my answer.
My problem is that as I develop my model, the complexity is starting to rise, and single thread computation takes too long. I would like to process the computation in parallel. Since each process will have to compare the result obtained with an s value to the current optimal, I tried to implement a Queue.
It's my first time doing multiprocessing, and even if I think I'm on the right track, I kinda feel like my code is messy and incomplete. Could I get some help?
Thanks :D
Instead of manually creating a process for each case, consider using Pool.imap_unordered. The trick is how to cleanly shut down when a passable result is obtained: you can implement this by passing a generator that exits early in case a flag is set that it checks every cycle. The main program reads from the iterator, maintains the best result seen, and sets the flag when it is good enough. The final trick is to slow down the (internal) thread reading from the generator to prevent a large backlog of scheduled tasks that must be waited on (or, uncleanly, killed) after the good result is obtained. Given the number of processes in the pool, that pacing can be achieved with a semaphore.
Here's an example (with trivial analysis) to demonstrate:
import multiprocessing,threading,os
def interrupted(data,sem,interrupt):
for x in data:
yield x
sem.acquire()
if interrupt: break
def analyze(x): return x**2
np=os.cpu_count()
pool=multiprocessing.Pool(np)
sem=threading.Semaphore(np-1)
token=[] # mutable
vals=pool.imap_unordered(analyze,interrupted(range(-10,10),sem,token))
pool.close() # optional: to let processes exit faster
best=None
for res in vals:
if best is None or res<best:
best=res
if best<5: token.append(None) # make it truthy
sem.release()
pool.join()
print(best)
There are of course other ways to share the semaphore and interrupt flag with the generator; this way uses an ugly data type but has the virtue of using no global variables (or even closures).

Round robin scheduler - call functions periodically at appropriate times

I have a list of functions (1..N) and function i needs to be called every X_i seconds (X_i would be large such as 1000+ s). Each X_i doesn't have to be unique, i.e. it is possible that X_i == X_j.
Provided, I generate a list of (function_i, X_i) how can I simply execute these functions at their appropriate times in the future and sleep between calls? I have used ApScheduler before but it runs tasks in parallel and I need functions to be run one after the other.
I can write my own iterator which returns the current function that needs to be executed and blocks until the next one but I'd rather use a library if one exists?
EDIT: N is about 200 at the moment.
threading module
The threading module lets you start a new thread, which will not be affected by other threads' sleep statements. This requires N threads, so if N is extremely huge, let me know and I will try to think of an alternative solution.
You can create N threads and set each one on a timed loop, like so:
import threading, time
def looper(function, delay): # Creates a function that will loop that function
def inner(): # Will keep looping once invoked
while True:
function() # Call the function; you can optionally add args
time.sleep(delay) # Swap this line and the one before it to wait before running rather than after
return inner # The function that should be called to start the loop is returned
def start(functions, delays): # Call this with the two lists to start the loops
for function, delay in zip(functions, delays): # Goes through the respective pairs
thread = threading.Thread(target = looper(function, delay)) # This thread will start the looper
thread.start()
start([lambda: print("hi"), lambda: print("bye")], [0.2, 0.3])
You can try it online here; just hit run and then hit run again when you want to kill it (Thanks to #DennisMitchell for the online interpreter)

Python asyncio task ordering

I have a question about how the event loop in python's asyncio module manages outstanding tasks. Consider the following code:
import asyncio
#asyncio.coroutine
def a():
for i in range(0, 3):
print('a.' + str(i))
yield
#asyncio.coroutine
def b():
for i in range(0, 3):
print('b.' + str(i))
yield
#asyncio.coroutine
def c():
for i in range(0, 3):
print('c.' + str(i))
yield
tasks = [
asyncio.Task(a()),
asyncio.Task(b()),
asyncio.Task(c()),
]
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait([t1, t2, t3]))
Running this will print:
a.0
b.0
c.0
a.1
b.1
c.1
a.2
b.2
c.2
Notice that it always prints out 'a' then 'b' then 'c'. I'm guessing that no matter how many iterations each coroutine goes through it will always print in that order. So you'd never see something like
b.100
c.100
a.100
Coming from a node.js background, this tells me that the event loop here is maintaining a queue internally that it uses to decide which task to run next. It initially puts a() at the front of the queue, then b(), then c() since that's the order of the tasks in the list passed to asyncio.wait(). Then whenever it hits a yield statement it puts that task at the end of the queue. I guess in a more realistic example, say if you were doing an async http request, it would put a() back on the end of the queue after the http response came back.
Can I get an amen on this?
Currently your example doesn't include any blocking I/O code. Try this to simulate some tasks:
import asyncio
#asyncio.coroutine
def coro(tag, delay):
for i in range(1, 8):
print(tag, i)
yield from asyncio.sleep(delay)
loop = asyncio.get_event_loop()
print("---- await 0 seconds :-) --- ")
tasks = [
asyncio.Task(coro("A", 0)),
asyncio.Task(coro("B", 0)),
asyncio.Task(coro("C", 0)),
]
loop.run_until_complete(asyncio.wait(tasks))
print("---- simulate some blocking I/O --- ")
tasks = [
asyncio.Task(coro("A", 0.1)),
asyncio.Task(coro("B", 0.3)),
asyncio.Task(coro("C", 0.5)),
]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
As you can see, coroutines are scheduled as needed, and not in order.
DISCLAIMER For at least v3.9 with the default implementation this appears to be true. However, the inner workings of the event loop are not public interface and thus may be changed with new versions. Additionally, asyncio allows for BaseEventLoop implementation to be substituted, which may change its behavior.
When a Task object is created, it calls loop.call_soon to register its _step method as a callback. The _step method actually does the work of calling your coroutine with calls to send() and processing the results.
In BaseEventLoop, loop.call_soon places the _step callback at the end of a _ready list of callbacks. Each run of the event loop, iterates the list of _ready callbacks in a FIFO order and calls them. Thus, for the initial run of tasks, they are executed in the order they are created.
When the task awaits or yields a future, it really depends on the nature of that future when the task's _wakeup method get put into the queue.
Also, note that other callbacks can be registered in between creation of tasks. While it is true that if TaskA is created before TaskB, the initial run of TaskA will happen before TaskB, there could still be other callbacks that get run in between.
Last, the above behavior is also for the default Task class that comes with asyncio. Its possible however to specify a custom task factory and use an alternative task implementation which could also change this behavior.
(This is a follow up to D-Rock's answer, was too long to be a comment.)
The execution order of callbacks is guaranteed in the asyncio documentation in a few places.
The loop.call_soon() docs guarantee the execution order:
Callbacks are called in the order in which they are registered. Each callback will be called exactly once.
The Future.add_done_callback() docs specify that callbacks are scheduled via loop.call_soon(), and thus have this guaranteed FIFO order.
And asyncio.Task described as a subclass of asyncio.Future, and so has the same behaviour for add_done_callback().
So I think it's pretty safe to rely on FIFO ordering of asyncio callbacks, at least when using vanilla asyncio.

Python, sleep some code not all

I have a situation, where at some point in my code I want to trigger a number of timers, the code will keep running, but at some point these functions will trigger and remove an item from a given list. Similar though not exactly like the code below. The problem is, I want these functions to wait a certain amount of time, the only way I know how is to use sleep, but that stops all of the code, when I need the first function to keep running. So how can I set a function aside with out making everything wait for it? If the answer involves threading, please know that I have very little experience with it and like explanations with pictures and small words.
from time import sleep
from datetime import datetime
def func():
x = 1
for i in range(20):
if i % 4 == 0:
func2()
print("START", datetime.now())
x += 1
else:
print("continue")
def func2():
print("go")
sleep(10)
print("func 2--------------------------------------", datetime.now())
func()
You need to use threading. http://docs.python.org/2/library/threading.html
You can start functions in their own threads.
I used background function. It will run in the background, even if going to another page.
You need to import threading, also time to use time.sleep():
import threading
import time
I had a function where I wanted to sleep code in the background, here is an example:
# This is the one that will sleep, but since you used args on the Thread, it will not make the mainFunction to sleep.
def backgroundFunction(obj):
theObj = obj
time.sleep(120)
# updates the Food to 5 in 2 minutes
obj["Food"] = 5
return
def mainFunction():
obj = {"Food": 4, "Water": 3}
# Make sure there are a comma in the args().
t1 = threading.Thread(target=backgroundFunction, args=(obj,))
t1.start()
return
If you used t1 = threading.Thread(target=backgroundFunction(obj)) it will not be in the background so don't use this, unless you want mainFunction to sleep also.
Depending on the situation, another option might be an event queue based system. That avoids threads, so it can be simpler.
The idea is that instead of using sleep(20), you calculate when the event should fire, using datetime.now() + timedelta(seconds=20). You then put that in a sorted list.
Regularly, perhaps each time through the main loop of your program, you check the first element in the list; if the time has passed, you remove it and call the relevant function.
To add an event:
pending_events.append((datetime.now() + timedelta(seconds=20), e))
pending_events.sort()
Then, as part of your main loop:
for ... # your main loop
# handle timed events:
while pending_events[0][0] < datetime.now():
the_time, e = pending_events.pop(0)
handle_event(e, the_time)
... # rest of your main loop
This relies on your main loop regularly calling the event-handling code, and on the event-handling code not taking much time to handle the event. Depending on what the main loop and the events are doing, this may come naturally or it may be some effort or it may rule out this method...
Notes:
You only need to check the first element in the list, because the list is sorted in time order; checking the first element checks the earliest one and you don't need to check the others until that one has passed.
Instead of a sorted list, you can use a heapq, which is more complicated but faster; in practice, you'd need a lot of pending events to notice any difference.
If the event is to be "every 20s" rather than "after 20s", use the_time + timedelta(seconds=20) to schedule each subsequent event; that way, the delay in getting to and processing the event won't be added.

Categories

Resources