python2.7 multiprocess pool async vs sync - python

Lots of doc write : 'apply' is for sync while 'apply_async' is for async.
And I read source code of multiprocessing (in file multiprocessing/pool.py), it says:
def apply(self, func, args=(), kwds={}):
assert self._state == RUN
return self.apply_async(func, args, kwds).get()
...
def apply_async(self, func, args=(), kwds={}, callback=None):
assert self._state = RUN
....
return result
It seems that apply just call apply_async, the only difference is their return values.
So my question is:
What's the real difference between sync and async ? and Why?

The huge difference is the .get() at the end of:
return self.apply_async(func, args, kwds).get()
apply_async() on its own does not block the caller: the call to apply_async() returns at once, and gives you back an AsyncResult object. Such objects have (among others) a .get() method, which blocks until the invoked process finishes running func(*args, **kwds) and returns its result.
Since apply() blocks until the result is ready, it's impossible to get more than one client process working simultaneously if apply() is all you use. Sometimes that's what you want, but not usually. Using apply_async() instead you can fire off as many tasks as you like in parallel, and retrieve their results later.

The difference is in the get() function. Execution is blocked until the apply call is done. apply_async will return immediately an ApplyResult object on which you must call get() to have your return value. Furthermore, the async version of the call supports a callback which will be executed when the execution is done, allowing event-driven operations.
If you supply multiple functions to apply_async, their return order is not guaranteed to be the same as submission order. You should check the map() function to get results in order.

Related

Running Image Manipulation in run_in_executor. Adapting to multiprocessing

Hey so I run lots of Image Manipulation on an api built using fastapi async. I would like to be able to run the Image Manipulation asynchronously. As a result I used run_in_executor which I believe runs it in a seperate thread. However I was told that using python multiprocessing is better instead. Does moving have any advantages?.
import asyncio
import functools
from app.exceptions.errors import ManipulationError
def executor(function):
#functools.wraps(function)
def decorator(*args, **kwargs):
try:
partial = functools.partial(function, *args, **kwargs)
loop = asyncio.get_event_loop()
return loop.run_in_executor(None, partial)
except Exception:
raise ManipulationError("Uanble To Manipulate Image")
return decorator
I made this decorator to wrap my blocking funcs as run in executor.
two questions
a) Does moving to multiprocesisng have any advantages
b) How would I do so
a) Does moving to multiprocesisng have any advantages
Yes, it utilizes multiple cores in case of CPU-bound processing.
b) How would I do so
By passing an instance of ProcessPoolExecutor to run_in_executor. (The None value you're passing now means use the default executor provided by asyncio, which is a ThreadPoolExecutor.) For example (untested):
_pool = concurrent.futures.ProcessPoolExecutor()
def executor(function):
#functools.wraps(function)
def decorator(*args):
loop = asyncio.get_event_loop()
return loop.run_in_executor(_pool, function, *args)
return decorator
This will also require that all arguments to the function be serializable, so that they can be transferred to the subprocess.

How to run a coroutine twice in Python?

I have a wrapper function which might run a courutine several times:
async def _request_wraper(self, courutine, attempts=5):
for i in range(1, attempts):
try:
task_result = await asyncio.ensure_future(courutine)
return task_result
except SOME_ERRORS:
do_smth()
continue
Corutine might be created from differect async func, which may accept diferent number of necessary/unnecessary arguments.
When I have a second loop iteration, I am getting error --> cannot reuse already awaited coroutine
I have tried to make a copy of courutine, but it is not possible with methods copy and deepcopy.
What could be possible solution to run corutine twice?
As you already found out, you can't await a coroutine many times. It simply doesn't make sense, coroutines aren't functions.
It seems what you're really trying to do is retry an async function call with arbitrary arguments. You can use arbitrary argument lists (*args) and the keyword argument equivalent (**kwargs) to capture all arguments and pass them to the function.
async def retry(async_function, *args, **kwargs, attempts=5):
for i in range(attempts):
try:
return await async_function(*args, **kwargs)
except Exception:
pass # (you should probably handle the error here instead of ignoring it)

Python multithreading function argument

I was writing some multithreading code and had a syntax issue in my code and found that the code was not executing in parallel but rather sequentially. I fixed the issue to pass the arguments to the function as a separate list instead of passing it as a parameter to the function but I couldn't figure out why python was behaving that way and couldn't find documentation for it. Anyone know why?
import time
from concurrent.futures import ThreadPoolExecutor
def do_work(i):
print("{} {} - Command started".format(i, time.time()))
time.sleep(1)
count = 0
executor = ThreadPoolExecutor(max_workers=2)
while count < 5:
print("Starting work")
executor.submit(do_work(count))
print("Work submitted")
count += 1
Fixed this line to make it go parallel.
executor.submit(do_work, count)
You were telling Python to execute the function do_work(), and to then pass whatever that function returned, to executor.do_work():
executor.submit(do_work(count))
It might be easier for you to see this if you used a variable to hold the result of do_work(). The following is functionally equivalent to the above:
do_work_result = do_work(count)
executor.submit(do_work_result)
In Python, functions are first-class objects; using just the name do_work you are referencing the function object. Only adding (...) to an expression that produces a function object (or another callable object type) causes something to be executed.
In the form
executor.submit(do_work, count)
you do not call the function. You are passing in the function object itself as the first argument, and count as the second argument. The executor.submit() function accepts callable objects and their arguments to then later on run those functions in parallel, with the arguments provided.
This allows the ThreadPoolExecutor to take that function reference and the single argument and only call the function in a new thread, later on.
Because you were calling the function first, you had to wait for each function to complete first as you called it sequentially before adding. And because the functions return None, you were adding those None references to executor.submit(), and would have seen a TypeError exception later on to tell you that 'NoneType' object is not callable. That happens because the threadpool executor tried to use None(), which doesn't work because indeed, None is not a callable.
Under the hood, the library essentially does this:
def submit(self, fn, *args, **kwargs):
# record the function to be called as a work item, with other information
w = _WorkItem(..., fn, args, kwargs)
self._work_queue.put(w)
so a work item referencing the function and arguments is added to a queue. Worker threads are created which take items from the queue again it is taken from the queue (in another thread, or a child process), the _WorkItem.run() method is called, which runs your function:
result = self.fn(*self.args, **self.kwargs)
Only then the (...) call syntax is used. Because there are multiple threads, the code is executed concurrently.
You do want to read up on how pure Python code can't run in parallel, only concurrently: Does Python support multithreading? Can it speed up execution time?
Your do_work() functions only run 'faster' because time.sleep() doesn't have to do any actual work, apart from telling the kernel to not give any execution time to the thread the sleep was executed on, for the requested amount of time. You end up with a bunch of threads all asleep. If your workers had to execute Python instructions, then the total time spent on running these functions concurrently or sequentially would not differ all that much.

use tornado coroutine in call stack

I am new to tornado and have some questions about tornado's coroutine.
if i have a call stack looks like:
func_a => func_b => func_c => func_d
and func_d is an asynchronous function and I use yield and #gen.coroutine decorator.
just like this:
#gen.coroutine
def redis_data(self, id):
ret = yield asyn_function()
raise gen.Return(ret)
Must I use yield and #gen.coroutine with func_c, func_b and func_a?
Yes, all your coroutine's callers must also be coroutines, and they must yield the result of your coroutine.
Why? No coroutine can do I/O without executing a yield statement. Look at your code: might it need to talk to the server? Then it must yield. So must its caller, and so on up the chain, so that ultimately you have yielded to the event loop. Otherwise the loop cannot make progress and the I/O does not complete.
This is both a technical requirement of coroutine code, and an advantage of coroutines over threads. You always know by looking at your code when you can be interrupted:
https://glyph.twistedmatrix.com/2014/02/unyielding.html
For more on refactoring coroutines, see:
http://emptysqua.re/blog/refactoring-tornado-coroutines/

intervals with pythons threading

I made this setInterval kind of like in javascripts only for python and it's a little different, problem is I can't seem to figure out how to cancel it since when I cancel it, it just makes a new one and continues to run that one after I canceled the original thread
def setInterval(sec, func, *args, **kw):
def inner():
func(*args, **kw)
setInterval(sec, func, *args, **kw) # This is where it sends it again
task = threading.Timer(sec, inner)
task.daemon = True
task.start()
return task
As you can see, it works, however I have no way of canceling the thread because the original executes again and creates a new one before it can be canceled. How would I set this up so if the thread is canceled it won't create any copies of the same original thread? I've tried adding some keywords then sending the thread in it and have it cancel if it is a type of threading.Timer, but it doesn't seem to work since it already make a copy of the original before it could cancel, any ideas or suggestions? I'm just trying to think of a way so it knows that it's a copy of the original then not execute it but I'm not sure how I would actually do that. Is there anything I could do so it terminates/cancels the original thread and doesn't start a new copy of it before it has a chance to be canceled? Here is my attempt at doing it incase you want to tell me what I'm doing wrong.
def setInterval(sec = None, func = None, *args, **kw):
start = True if type(func) != threading.Timer else func.cancel() # doesn't work anyways
def inner():
func(*args, **kw)
setInterval(sec, func, *args, **kw)
if start == True:
task = thrading.Timer(sec, inner)
task.daemon = True
task.start()
return task
def clearInterval(task):
setInterval(func = task)
myInterval = setInterval(10, print, "Hello, world!")
clearInterval(myInterval) # cancels original, continues to make copies
Cancelling a Timer before it runs prevents it from calling its callback function. But if it's already woken up, it's too late to cancel it. And, since your code creates a new Timer inside the callback, that new Timer won't know it's supposed to be canceled.
So, you need to cooperate with the callback function, by giving it access to some flag that lets it know it's been canceled (e.g., with a mutable global or closure variable, function attribute, or default parameter value). And of course that variable needs to be synchronized, meaning you read it under a Lock, or you've got a race condition. (In CPython, thanks to the GIL, the worst-case scenario of that race should be occasionally running one extra time, but in a different implementation, it's possible that the timer thread (where the callback is running) could never see the updated value.)
However, this is going to get complicated. You're probably better off first extending the Timer class to a RepeatingTimer. Then, if you really want to you can wrap that in trivial setInterval/clearInterval functions.
There are plenty of recipes on ActiveState and packages on PyPI that add a repeat flag or equivalent, and also do other nice things like use a single thread instead of creating a new thread for every interval. But if you want to know how to do this yourself, it's pretty easy. In fact, the threading docs have a link to the threading.py source because it's meant to be useful as example code, and you can see how trivial Timer is, so you could even re-implement it yourself.
But let's do it with subclassing instead.
class RepeatableTimer(threading.Timer):
def __init__(self, interval,
function, args=None, kwargs=None, repeat=False):
super(RepeatableTimer, self).__init__(interval, function, args, kwargs)
self.repeat = repeat
self.lock = threading.Lock()
def cancel(self):
with self.lock:
self.repeat = False
super(RepeatableTimer, self).cancel()
def run(self):
while True:
self.finished.clear()
super(RepeatableTimer, self).run()
with self.lock:
if not self.repeat:
break
I think it would actually be simpler to implement it from scratch, because then you don't have to worry about resetting the Event, but anyway, this works.
If you want to, e.g., extend this so instead of a repeat flag there's a times integer (which is -1 for "repeat forever") as in JavaScript, that's trivial.
Anyway, now you can wrap this in setInterval and clearInterval functions:
def setInterval(sec=None, func=None, *args, **kw):
task = RepeatableTimer(sec, func, args, kw, repeat=True)
task.daemon = True
task.start()
return task
def clearInterval(task):
task.cancel()
Although note that clearInterval = RepeatableTimer.cancel would work just as well. (In Python 2.x, this would be an unbound method vs. a function, but it would still work the same, other than giving different error messages if you called it with the wrong args. In 3.x, there is no difference at all.)
If you really want to do the whole mess with making clearInterval call setInterval, you can, but let's at least clean it up a bit—use isinstance instead of type, and don't try to set a flag that you use in a second if when you can just do it all in a single if:
def setInterval(sec=None, func=None, *args, **kw):
if isinstance(func, RepeatableTimer):
task.cancel()
else:
task = RepeatableTimer(sec, func, args, kw, repeat=True)
task.daemon = True
task.start()
return task
def clearInterval(task):
setInterval(func=task)
But I don't see what you think that's buying you.
Anyway, here's a test to verify that it works:
myInterval = setInterval(1, print, "Hello, world!")
time.sleep(3)
clearInterval(myInterval)
time.sleep(5)
This should usually print "Hello, world!" 2 or 3 times, occasionally 4, but never 7 or 8 like an uncancellable timer would.

Categories

Resources