Hey so I run lots of Image Manipulation on an api built using fastapi async. I would like to be able to run the Image Manipulation asynchronously. As a result I used run_in_executor which I believe runs it in a seperate thread. However I was told that using python multiprocessing is better instead. Does moving have any advantages?.
import asyncio
import functools
from app.exceptions.errors import ManipulationError
def executor(function):
#functools.wraps(function)
def decorator(*args, **kwargs):
try:
partial = functools.partial(function, *args, **kwargs)
loop = asyncio.get_event_loop()
return loop.run_in_executor(None, partial)
except Exception:
raise ManipulationError("Uanble To Manipulate Image")
return decorator
I made this decorator to wrap my blocking funcs as run in executor.
two questions
a) Does moving to multiprocesisng have any advantages
b) How would I do so
a) Does moving to multiprocesisng have any advantages
Yes, it utilizes multiple cores in case of CPU-bound processing.
b) How would I do so
By passing an instance of ProcessPoolExecutor to run_in_executor. (The None value you're passing now means use the default executor provided by asyncio, which is a ThreadPoolExecutor.) For example (untested):
_pool = concurrent.futures.ProcessPoolExecutor()
def executor(function):
#functools.wraps(function)
def decorator(*args):
loop = asyncio.get_event_loop()
return loop.run_in_executor(_pool, function, *args)
return decorator
This will also require that all arguments to the function be serializable, so that they can be transferred to the subprocess.
I have a wrapper function which might run a courutine several times:
async def _request_wraper(self, courutine, attempts=5):
for i in range(1, attempts):
try:
task_result = await asyncio.ensure_future(courutine)
return task_result
except SOME_ERRORS:
do_smth()
continue
Corutine might be created from differect async func, which may accept diferent number of necessary/unnecessary arguments.
When I have a second loop iteration, I am getting error --> cannot reuse already awaited coroutine
I have tried to make a copy of courutine, but it is not possible with methods copy and deepcopy.
What could be possible solution to run corutine twice?
As you already found out, you can't await a coroutine many times. It simply doesn't make sense, coroutines aren't functions.
It seems what you're really trying to do is retry an async function call with arbitrary arguments. You can use arbitrary argument lists (*args) and the keyword argument equivalent (**kwargs) to capture all arguments and pass them to the function.
async def retry(async_function, *args, **kwargs, attempts=5):
for i in range(attempts):
try:
return await async_function(*args, **kwargs)
except Exception:
pass # (you should probably handle the error here instead of ignoring it)
I was writing some multithreading code and had a syntax issue in my code and found that the code was not executing in parallel but rather sequentially. I fixed the issue to pass the arguments to the function as a separate list instead of passing it as a parameter to the function but I couldn't figure out why python was behaving that way and couldn't find documentation for it. Anyone know why?
import time
from concurrent.futures import ThreadPoolExecutor
def do_work(i):
print("{} {} - Command started".format(i, time.time()))
time.sleep(1)
count = 0
executor = ThreadPoolExecutor(max_workers=2)
while count < 5:
print("Starting work")
executor.submit(do_work(count))
print("Work submitted")
count += 1
Fixed this line to make it go parallel.
executor.submit(do_work, count)
You were telling Python to execute the function do_work(), and to then pass whatever that function returned, to executor.do_work():
executor.submit(do_work(count))
It might be easier for you to see this if you used a variable to hold the result of do_work(). The following is functionally equivalent to the above:
do_work_result = do_work(count)
executor.submit(do_work_result)
In Python, functions are first-class objects; using just the name do_work you are referencing the function object. Only adding (...) to an expression that produces a function object (or another callable object type) causes something to be executed.
In the form
executor.submit(do_work, count)
you do not call the function. You are passing in the function object itself as the first argument, and count as the second argument. The executor.submit() function accepts callable objects and their arguments to then later on run those functions in parallel, with the arguments provided.
This allows the ThreadPoolExecutor to take that function reference and the single argument and only call the function in a new thread, later on.
Because you were calling the function first, you had to wait for each function to complete first as you called it sequentially before adding. And because the functions return None, you were adding those None references to executor.submit(), and would have seen a TypeError exception later on to tell you that 'NoneType' object is not callable. That happens because the threadpool executor tried to use None(), which doesn't work because indeed, None is not a callable.
Under the hood, the library essentially does this:
def submit(self, fn, *args, **kwargs):
# record the function to be called as a work item, with other information
w = _WorkItem(..., fn, args, kwargs)
self._work_queue.put(w)
so a work item referencing the function and arguments is added to a queue. Worker threads are created which take items from the queue again it is taken from the queue (in another thread, or a child process), the _WorkItem.run() method is called, which runs your function:
result = self.fn(*self.args, **self.kwargs)
Only then the (...) call syntax is used. Because there are multiple threads, the code is executed concurrently.
You do want to read up on how pure Python code can't run in parallel, only concurrently: Does Python support multithreading? Can it speed up execution time?
Your do_work() functions only run 'faster' because time.sleep() doesn't have to do any actual work, apart from telling the kernel to not give any execution time to the thread the sleep was executed on, for the requested amount of time. You end up with a bunch of threads all asleep. If your workers had to execute Python instructions, then the total time spent on running these functions concurrently or sequentially would not differ all that much.
I am new to tornado and have some questions about tornado's coroutine.
if i have a call stack looks like:
func_a => func_b => func_c => func_d
and func_d is an asynchronous function and I use yield and #gen.coroutine decorator.
just like this:
#gen.coroutine
def redis_data(self, id):
ret = yield asyn_function()
raise gen.Return(ret)
Must I use yield and #gen.coroutine with func_c, func_b and func_a?
Yes, all your coroutine's callers must also be coroutines, and they must yield the result of your coroutine.
Why? No coroutine can do I/O without executing a yield statement. Look at your code: might it need to talk to the server? Then it must yield. So must its caller, and so on up the chain, so that ultimately you have yielded to the event loop. Otherwise the loop cannot make progress and the I/O does not complete.
This is both a technical requirement of coroutine code, and an advantage of coroutines over threads. You always know by looking at your code when you can be interrupted:
https://glyph.twistedmatrix.com/2014/02/unyielding.html
For more on refactoring coroutines, see:
http://emptysqua.re/blog/refactoring-tornado-coroutines/
I made this setInterval kind of like in javascripts only for python and it's a little different, problem is I can't seem to figure out how to cancel it since when I cancel it, it just makes a new one and continues to run that one after I canceled the original thread
def setInterval(sec, func, *args, **kw):
def inner():
func(*args, **kw)
setInterval(sec, func, *args, **kw) # This is where it sends it again
task = threading.Timer(sec, inner)
task.daemon = True
task.start()
return task
As you can see, it works, however I have no way of canceling the thread because the original executes again and creates a new one before it can be canceled. How would I set this up so if the thread is canceled it won't create any copies of the same original thread? I've tried adding some keywords then sending the thread in it and have it cancel if it is a type of threading.Timer, but it doesn't seem to work since it already make a copy of the original before it could cancel, any ideas or suggestions? I'm just trying to think of a way so it knows that it's a copy of the original then not execute it but I'm not sure how I would actually do that. Is there anything I could do so it terminates/cancels the original thread and doesn't start a new copy of it before it has a chance to be canceled? Here is my attempt at doing it incase you want to tell me what I'm doing wrong.
def setInterval(sec = None, func = None, *args, **kw):
start = True if type(func) != threading.Timer else func.cancel() # doesn't work anyways
def inner():
func(*args, **kw)
setInterval(sec, func, *args, **kw)
if start == True:
task = thrading.Timer(sec, inner)
task.daemon = True
task.start()
return task
def clearInterval(task):
setInterval(func = task)
myInterval = setInterval(10, print, "Hello, world!")
clearInterval(myInterval) # cancels original, continues to make copies
Cancelling a Timer before it runs prevents it from calling its callback function. But if it's already woken up, it's too late to cancel it. And, since your code creates a new Timer inside the callback, that new Timer won't know it's supposed to be canceled.
So, you need to cooperate with the callback function, by giving it access to some flag that lets it know it's been canceled (e.g., with a mutable global or closure variable, function attribute, or default parameter value). And of course that variable needs to be synchronized, meaning you read it under a Lock, or you've got a race condition. (In CPython, thanks to the GIL, the worst-case scenario of that race should be occasionally running one extra time, but in a different implementation, it's possible that the timer thread (where the callback is running) could never see the updated value.)
However, this is going to get complicated. You're probably better off first extending the Timer class to a RepeatingTimer. Then, if you really want to you can wrap that in trivial setInterval/clearInterval functions.
There are plenty of recipes on ActiveState and packages on PyPI that add a repeat flag or equivalent, and also do other nice things like use a single thread instead of creating a new thread for every interval. But if you want to know how to do this yourself, it's pretty easy. In fact, the threading docs have a link to the threading.py source because it's meant to be useful as example code, and you can see how trivial Timer is, so you could even re-implement it yourself.
But let's do it with subclassing instead.
class RepeatableTimer(threading.Timer):
def __init__(self, interval,
function, args=None, kwargs=None, repeat=False):
super(RepeatableTimer, self).__init__(interval, function, args, kwargs)
self.repeat = repeat
self.lock = threading.Lock()
def cancel(self):
with self.lock:
self.repeat = False
super(RepeatableTimer, self).cancel()
def run(self):
while True:
self.finished.clear()
super(RepeatableTimer, self).run()
with self.lock:
if not self.repeat:
break
I think it would actually be simpler to implement it from scratch, because then you don't have to worry about resetting the Event, but anyway, this works.
If you want to, e.g., extend this so instead of a repeat flag there's a times integer (which is -1 for "repeat forever") as in JavaScript, that's trivial.
Anyway, now you can wrap this in setInterval and clearInterval functions:
def setInterval(sec=None, func=None, *args, **kw):
task = RepeatableTimer(sec, func, args, kw, repeat=True)
task.daemon = True
task.start()
return task
def clearInterval(task):
task.cancel()
Although note that clearInterval = RepeatableTimer.cancel would work just as well. (In Python 2.x, this would be an unbound method vs. a function, but it would still work the same, other than giving different error messages if you called it with the wrong args. In 3.x, there is no difference at all.)
If you really want to do the whole mess with making clearInterval call setInterval, you can, but let's at least clean it up a bit—use isinstance instead of type, and don't try to set a flag that you use in a second if when you can just do it all in a single if:
def setInterval(sec=None, func=None, *args, **kw):
if isinstance(func, RepeatableTimer):
task.cancel()
else:
task = RepeatableTimer(sec, func, args, kw, repeat=True)
task.daemon = True
task.start()
return task
def clearInterval(task):
setInterval(func=task)
But I don't see what you think that's buying you.
Anyway, here's a test to verify that it works:
myInterval = setInterval(1, print, "Hello, world!")
time.sleep(3)
clearInterval(myInterval)
time.sleep(5)
This should usually print "Hello, world!" 2 or 3 times, occasionally 4, but never 7 or 8 like an uncancellable timer would.