Making a Python multiprocessing Pool awaitable - python

I have an application that needs to perform some processor-intensive work based on a websocket stream. I want to parallelize the CPU-intensive bits with multiprocessing, but I still need the async interface to handle the streaming parts of the application. To solve this problem I was hoping to make an awaitable version of multiprocessing.AsyncResult (the result of a multiprocessing.pool.Pool.submit_async action). However, I've run into some strange behavior.
My new awaitable pool result (which is a subclass of asyncio.Future) works fine as long as the pool result comes back before I start awaiting it. However, if I try to await the pool result before it has come back, then the program appears to stall indefinitely on the await statement.
I've checked the async iterator results with next(future.async()) and the iterator returns the future instance itself before the pool processing completes and raises a StopIterationError after, as I'd expect.
Code is below.
import multiprocessing
import multiprocessing.pool
import asyncio
import time
class Awaitable_Multiprocessing_Pool(multiprocessing.pool.Pool):
def __init__(self, *args, **kwargs):
multiprocessing.pool.Pool.__init__(self, *args, **kwargs)
def apply_awaitably(self, func, args = list(), kwargs = dict()):
return Awaitable_Multiprocessing_Pool_Result(
self,
func,
args,
kwargs)
class Awaitable_Multiprocessing_Pool_Result(asyncio.Future):
def __init__(self, pool, func, args = list(), kwargs = dict()):
asyncio.Future.__init__(self)
self.pool_result = pool.apply_async(
func,
args,
kwargs,
self.set_result,
self.set_exception)
def result(self):
return self.pool_result.get()
def done(self):
return self.pool_result.ready()
def dummy_processing_fun():
import time
print('start processing')
time.sleep(4)
print('finished processing')
return 'result'
if __name__ == '__main__':
async def main():
ah = Async_Handler(1)
pool = Awaitable_Multiprocessing_Pool(2)
while True:
future = pool.apply_awaitably(dummy_processing_fun, [])
# print(next(future.__await__())) # would show same as print(future)
# print(await future) # would stall indefinitely because pool result isn't in
time.sleep(10) # NOTE: you may have to make this longer to account for pool startup time on the first iteration
# print(next(future.__await__())) # would raise StopIteration
print(await future) # prints 'result'
asyncio.run(main())
Am I missing something obvious here? I think I have all the essential elements of an awaitable working correctly in part because of the fact that I can await successfully in some circumstances. Anyone have any insight?

I am not sure why you make it so complex... How about the following code?
from concurrent.futures import ProcessPoolExecutor
import asyncio
import time
def dummy_processing_fun():
import time
print('start processing')
time.sleep(4)
print('finished processing')
return 'result'
if __name__ == '__main__':
async def main():
pool = ProcessPoolExecutor(2)
while True:
future = pool.submit(dummy_processing_fun)
future = asyncio.wrap_future(future)
# print(next(future.__await__())) # would show same as print(future)
# print(await future) # would stall indefinitely because pool result isn't in
# time.sleep(5)
# print(next(future.__await__())) # would raise StopIteration
print(await future) # prints 'result'
asyncio.run(main())

Related

How to wrap a stuck function in a timer block? [duplicate]

There is a socket related function call in my code, that function is from another module thus out of my control, the problem is that it blocks for hours occasionally, which is totally unacceptable, How can I limit the function execution time from my code? I guess the solution must utilize another thread.
An improvement on #rik.the.vik's answer would be to use the with statement to give the timeout function some syntactic sugar:
import signal
from contextlib import contextmanager
class TimeoutException(Exception): pass
#contextmanager
def time_limit(seconds):
def signal_handler(signum, frame):
raise TimeoutException("Timed out!")
signal.signal(signal.SIGALRM, signal_handler)
signal.alarm(seconds)
try:
yield
finally:
signal.alarm(0)
try:
with time_limit(10):
long_function_call()
except TimeoutException as e:
print("Timed out!")
I'm not sure how cross-platform this might be, but using signals and alarm might be a good way of looking at this. With a little work you could make this completely generic as well and usable in any situation.
http://docs.python.org/library/signal.html
So your code is going to look something like this.
import signal
def signal_handler(signum, frame):
raise Exception("Timed out!")
signal.signal(signal.SIGALRM, signal_handler)
signal.alarm(10) # Ten seconds
try:
long_function_call()
except Exception, msg:
print "Timed out!"
Here's a Linux/OSX way to limit a function's running time. This is in case you don't want to use threads, and want your program to wait until the function ends, or the time limit expires.
from multiprocessing import Process
from time import sleep
def f(time):
sleep(time)
def run_with_limited_time(func, args, kwargs, time):
"""Runs a function with time limit
:param func: The function to run
:param args: The functions args, given as tuple
:param kwargs: The functions keywords, given as dict
:param time: The time limit in seconds
:return: True if the function ended successfully. False if it was terminated.
"""
p = Process(target=func, args=args, kwargs=kwargs)
p.start()
p.join(time)
if p.is_alive():
p.terminate()
return False
return True
if __name__ == '__main__':
print run_with_limited_time(f, (1.5, ), {}, 2.5) # True
print run_with_limited_time(f, (3.5, ), {}, 2.5) # False
I prefer a context manager approach because it allows the execution of multiple python statements within a with time_limit statement. Because windows system does not have SIGALARM, a more portable and perhaps more straightforward method could be using a Timer
from contextlib import contextmanager
import threading
import _thread
class TimeoutException(Exception):
def __init__(self, msg=''):
self.msg = msg
#contextmanager
def time_limit(seconds, msg=''):
timer = threading.Timer(seconds, lambda: _thread.interrupt_main())
timer.start()
try:
yield
except KeyboardInterrupt:
raise TimeoutException("Timed out for operation {}".format(msg))
finally:
# if the action ends in specified time, timer is canceled
timer.cancel()
import time
# ends after 5 seconds
with time_limit(5, 'sleep'):
for i in range(10):
time.sleep(1)
# this will actually end after 10 seconds
with time_limit(5, 'sleep'):
time.sleep(10)
The key technique here is the use of _thread.interrupt_main to interrupt the main thread from the timer thread. One caveat is that the main thread does not always respond to the KeyboardInterrupt raised by the Timer quickly. For example, time.sleep() calls a system function so a KeyboardInterrupt will be handled after the sleep call.
Here: a simple way of getting the desired effect:
https://pypi.org/project/func-timeout
This saved my life.
And now an example on how it works: lets say you have a huge list of items to be processed and you are iterating your function over those items. However, for some strange reason, your function get stuck on item n, without raising an exception. You need to other items to be processed, the more the better. In this case, you can set a timeout for processing each item:
import time
import func_timeout
def my_function(n):
"""Sleep for n seconds and return n squared."""
print(f'Processing {n}')
time.sleep(n)
return n**2
def main_controller(max_wait_time, all_data):
"""
Feed my_function with a list of itens to process (all_data).
However, if max_wait_time is exceeded, return the item and a fail info.
"""
res = []
for data in all_data:
try:
my_square = func_timeout.func_timeout(
max_wait_time, my_function, args=[data]
)
res.append((my_square, 'processed'))
except func_timeout.FunctionTimedOut:
print('error')
res.append((data, 'fail'))
continue
return res
timeout_time = 2.1 # my time limit
all_data = range(1, 10) # the data to be processed
res = main_controller(timeout_time, all_data)
print(res)
Doing this from within a signal handler is dangerous: you might be inside an exception handler at the time the exception is raised, and leave things in a broken state. For example,
def function_with_enforced_timeout():
f = open_temporary_file()
try:
...
finally:
here()
unlink(f.filename)
If your exception is raised here(), the temporary file will never be deleted.
The solution here is for asynchronous exceptions to be postponed until the code is not inside exception-handling code (an except or finally block), but Python doesn't do that.
Note that this won't interrupt anything while executing native code; it'll only interrupt it when the function returns, so this may not help this particular case. (SIGALRM itself might interrupt the call that's blocking--but socket code typically simply retries after an EINTR.)
Doing this with threads is a better idea, since it's more portable than signals. Since you're starting a worker thread and blocking until it finishes, there are none of the usual concurrency worries. Unfortunately, there's no way to deliver an exception asynchronously to another thread in Python (other thread APIs can do this). It'll also have the same issue with sending an exception during an exception handler, and require the same fix.
You don't have to use threads. You can use another process to do the blocking work, for instance, maybe using the subprocess module. If you want to share data structures between different parts of your program then Twisted is a great library for giving yourself control of this, and I'd recommend it if you care about blocking and expect to have this trouble a lot. The bad news with Twisted is you have to rewrite your code to avoid any blocking, and there is a fair learning curve.
You can use threads to avoid blocking, but I'd regard this as a last resort, since it exposes you to a whole world of pain. Read a good book on concurrency before even thinking about using threads in production, e.g. Jean Bacon's "Concurrent Systems". I work with a bunch of people who do really cool high performance stuff with threads, and we don't introduce threads into projects unless we really need them.
The only "safe" way to do this, in any language, is to use a secondary process to do that timeout-thing, otherwise you need to build your code in such a way that it will time out safely by itself, for instance by checking the time elapsed in a loop or similar. If changing the method isn't an option, a thread will not suffice.
Why? Because you're risking leaving things in a bad state when you do. If the thread is simply killed mid-method, locks being held, etc. will just be held, and cannot be released.
So look at the process way, do not look at the thread way.
I would usually prefer using a contextmanager as suggested by #josh-lee
But in case someone is interested in having this implemented as a decorator, here's an alternative.
Here's how it would look like:
import time
from timeout import timeout
class Test(object):
#timeout(2)
def test_a(self, foo, bar):
print foo
time.sleep(1)
print bar
return 'A Done'
#timeout(2)
def test_b(self, foo, bar):
print foo
time.sleep(3)
print bar
return 'B Done'
t = Test()
print t.test_a('python', 'rocks')
print t.test_b('timing', 'out')
And this is the timeout.py module:
import threading
class TimeoutError(Exception):
pass
class InterruptableThread(threading.Thread):
def __init__(self, func, *args, **kwargs):
threading.Thread.__init__(self)
self._func = func
self._args = args
self._kwargs = kwargs
self._result = None
def run(self):
self._result = self._func(*self._args, **self._kwargs)
#property
def result(self):
return self._result
class timeout(object):
def __init__(self, sec):
self._sec = sec
def __call__(self, f):
def wrapped_f(*args, **kwargs):
it = InterruptableThread(f, *args, **kwargs)
it.start()
it.join(self._sec)
if not it.is_alive():
return it.result
raise TimeoutError('execution expired')
return wrapped_f
The output:
python
rocks
A Done
timing
Traceback (most recent call last):
...
timeout.TimeoutError: execution expired
out
Notice that even if the TimeoutError is thrown, the decorated method will continue to run in a different thread. If you would also want this thread to be "stopped" see: Is there any way to kill a Thread in Python?
Using simple decorator
Here's the version I made after studying above answers. Pretty straight forward.
def function_timeout(seconds: int):
"""Wrapper of Decorator to pass arguments"""
def decorator(func):
#contextmanager
def time_limit(seconds_):
def signal_handler(signum, frame): # noqa
raise TimeoutException(f"Timed out in {seconds_} seconds!")
signal.signal(signal.SIGALRM, signal_handler)
signal.alarm(seconds_)
try:
yield
finally:
signal.alarm(0)
#wraps(func)
def wrapper(*args, **kwargs):
with time_limit(seconds):
return func(*args, **kwargs)
return wrapper
return decorator
How to use?
#function_timeout(seconds=5)
def my_naughty_function():
while True:
print("Try to stop me ;-p")
Well of course, don't forget to import the function if it is in a separate file.
Here's a timeout function I think I found via google and it works for me.
From:
http://code.activestate.com/recipes/473878/
def timeout(func, args=(), kwargs={}, timeout_duration=1, default=None):
'''This function will spwan a thread and run the given function using the args, kwargs and
return the given default value if the timeout_duration is exceeded
'''
import threading
class InterruptableThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.result = default
def run(self):
try:
self.result = func(*args, **kwargs)
except:
self.result = default
it = InterruptableThread()
it.start()
it.join(timeout_duration)
if it.isAlive():
return it.result
else:
return it.result
The method from #user2283347 is tested working, but we want to get rid of the traceback messages. Use pass trick from Remove traceback in Python on Ctrl-C, the modified code is:
from contextlib import contextmanager
import threading
import _thread
class TimeoutException(Exception): pass
#contextmanager
def time_limit(seconds):
timer = threading.Timer(seconds, lambda: _thread.interrupt_main())
timer.start()
try:
yield
except KeyboardInterrupt:
pass
finally:
# if the action ends in specified time, timer is canceled
timer.cancel()
def timeout_svm_score(i):
#from sklearn import svm
#import numpy as np
#from IPython.core.display import display
#%store -r names X Y
clf = svm.SVC(kernel='linear', C=1).fit(np.nan_to_num(X[[names[i]]]), Y)
score = clf.score(np.nan_to_num(X[[names[i]]]),Y)
#scoressvm.append((score, names[i]))
display((score, names[i]))
%%time
with time_limit(5):
i=0
timeout_svm_score(i)
#Wall time: 14.2 s
%%time
with time_limit(20):
i=0
timeout_svm_score(i)
#(0.04541284403669725, '计划飞行时间')
#Wall time: 16.1 s
%%time
with time_limit(5):
i=14
timeout_svm_score(i)
#Wall time: 5h 43min 41s
We can see that this method may need far long time to interrupt the calculation, we asked for 5 seconds, but it work out in 5 hours.
This code works for Windows Server Datacenter 2016 with python 3.7.3 and I didn't tested on Unix, after mixing some answers from Google and StackOverflow, it finally worked for me like this:
from multiprocessing import Process, Lock
import time
import os
def f(lock,id,sleepTime):
lock.acquire()
print("I'm P"+str(id)+" Process ID: "+str(os.getpid()))
lock.release()
time.sleep(sleepTime) #sleeps for some time
print("Process: "+str(id)+" took this much time:"+str(sleepTime))
time.sleep(sleepTime)
print("Process: "+str(id)+" took this much time:"+str(sleepTime*2))
if __name__ == '__main__':
timeout_function=float(9) # 9 seconds for max function time
print("Main Process ID: "+str(os.getpid()))
lock=Lock()
p1=Process(target=f, args=(lock,1,6,)) #Here you can change from 6 to 3 for instance, so you can watch the behavior
start=time.time()
print(type(start))
p1.start()
if p1.is_alive():
print("process running a")
else:
print("process not running a")
while p1.is_alive():
timeout=time.time()
if timeout-start > timeout_function:
p1.terminate()
print("process terminated")
print("watching, time passed: "+str(timeout-start) )
time.sleep(1)
if p1.is_alive():
print("process running b")
else:
print("process not running b")
p1.join()
if p1.is_alive():
print("process running c")
else:
print("process not running c")
end=time.time()
print("I am the main process, the two processes are done")
print("Time taken:- "+str(end-start)+" secs") #MainProcess terminates at approx ~ 5 secs.
time.sleep(5) # To see if on Task Manager the child process is really being terminated, and it is
print("finishing")
The main code is from this link:
Create two child process using python(windows)
Then I used .terminate() to kill the child process. You can see that the function f calls 2 prints, one after 5 seconds and another after 10 seconds. However, with a 7 seconds sleep and the terminate(), it does not show the last print.
It worked for me, hope it helps!

Python ThreadPoolExecutor terminate all threads

I am running a piece of python code in which multiple threads are run through threadpool executor. Each thread is supposed to perform a task (fetch a webpage for example). What I want to be able to do is to terminate all threads, even if one of the threads fail. For instance:
with ThreadPoolExecutor(self._num_threads) as executor:
jobs = []
for path in paths:
kw = {"path": path}
jobs.append(executor.submit(start,**kw))
for job in futures.as_completed(jobs):
result = job.result()
print(result)
def start(*args,**kwargs):
#fetch the page
if(success):
return True
else:
#Signal all threads to stop
Is it possible to do so? The results returned by threads are useless to me unless all of them are successful, so if even one of them fails, I would like to save some execution time of the rest of the threads and terminate them immediately. The actual code obviously is doing relatively lengthy tasks with a couple of failure points.
If you are done with threads and want to look into processes, then this peace of code here looks very promising and simple, almost the same syntax as thread, but with the multiprocessing module.
When the timeout flag expires the process is terminated, very convenient.
import multiprocessing
def get_page(*args, **kwargs):
# your web page downloading code goes here
def start_get_page(timeout, *args, **kwargs):
p = multiprocessing.Process(target=get_page, args=args, kwargs=kwargs)
p.start()
p.join(timeout)
if p.is_alive():
# stop the downloading 'thread'
p.terminate()
# and then do any post-error processing here
if __name__ == "__main__":
start_get_page(timeout, *args, **kwargs)
I have created an answer for a similar question I had, which I think will work for this question.
Terminate executor using ThreadPoolExecutor from concurrent.futures module
from concurrent.futures import ThreadPoolExecutor, as_completed
from time import sleep
NUM_REQUESTS = 100
def long_request(id):
sleep(1)
# Simulate bad response
if id == 10:
return {"data": {"valid": False}}
else:
return {"data": {"valid": True}}
def check_results(results):
valid = True
for result in results:
valid = result["data"]["valid"]
return valid
def main():
futures = []
responses = []
num_requests = 0
with ThreadPoolExecutor(max_workers=10) as executor:
for request_index in range(NUM_REQUESTS):
future = executor.submit(long_request, request_index)
# Future list
futures.append(future)
for future in as_completed(futures):
is_responses_valid = check_results(responses)
# Cancel all future requests if one invalid
if not is_responses_valid:
executor.shutdown(wait=False)
else:
# Append valid responses
num_requests += 1
responses.append(future.result())
return num_requests
if __name__ == "__main__":
requests = main()
print("Num Requests: ", requests)
In my code I used multiprocessing
import multiprocessing as mp
pool = mp.Pool()
for i in range(threadNumber):
pool.apply_async(publishMessage, args=(map_metrics, connection_parameters...,))
pool.close()
pool.terminate()
This is how I would do it:
import concurrent.futures
def start(*args,**kwargs):
#fetch the page
if(success):
return True
else:
return False
with concurrent.futures.ProcessPoolExecutor() as executor:
results = [executor.submit(start, {"path": path}) for path in paths]
concurrent.futures.wait(results, timeout=10, return_when=concurrent.futures.FIRST_COMPLETED)
for f in concurrent.futures.as_completed(results):
f_success = f.result()
if not f_success:
executor.shutdown(wait=False, cancel_futures=True) # shutdown if one fails
else:
#do stuff here
If any result is not True, everything will be shut down immediately.
You can try to use StoppableThread from func-timeout.
But terminating threads is strongly discouraged. And if you need to kill a thread, you probably have a design problem. Look at alternatives: asyncio coroutines and multiprocessing with legal cancel/terminating functionality.

Start asyncio loop from running loop

I need to create a Runner class that can asynchronously runs multiple jobs and return the results one all jobs are completed.
Here is the dummy code snippet which I already have
import asyncio
class Runner:
def __init__(self):
self.jobs_loop = asyncio.new_event_loop()
async def slow_process(self, *args):
... # run slow proces with args
await asyncio.sleep(5)
return "result"
def batch_run(self, count, *args):
results = self.jobs_loop.run_until_complete(asyncio.gather(
*[self.slow_process(*args) for _ in range(count)],
loop=self.jobs_loop,
return_exceptions=True
))
return results
runner = Runner()
print(runner.batch_run(10))
When the batch_run is called, it schedules count jobs and blocks until finished.
This, however, doesn't work when called from a running event loop raising
RuntimeError: Cannot run the event loop while another loop is running
How can I make batch_run work when called from existing event loop (e.g. Jupyter Console) as well as being callable from outside of the event loop.
EDIT
Ideally the function would look like this
def batch_run(self, count, *args):
loop = asyncio.get_event_loop()
jobs = asyncio.gather(...)
if not loop.is_running():
return loop.run_until_complete(jobs)
else:
future = asyncio.ensure_future(jobs, loop=loop)
# loop.wait_until_completed(future)
return future.result()
if only loop.wait_until_completed existed.

How to best perform Multiprocessing within requests with the python Tornado server?

I am using the I/O non-blocking python server Tornado. I have a class of GET requests which may take a significant amount of time to complete (think in the range of 5-10 seconds). The problem is that Tornado blocks on these requests so that subsequent fast requests are held up until the slow request completes.
I looked at: https://github.com/facebook/tornado/wiki/Threading-and-concurrency and came to the conclusion that I wanted some combination of #3 (other processes) and #4 (other threads). #4 on its own had issues and I was unable to get reliable control back to the ioloop when there was another thread doing the "heavy_lifting". (I assume that this was due to the GIL and the fact that the heavy_lifting task has high CPU load and keeps pulling control away from the main ioloop, but thats a guess).
So I have been prototyping how to solve this by doing "heavy lifting" tasks within these slow GET requests in a separate process and then place a callback back into the Tornado ioloop when the process is done to finish the request. This frees up the ioloop to handle other requests.
I have created a simple example demonstrating a possible solution, but am curious to get feedback from the community on it.
My question is two-fold: How can this current approach be simplified? What pitfalls potentially exist with it?
The Approach
Utilize Tornado's builtin asynchronous decorator which allows a request to stay open and for the ioloop to continue.
Spawn a separate process for "heavy lifting" tasks using python's multiprocessing module. I first attempted to use the threading module but was unable to get any reliable relinquishing of control back to the ioloop. It also appears that mutliprocessing would also take advantage of multicores.
Start a 'watcher' thread in the main ioloop process using the threading module who's job it is to watch a multiprocessing.Queue for the results of the "heavy lifting" task when it completes. This was needed because I needed a way to know that the heavy_lifting task had completed while being able to still notify the ioloop that this request was now finished.
Be sure that the 'watcher' thread relinquishes control to the main ioloop loop often with time.sleep(0) calls so that other requests continue to get readily processed.
When there is a result in the queue then add a callback from the "watcher" thread using tornado.ioloop.IOLoop.instance().add_callback() which is documented to be the only safe way to call ioloop instances from other threads.
Be sure to then call finish() in the callback to complete the request and hand over a reply.
Below is some sample code showing this approach. multi_tornado.py is the server implementing the above outline and call_multi.py is a sample script that calls the server in two different ways to test the server. Both tests call the server with 3 slow GET requests followed by 20 fast GET requests. The results are shown for both running with and without the threading turned on.
In the case of running it with "no threading" the 3 slow requests block (each taking a little over a second to complete). A few of the 20 fast requests squeeze through in between some of the slow requests within the ioloop (not totally sure how that occurs - but could be an artifact that I am running both the server and client test script on the same machine). The point here being that all of the fast requests are held up to varying degrees.
In the case of running it with threading enabled the 20 fast requests all complete first immediately and the three slow requests complete at about the same time afterwards as they have each been running in parallel. This is the desired behavior. The three slow requests take 2.5 seconds to complete in parallel - whereas in the non threaded case the three slow requests take about 3.5 seconds in total. So there is about 35% speed up overall (I assume due to multicore sharing). But more importantly - the fast requests were immediately handled in leu of the slow ones.
I do not have a lot experience with multithreaded programming - so while this seemingly works here I am curious to learn:
Is there a simpler way to accomplish this? What monster's may lurk within this approach?
(Note: A future tradeoff may be to just run more instances of Tornado with a reverse proxy like nginx doing load balancing. No matter what I will be running multiple instances with a load balancer - but I am concerned about just throwing hardware at this problem since it seems that the hardware is so directly coupled to the problem in terms of the blocking.)
Sample Code
multi_tornado.py (sample server):
import time
import threading
import multiprocessing
import math
from tornado.web import RequestHandler, Application, asynchronous
from tornado.ioloop import IOLoop
# run in some other process - put result in q
def heavy_lifting(q):
t0 = time.time()
for k in range(2000):
math.factorial(k)
t = time.time()
q.put(t - t0) # report time to compute in queue
class FastHandler(RequestHandler):
def get(self):
res = 'fast result ' + self.get_argument('id')
print res
self.write(res)
self.flush()
class MultiThreadedHandler(RequestHandler):
# Note: This handler can be called with threaded = True or False
def initialize(self, threaded=True):
self._threaded = threaded
self._q = multiprocessing.Queue()
def start_process(self, worker, callback):
# method to start process and watcher thread
self._callback = callback
if self._threaded:
# launch process
multiprocessing.Process(target=worker, args=(self._q,)).start()
# start watching for process to finish
threading.Thread(target=self._watcher).start()
else:
# threaded = False just call directly and block
worker(self._q)
self._watcher()
def _watcher(self):
# watches the queue for process result
while self._q.empty():
time.sleep(0) # relinquish control if not ready
# put callback back into the ioloop so we can finish request
response = self._q.get(False)
IOLoop.instance().add_callback(lambda: self._callback(response))
class SlowHandler(MultiThreadedHandler):
#asynchronous
def get(self):
# start a thread to watch for
self.start_process(heavy_lifting, self._on_response)
def _on_response(self, delta):
_id = self.get_argument('id')
res = 'slow result {} <--- {:0.3f} s'.format(_id, delta)
print res
self.write(res)
self.flush()
self.finish() # be sure to finish request
application = Application([
(r"/fast", FastHandler),
(r"/slow", SlowHandler, dict(threaded=False)),
(r"/slow_threaded", SlowHandler, dict(threaded=True)),
])
if __name__ == "__main__":
application.listen(8888)
IOLoop.instance().start()
call_multi.py (client tester):
import sys
from tornado.ioloop import IOLoop
from tornado import httpclient
def run(slow):
def show_response(res):
print res.body
# make 3 "slow" requests on server
requests = []
for k in xrange(3):
uri = 'http://localhost:8888/{}?id={}'
requests.append(uri.format(slow, str(k + 1)))
# followed by 20 "fast" requests
for k in xrange(20):
uri = 'http://localhost:8888/fast?id={}'
requests.append(uri.format(k + 1))
# show results as they return
http_client = httpclient.AsyncHTTPClient()
print 'Scheduling Get Requests:'
print '------------------------'
for req in requests:
print req
http_client.fetch(req, show_response)
# execute requests on server
print '\nStart sending requests....'
IOLoop.instance().start()
if __name__ == '__main__':
scenario = sys.argv[1]
if scenario == 'slow' or scenario == 'slow_threaded':
run(scenario)
Test Results
By running python call_multi.py slow (the blocking behavior):
Scheduling Get Requests:
------------------------
http://localhost:8888/slow?id=1
http://localhost:8888/slow?id=2
http://localhost:8888/slow?id=3
http://localhost:8888/fast?id=1
http://localhost:8888/fast?id=2
http://localhost:8888/fast?id=3
http://localhost:8888/fast?id=4
http://localhost:8888/fast?id=5
http://localhost:8888/fast?id=6
http://localhost:8888/fast?id=7
http://localhost:8888/fast?id=8
http://localhost:8888/fast?id=9
http://localhost:8888/fast?id=10
http://localhost:8888/fast?id=11
http://localhost:8888/fast?id=12
http://localhost:8888/fast?id=13
http://localhost:8888/fast?id=14
http://localhost:8888/fast?id=15
http://localhost:8888/fast?id=16
http://localhost:8888/fast?id=17
http://localhost:8888/fast?id=18
http://localhost:8888/fast?id=19
http://localhost:8888/fast?id=20
Start sending requests....
slow result 1 <--- 1.338 s
fast result 1
fast result 2
fast result 3
fast result 4
fast result 5
fast result 6
fast result 7
slow result 2 <--- 1.169 s
slow result 3 <--- 1.130 s
fast result 8
fast result 9
fast result 10
fast result 11
fast result 13
fast result 12
fast result 14
fast result 15
fast result 16
fast result 18
fast result 17
fast result 19
fast result 20
By running python call_multi.py slow_threaded (the desired behavior):
Scheduling Get Requests:
------------------------
http://localhost:8888/slow_threaded?id=1
http://localhost:8888/slow_threaded?id=2
http://localhost:8888/slow_threaded?id=3
http://localhost:8888/fast?id=1
http://localhost:8888/fast?id=2
http://localhost:8888/fast?id=3
http://localhost:8888/fast?id=4
http://localhost:8888/fast?id=5
http://localhost:8888/fast?id=6
http://localhost:8888/fast?id=7
http://localhost:8888/fast?id=8
http://localhost:8888/fast?id=9
http://localhost:8888/fast?id=10
http://localhost:8888/fast?id=11
http://localhost:8888/fast?id=12
http://localhost:8888/fast?id=13
http://localhost:8888/fast?id=14
http://localhost:8888/fast?id=15
http://localhost:8888/fast?id=16
http://localhost:8888/fast?id=17
http://localhost:8888/fast?id=18
http://localhost:8888/fast?id=19
http://localhost:8888/fast?id=20
Start sending requests....
fast result 1
fast result 2
fast result 3
fast result 4
fast result 5
fast result 6
fast result 7
fast result 8
fast result 9
fast result 10
fast result 11
fast result 12
fast result 13
fast result 14
fast result 15
fast result 19
fast result 20
fast result 17
fast result 16
fast result 18
slow result 2 <--- 2.485 s
slow result 3 <--- 2.491 s
slow result 1 <--- 2.517 s
If you're willing to use concurrent.futures.ProcessPoolExecutor instead of multiprocessing, this is actually very simple. Tornado's ioloop already supports concurrent.futures.Future, so they'll play nicely together out of the box. concurrent.futures is included in Python 3.2+, and has been backported to Python 2.x.
Here's an example:
import time
from concurrent.futures import ProcessPoolExecutor
from tornado.ioloop import IOLoop
from tornado import gen
def f(a, b, c, blah=None):
print "got %s %s %s and %s" % (a, b, c, blah)
time.sleep(5)
return "hey there"
#gen.coroutine
def test_it():
pool = ProcessPoolExecutor(max_workers=1)
fut = pool.submit(f, 1, 2, 3, blah="ok") # This returns a concurrent.futures.Future
print("running it asynchronously")
ret = yield fut
print("it returned %s" % ret)
pool.shutdown()
IOLoop.instance().run_sync(test_it)
Output:
running it asynchronously
got 1 2 3 and ok
it returned hey there
ProcessPoolExecutor has a more limited API than multiprocessing.Pool, but if you don't need the more advanced features of multiprocessing.Pool, it's worth using because the integration is so much simpler.
multiprocessing.Pool can be integrated into the tornado I/O loop, but it's a bit messy. A much cleaner integration can be done using concurrent.futures (see my other answer for details), but if you're stuck on Python 2.x and can't install the concurrent.futures backport, here is how you can do it strictly using multiprocessing:
The multiprocessing.Pool.apply_async and multiprocessing.Pool.map_async methods both have an optional callback parameter, which means that both can potentially be plugged into a tornado.gen.Task. So in most cases, running code asynchronously in a sub-process is as simple as this:
import multiprocessing
import contextlib
from tornado import gen
from tornado.gen import Return
from tornado.ioloop import IOLoop
from functools import partial
def worker():
print "async work here"
#gen.coroutine
def async_run(func, *args, **kwargs):
result = yield gen.Task(pool.apply_async, func, args, kwargs)
raise Return(result)
if __name__ == "__main__":
pool = multiprocessing.Pool(multiprocessing.cpu_count())
func = partial(async_run, worker)
IOLoop().run_sync(func)
As I mentioned, this works well in most cases. But if worker() throws an exception, callback is never called, which means the gen.Task never finishes, and you hang forever. Now, if you know that your work will never throw an exception (because you wrapped the whole thing in a try/except, for example), you can happily use this approach. However, if you want to let exceptions escape from your worker, the only solution I found was to subclass some multiprocessing components, and make them call callback even if the worker sub-process raised an exception:
from multiprocessing.pool import ApplyResult, Pool, RUN
import multiprocessing
class TornadoApplyResult(ApplyResult):
def _set(self, i, obj):
self._success, self._value = obj
if self._callback:
self._callback(self._value)
self._cond.acquire()
try:
self._ready = True
self._cond.notify()
finally:
self._cond.release()
del self._cache[self._job]
class TornadoPool(Pool):
def apply_async(self, func, args=(), kwds={}, callback=None):
''' Asynchronous equivalent of `apply()` builtin
This version will call `callback` even if an exception is
raised by `func`.
'''
assert self._state == RUN
result = TornadoApplyResult(self._cache, callback)
self._taskqueue.put(([(result._job, None, func, args, kwds)], None))
return result
...
if __name__ == "__main__":
pool = TornadoPool(multiprocessing.cpu_count())
...
With these changes, the exception object will be returned by the gen.Task, rather than the gen.Task hanging indefinitely. I also updated my async_run method to re-raise the exception when its returned, and made some other changes to provide better tracebacks for exceptions thrown in the worker sub-processes. Here's the full code:
import multiprocessing
from multiprocessing.pool import Pool, ApplyResult, RUN
from functools import wraps
import tornado.web
from tornado.ioloop import IOLoop
from tornado.gen import Return
from tornado import gen
class WrapException(Exception):
def __init__(self):
exc_type, exc_value, exc_tb = sys.exc_info()
self.exception = exc_value
self.formatted = ''.join(traceback.format_exception(exc_type, exc_value, exc_tb))
def __str__(self):
return '\n%s\nOriginal traceback:\n%s' % (Exception.__str__(self), self.formatted)
class TornadoApplyResult(ApplyResult):
def _set(self, i, obj):
self._success, self._value = obj
if self._callback:
self._callback(self._value)
self._cond.acquire()
try:
self._ready = True
self._cond.notify()
finally:
self._cond.release()
del self._cache[self._job]
class TornadoPool(Pool):
def apply_async(self, func, args=(), kwds={}, callback=None):
''' Asynchronous equivalent of `apply()` builtin
This version will call `callback` even if an exception is
raised by `func`.
'''
assert self._state == RUN
result = TornadoApplyResult(self._cache, callback)
self._taskqueue.put(([(result._job, None, func, args, kwds)], None))
return result
#gen.coroutine
def async_run(func, *args, **kwargs):
""" Runs the given function in a subprocess.
This wraps the given function in a gen.Task and runs it
in a multiprocessing.Pool. It is meant to be used as a
Tornado co-routine. Note that if func returns an Exception
(or an Exception sub-class), this function will raise the
Exception, rather than return it.
"""
result = yield gen.Task(pool.apply_async, func, args, kwargs)
if isinstance(result, Exception):
raise result
raise Return(result)
def handle_exceptions(func):
""" Raise a WrapException so we get a more meaningful traceback"""
#wraps(func)
def inner(*args, **kwargs):
try:
return func(*args, **kwargs)
except Exception:
raise WrapException()
return inner
# Test worker functions
#handle_exceptions
def test2(x):
raise Exception("eeee")
#handle_exceptions
def test(x):
print x
time.sleep(2)
return "done"
class TestHandler(tornado.web.RequestHandler):
#gen.coroutine
def get(self):
try:
result = yield async_run(test, "inside get")
self.write("%s\n" % result)
result = yield async_run(test2, "hi2")
except Exception as e:
print("caught exception in get")
self.write("Caught an exception: %s" % e)
finally:
self.finish()
app = tornado.web.Application([
(r"/test", TestHandler),
])
if __name__ == "__main__":
pool = TornadoPool(4)
app.listen(8888)
IOLoop.instance().start()
Here's how it behaves for the client:
dan#dan:~$ curl localhost:8888/test
done
Caught an exception:
Original traceback:
Traceback (most recent call last):
File "./mutli.py", line 123, in inner
return func(*args, **kwargs)
File "./mutli.py", line 131, in test2
raise Exception("eeee")
Exception: eeee
And if I send two simultaneous curl requests, we can see they're handled asynchronously on the server-side:
dan#dan:~$ ./mutli.py
inside get
inside get
caught exception inside get
caught exception inside get
Edit:
Note that this code becomes simpler with Python 3, because it introduces an error_callback keyword argument to all asynchronous multiprocessing.Pool methods. This makes it much easier to integrate with Tornado:
class TornadoPool(Pool):
def apply_async(self, func, args=(), kwds={}, callback=None):
''' Asynchronous equivalent of `apply()` builtin
This version will call `callback` even if an exception is
raised by `func`.
'''
super().apply_async(func, args, kwds, callback=callback,
error_callback=callback)
#gen.coroutine
def async_run(func, *args, **kwargs):
""" Runs the given function in a subprocess.
This wraps the given function in a gen.Task and runs it
in a multiprocessing.Pool. It is meant to be used as a
Tornado co-routine. Note that if func returns an Exception
(or an Exception sub-class), this function will raise the
Exception, rather than return it.
"""
result = yield gen.Task(pool.apply_async, func, args, kwargs)
raise Return(result)
All we need to do in our overridden apply_async is call the parent with the error_callback keyword argument, in addition to the callback kwarg. No need to override ApplyResult.
We can get even fancier by using a MetaClass in our TornadoPool, to allow its *_async methods to be called directly as if they were coroutines:
import time
from functools import wraps
from multiprocessing.pool import Pool
import tornado.web
from tornado import gen
from tornado.gen import Return
from tornado import stack_context
from tornado.ioloop import IOLoop
from tornado.concurrent import Future
def _argument_adapter(callback):
def wrapper(*args, **kwargs):
if kwargs or len(args) > 1:
callback(Arguments(args, kwargs))
elif args:
callback(args[0])
else:
callback(None)
return wrapper
def PoolTask(func, *args, **kwargs):
""" Task function for use with multiprocessing.Pool methods.
This is very similar to tornado.gen.Task, except it sets the
error_callback kwarg in addition to the callback kwarg. This
way exceptions raised in pool worker methods get raised in the
parent when the Task is yielded from.
"""
future = Future()
def handle_exception(typ, value, tb):
if future.done():
return False
future.set_exc_info((typ, value, tb))
return True
def set_result(result):
if future.done():
return
if isinstance(result, Exception):
future.set_exception(result)
else:
future.set_result(result)
with stack_context.ExceptionStackContext(handle_exception):
cb = _argument_adapter(set_result)
func(*args, callback=cb, error_callback=cb)
return future
def coro_runner(func):
""" Wraps the given func in a PoolTask and returns it. """
#wraps(func)
def wrapper(*args, **kwargs):
return PoolTask(func, *args, **kwargs)
return wrapper
class MetaPool(type):
""" Wrap all *_async methods in Pool with coro_runner. """
def __new__(cls, clsname, bases, dct):
pdct = bases[0].__dict__
for attr in pdct:
if attr.endswith("async") and not attr.startswith('_'):
setattr(bases[0], attr, coro_runner(pdct[attr]))
return super().__new__(cls, clsname, bases, dct)
class TornadoPool(Pool, metaclass=MetaPool):
pass
# Test worker functions
def test2(x):
print("hi2")
raise Exception("eeee")
def test(x):
print(x)
time.sleep(2)
return "done"
class TestHandler(tornado.web.RequestHandler):
#gen.coroutine
def get(self):
try:
result = yield pool.apply_async(test, ("inside get",))
self.write("%s\n" % result)
result = yield pool.apply_async(test2, ("hi2",))
self.write("%s\n" % result)
except Exception as e:
print("caught exception in get")
self.write("Caught an exception: %s" % e)
raise
finally:
self.finish()
app = tornado.web.Application([
(r"/test", TestHandler),
])
if __name__ == "__main__":
pool = TornadoPool()
app.listen(8888)
IOLoop.instance().start()
If your get requests are taking that long then tornado is the wrong framework.
I suggest you use nginx to route the fast gets to tornado and the slower ones to a different server.
PeterBe has an interesting article where he runs multiple Tornado servers and sets one of them to be 'the slow one' for handling the long running requests see: worrying-about-io-blocking I would try this method.

Asynchronous method call in Python?

I was wondering if there's any library for asynchronous method calls in Python. It would be great if you could do something like
#async
def longComputation():
<code>
token = longComputation()
token.registerCallback(callback_function)
# alternative, polling
while not token.finished():
doSomethingElse()
if token.finished():
result = token.result()
Or to call a non-async routine asynchronously
def longComputation()
<code>
token = asynccall(longComputation())
It would be great to have a more refined strategy as native in the language core. Was this considered?
Something like:
import threading
thr = threading.Thread(target=foo, args=(), kwargs={})
thr.start() # Will run "foo"
....
thr.is_alive() # Will return whether foo is running currently
....
thr.join() # Will wait till "foo" is done
See the documentation at https://docs.python.org/library/threading.html for more details.
You can use the multiprocessing module added in Python 2.6. You can use pools of processes and then get results asynchronously with:
apply_async(func[, args[, kwds[, callback]]])
E.g.:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=1) # Start a worker processes.
result = pool.apply_async(f, [10], callback) # Evaluate "f(10)" asynchronously calling callback when finished.
This is only one alternative. This module provides lots of facilities to achieve what you want. Also it will be really easy to make a decorator from this.
As of Python 3.5, you can use enhanced generators for async functions.
import asyncio
import datetime
Enhanced generator syntax:
#asyncio.coroutine
def display_date(loop):
end_time = loop.time() + 5.0
while True:
print(datetime.datetime.now())
if (loop.time() + 1.0) >= end_time:
break
yield from asyncio.sleep(1)
loop = asyncio.get_event_loop()
# Blocking call which returns when the display_date() coroutine is done
loop.run_until_complete(display_date(loop))
loop.close()
New async/await syntax:
async def display_date(loop):
end_time = loop.time() + 5.0
while True:
print(datetime.datetime.now())
if (loop.time() + 1.0) >= end_time:
break
await asyncio.sleep(1)
loop = asyncio.get_event_loop()
# Blocking call which returns when the display_date() coroutine is done
loop.run_until_complete(display_date(loop))
loop.close()
It's not in the language core, but a very mature library that does what you want is Twisted. It introduces the Deferred object, which you can attach callbacks or error handlers ("errbacks") to. A Deferred is basically a "promise" that a function will have a result eventually.
You can implement a decorator to make your functions asynchronous, though that's a bit tricky. The multiprocessing module is full of little quirks and seemingly arbitrary restrictions – all the more reason to encapsulate it behind a friendly interface, though.
from inspect import getmodule
from multiprocessing import Pool
def async(decorated):
r'''Wraps a top-level function around an asynchronous dispatcher.
when the decorated function is called, a task is submitted to a
process pool, and a future object is returned, providing access to an
eventual return value.
The future object has a blocking get() method to access the task
result: it will return immediately if the job is already done, or block
until it completes.
This decorator won't work on methods, due to limitations in Python's
pickling machinery (in principle methods could be made pickleable, but
good luck on that).
'''
# Keeps the original function visible from the module global namespace,
# under a name consistent to its __name__ attribute. This is necessary for
# the multiprocessing pickling machinery to work properly.
module = getmodule(decorated)
decorated.__name__ += '_original'
setattr(module, decorated.__name__, decorated)
def send(*args, **opts):
return async.pool.apply_async(decorated, args, opts)
return send
The code below illustrates usage of the decorator:
#async
def printsum(uid, values):
summed = 0
for value in values:
summed += value
print("Worker %i: sum value is %i" % (uid, summed))
return (uid, summed)
if __name__ == '__main__':
from random import sample
# The process pool must be created inside __main__.
async.pool = Pool(4)
p = range(0, 1000)
results = []
for i in range(4):
result = printsum(i, sample(p, 100))
results.append(result)
for result in results:
print("Worker %i: sum value is %i" % result.get())
In a real-world case I would ellaborate a bit more on the decorator, providing some way to turn it off for debugging (while keeping the future interface in place), or maybe a facility for dealing with exceptions; but I think this demonstrates the principle well enough.
Just
import threading, time
def f():
print "f started"
time.sleep(3)
print "f finished"
threading.Thread(target=f).start()
My solution is:
import threading
class TimeoutError(RuntimeError):
pass
class AsyncCall(object):
def __init__(self, fnc, callback = None):
self.Callable = fnc
self.Callback = callback
def __call__(self, *args, **kwargs):
self.Thread = threading.Thread(target = self.run, name = self.Callable.__name__, args = args, kwargs = kwargs)
self.Thread.start()
return self
def wait(self, timeout = None):
self.Thread.join(timeout)
if self.Thread.isAlive():
raise TimeoutError()
else:
return self.Result
def run(self, *args, **kwargs):
self.Result = self.Callable(*args, **kwargs)
if self.Callback:
self.Callback(self.Result)
class AsyncMethod(object):
def __init__(self, fnc, callback=None):
self.Callable = fnc
self.Callback = callback
def __call__(self, *args, **kwargs):
return AsyncCall(self.Callable, self.Callback)(*args, **kwargs)
def Async(fnc = None, callback = None):
if fnc == None:
def AddAsyncCallback(fnc):
return AsyncMethod(fnc, callback)
return AddAsyncCallback
else:
return AsyncMethod(fnc, callback)
And works exactly as requested:
#Async
def fnc():
pass
You could use eventlet. It lets you write what appears to be synchronous code, but have it operate asynchronously over the network.
Here's an example of a super minimal crawler:
urls = ["http://www.google.com/intl/en_ALL/images/logo.gif",
"https://wiki.secondlife.com/w/images/secondlife.jpg",
"http://us.i1.yimg.com/us.yimg.com/i/ww/beta/y3.gif"]
import eventlet
from eventlet.green import urllib2
def fetch(url):
return urllib2.urlopen(url).read()
pool = eventlet.GreenPool()
for body in pool.imap(fetch, urls):
print "got body", len(body)
Something like this works for me, you can then call the function, and it will dispatch itself onto a new thread.
from thread import start_new_thread
def dowork(asynchronous=True):
if asynchronous:
args = (False)
start_new_thread(dowork,args) #Call itself on a new thread.
else:
while True:
#do something...
time.sleep(60) #sleep for a minute
return
You can use concurrent.futures (added in Python 3.2).
import time
from concurrent.futures import ThreadPoolExecutor
def long_computation(duration):
for x in range(0, duration):
print(x)
time.sleep(1)
return duration * 2
print('Use polling')
with ThreadPoolExecutor(max_workers=1) as executor:
future = executor.submit(long_computation, 5)
while not future.done():
print('waiting...')
time.sleep(0.5)
print(future.result())
print('Use callback')
executor = ThreadPoolExecutor(max_workers=1)
future = executor.submit(long_computation, 5)
future.add_done_callback(lambda f: print(f.result()))
print('waiting for callback')
executor.shutdown(False) # non-blocking
print('shutdown invoked')
The newer asyncio running method in Python 3.7 and later is using asyncio.run() instead of creating loop and calling loop.run_until_complete() as well as closing it:
import asyncio
import datetime
async def display_date(delay):
loop = asyncio.get_running_loop()
end_time = loop.time() + delay
while True:
print("Blocking...", datetime.datetime.now())
await asyncio.sleep(1)
if loop.time() > end_time:
print("Done.")
break
asyncio.run(display_date(5))
Is there any reason not to use threads? You can use the threading class.
Instead of finished() function use the isAlive(). The result() function could join() the thread and retrieve the result. And, if you can, override the run() and __init__ functions to call the function specified in the constructor and save the value somewhere to the instance of the class.
The native Python way for asynchronous calls in 2021 with Python 3.9 suitable also for Jupyter / Ipython Kernel
Camabeh's answer is the way to go since Python 3.3.
async def display_date(loop):
end_time = loop.time() + 5.0
while True:
print(datetime.datetime.now())
if (loop.time() + 1.0) >= end_time:
break
await asyncio.sleep(1)
loop = asyncio.get_event_loop()
# Blocking call which returns when the display_date() coroutine is done
loop.run_until_complete(display_date(loop))
loop.close()
This will work in Jupyter Notebook / Jupyter Lab but throw an error:
RuntimeError: This event loop is already running
Due to Ipython's usage of event loops we need something called nested asynchronous loops which is not yet implemented in Python. Luckily there is nest_asyncio to deal with the issue. All you need to do is:
!pip install nest_asyncio # use ! within Jupyter Notebook, else pip install in shell
import nest_asyncio
nest_asyncio.apply()
(Based on this thread)
Only when you call loop.close() it throws another error as it probably refers to Ipython's main loop.
RuntimeError: Cannot close a running event loop
I'll update this answer as soon as someone answered to this github issue.
You can use process. If you want to run it forever use while (like networking) in you function:
from multiprocessing import Process
def foo():
while 1:
# Do something
p = Process(target = foo)
p.start()
if you just want to run it one time, do like that:
from multiprocessing import Process
def foo():
# Do something
p = Process(target = foo)
p.start()
p.join()

Categories

Resources