I am prototyping a FastAPI app with an endpoint that will launch long-running process using subprocess module. The obvious solution is to use concurrent.futures and ProcessPoolExecutor, however I am unable to get the behavior I want. Code sample:
import asyncio
from concurrent.futures import ProcessPoolExecutor
import subprocess as sb
import time
import random
pool = ProcessPoolExecutor(5)
def long_task(s):
print("started")
time.sleep(random.randrange(5, 15))
sb.check_output(["touch", str(s)])
print("done")
async def async_task():
loop = asyncio.get_event_loop()
print("started")
tasks = [loop.run_in_executor(pool, long_task, i) for i in range(10)]
while True:
print("in async task")
done, _ = await asyncio.wait(tasks, timeout=1)
for task in done:
await task
await asyncio.sleep(1)
def main():
loop = asyncio.get_event_loop()
loop.run_until_complete(async_task())
if __name__ == "__main__":
main()
This sample works fine, on the surface, but spawned processes do not get stopped after execution completes - I see all of python processes in ps aux | grep python. Shouldn't awaiting completed task stop it? In the end I do not care much about the result of the execution, it just should happen in the background and exit cleanly - without any hanging processes.
You must close the ProcessPool when you are done using it, either by explicitly calling its shutdown() method, or using it in a ContextManager. I used the ContextManager approach.
I don't know what subprocess.check_output does, so I commented it out.
I also replaced your infinite loop with a single call to asyncio.gather, which will yield until the Executor is finished.
I'm on Windows, so to observe the creation/deletion of Processes I watched the Windows Task Manager. The program creates 5 subprocesses and closes them again when the ProcessPool context manager exits.
import asyncio
from concurrent.futures import ProcessPoolExecutor
# import subprocess as sb
import time
import random
def long_task(s):
print("started")
time.sleep(random.randrange(5, 15))
# sb.check_output(["touch", str(s)])
print("done", s)
async def async_task():
loop = asyncio.get_event_loop()
print("started")
with ProcessPoolExecutor(5) as pool:
tasks = [loop.run_in_executor(pool, long_task, i) for i in range(10)]
await asyncio.gather(*tasks)
print("Completely done")
def main():
asyncio.run(async_task())
if __name__ == "__main__":
main()
Related
I want to start a new Process (Pricefeed) from my Executor class and then have the Executor class keep running in its own event loop (the shoot method). In my current attempt, the asyncio loop gets blocked on the line p.join(). However, without that line, my code just exits. How do I do this properly?
Note: fh.run() blocks as well.
import asyncio
from multiprocessing import Process, Queue
from cryptofeed import FeedHandler
from cryptofeed.defines import L2_BOOK
from cryptofeed.exchanges.ftx import FTX
class Pricefeed(Process):
def __init__(self, queue: Queue):
super().__init__()
self.coin_symbol = 'SOL-USD'
self.fut_symbol = 'SOL-USD-PERP'
self.queue = queue
async def _book_update(self, feed, symbol, book, timestamp, receipt_timestamp):
self.queue.put(book)
def run(self):
fh = FeedHandler()
fh.add_feed(FTX(symbols=[self.fut_symbol, self.coin_symbol], channels=[L2_BOOK],
callbacks={L2_BOOK: self._book_update}))
fh.run()
class Executor:
def __init__(self):
self.q = Queue()
async def shoot(self):
print('in shoot')
for i in range(5):
msg = self.q.get()
print(msg)
await asyncio.sleep(1) # do some stuff
async def run(self):
asyncio.create_task(self.shoot())
p = Pricefeed(self.q)
p.start()
p.join()
async def main():
g = Executor()
await g.run()
if __name__ == '__main__':
asyncio.run(main())
Since you're using a queue to communicate this is a somewhat tricky problem. To answer your first question as to why removing join makes the program work, join blocks until the process finishes. In asyncio you can't do anything blocking in a function marked async or it will freeze the event loop. To do this properly you'll need to run your process with the asyncio event loop's run_in_executor method which will run things in a process pool and return an awaitable that is compatible with the asyncio event loop.
Secondly, you'll need to use a multiprocessing Manager which creates shared state that can be used by multiple processes to properly share your queue. Managers directly support creation of a shared queue. Using these two bits of knowledge you can adapt your code to something like the following which works:
import asyncio
import functools
import time
from multiprocessing import Manager
from concurrent.futures import ProcessPoolExecutor
def run_pricefeed(queue):
i = 0
while True: #simulate putting an item on the queue every 250ms
queue.put(f'test-{i}')
i += 1
time.sleep(.25)
class Executor:
async def shoot(self, queue):
print('in shoot')
for i in range(5):
while not queue.empty():
msg = queue.get(block=False)
print(msg)
await asyncio.sleep(1) # do some stuff
async def run(self):
with ProcessPoolExecutor() as pool:
with Manager() as manager:
queue = manager.Queue()
asyncio.create_task(self.shoot(queue))
await asyncio.get_running_loop().run_in_executor(pool, functools.partial(run_pricefeed, queue))
async def main():
g = Executor()
await g.run()
if __name__ == '__main__':
asyncio.run(main())
This code has a drawback in that you need to empty the queue in a non-blocking fashing from your asyncio process and wait for a while for new items to come in before emptying it again, effectively implementing a polling mechanism. If you don't wait after emptying, you'll wind up with blocking code and you will freeze the event loop again. This isn't as good as just waiting for the queue to have an item in it by blocking, but may suit your needs. If possible, I would avoid asyncio here and use multiprocessing entirely, for example, by implementing queue processing as a separate process.
I would like to combine asyncio and multiprocessing as I have a task where a part is io-bound and another is cpu-bound. I first tried to use loop.run_in_executor(), but I couldn't get it to work probably. Instead I went with creating two processes where one uses asyncio and the other doesn't.
The code is such that I have a class with some non-blocking functions and one blocking. I have an asyncio.Queue to pass information between the non-blocking parts and a multiprocessing.Queue to pass information between the non-blocking and the blocking functions.
import multiprocessing as mp
from concurrent.futures import ProcessPoolExecutor
import asyncio
import time
class TestClass:
def __init__(self):
m = mp.Manager()
self.blocking_queue = m.Queue()
async def run(self):
loop = asyncio.get_event_loop()
self.non_blocking_queue = asyncio.Queue() # asyncio Queue must be declared within event loop
task1 = loop.create_task(self.non_blocking1())
task2 = loop.create_task(self.non_blocking2())
task3 = loop.create_task(self.print_msgs())
await asyncio.gather(task1, task2)
task3.cancel()
def blocking(self):
i = 0
while i < 5:
time.sleep(0.6)
i += 1
print("Blocking ", i)
line = self.blocking_queue.get()
print("Blocking: ", line)
print("blocking done")
async def non_blocking1(self):
for i in range(5):
await self.non_blocking_queue.put("Hello")
await asyncio.sleep(0.4)
async def non_blocking2(self):
for i in range(5):
await self.non_blocking_queue.put("World")
await asyncio.sleep(0.5)
async def print_msgs(self):
while True:
line = await self.non_blocking_queue.get()
self.blocking_queue.put(line)
print(line)
test_class = TestClass()
with ProcessPoolExecutor() as pool:
pool.submit(test_class.blocking)
pool.submit(asyncio.run(test_class.run()))
print("done")
About half the times I run this, it works fine and prints out the text in the blocking and the non-blocking queues. The other half it only prints out the results of the non-blocking queue. It looks like the blocking process isn't started at all. It is not consequent every other time. It might work five times in a row and then not work five times in row.
What might cause such a problem? Which better way can I do this, using both multiprocessing and asyncio?
running the async task "inside" the other process works for me, e.g.:
def runfn(fn):
return asyncio.run(fn())
with ProcessPoolExecutor() as pool:
pool.submit(test_class.blocking)
pool.submit(runfn, test_class.run)
presumably there's some state inside asyncio/the task that needs to be consistent or gets broken when running in another process
I have a function that continuously monitors an API. Basically, the function gets the data, parses it then appends it to a file. then it waits for 15 minutes and does the same over and over.
what I want is to run this loop in the background so I don't block the rest of my code from executing.
If you are using asyncio (I assume you are due to the asyncio tag) a scheduled operation can be performed using a task.
import asyncio
loop = asyncio.get_event_loop()
async def check_api():
while True:
# Do API check, helps if this is using async methods
await asyncio.sleep(15 * 60) # 15 minutes (in seconds)
loop.create_task(check_api())
... # Rest of your application
loop.run_forever()
If your API check is not async (or the library you are using to interact with it does is not async) you can use an Executor to run the operation in a separate thread or process while still maintaining the asyncio API.
For example:
from concurrent.futures import ThreadPoolExecutor
executor = ThreadPoolExecutor()
def call_api():
...
async def check_api():
while True:
await loop.run_in_executor(executor, call_api)
await asyncio.sleep(15 * 60) # 15 minutes (in seconds)
Note that asyncio does not automatically make your code parallel, it is co-operative multitasking, all of your methods need to cooperate by using await, a long-running operation will still block other threads and in that case, an Executor will help.
This is very broad, but you could take a look at the multiprocessing or threading python modules.
For running a thread in the background it would look something like this:
from threading import Thread
def background_task():
# your code here
t = Thread(target=background_task)
t.start()
Try multithreading :
import threading
def background():
while True:
number = int(len(oilrigs)) * 49
number += money
time.sleep(1)
def foreground():
// What you want to run in the foreground
b = threading.Thread(name='background', target=background)
f = threading.Thread(name='foreground', target=foreground)
b.start()
f.start()
Try Multi Threading
import threading
def background():
#The loop you want to run in back ground
b = threading.Thread(target=background)
b.start()
Consider the following program.
import asyncio
import multiprocessing
from multiprocessing import Queue
from concurrent.futures.thread import ThreadPoolExecutor
import sys
def main():
executor = ThreadPoolExecutor()
loop = asyncio.get_event_loop()
# comment the following line and the shutdown will work smoothly
asyncio.ensure_future(print_some(executor))
try:
loop.run_forever()
except KeyboardInterrupt:
print("shutting down")
executor.shutdown()
loop.stop()
loop.close()
sys.exit()
async def print_some(executor):
print("Waiting...Hit CTRL+C to abort")
queue = Queue()
loop = asyncio.get_event_loop()
some = await loop.run_in_executor(executor, queue.get)
print(some)
if __name__ == '__main__':
main()
All I want is a graceful shutdown when I hit "CTRL+C". However, the executor thread seems to prevent that (even though I do call shutdown)
You need to send a poison pill to make the workers stop listening on the queue.get call. Worker threads in the ThreadPoolExecutor pool will block Python from exiting if they have active work. There's a comment in the source code that describes the reasoning for this behavior:
# Workers are created as daemon threads. This is done to allow the interpreter
# to exit when there are still idle threads in a ThreadPoolExecutor's thread
# pool (i.e. shutdown() was not called). However, allowing workers to die with
# the interpreter has two undesirable properties:
# - The workers would still be running during interpreter shutdown,
# meaning that they would fail in unpredictable ways.
# - The workers could be killed while evaluating a work item, which could
# be bad if the callable being evaluated has external side-effects e.g.
# writing to a file.
#
# To work around this problem, an exit handler is installed which tells the
# workers to exit when their work queues are empty and then waits until the
# threads finish.
Here's a complete example that exits cleanly:
import asyncio
import multiprocessing
from multiprocessing import Queue
from concurrent.futures.thread import ThreadPoolExecutor
import sys
def main():
executor = ThreadPoolExecutor()
loop = asyncio.get_event_loop()
# comment the following line and the shutdown will work smoothly
fut = asyncio.ensure_future(print_some(executor))
try:
loop.run_forever()
except KeyboardInterrupt:
print("shutting down")
queue.put(None) # Poison pill
loop.run_until_complete(fut)
executor.shutdown()
loop.stop()
loop.close()
async def print_some(executor):
print("Waiting...Hit CTRL+C to abort")
loop = asyncio.get_event_loop()
some = await loop.run_in_executor(executor, queue.get)
print(some)
queue = None
if __name__ == '__main__':
queue = Queue()
main()
The run_until_complete(fut) call is needed to avoid a warning about a pending task hanging around when the asyncio eventloop exits. If you don't care about that, you can leave that call out.
It seems asyncio.Queue only can be pushed by the same thread reading it? For instance:
import asyncio
from threading import Thread
import time
q = asyncio.Queue()
def produce():
for i in range(100):
q.put_nowait(i)
time.sleep(0.1)
async def consume():
while True:
i = await q.get()
print('consumed', i)
Thread(target=produce).start()
asyncio.get_event_loop().run_until_complete(consume())
only prints
consumed 0
and then hangs. What am I missing?
You can't call asyncio methods from another thread directly.
Either use loop.call_soon_threadsafe:
loop.call_soon_threadsafe(q.put_nowait, i)
Or asyncio.run_coroutine_threadsafe:
future = asyncio.run_coroutine_threadsafe(q.put(i), loop)
where loop is the loop returned by asyncio.get_event_loop() in your main thread.