Mulitprocess management in Python with aiomultiprocess - python

I have a problem with the multiprocessing in Python. I need to create async processes, which run a undefined time and the number of processes is also undefined. As soon as a new request arrives, a new process must be created with the specifications from the request. We use ZeroMQ for messaging. There is also a Process which is started at the beginning and only ends if the whole script terminates.
Now I am searching for a solution how I can await all processes, while being able to add additional processes.
asyncio.gather()
Was my first idea, but it needs the list of processes before it's been called.
class Object:
def __init__(self, var):
self.var = var
async def run(self):
*do async things*
class object_controller:
def __init__(self):
self.ctx = zmq.Context()
self.socket = self.ctx.socket(zmq.PULL)
self.socket.connect("tcp://127.0.0.1:5558")
self.static_process = AStaticProcess()
self.sp = aiomultiprocess.Process(target=self.static_process.run)
self.sp.start()
#here I need a good way to await this process
def process(self, var):
object = Object(var)
process = aiomultiprocess.Process(target=object.run)
process.start()
def listener(self)
while True:
msg = self.socket.recv_pyobj()
# here I need to find a way how I can start and await this process while beeing able to
# receive additional request, which result in additional processes which need to be awaited
This is some code which hopefully explains my problem. I need a kind of Collector which awaits the Processes.
After initialization, there is no interaction between the object and the controller, only over zeroMQ (between the static process and the variable processes). There is also no return.

If you need to start up proceses while concurrently waiting for new ones, instead of explicitly calling await to know when the Processes finish, let them execute in the background using asyncio.create_task(). This will return a Task object, which has an add_done_callback method, which you can use to do some work when the process completes:
class Object:
def __init__(self, var):
self.var = var
async def run(self):
*do async things*
class object_controller:
def __init__(self):
self.ctx = zmq.Context()
self.socket = self.ctx.socket(zmq.PULL)
self.socket.connect("tcp://127.0.0.1:5558")
self.static_process = AStaticProcess()
self.sp = aiomultiprocess.Process(target=self.static_process.run)
self.sp.start()
asyncio.create_task(self.sp.join() self.handle_proc_finished)
def process(self, var):
object = Object(var)
process = aiomultiprocess.Process(target=object.run)
process.start()
def listener(self)
while True:
msg = self.socket.recv_pyobj()
process = aiomultiprocess.Process(...)
process.start()
t = asyncio.create_task(process.join())
t.add_done_callback(self.handle_other_proc_finished)
def handle_proc_finished(self, task):
# do something
def handle_other_proc_finished(self, task):
# do something else
If you want to avoid using callbacks, you can also pass create_task a coroutine you define yourself, which waits for the process to finish and does whatever needs to be done afterward.
self.sp.start()
asyncio.create_task(wait_for_proc(self.sp))
async def wait_for_proc(proc):
await proc.join()
# do other stuff

You need to create a list of tasks or a future object for the processes. Also you cannot add process to the event loop while awaiting other tasks

Related

asyncio network operation in thread?

I have a Python asyncio script that needs to run a long running task in a thread. During the operation of the thread, it needs to make network connections to another server. Is there any problem calling network/socket write functions in a thread as opposed to doing it in the main thread?
I know that in the Tiwsted library for example, one must always do network operations in the main thread. Are there any such limitations in asyncio? And if so, how does one get around this problem.
Here's my sample code:
import asyncio
import threading
#
# global servers dict keeps track of connected instances of each protocol
#
servers={}
class SomeOtherServer(asyncio.Protocol):
def __init__(self):
self.transport = None
def connection_made(self,transport):
self.transport=transport
servers["SomeOtherServer"] = self
def connection_lost(self):
self.transport=None
class MyServer(asyncio.Protocol):
def __init__(self):
self.transport = None
def connection_made(self,transport);
self.transport=transport
servers["MyServer"] = self
def connection_lost(self):
self.transport=None
def long_running_task(self,data):
# some long running operations here, then write data to other server
# other_server is also an instance of some sort of asyncio.Protocol
# is it ok to call this like this, even though this method is running in a thread?
other_server = servers["SomeOtherServer"]
other_server.transport.write(data)
def data_received(self,data):
task_thread = threading.Thread(target=self.long_running_task,args=[data])
task_thread.start()
async def main():
global loop
loop = asyncio.get_running_loop()
other_server_obj = await loop.create_server(lambda: SomeOtherServer(),"localhost",9001)
my_server_obj = await loop.create_server(lambda: MyServer(),"localhost",9002)
async with other_server_obj, my_server_obj:
while True:
await asyncio.sleep(3600)
asyncio.run(main())
Note that data_received will set up and call long_running_task in a thread, and long running_task makes a network connection to another server, and does so in the task thread, not the main thread. Is this ok or is there some other way this must be done?

Starting a new process from an asyncio loop

I want to start a new Process (Pricefeed) from my Executor class and then have the Executor class keep running in its own event loop (the shoot method). In my current attempt, the asyncio loop gets blocked on the line p.join(). However, without that line, my code just exits. How do I do this properly?
Note: fh.run() blocks as well.
import asyncio
from multiprocessing import Process, Queue
from cryptofeed import FeedHandler
from cryptofeed.defines import L2_BOOK
from cryptofeed.exchanges.ftx import FTX
class Pricefeed(Process):
def __init__(self, queue: Queue):
super().__init__()
self.coin_symbol = 'SOL-USD'
self.fut_symbol = 'SOL-USD-PERP'
self.queue = queue
async def _book_update(self, feed, symbol, book, timestamp, receipt_timestamp):
self.queue.put(book)
def run(self):
fh = FeedHandler()
fh.add_feed(FTX(symbols=[self.fut_symbol, self.coin_symbol], channels=[L2_BOOK],
callbacks={L2_BOOK: self._book_update}))
fh.run()
class Executor:
def __init__(self):
self.q = Queue()
async def shoot(self):
print('in shoot')
for i in range(5):
msg = self.q.get()
print(msg)
await asyncio.sleep(1) # do some stuff
async def run(self):
asyncio.create_task(self.shoot())
p = Pricefeed(self.q)
p.start()
p.join()
async def main():
g = Executor()
await g.run()
if __name__ == '__main__':
asyncio.run(main())
Since you're using a queue to communicate this is a somewhat tricky problem. To answer your first question as to why removing join makes the program work, join blocks until the process finishes. In asyncio you can't do anything blocking in a function marked async or it will freeze the event loop. To do this properly you'll need to run your process with the asyncio event loop's run_in_executor method which will run things in a process pool and return an awaitable that is compatible with the asyncio event loop.
Secondly, you'll need to use a multiprocessing Manager which creates shared state that can be used by multiple processes to properly share your queue. Managers directly support creation of a shared queue. Using these two bits of knowledge you can adapt your code to something like the following which works:
import asyncio
import functools
import time
from multiprocessing import Manager
from concurrent.futures import ProcessPoolExecutor
def run_pricefeed(queue):
i = 0
while True: #simulate putting an item on the queue every 250ms
queue.put(f'test-{i}')
i += 1
time.sleep(.25)
class Executor:
async def shoot(self, queue):
print('in shoot')
for i in range(5):
while not queue.empty():
msg = queue.get(block=False)
print(msg)
await asyncio.sleep(1) # do some stuff
async def run(self):
with ProcessPoolExecutor() as pool:
with Manager() as manager:
queue = manager.Queue()
asyncio.create_task(self.shoot(queue))
await asyncio.get_running_loop().run_in_executor(pool, functools.partial(run_pricefeed, queue))
async def main():
g = Executor()
await g.run()
if __name__ == '__main__':
asyncio.run(main())
This code has a drawback in that you need to empty the queue in a non-blocking fashing from your asyncio process and wait for a while for new items to come in before emptying it again, effectively implementing a polling mechanism. If you don't wait after emptying, you'll wind up with blocking code and you will freeze the event loop again. This isn't as good as just waiting for the queue to have an item in it by blocking, but may suit your needs. If possible, I would avoid asyncio here and use multiprocessing entirely, for example, by implementing queue processing as a separate process.

Wrapping a Queue in Future

I am writing a Tornado webserver in Python 3.7 to display the status of processes run by the multiprocessing library.
The following code works, but I'd like to be able to do it using Tornado's built-in library instead of hacking in the threading library. I haven't figured out how to do it without blocking Tornado during queue.get. I think the correct solution is to wrap the get calls in some sort of future. I've tried for hours, but haven't figured out how to do this.
Inside of my multiprocessing script:
class ProcessToMonitor(multiprocessing.Process)
def __init__(self):
multiprocessing.Process.__init__(self)
self.queue = multiprocessing.Queue()
def run():
while True:
# do stuff
self.queue.put(value)
Then, in my Tornado script
class MyWebSocket(tornado.websocket.WebSocketHandler):
connections = set()
def open(self):
self.connections.add(self)
def close(self):
self.connections.remove(self)
#classmethod
def emit(self, message):
[client.write_message(message) for client in self.connections]
def worker():
ptm = ProcessToMonitor()
ptm.start()
while True:
message = ptm.queue.get()
MyWebSocket.emit(message)
if __name__ == '__main__':
app = tornado.web.Application([
(r'/', MainHandler), # Not shown
(r'/websocket', MyWebSocket)
])
app.listen(8888)
threading.Thread(target=worker)
ioloop = tornado.ioloop.IOLoop.current()
ioloop.start()
queue.get isn't a blocking function, it just waits until there's an item in the queue in case the queue is empty. I can see from your code that queue.get fits perfectly for you use case inside a while loop.
I think you're probably using it incorrectly. You'll have to make the worker function a coroutine (async/await syntax):
async def worker():
...
while True:
message = await queue.get()
...
However, if you don't want to wait for an item and would like to proceed immediately, its alternative is queue.get_nowait.
One thing to note here is thatqueue.get_nowait will raise an exception called QueueEmpty if the queue is empty. So, you'll need to handle that exception.
Example:
while True:
try:
message = queue.get_nowait()
except QueueEmpty:
# wait for some time before
# next iteration
# otherwise this loop will
# keep running for no reason
MyWebSocket.emit(message)
As you can see, you'll have to use pause the while loop for some time if the queue is empty to prevent it from overwhelming the system.
So why not use queue.get in the first place?

Threads persisting with irc.bot.SingleServerIRCBot (using with twitch)

What is the correct way to send a disconnect signal to a thread containing a SingleServerIRCBot?
I am instantiating bots that connect to twitch with
import threading
import irc.bot
class MyBot(irc.bot.SingleServerIRCBot):
...
bot = MyBot(...)
threads = []
t = threading.Thread(target=bot.start()
threads.append(t)
t.start()
When the stream no longer exists, no matter what I've tried, I haven't been able to get the thread to successfully end. How should I go about sending a signal to the thread that tells it to exit the channel kill the bot and then itself?
The code for the .start method can be found here https://github.com/jaraco/irc/blob/master/irc/bot.py#L331
My first thought is to override that method with a while loop that has an exit condition. I haven't had any luck with that so far though.
Furthermore, there is a .die method here https://github.com/jaraco/irc/blob/master/irc/bot.py#L269 but how can I call that method when the thread is executing an infinite loop?
Trying to kill the threads directly ends up with them persisting, and eventually throwing errors about the total number of threads that my process is running.
Edit for the bounty: I would also accept an answer that describes a better way to handle multiple IRC bots at once.
I don't think you could (or should) kill a thread directly, but you could stop the task running on that thread. Then the thread would be inactive and you could remove it from the threads list, if you like. I'm not familiar with SingleServerIRCBot, but I'll use the class below as an example.
import time
class MyTask:
def __init__(self):
self._active = True
def start(self):
while self._active:
print('running')
time.sleep(1)
def die(self):
self._active = False
In Python3, threads have a _target attribute, from which we can access the target function/method. We could use this attribute to access the target's object and call the die method (eg: thread._target.__self__.die()). However I think it would be best to subclass Thread and store the the target object in a variable, as _target is a private attribute, and also for compatibility reasons.
import threading
class MyThread(threading.Thread):
def __init__(self, target, args=()):
super(MyThread, self).__init__()
self.target = target
self.args = args
def run(self):
self.target.start(*self.args)
def stop_task(self):
self.target.die()
Using this class we would pass a MyTask object as a target, and the start method would be called from MyThread.run. Now we can use MyThread.stop_task to stop the task running on this thread.
o = MyTask()
t = MyThread(target=o)
t.start()
t.stop_task()
time.sleep(1.1)
print(t.is_alive())
Note that I'm waiting 1.1 sec to test if the thread is alive. That's because the target (MyTask.start) will take up to one second to stop. This method doesn't kill the thread, but calls MyTask.die and waits for the task to finish. If you want to end the task immediately (and loose any resources used by the task) you could use a Process and end it with .terminate. You should also choose multiprocessing over multithreading if your task is performing more CPU operations than IO operations, because processes are not limited by the GIL.
Afrer stydying the source code, I noticed that .die() calls sys.exit, so we can't use it to terminate the task because it would stop the program. It seems the reason for this is that .start() calls the parent object's .start(), which then calls the .process_forever() method of a Reactor object. This method starts running Reactor.process_once() in an infinite loop with no break condition.
A possible solution is to subclass SingleServerIRCBot and use a boolean variabe to break the loop. This class should override .start() and .die(), in order to stop the bot running on a thread. The .die() method would set the flag to false, and .start() would call Reactor.process_once() in a loop.
import irc.bot
class MyBot(irc.bot.SingleServerIRCBot):
def __init__(self, channel, nickname, server, port=6667):
super(MyBot, self).__init__([(server, port)], nickname, nickname)
self.channel = channel
self._active = True
def start(self):
self._connect()
while self._active:
self.reactor.process_once(timeout=0.2)
def die(self, msg="Bye, cruel world!"):
self.connection.disconnect(msg)
self._active = False
Now we can stop the bot either by calling .stop_task() on the thread running the bot, or by calling the .die() method of the bot directly.
host, port = 'irc.freenode.net', 6667
nick = 'My-Bot'
channel = '#python'
bot = MyBot(channel, nick, host, port)
t = MyThread(bot)
t.start()
t.stop_task()
#bot.die()

Python - Combining multiprocessing and asyncio

I'm trying to combine multiprocessing with asyncio. The program has two main components - one which streams/generates content, and another that consumes it.
What I want to do is to create multiple processes in order to exploit multiple CPU cores - one for the stream listener/generator, another for the consumer, and a simple one to shut down everything when the consumer has stopped.
My approach so far has been to create the processes, and start them. Each such process creates an async task. Once all processes have started, I run the asyncio tasks. What I have so far (stripped down) is:
def consume_task(loop, consumer):
loop.create_task(consume_queue(consumer))
def stream_task(loop, listener, consumer):
loop.create_task(create_stream(listener, consumer))
def shutdown_task(loop, listener):
loop.create_task(shutdown(consumer))
async def shutdown(consumer):
print("Shutdown task created")
while not consumer.is_stopped():
print("No activity")
await asyncio.sleep(5)
print("Shutdown initiated")
loop.stop()
async def create_stream(listener, consumer):
stream = Stream(auth, listener)
print("Stream created")
stream.filter(track=KEYWORDS, is_async=True)
await asyncio.sleep(EVENT_DURATION)
print("Stream finished")
consumer.stop()
async def consume_queue(consumer):
await consumer.run()
loop = asyncio.get_event_loop()
p_stream = Process(target=stream_task, args=(loop, listener, consumer, ))
p_consumer = Process(target=consume_task, args=(loop, consumer, ))
p_shutdown = Process(target=shutdown_task, args=(loop, consumer, ))
p_stream.start()
p_consumer.start()
p_shutdown.start()
loop.run_forever()
loop.close()
The problem is that everything hangs (or does it block?) - no tasks are actually running. My solution was to change the first three functions to:
def consume_task(loop, consumer):
loop.create_task(consume_queue(consumer))
loop.run_forever()
def stream_task(loop, listener, consumer):
loop.create_task(create_stream(listener, consumer))
loop.run_forever()
def shutdown_task(loop, listener):
loop.create_task(shutdown(consumer))
loop.run_forever()
This does actually run. However, the consumer and the listener objects are not able to communicate. As a simple example, when the create_stream function calls consumer.stop(), the consumer does not stop. Even when I change a consumer class variable, the changes are not made - case in point, the shared queue remains empty. This is how I am creating the instances:
queue = Queue()
consumer = PrintConsumer(queue)
listener = QueuedListener(queue, max_time=EVENT_DURATION)
Please note that if I do not use processes, but only asyncio tasks, everything works as expected, so I do not think it's a reference issue:
loop = asyncio.get_event_loop()
stream_task(loop, listener, consumer)
consume_task(loop, consumer)
shutdown_task(loop, listener)
loop.run_forever()
loop.close()
Is it because they are running on different processes? How should I go about fixing this issue please?
Found the problem! Multi-processing creates copies of instances. The solution is to create a Manager, which shares the instances itself.
EDIT [11/2/2020]:
import asyncio
from multiprocessing import Process, Manager
"""
These two functions will be created as separate processes.
"""
def task1(loop, shared_list):
output = loop.run_until_complete(asyncio.gather(async1(shared_list)))
def task2(loop, shared_list):
output = loop.run_until_complete(asyncio.gather(async2(shared_list)))
"""
These two functions will be called (in different processes) asynchronously.
"""
async def async1(shared_list):
pass
async def async2(shared_list):
pass
"""
Create the manager and start it up.
From this manager, also create a list that is shared by functions in different threads.
"""
manager = Manager()
manager.start()
shared_list = manager.list()
loop = asyncio.get_event_loop() # the event loop
"""
Create two processes.
"""
process1 = Process(target=task1, args=(loop, shared_list, ))
process2 = Process(target=task2, args=(loop, shared_list, ))
"""
Start the two processes and wait for them to finish.
"""
process1.start()
process2.start()
output1 = process1.join()
output2 = process2.join()
"""
Clean up
"""
loop.close()
manager.shutdown()

Categories

Resources