Can I use a `multiprocessing.Queue` for communication within a process? - python

I'm using queues for inter-thread communication. I'm using multiprocessing.Queue() instead of queue.Queue() because the multiprocessing version exposes an underlying file descriptor which can be waited on with select.select - which means I can block waiting for an object in the queue or a packet to arrive on a network interface from the same thread.
But when I try to get an object from the queue, I get this:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: can't pickle _thread.lock objects
Is there a way to do this? Or am I stuck using queue.Queue() and having a separate thread select.select() on the sockets and put the results into the queue?
Edit: I think this is the minimal reproducible example:
import multiprocessing
import threading
queue = multiprocessing.Queue()
class Msg():
def __init__(self):
self.lock = threading.Lock()
def source():
queue.put(Msg())
def sink():
obj = queue.get()
print("Got")
threading.Thread(target=sink).start()
source()
The problem is that the object I'm putting into the queue has a threading.Lock object as a field (at several levels of composition deep).

TL;DR: threading.Lock instances simply cannot be pickled and pickle is used to serialize an object that is put to a multiprocessing.Queue instance. But there is very little value to passing an object to another thread via a multiprocessing.Queue since the thread retrieves what becomes a new instance of that object unless creating a copy of the object is part of your goal. So if you do pass the object via a queue, then the lock cannot not be part of the object's state and you need an alternate approach (see below).
The (much) Longer Answer
First, as your error message states threading.Lock` instances cannot be serialized with pickle. This can also easily be demonstrated:
>>> import pickle
>>> import threading
>>> lock = threading.Lock()
>>> serialized_lock = pickle.dumps(lock)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: cannot pickle '_thread.lock' object
Second, when you put an object to a threading.Queue instance, the object is serialized with pickle and so you get the above exception.
But while your posting constitutes a minimal, complete example, it does not represent a realistic program that does anything useful. What are you actually trying to accomplish? Let's suppose you were able to serialize a lock and therefore pass an instance of Msg via a queue. Presumably the lock is to serialize some code that updates the object's state. But since this is a different instance of Msg than the one that was put on the queue, the only meaningful use of this lock would be if this sink thread created additional threads that operated on this instance. So let's conjecture there is an attribute, x that needs to be incremented in multiple threads. This would require a lock since the += operator is not atomic. Since the required lock could not be part of the object's state if being passed via a queue, then you have to separately create the lock. This is just one of many possible approaches:
import multiprocessing
import threading
queue = multiprocessing.Queue()
class Msg():
def __init__(self):
self.x = 0
def set_lock(self, lock):
self.lock = lock
def compute(self):
with self.lock:
self.x += 1
def source():
queue.put(Msg())
def sink():
msg = queue.get()
msg.set_lock(threading.Lock())
t = threading.Thread(target=msg.compute)
t.start()
msg.compute()
t.join()
print(msg.x)
threading.Thread(target=sink).start()
source()
Prints:
2
If you are not using a queue for object passing, then there is no problem having the lock as part of the object's initial state:
import queue
import socket
import os
import select
import threading
class PollableQueue(queue.Queue):
def __init__(self):
super().__init__()
# Create a pair of connected sockets
if os.name == 'posix':
self._putsocket, self._getsocket = socket.socketpair()
else:
# Compatibility on non-POSIX systems
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.bind(('127.0.0.1', 0))
server.listen(1)
self._putsocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self._putsocket.connect(server.getsockname())
self._getsocket, _ = server.accept()
server.close()
def fileno(self):
return self._getsocket.fileno()
def put(self, item):
super().put(item)
self._putsocket.send(b'x')
def get(self):
self._getsocket.recv(1)
return super().get()
class Msg:
def __init__(self, q, socket):
# An instance of this class could be passed via a multithreading.Queue
# A multiprocessing.Lock could also be used but is not
# necessary if we are doing threading:
self.lock = threading.Lock() # to be used by some method not shown
self.q = q
self.socket = socket
def consume(self):
while True:
can_read, _, _ = select.select([q, read_socket], [], [])
for r in can_read:
item = r.get() if isinstance(r, queue.Queue) else r.recv(3).decode()
print('Got:', item, 'from', type(r))
# Example code that performs polling:
if __name__ == '__main__':
import threading
import time
q = PollableQueue()
write_socket, read_socket = socket.socketpair()
msg = Msg(q, read_socket)
t = threading.Thread(target=msg.consume, daemon=True)
t.start()
# Feed data to the queues
q.put('abc')
write_socket.send(b'cde')
write_socket.send(b'fgh')
q.put('ijk')
# Give consumer time to get all the items:
time.sleep(1)
Prints:
Got: abc from <class '__main__.PollableQueue'>
Got: ijk from <class '__main__.PollableQueue'>
Got: cde from <class 'socket.socket'>
Got: fgh from <class 'socket.socket'>

Related

Can I dynamically register objects to proxy with a multiprocessing BaseManager?

There are plenty of examples of using a multiprocessing BaseManager-derived class to register a method for returning a queue handle proxy, that clients can then use to pull/put from the queue.
This is great, but I have a different scenario - what if the number of queues that I need to proxy changes in response to outside events? What I really want is to proxy a method that returns a specific queue given a UID.
I tried this out but I couldn't get it to work, it appears that the only things that are available are what is registered with the class before the object is instantiated. I'm unable to BaseManager.register("my-new-queue", lambda: queue.Queue) once I've already instantiated an instance of that class and caused it to run.
Is there any way around this? It feels to me like we should be able to dynamically handle this
The registration is most important in the "server" process where the callable will actually get called. Registering a callable in a "client" process only adds that typeid (the string you pass to register) as a method to the manager class. The rub is that running the server blocks, preventing you from registering new callables, and it occurs in another process making it further difficult to modify the registry.
I've been tinkering with this a little while... imao managers are cursed.. I think your prior question would also be answered (aside from our discussion in the comments) by the thing that solved it. Basically python attempts to be a little bit secure about not sending around the authkey parameter for proxied objects, but it stumbles sometimes (particularly with nested proxies). The fix is to set the default authkey for the process mp.current_process().authkey = b'abracadabra' which is used as the fallback when authkey=None (https://bugs.python.org/issue7503)
Here's my full testing script which is derived from the remote manager example from the docs. Basically I create a shared dict to hold shared queues:
#server process
from multiprocessing.managers import BaseManager, DictProxy
from multiprocessing import current_process
from queue import Queue
queues = {} #dict[uuid, Queue]
class QueueManager(BaseManager):
pass
QueueManager.register('new_queue', callable=Queue)
QueueManager.register('get_queues', callable=lambda:queues, proxytype=DictProxy)
m = QueueManager(address=('localhost', 50000), authkey=b'abracadabra')
current_process().authkey = b'abracadabra'
s = m.get_server()
s.serve_forever()
#process A
from multiprocessing.managers import BaseManager
from multiprocessing import current_process
class QueueManager(BaseManager):
pass
QueueManager.register('new_queue')
QueueManager.register('get_queues')
m = QueueManager(address=('localhost', 50000), authkey=b'abracadabra')
m.connect()
current_process().authkey = b'abracadabra'
queues_dict = m.get_queues()
queues_dict['my_uuid'] = m.new_queue()
queues_dict['my_uuid'].put("this is a test")
#process B
from multiprocessing.managers import BaseManager
from multiprocessing import current_process
class QueueManager(BaseManager):
pass
QueueManager.register('new_queue')
QueueManager.register('get_queues')
m = QueueManager(address=('localhost', 50000), authkey=b'abracadabra')
m.connect()
current_process().authkey = b'abracadabra'
queues_dict = m.get_queues()
print(queues_dict['my_uuid'].get())
EDIT:
Regarding the comments: "get_queue take the UUID and return the specific queue" the modification is simple, and does not involve nested proxies thereby avoiding the digest auth issue:
#server process
from multiprocessing.managers import BaseManager
from collections import defaultdict
from queue import Queue
queues = defaultdict(Queue)
class QueueManager(BaseManager): pass
QueueManager.register('get_queue', callable=lambda uuid:queues[uuid])
m = QueueManager(address=('localhost', 50000), authkey=b'abracadabra')
s = m.get_server()
s.serve_forever()
#process A
from multiprocessing.managers import BaseManager
class QueueManager(BaseManager): pass
QueueManager.register('get_queue')
m = QueueManager(address=('localhost', 50000), authkey=b'abracadabra')
m.connect()
m.get_queue("my_uuid").put("this is a test")
#process B
from multiprocessing.managers import BaseManager
class QueueManager(BaseManager): pass
QueueManager.register('get_queue')
m = QueueManager(address=('localhost', 50000), authkey=b'abracadabra')
m.connect()
print(m.get_queue("my_uuid").get())
Aaron's answer is perhaps the simplest way here, where you share a dictionary and store the queues in that shared dictionary. However, it does not answer the problem of not being able to update the methods on a manager once it has started. Therefore, here is a more complete solution, less verbose than it's alternative, where you can update the registry even after the server has started:
from queue import Queue
from multiprocessing.managers import SyncManager, Server, State, dispatch
from multiprocessing.context import ProcessError
class UpdateServer(Server):
public = ['shutdown', 'create', 'accept_connection', 'get_methods',
'debug_info', 'number_of_objects', 'dummy', 'incref', 'decref', 'update_registry']
def update_registry(self, c, registry):
with self.mutex:
self.registry.update(registry)
def get_server(self):
if self._state.value != State.INITIAL:
if self._state.value == State.STARTED:
raise ProcessError("Already started server")
elif self._state.value == State.SHUTDOWN:
raise ProcessError("Manager has shut down")
else:
raise ProcessError(
"Unknown state {!r}".format(self._state.value))
return self._Server(self._registry, self._address,
self._authkey, self._serializer)
class UpdateManager(SyncManager):
_Server = UpdateServer
def update_registry(self):
assert self._state.value == State.STARTED, 'server not yet started'
conn = self._Client(self._address, authkey=self._authkey)
try:
dispatch(conn, None, 'update_registry', (type(self)._registry, ), {})
finally:
conn.close()
class MyQueue:
def __init__(self):
self.initialized = False
self.q = None
def initialize(self):
self.q = Queue()
def __call__(self):
if not self.initialized:
self.initialize()
self.initialized = True
return self.q
if __name__ == '__main__':
# Create an object of wrapper class, note that we do not initialize the queue right away (it's unpicklable)
queue = MyQueue()
manager = UpdateManager()
manager.start()
# If you register new typeids, then call update_registry. The method_to_typeid parameter maps the __call__ method to
# return a proxy of the queue instead since Queues are not picklable
UpdateManager.register('uuid', queue, method_to_typeid={'__call__': 'Queue'})
manager.update_registry()
# Once the object is stored in the manager process, now we can safely initialize the queue and share
# it among processes. Initialization is implicit when we call uuid() if it's not already initialized
q = manager.uuid()
q.put('bye')
print(q.get())
Over here, UpdateServer and UpdateManager add support for method update_registry which informs the server if any new typeid's are registered with the manager. MyQueue is simply a wrapper class to return the new queues registered if called directly. While it's functionally similar to registering lambda : queue, the wrapper is necessary because lamdba functions are not picklable and the server process is being started in a new process here (rather than doing server.serve_forever() in the main process, but you can do that too if you want).
So, you can now register typeids even after the manager process is running, just make sure to call the update_registry function right after. This function call will even work if you are starting the server in the main process itself (by using serve_forever, like in Aaron's answer) and connecting to it from another process using manager.connect.

How to run Python custom objects in separate processes, all working on a shared events queue?

I have 4 different Python custom objects and an events queue. Each obect has a method that allows it to retrieve an event from the shared events queue, process it if the type is the desired one and then puts a new event on the same events queue, allowing other processes to process it.
Here's an example.
import multiprocessing as mp
class CustomObject:
def __init__(events_queue: mp.Queue) -> None:
self.events_queue = event_queue
def process_events_queue() -> None:
event = self.events_queue.get()
if type(event) == SpecificEventDataTypeForThisClass:
# do something and create a new_event
self.events_queue.put(new_event)
else:
self.events_queue.put(event)
# there are other methods specific to each object
These 4 objects have specific tasks to do, but they all share this same structure. Since I need to "simulate" the production condition, I want them to run all at the same time, indipendently from eachother.
Here's just an example of what I want to do, if possible.
import multiprocessing as mp
import CustomObject
if __name__ == '__main__':
events_queue = mp.Queue()
data_provider = mp.Process(target=CustomObject, args=(events_queue,))
portfolio = mp.Process(target=CustomObject, args=(events_queue,))
engine = mp.Process(target=CustomObject, args=(events_queue,))
broker = mp.Process(target=CustomObject, args=(events_queue,))
while True:
data_provider.process_events_queue()
portfolio.process_events_queue()
engine.process_events_queue()
broker.process_events_queue()
My idea is to run each object in a separate process, allowing them to communicate with events shared through the events_queue. So my question is, how can I do that?
The problem is that obj = mp.Process(target=CustomObject, args=(events_queue,)) returns a Process instance and I can't access the CustomObject methods from it. Also, is there a smarter way to achieve what I want?
Processes require a function to run, which defines what the process is actually doing. Once this function exits (and there are no non-daemon threads) the process is done. This is similar to how Python itself always executes a __main__ script.
If you do mp.Process(target=CustomObject, args=(events_queue,)) that just tells the process to call CustomObject - which instantiates it once and then is done. This is not what you want, unless the class actually performs work when instantiated - which is a bad idea for other reasons.
Instead, you must define a main function or method that handles what you need: "communicate with events shared through the events_queue". This function should listen to the queue and take action depending on the events received.
A simple implementation looks like this:
import os, time
from multiprocessing import Queue, Process
class Worker:
# separate input and output for simplicity
def __init__(self, commands: Queue, results: Queue):
self.commands = commands
self.results = results
# our main function to be run by a process
def main(self):
# each process should handle more than one command
while True:
value = self.commands.get()
# pick a well-defined signal to detect "no more work"
if value is None:
self.results.put(None)
break
# do whatever needs doing
result = self.do_stuff(value)
print(os.getpid(), ':', self, 'got', value, 'put', result)
time.sleep(0.2) # pretend we do something
# pass on more work if required
self.results.put(result)
# placeholder for what needs doing
def do_stuff(self, value):
raise NotImplementedError
This is a template for a class that just keeps on processing events. The do_stuff method must be overloaded to define what actually happens.
class AddTwo(Worker):
def do_stuff(self, value):
return value + 2
class TimesThree(Worker):
def do_stuff(self, value):
return value * 3
class Printer(Worker):
def do_stuff(self, value):
print(value)
This already defines fully working process payloads: Process(target=TimesThree(in_queue, out_queue).main) schedules the main method in a process, listening for and responding to commands.
Running this mainly requires connecting the individual components:
if __name__ == '__main__':
# bookkeeping of resources we create
processes = []
start_queue = Queue()
# connect our workers via queues
queue = start_queue
for element in (AddTwo, TimesThree, Printer):
instance = element(queue, Queue())
# we run the main method in processes
processes.append(Process(target=instance.main))
queue = instance.results
# start all processes
for process in processes:
process.start()
# send input, but do not wait for output
start_queue.put(1)
start_queue.put(248124)
start_queue.put(-256)
# send shutdown signal
start_queue.put(None)
# wait for processes to shutdown
for process in processes:
process.join()
Note that you do not need classes for this. You can also compose functions for a similar effect, as long as everything is pickle-able:
import os, time
from multiprocessing import Queue, Process
def main(commands, results, do_stuff):
while True:
value = commands.get()
if value is None:
results.put(None)
break
result = do_stuff(value)
print(os.getpid(), ':', do_stuff, 'got', value, 'put', result)
time.sleep(0.2)
results.put(result)
def times_two(value):
return value * 2
if __name__ == '__main__':
in_queue, out_queue = Queue(), Queue()
worker = Process(target=main, args=(in_queue, out_queue, times_two))
worker.start()
for message in (1, 3, 5, None):
in_queue.put(message)
while True:
reply = out_queue.get()
if reply is None:
break
print('result:', reply)

Python Multiprocessing - TypeError: Pickling an AuthenticationString object is disallowed for security reasons

I'm having the following Problem. I want to implement a web crawler, so far this worked but it was so slow, that I tried to use multiprocessing for fetching the URLs.
Unfortunately I'm not very experienced at this field.
After some reading the easiest way seemed to me to use the map method from multiprocessing.pool for this.
But I constantly get the following error:
TypeError: Pickling an AuthenticationString object is disallowed for security reasons
I found very few cases with the same error and they unfortunately did not help me.
I created a stripped version of my code which can reproduce the error:
import multiprocessing
class TestCrawler:
def __init__(self):
self.m = multiprocessing.Manager()
self.queue = self.m.Queue()
for i in range(50):
self.queue.put(str(i))
self.pool = multiprocessing.Pool(6)
def mainloop(self):
self.process_next_url(self.queue)
while True:
self.pool.map(self.process_next_url, (self.queue,))
def process_next_url(self, queue):
url = queue.get()
print(url)
c = TestCrawler()
c.mainloop()
I would be very thankful about any help or suggestion!
Question: But I constantly get the following error:
The Error you'r getting is missleading, the reason are
self.queue = self.m.Queue()
Move the Queue instantiation Outside the class TestCrawler.
This leads to another Error:
NotImplementedError: pool objects cannot be passed between processes or pickled
The reason are:
self.pool = multiprocessing.Pool(6)
Both Errors are indicating that pickle can't find the class Members.
Note: Endless Loop!
Your following while Loop leads to a Endless Loop!
This will overload your System!
Furthermore, your pool.map(... starts only one Process with one Task!
while True:
self.pool.map(self.process_next_url, (self.queue,))
I suggest reading The Examples that demonstrates the use of a pool
Change to the following:
class TestCrawler:
def __init__(self, tasks):
# Assign the Global task to class member
self.queue = tasks
for i in range(50):
self.queue.put(str(i))
def mainloop(self):
# Instantiate the pool local
pool = mp.Pool(6)
for n in range(50):
# .map requires a Parameter pass None
pool.map(self.process_next_url, (None,))
# None is passed
def process_next_url(self, dummy):
url = self.queue.get()
print(url)
if __name__ == "__main__":
# Create the Queue as Global
tasks = mp.Manager().Queue()
# Pass the Queue to your class TestCrawler
c = TestCrawler(tasks)
c.mainloop()
This Example starts 5 Processes each processing 10 Tasks(urls):
class TestCrawler2:
def __init__(self, tasks):
self.tasks = tasks
def start(self):
pool = mp.Pool(5)
pool.map(self.process_url, self.tasks)
def process_url(self, url):
print('self.process_url({})'.format(url))
if __name__ == "__main__":
tasks = ['url{}'.format(n) for n in range(50)]
TestCrawler2(tasks).start()
Tested with Python: 3.4.2

How does one pass semaphore locks and shared objects into threads?

How does one pass semaphore locks into threading.Thread objects in Python?
I am using locks to allow different threading.Thread classes to modify shared resources in their run functions. However, I'm unsure how to pass in the lock which is declared in a "main" file while the two threading objects are defined in separate files. For example:
Let's say I have one class producing data in producer.py:
import threading
class producer(threading.Thread):
def __init__(self, lock, shared):
self.lock = lock # Must be linked with global_lock
self.shared = shared # Must also be linked with global_shared
self.sock = # Declare listening socket
def run(self):
self.lock.acquire()
self.shared = sock.recv_msg() # Get data from socket
self.lock.release()
And another class consuming data in consumer.py:
import threading
class consumer(threading.Thread):
def __init__(self, lock, shared):
self.lock = lock # Must be linked with global_lock
self.shared = shared # Must also be linked with global_shared
def run(self):
self.lock.acquire()
print self.shared
self.shared = None
self.lock.release()
Then, in main.py:
import threading
global_lock = threading.Lock()
shared_dictionary = {}
producer_thread = producer(global_lock, global_shared)
consumer_thread = consumer(global_lock, global_shared)
producer.start()
consumer.start()
I understand that I could use global variables and declare them at the beginning of the run functions in each class but this is undesired since the purpose of these classes are to be reusable.
How can I pass a shared lock between two threading.Thread objects such that their .run() functions use the shared lock and shared object. Please help. Thank you in advance!

Killing child processes created in class __init__ in Python

(New to Python and OO - I apologize in advance if I'm being stupid here)
I'm trying to define a Python 3 class such that when an instance is created two subprocesses are also created. These subprocesses do some work in the background (sending and listening for UDP packets). The subprocesses also need to communicate with each other and with the instance (updating instance attributes based on what is received from UDP, among other things).
I am creating my subprocesses with os.fork because I don't understand how to use the subprocess module to send multiple file descriptors to child processes - maybe this is part of my problem.
The problem I am running into is how to kill the child processes when the instance is destroyed. My understanding is I shouldn't use destructors in Python because stuff should get cleaned up and garbage collected automatically by Python. In any case, the following code leaves the children running after it exits.
What is the right approach here?
import os
from time import sleep
class A:
def __init__(self):
sfp, pts = os.pipe() # senderFromParent, parentToSender
pfs, stp = os.pipe() # parentFromSender, senderToParent
pfl, ltp = os.pipe() # parentFromListener, listenerToParent
sfl, lts = os.pipe() # senderFromListener, listenerToSender
pid = os.fork()
if pid:
# parent
os.close(sfp)
os.close(stp)
os.close(lts)
os.close(ltp)
os.close(sfl)
self.pts = os.fdopen(pts, 'w') # allow creator of A inst to
self.pfs = os.fdopen(pfs, 'r') # send and receive messages
self.pfl = os.fdopen(pfl, 'r') # to/from sender and
else: # listener processes
# sender or listener
os.close(pts)
os.close(pfs)
os.close(pfl)
pid = os.fork()
if pid:
# sender
os.close(ltp)
os.close(lts)
sender(self, sfp, stp, sfl)
else:
# listener
os.close(stp)
os.close(sfp)
os.close(sfl)
listener(self, ltp, lts)
def sender(a, sfp, stp, sfl):
sfp = os.fdopen(sfp, 'r') # receive messages from parent
stp = os.fdopen(stp, 'w') # send messages to parent
sfl = os.fdopen(sfl, 'r') # received messages from listener
while True:
# send UDP packets based on messages from parent and process
# responses from listener (some responses passed back to parent)
print("Sender alive")
sleep(1)
def listener(a, ltp, lts):
ltp = os.fdopen(ltp, 'w') # send messages to parent
lts = os.fdopen(lts, 'w') # send messages to sender
while True:
# listen for and process incoming UDP packets, sending some
# to sender and some to parent
print("Listener alive")
sleep(1)
a = A()
Running the above produces:
Sender alive
Listener alive
Sender alive
Listener alive
...
Actually, you should use destructors. Python objects have a __del__ method, which is called just before the object is garbage-collected.
In your case, you should define
def __del__(self):
...
within your class A that sends the appropriate kill signals to your child processes. Don't forget to store the child PIDs in your parent process, of course.
As suggested here, you can create a child process using multiprocessing module with flag daemon=True.
Example:
from multiprocessing import Process
p = Process(target=f, args=('bob',))
p.daemon = True
p.start()
There's no point trying to reinvent the wheel. subprocess does all you want and more, though multiprocessing will simply the process, so we'll use that.
You can use multiprocessing.Pipe to create connections and can send messages back and forth between a pair of processes. You can make a pipe "duplex", so both ends can send and receive if that's what you need. You can use multiprocessing.Manager to create a shared Namespace between processes (sharing a state between listener, sender and parent). There is a warning with using multiprocessing.list, multiprocessing.dict or multiprocessing.Namespace. Any mutable object assigned to them will not see changes made to that object until it is reassigned to the managed object.
eg.
namespace.attr = {}
# change below not cascaded to other processes
namespace.attr["key"] = "value"
# force change to other processes
namespace.attr = namespace.attr
If you need to have more than one process write to the same attribute then you will need to use synchronisation to prevent concurrent modification by one processes wiping out changes made by another process.
Example code:
from multiprocessing import Process, Pipe, Manager
class Reader:
def __init__(self, writer_conn, namespace):
self.writer_conn = writer_conn
self.namespace = namespace
def read(self):
self.namespace.msgs_recv = 0
with self.writer_conn:
try:
while True:
obj = self.writer_conn.recv()
self.namespace.msgs_recv += 1
print("Reader got:", repr(obj))
except EOFError:
print("Reader has no more data to receive")
class Writer:
def __init__(self, reader_conn, namespace):
self.reader_conn = reader_conn
self.namespace = namespace
def write(self, msgs):
self.namespace.msgs_sent = 0
with self.reader_conn:
for msg in msgs:
self.reader_conn.send(msg)
self.namespace.msgs_sent += 1
def create_child_processes(reader, writer, msgs):
p_write = Process(target=Writer.write, args=(writer, msgs))
p_write.start()
# This is very important otherwise reader will hang after writer has finished.
# The order of this statement coming after p_write.start(), but after
# p_read.start() is also important. Look up file descriptors and how they
# are inherited by child processes on Unix and how a any valid fd to the
# write side of a pipe will keep all read ends open
writer.reader_conn.close()
p_read = Process(target=Reader.read, args=(reader,))
p_read.start()
return p_read, p_write
def run_mp_pipe():
manager = Manager()
namespace = manager.Namespace()
read_conn, write_conn = Pipe()
reader = Reader(read_conn, namespace)
writer = Writer(write_conn, namespace)
p_read, p_write = create_child_processes(reader, writer,
msgs=["hello", "world", {"key", "value"}])
print("starting")
p_write.join()
p_read.join()
print("done")
print(namespace)
assert namespace.msgs_sent == namespace.msgs_recv
if __name__ == "__main__":
run_mp_pipe()
Output:
starting
Reader got: 'hello'
Reader got: 'world'
Reader got: {'key', 'value'}
Reader has no more data to receive
done
Namespace(msgs_recv=3, msgs_sent=3)

Categories

Resources