Why does Python's multiprocessing Queue have a buffer and a Pipe

Why does Python's multiprocessing Queue have a buffer and a Pipe - python

Context
I have been looking at the source code SEE HERE for multiprocessing Queue Python 2.7 and have some questions.
A deque is used for a buffer and any items put on the Queue are appended to the deque but for get(), a pipe is used.
We can see that during put, if the feeder thread has not been started yet it will start.
The thread will pop objects off the thread and send them on the read side of the above pipe.
Questions
So, why use a deque and a pipe?
Couldn't one just use a deque (or any other data structure with FIFO behavior) and synchronize push and pop?
Likewise couldn't one also just use a Pipe, wrapping send and recv?
Maybe there is something here that I am missing but the feeder thread popping items and putting them on the Pipe seems like overkill.

The multiprocessing.Queue is a port of the standard Queue capable of running on multiple processes. Therefore it tries to reproduce the same behaviour.
A deque is a list with fast insertion/extraction on both sides with, theoretically, infinite size. It's very well suited for representing a stack or a queue. It does not work across different processes though.
A Pipe works more like a socket and allows to transfer data across processes. Pipes are Operating System objects and their implementation differs from OS to OS. Moreover, pipes have a limited size. If you fill a pipe your next call to send will block until the other side of it does not get drained.
If you want to expose a Queue capable to work across multiple processes in a similar fashion than the standard one, you need the following features.
A buffer capable of storing messages in arrival order which have not been consumed yet.
A channel capable of transferring such messages across different processes.
Atomic put and get methods able to leave the control to the User on when to block the program flow.
The use of a deque a Thread and a Pipe is one of the simplest way to deliver these features but it's not the only one.
I personally prefer the use of bare pipes to let processes communicate as it gives me more control on my application.

A dequeue can only be in one process memory so using it to pass data between processes is impossible(...*)
You could use just a Pipe but then you would need to protect it with locks, and I guess this is why a dequeue was introduced.

Related

Put large ndarrays fast to multiprocessing.Queue

When trying to put a large ndarray to a Queue in a Process, I encounter the following problem:
First, here is the code:
import numpy
import multiprocessing
from ctypes import c_bool
import time
def run(acquisition_running, data_queue):
while acquisition_running.value:
length = 65536
data = numpy.ndarray(length, dtype='float')
data_queue.put(data)
time.sleep(0.1)
if __name__ == '__main__':
acquisition_running = multiprocessing.Value(c_bool)
data_queue = multiprocessing.Queue()
process = multiprocessing.Process(
target=run, args=(acquisition_running, data_queue))
acquisition_running.value = True
process.start()
time.sleep(1)
acquisition_running.value = False
process.join()
print('Finished')
number_items = 0
while not data_queue.empty():
data_item = data_queue.get()
number_items += 1
print(number_items)
If I use length=10 or so, everything works fine. I get 9 items transmitted through the Queue.
If I increase to length=1000, on my computer the process.join() blocks, although the function run() is already done. I can comment the line with process.join() and will see, that there are only 2 items put in the Queue, so apparently putting data to the Queue got very slow.
My plan is actually to transport 4 ndarray, each with length 65536. For the Thread this worked very fast (<1ms). Is there a way to improve speed of transmitting data for processes?
I used Python 3.4 on a Windows machine, but with Python 3.4 on Linux I get the same behavior.

"Is there a way to improve speed of transmitting data for processes?"
Surely, given the right problem to solve. Currently, you are just filling a buffer without emptying it simultaneously. Congratulations, you have just built yourself a so-called deadlock. The corresponding quote from the documentation is:
Bear in mind that a process that has put items in a queue will wait
before terminating until all the buffered items are fed by the
“feeder” thread to the underlying pipe.
But, let's approach this slowly. First of all, "speed" is not your problem! I understand that you are just experimenting with Python's multiprocessing. The most important insight when reading your code is that the flow of communication between parent and child and especially the event handling does not really make sense. If you have a real-world problem that you are trying to solve, you definitely cannot solve it this way. If you do not have a real-world problem, then you first need to come up with a good one before you should start writing code ;-). Eventually, you will need to understand the communication primitives an operating system provides for inter-process communication.
Explanation for what you are observing:
Your child process generates about 10 * length * size(float) bytes of data (considering the fact that your child process can perform about 10 iterations while your parent sleeps about 1 s before it sets acquisition_running to False). While your parent process sleeps, the child puts named amount of data into a queue. You need to appreciate that a queue is a complex construct. You do not need to understand every bit of it. But one thing really really is important: a queue for inter-process communication clearly uses some kind of buffer* that sits between parent and child. Buffers usually have a limited size. You are writing to this buffer from within the child without simultaneously reading from it in the parent. That is, the buffer contents steadily grow while the parent is just sleeping. By increasing length you run into the situation where the queue buffer is full and the child process cannot write to it anymore. However, the child process cannot terminate before it has written all data. At the same time, the parent process waits for the child to terminate.
You see? One entity waits for the other. The parent waits for the child to terminate and the child waits for the parent to make some space. Such a situation is called deadlock. It cannot resolve itself.
Regarding the details, the buffer situation is a little more complex than described above. Your child process has spawned an additional thread that tries to push the buffered data through a pipe to the parent. Actually, the buffer of this pipe is the limiting entity. It is defined by the operating system and, at least on Linux, is usually not larger than 65536 Bytes.
The essential part is, in other words: the parent does not read from the pipe before the child finishes attempting to write to the pipe. In every meaningful scenario where pipes are used, reading and writing happen in a rather simultaneous fashion so that one process can quickly react to input provided by another process. You are doing the exact opposite: you put your parent to sleep and therefore render it dis-responsive to input from the child, resulting in a deadlock situation.
(*) "When a process first puts an item on the queue a feeder thread is started which transfers objects from a buffer into the pipe", from https://docs.python.org/2/library/multiprocessing.html

If you have really big arrays, you might want to only pass their pickled state -- or a better alternative might be to use multiprocessing.Array or multiprocessing.sharedctypes.RawArray to make a shared memory array (for the latter, see http://briansimulator.org/sharing-numpy-arrays-between-processes/). You have to worry about conflicts, as you'll have an array that's not bound by the GIL -- and needs locks. However, you only need to send array indices to access the shared array data.

One thing you could do to resolve that issue, in tandem with the excellent answer from JPG, is to unload your Queue between every processes.
So do this instead:
process.start()
data_item = data_queue.get()
process.join()
While this does not fully replicate the behavior in the code (number of data counting), you get the idea ;)

Convert array/list to str(your_array)
q.put(str(your_array))

Concurrently searching a graph in Python 3

I'd like to create a small p2p application that concurrently processes incoming data from other known / trusted nodes (it mostly stores it in an SQLite database). In order to recognize these nodes, upon connecting, each node introduces itself and my application then needs to check whether it knows this node directly or maybe indirectly through another node. Hence, I need to do a graph search which obviously needs processing time and which I'd like to outsource to a separate process (or even multiple worker processes? See my 2nd question below). Also, in some cases it is necessary to adjust the graph, add new edges or vertices.
Let's say I have 4 worker processes accepting and handling incoming connections via asynchronous I/O. What's the best way for them to access (read / modify) the graph? A single queue obviously doesn't do the trick for read access because I need to pass the search results back somehow.
Hence, one way to do it would be another queue which would be filled by the graph searching process and which I could add to the event loop. The event loop could then pass the results to a handler. However, this event/callback-based approach would make it necessary to also always pass the corresponding sockets to the callbacks and thus to the Queue – which is nasty because sockets are not picklable. (Let alone the fact that callbacks lead to spaghetti code.)
Another idea that's just crossed my mind might be to create a pipe to the graph process for each incoming connection and then, on the graph's side, do asynchronous I/O as well. However, in order to avoid callbacks, if I understand correctly, I would need an async I/O library making use of yield from (i.e. tulip / PEP 3156). Are there other options?
Regarding async I/O on the graph's side: This is certainly the best way to handle many incoming requests at once but doing graph lookups is a CPU intensive task, thus could profit from using multiple worker threads or processes. The problem is: Multiple threads allow shared data but Python's GIL somewhat negates the performance benefit. Multiple processes on the other hand don't have this problem but how can I share and synchronize data between them? (For me it seems quite impossible to split up a graph.) Is there any way to solve this problem in a nice way? Also, does it make sense in terms of performance to mix asynchronous I/O with multithreading / multiprocessing?

Answering your last question: It does! But, IMHO, the question is: does it makes sense mix Events and Threads? You can check this article about hybrid concurrency models: http://bibliotecadigital.sbc.org.br/download.php?paper=3027
My tip: Start with just one process and an event loop, like in the tulip model. I'll try to explain how can you use tulip to have Events+async I/O (and threads or other processes) without callbacks at all.
You could have something like accept = yield from check_incoming(), which should be a tulip coroutine (check_incoming), and inside this function you could use loop.run_in_executor() to run your graph search in a thread/process pool (I'll explain more about this later). This function run_in_executor() returns a Future, in which you can also yield from tasks.wait([future_returned_by_run_in_executor], loop=self). The next step would be result = future_returned_by_run_in_executor.result() and finally return True or False.
The process pool requires that only pickable objects can be executed and returned. This requirement is not a problem but it's implicit that the graph operation must be self contained in a function and must obtain the graph instance somehow. The Thread pool has the GIL problem since you mentioned CPU bound tasks which can lead to 'acquiring-gil-conflicts' but this was improved in the new Python 3.x GIL. Both solutions have limitations..
So.. instead of a pool, you can have another single process with it's own event loop just to manage all the graph work and connect both processes with a unix domain socket for instance..
This second process, just like the first one, must also accept incoming connections (but now they are from a known source) and can use a thread pool just like I said earlier but it won't "conflict" with the first event loop process(the one that handles external clients), only with the second event loop. Threads sharing the same graph instance requires some locking/unlocking.
Hope it helped!

Is multiprocessing the right tool for me?

I need to write a very specific data processing daemon.
Here is how I thought it could work with multiprocessing :
Process #1: One process to fetch some vital meta data, they can be fetched every second, but those data must be available in process #2. Process #1 writes the data, and Process #2 reads them.
Process #2: Two processes which will fetch the real data based on what has been received in process #1. Fetched data will be stored into a (big) queue to be processed "later"
Process #3: Two (or more) processes which poll the queue created in Process #2 and process those data. Once done, a new queue is filled up to be used in Process #4
Process #4 : Two processes which will read the queue filled by Process(es) #3 and send the result back over HTTP.
The idea behind all these different processes is to specialize them as much as possible and to make them as independent as possible.
All thoses processes will be wrapped into a main daemon which is implemented here :
http://www.jejik.com/articles/2007/02/a_simple_unix_linux_daemon_in_python/
I am wondering if what I have imagined is relevant/stupid/overkill/etc, especially if I run daemon multiprocessing.Process(es) within a main parent process which will daemonized.
Furthermore I am a bit concerned about potential locking problems. In theory processes that read and write data uses different variables/structures so that should avoid a few problems, but I am still concerned.
Maybe using multiprocessing for my context is not the right thing to do. I would love to get your feedback about this.
Notes :
I can not use Redis as a data structure server
I thought about using ZeroMQ for IPC but I would avoid using another extra library if multiprocessing can do the job as well.
Thanks in advance for your feedback.

Generally, your division in different workers with different tasks as well as your plan to let them communicate already looks good. However, one thing you should be aware of is whenever a processing step is I/O or CPU bound. If you are I/O bound, I'd go for the threading module whenever you can: the memory footprint of your application will be smaller and the communication between threads can be more efficient, as shared memory is allowed. Only if you need additional CPU power, go for multiprocessing. In your system, you can use both (it looks like process 3 (or more) will do some heavy computing, while the other workers will predominantly be I/O bound).

What's the advantage of queues over pipes when communicating between processes?

What would be the advantage(s) (if any) of using 2 Queues over a Pipe to communicate between processes?
I am planning on using the multiprocessing python module.

The big win is that queues are process- and thread- safe. Pipes are not: if two different processes try to read from or write to the same end of a pipe, bad things happen. Queues are also at a somewhat higher level of abstraction than pipes, which may or may not be an advantage in your specific case.

Queues hold the messages and retains them until the next time the queue is active and pushes it through...regardless if the pipe or connection is broken...with a pipe/connection, its goodbye to the message with an error...
Hope this helps,
Best regards,
Tom.

Implementing a buffer-like structure in Python

I'm trying to write a small wsgi application which will put some objects to an external queue after each request. I want to make this in batch, ie. make the webserver put the object to a buffer-like structure in memory, and another thread and/or process for sending these objects to the queue in batch, when buffer is big enough or after certain timeout, and clearing the buffer. I don't want to be in NIH syndrome and not want to bother with threading stuff, however I could not find a suitable code for this job. Any suggestions?

Examine https://docs.python.org/library/queue.html to see if it meets your needs.

Since you write "thread and/or process", see also multiprocessing.Queue and multiprocessing.JoinableQueue from 2.6. Those are interprocess variants of Queue.

Use a buffered stream if you are using python 3.0.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.