I use a list of processes with queues for each one. Another thread is used to fill these queues one after the other and the processes fetch the data from it. The problem is that after a while the queues raise an empty exception from within the processes but the thread get a full exception. When I check the queue size it is consistent with the exceptions.
To make it worse this behavior can only be reproduced as part of a large code base, i can’t generate a small program to reproduce this.
Anyone had similar issues with multiprocessing queues not being consistent in different processes?
Edit
To add more to the description of the pipeline. I have multiple worker objects, each worker has an input queue (multiprocessing.Queue), a worker queue (multiprocessing.Queue), an output queue (threading.Queue), a worker process (multiprocessing.Process) and a manager thread (threading.Thread).
Against all these workers, I have a single feeder thread (threading.Thread) that adds sample identifiers to the input queues of all workers, one by one. The sample identifiers are very small in size (paths of files) so the feeder thread can keep up with the processes.
The worker gets the sample identifiers from the input queue, reads these samples, processes them and puts them into the worker queue on by one. The manager thread reads the data in the worker queues and puts it into the output queue because multiprocessing.Queue is slower on read.
All .get() and .put() calls have timeouts and I keep track of time it takes to get new data from this pipeline. I also have mechanisms for closing it and reopening it, by joining all processes and threads (even for queues) and then recreating all of them from scratch. When everything is working, the main process goes over the workers and reads the data off of their output queue one by one. It also takes a few ms to read new data most of the time.
This whole pipeline exists two times in my code (used for machine learning with Tensorflow). One instance is used for training and is created close to the beginning of the program, the other is used for testing. The second instance is created after a while of training, it goes over all of my dataset and then resets. When the second instance is run for the second time it gets stuck after 1000 samples or so. When it is stuck and I break on debug mode in the main process, I see that the input queue is full and the worker and output queues are empty. When I then break inside one of the worker processes I see that their input queue is empty. It seems like for some reason the worker process sees a different input queue than it should. Note that this is not some race issue because this result is stable.
Edit 2
I zeroed in on the point that the program hangs on. It seems like performing json.loads() on read file data. This means that the problem is different than what originally described. The processes hang and don't see an empty queue.
code for opening the file:
with open(json_path, 'r') as f:
data = f.read()
json_data = json.loads(data) # <== program hangs at this line
I tried using signal.alarm package to pinpoint where in json.loads() the program hangs but it doesn't raise the exception. The problem is reproduced with a single multiprocessing.Process as well, but not when all processing is done in the main process.
Rings a bell to anyone?
Related
In the following snippet, as I understand, a pool of two processes is being created and then the main script enters an infinite loop while continuously checking for messages and delegates the task to some function action_fn if it finds any message.
p = Pool(processes = 2)
while True:
message = receive_message_from_queue()
if message is not None:
# Do some task
p.map_async(action_fn, [temp_iterables])
What would happen here if there are 100 messages in the queue? Will there be 100 processes created by python? Or is it that at any time only two messages will be processed? Also, in the case such as this, what is the way to kill the process when its task is done and recreate the process when there is a new message?
The Pool of Workers is a design pattern which aims to separate the service logic from the business logic.
With service logic is intended all the logic needed to support a given task such as data storage and retrieval, metrics, logging and error handling.
Business logic instead refers to the components which do the "actual job" such as enriching or transforming the data, generating statistics etc.
It is usually implemented adopting the Publisher/Subscriber design pattern where one or more workers listen to a queue of jobs which is fed from the service side.
Most of the Pool implementations require the User to set a static number of workers during their declaration. Some more advanced ones allow to change the number of workers dynamically.
Jobs can be scheduled in non-blocking (asynchronous) fashion allowing the service to continue its execution flow or in blocking (synchronous) mode stopping the execution until results are not ready.
In your specific example, you are declaring a Pool with 2 workers. Assuming you are using the multiprocessing.Pool class, the interpreter will start 2 processes which will wait for new jobs. When you call map_async, the iterable gets split into multiple chunks which get enqueued inside the Pool internal queue. The workers will pick the chunks in the order they arrive, run the action_fn function against them and publish the results in a second results queue which gets consumed by the service.
Multiple calls to map_async result in more chucks getting appended to the internal queue. Virtually the queue is infinite in size. Practically, if you manage to fill it up, the subsequent call to map_async would block until the workers make some more space for new jobs to be enqueued.
You don't need to "kill the process when is done" as the Pool manages the workflow for you in a transparent manner. Concretely, the process never dies. It simply picks the next task from the queue and executes it until there are no more tasks available. At that point it goes into sleep until either new tasks are not scheduled or the Pool itself is terminated.
The problem:
When sending 1000 tasks to apply_async, they run in parallel on all 48 CPUs, but then sometimes fewer and fewer CPUs run, until only one CPU left is running, and only when the last one finishes its task, then all the CPUs continue running again each with a new task. It shouldn't need to wait for any "task batch" like this..
My (simplified) code:
from multiprocessing import Pool
pool = Pool(47)
tasks = [pool.apply_async(json2features, (j,)) for j in jsons]
feats = [t.get() for t in tasks]
jsons = [...] is a list of about 1000 JSONs already loaded to memory and parsed to objects.
json2features(json) does some CPU-heavy work on a json, and returns an array of numbers.
This function may take between 1 second and 15 minutes to run, and because of this I sort the jsons using a heuristic, s.t. hopefully the longest tasks are first in the list, and thus start first.
The json2features function also prints when a task is finished and how long it took. It all runs on an ubuntu server with 48 cores and like I said above, it starts out great, using all 47 cores. Then as the tasks get completed, fewer and fewer cores run, which at first sounds perfectly ok, where it not because after the last core is finished (when I see its print to stdout), all CPUs start running again on new tasks, meaning it wasn't really the end of the list. It may do the same thing again, and then again for the actual end of the list.
Sometimes it can be using just one core for 5 minutes, and when the task is finally done, it starts using all cores again, on new tasks. (So it's not stuck on some IPC overhead)
There are no repeated jsons, nor any dependencies between them (it's all static, fresh-from-disk data, no references etc..), nor any dependency between json2features calls (no global state or anything) except for them using the same terminal for their print.
I was suspicious that the problem was that a worker doesn't get released until get is called on its result, so I tried the following code:
from multiprocessing import Pool
pool = Pool(47)
tasks = [pool.apply_async(print, (i,)) for i in range(1000)]
# feats = [t.get() for t in tasks]
And it does print all 1000 numbers, even though get isn't called.
I have ran out of ideas right now what the problem might be.
Is this really the normal behavior of Pool?
Thanks a lot!
The multiprocessing.Pool relies on a single os.pipe to deliver the tasks to the workers.
Usually on Unix, the default pipe size range from 4 to 64 Kb in size. If the JSONs you are delivering are large in size, you might get the pipe clogged at any given point in time.
This means that, while one of the workers is busy reading the large JSON from the pipe, all the other workers will starve.
It is generally a bad practice to share large data via IPC as it leads to bad performance. This is even underlined in the multiprocessing programming guidelines.
Avoid shared state
As far as possible one should try to avoid shifting large amounts of data between processes.
Instead of reading the JSON files in the main process, just send the workers their file names and let them open and read the content. You will surely notice an improvement in performance because you are moving the JSON loading phase in the concurrent domain as well.
Note that the same is true also for the results. A single os.pipe is used to return the results to the main process as well. If one or more workers clog the results pipe then you will get all the processes waiting for the main one to drain it. Large results should be written to files as well. You can then leverage multithreading on the main process to quickly read back the results from the files.
I have a requirement to retrieve and replay a trace file (in python) containing transactions from different processes. We have to simulate the original scenrios in trace file, so we first separate the trace file into different pieces in each of which only contains the transactions from the single process and replay them in parallel. And further, in order to maintain the same ordering of different processes' transactions reflected in original trace, a series of multiprocessing.Event primitives are inserted in specific points of each piece for interprocess synchronization.
Our program mainly handle 2 steps asynchronously, retrieving and replaying. And since the trace file is very big, we would process it chunk-by-chunk:
The main (parent) process retrieves the the trace file chunk by chunk. For each chunk, different pieces would be generated and the multiprocessing.Event primitives are inserted for interprocess synchronization in later replay. The main process maintains a Event() list, whenever a processes-intertwined point is detected, the list incremented via eventList.append(Event()), and this new appended Event() would be referenced by eventList[i].set and eventList[i].wait pair which are inserted to specific points in 2 pieces. As soon as the first chunk is processed, child processes are spawned and the pieces would be distributed to respective child process for replaying. Later retrieved chunks would be thrown to the queue for respective childs.
Child processes are spawned after first chunk is retrieved by parent. Each child process replays one single piece which contains the transactions from same process.
The problem here is that the child processes are spawned just after the first chunk retrieved, so the eventList at this time point would be copy to child processes and it's ok for synchronized replaying the first chunk, but the main process now continues processing the second chunk, and the eventList incremented for the second chunk would not be realized by the child processes, and when later replaying the second chunk, the program fails.
I realize that the multiprocessing.Manager supporting list can share the memory between different processes, but seems the list can not accommodate Event(). The exception "RuntimeError: Semaphore objects should only be shared between processes through inheritance" pops up when append a Event() to Manager().list(). I tried also auto-generate new variable eventXX=Manager().Event() whenever needed, but seems it causes the performance dramatically decreased.
Does somebody know how can I make the child process realizes such multiprocessing.Event list change in parent? Or do you have any other methods for achieving this?
Thanks in advance.
While using the multiprocessing module in Python, is there a way to prevent the process of switching off to another process for a certain time?
I have ~50 different child processes spawned to retrieve data from a database (each process = each table in DB) and after querying and filtering the data, I try to write the output to an excel file.
Since all the processes have similar steps, they all end up to the writing process at similar times, and of course since I am writing to a single file, I have a lock that prevents multiple processes to write on the file.
The problem is though, the writing process seems to take very long, compared to the writing process when I had the same amount of data written in a single process (slower by at least x10)
I am guessing one of the reasons could be that while writing, the cpu is constantly switching off to other processes, which are all stuck at the mutex lock, only to come back to the process that is the only one that is active. I am guessing the context switching is a significant waste of time since there are a lot of processes to switch back and forth from
I was wondering if there was a way to lock a process such that for a certain part of the code, no context switching between processes happen
Or any other suggestions to speed up this process?
Don't use locking and don't write from multiple processes; Let the child processes return the output to the parent (e.g. via standard output), and have it wait for the processes to join to read it. I'm not 100% on the multiprocessing API but you could just have the parent process sleep and wait for a SIGCHLD and only then read data from an exited child's standard output, and write it to your output file.
This way only one process is writing to the file and you don't need any busy looping or whatever. It will be much simpler and much more efficient.
You can raise the priority of your process (Go to Task Manager, right click on the process and raise it's process priority). However the OS will context switch no matter what, your process has no better claim then other processes to the OS.
I'm fairly familiar with the python multiprocessing module, but I'm unsure of how to implement this setup. My project has this basic flow:
request serial device -> gather response -> parse response -> assert response -> repeat
It is right now a sequential operation that loops over this until it has gather the desired number of asserted responses. I was hoping to speed this task up by having a 'master process' do the first two operations, and then pass off the parsing and assertion task into a queue of worker processes. However, this is only beneficial if the master process is ALWAYS running. I'm guaranteed to be working on a multi-core machine.
Is there any way to have a process in the multiprocessing module always have focus / make run so I can achieve this?
From what I can gather (assuming that you don't have a stringent requirement that the master is always logging the data from the serial device) you just want the master to be ready to give any worker a chunk of data and be ready to receive data from any worker as soon as the worj=ker is ready.
to acheive this use two queus and multiprocessing
Multiprocessing Queue in Python
How to use multiprocessing queue in Python?
this should be sufficient fro your needs if time(parse data)>>gather data
Here's one way to implement your workflow:
Have two multiprocessing.Queue objects: tasks_queue and
results_queue. The tasks_queue will hold device outputs, and results_queue will hold results of the assertions.
Have a pool of workers, where each worker pulls device output from
tasks_queue, parses it, asserts, and puts the result of assertion on the results_queue.
Have another process continuously polling device and put device
output on the tasks_queue.
Have one last process continuously polling results_queue, and
ending the overall program when the desired number of resuts (successful
assertions) is reached.
Total number of processes (multiprocessing.Process objects) is 2 + k, where k is the number of workers in the pool.