Get Data from Other Processes using Multiprocessing

Get Data from Other Processes using Multiprocessing - python

(Language is Python 3)
I am writing a program with the module multiprocessing and using Pool. I need some variable that is shared between all of the processes. The parent process will initialize this variable and pass it as an argument to p.map(). I want the child processes to change this variable. The intent of this is because the first part of the child processes' work should be done in parallel (computational work that doesn't need any other processes' data). But, the second part of the processes' work needs to be done in order, one process after another, because they are writing to a file and the contents of that file should be in order. I want each process to wait until the others are done before moving on. I will record the "progress" of the entire program with the variable, e.g. when the first process is done writing to the file, it will increment the variable by one. I want this to be a signal to the next process in line to begin writing to the file. But I need some sort of waituntil() function to make the processes wait until the Value variable indicates that it is their "turn" to write to the file.
Here are my two problems:
I need a variable that the child processes can edit, and the child processes can actually get the value of that variable. What type of variable should I use? Should I use Value, Manager, or something else?
I need the processes to wait until the variable described above equals to a certain value, signaling that it is their turn to write to the file. Is there any sort of waituntil() function that I can use?

What you are looking for is called Synchronization.
There are multitudes of different synchronization primitives to choose from.
You should never attempt to write synchronization primitives on your own, as it is non-trivial to do correctly!
In your case either an Event or a Condition might be suitable.

Related

"Published" value accessible across processes in python

I'm writing software in python (3.7) that involves one main GUI thread, and multiple child processes that are each operating as state machines.
I'd like the child processes to publish their current state machine state name so the main GUI thread can check on what state the state machines are in.
I want to find a way to do this such that if the main process and the child process were trying to read/write to the state variable at the same time, the main process would immediately (with no locking/waiting) get a slightly out-of-date state, and the child process would immediately (with no locking/waiting) write the current state to the state variable.
Basically, I want to make sure the child process doesn't get any latency/jitter due to simultaneous access of the state variable, and I don't care if the GUI gets a slightly outdated value.
I looked into:
using a queue.Queue with a maxsize of 1, but the behavior of
queue.Queue is to block if the queue runs out of space - it would
work for my purposes if it behaved like a collections.deque and
silently made the oldest value walk the plank if a new one came in
with no available space.
using a multiprocessing.Value, but from
the documentation, it sounds like you need to acquire a lock to
access or write the value, and that's what I want to avoid - no
locking/blocking for simultaneous read/writes. It says something
vague about how if you don't use the lock, it won't be 'process-safe',
but I don't really know what that means - what bad things would
happen exactly without using a lock?
What's the best way to accomplish this? Thanks!

For some reason, I had forgotten that it's possible to put into a queue in a non-blocking way!
The solution I found is to use a multiprocessing.Queue with maxsize=1, and use non-blocking writes on the producer (child process) side. Here's a short version of what I did:
Initializing in parent process:
import multiprocessing as mp
import queue
publishedValue = mp.Queue(maxsize=1)
In repeatedly scheduled GUI function ("consumer"):
try:
# Attempt to get an updated published value
publishedValue.get(block=False)
except queue.Empty:
# No new published value available
pass
In child "producer" process:
try:
# Clear current value in case GUI hasn't already consumed it
publishedValue.get(block=False)
except queue.Empty:
# Published value has already been consumed, no problem
pass
try:
# Publish new value
publishedValue.put(block=False)
except queue.Full:
# Can't publish value right now, resource is locked
pass
Note that this does require that the child process can repeatedly attempt to re-publish the value if it gets blocked, otherwise the consumer might completely miss a published value (as opposed to simply getting it a bit late).
I think this may be possible in a bit more concise way (and probably with less overhead) with non-blocking writes to a multiprocessing.Value object instead of a queue, but the docs don't make it obvious (to me) how to do that.
Hope this helps someone.

How can I ensure that only one process is running a function in python multiprocess?

I have a function that is invoked by potentially multiple processes created with multiprocessing. I want to ensure not serialization, but single execution by the original process, that is only the main process will perform some logic, and the others will do nothing.
One option is to use a RLock with blocking=False, but this does not guarantee that the main process will perform the execution. I don't want to differentiate on current_process().name because it just doesn't feel right, and as far as I understand the name is arbitrary and not necessarily unique anyway.
Is there a more elegant way to ensure this? In MPI I used to do it with the id.

Are global variables get replicated in each process when doing multiprocessing in Python?

We have used parallel processing by having some functions being called by runInParallel that you will find in this answer: https://stackoverflow.com/a/7207336/720484
All of these functions are supposed to have access to a single global variable which they should read.
This global variable is actually an instance of a class. This instance contains a member variable/attribute and all of the processes read and write to it.
However things are not happening like this. The object(class instance) seems to be replicated and that its attributes are independent on each process. So if one process changes the value this is not visible to the variable of the other process.
Is this the expected behavior?
How to overcome it?
Thank you

All children processes will inherit that instance at the moment of forking from the parent process. Any changes made to the instance in the children and in the parent will NOT be seen after the fork.
This is how the processes work in Linux — every process has its own memory, protected from other processes (unless you intentionally shared it). It is not Python-specific.
What you are looking for is called IPC (Inter-Process Communication) in general. There are multiple ways how the processes can communicate with each another. You might want to use pipes or the shared memory.
In Python, read this: https://docs.python.org/2/library/multiprocessing.html#sharing-state-between-processes

multiprocess a function that calls another function in python

I'am trying to speedup a Python program , I made remark that there is a thread always running that scans the inputs from an external resource, and when it gets something, it will call another function that will parse the input data and return an understandable information (the parsing function also uses other functions).
A simple model of the scanning() function
def scanning(x):
alpha = GetSomething(x)
if alpha != 0:
print Parsing(alpha)
So my idea is to convert this thread into a process that will run in parallel with the main process, and when it gets something, it will send it using a Queue to the main process which should then call the parsing function.
My questions are: it is possible to keep the scanning()function as it is and use it inside a process (even if it calls other functions)?
If not, what are the required modifications on the structure of the scanning() function to be used conveniently with the multiprocessing module?
What is the proper way to multiprocess a function that calls other functions in Python ?

Short answer: yes, it is possible.
To understand why, you need to understand one thing about multiprocessing. It does not remove multiprocessing-invoked function into a separate process: it creates a full replica of your entire process: including it's code, loaded modules and any global data that have been initialized before you forked your processes.
So if your code has some sub-functions defined, they will be available to your function after it's been split into a separate process, along with any data that have been pre-initialized. Any modifications to values, functions and namespaces of your main process after forking processes will not affect the forked process at all - you need to use special tools to communicate between processes.
So, let's suppose you have the following abstract code:
import SomeModule
define SomeFunction()
assign SomeValue
define ChildProcess():
call SomeFunction()
increase SomeValue
do ChildProcessStuff
start ChildProcess()
decrease SomeValue
do MainProcessStuff
For both main and spawned processes, your code executes identically until the line start ChildProcess(). After this line your process splits into two which are fully identical at first, but have different points of execution. Main process goes past this line and proceeds straight to do MainProcessStuff, while your child process will never reach that line. Instead, it creates a replica of entire namespace and starts executing ChildProcess() as if it was called like a normal function followed by an exit().
Note how both main and child processes have access to SomeValue. Also note how their changes to it are independent, as they're doing them in different namespaces (and therefore to different SomeValues). This wouldn't be the case with threading module which does not split the namespaces, and it's an important distinction.
Also note that main process never executes the code in ChildProcess but it retains a reference to it, which can be used to track it's progress, terminate it prematurely etc.
You might also be interested in more in-depth information about Python threads and processes here.

Parallel python loss of data

I have a python function that creates and stores a object instance in a global list and this function is called by a thread. While the thread runs the lists is filled up as it should be, but when the thread exits the list is empty and I have no idea why. Any help would be appreciated.
simulationResults = []
def run(width1, height1, seed1, prob1):
global simulationResults
instance = Life(width1, height1, seed1, prob1)
instance.run()
simulationResults.append(instance)
this is called in my main by:
for i in range(1, nsims + 1):
simulations.append(multiprocessing.Process(target=run, args=(width, height, seed, prob)))
simulations[(len(simulations) - 1)].start()
for i in simulations:
i.join()

multiprocessing is based on processes, not threads. The important difference: Each process has a separate memory space, while threads share a common memory space. When first created, a process may (depending on OS, spawn method, etc.) be able to read the same values the parent process has, but if it writes to them, only the local values are changed, not the parent's copy. Only threads can rely on being able to access an arbitrary single shared global variable and have it behave as expected.
I'd suggest looking at either multiprocessing.Pool and its various methods to dispatch tasks and retrieve their results later, or if you must use raw Processes, look at the various ways to exchange data between processes; you can't just assign to a global variable, because globals stop being shared when the new Process is forked/spawned.

In your code you are creating new processes rather than threads. When the process is created the new process will have deep copies of the variables in the main process, but they are independent from each other. I think for your case it makes sense to use processes rather than threads because It would allow you to utilise multiple cores as opposed to thread that will be limited to a single core due to GIL.
You will have to use interprocess communication techniques to communicate between processes. But since in your case the processes are not persistent daemons, it would make sense to write the simulationResults into a different unique file by each process and read them back from the main process.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Get Data from Other Processes using Multiprocessing - python

Related

"Published" value accessible across processes in python

How can I ensure that only one process is running a function in python multiprocess?

Are global variables get replicated in each process when doing multiprocessing in Python?

multiprocess a function that calls another function in python

Parallel python loss of data

Categories

Resources