I'm trying to understand how Process class from the multiprocessing package works.
For this, I wrote a little example, where an object with certain value is created and then that value is changed in subprocess:
from multiprocessing import Process
class Foo:
def __init__(self):
self.value = "foo"
def run(self):
p = Process(target=self.change_value)
p.start()
p.join()
def change_value(self):
self.value = "bar"
print "inside: " + self.value
if __name__ == '__main__':
foo = Foo()
foo.run()
print "outside: " + foo.value
But this code gives me the following result:
>> inside: bar
>> outside: foo
Can someone explain me why it prints old property value ("foo") from the outside of a process despite the fact that second print is executed later?
And how to get actual value of that property ("bar") from the outside?
This is because multiprocessing.Process spawns a completely new, seperate instance of the python-environment in a new process. You will notice that in the taskmanager a new python.exe process will appear as you start the Process. It does - if you don't use the special objects such as Pipe and Queue - not share memory with the process it has been started from.
A little more about the internal work that is done:
You call p.start(). This will pickle the Process object p and spawn a new instance of the python interpreter with an own global state, etc... It does not share memory with the original process. Instead the pickled p is unpickled in the new process and work is done there.
print "inside: " + self.value: This is called by the newly spawned process thus the change done is reflected here
print "outside: " + foo.value: This is called in the original process that does have no idea about the memory of the spawned process and does not have access to it. Thus the foo is not changed in the process
What I guess you intended to use
Most likely the class you search for is threading.Thread. It offers the same interface as Process but it shares the global state and the python environment with the Thread it is started from. Any changes to objects in a spawned Thread can be read from outside.
Related
Given an instance method that mutates an instance variable, running this method in the ProcessPoolExecutor does run the method but does not mutate the instance variable.
from concurrent.futures import ProcessPoolExecutor
class A:
def __init__(self):
self.started = False
def method(self):
print("Started...")
self.started = True
if __name__ == "__main__":
a = A()
with ProcessPoolExecutor() as executor:
executor.submit(a.method)
assert a.started
Started...
Traceback (most recent call last):
File "/path/to/file", line 19, in <module>
assert a.started
AssertionError
Are only pure functions allowed in ProcessPoolExecutor?
For Windows
Multiprocessing does not share it's state with the child processes on Windows systems. This is because the default way to start child processes on Windows is through spawn. From the documentation for method spawn
The parent process starts a fresh python interpreter process. The child process will only inherit those resources necessary to run the process object’s run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver
Therefore, when you pass any objects to child processes, they are actually copied, and do not have the same memory address as in the parent process. A simple way to demonstrate this through your example would be to print the objects in the child process and the parent process:
from concurrent.futures import ProcessPoolExecutor
class A:
def __init__(self):
self.started = False
def method(self):
print("Started...")
print(f'Child proc: {self}')
self.started = True
if __name__ == "__main__":
a = A()
print(f'Parent proc: {a}')
with ProcessPoolExecutor() as executor:
executor.submit(a.method)
Output
Parent proc: <__main__.A object at 0x0000028F44B40FD0>
Started...
Child proc: <__mp_main__.A object at 0x0000019D2B8E64C0>
As you can see, both objects reside at different places in the memory. Altering one would not affect the other whatsoever. This is the reason why you don't see any changes to a.started in the parent process.
Once you understand this, your question then becomes then how to share the same object, rather than copies, to the child processes. There are numerous ways to go about this, and questions on how to share complex objects like a have already been asked and answered on stackoverflow.
For UNIX
The same could be said for other methods of starting new processes that UNIX based systems have the option of using (I am not sure the default for concurrent.futures on OSX). For example, from the documentation for multiprocessing, fork is explained as
The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic.
So fork creates child processes that share the entire memory space of the parent process on start. However, it uses copy-on-write to do so. What this means is that if you attempt to modify any object that is shared from within the child process, it will have to create a duplicate of that particular object as to not interrupt the parent process and localize that object to the child process (much like what spawn does on start).
Hence the answer still stands: if you plan to modify the objects passed to the child process, or if you are not on UNIX systems, you will need to share the objects yourself to have them point to the same memory address
Further reading on start methods.
I'm trying to figure out how Lock works under the hood. I run this code on MacOS which using "spawn" as default method to start new process.
from multiprocessing import Process, Lock, set_start_method
from time import sleep
def f(lock, i):
lock.acquire()
print(id(lock))
try:
print('hello world', i)
sleep(3)
finally:
lock.release()
if __name__ == '__main__':
# set_start_method("fork")
lock = Lock()
for num in range(3):
p = Process(target=f, args=(lock, num))
p.start()
p.join()
Output:
140580736370432
hello world 0
140251759281920
hello world 1
140398066042624
hello world 2
The Lock works in my code. However, the ids of lock make me confused. Since idare different, are they still same one lock or there are multiple locks and they somehow communicate secretly? Is id() still hold the position in multiprocessing, I quote "CPython implementation detail: id is the address of the object in memory."?
If I use "fork" method, set_start_method("fork"), it prints out identical id which totally make sense for me.
id is implemented as but not required to be the memory location of the given object. when using fork, the separate process does not get it's own memory space until it modifies something (copy on write), so the memory location does not change because it "is" the same object. When using spawn, an entire new process is created and the __main__ file is imported as a library into the local namesapce, so all your same functions, classes, and module level variables are accessable (sans any modifications from anything that results from if __name__ == "__main__":). Then python creates a connection between the processes (pipe) in which it can send which function to call, and the arguments to call it with. everything passing through this pipe must be pickle'd then unpickle'd. Locks specifically are re-created when un-pickling by asking the operating system for a lock with a specific name (which was created in the parent process when the lock was created, then this name is sent across using pickle). This is how the two locks are synchronized, because it is backed by an object the operating system controls. Python then stores this lock along with some other data (the PyObject as it were) in the memory of the new process. calling id now will get the location of this struct which is different because it was created by a different process in a different chunk of memory.
here's a quick example to convince you that a "spawn'ed" lock is still synchronized:
from multiprocessing import Process, Lock, set_start_method
def foo(lock):
with lock:
print(f'child process lock id: {id(lock)}')
if __name__ == "__main__":
set_start_method("spawn")
lock = Lock()
print(f'parent process lock id: {id(lock)}')
lock.acquire() #lock the lock so child has to wait
p = Process(target=foo, args=(lock,))
p.start()
input('press enter to unlock the lock')
lock.release()
p.join()
The different "id's" are the different PyObject locations, but have little to do with the underlying mutex. I am not aware that there's a direct way to inspect the underlying lock which the operating system manages.
I know that child processes won't see changes made after a fork/spawn, and Windows processes don't inherit globals not using shared memory. But what I have is a situation where the children can't see changes to a global variable in shared memory made before the fork/spawn.
Simple demonstration:
from multiprocessing import Process, Value
global foo
foo = Value('i',1)
def printfoo():
global foo
with foo.get_lock():
print(foo.value)
if __name__ == '__main__':
with foo.get_lock():
foo.value = 2
Process(target=printfoo).start()
On Linux and MacOS, this displays the expected 2. On Windows, it displays 1, even though the modification to the global Value is made before the call to Process. How can I make the change visible to the child process on Windows, too?
The problem here is that your child process creates a new shared value, rather than using the one the parent created. Your parent process needs to explicitly send the Value to the child, for example, as an argument to the target function:
from multiprocessing import Process, Value
def use_shared_value(val):
val.value = 2
if __name__ == '__main__':
val = Value('i', 1)
p = Process(target=use_shared_value, args=(val,))
p.start()
p.join()
print(val.value)
(Unfortunately, I don't have a Windows Python install to test this on.)
Child processes cannot inherit globals on Windows, regardless of whether those globals are initialized to multiprocessing.Value instances. multiprocessing.Value does not change the fact that the child re-executes your file, and re-executing the Value construction doesn't use the shared resources the parent allocated.
I want to pass a state attribute (bool) to a second process. The main process initializes this process and passes the regarding attribute within the constructor. Depending on this attribute the second process should print different values.
The class 'TestClass' is located in separated file.
This is the sub process in a second file (let‘s call it subprocess.py), which doesn‘t print my desired results.
from multiprocessing import Process, Value
import time
# this class is executed in a second process and reacts to changes in
# in the main process
class TestClass(Process):
def __init__(self, value):
Process.__init__(self)
self.__current_state = value
def run(self):
while True:
if bool(self.__current_state):
print("Hello World")
else:
print("Not Hello World")
time.sleep(0.5)
This is the main process, which is executed
import time
from SubProcess import TestClass
# main procedur
value_to_pass = Value('i', False).value
test_obj = TestClass(value_to_pass)
test_obj.start()
while True:
if bool(value_to_pass):
value_to_pass = Value('i', False).value
else:
value_to_pass = Value('i', True).value
# wait some time
time.sleep(0.4)
At the end I would like to have an alternating output of Not Hello World and Hello World, which would successfully indicate the passing of the state argument.
At the moment it just outputs the print depending on my initialization of value_to_pass. It obviously does never change its value.
Using global attributes doesn't fit my requirements, because it is a question of different files. Furthermore using object attributes just fits if I use threads. Later I will use a RaspberryPi. I will handle multiple sensors with it. Hence, I'm forced to use multiple processes.
Thank you!
I am using the multiprocessing module in python. Here is a sample of the code I am using:
import multiprocessing as mp
def function(fun_var1, fun_var2):
b = fun_var1 + fun_var2
# and more computationally intensive stuff happens here
return b
# my program freezes after the return command
class Worker(mp.Process):
def __init__(self, queue_obj, func_var1, func_var2):
mp.Process.__init__(self)
self.queue_obj = queue_obj
self.func_var1 = func_var1
self.func_var2 = func_var2
def run(self):
self.var = function( self.func_var1, self.func_var2 )
self.queue_obj.put(self.var)
if __name__ == '__main__':
mp.freeze_support()
queue_list = []
processes = []
result = []
for i in range(2):
queue_list.append(mp.Queue())
processes.append( Worker(queue_list[i], i, var1, var2 )
processes[i].start()
for i in range(2):
processes[i].join()
result.append(queue_list[i].get())
During runtime of the program two instances of the worker class are generated which work simultaneously. One instance finishes after about 2 minutes and the other would take about 7 minutes. The first instance returns its results fine. However, the second instance freezes the program when the function() that is called within the run() method returns its value. No error is being thrown, the program just does not continue to execute. The console also indicates that it is busy but not displaying the >>> prompt. I am completely clueless why this behavior occurs. The same code works fine for slightly different inputs in the two Worker instances. The only difference I can make out is that the work loads are more equal when it executes correctly. Could the time difference cause trouble? Does anyone have experience with this kind of behavior? Also note that if I run a serial setup of the program in which function() is just called twice by the main program, the code executes flawlessly. Could there be some timeout involved in the worker instance that makes it impossible for function() to return its value to the Worker instance? The return value of function() is actually a list that is fairly small. It contains about 100 float values.
Any suggestions are welcomed!
This is a bit of an educated guess without actually seeing what's going on in worker, but is it possible that your child has put items into the Queue that haven't been consumed? The documentation has a warning about this:
Warning
As mentioned above, if a child process has put items on a queue (and
it has not used JoinableQueue.cancel_join_thread), then that process
will not terminate until all buffered items have been flushed to the
pipe.
This means that if you try joining that process you may get a deadlock
unless you are sure that all items which have been put on the queue
have been consumed. Similarly, if the child process is non-daemonic
then the parent process may hang on exit when it tries to join all its
non-daemonic children.
Note that a queue created using a manager does not have this issue.
See Programming guidelines.
It might be worth trying to create your Queue object using mp.Manager.Queue and see if the issue goes away.