Multiprocessing issue: function calling [duplicate] - python

This question already has answers here:
Appending to the same list from different processes using multiprocessing
(3 answers)
Closed 2 years ago.
from multiprocessing import Process
a=[]
def one():
for i in range(3):
a.append(i)
def main():
p1=Process(target=one)
p1.start()
if __name__=='__main__':
main()
print('After calling from Multi-process')
print(a)
one()
print('Calling outside Multi-process')
print(a)
Output:
After calling from Multi-process:
[]
Calling outside Multi-process:
[0, 1, 2]
Why elements are not getting appended to a when calling the function one from Process?

Process of multi-processing creates sub-processes where the all relevant memories are copied over and separately modified, that is, it doesn't share the memory locations for the global variables.
If you really want to make this work, you can use Threading instead of Process. That does share the global memory locations as opposed to making multiple copies of global variables.
Do from threading import Thread and p1=Thread(target=one) instead.

I checked the documentation and I have some understanding, which may be helpful to you.
The python documentation has the following description of the Process class in the multiprocessing package:
Contexts and start methods
Depending on the platform, multiprocessing supports three ways to start a process. These start methods are
spawn
The parent process starts a fresh python interpreter process. The child process will only inherit those resources necessary to run the process object’s run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver.
Available on Unix and Windows. The default on Windows and macOS.
fork
The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic.
Available on Unix only. The default on Unix.
forkserver
When the program starts and selects the forkserver start method, a server process is started. From then on, whenever a new process is needed, the parent process connects to the server and requests that it fork a new process. The fork server process is single threaded so it is safe for it to use os.fork(). No unnecessary resources are inherited.
Available on Unix platforms which support passing file descriptors over Unix pipes.
From the description, we can know that the Process class uses os.fork to run the method specified by target in several cases.
os.fork will completely copy all objects of the parent process except for resources such as file descriptors for use by the child process.
Therefore, the list a operated by the one method running in the child process is the unique memory space of the child process itself, and will not change the list a owned by the parent process.
To verify, we can simply modify the one method, example:
def one():
for i in range(3):
a.append(i)
print('pid:', os.getpid(), 'ppid:', os.getppid(), 'list:', a)
Then we run this script again, and we will get the following results:
After calling from Multi-process
[] // Print directly in the parent process
pid: 6990 ppid: 1419 list: [0] // Call the `one` method in the parent process
pid: 6990 ppid: 1419 list: [0, 1] // Call the `one` method in the parent process
pid: 6990 ppid: 1419 list: [0, 1, 2] // Call the `one` method in the parent process
Calling outside Multi-process
[0, 1, 2] // Print directly in the parent process
pid: 6991 ppid: 6990 list: [0] // Call the `one` method in the child process
pid: 6991 ppid: 6990 list: [0, 1] // Call the `one` method in the child process
pid: 6991 ppid: 6990 list: [0, 1, 2] // Call the `one` method in the child process

Related

ProcessPoolExecutor does not mutate instance variable when submitting instance method

Given an instance method that mutates an instance variable, running this method in the ProcessPoolExecutor does run the method but does not mutate the instance variable.
from concurrent.futures import ProcessPoolExecutor
class A:
def __init__(self):
self.started = False
def method(self):
print("Started...")
self.started = True
if __name__ == "__main__":
a = A()
with ProcessPoolExecutor() as executor:
executor.submit(a.method)
assert a.started
Started...
Traceback (most recent call last):
File "/path/to/file", line 19, in <module>
assert a.started
AssertionError
Are only pure functions allowed in ProcessPoolExecutor?
For Windows
Multiprocessing does not share it's state with the child processes on Windows systems. This is because the default way to start child processes on Windows is through spawn. From the documentation for method spawn
The parent process starts a fresh python interpreter process. The child process will only inherit those resources necessary to run the process object’s run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver
Therefore, when you pass any objects to child processes, they are actually copied, and do not have the same memory address as in the parent process. A simple way to demonstrate this through your example would be to print the objects in the child process and the parent process:
from concurrent.futures import ProcessPoolExecutor
class A:
def __init__(self):
self.started = False
def method(self):
print("Started...")
print(f'Child proc: {self}')
self.started = True
if __name__ == "__main__":
a = A()
print(f'Parent proc: {a}')
with ProcessPoolExecutor() as executor:
executor.submit(a.method)
Output
Parent proc: <__main__.A object at 0x0000028F44B40FD0>
Started...
Child proc: <__mp_main__.A object at 0x0000019D2B8E64C0>
As you can see, both objects reside at different places in the memory. Altering one would not affect the other whatsoever. This is the reason why you don't see any changes to a.started in the parent process.
Once you understand this, your question then becomes then how to share the same object, rather than copies, to the child processes. There are numerous ways to go about this, and questions on how to share complex objects like a have already been asked and answered on stackoverflow.
For UNIX
The same could be said for other methods of starting new processes that UNIX based systems have the option of using (I am not sure the default for concurrent.futures on OSX). For example, from the documentation for multiprocessing, fork is explained as
The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic.
So fork creates child processes that share the entire memory space of the parent process on start. However, it uses copy-on-write to do so. What this means is that if you attempt to modify any object that is shared from within the child process, it will have to create a duplicate of that particular object as to not interrupt the parent process and localize that object to the child process (much like what spawn does on start).
Hence the answer still stands: if you plan to modify the objects passed to the child process, or if you are not on UNIX systems, you will need to share the objects yourself to have them point to the same memory address
Further reading on start methods.

How to safely destroy a joined multiprocessing object in a list?

I'm using a list to keep track of all the processes being spawned in a system and want to implement a cleanup mechanism to remove the ones that are stopped. Here is a basic example:
import multiprocessing as mp
class WorkerManager:
def __init__(self):
self.process_list = []
def spawn_process(self):
process = mp.Process(target=foo, args=(arg1, ))
self.process_list.append(process)
process.start()
If I loop through the list and call multiprocess.terminate() on the processes that are not alive, I still see the objects there. How can I safely remove the references?
Note: although it's not in the sample code above, I am properly calling multiprocessing.join() after each worker is done.

How to fork and join multiple subprocesses with a global timeout in Python?

I want to execute some tasks in parallel in multiple subprocesses and time out if the tasks were not completed within some delay.
A first approach consists in forking and joining the subprocesses individually with remaining timeouts computed with respect to the global timeout, like suggested in this answer. It works fine for me.
A second approach, which I want to use here, consists in creating a pool of subprocesses and waiting with the global timeout, like suggested in this answer.
However I have a problem with the second approach: after feeding the pool of subprocesses with tasks that have multiprocessing.Event() objects, waiting for their completion raises this exception:
RuntimeError: Condition objects should only be shared between processes through inheritance
Here is the Python code snippet:
import multiprocessing.pool
import time
class Worker:
def __init__(self):
self.event = multiprocessing.Event() # commenting this removes the RuntimeError
def work(self, x):
time.sleep(1)
return x * 10
if __name__ == "__main__":
pool_size = 2
timeout = 5
with multiprocessing.pool.Pool(pool_size) as pool:
result = pool.map_async(Worker().work, [4, 5, 2, 7])
print(result.get(timeout)) # raises the RuntimeError
In the "Programming guidlines" section of the multiprocessing — Process-based parallelism documentation, there is this paragraph:
Better to inherit than pickle/unpickle
When using the spawn or forkserver start methods many types from multiprocessing need to be picklable so that child processes can use them. However, one should generally avoid sending shared objects to other processes using pipes or queues. Instead you should arrange the program so that a process which needs access to a shared resource created elsewhere can inherit it from an ancestor process.
So multiprocessing.Event() caused a RuntimeError because it is not pickable, as demonstrated by the following Python code snippet:
import multiprocessing
import pickle
pickle.dumps(multiprocessing.Event())
which raises the same exception:
RuntimeError: Condition objects should only be shared between processes through inheritance
A solution is to use a proxy object:
A proxy is an object which refers to a shared object which lives (presumably) in a different process.
because:
An important feature of proxy objects is that they are picklable so they can be passed between processes.
multiprocessing.Manager().Event() creates a shared threading.Event() object and returns a proxy for it, so replacing this line:
self.event = multiprocessing.Event()
by the following line in the Python code snippet of the question solves the problem:
self.event = multiprocessing.Manager().Event()

what will process created by fork() do in python?

I wrote the following code, but I don't understand how it works very well:
NUM=8
def timec():
x=1000000
while x>0:
x-=1
pid_children=[]
start_time=time.time()
for i in range(NUM):
pid=os.fork()
if pid==0:
timec()
os._exit(0)
else:
pid_children.append(pid)
for j in pid_children:
os.waitpid(j,0)
print(time.time()-start_time)
I cannot understand where the child process starts or where it will finish.
And another question is will the waitpid() method wait for the child process to finish its work, or will it just return as soon as it is called?
When os.fork() is called, the program splits into two completely separate programs. In the child, os.fork() returns 0. In the parent, os.fork() returns the process id of the child.
The key distinction about os.fork() is that it does not create a new thread that shares the memory of the original thread, but instead creates an entirely new process. The new process has a copy of the memory of it's parent. Updates in the parent are not reflected in the child and updates in the child are not reflected in the parent! The each have their own state.
Given that context, here are the answers to your specific questions:
Where do the child processes start?
pid = os.fork()
This will generate more than NUM processes because after the first iteration you will have 2 processes inside of the for loop, each of which will fork into 2 processes, yielding 4 total processes after the second iteration. In total 256 (2^8) processes will be created!
Where do the child processes end?
Some will exit at:
os._exit(0)
Others will exit at the end of the file. That's because you overwrote pid in the subsequent iterations of the loop, so some children became orphaned (and never ran timec()).
pid_children will always only have a single process in it. That's because the entire state of the program is forked, and each fork (which has it's own copy of the list) only adds one element to the list.
What does waitpid do?
os.waitpid(pid) will block until the process with pid pid has completed.
os.fork() documentation
os.waitpid() documentation

python multiprocessing JoinableQueue PicklingError

Sorry...it seems that i asked a popular question, but i cannot find any helpful for my case from stackflow :P
so my code does the following things:
step 1. the parent process write task object into multiprocessing.JoinableQueue
step 2. child process(more than 1) read(get) the task object from the JoinableQueue and execute the task
my module structure is:
A.py
Class Task(object)
Class WorkerPool(object)
Class Worker(multiprocessing.Process)
def run() # here the step 2 is executed
Class TestGroup()
def loadTest() # here the step 1 above is executed, i.e. append the object of Task
What i understand is when mp.JoinableQueue is used, the objects appended should be pickable, i got the meaning of "the pickable" from https://docs.python.org/2/library/pickle.html#what-can-be-pickled-and-unpickled
my questions are:
1. does the object of Task is pickable in my case?
I got the error below when the code appends task objects into JoinableQueue:
File "/usr/lib/python2.6/multiprocessing/queues.py", line 242, in _feed
2014-06-23 03:18:43 INFO TestGroup: G1 End load test: object1
2014-06-23 03:18:43 INFO TestGroup: G1 End load test: object2
2014-06-23 03:18:43 INFO TestGroup: G1 End load test: object3
send(obj)
PicklingError: Can't pickle : attribute lookup pysphere.resources.VimService_services_types.DynamicData_Holder failed
What's the general usage of mp.JoinableQueue? in my case, i need to use join() and task_done()
When i choose to use Queue.Queue instead of mp.JoinableQueue, the pickerror is just gone, However, checking from the log, i found that all the child processes keep working on the first object of the Queue, what's the possible reason of this situation?
The multiprocessing module in Python starts multiple processes to run your tasks. Since processes do not share memory, they need to be able to communicate using serialized data. multiprocessing uses the pickle module to do the serialization, thus the requirement that the objects you are passing to the tasks be picklable.
1) Your task object seems to contain an instance from pysphere.resource.VimService_services_types. This is probably a reference to a system resource, such as an open file. This cannot be serialized or passed from one process to another, and therefore it causes the pickling error.
What you can do with mp.JoinableQueue is pass the arguments you need to the task, and have it start the service in the task itself so that it is local to that process.
For example:
queue = mp.JoinableQueue()
# not queue.put(task), since the new process will create the task
queue.put(task_args)
def f(task_args):
task = Task(task_args)
...
# you can't return the task, unless you've closed all non-serializable parts
return task.result
process = Process(target=f, args=(queue,))
...
2) Queue.Queue is meant for threading. It uses shared memory and synchronization mechanisms to provide atomic operations. However, when you start a new process with multiprocessing, it copies the initial process, and so each child will work on the same queue objects, since the queue in memory has been copied for each process.

Categories

Resources