Using python's multiprocessing module, the following contrived example runs with minimal memory requirements:
import multiprocessing
# completely_unrelated_array = range(2**25)
def foo(x):
for x in xrange(2**28):pass
print x**2
P = multiprocessing.Pool()
for x in range(8):
multiprocessing.Process(target=foo, args=(x,)).start()
Uncomment the creation of the completely_unrelated_array and you'll find that each spawned process allocates the memory for a copy of the completely_unrelated_array! This is a minimal example of a much larger project that I can't figure out how to workaround; multiprocessing seems to make a copy of everything that is global. I don't need a shared memory object, I simply need to pass in x, and process it without the memory overhead of the entire program.
Side observation: What's interesting is that print id(completely_unrelated_array) inside foo gives the same value, suggesting that somehow that might not be copies...
Because of the nature of os.fork(), any variables in the global namespace of your __main__ module will be inherited by the child processes (assuming you're on a Posix platform), so you'll see the memory usage in the children reflect that as soon as they're created. I'm not sure if all that memory is really being allocated though, as far as I know that memory is shared until you actually try to change it in the child, at which point a new copy is made. Windows, on the other hand, doesn't use os.fork() - it re-imports the main module in each child, and pickles any local variables you want sent to the children. So, using Windows you can actually avoid the large global ending up copied in the child by only defining it inside an if __name__ == "__main__": guard, because everything inside that guard will only run in the parent process:
import time
import multiprocessing
def foo(x):
for x in range(2**28):pass
print(x**2)
if __name__ == "__main__":
completely_unrelated_array = list(range(2**25)) # This will only be defined in the parent on Windows
P = multiprocessing.Pool()
for x in range(8):
multiprocessing.Process(target=foo, args=(x,)).start()
Now, in Python 2.x, you can only create new multiprocessing.Process objects by forking if you're using a Posix platform. But on Python 3.4, you can specify how the new processes are created, by using contexts. So, we can specify the "spawn" context, which is the one Windows uses, to create our new processes, and use the same trick:
# Note that this is Python 3.4+ only
import time
import multiprocessing
def foo(x):
for x in range(2**28):pass
print(x**2)
if __name__ == "__main__":
completely_unrelated_array = list(range(2**23)) # Again, this only exists in the parent
ctx = multiprocessing.get_context("spawn") # Use process spawning instead of fork
P = ctx.Pool()
for x in range(8):
ctx.Process(target=foo, args=(x,)).start()
If you need 2.x support, or want to stick with using os.fork() to create new Process objects, I think the best you can do to get the reported memory usage down is immediately delete the offending object in the child:
import time
import multiprocessing
import gc
def foo(x):
init()
for x in range(2**28):pass
print(x**2)
def init():
global completely_unrelated_array
completely_unrelated_array = None
del completely_unrelated_array
gc.collect()
if __name__ == "__main__":
completely_unrelated_array = list(range(2**23))
P = multiprocessing.Pool(initializer=init)
for x in range(8):
multiprocessing.Process(target=foo, args=(x,)).start()
time.sleep(100)
What is important here is which platform you are targeting.
Unix systems processes are created by using Copy-On-Write (cow) memory. So even though each process gets a copy of the full memory of the parent process, that memory is only actually allocated on a per page bases (4KiB)when it is modified.
So if you are only targeting these platforms you don't have to change anything.
If you are targeting platforms without cow forks you may want to use python 3.4 and its new forking contexts spawn and forkserver, see the documentation
These methods will create new processes which share nothing or limited state with the parent and all memory passing is explicit.
But not that that the spawned process will import your module so all global data will be explicitly copied and no copy-on-write is possible. To prevent this you have to reduce the scope of the data.
import multiprocessing as mp
import numpy as np
def foo(x):
import time
time.sleep(60)
if __name__ == "__main__":
mp.set_start_method('spawn')
# not global so forks will not have this allocated due to the spawn method
# if the method would be fork the children would still have this memory allocated
# but it could be copy-on-write
completely_unrelated_array = np.ones((5000, 10000))
P = mp.Pool()
for x in range(3):
mp.Process(target=foo, args=(x,)).start()
e.g top output with spawn:
%MEM TIME+ COMMAND
29.2 0:00.52 python3
0.5 0:00.00 python3
0.5 0:00.00 python3
0.5 0:00.00 python3
and with fork:
%MEM TIME+ COMMAND
29.2 0:00.52 python3
29.1 0:00.00 python3
29.1 0:00.00 python3
29.1 0:00.00 python3
note how its more than 100%, due to copy-on-write
Related
I'm trying to figure out how Lock works under the hood. I run this code on MacOS which using "spawn" as default method to start new process.
from multiprocessing import Process, Lock, set_start_method
from time import sleep
def f(lock, i):
lock.acquire()
print(id(lock))
try:
print('hello world', i)
sleep(3)
finally:
lock.release()
if __name__ == '__main__':
# set_start_method("fork")
lock = Lock()
for num in range(3):
p = Process(target=f, args=(lock, num))
p.start()
p.join()
Output:
140580736370432
hello world 0
140251759281920
hello world 1
140398066042624
hello world 2
The Lock works in my code. However, the ids of lock make me confused. Since idare different, are they still same one lock or there are multiple locks and they somehow communicate secretly? Is id() still hold the position in multiprocessing, I quote "CPython implementation detail: id is the address of the object in memory."?
If I use "fork" method, set_start_method("fork"), it prints out identical id which totally make sense for me.
id is implemented as but not required to be the memory location of the given object. when using fork, the separate process does not get it's own memory space until it modifies something (copy on write), so the memory location does not change because it "is" the same object. When using spawn, an entire new process is created and the __main__ file is imported as a library into the local namesapce, so all your same functions, classes, and module level variables are accessable (sans any modifications from anything that results from if __name__ == "__main__":). Then python creates a connection between the processes (pipe) in which it can send which function to call, and the arguments to call it with. everything passing through this pipe must be pickle'd then unpickle'd. Locks specifically are re-created when un-pickling by asking the operating system for a lock with a specific name (which was created in the parent process when the lock was created, then this name is sent across using pickle). This is how the two locks are synchronized, because it is backed by an object the operating system controls. Python then stores this lock along with some other data (the PyObject as it were) in the memory of the new process. calling id now will get the location of this struct which is different because it was created by a different process in a different chunk of memory.
here's a quick example to convince you that a "spawn'ed" lock is still synchronized:
from multiprocessing import Process, Lock, set_start_method
def foo(lock):
with lock:
print(f'child process lock id: {id(lock)}')
if __name__ == "__main__":
set_start_method("spawn")
lock = Lock()
print(f'parent process lock id: {id(lock)}')
lock.acquire() #lock the lock so child has to wait
p = Process(target=foo, args=(lock,))
p.start()
input('press enter to unlock the lock')
lock.release()
p.join()
The different "id's" are the different PyObject locations, but have little to do with the underlying mutex. I am not aware that there's a direct way to inspect the underlying lock which the operating system manages.
I know that child processes won't see changes made after a fork/spawn, and Windows processes don't inherit globals not using shared memory. But what I have is a situation where the children can't see changes to a global variable in shared memory made before the fork/spawn.
Simple demonstration:
from multiprocessing import Process, Value
global foo
foo = Value('i',1)
def printfoo():
global foo
with foo.get_lock():
print(foo.value)
if __name__ == '__main__':
with foo.get_lock():
foo.value = 2
Process(target=printfoo).start()
On Linux and MacOS, this displays the expected 2. On Windows, it displays 1, even though the modification to the global Value is made before the call to Process. How can I make the change visible to the child process on Windows, too?
The problem here is that your child process creates a new shared value, rather than using the one the parent created. Your parent process needs to explicitly send the Value to the child, for example, as an argument to the target function:
from multiprocessing import Process, Value
def use_shared_value(val):
val.value = 2
if __name__ == '__main__':
val = Value('i', 1)
p = Process(target=use_shared_value, args=(val,))
p.start()
p.join()
print(val.value)
(Unfortunately, I don't have a Windows Python install to test this on.)
Child processes cannot inherit globals on Windows, regardless of whether those globals are initialized to multiprocessing.Value instances. multiprocessing.Value does not change the fact that the child re-executes your file, and re-executing the Value construction doesn't use the shared resources the parent allocated.
import random
import os
from multiprocessing import Process
num = random.randint(0, 100)
def show_num():
print("pid:{}, num is {}".format(os.getpid(), num))
if __name__ == '__main__':
print("pid:{}, num is {}".format(os.getpid(), num))
p = Process(target=show_num)
p.start()
p.join()
print('Parent Process Stop')
The above code shows the basic usage of creating a process. If I run this script in the windows environment, the variable num is different in the parent process and child process. However, the variable num is the same when the script run between the Linux environment.
I understand their mechanism of creating process is different. For example, the windows system doesn't have fork method.
But, Can someone give me a more detailed explanation of their difference?
Thank you very much.
The difference explaining the behavior described in your post is exactly what you mentioned: the start method used for creating the process. On Unix-style OSs, the default is fork. On Windows, the only available option is spawn.
fork
As described in the Overview section of this Wiki page (in a slightly different order):
The fork operation creates a separate address space for the child. The
child process has an exact copy of all the memory segments of the
parent process.
The child process calls the exec system call to overlay itself with the
other program: it ceases execution of its former program in favor of
the other.
This means that, when using fork, the child process already has the variable num in its address space and uses it. random.randint(0, 100) is not called again.
spawn
As the multiprocessing docs describe:
The parent process starts a fresh python interpreter process.
In this fresh interpreter process, the module from which the child is spawned is executed. Oversimplified, this does python.exe your_script.py a second time. Hence, a new variable num is created in the child process by assigning the return value of another call to random.randint(0, 100) to it. Therefore it is very likely, that the content of num differs between the processes.This is, by the way, also the reason why you absolutely need to safeguard the instantiation and start of a process with the if __name__ == '__main__' idiom when using spawn as start method, otherwise you end up with:
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
You can use spawn in POSIX OSs as well, to mimic the behavior you have seen on Windows:
import random
import os
from multiprocessing import Process, set_start_method
import platform
num = random.randint(0, 100)
def show_num():
print("pid:{}, num is {}".format(os.getpid(), num))
if __name__ == '__main__':
print(platform.system())
# change the start method for new processes to spawn
set_start_method("spawn")
print("pid:{}, num is {}".format(os.getpid(), num))
p = Process(target=show_num)
p.start()
p.join()
print('Parent Process Stop')
Output:
Linux
pid:26835, num is 41
pid:26839, num is 13
Parent Process Stop
I am running a program which loads 20 GB data to the memory at first. Then I will do N (> 1000) independent tasks where each of them may use (read only) part of the 20 GB data. I am now trying to do those tasks via multiprocessing. However, as this answer says, the entire global variables are copied for each process. In my case, I do not have enough memory to perform more than 4 tasks as my memory is only 96 GB. I wonder if there is any solution to this kind of problem so that I can fully use all my cores without consuming too much memory.
In linux, forked processes have a copy-on-write view of the parent address space. forking is light-weight and the same program runs in both the parent and the child, except that the child takes a different execution path. As a small exmample,
import os
var = "unchanged"
pid = os.fork()
if pid:
print('parent:', os.getpid(), var)
os.waitpid(pid, 0)
else:
print('child:', os.getpid(), var)
var = "changed"
# show parent and child views
print(os.getpid(), var)
Results in
parent: 22642 unchanged
child: 22643 unchanged
22643 changed
22642 unchanged
Applying this to multiprocessing, in this example I load data into a global variable. Since python pickles the data sent to the process pool, I make sure it pickles something small like an index and have the worker get the global data itself.
import multiprocessing as mp
import os
my_big_data = "well, bigger than this"
def worker(index):
"""get char in big data"""
return my_big_data[index]
if __name__ == "__main__":
pool = mp.Pool(os.cpu_count())
for c in pool.imap_unordered(worker, range(len(my_big_data)), chunksize=1):
print(c)
Windows does not have a fork-and-exec model for running programs. It has to start a new instance of the python interpreter and clone all relevant data to the child. This is a heavy lift!
In Python 2.7, is there's a way to identify if the current forked/spawned process is a child process instance (as opposed to being starting as a regular process). My goal is to set a global variable differently if it's a child process (e.g. create a pool with size 0 for child else pool with some number greater than 0).
I can't pass a parameter into the function (being called to execute in the child process), as even before the function is invoked the process would have been initialized and hence the global variable (especially for spawned process).
Also I am not in a position to use freeze_support (unless of course I am miss understood how to use it) as my application is running in a web service container (flask). Hence there's no main method.
Any help will be much appreciated.
Sample code that goes into infinite loop if you run it on windows:
from multiprocessing import Pool, freeze_support
p = Pool(5) # This should be created only in parent process and not the child process
def f(x):
return x*x
if __name__ == '__main__':
freeze_support()
print(p.map(f, [1, 2, 3]))
I would suggest restructuring your program to something more like my example code below. You mentioned that you don't have a main function, but you can create a wrapper that handles your pool:
from multiprocessing import Pool, freeze_support
def f(x):
return x*x
def handle_request():
p = Pool(5) # pool will only be in the parent process
print(p.map(f, [1, 2, 3]))
p.close() # remember to clean up the resources you use
p.join()
return
if __name__ == '__main__':
freeze_support() # do you really need this?
# start your web service here and make it use `handle_request` as the callback
# when a request needs to be serviced
It sounds like you are having a bit of an XY problem. You shouldn't be making a pool of processes global. It's just bad. You're giving your subprocesses access to their own process objects, which allows you to accidentally do bad things, like make a child process join itself. If you create your pool within a wrapper that is called for each request, then you don't need to worry about a global variable.
In the comments, you mentioned that you want a persistent pool. There is indeed some overhead to creating a pool on each request, but it's far safer than having a global pool. Also, you now have the capability to handle multiple requests simultaneously, assuming your web service handles each request in their own thread/process, without multiple requests trampling on each other by trying to use the same pool. I would strongly suggest you try to use this approach, and if it doesn't meet your performance specifications, you can look at optimizing it in other ways (ie, no global pool) to meet your spec.
One other note: multiprocessing.freeze_support() only needs to be called if you intend to bundle your scripts into a Windows executable. Don't use it if you are not doing that.
Move the pool creation into the main section to only create a multiprocessing pool once, any only in the main process:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
p = Pool(5)
print(p.map(f, [1, 2, 3]))
This works because the only process that is executing in the __main__ name is the original process. Spawned processes run with the __mp_main__ module name.
create a pool with size 0 for child
The child processes should never start a new multiprocessing pool. Only handle your processes from a single entry point.