I have a class variable declared as a list that I want to update from a method declared within that class. However since this method processes a large amount of data, I am using multiprocessing to invoke it and hence I need to put lock on the class variable before updating it. I am unable to figure out how to put such a lock and update the class variable. If it matters, I am only creating one object of the said class at any given time.
Because of python's GIL, multiprocessing can only be used whith completely separate tasks, and no shared memory.
But you still can make it happend by using multiprocessing shared Array/Value:
from https://docs.python.org/2/library/multiprocessing.html#sharing-state-between-processes
from multiprocessing import Process, Value, Array
def f(n, a):
n.value = 3.1415927
for i in range(len(a)):
a[i] = -a[i]
if __name__ == '__main__':
num = Value('d', 0.0)
arr = Array('i', range(10))
p = Process(target=f, args=(num, arr))
p.start()
p.join()
print num.value
print arr[:]
Now as you asked, you need to ensure that differents processes won't access the same variable at the same time, and use a Lock. Hopefuly, all the shared variable available in the multiprocessing module are paired with a Lock.
To access the lock :
num.acquire() # get the lock
# do stuff
num.release() # don't forget to release it
I hope this helps.
If you're using the multiprocessing module (as opposed to multithreading, which is different), then unless I'm mistaken, the multiple processes forked don't share memory and each process would have its own copy of your class. This would mean that a lock would not be necessary, but it would also mean that the class attribute is not shared like you want it to be.
The multiprocessing module does offer several ways to allow communication between processes, including shared array objects. Perhaps this is what you're looking for.
Depending on what you're doing, you might also consider using the master-worker pattern, where you create a worker class with methods to manipulate your data, spawn several processes to run this class, and then dispatch datasets to the workers from your main process using the Queue class from the multiprocessing module.
Related
At first let me show you the current setup I have:
import multiprocessing.pool
from contextlib import closing
import os
def big_function(param):
process(another_module.global_variable[param])
def dispatcher():
# sharing read-only global variable taking benefit from Unix
# which follows policy copy-on-update
# https://stackoverflow.com/questions/19366259/
another_module.global_variable = huge_list
# send indices
params = range(len(another_module.global_variable))
with closing(multiprocessing.pool.Pool(processes=os.cpu_count())) as p:
multiprocessing_result = list(p.imap_unordered(big_function, params))
return multiprocessing_result
Here I use shared variable updated before creating process pool, which contains huge data, and that indeed gained me speedup, so it seem to be not pickled now. Also this variable belongs to the scope of an imported module (if it's important).
When I tried to create setup like this:
another_module.global_variable = []
p = multiprocessing.pool.Pool(processes=os.cpu_count())
def dispatcher():
# sharing read-only global variable taking benefit from Unix
# which follows policy copy-on-update
# https://stackoverflow.com/questions/19366259/
another_module_global_variable = huge_list
# send indices
params = range(len(another_module.global_variable))
multiprocessing_result = list(p.imap_unordered(big_function, params))
return multiprocessing_result
p "remembered" that global shared list was empty and refused to use new data when was called from inside the dispatcher.
Now here is the problem: processing ~600 data objects on 8 cores with the first setup above, my parallel computation runs 8 sec, while single-threaded it works 12 sec.
This is what I think: as long, as multiprocessing pickles data, and I need to re-create processes each time, I need to pickle function big_function(), so I lose time on that. The situation with data was partially solved using global variable (but I still need to recreate pool on each update of it).
What can I do with instances of big_function()(which depends on many other functions from other modules, numpy, etc)? Can I create os.cpu_count() of it's copies once and for all, and somehow feed new data into them and receive results, reusing workers?
Just to go over 'remembering' issue:
another_module.global_variable = []
p = multiprocessing.pool.Pool(processes=os.cpu_count())
def dispatcher():
another_module_global_variable = huge_list
params = range(len(another_module.global_variable))
multiprocessing_result = list(p.imap_unordered(big_function, params))
return multiprocessing_result
What seems to be the problem is when you are creating Pool instance.
Why is that?
It's because when you create instance of Pool, it does set up number of workers (by default equal to a number of CPU cores) and they are all started (forked) at that time. That means workers have a copy of parents global state (and another_module.global_variable among everything else), and with copy-on-write policy, when you update value of another_module.global_variable you change it in parent's process. Workers have a reference to the old value. That is why you have a problem with it.
Here are couple of links that can give you more explanation on this: this and this.
Here is a small snippet where you can switch lines where global variable value is changed and where process is started, and check what is printed in child process.
from __future__ import print_function
import multiprocessing as mp
glob = dict()
glob[0] = [1, 2, 3]
def printer(a):
print(globals())
print(a, glob[0])
if __name__ == '__main__':
p = mp.Process(target=printer, args=(1,))
p.start()
glob[0] = 'test'
p.join()
This is the Python2.7 code, but it works on Python3.6 too.
What would be the solution for this issue?
Well, go back to first solution. You update value of imported module's variable and then create pool of processes.
Now the real issue with the lack of speedup.
Here is the interesting part from documentation on how functions are pickled:
Note that functions (built-in and user-defined) are pickled by “fully
qualified” name reference, not by value. This means that only the
function name is pickled, along with the name of the module the
function is defined in. Neither the function’s code, nor any of its
function attributes are pickled. Thus the defining module must be
importable in the unpickling environment, and the module must contain
the named object, otherwise an exception will be raised.
This means that your function pickling should not be a time wasting process, or at least not by itself. What causes lack of speedup is that for ~600 data objects in list that you pass to imap_unordered call, you pass each one of them to a worker process. Once again, underlying implementation of multiprocessing.Pool may be the cause of this issue.
If you go deeper into multiprocessing.Pool implementation, you will see that two Threads using Queue are handling communication between parent and all child (worker) processes. Because of this and that all processes constantly require arguments for function and constantly return responses, you end up with very busy parent process. That is why 'a lot' of time is spent doing 'dispatching' work passing data to and from worker processes.
What to do about this?
Try to increase number of data objects that are processes in worker process at any time. In your example, you pass one data object after other and you can be sure that each worker process is processing exactly one data object at any time. Why not increase the number of data objects you pass to worker process? That way you can make each process busier with processing 10, 20 or even more data objects. From what I can see, imap_unordered has an chunksize argument. It's set to 1 by default. Try increasing it. Something like this:
import multiprocessing.pool
from contextlib import closing
import os
def big_function(params):
results = []
for p in params:
results.append(process(another_module.global_variable[p]))
return results
def dispatcher():
# sharing read-only global variable taking benefit from Unix
# which follows policy copy-on-update
# https://stackoverflow.com/questions/19366259/
another_module.global_variable = huge_list
# send indices
params = range(len(another_module.global_variable))
with closing(multiprocessing.pool.Pool(processes=os.cpu_count())) as p:
multiprocessing_result = list(p.imap_unordered(big_function, params, chunksize=10))
return multiprocessing_result
Couple of advices:
I see that you create params as a list of indexes, that you use to pick particular data object in big_function. You can create tuples that represent first and last index and pass them to big_function. This can be a way of increasing chunk of work. This is an alternative approach to the one I proposed above.
Unless you explicitly like to have Pool(processes=os.cpu_count()), you can omit it. It by default takes number of CPU cores.
Sorry for the length of answer or any typo that might have sneaked in.
I am currently creating a class which is supposed to execute some methods in a multi-threaded way, using the multiprocessing module. I execute the real computation using a Pool of n workers. Now I wanted to assign each of the currently n active workers an index between 0 and n for some other calculation. To do this, I wanted to use a shared Queue to assign an index in a way, that at every time no two workers have the same id. To share the same Queue inside the class between the different threads, I wanted to store it inside a Manager.Namespace(). But doing this, I got some problems with the Queue. Therefore, I created a minimal version of my problem and ended up with something like this:
from multiprocess import Process, Queue, Manager, Pool, cpu_count
class A(object):
def __init__(self):
manager = Manager()
self.ns = manager.Namespace()
self.ns.q = manager.Queue()
def foo(self):
for i in range(10):
print(i)
self.ns.q.put(i)
print(self.ns.q.get())
print(self.ns.q.qsize())
a = A()
a.foo()
In this code, the execution stops before the second print statement - therefore, I think, that no data is actually written in the Queue. When I remove the namespace related stuff the code works flawlessly. Is this the intended behaviour of the multiprocessings objects and am I doing something wrong? Or is this some kind of bug?
yes, you should not use Namespace here. when you put a Queue object into manager.Namespace(), each process will get a new Queue instance, all the writer/reader of those newly created queue objects have no connection with parent process, therefore no message will be received by worker processes. share a Queue solely instead.
by the way, you mentioned "thread" many times, but in the context of multiprocess module, a worker is a process, not a thread.
I'm kind of new to multiprocessing. However, assume that we have a program as below. The program seems to work fine. Now to the question. In my opinion we will have 4 instances of SomeKindOfClass with the same name (a) at the same time. How is that possible? Moreover, is there a potential risk with this kind of programming?
from multiprocessing.dummy import Pool
import numpy
from theFile import someKindOfClass
n = 8
allOutputs = numpy.zeros(n)
def work(index):
a = SomeKindOfClass()
a.theSlowFunction()
allOutputs[index] = a.output
pool = Pool(processes=4)
pool.map(work,range(0,n))
The name a is only local in scope within your work function, so there is no conflict of names here. Internally python will keep track of each class instance with a unique identifier. If you wanted to check this you could check the object id using the id function:
print(id(a))
I don't see any issues with your code.
Actually, you will have 8 instances of SomeKindOfClass (one for each worker), but only 4 will ever be active at the same time.
multiprocessing vs multiprocessing.dummy
Your program will only work if you continue to use the multiprocessing.dummy module, which is just a wrapper around the threading module. You are still using "python threads" (not separate processes). "Python threads" share the same global state; "Processes" don't. Python threads also share the same GIL, so they're still limited to running one python bytecode statement at a time, unlike processes, which can all run python code simultaneously.
If you were to change your import to from multiprocessing import Pool, you would notice that the allOutputs array remains unchanged after all the workers finish executing (also, you would likely get an error because you're creating the pool in the global scope, you should probably put that inside a main() function). This is because multiprocessing makes a new copy of the entire global state when it makes a new process. When the worker modifies the global allOutputs, it will be modifying a copy of that initial global state. When the process ends, nothing will be returned to the main process and the global state of the main process will remain unchanged.
Sharing State Between Processes
Unlike threads, processes aren't sharing the same memory
If you want to share state between processes, you have to explicitly declare shared variables and pass them to each process, or use pipes or some other method to allow the worker processes to communicate with each other or with the main process.
There are several ways to do this, but perhaps the simplest is using the Manager class
import multiprocessing
def worker(args):
index, array = args
a = SomeKindOfClass()
a.some_expensive_function()
array[index] = a.output
def main():
n = 8
manager = multiprocessing.Manager()
array = manager.list([0] * n)
pool = multiprocessing.Pool(4)
pool.map(worker, [(i, array) for i in range(n)])
print array
You can declare class instances inside the pool workers, because each instance has a separate place in memory so they don't conflict. The problem is if you declare a class instance first, then try to pass that one instance into multiple pool workers. Then each worker has a pointer to the same place in memory, and it will fail (this can be handled, just not this way).
Basically pool workers must not have overlapping memory anywhere. As long as the workers don't try to share memory somewhere, or perform operations that may result in collisions (like printing to the same file), there shouldn't be any problem.
Make sure whatever they're supposed to do (like something you want printed to a file, or added to a broader namespace somewhere) is returned as a result at the end, which you then iterate through.
If you are using multiprocessing you shouldn't worry - process doesn't share memory (by-default). So, there is no any risk to have several independent objects of class SomeKindOfClass - each of them will live in own process. How it works? Python runs your program and after that it runs 4 child processes. That's why it's very important to have if __init__ == '__main__' construction before pool.map(work,range(0,n)). Otherwise you will receive a infinity loop of process creation.
Problems could be if SomeKindOfClass keeps state on disk - for example, write something to file or read it.
In Python 2.7, is there's a way to identify if the current forked/spawned process is a child process instance (as opposed to being starting as a regular process). My goal is to set a global variable differently if it's a child process (e.g. create a pool with size 0 for child else pool with some number greater than 0).
I can't pass a parameter into the function (being called to execute in the child process), as even before the function is invoked the process would have been initialized and hence the global variable (especially for spawned process).
Also I am not in a position to use freeze_support (unless of course I am miss understood how to use it) as my application is running in a web service container (flask). Hence there's no main method.
Any help will be much appreciated.
Sample code that goes into infinite loop if you run it on windows:
from multiprocessing import Pool, freeze_support
p = Pool(5) # This should be created only in parent process and not the child process
def f(x):
return x*x
if __name__ == '__main__':
freeze_support()
print(p.map(f, [1, 2, 3]))
I would suggest restructuring your program to something more like my example code below. You mentioned that you don't have a main function, but you can create a wrapper that handles your pool:
from multiprocessing import Pool, freeze_support
def f(x):
return x*x
def handle_request():
p = Pool(5) # pool will only be in the parent process
print(p.map(f, [1, 2, 3]))
p.close() # remember to clean up the resources you use
p.join()
return
if __name__ == '__main__':
freeze_support() # do you really need this?
# start your web service here and make it use `handle_request` as the callback
# when a request needs to be serviced
It sounds like you are having a bit of an XY problem. You shouldn't be making a pool of processes global. It's just bad. You're giving your subprocesses access to their own process objects, which allows you to accidentally do bad things, like make a child process join itself. If you create your pool within a wrapper that is called for each request, then you don't need to worry about a global variable.
In the comments, you mentioned that you want a persistent pool. There is indeed some overhead to creating a pool on each request, but it's far safer than having a global pool. Also, you now have the capability to handle multiple requests simultaneously, assuming your web service handles each request in their own thread/process, without multiple requests trampling on each other by trying to use the same pool. I would strongly suggest you try to use this approach, and if it doesn't meet your performance specifications, you can look at optimizing it in other ways (ie, no global pool) to meet your spec.
One other note: multiprocessing.freeze_support() only needs to be called if you intend to bundle your scripts into a Windows executable. Don't use it if you are not doing that.
Move the pool creation into the main section to only create a multiprocessing pool once, any only in the main process:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
p = Pool(5)
print(p.map(f, [1, 2, 3]))
This works because the only process that is executing in the __main__ name is the original process. Spawned processes run with the __mp_main__ module name.
create a pool with size 0 for child
The child processes should never start a new multiprocessing pool. Only handle your processes from a single entry point.
I have an object that connects to a websocket remote server. I need to do a parallel process at the same time. However, I don't want to create a new connection to the server. Since threads are the easier way to do this, this is what I have been using so far. However, I have been getting a huge latency because of GIL. Can I achieve the same thing as threads but with multiprocesses in parallel?
This is the code that I have:
class WebSocketApp(object):
def on_open(self):
# Create another thread to make sure the commands are always been read
print "Creating thread..."
try:
thread.start_new_thread( self.read_commands,() )
except:
print "Error: Unable to start thread"
Is there an equivalent way to do this with multiprocesses?
Thanks!
The direct equivalent is
import multiprocessing
class WebSocketApp(object):
def on_open(self):
# Create another process to make sure the commands are always been read
print "Creating process..."
try:
multiprocessing.Process(target=self.read_commands,).start()
except:
print "Error: Unable to start process"
However, this doesn't address the "shared memory" aspect, which has to be handled a little differently than it is with threads, where you can just use global variables. You haven't really specified what objects you need to share between processes, so it's hard to say exactly what approach you should take. The multiprocessing documentation does cover ways to deal with shared state, however. Do note that in general it's better to avoid shared state if possible, and just explicitly pass state between the processes, either as an argument to the Process constructor or via a something like a Queue.
You sure can, use something along the lines of:
from multiprocessing import Process
class WebSocketApp(object):
def on_open(self):
# Create another thread to make sure the commands are always been read
print "Creating thread..."
try:
p = Process(target = WebSocketApp.read_commands, args = (self, )) # Add other arguments to this tuple
p.start()
except:
print "Error: Unable to start thread"
It is important to note, however, that as soon as the object is sent to the other process the two objects self and self in the different threads diverge and represent different objects. If you wish to communicate you will need to use something like the included Queue or Pipe in the multiprocessing module.
You may need to keep a reference of all the processes (p in this case) in your main thread in order to be able to communicate that your program is terminating (As a still-running child process will appear to hang the parent when it dies), but that depends on the nature of your program.
If you wish to keep the object the same, you can do one of a few things:
Make all of your object properties either single values or arrays and then do something similar to this:
from multiprocessing import Process, Value, Array
class WebSocketApp(object):
def __init__(self):
self.my_value = Value('d', 0.3)
self.my_array = Array('i', [4 10 4])
# -- Snip --
And then these values should work as shared memory. The types are very restrictive though (You must specify their types)
A different answer is to use a manager:
from multiprocessing import Process, Manager
class WebSocketApp(object):
def __init__(self):
self.my_manager = Manager()
self.my_list = self.my_manager.list()
self.my_dict = self.my_manager.dict()
# -- Snip --
And then self.my_list and self.my_dict act as a shared-memory list and dictionary respectively.
However, the types for both of these approaches can be restrictive so you may have to roll your own technique with a Queue and a Semaphore. But it depends what you're doing.
Check out the multiprocessing documentation for more information.