How to control the maximum concurrently running processes? - python

There are 5 files: main.py, worker.py, cat.py, dog.py and rabbit.py. cat, dog and rabbit inherit form worker and implement worker_run().
In the main.py, I prepare 3 processes to execute, however don't know how to control the maximum concurrently running process at the same time (eg. 2 processes).
I have tried using the multiprocessing.Pool, but it only supports functions outside class (?).
main.py:
from multiprocessing import Process
from cat import *
from dog import *
from rabbit import *
p1 = cat()
p2 = dog()
p3 = rabbit()
p1 = start()
p2 = start()
p3 = start()
p1 = join()
p2 = join()
p3 = join()
worker.py:
import multiprocessing
class Worker(multiprocessing.Process):
def __init__(self):
multiprocessing.Process.__init__(self)
print "Init"
self.value = None
def run(self):
print "Running"
self.worker_run()
#abc.abstractmethod
def worker_run(self):
""" implement """
return
cat.py:
from worker import *
class cat(Worker):
def worker_run(self)
for i in range(10000)
print "cat run"
dog.py:
from worker import *
class dog(Worker):
def worker_run(self)
for i in range(10000)
print "dog run"
rabbit.py:
from worker import *
class dog(Worker):
def worker_run(self)
for i in range(10000)
print "rabbit run"

If you want to let at most two methods run concurrently and block the third one until one of the others stopped, you have to use a Semaphore
You must pass the semaphore to the object methods so that they can acquire it.
In your main file you create the semaphore and pass it to the objects:
from multiprocessing import Process, Semaphore
from cat import *
from dog import *
from rabbit import *
semaphore = Semaphore(2) # at most 2 processes running concurrently
p1 = cat(semaphore)
p2 = dog(semaphore)
p3 = rabbit(semaphore)
p1.start()
p2.start()
p3.start()
p1.join()
p2.join()
p3.join()
you can then modify the Worker class to acquire the semaphore before running worker_run:
class Worker(multiprocessing.Process):
def __init__(self, semaphore):
multiprocessing.Process.__init__(self)
print "Init"
self.value = None
self.semaphore
def run(self):
with self.semaphore:
print "Running"
self.worker_run()
#abc.abstractmethod
def worker_run(self):
""" implement """
return
This should ensure that at most 2 worker_run methods are running concurrently.
In fact I believe you are making things more complex than what ought to be. You do not have to subclass Process. You can achieve exactly the same functionality using the target argument:
from multiprocessing import Process, Semaphore
from cat import Cat
from dog import Dog
from rabbit import Rabbit
semaphore = Semaphore(2)
cat = Cat()
dog = Dog()
rabbit = Rabbit()
def run(animal, sema):
with sema:
animal.worker_run(*args)
cat_proc = Process(target=run, args=(cat, semaphore))
dog_proc = Process(target=run, args=(dog, semaphore))
rabbit_proc = Process(target=run, args=(rabbit, semaphore))
cat_proc.start()
dog_proc.start()
rabbit_proc.start()
cat_proc.join()
dog_proc.join()
rabbit_proc.join()
In fact with a little change you can get rid of the Semaphore and simply use the Pool object:
from multiprocessing import Pool
from cat import Cat
from dog import Dog
from rabbit import Rabbit
cat = Cat()
dog = Dog()
rabbit = Rabbit()
def run(animal):
animal.worker_run()
pool = Pool(2)
pool.map(run, [cat, dog, rabbit])
The problem you had is that you cannot pass as target argument, or as callable to Pool.map a method, because methods cannot be pickled (see What can be pickled and unpickled?). The multiprocessing modules uses the pickle protocol to communicate between processes so everything it handles should be pickleable.
In particular to solve the problem about unpickleable methods the standard workaround is to use a global function where you explicitly pass the instance as first argument, as I did above. This is exactly what happens with method calls, but it's done automatically by the interpreter. In this case you have to handle it explicitly.

Related

How to share data between two processes?

How can I share values from one process with another?
Apparently I can do that through multithreading but not multiprocessing.
Multithreading is slow for my program.
I cannot show my exact code so I made this simple example.
from multiprocessing import Process
from threading import Thread
import time
class exp:
def __init__(self):
self.var1 = 0
def func1(self):
self.var1 = 5
print(self.var1)
def func2(self):
print(self.var1)
if __name__ == "__main__":
#multithreading
obj1 = exp()
t1 = Thread(target = obj1.func1)
t2 = Thread(target = obj1.func2)
print("multithreading")
t1.start()
time.sleep(1)
t2.start()
time.sleep(3)
#multiprocessing
obj = exp()
p1 = Process(target = obj.func1)
p2 = Process(target = obj.func2)
print("multiprocessing")
p1.start()
time.sleep(2)
p2.start()
Expected output:
multithreading
5
5
multiprocessing
5
5
Actual output:
multithreading
5
5
multiprocessing
5
0
I know there has been a couple of close votes against this question, but the supposed duplicate question's answer does not really explain why the OP's program does not work as is and the offered solution is not what I would propose. Hence:
Let's analyze what is happening. The creation of obj = exp() is done by the main process. The execution of exp.func1 occurs is a different process/address space and therefore the obj object a must be serialized/de-serialized to the address space of that process. In that new address space self.var1 comes across with the initial value of 0 and is then set to 5, but only the copy of the obj object that is in the address space of process p1 is being modified; the copy of that object that exists in the main process has not been modified. Then when you start process p2, another copy of obj from the main process is sent to the new process, but still with self.var1 having a value of 0.
The solution is for self.var1 to be an instance of multiprocessing.Value, which is a special variable that exists in shared memory accessible to all procceses. See the docs.
from multiprocessing import Process, Value
class exp:
def __init__(self):
self.var1 = Value('i', 0, lock=False)
def func1(self):
self.var1.value = 5
print(self.var1.value)
def func2(self):
print(self.var1.value)
if __name__ == "__main__":
#multiprocessing
obj = exp()
p1 = Process(target = obj.func1)
p2 = Process(target = obj.func2)
print("multiprocessing")
p1.start()
# No need to sleep, just wait for p1 to complete
# before starting p2:
#time.sleep(2)
p1.join()
p2.start()
p2.join()
Prints:
multiprocessing
5
5
Note
Using shared memory for this particular problem is much more efficient than using a managed class, which is referenced by the "close" comment.
The assignment of 5 to self.var1.value is an atomic operation and does not need to be a serialized operation. But if:
We were performing a non-atomic operation (requires multiple steps) such as self.var1.value += 1 and:
Multiple processes were performing this non-atomic operation in parallel, then:
We should create the value with a lock: self.var1 = Value('i', 0, lock=True) and:
Update the value under control of the lock: with self.var1.get_lock(): self.var1.value += 1
There are several ways to do that: you can use shared memory, fifo or message passing

Change class object in Python multiprocessing

I feel like I am missing something very simple but still cannot figure out how to achieve the result after reading docs of the multiprocessing package. All I want is to set a class object (property) in a separate process and return it back to the main process. What I tried:
from multiprocessing import Process, Queue
class B:
def __init__(self):
self.attr = 'hello'
def worker(queue):
b = B()
setattr(b.__class__, 'prop', property(lambda b: b.attr))
assert b.prop
queue.put(b)
queue = Queue()
p = Process(target=worker, args=(queue,))
p.start()
res = queue.get()
p.join()
assert hasattr(res, 'prop')
So property "prop" just disappears. What is the proper way to return it ? I am using Windows 10.

Keep static class members in python multiprocessing

I'm trying to keep a "static" defined multiprocessing Queue through multiple Processes, but it appears that this context is not copied to the new spawned process. Is there a way to keep them without storing them to derived process classes (so without self.q = A.q)?
main.py
from class_b import B
if __name__ == "__main__":
b = B()
b.start()
while True:
pass
class_a.py
from multiprocessing import Process, Queue
class A(Process):
q = Queue()
def __init__(self) -> None:
super().__init__(daemon=True)
class_b.py
from multiprocessing import Process
from class_a import A
class B(Process):
def __init__(self):
super().__init__(daemon=True)
print(A.q)
def run(self):
print(A.q)
console
<multiprocessing.queues.Queue object at 0x000001F77851B280>
<multiprocessing.queues.Queue object at 0x0000023C420C2580>
When you import from class_a.py to access A.q, then so does multiprocessing in its own process. Then there will be two copies. You should create it as a local in "main" and pass it into B.
from class_b import B
from multiprocessing import Queue
if __name__ == "__main__":
q = Queue()
b = B(q)
b.start()
while True:
pass
Then make B store that reference for itself:
from multiprocessing import Process
class B(Process):
def __init__(self, q):
super().__init__(daemon=True)
print(q)
self.q = q
def run(self):
print(self.q)

How can you code a nested concurrency in python?

My code has the following scheme:
class A():
def evaluate(self):
b = B()
for i in range(30):
b.run()
class B():
def run(self):
pass
if __name__ == '__main__':
a = A()
for i in range(10):
a.evaluate()
And I want to have two level of concurrency, the first one is on the evaluate method and the second one is on the run method (nested concurrency). The question is how to introduce this concurrency using the Pool class of the multiprocessing module? Should I pass explicitly number of cores?. The solution should not create processes greater than number of multiprocessing.cpu_count().
note: assume that number of cores is greater than 10 .
Edit:
I have seen a lot of comments that say that python does not have true concurrency due to GIL, this is true for python multi-threading but for multiprocessing this is not quit correct look here, also I have timed it also this article did, and the results show that it can go faster than sequential execution.
Your comment touches on a possible solution. In order to have "nested" concurrency you could have 2 separate pools. This would result in a "flat" structure program instead of a nest program. Additionally, it decouples A from B, A now knows nothing about b it just publishes to a generic queue. The example below uses a single process to illustrate wiring up concurrent workers communicating across an asynchronous queue but it could easily be replaced with a pool:
import multiprocessing as mp
class A():
def __init__(self, in_q, out_q):
self.in_q = in_q
self.out_q = out_q
def evaluate(self):
"""
Reads from input does work and process output
"""
while True:
job = self.in_q.get()
for i in range(30):
self.out_q.put(i)
class B():
def __init__(self, in_q):
self.in_q = in_q
def run(self):
"""
Loop over queue and process items, optionally configure
with another queue to "sink" the processing pipeline
"""
while True:
job = self.in_q.get()
if __name__ == '__main__':
# create the queues to wire up our concurrent worker pools
A_q = mp.Queue()
AB_q = mp.Queue()
a = A(in_q=A_q, out_q=AB_q)
b = B(in_q=AB_q)
p = mp.Process(target=a.evaluate)
p.start()
p2 = mp.Process(target=b.run)
p2.start()
for i in range(10):
A_q.put(i)
p.join()
p2.join()
This is a common pattern in golang.

Python - start two processes to run indefinitely

I have a simple example script constructed that defines three separate processes using multiprocessing in python. My objective is to have one parent thread that spawns two smaller threads that will collect and process data.
Currently, my implementation looks like this:
from Queue import Queue,Empty
from multiprocessing import Process
import time
import hashlib
class FillQueue(Process):
def __init__(self,q):
Process.__init__(self)
self.q = q
def run(self):
i = 0
while i is not 5:
print 'putting'
self.q.put('foo')
i+=1
self.q.put('|STOP|')
class ConsumeQueue(Process):
def __init__(self,q):
Process.__init__(self)
self.q = q
def run(self):
print 'Consume'
while True:
try:
value = self.q.get(False)
print value
if value == '|STOP|':
print 'done'
break;
except Empty:
print 'Nothing to process atm'
class Ripper(Process):
q = Queue()
def __init__(self):
self.fq = FillQueue(self.q)
self.cq = ConsumeQueue(self.q)
self.fq.daemon = True
self.cq.daemon = True
def run(self):
try:
self.fq.start()
self.cq.start()
except KeyboardInterrupt:
print 'exit'
if __name__ == '__main__':
r = Ripper()
r.start()
As it runs presently, the output from the script on CLI looks like this:
putting
putting
putting
putting
putting
Consume
foo
foo
foo
foo
foo
|STOP|
done
Obviously, the way I am starting my two threads is blocking, since the consumer doesn't even begin to process the items in the queue until the filler finishes adding items.
How should I rewrite this to make both threads begin immediately and not block, so the consumer will simply pass to the Empty except block while there is no work to process, but will exit completely when it receives the stop message?
EDIT: typo, had the start and run methods mixed up
You seem to be starting multiple processes using multiprocessing.Process.
However, you are using Queue.Queue which is only threadsafe, and not designed to be used by multiple processes.
shevek's answer is valid as well, but as a start, you should replace Queue.Queue with multiprocessing.Queue.
try this:
from Queue import Empty
from multiprocessing import Process, Queue
import time
import hashlib
class FillQueue(object):
def __init__(self, q):
self.q = q
def run(self):
i = 0
while i < 5:
print 'putting'
self.q.put('foo %d' % i )
i+=1
time.sleep(.5)
self.q.put('|STOP|')
class ConsumeQueue(object):
def __init__(self, q):
self.q = q
def run(self):
while True:
try:
value = self.q.get(False)
print value
if value == '|STOP|':
print 'done'
break;
except Empty:
print 'Nothing to process atm'
time.sleep(.2)
if __name__ == '__main__':
q = Queue()
f = FillQueue(q)
c = ConsumeQueue(q)
p1 = Process(target=f.run)
p1.start()
p2 = Process(target=c.run)
p2.start()
p1.join()
p2.join()
I think your program works fine. The CPU processes only one thing at a time, for a short time. However, the time required to put all your stuff in the queue is very short. So there is no reason that the filler cannot do this in one time slice.
If you add some delays in the filler, I think you should see that it actually works as you expect.

Categories

Resources