I want to share a flag object which will contain multiple multiprocessing.event objects to communicate with multiple python processes created by Multiprocessing.
Will this work?
You could do something like the following. Class Foo internally creates a list of multiprocessing.Event objects (in this case only 1 object for demo purposes) and an instance of Foo is passed to two processes:
from multiprocessing import Process, Event
import time
class Foo:
def __init__(self):
self.lst = []
self.lst.append(Event())
def worker1(foo):
event = foo.lst[0]
t = time.time_ns()
print('Waiting for event to be set...')
event.wait()
print('Wait satisfied, elapsed time =', (time.time_ns() - t) / 1_000_000_000.0)
def worker2(foo):
event = foo.lst[0]
time.sleep(2)
event.set()
def main():
foo = Foo()
p1 = Process(target=worker1, args=(foo,))
p2 = Process(target=worker2, args=(foo,))
p1.start()
p2.start()
p1.join()
p2.join()
if __name__ == '__main__':
main()
Prints:
Waiting for event to be set...
Wait satisfied, elapsed time = 2.0112978
Related
I am new to python multiprocessing, a background about the below code. I am trying to create three processes, one to add an element to the list, one to modify element in the list, and one to print the list.
The three processes are ideally using the same list that is in shared memory, initiated using manager.
The problem I face is that testprocess2 is not able to set the value to 0, basically, it is not able to alter the list.
class Trade:
def __init__(self, id):
self.exchange = None
self.order_id = id
class testprocess2(Process):
def __init__(self, trades, lock):
super().__init__(args=(trades, lock))
self.trades = trades
self.lock = lock
def run(self):
while True:
# lock.acquire()
print("Altering")
for idx in range(len(self.trades)):
self.trades[idx].order_id = 0
# lock.release()
sleep(1)
class testprocess1(Process):
def __init__(self, trades, lock):
super().__init__(args=(trades, lock))
self.trades = trades
self.lock = lock
def run(self):
while True:
print("start")
for idx in range(len(self.trades)):
print(self.trades[idx].order_id)
sleep(1)
class testprocess(Process):
def __init__(self, trades, lock):
super().__init__(args=(trades, lock))
self.trades = trades
self.lock = lock
def run(self):
while True:
# lock.acquire()
n = random.randint(0, 9)
print("adding random {}".format(n))
self.trades.append(Trade(n))
# lock.release()
# print(trades)
sleep(5)
if __name__ == "__main__":
with Manager() as manager:
records = manager.list([Trade(5)])
lock = Lock()
p1 = testprocess(records, lock)
p1.start()
p2 = testprocess1(records, lock)
p2.start()
p3 = testprocess2(records, lock)
p3.start()
p1.join()
p2.join()
p3.join()
Strictly speaking your managed list is not in shared memory and it is very important to understand what is going on. The actual list holding your Trade instances resides in a process that is created when you execute the Manager() call. When you then execute records = manager.list([Trade(5)]), records is not a direct reference to that list because, as I said, we are not dealing with shared memory. It is instead a special proxy object that implements the same methods as a list but when you, for example, invoke append on this proxy object, it takes the argument you are trying to append and serializes it and transmits it to the manager's process via either a socket or pipe where it gets de-serialized and appended to the actual list. In short, operations on the proxy object are turned into remote method calls.
Now for your problem. You are trying to reset the order_id attribute with the following statement:
self.trades[idx].order_id = 0
Since we are dealing with a remote list via a proxy object, the above statements unfortunately become the equivalent of:
trade = self.trades[idx] # fetch object from the remote list
trade.order_id = 0 # reset the order_id to 0 on the local copy
What is missing is updating the list with the newly updated trade object:
self.trades[idx] = trade
So your single update statement really needs to be replaced with the above 3-statement sequence.
I have also taken the liberty to modify your code in several ways.
The PEP8 Style Guide for Python Code recommends that class names be capitalized.
Since all of your process classes are identical in how they are constructed (i.e. have identical __init__ methods), I have created an abstract base class, TestProcess that these classes inherit from. All they have to do is provide a run method.
I have made these process classes daemon classes. That means that they will terminate automatically when the main process terminates. I did this for demo purposes so that the program does not loop endlessly. The main process will terminate after 15 seconds.
You do not need to pass the trades and lock arguments to the __init__ method of the Process class. If you were not deriving your classes from Process and you just wanted to, for example, have your newly created process be running a function foo that takes arguments trades and lock, then you would specify p1 = Process(target=foo, args=(trades, lock)). That is the real purpose of the args argument, i.e. to be used with the target argument. See documentation for threading.Thread class for details. I actually see very little value in actually deriving your classes from multiprocessing.Process (by not doing so there is better opportunity for reuse). But since you did, you are already in your __init__ method setting instance attributes self.trades and self.lock, which will be used when your run method is invoked implicitly by your calling the start method. There is nothing further you need to do. See the two additional code examples at the end.
from multiprocessing import Process, Manager, Lock
from time import sleep
import random
from abc import ABC, abstractmethod
class Trade:
def __init__(self, id):
self.exchange = None
self.order_id = id
class TestProcess(Process, ABC):
def __init__(self, trades, lock):
Process.__init__(self, daemon=True)
self.trades = trades
self.lock = lock
#abstractmethod
def run():
pass
class TestProcess2(TestProcess):
def run(self):
while True:
# lock.acquire()
print("Altering")
for idx in range(len(self.trades)):
trade = self.trades[idx]
trade.order_id = 0
# We must tell the managed list that it has been updated!!!:
self.trades[idx] = trade
# lock.release()
sleep(1)
class TestProcess1(TestProcess):
def run(self):
while True:
print("start")
for idx in range(len(self.trades)):
print(f'index = {idx}, order id = {self.trades[idx].order_id}')
sleep(1)
class TestProcess(TestProcess):
def run(self):
while True:
# lock.acquire()
n = random.randint(0, 9)
print("adding random {}".format(n))
self.trades.append(Trade(n))
# lock.release()
# print(trades)
sleep(5)
if __name__ == "__main__":
with Manager() as manager:
records = manager.list([Trade(5)])
lock = Lock()
p1 = TestProcess(records, lock)
p1.start()
p2 = TestProcess1(records, lock)
p2.start()
p3 = TestProcess2(records, lock)
p3.start()
sleep(15) # run for 15 seconds
Using classes not derived from multiprocessing.Process
from multiprocessing import Process, Manager, Lock
from time import sleep
import random
from abc import ABC, abstractmethod
class Trade:
def __init__(self, id):
self.exchange = None
self.order_id = id
class TestProcess(ABC):
def __init__(self, trades, lock):
self.trades = trades
self.lock = lock
#abstractmethod
def process():
pass
class TestProcess2(TestProcess):
def process(self):
while True:
# lock.acquire()
print("Altering")
for idx in range(len(self.trades)):
trade = self.trades[idx]
trade.order_id = 0
# We must tell the managed list that it has been updated!!!:
self.trades[idx] = trade
# lock.release()
sleep(1)
class TestProcess1(TestProcess):
def process(self):
while True:
print("start")
for idx in range(len(self.trades)):
print(f'index = {idx}, order id = {self.trades[idx].order_id}')
sleep(1)
class TestProcess(TestProcess):
def process(self):
while True:
# lock.acquire()
n = random.randint(0, 9)
print("adding random {}".format(n))
self.trades.append(Trade(n))
# lock.release()
# print(trades)
sleep(5)
if __name__ == "__main__":
with Manager() as manager:
records = manager.list([Trade(5)])
lock = Lock()
tp = TestProcess(records, lock)
p1 = Process(target=tp.process, daemon=True)
p1.start()
tp1 = TestProcess1(records, lock)
p2 = Process(target=tp1.process, daemon=True)
p2.start()
tp2 = TestProcess2(records, lock)
p3 = Process(target=tp2.process, daemon=True)
p3.start()
sleep(15) # run for 15 seconds
Using functions instead of classes derived from multiprocessing.Process
from multiprocessing import Process, Manager, Lock
from time import sleep
import random
class Trade:
def __init__(self, id):
self.exchange = None
self.order_id = id
def testprocess2(trades, lock):
while True:
# lock.acquire()
print("Altering")
for idx in range(len(trades)):
trade = trades[idx]
trade.order_id = 0
# We must tell the managed list that it has been updated!!!:
trades[idx] = trade
# lock.release()
sleep(1)
def testprocess1(trades, lock):
while True:
print("start")
for idx in range(len(trades)):
print(f'index = {idx}, order id = {trades[idx].order_id}')
sleep(1)
def testprocess(trades, lock):
while True:
# lock.acquire()
n = random.randint(0, 9)
print("adding random {}".format(n))
trades.append(Trade(n))
# lock.release()
# print(trades)
sleep(5)
if __name__ == "__main__":
with Manager() as manager:
records = manager.list([Trade(5)])
lock = Lock()
p1 = Process(target=testprocess, args=(records, lock), daemon=True)
p1.start()
p2 = Process(target=testprocess1, args=(records, lock), daemon=True)
p2.start()
p3 = Process(target=testprocess2, args=(records, lock), daemon=True)
p3.start()
sleep(15) # run for 15 seconds
How can I share values from one process with another?
Apparently I can do that through multithreading but not multiprocessing.
Multithreading is slow for my program.
I cannot show my exact code so I made this simple example.
from multiprocessing import Process
from threading import Thread
import time
class exp:
def __init__(self):
self.var1 = 0
def func1(self):
self.var1 = 5
print(self.var1)
def func2(self):
print(self.var1)
if __name__ == "__main__":
#multithreading
obj1 = exp()
t1 = Thread(target = obj1.func1)
t2 = Thread(target = obj1.func2)
print("multithreading")
t1.start()
time.sleep(1)
t2.start()
time.sleep(3)
#multiprocessing
obj = exp()
p1 = Process(target = obj.func1)
p2 = Process(target = obj.func2)
print("multiprocessing")
p1.start()
time.sleep(2)
p2.start()
Expected output:
multithreading
5
5
multiprocessing
5
5
Actual output:
multithreading
5
5
multiprocessing
5
0
I know there has been a couple of close votes against this question, but the supposed duplicate question's answer does not really explain why the OP's program does not work as is and the offered solution is not what I would propose. Hence:
Let's analyze what is happening. The creation of obj = exp() is done by the main process. The execution of exp.func1 occurs is a different process/address space and therefore the obj object a must be serialized/de-serialized to the address space of that process. In that new address space self.var1 comes across with the initial value of 0 and is then set to 5, but only the copy of the obj object that is in the address space of process p1 is being modified; the copy of that object that exists in the main process has not been modified. Then when you start process p2, another copy of obj from the main process is sent to the new process, but still with self.var1 having a value of 0.
The solution is for self.var1 to be an instance of multiprocessing.Value, which is a special variable that exists in shared memory accessible to all procceses. See the docs.
from multiprocessing import Process, Value
class exp:
def __init__(self):
self.var1 = Value('i', 0, lock=False)
def func1(self):
self.var1.value = 5
print(self.var1.value)
def func2(self):
print(self.var1.value)
if __name__ == "__main__":
#multiprocessing
obj = exp()
p1 = Process(target = obj.func1)
p2 = Process(target = obj.func2)
print("multiprocessing")
p1.start()
# No need to sleep, just wait for p1 to complete
# before starting p2:
#time.sleep(2)
p1.join()
p2.start()
p2.join()
Prints:
multiprocessing
5
5
Note
Using shared memory for this particular problem is much more efficient than using a managed class, which is referenced by the "close" comment.
The assignment of 5 to self.var1.value is an atomic operation and does not need to be a serialized operation. But if:
We were performing a non-atomic operation (requires multiple steps) such as self.var1.value += 1 and:
Multiple processes were performing this non-atomic operation in parallel, then:
We should create the value with a lock: self.var1 = Value('i', 0, lock=True) and:
Update the value under control of the lock: with self.var1.get_lock(): self.var1.value += 1
There are several ways to do that: you can use shared memory, fifo or message passing
I am new with multiprocessing in python and so far all the example I've seen are this kind (with one or more methods in the file and then 'main'):
from multiprocessing import Process
def f1(a):
#do something
def f2(b):
#do something
if __name__ == '__main__':
f1(a1)
p = Process(target=f2, args=(b2,))
p.start()
p.join()
If I have instead a method who calls 2 functions in another file to be concurrent like in the following lines,
def function():
#do something
file2.f1(a) #first concurrent method
file2.f2(b) #second concurrent method
how should I do?
Can anyone make a simple example? I tried in this way, but it starts all the program again after the first loop :
def function():
#do something
for i in range(3):
p1 = Process(target=file2.f1, args=(a)) #first concurrent method
p2 = Process(target=file2.f2, args=(b)) #second concurrent method
p1.start()
p2.start()
p1.join()
p2.join()
The issue seems to be that args varialbe is incorrectly defined, it should be tuple and not a single variable:
def function():
#do something
for i in range(3):
p1 = Process(target=file2.f1, args=(a, )) #first concurrent method
p2 = Process(target=file2.f2, args=(b, )) #second concurrent method
p1.start()
p2.start()
p1.join()
p2.join()
If you the order of the executions is flexible, you can use the Pool class to trigger multiple calls:
from multiprocessing.pool import Pool
pool = Pool()
pool.map_async(f1, [(arg, )] * 3)
pool.map_async(f2, [(arg, )] * 3)
pool.close()
pool.join()
Currently I have 3 Process A,B,C created under main process. However, I would like to start B and C in Process A. Is that possible?
process.py
from multiprocessing import Process
procs = {}
import time
def test():
print(procs)
procs['B'].start()
procs['C'].start()
time.sleep(8)
procs['B'].terminate()
procs['C'].termiante()
procs['B'].join()
procs['C'].join()
def B():
while True:
print('+'*10)
time.sleep(1)
def C():
while True:
print('-'*10)
time.sleep(1)
procs['A'] = Process(target = test)
procs['B'] = Process(target = B)
procs['C'] = Process(target = C)
main.py
from process import *
print(procs)
procs['A'].start()
procs['A'].join()
And I got error
AssertionError: can only start a process object created by current process
Are there any alternative way to start process B and C in A? Or let A send signal to ask master process start B and C
I would recommend using Event objects to do the synchronization. They permit to trigger some actions across the processes. For instance
from multiprocessing import Process, Event
import time
procs = {}
def test():
print(procs)
# Will let the main process know that it needs
# to start the subprocesses
procs['B'][1].set()
procs['C'][1].set()
time.sleep(3)
# This will trigger the shutdown of the subprocess
# This is cleaner than using terminate as it allows
# you to clean up the processes if needed.
procs['B'][1].set()
procs['C'][1].set()
def B():
# Event will be set once again when this process
# needs to finish
event = procs["B"][1]
event.clear()
while not event.is_set():
print('+' * 10)
time.sleep(1)
def C():
# Event will be set once again when this process
# needs to finish
event = procs["C"][1]
event.clear()
while not event.is_set():
print('-' * 10)
time.sleep(1)
if __name__ == '__main__':
procs['A'] = (Process(target=test), None)
procs['B'] = (Process(target=B), Event())
procs['C'] = (Process(target=C), Event())
procs['A'][0].start()
# Wait for events to be set before starting the subprocess
procs['B'][1].wait()
procs['B'][0].start()
procs['C'][1].wait()
procs['C'][0].start()
# Join all the subprocess in the process that created them.
procs['A'][0].join()
procs['B'][0].join()
procs['C'][0].join()
note that this code is not really clean. Only one event is needed in this case. But you should get the main idea.
Also, the process A is not needed anymore, you could consider using callbacks instead. See for instance the concurrent.futures module if you want to chain some async actions.
I have a simple example script constructed that defines three separate processes using multiprocessing in python. My objective is to have one parent thread that spawns two smaller threads that will collect and process data.
Currently, my implementation looks like this:
from Queue import Queue,Empty
from multiprocessing import Process
import time
import hashlib
class FillQueue(Process):
def __init__(self,q):
Process.__init__(self)
self.q = q
def run(self):
i = 0
while i is not 5:
print 'putting'
self.q.put('foo')
i+=1
self.q.put('|STOP|')
class ConsumeQueue(Process):
def __init__(self,q):
Process.__init__(self)
self.q = q
def run(self):
print 'Consume'
while True:
try:
value = self.q.get(False)
print value
if value == '|STOP|':
print 'done'
break;
except Empty:
print 'Nothing to process atm'
class Ripper(Process):
q = Queue()
def __init__(self):
self.fq = FillQueue(self.q)
self.cq = ConsumeQueue(self.q)
self.fq.daemon = True
self.cq.daemon = True
def run(self):
try:
self.fq.start()
self.cq.start()
except KeyboardInterrupt:
print 'exit'
if __name__ == '__main__':
r = Ripper()
r.start()
As it runs presently, the output from the script on CLI looks like this:
putting
putting
putting
putting
putting
Consume
foo
foo
foo
foo
foo
|STOP|
done
Obviously, the way I am starting my two threads is blocking, since the consumer doesn't even begin to process the items in the queue until the filler finishes adding items.
How should I rewrite this to make both threads begin immediately and not block, so the consumer will simply pass to the Empty except block while there is no work to process, but will exit completely when it receives the stop message?
EDIT: typo, had the start and run methods mixed up
You seem to be starting multiple processes using multiprocessing.Process.
However, you are using Queue.Queue which is only threadsafe, and not designed to be used by multiple processes.
shevek's answer is valid as well, but as a start, you should replace Queue.Queue with multiprocessing.Queue.
try this:
from Queue import Empty
from multiprocessing import Process, Queue
import time
import hashlib
class FillQueue(object):
def __init__(self, q):
self.q = q
def run(self):
i = 0
while i < 5:
print 'putting'
self.q.put('foo %d' % i )
i+=1
time.sleep(.5)
self.q.put('|STOP|')
class ConsumeQueue(object):
def __init__(self, q):
self.q = q
def run(self):
while True:
try:
value = self.q.get(False)
print value
if value == '|STOP|':
print 'done'
break;
except Empty:
print 'Nothing to process atm'
time.sleep(.2)
if __name__ == '__main__':
q = Queue()
f = FillQueue(q)
c = ConsumeQueue(q)
p1 = Process(target=f.run)
p1.start()
p2 = Process(target=c.run)
p2.start()
p1.join()
p2.join()
I think your program works fine. The CPU processes only one thing at a time, for a short time. However, the time required to put all your stuff in the queue is very short. So there is no reason that the filler cannot do this in one time slice.
If you add some delays in the filler, I think you should see that it actually works as you expect.