I am new to python multiprocessing, a background about the below code. I am trying to create three processes, one to add an element to the list, one to modify element in the list, and one to print the list.
The three processes are ideally using the same list that is in shared memory, initiated using manager.
The problem I face is that testprocess2 is not able to set the value to 0, basically, it is not able to alter the list.
class Trade:
def __init__(self, id):
self.exchange = None
self.order_id = id
class testprocess2(Process):
def __init__(self, trades, lock):
super().__init__(args=(trades, lock))
self.trades = trades
self.lock = lock
def run(self):
while True:
# lock.acquire()
print("Altering")
for idx in range(len(self.trades)):
self.trades[idx].order_id = 0
# lock.release()
sleep(1)
class testprocess1(Process):
def __init__(self, trades, lock):
super().__init__(args=(trades, lock))
self.trades = trades
self.lock = lock
def run(self):
while True:
print("start")
for idx in range(len(self.trades)):
print(self.trades[idx].order_id)
sleep(1)
class testprocess(Process):
def __init__(self, trades, lock):
super().__init__(args=(trades, lock))
self.trades = trades
self.lock = lock
def run(self):
while True:
# lock.acquire()
n = random.randint(0, 9)
print("adding random {}".format(n))
self.trades.append(Trade(n))
# lock.release()
# print(trades)
sleep(5)
if __name__ == "__main__":
with Manager() as manager:
records = manager.list([Trade(5)])
lock = Lock()
p1 = testprocess(records, lock)
p1.start()
p2 = testprocess1(records, lock)
p2.start()
p3 = testprocess2(records, lock)
p3.start()
p1.join()
p2.join()
p3.join()
Strictly speaking your managed list is not in shared memory and it is very important to understand what is going on. The actual list holding your Trade instances resides in a process that is created when you execute the Manager() call. When you then execute records = manager.list([Trade(5)]), records is not a direct reference to that list because, as I said, we are not dealing with shared memory. It is instead a special proxy object that implements the same methods as a list but when you, for example, invoke append on this proxy object, it takes the argument you are trying to append and serializes it and transmits it to the manager's process via either a socket or pipe where it gets de-serialized and appended to the actual list. In short, operations on the proxy object are turned into remote method calls.
Now for your problem. You are trying to reset the order_id attribute with the following statement:
self.trades[idx].order_id = 0
Since we are dealing with a remote list via a proxy object, the above statements unfortunately become the equivalent of:
trade = self.trades[idx] # fetch object from the remote list
trade.order_id = 0 # reset the order_id to 0 on the local copy
What is missing is updating the list with the newly updated trade object:
self.trades[idx] = trade
So your single update statement really needs to be replaced with the above 3-statement sequence.
I have also taken the liberty to modify your code in several ways.
The PEP8 Style Guide for Python Code recommends that class names be capitalized.
Since all of your process classes are identical in how they are constructed (i.e. have identical __init__ methods), I have created an abstract base class, TestProcess that these classes inherit from. All they have to do is provide a run method.
I have made these process classes daemon classes. That means that they will terminate automatically when the main process terminates. I did this for demo purposes so that the program does not loop endlessly. The main process will terminate after 15 seconds.
You do not need to pass the trades and lock arguments to the __init__ method of the Process class. If you were not deriving your classes from Process and you just wanted to, for example, have your newly created process be running a function foo that takes arguments trades and lock, then you would specify p1 = Process(target=foo, args=(trades, lock)). That is the real purpose of the args argument, i.e. to be used with the target argument. See documentation for threading.Thread class for details. I actually see very little value in actually deriving your classes from multiprocessing.Process (by not doing so there is better opportunity for reuse). But since you did, you are already in your __init__ method setting instance attributes self.trades and self.lock, which will be used when your run method is invoked implicitly by your calling the start method. There is nothing further you need to do. See the two additional code examples at the end.
from multiprocessing import Process, Manager, Lock
from time import sleep
import random
from abc import ABC, abstractmethod
class Trade:
def __init__(self, id):
self.exchange = None
self.order_id = id
class TestProcess(Process, ABC):
def __init__(self, trades, lock):
Process.__init__(self, daemon=True)
self.trades = trades
self.lock = lock
#abstractmethod
def run():
pass
class TestProcess2(TestProcess):
def run(self):
while True:
# lock.acquire()
print("Altering")
for idx in range(len(self.trades)):
trade = self.trades[idx]
trade.order_id = 0
# We must tell the managed list that it has been updated!!!:
self.trades[idx] = trade
# lock.release()
sleep(1)
class TestProcess1(TestProcess):
def run(self):
while True:
print("start")
for idx in range(len(self.trades)):
print(f'index = {idx}, order id = {self.trades[idx].order_id}')
sleep(1)
class TestProcess(TestProcess):
def run(self):
while True:
# lock.acquire()
n = random.randint(0, 9)
print("adding random {}".format(n))
self.trades.append(Trade(n))
# lock.release()
# print(trades)
sleep(5)
if __name__ == "__main__":
with Manager() as manager:
records = manager.list([Trade(5)])
lock = Lock()
p1 = TestProcess(records, lock)
p1.start()
p2 = TestProcess1(records, lock)
p2.start()
p3 = TestProcess2(records, lock)
p3.start()
sleep(15) # run for 15 seconds
Using classes not derived from multiprocessing.Process
from multiprocessing import Process, Manager, Lock
from time import sleep
import random
from abc import ABC, abstractmethod
class Trade:
def __init__(self, id):
self.exchange = None
self.order_id = id
class TestProcess(ABC):
def __init__(self, trades, lock):
self.trades = trades
self.lock = lock
#abstractmethod
def process():
pass
class TestProcess2(TestProcess):
def process(self):
while True:
# lock.acquire()
print("Altering")
for idx in range(len(self.trades)):
trade = self.trades[idx]
trade.order_id = 0
# We must tell the managed list that it has been updated!!!:
self.trades[idx] = trade
# lock.release()
sleep(1)
class TestProcess1(TestProcess):
def process(self):
while True:
print("start")
for idx in range(len(self.trades)):
print(f'index = {idx}, order id = {self.trades[idx].order_id}')
sleep(1)
class TestProcess(TestProcess):
def process(self):
while True:
# lock.acquire()
n = random.randint(0, 9)
print("adding random {}".format(n))
self.trades.append(Trade(n))
# lock.release()
# print(trades)
sleep(5)
if __name__ == "__main__":
with Manager() as manager:
records = manager.list([Trade(5)])
lock = Lock()
tp = TestProcess(records, lock)
p1 = Process(target=tp.process, daemon=True)
p1.start()
tp1 = TestProcess1(records, lock)
p2 = Process(target=tp1.process, daemon=True)
p2.start()
tp2 = TestProcess2(records, lock)
p3 = Process(target=tp2.process, daemon=True)
p3.start()
sleep(15) # run for 15 seconds
Using functions instead of classes derived from multiprocessing.Process
from multiprocessing import Process, Manager, Lock
from time import sleep
import random
class Trade:
def __init__(self, id):
self.exchange = None
self.order_id = id
def testprocess2(trades, lock):
while True:
# lock.acquire()
print("Altering")
for idx in range(len(trades)):
trade = trades[idx]
trade.order_id = 0
# We must tell the managed list that it has been updated!!!:
trades[idx] = trade
# lock.release()
sleep(1)
def testprocess1(trades, lock):
while True:
print("start")
for idx in range(len(trades)):
print(f'index = {idx}, order id = {trades[idx].order_id}')
sleep(1)
def testprocess(trades, lock):
while True:
# lock.acquire()
n = random.randint(0, 9)
print("adding random {}".format(n))
trades.append(Trade(n))
# lock.release()
# print(trades)
sleep(5)
if __name__ == "__main__":
with Manager() as manager:
records = manager.list([Trade(5)])
lock = Lock()
p1 = Process(target=testprocess, args=(records, lock), daemon=True)
p1.start()
p2 = Process(target=testprocess1, args=(records, lock), daemon=True)
p2.start()
p3 = Process(target=testprocess2, args=(records, lock), daemon=True)
p3.start()
sleep(15) # run for 15 seconds
Related
I have two threads with while loops in them. The first process data that the second needs to elaborate in parallel. I need to share a variable.
let's introduce dummy input:
data = iter([1,2,3,4,5,6,7,8,9])
My first class of Thread:
import threading
from queue import Queue
import time
class Thread1(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
_download = {}
def run(self):
i = 0
while True:
_download[i] = next(data)
self.queue.put(next(data))
time.sleep(1)
i += 1
My second class of Thread:
class Thread2(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
while True:
self.queue.get()
time.sleep(3)
with the main method:
q = Queue(maxsize=10)
t = Thread1(q)
s = Thread2(q)
t.start()
s.start()
I illustratedthe two alternatives for the case. I can access queue variable from the second Thread but I also want that the second Thread access the dictionary.
what can I do to access also the dictionary from Thread2?
for which choice should I opt?
I want to share a flag object which will contain multiple multiprocessing.event objects to communicate with multiple python processes created by Multiprocessing.
Will this work?
You could do something like the following. Class Foo internally creates a list of multiprocessing.Event objects (in this case only 1 object for demo purposes) and an instance of Foo is passed to two processes:
from multiprocessing import Process, Event
import time
class Foo:
def __init__(self):
self.lst = []
self.lst.append(Event())
def worker1(foo):
event = foo.lst[0]
t = time.time_ns()
print('Waiting for event to be set...')
event.wait()
print('Wait satisfied, elapsed time =', (time.time_ns() - t) / 1_000_000_000.0)
def worker2(foo):
event = foo.lst[0]
time.sleep(2)
event.set()
def main():
foo = Foo()
p1 = Process(target=worker1, args=(foo,))
p2 = Process(target=worker2, args=(foo,))
p1.start()
p2.start()
p1.join()
p2.join()
if __name__ == '__main__':
main()
Prints:
Waiting for event to be set...
Wait satisfied, elapsed time = 2.0112978
with some help I could run a process in python, Now I wan't to share a value betwenn the two tasks. I can set the value inside the init, but I can't change it inside the run method.
And by the way: how to kill the process when the main process stops?
from multiprocessing import Process, Value
import serial
import time
class P(Process):
def __init__(self, num):
num.value = 15
super(P, self).__init__()
def run(self):
while True:
num.value = num.value + 1
print("run simple process")
time.sleep(0.5)
def main():
while True:
print("run main")
print (num.value)
time.sleep(2.5)
if __name__ == "__main__":
num = Value('d', 0.0)
p = P(num)
p.start()
#p.join()
main()
In your simplified case you just passed num value upon initialization time.
To be able to access that value in other process's methods - set it as a state of the process:
class P(Process):
def __init__(self, num):
self.num = num
self.num.value = 15
super(P, self).__init__()
def run(self):
while True:
self.num.value += 1
print("run simple process")
time.sleep(0.5)
For a more "serious" cases - consider using Managers and Synchronization primitives.
I need to pass each object in a large list to a function. After the function completes I no longer need the object passed to the function and would like to delete the object to save memory. If I were working with a single process I would do the following:
result = []
while len(mylist) > 0:
result.append(myfunc(mylist.pop())
As I loop over mylist I pop off each object in the list such that the object is no longer stored in mylist after it's passed to my function. How do I achieve this same effect in parallel using multiprocessing?
A simple consumer example (credits go here) :
import multiprocessing
import time
import random
class Consumer(multiprocessing.Process):
def __init__(self, task_queue, result_queue):
multiprocessing.Process.__init__(self)
self.task_queue = task_queue
self.result_queue = result_queue
def run(self):
while True:
task = self.task_queue.get()
if task is None:
# Poison pill means shutdown
self.task_queue.task_done()
break
answer = task.process()
self.task_queue.task_done()
self.result_queue.put(answer)
return
class Task(object):
def process(self):
time.sleep(0.1) # pretend to take some time to do the work
return random.randint(0, 100)
if __name__ == '__main__':
# Establish communication queues
tasks = multiprocessing.JoinableQueue()
results = multiprocessing.Queue()
# Start consumers
num_consumers = multiprocessing.cpu_count() * 2
consumers = [Consumer(tasks, results) for i in xrange(num_consumers)]
for consumer in consumers:
consumer.start()
# Enqueue jobs
num_jobs = 10
for _ in xrange(num_jobs):
tasks.put(Task())
# Add a poison pill for each consumer
for _ in xrange(num_consumers):
tasks.put(None)
# Wait for all tasks to finish
tasks.join()
# Start printing results
while num_jobs:
result = results.get()
print 'Result:', result
num_jobs -= 1
I'm trying to understand the basics of threading and concurrency. I want a simple case where two threads repeatedly try to access one shared resource.
The code:
import threading
class Thread(threading.Thread):
def __init__(self, t, *args):
threading.Thread.__init__(self, target=t, args=args)
self.start()
count = 0
lock = threading.Lock()
def increment():
global count
lock.acquire()
try:
count += 1
finally:
lock.release()
def bye():
while True:
increment()
def hello_there():
while True:
increment()
def main():
hello = Thread(hello_there)
goodbye = Thread(bye)
while True:
print count
if __name__ == '__main__':
main()
So, I have two threads, both trying to increment the counter. I thought that if thread 'A' called increment(), the lock would be established, preventing 'B' from accessing until 'A' has released.
Running the makes it clear that this is not the case. You get all of the random data race-ish increments.
How exactly is the lock object used?
Additionally, I've tried putting the locks inside of the thread functions, but still no luck.
You can see that your locks are pretty much working as you are using them, if you slow down the process and make them block a bit more. You had the right idea, where you surround critical pieces of code with the lock. Here is a small adjustment to your example to show you how each waits on the other to release the lock.
import threading
import time
import inspect
class Thread(threading.Thread):
def __init__(self, t, *args):
threading.Thread.__init__(self, target=t, args=args)
self.start()
count = 0
lock = threading.Lock()
def incre():
global count
caller = inspect.getouterframes(inspect.currentframe())[1][3]
print "Inside %s()" % caller
print "Acquiring lock"
with lock:
print "Lock Acquired"
count += 1
time.sleep(2)
def bye():
while count < 5:
incre()
def hello_there():
while count < 5:
incre()
def main():
hello = Thread(hello_there)
goodbye = Thread(bye)
if __name__ == '__main__':
main()
Sample output:
...
Inside hello_there()
Acquiring lock
Lock Acquired
Inside bye()
Acquiring lock
Lock Acquired
...
import threading
# global variable x
x = 0
def increment():
"""
function to increment global variable x
"""
global x
x += 1
def thread_task():
"""
task for thread
calls increment function 100000 times.
"""
for _ in range(100000):
increment()
def main_task():
global x
# setting global variable x as 0
x = 0
# creating threads
t1 = threading.Thread(target=thread_task)
t2 = threading.Thread(target=thread_task)
# start threads
t1.start()
t2.start()
# wait until threads finish their job
t1.join()
t2.join()
if __name__ == "__main__":
for i in range(10):
main_task()
print("Iteration {0}: x = {1}".format(i,x))