My question is around Queues and using ThreadPoolExecutor. If I understand the Python docs for Queues I can have code somewhat like this and not have to worry about needing another lock in Class B to control which thread is adding in items in to the queue? Since the Queue implments multiproducer, multiconsumer
class A:
def __init__(max_worker = 1):
pool = ThreadPoolExecutor(max_worker)
buffer = {}
_lock = threading.RLock()
def add_record_id(id, item):
with self._lock:
buffer[id].add(item, pool)
class B:
def __init__():
q = queue.Queue()
def add(item, pool):
if id >= 0:
q.put(item)
pool.submit(background_remover)
I am new to python multiprocessing, a background about the below code. I am trying to create three processes, one to add an element to the list, one to modify element in the list, and one to print the list.
The three processes are ideally using the same list that is in shared memory, initiated using manager.
The problem I face is that testprocess2 is not able to set the value to 0, basically, it is not able to alter the list.
class Trade:
def __init__(self, id):
self.exchange = None
self.order_id = id
class testprocess2(Process):
def __init__(self, trades, lock):
super().__init__(args=(trades, lock))
self.trades = trades
self.lock = lock
def run(self):
while True:
# lock.acquire()
print("Altering")
for idx in range(len(self.trades)):
self.trades[idx].order_id = 0
# lock.release()
sleep(1)
class testprocess1(Process):
def __init__(self, trades, lock):
super().__init__(args=(trades, lock))
self.trades = trades
self.lock = lock
def run(self):
while True:
print("start")
for idx in range(len(self.trades)):
print(self.trades[idx].order_id)
sleep(1)
class testprocess(Process):
def __init__(self, trades, lock):
super().__init__(args=(trades, lock))
self.trades = trades
self.lock = lock
def run(self):
while True:
# lock.acquire()
n = random.randint(0, 9)
print("adding random {}".format(n))
self.trades.append(Trade(n))
# lock.release()
# print(trades)
sleep(5)
if __name__ == "__main__":
with Manager() as manager:
records = manager.list([Trade(5)])
lock = Lock()
p1 = testprocess(records, lock)
p1.start()
p2 = testprocess1(records, lock)
p2.start()
p3 = testprocess2(records, lock)
p3.start()
p1.join()
p2.join()
p3.join()
Strictly speaking your managed list is not in shared memory and it is very important to understand what is going on. The actual list holding your Trade instances resides in a process that is created when you execute the Manager() call. When you then execute records = manager.list([Trade(5)]), records is not a direct reference to that list because, as I said, we are not dealing with shared memory. It is instead a special proxy object that implements the same methods as a list but when you, for example, invoke append on this proxy object, it takes the argument you are trying to append and serializes it and transmits it to the manager's process via either a socket or pipe where it gets de-serialized and appended to the actual list. In short, operations on the proxy object are turned into remote method calls.
Now for your problem. You are trying to reset the order_id attribute with the following statement:
self.trades[idx].order_id = 0
Since we are dealing with a remote list via a proxy object, the above statements unfortunately become the equivalent of:
trade = self.trades[idx] # fetch object from the remote list
trade.order_id = 0 # reset the order_id to 0 on the local copy
What is missing is updating the list with the newly updated trade object:
self.trades[idx] = trade
So your single update statement really needs to be replaced with the above 3-statement sequence.
I have also taken the liberty to modify your code in several ways.
The PEP8 Style Guide for Python Code recommends that class names be capitalized.
Since all of your process classes are identical in how they are constructed (i.e. have identical __init__ methods), I have created an abstract base class, TestProcess that these classes inherit from. All they have to do is provide a run method.
I have made these process classes daemon classes. That means that they will terminate automatically when the main process terminates. I did this for demo purposes so that the program does not loop endlessly. The main process will terminate after 15 seconds.
You do not need to pass the trades and lock arguments to the __init__ method of the Process class. If you were not deriving your classes from Process and you just wanted to, for example, have your newly created process be running a function foo that takes arguments trades and lock, then you would specify p1 = Process(target=foo, args=(trades, lock)). That is the real purpose of the args argument, i.e. to be used with the target argument. See documentation for threading.Thread class for details. I actually see very little value in actually deriving your classes from multiprocessing.Process (by not doing so there is better opportunity for reuse). But since you did, you are already in your __init__ method setting instance attributes self.trades and self.lock, which will be used when your run method is invoked implicitly by your calling the start method. There is nothing further you need to do. See the two additional code examples at the end.
from multiprocessing import Process, Manager, Lock
from time import sleep
import random
from abc import ABC, abstractmethod
class Trade:
def __init__(self, id):
self.exchange = None
self.order_id = id
class TestProcess(Process, ABC):
def __init__(self, trades, lock):
Process.__init__(self, daemon=True)
self.trades = trades
self.lock = lock
#abstractmethod
def run():
pass
class TestProcess2(TestProcess):
def run(self):
while True:
# lock.acquire()
print("Altering")
for idx in range(len(self.trades)):
trade = self.trades[idx]
trade.order_id = 0
# We must tell the managed list that it has been updated!!!:
self.trades[idx] = trade
# lock.release()
sleep(1)
class TestProcess1(TestProcess):
def run(self):
while True:
print("start")
for idx in range(len(self.trades)):
print(f'index = {idx}, order id = {self.trades[idx].order_id}')
sleep(1)
class TestProcess(TestProcess):
def run(self):
while True:
# lock.acquire()
n = random.randint(0, 9)
print("adding random {}".format(n))
self.trades.append(Trade(n))
# lock.release()
# print(trades)
sleep(5)
if __name__ == "__main__":
with Manager() as manager:
records = manager.list([Trade(5)])
lock = Lock()
p1 = TestProcess(records, lock)
p1.start()
p2 = TestProcess1(records, lock)
p2.start()
p3 = TestProcess2(records, lock)
p3.start()
sleep(15) # run for 15 seconds
Using classes not derived from multiprocessing.Process
from multiprocessing import Process, Manager, Lock
from time import sleep
import random
from abc import ABC, abstractmethod
class Trade:
def __init__(self, id):
self.exchange = None
self.order_id = id
class TestProcess(ABC):
def __init__(self, trades, lock):
self.trades = trades
self.lock = lock
#abstractmethod
def process():
pass
class TestProcess2(TestProcess):
def process(self):
while True:
# lock.acquire()
print("Altering")
for idx in range(len(self.trades)):
trade = self.trades[idx]
trade.order_id = 0
# We must tell the managed list that it has been updated!!!:
self.trades[idx] = trade
# lock.release()
sleep(1)
class TestProcess1(TestProcess):
def process(self):
while True:
print("start")
for idx in range(len(self.trades)):
print(f'index = {idx}, order id = {self.trades[idx].order_id}')
sleep(1)
class TestProcess(TestProcess):
def process(self):
while True:
# lock.acquire()
n = random.randint(0, 9)
print("adding random {}".format(n))
self.trades.append(Trade(n))
# lock.release()
# print(trades)
sleep(5)
if __name__ == "__main__":
with Manager() as manager:
records = manager.list([Trade(5)])
lock = Lock()
tp = TestProcess(records, lock)
p1 = Process(target=tp.process, daemon=True)
p1.start()
tp1 = TestProcess1(records, lock)
p2 = Process(target=tp1.process, daemon=True)
p2.start()
tp2 = TestProcess2(records, lock)
p3 = Process(target=tp2.process, daemon=True)
p3.start()
sleep(15) # run for 15 seconds
Using functions instead of classes derived from multiprocessing.Process
from multiprocessing import Process, Manager, Lock
from time import sleep
import random
class Trade:
def __init__(self, id):
self.exchange = None
self.order_id = id
def testprocess2(trades, lock):
while True:
# lock.acquire()
print("Altering")
for idx in range(len(trades)):
trade = trades[idx]
trade.order_id = 0
# We must tell the managed list that it has been updated!!!:
trades[idx] = trade
# lock.release()
sleep(1)
def testprocess1(trades, lock):
while True:
print("start")
for idx in range(len(trades)):
print(f'index = {idx}, order id = {trades[idx].order_id}')
sleep(1)
def testprocess(trades, lock):
while True:
# lock.acquire()
n = random.randint(0, 9)
print("adding random {}".format(n))
trades.append(Trade(n))
# lock.release()
# print(trades)
sleep(5)
if __name__ == "__main__":
with Manager() as manager:
records = manager.list([Trade(5)])
lock = Lock()
p1 = Process(target=testprocess, args=(records, lock), daemon=True)
p1.start()
p2 = Process(target=testprocess1, args=(records, lock), daemon=True)
p2.start()
p3 = Process(target=testprocess2, args=(records, lock), daemon=True)
p3.start()
sleep(15) # run for 15 seconds
Lets say there is four core cpu processor and I am running all four process. To each process, I assigned a queue. But how do i run all queue concurrently.
class ParentProcess(multiprocessing.Process):
def __init__(self, queue):
multiprocessing.Process.__init__(self)
self.queue = queue
def run(self):
while not self.queue.empty():
data = self.queue.get()
print(data)
def main():
numProcs = 4
queueList = [multiprocessing.Queue() for i in range(numProcs)]
for index, queue in enumerate(queueList):
deals = [1,2,3]
for deal in deals:
queue.put(deal)
p = ParentProcess(queue)
p.start()
p.join()
Here each process runs one by one. But,
i need to run this parallely and work on each element of queue parallely.
I've wrote this script here to read data from a txt file and process it. But it seems that if I give it a big file and a high number of threads, the more it reads from the list, the slower the script gets.
Is there a way to avoid waiting for all the threads to finish and start a new one whenever a thread is done with the work?
Also it seems that when it finishes processing, the script doesn't exit.
import threading, Queue, time
class Work(threading.Thread):
def __init__(self, jobs):
threading.Thread.__init__(self)
self.Lock = threading.Lock()
self.jobs = jobs
def myFunction(self):
#simulate work
self.Lock.acquire()
print("Firstname: "+ self.firstname + " Lastname: "+ self.lastname)
self.Lock.release()
time.sleep(3)
def run(self):
while True:
self.item = self.jobs.get().rstrip()
self.firstname = self.item.split(":")[0]
self.lastname = self.item.split(":")[1]
self.myFunction()
self.jobs.task_done()
def main(file):
jobs = Queue.Queue()
myList = open(file, "r").readlines()
MAX_THREADS = 10
pool = [Work(jobs) for i in range(MAX_THREADS)]
for thread in pool:
thread.start()
for item in myList:
jobs.put(item)
for thread in pool:
thread.join()
if __name__ == '__main__':
main('list.txt')
The script probably seems to take longer on larger inputs because there's a 3 second pause between each batch of printing.
The issue with the script not finishing is, since you are using Queue, you need to call join() on the Queue, not on the individual threads. To make sure that the script returns when the jobs have stopped running, you should also set daemon = True.
The Lock will also not work in the current code because threading.Lock() produces a new lock each time. You need to have all the jobs share the same lock.
If you want to use this in Python 3 (which you should), the Queue module has been renamed to queue.
import threading, Queue, time
lock = threading.Lock() # One lock
class Work(threading.Thread):
def __init__(self, jobs):
threading.Thread.__init__(self)
self.daemon = True # set daemon
self.jobs = jobs
def myFunction(self):
#simulate work
lock.acquire() # All jobs share the one lock
print("Firstname: "+ self.firstname + " Lastname: "+ self.lastname)
self.Lock.release()
time.sleep(3)
def run(self):
while True:
self.item = self.jobs.get().rstrip()
self.firstname = self.item.split(":")[0]
self.lastname = self.item.split(":")[1]
self.myFunction()
self.jobs.task_done()
def main(file):
jobs = Queue.Queue()
with open(file, 'r') as fp: # Close the file when we're done
myList = fp.readlines()
MAX_THREADS = 10
pool = [Work(jobs) for i in range(MAX_THREADS)]
for thread in pool:
thread.start()
for item in myList:
jobs.put(item)
jobs.join() # Join the Queue
if __name__ == '__main__':
main('list.txt')
Simpler example (based on an example from the Python docs)
import threading
import time
from Queue import Queue # Py2
# from queue import Queue # Py3
lock = threading.Lock()
def worker():
while True:
item = jobs.get()
if item is None:
break
firstname, lastname = item.split(':')
lock.acquire()
print("Firstname: " + firstname + " Lastname: " + lastname)
lock.release()
time.sleep(3)
jobs.task_done()
jobs = Queue()
pool = []
MAX_THREADS = 10
for i in range(MAX_THREADS):
thread = threading.Thread(target=worker)
thread.start()
pool.append(thread)
with open('list.txt') as fp:
for line in fp:
jobs.put(line.rstrip())
# block until all tasks are done
jobs.join()
# stop workers
for i in range(MAX_THREADS):
jobs.put(None)
for thread in pool:
thread.join()
I am trying to find a way to compare between different objects (inherited from Thread class) in a way that keep parallilsm (real-time processing).
Every worker has three fields (message, count, n ). I am updating Count everytime. Let's say that I have three threads workers. I need to compare in my server based on the field count, how can I do access and compare between Worker.count of every worker, in a way that I keep parallelism
from Queue import Queue
from threading import Thread
import time
class Worker(Thread):
def __init__(self, message, n):
Thread.__init__(self)
self.message = message
self.count= 0
self.n = n
def run(self):
while True:
print(self.message)
self.count+=1
time.sleep(self.n)
class Comparator(Thread):
def __init__(self, message, n):
Thread.__init__(self)
self.message = message
self.n = n
def run(self):
while True:
max= max([x.count for x in threads]) # how can I access to other threads
print "max", max
time.sleep(self.n)
thread1 = Worker("Test-1", 1)
thread2 = Worker("Test-2", 3)
s = Comparator("Test-3", 2)
s.start()
s.join()
threads = [thread1, thread2]
for g in threads:
g.start()
for worker in threads:
# wait for workers
worker.join()
NOTE Using shared object here is not a good solution for me, using Queue() for example is not what I want, I need to do comparision based on updated field in the object that I update on the go (for simplicity, I use max() ).
you can pass the threads list to Comparator __init__() method :
[...]
class Comparator(Thread):
def __init__(self, message, n, threads):
Thread.__init__(self)
self.threads = threads
def run(self):
while True:
max= max([x.count for x in self.threads])
print("max", max)
time.sleep(self.n)
[...]
threads = [thread1, thread2]
s = Comparator("Test-3", 2, threads)