How can you code a nested concurrency in python? - python

My code has the following scheme:
class A():
def evaluate(self):
b = B()
for i in range(30):
b.run()
class B():
def run(self):
pass
if __name__ == '__main__':
a = A()
for i in range(10):
a.evaluate()
And I want to have two level of concurrency, the first one is on the evaluate method and the second one is on the run method (nested concurrency). The question is how to introduce this concurrency using the Pool class of the multiprocessing module? Should I pass explicitly number of cores?. The solution should not create processes greater than number of multiprocessing.cpu_count().
note: assume that number of cores is greater than 10 .
Edit:
I have seen a lot of comments that say that python does not have true concurrency due to GIL, this is true for python multi-threading but for multiprocessing this is not quit correct look here, also I have timed it also this article did, and the results show that it can go faster than sequential execution.

Your comment touches on a possible solution. In order to have "nested" concurrency you could have 2 separate pools. This would result in a "flat" structure program instead of a nest program. Additionally, it decouples A from B, A now knows nothing about b it just publishes to a generic queue. The example below uses a single process to illustrate wiring up concurrent workers communicating across an asynchronous queue but it could easily be replaced with a pool:
import multiprocessing as mp
class A():
def __init__(self, in_q, out_q):
self.in_q = in_q
self.out_q = out_q
def evaluate(self):
"""
Reads from input does work and process output
"""
while True:
job = self.in_q.get()
for i in range(30):
self.out_q.put(i)
class B():
def __init__(self, in_q):
self.in_q = in_q
def run(self):
"""
Loop over queue and process items, optionally configure
with another queue to "sink" the processing pipeline
"""
while True:
job = self.in_q.get()
if __name__ == '__main__':
# create the queues to wire up our concurrent worker pools
A_q = mp.Queue()
AB_q = mp.Queue()
a = A(in_q=A_q, out_q=AB_q)
b = B(in_q=AB_q)
p = mp.Process(target=a.evaluate)
p.start()
p2 = mp.Process(target=b.run)
p2.start()
for i in range(10):
A_q.put(i)
p.join()
p2.join()
This is a common pattern in golang.

Related

How to share data between two processes?

How can I share values from one process with another?
Apparently I can do that through multithreading but not multiprocessing.
Multithreading is slow for my program.
I cannot show my exact code so I made this simple example.
from multiprocessing import Process
from threading import Thread
import time
class exp:
def __init__(self):
self.var1 = 0
def func1(self):
self.var1 = 5
print(self.var1)
def func2(self):
print(self.var1)
if __name__ == "__main__":
#multithreading
obj1 = exp()
t1 = Thread(target = obj1.func1)
t2 = Thread(target = obj1.func2)
print("multithreading")
t1.start()
time.sleep(1)
t2.start()
time.sleep(3)
#multiprocessing
obj = exp()
p1 = Process(target = obj.func1)
p2 = Process(target = obj.func2)
print("multiprocessing")
p1.start()
time.sleep(2)
p2.start()
Expected output:
multithreading
5
5
multiprocessing
5
5
Actual output:
multithreading
5
5
multiprocessing
5
0
I know there has been a couple of close votes against this question, but the supposed duplicate question's answer does not really explain why the OP's program does not work as is and the offered solution is not what I would propose. Hence:
Let's analyze what is happening. The creation of obj = exp() is done by the main process. The execution of exp.func1 occurs is a different process/address space and therefore the obj object a must be serialized/de-serialized to the address space of that process. In that new address space self.var1 comes across with the initial value of 0 and is then set to 5, but only the copy of the obj object that is in the address space of process p1 is being modified; the copy of that object that exists in the main process has not been modified. Then when you start process p2, another copy of obj from the main process is sent to the new process, but still with self.var1 having a value of 0.
The solution is for self.var1 to be an instance of multiprocessing.Value, which is a special variable that exists in shared memory accessible to all procceses. See the docs.
from multiprocessing import Process, Value
class exp:
def __init__(self):
self.var1 = Value('i', 0, lock=False)
def func1(self):
self.var1.value = 5
print(self.var1.value)
def func2(self):
print(self.var1.value)
if __name__ == "__main__":
#multiprocessing
obj = exp()
p1 = Process(target = obj.func1)
p2 = Process(target = obj.func2)
print("multiprocessing")
p1.start()
# No need to sleep, just wait for p1 to complete
# before starting p2:
#time.sleep(2)
p1.join()
p2.start()
p2.join()
Prints:
multiprocessing
5
5
Note
Using shared memory for this particular problem is much more efficient than using a managed class, which is referenced by the "close" comment.
The assignment of 5 to self.var1.value is an atomic operation and does not need to be a serialized operation. But if:
We were performing a non-atomic operation (requires multiple steps) such as self.var1.value += 1 and:
Multiple processes were performing this non-atomic operation in parallel, then:
We should create the value with a lock: self.var1 = Value('i', 0, lock=True) and:
Update the value under control of the lock: with self.var1.get_lock(): self.var1.value += 1
There are several ways to do that: you can use shared memory, fifo or message passing

Is it possible to set maxtasksperchild for a threadpool?

After encountering some probable memory leaks in a long running multi threaded script I found out about maxtasksperchild, which can be used in a Multi process pool like this:
import multiprocessing
with multiprocessing.Pool(processes=32, maxtasksperchild=x) as pool:
pool.imap(function,stuff)
Is something similar possible for the Threadpool (multiprocessing.pool.ThreadPool)?
As the answer by noxdafox said, there is no way in the parent class, you can use threading module to control the max number of tasks per child. As you want to use multiprocessing.pool.ThreadPool, threading module is similar, so...
def split_processing(yourlist, num_splits=4):
'''
yourlist = list which you want to pass to function for threading.
num_splits = control total units passed.
'''
split_size = len(yourlist) // num_splits
threads = []
for i in range(num_splits):
start = i * split_size
end = len(yourlist) if i+1 == num_splits else (i+1) * split_size
threads.append(threading.Thread(target=function, args=(yourlist, start, end)))
threads[-1].start()
# wait for all threads to finish
for t in threads:
t.join()
Lets say
yourlist has 100 items, then
if num_splits = 10; then threads = 10, each thread has 10 tasks.
if num_splits = 5; then threads = 5, each thread has 20 tasks.
if num_splits = 50; then threads = 50, each thread has 2 tasks.
and vice versa.
Looking at multiprocessing.pool.ThreadPool implementation it becomes evident that the maxtaskperchild parameter is not propagated to the parent multiprocessing.Pool class. The multiprocessing.pool.ThreadPool implementation has never been completed, hence it lacks few features (as well as tests and documentation).
The pebble package implements a ThreadPool which supports workers restart after a given amount of tasks have been processed.
I wanted a ThreadPool that will run a new task as soon as another task in the pool completes (i.e. maxtasksperchild=1). I decided to write a small "ThreadPool" class that creates a new thread for every task. As soon a task in the pool completes, another thread is created for the next value in the iterable passed to the map method. The map method blocks until all values in the passed iterable have been processed and their threads returned.
import threading
class ThreadPool():
def __init__(self, processes=20):
self.processes = processes
self.threads = [Thread() for _ in range(0, processes)]
def get_dead_threads(self):
dead = []
for thread in self.threads:
if not thread.is_alive():
dead.append(thread)
return dead
def is_thread_running(self):
return len(self.get_dead_threads()) < self.processes
def map(self, func, values):
attempted_count = 0
values_iter = iter(values)
# loop until all values have been attempted to be processed and
# all threads are finished running
while (attempted_count < len(values) or self.is_thread_running()):
for thread in self.get_dead_threads():
try:
# run thread with the next value
value = next(values_iter)
attempted_count += 1
thread.run(func, value)
except StopIteration:
break
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, exc_tb):
pass
class Thread():
def __init__(self):
self.thread = None
def run(self, target, *args, **kwargs):
self.thread = threading.Thread(target=target,
args=args,
kwargs=kwargs)
self.thread.start()
def is_alive(self):
if self.thread:
return self.thread.is_alive()
else:
return False
You can use it like this:
def run_job(self, value, mp_queue=None):
# do something with value
value += 1
with ThreadPool(processes=2) as pool:
pool.map(run_job, [1, 2, 3, 4, 5])

Python 3 Limit count of active threads (finished threads do not quit)

I want to limit the number of active threads. What i have seen is, that a finished thread stays alive and does not exit itself, so the number of active threads keep growing until an error occours.
The following code starts only 8 threads at a time but they stay alive even when they finished. So the number keeps growing:
class ThreadEx(threading.Thread):
__thread_limiter = None
__max_threads = 2
#classmethod
def max_threads(cls, thread_max):
ThreadEx.__max_threads = thread_max
ThreadEx.__thread_limiter = threading.BoundedSemaphore(value=ThreadEx.__max_threads)
def __init__(self, target=None, args:tuple=()):
super().__init__(target=target, args=args)
if not ThreadEx.__thread_limiter:
ThreadEx.__thread_limiter = threading.BoundedSemaphore(value=ThreadEx.__max_threads)
def run(self):
ThreadEx.__thread_limiter.acquire()
try:
#success = self._target(*self._args)
#if success: return True
super().run()
except:
pass
finally:
ThreadEx.__thread_limiter.release()
def call_me(test1, test2):
print(test1 + test2)
time.sleep(1)
ThreadEx.max_threads(8)
for i in range(0, 99):
t = ThreadEx(target=call_me, args=("Thread count: ", str(threading.active_count())))
t.start()
Due to the for loop, the number of threads keep growing to 99.
I know that a thread has done its work because call_me has been executed and threading.active_count() was printed.
Does somebody know how i make sure, a finished thread does not stay alive?
This may be a silly answer but to me it looks you are trying to reinvent ThreadPool.
from multiprocessing.pool import ThreadPool
from time import sleep
p = ThreadPool(8)
def call_me(test1):
print(test1)
sleep(1)
for i in range(0, 99):
p.apply_async(call_me, args=(i,))
p.close()
p.join()
This will ensure only 8 concurrent threads are running your function at any point of time. And if you want a bit more performance, you can import Pool from multiprocessing and use that. The interface is exactly the same but your pool will now be subprocesses instead of threads, which usually gives a performance boost as GIL does not come in the way.
I have changed the class according to the help of Hannu.
I post it for reference, maybe it's useful for others that come across this post:
import threading
from multiprocessing.pool import ThreadPool
import time
class MultiThread():
__thread_pool = None
#classmethod
def begin(cls, max_threads):
MultiThread.__thread_pool = ThreadPool(max_threads)
#classmethod
def end(cls):
MultiThread.__thread_pool.close()
MultiThread.__thread_pool.join()
def __init__(self, target=None, args:tuple=()):
self.__target = target
self.__args = args
def run(self):
try:
result = MultiThread.__thread_pool.apply_async(self.__target, args=self.__args)
return result.get()
except:
pass
def call_me(test1, test2):
print(test1 + test2)
time.sleep(1)
return 0
MultiThread.begin(8)
for i in range(0, 99):
t = MultiThread(target=call_me, args=("Thread count: ", str(threading.active_count())))
t.run()
MultiThread.end()
The maximum of threads is 8 at any given time determined by the method begin.
And also the method run returns the result of your passed function if it returns something.
Hope that helps.

python - multiprocessing with queue

Here is my code below , I put string in queue , and hope dowork2 to do something work , and return char in shared_queue
but I always get nothing at while not shared_queue.empty()
please give me some point , thanks.
import time
import multiprocessing as mp
class Test(mp.Process):
def __init__(self, **kwargs):
mp.Process.__init__(self)
self.daemon = False
print('dosomething')
def run(self):
manager = mp.Manager()
queue = manager.Queue()
shared_queue = manager.Queue()
# shared_list = manager.list()
pool = mp.Pool()
results = []
results.append(pool.apply_async(self.dowork2,(queue,shared_queue)))
while True:
time.sleep(0.2)
t =time.time()
queue.put('abc')
queue.put('def')
l = ''
while not shared_queue.empty():
l = l + shared_queue.get()
print(l)
print( '%.4f' %(time.time()-t))
pool.close()
pool.join()
def dowork2(queue,shared_queue):
while True:
path = queue.get()
shared_queue.put(path[-1:])
if __name__ == '__main__':
t = Test()
t.start()
# t.join()
# t.run()
I managed to get it work by moving your dowork2 outside the class. If you declare dowork2 as a function before Test class and call it as
results.append(pool.apply_async(dowork2, (queue, shared_queue)))
it works as expected. I am not 100% sure but it probably goes wrong because your Test class is already subclassing Process. Now when your pool creates a subprocess and initialises the same class in the subprocess, something gets overridden somewhere.
Overall I wonder if Pool is really what you want to use here. Your worker seems to be in an infinite loop indicating you do not expect a return value from the worker, only the result in the return queue. If this is the case, you can remove Pool.
I also managed to get it work keeping your worker function within the class when I scrapped the Pool and replaced with another subprocess:
foo = mp.Process(group=None, target=self.dowork2, args=(queue, shared_queue))
foo.start()
# results.append(pool.apply_async(Test.dowork2, (queue, shared_queue)))
while True:
....
(you need to add self to your worker, though, or declare it as a static method:)
def dowork2(self, queue, shared_queue):

Python - start two processes to run indefinitely

I have a simple example script constructed that defines three separate processes using multiprocessing in python. My objective is to have one parent thread that spawns two smaller threads that will collect and process data.
Currently, my implementation looks like this:
from Queue import Queue,Empty
from multiprocessing import Process
import time
import hashlib
class FillQueue(Process):
def __init__(self,q):
Process.__init__(self)
self.q = q
def run(self):
i = 0
while i is not 5:
print 'putting'
self.q.put('foo')
i+=1
self.q.put('|STOP|')
class ConsumeQueue(Process):
def __init__(self,q):
Process.__init__(self)
self.q = q
def run(self):
print 'Consume'
while True:
try:
value = self.q.get(False)
print value
if value == '|STOP|':
print 'done'
break;
except Empty:
print 'Nothing to process atm'
class Ripper(Process):
q = Queue()
def __init__(self):
self.fq = FillQueue(self.q)
self.cq = ConsumeQueue(self.q)
self.fq.daemon = True
self.cq.daemon = True
def run(self):
try:
self.fq.start()
self.cq.start()
except KeyboardInterrupt:
print 'exit'
if __name__ == '__main__':
r = Ripper()
r.start()
As it runs presently, the output from the script on CLI looks like this:
putting
putting
putting
putting
putting
Consume
foo
foo
foo
foo
foo
|STOP|
done
Obviously, the way I am starting my two threads is blocking, since the consumer doesn't even begin to process the items in the queue until the filler finishes adding items.
How should I rewrite this to make both threads begin immediately and not block, so the consumer will simply pass to the Empty except block while there is no work to process, but will exit completely when it receives the stop message?
EDIT: typo, had the start and run methods mixed up
You seem to be starting multiple processes using multiprocessing.Process.
However, you are using Queue.Queue which is only threadsafe, and not designed to be used by multiple processes.
shevek's answer is valid as well, but as a start, you should replace Queue.Queue with multiprocessing.Queue.
try this:
from Queue import Empty
from multiprocessing import Process, Queue
import time
import hashlib
class FillQueue(object):
def __init__(self, q):
self.q = q
def run(self):
i = 0
while i < 5:
print 'putting'
self.q.put('foo %d' % i )
i+=1
time.sleep(.5)
self.q.put('|STOP|')
class ConsumeQueue(object):
def __init__(self, q):
self.q = q
def run(self):
while True:
try:
value = self.q.get(False)
print value
if value == '|STOP|':
print 'done'
break;
except Empty:
print 'Nothing to process atm'
time.sleep(.2)
if __name__ == '__main__':
q = Queue()
f = FillQueue(q)
c = ConsumeQueue(q)
p1 = Process(target=f.run)
p1.start()
p2 = Process(target=c.run)
p2.start()
p1.join()
p2.join()
I think your program works fine. The CPU processes only one thing at a time, for a short time. However, the time required to put all your stuff in the queue is very short. So there is no reason that the filler cannot do this in one time slice.
If you add some delays in the filler, I think you should see that it actually works as you expect.

Categories

Resources