We are trying to access data between two threads, but are unable to accomplish this. We are looking for an easy (and elegant) way.
This is our current code.
Goal: after the second thread/process is done, the listHolder in instance B must contain 2 items.
Class A:
self.name = "MyNameIsBlah"
Class B:
# Contains a list of A Objects. Is now empty.
self.listHolder = []
def add(self, obj):
self.listHolder.append(obj)
def remove(self, obj):
self.listHolder.remove(obj)
def process(list):
# Create our second instance of A in process/thread
secondItem = A()
# Add our new instance to the list, so that we can access it out of our process/thread.
list.append(secondItem)
# Create new instance of B which is the manager. Our listHolder is empty here.
manager = B()
# Create new instance of A which is our first item
firstItem = A()
# Add our first item to the manager. Our listHolder now contains one item now.
b.add(firstItem)
# Start a new seperate process.
p = Process(target=process, args=manager.listHolder)
# Now start the thread
p.start()
# We now want to access our second item here from the listHolder, which was initiated in the seperate process/thread.
print len(manager.listHolder) << 1
print manager.listHolder[1] << ERROR
Expected output: 2 A instances in listHolder.
Got output: 1 A instance in listHolder.
How can we access our objects in the manager with the use of a seperated process/threads, so they can run two functions simultaneously in a non-thread-blocking way.
Currently we are trying to accomplish this with processes, but if threads can accomplish this goal in a easier way, then its not a problem. Python 2.7 is used.
Update 1:
#James Mills replied with using ".join()". However, this will block the main thread until the second Process is done. I tried using this, but the Process which is used in this example will never stop execution (while True). It will act as a timer, which must be able to iterate to a list and remove objects from the list.
Anyone has any suggestion how to accomplish this and fix the current cPickle error?
if James Mills answer doesn't work for you, here's a writeup of how to use queues to explicitly send data back and forth to a worker process:
#!/usr/bin/env python
import logging, multiprocessing, sys
def myproc(arg):
return arg*2
def worker(inqueue, outqueue):
logger = multiprocessing.get_logger()
logger.info('start')
while True:
job = inqueue.get()
logger.info('got %s', job)
outqueue.put( myproc(job) )
def beancounter(inqueue):
while True:
print 'done:', inqueue.get()
def main():
logger = multiprocessing.log_to_stderr(
level=logging.INFO,
)
logger.info('setup')
data_queue = multiprocessing.Queue()
out_queue = multiprocessing.Queue()
for num in range(5):
data_queue.put(num)
worker_p = multiprocessing.Process(
target=worker, args=(data_queue, out_queue),
name='worker',
)
worker_p.start()
bean_p = multiprocessing.Process(
target=beancounter, args=(out_queue,),
name='beancounter',
)
bean_p.start()
worker_p.join()
bean_p.join()
logger.info('done')
if __name__=='__main__':
main()
from: Django multiprocessing and empty queue after put
Another example of using multiprocessing Manager to handle the data is here:
http://johntellsall.blogspot.com/2014/05/code-multiprocessing-producerconsumer.html
One of the simplest ways of Sharing state between processes is to use the multiprocessing.Manager class to synchronize data between processes (which interally uses a Queue):
Example:
from multiprocessing import Process, Manager
def f(d, l):
d[1] = '1'
d['2'] = 2
d[0.25] = None
l.reverse()
if __name__ == '__main__':
manager = Manager()
d = manager.dict()
l = manager.list(range(10))
p = Process(target=f, args=(d, l))
p.start()
p.join()
print d
print l
Output:
bash-4.3$ python -i foo.py
{0.25: None, 1: '1', '2': 2}
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
>>>
Note: Please be careful with the types of obejcts ou are sharing and attaching to your Process classes as you may end up with issues with pickling. See: Python multiprocessing pickling error
Related
I have a function in which I create a pool of processes. More over I use multiprocessing.Value() and multiprocessing.Lock() in order to manage some shared values between processes.
I want to do the same thing with an array of objects in order to share it between processes but I don't know how to do it. I will only read from that array.
This is the function:
from multiprocessing import Value,Pool,Lock,cpu_count
def predict(matches_path, unknown_path, files_path, imtodetect_path, num_query_photos, use_top3, uid, workbook, excel_file_path,modelspath,email_address):
shared_correct_matched_imgs = Value('i', 0)
shared_unknown_matched_imgs = Value('i', 0)
shared_tot_imgs = Value('i', 0)
counter = Value('i', 0)
shared_lock = Lock()
num_workers = cpu_count()
feature = load_feature(modelspath)
pool = Pool(initializer=init_globals,
initargs=[counter, shared_tot_imgs, shared_correct_matched_imgs, shared_unknown_matched_imgs,
shared_lock], processes=num_workers)
for img in glob.glob(os.path.join(imtodetect_path, '*g')):
pool.apply_async(predict_single_img, (img,imtodetect_path,excel_file_path,files_path,use_top3,uid,matches_path,unknown_path,num_query_photos,index,modelspath))
index+=increment
pool.close()
pool.join()
The array is created with the instruction feature = load_feature(modelspath). This is the array that I want to share.
In init_globals I inizialize the shared value:
def init_globals(counter, shared_tot_imgs, shared_correct_matched_imgs, shared_unknown_matched_imgs, shared_lock):
global cnt, tot_imgs, correct_matched_imgs, unknown_matched_imgs, lock
cnt = counter
tot_imgs = shared_tot_imgs
correct_matched_imgs = shared_correct_matched_imgs
unknown_matched_imgs = shared_unknown_matched_imgs
lock = shared_lock
The easy way of providing shared static data is simply to make it a global variable accessible to the function you want to call. If you're using an operating system which supports "fork", it is very straightforward to use global variables in child processes as long as they're constant (if you modify them, changes won't be reflected in the other processes)
import multiprocessing as mp
from random import randint
shared = ['some', 'shared', 'data', f'{randint(0,1e6)}']
def foo():
print(' '.join(shared))
if __name__ == "__main__":
mp.set_start_method("fork")
#defining "shared" here would be valid also
p = mp.Process(target=foo)
p.start()
p.join()
print(' '.join(shared)) #same random number means "shared" is same object
This won't work when using "spawn" as the start method (the only one available on windows), because the memory of the parent is not shared in any way with the child, so the child must "import" the main file to gain access to whatever the target function is (this is also why you can run into problems with decorators.) If you define your data outside the if __name__ == "__main__": block, it will kinda work, but you will have made separate copies of the data, which can be undesirable if it's big, slow to create, or can change each time it's created.
import multiprocessing as mp
from random import randint
shared = ['some', 'shared', 'data', f'{randint(0,1e6)}']
def foo():
print(' '.join(shared))
if __name__ == "__main__":
mp.set_start_method("spawn")
p = mp.Process(target=foo)
p.start()
p.join()
print(' '.join(shared)) #different number means different copy of "shared" (1 a million chance of being same i guess...)
My code has the following scheme:
class A():
def evaluate(self):
b = B()
for i in range(30):
b.run()
class B():
def run(self):
pass
if __name__ == '__main__':
a = A()
for i in range(10):
a.evaluate()
And I want to have two level of concurrency, the first one is on the evaluate method and the second one is on the run method (nested concurrency). The question is how to introduce this concurrency using the Pool class of the multiprocessing module? Should I pass explicitly number of cores?. The solution should not create processes greater than number of multiprocessing.cpu_count().
note: assume that number of cores is greater than 10 .
Edit:
I have seen a lot of comments that say that python does not have true concurrency due to GIL, this is true for python multi-threading but for multiprocessing this is not quit correct look here, also I have timed it also this article did, and the results show that it can go faster than sequential execution.
Your comment touches on a possible solution. In order to have "nested" concurrency you could have 2 separate pools. This would result in a "flat" structure program instead of a nest program. Additionally, it decouples A from B, A now knows nothing about b it just publishes to a generic queue. The example below uses a single process to illustrate wiring up concurrent workers communicating across an asynchronous queue but it could easily be replaced with a pool:
import multiprocessing as mp
class A():
def __init__(self, in_q, out_q):
self.in_q = in_q
self.out_q = out_q
def evaluate(self):
"""
Reads from input does work and process output
"""
while True:
job = self.in_q.get()
for i in range(30):
self.out_q.put(i)
class B():
def __init__(self, in_q):
self.in_q = in_q
def run(self):
"""
Loop over queue and process items, optionally configure
with another queue to "sink" the processing pipeline
"""
while True:
job = self.in_q.get()
if __name__ == '__main__':
# create the queues to wire up our concurrent worker pools
A_q = mp.Queue()
AB_q = mp.Queue()
a = A(in_q=A_q, out_q=AB_q)
b = B(in_q=AB_q)
p = mp.Process(target=a.evaluate)
p.start()
p2 = mp.Process(target=b.run)
p2.start()
for i in range(10):
A_q.put(i)
p.join()
p2.join()
This is a common pattern in golang.
Here is my code below , I put string in queue , and hope dowork2 to do something work , and return char in shared_queue
but I always get nothing at while not shared_queue.empty()
please give me some point , thanks.
import time
import multiprocessing as mp
class Test(mp.Process):
def __init__(self, **kwargs):
mp.Process.__init__(self)
self.daemon = False
print('dosomething')
def run(self):
manager = mp.Manager()
queue = manager.Queue()
shared_queue = manager.Queue()
# shared_list = manager.list()
pool = mp.Pool()
results = []
results.append(pool.apply_async(self.dowork2,(queue,shared_queue)))
while True:
time.sleep(0.2)
t =time.time()
queue.put('abc')
queue.put('def')
l = ''
while not shared_queue.empty():
l = l + shared_queue.get()
print(l)
print( '%.4f' %(time.time()-t))
pool.close()
pool.join()
def dowork2(queue,shared_queue):
while True:
path = queue.get()
shared_queue.put(path[-1:])
if __name__ == '__main__':
t = Test()
t.start()
# t.join()
# t.run()
I managed to get it work by moving your dowork2 outside the class. If you declare dowork2 as a function before Test class and call it as
results.append(pool.apply_async(dowork2, (queue, shared_queue)))
it works as expected. I am not 100% sure but it probably goes wrong because your Test class is already subclassing Process. Now when your pool creates a subprocess and initialises the same class in the subprocess, something gets overridden somewhere.
Overall I wonder if Pool is really what you want to use here. Your worker seems to be in an infinite loop indicating you do not expect a return value from the worker, only the result in the return queue. If this is the case, you can remove Pool.
I also managed to get it work keeping your worker function within the class when I scrapped the Pool and replaced with another subprocess:
foo = mp.Process(group=None, target=self.dowork2, args=(queue, shared_queue))
foo.start()
# results.append(pool.apply_async(Test.dowork2, (queue, shared_queue)))
while True:
....
(you need to add self to your worker, though, or declare it as a static method:)
def dowork2(self, queue, shared_queue):
I'm trying to inherit a sub class from multiprocessing.Process, which will have a queue for each instant, so that the queue can be use to catch the return value of the target.
But the problem is the multiprocessing.Process.start() uses subprocess.Popen (https://github.com/python/cpython/blob/master/Lib/multiprocessing/process.py) to create a process and run the target inside it. Is there a way to overload this without defining/overloading the entire Process module.
This is what I'm trying to do:
class Mprocessor(multiprocessing.Process):
def __init__(self, **kwargs):
multiprocessing.Process.__init__(self, **kwargs)
self._ret = Queue.Queue()
def run(self):
self._ret.put(multiprocessing.Process.run(self))
def getReturn(self):
if self._ret.empty:
return None
return self._ret.get()
Here I try to create a multiprocessig.Queue inside the class.
I override the 'run' method so when it is executed the return value/s of the target is put inside the queue.
I have a 'getReturn' method which is called in the main function using the Mprocess class. This method should only be called when 'Mprocess.isalive()' method(which is defined for multiprocessing.Process) returns false.
But this mechanism is not working because when I call 'Mprocess.start()' it creates a subprocess which runs the target in its own environment.
I want to know if there's a way to use the queue in the start method to get the return value, and avoid the target to have a queue argument to communicate.
I wanted to generalize this module.
I don't want my methods to be defined to have a queue to get return value.
I want to have a module so that it can be applicable to any function, because I am planning to have a manager method, which takes a dict["process_name/ID" : methods/targets], a dict["process name/ID" : [argument_list]] and create a process for each of this targets and return a dict["process_name/ID" : (return tuple, ).
Any ideas will be welcomed.
EDIT
Manager function:
def Processor_call(func = None, func_args = None):
if sorted(func.keys()) != sorted(func_args()):
print "Names in func dict and args dict doesn't match"
return None
process_list = multiprocessing.Queue()
for i in func.keys():
p = Mprocessor(name = i, target = func[i], args = tuple(func_args[i]))
process_list.put(p)
p.start()
return_dict = {}
while not process_list.empty():
process_wait = process_list.get()
if not process_wait.is_alive():
process_wait.join()
if process_wait.exitcode == 0:
return_dict[process_wait.name] = process_wait.getReturn()
else:
print "Error in process %s, status not availabe" %process_wait.name
else:
join_process.put(process_wait)
return return_dict
EDIT: The target function should look like this.
def sum(a , b):
return a + b
I don't want to pass a queue into the function, and return with queue.
I want to make a common module so that, any existing methods can use multiprocessing without any change to its definition, So the interface with other modules are maintained.
I don't want a function to be designed only to be run as a process, I want to have the common interface so that other modules can also use this function as a normal method, without bothering to read from the queue to get the return value.
Comment: ... so that I'll get the return value from the process started from start method
This will work for me, for instance:
class Mprocessor
class Mprocessor(multiprocessing.Process):
def __init__(self, queue, **kwargs):
multiprocessing.Process.__init__(self, **kwargs)
self._ret = queue
def run(self):
return_value = self._target( *self._args )
self._ret.put((self.name, return_value))
time.sleep(0.25)
exit(0)
Start processes and wait for return values
def Processor_call(func=None, func_args=None):
print('func=%s, func_args=%s' % (func, func_args))
ret_q = multiprocessing.Manager().Queue()
process_list = []
for i in func.keys():
p = Mprocessor(name=i, target=func[i], args=(func_args[i],), queue=ret_q)
p.start()
process_list.append(p)
time.sleep(0.1)
print('Block __main__ until all process terminated')
for p in process_list:
p.join()
print('Aggregate alle return values')
return_dict = {}
while not ret_q.empty():
p_name, value = ret_q.get()
return_dict[p_name] = value
return return_dict
__main__
if __name__ == '__main__':
rd = Processor_call({'f1':f1, 'f2':f1}, {'f1':1, 'f2':2})
print('rd=%s' % rd)
Output:
func={'f1': , 'f2': }, func_args={'f1': 1, 'f2': 2}
pid:4501 start 2
pid:4501 running
pid:4500 start 1
pid:4500 running
Block __main__ until all process terminated
pid:4501 running
pid:4500 running
pid:4501 running
pid:4500 running
pid:4501 Terminate
pid:4500 Terminate
Aggregate alle return values
rd={'f1': 1, 'f2': 2}
Tested with Python:3.4.2 and 2.7.9
Question: Is it possible to inherit multiprocessing.Process to communicate with the main process
Yes, it's possible. But not useing a class object, as your process use it's own copy of the class object .
You have to use a global Queue object and pass it to your process .
I have a method defined inside a class as follows:
from Queue import Queue
from threading import Thread, Lock
class Example(object):
def f(self):
things = [] # list of some objects to process
my_set = set()
my_dict = {}
my_list = []
def update(q, t_lock, my_set, my_dict, my_list):
while True:
thing = q.get()
new_set, new_dict, new_list = otherclass.method(thing)
with t_lock:
my_set = set_overlay(my_set, new_set)
my_dict.update(new_dict)
my_list += new_list
q.task_done()
q = Queue()
num_threads = 10
thread_lock = Lock()
for i in range(num_threads):
worker = Thread(target=update, args=(q, thread_lock, my_set, my_dict, my_list))
worker.setDaemon(True)
worker.start()
for t in things:
q.put(t)
q.join()
As you can see I'm trying to update the variables my_set, my_dict, and my_list with some results from the threaded update method defined inside the f method. I can pass them all to the update function, but this only works for mutable datatypes.
I updated this to use threading.Lock() as user maxywb suggested, but I would like to keep the same questions below open to be answered now that the lock is included.
My questions are:
Is this threadsafe? Can I guarantee that no updates to any of variables will be lost?
What if I wanted to throw an immutable variable into the mix like an int, that was added to from the results of otherclass.method(thing)? How would I go about doing that?
Is there a better way to do this/architect this? The idea here is to update variables local to a class method from a thread, while sharing a (hopefully) threadsafe reference to that variable across threads.