How to share data between two processes? - python

How can I share values from one process with another?
Apparently I can do that through multithreading but not multiprocessing.
Multithreading is slow for my program.
I cannot show my exact code so I made this simple example.
from multiprocessing import Process
from threading import Thread
import time
class exp:
def __init__(self):
self.var1 = 0
def func1(self):
self.var1 = 5
print(self.var1)
def func2(self):
print(self.var1)
if __name__ == "__main__":
#multithreading
obj1 = exp()
t1 = Thread(target = obj1.func1)
t2 = Thread(target = obj1.func2)
print("multithreading")
t1.start()
time.sleep(1)
t2.start()
time.sleep(3)
#multiprocessing
obj = exp()
p1 = Process(target = obj.func1)
p2 = Process(target = obj.func2)
print("multiprocessing")
p1.start()
time.sleep(2)
p2.start()
Expected output:
multithreading
5
5
multiprocessing
5
5
Actual output:
multithreading
5
5
multiprocessing
5
0

I know there has been a couple of close votes against this question, but the supposed duplicate question's answer does not really explain why the OP's program does not work as is and the offered solution is not what I would propose. Hence:
Let's analyze what is happening. The creation of obj = exp() is done by the main process. The execution of exp.func1 occurs is a different process/address space and therefore the obj object a must be serialized/de-serialized to the address space of that process. In that new address space self.var1 comes across with the initial value of 0 and is then set to 5, but only the copy of the obj object that is in the address space of process p1 is being modified; the copy of that object that exists in the main process has not been modified. Then when you start process p2, another copy of obj from the main process is sent to the new process, but still with self.var1 having a value of 0.
The solution is for self.var1 to be an instance of multiprocessing.Value, which is a special variable that exists in shared memory accessible to all procceses. See the docs.
from multiprocessing import Process, Value
class exp:
def __init__(self):
self.var1 = Value('i', 0, lock=False)
def func1(self):
self.var1.value = 5
print(self.var1.value)
def func2(self):
print(self.var1.value)
if __name__ == "__main__":
#multiprocessing
obj = exp()
p1 = Process(target = obj.func1)
p2 = Process(target = obj.func2)
print("multiprocessing")
p1.start()
# No need to sleep, just wait for p1 to complete
# before starting p2:
#time.sleep(2)
p1.join()
p2.start()
p2.join()
Prints:
multiprocessing
5
5
Note
Using shared memory for this particular problem is much more efficient than using a managed class, which is referenced by the "close" comment.
The assignment of 5 to self.var1.value is an atomic operation and does not need to be a serialized operation. But if:
We were performing a non-atomic operation (requires multiple steps) such as self.var1.value += 1 and:
Multiple processes were performing this non-atomic operation in parallel, then:
We should create the value with a lock: self.var1 = Value('i', 0, lock=True) and:
Update the value under control of the lock: with self.var1.get_lock(): self.var1.value += 1

There are several ways to do that: you can use shared memory, fifo or message passing

Related

Update and Read the same variable using python Ray

I'm just studying about multiprocessing in Python. I have a code that updates the value of a variable in a process, and other processes read the value of this variable. This is working as I expected.
Now I just want to know if there is some way to do the same using the Ray library to improve the speed of execution if I need to run lots of processes reading it
from multiprocessing import Process, Manager
def write_to_dict(d, value):
while True:
value = value + 1
d['key'] = value
def read_from_dict(d):
while True:
read = d['key']
print(read)
if __name__ == '__main__':
manager = Manager()
shared_dict = manager.dict()
p1 = Process(target=write_to_dict, args=(shared_dict, 0))
p2 = Process(target=read_from_dict, args=(shared_dict,))
p1.start()
p2.start()
p1.join()
p2.join()

Multiprocessing, shared object which contains events

I want to share a flag object which will contain multiple multiprocessing.event objects to communicate with multiple python processes created by Multiprocessing.
Will this work?
You could do something like the following. Class Foo internally creates a list of multiprocessing.Event objects (in this case only 1 object for demo purposes) and an instance of Foo is passed to two processes:
from multiprocessing import Process, Event
import time
class Foo:
def __init__(self):
self.lst = []
self.lst.append(Event())
def worker1(foo):
event = foo.lst[0]
t = time.time_ns()
print('Waiting for event to be set...')
event.wait()
print('Wait satisfied, elapsed time =', (time.time_ns() - t) / 1_000_000_000.0)
def worker2(foo):
event = foo.lst[0]
time.sleep(2)
event.set()
def main():
foo = Foo()
p1 = Process(target=worker1, args=(foo,))
p2 = Process(target=worker2, args=(foo,))
p1.start()
p2.start()
p1.join()
p2.join()
if __name__ == '__main__':
main()
Prints:
Waiting for event to be set...
Wait satisfied, elapsed time = 2.0112978

How can you code a nested concurrency in python?

My code has the following scheme:
class A():
def evaluate(self):
b = B()
for i in range(30):
b.run()
class B():
def run(self):
pass
if __name__ == '__main__':
a = A()
for i in range(10):
a.evaluate()
And I want to have two level of concurrency, the first one is on the evaluate method and the second one is on the run method (nested concurrency). The question is how to introduce this concurrency using the Pool class of the multiprocessing module? Should I pass explicitly number of cores?. The solution should not create processes greater than number of multiprocessing.cpu_count().
note: assume that number of cores is greater than 10 .
Edit:
I have seen a lot of comments that say that python does not have true concurrency due to GIL, this is true for python multi-threading but for multiprocessing this is not quit correct look here, also I have timed it also this article did, and the results show that it can go faster than sequential execution.
Your comment touches on a possible solution. In order to have "nested" concurrency you could have 2 separate pools. This would result in a "flat" structure program instead of a nest program. Additionally, it decouples A from B, A now knows nothing about b it just publishes to a generic queue. The example below uses a single process to illustrate wiring up concurrent workers communicating across an asynchronous queue but it could easily be replaced with a pool:
import multiprocessing as mp
class A():
def __init__(self, in_q, out_q):
self.in_q = in_q
self.out_q = out_q
def evaluate(self):
"""
Reads from input does work and process output
"""
while True:
job = self.in_q.get()
for i in range(30):
self.out_q.put(i)
class B():
def __init__(self, in_q):
self.in_q = in_q
def run(self):
"""
Loop over queue and process items, optionally configure
with another queue to "sink" the processing pipeline
"""
while True:
job = self.in_q.get()
if __name__ == '__main__':
# create the queues to wire up our concurrent worker pools
A_q = mp.Queue()
AB_q = mp.Queue()
a = A(in_q=A_q, out_q=AB_q)
b = B(in_q=AB_q)
p = mp.Process(target=a.evaluate)
p.start()
p2 = mp.Process(target=b.run)
p2.start()
for i in range(10):
A_q.put(i)
p.join()
p2.join()
This is a common pattern in golang.

Access data between two threading processes

We are trying to access data between two threads, but are unable to accomplish this. We are looking for an easy (and elegant) way.
This is our current code.
Goal: after the second thread/process is done, the listHolder in instance B must contain 2 items.
Class A:
self.name = "MyNameIsBlah"
Class B:
# Contains a list of A Objects. Is now empty.
self.listHolder = []
def add(self, obj):
self.listHolder.append(obj)
def remove(self, obj):
self.listHolder.remove(obj)
def process(list):
# Create our second instance of A in process/thread
secondItem = A()
# Add our new instance to the list, so that we can access it out of our process/thread.
list.append(secondItem)
# Create new instance of B which is the manager. Our listHolder is empty here.
manager = B()
# Create new instance of A which is our first item
firstItem = A()
# Add our first item to the manager. Our listHolder now contains one item now.
b.add(firstItem)
# Start a new seperate process.
p = Process(target=process, args=manager.listHolder)
# Now start the thread
p.start()
# We now want to access our second item here from the listHolder, which was initiated in the seperate process/thread.
print len(manager.listHolder) << 1
print manager.listHolder[1] << ERROR
Expected output: 2 A instances in listHolder.
Got output: 1 A instance in listHolder.
How can we access our objects in the manager with the use of a seperated process/threads, so they can run two functions simultaneously in a non-thread-blocking way.
Currently we are trying to accomplish this with processes, but if threads can accomplish this goal in a easier way, then its not a problem. Python 2.7 is used.
Update 1:
#James Mills replied with using ".join()". However, this will block the main thread until the second Process is done. I tried using this, but the Process which is used in this example will never stop execution (while True). It will act as a timer, which must be able to iterate to a list and remove objects from the list.
Anyone has any suggestion how to accomplish this and fix the current cPickle error?
if James Mills answer doesn't work for you, here's a writeup of how to use queues to explicitly send data back and forth to a worker process:
#!/usr/bin/env python
import logging, multiprocessing, sys
def myproc(arg):
return arg*2
def worker(inqueue, outqueue):
logger = multiprocessing.get_logger()
logger.info('start')
while True:
job = inqueue.get()
logger.info('got %s', job)
outqueue.put( myproc(job) )
def beancounter(inqueue):
while True:
print 'done:', inqueue.get()
def main():
logger = multiprocessing.log_to_stderr(
level=logging.INFO,
)
logger.info('setup')
data_queue = multiprocessing.Queue()
out_queue = multiprocessing.Queue()
for num in range(5):
data_queue.put(num)
worker_p = multiprocessing.Process(
target=worker, args=(data_queue, out_queue),
name='worker',
)
worker_p.start()
bean_p = multiprocessing.Process(
target=beancounter, args=(out_queue,),
name='beancounter',
)
bean_p.start()
worker_p.join()
bean_p.join()
logger.info('done')
if __name__=='__main__':
main()
from: Django multiprocessing and empty queue after put
Another example of using multiprocessing Manager to handle the data is here:
http://johntellsall.blogspot.com/2014/05/code-multiprocessing-producerconsumer.html
One of the simplest ways of Sharing state between processes is to use the multiprocessing.Manager class to synchronize data between processes (which interally uses a Queue):
Example:
from multiprocessing import Process, Manager
def f(d, l):
d[1] = '1'
d['2'] = 2
d[0.25] = None
l.reverse()
if __name__ == '__main__':
manager = Manager()
d = manager.dict()
l = manager.list(range(10))
p = Process(target=f, args=(d, l))
p.start()
p.join()
print d
print l
Output:
bash-4.3$ python -i foo.py
{0.25: None, 1: '1', '2': 2}
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
>>>
Note: Please be careful with the types of obejcts ou are sharing and attaching to your Process classes as you may end up with issues with pickling. See: Python multiprocessing pickling error

Python - start two processes to run indefinitely

I have a simple example script constructed that defines three separate processes using multiprocessing in python. My objective is to have one parent thread that spawns two smaller threads that will collect and process data.
Currently, my implementation looks like this:
from Queue import Queue,Empty
from multiprocessing import Process
import time
import hashlib
class FillQueue(Process):
def __init__(self,q):
Process.__init__(self)
self.q = q
def run(self):
i = 0
while i is not 5:
print 'putting'
self.q.put('foo')
i+=1
self.q.put('|STOP|')
class ConsumeQueue(Process):
def __init__(self,q):
Process.__init__(self)
self.q = q
def run(self):
print 'Consume'
while True:
try:
value = self.q.get(False)
print value
if value == '|STOP|':
print 'done'
break;
except Empty:
print 'Nothing to process atm'
class Ripper(Process):
q = Queue()
def __init__(self):
self.fq = FillQueue(self.q)
self.cq = ConsumeQueue(self.q)
self.fq.daemon = True
self.cq.daemon = True
def run(self):
try:
self.fq.start()
self.cq.start()
except KeyboardInterrupt:
print 'exit'
if __name__ == '__main__':
r = Ripper()
r.start()
As it runs presently, the output from the script on CLI looks like this:
putting
putting
putting
putting
putting
Consume
foo
foo
foo
foo
foo
|STOP|
done
Obviously, the way I am starting my two threads is blocking, since the consumer doesn't even begin to process the items in the queue until the filler finishes adding items.
How should I rewrite this to make both threads begin immediately and not block, so the consumer will simply pass to the Empty except block while there is no work to process, but will exit completely when it receives the stop message?
EDIT: typo, had the start and run methods mixed up
You seem to be starting multiple processes using multiprocessing.Process.
However, you are using Queue.Queue which is only threadsafe, and not designed to be used by multiple processes.
shevek's answer is valid as well, but as a start, you should replace Queue.Queue with multiprocessing.Queue.
try this:
from Queue import Empty
from multiprocessing import Process, Queue
import time
import hashlib
class FillQueue(object):
def __init__(self, q):
self.q = q
def run(self):
i = 0
while i < 5:
print 'putting'
self.q.put('foo %d' % i )
i+=1
time.sleep(.5)
self.q.put('|STOP|')
class ConsumeQueue(object):
def __init__(self, q):
self.q = q
def run(self):
while True:
try:
value = self.q.get(False)
print value
if value == '|STOP|':
print 'done'
break;
except Empty:
print 'Nothing to process atm'
time.sleep(.2)
if __name__ == '__main__':
q = Queue()
f = FillQueue(q)
c = ConsumeQueue(q)
p1 = Process(target=f.run)
p1.start()
p2 = Process(target=c.run)
p2.start()
p1.join()
p2.join()
I think your program works fine. The CPU processes only one thing at a time, for a short time. However, the time required to put all your stuff in the queue is very short. So there is no reason that the filler cannot do this in one time slice.
If you add some delays in the filler, I think you should see that it actually works as you expect.

Categories

Resources