I'm just studying about multiprocessing in Python. I have a code that updates the value of a variable in a process, and other processes read the value of this variable. This is working as I expected.
Now I just want to know if there is some way to do the same using the Ray library to improve the speed of execution if I need to run lots of processes reading it
from multiprocessing import Process, Manager
def write_to_dict(d, value):
while True:
value = value + 1
d['key'] = value
def read_from_dict(d):
while True:
read = d['key']
print(read)
if __name__ == '__main__':
manager = Manager()
shared_dict = manager.dict()
p1 = Process(target=write_to_dict, args=(shared_dict, 0))
p2 = Process(target=read_from_dict, args=(shared_dict,))
p1.start()
p2.start()
p1.join()
p2.join()
Related
How can I share values from one process with another?
Apparently I can do that through multithreading but not multiprocessing.
Multithreading is slow for my program.
I cannot show my exact code so I made this simple example.
from multiprocessing import Process
from threading import Thread
import time
class exp:
def __init__(self):
self.var1 = 0
def func1(self):
self.var1 = 5
print(self.var1)
def func2(self):
print(self.var1)
if __name__ == "__main__":
#multithreading
obj1 = exp()
t1 = Thread(target = obj1.func1)
t2 = Thread(target = obj1.func2)
print("multithreading")
t1.start()
time.sleep(1)
t2.start()
time.sleep(3)
#multiprocessing
obj = exp()
p1 = Process(target = obj.func1)
p2 = Process(target = obj.func2)
print("multiprocessing")
p1.start()
time.sleep(2)
p2.start()
Expected output:
multithreading
5
5
multiprocessing
5
5
Actual output:
multithreading
5
5
multiprocessing
5
0
I know there has been a couple of close votes against this question, but the supposed duplicate question's answer does not really explain why the OP's program does not work as is and the offered solution is not what I would propose. Hence:
Let's analyze what is happening. The creation of obj = exp() is done by the main process. The execution of exp.func1 occurs is a different process/address space and therefore the obj object a must be serialized/de-serialized to the address space of that process. In that new address space self.var1 comes across with the initial value of 0 and is then set to 5, but only the copy of the obj object that is in the address space of process p1 is being modified; the copy of that object that exists in the main process has not been modified. Then when you start process p2, another copy of obj from the main process is sent to the new process, but still with self.var1 having a value of 0.
The solution is for self.var1 to be an instance of multiprocessing.Value, which is a special variable that exists in shared memory accessible to all procceses. See the docs.
from multiprocessing import Process, Value
class exp:
def __init__(self):
self.var1 = Value('i', 0, lock=False)
def func1(self):
self.var1.value = 5
print(self.var1.value)
def func2(self):
print(self.var1.value)
if __name__ == "__main__":
#multiprocessing
obj = exp()
p1 = Process(target = obj.func1)
p2 = Process(target = obj.func2)
print("multiprocessing")
p1.start()
# No need to sleep, just wait for p1 to complete
# before starting p2:
#time.sleep(2)
p1.join()
p2.start()
p2.join()
Prints:
multiprocessing
5
5
Note
Using shared memory for this particular problem is much more efficient than using a managed class, which is referenced by the "close" comment.
The assignment of 5 to self.var1.value is an atomic operation and does not need to be a serialized operation. But if:
We were performing a non-atomic operation (requires multiple steps) such as self.var1.value += 1 and:
Multiple processes were performing this non-atomic operation in parallel, then:
We should create the value with a lock: self.var1 = Value('i', 0, lock=True) and:
Update the value under control of the lock: with self.var1.get_lock(): self.var1.value += 1
There are several ways to do that: you can use shared memory, fifo or message passing
I'm making use of multiprocessing but I get this error "MongoClient opened before fork." for every process. I did some research and concluded that I'm now creating multiple MongoClients (one per subprocess). But I didn't find a real solution. Every process is making use of MongoDB connection (I'm using pymongo as connector). Can someone help me?
Code:
def func1():
while True:
col1.insert_one({...})
...
def func2():
while True:
col2.insert_one({...})
...
if __name__ == "__main__":
# MongoDB
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["testdb"]
col1 = mydb["col1"]
col2 = mydb["col2"]
# Multiproccesing
p1 = Process(target=func1)
p2 = Process(target=func2)
p1.start()
p2.start()
p1.join()
p2.join()
Have each process open their own MongoDB connection(s).
Heed the warning in get_mongo_client(); if you want something that's safe to call from wherever, you'll need to "tag" _mongo_client with the PID of the current process and discard the object if it has the wrong PID.
_mongo_client = None # Global per process
def get_mongo_client():
# Make sure not to call this within the master process, or things
# will break again.
global _mongo_client
if _mongo_client is None:
_mongo_client = pymongo.MongoClient("mongodb://localhost:27017/")
return _mongo_client
def get_mongo_col(collection, database="testdb"):
client = get_mongo_client()
return client[database][collection]
def func1():
col1 = get_mongo_col("col1")
while True:
col1.insert_one({})
# ...
def func2():
col2 = get_mongo_col("col2")
while True:
col2.insert_one({})
# ...
def main():
# Multiproccesing
p1 = Process(target=func1)
p2 = Process(target=func2)
p1.start()
p2.start()
p1.join()
p2.join()
if __name__ == "__main__":
main()
Why while loop is ignored in work1? I would like to update value from string to another value in loop and output this value in process work2. Also already tried with Queue, but problem is I have only one variable which I would like to update in work1 and access to it at work2.
from multiprocessing import Process, Manager, Value
from ctypes import c_char_p
import time
def work1(string):
i = 2
string.value = i
# while True:
# print("work1")
# string.value = i + 1
# time.sleep(2)
def work2(string):
while True:
print("Value set in work1 " + str(string.value))
time.sleep(2)
if __name__ == '__main__':
manager = Manager()
string = manager.Value(int, 0);
p1=Process(target=work1, args=(string,))
p1.start()
p1.join()
p2=Process(target=work2, args=(string,))
p2.start()
p2.join()
That is because you didn't make your program parallel with two processes, but instead, two processes run in tandem. What you need to do is to start both process before any join. Like my modification below:
from multiprocessing import Process, Manager, Value
from ctypes import c_char_p
import time
def work1(string):
i = 2
string.value = i
while True:
i = i+1
string.value = i
print("work1 set value to "+str(string.value))
time.sleep(2)
def work2(string):
while True:
print("Value set in work1 " + str(string.value))
time.sleep(2)
if __name__ == '__main__':
manager = Manager()
string = manager.Value(int, 0, lock=False);
p1=Process(target=work1, args=(string,))
p2=Process(target=work2, args=(string,))
p1.start()
p2.start()
p2.join()
p1.join()
Indeed, if you write the code in this way, the join never happened due to the infinite while loop.
Based on this pretty useful tutorial I have tried to make a simple implementation of Python multiprocessing to measure its effectivity. The modules multi1, multi2, multi3 contain an ODE integration and exporting the calculated values in a csv (it does not matter, they are here for a script to do something).
import multiprocessing
import multi1
import multi2
import multi3
import time
t0 = time.time()
if __name__ == '__main__':
p1 = multiprocessing.Process(target = multi1.main(), args=())
p2 = multiprocessing.Process(target = multi2.main(), args=())
p3 = multiprocessing.Process(target = multi3.main(), args=())
p1.start()
p2.start()
p3.start()
p1.join()
p2.join()
p3.join()
t1 = time.time()
multi1.main()
multi2.main()
multi3.main()
t2 = time.time()
print t1-t0
print t2-t1
The problem is that the printed times are equal, so the multiprocessing didn't speed up the process. Why?
You called main in the main thread, and passed the return value (probably None) as the target, so no actual work is done in your worker processes. Remove the call parens, so you pass the function itself without calling it, e.g.:
p1 = multiprocessing.Process(target=multi1.main, args=())
p2 = multiprocessing.Process(target=multi2.main, args=())
p3 = multiprocessing.Process(target=multi3.main, args=())
This is the same basic problem seen in the threaded case.
I've just tested python multiprocessing for reading file or a global variable, but there is something strange happen.
for expample:
import multiprocessing
a = 0
def test(lock, name):
global a
with lock:
for i in range(10):
a = a + 1
print "in process %d : %d" % (name, a)
def main():
lock = multiprocessing.Lock()
p1 = multiprocessing.Process(target=test, args=(lock, 1))
p2 = multiprocessing.Process(target=test, args=(lock, 2))
p1.start()
p2.start()
p1.join()
p2.join()
print "in main process : %d" % a
if __name__=='__main__':
main()
The program read a global variable, but the output is:
in process 1 : 10
in process 2 : 10
in main process : 0
It seems that the sub-process cannot get and edit the global variable properly. Also, if I change the program to read the file, each sub-process will read the file completely, ignoring the lock.
So how does these happen? And how to solve this problem?
Global variables are not shared between processes. When you create and start a new Process(), that process runs inside a separate "cloned" copy of the current Python interpreter. Updating the variable from within a Process() will only update the variable locally to the particular process it is updated in.
To share data between Python processes, we need a multiprocessing.Pipe(), a multiprocessing.Queue(), a multiprocessing.Value(), a multiprocessing.Array() or one of the other multiprocessing-safe containers.
Here's an example based on your code:
import multiprocessing
def worker(lock, counter, name):
with lock:
for i in range(10):
counter.value += 1
print "In process {}: {}".format(name, counter.value)
def main():
lock = multiprocessing.Lock()
counter = multiprocessing.Value('i', 0)
p1 = multiprocessing.Process(target=worker, args=(lock, counter, 1))
p2 = multiprocessing.Process(target=worker, args=(lock, counter, 2))
p1.start()
p2.start()
p1.join()
p2.join()
print "In main process: {}".format(counter.value)
if __name__=='__main__':
main()
This gives me:
In process 1: 10
In process 2: 20
In main process: 20
Now, if you really want to use a global variable, you can use a multiprocessing.Manager(), but I think the first method is preferable, and this is a "heavier" solution. Here's an example:
import multiprocessing
manager = multiprocessing.Manager()
counter = manager.Value('i', 0);
def worker(lock, name):
global counter
with lock:
for i in range(10):
counter.value += 1
print "In process {}: {}".format(name, counter.value)
def main():
global counter
lock = multiprocessing.Lock()
p1 = multiprocessing.Process(target=worker, args=(lock, 1))
p2 = multiprocessing.Process(target=worker, args=(lock, 2))
p1.start()
p2.start()
p1.join()
p2.join()
print "In main process: {}".format(counter.value)
if __name__=='__main__':
main()