python sharing singleton object between child processes - python

I know that processes do not share same context in python. But what about singleton objects? I was able to get the child process share same internal object as parent process, but am unable to understand how. Is there something wrong with the code below?
This could be a follow up to this stackoverflow question.
This is the code I have:
Singleton.py:
import os
class MetaSingleton(type):
_instances = {}
def __call__(cls, *args, **kwargs):
if cls not in cls._instances:
cls._instances[cls] = super(MetaSingleton, cls).__call__(*args, **kwargs)
return cls._instances[cls]
class Singleton:
__metaclass__ = MetaSingleton
def __init__(self):
self.key="KEY TEST"
print "inside init"
def getKey(self):
return self.key
def setKey(self,key1):
self.key = key1
process_singleton.py:
import os
from Singleton import Singleton
def callChildProc():
singleton = Singleton()
print ("singleton key: %s"%singleton.getKey())
def taskRun():
singleton = Singleton()
singleton.setKey("TEST2")
for i in range(1,10):
print ("In parent process, singleton key: %s" %singleton.getKey())
try:
pid = os.fork()
except OSError,e:
print("Could not create a child process: %s"%e)
if pid == 0:
print("In the child process that has the PID %d"%(os.getpid()))
callChildProc()
exit()
print("Back to the parent process")
taskRun()

On forking systems, child processes have a copy on write view of the parent memory space. Processes use virtual memory and right after the fork both process virtual spaces point to the same physical RAM. On write, the physical page is copied and virtual memory is remapped so that bit of the the memory is no longer shared. This deferred copy is usually faster than cloning the memory space at the fork.
The result is that neither parent or child sees the other sides changes. Since you setup the singleton before the fork, both parent and child see the same value.
Here is a quick example where I use time.sleep to control when parent and child make their private changes:
import multiprocessing as mp
import time
def proc():
global foo
time.sleep(1)
print('child foo should be 1, the value before the fork:', foo)
foo = 3 # child private copy
foo = 1 # the value both see at fork
p = mp.Process(target=proc)
p.start()
foo = 2 # parent private copy
time.sleep(2)
print('parent foo should be 2, not the 3 set by child:', foo)
When run:
child foo should be 1, the value before the fork: 1
parent foo should be 2, not the 3 set by child: 2

Related

python multiprocessing manager connect creates another object

I would like to create shared object among processes. First I created server process which spawned process for class ProcessClass. Then I created another process where I want to connect to shared object.
But connection from another process created its own instance of ProcessClass.
So what I need to do to access this remote shared object.
Here is my test code.
from multiprocessing.managers import BaseManager
from multiprocessing import Process
class ProcessClass:
def __init__(self):
self._state = False
def set(self):
self._state = True
def get(self):
return self._state
class MyManager(BaseManager):
pass
def another_process():
MyManager.register('my_object')
m = MyManager(address=('', 50000))
m.connect()
proxy = m.my_object()
print(f'state from another process: {proxy.get()}')
def test_spawn_and_terminate_process():
MyManager.register('my_object', ProcessClass)
m = MyManager(address=('', 50000))
m.start()
proxy = m.my_object()
proxy.set()
print(f'state from main process: {proxy.get()}')
p = Process(target=another_process)
p.start()
p.join()
print(f'state from main process: {proxy.get()}')
if __name__ == '__main__':
test_spawn_and_terminate_process()
Output is
python test_communication.py
state from main process: True
state from another process: False
state from main process: True
Your code is working as it is supposed to. If you look at the documentation for multiprocessing.managers.SyncManager you will see that there is, for example, a method dict() to create a shareable dictionary. Would you expect that calling this method multiple times would return the same dictionary over and over again or new instances of sharable dictionaries?
What you need to do is enforce a singleton instance to be used repeatedly for successive invocations of proxy = m.my_object() and the way to do that is to first define the following function:
singleton = None
def get_singleton_process_instance():
global singleton
if singleton is None:
singleton = ProcessClass()
return singleton
Then you need to make a one line change in funtion test_spawn_and_terminate_process:
def test_spawn_and_terminate_process():
#MyManager.register('my_object', ProcessClass)
MyManager.register('my_object', get_singleton_process_instance)
This ensures that to satisfy requests for 'my_object', it always invokes get_singleton_process_instance() (returning the singleton) instead of ProcessClass(), which would return a new instance.

how to update attribute from start() and run() in multiprocessing

I create sub-class from multiprocessing.Process.
Object p.run() can update instance.ret_value from the long_runtime_proc, but p.start() can't get the ret_value updated though long_runtime_proc called and ran.
How can I get ret_value with p.start()?
*class myProcess (multiprocessing.Process):
def __init__(self, pid, name, ret_value=0):
multiprocessing.Process.__init__(self)
self.id = pid
self.ret_value = ret_value
def run(self):
self.ret_value = long_runtime_proc (self.id)*
Calling Process.run() directly does not start a new process, i.e. the code in Process.run() is executed in the same process that invoked it. That's why changes to self.ret_value are effective. However, you are not supposed to call Process.run() directly.
When you start the subprocess with Process.start() a new child process is created and then the code in Process.run() is executed in this new process. When you assign the return value of long_runtime_proc to self.ret_value, this occurs in the child process, not the parent and thus the parent ret_vaule is not updated.
What you probably need to do is to use a pipe or a queue to send the return value to the parent process. See the documentation for details. Here is an example using a queue:
import time
import multiprocessing
def long_runtime_proc():
'''Simulate a long running process'''
time.sleep(10)
return 1234
class myProcess(multiprocessing.Process):
def __init__(self, result_queue):
self.result_queue = result_queue
super(myProcess, self).__init__()
def run(self):
self.result_queue.put(long_runtime_proc())
q = multiprocessing.Queue()
p = myProcess(q)
p.start()
ret_value = q.get()
p.join()
With this code ret_value will end up being assigned the value off the queue which will be 1234.

Make Singleton class in Multiprocessing

I create Singleton class using Metaclass, it working good in multithreadeds and create only one instance of MySingleton class but in multiprocessing, it creates always new instance
import multiprocessing
class SingletonType(type):
# meta class for making a class singleton
def __call__(cls, *args, **kwargs):
try:
return cls.__instance
except AttributeError:
cls.__instance = super(SingletonType, cls).__call__(*args, **kwargs)
return cls.__instance
class MySingleton(object):
# singleton class
__metaclass__ = SingletonType
def __init__(*args,**kwargs):
print "init called"
def task():
# create singleton class instance
a = MySingleton()
# create two process
pro_1 = multiprocessing.Process(target=task)
pro_2 = multiprocessing.Process(target=task)
# start process
pro_1.start()
pro_2.start()
My output:
init called
init called
I need MySingleton class init method get called only once
Each of your child processes runs its own instance of the Python interpreter, hence the SingletonType in one process doesn't share its state with those in another process. This means that a true singleton that only exists in one of your processes will be of little use, because you won't be able to use it in the other processes: while you can manually share data between processes, that is limited to only basic data types (for example dicts and lists).
Instead of relying on singletons, simply share the underlying data between the processes:
#!/usr/bin/env python3
import multiprocessing
import os
def log(s):
print('{}: {}'.format(os.getpid(), s))
class PseudoSingleton(object):
def __init__(*args,**kwargs):
if not shared_state:
log('Initializating shared state')
with shared_state_lock:
shared_state['x'] = 1
shared_state['y'] = 2
log('Shared state initialized')
else:
log('Shared state was already initalized: {}'.format(shared_state))
def task():
a = PseudoSingleton()
if __name__ == '__main__':
# We need the __main__ guard so that this part is only executed in
# the parent
log('Communication setup')
shared_state = multiprocessing.Manager().dict()
shared_state_lock = multiprocessing.Lock()
# create two process
log('Start child processes')
pro_1 = multiprocessing.Process(target=task)
pro_2 = multiprocessing.Process(target=task)
pro_1.start()
pro_2.start()
# Wait until processes have finished
# See https://stackoverflow.com/a/25456494/857390
log('Wait for children')
pro_1.join()
pro_2.join()
log('Done')
This prints
16194: Communication setup
16194: Start child processes
16194: Wait for children
16200: Initializating shared state
16200: Shared state initialized
16201: Shared state was already initalized: {'x': 1, 'y': 2}
16194: Done
However, depending on your problem setting there might be better solutions using other mechanisms of inter-process communication. For example, the Queue class is often very useful.

Python multiprocessing subclass initialization

Is it okay to initialize the state of a multiprocessing.Process subclass in the __init__() method? Or will this result in duplicate resource utilization when the process forks? Take this example:
from multiprocessing import Process, Pipe
import time
class MyProcess(Process):
def __init__(self, conn, bar):
super().__init__()
self.conn = conn
self.bar = bar
self.databuffer = []
def foo(self, baz):
return self.bar * baz
def run(self):
'''Process mainloop'''
running = True
i = 0
while running:
self.databuffer.append(self.foo(i))
if self.conn.poll():
m = self.conn.recv()
if m=='get':
self.conn.send((i, self.databuffer))
elif m=='stop':
running = False
i += 1
time.sleep(0.1)
if __name__=='__main__':
conn, child_conn = Pipe()
p = MyProcess(child_conn, 5)
p.start()
time.sleep(2)
# Touching the instance does not affect the process which has forked.
p.bar=1
print(p.databuffer)
time.sleep(2)
conn.send('get')
i,data = conn.recv()
print(i,data)
conn.send('stop')
p.join()
As I note in the code, you cannot communicate with the process via the instance p, only via the Pipe so if I do a bunch of setup in the __init__ method such as create file handles, how is this duplicated when the process forks?
Does this mean that subclassing multiprocessing.Process in the same way you would a threading.Thread a bad idea?
Note that my processes are long running and meant to handle blocking IO.
This is easy to test. In __init__, add the following:
self.file = open('does_it_open.txt'.format(self.count), 'w')
Then run:
$ strace -f python youprogram.py 2> test.log
$ grep does_it_open test.log
open("does_it_open.txt", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 6
That means that at least on my system, copying your code and adding that call, the file was opened once, and only once.
For more about the wizardry that is strace, check out this fantastic blog post.

Python sharing class instance among threads

I have a class, that loads all resources into memory that are needed for my application (mostly images).
Then several threads need to access these resources through this class.
I don't want every instance to reload all resources, so I thought I use the Singleton Pattern.
I did it like this:
class DataContainer(object):
_instance = None
_lock = threading.Lock()
_initialised = True
def __new__(cls, *args, **kwargs):
with cls._lock:
if not cls._instance:
cls._initialised = False
cls._instance = object.__new__(cls, *args, **kwargs)
return cls._instance
def __init__(self, map_name = None):
# instance has already been created
if self._initialised:
return
self._initialised = True
# load images
This works fine, as long as I am not using multiple threads. But with multiple Threads every thread has a different instance. So using 4 threads, they each create a new instance.
I want all threads to use the same instance of this class,
so the resources are only loaded into memory once.
I also tried to do this in the same module where the class is defined, but outside the class definition:
def getDataContainer():
global dataContainer
return dataContainer
dataContainer = DataContainer()
but every thread still has its own instance.
I am new to python, if this is the wrong approach plz let me know,
I appreciate any help
To expand on #Will's comment, if a "shared object" is created by the parent, then passed in to each thread, all threads will share the same object.
(With processes, see the multiprocessing.Manager class, which directly support sharing state, including with modifications.)
import threading, time
class SharedObj(object):
image = 'beer.jpg'
class DoWork(threading.Thread):
def __init__(self, shared, *args, **kwargs):
super(DoWork,self).__init__(*args, **kwargs)
self.shared = shared
def run(self):
print threading.current_thread(), 'start'
time.sleep(1)
print 'shared', self.shared.image, id(self.shared)
print threading.current_thread(), 'done'
myshared = SharedObj()
threads = [ DoWork(shared=myshared, name='a'),
DoWork(shared=myshared, name='b')
]
for t in threads:
t.start()
for t in threads:
t.join()
print 'DONE'
Output:
<DoWork(a, started 140381090318080)> start
<DoWork(b, started 140381006067456)> start
shared beer.jpg shared140381110335440
<DoWork(b, started 140381006067456)> done
beer.jpg 140381110335440
<DoWork(a, started 140381090318080)> done
DONE
Note that the thread IDs are different, but they both use the same SharedObj instance, at memory address ending in 440.

Categories

Resources