I am trying to get attribute from my Proxy class but I don't quite understand the implementation (as per https://docs.python.org/3.9/library/multiprocessing.html?highlight=multiprocessing#multiprocessing.managers.BaseManager.register)
I understand I need to pass exposed and method_to_typeid to the .register() method because if I don't I only have access to "public" methods and not attributes.
Here is my code;
from multiprocessing import Process
from multiprocessing.managers import BaseManager
class CustomManager(BaseManager):
# nothing
pass
class TestClass:
def __init__(self):
self._items = []
#property
def items(self):
return self._items
def fill_items(self):
self._items.append(1)
if __name__ == "__main__":
CustomManager.register(
'TestClass',
TestClass,
exposed=('items', 'fill_items'),
method_to_typeid={'items': 'list'}
)
manager = CustomManager()
manager.start()
shared_object = manager.TestClass()
p = Process(target=shared_object.fill_items)
p.start()
p.join()
print(shared_object.items)
#print(shared_object.items())
I would expect this to return my list but it returns a reference to the method;
Output:
<bound method items of <AutoProxy[TestClass] object, typeid 'TestClass' at 0x7feb38056670>>
But when I try to call it as a method i.e. shared_object.items() I get;
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/managers.py", line 824, in _callmethod
raise convert_to_error(kind, result)
TypeError: 'list' object is not callable
Which makes sense because it is an attribute that contains list type value not a method. So but then why when I try to call it as an attribute I get it's reference not the value?
Tried following the official documentation and checked already asked questions, for most of which the solution was to add NamespaceProxy, but it looks like now instead of implementing our own NamespaceProxy the correct way is to just pass the two extra args for .register() method.
The solution here is just to use threading instead of multiprocessing. ChatGPT got pretty close to the implementation I needed but could not resolve the issue without changing the implementation of my classes. In the end it makes more sense to use threads anyway because;
threads share memory as opposed to processes
my script is I/O bound which suggests using threads where as CPU bound ones should use processes
Related
i have read a little about multiprocessing and pickling problems, I have also read that there are some solutions but I don't really know how can they help to mine situation.
I am building Test Runner where I use Multiprocessing to call modified Test Class methods. Modified by metaclass so I can have setUp and tearDown methods before and after each run test.
Here is my Parent Metaclass:
class MetaTestCase(type):
def __new__(cls, name: str, bases: Tuple, attrs: dict):
def replaced_func(fn):
def new_test(*args, **kwargs):
args[0].before()
result = fn(*args, **kwargs)
args[0].after()
return result
return new_test
# If method is found and its name starts with test, replace it
for i in attrs:
if callable(attrs[i]) and attrs[i].__name__.startswith('test'):
attrs[i] = replaced_func(attrs[i])
return (super(MetaTestCase, cls).__new__(cls, name, bases, attrs))
I am using this Sub Class to inherit MetaClass:
class TestCase(metaclass=MetaTestCase):
def before(self) -> None:
"""Overridable, execute before test part."""
pass
def after(self) -> None:
"""Overridable, execute after test part."""
pass
And then I use this in my TestSuite Class:
class TestApi(TestCase):
def before(self):
print('before')
def after(self):
print('after')
def test_api_one(self):
print('test')
Sadly when I try to execute that test with multiprocessing.Process it fails on
AttributeError: Can't pickle local object 'MetaTestCase.__new__.<locals>.replaced_func.<locals>.new_test'
Here is how I create and execute Process:
module = importlib.import_module('tests.api.test_api') # Finding and importing module
object = getattr(module, 'TestApi') # Getting Class from module
process = Process(target=getattr(object, 'test_api_one')) # Calling class method
process.start()
process.join()
I tried to use pathos.helpers.mp.Process, it passes pickling phase I guess but has some problems with Tuple that I don't understand:
Process Process-1:
Traceback (most recent call last):
result = fn(*args, **kwargs)
IndexError: tuple index out of range
Is there any simple solution for that so I can pickle that object and run test sucessfully along with my modified test class?
As for your original question of why you are getting the pickling error, this answer summarizes the problem and offers solutions (similar to those already provided here).
Now as to why you are receiving the IndexError, this is because you are not passing an instance of the class to the function (the self argument). A quick fix would be to do this (also, please don't use object as a variable name):
module = importlib.import_module('tests.api.test_api') # Finding and importing module
obj = getattr(module, 'TestApi')
test_api = obj() # Instantiate!
# Pass the instance explicitly! Alternatively, you can also do target=test_api.test_api_one
process = Process(target=getattr(obj, 'test_api_one'), args=(test_api, ))
process.start()
process.join()
Ofcourse, you can also opt to make the methods of the class as class methods, and pass the target function as obj.method_name.
Also, as a quick sidenote, the usage of a metaclass for the use case shown in the example seems like an overkill. Are you sure you can't do what you want with class decorators instead (which might also be compatible with the standard library's multiprocessing)?
https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled
"The following types can be pickled... functions (built-in and user-defined) accessible from the top level of a module (using def, not lambda);"
It sounds like you cannot pickle locally defined functions. This makes sense based on other pickle behavior I've seen. Essentially it's just pickling instructions to the python interpreter for how it can find the function definition. That usually means its a module name and function name (for example) so the multiprocessing Process can import the correct function.
There's no way for another process to import your replaced_func function because it's only locally defined.
You could try defining it outside of the metaclass, which would make it importable by other processes.
I was asked to develop a consistent way to run(train, make predictions, etc.) any ML model from the command line. I also need to periodically check the DB for requests related to training, like abort requests. To minimize the effect checking the DB has on training, I want to create a separate process for fetching requests from the DB.
So I created an abstract class RunnerBaseClass which requires its child classes to implement _train() for each ML model, and it will run _train() with _check_db() using the multiprocessing module when you call run().
I also want to get rid of the need for the boilerplate
if __name__ == '__main__':
...
code, and make argument parsing, creating an instance, and calling the run() method done automatically.
So I created a class decorator #autorun which calls the run() method of the class when the script is run directly from the command line. When run, the decorator successfully calls run(), but there seems to be a problem creating a subprocess with the class' method and the following error occurs:
Traceback (most recent call last):
File "run.py", line 4, in <module>
class Runner(RunnerBaseClass):
File "/Users/yongsinp/Downloads/runner_base.py", line 27, in class_decorator
instance.run()
File "/Users/yongsinp/Downloads/runner_base.py", line 16, in run
db_check_process.start()
File "/Users/yongsinp/miniforge3/envs/py3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/Users/yongsinp/miniforge3/envs/py3.8/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/Users/yongsinp/miniforge3/envs/py3.8/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/Users/yongsinp/miniforge3/envs/py3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/Users/yongsinp/miniforge3/envs/py3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/Users/yongsinp/miniforge3/envs/py3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/Users/yongsinp/miniforge3/envs/py3.8/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class '__main__.Runner'>: attribute lookup Runner on __main__ failed
Here's a minimal code that can be used to reproduce the error.
runner_base.py:
from abc import ABC, abstractmethod
from multiprocessing import Process
class RunnerBaseClass(ABC):
#abstractmethod
def _train(self) -> None:
...
def _check_db(self):
print("Checking DB")
def run(self) -> None:
db_check_process = Process(target=self._check_db)
db_check_process.start()
self._train()
db_check_process.join()
def autorun(env_name: str):
def class_decorator(class_):
instance = class_()
if env_name == '__main__':
instance.run()
return instance
return class_decorator
run.py:
from runner_base import RunnerBaseClass, autorun
#autorun(__name__)
class Runner(RunnerBaseClass):
def _train(self) -> None:
print("Training")
I have looked up the cause for this error and can simply fix it by not using the decorator, or turning the method into a function.
runner_base.py:
from abc import ABC, abstractmethod
from multiprocessing import Process
class RunnerBaseClass(ABC):
#abstractmethod
def _train(self) -> None:
...
def run(self) -> None:
db_check_process = Process(target=check_db)
db_check_process.start()
self._train()
db_check_process.join()
def autorun(env_name: str):
def class_decorator(class_):
instance = class_()
if env_name == '__main__':
instance.run()
return instance
return class_decorator
def check_db():
print("Checking DB")
I can just use the function instead of the method and be done with it, but I don't like the idea of passing configurations and objects for inter-process communication(like Queue) to the function which I don't have to when using a method. So, is there a way for me to keep _check_db() a method, and use the #autorun decorator?
(I am aware of using dill and other modules, but I'd like to stick with the builtin ones if possible.)
There might be a couple of misunderstandings here.
I can just use the function instead of the method and be done with it, but I don't like the idea of passing configurations and an object for communication in between processes to the function which I don't have to when using a method
It's understandable why you might think this, but your logic for using a method rather a function is flawed if you are planning to modify objects of Runner, in either the child or the parent processes. When you spawn processes using start method "spawn" (the default on Windows and macOS), the child processes don't have access to the parent's memory space. Therefore, if you create an object of Runner, and pass it to a process, that process will have a duplicate of that object with a different memory address than the one present in the parent. Any modifications made to these objects will not be propagated across processes. Same goes for start method "fork" (default on Unix), the only difference being that it uses copy-on-write, where, rather than on start, the duplicate will only be created if you attempt to modify the object in the child process.
So just keep in mind that sharing objects like you are trying to do only makes sense if you aim to use the objects as read-only (like passing configurations and data from one process to another), and don't care about whether the changes made to them are reflected in the other processes. If you also want them to be writable, you can simply use managers like this answer mentions. Keep in mind that using managers will negatively impact your code's performance (as communication will require all data to be serialized).
This brings us to the next question: can you even pass complex objects to another process?
Multiprocessing uses pickle to transfer data from one process to another. This means that any object passed as an argument must be picklable. Whether or not pickle can serialize complex objects like instances of Runner is then very much dependent on the instance attributes the object has. In your case, the problem isn't with pickling your instance, it's with the fact that you are attempting to do so when the class Runner hasn't even been added to the top-module. To check this, change your decorator to print whether the class exists in the global attributes before it attempts to create an instance:
def autorun(env_name: str):
def class_decorator(class_):
print(class_.__name__ in globals())
instance = class_()
if env_name == '__main__':
instance.run()
return instance
return class_decorator
Output:
False
In general, attributes not defined at the top module are not picklable with pickle, and this is why your code fails with a pickling error. Additionally, you also won't be able to use class ABC, since that can't be pickled either.
So what's the solution?
I recommend you to look outside the builtins to achieve what you want, or, like you mentioned, change the method check_db into a function. Apart from that, there is also a rather unintuitive workaround that you can use.
Method 1
If you do decide to use something better, like multiprocess, which uses dill rather than pickle, your code will become like this:
from multiprocess import Process
class RunnerBaseClass:
def _train(self) -> None:
...
def _check_db(self):
print("Checking DB")
def run(self) -> None:
db_check_process = Process(target=self._check_db)
db_check_process.start()
self._train()
db_check_process.join()
def autorun(env_name: str):
def class_decorator(class_):
instance = class_()
if env_name == '__main__':
instance.run()
return instance
return class_decorator
#autorun(__name__)
class Runner(RunnerBaseClass):
def _train(self) -> None:
print("Training")
Output
Training
Checking DB
Method 2
The second method relies on you changing the decorator to create an instance of the passed class's parent class instead, and attach it to a child of Runner. Consider this code:
from multiprocessing import Process
class RunnerBaseClass:
def _train(self) -> None:
...
def _check_db(self):
print("Checking DB")
def run(self) -> None:
db_check_process = Process(target=self._check_db)
db_check_process.start()
self._train()
db_check_process.join()
def autorun(env_name: str):
def class_decorator(class_):
# Create instance of parent class
instance = class_.__bases__[0]()
if env_name == '__main__':
instance.run()
return instance
return class_decorator
class Runner(RunnerBaseClass):
def _train(self) -> None:
print("Training")
#autorun(__name__)
class RunnerChild(Runner):
pass
Here, we attach the decorator to RunnerChild, a child of class Runner. The decorator then creates an instance of RunnerChild's parent class and executes run(). By doing it in this order, the Runner class has already been added to the top-module and can therefore be pickled.
Output
Training
Checking DB
I'm not sure if this is what you wanted, but you don't have to use decorators at all or the boilerplate.
After making your class, you can outside of it just write the code to make an object and run the functions, you don't need the if __name__ == "__main__" part.
And if you want to run specific functions from the command line, you could use the sys module and get arguments with sys.argv[1]. For example, if you wanted to train in one command, then make predictions in another, you could get the value of sys.argv and, using an if statement, check what the value is. If it is for example, "train", you could invoke a train function, or whatever else you need to do.
I am a bit puzzled right now about the following:
import weakref
class A:
def __init__(self, p):
self.p = p
def __repr__(self):
return f"{type(self).__name__}(p={self.p!r})"
a = A(1)
proxy_a = weakref.proxy(a)
print(repr(proxy_a))
# '<weakproxy at 0x7f2ea2fc1b80 to A at 0x7f2ea2fee610>'
print(proxy_a.__repr__())
# 'A(p=1)'
Why does repr(proxy_a) return a representation of the proxy while proxy_a.__repr__() returns the representation of the original object? Shouldn't the two calls boil down to the same thing? And which __repr__ implementation is actually called by using repr(proxy_a)?
repr(proxy_a) calls the default C implementation of repr for the weakref.proxy object. While proxy_a.__repr__() proxies the version from the a object.
Yes I would expect them to execute the same code, but wait, don't we expect also the proxy to send attribute lookups and method calls to the proxied object? At the same time I would want to see that a proxy is a proxy object, so the repr(proxy_a) result makes sense too. So it is not even clear what should be the right behaviour.
Information on this is very scarce, but it looks weakref.proxy objects do not replace in a complete transparent way the original objects, contrary to common expectations.
Added a few print lines to make things more clear. Note in the last line, that it is possible to access the weakreferred object and its methods through the __self__ parameter of a bound method from the proxy.
import weakref
class A:
def __init__(self, p):
self.__p = p
def __repr__(self):
return f"{type(self).__name__}(p={self.__p!r})"
a = A(1)
proxy_a = weakref.proxy(a)
print(repr(proxy_a))
# '<weakproxy at 0x7f2ea2fc1b80 to A at 0x7f2ea2fee610>'
print(proxy_a.__repr__())
# 'A(p=1)'
print(proxy_a.__repr__)
# <bound method A.__repr__ of A(p=1)>
print(type(proxy_a))
# <class 'weakproxy'>
print(type(proxy_a.__repr__.__self__))
# <class '__main__.A'>
print(proxy_a.__repr__.__self__.__repr__())
# A(p=1)
See also this complete thread python-dereferencing-weakproxy and some of the entries in this (old) thread from the Python bugtracker, where weakref.proxy and method delegation is mentioned in several places.
I have implemented the factory method pattern to parametrize the base class of the product class:
def factory(ParentClass):
class Wrapper(ParentClass):
_attr = "foo"
def wrapped_method():
"Do things to be done in `ParentClass`."""
return _attr
return Wrapper
I need to share Wrapper objects with a process spawned using the multiprocessing module by means of a multiprocessing.Queue.
Since multiprocessing.Queue uses Pickle to store the objects (see note at Pickle documentation), and Wrapper is not defined at the top level, I get the following error:
PicklingError: Can't pickle <class 'Wrapper'>: attribute lookup Wrapper failed
I used the workaround in this answer and I get another error:
AttributeError: ("type object 'ParentClass' has no attribute 'Wrapper'", <main._NestedClassGetter object at 0x8c7fe4c>, (<class 'ParentClass'>, 'Wrapper'))
Is there a solution to share these sort of objects among processes?
According to the Pickle documentation, the workaround linked in the question could be modified to:
class _NestedClassGetter(object):
"""
From: http://stackoverflow.com/a/11493777/741316
When called with the containing class as the first argument,
and the name of the nested class as the second argument,
returns an instance of the nested class.
"""
def __call__(self, factory_method, base):
nested_class = factory_method(base)
# make an instance of a simple object (this one will do), for which we
# can change the __class__ later on.
nested_instance = _NestedClassGetter()
# set the class of the instance, the __init__ will never be called on
# the class but the original state will be set later on by pickle.
nested_instance.__class__ = nested_class
return nested_instance
and the __reduce__ method to:
def __reduce__(self):
state = self.__dict__.copy()
return (_NestedClassGetter(),
(factory, ParentClass), state,)
Thanks to #dano for his comment.
The best solution is to restructure your code to not have dynamically declared classes, but assuming that isn't the case, you can do a little more work to pickle them.
And this method to your Wrapper class:
def __reduce__(self):
r = super(Wrapper, self).__reduce__()
return (wrapper_unpickler,
((factory, ParentClass, r[0]) + r[1][1:])) + r[2:]
Add this function to your module:
def wrapper_unpickler(factory, cls, reconstructor, *args):
return reconstructor(*((factory(cls),) + args))
Essentially, you are swapping the dynamically generated Wrapper class for the factory funciton + wrapped class when pickling, and then when unpickling, dynamically generating the Wrapper class again (passing the wrapped type to the factory) and swapping the wrapped class for the Wrapper.
I have a project where i'm trying to use weakrefs with callbacks, and I don't understand what I'm doing wrong. I have created simplified test that shows the exact behavior i'm confused with.
Why is it that in this test test_a works as expected, but the weakref for self.MyCallbackB disappears between the class initialization and calling test_b? I thought like as long as the instance (a) exists, the reference to self.MyCallbackB should exist, but it doesn't.
import weakref
class A(object):
def __init__(self):
def MyCallbackA():
print 'MyCallbackA'
self.MyCallbackA = MyCallbackA
self._testA = weakref.proxy(self.MyCallbackA)
self._testB = weakref.proxy(self.MyCallbackB)
def MyCallbackB(self):
print 'MyCallbackB'
def test_a(self):
self._testA()
def test_b(self):
self._testB()
if __name__ == '__main__':
a = A()
a.test_a()
a.test_b()
You want a WeakMethod.
An explanation why your solution doesn't work can be found in the discussion of the recipe:
Normal weakref.refs to bound methods don't quite work the way one expects, because bound methods are first-class objects; weakrefs to bound methods are dead-on-arrival unless some other strong reference to the same bound method exists.
According to the documentation for the Weakref module:
In the following, the term referent means the object which is referred to
by a weak reference.
A weak reference to an object is not
enough to keep the object alive: when
the only remaining references to a
referent are weak references, garbage
collection is free to destroy the
referent and reuse its memory for
something else.
Whats happening with MyCallbackA is that you are holding a reference to it in the instances of A, thanks to -
self.MyCallbackA = MyCallbackA
Now, there is no reference to the bound method MyCallbackB in your code. It is held only in a.__class__.__dict__ as an unbound method. Basically, a bound method is created (and returned to you) when you do self.methodName. (AFAIK, a bound method works like a property -using a descriptor (read-only): at least for new style classes. I am sure, something similar i.e. w/o descriptors happens for old style classes. I'll leave it to someone more experienced to verify the claim about old style classes.) So, self.MyCallbackB dies as soon as the weakref is created, because there is no strong reference to it!
My conclusions are based on :-
import weakref
#Trace is called when the object is deleted! - see weakref docs.
def trace(x):
print "Del MycallbackB"
class A(object):
def __init__(self):
def MyCallbackA():
print 'MyCallbackA'
self.MyCallbackA = MyCallbackA
self._testA = weakref.proxy(self.MyCallbackA)
print "Create MyCallbackB"
# To fix it, do -
# self.MyCallbackB = self.MyCallBackB
# The name on the LHS could be anything, even foo!
self._testB = weakref.proxy(self.MyCallbackB, trace)
print "Done playing with MyCallbackB"
def MyCallbackB(self):
print 'MyCallbackB'
def test_a(self):
self._testA()
def test_b(self):
self._testB()
if __name__ == '__main__':
a = A()
#print a.__class__.__dict__["MyCallbackB"]
a.test_a()
Output
Create MyCallbackB
Del MycallbackB
Done playing with MyCallbackB
MyCallbackA
Note :
I tried verifying this for old style classes. It turned out that "print a.test_a.__get__"
outputs -
<method-wrapper '__get__' of instancemethod object at 0xb7d7ffcc>
for both new and old style classes. So it may not really be a descriptor, just something descriptor-like. In any case, the point is that a bound-method object is created when you acces an instance method through self, and unless you maintain a strong reference to it, it will be deleted.
The other answers address the why in the original question, but either don't provide a workaround or refer to external sites.
After working through several other posts on StackExchange on this topic, many of which are marked as duplicates of this question, I finally came to a succinct workaround. When I know the nature of the object I'm dealing with, I use the weakref module; when I might instead be dealing with a bound method (as occurs in my code when using event callbacks), I now use the following WeakRef class as a direct replacement for weakref.ref(). I've tested this with Python 2.4 through and including Python 2.7, but not on Python 3.x.
class WeakRef:
def __init__ (self, item):
try:
self.method = weakref.ref (item.im_func)
self.instance = weakref.ref (item.im_self)
except AttributeError:
self.reference = weakref.ref (item)
else:
self.reference = None
def __call__ (self):
if self.reference != None:
return self.reference ()
instance = self.instance ()
if instance == None:
return None
method = self.method ()
return getattr (instance, method.__name__)