Python Multiprocessing Can't pickle local object

Python Multiprocessing Can't pickle local object - python

i have read a little about multiprocessing and pickling problems, I have also read that there are some solutions but I don't really know how can they help to mine situation.
I am building Test Runner where I use Multiprocessing to call modified Test Class methods. Modified by metaclass so I can have setUp and tearDown methods before and after each run test.
Here is my Parent Metaclass:
class MetaTestCase(type):
def __new__(cls, name: str, bases: Tuple, attrs: dict):
def replaced_func(fn):
def new_test(*args, **kwargs):
args[0].before()
result = fn(*args, **kwargs)
args[0].after()
return result
return new_test
# If method is found and its name starts with test, replace it
for i in attrs:
if callable(attrs[i]) and attrs[i].__name__.startswith('test'):
attrs[i] = replaced_func(attrs[i])
return (super(MetaTestCase, cls).__new__(cls, name, bases, attrs))
I am using this Sub Class to inherit MetaClass:
class TestCase(metaclass=MetaTestCase):
def before(self) -> None:
"""Overridable, execute before test part."""
pass
def after(self) -> None:
"""Overridable, execute after test part."""
pass
And then I use this in my TestSuite Class:
class TestApi(TestCase):
def before(self):
print('before')
def after(self):
print('after')
def test_api_one(self):
print('test')
Sadly when I try to execute that test with multiprocessing.Process it fails on
AttributeError: Can't pickle local object 'MetaTestCase.__new__.<locals>.replaced_func.<locals>.new_test'
Here is how I create and execute Process:
module = importlib.import_module('tests.api.test_api') # Finding and importing module
object = getattr(module, 'TestApi') # Getting Class from module
process = Process(target=getattr(object, 'test_api_one')) # Calling class method
process.start()
process.join()
I tried to use pathos.helpers.mp.Process, it passes pickling phase I guess but has some problems with Tuple that I don't understand:
Process Process-1:
Traceback (most recent call last):
result = fn(*args, **kwargs)
IndexError: tuple index out of range
Is there any simple solution for that so I can pickle that object and run test sucessfully along with my modified test class?

As for your original question of why you are getting the pickling error, this answer summarizes the problem and offers solutions (similar to those already provided here).
Now as to why you are receiving the IndexError, this is because you are not passing an instance of the class to the function (the self argument). A quick fix would be to do this (also, please don't use object as a variable name):
module = importlib.import_module('tests.api.test_api') # Finding and importing module
obj = getattr(module, 'TestApi')
test_api = obj() # Instantiate!
# Pass the instance explicitly! Alternatively, you can also do target=test_api.test_api_one
process = Process(target=getattr(obj, 'test_api_one'), args=(test_api, ))
process.start()
process.join()
Ofcourse, you can also opt to make the methods of the class as class methods, and pass the target function as obj.method_name.
Also, as a quick sidenote, the usage of a metaclass for the use case shown in the example seems like an overkill. Are you sure you can't do what you want with class decorators instead (which might also be compatible with the standard library's multiprocessing)?

https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled
"The following types can be pickled... functions (built-in and user-defined) accessible from the top level of a module (using def, not lambda);"
It sounds like you cannot pickle locally defined functions. This makes sense based on other pickle behavior I've seen. Essentially it's just pickling instructions to the python interpreter for how it can find the function definition. That usually means its a module name and function name (for example) so the multiprocessing Process can import the correct function.
There's no way for another process to import your replaced_func function because it's only locally defined.
You could try defining it outside of the metaclass, which would make it importable by other processes.

Related

How to access property attribute from multiprocess Proxy class in Python?

I am trying to get attribute from my Proxy class but I don't quite understand the implementation (as per https://docs.python.org/3.9/library/multiprocessing.html?highlight=multiprocessing#multiprocessing.managers.BaseManager.register)
I understand I need to pass exposed and method_to_typeid to the .register() method because if I don't I only have access to "public" methods and not attributes.
Here is my code;
from multiprocessing import Process
from multiprocessing.managers import BaseManager
class CustomManager(BaseManager):
# nothing
pass
class TestClass:
def __init__(self):
self._items = []
#property
def items(self):
return self._items
def fill_items(self):
self._items.append(1)
if __name__ == "__main__":
CustomManager.register(
'TestClass',
TestClass,
exposed=('items', 'fill_items'),
method_to_typeid={'items': 'list'}
)
manager = CustomManager()
manager.start()
shared_object = manager.TestClass()
p = Process(target=shared_object.fill_items)
p.start()
p.join()
print(shared_object.items)
#print(shared_object.items())
I would expect this to return my list but it returns a reference to the method;
Output:
<bound method items of <AutoProxy[TestClass] object, typeid 'TestClass' at 0x7feb38056670>>
But when I try to call it as a method i.e. shared_object.items() I get;
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/managers.py", line 824, in _callmethod
raise convert_to_error(kind, result)
TypeError: 'list' object is not callable
Which makes sense because it is an attribute that contains list type value not a method. So but then why when I try to call it as an attribute I get it's reference not the value?
Tried following the official documentation and checked already asked questions, for most of which the solution was to add NamespaceProxy, but it looks like now instead of implementing our own NamespaceProxy the correct way is to just pass the two extra args for .register() method.

The solution here is just to use threading instead of multiprocessing. ChatGPT got pretty close to the implementation I needed but could not resolve the issue without changing the implementation of my classes. In the end it makes more sense to use threads anyway because;
threads share memory as opposed to processes
my script is I/O bound which suggests using threads where as CPU bound ones should use processes

Using a class decorator to automatically run a method with a child process

I was asked to develop a consistent way to run(train, make predictions, etc.) any ML model from the command line. I also need to periodically check the DB for requests related to training, like abort requests. To minimize the effect checking the DB has on training, I want to create a separate process for fetching requests from the DB.
So I created an abstract class RunnerBaseClass which requires its child classes to implement _train() for each ML model, and it will run _train() with _check_db() using the multiprocessing module when you call run().
I also want to get rid of the need for the boilerplate
if __name__ == '__main__':
...
code, and make argument parsing, creating an instance, and calling the run() method done automatically.
So I created a class decorator #autorun which calls the run() method of the class when the script is run directly from the command line. When run, the decorator successfully calls run(), but there seems to be a problem creating a subprocess with the class' method and the following error occurs:
Traceback (most recent call last):
File "run.py", line 4, in <module>
class Runner(RunnerBaseClass):
File "/Users/yongsinp/Downloads/runner_base.py", line 27, in class_decorator
instance.run()
File "/Users/yongsinp/Downloads/runner_base.py", line 16, in run
db_check_process.start()
File "/Users/yongsinp/miniforge3/envs/py3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/Users/yongsinp/miniforge3/envs/py3.8/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/Users/yongsinp/miniforge3/envs/py3.8/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/Users/yongsinp/miniforge3/envs/py3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/Users/yongsinp/miniforge3/envs/py3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/Users/yongsinp/miniforge3/envs/py3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/Users/yongsinp/miniforge3/envs/py3.8/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class '__main__.Runner'>: attribute lookup Runner on __main__ failed
Here's a minimal code that can be used to reproduce the error.
runner_base.py:
from abc import ABC, abstractmethod
from multiprocessing import Process
class RunnerBaseClass(ABC):
#abstractmethod
def _train(self) -> None:
...
def _check_db(self):
print("Checking DB")
def run(self) -> None:
db_check_process = Process(target=self._check_db)
db_check_process.start()
self._train()
db_check_process.join()
def autorun(env_name: str):
def class_decorator(class_):
instance = class_()
if env_name == '__main__':
instance.run()
return instance
return class_decorator
run.py:
from runner_base import RunnerBaseClass, autorun
#autorun(__name__)
class Runner(RunnerBaseClass):
def _train(self) -> None:
print("Training")
I have looked up the cause for this error and can simply fix it by not using the decorator, or turning the method into a function.
runner_base.py:
from abc import ABC, abstractmethod
from multiprocessing import Process
class RunnerBaseClass(ABC):
#abstractmethod
def _train(self) -> None:
...
def run(self) -> None:
db_check_process = Process(target=check_db)
db_check_process.start()
self._train()
db_check_process.join()
def autorun(env_name: str):
def class_decorator(class_):
instance = class_()
if env_name == '__main__':
instance.run()
return instance
return class_decorator
def check_db():
print("Checking DB")
I can just use the function instead of the method and be done with it, but I don't like the idea of passing configurations and objects for inter-process communication(like Queue) to the function which I don't have to when using a method. So, is there a way for me to keep _check_db() a method, and use the #autorun decorator?
(I am aware of using dill and other modules, but I'd like to stick with the builtin ones if possible.)

There might be a couple of misunderstandings here.
I can just use the function instead of the method and be done with it, but I don't like the idea of passing configurations and an object for communication in between processes to the function which I don't have to when using a method
It's understandable why you might think this, but your logic for using a method rather a function is flawed if you are planning to modify objects of Runner, in either the child or the parent processes. When you spawn processes using start method "spawn" (the default on Windows and macOS), the child processes don't have access to the parent's memory space. Therefore, if you create an object of Runner, and pass it to a process, that process will have a duplicate of that object with a different memory address than the one present in the parent. Any modifications made to these objects will not be propagated across processes. Same goes for start method "fork" (default on Unix), the only difference being that it uses copy-on-write, where, rather than on start, the duplicate will only be created if you attempt to modify the object in the child process.
So just keep in mind that sharing objects like you are trying to do only makes sense if you aim to use the objects as read-only (like passing configurations and data from one process to another), and don't care about whether the changes made to them are reflected in the other processes. If you also want them to be writable, you can simply use managers like this answer mentions. Keep in mind that using managers will negatively impact your code's performance (as communication will require all data to be serialized).
This brings us to the next question: can you even pass complex objects to another process?
Multiprocessing uses pickle to transfer data from one process to another. This means that any object passed as an argument must be picklable. Whether or not pickle can serialize complex objects like instances of Runner is then very much dependent on the instance attributes the object has. In your case, the problem isn't with pickling your instance, it's with the fact that you are attempting to do so when the class Runner hasn't even been added to the top-module. To check this, change your decorator to print whether the class exists in the global attributes before it attempts to create an instance:
def autorun(env_name: str):
def class_decorator(class_):
print(class_.__name__ in globals())
instance = class_()
if env_name == '__main__':
instance.run()
return instance
return class_decorator
Output:
False
In general, attributes not defined at the top module are not picklable with pickle, and this is why your code fails with a pickling error. Additionally, you also won't be able to use class ABC, since that can't be pickled either.
So what's the solution?
I recommend you to look outside the builtins to achieve what you want, or, like you mentioned, change the method check_db into a function. Apart from that, there is also a rather unintuitive workaround that you can use.
Method 1
If you do decide to use something better, like multiprocess, which uses dill rather than pickle, your code will become like this:
from multiprocess import Process
class RunnerBaseClass:
def _train(self) -> None:
...
def _check_db(self):
print("Checking DB")
def run(self) -> None:
db_check_process = Process(target=self._check_db)
db_check_process.start()
self._train()
db_check_process.join()
def autorun(env_name: str):
def class_decorator(class_):
instance = class_()
if env_name == '__main__':
instance.run()
return instance
return class_decorator
#autorun(__name__)
class Runner(RunnerBaseClass):
def _train(self) -> None:
print("Training")
Output
Training
Checking DB
Method 2
The second method relies on you changing the decorator to create an instance of the passed class's parent class instead, and attach it to a child of Runner. Consider this code:
from multiprocessing import Process
class RunnerBaseClass:
def _train(self) -> None:
...
def _check_db(self):
print("Checking DB")
def run(self) -> None:
db_check_process = Process(target=self._check_db)
db_check_process.start()
self._train()
db_check_process.join()
def autorun(env_name: str):
def class_decorator(class_):
# Create instance of parent class
instance = class_.__bases__[0]()
if env_name == '__main__':
instance.run()
return instance
return class_decorator
class Runner(RunnerBaseClass):
def _train(self) -> None:
print("Training")
#autorun(__name__)
class RunnerChild(Runner):
pass
Here, we attach the decorator to RunnerChild, a child of class Runner. The decorator then creates an instance of RunnerChild's parent class and executes run(). By doing it in this order, the Runner class has already been added to the top-module and can therefore be pickled.
Output
Training
Checking DB

I'm not sure if this is what you wanted, but you don't have to use decorators at all or the boilerplate.
After making your class, you can outside of it just write the code to make an object and run the functions, you don't need the if __name__ == "__main__" part.
And if you want to run specific functions from the command line, you could use the sys module and get arguments with sys.argv[1]. For example, if you wanted to train in one command, then make predictions in another, you could get the value of sys.argv and, using an if statement, check what the value is. If it is for example, "train", you could invoke a train function, or whatever else you need to do.

unitTest a python 3 metaclass

I have a metaclass that set a class property my_new_property when it loads the class. This file will me named my_meta and the code is this
def remote_function():
# Get some data from a request to other site
return 'remote_response'
class MyMeta(type):
def __new__(cls, *args, **kwargs):
print("It is in")
obj = super().__new__(cls, *args, **kwargs)
new_value = remote_function()
setattr(obj, 'my_new_property', new_value)
return obj
The functionality to set the property works fine, however when writing the test file tests.py with only one code line:
from my_meta import MyMeta
The meta code is executed. As a consequence, it executes the real method remote_function.
The question is... as the meta code is executed only by using the import from the test file, how could I mock the method remote_function?

Importing the file as you show us won't trigger execution of the metaclass code.
However, importing any file (includng the one where the metaclass is), where there is a class that makes use of this metaclass, will run the code in the metaclass __new__ method - as parsing a class body defined with the class statement does just that: call the metaclass to create a new class instance.
So, the recomendation is: do not have your metaclass __new__ or __init__ methods to trigger side effects, like accessing remote stuff, if that can't be done in a seamless and innocuous way. Not only testing, but importing modules of your app in a Python shell will also trigger the behavior.
You could have a method in the metaclass to inialize with the remote value, and when you are about to actually use those, you'd explicitly call a such "remote_init" - like in
class MyMeta(type):
def __new__(cls, *args, **kwargs):
print("It is in")
obj = super().__new__(cls, *args, **kwargs)
new_value = remote_function()
setattr(obj, 'my_new_property', new_value)
return obj
def remote_init(cls):
if hasattr(cls, "my_new_property"):
return
cls.my_new_property = remote_function()
The remote_init method, being placed in the metaclass will behave jsut like a class method for the instantiated classes, but won't be visible (for dir or attribute retrieval), from the class instances.
This is the safest thing to do.
If you want to avoid the explicit step, which is understanble, you could use a setting in a configuration file, and a test inside the remote_function on whether to trigger the actual networking code, or just return a local, dummy value. You then make the configurations differ for testing/staging/production.
And, finally, you could just separate remote_method in another module, import that first, patch it out with unitttest.mock.patch, and the import the metaclass containing module - when it runs and calls the method, it will be just the pacthed version. This will work for your tests, but won't fix the problem of you triggering side-effects in other occasions (like in other tests that load this module).
Of course, for that to work you have to import the module containing the metaclass and any classes defined with it inside your test function, in a region where mock.patch is active, not importing it at the top of the file. There is just no problem in importing things inside test methods to have control over the importing process itself.

Factory method pattern conflicts with use of multiprocessing queue

I have implemented the factory method pattern to parametrize the base class of the product class:
def factory(ParentClass):
class Wrapper(ParentClass):
_attr = "foo"
def wrapped_method():
"Do things to be done in `ParentClass`."""
return _attr
return Wrapper
I need to share Wrapper objects with a process spawned using the multiprocessing module by means of a multiprocessing.Queue.
Since multiprocessing.Queue uses Pickle to store the objects (see note at Pickle documentation), and Wrapper is not defined at the top level, I get the following error:
PicklingError: Can't pickle <class 'Wrapper'>: attribute lookup Wrapper failed
I used the workaround in this answer and I get another error:
AttributeError: ("type object 'ParentClass' has no attribute 'Wrapper'", <main._NestedClassGetter object at 0x8c7fe4c>, (<class 'ParentClass'>, 'Wrapper'))
Is there a solution to share these sort of objects among processes?

According to the Pickle documentation, the workaround linked in the question could be modified to:
class _NestedClassGetter(object):
"""
From: http://stackoverflow.com/a/11493777/741316
When called with the containing class as the first argument,
and the name of the nested class as the second argument,
returns an instance of the nested class.
"""
def __call__(self, factory_method, base):
nested_class = factory_method(base)
# make an instance of a simple object (this one will do), for which we
# can change the __class__ later on.
nested_instance = _NestedClassGetter()
# set the class of the instance, the __init__ will never be called on
# the class but the original state will be set later on by pickle.
nested_instance.__class__ = nested_class
return nested_instance
and the __reduce__ method to:
def __reduce__(self):
state = self.__dict__.copy()
return (_NestedClassGetter(),
(factory, ParentClass), state,)
Thanks to #dano for his comment.

The best solution is to restructure your code to not have dynamically declared classes, but assuming that isn't the case, you can do a little more work to pickle them.
And this method to your Wrapper class:
def __reduce__(self):
r = super(Wrapper, self).__reduce__()
return (wrapper_unpickler,
((factory, ParentClass, r[0]) + r[1][1:])) + r[2:]
Add this function to your module:
def wrapper_unpickler(factory, cls, reconstructor, *args):
return reconstructor(*((factory(cls),) + args))
Essentially, you are swapping the dynamically generated Wrapper class for the factory funciton + wrapped class when pickling, and then when unpickling, dynamically generating the Wrapper class again (passing the wrapped type to the factory) and swapping the wrapped class for the Wrapper.

How do you pass a class method as param of deferred.defer on GAE

Here are my attempts:
deferred.defer(class1().method1, class2.method2, arg)
deferred.defer(class1().method1, class2.method2(), arg)
Both of these fail with the error:
Can't pickle <type 'instancemethod'>: it's not found as __builtin__.instancemethod
In another post about how to pickle an instance method, Steven Bethard's solution was suggested: http://bytes.com/topic/python/answers/552476-why-cant-you-pickle-instancemethods (towards the bottom of the page)
The code has lost its formatting and I have been unable to successfully use the code to solve my deferred problem.

The function to be used in deferred.defer must be one that this directly importable, global function. This is because the Deferred handler will almost surely work in different interpreter instance, so the function in question must be imported by it.
If class1 in your code refers to actual class name, the simplest way to tackle the problem would be wrapping the call to its method inside global function and passing it to defer:
def deferred_method_call(*args, **kwargs):
class1.method1(*args, **kwargs)
deferred.defer(deferred_method_call, ...)
On the other hand, if class1 is just a name of variable you point to actual class, you would want to pass it as parameter to your function:
def deferred_method_call(class_, *args, **kwargs):
class_.method1(*args, **kwargs)
deferred.defer(deferred_method_call, class1, ...)
This works because class objects (instances of type) are picklable and can be passed as arguments to defered functions.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Multiprocessing Can't pickle local object - python

Related

How to access property attribute from multiprocess Proxy class in Python?

Using a class decorator to automatically run a method with a child process

unitTest a python 3 metaclass

Factory method pattern conflicts with use of multiprocessing queue

How do you pass a class method as param of deferred.defer on GAE

Categories

Resources