pickling lru_cached function on object - python

As part of parallellizing some existing code (with multiprocessing), I run into the situation that something similar to the class below needs to be pickled.
Starting from:
import pickle
from functools import lru_cache
class Test:
def __init__(self):
self.func = lru_cache(maxsize=None)(self._inner_func)
def _inner_func(self, x):
# In reality this will be slow-running
return x
calling
t = Test()
pickle.dumps(t)
returns
_pickle.PicklingError: Can't pickle <functools._lru_cache_wrapper object at 0x00000190454A7AC8>: it's not the same object as __main__.Test._inner_func
which I don't really understand. By the way, I also tried a variation where the name of _inner_func was func as well, that didn't change things.

If anybody is interested, this can be solved by using getstate and setstate like this:
from functools import lru_cache
from copy import copy
class Test:
def __init__(self):
self.func = lru_cache(maxsize=None)(self._inner_func)
def _inner_func(self, x):
# In reality this will be slow-running
return x
def __getstate__(self):
result = copy(self.__dict__)
result["func"] = None
return result
def __setstate__(self, state):
self.__dict__ = state
self.func = lru_cache(maxsize=None)(self._inner_func)

As detailled in the comments, the pickle module has issues when dealing with decorators. See this question for more details:
Pickle and decorated classes (PicklingError: not the same object)

Use methodtools.lru_cache not to create a new cache function in __init__
import pickle
from methodtools import lru_cache
class Test:
#lru_cache(maxsize=None)
def func(self, x):
# In reality this will be slow-running
return x
if __name__ == '__main__':
t = Test()
print(pickle.dumps(t))
It requires to install methodtools via pypi:
pip install methodtools

Related

Dynamically add function to class through decorator

I'm trying to find a way to dynamically add methods to a class through decorator.
The decorator i have look like:
def deco(target):
def decorator(function):
#wraps(function)
def wrapper(self, *args, **kwargs):
return function(*args, id=self.id, **kwargs)
setattr(target, function.__name__, wrapper)
return function
return decorator
class A:
pass
# in another module
#deco(A)
def compute(id: str):
return do_compute(id)
# in another module
#deco(A)
def compute2(id: str):
return do_compute2(id)
# **in another module**
a = A()
a.compute() # this should work
a.compute2() # this should work
My hope is the decorator should add the compute() function to class A, any object of A should have the compute() method.
However, in my test, this only works if i explicitly import compute into where an object of A is created. I think i'm missing something obvious, but don't know how to fix it. appreciate any help!
I think this will be quite simpler using a decorator implemented as a class:
class deco:
def __init__(self, cls):
self.cls = cls
def __call__(self, f):
setattr(self.cls, f.__name__, f)
return self.cls
class A:
def __init__(self, val):
self.val = val
#deco(A)
def compute(a_instance):
print(a_instance.val)
A(1).compute()
A(2).compute()
outputs
1
2
But just because you can do it does not mean you should. This can become a debugging nightmare, and will probably give a hard time to any static code analyser or linter (PyCharm for example "complains" with Unresolved attribute reference 'compute' for class 'A')
Why doesn't it work out of the box when we split it to different modules (more specifically, when compute is defined in another module)?
Assume the following:
a.py
print('importing deco and A')
class deco:
def __init__(self, cls):
self.cls = cls
def __call__(self, f):
setattr(self.cls, f.__name__, f)
return self.cls
class A:
def __init__(self, val):
self.val = val
b.py
print('defining compute')
from a import A, deco
#deco(A)
def compute(a_instance):
print(a_instance.val)
main.py
from a import A
print('running main')
A(1).compute()
A(2).compute()
If we execute main.py we get the following:
importing deco and A
running main
Traceback (most recent call last):
A(1).compute()
AttributeError: 'A' object has no attribute 'compute'
Something is missing. defining compute is not outputted. Even worse, compute is never defined, let alone getting bound to A.
Why? because nothing triggered the execution of b.py. Just because it sits there does not mean it gets executed.
We can force its execution by importing it. Feels kind of abusive to me, but it works because importing a file has a side-effect: it executes every piece of code that is not guarded by if __name__ == '__main__, much like importing a module executes its __init__.py file.
main.py
from a import A
import b
print('running main')
A(1).compute()
A(2).compute()
outputs
importing deco and A
defining compute
running main
1
2

How do I properly decorate a `classmethod` with `functools.lru_cache`?

I tried to decorate a classmethod with functools.lru_cache. My attempt failed:
import functools
class K:
#functools.lru_cache(maxsize=32)
#classmethod
def mthd(i, stryng: str): \
return stryng
obj = K()
The error message comes from functools.lru_cache:
TypeError: the first argument must be callable
A class method is, itself, not callable. (What is callable is the object return by the class method's __get__ method.)
As such, you want the function decorated by lru_cache to be turned into a class method instead.
#classmethod
#functools.lru_cache(maxsize=32)
def mthd(cls, stryng: str):
return stryng
The selected answer is totally correct but adding another post. If you want to bind the cache storages to each classes, instead of sharing the single storage to all its subclasses, there is another option methodtools
import functools
import methodtools
class K:
#classmethod
#functools.lru_cache(maxsize=1)
def mthd(cls, s: str):
print('functools', s)
return s
#methodtools.lru_cache(maxsize=1) # note that methodtools wraps classmethod
#classmethod
def mthd2(cls, s: str):
print('methodtools', s)
return s
class L(K):
pass
K.mthd('1')
L.mthd('2')
K.mthd2('1')
L.mthd2('2')
K.mthd('1') # functools share the storage
L.mthd('2')
K.mthd2('1') # methodtools doesn't share the storage
L.mthd2('2')
Then the result is
$ python example.py
functools 1
functools 2
methodtools 1
methodtools 2
functools 1
functools 2

multiprocessing for class method

I want to use multiprocessing for a class method. I found out from this answer that Pool in multiprocessing cannot pickle class methods directly but there is a workaround for that by defining a function outside the class, and adds an additional argument(s) to that function (Similar suggestion is also on this blog). Hence, I tried to achieve that by the following simple program which has MyClass where I want to parallel fun. However, I am not getting any results (there is no bug). It seems I am missing something but I feel I am almost there! Any fix is really appreciated.
import multiprocessing
class MyClass:
def __init__(self):
pass
def fun(self, myList):
print myList
def unwrap_fun(obj, myList):
return obj.fun(myList)
obj = MyClass()
mlp = multiprocessing.Pool(processes=multiprocessing.cpu_count())
mlp.imap_unordered(unwrap_fun, (obj, range(1, 10)))
You should call close() and join() from your main process. Try this:
import multiprocessing
class MyClass:
def fun(self, myList):
print myList
def unwrap_fun(myList):
obj = MyClass()
return obj.fun(myList)
if __name__ == '__main__':
mlp = multiprocessing.Pool(processes=multiprocessing.cpu_count())
mlp.imap_unordered(unwrap_fun, range(1, 10))
mlp.close()
mlp.join()

Python: How to import package twice?

Is there a way to import a package twice in the same python session, under the same name, but at different scope, in a multi-threaded environment ?
I would like to import the package, then override some of its functions to change its behavior only when used in specific class.
For instance, is it possible to achieve something like this ?
import mod
class MyClass:
mod = __import__('mod')
def __init__():
mod.function = new_function # override module function
def method():
mod.function() # call new_function
mod.function() # call original function
It might seem weird, but in this case the user deriving the class wouldn't have to change his code to use the improved package.
To import a module as a copy:
def freshimport(name):
import sys, importlib
if name in sys.modules:
del sys.modules[name]
mod = importlib.import_module(name)
sys.modules[name] = mod
return mod
Test:
import mymodule as m1
m2 = freshimport('mymodule')
assert m1.func is not m2.func
Note:
importlib.reload will not do the job, as it always "thoughtfully" updates the old module:
import importlib
import mymodule as m1
print(id(m1.func))
m2 = importlib.reload(m1)
print(id(m1.func))
print(id(m2.func))
Sample output:
139681606300944
139681606050680
139681606050680
It looks like a job for a context manager
import modul
def newfunc():
print('newfunc')
class MyClass:
def __enter__(self):
self._f = modul.func
modul.func = newfunc
return self
def __exit__(self, type, value, tb):
modul.func = self._f
def method(self):
modul.func()
modul.func()
with MyClass() as obj:
obj.method()
modul.func()
modul.func()
outputs
func
newfunc
newfunc
func
where modul.py contains
def func():
print('func')
NOTE: this solution suits single-threaded applications only (unspecified in the OP)

Python LRU Cache Decorator Per Instance

Using the LRU Cache decorator found here:
http://code.activestate.com/recipes/578078-py26-and-py30-backport-of-python-33s-lru-cache/
from lru_cache import lru_cache
class Test:
#lru_cache(maxsize=16)
def cached_method(self, x):
return x + 5
I can create a decorated class method with this but it ends up creating a global cache that applies to all instances of class Test. However, my intent was to create a per instance cache. So if I were to instantiate 3 Tests, I would have 3 LRU caches rather than 1 LRU cache that for all 3 instances.
The only indication I have that this is happening is when calling the cache_info() on the different class instances decorated methods, they all return the same cache statistics (which is extremely unlikely to occur given they are being interacted with very different arguments):
CacheInfo(hits=8379, misses=759, maxsize=128, currsize=128)
CacheInfo(hits=8379, misses=759, maxsize=128, currsize=128)
CacheInfo(hits=8379, misses=759, maxsize=128, currsize=128)
Is there a decorator or trick that would allow me to easily cause this decorator to create a cache for each class instance?
Assuming you don't want to modify the code (e.g., because you want to be able to just port to 3.3 and use the stdlib functools.lru_cache, or use functools32 out of PyPI instead of copying and pasting a recipe into your code), there's one obvious solution: Create a new decorated instance method with each instance.
class Test:
def cached_method(self, x):
return x + 5
def __init__(self):
self.cached_method = lru_cache(maxsize=16)(self.cached_method)
How about this: a function decorator that wraps the method with lru_cache the first time it's called on each instance?
def instance_method_lru_cache(*cache_args, **cache_kwargs):
def cache_decorator(func):
#wraps(func)
def cache_factory(self, *args, **kwargs):
print('creating cache')
instance_cache = lru_cache(*cache_args, **cache_kwargs)(func)
instance_cache = instance_cache.__get__(self, self.__class__)
setattr(self, func.__name__, instance_cache)
return instance_cache(*args, **kwargs)
return cache_factory
return cache_decorator
Use it like this:
class Foo:
#instance_method_lru_cache()
def times_2(self, bar):
return bar * 2
foo1 = Foo()
foo2 = Foo()
print(foo1.times_2(2))
# creating cache
# 4
foo1.times_2(2)
# 4
print(foo2.times_2(2))
# creating cache
# 4
foo2.times_2(2)
# 4
Here's a gist on GitHub with some inline documentation.
These days, methodtools will work
from methodtools import lru_cache
class Test:
#lru_cache(maxsize=16)
def cached_method(self, x):
return x + 5
You need to install methodtools
pip install methodtools
If you are still using py2, then functools32 also is required
pip install functools32

Categories

Resources