I conceptually want to do something easy: save a python object that I can access from another (different) program later.
But the problem is that is has a wrapper around it (the f(x) below) that is not being referenced in new environments.
After spending a whole 12 hours, I feel even more confused than when I started. I think "pickle"-ing or "dill" etc... is what I am supposed to do. But I am running up against the pickling problem. But reading online is getting me no where. (btw, i tried shap.save but it is having the same realm of problems and uses pickle anyways).
import shap, pickle
model = ... (some tensorflow function)
def f(X):
...
return model.predict(...).flatten()
explainer = shap.KernelExplainer(f, X.iloc[:50, :])
with open(f"/tmp/{file}.pkl", 'wb') as fil:
# explainer.save(fil)
pickle.dump(explainer, fil)
This does not work because it "cannot find
attribute 'f'". These look like the most promising articles I could find but I could not implement for my scenario.
http://gael-varoquaux.info/programming/decoration-in-python-done-right-decorating-and-pickling.html
Unable to load files using pickle and multiple modules
https://github.com/slundberg/shap/issues/295
Python: Can't pickle type X, attribute lookup failed
*** please provide suggestions on terminology in the comments for me to improve how I ask question because I do not know how to word my prompt.
Usually, in any of programming language, you don't save the object but instead, you save the attributes of an object into a file. Then you should make a function to read the file and fill the value into your object.
You can make your object into an installable package and import the package as an object into your program.
import pickle
class A(Object):
def __init__(self, var1, var2):
self.var1 = var1
sefl.var2 = var2
def dump_obj(self, fn):
pickle.dump(self, fn, pickle.HIGHEST_PROTOCOL)
in a file that you want to load your object and feed the data into your object
import pickle
import file_with_object_A
def load_A(fn):
with open(fn, 'rb') as file:
new_a = pickle.load(fn)
# or alternatively, you can load each attribute into a variable and feed it to your object
var1, var2 = file.readline(fn)
my_new_A = A(var1, var2)
Remember to import the file that you detail the function f(X) into your new code file, either in a way of the package or keep this file contains f(X) in the same directory with the current working file.
You can check out this answer Saving an Object (Data persistence)
Related
This is probably a relatively common concern but I haven't found any real answer so far.
Let's use an example to illustrate why Pickle is not satisfying here: I have a file which contains a class that takes a long time (tens of hours) to instantiate. Once instantiated, I then save it to a pickle file:
import pickle
import time
from foo.bar import baz
class MyClass:
def __init__(self):
self.message = baz.very_long_computation()
obj = MyClass()
with open('./obj.pickle', 'wb') as f:
pickle.dump(obj)
From a different file in the same repository, I want to load the instantiated object and use it, without having to wait:
import pickle
with open('./obj.pickle', 'rb') as f:
obj = pickle.load(f)
print(obj.message)
What can happen sometimes is that I change e.g. the baz module's location in the folder structure (without necessarily modify its content), for instance I move it from foo.bar.baz to qux.bar.baz.
Now, if I try to load obj.pickle again, Python will complain with a ModuleNotFoundError: No module named foo.
It seems that whenever I make the slightest refactoring, I have to completely re-instantiate MyClass, involving this long waiting time. In practice this is terribly annoying, hence my two questions:
is there any alternative to Pickle that wouldn't force re-instantiating?
is there a way to avoid this problem while still using Pickle?
Note: I understand that in the above example, I would simply have to save self.message into a text file and call it a day. In reality, it would be complex to persist the object's content in such a manner.
I have a problem loading objects via numpy.load after renaming a module.
Here's a simple example showing the problem.
Imagine having a class defined in mymodule.py:
class MyClass(object):
a = "ciao"
b = [1, 2, 3]
def __init__(self, value=2):
self.value = value
from a python session I can simply create an instance and save it:
import numpy as np
import mymodule
instance = mymodule.MyClass()
np.save("dump.npy", instance)
Loading the file works nicely (even from a fresh session started in the same folder):
np.load("dump.npy")
If I now rename the module:
mv mymodule.py mymodule2.py
the loading fails. This is expected, but I was hoping that by importing the module before loading:
import mymodule2 as mymodule
the object definition could be found ... but it does not work.
This means that:
1. I do not understand how it works
2. I am forced to keep a symbolic link to the renamed file in a project I am partially refactoring.
Is there anything else I can do do avoid the symbolic link solution ? and to avoid having the same problem in the future ?
Thanks a lot,
marco
[this is my first question here, sorry If I am doing something wrong]
NumPy uses pickle for arrays with objects, but adds a header on top of it. Therefore, you'll need to do a bit more than coding a custom Unpickler:
import pickle
from numpy.lib.format import read_magic, _check_version, _read_array_header
class RenamingUnpickler(pickle.Unpickler):
def find_class(self, module, name):
if module == 'mymodule':
module = 'mymodule2'
return super().find_class(module, name)
with open('dump.npy', 'rb') as fp:
version = read_magic(fp)
_check_version(version)
dtype = _read_array_header(fp, version)[2]
assert dtype.hasobject
print(RenamingUnpickler(fp).load())
Previously I defined an ElectrodePositionsModel class in the module gselu.py in the package gselu, and pickled the ElectrodePositionsModel objects into some files.
Some time later it was decided to refactor the project and the package name gselu was changed to ielu
When I attempt to unpickle the old pickle files with pickle.load(), the process fails with the error, 'module' object has no attribute 'ElectrodePositionsModel'. What I understand of the Unpicklers behavior is that this is because the pickle thinks it has stored an instance of gselu.gselu.ElectrodePositionsModel, and tries to therefore import this class from this module. When it doesn't exist, it gives up.
I think that I am supposed to add something to the module's init.py to tell it where the gselu.gselu.ElectrodePositionsModel is, but I can't get the pickle.load() function to give me any error message other than 'module' has no attribute 'ElectrodePositionsModel' and I can't figure out where I am supposed to provide the correct path to find it. The code that does the unpickling is in the same module file (gselu.py) as the ElectrodePositionsModel class.
When I load the pickle file in an ipython session and manually import ElectrodePositionsModel, it loads correctly.
How do I tell the pickler where to load this module?
I realise this questions is old, but I just ran into a similar problem.
What I did to solve it was to take the old code and unpickle the data using that.
Then instead of pickling directly the custom classes, I pickled CustomClass.__dict__ which only contained raw python.
This data could then easily be imported in the new module by doing
a = NewNameOfCustomClass()
a.__dict__ = pickle.load('olddata.p', 'rb')
This method works if your custom class only has standard variables (such as builtins or numpy arrays, etc).
I have a user-defined class 'myclass' that I store on file with the pickle module, but I am having problem unpickling it. I have about 20 distinct instances of the same structure, that I save in distinct files. When I read each file, the code works on some files and not on others, when I get the error:
'module' object has no attribute 'myclass'
I have generated some files today and some other yesterday, and my code only works on the files generated today (I have NOT changed class definition between yesterday and today).
I was wondering if maybe my method is not robust, if I am not doing things as I should do, for example maybe I cannot pickled user-defined class, and if this is introducing some randomness in the process.
Another issue could be that the files that I generated yesterday were generated on a different machine --- because I work on an academic cluster, I have some login nodes and some computing nodes, that differs by architecture. So I generated yesterday files on the computing nodes, and today files on the login nodes, and I am reading everything on the login nodes.
As suggested in some of the comments, I have installed dill and loaded it with import dill as pickle. Now I can read the files from computing nodes to login nodes of the same cluster. But if I try to read the files generated on the computing node of one cluster, on the login node of another cluster I cannot. I get KeyError: 'ClassType' in _load_type(name) in dill.py
Can it be because the python version is different? I have generated the files with python2.7 and I read them with python3.3.
EDIT:
I can read the pickled files, if I use everywhere python 2.7. Sadly, part of my code, written in python 3, is not automatically compatible with python 2.7 :(
Can you from mymodule import myclass? Pickling does not pickle the class, just a reference to it. To load a pickled object python must be able to find the class that was to be used to create the object.
eg.
import pickle
class A(object):
pass
obj = A()
pickled = pickle.dumps(obj)
_A = A; del A # hide class
try:
pickle.loads(pickled)
except AttributeError as e:
print(e)
A = _A # unhide class
print(pickle.loads(pickled))
Following the suggestion here, my package (or the directory containing my modules) is located at C:/Python34/Lib/site-packages. The directory contains an __init__.py and sys.path contains a path to the directory as shown.
Still I am getting the following error:
Traceback (most recent call last):
File "C:/Python34/Lib/site-packages/toolkit/window.py", line 6, in <module>
from catalogmaker import Catalog
File "C:\Python34\Lib\site-packages\toolkit\catalogmaker.py", line 1, in <module>
from patronmaker import Patron
File "C:\Python34\Lib\site-packages\toolkit\patronmaker.py", line 4, in <module>
class Patron:
File "C:\Python34\Lib\site-packages\toolkit\patronmaker.py", line 11, in Patron
patrons = pickle.load(f)
ImportError: No module named 'Patron'
I have a class in patronmaker.py named 'Patron' but no module named Patron so I am not sure what the last statement in the error message means. I very much appreciate your thoughts on what I am missing.
Python Version 3.4.1 on a Windows 32 bits machine.
You are saving all patron instances (i.e. self) to the Patron class attribute Patron.patrons. Then you are trying to pickle a class attribute from within the class. This can choke pickle, however I believe dill should be able to handle it. Is it really necessary to save all the class instances to a list in Patrons? It's a bit of an odd thing to do…
pickle serializes classes by reference, and doesn't play well with __main__ for many objects. In dill, you don't have to serialize classes by reference, and it can handle issues with __main__, much better. Get dill here: https://github.com/uqfoundation
Edit:
I tried your code (with one minor change) and it worked.
dude#hilbert>$ python patronmaker.py
Then start python…
>>> import dill
>>> f = open('patrons.pkl', 'rb')
>>> p = dill.load(f)
>>> p
[Julius Caeser, Kunte Kinta, Norton Henrich, Mother Teresa]
The only change I made was to uncomment the lines at the end of patronmaker.py so that it saved some patrons…. and I also replaced import pickle with import dill as pickle everywhere.
So, even by downloading and running your code, I can't produce an error with dill. I'm using the latest dill from github.
Additional Edit:
Your traceback above is from an ImportError. Did you install your module? If you didn't use setup.py to install it, or if you don't have your module on your PYTHONPATH, then you won't find your module regardless of how you are serializing things.
Even more edits:
Looking at your code, you should be using the singleton pattern for patrons… it should not be inside the class Patron. The block of code at the class level to load the patrons into Patron.patrons is sure to cause problems… and probably bound to be the source of some form of errors. I also see that you are pickling the attribute Patrons.patrons (not even the class itself) from inside the Patrons class -- this is madness -- don't do it. Also notice that when you are trying to obtain the patrons, you use Patron.patrons… this is calling the class object and not an instance. Move patrons outside of the class, and use the singleton directly as a list of patrons. Also you should typically be using the patrons instance, so if you wanted to have each patron know who all the other patrons are, p = Patron('Joe', 'Blow'), then p.patrons to get all patrons… but you'd need to write a Patrons.load method that reads the singleton list of patrons… you could also use a property to make the load give you something that looks like an attribute.
If you build a singleton of patrons (as a list)… or a "registry" of patrons (as a dict) if you like, then just check if a patrons pickle file exists… to load to the registry… and don't do it from inside the Patrons class… things should go much better. Your code currently is trying to load a class instance on a class definition while it builds that class object. That's bad...
Also, don't expect people to go downloading your code and debugging it for you, when you don't present a minimal test case or sufficient info for how the traceback was created.
You may have hit on a valid pickling error in dill for some dark corner case, but I can't tell b/c I can't reproduce your error. However, I can tell that you need some refactoring.
And just to be explicit:
Move your patrons initializing mess from Patrons into a new file patrons.py
import os
import dill as pickle
#Initialize patrons with saved pickle data
if os.path.isfile('patrons.pkl'):
with open("patrons.pkl", 'rb') as f:
patrons = pickle.load(f)
else: patrons = []
Then in patronmaker.py, and everywhere else you need the singleton…
import dill as pickle
import os.path
import patrons as the
class Patron:
def __init__(self, lname, fname):
self.lname = lname.title()
self.fname = fname.title()
self.terrCheckedOutHistory = {}
#Add any created Patron to patrons list
the.patrons.append(self)
#Preserve this person via pickle
with open('patrons.pkl', 'wb') as f:
pickle.dump(the.patrons, f)
And you should be fine unless your code is hitting one of the cases that attributes on modules can't be serialized because they were added dynamically (see https://github.com/uqfoundation/dill/pull/47), which should definitely make pickle fail, and in some cases dill too… probably with an AtrributeError on the module. I just can't reproduce this… and I'm done.