Following the suggestion here, my package (or the directory containing my modules) is located at C:/Python34/Lib/site-packages. The directory contains an __init__.py and sys.path contains a path to the directory as shown.
Still I am getting the following error:
Traceback (most recent call last):
File "C:/Python34/Lib/site-packages/toolkit/window.py", line 6, in <module>
from catalogmaker import Catalog
File "C:\Python34\Lib\site-packages\toolkit\catalogmaker.py", line 1, in <module>
from patronmaker import Patron
File "C:\Python34\Lib\site-packages\toolkit\patronmaker.py", line 4, in <module>
class Patron:
File "C:\Python34\Lib\site-packages\toolkit\patronmaker.py", line 11, in Patron
patrons = pickle.load(f)
ImportError: No module named 'Patron'
I have a class in patronmaker.py named 'Patron' but no module named Patron so I am not sure what the last statement in the error message means. I very much appreciate your thoughts on what I am missing.
Python Version 3.4.1 on a Windows 32 bits machine.
You are saving all patron instances (i.e. self) to the Patron class attribute Patron.patrons. Then you are trying to pickle a class attribute from within the class. This can choke pickle, however I believe dill should be able to handle it. Is it really necessary to save all the class instances to a list in Patrons? It's a bit of an odd thing to do…
pickle serializes classes by reference, and doesn't play well with __main__ for many objects. In dill, you don't have to serialize classes by reference, and it can handle issues with __main__, much better. Get dill here: https://github.com/uqfoundation
Edit:
I tried your code (with one minor change) and it worked.
dude#hilbert>$ python patronmaker.py
Then start python…
>>> import dill
>>> f = open('patrons.pkl', 'rb')
>>> p = dill.load(f)
>>> p
[Julius Caeser, Kunte Kinta, Norton Henrich, Mother Teresa]
The only change I made was to uncomment the lines at the end of patronmaker.py so that it saved some patrons…. and I also replaced import pickle with import dill as pickle everywhere.
So, even by downloading and running your code, I can't produce an error with dill. I'm using the latest dill from github.
Additional Edit:
Your traceback above is from an ImportError. Did you install your module? If you didn't use setup.py to install it, or if you don't have your module on your PYTHONPATH, then you won't find your module regardless of how you are serializing things.
Even more edits:
Looking at your code, you should be using the singleton pattern for patrons… it should not be inside the class Patron. The block of code at the class level to load the patrons into Patron.patrons is sure to cause problems… and probably bound to be the source of some form of errors. I also see that you are pickling the attribute Patrons.patrons (not even the class itself) from inside the Patrons class -- this is madness -- don't do it. Also notice that when you are trying to obtain the patrons, you use Patron.patrons… this is calling the class object and not an instance. Move patrons outside of the class, and use the singleton directly as a list of patrons. Also you should typically be using the patrons instance, so if you wanted to have each patron know who all the other patrons are, p = Patron('Joe', 'Blow'), then p.patrons to get all patrons… but you'd need to write a Patrons.load method that reads the singleton list of patrons… you could also use a property to make the load give you something that looks like an attribute.
If you build a singleton of patrons (as a list)… or a "registry" of patrons (as a dict) if you like, then just check if a patrons pickle file exists… to load to the registry… and don't do it from inside the Patrons class… things should go much better. Your code currently is trying to load a class instance on a class definition while it builds that class object. That's bad...
Also, don't expect people to go downloading your code and debugging it for you, when you don't present a minimal test case or sufficient info for how the traceback was created.
You may have hit on a valid pickling error in dill for some dark corner case, but I can't tell b/c I can't reproduce your error. However, I can tell that you need some refactoring.
And just to be explicit:
Move your patrons initializing mess from Patrons into a new file patrons.py
import os
import dill as pickle
#Initialize patrons with saved pickle data
if os.path.isfile('patrons.pkl'):
with open("patrons.pkl", 'rb') as f:
patrons = pickle.load(f)
else: patrons = []
Then in patronmaker.py, and everywhere else you need the singleton…
import dill as pickle
import os.path
import patrons as the
class Patron:
def __init__(self, lname, fname):
self.lname = lname.title()
self.fname = fname.title()
self.terrCheckedOutHistory = {}
#Add any created Patron to patrons list
the.patrons.append(self)
#Preserve this person via pickle
with open('patrons.pkl', 'wb') as f:
pickle.dump(the.patrons, f)
And you should be fine unless your code is hitting one of the cases that attributes on modules can't be serialized because they were added dynamically (see https://github.com/uqfoundation/dill/pull/47), which should definitely make pickle fail, and in some cases dill too… probably with an AtrributeError on the module. I just can't reproduce this… and I'm done.
Related
This is probably a relatively common concern but I haven't found any real answer so far.
Let's use an example to illustrate why Pickle is not satisfying here: I have a file which contains a class that takes a long time (tens of hours) to instantiate. Once instantiated, I then save it to a pickle file:
import pickle
import time
from foo.bar import baz
class MyClass:
def __init__(self):
self.message = baz.very_long_computation()
obj = MyClass()
with open('./obj.pickle', 'wb') as f:
pickle.dump(obj)
From a different file in the same repository, I want to load the instantiated object and use it, without having to wait:
import pickle
with open('./obj.pickle', 'rb') as f:
obj = pickle.load(f)
print(obj.message)
What can happen sometimes is that I change e.g. the baz module's location in the folder structure (without necessarily modify its content), for instance I move it from foo.bar.baz to qux.bar.baz.
Now, if I try to load obj.pickle again, Python will complain with a ModuleNotFoundError: No module named foo.
It seems that whenever I make the slightest refactoring, I have to completely re-instantiate MyClass, involving this long waiting time. In practice this is terribly annoying, hence my two questions:
is there any alternative to Pickle that wouldn't force re-instantiating?
is there a way to avoid this problem while still using Pickle?
Note: I understand that in the above example, I would simply have to save self.message into a text file and call it a day. In reality, it would be complex to persist the object's content in such a manner.
I have a problem loading objects via numpy.load after renaming a module.
Here's a simple example showing the problem.
Imagine having a class defined in mymodule.py:
class MyClass(object):
a = "ciao"
b = [1, 2, 3]
def __init__(self, value=2):
self.value = value
from a python session I can simply create an instance and save it:
import numpy as np
import mymodule
instance = mymodule.MyClass()
np.save("dump.npy", instance)
Loading the file works nicely (even from a fresh session started in the same folder):
np.load("dump.npy")
If I now rename the module:
mv mymodule.py mymodule2.py
the loading fails. This is expected, but I was hoping that by importing the module before loading:
import mymodule2 as mymodule
the object definition could be found ... but it does not work.
This means that:
1. I do not understand how it works
2. I am forced to keep a symbolic link to the renamed file in a project I am partially refactoring.
Is there anything else I can do do avoid the symbolic link solution ? and to avoid having the same problem in the future ?
Thanks a lot,
marco
[this is my first question here, sorry If I am doing something wrong]
NumPy uses pickle for arrays with objects, but adds a header on top of it. Therefore, you'll need to do a bit more than coding a custom Unpickler:
import pickle
from numpy.lib.format import read_magic, _check_version, _read_array_header
class RenamingUnpickler(pickle.Unpickler):
def find_class(self, module, name):
if module == 'mymodule':
module = 'mymodule2'
return super().find_class(module, name)
with open('dump.npy', 'rb') as fp:
version = read_magic(fp)
_check_version(version)
dtype = _read_array_header(fp, version)[2]
assert dtype.hasobject
print(RenamingUnpickler(fp).load())
Previously I defined an ElectrodePositionsModel class in the module gselu.py in the package gselu, and pickled the ElectrodePositionsModel objects into some files.
Some time later it was decided to refactor the project and the package name gselu was changed to ielu
When I attempt to unpickle the old pickle files with pickle.load(), the process fails with the error, 'module' object has no attribute 'ElectrodePositionsModel'. What I understand of the Unpicklers behavior is that this is because the pickle thinks it has stored an instance of gselu.gselu.ElectrodePositionsModel, and tries to therefore import this class from this module. When it doesn't exist, it gives up.
I think that I am supposed to add something to the module's init.py to tell it where the gselu.gselu.ElectrodePositionsModel is, but I can't get the pickle.load() function to give me any error message other than 'module' has no attribute 'ElectrodePositionsModel' and I can't figure out where I am supposed to provide the correct path to find it. The code that does the unpickling is in the same module file (gselu.py) as the ElectrodePositionsModel class.
When I load the pickle file in an ipython session and manually import ElectrodePositionsModel, it loads correctly.
How do I tell the pickler where to load this module?
I realise this questions is old, but I just ran into a similar problem.
What I did to solve it was to take the old code and unpickle the data using that.
Then instead of pickling directly the custom classes, I pickled CustomClass.__dict__ which only contained raw python.
This data could then easily be imported in the new module by doing
a = NewNameOfCustomClass()
a.__dict__ = pickle.load('olddata.p', 'rb')
This method works if your custom class only has standard variables (such as builtins or numpy arrays, etc).
(There are many similar and more generic questions, been trying the solutions from them after reading through them, can't get them working so asking here as a more situation-specific version of what I'm seeing)
I think I am really miss-understanding how Python does OOP due to my more C#/C++ background. So here's what I'm trying to do right this moment.
I'm starting with two modules to set up the rest of my project, partially as a sanity-check and proof-of-concept. One module logs things to a file as I go while also storing data from multiple modules (to eventually package them all and dump them on request) Doing all this in PyCharm and mentioning the error warnings it suggests by the way, and using Python 2.7
Module 1:
src\helpers\logHelpers.py
class LogHelpers:
class log:
def classEnter():
#doing stuff
def __init__(self):
self.myLog = LogHelpers.log() #forgot to mention this was here initially
[..] various logging functions and variables to summarize what's happening
__builtin__.mylogger = LogHelpers
Module 2:
src\ULTs\myULTs.py
mylogger.myLog.classEnter()
(both the modules and the root src\ have an empty init.py file in them)
So according to the totally awesome response here ( Python - Visibility of global variables in imported modules ) at this stage this should be working, but 'mylogger' becomes an 'unresolved reference'
So that was one approach. I also tried the more straight forward global one ( Python: How to make a cross-module variable? )
Module 1:
src\helpers\logHelpers.py
class LogHelpers:
class log:
def classEnter(self):
#doing stuff
def __init__(self):
self.myLog = LogHelpers.log() #forgot to mention this was here initially
[..] various logging functions and variables to summarize what's happening
mylogger = LogHelpers
__init__.py
__all__ = ['LogHelpers', hexlogger]
from .logHelpers import *
Module 2:
src\ULTs\myULTs.py
from helpers import mylogger
mylogger.myLog.classEnter()
This version gets a "parameter 'self' unfilled" error on the classEnter, which various reports seem to indicate means that mylogger is un-initialized (misleading error code but that's what it seems to mean)
And then I tried this..
Module 1:
src\helpers\logHelpers.py
class LogHelpers:
class log:
def classEnter(self):
#doing stuff
def __init__(self):
self.myLog = LogHelpers.log() #forgot to mention this was here initially
[..] various logging functions and variables to summarize what's happening
__mylogger = LogHelpers
__init__.py
__all__ = ['LogHelpers', hexlogger]
from .logHelpers import *
Module 2:
src\ULTs\myULTs.py
from helpers import mylogger
def someFunction(self):
global mylogger
mylogger.myLog.classEnter()
And this version gets the 'Global variable is undefined at the module level' error when I hover of global mylogger.
Then there is the idea of each other module tracking its own instance of a class apparently, if I end up having to I can go with that method and coordinate them.. but that's kind of a hack considering what I'm trying to do.
That's kind of where I'm at, that's the gist of what I'm trying to do... I'm reading through as many similar questions as I can but all of them seem to come back to these kinda of solutions (which don't seem to be working) or saying 'don't do that' (which is generally good advice but I'm not really grocking the preferred Pythony way of keeping multiple ongoing non-static classes organized for a large project - other than shoving them all in one directory)
Thoughts? (How badly am I mangling Python here?)
[EDIT] Based on feedback tried a mini version that eliminated the inner classes completely:
Ok, so did a local mini-class based on what you said:
class testClass:
def __init__(self):
self.testVar = 2
def incrementVar(self):
self.testVar += 1
myClass = testClass()
Set it up via init.py
__all__ = [myClass]
from .logHelpers import myClass
Went to other module and
from helpers import myClass
class Test_LogHelpers(unittest.TestCase):
def test_mini(self):
myClass.incrementVar()
Ran it directly instead of looking at PyCharm, no Global anything.. NameError: name 'myClass is not defined
So still at square one :( (and still need to store state)
[EDIT] Adding Traceback:
Traceback (most recent call last):
File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 3.4.1\helpers\pycharm\utrunner.py", line 124, in <module> module = loadSource(a[0])
File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 3.4.1\helpers\pycharm\utrunner.py", line 40, in loadSource module = imp.load_source(moduleName, fileName)
File "C:\[...mylocation...]\py\src\ULTs\LogHelpers_ULT.py", line 3, in <module> from helpers import myClass
File "C:\[...mylocation...]\py\src\helpers\__init__.py", line 7, in <module>
__all__ = [myClass]
NameError: name 'myClass' is not defined
============================================================================
kk, I got it working with the miniclass. I don't know why the other approach / approaches was not working, but this seemed to fix things.
(Resources: http://docs.python-guide.org/en/latest/writing/structure/ , http://mikegrouchy.com/blog/2012/05/be-pythonic-__init__py.html )
**logHelpers.py**
[... some static logging functionality ...]
class testClass:
def __init__(self):
self.testVar = 2
def incrementVar(self, source):
self.testVar += 1
mylogger.myLog.info(source + " called, new val: " + str(self.testVar))
myClass = testClass()
**test_LogHelpers_ULT.py**
import unittest
from helpers.logHelpers import myClass
class Test_LogHelpers(unittest.TestCase):
def test_mini(self):
myClass.incrementVar("LogHelpers")
For some reason skipping the
init.py
(and leaving it blank) and going for the explicit importation worked. It also maintained state - I created a duplicate of the test file and my log output correctly had '3' for the first file to call the helper, and '4' for the second file to call the helper.
Thanks Daniel Roseman for the help and suggestions, they had me look a bit more in the right direction. If you can spot why the previous stuff wasn't working it would be much appreciate just to add to my understanding of this language, but I'm gonna go ahead and mark your answer as 'Answered' since it had some very useful feedback.
Before I start, note that the PyCharm warnings are not actual Python errors: if you ran your code, you would probably get more useful feedback (remember static analysis of a dynamic language like Python can only get you so far, many things can't be resolved until you actually run the code).
Firstly, it's really not clear why you have nested classes here. The outer class seems completely useless; you should remove it.
The reason for the error message about "self" is that you have defined an instance method, which can only be called on an instance of log. You could make mylogger (absolutely no need for the double-underscore prefix) an instance: mylogger = log() - and then import that, or import the class and instantiate it where it is used.
So in your first snippet, the error message is quite clear: you have not defined mylogger. Using my recommendation above, you can do from helpers import mylogger and then directly call mylogger.classEnter().
Finally, I can't see what that global statement is doing in someFunction. There's no need to declare a name as global unless you plan to reassign it within your scope and have that reassignment reflected in the global scope. You're not doing that here, so no need for global.
By the way, you should also question whether you even need the inner log class. Generally speaking, classes are only useful when you need to store some kind of state in the object. Here, as your docstring says, you have a collection of utility methods. So why put them in a class? Just make them top-level functions inside the logHelpers module (incidentally, Python style prefers lower_case_with_underscore for module names, so it should be "log_helpers.py").
I've recently changed my program's directory layout: before, I had all my modules inside the "main" folder. Now, I've moved them into a directory named after the program, and placed an __init__.py there to make a package.
Now I have a single .py file in my main directory that is used to launch my program, which is much neater.
Anyway, trying to load in pickled files from previous versions of my program is failing. I'm getting, "ImportError: No module named tools" - which I guess is because my module was previously in the main folder, and now it's in whyteboard.tools, not simply plain tools. However, the code that is importing in the tools module lives in the same directory as it, so I doubt there's a need to specify a package.
So, my program directory looks something like this:
whyteboard-0.39.4
-->whyteboard.py
-->README.txt
-->CHANGELOG.txt
---->whyteboard/
---->whyteboard/__init__.py
---->whyteboard/gui.py
---->whyteboard/tools.py
whyteboard.py launches a block of code from whyteboard/gui.py, that fires up the GUI. This pickling problem definitely wasn't happening before the directory re-organizing.
As pickle's docs say, in order to save and restore a class instance (actually a function, too), you must respect certain constraints:
pickle can save and restore class
instances transparently, however the
class definition must be importable
and live in the same module as when
the object was stored
whyteboard.tools is not the "the same module as" tools (even though it can be imported by import tools by other modules in the same package, it ends up in sys.modules as sys.modules['whyteboard.tools']: this is absolutely crucial, otherwise the same module imported by one in the same package vs one in another package would end up with multiple and possibly conflicting entries!).
If your pickle files are in a good/advanced format (as opposed to the old ascii format that's the default only for compatibility reasons), migrating them once you perform such changes may in fact not be quite as trivial as "editing the file" (which is binary &c...!), despite what another answer suggests. I suggest that, instead, you make a little "pickle-migrating script": let it patch sys.modules like this...:
import sys
from whyteboard import tools
sys.modules['tools'] = tools
and then cPickle.load each file, del sys.modules['tools'], and cPickle.dump each loaded object back to file: that temporary extra entry in sys.modules should let the pickles load successfully, then dumping them again should be using the right module-name for the instances' classes (removing that extra entry should make sure of that).
This can be done with a custom "unpickler" that uses find_class():
import io
import pickle
class RenameUnpickler(pickle.Unpickler):
def find_class(self, module, name):
renamed_module = module
if module == "tools":
renamed_module = "whyteboard.tools"
return super(RenameUnpickler, self).find_class(renamed_module, name)
def renamed_load(file_obj):
return RenameUnpickler(file_obj).load()
def renamed_loads(pickled_bytes):
file_obj = io.BytesIO(pickled_bytes)
return renamed_load(file_obj)
Then you'd need to use renamed_load() instead of pickle.load() and renamed_loads() instead of pickle.loads().
Happened to me, solved it by adding the new location of the module to sys.path before loading pickle:
import sys
sys.path.append('path/to/whiteboard')
f = open("pickled_file", "rb")
pickle.load(f)
pickle serializes classes by reference, so if you change were the class lives, it will not unpickle because the class will not be found. If you use dill instead of pickle, then you can serialize classes by reference or directly (by directly serializing the class instead of it's import path). You simulate this pretty easily by just changing the class definition after a dump and before a load.
Python 2.7.8 (default, Jul 13 2014, 02:29:54)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>>
>>> class Foo(object):
... def bar(self):
... return 5
...
>>> f = Foo()
>>>
>>> _f = dill.dumps(f)
>>>
>>> class Foo(object):
... def bar(self, x):
... return x
...
>>> g = Foo()
>>> f_ = dill.loads(_f)
>>> f_.bar()
5
>>> g.bar(4)
4
This is the normal behavior of pickle, unpickled objects need to have their defining module importable.
You should be able to change the modules path (i.e. from tools to whyteboard.tools) by editing the pickled files, as they are normally simple text files.
When you try to load a pickle file that contain a class reference, you must respect the same structure when you saved the pickle. If you want use the pickle somewhere else, you have to tell where this class or other object is; so do this below you can save the day:
import sys
sys.path.append('path/to/folder containing the python module')
For people like me needing to update lots of pickle dumps, here's a function implementing #Alex Martelli's excellent advice:
import sys
from types import ModuleType
import pickle
# import torch
def update_module_path_in_pickled_object(
pickle_path: str, old_module_path: str, new_module: ModuleType
) -> None:
"""Update a python module's dotted path in a pickle dump if the
corresponding file was renamed.
Implements the advice in https://stackoverflow.com/a/2121918.
Args:
pickle_path (str): Path to the pickled object.
old_module_path (str): The old.dotted.path.to.renamed.module.
new_module (ModuleType): from new.location import module.
"""
sys.modules[old_module_path] = new_module
dic = pickle.load(open(pickle_path, "rb"))
# dic = torch.load(pickle_path, map_location="cpu")
del sys.modules[old_module_path]
pickle.dump(dic, open(pickle_path, "wb"))
# torch.save(dic, pickle_path)
In my case, the dumps were PyTorch model checkpoints. Hence the commented-out torch.load/save().
Example
from new.location import new_module
for pickle_path in ('foo.pkl', 'bar.pkl'):
update_module_path_in_pickled_object(
pickle_path, "old.module.dotted.path", new_module
)