I have a problem loading objects via numpy.load after renaming a module.
Here's a simple example showing the problem.
Imagine having a class defined in mymodule.py:
class MyClass(object):
a = "ciao"
b = [1, 2, 3]
def __init__(self, value=2):
self.value = value
from a python session I can simply create an instance and save it:
import numpy as np
import mymodule
instance = mymodule.MyClass()
np.save("dump.npy", instance)
Loading the file works nicely (even from a fresh session started in the same folder):
np.load("dump.npy")
If I now rename the module:
mv mymodule.py mymodule2.py
the loading fails. This is expected, but I was hoping that by importing the module before loading:
import mymodule2 as mymodule
the object definition could be found ... but it does not work.
This means that:
1. I do not understand how it works
2. I am forced to keep a symbolic link to the renamed file in a project I am partially refactoring.
Is there anything else I can do do avoid the symbolic link solution ? and to avoid having the same problem in the future ?
Thanks a lot,
marco
[this is my first question here, sorry If I am doing something wrong]
NumPy uses pickle for arrays with objects, but adds a header on top of it. Therefore, you'll need to do a bit more than coding a custom Unpickler:
import pickle
from numpy.lib.format import read_magic, _check_version, _read_array_header
class RenamingUnpickler(pickle.Unpickler):
def find_class(self, module, name):
if module == 'mymodule':
module = 'mymodule2'
return super().find_class(module, name)
with open('dump.npy', 'rb') as fp:
version = read_magic(fp)
_check_version(version)
dtype = _read_array_header(fp, version)[2]
assert dtype.hasobject
print(RenamingUnpickler(fp).load())
Related
I have my own repository created in BitBucket.
In that repository, I have a file named core.py and an __init__.py file
I tried to import the core module, and I fixed all the requirements that were needed.
Now when I am finally able to import the module using ipython, which is only one big class, with the call:
obj = MyClass()
I get an error:
name 'MyClass()' is not defined
even though it seems the module was imported.
Let me know if more information is Needed.
As you stated in your comment, you are importing core.py:
from mintigocloudstorage import core
That means, you also have to tell your script where to find your class:
obj = core.MyClass()
If the import was sucessfull as you say, Python should now be able to locate your classes definition.
Alternatively you can also import your class:
from mintigocloudstorage.core import MyClass
obj = MyClass()
I was writing a code in python and got stuck with a doubt. Seems irrelevant but can't get over it. The thing is when I import a module and use it as below:
import math
print math.sqrt(9)
Here I see math(module) as a class which had a method sqrt(). If that is the case then how can I directly use the class without creating an object of it. I am basically unable to understand here the abstraction between class and and object.
Modules are more like objects, not like classes. You don't "instantiate" a module, there's only one instance of each module and you can access it using the import statement.
Specifically, modules are objects of type 'module':
>>> import math
>>> type(math)
<type 'module'>
Each module is going to have a different set of variables and methods.
Modules are instantiated by Python, whenever they are first imported. Modules that have been instantiated are stored in sys.modules:
>>> import sys
>>> 'math' in sys.modules
False
>>> import math
>>> 'math' in sys.modules
True
>>> sys.modules['math'] is math
True
AFAIK all python modules (like math and million more) are instantiated when you have imported it. How many times are they instantiated you ask ? Just once! All modules are singletons.
Just saying the above statement isn't enough so let's dive deep into it.
Create a python module ( module is basically any file ending with ".py" extension ) say "p.py" containing some code as follows:
In p.py
print "Instantiating p.py module. Please wait..."
# your good pythonic optimized functions, classes goes here
print "Instantiating of p.py module is complete."
and in q.py try importing it
import p
and when you run q.py you will see..
Instantiating p.py module. Please wait...
Instantiating of p.py module is complete.
Now have you created an instance of it ? NO! But still you have it up and running ready to be used.
In your case math is not a class. When you import math the whole module math is imported. You can see it like the inclusion of a library (the concept of it).
If you want to avoid to import the whole module (in order to not have everything included in your program), you can do something like this:
from math import sqrt
print sqrt(9)
This way only sqrt is imported and not everything from the math module.
Here I see math(module) as a class which had a method sqrt(). If that is the case then how can I directly use the class without creating an object of it. I am basically unable to understand here the abstraction between class and and object.
When you import a module, the module object is created. Just like when you use open('file.txt') a file object will be created.
You can use a class without creating an object from it by referencing the class name:
class A:
value = 2 + 2
A.value
class A is an object of class type--the built-in class used to create classes. Everything in Python is an object.
When you call the class A() that's how you create an object. *Sometimes objects are created by statements like import creates a module object, def creates a function object, classcreates a class object that creates other objects and many other statements...
Lets say I have the following 2 classes in module a
class Real(object):
...
def print_stuff(self):
print 'real'
class Fake(Real):
def print_stff(self):
print 'fake'
in module b it uses the Real class
from a import Real
Real().print_stuff()
How do I monkey patch so that when b imports Real it's actually swapped with Fake?
I was trying to do like this in initialize script but it doesn't work.
if env == 'dev':
from a import Real, Fake
Real = Fake
My purpose is to use the Fake class in development mode.
You can use patch from the mock module. Here is an example:
with patch('yourpackage.b.Real') as fake_real:
fake_real.return_value = Fake()
foo = b.someClass()
foo.somemethod()
The issue is that when you do -
from a import Real, Fake
You are basically importing those two classes into your initialize script's namespace and creating Real and Fake names in the initialize script's namespace. Then you make the name Real in initialize script point to Fake , but that does not change anything in the actual a module.
If initialize script is another .py module/script at runs at the start of your original program , then you can use the below -
if env == 'dev':
import a
a.Real = a.Fake
Please note, this would make a.Real to refer to the Fake class whenever you use Real from a module after the above line is executed.
Though I would suggest that a better way would be to do this in your a module itself, by making it possible to check the env in that module, as -
if <someothermodule>.env == 'dev':
Real = Fake
As was asked in the comments -
Doesn't import a also import into initialize script's namespace? What's the difference between importing modules and classes?
The thing is that when you import just the class using from a import class , what you actually do is create that variable, class in your module namespace (in the module that you import it to) , changing that variable to point to something new in that module namespace does not affect the original class in its original module-object, its only affected in the module in which its changed.
But when you do import a, you are just importing the module a (and while importing the module object also gets cached in the sys.modules dictionary, so any other imports to a from any other modules would get this cached version from sys.modules ) (Another note, is that from a import something also internally imports a and caches it in sys.modules, but lets not get into those details as I think that is not necessary here).
And then when you do a.Real = <something> , you are changing the Real attribute of a module object, which points to the class, to something else, this mutates the a module directly, hence the change is also reflected, when the module a gets imported from some other module.
Following the suggestion here, my package (or the directory containing my modules) is located at C:/Python34/Lib/site-packages. The directory contains an __init__.py and sys.path contains a path to the directory as shown.
Still I am getting the following error:
Traceback (most recent call last):
File "C:/Python34/Lib/site-packages/toolkit/window.py", line 6, in <module>
from catalogmaker import Catalog
File "C:\Python34\Lib\site-packages\toolkit\catalogmaker.py", line 1, in <module>
from patronmaker import Patron
File "C:\Python34\Lib\site-packages\toolkit\patronmaker.py", line 4, in <module>
class Patron:
File "C:\Python34\Lib\site-packages\toolkit\patronmaker.py", line 11, in Patron
patrons = pickle.load(f)
ImportError: No module named 'Patron'
I have a class in patronmaker.py named 'Patron' but no module named Patron so I am not sure what the last statement in the error message means. I very much appreciate your thoughts on what I am missing.
Python Version 3.4.1 on a Windows 32 bits machine.
You are saving all patron instances (i.e. self) to the Patron class attribute Patron.patrons. Then you are trying to pickle a class attribute from within the class. This can choke pickle, however I believe dill should be able to handle it. Is it really necessary to save all the class instances to a list in Patrons? It's a bit of an odd thing to do…
pickle serializes classes by reference, and doesn't play well with __main__ for many objects. In dill, you don't have to serialize classes by reference, and it can handle issues with __main__, much better. Get dill here: https://github.com/uqfoundation
Edit:
I tried your code (with one minor change) and it worked.
dude#hilbert>$ python patronmaker.py
Then start python…
>>> import dill
>>> f = open('patrons.pkl', 'rb')
>>> p = dill.load(f)
>>> p
[Julius Caeser, Kunte Kinta, Norton Henrich, Mother Teresa]
The only change I made was to uncomment the lines at the end of patronmaker.py so that it saved some patrons…. and I also replaced import pickle with import dill as pickle everywhere.
So, even by downloading and running your code, I can't produce an error with dill. I'm using the latest dill from github.
Additional Edit:
Your traceback above is from an ImportError. Did you install your module? If you didn't use setup.py to install it, or if you don't have your module on your PYTHONPATH, then you won't find your module regardless of how you are serializing things.
Even more edits:
Looking at your code, you should be using the singleton pattern for patrons… it should not be inside the class Patron. The block of code at the class level to load the patrons into Patron.patrons is sure to cause problems… and probably bound to be the source of some form of errors. I also see that you are pickling the attribute Patrons.patrons (not even the class itself) from inside the Patrons class -- this is madness -- don't do it. Also notice that when you are trying to obtain the patrons, you use Patron.patrons… this is calling the class object and not an instance. Move patrons outside of the class, and use the singleton directly as a list of patrons. Also you should typically be using the patrons instance, so if you wanted to have each patron know who all the other patrons are, p = Patron('Joe', 'Blow'), then p.patrons to get all patrons… but you'd need to write a Patrons.load method that reads the singleton list of patrons… you could also use a property to make the load give you something that looks like an attribute.
If you build a singleton of patrons (as a list)… or a "registry" of patrons (as a dict) if you like, then just check if a patrons pickle file exists… to load to the registry… and don't do it from inside the Patrons class… things should go much better. Your code currently is trying to load a class instance on a class definition while it builds that class object. That's bad...
Also, don't expect people to go downloading your code and debugging it for you, when you don't present a minimal test case or sufficient info for how the traceback was created.
You may have hit on a valid pickling error in dill for some dark corner case, but I can't tell b/c I can't reproduce your error. However, I can tell that you need some refactoring.
And just to be explicit:
Move your patrons initializing mess from Patrons into a new file patrons.py
import os
import dill as pickle
#Initialize patrons with saved pickle data
if os.path.isfile('patrons.pkl'):
with open("patrons.pkl", 'rb') as f:
patrons = pickle.load(f)
else: patrons = []
Then in patronmaker.py, and everywhere else you need the singleton…
import dill as pickle
import os.path
import patrons as the
class Patron:
def __init__(self, lname, fname):
self.lname = lname.title()
self.fname = fname.title()
self.terrCheckedOutHistory = {}
#Add any created Patron to patrons list
the.patrons.append(self)
#Preserve this person via pickle
with open('patrons.pkl', 'wb') as f:
pickle.dump(the.patrons, f)
And you should be fine unless your code is hitting one of the cases that attributes on modules can't be serialized because they were added dynamically (see https://github.com/uqfoundation/dill/pull/47), which should definitely make pickle fail, and in some cases dill too… probably with an AtrributeError on the module. I just can't reproduce this… and I'm done.
Is there a way to load a module twice in the same python session?
To fill this question with an example: Here is a module:
Mod.py
x = 0
Now I would like to import that module twice, like creating two instances of a class to have actually two copies of x.
To already answer the questions in the comments, "why anyone would want to do that if they could just create a class with x as a variable":
You are correct, but there exists some huge amount of source that would have to be rewritten, and loading a module twice would be a quick fix^^.
Yes, you can load a module twice:
import mod
import sys
del sys.modules["mod"]
import mod as mod2
Now, mod and mod2 are two instances of the same module.
That said, I doubt this is ever useful. Use classes instead -- eventually it will be less work.
Edit: In Python 2.x, you can also use the following code to "manually" import a module:
import imp
def my_import(name):
file, pathname, description = imp.find_module(name)
code = compile(file.read(), pathname, "exec", dont_inherit=True)
file.close()
module = imp.new_module(name)
exec code in module.__dict__
return module
This solution might be more flexible than the first one. You no longer have to "fight" the import mechanism since you are (partly) rolling your own one. (Note that this implementation doesn't set the __file__, __path__ and __package__ attributes of the module -- if these are needed, just add code to set them.)
Deleting an entry from sys.modules will not necessarily work (e.g. it will fail when importing recurly twice, if you want to work with multiple recurly accounts in the same app etc.)
Another way to accomplish this is:
>>> import importlib
>>> spec = importlib.util.find_spec(module_name)
>>> instance_one = importlib.util.module_from_spec(spec)
>>> instance_two = importlib.util.module_from_spec(spec)
>>> instance_one == instance_two
False
You could use the __import__ function.
module1 = __import__("module")
module2 = __import__("module")
Edit: As it turns out, this does not import two separate versions of the module, instead module1 and module2 will point to the same object, as pointed out by Sven.