Saving a python object along an external function used by it - python

I have some class which take a function as on of its init arguments:
class A(object):
def __init__(self, some_var, some_function):
self.some_function = some_function
self.x = self.some_function(some_var)
I can create a function, pass it to an instance of the object, and save it using pickle:
import pickle as pcl
def some_function(x):
return x*x
a = A(some_var=2, some_function=some_function)
pcl.dump(a, open('a_obj.p', 'wb'))
Now I want to open this object in some other code. However, I don't want to include the def some_function(x): code in each file which uses this specific saved object.
So, what's the best python practice to pass external function as an argument to a python object and then save the object, such that the external function is "implemented" inside the object instance, so it doesn't have to be written in every file which uses the saved object?
Edit: Let me clarify, I don't want to save the function. I want to save only the object. I there's any elegant way to "combine" the external function inside the object so I can pass it as an argument and then it "becomes" part of this object's instance?

The easiest way to do what you are asking is with the dill module.
You can dump an instance of an object like this:
import dill
def f(x):
return x*x
class A(object):
def __init__(self, some_var, some_function):
self.some_function = some_function
self.x = self.some_function(some_var)
a = A(2, f)
a.x
# returns:
# 4
with open('a.dill', 'wb') as fp:
dill.dump(a, fp)
Then create a new instance of python, you can load it back in using:
import dill
with open('a.dill', 'rb') as fp:
a = dill.load(fp)
a.x
# returns:
# 4
a.some_function(4)
# returns:
# 16

If you really, really wanted to do this it would be possible with the Marshal module, from which Pickle is based on. Pickling functions is intentionally not possible for security reasons.
There is also a lot of info you would probably find useful in this question:
Is there an easy way to pickle a python function (or otherwise serialize its code)?

Related

Create instances from just a class without anithing else [duplicate]

Is there a way to circumvent the constructor __init__ of a class in python?
Example:
class A(object):
def __init__(self):
print "FAILURE"
def Print(self):
print "YEHAA"
Now I would like to create an instance of A. It could look like this, however this syntax is not correct.
a = A
a.Print()
EDIT:
An even more complex example:
Suppose I have an object C, which purpose it is to store one single parameter and do some computations with it. The parameter, however, is not passed as such but it is embedded in a huge parameter file. It could look something like this:
class C(object):
def __init__(self, ParameterFile):
self._Parameter = self._ExtractParamterFile(ParameterFile)
def _ExtractParamterFile(self, ParameterFile):
#does some complex magic to extract the right parameter
return the_extracted_parameter
Now I would like to dump and load an instance of that object C. However, when I load this object, I only have the single variable self._Parameter and I cannot call the constructor, because it is expecting the parameter file.
#staticmethod
def Load(file):
f = open(file, "rb")
oldObject = pickle.load(f)
f.close()
#somehow create newObject without calling __init__
newObject._Parameter = oldObject._Parameter
return newObject
In other words, it is not possible to create an instance without passing the parameter file. In my "real" case, however, it is not a parameter file but some huge junk of data I certainly not want to carry around in memory or even store it to disc.
And since I want to return an instance of C from the method Load I do somehow have to call the constructor.
OLD EDIT:
A more complex example, which explains why I am asking the question:
class B(object):
def __init__(self, name, data):
self._Name = name
#do something with data, but do NOT save data in a variable
#staticmethod
def Load(self, file, newName):
f = open(file, "rb")
s = pickle.load(f)
f.close()
newS = B(???)
newS._Name = newName
return newS
As you can see, since data is not stored in a class variable I cannot pass it to __init__. Of course I could simply store it, but what if the data is a huge object, which I do not want to carry around in memory all the time or even save it to disc?
You can circumvent __init__ by calling __new__ directly. Then you can create a object of the given type and call an alternative method for __init__. This is something that pickle would do.
However, first I'd like to stress very much that it is something that you shouldn't do and whatever you're trying to achieve, there are better ways to do it, some of which have been mentioned in the other answers. In particular, it's a bad idea to skip calling __init__.
When objects are created, more or less this happens:
a = A.__new__(A, *args, **kwargs)
a.__init__(*args, **kwargs)
You could skip the second step.
Here's why you shouldn't do this: The purpose of __init__ is to initialize the object, fill in all the fields and ensure that the __init__ methods of the parent classes are also called. With pickle it is an exception because it tries to store all the data associated with the object (including any fields/instance variables that are set for the object), and so anything that was set by __init__ the previous time would be restored by pickle, there's no need to call it again.
If you skip __init__ and use an alternative initializer, you'd have a sort of a code duplication - there would be two places where the instance variables are filled in, and it's easy to miss one of them in one of the initializers or accidentally make the two fill the fields act differently. This gives the possibility of subtle bugs that aren't that trivial to trace (you'd have to know which initializer was called), and the code will be more difficult to maintain. Not to mention that you'd be in an even bigger mess if you're using inheritance - the problems will go up the inheritance chain, because you'd have to use this alternative initializer everywhere up the chain.
Also by doing so you'd be more or less overriding Python's instance creation and making your own. Python already does that for you pretty well, no need to go reinventing it and it will confuse people using your code.
Here's what to best do instead: Use a single __init__ method that is to be called for all possible instantiations of the class that initializes all instance variables properly. For different modes of initialization use either of the two approaches:
Support different signatures for __init__ that handle your cases by using optional arguments.
Create several class methods that serve as alternative constructors. Make sure they all create instances of the class in the normal way (i.e. calling __init__), as shown by Roman Bodnarchuk, while performing additional work or whatever. It's best if they pass all the data to the class (and __init__ handles it), but if that's impossible or inconvenient, you can set some instance variables after the instance was created and __init__ is done initializing.
If __init__ has an optional step (e.g. like processing that data argument, although you'd have to be more specific), you can either make it an optional argument or make a normal method that does the processing... or both.
Use classmethod decorator for your Load method:
class B(object):
def __init__(self, name, data):
self._Name = name
#store data
#classmethod
def Load(cls, file, newName):
f = open(file, "rb")
s = pickle.load(f)
f.close()
return cls(newName, s)
So you can do:
loaded_obj = B.Load('filename.txt', 'foo')
Edit:
Anyway, if you still want to omit __init__ method, try __new__:
>>> class A(object):
... def __init__(self):
... print '__init__'
...
>>> A()
__init__
<__main__.A object at 0x800f1f710>
>>> a = A.__new__(A)
>>> a
<__main__.A object at 0x800f1fd50>
Taking your question literally I would use meta classes :
class MetaSkipInit(type):
def __call__(cls):
return cls.__new__(cls)
class B(object):
__metaclass__ = MetaSkipInit
def __init__(self):
print "FAILURE"
def Print(self):
print "YEHAA"
b = B()
b.Print()
This can be useful e.g. for copying constructors without polluting the parameter list.
But to do this properly would be more work and care than my proposed hack.
Not really. The purpose of __init__ is to instantiate an object, and by default it really doesn't do anything. If the __init__ method is not doing what you want, and it's not your own code to change, you can choose to switch it out though. For example, taking your class A, we could do the following to avoid calling that __init__ method:
def emptyinit(self):
pass
A.__init__ = emptyinit
a = A()
a.Print()
This will dynamically switch out which __init__ method from the class, replacing it with an empty call. Note that this is probably NOT a good thing to do, as it does not call the super class's __init__ method.
You could also subclass it to create your own class that does everything the same, except overriding the __init__ method to do what you want it to (perhaps nothing).
Perhaps, however, you simply wish to call the method from the class without instantiating an object. If that is the case, you should look into the #classmethod and #staticmethod decorators. They allow for just that type of behavior.
In your code you have put the #staticmethod decorator, which does not take a self argument. Perhaps what may be better for the purpose would a #classmethod, which might look more like this:
#classmethod
def Load(cls, file, newName):
# Get the data
data = getdata()
# Create an instance of B with the data
return cls.B(newName, data)
UPDATE: Rosh's Excellent answer pointed out that you CAN avoid calling __init__ by implementing __new__, which I was actually unaware of (although it makes perfect sense). Thanks Rosh!
I was reading the Python cookbook and there's a section talking about this: the example is given using __new__ to bypass __init__()
>>> class A:
def __init__(self,a):
self.a = a
>>> test = A('a')
>>> test.a
'a'
>>> test_noinit = A.__new__(A)
>>> test_noinit.a
Traceback (most recent call last):
File "", line 1, in
test_noinit.a
AttributeError: 'A' object has no attribute 'a'
>>>
However I think this only works in Python3. Below is running under 2.7
>>> class A:
def __init__(self,a):
self.a = a
>>> test = A.__new__(A)
Traceback (most recent call last):
File "", line 1, in
test = A.__new__(A)
AttributeError: class A has no attribute '__new__'
>>>
As I said in my comment you could change your __init__ method so that it allows creation without giving any values to its parameters:
def __init__(self, p0, p1, p2):
# some logic
would become:
def __init__(self, p0=None, p1=None, p2=None):
if p0 and p1 and p2:
# some logic
or:
def __init__(self, p0=None, p1=None, p2=None, init=True):
if init:
# some logic

Using a classmethod to retrieve or load data on init

I have a time-consuming database lookup (downloads data from online) which I want to avoid doing constantly, so I would like to pickle the data if I don't already have it.
This data is being used by the class which has this classmethod.
Is this a 'proper' or expected use of a classmethod? I feel like I could fairly easily refactor it to be an instance method but it feels like it should be a classmethod due to what it's doing. Below is a mockup of the relevant parts of the class.
import os
import pickle
class Example:
def __init__(self):
self.records = self.get_records()
#classmethod
def get_records(cls):
"""
If the records aren't already downloaded from the server,
get them and add to a pickle file.
Otherwise, just load the pickle file.
"""
if not os.path.exists('records.pkl'):
# Slow request
records = get_from_server()
with open('records.pkl', 'wb') as rec_file:
pickle.dump(records, rec_file)
else:
with open('records.pkl', 'rb') as rec_file:
records = pickle.load(rec_file)
return records
def use_records(self):
for item in self.records:
...
Is there also an easy way to refactor this so that I can retrieve the data on request, even if the pickle file exists? Is that as simple as just adding another argument to the classmethod?
Thanks for any help.

Variable behaviour in python - making more efficient code

Trying to do some optimization here on a class. We're trying not to change too much the class definitions. In essence we are instantiating a ClassA N times but one of the methods has a nasty file read.
for x in range(0, N):
cl = ClassA()
cl.dostuff(x)
The class looks like this:
class ClassA:
def dostuff(self, x):
#open nasty file here
nastyfile = open()
do something else
We could bring that file read out of the class and put in before the loop as the file will not change. But is there a way we can ensure that we only ever open the nasty file once for instances of the class. I.e. so for example on the first instantiate of the class it is defined for all future instances of the class without having to read in again. Is there a way to do this in the current form without really changing the structure too much of the existing code base.
One question relates to the interpreter - i.e. is python smart enough to cache variables just as nastyfile, so that we do as we are, or is the quick and dirty solution the following:
nastyfile = open()
for x in range(0, 1):
cl = ClassA()
cl.dostuff(x)
Looking for a pythonic way to do this.
You could encapsulate opening the file in a classmethod.
class ClassA():
#classmethod
def open_nasty_file(cls):
cls.nasty_file = open('file_path', 'file_mode')
def do_stuff(self):
if not hasattr(self, 'nasty_file'):
self.open_nasty_file()
This approach relies on the fact that attribute look-ups will try finding the attribute on the class if not found on the instance.
You could put this check/instantiation in the __init__ function if you want it opened when the first instance is instantiated.
Note that this method will leave the file open, so it will need to be closed at some point.
You could have a class method that opens the file when the first instance asks for it. I've wrapped it in a lock so that it is thread safe.
import threading
class ClassA:
_nasty_file = None
_nasty_file_lock = threading.Lock()
def dostuff(self, x):
#open nasty file here
nastyfile = get_nasty_file()
do something else
#classmethod
def get_nasty_file(cls):
with cls._nasty_file_lock:
if cls._nasty_file is None:
with open('nastyfile') as fp:
cls._nasty_file = fp.read()
return cls._nasty_file
Instances can access and modify class attributes by themselves. So you can just set up an attribute on the class and provide it with a default (None) value, and then check for that value before doing anything in dostuff. Example:
class A():
nastyfileinfo=None
def dostuff(self,x):
if A.nastyfileinfo: print('nastyfileinfo already exists:',A.nastyfileinfo)
if not A.nastyfileinfo:
print('Adding nastyfileinfo')
A.nastyfileinfo='This is really nasty' ## open()
print('>>>nastyfileinfo:',A.nastyfileinfo)
## Continue doing your other stuff involving x
for j in range(0,10):
A().dostuff(j)
nastyfileinfo is also considered an attribute of the instance, so you can reference it with instance.nastyfileinfo, however if you modify it there it will only update for that one specific instance, whereas if you modify it on the class, all other instances will be able to see it (provided they didn't change their personal/self reference to nastyfileinfo).
instants=[]
for j in range(0,10):
instants.append(A())
for instance in instants:
print(instance.nastyfileinfo)
instants[5].dostuff(5)
for instance in instants:
print(instance.nastyfileinfo)

Issue when I try to use serialized objects in python: when I get them back they lost their type

I'm working on a computing program where after each step of this program, I serialize data to be able to do a back up in case of crash.
In the last step I have to write reports through xlsxwriter.
So, when I un-serialize data (a list of beam) I can't use methods directly.
Here is the class used to serialize and un-serialize data:
import pickle
class Serializer(object):
#staticmethod
def serialize(obj):
return pickle.dumps(obj,protocol=pickle.HIGHEST_PROTOCOL)
#staticmethod
def unserialize(serializedDatas):
return pickle.loads (serializedDatas)
#staticmethod
def serializeToFile(obj,file):
return pickle.dump(obj,file,protocol=pickle.HIGHEST_PROTOCOL)
#staticmethod
def unserializeFile(file):
return pickle.load(file)
Here is my beam class:
from TypeControl import TypeControl
from Point import Point
from Section import Section
import copy
import numpy as np
class Beam(TypeControl):
def __init__(self,number,Name,Section,pt1,pt2):
super(self.__class__,self).__init__([],Name)
self.get().append(TypeControl(pt1,pt1.getName()))
self.get().append(TypeControl(pt2,pt2.getName()))
self.get().append(TypeControl(copy.deepcopy(Section),Section.getName()))
self.get().append(TypeControl(number,"Number"))
def getPointA (self):
return self.get()[0].get()
def getPointB (self):
return self.get()[1].get()
def getSection (self):
return self.get()[2].get()
def getNumber (self):
return self.get()[3].get()
def getLength(self):
vect=self.getPointA()-self.getPointB()
return np.sqrt(vect.getx()*vect.getx()+vect.gety()*vect.gety()+vect.getz()*vect.getz())
TypeControl is a class used to ensure the type of a data will not change during the execution of the program.
Now let's talk about my problem:
-the problem occur randomly
-when it works b.getName() return a name, when it don't works is executed as "pass" instruction, it don't raise any errors
-I've used the debugger to get more informations:
b is supposed to be my Beam object
(Pdb) b.getName()
*** The specified object '.getName()' is not a function or was not found along sys.path.
I expected a name ,getName() is a method of TypeControl Class, it should work.
(Pdb) Beam.getName(b)
'ExtColumnHP2_M2'
You might think that I try to use the wrong object due to a confusion. And it may work due to inheritance or something else.
Look at the following:
(Pdb) b
it returns nothing in the shell, I expected something like :
class 'Beam.Beam'
The question is what is the type of b?
(Pdb) type(b)
class 'Beam.Beam'
I would like it to return the good type:
(Pdb) b.dict
*** The specified object '.dict' is not a function or was not found along sys.path.
In c++, to solve my problem I would have tried: b=(Beam)b
Does someone know why I lost methods linked to this object?
Someone can give me a solution to use b as a common object?
Why before the serialization everything was working well?
The obvious solution is to do something like that: b=Beam(b),but that does not seem very smart.

Test if some field has been initialized in python

I am trying to write a test in Python that checks if a method in a class that I am writing sets the attribute value for a dataset in some Hdf file. The logic is the following: An instance of the class is constructed by passing an instance of h5py.File, then one method creates a dataset inside this file. In the next step I have another method that sets certain attributes for this dataset.
What I am trying to test is if my class method create_attributes(self,attributes) sets the field hdf_file[dset_name].attrs[attr_name] to some value that is passed in the variable attributes. However, I would like to avoid to actually create a Hdf file. So far I have tried to mock an instance of a hdf file and work with that. The minimal working code example would be the following:
import h5py
class TestSomething:
#mock.patch('h5py.File')
def test_if_attr_is_initialized(self,mock_hdf):
# Here I would like to call a function that basically executes
# the following line:
mock_hdf['test_dset'].attrs['test_field']='value'
# Then I want to check if the attribute field has been assigned
assert mock_hdf['test_dset'].attrs['test_field']=='value'
Can anybody help me finding the correct thing to do to check whether or not the attribute in the hdf file is set correctly? Any help would be greatly appreciated, I am a complete newbie to all the mocking techniques.
Edit:
In the following I am providing a minimal code example for both the class, and the respective test as requeseted by wwii:
import h5py
class HdfWriter():
def __init__(self,hdf_file):
self.hdf_file=hdf_file
def create_attrs(self,attributes):
dset_name=attributes.keys()[0]
attrs=attributes[dset_name]
for key in attrs:
self.hdf_file[dset_name].attrs[key]=attrs[key]
Please note here that with a real hdf file I would first have to create a dataset but I would like to leave that for another test. The following test should just check, whether for a hypothetical hdf file, which has the dataset test_dset the attributes for this data set are written:
import h5py
import HdfWriter
class TestSomething:
#mock.patch('h5py.File')
def test_if_attr_is_initialized(self,mock_hdf):
writer=hw.HdfWriter(mock_hdf)
attr={'test_dset':{'test_field':'test_value'}}
writer.create_attrs(attr)
assert writer.hdf_file['test_dset'].attrs['test_field']=='value'
Mocking h5py.File
class HdfWriter():
def __init__(self,hdf_file):
self.hdf_file=hdf_file
def create_attrs(self,attributes):
dset_name=attributes.keys()[0]
attrs=attributes[dset_name]
for key in attrs:
self.hdf_file[dset_name].attrs[key]=attrs[key]
For the purpose of the create_attrs method, hdf_file behaves as a dictionary that returns an object that also behaves like a dictionary. The docs explain pretty clearly how to mock a dictionary.
You need a mock that has an attrs attribute that behaves like a dictionary:
import mock
attrs_d = {}
def setattrs(name, value):
## print 'setattrs', name, value
attrs_d[name] = value
def getattrs(name):
## print 'getattrs', name
return attrs_d[name]
mock = mock.MagicMock()
mock.attrs.__setitem__.side_effect = setattrs
mock.attrs.__getitem__.side_effect = getattrs
You need a mock for hdf_file that behaves like a dictionary and will return the mock object created above.
hdf_d = {'test_dset':mock}
def getitem(name):
## print 'getitem', name
return hdf_d[name]
def setitem(name, value):
hdf_d[name] = value
mock_hdf = mock.MagicMock()
mock_hdf.__getitem__.side_effect = getitem
mock_hdf.__setitem__.side_effect = setitem
hdf_d, as implemented, only works for the key 'test_dset'. Depending on your needs it may be better for getitems to just return mock regardless of the name argument.
def test_if_attr_is_initialized(mock_hdf):
writer=HdfWriter(mock_hdf)
attr={'test_dset':{'test_field':'test_value'}}
writer.create_attrs(attr)
print writer.hdf_file['test_dset'].attrs['test_field'], '==', attr['test_dset']['test_field']
assert writer.hdf_file['test_dset'].attrs['test_field']=='test_value'
test_if_attr_is_initialized(mock_hdf)
>>>
test_value == test_value
>>>
This should suffice to test create_attrs but it may not be optimal - maybe someone will chime in with some refinements.

Categories

Resources