I have two classes - one which inherits from the other. I want to know how to cast to (or create a new variable of) the sub class. I have searched around a bit and mostly 'downcasting' like this seems to be frowned upon, and there are some slightly dodgy workarounds like setting instance.class - though this doesn't seem like a nice way to go.
eg.
http://www.gossamer-threads.com/lists/python/python/871571
http://code.activestate.com/lists/python-list/311043/
sub question - is downcasting really that bad? If so why?
I have simplified code example below - basically i have some code that creates a Peak object after having done some analysis of x, y data. outside this code I know that the data is 'PSD' data power spectral density - so it has some extra attributes. How do i down cast from Peak, to Psd_Peak?
"""
Two classes
"""
import numpy as np
class Peak(object) :
"""
Object for holding information about a peak
"""
def __init__(self,
index,
xlowerbound = None,
xupperbound = None,
xvalue= None,
yvalue= None
):
self.index = index # peak index is index of x and y value in psd_array
self.xlowerbound = xlowerbound
self.xupperbound = xupperbound
self.xvalue = xvalue
self.yvalue = yvalue
class Psd_Peak(Peak) :
"""
Object for holding information about a peak in psd spectrum
Holds a few other values over and above the Peak object.
"""
def __init__(self,
index,
xlowerbound = None,
xupperbound = None,
xvalue= None,
yvalue= None,
depth = None,
ampest = None
):
super(Psd_Peak, self).__init__(index,
xlowerbound,
xupperbound,
xvalue,
yvalue)
self.depth = depth
self.ampest = ampest
self.depthresidual = None
self.depthrsquared = None
def peakfind(xdata,ydata) :
'''
Does some stuff.... returns a peak.
'''
return Peak(1,
0,
1,
.5,
10)
# Find a peak in the data.
p = peakfind(np.random.rand(10),np.random.rand(10))
# Actually the data i used was PSD -
# so I want to add some more values tot he object
p_psd = ????????????
edit
Thanks for the contributions.... I'm afraid I was feeling rather downcast(geddit?) since the answers thus far seem to suggest I spend time hard coding converters from one class type to another. I have come up with a more automatic way of doing this - basically looping through the attributes of the class and transfering them one to another. how does this smell to people - is it a reasonable thing to do - or does it spell trouble ahead?
def downcast_convert(ancestor, descendent):
"""
automatic downcast conversion.....
(NOTE - not type-safe -
if ancestor isn't a super class of descendent, it may well break)
"""
for name, value in vars(ancestor).iteritems():
#print "setting descendent", name, ": ", value, "ancestor", name
setattr(descendent, name, value)
return descendent
You don't actually "cast" objects in Python. Instead you generally convert them -- take the old object, create a new one, throw the old one away. For this to work, the class of the new object must be designed to take an instance of the old object in its __init__ method and do the appropriate thing (sometimes, if a class can accept more than one kind of object when creating it, it will have alternate constructors for that purpose).
You can indeed change the class of an instance by pointing its __class__ attribute to a different class, but that class may not work properly with the instance. Furthermore, this practice is IMHO a "smell" indicating that you should probably be taking a different approach.
In practice, you almost never need to worry about types in Python. (With obvious exceptions: for example, trying to add two objects. Even in such cases, the checks are as broad as possible; here, Python would check for a numeric type, or a type that can be converted to a number, rather than a specific type.) Thus it rarely matters what the actual class of an object is, as long as it has the attributes and methods that whatever code is using it needs.
See following example. Also, be sure to obey the LSP (Liskov Substitution Principle)
class ToBeCastedObj:
def __init__(self, *args, **kwargs):
pass # whatever you want to state
# original methods
# ...
class CastedObj(ToBeCastedObj):
def __init__(self, *args, **kwargs):
pass # whatever you want to state
#classmethod
def cast(cls, to_be_casted_obj):
casted_obj = cls()
casted_obj.__dict__ = to_be_casted_obj.__dict__
return casted_obj
# new methods you want to add
# ...
This isn't a downcasting problem (IMHO). peekfind() creates a Peak object - it can't be downcast because its not a Psd_Peak object - and later you want to create a Psd_Peak object from it. In something like C++, you'd likely rely on the default copy constructor - but that's not going to work, even in C++, because your Psd_Peak class requires more parameters in its constructor. In any case, python doesn't have a copy constructor, so you end up with the rather verbose (fred=fred, jane=jane) stuff.
A good solution may be to create an object factory and pass the type of Peak object you want to peekfind() and let it create the right one for you.
def peak_factory(peak_type, index, *args, **kw):
"""Create Peak objects
peak_type Type of peak object wanted
(you could list types)
index index
(you could list params for the various types)
"""
# optionally sanity check parameters here
# create object of desired type and return
return peak_type(index, *args, **kw)
def peakfind(peak_type, xdata, ydata, **kw) :
# do some stuff...
return peak_factory(peak_type,
1,
0,
1,
.5,
10,
**kw)
# Find a peak in the data.
p = peakfind(Psd_Peak, np.random.rand(10), np.random.rand(10), depth=111, ampest=222)
Related
I have a class with several named attributes. I would like to be able to pass one of the classes attributes to itself and be able to determine specifically which attribute was passed.
Below is a trivial example of how I was doing it (using the "is" operator), until I discovered that special cached variable IDs are used for integer values between -5 and 256.
class AClass:
def __init__(self, one, two, three):
self.one = one
self.two = two
self.three = three
def getIndex(self, attribute):
if attribute is self.one:
return 1
elif attribute is self.two:
return 2
elif attribute is self.three:
return 3
def setByIndex(self, i, value):
if i == 1:
self.one = value
elif i == 2:
self.two = value
elif i == 3:
self.three = value
def setByAttrib(self, attribute, value):
i = self.getIndex(attribute)
self.setByIndex(i, value)
object = AClass(0, 0, 0)
object.setByAttrib(object.three, 10)
In the above example, the intention is to set object.three to 10. However, since all attributes are pointing to the cached location of integer 0, the getIndex function would evaluate true on any of them, and object.one (which appears first) will get set to 10. If the object was initialized with values 257, 257, 257, functionality would presumably be as intended.
So the question is, is there a way to either:
a) force the system to assign non-cached, unique memory locations for these attributes (even if they are set between -5 and 256), or
b) use some other method to check if an attribute passed as an argument is uniquely itself?
EDIT:
Since it was asked a couple times, one of the reasons I'm using this paradigm is due to the the lack of pointers in python. In the example above, the setByIndex function could be doing some complicated work on the attribute. Rather than write multiple identical functions for each variable (eg setOne, setTwo, setThree), I can write out a single generic function that is retrieving and setting by index (index is basically acting like a pointer). Yes, I could pass the attribute value as an argument and return the new set value and do the assignment in the scope where the specific attribute is known, but I am already returning a value. Yes, I could return a list, but it adds more complexity.
I do realize that there are better ways to implement what I need (eg key-value pairs for the attributes and index numbers) but it would be a lot of work to implement (thousands of changes). If there was a way to use the varaible ID as my unique identifier and continue to use the "is" operator (or similar), I wouldn't need to change too much. Not looking possible though. Appreciate the comments/responses.
I wouldn't worry about the memory locations, they are simply an implementation detail here. It's really about function design, so if you want to set object.three, then do exactly that, otherwise, you can create a mapping to an index if you wanted to:
class MyClass:
def __init__(self, *args):
self.one, self.two, self.three, *_ = args
# get an object by it's index
def get_by_index(self, index):
# here's how you could create such a mapping
opts = dict(zip((1, 2, 3), ('one', 'two', 'three')))
try:
return getattr(self, opts[index])
except KeyError as e:
raise ValueError(f"Improper alias for attribute, select one of {', '.join(opts)}") from e
# if you want to set by an index, then do that this way
def set_by_index(self, index, val):
opts = dict(zip((1, 2, 3), ('one', 'two', 'three')))
try:
setattr(self, opts[index], val)
except KeyError as e:
raise ValueError(f"Improper alias for attribute, select one of {', '.join(opts)}") from e
# otherwise, just set the attribute by the name
a = MyClass(0, 0, 0)
a.three = 55
The thing is, you're right, is will look at the three 0's the same way, because it never copied that data in the first place. one, two, three point to the same data because they were assigned the same data. Once you assign the attribute again, you've effectively re-binded that attribute to a new value, rather than updating an existing one.
Point being, don't worry about where the memory is for this implementation, just set explicitly against the attribute
In the snippet below, how do I avoid computing the following numpy variables mask, zbar, te , ro and rvol in the procedures Get_mask, Get_K_Shell_Te etc? These variables are large arrays and I have to define at least six more procedures that reuse them. It looks like what I am doing is not a good idea and is slow.
import numpy as np
# this computes various quantities related to the shell in a object oriented way
class Shell_Data:
def __init__(self, data):
self.data = data
def Get_mask(self):
zbar=self.data['z2a1']
y=self.data['y']*1000
mask= np.logical_and(zbar >= 16 ,zbar<= 19 )
return self.mask
def Get_Shell_Te(self):
self.mask =self.Get_mask()
te =self.data['te'][self.mask]
ro =self.data['ro'][self.mask]
rvol =self.data['rvol'][self.mask]
self.Shell_Te=np.sum(te*ro/rvol)/(np.sum(ro/rvol))
print "Shell Temperature= %0.3f eV" % (self.Shell_Te)
return self.Shell_Te
def Get_Shell_ro(self):
mask =self.Get_mask()
te =self.data['te'][mask]
ro =self.data['ro'][mask]
rvol =self.data['rvol'][mask]
radk =self.data['radk'][mask]
self.Shell_ro=np.sum(radk*ro/rvol)/np.sum(radk/rvol)
return self.Shell_ro
zbar depends on self.data. If you update self.data, you likely have to re-compute it.
If you can make your data immutable, you can compute these values once, e.g. in the constructor.
If you want to avoid calculating the mask data until it's actually required, you can cache the value, like so:
class Shell_Data(...):
def __init__(self,...):
self.cached_mask = None
...
# #property makes an access to self.mask
# to actually return the result of a call to self.mask()
#property
def mask(self):
if self.cached_mask is None: # Not yet calculated.
self.cached_mask = self.computeMask()
return self.cached_mask
def computeMask(self):
zbar = ...
...
return np.logical_and(...)
def someComputation(self):
# The first access to self.mask will compute it.
te = self.data['te'][self.mask]
# The second access will just reuse the same result.
ro = self.data['ro'][self.mask]
...
If you have to mutate self.data, you can cache the computed data, and re-calculate it only when self.data changes. E.g. if you had a setData() method for that, you could recalculate the mask in it, or just set self.cached_mask to None.
(Also, read about instance variables again.
Every method receives the parameter named self, the instance of the object for which it is called. You can access all its instance variables as self.something, exactly the way you access instance variables (and methods) when an object is not called self. If you set an instance variable in one method, you can just access it an another (e.g. self.mask), no need to return it. If you return something from a method, likely it's not worth storing as an instance variable, like self.mask.)
I apologize in advance if there is an obvious solution to this question or it is a duplicate.
I have a class as follows:
class Kernel(object):
""" creates kernels with the necessary input data """
def __init__(self, Amplitude, random = None):
self.Amplitude = Amplitude
self.random = random
if random != None:
self.dims = list(random.shape)
def Gaussian(self, X, Y, sigmaX, sigmaY, muX=0.0, muY=0.0):
""" return a 2 dimensional Gaussian kernel """
kernel = np.zeros([X, Y])
theta = [self.Amplitude, muX, muY, sigmaX, sigmaY]
for i in range(X):
for j in range(Y):
kernel[i][j] = integrate.dblquad(lambda x, y: G2(x + float(i) - (X-1.0)/2.0, \
y + float(j) - (Y-1.0)/2.0, theta), \
-0.5, 0.5, lambda y: -0.5, lambda y: 0.5)[0]
return kernel
It just basically creates a bunch of convolution kernels (I've only included the first).
I want to add an instance (method?) to this class so that I can use something like
conv = Kernel(1.5)
conv.Gaussian(9, 9, 2, 2).kershow()
and have the array pop up using Matplotlib. I know how to write this instance and plot it with Matplotlib, but I don't know how to write this class so that for each method I would like to have this additional ability (i.e. .kershow()), I may call it in this manner.
I think I could use decorators ? But I've never used them before. How can I do this?
The name of the thing you're looking for is function or method chaining.
Strings are a really good example of this in Python. Because a string is immutable, each string method returns a new string. So you can call string methods on the return values, rather than storing the intermediate value. For example:
lower = ' THIS IS MY NAME: WAYNE '.lower()
without_left_padding = lower.lstrip()
without_right_padding = without_left_padding.rstrip()
title_cased = without_right_padding.title()
Instead you could write:
title_cased = ' THIS IS MY NAME: WAYNE '.lower().lstrip().rstrip().title()
Of course really you'd just do .strip().title(), but this is an example.
So if you want a .kernshow() option, then you'll need to include that method on whatever you return. In your case, numpy arrays don't have a .kernshow method, so you'll need to return something that does.
Your options are mostly:
A subclass of numpy arrays
A class that wraps the numpy array
I'm not sure what is involved with subclassing the numpy array, so I'll stick with the latter as an example. Either you can use the kernel class, or create a second class.
Alex provided an example of using your kernel class, but alternatively you could have another class like this:
class KernelPlotter(object):
def __init__(self, kernel):
self.kernel = kernel
def kernshow(self):
# do the plotting here
Then you would pretty much follow your existing code, but rather than return kernel you would do return KernelPlotter(kernel).
Which option you choose really depends on what makes sense for your particular problem domain.
There's another sister to function chaining called a fluent interface that's basically function chaining but with the goal of making the interface read like English. For example you might have something like:
Kernel(with_amplitude=1.5).create_gaussian(with_x=9, and_y=9, and_sigma_x=2, and_sigma_y=2).show_plot()
Though obviously there can be some problems when writing your code this way.
Here's how I would do it:
class Kernel(object):
def __init__ ...
def Gaussian(...):
self.kernel = ...
...
return self # not kernel
def kershow(self):
do_stuff_with(self.kernel)
Basically the Gaussian method doesn't return a numpy array, it just stores it in the Kernel object to be used elsewhere in the class. In particular kershow can now use it. The return self is optional but allows the kind of interface you wanted where you write
conv.Gaussian(9, 9, 2, 2).kershow()
instead of
conv.Gaussian(9, 9, 2, 2)
conv.kershow()
This is a design principle question for classes dealing with mathematical/physical equations where the user is allowed to set any parameter upon which the remaining are being calculated.
In this example I would like to be able to let the frequency be set as well while avoiding circular dependencies.
For example:
from traits.api import HasTraits, Float, Property
from scipy.constants import c, h
class Photon(HasTraits):
wavelength = Float # would like to do Property, but that would be circular?
frequency = Property(depends_on = 'wavelength')
energy = Property(depends_on = ['wavelength, frequency'])
def _get_frequency(self):
return c/self.wavelength
def _get_energy(self):
return h*self.frequency
I'm also aware of an update trigger timing problem here, because I don't know the sequence the updates will be triggered:
Wavelength is being changed
That triggers an updated of both dependent entities: frequency and energy
But energy needs frequency to be updated so that energy has the value fitting to the new wavelength!
(The answer to be accepted should also address this potential timing problem.)
So, what' the best design pattern to get around these inter-dependent problems?
At the end I want the user to be able to update either wavelength or frequency and frequency/wavelength and energy shall be updated accordingly.
This kind of problems of course do arise in basically all classes that try to deal with equations.
Let the competition begin! ;)
Thanks to Adam Hughes and Warren Weckesser from the Enthought mailing list I realized what I was missing in my understanding.
Properties do not really exist as an attribute. I now look at them as something like a 'virtual' attribute that completely depends on what the writer of the class does at the time a _getter or _setter is called.
So when I would like to be able to set wavelength AND frequency by the user, I only need to understand that frequency itself does not exist as an attribute and that instead at _setting time of the frequency I need to update the 'fundamental' attribute wavelength, so that the next time the frequency is required, it is calculated again with the new wavelength!
I also need to thank the user sr2222 who made me think about the missing caching. I realized that the dependencies I set up by using the keyword 'depends_on' are only required when using the 'cached_property' Trait. If the cost of calculation is not that high or it's not executed that often, the _getters and _setters take care of everything that one needs and one does not need to use the 'depends_on' keyword.
Here now the streamlined solution I was looking for, that allows me to set either wavelength or frequency without circular loops:
class Photon(HasTraits):
wavelength = Float
frequency = Property
energy = Property
def _wavelength_default(self):
return 1.0
def _get_frequency(self):
return c/self.wavelength
def _set_frequency(self, freq):
self.wavelength = c/freq
def _get_energy(self):
return h*self.frequency
One would use this class like this:
photon = Photon(wavelength = 1064)
or
photon = Photon(frequency = 300e6)
to set the initial values and to get the energy now, one just uses it directly:
print(photon.energy)
Please note that the _wavelength_default method takes care of the case when the user initializes the Photon instance without providing an initial value. Only for the first access of wavelength this method will be used to determine it. If I would not do this, the first access of frequency would result in a 1/0 calculation.
I would recommend to teach your application what can be derived from what. For example, a typical case is that you have a set of n variables, and any one of them can be derived from the rest. (You can model more complicated cases as well, of course, but I wouldn't do it until you actually run into such cases).
This can be modeled like this:
# variable_derivations is a dictionary: variable_id -> function
# each function produces this variable's value given all the other variables as kwargs
class SimpleDependency:
_registry = {}
def __init__(self, variable_derivations):
unknown_variable_ids = variable_derivations.keys() - self._registry.keys():
raise UnknownVariable(next(iter(unknown_variable_ids)))
self.variable_derivations = variable_derivations
def register_variable(self, variable, variable_id):
if variable_id in self._registry:
raise DuplicateVariable(variable_id)
self._registry[variable_id] = variable
def update(self, updated_variable_id, new_value):
if updated_variable_id not in self.variable_ids:
raise UnknownVariable(updated_variable_id)
self._registry[updated_variable_id].assign(new_value)
other_variable_ids = self.variable_ids.keys() - {updated_variable_id}
for variable_id in other_variable_ids:
function = self.variable_derivations[variable_id]
arguments = {var_id : self._registry[var_id] for var_id in other_variable_ids}
self._registry[variable_id].assign(function(**arguments))
class FloatVariable(numbers.Real):
def __init__(self, variable_id, variable_value = 0):
self.variable_id = variable_id
self.value = variable_value
def assign(self, value):
self.value = value
def __float__(self):
return self.value
This is just a sketch, I didn't test or think through every possible issue.
I have a class that holds the size and position of something I draw to the screen. I am using sqlalchemy with a sqlite database to persist these objects. However, the position is a 2D value (x and y) and I'd like to have a convienent way to access this as
MyObject.pos # preferred, simpler interface
# instead of:
MyObject.x
MyObject.y # inconvenient
I can use properties but this solution isn't optimal since I cannot query based on the properties
session.query(MyObject).filter(MyObject.pos==some_pos).all()
Is there some way to use collections or association proxies to get the behavior I want?
If you are using PostGIS (Geometry extended version of postgres), you can take advantage of that using GeoAlchemy, which allows you to define Column types in terms of geometric primitives available in PostGIS. One such data type is Point, which is just what it sounds like.
PostGIS is a bit more difficult to set up than vanilla PostgreSQL, but if you actually intend to do queries based on actual geometric terms, it's well worth the extra (mostly one time) trouble.
Another solution, using plain SQLAlchemy is to define your own column types with the desired semantics, and translate them at compile time to more primitive types supported by your database.
Actually, you could use a property, but not with the builtin property decorator. You'd have to have to work a little harder and create your own, custom descriptor.
You probably want a point class. A decent option is actually to use
a namedtuple, since you don't have to worry about proxying assignment
of individual coordinates. The property gets assigned all or nothing
Point = collections.namedtuple('Point', 'x y')
This would let us at least compare point values. The next step in
writing the descriptor is to work through its methods. There are two methods to think about, __get__
and __set__, and with get, two situations, when called on
an instance, and you should handle actual point values, and when
called on the class, and you should turn it into a column expression.
What to return in that last case is a bit tricky. What we want is something
that will, when compared to a point, returns a column expression that equates
the individual columns with the individual coordinates. well make one more
class for that.
class PointColumnProxy(object):
def __init__(self, x, y):
''' these x and y's are the actual sqlalchemy columns '''
self.x, self.y = x, y
def __eq__(self, pos):
return sqlalchemy.and_(self.x == pos.x,
self.y == pos.y)
All that's left is to define the actual descriptor class.
class PointProperty(object):
def __init__(self, x, y):
''' x and y are the names of the coordinate attributes '''
self.x = x
self.y = y
def __set__(self, instance, value):
assert type(value) == Point
setattr(instance, self.x, value.x)
setattr(instance, self.y, value.y)
def __get__(self, instance, owner):
if instance is not None:
return Point(x=getattr(instance, self.x),
y=getattr(instance, self.y))
else: # called on the Class
return PointColumnProxy(getattr(owner, self.x),
getattr(owner, self.y))
which could be used thusly:
Base = sqlalchemy.ext.declarative.declarative_base()
class MyObject(Base):
x = Column(Float)
y = Column(Float)
pos = PointProperty('x', 'y')
Define your table with a PickleType column type. It will then automatically persist Python objects, as long as they are pickleable. A tuple is pickleable.
mytable = Table("mytable", metadata,
Column('pos', PickleType(),
...
)