Override .T (transpose) in subclass of numpy ndarray - python

I have a three dimensional dataset where the 1st dimension gives the type of the variable and the 2nd and 3rd dimensions are spatial indexes. I am attempting to make this data more user friendly by creating a subclass of ndarray containing the data, but with attributes that have sensible names that point to the appropriate variable dimension. One of the variable types is temperature, which I would like to represent with the attribute .T. I attempt to set it like this:
self.T = self[8,:,:]
However, this clashes with the underlying numpy attribute for transposing an array. Normally, overriding a class attribute is trivial, however in this case I get an exception when I try to re-write the attribute. The following is a minimal example of the same problem:
import numpy as np
class foo(np.ndarray):
def __new__(cls, input_array):
obj = np.asarray(input_array).view(cls)
obj.T = 100.0
return obj
foo([1,2,3,4])
results in:
Traceback (most recent call last):
File "tmp.py", line 9, in <module>
foo([1,2,3,4])
File "tmp.py", line 6, in __new__
obj.T = 100.0
AttributeError: attribute 'T' of 'numpy.ndarray' objects is not writable
I have tried using setattr(obj, 'T', 100.0) to set the attribute, but the result is the same.
Obviously, I could just give up and name my attribute .temperature, or something else. However .T will be much more eloquent for the subsequent mathematical expressions which will be done with these data objects. How can I force python/numpy to override this attribute?

For np.matrix subclass, as defined in np.matrixlib.defmatrix:
#property
def T(self):
"""
Returns the transpose of the matrix.
....
"""
return self.transpose()

T is not a conventional attribute that lives in a __dict__ or __slots__. In fact, you can see this immediately because the result of T changes if you modify the shape or contents of an array.
Since ndarray is a class written in C, it has special descriptors for the dynamic attributes it exposes. T is one of these dynamic attributes, defined as a PyGetSetDef structure. You can't override it by simple assignment, because there is nothing to assign to, but you can make a descriptor that overrides it at the class level.
As #hpaulj's answer suggests, the simplest solution may be to use a property to implement the descriptor protocol for you:
import numpy as np
class foo(np.ndarray):
#property
def T(self):
return self[8, :, :]
More complicated alternatives would be to make your own descriptor type, or even to extend the class in C and write your own PyGetSetDef structure. It all depends on what you are trying to achieve.

Following Mad Physicist and hpaulj's lead, the solution to my minimal working example is:
import numpy as np
class foo(np.ndarray):
def __new__(cls, input_array):
obj = np.asarray(input_array).view(cls)
return obj
#property
def T(self):
return 100.0
x = foo([1,2,3,4])
print("T is", x.T)
Which results in:
T is [1 2 3 4]

Related

Initializing a subclass on condition from the parent class initialisation

I'm working with matrices for a project I'm writing in Python. I know that a lot of libraries already exist for manipulating matrices but I'm writing my own so I know exactly what's going on under the hood.
So I have a Matrix base class and a Vector subclass. Both work as expected individually but I'd like a Matrix to be a Vector if initialized with a single line or column.
I tried something like self = Vector(...) when the Matrix is initialized with the right size. But that doesn't seem to affect the object. I also thought of calling the __init__() method of the Vector class but that doesn't suffice because what I want most importantly are the Vector's methods.
Is there a pythonic way of dealing with a situation like this?
This can be done, although it might not be the best way to do it. After all if the Matrix class is instantiated, one expect the result to be a Matrix instance.
One way of achieving that is to customize the constructor of the Matrix class:
class Matrix:
def __new__(cls, nrows, ncols):
if nrows == 1:
inst = super(Matrix, cls).__new__(Vector)
else:
inst = super(Matrix, cls).__new__(cls)
inst.nrows = nrows
inst.ncols = ncols
return inst
def __repr__(self):
return '{}(nrows={}, ncols={})'.format(
self.__class__.__name__, self.nrows, self.ncols)
Demo:
>>> m1 = Matrix(2, 5)
Matrix(nrows=2, ncols=5)
>>> Matrix(1, 5)
Vector(nrows=1, ncols=5)
Mind that instances are actually created inside the __new__() method, while __init__() is used for initializing the newly created instance.
Also, as mentioned in a comment below by #Blckknght, creating a Vector instance through the Matrix class can lead to unwanted surprises, such as as the Vector's __init__() method not getting called (it would have to be called manually).
Depending on your use case, though, it might thus be better to keep things clean and just use a factory for instance creation:
class Matrix:
def __init__(self, nrows, ncols):
self.nrows = nrows
self.ncols = ncols
def __repr__(self):
return '{}(nrows={}, ncols={})'.format(
self.__class__.__name__, self.nrows, self.ncols)
class Vector(Matrix):
pass
def make_matrix(nrows, ncols):
if nrows == 1:
return Vector(nrows, ncols)
return Matrix(nrows, ncols)
Demo:
>>> make_matrix(1, 5)
Vector(nrows=1, ncols=5)
>>> make_matrix(2, 5)
Matrix(nrows=2, ncols=5)
Of course make_matrix() could also be implemented as a (class/static) method of the Matrix class, but that would make the parent class more tightly coupled with one of its child classes...

Method changes both instances even if applied to only one of them

I'm struggling to understand why my simple code behaves like this. I create 2 instances a and b that takes in an array as argument. Then I define a method to change one of the instances array, but then both get changed. Any idea why this happen and how can I avoid the method changing the other instance?
import numpy as np
class Test:
def __init__(self, arg):
self.arg=arg
def change(self,i,j,new):
self.arg[i][j]=new
array=np.array([[11,12,13]])
a=Test(array)
b=Test(array)
#prints the same as expected
print(a.arg)
print(b.arg)
print()
a.change(0,0,3)
#still prints the same, even though I did
#not change b.arg
print(a.arg)
print(b.arg)
Because you assigned the same object as the instance members. You can use np.array(x, copy=True) or x.copy() to generate a new array object:
array = np.array([[11,12,13]])
a = Test(array.copy())
b = Test(np.array(array, copy=True))
Alternatively, if your arg is always a np.array, you could do it in the __init__ method (as noted by roganjosh in the comments):
class Test:
def __init__(self, arg):
self.arg = np.array(arg, copy=True)
...

extending built-in python dict class

I want to create a class that would extend dict's functionalities. This is my code so far:
class Masks(dict):
def __init__(self, positive=[], negative=[]):
self['positive'] = positive
self['negative'] = negative
I want to have two predefined arguments in the constructor: a list of positive and negative masks. When I execute the following code, I can run
m = Masks()
and a new masks-dictionary object is created - that's fine. But I'd like to be able to create this masks objects just like I can with dicts:
d = dict(one=1, two=2)
But this fails with Masks:
>>> n = Masks(one=1, two=2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __init__() got an unexpected keyword argument 'two'
I should call the parent constructor init somewhere in Masks.init probably. I tried it with **kwargs and passing them into the parent constructor, but still - something went wrong. Could someone point me on what should I add here?
You must call the superclass __init__ method. And if you want to be able to use the Masks(one=1, ..) syntax then you have to use **kwargs:
In [1]: class Masks(dict):
...: def __init__(self, positive=(), negative=(), **kwargs):
...: super(Masks, self).__init__(**kwargs)
...: self['positive'] = list(positive)
...: self['negative'] = list(negative)
...:
In [2]: m = Masks(one=1, two=2)
In [3]: m['one']
Out[3]: 1
A general note: do not subclass built-ins!!!
It seems an easy way to extend them but it has a lot of pitfalls that will bite you at some point.
A safer way to extend a built-in is to use delegation, which gives better control on the subclass behaviour and can avoid many pitfalls of inheriting the built-ins. (Note that implementing __getattr__ it's possible to avoid reimplementing explicitly many methods)
Inheritance should be used as a last resort when you want to pass the object into some code that does explicit isinstance checks.
Since all you want is a regular dict with predefined entries, you can use a factory function.
def mask(*args, **kw):
"""Create mask dict using the same signature as dict(),
defaulting 'positive' and 'negative' to empty lists.
"""
d = dict(*args, **kw)
d.setdefault('positive', [])
d.setdefault('negative', [])

How to implement an array-like property wrapper in python?

I have a class in python that acts as a front-end to a c-library. This library performs simulations and handles very large arrays of data. This library passes forward a ctype array and my wrapper converts it into a proper numpy.ndarray.
class SomeClass(object):
#property
def arr(self):
return numpy.array(self._lib.get_arr())
However, in order to make sure that memory problems don't occur, I keep the ndarray data separate from the library data, so changing the ndarray does not cause a change in the true array being used by the library. However, I can pass along a new array of the same shape and overwrite the library's held array.
#arr.setter
def arr(self, new_arr):
self._lib.set_arr(new_arr.ctypes)
So, I can interact with the array like so:
x = SomeClass()
a = x.arr
a[0] += 1
x.arr = a
My desire is to simplify this even more by allowing syntax to simply be x.arr[0] += 1, which would be more readable and have less variables. I am not exactly sure how to go about creating such a wrapper (I have very little experience making wrapper classes/functions) that mimics properties but allows item access as my example.
How would I go about making such a wrapper class? Is there a better way to accomplish this goal? If you have any advice or reading that could help I would appreciate it very much.
This could work. Array is a proxy for the Numpy/C array:
class Array(object):
def __init__(self):
#self.__lib = ...
self.np_array = numpy.array(self._lib.get_arr())
def __getitem__(self, key):
self.np_array = numpy.array(self._lib.get_arr())
return self.np_array.__getitem__(key)
def __setitem__(self, key, value):
self.np_array.__setitem__(key, value)
self._lib.set_arr(new_arr.ctypes)
def __getattr__(self, name):
"""Delegate to NumPy array."""
try:
return getattr(self.np_array, name)
except AttributeError:
raise AttributeError(
"'Array' object has no attribute {}".format(name))
Should behave like this:
>>> a = Array()
>>> a[1]
1
>>> a[1] = 10
>>> a[1]
10
The 10 should end up in your C array too.
I think your descriptor should return Instance of list-like class which knows about self._lib and will update it during normal operation append, __setitem__, __getitem__, etc.

How do I downcast in python

I have two classes - one which inherits from the other. I want to know how to cast to (or create a new variable of) the sub class. I have searched around a bit and mostly 'downcasting' like this seems to be frowned upon, and there are some slightly dodgy workarounds like setting instance.class - though this doesn't seem like a nice way to go.
eg.
http://www.gossamer-threads.com/lists/python/python/871571
http://code.activestate.com/lists/python-list/311043/
sub question - is downcasting really that bad? If so why?
I have simplified code example below - basically i have some code that creates a Peak object after having done some analysis of x, y data. outside this code I know that the data is 'PSD' data power spectral density - so it has some extra attributes. How do i down cast from Peak, to Psd_Peak?
"""
Two classes
"""
import numpy as np
class Peak(object) :
"""
Object for holding information about a peak
"""
def __init__(self,
index,
xlowerbound = None,
xupperbound = None,
xvalue= None,
yvalue= None
):
self.index = index # peak index is index of x and y value in psd_array
self.xlowerbound = xlowerbound
self.xupperbound = xupperbound
self.xvalue = xvalue
self.yvalue = yvalue
class Psd_Peak(Peak) :
"""
Object for holding information about a peak in psd spectrum
Holds a few other values over and above the Peak object.
"""
def __init__(self,
index,
xlowerbound = None,
xupperbound = None,
xvalue= None,
yvalue= None,
depth = None,
ampest = None
):
super(Psd_Peak, self).__init__(index,
xlowerbound,
xupperbound,
xvalue,
yvalue)
self.depth = depth
self.ampest = ampest
self.depthresidual = None
self.depthrsquared = None
def peakfind(xdata,ydata) :
'''
Does some stuff.... returns a peak.
'''
return Peak(1,
0,
1,
.5,
10)
# Find a peak in the data.
p = peakfind(np.random.rand(10),np.random.rand(10))
# Actually the data i used was PSD -
# so I want to add some more values tot he object
p_psd = ????????????
edit
Thanks for the contributions.... I'm afraid I was feeling rather downcast(geddit?) since the answers thus far seem to suggest I spend time hard coding converters from one class type to another. I have come up with a more automatic way of doing this - basically looping through the attributes of the class and transfering them one to another. how does this smell to people - is it a reasonable thing to do - or does it spell trouble ahead?
def downcast_convert(ancestor, descendent):
"""
automatic downcast conversion.....
(NOTE - not type-safe -
if ancestor isn't a super class of descendent, it may well break)
"""
for name, value in vars(ancestor).iteritems():
#print "setting descendent", name, ": ", value, "ancestor", name
setattr(descendent, name, value)
return descendent
You don't actually "cast" objects in Python. Instead you generally convert them -- take the old object, create a new one, throw the old one away. For this to work, the class of the new object must be designed to take an instance of the old object in its __init__ method and do the appropriate thing (sometimes, if a class can accept more than one kind of object when creating it, it will have alternate constructors for that purpose).
You can indeed change the class of an instance by pointing its __class__ attribute to a different class, but that class may not work properly with the instance. Furthermore, this practice is IMHO a "smell" indicating that you should probably be taking a different approach.
In practice, you almost never need to worry about types in Python. (With obvious exceptions: for example, trying to add two objects. Even in such cases, the checks are as broad as possible; here, Python would check for a numeric type, or a type that can be converted to a number, rather than a specific type.) Thus it rarely matters what the actual class of an object is, as long as it has the attributes and methods that whatever code is using it needs.
See following example. Also, be sure to obey the LSP (Liskov Substitution Principle)
class ToBeCastedObj:
def __init__(self, *args, **kwargs):
pass # whatever you want to state
# original methods
# ...
class CastedObj(ToBeCastedObj):
def __init__(self, *args, **kwargs):
pass # whatever you want to state
#classmethod
def cast(cls, to_be_casted_obj):
casted_obj = cls()
casted_obj.__dict__ = to_be_casted_obj.__dict__
return casted_obj
# new methods you want to add
# ...
This isn't a downcasting problem (IMHO). peekfind() creates a Peak object - it can't be downcast because its not a Psd_Peak object - and later you want to create a Psd_Peak object from it. In something like C++, you'd likely rely on the default copy constructor - but that's not going to work, even in C++, because your Psd_Peak class requires more parameters in its constructor. In any case, python doesn't have a copy constructor, so you end up with the rather verbose (fred=fred, jane=jane) stuff.
A good solution may be to create an object factory and pass the type of Peak object you want to peekfind() and let it create the right one for you.
def peak_factory(peak_type, index, *args, **kw):
"""Create Peak objects
peak_type Type of peak object wanted
(you could list types)
index index
(you could list params for the various types)
"""
# optionally sanity check parameters here
# create object of desired type and return
return peak_type(index, *args, **kw)
def peakfind(peak_type, xdata, ydata, **kw) :
# do some stuff...
return peak_factory(peak_type,
1,
0,
1,
.5,
10,
**kw)
# Find a peak in the data.
p = peakfind(Psd_Peak, np.random.rand(10), np.random.rand(10), depth=111, ampest=222)

Categories

Resources