I am trying to create a subclass of numpy.ndarray. It is very simple, and is just a numpy array with some extra attributes and methods that manipulate those attributes. For the most part, it works fine, however I have a problem when using reductions like np.sum.
First off, I have read both Subclassing ndarray and Zero-Rank Arrays.
It seems that when I create a subclass of ndarray it behaves differently with respect to zero-rank array -> scalar conversion.
In this example I just use the simplest possible derived class, one that doesn't actually do anything:
class XArray(np.ndarray):
pass
x = np.eye(2)
y = x.view(type=XArray)
print type(np.sum(x)), type(np.sum(y))
<type 'numpy.float64'> <type '__main__.XArray'>
The former is a numpy scalar, the latter is a zero-rank array of my subclass. Overriding __new__ and __array_finalize__ as documented in the array subclassing guide doesn't change this behavior.
First, my problem: this breaks object oriented-ness. XArray instances cannot be substituted for ndarray instances transparently without breaking lots of code.
I can fix this by overriding the __array_wrap__ method:
class XArray(np.ndarray):
def __array_wrap__(self, obj):
if len(obj.shape)==0:
return obj[()]
else:
return np.ndarray.__array_wrap__(obj)
a = np.sum(np.eye(2).view(XArray))
print type(a)
<type 'numpy.float64'>
I am fine with this, except for two questions:
Is this the right place to do this special case? I can't figure out where this conversion is happening for normal numpy arrays, so I can't tell where it should happen to my derived class.
Is this enough to make my subclass work, or am I going to continue having compatibility problems. Should I just abandon the idea of subclassing ndarray?
The goal here is to be 100 % compatible with regular numpy arrays. It is OK and expected that some operations will lose the derived type information and return an ndarray base class. I am fine with that, I just can't have code written to operate on ndarray's break.
Related
I have three questioned mentioned below
1:-for example when we say arr = np.array([1,2,3])
what happen internally? does the array function internally call numpy.ndarray constructor to create this object? or some other mechanism is followed?
2:-New arrays can be constructed using the routines detailed in Array creation routines like above, and also by using the low-level ndarray constructor:
ndarray(shape[, dtype, buffer, offset, …])
does this internaly call init constructor like in normal python? if not then can how is the constructor written for it to create ndarray object?
3:- does routine and function means the same thing?
If I have a custom python class, and use that in a numpy.ndarray, my array ends up with dtype 'O' (object), which is fine:
import numpy
class Test(object):
"""Dummy class
"""
def __init__(self, value):
self.value = value
def __float__(self):
return float(self.value)
arr = numpy.array([], dtype=Test)
This gives me array([], dtype=object), but how can I unwrap the dtype to check that the underlying type is Test?
This is easy when there are elements in the array, since I can use isinstance on any of the members, but when the array is empty, I am stumped. I hope that the underlying type is stored in the dtype somewhere...
You can't. Arrays aren't meant to be used with non-primitive types (efficiently), and really are no different from a (terribly slow) list. In fact, once you go object, you can put anything you want into the array:
array((Test(),[])) #works fine, dtype object. Even explicitly setting dtype will not fail, and be ignored.
As you can see - if you do not put a primitive numpy can convert to, no type enforcing is done.
Though I would not recommend an array at all, if you can guarantee the array contains a single type, then
type(arr[0])
is really your only option (which is shape dependent of course).
type(ar.reshape(-1)[0])
Shape independent assuming it's not heterogeneous and it's a view, so doesn't take extra memory.
I'm creating a numpy array which is to be filled with objects of a particular class I've made. I'd like to initialize the array such that it will only ever contain objects of that class. For example, here's what I'd like to do, and what happens if I do it.
class Kernel:
pass
>>> L = np.empty(4,dtype=Kernel)
TypeError: data type not understood
I can do this:
>>> L = np.empty(4,dtype=object)
and then assign each element of L as a Kernel object (or any other type of object). It would be so neat were I able to have an array of Kernels, though, from both a programming point of view (type checking) and a mathematical one (operations on sets of functions).
Is there any way for me to specify the data type of a numpy array using an arbitrary class?
If your Kernel class has a predictable amount of member data, then you could define a dtype for it instead of a class. e.g. if it's parameterized by 9 floats and an int, you could do
kerneldt = np.dtype([('myintname', np.int32), ('myfloats', np.float64, 9)])
arr = np.empty(dims, dtype=kerneldt)
You'll have to do some coercion to turn them into objects of class Kernel every time you want to manipulate methods of a single kernel but that's one way to store the actual data in a NumPy array. If you want to only store a reference, then the object dtype is the best you can do without subclassing ndarray.
It has to be a Numpy scalar type:
http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html#arrays-scalars-built-in
or a subclass of ndarray:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html#numpy.ndarray
As far as I know, enforcing a single type for elements in a numpy.ndarray has to be done manually (unless the array contains Numpy scalars): there is no built-in checking mechanism (your array has dtype=object). If you really want to enforce a single type, you have to subclass ndarray and implement the checks in the appropriate methods (__setitem__, etc.).
If you want to implement operations on a set of functions (Kernel objects), you might be able to do so by defining the proper operations directly in your Kernel class. This is what I did for my uncertainties.py module, which handles numpy.ndarrays of numbers with uncertainties.
Basically I have a class which subclasses ndarray and has additional information. When I call np.asarray() on my object, it returns just the numpy array and destroys my additional information.
My question is then this: Is there a way in Python to change how np.asarray() acts on my subclass of ndarray from within my subclass? I don't want to change numpy of course, and I do not want to go through every instance where np.asarray() is called to take care of this.
Thanks in advance!
Chris
Short answer: No. Numpy's asarray() doesn't check e.g. if a special method on the class of its argument exists and so doesn't provide a way to override its behaviour.
Long answer: It's not possible from your subclass, but you can hotpatch the numpy module in your module level code to replace the asarray function with your own wrapper. This is a very hacky solution and I don't recommend it, but it may work for you.
_real_asarray = np.asarray
def _new_asarray(a, dtype=None, order=None):
if isinstance(a, MyClass):
# special handling here
else:
return _real_asarray(a, dtype, order)
np.asarray = _new_asarray
No. Numpy's asarray() is coded to instantiate a regular numpy array, and you can't change that without editing asarray() or changing the caller's code to call your special method instead of asarray()
[Python 3]
I like ndarray but I find it annoying to use.
Here's one problem I face. I want to write class Array that will inherit much of the functionality of ndarray, but has only one way to be instantiated: as a zero-filled array of a certain size. I was hoping to write:
class Array(numpy.ndarray):
def __init__(size):
# What do here?
I'd like to call super().__init__ with some parameters to create a zero-filled array, but it won't work since ndarray uses a global function numpy.zeros (rather than a constructor) to create a zero-filled array.
Questions:
Why does ndarray use global (module) functions instead of constructors in many cases? It is a big annoyance if I'm trying to reuse them in an object-oriented setting.
What's the best way to define class Array that I need? Should I just manually populate ndarray with zeroes, or is there any way to reuse the zeros function?
Why does ndarray use global (module) functions instead of constructors in many cases?
To be compatible/similar to Matlab, where functions like zeros or ones originally came from.
Global factory functions are quick to write and easy to understand. What should the semantics of a constructor be, e.g. how would you express a simple zeros or empty or ones with one single constructor? In fact, such factory functions are quite common, also in other programming languages.
What's the best way to define class Array that I need?
import numpy
class Array(numpy.ndarray):
def __new__(cls, size):
result = numpy.ndarray.__new__(Array, size)
result.fill(0)
return result
arr = Array(5)
def test(a):
print type(a), a
test(arr)
test(arr[2:4])
test(arr.view(int))
arr[2:4] = 5.5
test(arr)
test(arr[2:4])
test(arr.view(int))
Note that this is Python 2, but it would require only small modifications to work with Python 3.
If you don't like ndarray interface then don't inherit it. You can define your own interface and delegate the rest to ndarray and numpy.
import functools
import numpy as np
class Array(object):
def __init__(self, size):
self._array = np.zeros(size)
def __getattr__(self, attr):
try: return getattr(self._array, attr)
except AttributeError:
# extend interface to all functions from numpy
f = getattr(np, attr, None)
if hasattr(f, '__call__'):
return functools.partial(f, self._array)
else:
raise AttributeError(attr)
def allzero(self):
return np.allclose(self._array, 0)
a = Array(10)
# ndarray doesn't have 'sometrue()' that is the same as 'any()' that it has.
assert a.sometrue() == a.any() == False
assert a.allzero()
try: a.non_existent
except AttributeError:
pass
else:
assert 0
Inheritance of ndarray is little bit tricky. ndarray does not even have method __init(self, )___, so it can't be called from subclass, but there are reasons for that. Please see numpy documentation of subclassing.
By the way could you be more specific of your particular needs? It's still quite easy to cook up a class (utilizing ndarray) for your own needs, but a subclass of ndarray to pass all the numpy machinery is quite different issue.
It seems that I can't comment my own post, odd
#Philipp: It will be called by Python, but not by numpy. There are three ways to instantiate ndarray, and the guidelines how to handle all cases is given on that doc.