Selecting one element from each innermost dimension with numpy - python

I have a three dimensional numpy source array and a two-dimensional numpy array of indexes.
For example:
src = np.array([[[1,2,3],[4,5,6]],
[[7,8,9],[10,11,12]]])
idx = np.array([[0,1],
[1,2]])
I'd like to get a 2d array, where each element represents the indexed value in the innermost dimension in that position:
array([[1,5],
[8,12]])
How do I do this with numpy?

You can try np.take, here is the documentation.
However, you should count the index of the array after flattening all the elements. For example you should use
src = np.array([[[1,2,3],[4,5,6]],
[[7,8,9],[10,11,12]]])
idx = np.array([[0,4],
[7,11]])
# Wanted result
res = np.take(src, idx)
where src was regarded as [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
You can also try np.take_along_axis, here is the documentation.
Using this method need your src and idx in same dimension, therefore, you should first unsqueezed the src and squeeze the res.
# Unsqueezed the last dim
idx = np.expand_dims(idx, axis=-1)
# Squeeze the last dim
res = np.take_along_axis(src, idx, axis=2).squeeze(-1)

You can use the np.choose method with a little reshaping:
np.choose(idx.reshape((1, 2, 2)), src.transpose()).reshape((2, 2))
>>>> array([[ 1, 8],
[ 5, 12]])

Direct indexing:
src[np.arange(2)[:, None], np.arange(2), idx]

Related

Complex indexing of a multidimensional array with indices of lower dimensional arrays in Python

Problem:
I have a numpy array of 4 dimensions:
x = np.arange(1000).reshape(5, 10, 10, 2 )
If we print it:
I want to find the indices of the 6 largest values of the array in the 2nd axis but only for the 0th element in the last axis (red circles in the image):
indLargest2ndAxis = np.argpartition(x[...,0], 10-6, axis=2)[...,10-6:]
These indices have a shape of (5,10,6) as expected.
I want to obtain the values of the array for these indices in the 2nd axis but now for the 1st element in the last axis (yellow circles in the image). They should have a shape of (5,10,6). Without vectorizing, this could be done with:
np.array([ [ [ x[i, j, k, 1] for k in indLargest2ndAxis[i,j]] for j in range(10) ] for i in range(5) ])
However, I would like to achieve it vectorizing. I tried indexing with:
x[indLargest2ndAxis, 1]
But I get IndexError: index 5 is out of bounds for axis 0 with size 5. How can I manage this indexing combination in a vectorized way?
Ah, I think I now get what you are after. Fancy indexing is documented here in detail. Be warned though that - in its full generality - this is quite heavy stuff. In a nutshell, fancy indexing allows you to take elements from a source array (according to some idx) and place them into a new array (fancy indexing allways returns a copy):
source = np.array([10.5, 21, 42])
idx = np.array([0, 1, 2, 1, 1, 1, 2, 1, 0])
# this is fancy indexing
target = source[idx]
expected = np.array([10.5, 21, 42, 21, 21, 21, 42, 21, 10.5])
assert np.allclose(target, expected)
What is nice about this is that you can control the shape of the resulting array using the shape of the index array:
source = np.array([10.5, 21, 42])
idx = np.array([[0, 1], [1, 2]])
target = source[idx]
expected = np.array([[10.5, 21], [21, 42]])
assert np.allclose(target, expected)
assert target.shape == (2,2)
Where things get a little more interesting is if source has more than one dimension. In this case, you need to specify the indices of each axis so that numpy knows which elements to take:
source = np.arange(4).reshape(2,2)
idxA = np.array([0, 1])
idxB = np.array([0, 1])
# this will take (0,0) and (1,1)
target = source[idxA, idxB]
expected = np.array([0, 3])
assert np.allclose(target, expected)
Observe that, again, the shape of target matches the shape of the index used. What is awesome about fancy indexing is that index shapes are broadcasted if necessary:
source = np.arange(4).reshape(2,2)
idxA = np.array([0, 0, 1, 1]).reshape((4,1))
idxB = np.array([0, 1]).reshape((1,2))
target = source[idxA, idxB]
expected = np.array([[0, 1],[0, 1],[2, 3],[2, 3]])
assert np.allclose(target, expected)
At this point, you can understand where your exception comes from. Your source.ndim is 4; however, you try to index it with a 2-tuple (indLargest2ndAxis, 1). Numpy will interpret this as you trying to index the first axis using indLargest2ndAxis, the second axis using 1, and all other axis using :. Clearly, this doesn't work. All values of indLargest2ndAxis would have to be between 0 and 4 (inclusive), since they would have to refer to positions along the first axis of x.
What my suggestion of x[..., indLargest2ndAxis, 1] does is tell numpy that you wish to index the last two axes of x, i.e., you wish to index the third axis using indLargest2ndAxis, the fourth axis using 1, and : for anything else.
This will produce a result since all elements of indLargest2ndAxis are in [0, 10), but will produce a shape of (5, 10, 5, 10, 6) (which is not what you want). Being a bit hand-wavy, the first part of the shape (5, 10) comes from the ellipsis (...), aka. select everything, the middle part (5, 10, 6) comes from indLargest2ndAxis selecting elements along the third axis of x according to the shape of indLargest2ndAxis and the final part (which you don't see because it is squeezed) comes from selecting index 1 along the fourth axis.
Moving on to your actual problem, you can entirely dodge the fancy indexing bullet and do the following:
x = np.arange(1000).reshape(5, 10, 10, 2)
order = x[..., 0]
values = x[..., 1]
idx = np.argpartition(order, 4)[..., 4:]
result = np.take_along_axis(values, idx, axis=-1)
Edit: Of course, you can also use fancy indexing; however, it is more cryptic and doesn't scale as nicely to different shapes:
x = np.arange(1000).reshape(5, 10, 10, 2)
indLargest2ndAxis = np.argpartition(x[..., 0], 4)[..., 4:]
result = x[np.arange(5)[:, None, None], np.arange(10)[None, :, None], indLargest2ndAxis, 1]

Setting results of torch.gather(...) calls

I have a 2D pytorch tensor of shape n by m. I want to index the second dimension using a list of indices (which could be done with torch.gather) then then also set new values to the result of the indexing.
Example:
data = torch.tensor([[0,1,2], [3,4,5], [6,7,8]]) # shape (3,3)
indices = torch.tensor([1,2,1], dtype=torch.long).unsqueeze(-1) # shape (3,1)
# data tensor:
# tensor([[0, 1, 2],
# [3, 4, 5],
# [6, 7, 8]])
I want to select the specified indices per row (which would be [1,5,7] but then also set these values to another number - e.g. 42
I can select the desired columns row wise by doing:
data.gather(1, indices)
tensor([[1],
[5],
[7]])
data.gather(1, indices)[:] = 42 # **This does NOT work**, since the result of gather
# does not use the same storage as the original tensor
which is fine, but I would like to change these values now, and have the change also affect the data tensor.
I can do what I want to achieve using this, but it seems to be very un-pythonic:
max_index = torch.max(indices)
for i in range(0, max_index + 1):
mask = (indices == i).nonzero(as_tuple=True)[0]
data[mask, i] = 42
print(data)
# tensor([[ 0, 42, 2],
# [ 3, 4, 42],
# [ 6, 42, 8]])
Any hints on how to do that more elegantly?
What you are looking for is torch.scatter_ with the value option.
Tensor.scatter_(dim, index, src, reduce=None) → Tensor
Writes all values from the tensor src into self at the indices specified in the index tensor. For each value in src, its output index is specified by
its index in src for dimension != dim and by the corresponding value
in index for dimension = dim.
With 2D tensors as input and dim=1, the operation is:
self[i][index[i][j]] = src[i][j]
No mention of the value parameter though...
With value=42, and dim=1, this will have the following effect on data:
data[i][index[i][j]] = 42
Here applied in-place:
>>> data.scatter_(index=indices, dim=1, value=42)
>>> data
tensor([[ 0, 42, 2],
[ 3, 4, 42],
[ 6, 42, 8]])

How to concatenate an array into first position of n-d array in numpy?

I need to add extra element to first position of n-d array in numpy.
Here is the code:
tmp_boxes3d = [None,6]
tmp_scores = [None]
Now I need to add tmp_scores element into first position of tmp_boxes3d.
For better understanding
boxes = np.array([[1,2,3,4],[5,6,7,8]])
scores = np.array([0,0])
boxes_scores = [[0,1,2,3,4],[0,5,6,7,8]] # result
So I need to do for [None,6] shape.
Can anyone help me with this?
Try np.concatenate
boxes = np.array([[1,2,3,4],[5,6,7,8]])
scores = np.array([0,0])
np.concatenate((scores[:, np.newaxis], boxes), axis=1)
array([[0, 1, 2, 3, 4],
[0, 5, 6, 7, 8]])
Or, np.hstack
np.hstack((scores[:, np.newaxis], boxes))

numpy 3d to 2d transformation based on 2d mask array

If I have an ndarray like this:
>>> a = np.arange(27).reshape(3,3,3)
>>> a
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
I know I can get the maximum along a certain axis using np.max(axis=...):
>>> a.max(axis=2)
array([[ 2, 5, 8],
[11, 14, 17],
[20, 23, 26]])
Alternatively, I could get the indices along that axis which correspond to the maximum values from:
>>> indices = a.argmax(axis=2)
>>> indices
array([[2, 2, 2],
[2, 2, 2],
[2, 2, 2]])
My question -- Given the array indices and the array a, is there an elegant way to reproduce the array the array returned by a.max(axis=2)?
This would probably work:
import itertools as it
import numpy as np
def apply_mask(field,indices):
data = np.empty(indices.shape)
#It seems highly likely that there is a more numpy-approved way to do this.
idx = [range(i) for i in indices.shape]
for idx_tup,zidx in zip(it.product(*idx),indices.flat):
data[idx_tup] = field[idx_tup+(zidx,)]
return data
But, it seems pretty hacky/inefficient. It also doesn't allow for me to use this with any axis other than the "last" axis. Is there a numpy function (or some use of magical numpy indexing) to make this work? The naive a[:,:,a.argmax(axis=2)] doesn't work.
UPDATE:
It seems the following also works (and is a little nicer):
import numpy as np
def apply_mask(field,indices):
data = np.empty(indices.shape)
for idx_tup,zidx in np.ndenumerate(indices):
data[idx_tup] = field[idx_tup+(zidx,)]
return data
I would like to do this because I would like to extract the indices based on the data in 1 array (typically using argmax(axis=...)) and use those indices to pull data out of a bunch of other (equivalently shaped) arrays. I'm open to alternative ways to accomplish this (e.g. using boolean masked arrays). However, I like the "safety" that I get using these "index" arrays. With this I am guaranteed to have the right number of elements to create a new array which looks like a 2d "slice" through the 3d field.
Here is some magic numpy indexing that will do what you want, but unfortunately it's pretty unreadable.
def apply_mask(a, indices, axis):
magic_index = [np.arange(i) for i in indices.shape]
magic_index = np.ix_(*magic_index)
magic_index = magic_index[:axis] + (indices,) + magic_index[axis:]
return a[magic_index]
or equally unreadable:
def apply_mask(a, indices, axis):
magic_index = np.ogrid[tuple(slice(i) for i in indices.shape)]
magic_index.insert(axis, indices)
return a[magic_index]
I use index_at() to create the full index:
import numpy as np
def index_at(idx, shape, axis=-1):
if axis<0:
axis += len(shape)
shape = shape[:axis] + shape[axis+1:]
index = list(np.ix_(*[np.arange(n) for n in shape]))
index.insert(axis, idx)
return tuple(index)
a = np.random.randint(0, 10, (3, 4, 5))
axis = 1
idx = np.argmax(a, axis=axis)
print a[index_at(idx, a.shape, axis=axis)]
print np.max(a, axis=axis)

How to make a multidimension numpy array with a varying row size?

I would like to create a two dimensional numpy array of arrays that has a different number of elements on each row.
Trying
cells = numpy.array([[0,1,2,3], [2,3,4]])
gives an error
ValueError: setting an array element with a sequence.
We are now almost 7 years after the question was asked, and your code
cells = numpy.array([[0,1,2,3], [2,3,4]])
executed in numpy 1.12.0, python 3.5, doesn't produce any error and
cells contains:
array([[0, 1, 2, 3], [2, 3, 4]], dtype=object)
You access your cells elements as cells[0][2] # (=2) .
An alternative to tom10's solution if you want to build your list of numpy arrays on the fly as new elements (i.e. arrays) become available is to use append:
d = [] # initialize an empty list
a = np.arange(3) # array([0, 1, 2])
d.append(a) # [array([0, 1, 2])]
b = np.arange(3,-1,-1) #array([3, 2, 1, 0])
d.append(b) #[array([0, 1, 2]), array([3, 2, 1, 0])]
While Numpy knows about arrays of arbitrary objects, it's optimized for homogeneous arrays of numbers with fixed dimensions. If you really need arrays of arrays, better use a nested list. But depending on the intended use of your data, different data structures might be even better, e.g. a masked array if you have some invalid data points.
If you really want flexible Numpy arrays, use something like this:
numpy.array([[0,1,2,3], [2,3,4]], dtype=object)
However this will create a one-dimensional array that stores references to lists, which means that you will lose most of the benefits of Numpy (vector processing, locality, slicing, etc.).
This isn't well supported in Numpy (by definition, almost everywhere, a "two dimensional array" has all rows of equal length). A Python list of Numpy arrays may be a good solution for you, as this way you'll get the advantages of Numpy where you can use them:
cells = [numpy.array(a) for a in [[0,1,2,3], [2,3,4]]]
Another option would be to store your arrays as one contiguous array and also store their sizes or offsets. This takes a little more conceptual thought around how to operate on your arrays, but a surprisingly large number of operations can be made to work as if you had a two dimensional array with different sizes. In the cases where they can't, then np.split can be used to create the list that calocedrus recommends. The easiest operations are ufuncs, because they require almost no modification. Here are some examples:
cells_flat = numpy.array([0, 1, 2, 3, 2, 3, 4])
# One of these is required, it's pretty easy to convert between them,
# but having both makes the examples easy
cell_lengths = numpy.array([4, 3])
cell_starts = numpy.insert(cell_lengths[:-1].cumsum(), 0, 0)
cell_lengths2 = numpy.diff(numpy.append(cell_starts, cells_flat.size))
assert np.all(cell_lengths == cell_lengths2)
# Copy prevents shared memory
cells = numpy.split(cells_flat.copy(), cell_starts[1:])
# [array([0, 1, 2, 3]), array([2, 3, 4])]
numpy.array([x.sum() for x in cells])
# array([6, 9])
numpy.add.reduceat(cells_flat, cell_starts)
# array([6, 9])
[a + v for a, v in zip(cells, [1, 3])]
# [array([1, 2, 3, 4]), array([5, 6, 7])]
cells_flat + numpy.repeat([1, 3], cell_lengths)
# array([1, 2, 3, 4, 5, 6, 7])
[a.astype(float) / a.sum() for a in cells]
# [array([ 0. , 0.16666667, 0.33333333, 0.5 ]),
# array([ 0.22222222, 0.33333333, 0.44444444])]
cells_flat.astype(float) / np.add.reduceat(cells_flat, cell_starts).repeat(cell_lengths)
# array([ 0. , 0.16666667, 0.33333333, 0.5 , 0.22222222,
# 0.33333333, 0.44444444])
def complex_modify(array):
"""Some complicated function that modifies array
pretend this is more complex than it is"""
array *= 3
for arr in cells:
complex_modify(arr)
cells
# [array([0, 3, 6, 9]), array([ 6, 9, 12])]
for arr in numpy.split(cells_flat, cell_starts[1:]):
complex_modify(arr)
cells_flat
# array([ 0, 3, 6, 9, 6, 9, 12])
In numpy 1.14.3, using append:
d = [] # initialize an empty list
a = np.arange(3) # array([0, 1, 2])
d.append(a) # [array([0, 1, 2])]
b = np.arange(3,-1,-1) #array([3, 2, 1, 0])
d.append(b) #[array([0, 1, 2]), array([3, 2, 1, 0])]
what you get an list of arrays (that can be of different lengths) and you can do operations like d[0].mean(). On the other hand,
cells = numpy.array([[0,1,2,3], [2,3,4]])
results in an array of lists.
You may want to do this:
a1 = np.array([1,2,3])
a2 = np.array([3,4])
a3 = np.array([a1,a2])
a3 # array([array([1, 2, 3]), array([3, 4])], dtype=object)
type(a3) # numpy.ndarray
type(a2) # numpy.ndarray
Slightly off-topic, but not as much as one would think because of eager mode which is now the default:
If you are using Tensorflow, you can do:
a = tf.ragged.constant([[0, 1, 2, 3]])
b = tf.ragged.constant([[2, 3, 4]])
c = tf.concat([a, b], axis=0)
And you can then do all the mathematical operations still, like tf.math.reduce_mean, etc.
np.array([[0,1,2,3], [2,3,4]], dtype=object) returns an "array" of lists.
a = np.array([np.array([0,1,2,3]), np.array([2,3,4])], dtype=object) returns an array of arrays. It allows already for operations such as a+1.
Building up on this, the functionality can be enhanced by subclassing.
import numpy as np
class Arrays(np.ndarray):
def __new__(cls, input_array, dims=None):
obj = np.array(list(map(np.array, input_array))).view(cls)
return obj
def __getitem__(self, ij):
if isinstance(ij, tuple) and len(ij) > 1:
# handle twodimensional slicing
if isinstance(ij[0],slice) or hasattr(ij[0], '__iter__'):
# [1:4,:] or [[1,2,3],[1,2]]
return Arrays(arr[ij[1]] for arr in self[ij[0]])
return self[ij[0]][ij[1]] # [1,:] np.array
return super(Arrays, self).__getitem__(ij)
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
axis = kwargs.pop('axis', None)
dimk = [len(arg) if hasattr(arg, '__iter__') else 1 for arg in inputs]
dim = max(dimk)
pad_inputs = [([i]*dim if (d<dim) else i) for d,i in zip(dimk, inputs)]
result = [np.ndarray.__array_ufunc__(self, ufunc, method, *x, **kwargs) for x in zip(*pad_inputs)]
if method == 'reduce':
# handle sum, min, max, etc.
if axis == 1:
return np.array(result)
else:
# repeat over remaining axis
return np.ndarray.__array_ufunc__(self, ufunc, method, result, **kwargs)
return Arrays(result)
Now this works:
a = Arrays([[0,1,2,3], [2,3,4]])
a[0:1,0:-1]
# Arrays([[0, 1, 2]])
np.sin(a)
# Arrays([array([0. , 0.84147098, 0.90929743, 0.14112001]),
# array([ 0.90929743, 0.14112001, -0.7568025 ])], dtype=object)
a + 2*a
# Arrays([array([0, 3, 6, 9]), array([ 6, 9, 12])], dtype=object)
To get nanfunctions working, this can be done
# patch for nanfunction that cannot handle the object-ndarrays along with second axis=-1
def nanpatch(func):
def wrapper(a, axis=None, **kwargs):
if isinstance(a, Arrays):
rowresult = [func(x, **kwargs) for x in a]
if axis == 1:
return np.array(rowresult)
else:
# repeat over remaining axis
return func(rowresult)
# otherwise keep the original version
return func(a, axis=axis, **kwargs)
return wrapper
np.nanmean = nanpatch(np.nanmean)
np.nansum = nanpatch(np.nansum)
np.nanmin = nanpatch(np.nanmin)
np.nanmax = nanpatch(np.nanmax)
np.nansum(a)
# 15
np.nansum(a, axis=1)
# array([6, 9])

Categories

Resources