This question already has an answer here:
Numpy get values from np.argmin indices [duplicate]
(1 answer)
Closed 1 year ago.
I want to find the location of minima along a given axis in a rank-3 numpy array. I have obtained these locations with np.argmin, however I'm not sure how to "apply" this to the original matrix to get the actual minima.
For example:
import numpy as np
a = np.random.randn(10, 5, 2)
min_loc = a.argmin(axis = 0) # this gives an array of shape (5, 2)
Now, the problem is how do I get the actual minima using min_loc? I have tried a[min_loc], which gives me a shape (5, 2, 5, 2). What's the logic for this shape? How can I use this auxiliary matrix to get a sensible solution of shape (5, 2)
Note that a.min(axis = 0) is not the solution I'm looking for. I need a solution via argmin.
a[min_loc] does integer array indexing on the first dimension, i.e. it will pick up (5, 2) shaped array for each index in min_loc. Since min_loc itself is (5, 2) shaped, and for each integer in min_loc, it picks up another (5, 2) shaped array. You end up with a (5, 2, 5, 2) array. Same reason a[np.array([0, 3])] has a shape of (2, 5, 2) and a[np.array([[0], [3]])] has a shape of (2, 1, 5, 2), since you only provide the index for the 1st dimension.
For your usecase, you do not want to pick up a subarray for each index in min_loc but rather you need an element. For instance, if you have min_loc = [[5, ...], ...], the first element should have a full indice of 5, 0, 0 instead of 5, :, :. This is exactly what advanced indexing does. Basically by providing an integer array as index for each dimension, you can pick up the element corresponding to the specific positions. And you can construct indices for the 2nd and 3rd dimensions from a (5, 2) shape with np.indices:
j, k = np.indices(min_loc.shape)
a[min_loc, j, k]
# [[-1.82762089 -0.80927253]
# [-1.06147046 -1.70961507]
# [-0.59913623 -1.10963768]
# [-2.57382762 -0.77081778]
# [-1.6918745 -1.99800825]]
where j, k are coordinates for the 2nd and 3rd dimensions:
j
#[[0 0]
# [1 1]
# [2 2]
# [3 3]
# [4 4]]
k
#[[0 1]
# [0 1]
# [0 1]
# [0 1]
# [0 1]]
Or as #hpaulj commented, use np.take_along_axis method:
np.take_along_axis(a, min_loc[None], axis=0)
# [[[-0.93515242 -2.29665325]
# [-1.30864779 -1.483428 ]
# [-1.24262879 -0.71030707]
# [-1.40322789 -1.35580273]
# [-2.10997209 -2.81922197]]]
Related
I was recently given task (during exam, not funny) to create function returning cumulative sum along given dimension (input: 2d array), without use of np.cumsum ofc; to be honest i find this quite hard to even start with.
function should look like this:
def cumsum_2d(array : np.ndarray, dim : int = 0) -> np.ndarray:
and then result is supposed to be compared with result from actual np.cumsum
I would be grateful for even basic outline or general idea what to do.
Here is another approach that doesn't use ufunc.accumulate or functools.reduce.
It works by inserting an extra dimension, broadcasting the array along that dimension, and then doing a sum where it only considers indices less than or equal to the current index along the summation dimension.
It's morally similar to a brute-force approach where you make a bunch of copies of the array, set the elements you don't want to zero, and then doing the sum.
import numpy as np
def cumsum_2d(array: np.ndarray, dim: int = 0):
# Make sure the dim argument is positive
dim = dim % array.ndim
# Calculate the new shape with an extra copy of dim
shape_new = list(array.shape)
shape_new.insert(dim + 1, array.shape[dim])
# Insert the new dimension and broadcast the array along that dimension
array = np.broadcast_to(np.expand_dims(array, dim + 1), shape_new)
# Save the indices of the array
indices = np.indices(array.shape)
# Sum along the requested dimension, considering only the elements less than the current index
return np.sum(array, axis=dim, where=indices[dim] <= indices[dim + 1])
a = np.random.random((4, 5))
assert np.array_equal(cumsum_2d(a, 1), np.cumsum(a, 1))
assert np.array_equal(cumsum_2d(a, 0), np.cumsum(a, 0))
assert np.array_equal(cumsum_2d(a, -1), np.cumsum(a, -1))
assert np.array_equal(cumsum_2d(a, -2), np.cumsum(a, -2))
Note that this function should work for arrays of any rank, not just two-dimensional ones.
This approach is fairly "from scratch". It does use functools.reduce(), which I assume must be permitted.
import functools
import numpy as np
def cumsum_2d(array: np.ndarray, dim: int = 0) -> np.ndarray:
if not isinstance(dim, int) or not 0 <= dim <= 1:
raise ValueError('"dim": expected integer 0 or 1, got {dim}.')
elif not array.ndim == 2:
raise ValueError(
f"{array.ndim} dimensional array not allowed - 2 dimensional arrays expected."
)
array = array.T if dim == 1 else array
result = [
functools.reduce(lambda x, y: x + y, array[: i + 1]) for i in range(len(array))
]
result = np.array(result)
result = result.T if dim == 1 else result
return result
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
dim = 1
print(f"For dim = {dim} and a= \n{a}:")
print(f"...got: \n{cumsum_2d(a, dim)}")
print(f"...expected: \n{np.cumsum(a, dim)}")
This has the result:
# For dim = 1 and a=
# [[1 2 3]
# [4 5 6]
# [7 8 9]]:
# ...got:
# [[ 1 3 6]
# [ 4 9 15]
# [ 7 15 24]]
# ...expected:
# [[ 1 3 6]
# [ 4 9 15]
# [ 7 15 24]]
Trying with dim = 1 raises ValueError per the function definition - this mimics the AxisError raised by np.cumsum under similar circumstances:
ValueError: "dim": expected integer 0 or 1, got 2.
Lastly, trying with a non 2-D array also raises an customised ValueError as programmed, ensuring the user doesn't get any silently passed unexpected behaviour.
b = np.array([[[1, 2, 3], [1, 2, 3]], [[4, 5, 6], [1, 2, 3]], [[7, 8, 9], [1, 2, 3]]])
cumsum_2d(b, dim)
Result:
ValueError: 3 dimensional array not allowed - 2 dimensional arrays expected.
Problem:
I have a numpy array of 4 dimensions:
x = np.arange(1000).reshape(5, 10, 10, 2 )
If we print it:
I want to find the indices of the 6 largest values of the array in the 2nd axis but only for the 0th element in the last axis (red circles in the image):
indLargest2ndAxis = np.argpartition(x[...,0], 10-6, axis=2)[...,10-6:]
These indices have a shape of (5,10,6) as expected.
I want to obtain the values of the array for these indices in the 2nd axis but now for the 1st element in the last axis (yellow circles in the image). They should have a shape of (5,10,6). Without vectorizing, this could be done with:
np.array([ [ [ x[i, j, k, 1] for k in indLargest2ndAxis[i,j]] for j in range(10) ] for i in range(5) ])
However, I would like to achieve it vectorizing. I tried indexing with:
x[indLargest2ndAxis, 1]
But I get IndexError: index 5 is out of bounds for axis 0 with size 5. How can I manage this indexing combination in a vectorized way?
Ah, I think I now get what you are after. Fancy indexing is documented here in detail. Be warned though that - in its full generality - this is quite heavy stuff. In a nutshell, fancy indexing allows you to take elements from a source array (according to some idx) and place them into a new array (fancy indexing allways returns a copy):
source = np.array([10.5, 21, 42])
idx = np.array([0, 1, 2, 1, 1, 1, 2, 1, 0])
# this is fancy indexing
target = source[idx]
expected = np.array([10.5, 21, 42, 21, 21, 21, 42, 21, 10.5])
assert np.allclose(target, expected)
What is nice about this is that you can control the shape of the resulting array using the shape of the index array:
source = np.array([10.5, 21, 42])
idx = np.array([[0, 1], [1, 2]])
target = source[idx]
expected = np.array([[10.5, 21], [21, 42]])
assert np.allclose(target, expected)
assert target.shape == (2,2)
Where things get a little more interesting is if source has more than one dimension. In this case, you need to specify the indices of each axis so that numpy knows which elements to take:
source = np.arange(4).reshape(2,2)
idxA = np.array([0, 1])
idxB = np.array([0, 1])
# this will take (0,0) and (1,1)
target = source[idxA, idxB]
expected = np.array([0, 3])
assert np.allclose(target, expected)
Observe that, again, the shape of target matches the shape of the index used. What is awesome about fancy indexing is that index shapes are broadcasted if necessary:
source = np.arange(4).reshape(2,2)
idxA = np.array([0, 0, 1, 1]).reshape((4,1))
idxB = np.array([0, 1]).reshape((1,2))
target = source[idxA, idxB]
expected = np.array([[0, 1],[0, 1],[2, 3],[2, 3]])
assert np.allclose(target, expected)
At this point, you can understand where your exception comes from. Your source.ndim is 4; however, you try to index it with a 2-tuple (indLargest2ndAxis, 1). Numpy will interpret this as you trying to index the first axis using indLargest2ndAxis, the second axis using 1, and all other axis using :. Clearly, this doesn't work. All values of indLargest2ndAxis would have to be between 0 and 4 (inclusive), since they would have to refer to positions along the first axis of x.
What my suggestion of x[..., indLargest2ndAxis, 1] does is tell numpy that you wish to index the last two axes of x, i.e., you wish to index the third axis using indLargest2ndAxis, the fourth axis using 1, and : for anything else.
This will produce a result since all elements of indLargest2ndAxis are in [0, 10), but will produce a shape of (5, 10, 5, 10, 6) (which is not what you want). Being a bit hand-wavy, the first part of the shape (5, 10) comes from the ellipsis (...), aka. select everything, the middle part (5, 10, 6) comes from indLargest2ndAxis selecting elements along the third axis of x according to the shape of indLargest2ndAxis and the final part (which you don't see because it is squeezed) comes from selecting index 1 along the fourth axis.
Moving on to your actual problem, you can entirely dodge the fancy indexing bullet and do the following:
x = np.arange(1000).reshape(5, 10, 10, 2)
order = x[..., 0]
values = x[..., 1]
idx = np.argpartition(order, 4)[..., 4:]
result = np.take_along_axis(values, idx, axis=-1)
Edit: Of course, you can also use fancy indexing; however, it is more cryptic and doesn't scale as nicely to different shapes:
x = np.arange(1000).reshape(5, 10, 10, 2)
indLargest2ndAxis = np.argpartition(x[..., 0], 4)[..., 4:]
result = x[np.arange(5)[:, None, None], np.arange(10)[None, :, None], indLargest2ndAxis, 1]
If the 3x4 matrix is shown below,
a=[[1,2,3,4], [5,6,7,8], [9,10,11,12]]
I want to find 7 and draw the coordinate value (2,3) into a variable.
Do you have a built-in function?
In matlab, [row, col] = find(a==7), and result is row=2,col=3.
I'm curious about how Python works.
After initializing the value of the matrix value you want,
val = 7
here is a nice one-liner:
array = [(ix,iy) for ix, row in enumerate(a) for iy, i in enumerate(row) if i == val]
Output of print(array):
[(1, 2)]
Note the one-liner will catch all instances of the number 7 in a matrix, not just one. Also note the indexes start at 0, so row 2 will be displayed as 1 and column 3 will be displayed as 2. If, say, you have more than one instance of 7 in a row and want the actual row and column numbers (not starting at 0), this may be helpful:
a=[[1,7,7,4], [5,6,7,8], [9,10,11,7]]
val = 7
array = [(ix+1,iy+1) for ix, row in enumerate(a) for iy, i in enumerate(row) if i == val]
print(array)
Output:
[(1, 2), (1, 3), (2, 3), (3, 4)]
To do it similar to Matlab you would have to use numpy
import numpy as np
a = [[1,2,3,4], [5,6,7,8], [9,10,11,12]]
a = np.array(a)
rows, cols = np.where(a == 7)
print(rows[0], cols[0])
It can find all 7 in matrix so it returns rows, cols as lists.
And it counts rows/cols starting at 0 so you may have to add +1 to get the same results as matlab
I would use numpy's where function. Here's another post that displays it's use nicely. I'd apply it to your use case like so:
import numpy as np
arr = np.array([[1, 2, 3],[4, 100, 6],[100, 8, 9]])
positions = np.where(arr == 100)
# positions = (array([1, 2], dtype=int64), array([1, 0], dtype=int64))
positions = [tuple(cor.item() for cor in pos) for pos in positions]
# positions = [(1, 2), (1, 0)]
Note that this solution allows for the possibly that the desired pattern might occur more than once.
I am working with matrices of (x,y,z) dimensions, and would like to index numerous values from this matrix simultaneously.
ie. if the index A[0,0,0] = 5
and A[1,1,1] = 10
A[[1,1,1], [5,5,5]] = [5, 10]
however indexing like this seems to return huge chunks of the matrix.
Does anyone know how I can accomplish this? I have a large array of indices (n, x, y, z) that i need to use to index from A)
Thanks
You are trying to use 1 as the first index 3 times and 5 as the index into the second dimension (again three times). This will give you the element at A[1,5,:] repeated three times.
A = np.random.rand(6,6,6);
B = A[[1,1,1], [5,5,5]]
# [[ 0.17135991, 0.80554887, 0.38614418, 0.55439258, 0.66504806, 0.33300839],
# [ 0.17135991, 0.80554887, 0.38614418, 0.55439258, 0.66504806, 0.33300839],
# [ 0.17135991, 0.80554887, 0.38614418, 0.55439258, 0.66504806, 0.33300839]]
B.shape
# (3, 6)
Instead, you will want to specify [1,5] for each axis of your matrix.
A[[1,5], [1,5], [1,5]] = [5, 10]
Advanced indexing works like this:
A[I, J, K][n] == A[I[n], J[n], K[n]]
with A, I, J, and K all arrays. That's not the full, general rule, but it's what the rules simplify down to for what you need.
For example, if you want output[0] == A[0, 0, 0] and output[1] == A[1, 1, 1], then your I, J, and K arrays should look like np.array([0, 1]). Lists also work:
A[[0, 1], [0, 1], [0, 1]]
I have an M-dimensional np.ndarray, where M <= N. Beyond this condition, the array may have any shape. I want to convert this array to N-dimensional, with dimensions 0 through M kept the same and dimensions M through N set to 1.
I can almost accomplish this behavior by copying the array using np.array and supplying the the ndmin argument. However, this places extra axis to the 'first' rather than 'last' positions:
>>> a3d = np.zeros((2,3,4))
>>> a5d = np.array(a3d, ndmin = 5)
>>> a5d.shape
(1, 1, 2, 3, 4) #actual shape
(2, 3, 4, 1, 1) #desired shape
Is there a way to specify where the added dimensions should go? Is there an alternate function I can use here which can result in my desired output?
Obviously in the example above I could manipulate the array after the fact to put axes in the order I want them, but since the orignal array could have had anywhere from 0 to 5 dimensions (and I want to keep original dimensions in the original order), I can't think of a way to do that without a tedious series of checks on the original shape.
I'd use .reshape ...
>>> a3d = a3d.reshape(a3d.shape + (1, 1))
>>> a3d.shape
(2, 3, 4, 1, 1)
If you want to pad up to a certain dimensionality:
>>> a3d = np.zeros((2,3,4))
>>> ndim = 5
>>> padded_shape = (a3d.shape + (1,)*ndim)[:ndim]
>>> new_a3d = a3d.reshape(padded_shape)
>>> new_a3d.shape
(2, 3, 4, 1, 1)
Just set
a5d = np.array(a3d)
a5d.shape = a3d.shape + (1, 1)
print a5d.shape
(2, 3, 4, 1, 1)
since the arrays are of the same physical size