In a related question I learned that if I have an array of shape MxMxN, and I want to select based on a boolean matrix of shape MxM, I can simply do
data[select, ...]
and be done with it. Unfortunately, now I have my data in a different order:
import numpy as np
data = np.arange(36).reshape((3, 4, 3))
select = np.random.choice([0, 1], size=9).reshape((3, 3)).astype(bool)
For each element in data indexed i0, i1, i2, it should be selected, if select[i0, i2] == True.
How can I proceed with my selection without having to do something inefficient like
data.flatten()[np.repeat(select[:, None, :], 4, axis=1).flatten()]
One way would be to simply use np.broadcast_to to broadcast without actual replication and use that broadcasted mask directly for masking required elements -
mask = np.broadcast_to(select[:,None,:], data.shape)
out = data[mask]
Another way and probably faster one would be to get the indices and then index with those. The elements thus obtained would be ordered by axis=1. The implementation would look something like this -
idx = np.argwhere(select)
out = data[idx[:,0], :, idx[:,1]]
Related
It is similar to some questions around SO, but I don't quite understand the trick to get what I want.
I have two arrays,
arr of shape (x, y, z)
indexes of shape (x, y) which hold indexes of interest for z.
For each value of indexes I want to get the actual value in arr where:
arr.x == indexes.x
arr.y == indexes.y
arr.z == indexes[x,y]
This would give an array of shape(x,y) similar to indexes' shape.
For example:
arr = np.arange(99)
arr = arr.reshape(3,3,11)
indexes = np.asarray([
[0,2,2],
[1,2,3],
[3,2,10]])
# indexes.shape == (3,3)
# Example for the first element to be computed
first_element = arr[0,0,indexes[0,0]]
With the above indexes, the expected arrays would look like:
expected_result = np.asarray([
[0,13,24],
[34,46,58],
[69,79,98]])
I tried elements = np.take(arr, indexes, axis=z)
but it gives an array of shape (x, y, x, y)
I also tried things like elements = arr[indexes, indexes,:] but I don't get what I wish.
I saw a few answers involving transposing indexes and transforming it into tuples but I don't understand how it would help.
Note: I'm a bit new to numpy so I don't fully understand indexing yet.
How would you solve this numpy style ?
This can be done using np.take_along_axis
import numpy as np
#sample data
np.random.seed(0)
arr = np.arange(3*4*2).reshape(3, 4, 2) # 3d array
idx = np.random.randint(0, 2, (3, 4)) # array of indices
out = np.squeeze(np.take_along_axis(arr, idx[..., np.newaxis], axis=-1))
In this code, the array of indices gets added one more axis, so it can be broadcasted to the shape of the array arr from which we are making the selection. Then, since the return value of np.take_along_axis has the same shape as the array of indices, we need to remove this extra dimension using np.squeeze.
Another option is to use np.choose, but in this case the axis along which you are making selections must be moved to be the first axis of the array:
out = np.choose(idx, np.moveaxis(arr, -1, 0))
The solution here should work for you: Indexing 3d numpy array with 2d array
Adapted to your code:
ax_0 = np.arange(arr.shape[0])[:,None]
ax_1 = np.arange(arr.shape[1])[None,:]
new_array = arr[ax_0, ax_1, indexes]
You can perform such an operation with np.take_along_axis, the operation can only be applied along one dimension so you will need to reshape your input and indices.
The operation you are looking to perform is:
out[i, j] = arr[i, j, indices[i, j]]
However, we are forced to reshape both arr and indices, i.e. map (i, j) to k, such that we can apply np.take_along_axis. The following operation will take place:
out[k] = arr[k, indices[k]] # indexing along axis=1
The actual usage here comes down to:
>>> put = np.take_along_axis(arr.reshape(9, 11), indices.reshape(9, 1), axis=1)
array([[ 0],
[13],
[24],
[34],
[46],
[58],
[69],
[79],
[91]])
Then reshape back to the shape of indices:
>>> put.reshape(indices.shape)
array([[ 0, 13, 24],
[34, 46, 58],
[69, 79, 91]])
Let's say I have a 4d array A with shape (D0, D1, D2, D3). I have a 1d array B with shape (D0,), which includes the indices I need at axis 2.
The trivial way to implement what I need:
output_lis = []
for i in range(D0):
output_lis.append(A[i, :, B[i], :])
#output = np.concatenate(output_lis, axis=0) #it is wrong to use concatenate. Thanks to #Mad Physicist. Instead, using stack.
output = np.stack(output_lis, axis=0) #shape: [D0, D1, D3]
So, my question is how to implement it with numpy API in a fast way?
Use fancy indexing to step along two dimensions in lockstep. In this case, arange provides the sequence i, while B provides the sequence B[i]:
A[np.arange(D0), :, B, :]
The shape of this array is indeed (D0, D1, D3), unlike the shape of your for loop result.
To get the same result from your example, use stack (which adds a new axis), rather than concatenate (which uses an existing axis):
output = np.stack(output_lis, axis=0)
I need to select only the non-zero 3d portions of a 3d binary array (or alternatively the true values of a boolean array). Currently I am able to do so with a series of 'for' loops that use np.any, but this does work but seems awkward and slow, so currently investigating a more direct way to accomplish the task.
I am rather new to numpy, so the approaches that I have tried include a) using
np.nonzero, which returns indices that I am at a loss to understand what to do with for my purposes, b) boolean array indexing, and c) boolean masks. I can generally understand each of those approaches for simple 2d arrays, but am struggling to understand the differences between the approaches, and cannot get them to return the right values for a 3d array.
Here is my current function that returns a 3D array with nonzero values:
def real_size(arr3):
true_0 = []
true_1 = []
true_2 = []
print(f'The input array shape is: {arr3.shape}')
for zero_ in range (0, arr3.shape[0]):
if arr3[zero_].any()==True:
true_0.append(zero_)
for one_ in range (0, arr3.shape[1]):
if arr3[:,one_,:].any()==True:
true_1.append(one_)
for two_ in range (0, arr3.shape[2]):
if arr3[:,:,two_].any()==True:
true_2.append(two_)
arr4 = arr3[min(true_0):max(true_0) + 1, min(true_1):max(true_1) + 1, min(true_2):max(true_2) + 1]
print(f'The nonzero area is: {arr4.shape}')
return arr4
# Then use it on a small test array:
test_array = np.zeros([2, 3, 4], dtype = int)
test_array[0:2, 0:2, 0:2] = 1
#The function call works and prints out as expected:
non_zero = real_size(test_array)
>> The input array shape is: (2, 3, 4)
>> The nonzero area is: (2, 2, 2)
# So, the array is correct, but likely not the best way to get there:
non_zero
>> array([[[1, 1],
[1, 1]],
[[1, 1],
[1, 1]]])
The code works appropriately, but I am using this on much larger and more complex arrays, and don't think this is an appropriate approach. Any thoughts on a more direct method to make this work would be greatly appreciated. I am also concerned about errors and the results if the input array has for example two separate non-zero 3d areas within the original array.
To clarify the problem, I need to return one or more 3D portions as one or more 3d arrays beginning with an original larger array. The returned arrays should not include extraneous zeros (or false values) in any given exterior plane in three dimensional space. Just getting the indices of the nonzero values (or vice versa) doesn't by itself solve the problem.
Assuming you want to eliminate all rows, columns, etc. that contain only zeros, you could do the following:
nz = (test_array != 0)
non_zero = test_array[nz.any(axis=(1, 2))][:, nz.any(axis=(0, 2))][:, :, nz.any(axis=(0, 1))]
An alternative solution using np.nonzero:
i = [np.unique(_) for _ in np.nonzero(test_array)]
non_zero = test_array[i[0]][:, i[1]][:, :, i[2]]
This can also be generalized to arbitrary dimensions, but requires a bit more work (only showing the first approach here):
def real_size(arr):
nz = (arr != 0)
result = arr
axes = np.arange(arr.ndim)
for axis in range(arr.ndim):
zeros = nz.any(axis=tuple(np.delete(axes, axis)))
result = result[(slice(None),)*axis + (zeros,)]
return result
non_zero = real_size(test_array)
I would like to iterate through a subset of dimensions of a numpy array and compare the resulting array elements (which are arrays or the remaining dimension(s)).
The code below does this:
import numpy
def min(h,m):
return h*60+m
exclude_times_default=[min(3,00),min(6,55)]
d=exclude_times_default
exclude_times_wkend=[min(3,00),min(9,00)]
w=exclude_times_wkend;
exclude_times=numpy.array([[[min(3,00),min(6,20)],d,d,d,d,d,[min(3,00),min(6,20)],d,d,[min(3,00),min(6,20)]],
[d,d,d,d,[min(3,00),min(9,30)],[min(3,00),min(9,30)],d,d,d,d],
[[min(20,00),min(7,15)],[min(3,00),min(23,15)],[min(3,00),min(7,15)],[min(3,00),min(7,15)],[min(3,00),min(23,15)],[min(3,00),min(23,15)],d,d,d,d]])
num_level=exclude_times.shape[0]
num_wind=exclude_times.shape[1]
for level in range(num_level):
for window in range(num_wind):
if (exclude_times[level,window,:]==d).all():
print("Default")
exclude_times[level][window]=w
print(level,window,exclude_times[level][window])
The solution does not look very elegant to me, just wondering if there are more elegant solutions.
You can get a 2D mask pinpointing all the window/level combinations set to default like this:
mask = (exclude_times == d[None, None, :]).all(axis=-1)
The expression d[None, None, :] introduces two new axes into a view of d to make it broadcast to the shape of exclude_times properly. Another way to do that would be with an explicit reshape: np.reshape(d, (1, 1, -1)) or d.reshape(1, 1, -1). There are many other ways as well.
The .all(axis=-1) operation reduces the 3D boolean mask along the last axis, giving you a 2D mask indexed be level and window.
To count the number of default entries, use np.countnonzero:
nnz = np.countnonzero(mask)
To count the defaults for each window:
np.countnonzero(mask, axis=0)
To count the defaults for each level:
np.countnonzero(mask, axis=1)
Remember, the axis parameter is the one you reduce, not the one(s) you keep.
Assigning w to the default elements is a bit more complex. The problem is that exclude_times[mask[:, :, None]] is a copy of the original data, and doesn't preserve the shape of the original at all.
You have to do a couple of extra steps to reshape correctly:
exclude_times[mask[:, :, None]] = np.broadcast_to(w[None, :], (nnz, 2)).ravel()
Given a 2D numpy array, i.e.;
import numpy as np
data = np.array([
[11,12,13],
[21,22,23],
[31,32,33],
[41,42,43],
])
I need modify in place a sub-array based on two masking vectors for the desired rows and columns;
rows = np.array([False, False, True, True], dtype=bool)
cols = np.array([True, True, False], dtype=bool)
Such that i.e.;
print data
#[[11,12,13],
# [21,22,23],
# [0,0,33],
# [0,0,43]]
Now that you know how to access the rows/cols you want, just assigne the value you want to your subarray. It's a tad trickier, though:
mask = rows[:,None]*cols[None,:]
data[mask] = 0
The reason is that when we access the subarray as data[rows][:,cols] (as illustrated in your previous question, we're taking a view of a view, and some references to the original data get lost in the way.
Instead, here we construct a 2D boolean array by broadcasting your two 1D arrays rows and cols one with the other. Your mask array has now the shape (len(rows),len(cols). We can use mask to directly access the original items of data, and we set them to a new value. Note that when you do data[mask], you get a 1D array, which was not the answer you wanted in your previous question.
To construct the mask, we could have used the & operator instead of * (because we're dealing with boolean arrays), or the simpler np.outer function:
mask = np.outer(rows,cols)
Edit: props to #Marcus Jones for the np.outer solution.