Manually project coordinates similar to gluLookAt in python - python

I'm trying to implement viewing matrix and projection, similar to gluLookAt to get the view position of each 3D coordinate. I have implemented something that seems close to working but is reversed.
For example - the following code gets the correct position (When I actually don't change the coordinates. But if I change the up-vector to point towards X instead of Y, I get reversed coordinates.
import numpy as np
def normalize_vector(vector):
return vector / (np.linalg.norm(vector))
def get_lookat_matrix(position_vector, front_vector, up_vector):
m1 = np.zeros([4, 4], dtype=np.float32)
m2 = np.zeros([4, 4], dtype=np.float32)
z = normalize_vector(-front_vector)
x = normalize_vector(np.cross(up_vector, z))
y = np.cross(z, x)
m1[:3, 0] = x
m1[:3, 1] = y
m1[:3, 2] = z
m1[3, 3] = 1.0
m2[0, 0] = m2[1, 1] = m2[2, 2] = 1.0
m2[:3, 3] = -position_vector
m2[3, 3] = 1.0
return np.matmul(m1, m2)
def get_projection_matrix(near, far):
aspect = 1.0
fov = 1.0 # 90 Degrees
m = np.zeros([4, 4], dtype=np.float32)
m[0, 0] = fov/aspect
m[1, 1] = fov
m[2, 2] = (-far)/(far-near)
m[2, 3] = (-near*far)/(far-near)
m[3, 2] = -1.0
return m
position_vector = np.array([0, 0, 0], dtype=np.float32)
front_vector = np.array([0, 0, -1], dtype=np.float32)
up_vector = np.array([0, 1, 0], dtype=np.float32)
viewing_matrix = get_lookat_matrix(position_vector=position_vector, front_vector=front_vector, up_vector=up_vector)
print("viewing_matrix\n", viewing_matrix, "\n\n")
projection_matrix = get_projection_matrix(near=0.1, far=100.0)
point = np.array([1, 0, -10, 1], dtype=np.float32)
projected_point =
# Normalize
projected_point /= projected_point[3]
And it happens with many changes of the coordinates. I'm not sure where am I wrong.

gluLookAt defines a 4*4 viewing transformation matrix, for the use of OpenGL.
A "mathematical" 4*4 matrix looks like this:
c0 c1 c2 c3 c0 c1 c2 c3
[ Xx Yx Zx Tx ] [ 0 4 8 12 ]
[ Xy Yy Zy Ty ] [ 1 5 9 13 ]
[ Xz Yz Zz Tz ] [ 2 6 10 14 ]
[ 0 0 0 1 ] [ 3 7 11 15 ]
But the memory image of a 4*4 OpenGL matrix looks like this:
[ Xx, Xy, Xz, 0, Yx, Yy, Yz, 0, Zx, Zy, Zz, 0, Tx, Ty, Tz, 1 ]
See The OpenGL Shading Language 4.6, 5.4.2 Vector and Matrix Constructors, page 101
and OpenGL ES Shading Language 3.20 Specification, 5.4.2 Vector and Matrix Constructors, page 100:
To initialize a matrix by specifying vectors or scalars, the components are assigned to the matrix elements in column-major order.
mat4(float, float, float, float, // first column
float, float, float, float, // second column
float, float, float, float, // third column
float, float, float, float); // fourth column
Note, in compare to a mathematical matrix where the columns are written from top to bottom, which feels natural, at the initialization of an OpenGL matrix, the colums are written from the left to the right. This lead sto the benefit, that the x, y, z components of an axis or of the translation are in direct succession in the memory. This is a big advantage when accessing the axis vectors or the translation vector of the matrix.
See also Data Type (GLSL) - Matrix constructors.
This means you have to "swap" columns and rows (transpose) of the matrix:
def get_lookat_matrix(position_vector, front_vector, up_vector):
m1 = np.zeros([4, 4], dtype=np.float32)
m2 = np.zeros([4, 4], dtype=np.float32)
z = normalize_vector(-front_vector)
x = normalize_vector(np.cross(up_vector, z))
y = np.cross(z, x)
m1[0, :3] = x
m1[1, :3] = y
m1[2, :3] = z
m1[3, 3] = 1.0
m2[0, 0] = m2[1, 1] = m2[2, 2] = 1.0
m2[3, :3] = -position_vector
m2[3, 3] = 1.0
return np.matmul(m1, m2)
def get_projection_matrix(near, far):
aspect = 1.0
fov = 1.0 # 90 Degrees
m = np.zeros([4, 4], dtype=np.float32)
m[0, 0] = fov/aspect
m[1, 1] = fov
m[2, 2] = (-far+near)/(far-near)
m[3, 2] = (-2.0*near*far)/(far-near)
m[2, 3] = -1.0
return m

There's a minor change you must do:
m[2, 2] = -(far+near)/(far-near) //instead of m[2, 2] = (-far)/(far-near)
m[2, 3] = (-2.0*near*far)/(far-near) //instead of m[2, 3] = (-near*far)/(far-near)
The big thing is the row/column order of your matrices.
As #Rabbid76 pointed out, mayor column order is preferred. GLSL provides a function to transpose a matrix. You can also tell to transpose the matrix when it's passed to GPU with glUniformMatrix family commands.
Let's see how to work with row mayor order matrices, as your code does.
The goal, by now with CPU, is to get: finalPoint = matrixMultiply(C, P) with C the combined matrix and P the point coordinates. matrixMultiply is any function you use to do matrices multplication. Remember the order matters, A·B is not the same as B·A
Because C is a 4x4 matrix and P is 1x4, C·P is not possible, it must be P·C.
Notice that with column order P is 4x1 and then C·P is the right operation.
Let's call L the look-at matrix (proper name is view matrix). It's formed by an orientation matrix O and a translation matrix T. With column order is L= O·T.
A property of transposed matrix is (A·B)t = Bt · At
So, with row order you get O·T = Oct · Tct = (Tc · Oc)t where c is for column order. Hey! what we wish is (Oc · Tc)t Notice the change in order of multiplication?
So, if you work with row mayor order matrices, the order they are multiplied is swapped.
The view&projection combined matrix also must be swapped.
Thus replace:
return np.matmul(m2, m1) //was return np.matmul(m1, m2)
//was projected_point =
projected_point =
Despite of all of above, I recommend to work with column mayor order. That's best for OpenGL. And you'll understand better any maths and tutorials you find on OpenGL.


Map an array of N 2D coordinates into an array of N 2D grids/images

My input is a set of N 2D coordinates in the form of a numpy array, e.g.:
import numpy as np
import matplotlib.pyplot as plt
X = np.array([
[-5, 1],
[3, 7.2],
[0, -7],
[9, -8],
[0, 0.1]
Here N would be 5 coordinates.
We can bin each point to some resolution X resultion 2D grid parameter on the 2D field for which we know the min_X and max_X for both dimensions:
resultion = 100
min_X = -10
max_X = 10
bins = np.linspace(min_X, max_X, resolution + 1)
X_binned = np.digitize(X, bins) - 1
array([[25, 55],
[65, 86],
[50, 15],
[95, 10],
[50, 50]], dtype=int64)
So for example here we have a 100 X 100 grid on the [-10,10] X [-10,10] field, and point (-5, 1) maps to cell (25,55).
Now I would like a set of 2D all-zero grids, in the form of a N X resolution X resolution X 1 numpy array (or a set of "images"). For each grid i there should appear 1 for coordinate i location.
This is what I'm doing but it seems not very numpy-y and the complexity is O(N) (N would eventually be around 100K):
X_images = np.zeros((X.shape[0], resolution, resolution, 1), dtype=np.uint8)
for i, (j, k) in enumerate(X_binned):
X_images[i, resolution - 1 - k, j, 0] = 1
To show that it works see e.g. first point (-5, 1) which maps to where it should be on the X_images[0] grid/image:
So is there a better way, if not more efficient more elegant?
You can set them all at once by indexing X_images with the list of rows and cols:
X_images = np.zeros((X.shape[0], resolution, resolution, 1), dtype=np.uint8)
i = np.arange(X.shape[0])
j = X_binned[:, 0]
k = resolution - 1 - X_binned[:, 1]
X_images[i, k, j] = 1
Using np.ix_() makes this pretty straight forward:
X_images = np.zeros((X_binned.shape[0], 100, 100, 1), dtype="uint8")
X_images[np.ix_(np.arange(N), *X_binned.T, [0])] = 1

Needing to assess smaller 3D arrays in larger 3D array with Numpy

I have to take a random integer 50x50x50 array and determine which contiguous 3x3x3 cube within it has the largest sum.
It seems like a lot of splitting features in Numpy don't work well unless the smaller cubes are evenly divisible into the larger one. Trying to work through the thought process I made a 48x48x48 cube that is just in order from 1 to 110,592. I then was thinking of reshaping it to a 4D array with the following code and assessing which of the arrays had the largest sum? when I enter this code though it splits the array in an order that is not ideal. I want the first array to be the 3x3x3 cube that would have been in the corner of the 48x48x48 cube. Is there a syntax that I can add to make this happen?
import numpy as np
arr1 = np.arange(0,110592)
arr2=np.reshape(arr1, (48,48,48))
arr3 = np.reshape(arr2, (4096, 3,3,3))
array([[[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[ 12, 13, 14],
[ 15, 16, 17]],
[[ 18, 19, 20],
[ 21, 22, 23],
[ 24, 25, 26]]],
desired output:
array([[[[ 0, 1, 2],
[ 48, 49, 50],
[ 96, 97, 98]],
etc etc
There's a live version of this solution online you can try for yourself
There's a simple (kind of) solution to your original problem of finding the maximum 3x3x3 subcube in a 50x50x50 cube that's based on changing the input array's strides. This solution is completely vectorized (meaning no looping), and so should get the best possible performance out of Numpy:
import numpy as np
def cubecube(arr, cshape):
strides = (*arr.strides, *arr.strides)
shape = (*np.array(arr.shape) - cshape + 1, *cshape)
return np.lib.stride_tricks.as_strided(arr, shape=shape, strides=strides)
def maxcube(arr, cshape):
cc = cubecube(arr, cshape)
ccsums = cc.sum(axis=tuple(range(-arr.ndim, 0)))
ix = np.unravel_index(np.argmax(ccsums), ccsums.shape)[:arr.ndim]
return ix, cc[ix]
The maxcube function takes an array and the shape of the subcubes, and returns a tuple of (first-index-of-largest-cube, largest-cube). Here's an example of how to use maxcube:
shape = (50, 50, 50)
cshape = (3, 3, 3)
# set up a 50x50x50 array
arr = np.arange(*shape)
# set one of the subcubes as the largest
arr[37, 26, 11] = 999999
ix, cube = maxcube(arr, cshape)
print('first index of largest cube: {}'.format(ix))
print('largest cube:\n{}'.format(cube))
which outputs:
first index of largest cube: (37, 26, 11)
largest cube:
[[[999999 93812 93813]
[ 93861 93862 93863]
[ 93911 93912 93913]]
[[ 96311 96312 96313]
[ 96361 96362 96363]
[ 96411 96412 96413]]
[[ 98811 98812 98813]
[ 98861 98862 98863]
[ 98911 98912 98913]]]
In depth explanation
A cube of cubes
What you have is a 48x48x48 cube, but what you want is a cube of smaller cubes. One can be converted to the other by altering its strides. For a 48x48x48 array of dtype int64, the stride will originally be set as (48*48*8, 48*8, 8). The first value of each non-overlapping 3x3x3 subcube can be iterated over with a stride of (3*48*48*8, 3*48*8, 3*8). Combine these strides to get the strides of the cube of cubes:
# Set up a 48x48x48 array, like in OP's example
arr = np.arange(48**3).reshape(48,48,48)
shape = (16,16,16,3,3,3)
strides = (3*48*48*8, 3*48*8, 3*8, 48*48*8, 48*8, 8)
# restride into a 16x16x16 array of 3x3x3 cubes
arr2 = np.lib.stride_tricks.as_strided(arr, shape=shape, strides=strides)
arr2 is a view of arr (meaning that they share data, so no copy needs to be made) with a shape of (16,16,16,3,3,3). The ijkth 3x3 cube in arr can be accessed by passing the indices to arr2:
i,j,k = 0,0,0
[[[ 0 1 2]
[ 48 49 50]
[ 96 97 98]]
[[2304 2305 2306]
[2352 2353 2354]
[2400 2401 2402]]
[[4608 4609 4610]
[4656 4657 4658]
[4704 4705 4706]]]
You can get the sums of all of the subcubes by just summing across the inner axes:
sumOfSubcubes = arr2.sum(3,4,5)
This will yield a 16x16x16 array in which each value is the sum of a non-overlapping 3x3x3 subcube from your original array. This solves the specific problem about the 48x48x48 array that the OP asked about. Restriding can also be used to find all of the overlapping 3x3x3 cubes, as in the cubecube function above.
Your thought process with the 48x48x48 cube goes in the right direction insofar that there are 48³ different contiguous 3x3x3 cubes within the 50x50x50 array, though I don't understand why you would want to reshape it.
What you could do is add all 27 values of each 3x3x3 cube to a 48x48x48 dimensional array by going through all 27 permutations of adjacent slices and find the maximum over it. The found entry will give you the index tuple coordinate_index of the cube corner that is closest to the origin of your original array.
import numpy as np
array_shape = np.array((50,50,50), dtype=int)
cube_dim = np.array((3,3,3), dtype=int)
original_array = np.random.randint(array_shape)
reduced_shape = array_shape - cube_dim + 1
sum_array = np.zeros(reduced shape, dtype=int)
for i in range(cube_dim[0]):
for j in range(cube_dim[1]):
for k in range(cube_dim[2]):
sum_array += original_array[
i:-cube_dim[0]+1+i, j:-cube_dim[1]+1+j, k:-cube_dim[2]+1+k
flat_index = np.argmax(sum_array)
coordinate_index = np.unravel_index(flat_index, reduced_shape)
This method should be faster than looping over each of the 48³ index combinations to find the desired cube as it uses in place summation but in turn requires more memory. I'm not sure about it, but defining an (48³, 27) array with slices and using np.sum over the second axis could be even faster.
You can easily change the above code to find a cuboid with arbitrary side lengths instead.
This is a solution without many numpy functions, just numpy.sum. First define a squared matrix and then the size of the cube cs you are going to perform the summation within.
Just change cs to adjust the cube size and find other solutions. Following #Divakar suggestion, I have used a 4x4x4 array and I also store the location where the cube is location (just the vertex of the cube's origin)
import numpy as np
a = np.random.randint(0,9,(4,4,4))
cs = 2 # Cube size
my_sum = 0
idx = None
for i in range(a.shape[0]-cs+2):
for j in range(a.shape[1]-cs+2):
for k in range(a.shape[2]-cs+2):
cube_sum = np.sum(a[i:i+cs, j:j+cs, k:k+cs])
if cube_sum > my_sum:
my_sum = cube_sum
idx = (i,j,k)
print(my_sum, idx) # 42 (0, 0, 0)
This 3D array a is
[[[5 0 3 3]
[7 3 5 2]
[4 7 6 8]
[8 1 6 7]]
[[7 8 1 5]
[8 4 3 0]
[3 5 0 2]
[3 8 1 3]]
[[3 3 7 0]
[1 0 4 7]
[3 2 7 2]
[0 0 4 5]]
[[5 6 8 4]
[1 4 8 1]
[1 7 3 6]
[7 2 0 3]]]
And you get my_sum = 42 and idx = (0, 0, 0) for cs = 2. And my_sum = 112 and idx = (1, 0, 0) for cs = 3
Here is a cumsum based fast solution:
import numpy as np
nd = 3
cs = 3
N = 50
# create indices [cs-1:, ...], [:, cs-1:, ...], ...
fromcsm = *zip(*np.where(np.identity(nd, bool), np.s_[cs-1:], np.s_[:])),
# create indices [cs:, ...], [:, cs:, ...], ...
fromcs = *zip(*np.where(np.identity(nd, bool), np.s_[cs:], np.s_[:])),
# create indices [:cs, ...], [:, :cs, ...], ...
tocs = *zip(*np.where(np.identity(nd, bool), np.s_[:cs], np.s_[:])),
# create indices [:-cs, ...], [:, :-cs, ...], ...
tomcs = *zip(*np.where(np.identity(nd, bool), np.s_[:-cs], np.s_[:])),
# create indices [cs-1, ...], [:, cs-1, ...], ...
atcsm = *zip(*np.where(np.identity(nd, bool), cs-1, np.s_[:])),
def windowed_sum(a):
out = a.copy()
for i, (fcsm, fcs, tcs, tmcs, acsm) \
in enumerate(zip(fromcsm, fromcs, tocs, tomcs, atcsm)):
out[fcs] -= out[tmcs]
out[acsm] = out[tcs].sum(axis=i)
out = out[fcsm].cumsum(axis=i)
return out
This returns the sums over all the sub cubes. We can then use argmax and unravel_index to get the offset of the maximum cube. Example:
a = np.random.randint(0,9,(N,N,N))
s = windowed_sum(a)
idx = np.unravel_index(np.argmax(s,), s.shape)

Tensorflow scan multiple matrix rows with offset

I want to scan a matrix analogous to Tensorflow's tf.scan(), but using multiple rows at a time. So given a [n, m] matrix, I want to be able to iterate the m rows (with n elements) from i + j to m giving m - j slices of shape [i - j, n].
How can this be achieved?
I know how tf.scan does something like this, returning the accumulated value of each iteration. But I don't think shifting the matrix as multiple inputs solves this, since the values that have an offset cannot be precomputed.
To give an example for n = 3 and m = 5, let's say I have a matrix that looks like the following:
# [[1 0 0]
# [1 1 0]
# [0 0 0] row 3
# [0 0 0] row 4
# [0 0 0]] row 5
matrix_shape = [5, 3]
matrix_idx = tf.constant([[0, 0], [1, 0], [1, 1]])
matrix = tf.scatter_nd(matrix_idx,
I want to apply the following function from row 3 to row 5:
# [[ 1 0 0] ┌ a
# [ 1 1 0] ├ b
# [ 6 4 2] <─┴ output / current line
# [16 12 6]
# [46 34 18]]
def compute(x):
a = x[0]
b = x[1]
return (a + b + 1) * 2
Does Tensorflow have a function specific to this problem?
The following code I wrote does exactly what I wanted.
The important part here is the return of the function used by tf.scan, which not only gives back the current computation c, but also the row from the previous step b. It is therefore important to later cut off this excess from computation by only selecting the later tensor in this list with [1].
#!/usr/bin/env python3
import tensorflow as tf
def compute(x, _):
a = x[0]
b = x[1]
c = (a + b + 1) * 2
return (b, c)
matrix_shape = tf.constant([3, 3])
init_data = [[1, 0, 0], [1, 1, 0]]
initializer = (
matrix = tf.zeros(matrix_shape, dtype=tf.int32)
computation = tf.scan(compute, matrix, initializer)[1]
result = tf.concat((tf.constant(init_data), computation), axis=0)
with tf.Session() as sess:
Since I'm yet lacking experience: May this solution be bad for performance, because the function is returning a tuple and therefore not using Tensorflow's speed optimizations?

How do I overwrite a row vector in a numpy array?

I am trying to normalize each row vector of numpy array x, but I'm facing 2 problems.
I'm unable to update the row vectors of x (source code in image)
Is it possible to avoid the for loop (line 6) with any numpy functions?
import numpy as np
x = np.array([[0, 3, 4] , [1, 6, 4]])
c = x ** 2
for i in range(0, len(x)):
print(x[i]/np.sqrt(c[i].sum())) #prints [0. 0.6 0.8]
x[i] = x[i]/np.sqrt(c[i].sum())
print(x[i]) #prints [0 0 0]
print(x) #prints [[0 0 0] [0 0 0]] and wasn't updated
I've just recently started out with numpy, so any assistance would be greatly appreciated!
I'm unable to update the row vectors of x (source code in image)
Your np.array has no dtype argument, so it uses <type 'numpy.int32'>. If you wish to store floats in the array, add a float dtype:
x = np.array([
], dtype = np.float)
To see this, compare
x = np.array([
], dtype = np.float)
print type(x[0][0]) # output = <type 'numpy.float64'>
x = np.array([
print type(x[0][0]) # output = <type 'numpy.int32'>
is it possible to avoid the for loop (line 6) with any numpy functions?
This is how I would do it:
norm1, norm2 = np.linalg.norm(x[0]), np.linalg.norm(x[1])
print x[0] / norm1
print x[1] / norm2
You can use:
x/np.sqrt((x*x).sum(axis=1))[:, None]
In [9]: x = np.array([[0, 3, 4] , [1, 6, 4]])
In [10]: x/np.sqrt((x*x).sum(axis=1))[:, None]
array([[0. , 0.6 , 0.8 ],
[0.13736056, 0.82416338, 0.54944226]])
For the first question:
x = np.array([[0,3,4],[1,6,4]],dtype=np.float32)
For the second question:
Given 2-dimensional array
x = np.array([[0, 3, 4] , [1, 6, 4]])
Row-wise L2 norm of that array can be calculated with:
norm = np.linalg.norm(x, axis = 1)
[5. 7.28010989]
You can not divide array x of shape (2, 3) by norm of shape (2,), the following trick enables that by adding extra dimension to norm
# Divide by adding extra dimension
x = x / norm[:, None]
[[0. 0.6 0.8 ]
[0.13736056 0.82416338 0.54944226]]
This solves both your questions

Numpy: Get rectangle area just the size of mask

I have an image and a mask. Both are numpy array. I get the mask through GraphSegmentation (cv2.ximgproc.segmentation), so the area isn't rectangle, but not divided. I'd like to get a rectangle just the size of masked area, but I don't know the efficient way.
In other words, unmasked pixels are value of 0 and masked pixels are value over 0, so I want to get a rectangle where...
top = the smallest index of axis 0 whose value > 0
bottom = the largest index of axis 0 whose value > 0
left = the smallest index axis 1 whose value > 0
right = the largest index axis 1 whose value > 0
image = src[top : bottom, left : right]
My code is below
segmentation = cv2.ximgproc.segmentation.createGraphSegmentation()
src = cv2.imread('image_file')
segment = segmentation.processImage(src)
for i in range(np.max(segment)):
dst = np.array(src)
dst[segment != i] = 0
cv2.imwrite('output_file', dst)
If you prefer pure Numpy, you can achieve this using np.where and np.meshgrid:
i, j = np.where(mask)
indices = np.meshgrid(np.arange(min(i), max(i) + 1),
np.arange(min(j), max(j) + 1),
sub_image = image[indices]
np.where returns a tuple of arrays specifying, pairwise, the indices in each axis for each non-zero element of mask. We then create arrays of all the row and column indices we will want using np.arange, and use np.meshgrid to generate two grid-shaped arrays that index the part of the image we're interested in. Note that we specify matrix-style indexing using index='ij' to avoid having to transpose the result (the default is Cartesian-style indexing).
Essentially, meshgrid constructs indices so that:
image[indices][a, b] == image[indices[0][a, b], indices[1][a, b]]
Start with the following:
>>> image = np.arange(12).reshape((4, 3))
>>> image
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
Let's say we want to extract the [[3,4],[6,7]] sub-matrix, which is the bounding rectangle for the the following mask:
>>> mask = np.array([[0,0,0],[0,1,0],[1,0,0],[0,0,0]])
>>> mask
array([[0, 0, 0],
[0, 1, 0],
[1, 0, 0],
[0, 0, 0]])
Then, applying the above method:
>>> i, j = np.where(mask)
>>> indices = np.meshgrid(np.arange(min(i), max(i) + 1), np.arange(min(j), max(j) + 1), indexing='ij')
>>> image[indices]
array([[3, 4],
[6, 7]])
Here, indices[0] is a matrix of row indices, while indices[1] is the corresponding matrix of column indices:
>>> indices[0]
array([[1, 1],
[2, 2]])
>>> indices[1]
array([[0, 1],
[0, 1]])
I think using np.amax and np.amin and cropping the image is much faster.
i, j = np.where(mask)
indices = np.meshgrid(np.arange(min(i), max(i) + 1),
np.arange(min(j), max(j) + 1),
sub_image = image[indices]
Time taken: 50 msec
where = np.array(np.where(mask))
x1, y1 = np.amin(where, axis=1)
x2, y2 = np.amax(where, axis=1)
sub_image = image[x1:(x2+1), y1:(y2+1)]
Time taken: 5.6 msec
I don't get Hans's results when running the two methods (using NumPy 1.18.5). In any case, there is a much more efficient method, where you take the arg-max along each dimension
i, j = np.where(mask)
y, x = np.meshgrid(
np.arange(min(i), max(i) + 1),
np.arange(min(j), max(j) + 1),
Took 38 ms
where = np.array(np.where(mask))
y1, x1 = np.amin(where, axis=1)
y2, x2 = np.amax(where, axis=1) + 1
sub_image = image[y1:y2, x1:x2]
Took 35 ms
maskx = np.any(mask, axis=0)
masky = np.any(mask, axis=1)
x1 = np.argmax(maskx)
y1 = np.argmax(masky)
x2 = len(maskx) - np.argmax(maskx[::-1])
y2 = len(masky) - np.argmax(masky[::-1])
sub_image = image[y1:y2, x1:x2]
Took 2 ms
Timings script

