Speed up numpy integer-array indexing for depth

Speed up numpy integer-array indexing for depth - python

Suppose I have an array
[[0 2 1]
[1 0 1]
[2 1 1]]
and I want to convert it into a tensor of the form
[[[1 0 0]
[0 1 0]
[0 0 0]]
[[0 0 1]
[1 0 1]
[0 1 1]]
[[0 1 0]
[0 0 0]
[1 0 0]]]
Where each depth layer (index i) is a binary mask showing where i appears in the input.
I have written code for this which works correctly but is too slow for any use. Can I replace the loop in this function with another vectorized operation?
def im2segmap(im, depth):
tensor = np.zeros((im.shape[0], im.shape[1], num_classes))
for c in range(depth):
rows, cols = np.argwhere(im==c).T
tensor[c, rows, cols] = 1
return tensor

Use broadcasting -
(a==np.arange(num_classes)[:,None,None]).astype(int)
Or with builtin outer comparison -
(np.equal.outer(range(num_classes),a)).astype(int)
Use uint8 if you have to use an int dtype or keep as boolean by skipping the int conversion altogether for further boost.
Sample run -
In [42]: a = np.array([[0,2,1],[1,0,1],[2,1,1]])
In [43]: num_classes = 3 # or depth
In [44]: (a==np.arange(num_classes)[:,None,None]).astype(int)
Out[44]:
array([[[1, 0, 0],
[0, 1, 0],
[0, 0, 0]],
[[0, 0, 1],
[1, 0, 1],
[0, 1, 1]],
[[0, 1, 0],
[0, 0, 0],
[1, 0, 0]]])
To have the depth/num_classes as the third dim, extend the input array and then compare against the range array -
(a[...,None]==np.arange(num_classes)).astype(int)
(np.equal.outer(im, range(num_classes))).astype(int)
(np.equal.outer(im, range(num_classes))).astype(np.uint8) # lower prec

Related

How to convert a boolean array into a matrix?

I am a beginner, and I want to know is it possible to convert a boolean array into a matrix in NumPy?
For example, we have a boolean array a like this:
a = [[False],
[True],
[True],
[False],
[True]]
And, we turn it into the following matrix:
m = [[0, 0, 0, 0, 0]
[0, 1, 0, 0, 0]
[0, 0, 1, 0, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 1]]
I mean the array to be the diagonal of the matrix.

You can use np.diagflat which creates a two-dimensional array with the flattened input as a diagonal:
np.diagflat(np.array(a, dtype=int))
#[[0 0 0 0 0]
# [0 1 0 0 0]
# [0 0 1 0 0]
# [0 0 0 0 0]
# [0 0 0 0 1]]
Working example

Create new array based on the value and shapes of current array

Suppose I have an array contains index [2,1,1,2,0], I would like to create a new array 3x5 that only has value equal to 1 at index 2 for first row, at index 1 for second row, and so on..
For example:
[[0, 0, 1], #2
[0, 1, 0], #1
[0, 1, 0], #1
[0, 0, 1], #2
[1, 0, 0]] #0
How could I vectorize this procedure without using for loop?

import numpy as np
in_index = [2,1,1,2,0]
len_in_index = len(in_index)
result = np.zeros((len_in_index, 3), dtype=int)
for i in range(len_in_index):
result[i, in_index[i]] = 1
print(result)
Output:
[[0 0 1]
[0 1 0]
[0 1 0]
[0 0 1]
[1 0 0]]

Pad elements in ndarray using unique padding for each element

I am quite new to python and have read lots of SO questions on this topic however none of them answers my needs.
I end up with an ndarray:
[[1, 2, 3]
[4, 5, 6]]
Now I want to pad each element (e.g. [1, 2, 3]) with a tailored padding just for that element. Of course I could do it in a for loop and append each result to a new ndarray but isn't there a faster and cleaner way I could apply this over the whole ndarray at once?
I imagined it could work like:
myArray = [[1, 2, 3]
[4, 5, 6]]
paddings = [(1, 2),
(2, 1)]
myArray = np.pad(myArray, paddings, 'constant')
But of course this just outputs:
[[0 0 0 0 0 0 0 0 0]
[0 0 1 2 3 0 0 0 0]
[0 0 3 4 5 0 0 0 0]
[0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0]]
Which is not what i need. The target result would be:
[[0 1 2 3 0 0]
[0 0 4 5 6 0]]
How can I achieve this using numpy?

Here is a loop based solution but with creating a zeros array as per the dimensions of input array and paddings. Explanation in comments:
In [192]: myArray
Out[192]:
array([[1, 2, 3],
[4, 5, 6]])
In [193]: paddings
Out[193]:
array([[1, 2],
[2, 1]])
# calculate desired shape; needed for initializing `padded_arr`
In [194]: target_shape = (myArray.shape[0], myArray.shape[1] + paddings.shape[1] + 1)
In [195]: padded_arr = np.zeros(target_shape, dtype=np.int32)
In [196]: padded_arr
Out[196]:
array([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]], dtype=int32)
After this, we can use a for loop to slot fill the sequences from myArray, based on the values from paddings:
In [199]: for idx in range(paddings.shape[0]):
...: padded_arr[idx, paddings[idx, 0]:-paddings[idx, 1]] = myArray[idx]
...:
In [200]: padded_arr
Out[200]:
array([[0, 1, 2, 3, 0, 0],
[0, 0, 4, 5, 6, 0]], dtype=int32)
The reason we've to resort to a loop based solution is because numpy.lib.pad() doesn't yet support this sort of padding, even with all available additional modes and keyword arguments that it already provides.

numpy 2d boolean array count consecutive True sizes

I'm interested in finding out individual sizes of the 'True' patches in a boolean array. For instance in the boolean matrix:
[[1, 0, 0, 0],
[0, 1, 1, 0],
[0, 1, 0, 0],
[0, 1, 0, 0]]
The output would be:
[[1, 0, 0, 0],
[0, 4, 4, 0],
[0, 4, 0, 0],
[0, 4, 0, 0]]
I'm aware that I can do this recursively, but I'm also under the impression that python array operations are costly on large scale and is there an available library function for this?

Here's a quick and simple complete solution:
import numpy as np
import scipy.ndimage.measurements as mnts
A = np.array([
[1, 0, 0, 0],
[0, 1, 1, 0],
[0, 1, 0, 0],
[0, 1, 0, 0]
])
# labeled is a version of A with labeled clusters:
#
# [[1 0 0 0]
# [0 2 2 0]
# [0 2 0 0]
# [0 2 0 0]]
#
# clusters holds the number of different clusters: 2
labeled, clusters = mnts.label(A)
# sizes is an array of cluster sizes: [0, 1, 4]
sizes = mnts.sum(A, labeled, index=range(clusters + 1))
# mnts.sum always outputs a float array, so we'll convert sizes to int
sizes = sizes.astype(int)
# get an array with the same shape as labeled and the
# appropriate values from sizes by indexing one array
# with the other. See the `numpy` indexing docs for details
labeledBySize = sizes[labeled]
print(labeledBySize)
output:
[[1 0 0 0]
[0 4 4 0]
[0 4 0 0]
[0 4 0 0]]
The trickiest line above is the "fancy" numpy indexing:
labeledBySize = sizes[labeled]
in which one array is used to index the other. See the numpy indexing docs (section "Index arrays") for details on why this works.
I also wrote a version of the above code as a single compact function that you can try out yourself online. It includes a test case based on a random array.

numpy 3D array shape

I created a numpy array of shape (4,3,2); I expected following code can print out a array shaped 4 X 3 or 3 X 4
import numpy as np
x = np.zeros((4,3,2), np.int32)
print(x[:][:][0])
However, I got
[[0 0]
[0 0]
[0 0]]
Looks like a 2 X 3? I am really confused on numpy matrix now. Shouldn't I get
[[0 0 0]
[0 0 0]
[0 0 0]
[0 0 0]]
in stead? How to do map a 3D image into a numpy 3D matrix?
For example, in MATLAB, the shape (m, n, k) means (row, col, slice) in a context of an (2D/3D) image.
Thanks a lot

x[:] slices all elements along the first dimension, so x[:] gives the same result as x and x[:][:][0] is thus equivalent to x[0].
To select an element on the last dimension, you can do:
x[..., 0]
#array([[0, 0, 0],
# [0, 0, 0],
# [0, 0, 0],
# [0, 0, 0]], dtype=int32)
or:
x[:,:,0]
#array([[0, 0, 0],
# [0, 0, 0],
# [0, 0, 0],
# [0, 0, 0]], dtype=int32)

You need to specify all slices at the same time in a tuple, like so:
x[:, :, 0]
If you do x[:][:][0] you are actually indexing the first dimension three times. The first two create a view for the entire array and the third creates a view for the index 0 of the first dimension

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Speed up numpy integer-array indexing for depth - python

Related

How to convert a boolean array into a matrix?

Create new array based on the value and shapes of current array

Pad elements in ndarray using unique padding for each element

numpy 2d boolean array count consecutive True sizes

numpy 3D array shape

Categories

Resources