Imagine we are under a segmentation problem that has 5 classes (0, 1, 2, 3, 4). Considering that we have the following 3D mask volumes (A.K.A. 3D numpy arrays):
# Ground truth mask
y_true = np.array([[[2, 1, 4], [0, 1, 1], [2, 1, 0]],
[[2, 2, 2], [0, 1, 0], [0, 1, 1]],
[[2, 4, 4], [2, 1, 4], [2, 1, 1]]])
# Predicted mask
y_pred = np.array([[[2, 0, 4], [0, 2, 1], [2, 0, 0]],
[[2, 4, 0], [0, 1, 2], [0, 4, 1]],
[[2, 0, 4], [1, 1, 4], [2, 2, 1]]])
How can I compute the Hausdorff distance between them? I've looked into Monai's implementation however couldn't figure out the meaning of the compute_hausdorff_distance output.
I implemented a one-hot encoder, since Monai requires the inputs to be one-hot encoded.
def one_hot_encode(array):
return np.eye(5)[array].astype(dtype=int)
Now we have that:
# Ground truth mask
y_true = [[[[0 0 1 0 0]
[0 1 0 0 0]
[0 0 0 0 1]]
...
[[1 0 0 0 0]
[0 1 0 0 0]
[0 1 0 0 0]]]
# Predicted mask
y_pred = [[[[0 0 1 0 0]
[1 0 0 0 0]
[0 0 0 0 1]]
...
[[0 0 1 0 0]
[0 0 1 0 0]
[0 1 0 0 0]]]]
The output of Monai's implementation is:
>>> compute_hausdorff_distance(one_hot_encode(y_pred), one_hot_encode(y_true), include_background=True)
>>> [[1. 1. 1. ]
[2. 1.41421356 3. ]
[2.23606798 1. 1. ]]
Looking at it I can understand it is computing the euclidean distance. It looks like it is looking at labels as positions, but should't the output be of shape 3x3x3just like the masks?
Also, Scipy implementation only works for 2D masks/arrays. Would it be right to compute the Hausdorff distance slice-wise, i.e., slice by slice, and afterwards average all the slice Hausdorff distances obtained? Or does this approach violates the Hausdorff distance principle for 3D data?
Related
I have 2 2d arrays and I would like to return all values that are differing in the second array while keeping the existing dimensions.
I've done something like diff = arr2[np.nonzero(arr2-arr1)] works to give me the differing elements but how do I keep the dimensions and relative position of the elements?
Example Input:
arr1 = [[0 1 2] arr2 = [[0 1 2]
[3 4 5] [3 5 5]
[6 7 8]] [6 7 8]]
Expected output:
diff = [[0 0 0]
[0 5 0]
[0 0 0]]
How about the following:
import numpy as np
arr1 = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
arr2 = np.array([[0, 1, 2], [3, 5, 5], [6, 7, 8]])
diff = arr2 * ((arr2 - arr1) != 0)
print(diff)
# [[0 0 0]
# [0 5 0]
# [0 0 0]]
EDIT: Surprisingly to me, the following first version of my answer (corrected by OP) might be faster:
diff = arr2 * np.abs(np.sign(arr2 - arr1))
If they are numpy arrays, you could do
ans = ar1 * 0
ans[ar1 != ar2] = ar2[ar1 != ar2]
ans
# array([[0, 0, 0],
# [0, 5, 0],
# [0, 0, 0]])
Without numpy, you can use map
list(map(lambda a, b: list(map(lambda x, y: y if x != y else 0, a, b)), arr1, arr2))
# [[0, 0, 0], [0, 5, 0], [0, 0, 0]]
Data
import numpy as np
arr1 = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
arr2 = [[0, 1, 2], [3, 5, 5], [6, 7, 8]]
ar1 = np.array(arr1)
ar2 = np.array(arr2)
I am surprised no one proposed the numpy.where method:
diff = np.where(arr1!=arr2, arr2, 0)
Literally, where arr1 and arr2 are different take the values of arr2, else take 0.
Output:
array([[0, 0, 0],
[0, 5, 0],
[0, 0, 0]])
np.copyto
You can check for inequality between the two arrays then use np.copyto with np.zeros/ np.zeros_like.
out = np.zeros(arr2.shape) # or np.zeros_like(arr2)
np.copyto(out, arr2, where=arr1!=arr2)
print(out)
# [[0 0 0]
# [0 5 0]
# [0 0 0]]
np.where
You can use np.where and specify x, y args.
out = np.where(arr1!=arr2, arr2, 0)
# [[0 0 0]
# [0 5 0]
# [0 0 0]]
I have a 2D array of size 100x128. This consists of 1's and 0's of dtype int64. I need to convert this to each column into 16 symbols of 8 bits with corresponding decimal number.
For example, In a smaller scale, consider this 5x8 case with 2 bit symbols.
[[1 0 0 1 0 1 0 1],
[0 1 1 0 1 0 1 1],
[1 0 1 0 1 1 0 0],
[1 0 1 1 0 1 1 0],
[1 1 0 0 0 1 0 1]]
I need to convert this to
[[2 1 1 1],
[1 2 1 3]
[2 2 3 0],
[2 3 1 2],
[3 0 1 1]].
So In my case, I need to first divide each row of 128 bits into 8 bit chunks and get the decimal value of the corresponding binary number, which should give me 100x16 2D array.
I have tried first reshaping the (100,128) 2D array into a (100,16,8) 3D array. But I am not sure how to do the binary to decimal conversion considering 3rd dimension as binary strings.
You can reshape, take the matrix product of every 2 bit sequence by the corresponding values in the 2**np.arange(n_bits) which can be done with the # operator, then reshape again:
n_bits = 2
pows = (2 ** np.arange(n_bits))[::-1]
(np.reshape(a, (-1, n_bits)) # pows).reshape(-1, a.shape[1]//2)
array([[2, 1, 1, 1],
[1, 2, 2, 3],
[2, 2, 3, 0],
[2, 3, 1, 2],
[3, 0, 1, 1]])
Assuming you are using numpy you could do something like:
bits_count = 2
bit_weights = 2**np.arange(2)[::-1]
data = np.array([[1, 0, 0, 1, 0, 1, 0, 1],
[0, 1, 1, 0, 1, 0, 1, 1],
[1, 0, 1, 0, 1, 1, 0, 0],
[1, 0, 1, 1, 0, 1, 1, 0],
[1, 1, 0, 0, 0, 1, 0, 1]]
)
(data.reshape((data.shape[0], -1, 2)) * bit_weights).sum(axis=2)
I am quite new to python and have read lots of SO questions on this topic however none of them answers my needs.
I end up with an ndarray:
[[1, 2, 3]
[4, 5, 6]]
Now I want to pad each element (e.g. [1, 2, 3]) with a tailored padding just for that element. Of course I could do it in a for loop and append each result to a new ndarray but isn't there a faster and cleaner way I could apply this over the whole ndarray at once?
I imagined it could work like:
myArray = [[1, 2, 3]
[4, 5, 6]]
paddings = [(1, 2),
(2, 1)]
myArray = np.pad(myArray, paddings, 'constant')
But of course this just outputs:
[[0 0 0 0 0 0 0 0 0]
[0 0 1 2 3 0 0 0 0]
[0 0 3 4 5 0 0 0 0]
[0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0]]
Which is not what i need. The target result would be:
[[0 1 2 3 0 0]
[0 0 4 5 6 0]]
How can I achieve this using numpy?
Here is a loop based solution but with creating a zeros array as per the dimensions of input array and paddings. Explanation in comments:
In [192]: myArray
Out[192]:
array([[1, 2, 3],
[4, 5, 6]])
In [193]: paddings
Out[193]:
array([[1, 2],
[2, 1]])
# calculate desired shape; needed for initializing `padded_arr`
In [194]: target_shape = (myArray.shape[0], myArray.shape[1] + paddings.shape[1] + 1)
In [195]: padded_arr = np.zeros(target_shape, dtype=np.int32)
In [196]: padded_arr
Out[196]:
array([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]], dtype=int32)
After this, we can use a for loop to slot fill the sequences from myArray, based on the values from paddings:
In [199]: for idx in range(paddings.shape[0]):
...: padded_arr[idx, paddings[idx, 0]:-paddings[idx, 1]] = myArray[idx]
...:
In [200]: padded_arr
Out[200]:
array([[0, 1, 2, 3, 0, 0],
[0, 0, 4, 5, 6, 0]], dtype=int32)
The reason we've to resort to a loop based solution is because numpy.lib.pad() doesn't yet support this sort of padding, even with all available additional modes and keyword arguments that it already provides.
Suppose I have an array
[[0 2 1]
[1 0 1]
[2 1 1]]
and I want to convert it into a tensor of the form
[[[1 0 0]
[0 1 0]
[0 0 0]]
[[0 0 1]
[1 0 1]
[0 1 1]]
[[0 1 0]
[0 0 0]
[1 0 0]]]
Where each depth layer (index i) is a binary mask showing where i appears in the input.
I have written code for this which works correctly but is too slow for any use. Can I replace the loop in this function with another vectorized operation?
def im2segmap(im, depth):
tensor = np.zeros((im.shape[0], im.shape[1], num_classes))
for c in range(depth):
rows, cols = np.argwhere(im==c).T
tensor[c, rows, cols] = 1
return tensor
Use broadcasting -
(a==np.arange(num_classes)[:,None,None]).astype(int)
Or with builtin outer comparison -
(np.equal.outer(range(num_classes),a)).astype(int)
Use uint8 if you have to use an int dtype or keep as boolean by skipping the int conversion altogether for further boost.
Sample run -
In [42]: a = np.array([[0,2,1],[1,0,1],[2,1,1]])
In [43]: num_classes = 3 # or depth
In [44]: (a==np.arange(num_classes)[:,None,None]).astype(int)
Out[44]:
array([[[1, 0, 0],
[0, 1, 0],
[0, 0, 0]],
[[0, 0, 1],
[1, 0, 1],
[0, 1, 1]],
[[0, 1, 0],
[0, 0, 0],
[1, 0, 0]]])
To have the depth/num_classes as the third dim, extend the input array and then compare against the range array -
(a[...,None]==np.arange(num_classes)).astype(int)
(np.equal.outer(im, range(num_classes))).astype(int)
(np.equal.outer(im, range(num_classes))).astype(np.uint8) # lower prec
I'm interested in finding out individual sizes of the 'True' patches in a boolean array. For instance in the boolean matrix:
[[1, 0, 0, 0],
[0, 1, 1, 0],
[0, 1, 0, 0],
[0, 1, 0, 0]]
The output would be:
[[1, 0, 0, 0],
[0, 4, 4, 0],
[0, 4, 0, 0],
[0, 4, 0, 0]]
I'm aware that I can do this recursively, but I'm also under the impression that python array operations are costly on large scale and is there an available library function for this?
Here's a quick and simple complete solution:
import numpy as np
import scipy.ndimage.measurements as mnts
A = np.array([
[1, 0, 0, 0],
[0, 1, 1, 0],
[0, 1, 0, 0],
[0, 1, 0, 0]
])
# labeled is a version of A with labeled clusters:
#
# [[1 0 0 0]
# [0 2 2 0]
# [0 2 0 0]
# [0 2 0 0]]
#
# clusters holds the number of different clusters: 2
labeled, clusters = mnts.label(A)
# sizes is an array of cluster sizes: [0, 1, 4]
sizes = mnts.sum(A, labeled, index=range(clusters + 1))
# mnts.sum always outputs a float array, so we'll convert sizes to int
sizes = sizes.astype(int)
# get an array with the same shape as labeled and the
# appropriate values from sizes by indexing one array
# with the other. See the `numpy` indexing docs for details
labeledBySize = sizes[labeled]
print(labeledBySize)
output:
[[1 0 0 0]
[0 4 4 0]
[0 4 0 0]
[0 4 0 0]]
The trickiest line above is the "fancy" numpy indexing:
labeledBySize = sizes[labeled]
in which one array is used to index the other. See the numpy indexing docs (section "Index arrays") for details on why this works.
I also wrote a version of the above code as a single compact function that you can try out yourself online. It includes a test case based on a random array.