how to match one axis with multiple numpy arrays - python

Background:
I have an rgb image with three diminution (W, H, C), where C = 3. I want to mask a few colors like (0,0,255) , (0,255,255) in this image. The problem becomes matching the last axis of the image with a list of colors I defined. color_list = [[255,0,0], [255,255,0], [255,0,255]] # just an example
It is easy to do it with one color,
mask = np.all(image == [255,0,0], axis = 2)
But I have to run a for loop if I have multiple colors.
masks = [np.all(image == color, axis = 2) for color in color_list]
mask = np.any(masks, axis=0)
Question:
Any elegant way to get the mask with multiple colors?

I have one way using broadcasting which is more efficient since it will loop in C. Basically make arrays comparable. It might look difficult at beginning, but once you know how it works this is all you will use [conditions apply]....
import numpy as np
x = np.array([[[255, 0, 0],[ 0, 255, 0], [ 0, 255, 0], [ 0, 255, 0]], [[255, 0, 0],[ 0, 255, 0], [ 0, 255, 0], [ 0, 255, 0]]])
print(x.shape)
# (2, 4, 3)
color_list = np.array([[255,0,0], [255,255,0], [255,0,255]])
print(color_list.shape)
# (3, 3)
# make array compatible
x = x[:, :, np.newaxis, :]
### Analogy for interpreting broadcasting
# Here repeating is for analogy and does not mean it will allocate new copy of memory
# element wise comparision, possibler due to broadcast
# shape of x is (2, 4, 1, 3)
# By broadcasting conceptually x will be repeated along axis=2 this will make (2, 4, 3, 3)
# color_list will be repeated over (2, 4) making it (2, 4, 3, 3) and they will have same shape also the final shape after == will be (2, 4, 3, 3)
f1 = np.all(x[:, :, np.newaxis, :] == color_list, axis=3)
#array([[[ True, False, False],
# [False, False, False],
# [False, False, False],
# [False, False, False]],
#
# [[ True, False, False],
# [False, False, False],
# [False, False, False],
# [False, False, False]]])
mask = np.any(f1, axis=2)
We have target array with shape (W, H, C) == (2, 4, 3) and we need to find size 3 arrays of color_list == [[255,0,0], [255,255,0], [255,0,255]]
Ideally we would like to do cross comparison, by that I mean if one side have M and other side N entries, then after some operations we would like M * N results. This would seem like repeating M entries each N times and comparing. While that may seem not possible at first glance, but numpy provides broadcasting. This will conceptually repeat the entries like your for loop(actually it highly memory efficient, it wont create actual copies)
So we need to broadcast, to make these two arrays compatible, but they are not compatible, as mentioned in broadcasting rules shapes are compared right to left and they need to be same or one of them must be 1.
color_list shape is (3, 3), x shape is (2, 4, 3). We will add new axis in x to make it compatible for broadcasting, which is x[:, :, np.newaxis, :] which has shape (2, 4, 1, 3). Now both are compatible and we can compare.
Compare along last axis which is color channel axis=3 and then on last but one axis which his axis = 2 will give (W, H) boolean where each entry represents True if the color channel triple was there in color_list.
This technique is exactly the same which can be used to calculate the distance matrix when two arrays of points are givenlike here Fast way to calculate min distance between two numpy arrays of 3D points

Related

Setting some 2d array labels to zero in python

My goal is to set some labels in 2d array to zero without using a for loop. Is there a faster numpy way to do this without the for loop? The ideal scenario would be temp_arr[labeled_im not in labels] = 0, but it's not really working the way I'd like it to.
labeled_array = np.array([[1,2,3],
[4,5,6],
[7,8,9]])
labels = [2,4,5,6,8]
temp_arr = np.zeros((labeled_array.shape)).astype(int)
for label in labels:
temp_arr[labeled_array == label] = label
>> temp_arr
[[0 2 0]
[4 5 6]
[0 8 0]]
The for loop gets quite slow when there are a lot of iterations to go through, so it is important to improve the execution time with numpy.
You can use define labels as a set and use temp_arr = np.where(np.isin(labeled_array, labels), labeled_array, 0). Although, the difference for such a small array does not seem to be significant.
import numpy as np
import time
labeled_array = np.array([[1,2,3],
[4,5,6],
[7,8,9]])
labels = [2,4,5,6,8]
start = time.time()
temp_arr_0 = np.zeros((labeled_array.shape)).astype(int)
for label in labels:
temp_arr_0[labeled_array == label] = label
end = time.time()
print(f"Loop takes {end - start}")
start = time.time()
temp_arr_1 = np.where(np.isin(labeled_array, labels), labeled_array, 0)
end = time.time()
print(f"np.where takes {end - start}")
labels = {2,4,5,6,8}
start = time.time()
temp_arr_2 = np.where(np.isin(labeled_array, labels), labeled_array, 0)
end = time.time()
print(f"np.where with set takes {end - start}")
outputs
Loop takes 5.3882598876953125e-05
np.where takes 0.00010514259338378906
np.where with set takes 3.314018249511719e-05
In the case the labels are unique in labels (and memory isn't a concern), here's a way to go.
As the very first step, we convert labels to a ndarray
labels = np.array(labels)
Then, we produce two broadcastable arrays from labeled_array and labels
labeled_row = labeled_array.ravel()[np.newaxis, :]
labels_col = labels[:, np.newaxis]
The above code block produces respectively a row array of shape (1,9)
array([[1, 2, 3, 4, 5, 6, 7, 8, 9]])
and a column array of shape (5,1)
array([[2],
[4],
[5],
[6],
[8]])
Now the two shapes are broadcastable (see this page), so we can perform elementwise comparison, e.g.
mask = labeled_row == labels_col
which returns a (5,9)-shaped boolean mask
array([[False, True, False, False, False, False, False, False, False],
[False, False, False, True, False, False, False, False, False],
[False, False, False, False, True, False, False, False, False],
[False, False, False, False, False, True, False, False, False],
[False, False, False, False, False, False, False, True, False]])
In the case the assumption above is fullfilled, you'll have a number of True values per row equal to the number of times the corresponding label appears in your labeled_array. Nonetheless, you can also have all-False rows, e.g. when a label in labels never appears in your labeled_array.
To find out which labels actually appeared in your labeled_array, you can use np.nonzero on the boolean mask
indices = np.nonzero(mask)
which returns a tuple containing the row and column indices of the non-zero (i.e. True) elements
(array([0, 1, 2, 3, 4], dtype=int64), array([1, 3, 4, 5, 7], dtype=int64))
By construction, the first element of the tuple above tells you which labels actually appeared in your labeled_array, e.g.
appeared_labels = labels[indices[0]]
(note that you can have consecutive elements in appeared_labels if that specific label appeared more than once in your labeled_array).
We can now build and fill the output array:
out = np.zeros(labeled_array.size, dtype=int)
out[indices[1]] = labels[indices[0]]
and bring it back to the original shape
out = out.reshape(*labeled_array.shape)
array([[0, 2, 0],
[4, 5, 6],
[0, 8, 0]])

how to average a tensor axis with specified mask in tensorflow

For example:
I have a input tensor(input), shaped (?,10) dtype=float32, the first dimension means batch_size.
And a mask tensor(mask), shaped (?,10). mask[sample_number] is like [True,True,False,...], means the masks
And a label tensor(avg_label), shaped (?,) ,means the correct mean value of masked positions for each sample
I want to train the model , but can't find a good way to get the output.
The tf.reduce_... (e.g. tf.reduce_mean) functions don't seem to support argument about masking.
I try tf.boolean_mask ,But it will flatten the output shape into only one dimension,throwing the sample_number dimension, so it cannot differentiate among the samples
I considered tf.where, like:
masked=tf.where(mask,input,tf.zeros(tf.shape(input)))
avg_out=tf.reduce_mean(masked,axis=1)
loss=tf.pow(avg_out-avg_label,2)
But the code above is certainly not working because False set to 0 will change avg. If use np.nan ,it will always get nan. i wonder if there is a value representing absence when doing reduce operations.
How can i do this?
You can use tf.ragged.boolean_mask to keep the dimensionality.
tf.reduce_mean(tf.ragged.boolean_mask(x, mask=mask), axis=1)
You can use tf.boolean_mask.
In [17]: tensor = tf.constant([[1, 2], [3, 4], [5, 6]])
In [18]: mask = np.array([[True, False], [False, True], [True, False]])
In [19]: masked = tf.boolean_mask(tensor, mask)
In [20]: masked.eval()
Out[20]: array([1, 4, 5], dtype=int32)
In [21]: tf.reduce_mean(masked).eval()
Out[21]: 3
For the False masked values you can use tf.logical_not to toggle the mask.
You can write your own mean function by just counting the non-vanishing entries in your mask
Why not just
import tensorflow as tf
import numpy as np
B, H, W, C = 5, 224, 224, 3
data = np.random.randn(B, H, W, C).astype(np.float32)
mask = np.random.randint(2, size=(B, H, W, C)).astype(np.float32)
expected = (data * mask).sum(axis=(1, 2, 3), keepdims=True)
expected = expected / mask.sum(axis=(1, 2, 3), keepdims=True)
data_op = tf.convert_to_tensor(data)
mask_op = tf.convert_to_tensor(mask)
actual_op = tf.reduce_sum(tf.multiply(data, mask), axis=[1, 2, 3], keepdims=True) / tf.reduce_sum(mask, axis=[1, 2, 3], keepdims=True)
with tf.Session() as sess:
actual = sess.run(actual_op)
np.testing.assert_allclose(actual, expected)

Numpy: find indicies conditioned on values in two different arrays (coming from R)

I have a volume represented by a 3D ndarray, X, with values between, say, 0 and 255, and I have another 3D ndarray, Y, that is an arbitrary mask of the first array, with values of either 0 or 1.
I want to find the indicies of a random sample of 50 voxels that is both greater than zero in X, the 'image', and equal to 1 in Y, the 'mask'.
My experience is with R, where the following would work:
idx <- sample(which(X>0 & Y==1), 50)
Maybe the advantage in R is that I can index 3D arrays linearly, because just using a single index in numpy gives me a 2D matrix, for example.
I guess it probably involves numpy.random.choice, but it doesn't seem like I can use that conditionally, let alone conditioned on two different arrays. Is there another approach I should be using instead?
Here's one way -
N = 50 # number of samples needed (50 for your actual case)
# Get mask based on conditionals
mask = (X>0) & (Y==1)
# Get corresponding linear indices (easier to random sample in next step)
idx = np.flatnonzero(mask)
# Get random sample
rand_idx = np.random.choice(idx, N)
# Format into three columnar output (each col for each dim/axis)
out = np.c_[np.unravel_index(rand_idx, X.shape)]
If you need random sample without replacement, use np.random.choice() with optional arg replace=False.
Sample run -
In [34]: np.random.seed(0)
...: X = np.random.randint(0,4,(2,3,4))
...: Y = np.random.randint(0,2,(2,3,4))
In [35]: N = 5 # number of samples needed (50 for your actual case)
...: mask = (X>0) & (Y==1)
...: idx = np.flatnonzero(mask)
...: rand_idx = np.random.choice(idx, N)
...: out = np.c_[np.unravel_index(rand_idx, X.shape)]
In [37]: mask
Out[37]:
array([[[False, True, True, False],
[ True, False, True, False],
[ True, False, True, True]],
[[False, True, True, False],
[False, False, False, True],
[ True, True, True, True]]], dtype=bool)
In [38]: out
Out[38]:
array([[1, 0, 1],
[0, 0, 1],
[0, 0, 2],
[1, 1, 3],
[1, 1, 3]])
Correlate the output out against the places of True values in mask for a quick verification.
If you don't want to flatten for getting the linear indices and directly get the indices per dim/axis, we can do it like so -
i0,i1,i2 = np.where(mask)
rand_idx = np.random.choice(len(i0), N)
out = np.c_[i0,i1,i2][rand_idx]
For performance, index first and then concatenate with np.c_ at the last step -
out = np.c_[i0[rand_idx], i1[rand_idx], i2[rand_idx]]

vectorized/broadcasted Dot product of numpy arrays with different dimensions

The Problem:
I want to calculate the dot product of a very large set of data. I am able to do this in a nested for-loop, but this is way too slow.
Here is a small example:
import numpy as np
points = np.array([[0.5, 2, 3, 5.5, 8, 11], [1, 2, -1.5, 0.5, 4, 5]])
lines = np.array([[0, 2, 4, 6, 10, 10, 0, 0], [0, 0, 0, 0, 0, 4, 4, 0]])
x1 = lines[0][0:-1]
y1 = lines[1][0:-1]
L1 = np.asarray([x1, y1])
# calculate the relative length of the projection
# of each point onto each line
a = np.diff(lines)
b = points[:,:,None] - L1[:,None,:]
print(a.shape)
print(b.shape)
[rows, cols, pages] = np.shape(b)
Z = np.zeros((cols, pages))
for k in range(cols):
for l in range(pages):
Z[k][l] = a[0][l]*b[0][k][l] + a[1][l]*b[1][k][l]
N = np.linalg.norm(a, axis=0)**2
relativeProjectionLength = np.squeeze(np.asarray(Z/N))
In this example, the first two dimensions of both a and b represent the x- and y-coordinates that I need for the dot product.
The shape of a is (2,7) and b has (2,6,7). Since the dot product reduces the first dimension I would expect the result to be of the shape (6,7). How can I calculate this without the slow loops?
What I have tried:
I think that numpy.dot with correct broadcasting could do the job, however I have trouble setting up the dimensions correctly.
a = a[:, None, :]
Z = np.dot(a,b)
This on gives me the following error:
shapes (2,1,7) and (2,6,7) not aligned: 7 (dim 2) != 6 (dim 1)
You can use np.einsum -
np.einsum('ij,ikj->kj',a,b)
Explanation :
Keep the last axes aligned for the two inputs.
Sum-reduce the first from those.
Let the rest stay, which is the second axis of b.
Usual rules on whether to use einsum or stick to a loopy-dot based method apply here.
numpy.dot does not reduce the first dimension. From the docs:
For N dimensions it is a sum product over the last axis of a and the second-to-last of b:
dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])
That is exactly what the error is telling you: it is attempting to match axis 2 in the first vector to axis 1 in the second.
You can fix this using numpy.rollaxis or better yet numpy.moveaxis. Instead of a = a[:, None, :], do
a = np.movesxis(a, 0, -1)
b = np.moveaxis(b, 0, -2)
Z = np.dot(a, b)
Better yet, you can construct your arrays to have the correct shape up front. For example, transpose lines and do a = np.diff(lines, axis=0).

Numpy inverse mask

I want to inverse the true/false value in my numpy masked array.
So in the example below i don't want to mask out the second value in the data array, I want to mask out the first and third value.
Below is just an example. My masked array is created by a longer process than runs before. So I can not change the mask array itself. Is there another way to inverse the values?
import numpy
data = numpy.array([[ 1, 2, 5 ]])
mask = numpy.array([[0,1,0]])
numpy.ma.masked_array(data, mask)
import numpy
data = numpy.array([[ 1, 2, 5 ]])
mask = numpy.array([[0,1,0]])
numpy.ma.masked_array(data, ~mask) #note this probably wont work right for non-boolean (T/F) values
#or
numpy.ma.masked_array(data, numpy.logical_not(mask))
for example
>>> a = numpy.array([False,True,False])
>>> ~a
array([ True, False, True], dtype=bool)
>>> numpy.logical_not(a)
array([ True, False, True], dtype=bool)
>>> a = numpy.array([0,1,0])
>>> ~a
array([-1, -2, -1])
>>> numpy.logical_not(a)
array([ True, False, True], dtype=bool)
Latest Python version also support '~' character as 'logical_not'. For Example
import numpy
data = numpy.array([[ 1, 2, 5 ]])
mask = numpy.array([[False,True,False]])
result = data[~mask]

Categories

Resources