Remove empty 'rows' and 'columns' from 3D numpy pixel array - python

I essentially want to crop an image with numpy—I have a 3-dimension numpy.ndarray object, ie:
[ [0,0,0,0], [255,255,255,255], ....]
[0,0,0,0], [255,255,255,255], ....] ]
where I want to remove whitespace, which, in context, is known to be either entire rows or entire columns of [0,0,0,0].
Letting each pixel just be a number for this example, I'm trying to essentially do this:
Given this: *EDIT: chose a slightly more complex example to clarify
[ [0,0,0,0,0,0]
[0,0,1,1,1,0]
[0,1,1,0,1,0]
[0,0,0,1,1,0]
[0,0,0,0,0,0]]
I'm trying to create this:
[ [0,1,1,1],
[1,1,0,1],
[0,0,1,1] ]
I can brute force this with loops, but intuitively I feel like numpy has a better means of doing this.

In general, you'd want to look into scipy.ndimage.label and scipy.ndimage.find_objects to extract the bounding box of contiguous regions fulfilling a condition.
However, in this case, you can do it fairly easily with "plain" numpy.
I'm going to assume you have a nrows x ncols x nbands array here. The other convention of nbands x nrows x ncols is also quite common, so have a look at the shape of your array.
With that in mind, you might do something similar to:
mask = im == 0
all_white = mask.sum(axis=2) == 0
rows = np.flatnonzero((~all_white).sum(axis=1))
cols = np.flatnonzero((~all_white).sum(axis=0))
crop = im[rows.min():rows.max()+1, cols.min():cols.max()+1, :]
For your 2D example, it would look like:
import numpy as np
im = np.array([[0,0,0,0,0,0],
[0,0,1,1,1,0],
[0,1,1,0,1,0],
[0,0,0,1,1,0],
[0,0,0,0,0,0]])
mask = im == 0
rows = np.flatnonzero((~mask).sum(axis=1))
cols = np.flatnonzero((~mask).sum(axis=0))
crop = im[rows.min():rows.max()+1, cols.min():cols.max()+1]
print crop
Let's break down the 2D example a bit.
In [1]: import numpy as np
In [2]: im = np.array([[0,0,0,0,0,0],
...: [0,0,1,1,1,0],
...: [0,1,1,0,1,0],
...: [0,0,0,1,1,0],
...: [0,0,0,0,0,0]])
Okay, now let's create a boolean array that meets our condition:
In [3]: mask = im == 0
In [4]: mask
Out[4]:
array([[ True, True, True, True, True, True],
[ True, True, False, False, False, True],
[ True, False, False, True, False, True],
[ True, True, True, False, False, True],
[ True, True, True, True, True, True]], dtype=bool)
Also, note that the ~ operator functions as logical_not on boolean arrays:
In [5]: ~mask
Out[5]:
array([[False, False, False, False, False, False],
[False, False, True, True, True, False],
[False, True, True, False, True, False],
[False, False, False, True, True, False],
[False, False, False, False, False, False]], dtype=bool)
With that in mind, to find rows where all elements are false, we can sum across columns:
In [6]: (~mask).sum(axis=1)
Out[6]: array([0, 3, 3, 2, 0])
If no elements are True, we'll get a 0.
And similarly to find columns where all elements are false, we can sum across rows:
In [7]: (~mask).sum(axis=0)
Out[7]: array([0, 1, 2, 2, 3, 0])
Now all we need to do is find the first and last of these that are not zero. np.flatnonzero is a bit easier than nonzero, in this case:
In [8]: np.flatnonzero((~mask).sum(axis=1))
Out[8]: array([1, 2, 3])
In [9]: np.flatnonzero((~mask).sum(axis=0))
Out[9]: array([1, 2, 3, 4])
Then, you can easily slice out the region based on min/max nonzero elements:
In [10]: rows = np.flatnonzero((~mask).sum(axis=1))
In [11]: cols = np.flatnonzero((~mask).sum(axis=0))
In [12]: im[rows.min():rows.max()+1, cols.min():cols.max()+1]
Out[12]:
array([[0, 1, 1, 1],
[1, 1, 0, 1],
[0, 0, 1, 1]])

One way of implementing this for arbitrary dimensions would be:
import numpy as np
def trim(arr, mask):
bounding_box = tuple(
slice(np.min(indexes), np.max(indexes) + 1)
for indexes in np.where(mask))
return arr[bounding_box]
A slightly more flexible solution (where you could indicate which axis to act on) is available in FlyingCircus (Disclaimer: I am the main author of the package).

You could use np.nonzero function to find your zero values, then slice nonzero elements from your original array and reshape to what you want:
import numpy as np
n = np.array([ [0,0,0,0,0,0],
[0,0,1,1,1,0],
[0,0,1,1,1,0],
[0,0,1,1,1,0],
[0,0,0,0,0,0]])
elems = n[n.nonzero()]
In [415]: elems
Out[415]: array([1, 1, 1, 1, 1, 1, 1, 1, 1])
In [416]: elems.reshape(3,3)
Out[416]:
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])

Related

Compare matrix values columnwise with the corresponding mean

Having a matrix with d features and n samples, I would like to compare each feature of a sample (row) against the mean of the column corresponding to that feature and then assign a corresponding label 1 or 0.
Eg. for a matrix X = [x11, x12; x21, x22] I compute the mean of the two columns (mu1, mu2) and then I keep on comparing (x11, x21 with mu1 and so on) to check whether these are greater or smaller than mu and to then assign a label to them according to the if statement (see below).
I have the mean vector for each column i.e. of length d.
I am now using for-loops however these are not computationally effective.
X_copy = X_train;
mu = np.mean(X_train, axis = 0)
for i in range(X_train.shape[0]):
for j in range(X_train.shape[1]):
if X_train[i,j]<mu[j]: #less than mean for the col, assign 0
X_copy[i,j] = 0
else:
X_copy[i,j] = 1 #more than or equal to mu for the col, assign 1
Is there any better alternative?
I don't have much experience with python hence thank you for understanding.
Direct comparison, which makes the average vector compare on each row of the original array. Then convert the data type of the result to int:
>>> X_train = np.random.rand(3, 4)
>>> X_train
array([[0.4789953 , 0.84095907, 0.53538172, 0.04880835],
[0.64554335, 0.50904539, 0.34069036, 0.5290601 ],
[0.84664389, 0.63984867, 0.66111495, 0.89803495]])
>>> (X_train >= X_train.mean(0)).astype(int)
array([[0, 1, 1, 0],
[0, 0, 0, 1],
[1, 0, 1, 1]])
Update:
There is a broadcast mechanism for operations between numpy arrays. For example, an array is compared with a number, which will make the number swim among all elements of the array and compare them one by one:
>>> X_train > 0.5
array([[False, True, True, False],
[ True, True, False, True],
[ True, True, True, True]])
>>> X_train > np.full(X_train.shape, 0.5) # Equivalent effect.
array([[False, True, True, False],
[ True, True, False, True],
[ True, True, True, True]])
Similarly, you can compare a vector with a 2D array, as long as the length of the vector is the same as that of the first dimension of the array:
>>> mu = X_train.mean(0)
>>> X_train > mu
array([[False, True, True, False],
[False, False, False, True],
[ True, False, True, True]])
>>> X_train > np.tile(mu, (X_train.shape[0], 1)) # Equivalent effect.
array([[False, True, True, False],
[False, False, False, True],
[ True, False, True, True]])
How do I compare other axes? My English is not good, so it is difficult for me to explain. Here I provide the official explanation of numpy. I hope you can get started through it: Broadcasting

Pytorch differences between two tensors

I have two tensors like this:
1st tensor
[[0,0],[0,1],[0,2],[1,3],[1,4],[2,1],[2,4]]
2nd tensor
[[0,1],[0,2],[1,4],[2,4]]
I want the result tensor to be like this:
[[0,0],[1,3],[2,1]] # differences between 1st tensor and 2nd tensor
I have tried to use set, list, torch.where,.. and couldn't find any good way to achieve this. Is there any way to get the different rows between two different sizes of tensors? (need to be efficient)
You can perform a pairwairse comparation to see which elements of the first tensor are present in the second vector.
a = torch.as_tensor([[0,0],[0,1],[0,2],[1,3],[1,4],[2,1],[2,4]])
b = torch.as_tensor([[0,1],[0,2],[1,4],[2,4]])
# Expand a to (7, 1, 2) to broadcast to all b
a_exp = a.unsqueeze(1)
# c: (7, 4, 2)
c = a_exp == b
# Since we want to know that all components of the vector are equal, we reduce over the last fim
# c: (7, 4)
c = c.all(-1)
print(c)
# Out: Each row i compares the ith element of a against all elements in b
# Therefore, if all row is false means that the a element is not present in b
tensor([[False, False, False, False],
[ True, False, False, False],
[False, True, False, False],
[False, False, False, False],
[False, False, True, False],
[False, False, False, False],
[False, False, False, True]])
non_repeat_mask = ~c.any(-1)
# Apply the mask to a
print(a[non_repeat_mask])
tensor([[0, 0],
[1, 3],
[2, 1]])
If you feel cool you can do it one liner :)
a[~a.unsqueeze(1).eq(b).all(-1).any(-1)]
In case someone is looking for a solution with a vector of dim=1, this is the adaptation of #Guillem solution
a = torch.tensor(list(range(0, 10)))
b = torch.tensor(list(range(5,15)))
a[~a.unsqueeze(1).eq(b).any(1)]
outputs:
tensor([0, 1, 2, 3, 4])
Here is another solution, when you want the absolute difference, and not just comparing the first with the second. Be careful when using it, because order here doesnt matter
combined = torch.cat((a, b))
uniques, counts = combined.unique(return_counts=True)
difference = uniques[counts == 1]
outputs
tensor([ 0, 1, 2, 3, 4, 10, 11, 12, 13, 14])

Efficient way to restrict a numpy boolean selector to the first few true values

I have a numpy boolean selector array which I can apply to array a. (not actually random in the problem domain, this is just convenient for the example). But I actually want to select using only the first n True entries of selector (up to n=3 in the example). So given selector plus a parameter n, how do I generate select_first_few, using numpy operations, thus avoiding an iterative loop?
>>> import numpy as np
>>> selector = np.random.random(10) > 0.5
>>> a = np.arange(10)
>>> selector
array([ True, False, True, True, True, False, True, False, True,
False])
>>> chosen, others = a[selector], a[~selector]
>>> chosen
array([0, 2, 3, 4, 6, 8])
>>> others
array([1, 5, 7, 9])
>>> select_first_few = np.array([ True, False, True, True, False, False, False, False, False,
... False])
>>> chosen_few, tough_luck = a[select_first_few], a[~select_first_few]
>>> chosen_few
array([0, 2, 3])
>>> tough_luck
array([1, 4, 5, 6, 7, 8, 9])
Approach #1
One approach would be using cumsum and argmax to get the extent and then slice thereafter to set False -
In [40]: n = 3
In [41]: selector
Out[41]:
array([ True, False, True, True, True, False, True, False, True,
False])
In [42]: selector[(selector.cumsum()>n).argmax():] = 0
In [43]: selector # your select_first_few mask
Out[43]:
array([ True, False, True, True, False, False, False, False, False,
False])
Then, use this new selector to select and de-select elements off the input array.
Approach #2
Another approach would be to mask-the-mask -
n = 3
C = np.count_nonzero(selector)
newmask = np.zeros(C, dtype=bool)
newmask[:n] = 1
selector[selector] = newmask
Sample run -
In [62]: selector
Out[62]:
array([ True, False, True, True, True, False, True, False, True,
False])
In [63]: n = 3
...: C = np.count_nonzero(selector)
...: newmask = np.zeros(C, dtype=bool)
...: newmask[:n] = 1
...: selector[selector] = newmask
In [64]: selector
Out[64]:
array([ True, False, True, True, False, False, False, False, False,
False])
Or make it shorter with on-the-fly concatenation of booleans -
n = 3
C = np.count_nonzero(selector)
selector[selector] = np.r_[np.ones(n,dtype=bool),np.zeros(C-n,dtype=bool)]
Approach #3
Most simplistic one -
selector &= selector.cumsum()<=n
Get the all the choosen indices in a list and slice this list.
Then use list comprehension to retrieve the data at those choosen indices.
import numpy as np
selector = np.random.random(10) > 0.5
data = np.arange(10)
choosen_indices = np.where(selector)
#select first 3 choosen
choosen_few_indices = choosen_indices[:3]
choosen_few = [data[i] for i in choosen_few_indices]
# if you are also interested in the not choosen data
not_choosen_indices = list(set(range(len(data))) - set(choosen_indices))
# proceed ...

Getting a grid of a matrix via logical indexing in Numpy

I'm trying to rewrite a function using numpy which is originally in MATLAB. There's a logical indexing part which is as follows in MATLAB:
X = reshape(1:16, 4, 4).';
idx = [true, false, false, true];
X(idx, idx)
ans =
1 4
13 16
When I try to make it in numpy, I can't get the correct indexing:
X = np.arange(1, 17).reshape(4, 4)
idx = [True, False, False, True]
X[idx, idx]
# Output: array([6, 1, 1, 6])
What's the proper way of getting a grid from the matrix via logical indexing?
You could also write:
>>> X[np.ix_(idx,idx)]
array([[ 1, 4],
[13, 16]])
In [1]: X = np.arange(1, 17).reshape(4, 4)
In [2]: idx = np.array([True, False, False, True]) # note that here idx has to
# be an array (not a list)
# or boolean values will be
# interpreted as integers
In [3]: X[idx][:,idx]
Out[3]:
array([[ 1, 4],
[13, 16]])
In numpy this is called fancy indexing. To get the items you want you should use a 2D array of indices.
You can use an outer to make from your 1D idx a proper 2D array of indices. The outers, when applied to two 1D sequences, compare each element of one sequence to each element of the other. Recalling that True*True=True and False*True=False, the np.multiply.outer(), which is the same as np.outer(), can give you the 2D indices:
idx_2D = np.outer(idx,idx)
#array([[ True, False, False, True],
# [False, False, False, False],
# [False, False, False, False],
# [ True, False, False, True]], dtype=bool)
Which you can use:
x[ idx_2D ]
array([ 1, 4, 13, 16])
In your real code you can use x=[np.outer(idx,idx)] but it does not save memory, working the same as if you included a del idx_2D after doing the slice.

Check if values in a set are in a numpy array in python

I want to check if a NumPyArray has values in it that are in a set, and if so set that area in an array = 1. If not set a keepRaster = 2.
numpyArray = #some imported array
repeatSet= ([3, 5, 6, 8])
confusedRaster = numpyArray[numpy.where(numpyArray in repeatSet)]= 1
Yields:
<type 'exceptions.TypeError'>: unhashable type: 'numpy.ndarray'
Is there a way to loop through it?
for numpyArray
if numpyArray in repeatSet
confusedRaster = 1
else
keepRaster = 2
To clarify and ask for a bit further help:
What I am trying to get at, and am currently doing, is putting a raster input into an array. I need to read values in the 2-d array and create another array based on those values. If the array value is in a set then the value will be 1. If it is not in a set then the value will be derived from another input, but I'll say 77 for now. This is what I'm currently using. My test input has about 1500 rows and 3500 columns. It always freezes at around row 350.
for rowd in range(0, width):
for cold in range (0, height):
if numpyarray.item(rowd,cold) in repeatSet:
confusedArray[rowd][cold] = 1
else:
if numpyarray.item(rowd,cold) == 0:
confusedArray[rowd][cold] = 0
else:
confusedArray[rowd][cold] = 2
In versions 1.4 and higher, numpy provides the in1d function.
>>> test = np.array([0, 1, 2, 5, 0])
>>> states = [0, 2]
>>> np.in1d(test, states)
array([ True, False, True, False, True], dtype=bool)
You can use that as a mask for assignment.
>>> test[np.in1d(test, states)] = 1
>>> test
array([1, 1, 1, 5, 1])
Here are some more sophisticated uses of numpy's indexing and assignment syntax that I think will apply to your problem. Note the use of bitwise operators to replace if-based logic:
>>> numpy_array = numpy.arange(9).reshape((3, 3))
>>> confused_array = numpy.arange(9).reshape((3, 3)) % 2
>>> mask = numpy.in1d(numpy_array, repeat_set).reshape(numpy_array.shape)
>>> mask
array([[False, False, False],
[ True, False, True],
[ True, False, True]], dtype=bool)
>>> ~mask
array([[ True, True, True],
[False, True, False],
[False, True, False]], dtype=bool)
>>> numpy_array == 0
array([[ True, False, False],
[False, False, False],
[False, False, False]], dtype=bool)
>>> numpy_array != 0
array([[False, True, True],
[ True, True, True],
[ True, True, True]], dtype=bool)
>>> confused_array[mask] = 1
>>> confused_array[~mask & (numpy_array == 0)] = 0
>>> confused_array[~mask & (numpy_array != 0)] = 2
>>> confused_array
array([[0, 2, 2],
[1, 2, 1],
[1, 2, 1]])
Another approach would be to use numpy.where, which creates a brand new array, using values from the second argument where mask is true, and values from the third argument where mask is false. (As with assignment, the argument can be a scalar or an array of the same shape as mask.) This might be a bit more efficient than the above, and it's certainly more terse:
>>> numpy.where(mask, 1, numpy.where(numpy_array == 0, 0, 2))
array([[0, 2, 2],
[1, 2, 1],
[1, 2, 1]])
Here is one possible way of doing what you whant:
numpyArray = np.array([1, 8, 35, 343, 23, 3, 8]) # could be n-Dimensional array
repeatSet = np.array([3, 5, 6, 8])
mask = (numpyArray[...,None] == repeatSet[None,...]).any(axis=-1)
print mask
>>> [False True False False False True True]
In recent numpy you could use a combination of np.isin and np.where to achieve this result. The first method outputs a boolean numpy array that evaluates to True where its vlaues are equal to an array-like specified test element (see doc), while with the second you could create a new array that set some a value where the specified confition evaluates to True and another value where False.
Example
I'll make an example with a random array but using the specific values you provided.
import numpy as np
repeatSet = ([2, 5, 6, 8])
arr = np.array([[1,5,1],
[0,1,0],
[0,0,0],
[2,2,2]])
out = np.where(np.isin(arr, repeatSet), 1, 77)
> out
array([[77, 1, 77],
[77, 77, 77],
[77, 77, 77],
[ 1, 1, 1]])

Categories

Resources