removing entries from a numpy array - python

I have a multidimensional numpy array with the shape (4, 2000). Each column in the array is a 4D element where the first two elements represent 2D positions.
Now, I have an image mask with the same shape as an image which is binary and tells me which pixels are valid or invalid. An entry of 0 in the mask highlights pixels that are invalid.
Now, I would like to do is filter my first array based on this mask i.e. remove entries where the position elements in my first array correspond to invalid pixels in the image. This can be done by looking up the corresponding entries in the mask and marking those columns to be deleted which correspond to a 0 entry in the mask.
So, something like:
import numpy as np
# Let mask be a 2D array of 0 and 1s
array = np.random.rand(4, 2000)
for i in range(2000):
current = array[:, i]
if mask[current[0], current[1]] <= 0:
# Somehow remove this entry from my array.
If possible, I would like to do this without looping as I have in my incomplete code.

You could select the x and y coordinates from array like this:
xarr, yarr = array[0, :], array[1, :]
Then form a boolean array of shape (2000,) which is True wherever the mask is 1:
idx = mask[xarr, yarr].astype(bool)
mask[xarr, yarr] is using so-called "integer array indexing".
All it means here is that the ith element of idx equals mask[xarr[i], yarr[i]].
Then select those columns from array:
result = array[:, idx]
import numpy as np
mask = np.random.randint(2, size=(500,500))
array = np.random.randint(500, size=(4, 2000))
xarr, yarr = array[0, :], array[1, :]
idx = mask[xarr, yarr].astype(bool)
result = array[:, idx]
cols = []
for i in range(2000):
current = array[:, i]
if mask[current[0], current[1]] > 0:
cols.append(i)
expected = array[:, cols]
assert np.allclose(result, expected)

I'm not sure if I'm reading the question right. Let's try again!
You have an array with 2 dimensions and you want to remove all columns that have masked data. Again, apologies if I've read this wrong.
import numpy.ma as ma
a = ma.array((([[1,2,3,4,5],[6,7,8,9,10]]),mask=[[0,0,0,1,0],[0,0,1,0,0]])
a[:,-a.mask.any(0)] # this is where the action happens
the a.mask.any(0) identifies all columns that are masked into a Boolean array. It's negated (the '-' sign) because we want the inverse, and then it uses that array to remove all masked values via indexing.
This gives me an array:
[[1 2 5],[6 7 10]]
In other words, the array has all removed all columns with masked data anywhere. Hope I got it right this time.

Related

How to change numpy array based on mask array?

I have an array data_set, size:(172800,3) and mask array, size (172800) consists of 1's and 0's. I would like to replace value form data_set array based on values (0 or 1) in mask array by the value defined by me: ex : [0,0,0] or [128,16,128].
I have tried, "np.placed" function but here the problem is the incorrect size of mask array.
I have also checked the more pythonic way:
data_set[mask]= [0,0,0] it worked fine but for some raison only for 2 first elements.
data_set[mask]= [0,0,0]
data_set = np.place(data_set, mask, [0,0,0])
My expected output is to change the value of element in data_set matrix to [0,0,0] if the mask value is 1.
ex.
data_set = [[134,123,90] , [234,45,65] , [32,233,45]]
mask = [ 1, 0, 1]
output = [[0,0,0] , [234, 45,65] , [0,0,0]]
When you try to index your data with mask numpy assumes you are giving it a list of indices. Use boolean arrays, or convert your mask to a list of indices:
import numpy as np
data_set = np.array([[134,123,90] , [234,45,65] , [32,233,45]])
mask = np.array([1, 0, 1])
val = np.zeros(data_set.shape[1])
data_set[mask.astype(bool),:] = val
# or
data_set[np.where(mask),:] = val
The first one converts your array of ints to an array of bools, while the second one creates a list of indexes where the mask is not zero.
You can set val to whatever value you need as long as it matches the remaining dimension of the dataset (in this case, 3).

Maintain 2D structure after logical indexing a numpy array

I am tracking a dynamically changing mask that rolls using some input shift. This mask stores values that determine where I can trust values in another array of the same shape. An example of how the mask changes over each iteration is below. I have a large stack of logical checks that determine how to set the rolled parts of the mask to zero based on whether the x and y values of the shift are equal to 0 or are positive or negative. Here I just hardcoded it all for clarity.
import numpy as np
mask = np.full((8,8), 10)
#Iteration 1
mask = np.roll(mask, (0, 1), axis = (0,1))
mask[:, :1] = 0
#logical indexing happens here
mask += 1
print (mask)
#Iteration 2
mask = np.roll(mask, (1, 0), axis = (0,1))
mask[:1, :] = 0
#logical indexing happens here
mask +=1
print (mask)
#Iteration 3
mask = np.roll(mask, (2, -1), axis = (0,1))
mask[:, -1:] = 0
mask[:2, :] = 0
#logical indexing happens here
mask +=1
print (mask)
After each iteration and before the mask is increased by one, I need to index into and pull the values of a second array where the mask is above some threshold (10 in this case). Since I am rolling and setting values, I always know that the part of the mask that fulfills this condition can be broadcast into a 2d array. A simplified example of what I am doing now is below where arr2 is a flattened array.
import numpy as np
arr1 = np.arange(0, 64, 1).reshape((8,8))
mask = np.full((8,8), 10)
mask[:, 0] = 0
arr2 = arr1[mask >= 10]
How can I keep arr2 as a 2d array where the mask is above the set threshold?
I do not know a priori what the shift will be that is applied to the mask so I have to rely on the values in the mask to determine the shape of the resulting array. My arrays are much larger than this example and the shifts are between -5 and 5 so I know I won't get close to setting the entire array below the threshold. The idea is that after ~10 iterations, some parts of the array become trustworthy again and can be useful information after the logical index.
The answer here is a work around and was obvious now that it has simmered in my mind for a while. Basically, since I know that the resulting area will be square, I can just count across a row and column where each index meets my condition. So continuing my example from before I just add in a couple lines to determine how many values in a row and column meet my condition.
import numpy as np
#Initializing array
arr1 = np.arange(0, 64, 1).reshape((8,8))
#mask array
mask = np.full((8,8), 10)
#Setting some rows and cols to zero to simulate my roll functionality
mask[:, 0] = 0
mask[:2, :] = 0
#Summing across a row and col where condition is met
sizex = np.sum(mask[4, :] >= 10)
sizey = np.sum(mask[:, 4] >= 10)
#Using the mask to index into the original array and reshaping
arr2 = arr1[mask >= 10].reshape((sizey, sizex))

How to numpy-ify a two dimensional conditional lookup?

I'm trying to vectorize or otherwise make faster (likely using numpy) a lookup/matching for loop. I've looked into np.vectorize, numpy indexing and np.where, but can't find the right implementation/combination to fit my needs.
Code in Question:
Sx = np.zeros((Np+1, 2*N+1))
rows, cols = prepped_array.shape[0], prepped_array.shape[1]
for ind1 in range(rows):
for ind2 in range(cols):
if prepped_array[ind1][ind2][0] != -1:
Sx[ind1, ind2] = M[prepped_array[ind1][ind2][0], prepped_array[ind1][ind2][1]]
prepped_array is a lookup table (initialized to all [-1, -1]) where values have been replaced where they should be changed in Sx.
M is transformed input that we want to map into the Sx array.
Any ideas/pointers? Thanks!
You can use a boolean mask for indexing the Sx and prepped_array and then use two index arrays, derived from the prepped_array, to index into the M array. The code might speak clearer than the previous sentence:
mask = prepped_array[:, :, 0] != -1
Sx[mask] = M[tuple(prepped_array[mask].T)]
Let's look at the involved steps:
mask = prepped_array[:, :, 0] != -1 creates a 2D boolean array indicating where the condition is met.
prepped_array[mask] creates a 2D array where entries from the previous 3rd dimensions appear now along the 2nd dimension; the first dimensions corresponds to each True instance in mask.
tuple(prepped_array[mask].T) creates two 1D arrays which can be used to further index into other arrays: the first array denotes row indices and the second array denotes column indices.
So Sx[mask] = M[tuple(prepped_array[mask].T)] maps the indices contained in prepped_array to the array M using the previous two 1D index arrays.
Sx[mask] finally references those elements in Sx that for which the condition in prepped_array[:, :, 0] is met.

Numpy 2D spatial mask to be filled with specific values from a 2D array to form a 3D structure

I'm quite new to programming in general, but I could not figure this problem out until now.
I've got a two-dimensional numpy array mask, lets say mask.shape is (3800,3500)which is filled with 0s and 1s representing a spatial resolution of a 2D image, where a 1 represents a visible pixel and 0 represents background.
I've got a second two-dimensional array data of data.shape is (909,x) where x is exactly the amount of 1s in the first array. I now want to replace each 1 in the first array with a vector of length 909 from the second array. Resulting in a final 3D array of shape(3800,3500,909) which is basically a 2D x by y image where select pixels have a spectrum of 909 values in z direction.
I tried
mask_vector = mask.flatten
ones = np.ones((909,1))
mask_909 = mask_vector.dot(ones) #results in a 13300000 by 909 2d array
count = 0
for i in mask_vector:
if i == 1:
mask_909[i,:] = data[:,count]
count += 1
result = mask_909.reshape((3800,3500,909))
This results in a viable 3D array giving a 2D picture when doing plt.imshow(result.mean(axis=2))
But the values are still only 1s and 0s not the wanted spectral data in z direction.
I also tried using np.where but broadcasting fails as the two 2D arrays have clearly different shapes.
Has anybody got a solution? I am sure that there must be an easy way...
Basically, you simply need to use np.where to locate the 1s in your mask array. Then initialize your result array to zero and replace the third dimension with your data using the outputs of np.where:
import numpy as np
m, n, k = 380, 350, 91
mask = np.round(np.random.rand(m, n))
x = np.sum(mask == 1)
data = np.random.rand(k, x)
result = np.zeros((m, n, k))
row, col = np.where(mask == 1)
result[row,col] = data.transpose()

How can I create a masked array with columns filtered out by column sum in numpy?

I'm using python and numpy. I have an array with a number of columns, which I want to mask (not remove, in order to preserve indices) depending on whether the column sum is below a threshold. Here is what I have:
x_frequencies = np.sum(X, axis=0)
cutoff = np.percentile(x_frequencies, q=99)
mask = np.sum(X, axis=0) < cutoff
print(X.shape)
print(mask.shape)
print(mask[0].shape)
X_filtered = X[:,mask] # Error here
and the output for this is
(22987, 29308)
(1, 29308)
(1, 29308)
# Stacktrace
IndexError: invalid index shape
So I have two questions: firstly, how can I do what I'm intending to do; secondly, how can I get a 1d array out of mask (i.e. one with shape (29308,)) because I've tried reshape and flatten and neither of them are changing the shape.
Edit: X is a scipy.sparse.csr.csr_matrix
SOLVED: Had to convert mask from a matrix into an array:
mask = np.array(np.sum(X, axis=0) < cutoff).flatten()
Thank you to Divakar for asking about the type of X; I'm new to scipy/numpy and didn't know the difference between the matrix and array types

Categories

Resources