I am tracking a dynamically changing mask that rolls using some input shift. This mask stores values that determine where I can trust values in another array of the same shape. An example of how the mask changes over each iteration is below. I have a large stack of logical checks that determine how to set the rolled parts of the mask to zero based on whether the x and y values of the shift are equal to 0 or are positive or negative. Here I just hardcoded it all for clarity.
import numpy as np
mask = np.full((8,8), 10)
#Iteration 1
mask = np.roll(mask, (0, 1), axis = (0,1))
mask[:, :1] = 0
#logical indexing happens here
mask += 1
print (mask)
#Iteration 2
mask = np.roll(mask, (1, 0), axis = (0,1))
mask[:1, :] = 0
#logical indexing happens here
mask +=1
print (mask)
#Iteration 3
mask = np.roll(mask, (2, -1), axis = (0,1))
mask[:, -1:] = 0
mask[:2, :] = 0
#logical indexing happens here
mask +=1
print (mask)
After each iteration and before the mask is increased by one, I need to index into and pull the values of a second array where the mask is above some threshold (10 in this case). Since I am rolling and setting values, I always know that the part of the mask that fulfills this condition can be broadcast into a 2d array. A simplified example of what I am doing now is below where arr2 is a flattened array.
import numpy as np
arr1 = np.arange(0, 64, 1).reshape((8,8))
mask = np.full((8,8), 10)
mask[:, 0] = 0
arr2 = arr1[mask >= 10]
How can I keep arr2 as a 2d array where the mask is above the set threshold?
I do not know a priori what the shift will be that is applied to the mask so I have to rely on the values in the mask to determine the shape of the resulting array. My arrays are much larger than this example and the shifts are between -5 and 5 so I know I won't get close to setting the entire array below the threshold. The idea is that after ~10 iterations, some parts of the array become trustworthy again and can be useful information after the logical index.
The answer here is a work around and was obvious now that it has simmered in my mind for a while. Basically, since I know that the resulting area will be square, I can just count across a row and column where each index meets my condition. So continuing my example from before I just add in a couple lines to determine how many values in a row and column meet my condition.
import numpy as np
#Initializing array
arr1 = np.arange(0, 64, 1).reshape((8,8))
#mask array
mask = np.full((8,8), 10)
#Setting some rows and cols to zero to simulate my roll functionality
mask[:, 0] = 0
mask[:2, :] = 0
#Summing across a row and col where condition is met
sizex = np.sum(mask[4, :] >= 10)
sizey = np.sum(mask[:, 4] >= 10)
#Using the mask to index into the original array and reshaping
arr2 = arr1[mask >= 10].reshape((sizey, sizex))
Related
I need to do an element-by-element match of a 6x3 array with a 2x2 array. Return the bigger array with a True or False in the corresponding elements based on a match or no match. For the elements in the bigger array that cannot be compared e.g. column 3 and rows 3 to 6, I need to fill with NaN.
Here's my pseudo code:
first_arr_rows, first_arr_cols = 6, 3 # This is an example and will be dynamically initialized
sec_arr_rows, sec_arr_cols = 2, 2 # This is an example and will be dynamically initialized
if (sec_arr_cols <= first_arr_cols) and (sec_arr_rows <= first_arr_rows):
compared = arr1[:sec_arr_rows,:sec_arr_cols] == arr2[:sec_arr_rows,:sec_arr_cols]
# the above statement creates a 2x2 array
new_cols = np.zeros((first_arr_rows, first_arr_cols - sec_arr_cols))
new_rows = np.zeros((first_arr_rows - sec_arr_rows, compared.shape[1]))
compared = np.append(compared, new_rows, axis=0)
compared = np.append(compared, new_cols, axis=1)
compared[sec_arr_rows+1:,:] = np.nan
compared[:,sec_arr_cols:] = np.nan
Is there a simpler, more efficient way in Python to achieve this?
Here is my solution assuming the first array is always bigger than the second (see comments for general solution, e.g for the second array is bigger on some dimension)
import numpy as np
a = np.arange(18).reshape(6, 3) # 6x3 array
b = np.arange(4).reshape(2, 2) # 2x2 array
# create a resulting array of `nan` values
# in general case, desired shape is
# np.max([a.shape, b.shape], axis=0)
result = np.full(a.shape, np.nan)
# our selection have a shape of the smaller array
# in general case:
# tuple(map(slice, np.min([a.shape, b.shape], axis=0)))
selection = (slice(b.shape[0]), slice(b.shape[1]))
# compare values according the selection
result[selection] = a[selection] == b[selection]
First I create my array
myarray = np.random.random_integers(0,10, size=20)
Then, I want to set 20% of the elements in the array to 0 (or some other number). How should I do this? Apply a mask?
You can calculate the indices with np.random.choice, limiting the number of chosen indices to the percentage:
indices = np.random.choice(np.arange(myarray.size), replace=False,
size=int(myarray.size * 0.2))
myarray[indices] = 0
For others looking for the answer in case of nd-array, as proposed by user holi:
my_array = np.random.rand(8, 50)
indices = np.random.choice(my_array.shape[1]*my_array.shape[0], replace=False, size=int(my_array.shape[1]*my_array.shape[0]*0.2))
We multiply the dimensions to get an array of length dim1*dim2, then we apply this indices to our array:
my_array[np.unravel_index(indices, my_array.shape)] = 0
The array is now masked.
Use np.random.permutation as random index generator, and take the first 20% of the index.
myarray = np.random.random_integers(0,10, size=20)
n = len(myarray)
random_idx = np.random.permutation(n)
frac = 20 # [%]
zero_idx = random_idx[:round(n*frac/100)]
myarray[zero_idx] = 0
If you want the 20% to be random:
random_list = []
array_len = len(myarray)
while len(random_list) < (array_len/5):
random_int = math.randint(0,array_len)
if random_int not in random_list:
random_list.append(random_int)
for position in random_list:
myarray[position] = 0
return myarray
This would ensure you definitely get 20% of the values, and RNG rolling the same number many times would not result in less than 20% of the values being 0.
Assume your input numpy array is A and p=0.2. The following are a couple of ways to achieve this.
Exact Masking
ones = np.ones(A.size)
idx = int(min(p*A.size, A.size))
ones[:idx] = 0
A *= np.reshape(np.random.permutation(ones), A.shape)
Approximate Masking
This is commonly done in several denoising objectives, most notably the Masked Language Modeling in Transformers pre-training. Here is a more pythonic way of setting a certain proportion (say 20%) of elements to zero.
A *= np.random.binomial(size=A.shape, n=1, p=0.8)
Another Alternative:
A *= np.random.randint(0, 2, A.shape)
I'm using python and numpy. I have an array with a number of columns, which I want to mask (not remove, in order to preserve indices) depending on whether the column sum is below a threshold. Here is what I have:
x_frequencies = np.sum(X, axis=0)
cutoff = np.percentile(x_frequencies, q=99)
mask = np.sum(X, axis=0) < cutoff
print(X.shape)
print(mask.shape)
print(mask[0].shape)
X_filtered = X[:,mask] # Error here
and the output for this is
(22987, 29308)
(1, 29308)
(1, 29308)
# Stacktrace
IndexError: invalid index shape
So I have two questions: firstly, how can I do what I'm intending to do; secondly, how can I get a 1d array out of mask (i.e. one with shape (29308,)) because I've tried reshape and flatten and neither of them are changing the shape.
Edit: X is a scipy.sparse.csr.csr_matrix
SOLVED: Had to convert mask from a matrix into an array:
mask = np.array(np.sum(X, axis=0) < cutoff).flatten()
Thank you to Divakar for asking about the type of X; I'm new to scipy/numpy and didn't know the difference between the matrix and array types
I was hoping I would solve this before I finished the post, but here it goes:
I have an array array1 with a shape (4808L, 5135L) and I am trying to select a rectangular subset of the array. Specifically, I am trying to select the all values in rows 4460:4807 and all the values in columns 2718:2967.
To start I create a mask of the same shape as array1 like:
mask = np.zeros(array1.shape[:2], dtype = "uint8")
mask[array1== 399] = 255
Then I am trying to find the index of the points where mask = 255:
true_points = np.argwhere(mask)
top_left = true_points.min(axis=0)
# take the largest points and use them as the bottom right of your crop
bottom_right = true_points.max(axis=0)
cmask = mask[top_left[0]:bottom_right[0]+1, top_left[1]:bottom_right[1]+1]
Where:
top_left = array([4460, 2718], dtype=int64)
bottom_right = array([4807, 2967], dtype=int64)
cmask looks correct. Then using top_left and bottom_right I am trying to subset array1 using:
crop_array = array1[top_left[0]:bottom_right[0]+1, top_left[1]:bottom_right[1]+1]
This results in a crop_array have the same shape of cmask, but the values are populated incorrectly. Since cmask[0][0] = 0 I would expect crop_array[0][0] to be equal to zero as well.
How do I poulate crop_array with the values from array1 while retaining the structure of the cmask?
Thanks in advance.
If I understood your question correctly, you're looking for the .copy() method. An example matching your indices and variables:
import numpy as np
array1 = np.random.rand(4808,5135)
crop_array = array1[4417:,2718:2967].copy()
assert np.all(np.equal(array1[4417:,2718:2967], crop_array)) == True, (
'Equality Failed'
)
I have a multidimensional numpy array with the shape (4, 2000). Each column in the array is a 4D element where the first two elements represent 2D positions.
Now, I have an image mask with the same shape as an image which is binary and tells me which pixels are valid or invalid. An entry of 0 in the mask highlights pixels that are invalid.
Now, I would like to do is filter my first array based on this mask i.e. remove entries where the position elements in my first array correspond to invalid pixels in the image. This can be done by looking up the corresponding entries in the mask and marking those columns to be deleted which correspond to a 0 entry in the mask.
So, something like:
import numpy as np
# Let mask be a 2D array of 0 and 1s
array = np.random.rand(4, 2000)
for i in range(2000):
current = array[:, i]
if mask[current[0], current[1]] <= 0:
# Somehow remove this entry from my array.
If possible, I would like to do this without looping as I have in my incomplete code.
You could select the x and y coordinates from array like this:
xarr, yarr = array[0, :], array[1, :]
Then form a boolean array of shape (2000,) which is True wherever the mask is 1:
idx = mask[xarr, yarr].astype(bool)
mask[xarr, yarr] is using so-called "integer array indexing".
All it means here is that the ith element of idx equals mask[xarr[i], yarr[i]].
Then select those columns from array:
result = array[:, idx]
import numpy as np
mask = np.random.randint(2, size=(500,500))
array = np.random.randint(500, size=(4, 2000))
xarr, yarr = array[0, :], array[1, :]
idx = mask[xarr, yarr].astype(bool)
result = array[:, idx]
cols = []
for i in range(2000):
current = array[:, i]
if mask[current[0], current[1]] > 0:
cols.append(i)
expected = array[:, cols]
assert np.allclose(result, expected)
I'm not sure if I'm reading the question right. Let's try again!
You have an array with 2 dimensions and you want to remove all columns that have masked data. Again, apologies if I've read this wrong.
import numpy.ma as ma
a = ma.array((([[1,2,3,4,5],[6,7,8,9,10]]),mask=[[0,0,0,1,0],[0,0,1,0,0]])
a[:,-a.mask.any(0)] # this is where the action happens
the a.mask.any(0) identifies all columns that are masked into a Boolean array. It's negated (the '-' sign) because we want the inverse, and then it uses that array to remove all masked values via indexing.
This gives me an array:
[[1 2 5],[6 7 10]]
In other words, the array has all removed all columns with masked data anywhere. Hope I got it right this time.