How to change numpy array based on mask array? - python

I have an array data_set, size:(172800,3) and mask array, size (172800) consists of 1's and 0's. I would like to replace value form data_set array based on values (0 or 1) in mask array by the value defined by me: ex : [0,0,0] or [128,16,128].
I have tried, "np.placed" function but here the problem is the incorrect size of mask array.
I have also checked the more pythonic way:
data_set[mask]= [0,0,0] it worked fine but for some raison only for 2 first elements.
data_set[mask]= [0,0,0]
data_set = np.place(data_set, mask, [0,0,0])
My expected output is to change the value of element in data_set matrix to [0,0,0] if the mask value is 1.
ex.
data_set = [[134,123,90] , [234,45,65] , [32,233,45]]
mask = [ 1, 0, 1]
output = [[0,0,0] , [234, 45,65] , [0,0,0]]

When you try to index your data with mask numpy assumes you are giving it a list of indices. Use boolean arrays, or convert your mask to a list of indices:
import numpy as np
data_set = np.array([[134,123,90] , [234,45,65] , [32,233,45]])
mask = np.array([1, 0, 1])
val = np.zeros(data_set.shape[1])
data_set[mask.astype(bool),:] = val
# or
data_set[np.where(mask),:] = val
The first one converts your array of ints to an array of bools, while the second one creates a list of indexes where the mask is not zero.
You can set val to whatever value you need as long as it matches the remaining dimension of the dataset (in this case, 3).

Related

Compare two unequal size numpy arrays and fill the exclusion elements with nan

I need to do an element-by-element match of a 6x3 array with a 2x2 array. Return the bigger array with a True or False in the corresponding elements based on a match or no match. For the elements in the bigger array that cannot be compared e.g. column 3 and rows 3 to 6, I need to fill with NaN.
Here's my pseudo code:
first_arr_rows, first_arr_cols = 6, 3 # This is an example and will be dynamically initialized
sec_arr_rows, sec_arr_cols = 2, 2 # This is an example and will be dynamically initialized
if (sec_arr_cols <= first_arr_cols) and (sec_arr_rows <= first_arr_rows):
compared = arr1[:sec_arr_rows,:sec_arr_cols] == arr2[:sec_arr_rows,:sec_arr_cols]
# the above statement creates a 2x2 array
new_cols = np.zeros((first_arr_rows, first_arr_cols - sec_arr_cols))
new_rows = np.zeros((first_arr_rows - sec_arr_rows, compared.shape[1]))
compared = np.append(compared, new_rows, axis=0)
compared = np.append(compared, new_cols, axis=1)
compared[sec_arr_rows+1:,:] = np.nan
compared[:,sec_arr_cols:] = np.nan
Is there a simpler, more efficient way in Python to achieve this?
Here is my solution assuming the first array is always bigger than the second (see comments for general solution, e.g for the second array is bigger on some dimension)
import numpy as np
a = np.arange(18).reshape(6, 3) # 6x3 array
b = np.arange(4).reshape(2, 2) # 2x2 array
# create a resulting array of `nan` values
# in general case, desired shape is
# np.max([a.shape, b.shape], axis=0)
result = np.full(a.shape, np.nan)
# our selection have a shape of the smaller array
# in general case:
# tuple(map(slice, np.min([a.shape, b.shape], axis=0)))
selection = (slice(b.shape[0]), slice(b.shape[1]))
# compare values according the selection
result[selection] = a[selection] == b[selection]

How to have an array with no pair of elements closer by a distance

I want to remove elements from a numpy vector that are closer than a distance d. (I don't want any pair in the array or list that have a smaller distance between them than d but don't want to remove the pair completely otherwise.
for example if my array is:
array([[0. ],
[0.9486833],
[1.8973666],
[2.8460498],
[0.9486833]], dtype=float32)
All I need is to remove either the element with the index 1 or 4 not both of them.
I also need the indices of the elements from the original array that remain in the latent one.
Since the original array is in tensorflow 2.0, I will be happier if conversion to numpy is not needed like above. Because of speed also I prefer not to use another package and stay with numpy or scipy.
Thanks.
Here's a solution, using only a list. Note that this modifies the original list, so if you want to keep the original, copy.deepcopy it.
THRESHOLD = 0.1
def wrangle(l):
for i in range(len(l)):
for j in range(len(l)-1, i, -1):
if abs(l[i] - l[j]) < THRESHOLD:
l.pop(j)
using numpy:
import numpy as np
a = np.array([[0. ],
[0.9486833],
[1.8973666],
[2.8460498],
[0.9486833]])
threshold = 1.0
# The indices of the items smaller than a certain threshold, but larger than 0.
smaller_than = np.flatnonzero(np.logical_and(a < threshold, a > 0))
# Get the first index smaller than threshold
first_index = smaller_than[0]
# Recreate array without this index (bit cumbersome)
new_array = a[np.arange(len(a)) != first_index]
I'm pretty sure this is really easy to recreate in tensorflow, but I don't know how.
If your array is really only 1-d you can flatten it and do something like this:
a=tf.constant(np.array([[0. ],
[0.9486833],
[1.8973666],
[2.8460498],
[0.9486833]], dtype=np.float32))
d = 0.1
flat_a = tf.reshape(a,[-1]) # flatten
a1 = tf.expand_dims(flat_a, 1)
a2 = tf.expand_dims(flat_a, 0)
distance_map = tf.math.abs(a1-a2)
too_small = tf.cast(tf.math.less(dist_map, d), tf.int32)
# 1 at indices i,j if the distance between elements at i and j is less than d, 0 otherwise
upper_triangular_part = tf.linalg.band_part(too_small, 0, -1) - tf.linalg.band_part(too_small, 0,0)
remove = tf.reduce_sum(upper_triangular_part, axis=0)
remove = tf.cast(tf.math.greater(remove, 0), tf.float32)
# 1. at indices where the element should be removed, 0. otherwise
output = flat_a - remove * flat_a
You can access the indices through the remove tensor. If you need the extra dimension you can just use tf.expand_dims at the end of this.

Inserting a mini-array into a larger array at intervals (not changing size)

I'm trying to insert a mini array into a larger array without resizing, so changing the values of the larger array with the mini array.
Have a mini array, xx.
Have a larger array, XX
Every Y elements, replace the next elements with mini array values.
All the way till the end.
I've tried to do it through indexing (code can be found below).
mesh_array = np.zeros(shape=(100,100), dtype=np.uint8)
mini_square = np.ones(shape=(2,2), dtype=np.uint8)
flattened_array = np.ravel(mesh_array)
flattened_minisquare = np.ravel(mini_square)
flattened_array[1:-1:10] = flattened_minisquare
Expected result is that every 10 elements, it will replace the following ones with the flattened_minisquare values.
[0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,0...]
The error message that I get:
"ValueError: could not broadcast input array from shape (4) into shape (1000)"
There may be better ways, but one way is to approach this problem as follows:
import numpy as np
mesh_array = np.zeros(shape=(100,100), dtype=np.uint8)
mini_square = np.ones(shape=(2,2), dtype=np.uint8)
flattened_array = np.ravel(mesh_array)
flattened_minisquare = np.ravel(mini_square)
Now, we can construct array which would correspond to the positions where you wish to fill the minisquare with the remaining values as zeros. Note that it would, if the given output is correct, be an array of length 13 in this case. 9 elements of original array + 4 from the minisquare
stepsize = 10
temp = np.zeros(stepsize + len(flattened_minisquare) - 1)
temp[-len(flattened_minisquare):] = flattened_minisquare
We also create a mask for the values which are not filled by the minisquare.
mask = np.copy(temp)
mask[-len(flattened_minisquare):] = np.ones_like(flattened_minisquare)
mask = ~mask.astype(bool)
Now, just use np.resize to expand both the mask and the temp array, and then finally use the mask to fill the values back from the old array.
out = np.resize(temp, len(flattened_array))
final_mask = np.resize(mask, len(flattened_array))
out[final_mask] = flattened_array[final_mask]
print(out)
#[0. 0. 0. ... 0. 0. 0.]

How to find index and max value for overlapping numpy arrays

I have two numpy Array with the same SHAPEs. One with values and one with "zones". I need to find max value and index of the value in valuearr which overlap zone 3 in zonearr:
import numpy as np
valuearr = np.array([[10,11,12,13],
[21,22,23,24],
[31,32,33,34],
[41,42,43,44]])
zonearr = np.array([ [0,0,1,1],
[0,0,1,1],
[3,3,0,0],
[3,3,0,0]])
Im trying:
valuearr[np.argwhere(zonearr==3)].max()
44
When it should be 42.
To get index i try
ind = np.unravel_index(np.argmax(valuearr[np.argwhere(zonearr==3)], axis=None), valuearr.shape)
Which of course doesnt work since max value is not 44 and also give error:
builtins.ValueError: index 19 is out of bounds for array with size 16
You can use a masked array to do what you want.
With:
import numpy as np
valuearr = np.array([[10,11,12,13],
[21,22,23,24],
[31,32,33,34],
[41,42,43,44]])
zonearr = np.array([ [0,0,1,1],
[0,0,1,1],
[3,3,0,0],
[3,3,0,0]], dtype=np.int)
First mask out all the values where zonearr is not equal to 3:
masked = np.ma.masked_array(valuearr, mask = (zonearr!=3))
Then find the position of the maximum value with argmax:
idx_1d = np.argmax(masked)
Finally, convert it into a 2d index:
idx_2d = np.unravel_index(idx_1d, valuearr.shape)
and print:
print(idx_2d, valuearr[idx_2d])
which gives:
(3, 1) 42
Please try the below code
np.max(valuearr[np.where(zonearr==3)])
It fetches the indices of the elements from zonearr, where the value equals to '3'. Followed by, obtaining the maximum element from valuearr through the obtained indices.
To obtain the index of the element 42(as per your example), please use the below code:
np.argwhere(valuearr==np.max(valuearr[np.where(zonearr==3)]))

removing entries from a numpy array

I have a multidimensional numpy array with the shape (4, 2000). Each column in the array is a 4D element where the first two elements represent 2D positions.
Now, I have an image mask with the same shape as an image which is binary and tells me which pixels are valid or invalid. An entry of 0 in the mask highlights pixels that are invalid.
Now, I would like to do is filter my first array based on this mask i.e. remove entries where the position elements in my first array correspond to invalid pixels in the image. This can be done by looking up the corresponding entries in the mask and marking those columns to be deleted which correspond to a 0 entry in the mask.
So, something like:
import numpy as np
# Let mask be a 2D array of 0 and 1s
array = np.random.rand(4, 2000)
for i in range(2000):
current = array[:, i]
if mask[current[0], current[1]] <= 0:
# Somehow remove this entry from my array.
If possible, I would like to do this without looping as I have in my incomplete code.
You could select the x and y coordinates from array like this:
xarr, yarr = array[0, :], array[1, :]
Then form a boolean array of shape (2000,) which is True wherever the mask is 1:
idx = mask[xarr, yarr].astype(bool)
mask[xarr, yarr] is using so-called "integer array indexing".
All it means here is that the ith element of idx equals mask[xarr[i], yarr[i]].
Then select those columns from array:
result = array[:, idx]
import numpy as np
mask = np.random.randint(2, size=(500,500))
array = np.random.randint(500, size=(4, 2000))
xarr, yarr = array[0, :], array[1, :]
idx = mask[xarr, yarr].astype(bool)
result = array[:, idx]
cols = []
for i in range(2000):
current = array[:, i]
if mask[current[0], current[1]] > 0:
cols.append(i)
expected = array[:, cols]
assert np.allclose(result, expected)
I'm not sure if I'm reading the question right. Let's try again!
You have an array with 2 dimensions and you want to remove all columns that have masked data. Again, apologies if I've read this wrong.
import numpy.ma as ma
a = ma.array((([[1,2,3,4,5],[6,7,8,9,10]]),mask=[[0,0,0,1,0],[0,0,1,0,0]])
a[:,-a.mask.any(0)] # this is where the action happens
the a.mask.any(0) identifies all columns that are masked into a Boolean array. It's negated (the '-' sign) because we want the inverse, and then it uses that array to remove all masked values via indexing.
This gives me an array:
[[1 2 5],[6 7 10]]
In other words, the array has all removed all columns with masked data anywhere. Hope I got it right this time.

Categories

Resources