Check equality of two axes in multidiimensional numpy array - python

I am given a 3-dimensional shape(n,m,k) numpy array. I'd like to view this as a 2-dimensional matrix containing vectors, i.e. a nxm matrix with a vector of size k. I'd now like to check for two such arrays of shape (n,m,k) wheter entry (x,y,:) in the first array is equal to (x,y,:) in the second array. Is there a method to do this in numpy without using loops?
I'd thought about something like A == B conditioned on the first and second axis.

You can use a condition, and ndarray.all together with axis:
a = np.arange(27).reshape(3,3,3)
b = np.zeros_like(a)
b[0,1,2] = a[0,1,2]
b[1,2,0] = a[1,2,0]
b[2,1,:] = a[2,1,:] # set to the same 3-vector at n=2, m=1
(a == b).all(axis=2) # check whether all elements of last axis are equal
# array([[False, False, False],
# [False, False, False],
# [False, True, False]])
As you can see, for n=2 and m=1 we get the same 3-vector in a and b.

Related

Optimal way to modify value of a numpy array based on condition

I have a numpy.ndarray of the form
import numpy as np
my_array = np.array([[True, True, False], [True, False, True]])
In this example is a matrix of 3 columns and two rows, but my_array is thinking as an arbitriary 2d shape. By other hand I have a numpy.ndarray that represent a vector W with lenght equal to the number of rows of my_array, this vector has float values, for example W = np.array([10., 1.5]). Additionally I have a list WT of two-tuples with lenght equal to W, for example WT = [(0,20.), (0,1.)]. These tuples represents mathematical intervals (a,b).
I want to modify the column values of my_arraybased on the following condition: Given a column, we change to False (or we keep False if the value was that) the i-th element of the column if the i-th element of W does not belong to the mathematical interval of the i-th two-tuple of WT. For example, the first column of my_array is [True, True], so we have to analyze if 10. belong to (0,20) and 1.5 belong to (0,1), the resulting column should be [True, False].
I have a for loop, but I think there is a smart way to do this.
Obs: I donĀ“t need to change values from False to True.
I made this implementation :
import numpy as np
my_array = np.array([[True, True, False], [True, False, True]])
W = np.array([10.0, 1.5])
WT = np.array([[0, 20], [0, 1]])
i = (W > WT[:,0]) * (W < WT[:,1])
print("my_array before", my_array)
my_array[:, 0] = i
print("my_array after", my_array)
It will update the column values given your conditions.

np.meshgrid throws DeprecationWarning or MemoryError for large inputs

For a clustering problem I am trying to create the ideal similarity matrix. That is, I have an one-dimensional array of cluster labels and need to create a two-dimensional binary or boolean matrix with an entry of 1 iff two data points belong to the same cluster.
To do so I use np.meshgrid but it only works for smaller examples. Here's an MWE:
With an array of size 5 it works as desired:
arr = np.random.randint(0, 10, size=5)
print(arr)
mesh_grid = np.meshgrid(arr, arr, sparse=True)
mesh_grid[0] == mesh_grid[1]
gives
[9 0 9 0 7]
array([[ True, False, False, False, False],
[False, True, False, False, False],
[False, False, True, False, False],
[False, False, False, True, False],
[False, False, False, False, True]])
However, with an array of size 60000 it does not work:
arr = np.random.randint(0, 10, size=60000)
mesh_grid = np.meshgrid(arr, arr, sparse=True)
mesh_grid[0] == mesh_grid[1]
gives
DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
mesh_grid[0] == mesh_grid[1]
Setting sparse=False throws a memory error. And based on this answer I assume the DeprecationWarningmust be due to memory too.
Question: How can I solve this or is there another more efficient way to obtain the desired matrix?
If, for example, your array is composed by only 10 differents element (0,1,2,3....) then you only need to compare your array with those 10 elements and not with the whole matrix.
So you can do the following operations:
# Number of different elements
n = 3
# Generate the random vector (2D)
arr = np.random.randint(0, n, size=10)[None,:]
# Create the vector containing all the different elements (2D)
num = np.r_[0:n][:,None]
# We broadcast the 2 vectors to obtain a n*10 matrix
uni = arr==num
# Based on the previous result, we duplicate the row that need to be duplicated:
res = uni[arr] # 10 * 10 matrix
You can use np.unique() to extract the unique values of arr in the case where your unique value are not linearly distributed.

How to properly index to an array of changing size due to masking in python

This is a problem I've run into when developing something, and it's a hard question to phrase. So it's best with an simple example:
Imagine you have 4 random number generators which generate an array of size 4:
[rng-0, rng-1, rng-2, rng-3]
| | | |
[val0, val1, val2, val3]
Our goal is to loop through "generations" of arrays populated by these RNGs, and iteratively mask out the RNG which outputted the maximum value.
So an example might be starting out with:
mask = [False, False, False, False], arr = [0, 10, 1, 3], and so we would mask out rng-1.
Then the next iteration could be: mask = [False, True, False, False], arr = [2, 1, 9] (before it gets asked, yes arr HAS to decrease in size with each rng that is masked out). In this case, it is clear that rng-3 should be masked out (e.g. mask[3] = True), but since arr is now of different size than mask, it is tricky to get the right indexing for setting the mask (since the max of arr is at index 2 of the arr, but the corresponding generator is index 3). This problem grows more an more difficult as more generators get masked out (in my case I'm dealing with a mask of size ~30).
If it helps, here is python version of the example:
rng = np.random.RandomState(42)
mask = np.zeros(10, dtype=bool) # True if generator is being masked
for _ in range(mask.size):
arr = rng.randint(100, size=~mask.sum())
unadjusted_max_value_idx = arr.argmax()
adjusted_max_value_idx = unadjusted_max_value_idx + ????
mask[adjusted_max_value_idx] = True
Any idea a good way to map the index of the max value in the arr to the corresponding index in the mask? (i.e. moving from unadjusted_max_value_idx to adjusted_max_value_idx)
#use a helper list
rng = np.random.RandomState(42)
mask = np.zeros(10, dtype=bool) # True if generator is being masked
ndxLst=list(range(mask.size))
maskHistory=[]
for _ in range(mask.size):
arr = rng.randint(100, size=(~mask).sum())
unadjusted_max_value_idx = arr.argmax()
adjusted_max_value_idx=ndxLst.pop(unadjusted_max_value_idx)
mask[adjusted_max_value_idx] = True
maskHistory.append(adjusted_max_value_idx)
print(maskHistory)
print(mask)

Reduce boolean values in python ndarray using AND

I have a python array of this shape [3, 1000, 3] with boolean values inside. The first 3 is the batch size and the values of a batch are like these
[[False, False, False]\n
[False, True, True]\n
[False, False, True]\n
[True, True, True]\n
...
]
size (1000, 3)
I want to apply the and function to each triplet to end up with this new array
[[False]\n
[False]\n
[False]\n
[True]\n
...
]
size (3, 1000)
Looking at numpy I didn't find something useful. I've also tried to import operator and apply reduce(operator.and_, array) but it doesn't work.
Any idea to solve this?
You can easily do this using np.all.
This will check if all values along the last dimension are True:
y = np.all(arr, axis=-1)
y.shape # (3, 1000)

Modify numpy array section in-place using boolean indexing

Given a 2D numpy array, i.e.;
import numpy as np
data = np.array([
[11,12,13],
[21,22,23],
[31,32,33],
[41,42,43],
])
I need modify in place a sub-array based on two masking vectors for the desired rows and columns;
rows = np.array([False, False, True, True], dtype=bool)
cols = np.array([True, True, False], dtype=bool)
Such that i.e.;
print data
#[[11,12,13],
# [21,22,23],
# [0,0,33],
# [0,0,43]]
Now that you know how to access the rows/cols you want, just assigne the value you want to your subarray. It's a tad trickier, though:
mask = rows[:,None]*cols[None,:]
data[mask] = 0
The reason is that when we access the subarray as data[rows][:,cols] (as illustrated in your previous question, we're taking a view of a view, and some references to the original data get lost in the way.
Instead, here we construct a 2D boolean array by broadcasting your two 1D arrays rows and cols one with the other. Your mask array has now the shape (len(rows),len(cols). We can use mask to directly access the original items of data, and we set them to a new value. Note that when you do data[mask], you get a 1D array, which was not the answer you wanted in your previous question.
To construct the mask, we could have used the & operator instead of * (because we're dealing with boolean arrays), or the simpler np.outer function:
mask = np.outer(rows,cols)
Edit: props to #Marcus Jones for the np.outer solution.

Categories

Resources