Index with ndarray/ tensor - python

I have a tensor A with shape (NB, N, 2, 2).
If I have a list B, consisting of indices with length NB that I want to keep in tensor A, how should I do that?
That is to say, I want to keep 1 (out of N) element per batch, based on the indices in B.
I can get it done with a for loop specifying the batch i in A, and the i th element in b. But is there a vectorized way to do it?
I tried A[B] or A[B.unsqueeze(1)], both had index errors. And A[:, B] would return NB elements for every batch.
Example:
A = Tensor([[[a 2x2 mat AAA1], [a 2x2 mat BBB1], [a 2x2 mat CCC1], [a 2x2 mat DDD1]],
[[a 2x2 mat AAA2], [a 2x2 mat BBB2], [a 2x2 mat CCC2], [a 2x2 mat DDD2]],
[[a 2x2 mat AAA3], [a 2x2 mat BBB3], [a 2x2 mat CCC3], [a 2x2 mat DDD3]]
])
B = [1, 3, 0]
Expected output:
Tensor([[[a 2x2 mat BBB1]],
[[a 2x2 mat DDD2]],
[[a 2x2 mat AAA3]]
])

torch.gather comes to rescue.
Prepare your index list like
# A.shape = (NB, N, 2, 2)
B = torch.tensor([1, 3, 0]) # should be of length NB
B = B[:, None, None, None].repeat(1, # your actual indecies in batch dim
1, # indexing dim to be kept 1
2, # these two must be repeated
2)
And finally, use gather like this
torch.gather(A, 1, B) # indexing along '1'-th dim

Related

Indexing and retrieving numpy rows from a multidimensional array

I have a source multidimensional array of shape (a,b,c,c,d) which stores vectors/data of size d, and another array of shape (a,b,e,2) that stores e indices of size 2. 2-dimensional values correspond to the indices 2-3 of the data array (both dimensions of size c). Note that both arrays share the same a,b dimension sizes.
What I want to do is to use these indices to retrieve rows of size d from the first array. So that, the output array should have size (a,b,e,d), i.e. e vectors of size d along the a,b dimensions.
a, b, c, d = 3,5,7,9
e = 11
data = np.random.rand(a,b,c,c,d)
inds = np.random.randint(0,c, size=(a,b,e,2))
res = data[:, :, inds[:,:,:,0], inds[:,:,:,1],:]
print(' - Obtained shape:', res.shape)
print(' - Desired shape:', (a,b,e,d))
# - Obtained shape: (3, 5, 3, 5, 11, 9)
# - Desired shape: (3, 5, 11, 9)
The only way I can think right now is enforcing full fancy indexing by generating range-like indices in all three leading dimensions:
import numpy as np
rng = np.random.default_rng()
a, b, c, d = 3, 5, 7, 9
e = 11
data = rng.uniform(size=(a, b, c, c, d))
inds = rng.integers(0, c, size=(a, b, e, 2))
# generate open index meshes to reduce memory for at least here
aind, bind, _, = np.ogrid[:a, :b, :e]
res = data[aind, bind, inds[..., 0], inds[..., 1], :]
print(' - Obtained shape:', res.shape)
print(' - Desired shape:', (a, b, e, d))
Random check to see that the values are correct too:
sample_index_pos = (1, 1, 8) # <-> (a, b, e)
c_inds = inds[sample_index_pos] # <-> (c, c)
expected = data[sample_index_pos[:2] + tuple(c_inds)]
have = res[sample_index_pos]
print(np.array_equal(expected, have))
# True

NumPy: Concatenating 1D array to 3D array

Suppose I have a 5x10x3 array, which I interpret as 5 'sub-arrays', each consisting of 10 rows and 3 columns. I also have a seperate 1D array of length 5, which I call b.
I am trying to insert a new column into each sub-array, where the column inserted into the ith (i=0,1,2,3,4) sub-array is a 10x1 vector where each element is equal to b[i].
For example:
import numpy as np
np.random.seed(777)
A = np.random.rand(5,10,3)
b = np.array([2,4,6,8,10])
A[0] should look like:
A[1] should look like:
And similarly for the other 'sub-arrays'.
(Notice b[0]=2 and b[1]=4)
What about this?
# Make an array B with the same dimensions than A
B = np.tile(b, (1, 10, 1)).transpose(2, 1, 0) # shape: (5, 10, 1)
# Concatenate both
np.concatenate([A, B], axis=-1) # shape: (5, 10, 4)
One method would be np.pad:
np.pad(A, ((0,0),(0,0),(0,1)), 'constant', constant_values=[[[],[]],[[],[]],[[],b[:, None,None]]])
# array([[[9.36513084e-01, 5.33199169e-01, 1.66763960e-02, 2.00000000e+00],
# [9.79060284e-02, 2.17614285e-02, 4.72452812e-01, 2.00000000e+00],
# etc.
Or (more typing but probably faster):
i,j,k = A.shape
res = np.empty((i,j,k+1), np.result_type(A, b))
res[...,:-1] = A
res[...,-1] = b[:, None]
Or dstack after broadcast_to:
np.dstack([A,np.broadcast_to(b[:,None],A.shape[:2])]

How to calculate hamming distance between 1d and 2d array without loop

A is a 1d array with shape 100, B is a 2d array with shape (50000, 100). I want to calculate hamming distance between A and B, and get an array X with shape 50000.
I can do it with a loop:
for i in range(50000):
X[i] = np.count_nonzero(A != B[j,:])
I'd like to know can I skip the loop or do something to make it faster?
You can directly compare A and B with A != B, which will broadcast due to the different number of dimensions A and B have, and then you can use np.count_nonzero per row with axis=1:
np.count_nonzero(A != B, axis=1)
A = np.array([1,2])
B = np.array([[1,2],[3,2],[1,3],[2,4]])
np.count_nonzero(A != B, axis=1)
# array([0, 1, 1, 2])

Numpy array and column extracted from a matrix, different shape

I'm trying to do an integration with numpy:
A = n.trapz(B,C)
but I have some issues with B and C shapes
B is a filled array inizialized with numpy zeros function
B=np.zeros((N,1))
C is a column extracted from a matrix, always inizialized with numpy:
C = D[:,0]
D = np.zeros((N,2))
the problem is that:
n.shape(B) # (N,1)
n.shape(C) # (N,)
how can I manage this?
Try
B = np.zeros(N)
np.trapz(B, C)
Also, you np.trapz accepts multi-dimensional arrays, so arrays of shape (N, 1) are ok; you just need to specify an axis to handle it properly.
B = np.zeros((N, 1))
C = D[:, 0]
np.trapz(B, C.reshape(N, 1), axis=1)

numpy array each element multiplication with matrix

I have a matrix
A = [[ 1. 1.]
[ 1. 1.]]
and two arrays (a and b), every array contains 20 float numbers How can I multiply the using formula:
( x' = A * ( x )
y' ) y
Is this correct? m = A * [a, b]
Matrix multiplication with NumPy arrays can be done with np.dot.
If X has shape (i,j) and Y has shape (j,k) then np.dot(X,Y) will be the matrix product and have shape (i,k). The last axis of X and the second-to-last axis of Y is multiplied and summed over.
Now, if a and b have shape (20,), then np.vstack([a,b]) has shape (2, 20):
In [66]: np.vstack([a,b]).shape
Out[66]: (2, 20)
You can think of np.vstack([a, b]) as a 2x20 matrix with the values of a on the first row, and the values of b on the second row.
Since A has shape (2,2), we can perform the matrix multiplication
m = np.dot(A, np.vstack([a,b]))
to arrive at an array of shape (2, 20).
The first row of m contains the x' values, the second row contains the y' values.
NumPy also has a matrix subclass of ndarray (a special kind of NumPy array) which has convenient syntax for doing matrix multiplication with 2D arrays. If we define A to be a matrix (rather than a plain ndarray which is what np.array(...) creates), then matrix multiplication can be done with the * operator.
I show both ways (with A being a plain ndarray and A2 being a matrix) below:
import numpy as np
A = np.array([[1.,1.],[1.,1.]])
A2 = np.matrix([[1.,1.],[1.,1.]])
a = np.random.random(20)
b = np.random.random(20)
c = np.vstack([a,b])
m = np.dot(A, c)
m2 = A2 * c
assert np.allclose(m, m2)

Categories

Resources