How to calculate hamming distance between 1d and 2d array without loop - python

A is a 1d array with shape 100, B is a 2d array with shape (50000, 100). I want to calculate hamming distance between A and B, and get an array X with shape 50000.
I can do it with a loop:
for i in range(50000):
X[i] = np.count_nonzero(A != B[j,:])
I'd like to know can I skip the loop or do something to make it faster?

You can directly compare A and B with A != B, which will broadcast due to the different number of dimensions A and B have, and then you can use np.count_nonzero per row with axis=1:
np.count_nonzero(A != B, axis=1)
A = np.array([1,2])
B = np.array([[1,2],[3,2],[1,3],[2,4]])
np.count_nonzero(A != B, axis=1)
# array([0, 1, 1, 2])

Related

Index with ndarray/ tensor

I have a tensor A with shape (NB, N, 2, 2).
If I have a list B, consisting of indices with length NB that I want to keep in tensor A, how should I do that?
That is to say, I want to keep 1 (out of N) element per batch, based on the indices in B.
I can get it done with a for loop specifying the batch i in A, and the i th element in b. But is there a vectorized way to do it?
I tried A[B] or A[B.unsqueeze(1)], both had index errors. And A[:, B] would return NB elements for every batch.
Example:
A = Tensor([[[a 2x2 mat AAA1], [a 2x2 mat BBB1], [a 2x2 mat CCC1], [a 2x2 mat DDD1]],
[[a 2x2 mat AAA2], [a 2x2 mat BBB2], [a 2x2 mat CCC2], [a 2x2 mat DDD2]],
[[a 2x2 mat AAA3], [a 2x2 mat BBB3], [a 2x2 mat CCC3], [a 2x2 mat DDD3]]
])
B = [1, 3, 0]
Expected output:
Tensor([[[a 2x2 mat BBB1]],
[[a 2x2 mat DDD2]],
[[a 2x2 mat AAA3]]
])
torch.gather comes to rescue.
Prepare your index list like
# A.shape = (NB, N, 2, 2)
B = torch.tensor([1, 3, 0]) # should be of length NB
B = B[:, None, None, None].repeat(1, # your actual indecies in batch dim
1, # indexing dim to be kept 1
2, # these two must be repeated
2)
And finally, use gather like this
torch.gather(A, 1, B) # indexing along '1'-th dim

Swap 2 numpy arrays based on condition from different arrays

I have 4 arrays, A,B,C,D. A and B have shape (n,n) and C/D have shape (n,n,m). I am trying to set it up so that when an element of A is greater than B, that array of length m belongs to C. In essence
C_new = np.where(A > B, C,D) , D_new = np.where(A < B , D, C). However this gives me a value error (operands could not be broadcast together with shapes)
I am curious if I can use where here instead of just looping through each element?
Edit: example:
A = np.ones((2,2))
B = 2*np.eye(2)
C = np.ones((2,2,3))
D = np.zeros((2,2,3))
# Cnew = np.where(A > B, C,D)-> ValueError: operands could not be broadcast together with shapes (2,2) (2,2,3) (2,2,3)
The Cnew would be zeros in the (0,0) and (1,1) index.
You need to add a new axis at the end of the condition in order for it to broadcast correctly:
C_new = np.where((A > B)[..., np.newaxis], C, D)
D_new = np.where((A < B)[..., np.newaxis], D, C)

NumPy: Concatenating 1D array to 3D array

Suppose I have a 5x10x3 array, which I interpret as 5 'sub-arrays', each consisting of 10 rows and 3 columns. I also have a seperate 1D array of length 5, which I call b.
I am trying to insert a new column into each sub-array, where the column inserted into the ith (i=0,1,2,3,4) sub-array is a 10x1 vector where each element is equal to b[i].
For example:
import numpy as np
np.random.seed(777)
A = np.random.rand(5,10,3)
b = np.array([2,4,6,8,10])
A[0] should look like:
A[1] should look like:
And similarly for the other 'sub-arrays'.
(Notice b[0]=2 and b[1]=4)
What about this?
# Make an array B with the same dimensions than A
B = np.tile(b, (1, 10, 1)).transpose(2, 1, 0) # shape: (5, 10, 1)
# Concatenate both
np.concatenate([A, B], axis=-1) # shape: (5, 10, 4)
One method would be np.pad:
np.pad(A, ((0,0),(0,0),(0,1)), 'constant', constant_values=[[[],[]],[[],[]],[[],b[:, None,None]]])
# array([[[9.36513084e-01, 5.33199169e-01, 1.66763960e-02, 2.00000000e+00],
# [9.79060284e-02, 2.17614285e-02, 4.72452812e-01, 2.00000000e+00],
# etc.
Or (more typing but probably faster):
i,j,k = A.shape
res = np.empty((i,j,k+1), np.result_type(A, b))
res[...,:-1] = A
res[...,-1] = b[:, None]
Or dstack after broadcast_to:
np.dstack([A,np.broadcast_to(b[:,None],A.shape[:2])]

Numpy array and column extracted from a matrix, different shape

I'm trying to do an integration with numpy:
A = n.trapz(B,C)
but I have some issues with B and C shapes
B is a filled array inizialized with numpy zeros function
B=np.zeros((N,1))
C is a column extracted from a matrix, always inizialized with numpy:
C = D[:,0]
D = np.zeros((N,2))
the problem is that:
n.shape(B) # (N,1)
n.shape(C) # (N,)
how can I manage this?
Try
B = np.zeros(N)
np.trapz(B, C)
Also, you np.trapz accepts multi-dimensional arrays, so arrays of shape (N, 1) are ok; you just need to specify an axis to handle it properly.
B = np.zeros((N, 1))
C = D[:, 0]
np.trapz(B, C.reshape(N, 1), axis=1)

numpy array each element multiplication with matrix

I have a matrix
A = [[ 1. 1.]
[ 1. 1.]]
and two arrays (a and b), every array contains 20 float numbers How can I multiply the using formula:
( x' = A * ( x )
y' ) y
Is this correct? m = A * [a, b]
Matrix multiplication with NumPy arrays can be done with np.dot.
If X has shape (i,j) and Y has shape (j,k) then np.dot(X,Y) will be the matrix product and have shape (i,k). The last axis of X and the second-to-last axis of Y is multiplied and summed over.
Now, if a and b have shape (20,), then np.vstack([a,b]) has shape (2, 20):
In [66]: np.vstack([a,b]).shape
Out[66]: (2, 20)
You can think of np.vstack([a, b]) as a 2x20 matrix with the values of a on the first row, and the values of b on the second row.
Since A has shape (2,2), we can perform the matrix multiplication
m = np.dot(A, np.vstack([a,b]))
to arrive at an array of shape (2, 20).
The first row of m contains the x' values, the second row contains the y' values.
NumPy also has a matrix subclass of ndarray (a special kind of NumPy array) which has convenient syntax for doing matrix multiplication with 2D arrays. If we define A to be a matrix (rather than a plain ndarray which is what np.array(...) creates), then matrix multiplication can be done with the * operator.
I show both ways (with A being a plain ndarray and A2 being a matrix) below:
import numpy as np
A = np.array([[1.,1.],[1.,1.]])
A2 = np.matrix([[1.,1.],[1.,1.]])
a = np.random.random(20)
b = np.random.random(20)
c = np.vstack([a,b])
m = np.dot(A, c)
m2 = A2 * c
assert np.allclose(m, m2)

Categories

Resources