I've seen many of the posts on how to get closes value in a numpy array, how to get closest coordinate in a 2D array etc. But none of them seem to solve what I am looking for.
The problem is, I have a 2D numpy array as such:
[[77.62881735 12.91172607]
[77.6464534 12.9230648]
[77.65330961 12.92020244]
[77.63142413 12.90909731]]
And I have one numpy array like this:
[77.64000112 12.91602265]
Now I want to find a coordinate in the 2D numpy array that is closest to the co-ordinates in 1D array.
That said, I am a beginner in these stuffs..So any input is appreciated.
I assume you mean euclidean distance. Try this:
a = np.array([[77.62881735, 12.91172607],
[77.6464534, 12.9230648],
[77.65330961,12.92020244],
[77.63142413 ,12.90909731]])
b = np.array([77.64000112, 12.91602265])
idx_min = np.sum( (a-b)**2, axis=1, keepdims=True).argmin(axis=0)
idx_min, a[idx_min]
Output:
(array([1], dtype=int64), array([[77.6464534, 12.9230648]]))
You need to implement your own "distance" computing function.
My example implements Euclidean Distance for simple
import numpy as np
import math
def compute_distance(coord1, coord2):
return math.sqrt(pow(coord1[0] - coord2[0], 2) + pow(coord1[1] - coord2[1], 2))
gallery = np.asarray([[77.62881735, 12.91172607],
[77.6464534, 12.9230648],
[77.65330961, 12.92020244],
[77.63142413, 12.90909731]])
query = np.asarray([77.64000112, 12.91602265])
distances = [compute_distance(i, query) for i in gallery]
min_coord = gallery[np.argmin(distances)]
Related
I have two numpy arrays, with just the 3-dimensional coordinates of two molecules.
I need to implement the following equation, and I'm having problems in the subtraction of each coordinate of one of the arrays by the second, and then square it.
I have tried the following, but since I'm still learning I feel that I am making some major mistake. The simple code I use is:
a = [math.sqrt(1/3*((i[:,0]-j[:,0])**2) + ((i[:,1] - j[:,1])**2) + ((i[:,2]-j[:,2])**2) for i, j in zip(coordenates_2, coordenates_1))]
It's numpy you can easily do it using the following example:
import numpy as np
x1 = np.random.randn(3,3,3)
x2 = np.random.randn(3,3,3)
res = np.sqrt(np.mean(np.power(x1-x2,2)))
In python, is there a vectorized efficient way to calculate the cosine distance of a sparse array u to a sparse matrix v, resulting in an array of elements [1, 2, ..., n] corresponding to cosine(u,v[0]), cosine(u,v[1]), ..., cosine(u, v[n])?
Not natively. You can however use the library scipy that can compute the cosine distance between two vectors for you: http://docs.scipy.org/doc/scipy-0.17.0/reference/generated/scipy.spatial.distance.cosine.html. You can build a version that takes a matrix using this as a stepping stone.
Add the vector onto the end of the matrix, calculate a pairwise distance matrix using sklearn.metrics.pairwise_distances() and then extract the relevant column/row.
So for vector v (with shape (D,)) and matrix m (with shape (N,D)) do:
import sklearn
from sklearn.metrics import pairwise_distances
new_m = np.concatenate([m,v[None,:]], axis=0)
distance_matrix = sklearn.metrics.pairwise_distances(new_m, axis=0), metric="cosine")
distances = distance_matrix[-1,:-1]
Not ideal, but better than iterating!
This method can be extended if you are querying more than one vector. To do this, a list of vectors can be concatenated instead.
I think there is a way using the definition and the numpy library:
Definition:
import numpy as np
#just creating random data
u = np.random.random(100)
v = np.random.random((100,100))
#dot product: for every row in v, multiply u and sum the elements
u_dot_v = np.sum(u*v,axis = 1)
#find the norm of u and each row of v
mod_u = np.sqrt(np.sum(u*u))
mod_v = np.sqrt(np.sum(v*v,axis = 1))
#just apply the definition
final = 1 - u_dot_v/(mod_u*mod_v)
#verify with the cosine function from scipy
from scipy.spatial.distance import cosine
final2 = np.array([cosine(u,i) for i in v])
The definition of cosine distance i found here :https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cosine.html#scipy.spatial.distance.cosine
In scipy.spatial.distance.cosine()
http://docs.scipy.org/doc/scipy-0.17.0/reference/generated/scipy.spatial.distance.cosine.html
Below worked for me, have to provide correct signature
from scipy.spatial.distance import cosine
def cosine_distances(embedding_matrix, extracted_embedding):
return cosine(embedding_matrix, extracted_embedding)
cosine_distances = np.vectorize(cosine_distances, signature='(m),(d)->()')
cosine_distances(corpus_embeddings, extracted_embedding)
In my case
corpus_embeddings is a (10000,128) matrix
extracted_embedding is a 128-dimensional vector
considering I have a 3D histogram or for simplicity a 3D numpy array of shape (X,Y,Z)
import numpy as np
array = np.random.random((100,100,100))
What is the best way, using numpy or scipy to obtain array's values' indexes of which satisfy a sphere conditions?
(index_x**2 + index_y**2 + index_z**2) <= radius**2
Obvioulsy, in the later condition, the array center is (0, 0, 0). In general the condition will be
((index_x-center_x)**2 + (index_y-center_y)**2 +(index_z-center_z)**2) <= radius**2
The problem is easy to solve using simply a python loop, but I need that to be optimized.
many thanks for your help
You can first efficiently get the indexes with ogrid() and then obtain the indexes that satisfy your condition with nonzero().
Getting the indexes can be obtained with nonzero() like so:
indexes = numpy.transpose((x**2+y**2+z**2 <= radius**2).nonzero()) # transpose() might be unnecessary: it depends on your needs
where the indexes arrays are obtained efficiently with ogrid():
x, y, z = numpy.ogrid[:100, :100, :100]
or, for an arbitrary shape for your input data array:
x, y, z = ogrid[tuple(slice(None, dim) for dim in data.shape)]
Just for making #EOL nice approach more general, one can define a center within the shape of the array
array = np.random.random((100,100,100))
center = (30,10,25)
radius = 5.0
x, y, z = np.ogrid[-center[0]:array.shape[0]-center[0],-center[1] :array.shape[1]-center[1], -center[2]:array.shape[2]-center[2]]
indexes = numpy.transpose((x**2+y**2+z**2 <= radius**2).nonzero())
I have many 100x100 grids, is there an efficient way using numpy to calculate the median for every grid point and return just one 100x100 grid with the median values? Presently, I'm using a for loop to run through each grid point, calculating the median and then combining them into one grid at the end. I'm sure there's a better way to do this using numpy. Any help would be appreciated! Thanks!
Create as 100x100xN array (or stack together if that's not possible) and use np.median with the correct axis to do it in one go:
import numpy as np
a = np.random.rand(100,100)
b = np.random.rand(100,100)
c = np.random.rand(100,100)
d = np.dstack((a,b,c))
result = np.median(d,axis=2)
How many grids are there?
One option would be to create a 3D array that is 100x100xnumGrids and compute the median across the 3rd dimension.
use axis parameter of median:
import numpy as np
data = np.random.rand(100, 5, 5)
print np.median(data, axis=0)
print np.median(data[:, 0, 0])
print np.median(data[:, 1, 0])
I'm looking for dynamically growing vectors in Python, since I don't know their length in advance. In addition, I would like to calculate distances between these sparse vectors, preferably using the distance functions in scipy.spatial.distance (although any other suggestions are welcome). Any ideas how to do this? (Initially, it doesn't need to be efficient.)
Thanks a lot in advance!
You can use regular python lists (which are dynamic) as vectors. Trivial example follows.
from scipy.spatial.distance import sqeuclidean
a = [1,2,3]
b = [0,0,0]
print sqeuclidean(a,b) # 14
As per aganders3's suggestion, do note that you can also use numpy arrays if needed:
import numpy
a = numpy.array([1,2,3])
If the sparse part of your question is crucial I'd use scipy for that - it has support for sparse matrixes. You can define a 1xn matrix and use it as a vector. This works (the parameter is the size of the matrix, filled with zeroes by default):
sqeuclidean(scipy.sparse.coo_matrix((1,3)),scipy.sparse.coo_matrix((1,3))) # 0
There are many kinds of sparse matrixes, some dictionary based (see comment). You can define a row sparse matrix from a list like this:
scipy.sparse.csr_matrix([1,2,3])
Here is how you can do it in numpy:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([0, 0, 0])
c = np.sum(((a - b) ** 2)) # 14