I am attempting to write a program which constructs a matrix and performs a singular value decomposition on it. I am evaluating the function ax^2 +bx + 1 on a grid. I then make a uniform meshgrid of a and b. The rows of the matrix correspond to different quadratic coefficients, while each column corresponds to a grid point at which the function is evaluated.
The matlab code is here:
% Collect data
x = linspace(-1,1,100);
[a,b] = meshgrid(0:0.1:1,0:0.1:1);
D=zeros(numel(x),numel(a));
sz = size(D)
% Build “Dose” matrix
for i=1:numel(a)
D(:,i) = a(i)*x.^2+b(i)*x+1;
end
% Do the SVD:
[U,S,V]=svd(D,'econ');
D_reconstructed = U*S*V';
plot(diag(S))
scatter3(a(:),b(:),V(:,1))
This is my attempt at a solution:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-1, 1, 100)
def f(x, a, b):
return a*x*x + b*x + 1
a, b = np.mgrid[0:1:0.1,0:1:0.1]
#a = b = np.arange(0,1,0.01)
D = np.zeros((x.size, a.size))
for i in range(a.size):
D[i] = a[i]*x*x +b[i]*x +1
U, S, V = np.linalg.svd(D)
plt.plot(np.diag(S))
fig = plt.figure()
ax = plt.axes(projection="3d")
ax.scatter(a, b, V[0])
but I always get broadcasting errors which I am not sure how to fix.
Firstly, in MATLAB you're assigning to D(:,i), but in python you're assigning to D[i]. The latter is equivalent to D[i, ...] which is in your case D[i, :]. Instead you seem to need D[:, i].
Secondly, in MATLAB using a linear index into a 2d array (namely a and b) will give you flattened views. If you do that with numpy you get slices of an array instead, just as I mentioned with D[i].
You can do away with the loop with broadcasting and getting your desired 2d array by .ravelling (or reshaping) your a and b arrays:
x = np.linspace(-1, 1, 100)[:, None] # inject trailing singleton for broadcasting
a, b = np.mgrid[0:1:0.1, 0:1:0.1]
D = a.ravel() * x**2 + b.ravel() * x + 1
The way this works is that x has shape (100, 1) after we inject a trailing singleton (in MATLAB trailing singletons are implied, in numpy leading ones), and both a.ravel() and b.ravel() have shape (10*10,) which is compatible with (1, 10*10), making broadcasting possible into shape (100, 10*10). You could also replace the calls to ravel with
a, b = np.mgrid[...].reshape(2, -1)
which is a trick I sometimes use, but this is harder to read if you're unfamiliar with the pattern.
Side note: it's better to use example data where dimensions end up being of different size so that you notice if something ends up being transposed.
I understand that there are a lot of answers on this topic but I have scrutinized all of them and did not find something suitable for me.
I'm sure that error is childish but still can not find a solution.
I want to take some element from numpy.linspace.
import numpy
#Porosity range
phi = numpy.linspace(0.1, 1, num=10)
mu = [1, 10, 100, 1000]
Example for how it looks like but not in loop and it works:
mu_total3 = mu[0]*phi[2]+ mu[1]*(1 - phi[2])
print(mu_total3)
7.3
What I want in following:
for x in phi:
mu_total = mu[0]*phi[x]+ mu[1]*(1 - phi[x])
print(mu_total)
Numpy is specialised at doing vector operations. That is taking an one or two arrays and applying an operation to all its elements. For python lists you might write:
zs = []
for x, y in zip(xs, ys):
z = x + 2*y
zs.append(z)
print(zs)
Wheras with a numpy array you can write:
zs = xs + 2*ys
print(zs)
Applied to your code that becomes:
mu_totals = mu[0]*phi + mu[1]*(1 - phi)
I have two arrays of x-y coordinates, and I would like to find the minimum Euclidean distance between each point in one array with all the points in the other array. The arrays are not necessarily the same size. For example:
xy1=numpy.array(
[[ 243, 3173],
[ 525, 2997]])
xy2=numpy.array(
[[ 682, 2644],
[ 277, 2651],
[ 396, 2640]])
My current method loops through each coordinate xy in xy1 and calculates the distances between that coordinate and the other coordinates.
mindist=numpy.zeros(len(xy1))
minid=numpy.zeros(len(xy1))
for i,xy in enumerate(xy1):
dists=numpy.sqrt(numpy.sum((xy-xy2)**2,axis=1))
mindist[i],minid[i]=dists.min(),dists.argmin()
Is there a way to eliminate the for loop and somehow do element-by-element calculations between the two arrays? I envision generating a distance matrix for which I could find the minimum element in each row or column.
Another way to look at the problem. Say I concatenate xy1 (length m) and xy2 (length p) into xy (length n), and I store the lengths of the original arrays. Theoretically, I should then be able to generate a n x n distance matrix from those coordinates from which I can grab an m x p submatrix. Is there a way to efficiently generate this submatrix?
(Months later)
scipy.spatial.distance.cdist( X, Y )
gives all pairs of distances,
for X and Y 2 dim, 3 dim ...
It also does 22 different norms, detailed
here .
# cdist example: (nx,dim) (ny,dim) -> (nx,ny)
from __future__ import division
import sys
import numpy as np
from scipy.spatial.distance import cdist
#...............................................................................
dim = 10
nx = 1000
ny = 100
metric = "euclidean"
seed = 1
# change these params in sh or ipython: run this.py dim=3 ...
for arg in sys.argv[1:]:
exec( arg )
np.random.seed(seed)
np.set_printoptions( 2, threshold=100, edgeitems=10, suppress=True )
title = "%s dim %d nx %d ny %d metric %s" % (
__file__, dim, nx, ny, metric )
print "\n", title
#...............................................................................
X = np.random.uniform( 0, 1, size=(nx,dim) )
Y = np.random.uniform( 0, 1, size=(ny,dim) )
dist = cdist( X, Y, metric=metric ) # -> (nx, ny) distances
#...............................................................................
print "scipy.spatial.distance.cdist: X %s Y %s -> %s" % (
X.shape, Y.shape, dist.shape )
print "dist average %.3g +- %.2g" % (dist.mean(), dist.std())
print "check: dist[0,3] %.3g == cdist( [X[0]], [Y[3]] ) %.3g" % (
dist[0,3], cdist( [X[0]], [Y[3]] ))
# (trivia: how do pairwise distances between uniform-random points in the unit cube
# depend on the metric ? With the right scaling, not much at all:
# L1 / dim ~ .33 +- .2/sqrt dim
# L2 / sqrt dim ~ .4 +- .2/sqrt dim
# Lmax / 2 ~ .4 +- .2/sqrt dim
To compute the m by p matrix of distances, this should work:
>>> def distances(xy1, xy2):
... d0 = numpy.subtract.outer(xy1[:,0], xy2[:,0])
... d1 = numpy.subtract.outer(xy1[:,1], xy2[:,1])
... return numpy.hypot(d0, d1)
the .outer calls make two such matrices (of scalar differences along the two axes), the .hypot calls turns those into a same-shape matrix (of scalar euclidean distances).
The accepted answer does not fully address the question, which requests to find the minimum distance between the two sets of points, not the distance between every point in the two sets.
Although a straightforward solution to the original question indeed consists of computing the distance between every pair and subsequently finding the minimum one, this is not necessary if one is only interested in the minimum distances. A much faster solution exists for the latter problem.
All the proposed solutions have a running time that scales as m*p = len(xy1)*len(xy2). This is OK for small datasets, but an optimal solution can be written that scales as m*log(p), producing huge savings for large xy2 datasets.
This optimal execution time scaling can be achieved using scipy.spatial.KDTree as follows
import numpy as np
from scipy import spatial
xy1 = np.array(
[[243, 3173],
[525, 2997]])
xy2 = np.array(
[[682, 2644],
[277, 2651],
[396, 2640]])
# This solution is optimal when xy2 is very large
tree = spatial.KDTree(xy2)
mindist, minid = tree.query(xy1)
print(mindist)
# This solution by #denis is OK for small xy2
mindist = np.min(spatial.distance.cdist(xy1, xy2), axis=1)
print(mindist)
where mindist is the minimum distance between each point in xy1 and the set of points in xy2
For what you're trying to do:
dists = numpy.sqrt((xy1[:, 0, numpy.newaxis] - xy2[:, 0])**2 + (xy1[:, 1, numpy.newaxis - xy2[:, 1])**2)
mindist = numpy.min(dists, axis=1)
minid = numpy.argmin(dists, axis=1)
Edit: Instead of calling sqrt, doing squares, etc., you can use numpy.hypot:
dists = numpy.hypot(xy1[:, 0, numpy.newaxis]-xy2[:, 0], xy1[:, 1, numpy.newaxis]-xy2[:, 1])
import numpy as np
P = np.add.outer(np.sum(xy1**2, axis=1), np.sum(xy2**2, axis=1))
N = np.dot(xy1, xy2.T)
dists = np.sqrt(P - 2*N)
I think the following function also works.
import numpy as np
from typing import Optional
def pairwise_dist(X: np.ndarray, Y: Optional[np.ndarray] = None) -> np.ndarray:
Y = X if Y is None else Y
xx = (X ** 2).sum(axis = 1)[:, None]
yy = (Y ** 2).sum(axis = 1)[:, None]
return xx + yy.T - 2 * (X # Y.T)
Explanation
Suppose each row of X and Y are coordinates of the two sets of points.
Let their sizes be m X p and p X n respectively.
The result will produce a numpy array of size m X n with the (i, j)-th entry being the distance between the i-th row and the j-th row of X and Y respectively.
I highly recommend using some inbuilt method for calculating squares, and roots for they are customized for optimized way to calculate and very safe against overflows.
#alex answer below is the most safest in terms of overflow and should also be very fast. Also for single points you can use math.hypot which now supports more than 2 dimensions.
>>> def distances(xy1, xy2):
... d0 = numpy.subtract.outer(xy1[:,0], xy2[:,0])
... d1 = numpy.subtract.outer(xy1[:,1], xy2[:,1])
... return numpy.hypot(d0, d1)
Safety concerns
i, j, k = 1e+200, 1e+200, 1e+200
math.hypot(i, j, k)
# np.hypot for 2d points
# 1.7320508075688773e+200
np.sqrt(np.sum((np.array([i, j, k])) ** 2))
# RuntimeWarning: overflow encountered in square
overflow/underflow/speeds
I think that the most straightforward and efficient solution is to do it like this:
distances = np.linalg.norm(xy1, xy2) # calculate the euclidean distances between the test point and the training features.
min_dist = numpy.min(dists, axis=1) # get the minimum distance
min_id = np.argmi(distances) # get the index of the class with the minimum distance, i.e., the minimum difference.
Although many answers here are great, there is another way which has not been mentioned here, using numpy's vectorization / broadcasting properties to compute the distance between each points of two different arrays of different length (and, if wanted, the closest matches). I publish it here because it can be very handy to master broadcasting, and it also solves this problem elengantly while remaining very efficient.
Assuming you have two arrays like so:
# two arrays of different length, but with the same dimension
a = np.random.randn(6,2)
b = np.random.randn(4,2)
You can't do the operation a-b: numpy complains with operands could not be broadcast together with shapes (6,2) (4,2). The trick to allow broadcasting is to manually add a dimension for numpy to broadcast along to. By leaving the dimension 2 in both reshaped arrays, numpy knows that it must perform the operation over this dimension.
deltas = a.reshape(6, 1, 2) - b.reshape(1, 4, 2)
# contains the distance between each points
distance_matrix = (deltas ** 2).sum(axis=2)
The distance_matrix has a shape (6,4): for each point in a, the distances to all points in b are computed. Then, if you want the "minimum Euclidean distance between each point in one array with all the points in the other array", you would do :
distance_matrix.argmin(axis=1)
This returns the index of the point in b that is closest to each point of a.
I have two arrays, lets say x and y that contain a few thousand datapoints.
Plotting a scatterplot gives a beautiful representation of them. Now I'd like to select all points within a certain radius. For example r=10
I tried this, but it does not work, as it's not a grid.
x = [1,2,4,5,7,8,....]
y = [-1,4,8,-1,11,17,....]
RAdeccircle = x**2+y**2
r = 10
regstars = np.where(RAdeccircle < r**2)
This is not the same as an nxn array, and RAdeccircle = x**2+y**2 does not seem to work as it does not try all permutations.
You can only perform ** on a numpy array, But in your case you are using lists, and using ** on a list returns an error,so you first need to convert the list to numpy array using np.array()
import numpy as np
x = np.array([1,2,4,5,7,8])
y = np.array([-1,4,8,-1,11,17])
RAdeccircle = x**2+y**2
print RAdeccircle
r = 10
regstars = np.where(RAdeccircle < r**2)
print regstars
>>> [ 2 20 80 26 170 353]
>>> (array([0, 1, 2, 3], dtype=int64),)
I'd like to multiply two vectors, one column (i.e., (N+1)x1), one row (i.e., 1x(N+1)) to give a (N+1)x(N+1) matrix. I'm fairly new to Numpy but have some experience with MATLAB, this is the equivalent code in MATLAB to what I want in Numpy:
n = 0:N;
xx = cos(pi*n/N)';
T = cos(acos(xx)*n');
in Numpy I've tried:
import numpy as np
n = range(0,N+1)
pi = np.pi
xx = np.cos(np.multiply(pi / float(N), n))
xxa = np.asarray(xx)
na = np.asarray(n)
nd = np.transpose(na)
T = np.cos(np.multiply(np.arccos(xxa),nd))
I added the asarray line after I noticed that without it Numpy seemed to be treating xx and n as lists. np.shape(n), np.shape(xx), np.shape(na) and np.shape(xxa) gives the same result: (100001L,)
np.multiply only does element by element multiplication. You want an outer product. Use np.outer:
np.outer(np.arccos(xxa), nd)
If you want to use NumPy similar to MATLAB, you have to make sure that your arrays have the right shape. You can check the shape of any NumPy array with arrayname.shape and because your array na has shape (4,) instead of (4,1), the transpose method is effectless and multiply calculates the dot product. Use arrayname.reshape(N+1,1) resp. arrayname.reshape(1,N+1) to transform your arrays:
import numpy as np
n = range(0,N+1)
pi = np.pi
xx = np.cos(np.multiply(pi / float(N), n))
xxa = np.asarray(xx).reshape(N+1,1)
na = np.asarray(n).reshape(N+1,1)
nd = np.transpose(na)
T = np.cos(np.multiply(np.arccos(xxa),nd))
Since Python 3.5, you can use the # operator for matrix multiplication. So it's a walkover to get code that's very similar to MATLAB:
import numpy as np
n = np.arange(N + 1).reshape(N + 1, 1)
xx = np.cos(np.pi * n / N)
T = np.cos(np.arccos(xx) # n.T)
Here n.T denotes the transpose of n.