sorting via argsort - generalization to 2d matrices - python

For sorting a numpy via argsort, we can do:
import numpy as np
x = np.random.rand(3)
x_sorted = x[np.argsort(x)]
I am looking for a numpy solution for the generalization to two or higher dimensions.
The indexing as in the 1d case won't work for 2d matrices.
Y = np.random.rand(4, 3)
sort_indices = np.argsort(Y)
#Y_sorted = Y[sort_indices] (what would that line be?)
Related: I am looking for a pure numpy answer that addresses the same problem as solved in this answer: https://stackoverflow.com/a/53700995/2272172

Use np.take_along_axis:
import numpy as np
np.random.seed(42)
x = np.random.rand(3)
x_sorted = x[np.argsort(x)]
Y = np.random.rand(4, 3)
sort_indices = np.argsort(Y)
print(np.take_along_axis(Y, sort_indices, axis=1))
print(np.array(list(map(lambda x, y: y[x], np.argsort(Y), Y)))) # the solution provided
Output
[[0.15599452 0.15601864 0.59865848]
[0.05808361 0.60111501 0.86617615]
[0.02058449 0.70807258 0.96990985]
[0.18182497 0.21233911 0.83244264]]
[[0.15599452 0.15601864 0.59865848]
[0.05808361 0.60111501 0.86617615]
[0.02058449 0.70807258 0.96990985]
[0.18182497 0.21233911 0.83244264]]

Related

How do I convert this Matlab code with meshgrid and arrays to Python code?

I am attempting to write a program which constructs a matrix and performs a singular value decomposition on it. I am evaluating the function ax^2 +bx + 1 on a grid. I then make a uniform meshgrid of a and b. The rows of the matrix correspond to different quadratic coefficients, while each column corresponds to a grid point at which the function is evaluated.
The matlab code is here:
% Collect data
x = linspace(-1,1,100);
[a,b] = meshgrid(0:0.1:1,0:0.1:1);
D=zeros(numel(x),numel(a));
sz = size(D)
% Build “Dose” matrix
for i=1:numel(a)
D(:,i) = a(i)*x.^2+b(i)*x+1;
end
% Do the SVD:
[U,S,V]=svd(D,'econ');
D_reconstructed = U*S*V';
plot(diag(S))
scatter3(a(:),b(:),V(:,1))
This is my attempt at a solution:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-1, 1, 100)
def f(x, a, b):
return a*x*x + b*x + 1
a, b = np.mgrid[0:1:0.1,0:1:0.1]
#a = b = np.arange(0,1,0.01)
D = np.zeros((x.size, a.size))
for i in range(a.size):
D[i] = a[i]*x*x +b[i]*x +1
U, S, V = np.linalg.svd(D)
plt.plot(np.diag(S))
fig = plt.figure()
ax = plt.axes(projection="3d")
ax.scatter(a, b, V[0])
but I always get broadcasting errors which I am not sure how to fix.
Firstly, in MATLAB you're assigning to D(:,i), but in python you're assigning to D[i]. The latter is equivalent to D[i, ...] which is in your case D[i, :]. Instead you seem to need D[:, i].
Secondly, in MATLAB using a linear index into a 2d array (namely a and b) will give you flattened views. If you do that with numpy you get slices of an array instead, just as I mentioned with D[i].
You can do away with the loop with broadcasting and getting your desired 2d array by .ravelling (or reshaping) your a and b arrays:
x = np.linspace(-1, 1, 100)[:, None] # inject trailing singleton for broadcasting
a, b = np.mgrid[0:1:0.1, 0:1:0.1]
D = a.ravel() * x**2 + b.ravel() * x + 1
The way this works is that x has shape (100, 1) after we inject a trailing singleton (in MATLAB trailing singletons are implied, in numpy leading ones), and both a.ravel() and b.ravel() have shape (10*10,) which is compatible with (1, 10*10), making broadcasting possible into shape (100, 10*10). You could also replace the calls to ravel with
a, b = np.mgrid[...].reshape(2, -1)
which is a trick I sometimes use, but this is harder to read if you're unfamiliar with the pattern.
Side note: it's better to use example data where dimensions end up being of different size so that you notice if something ends up being transposed.

Zero Padding a Matrix in Python

I have a 5x20 Matrix, I intend to use Matrix Function and Transformations like Fourier Transform which can only be used for Symmetric Matrix. How can I convert the 5x20 Matrix to 20x20 Matrix by Zero Padding?
import numpy as np
a = np.array([[1,1,1,1],[2,2,2,2]])
b=np.zeros((20,20))
result = np.zeros_like(b)
x = 0
y = 0
result[x:a.shape[0],y:a.shape[1]] = a
print(result)
You can use the above logic to implement the padding.
One more approach.
Here a is the array that you already have.
import numpy as np
a = np.random.rand(5,20)
za = np.zeros((1, 20))
while a.shape[0] < 20:
a = np.concatenate((a,za))
Are you using numpy? If you do you can do it by slicing the arrays
import numpy as np
random_array_5_20 = np.random.rand(5, 20)
padded_array_20_20 = np.zeros((20, 20))
padded_array_20_20[:5, :20] = random_array_5_20

Most efficient way to index into a numpy array from a scipy CSR matrix?

I have a numpy ndarray X with shape (4000, 3), where each sample in X is a 3D coordinate (x,y,z).
I have a scipy csr matrix nn_rad_csr of shape (4000, 4000), which is the nearest neighbors graph generated from sklearn.neighbors.radius_neighbors_graph(X, 0.01, include_self=True).
nn_rad_csr.toarray()[i] is a shape (4000,) sparse vector with binary weights (0 or 1) associated with the edges in the nearest neighbors graph from node X[i].
For instance, if nn_rad_csr.toarray()[i][j] == 1 then X[j] is within the nearest neighbor radius of X[i], whereas a value of 0 means it is not a neighbor.
What I'd like to do is have a function radius_graph_conv(X, rad) which returns an array Y which is X, averaged by its neighbors' values. I'm not sure how to exploit the sparsity of a CSR matrix to efficiently perform radius_graph_conv. I have two naive implementations of graph conv below.
import numpy as np
from sklearn.neighbors import radius_neighbors_graph, KDTree
def radius_graph_conv(X, rad):
nn_rad_csr = radius_neighbors_graph(X, rad, include_self=True)
csr_indices = nn_rad_csr.indices
csr_indptr = nn_rad_csr.indptr
Y = np.copy(X)
for i in range(X.shape[0]):
j, k = csr_indptr[i], csr_indptr[i+1]
neighbor_idx = csr_indices[j:k]
rad_neighborhood = X[neighbor_idx] # ndim always 2
Y[i] = np.mean(rad_neighborhood, axis=0)
return Y
def radius_graph_conv_matmul(X, rad):
nn_rad_arr = radius_neighbors_graph(X, rad, include_self=True).toarray()
# np.sum(nn_rad_arr, axis=-1) is basically a count of neighbors
return np.matmul(nn_rad_arr / np.sum(nn_rad_arr, axis=-1), X)
Is there a better way to do this? With a knn graph, its a very simple function, since the number of neighbors is fixed and you can just index into X, but with a radius or density based nearest neighbors graph, you have to work with a CSR, (or an array of arrays if you are using a kd tree).
Here is the direct way of exploiting csr format. Your matmul solution probably does similar things under the hood. But we save one lookup (from the .data attribute) by also exploiting that it is an adjacency matrix; also, diffing .indptr should be more efficient than summing the equivalent amount of ones.
>>> import numpy as np
>>> from scipy import sparse
>>>
# create mock data
>>> A = np.random.random((100, 100)) < 0.1
>>> A = (A | A.T).view(np.uint8)
>>> AS = sparse.csr_matrix(A)
>>> X = np.random.random((100, 3))
>>>
# dense solution for reference
>>> Xa = A # X / A.sum(axis=-1, keepdims=True)
# sparse solution
>>> XaS = np.add.reduceat(X[AS.indices], AS.indptr[:-1], axis=0) / np.diff(AS.indptr)[:, None]
>>>
# check they are the same
>>> np.allclose(Xa, XaS)
True

Multiplying column and row vectors in Numpy

I'd like to multiply two vectors, one column (i.e., (N+1)x1), one row (i.e., 1x(N+1)) to give a (N+1)x(N+1) matrix. I'm fairly new to Numpy but have some experience with MATLAB, this is the equivalent code in MATLAB to what I want in Numpy:
n = 0:N;
xx = cos(pi*n/N)';
T = cos(acos(xx)*n');
in Numpy I've tried:
import numpy as np
n = range(0,N+1)
pi = np.pi
xx = np.cos(np.multiply(pi / float(N), n))
xxa = np.asarray(xx)
na = np.asarray(n)
nd = np.transpose(na)
T = np.cos(np.multiply(np.arccos(xxa),nd))
I added the asarray line after I noticed that without it Numpy seemed to be treating xx and n as lists. np.shape(n), np.shape(xx), np.shape(na) and np.shape(xxa) gives the same result: (100001L,)
np.multiply only does element by element multiplication. You want an outer product. Use np.outer:
np.outer(np.arccos(xxa), nd)
If you want to use NumPy similar to MATLAB, you have to make sure that your arrays have the right shape. You can check the shape of any NumPy array with arrayname.shape and because your array na has shape (4,) instead of (4,1), the transpose method is effectless and multiply calculates the dot product. Use arrayname.reshape(N+1,1) resp. arrayname.reshape(1,N+1) to transform your arrays:
import numpy as np
n = range(0,N+1)
pi = np.pi
xx = np.cos(np.multiply(pi / float(N), n))
xxa = np.asarray(xx).reshape(N+1,1)
na = np.asarray(n).reshape(N+1,1)
nd = np.transpose(na)
T = np.cos(np.multiply(np.arccos(xxa),nd))
Since Python 3.5, you can use the # operator for matrix multiplication. So it's a walkover to get code that's very similar to MATLAB:
import numpy as np
n = np.arange(N + 1).reshape(N + 1, 1)
xx = np.cos(np.pi * n / N)
T = np.cos(np.arccos(xx) # n.T)
Here n.T denotes the transpose of n.

extract value of a numeric array function in numpy

If I define a function whit two array, for instance like this:
from numpy import *
x = arange(-10,10,0.1)
y = x**3
How can I extract the value of y(5.05) interpolating the value of the two closer point y(5) and y(5.1)? Now if I want find that value, I use this method:
y0 = y[x>5][0]
And I should obtain the value of y for x=5.1, but I think that exist better methods, and probably they are the correct ones.
There's numpy.interp, if linear interpolation will suffice:
>>> import numpy as np
>>> x = np.arange(-10, 10, 0.1)
>>> y = x**3
>>> np.interp(5.05, x, y)
128.82549999999998
>>> 5.05**3
128.787625
And there are a bunch of tools in scipy for interpolation [docs]:
>>> import scipy.interpolate
>>> f = scipy.interpolate.UnivariateSpline(x, y)
>>> f
<scipy.interpolate.fitpack2.LSQUnivariateSpline object at 0xa85708c>
>>> f(5.05)
array(128.78762500000025)
There's a function for this in numpy/scipy..
import numpy as np
np.interp(5.05, x, y)

Categories

Resources