I have a condensed distance matrix from scipy that I need to pass to a C function that requires the matrix be converted to the lower triangle read by rows. For example:
0 1 2 3
0 4 5
0 6
0
The condensed form of this is: [1,2,3,4,5,6] but I need to convert it to
0
1 0
2 4 0
3 5 6 0
The lower triangle read by rows is: [1,2,4,3,5,6].
I was hoping to convert the compact distance matrix to this form without creating a redundant matrix.
Here's a quick implementation--but it creates the square redundant distance matrix as an intermediate step:
In [128]: import numpy as np
In [129]: from scipy.spatial.distance import squareform
c is the condensed form of the distance matrix:
In [130]: c = np.array([1, 2, 3, 4, 5, 6])
d is the redundant square distance matrix:
In [131]: d = squareform(c)
Here's your condensed lower triangle distances:
In [132]: d[np.tril_indices(d.shape[0], -1)]
Out[132]: array([1, 2, 4, 3, 5, 6])
Here's a method that avoids forming the redundant distance matrix. The function condensed_index(i, j, n) takes the row i and column j of the redundant distance matrix, with j > i, and returns the corresponding index in the condensed distance array.
In [169]: def condensed_index(i, j, n):
...: return n*i - i*(i+1)//2 + j - i - 1
...:
As above, c is the condensed distance array.
In [170]: c
Out[170]: array([1, 2, 3, 4, 5, 6])
In [171]: n = 4
In [172]: i, j = np.tril_indices(n, -1)
Note that the arguments are reversed in the following call:
In [173]: indices = condensed_index(j, i, n)
indices gives the desired permutation of the condensed distance array.
In [174]: c[indices]
Out[174]: array([1, 2, 4, 3, 5, 6])
(Basically the same function as condensed_index(i, j, n) was given in several answers to this question.)
Related
how to perform pair operations on more than 2 lists
Example
If my matrix have 2 lists (L,M) I calculate the dot product and the results are [[M.M M.L , L.M LL]]
How to calculate the same operation for matrices that have more than 2 lists in a way that the result is a symmetric matrice
x = np.array([[1, 3, 5],[1, 4, 5],[2,6,10]])
How to perform pairwise analysis ?
Solution 1: An alternative to the brute force below is using np.einsum, but it is not simple to use that function. This link has an explanation on how to use it, https://ajcr.net/Basic-guide-to-einsum/. See Solution 2 on how matrix is defined.
np.einsum('ij,jk', matrix,matrix.T)
Out[35]:
array([[35, 38],
[38, 42]])
matrix = np.array([L, M, N]) # matrix with 3 lists
np.einsum('ij,jk', matrix,matrix.T)
Out[37]:
array([[ 35, 38, 70],
[ 38, 42, 76],
[ 70, 76, 140]])
Solution 2 for smaller matrices. Explanation below:
def dot_pairwise(matrix):
return [[np.dot(i, j) for j in matrix] for i in matrix]
dot_pairwise(matrix)
Explanation:
import numpy as np
L = np.array([1, 3, 5])
M = np.array([1, 4, 5])
N = np.array([2, 6, 10])
matrix = np.array([L, M, N]) # matrix with 3 lists
# matrix = np.array([L, M]) # matrix with 2 lists to replicate your example
# Initialize an empty result list
result = []
for i in matrix:
row = [] # Initialize an empty row
for j in matrix:
# Calculate the dot product between the ith and jth lists using numpy.dot
print(i,j) # to print the matrices
dot_product = np.dot(i, j)
row.append(dot_product) # Add the dot product to the row
result.append(row) # Add the row to the result
print(result) # [[LL, LM, LN], [ML, MM, MN], [NL, NM, NN]]
This is the result using L, M matrix:
[1 3 5] [1 3 5] LL
[1 3 5] [1 4 5] LM
[1 4 5] [1 3 5] ML
[1 4 5] [1 4 5] MM
[[35, 38], [38, 42]] # dot products
Alternative from this answer, slightly changed:
np.tensordot(x, x, axes=(1,1))
Equivalent transform for vectorized solution
For a given symmetric 4x4 matrix Q and a 3x4 matrix P the 3x3 matrix C is obtained through
C=P # Q # P.T
It can be shown that the output C will be symmetric again. The same problem can be formulated using only the unique elements in Q and C exploiting their symmetry. To do so, the matrices are vectorized as seen below.
I want to construct a matrix B that maps the vectorized matrices onto each other like so:
c = B # q
B must be a 6x10 and should be constructable from P only. How can I get B from P?
I tried this, but it doesnt seem to work. Maybe someone has experienced a similar problem?
import numpy as np
def vectorize(A, ord='c'):
"""
Symmetric matrix to vector e.g:
[[1, 2, 3],
[2, 4, 5],
[3, 5, 6]] -> [1, 2, 3, 4, 5, 6] (c-order, row-col)
-> [1, 2, 4, 3, 5, 6] (f-order, col-row)
"""
# upper triangle mask
m = np.triu(np.ones_like(A, dtype=bool)).flatten(order=ord)
return A.flatten(order=ord)[m]
def B(P):
B = np.zeros((6, 10))
counter = 0
# the i,j entry in C depends on the i, j columns in P
for i in range(3):
for j in range(i, 3):
coeffs = np.outer(P[i], P[j])
B[counter] = vectorize(coeffs)
counter += 1
return B
if __name__ == '__main__':
# original transform
P = np.arange(12).reshape((3, 4))
# calculated transform for vectorized matrix
_B = B(P)
# some random symmetric matrix
Q = np.array([[1, 2, 3, 4],
[2, 5, 6, 7],
[3, 6, 8, 9],
[4, 7, 9, 10]])
# if B is an equivilant transform to P, these should be similar
C = P # Q # P.T
c = _B # vectorize(Q)
print(f"q: {vectorize(Q)}\n"
f"C: {vectorize(C)}\n"
f"c: {c}")
Output:
q: [ 1 2 3 4 5 6 7 8 9 10]
C: [ 301 949 2973 1597 4997 8397]
c: [ 214 542 870 1946 3154 5438] <-- not the same
import numpy as np
def vec_from_mat(A, order='c'):
"""
packs the unique elements of symmetric matrix A into a vector
:param A: symmetric matrix
:return:
"""
return A[np.triu_indices(A.shape[0])].flatten(order=order)
def B_from_P(P):
"""
returns a 6x10 matrix that maps the 10 unique elements of a symmetric 4x4 matrix Q on the 6 unique elements of a
3x3 matrix C to linearize the equation C=PTQP to c=Bv
:param P: 3x4 matrix
:return: B with shape (6, 10)
"""
n, m = P.shape
b1, b2 = (n * (n + 1) // 2), (m * (m + 1) // 2)
B = np.zeros((b1, b2))
for a, (i, j) in enumerate(zip(*np.triu_indices(n))):
coeffs = np.outer(P[i], P[j])
# collect coefficients from lower and upper triangle of symmetric matrix
B[a] = vec_from_mat(coeffs) + vec_from_mat(np.triu(coeffs.T, k=1))
return B
I'd like to symmetrically permute a sparse matrix, permuting rows and columns in the same way. For example, I would like to rotate the rows and columns, which takes:
1 2 3
0 1 0
0 0 1
to
1 0 0
0 1 0
2 3 1
In Octave or MATLAB, one can do this concisely with matrix indexing:
A = sparse([1 2 3; 0 1 0; 0 0 1]);
perm = [2 3 1];
Aperm = A(perm,perm);
I am interested in doing this in Python, with NumPy/SciPy. Here is an attempt:
#!/usr/bin/env python
import numpy as np
from scipy.sparse import csr_matrix
row = np.array([0, 0, 0, 1, 2])
col = np.array([0, 1, 2, 1, 2])
data = np.array([1, 2, 3, 1, 1])
A = csr_matrix((data, (row, col)), shape=(3, 3))
p = np.array([1, 2, 0])
#Aperm = A[p,p] # gives [1,1,1], the permuted diagonal
Aperm = A[:,p][p,:] # works, but more verbose
Is there a cleaner way to accomplish this sort of symmetric permutation of a matrix?
(I'm more interested in concise syntax than I am in performance)
In MATLAB
A(perm,perm)
is a block operation. In numpy A[perm,perm] selects elements on the diagonal.
A[perm[:,None], perm]
is the block indexing. The MATLAB diagonal requires something like sub2ind. What's concise in one is more verbose in the other, and v.v.
Actually numpy is using the same logic in both cases. It 'broadcasts' one index against the other, A (n,) against (n,) in the diagonal case, and (n,1) against (1,n) in the block case. The results are (n,) and (n,n) shaped.
This numpy indexing works with sparse matrices as well, though it isn't as fast. It actually uses matrix multiplication to do this sort of indexing - with an 'extractor' matrix based on the indices (maybe 2, M*A*M.T).
MATLAB's documentation about a permutation matrix:
https://www.mathworks.com/help/matlab/math/sparse-matrix-operations.html#f6-13070
I am attempting to generalize some Python code to operate on arrays of arbitrary dimension. The operations are applied to each vector in the array. So for a 1D array, there is simply one operation, for a 2-D array it would be both row and column-wise (linearly, so order does not matter). For example, a 1D array (a) is simple:
b = operation(a)
where 'operation' is expecting a 1D array. For a 2D array, the operation might proceed as
for ii in range(0,a.shape[0]):
b[ii,:] = operation(a[ii,:])
for jj in range(0,b.shape[1]):
c[:,ii] = operation(b[:,ii])
I would like to make this general where I do not need to know the dimension of the array beforehand, and not have a large set of if/elif statements for each possible dimension.
Solutions that are general for 1 or 2 dimensions are ok, though a completely general solution would be preferred. In reality, I do not imagine needing this for any dimension higher than 2, but if I can see a general example I will learn something!
Extra information:
I have a matlab code that uses cells to do something similar, but I do not fully understand how it works. In this example, each vector is rearranged (basically the same function as fftshift in numpy.fft). Not sure if this helps, but it operates on an array of arbitrary dimension.
function aout=foldfft(ain)
nd = ndims(ain);
for k = 1:nd
nx = size(ain,k);
kx = floor(nx/2);
idx{k} = [kx:nx 1:kx-1];
end
aout = ain(idx{:});
In Octave, your MATLAB code does:
octave:19> size(ain)
ans =
2 3 4
octave:20> idx
idx =
{
[1,1] =
1 2
[1,2] =
1 2 3
[1,3] =
2 3 4 1
}
and then it uses the idx cell array to index ain. With these dimensions it 'rolls' the size 4 dimension.
For 5 and 6 the index lists would be:
2 3 4 5 1
3 4 5 6 1 2
The equivalent in numpy is:
In [161]: ain=np.arange(2*3*4).reshape(2,3,4)
In [162]: idx=np.ix_([0,1],[0,1,2],[1,2,3,0])
In [163]: idx
Out[163]:
(array([[[0]],
[[1]]]), array([[[0],
[1],
[2]]]), array([[[1, 2, 3, 0]]]))
In [164]: ain[idx]
Out[164]:
array([[[ 1, 2, 3, 0],
[ 5, 6, 7, 4],
[ 9, 10, 11, 8]],
[[13, 14, 15, 12],
[17, 18, 19, 16],
[21, 22, 23, 20]]])
Besides the 0 based indexing, I used np.ix_ to reshape the indexes. MATLAB and numpy use different syntax to index blocks of values.
The next step is to construct [0,1],[0,1,2],[1,2,3,0] with code, a straight forward translation.
I can use np.r_ as a short cut for turning 2 slices into an index array:
In [201]: idx=[]
In [202]: for nx in ain.shape:
kx = int(np.floor(nx/2.))
kx = kx-1;
idx.append(np.r_[kx:nx, 0:kx])
.....:
In [203]: idx
Out[203]: [array([0, 1]), array([0, 1, 2]), array([1, 2, 3, 0])]
and pass this through np.ix_ to make the appropriate index tuple:
In [204]: ain[np.ix_(*idx)]
Out[204]:
array([[[ 1, 2, 3, 0],
[ 5, 6, 7, 4],
[ 9, 10, 11, 8]],
[[13, 14, 15, 12],
[17, 18, 19, 16],
[21, 22, 23, 20]]])
In this case, where 2 dimensions don't roll anything, slice(None) could replace those:
In [210]: idx=(slice(None),slice(None),[1,2,3,0])
In [211]: ain[idx]
======================
np.roll does:
indexes = concatenate((arange(n - shift, n), arange(n - shift)))
res = a.take(indexes, axis)
np.apply_along_axis is another function that constructs an index array (and turns it into a tuple for indexing).
If you are looking for a programmatic way to index the k-th dimension an n-dimensional array, then numpy.take might help you.
An implementation of foldfft is given below as an example:
In[1]:
import numpy as np
def foldfft(ain):
result = ain
nd = len(ain.shape)
for k in range(nd):
nx = ain.shape[k]
kx = (nx+1)//2
shifted_index = list(range(kx,nx)) + list(range(kx))
result = np.take(result, shifted_index, k)
return result
a = np.indices([3,3])
print("Shape of a = ", a.shape)
print("\nStarting array:\n\n", a)
print("\nFolded array:\n\n", foldfft(a))
Out[1]:
Shape of a = (2, 3, 3)
Starting array:
[[[0 0 0]
[1 1 1]
[2 2 2]]
[[0 1 2]
[0 1 2]
[0 1 2]]]
Folded array:
[[[2 0 1]
[2 0 1]
[2 0 1]]
[[2 2 2]
[0 0 0]
[1 1 1]]]
You could use numpy.ndarray.flat, which allows you to linearly iterate over a n dimensional numpy array. Your code should then look something like this:
b = np.asarray(x)
for i in range(len(x.flat)):
b.flat[i] = operation(x.flat[i])
The folks above provided multiple appropriate solutions. For completeness, here is my final solution. In this toy example for the case of 3 dimensions, the function 'ops' replaces the first and last element of a vector with 1.
import numpy as np
def ops(s):
s[0]=1
s[-1]=1
return s
a = np.random.rand(4,4,3)
print '------'
print 'Array a'
print a
print '------'
for ii in np.arange(a.ndim):
a = np.apply_along_axis(ops,ii,a)
print '------'
print ' Axis',str(ii)
print a
print '------'
print ' '
The resulting 3D array has a 1 in every element on the 'border' with the numbers in the middle of the array unchanged. This is of course a toy example; however ops could be any arbitrary function that operates on a 1D vector.
Flattening the vector will also work; I chose not to pursue that simply because the book-keeping is more difficult and apply_along_axis is the simplest approach.
apply_along_axis reference page
I've a Sparse matrix in CSR Sparse format in python and I want to import it to MATLAB. MATLAB does not have a CSR Sparse format. It has only 1 Sparse format for all kind of matrices. Since the matrix is very large in the dense format I was wondering how could I import it as a MATLAB sparse matrix?
The scipy.io.savemat saves sparse matrices in a MATLAB compatible format:
In [1]: from scipy.io import savemat, loadmat
In [2]: from scipy import sparse
In [3]: M = sparse.csr_matrix(np.arange(12).reshape(3,4))
In [4]: savemat('temp', {'M':M})
In [8]: x=loadmat('temp.mat')
In [9]: x
Out[9]:
{'M': <3x4 sparse matrix of type '<type 'numpy.int32'>'
with 11 stored elements in Compressed Sparse Column format>,
'__globals__': [],
'__header__': 'MATLAB 5.0 MAT-file Platform: posix, Created on: Mon Sep 8 09:34:54 2014',
'__version__': '1.0'}
In [10]: x['M'].A
Out[10]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Note that savemat converted it to csc. It also transparently takes care of the index starting point difference.
And in Octave:
octave:4> load temp.mat
octave:5> M
M =
Compressed Column Sparse (rows = 3, cols = 4, nnz = 11 [92%])
(2, 1) -> 4
(3, 1) -> 8
(1, 2) -> 1
(2, 2) -> 5
...
octave:8> full(M)
ans =
0 1 2 3
4 5 6 7
8 9 10 11
The Matlab and Scipy sparse matrix formats are compatible. You need to get the data, indices and matrix size of the matrix in Scipy and use them to create a sparse matrix in Matlab. Here's an example:
from scipy.sparse import csr_matrix
from scipy import array
# create a sparse matrix
row = array([0,0,1,2,2,2])
col = array([0,2,2,0,1,2])
data = array([1,2,3,4,5,6])
mat = csr_matrix( (data,(row,col)), shape=(3,4) )
# get the data, shape and indices
(m,n) = mat.shape
s = mat.data
i = mat.tocoo().row
j = mat.indices
# display the matrix
print mat
Which prints out:
(0, 0) 1
(0, 2) 2
(1, 2) 3
(2, 0) 4
(2, 1) 5
(2, 2) 6
Use the values m, n, s, i, and j from Python to create a matrix in Matlab:
m = 3;
n = 4;
s = [1, 2, 3, 4, 5, 6];
% Index from 1 in Matlab.
i = [0, 0, 1, 2, 2, 2] + 1;
j = [0, 2, 2, 0, 1, 2] + 1;
S = sparse(i, j, s, m, n, m*n)
Which gives the same Matrix, only indexed from 1.
(1,1) 1
(3,1) 4
(3,2) 5
(1,3) 2
(2,3) 3
(3,3) 6