Replace elements in sparse matrix created by Scipy (Python) - python

I have a huge sparse matrix in Scipy and I would like to replace numerous elements inside by a given value (let's say -1).
Is there a more efficient way to do it than using:
SM[[rows],[columns]]=-1
Here is an example:
Nr=seg.shape[0] #size ~=50000
Im1=sparse.csr_matrix(np.append(np.array([-1]),np.zeros([1,Nr-1])))
Im1=sparse.csr_matrix(sparse.vstack([Im1,sparse.eye(Nr)]))
Im1[prev[1::]-1,Num[1::]-1]=-1 # this line is very slow
Im2=sparse.vstack([sparse.csr_matrix(np.zeros([1,Nr])),sparse.eye(Nr)])
IM=sparse.hstack([Im1,Im2]) #final result

I've played around with your sparse arrays. I'd encourage you to do some timings on smaller sizes, to see how different methods and sparse types behave. I like to use timeit in Ipython.
Nr=10 # seg.shape[0] #size ~=50000
Im2=sparse.vstack([sparse.csr_matrix(np.zeros([1,Nr])),sparse.eye(Nr)])
Im2 has a zero first row, and offset diagonal on the rest. So it's simpler, though not much faster, to start with an empty sparse matrix:
X = sparse.vstack([sparse.csr_matrix((1,Nr)),sparse.eye(Nr)])
Or use diags to construct the offset diagonal directly:
X = sparse.diags([1],[-1],shape=(Nr+1, Nr))
Im1 is similar, except it has a -1 in the (0,0) slot. How about stacking 2 diagonal matrices?
X = sparse.vstack([sparse.diags([-1],[0],(1,Nr)),sparse.eye(Nr)])
Or make the offset diagonal (copy Im2?), and modify [0,0]. A csr matrix gives an efficiency warning, recommending the use of lil format. It does, though, take some time to convert tolil().
X = sparse.diags([1],[-1],shape=(Nr+1, Nr)).tolil()
X[0,0] = -1 # slow warning with csr
Let's try your larger insertions:
prev = np.arange(Nr-2) # what are these like?
Num = np.arange(Nr-2)
Im1[prev[1::]-1,Num[1::]-1]=-1
With Nr=10, and various Im1 formats:
lil - 267 us
csr - 1.44 ms
coo - not supported
todense - 25 us
OK, I've picked prev and Num such that I end up modifying diagonals of Im1. In this case it would be faster to construct those diagonals right from the start.
X2=Im1.todia()
print X2.data
[[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[-1. -1. -1. -1. -1. -1. -1. 0. 0. 0.]]
print X2.offsets
[-1 0]
You may have to learn how various sparse formats are stored. csr and csc are a bit complex, designed for fast linear algebra operations. lil, dia, coo are simpler to understand.

Related

Computing pairwise accuracy/comparison between many arrays

Let's say I have several arrays, where each array is the same length. I am working with binary-valued (values are 0 or 1) arrays which might simplify the problem, so it's okay if the proposed solution makes use of this property.
I want to compute pairwise accuracies between each pair of arrays, where accuracy can be thought of as the proportion of times the elements in two arrays are equal. So here is a simple example where I am using a list of lists format. Let's say A = [[1,1,1], [0,1,0], [1,1,0]]. We would want to output:
1. , 1/3, 2/3
1/3, 1., 2/3
2/3, 2/3, 1.
I can compute this using multiple loops (iterating over each pair of arrays, and over each index). However, is there are built-in functionalities or library (e.g numpy) that can help do this more cleanly and efficiently?
You can use broadcasting:
import numpy as np
A = np.array([[1,1,1], [0,1,0], [1,1,0]])
output = A[:,None,:] == A[None,:,:]
output = output.sum(axis=2) / 3
print(output)
# [[1. 0.33333333 0.66666667]
# [0.33333333 1. 0.66666667]
# [0.66666667 0.66666667 1. ]]
I'd suggest
A = np.array(A)
-1 * np.linalg.norm(A[:, None, :] - A[None, :, :], axis=-1, ord=1)/len(A) + 1
that leverages NumPy's linalg.norm.
Since pairwise accuracies seemingly refers to the relative number of coinciding elements in between two vectors. In this case, you compute
1 - HammingDistance(v1, v2) / len(v2)
where the Hamming distance counts the (absolute) number of indices of non-equal values. This is emulated by using the 1-norm through ord=1.
However, if you'd prefer to leverage the binary structure of your vectors without invoking the linear algebra in NumPy but merely is broadcasting capability,
A = np.array(A)
-1 * (A[:, None, :] != A).sum(2)/len(A) + 1
will equally do.
Naturally, both code snippets require the lists (i.e. vectors) in your code to have the same length. However, it is non-trivial in a mathematically rigorous way to measure distance (and in turn, similarity) anyway when this is not the case.

Numpy Matrix Determinant Not Working as Expected?

I have a question in which I am asked to show that the determinant of matrix B equals 0. Matrix B is defined as:
import numpy as np
from numpy import linalg as m
B = np.array([[-1-3.j,-8-10.j,0-3.j],
[-7-3.j,-4-9.j,-3-2.j],
[11-3.j,-16-12.j,6-5.j]
])
print(B)
[[ -1. -3.j -8.-10.j 0. -3.j]
[ -7. -3.j -4. -9.j -3. -2.j]
[ 11. -3.j -16.-12.j 6. -5.j]]
The determinant is straightforward using numpy
m.linalg.det(B)
(-8.126832540256171e-14-1.5987211554602298e-14j)
Which is clearly not equal to zero.
I double checked my answer using https://www.symbolab.com/ and the determinant is definitely zero.
I feel like I am doing something ridiculously silly, but can't quite figure out what. Any help?
What you're seeing are really tiny numbers that are almost equal to zero. They're not exactly equal to zero only due to numerical inaccuracies.
That's why we're usually not testing them for equality but for closeness
np.allclose(np.linalg.det(B), 0). # True
To expand a little on Nils answer:
There are various ways to compute determinants. The way taught in algebra classes -- laplace expansion -- is a reasonable way to go for small (eg 3 x 3) matrices, but rapidly becomes impossible -- because of the number of computations required -- for larger matrices.
In your case, where all the real and imaginary parts are small integers, such a computation would evaluate the determinant exactly, as 0.
In python linalg.det uses a different approach, where you factorise the matrix into factors -- triangular matrices and permutations -- whose determinants can easily be computed, and then the determinant of the product is the product of the determinants of the factors. This is a order N cubed computation, and so can be used for even quite large matrices.
However such factorisations are (a little) inaccurate; the original matrix will not be exactly equal to the product. Thus the determinant will also be, most likely, a little inaccurate.

numpy.linalg.eigvals shows different values for the transpose of a matrix - is this due to overflow?

I'm getting different answers coming from np.linalg.eigvals depending on whether I use the transpose of a matrix.
To replicate:
mat = np.array([[ -7.00616288e-08, -2.79704289e-09 , 1.67598654e-10],
[ -3.23676574e+07, -1.58978291e+15, 0.00000000e+00],
[ 0.00000000e+00 , 1.80156232e-02 , -2.32851854e+07]])
print(np.linalg.eigvals(mat))
print(np.linalg.eigvals(mat.transpose()))
I get:
[ -7.00616288e-08 -1.58978291e+15 -2.32851854e+07]
[ -1.58978291e+15 2.50000000e-01 -2.32851854e+07]
Note that these values are different. Since the eigenvalues of a matrix and its transpose are identical, I assume that these issues are due to overflow. Is there some maximum value I should limit to, to make sure that this is always consistent?
Not due to an overflow. Overflow is easy to detect, and it generates a warning. The issue is the limit of double precision: significant digits can be lost when numbers of very different magnitudes are added, and then subtracted. For example, (1e20 + 1) - 1e20 == 0.
The second result, with 2 negative eigenvalues, is incorrect, because the determinant of your matrix is clearly negative: the product of main-diagonal entries is of magnitude 1e15 and dominates all other terms in the determinant by a large margin. So the sign of the determinant is the sign of this product, which is negative.
The issue is that mat.T has all tiny entries in the first column, much smaller than those in other columns. When looking for a pivot, an algorithm may scan that column and settle for what is found there. This is not necessarily how .eigvals works, but same principle -- numerical linear algebra algorithms tend to proceed from the upper left corner, and so it's best to avoid small entries there. Here's one way to:
mat1 = np.roll(mat, 1, axis=[0, 1])
print(np.linalg.eigvals(mat1))
print(np.linalg.eigvals(mat1.T))
prints
[-7.00616288e-08 -2.32851854e+07 -1.58978291e+15]
[-2.32851854e+07 -1.58978291e+15 -7.00616288e-08]
which are consistent. Rolling both axes means conjugating mat by a permutation matrix, which does not change the eigenvalues. The rolled matrix is
[[-2.32851854e+07 0.00000000e+00 1.80156232e-02]
[ 1.67598654e-10 -7.00616288e-08 -2.79704289e-09]
[ 0.00000000e+00 -3.23676574e+07 -1.58978291e+15]]
which gives NumPy a nice large number to start with.
Ideally it would do something like that itself, but no (practical) algorithm is ideal for every situation.

Rotate transformation matrix to match reflection vector

There are a couple of ball-bounce related questions on stackoverflow that i've looked through, however none of them seem to get me past my predicament. I have a turtle cursor defined by a transformation matrix that intersects a line in 3d space. What I want is to rotate the cursor, that is, the transformation matrix, at the point of intersection so that it's new direction matches the reflection vector. I have functions that will get both the reflection vector R from the incident vector V and the normal of the reflecting line N. I normalize each before evaluating:
N,V=unit_vector(N),unit_vector(V)
R = -2*(np.dot(V,N))*N - V
R=unit_vector(R)
My transformation matrix, T is in a numpy array:
array([[ -0.84923515, -0.6 , 0. , 3.65341878],
[ 0.52801483, -0.84923515, 0. , 25.12882224],
[ 0. , 0. , 1. , 0. ],
[ 0. , 0. , 0. , 1. ]])
How can I transform T by R to get the correct direction vector? I've found and used the R2_vect function from here to get a rotation matrix from one vector to another but only a few of the resulting reflections appear correct when i send them to vtk to render. I'm asking about this here because I seem to be reaching the limit of what I can remember from my already shaky linear algebra. Thanks for any information.
A little extra research clarified things: the first 3 columns of the transformation matrix represent 3 orthonormal vectors ( x1, x2, x3 ) and the 4th column represents the coordinates in space of the cursor at given time interval. the final row contains no data, it's just there to keep the matrix square. rotating the vectors was just a matter of removing the last row of T, taking the 3x3 rotation matrix from my listed function R and rotating each vector: R.dot(x1), R.dot(x2), R.dot(x3) Then I just had to composite the values back into a 4x4 matrix.

How to calculate (1 - SparseMatrix) of a huge sparse matrix?

I researched a lot on this but couldn't find a practical solution to this problem. I am using scipy to create csr sparse matrix and want to substract this matrix from an equivalent matrix of all ones. In scipy and numpy notations, if matrix is not sparse, we can do so by simply writing 1 - MatrixVariable. However, this operation is not implemented if Matrix is sparse. I could just think of the following obvious solution:
Iterate through the entire sparse matrix, set all zero elements to 1 and all non-zero elements to 0.
But this would create a matrix where most elements are 1 and only a few are 0, which is no longer sparse and due its huge size could not be converted to dense.
What could be an alternative and effective way of doing this?
Thanks.
Your new matrix will not be sparse, because it will have 1s everywhere, so you will need a dense array to hold it:
new_mat = np.ones(sps_mat.shape, sps_mat.dtype) - sps_mat.todense()
This requires that your matrix fits in memory. It actually requires that it fits in memory 3 times. If that is an issue, you can get it to be more efficient doing something like:
new_mat = sps_mat.todense()
new_mat *= -1
new_mat += 1
You can access the data from your sparse matrix as a 1D array so that:
ss.data *= -1
ss.data += 1
will work like 1 - ss, for all non-zero elements in your sparse matrix.

Categories

Resources