I am trying to find the eigenvalues of many small matrices, while not trying to use a loop, with the intent to use CuPy later on.
Thus, I have tried to set up a large matrix that takes the matrices that I want to solve as blocks on its diagonal. This matrix contains a lot of unnecessary zeros, thus I use Scipy.Sparse.
All works well, until I want to find the eigenvalues, where the spsolve() function calculates the full eigenvectors to the problem, when most of the entries should also be zero.
import numpy as np
from scipy import sparse as sp
from scipy.sparse.linalg import spsolve, eigs
sigx=np.array([[0, 1],[1, 0]], dtype=np.complex128) # a 2x2 Pauli matrix
karray=np.arange(-np.pi, np.pi, np.pi/100) #200 elements
H_sci=sp.kron(sp.diags(karray), sigx) #The sparse matrix I want to find the eigenvalues to
H_reg=H_sci.toarray() #Converted into a regular numpy array to see the memory difference
print(H_sci.data.nbytes) #12800 = 2*2*200*16, reminder that 16 bytes = 128 bits --> saves 4 arrays of length 200
print(H_reg.nbytes) #2560000 = 2*2*200*200*16 --> saves the entire matrix
E_sci=eigs(H_sci, k=398) #throws an error for k=400 and 399, even though I should have 400 eigenvalues?
print(E_sci[1].data.nbytes) #2547200 --> as much as H_reg
Do I do something wrong? Is there an alternative approach to solving many matrices (here 2x2 for example) in parallel? I have used Numba for looping over the matrices before, but I would like to try to use my GPU to see whether I can speed this problem up, because I do not see why I should solve these matrices one after another.
Related
I have a matrix X and I need to write a function, which calculate a trace of matrix .
I wrote a next script:
import numpy as np
def test(matrix):
return (np.dot(matrix, matrix.T)).trace()
np.random.seed(42)
matrix = np.random.uniform(size=(1000, 1))
print(test(matrix))
It works fine on small matrix, but when I try to calculate on large matrix (for example on matrix with shape (50000, 1)), it gives me a memory error.
I tried to find a solution to the problem in other questions on the site, but nothing helped me. I would be grateful for any advice!
The number you're trying to compute is just the sum of the squares of all entries of X. Sum the squares instead of computing a giant matrix product full of entries you don't want:
return (X**2).sum()
Or ravel the matrix and use dot, which is probably faster for contiguous X:
raveled = X.ravel()
return raveled.dot(raveled)
Actually, ravel is probably faster for non-contiguous X, too - even when ravel needs to copy, it's not doing more allocation than (X**2).sum().
import numpy
from scipy.spatial.distance import pdist
X = numpy.zeros(50000,25)
C = pdist(X, 'euclidian')
I want to find:
And then numpy gives error : Array is too big.
I think problem is about array size of C. Pdist cannot creates (50000,50000) array. I dont know why numpy restricts? I can run same code in matlab. How can i run this code using array?
And also ,i found possible duplication but their array-matrix size too big.
Is it possible to create a 1million x 1 million matrix using numpy?
Very large matrices using Python and NumPy
first thing there are a couple of typos in your code. It's:
X = numpy.zeros((50000,25)) # it's a tuple going in
C = pdist(X, 'euclidean') # euclidean with an e
of course it does not matter for the question.
The Euclidean pdist is just a call for numpy.linalg.norm (http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.norm.html). It's a very general function. If it does not work in your case due to memory constraints you can always create something yourself. Two 50000 length vectors do not take that much memory and this can make one pairwise comparison:
np.sqrt(np.sum(np.square(X[0])) + np.sum(np.square(X[1])))
And then you only need to loop through the whole thing.
Hope it helps,
P
I have a loop that in each iteration gives me a column c of a sparse matrix N.
To assemble/grow/accumulate N column by column I thought of using
N = scipy.sparse.hstack([N, c])
To do this it would be nice to initialize the matrix with with rows of length 0. However,
N = scipy.sparse.csc_matrix((4,0))
raises a ValueError: invalid shape.
Any suggestions, how to do this right?
You can't. Sparse matrices are restricted compared to NumPy arrays and in particular don't allow 0 for any axis. All sparse matrix constructors check for this, so if and when you do manage to build such a matrix, you're exploiting a SciPy bug and your script is likely to break when you upgrade SciPy.
That being said, I don't see why you'd need an n × 0 sparse matrix since an n × 0 NumPy array is allowed and takes practically no storage space.
Turns out sparse.hstack cannot handle a NumPy array with a zero axis, so disregard my previous comment. However, what I think you should do is collect all the columns in a list, then hstack them in one call. That's better than your loop since append'ing to a list takes amortized constant time, while hstack takes linear time. So your proposed algorithm takes quadratic time while it could be linear.
You must use at least 1 in your shape.
N = scipy.sparse.csc_matrix((4,1))
Which you can stack:
print scipy.sparse.hstack( (N,N) )
#<4x2 sparse matrix of type '<type 'numpy.float64'>'
# with 0 stored elements in COOrdinate format>
I have two M X N matrices which I construct after extracting data from images. Both the vectors have lengthy first row and after the 3rd row they all become only first column.
for example raw vector looks like this
1,23,2,5,6,2,2,6,2,
12,4,5,5,
1,2,4,
1,
2,
2
:
Both vectors have a similar pattern where first three rows have lengthy row and then thin out as it progress. Do do cosine similarity I was thinking to use a padding technique to add zeros and make these two vectors N X N. I looked at Python options of cosine similarity but some examples were using a package call numpy. I couldn't figure out how exactly numpy can do this type of padding and carry out a cosine similarity. Any guidance would be greatly appreciated.
If both arrays have the same dimension, I would flatten them using NumPy. NumPy (and SciPy) is a powerful scientific computational tool that makes matrix manipulations way easier.
Here an example of how I would do it with NumPy and SciPy:
import numpy as np
from scipy.spatial import distance
A = np.array([[1,23,2,5,6,2,2,6,2],[12,4,5,5],[1,2,4],[1],[2],[2]], dtype=object )
B = np.array([[1,23,2,5,6,2,2,6,2],[12,4,5,5],[1,2,4],[1],[2],[2]], dtype=object )
Aflat = np.hstack(A)
Bflat = np.hstack(B)
dist = distance.cosine(Aflat, Bflat)
The result here is dist = 1.10e-16 (i.e., 0).
Note that I've used here the dtype=object because that's the only way I know to be able to store different shapes into an array in NumPy. That's why later I used hstack() in order to flatten the array (instead of using the more common flatten() function).
I would make them into a scipy sparse matrix (http://docs.scipy.org/doc/scipy/reference/sparse.html) and then run cosine similarity from the scikit learn module.
from scipy import sparse
sparse_matrix= scipy.sparse.csr_matrix(your_np_array)
from sklearn.metrics import pairwise_distances
from scipy.spatial.distance import cosine
distance_matrix= pairwise_distances(sparse_matrix, metric="cosine")
Why cant you just run a nested loop over both jagged lists (presumably), summating each row using Euclidian/vector dot product and using the result as a similarity measure. This assumes that the jagged dimensions are identical.
Although I'm not quite sure how you are getting a jagged array from a bitmap image (I would of assumed it would be a proper dense matrix of MxN form) or how the jagged array of arrays above is meant to represent an MxN matrix/image data, and therefore, how padding the data with zeros would make sense? If this was a sparse matrix representation, one would expect row/col information annotated with the values.
Say I have a huge numpy matrix A taking up tens of gigabytes. It takes a non-negligible amount of time to allocate this memory.
Let's say I also have a collection of scipy sparse matrices with the same dimensions as the numpy matrix. Sometimes I want to convert one of these sparse matrices into a dense matrix to perform some vectorized operations.
Can I load one of these sparse matrices into A rather than re-allocate space each time I want to convert a sparse matrix into a dense matrix? The .toarray() method which is available on scipy sparse matrices does not seem to take an optional dense array argument, but maybe there is some other way to do this.
If the sparse matrix is in the COO format:
def assign_coo_to_dense(sparse, dense):
dense[sparse.row, sparse.col] = sparse.data
If it is in the CSR format:
def assign_csr_to_dense(sparse, dense):
rows = sum((m * [k] for k, m in enumerate(np.diff(sparse.indptr))), [])
dense[rows, sparse.indices] = sparse.data
To be safe, you might want to add the following lines to the beginning of each of the functions above:
assert sparse.shape == dense.shape
dense[:] = 0
It does seem like there should be a better way to do this (and I haven't scoured the documentation), but you could always loop over the elements of the sparse array and assign to the dense array (probably zeroing out the dense array first). If this ends up too slow, that seems like an easy C extension to write....