I have a matrix and I want to check if it is sparse or not.
Things I have tried:
isinstance method:
if isinstance(<matrix>, scipy.sparse.csc.csc_matrix):
This works fine if I know exactly which sparse class I want to check.
getformat method: But it assumes that my matrix is sparse and give format
But I want a way to know if matrix is sparse or not, and should work irrespective of which sparse class.
Kindly help me.
scipy.sparse.issparse(my_matrix)
You can do sparsity = 1.0 - count_nonzero(X) / X.size
This works for any matrices.
Related
Can someone help me with this?
Have you checked numpy library? it makes working with matrices a lot easier. You can check it out in this links: https://www.tutorialspoint.com/python_data_structure/python_matrix.htm
numpy gives you a .dot method for matrix multiplication. Hope this guides you enough --> https://numpy.org/doc/stable/reference/generated/numpy.dot.html#numpy.dot
If you have a matrix like
mat=np.array([[1,2,3,4,5],
[2,3,4,5,6],
[3,4,5,6,7]])
and an array like
arr=np.array([1,2,3])
Then the required function using numpy multiplication can be so simple:
def multiplicate(mat,arr):
mat*arr.reshape((3,1))
I am using mlnn from skmultilearn.adapt library for one of my classification problems. The ouput which predict functions give me is sparse matrix of type int.
mlk=mlknn.MLkNN(k=10)
mlk.fit(training_M,Y_train)
output=mlk.predict(testing_M)
when i try to print the output like
print(output)
it shows me only 1 output i.e.
(0, 1120) 1
But I need to read the full matrix and find the non zero values.
if I do
output[2][4]
it shows me Row Index out of bound erro
How can i avoid this error and get the row and column index of all the non zero values?
This print is a condensed form and means that there is only one non-zero value in that matrix, otherwise there would be more output.
You can double-check this by calling output.nnz. (attribute, not function)
If you got enough memory, you can use output.todense() to obtain classic non-sparse numpy-arrays.
Otherwise look up the docs to see how to work with these more efficiently.
scipy sparse docs
Remark: your example output[2][4] shows that you are new to numpy/scipy and i highly recommend going through their docs. Indexing 2d-arrays / matrices is done like output[2,4]
Im working with python, sklearn and numpy and I am creating the following sparse matrix:
feats = tfidf_vect.fit_transform(np.asarray(tweets))
print(feats)
feats=np.log(np.asarray(feats))
but I am getting the following error when I apply the log:
Traceback (most recent call last):
File "src/ef_tfidf.py", line 100, in <module>
feats=np.log(np.asarray(feats))
AttributeError: log
the error is related with the fact that feats it's a sparse matrix I would appreciate any help with this, I mean a way to apply the log to a sparse matrix.
The correct way to convert a sparse matrix to an ndarray is with the toarray method:
feats = np.log(feats.toarray())
np.array doesn't understand sparse matrix inputs.
If you want to only take the log of non-zero entries and return a sparse matrix of results, the best way would probably be to take the logarithm of the matrix's data and build a new sparse matrix with that data.
How that works through the public interface is different for different sparse matrix types; you'd want to look up the constructor for whatever type you have. Alternatively, there's the private _with_data method:
feats = feats._with_data(np.log(feats.data), copy=True)
So I actually needed to take something like log(p+1) for some sparse matrix p and I found this scipy method log1p which returns exactly that on a sparse matrix. I don't have enough reputation to comment so I'm just putting this here in case it helps anyone.
You could apply this to the original question with
feats = (feats-1).log1p()
This has the advantage of keeping feats sparse.
fit_transform()returns scipy.sparse.coo_matrix object, which has data attribute linked to data array of the sparse matrix
You can use the data attribute to manipulate non-zero data of coo sparse matrix directly, as following:
feats.data = np.log(feats.data)
I have a huge sparse matrix. I would like to save the dense equivalent one into file system.
The problem is the memory limit on my machine.
My original idea is:
convert huge_sparse_matrix to ndarray by np.asarray(huge_sparse_matrix)
assign values
save it back to file system
However, at step 1, Python raises MemoryError.
One possible approach in my mind is:
create a chunk of the dense array
assign values from the corresponding sparse one
save the dense array chunk back to file system
repeat 1-3
But how to do that?
you can use the scipy.sparse function to read sparse matrix and then convert it to numpy , see documentation here scipy.sparse docs and examples
I think np.asarray() is not really the function you're looking for.
You might try the SciPy matrix format cco_matrix() (coordinate formatted matrix).
scipy.sparse.coo_matrix
this format allows to save huge sparse matrices in very little memory.
furthermore there are many mathematical scipy functions which also work with this matrix format.
The matrix representation in this format are basically three lists:
row: the index of the row
col: the index of the column
data: the value at this position
hope that helped, cheers
The common and most straightforward answer to memory problems is: Do not create objects, use an iterator or a generator.
If I understand correctly, you have a sparse matrix and you want to transform it into a list representation. Here's a sample code:
def iter_sparse_matrix ( m, d1, d2 ):
for i in xrange(d1):
for j in xrange(d2):
if m[i][j]:
yield ( i, j, m[i][j] )
dense_array = list(iter_sparse_matrix(m, d1, d2))
You might also want to look here:
http://cvxopt.org/userguide/matrices.html#sparse-matrices
If I'm not wrong the problem you have is that the dense of the sparse matrix does not fit in your memory, and thus, you are not able to save it.
What I would suggest you is to use HDF5. HDF5 handles big data in disk passing it to memory only when needed.
I something like this should work:
import h5py
data = # your sparse matrix
cx = data.tocoo() # coo sparse representation
This will create your data matrix (of zeros) in disk.
f = h5py.File('dset.h5','w')
dataset = f.create_dataset("data", data.shape)
Fill the matrix with the sparse data:
dataset[cx.row, cx.col] = cx.data
Add any modifications you want to dataset:
dataset[something, something] = something
And finally, save it:
file.close()
The way HDF5 works I think is perfect for your needs. The matrix is stored always in disk, so it doesn't require memory, however, you can operate with it as if it was a standard numpy matrix (indexing, slicing, np.(..) operations and so on) and the h5py driver will send the parts of the matrix that you need to memory (never the whole matrix unless you specifically require it with something like data[:, :]).
PS: I'm assuming your sparse matrix is one of the scipy's sparse matrix. If not replace cx.row, cx.col and cx.data from the ones provided by your matrix representation (should be something like it).
I'm having an error in my code, I hope you can help me!:
(When I paste the code something weird happens (not all of it is written like code) but here we go:
I want to linalg.solve(A,Res) . The first one (A) has 10 rows and 10 columns,i.e, matrix([10 arrays, 10 elements]) and the second one has 10 rows and 1 column, i.e, matrix([1 array, 10 elements]).
When I executed the code it throws the following error:
Singular Matrix
I don't know what to do. When I don't ask to linalg.solve, but ask to print both matrices, both are fine: 10 equations, 10 variables. So I don't know what's going on. Please Help!!!
If you need me to paste the code (as horrible as it looks) I can do it.
Thank you
A singular matrix is a matrix that cannot be inverted, or, equivalently, that has determinant zero. For this reason, you cannot solve a system of equations using a singular matrix (it may have no solution or multiple solutions, but in any case no unique solution). So better make sure your matrix is non-singular (i.e., has non-zero determinant), since numpy.linalg.solve requires non-singular matrices.
Here is some decent explanation about what's going on for 2 x 2 matrices (but the generalization is straightforward to N x N).