sparse matrix from dictionaries - python

I just started to learn to program in Python and I am trying to construct a sparse matrix using Scipy package. I found that there are different types of sparse matrices, but all of them require to store using three vectors like row, col, data; or if you want to each new entry separately, like S(i,j) = s_ij you need to initiate the matrix with a given size.
My question is if there is a way to store the matrix entrywise without needing the initial size, like a dictionary.

No. Any matrix in Scipy, sparse or not, must be instantiated with a size.

You can use usual dictionary with tuples of two integers as indices. For example:
matrix = {}
matrix[5, 7] = 1
matrix[3, 8] = 5

dic={}
a,b=int(input("Enter the order:")),int(input())
for i in range(a):
for j in range(b):
c=int(input())
if c!=0:
dic[(i,j)]=c
if len(dic)<=(a+b)/2:
print("sparse metrix")
else:
print("non sparse metrix")
for i in range(a):
for j in range(b):
print(dic.get((i,j),0),end=" ")
print()

Related

Python list.insert() multi-index / list of list

For performance reasons I'd like to use the Python list insert() method. I will demonstrate why:
My final list is a 31k * 31k matrix:
w=31*10**3
h=31*10**3
distance_matrix = [[0 for x in range(w)] for y in range(h)]
I intent to update the matrix one iteration at the time:
for i in range(len(index)):
for j in range(len(index)):
distance_matrix[index[i]][index[j]] = k[0][i][j]
Obviously this doesn't perform well.
I'd rather like to start with an empty list and fill it up gradually, making the computation intense at the end of the process (and easy at the beginning):
distance_matrix = []
for i in range(len(index)):
for j in range(len(index)):
distance_matrix.insert([index[i]][index[j]], k[0][i][j])
But this multi-index or list-in-list insert doesn't seem to be possible.
How would you advise to proceed? I've also looked into numpy arrays, but without luck so far.
To be precise: updating the (ordered) large array of zeros index by index is the issue here. In a DataFrame I can use custom columns/indices, but that is not scalable in performance.
Additional information:
I split up the entire original data matrix in parts to compute distance matrices in parallel. The issue in this process is to aggregate the distance matrix again with the computed values. The distance matrix/array is very large, therefore a simple list insert or edit takes very long.
I think this approach achieves what I had in mind:
distance_matrix = []
def dynamic_append(x,i,j,val):
if((len(x)-1)<i):
dif_x = i-len(x)+1
for k in range(dif_x):
x.append([])
dif_y = j-len(x[i])+1
for l in range(dif_y):
x[i].append([])
elif((len(x[i])-1)<j):
dif_y = j-len(x[i])+1
for l in range(dif_y):
x[i].append([])
x[i][j]=val
return(x)
for i in range(len(index)):
for j in range(len(index)):
distance_matrix=dynamic_append(distance_matrix,index[i],index[j],k[0][i][j])

element wise matrix multiplication python

Hi I'm stuck on what on the face of it seems a simple problem, so I must be missing something!
I have a list (of indeterminate length) of matrices calculated from user values. - ttranspose
I also have another single matrix, Qbar which I would like to multiply (matrix form) each of the matrices in ttranspose, and output a list of the resultant matrices. << Which should be the same length as ttranspose.
def Q_by_transpose(ttranspose, Qmatrix):
Q_by_transpose = []
for matrix in ttranspose:
Q_by_transpose_ind = np.matmul(ttranspose, Qmatrix)
Q_by_transpose.append(Q_by_transpose_ind)
return (Q_by_transpose)
Instead when I test this with a list of 6 matrices (ttranspose) I get the a long list of mtrices, which appears to be in 6 arrays (as expected) but each array is made up of 6 matrices?
Im hoping to create a list of matrices for which I would then perform elementwise multiplication between this and another list. So solving this will help on both fronts!
Any help would be greatly appreciated!
I am new to Python and Numpy so am hopeful you guys will be able to help!
Thanks
It appears that instead of passing a single matrix to the np.matmul function, you are passing the entire list of matrices. Instead of
for matrix in ttranspose:
Q_by_transpose_ind = np.matmul(ttranspose, Qmatrix)
Q_by_transpose.append(Q_by_transpose_ind)
do this:
for matrix in ttranspose:
Q_by_transpose_ind = np.matmul(matrix, Qmatrix)
Q_by_transpose.append(Q_by_transpose_ind)
This will only pass one matrix to np.matmul instead of the whole list. Essentially what you're doing right now is multiplying the entire list of matrices n times, where n is the number of matrices in ttranspose.

How to convert co-occurrence matrix to sparse matrix

I am starting dealing with sparse matrices so I'm not really proficient on this topic. My problem is, I have a simple coo-occurrences matrix from a word list, just a 2-dimensional co-occurrence matrix word by word counting how many times a word occurs in same context. The matrix is quite sparse since the corpus is not that big. I want to convert it to a sparse matrix to be able to deal better with it, eventually do some matrix multiplication afterwards. Here what I have done until now (only the first part, the rest is just output format and cleaning data):
def matrix(from_corpus):
d = defaultdict(lambda : defaultdict(int))
heads = set()
trans = set()
for text in corpus:
d[text[0]][text[1]] += 1
heads.add(text[0])
trans.add(text[1])
return d,heads,trans
My idea would be to make a new function:
def matrix_to_sparse(d):
A = sparse.lil_matrix(d)
Does this make any sense? This is however not working and somehow I don't the way how get a sparse matrix. Should I better work with numpy arrays? What would be the best way to do this. I want to compare many ways to deal with matrices.
It would be nice if some could put me in the direction.
Here's how you construct a document-term matrix A from a set of documents in SciPy's COO format, which is a good tradeoff between ease of use and efficiency(*):
vocabulary = {} # map terms to column indices
data = [] # values (maybe weights)
row = [] # row (document) indices
col = [] # column (term) indices
for i, doc in enumerate(documents):
for term in doc:
# get column index, adding the term to the vocabulary if needed
j = vocabulary.setdefault(term, len(vocabulary))
data.append(1) # uniform weights
row.append(i)
col.append(j)
A = scipy.sparse.coo_matrix((data, (row, col)))
Now, to get a cooccurrence matrix:
A.T * A
(ignore the diagonal, which holds cooccurrences of term with themselves, i.e. squared frequency).
Alternatively, use some package that does this kind of thing for you, such as Gensim or scikit-learn. (I'm a contributor to both projects, so this might not be unbiased advice.)

Inserting one matrix as an element of another matrix using Numpy

I'm using Numpy and have a 7x12x12 matrix whose values I would like to populate in 12x12 chunks, 7 different times. Suppose I have these 12x12 matrices:
first_Matrix
second_Matrix
third_Matrix
... (etc)
seventh_Matrix = first_Matrix + second_Matrix + third_Matrix...
that I'd like to add to:
grand_Matrix
How can I do this? I assume there is a better way than loops that map the coordinates from one matrix to the next, and if there's not, could someone please write out the code for mapping first_Matrix into the first 12x12 element of grand_Matrix?
grand_Matrix[0,...] = first_Matrix
grand_Matrix[1,...] = second_Matrix
and so on.
Anyway, as #Lattyware commented, it is a bad design to have extra names for so many such homogenous objects.
If you have a list of 12x12 matrices:
grand_Matrix = np.vstack(m[None,...] for m in matrices)
None adds a new dimension to each matrix and stacks them along this dimension.

How to represent matrices in python

How can I represent matrices in python?
Take a look at this answer:
from numpy import matrix
from numpy import linalg
A = matrix( [[1,2,3],[11,12,13],[21,22,23]]) # Creates a matrix.
x = matrix( [[1],[2],[3]] ) # Creates a matrix (like a column vector).
y = matrix( [[1,2,3]] ) # Creates a matrix (like a row vector).
print A.T # Transpose of A.
print A*x # Matrix multiplication of A and x.
print A.I # Inverse of A.
print linalg.solve(A, x) # Solve the linear equation system.
Python doesn't have matrices. You can use a list of lists or NumPy
If you are not going to use the NumPy library, you can use the nested list. This is code to implement the dynamic nested list (2-dimensional lists).
Let r is the number of rows
let r=3
m=[]
for i in range(r):
m.append([int(x) for x in raw_input().split()])
Any time you can append a row using
m.append([int(x) for x in raw_input().split()])
Above, you have to enter the matrix row-wise. To insert a column:
for i in m:
i.append(x) # x is the value to be added in column
To print the matrix:
print m # all in single row
for i in m:
print i # each row in a different line
((1,2,3,4),
(5,6,7,8),
(9,0,1,2))
Using tuples instead of lists makes it marginally harder to change the data structure in unwanted ways.
If you are going to do extensive use of those, you are best off wrapping a true number array in a class, so you can define methods and properties on them. (Or, you could NumPy, SciPy, ... if you are going to do your processing with those libraries.)

Categories

Resources