how to create a sparse matrix from lists of numbers - python

I have three lists namely A , B , C
All these lists contain 97510 items . I need to create a sparse matrix like this
matrix[A[0]][B[0]] = C[0]
For example ,
A=[1,2,3,4,5]
B=[7,8,9,10,11]
C=[14,15,16,17,18]
I need to create a sparse matrix with
matrix[1][7] = 14 # (which is C[0])
matrix[2][8] = 15 # and so on .....
I tried and python gives me an error saying that "Index values must be continuous"
How do I do it?

I suggest you have a look at the SciPy sparse matrices. E.g. a COO sparse matrix:
matrix = sparse.coo_matrix((C,(A,B)),shape=(5,5))
Note: I just took the COO matrix because it was in the example, you can take any other. You probably have to try which one is most suitable for your situation. They all differ in the way how the data is compressed and this has an influence on the performance of certain operations.

If you simply need a way how to get matrix[A[0]][B[0]] = C[0] you can use the following:
A=[1,2,3,4,5]
B=[7,8,9,10,11]
C=[14,15,16,17,18]
matrix = dict((v,{B[i]:C[i]}) for i, v in enumerate(A))
EDITED(thanx for gnibbler):
A = [1,2,3,4,5]
B = [7,8,9,10,11]
C = [14,15,16,17,18]
matrix = dict(((v, B[i]), C[i]) for i, v in enumerate(A))

It is very simple to use a dict, especially if you are willing to change the way you write the indices slightly
>>> A=[1,2,3,4,5]
>>> B=[7,8,9,10,11]
>>> C=[14,15,16,17,18]
>>> matrix=dict(((a,b),c) for a,b,c in zip(A,B,C))
>>> matrix[1,7]
14
>>> matrix[2,8]
15
>>>

Have a look at numpy/scipy which has support for sparse matrixes. See e.g. here

Related

Numpy split array by grouping array

There are the following 2 arrays with equal length. My goal is to split the array B into groups defined by the array A. So finally there should be 3 arrays or an list of array. The final list of arrays should consists of the following rows of array B:
First and second
Third and fifth
Fourth
The order is not really relevant.
A = array([[-1],
[ 1],
[ 0],
[ 0],
[ 1]])
B = array([[ 624.5 , 548. ],
[ 912.8201, 564.3444],
[1564.5 , 764. ],
[1463.4163, 785.9251],
[1698.0757, 846.6306]])
The problem occured to me by using the dbscan clustering function. The A array describes the clusters (0, 1) of the points in array B. The values -1 declares the point as outlier. (The values used are not precise).
My goal is to calculate the compactness, ... of each found cluster
The numpy_indexed package (disclaimer: i am its author) was designed with these type of use cases in mind.
import numpy_indexed as npi
C = npi.group_by(A).split(B)
Not sure what you mean by compactness of each group; but rather than splitting and doing subsequent computations, it is typically more efficient to compute reductions over groups directly; whereby you can reuse the grouping object for increased efficiency:
groups = npi.group_by(A)
mean = groups.mean(B)
std = groups.std(B)
Keep is simple:
[data[labels == l] for l in np.unique(labels)]
Similarly, you can build a dict in a one-liner.
this is a bit lengthy but it should work.
final_dict = {}
for counter in range(0,len(A)):
if(A[counter] not in final_dict):
final_dict[A[counter]] = B[counter]
else:
final_dict[A[counter]] = final_dict[A[counter]] + B[counter]
final_array = []
for key,value in final_dict.items():
final_array.append(value)
Basically since you have odd values like -1 to work with you can set it as keys of a dictionary and then you iterate over the dictionary to get the groups of values which you can then append to a final output array

Row Division in Scipy Sparse Matrix

I want to divide a sparse matrix's rows by scalars given in an array.
For example, I have a csr_matrix C :
C = [[2,4,6], [5,10,15]]
D = [2,5]
I want the result of C after division to be :
result = [[1, 2, 3], [1, 2, 3]]
I have tried this using the method that we use for numpy arrays:
result = C / D[:,None]
But this seems really slow. How to do this efficiently in sparse matrices?
Approach #1
Here's a sparse matrix solution using manual replication with indexing -
from scipy.sparse import csr_matrix
r,c = C.nonzero()
rD_sp = csr_matrix(((1.0/D)[r], (r,c)), shape=(C.shape))
out = C.multiply(rD_sp)
The output is a sparse matrix as well as opposed to the output from C / D[:,None] that creates a full matrix. As such, the proposed approach saves on memory.
Possible performance boost with replication using np.repeat instead of indexing -
val = np.repeat(1.0/D, C.getnnz(axis=1))
rD_sp = csr_matrix((val, (r,c)), shape=(C.shape))
Approach #2
Another approach could involve data method of the sparse matrix that gives us a flattened view into the sparse matrix for in-place results and also avoid the use of nonzero, like so -
val = np.repeat(D, C.getnnz(axis=1))
C.data /= val
Question: I want to divide a sparse matrix's rows by scalars given in an array.
For example:
C = [[2,4,6], [5,10,15]]
D = [2,5]
Answer : use "multiply" provided by sparse matrix interface - it allows to "pointwise" multiply matrices by matrices as well as by vectors and scalars
C = [[2,4,6], [5,10,15]]
D = [2,5]
from scipy.sparse import csr_matrix
c = csr_matrix(C)
c2 = c.multiply( 1/np.array(D).reshape(2,1) )
c2.toarray()
'output:' array([[ 2, 4, 6],
[ 5, 10, 15]], dtype=int64)
PS
Thanks to Alexander Kirillin
one line code: result = [[C[i][j]/D[i] for j in range(len(C[0]))] for i in range(len(D))]
C = [[2,4,6], [5,10,15]] #len(C[0]) = 3
D = [2,5] # len(D) = 2
result = [[C[i][j]/D[i] for j in range(len(C[0]))] for i in range(len(D))]
print result
If you first cast D to type numpy.matrix (which I'm assuming you can do unless D is too big to fit into memory), then you can just run
C.multiply(1.0 / D.T)
to get what you want.

Check how many numpy array within a numpy array are equal to other numpy arrays within another numpy array of different size

My problem
Suppose I have
a = np.array([ np.array([1,2]), np.array([3,4]), np.array([5,6]), np.array([7,8]), np.array([9,10])])
b = np.array([ np.array([5,6]), np.array([1,2]), np.array([3,192])])
They are two arrays, of different sizes, containing other arrays (the inner arrays have same sizes!)
I want to count how many items of b (i.e. inner arrays) are also in a. Notice that I am not considering their position!
How can I do that?
My Try
count = 0
for bitem in b:
for aitem in a:
if aitem==bitem:
count+=1
Is there a better way? Especially in one line, maybe with some comprehension..
The numpy_indexed package contains efficient (nlogn, generally) and vectorized solutions to these types of problems:
import numpy_indexed as npi
count = len(npi.intersection(a, b))
Note that this is subtly different than your double loop, discarding duplicate entries in a and b for instance. If you want to retain duplicates in b, this would work:
count = npi.in_(b, a).sum()
Duplicate entries in a could also be handled by doing npi.count(a) and factoring in the result of that; but anyway, im just rambling on for illustration purposes since I imagine the distinction probably does not matter to you.
Here is a simple way to do it:
a = np.array([ np.array([1,2]), np.array([3,4]), np.array([5,6]), np.array([7,8]), np.array([9,10])])
b = np.array([ np.array([5,6]), np.array([1,2]), np.array([3,192])])
count = np.count_nonzero(
np.any(np.all(a[:, np.newaxis, :] == b[np.newaxis, :, :], axis=-1), axis=0))
print(count)
>>> 2
You can do what you want in one liner as follows:
count = sum([np.array_equal(x,y) for x,y in product(a,b)])
Explanation
Here's an explanation of what's happening:
Iterate through the two arrays using itertools.product which will create an iterator over the cartesian product of the two arrays.
Compare each two arrays in a tuple (x,y) coming from step 1. using np.array_equal
True is equal to 1 when using sum on a list
Full example:
The final code looks like this:
import numpy as np
from itertools import product
a = np.array([ np.array([1,2]), np.array([3,4]), np.array([5,6]), np.array([7,8]), np.array([9,10])])
b = np.array([ np.array([5,6]), np.array([1,2]), np.array([3,192])])
count = sum([np.array_equal(x,y) for x,y in product(a,b)])
# output: 2
You can convert the rows to dtype = np.void and then use np.in1d as on the resulting 1d arrays
def void_arr(a):
return np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
b[np.in1d(void_arr(b), void_arr(a))]
array([[5, 6],
[1, 2]])
If you just want the number of intersections, it's
np.in1d(void_arr(b), void_arr(a)).sum()
2
Note: if there are repeat items in b or a, then np.in1d(void_arr(b), void_arr(a)).sum() likely won't be equal to np.in1d(void_arr(a), void_arr(b)).sum(). I've reversed the order from my original answer to match your question (i.e. how many elements of b are in a?)
For more information, see the third answer here

Sorting a NumPy array and permuting another one along with it

I have two, numpy arrays, the first, A, being one-dimensional, the second, B, is two-dimensional in the application I have in mind, but really could have any dimension. Every single index of B covers the same range as the single index of A.
Now, I'd like to sort A (in descending order) but would like to permute every dimension of B along with it. Mathematically speaking, if P is the permutation matrix that sorts A, I would like to transform B according to np.dot(P, np.dot(B, P.T)). E.g. consider this example where sorting coincidentally corresponds to reversing the order:
In [1]: import numpy as np
In [2]: A = np.array([1,2,3])
In [3]: B = np.random.rand(3,3); B
Out[3]:
array([[ 0.67402953, 0.45017072, 0.24324747],
[ 0.40559793, 0.79007712, 0.94247771],
[ 0.47477422, 0.27599007, 0.13941255]])
In [4]: # desired output:
In [5]: A[::-1]
Out[5]: array([3, 2, 1])
In [6]: B[::-1,::-1]
Out[6]:
array([[ 0.13941255, 0.27599007, 0.47477422],
[ 0.94247771, 0.79007712, 0.40559793],
[ 0.24324747, 0.45017072, 0.67402953]])
The application I have in mind is to obtain eigenvalues and eigenvectors of a nonsymmetric matrix using np.linalg.eig (in contrast to eigh, eig does not guarantee any ordering of the eigenvalues), sort them by absolute value, and truncate the space. It would be beneficial to permute the components of the matrix holding the eigenvectors along with the eigenvalues and perform the truncation by slicing it.
You can use np.argsort to get sorted indices of A. Then you can use these indices to rearrange B.
It is not entirely cear how you want to rearrange B...
p = np.argsort(A)
B[:, p][p, :] # rearrange rows and column of B
B.transpose(p) # rearrange dimensions of B
If you want to order eigenvectors according to eigenvalues, you should only rearrange the columns of the eigenvectors:
(Also, it may make sense to use the absolute value, in case you get complex eigenvalues)
e, v = eig(x)
p = np.argsort(np.abs(e))[::-1] # descending order
v = v[:, p]
You can use numpy.argsort to get the index mapping. For example:
test=np.array([2,1,3])
test_array=np.array([[2,3,4],[1,2,3]])
rearranged_array=test_array[:,test.argsort()]
Here, test.argsort() yields [1,0,2].

How to create nested lists in python?

I know you can create easily nested lists in python like this:
[[1,2],[3,4]]
But how to create a 3x3x3 matrix of zeroes?
[[[0] * 3 for i in range(0, 3)] for j in range (0,3)]
or
[[[0]*3]*3]*3
Doesn't seem right. There is no way to create it just passing a list of dimensions to a method? Ex:
CreateArray([3,3,3])
In case a matrix is actually what you are looking for, consider the numpy package.
http://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros.html#numpy.zeros
This will give you a 3x3x3 array of zeros:
numpy.zeros((3,3,3))
You also benefit from the convenience features of a module built for scientific computing.
List comprehensions are just syntactic sugar for adding expressiveness to list initialization; in your case, I would not use them at all, and go for a simple nested loop.
On a completely different level: do you think the n-dimensional array of NumPy could be a better approach?
Although you can use lists to implement multi-dimensional matrices, I think they are not the best tool for that goal.
NumPy addresses this problem
Link
>>> a = array( [2,3,4] )
>>> a
array([2, 3, 4])
>>> type(a)
<type 'numpy.ndarray'>
But if you want to use the Python native lists as a matrix the following helper methods can become handy:
import copy
def Create(dimensions, item):
for dimension in dimensions:
item = map(copy.copy, [item] * dimension)
return item
def Get(matrix, position):
for index in position:
matrix = matrix[index]
return matrix
def Set(matrix, position, value):
for index in position[:-1]:
matrix = matrix[index]
matrix[position[-1]] = value
Or use the nest function defined here, combined with repeat(0) from the itertools module:
nest(itertools.repeat(0),[3,3,3])
Just nest the multiplication syntax:
[[[0] * 3] * 3] * 3
It's therefore simple to express this operation using folds
def zeros(dimensions):
return reduce(lambda x, d: [x] * d, [0] + dimensions)
Or if you want to avoid reference replication, so altering one item won't affect any other you should instead use copies:
import copy
def zeros(dimensions):
item = 0
for dimension in dimensions:
item = map(copy.copy, [item] * dimension)
return item

Categories

Resources