How to write this in most efficient way - python

I have an array (N = 10^4) and I need to find a difference between each two of the entries( calculating a potential given the coordinates of the atoms)
Here is the code I am writing in pure python, but its really not effective, can anyone tell me how to speed it up? (using numpy or weave). Here x,y are arrays of coordinates of atoms(just simple 1D array)
def potential(r):
U = 4.*(np.power(r,-12) - np.power(r,-6))
return U
def total_energy(x):
E = 0.
#need to speed up this part
for i in range(N-1):
for j in range(i):
E += potential(np.sqrt((x[i]-x[j])**2))
return E

first you can use array arithmetics:
def potential(r):
return 4.*(r**(-12) - r**(-6))
def total_energy(x):
E = 0.
for i in range(N-1):
E += potential(np.sqrt((x[i]-x[:i])**2)).sum()
return E
or you can test the fully vectorized version:
def total_energy(x):
b=np.diag(x).cumsum(1)-x
return potential(abs(b[np.triu_indices_from(b,1)])).sum()

I would recommend looking into scipy.spatial.distance. Using pdist in particular computes all pairwise distances of an array.
I am assuming that you have an array that is of shape (Nx3), thus we need to slightly change your code:
def potential(r):
U = 4.*(np.power(r,-12) - np.power(r,-6))
return U
def total_energy(x):
E = 0.
#need to speed up this part
for i in range(N): #To N here
for j in range(i):
E += potential(np.sqrt(np.sum((x[i]-x[j])**2))) #Add sum here
return E
Now lets rewrite this using spatial:
import scipy.spatial.distance as sd
def scipy_LJ(arr, sigma=None):
"""
Computes the Lennard-Jones potential for an array (M x N) of M points
in N dimensional space. Usage of a sigma parameter is optional.
"""
if len(arr.shape)==1:
arr = arr[:,None]
r = sd.pdist(arr)
if sigma==None:
np.power(r, -6, out=r)
return np.sum(r**2 - r)*4
else:
r *= sigma
np.power(r, -6, out=r)
return np.sum(r**2 - r)*4
Lets run some tests:
N = 1000
points = np.random.rand(N,3)+0.1
np.allclose(total_energy(points), scipy_LJ(points))
Out[43]: True
%timeit total_energy(points)
1 loops, best of 3: 13.6 s per loop
%timeit scipy_LJ(points)
10 loops, best of 3: 24.3 ms per loop
Now it is ~500 times faster!
N = 10000
points = np.random.rand(N,3)+0.1
%timeit scipy_LJ(points)
1 loops, best of 3: 3.05 s per loop
This used ~2GB of ram.

Here is the final answer with some timing
0) The Plain version (Really slow)
In [16]: %timeit total_energy(points)
1 loops, best of 3: 14.9 s per loop
1) SciPy version
In [9]: %timeit scipy_LJ(points)
10 loops, best of 3: 44 ms per loop
1-2) Numpy version
%timeit sum( potential(np.sqrt((x[i]-x[:i])**2 + (y[i]-y[:i])**2 + (z[i] - z[:i])**2)).sum() for i in range(N-1))
10 loops, best of 3: 126 ms per loop
2) Insanely fast Fortran version (! - means comment)
subroutine EnergyForces(Pos, PEnergy, Dim, NAtom)
implicit none
integer, intent(in) :: Dim, NAtom
real(8), intent(in), dimension(0:NAtom-1, 0:Dim-1) :: Pos
! real(8), intent(in) :: L
real(8), intent(out) :: PEnergy
real(8), dimension(Dim) :: rij, Posi
real(8) :: d2, id2, id6, id12
real(8) :: rc2, Shift
integer :: i, j
PEnergy = 0.
do i = 0, NAtom - 1
!store Pos(i,:) in a temporary array for faster access in j loop
Posi = Pos(i,:)
do j = i + 1, NAtom - 1
rij = Pos(j,:) - Posi
! rij = rij - L * dnint(rij / L)
!compute only the squared distance and compare to squared cut
d2 = sum(rij * rij)
id2 = 1. / d2 !inverse squared distance
id6 = id2 * id2 * id2 !inverse sixth distance
id12 = id6 * id6 !inverse twelvth distance
PEnergy = PEnergy + 4. * (id12 - id6)
enddo
enddo
end subroutine
after calling it
In [14]: %timeit ljlib.energyforces(points.transpose(), 3, N)
10000 loops, best of 3: 61 us per loop
3) Conclusion Fortran is 1000 times faster than scipy and 3000 times faster than numpy, and millions times faster than the pure python. That is because the Scipy version creates a matrix of differences and then analyzes it, whereas the Fortran version does everything on the fly.

Thank you for your help. Here is what I have found.
The shortest version
return sum( potential(np.sqrt((x[i]-x[:i])**2)).sum() for i in range(N-1))
The scipy version is also good.
The fastest version one may consider is to use the f2py program, i.e write the bottleneck slow part in pure Fortran(which is insanely fast), compile it and then plug it into your python code as a library
For example I have:
program_lj.f90
$gfortran -c program_lj.f90
If one defines all the types explicitly in fortran program we are good to go.
$f2py -c -m program_lj program_lj.f90
After compilation the only thing left would be to call the program from python.
in python program:
import program_lj
result = program_lj.subroutine_in_program(parameters)
In case you need a more general reference please refer to wonderful webpage.

Related

Run two nested for loops in parallel to create matrix

I've written a method that takes in an integer "n" and creates a square matrix where the values of each element are dictated by their respective i,j indices.
When I build a small matrix 30x30 it works just fine, but when I try to do something larger like 1000x1000 it takes very long. Is there any way that I can speed it up with multiprocessing?
def createMatrix(n):
matrix = []
for j in range(1,n+1):
row = []
for i in range(1,n+1):
value = 1/(i+j-1)
row.append(value)
matrix.append(row)
return np.array(matrix)
Parallelizing two computation-bound for loops in Python is not trivial because of GIL. The good news is that your case is perfectly vectorizeable:
def createMatrix(n):
return 1 / (np.arange(n)[None, :] + np.arange(n)[:, None] + 1)
Explanation:
essentially, your formula for the matrix is X[row][column] = 1/(row+column-1), where rows and columns are 1-based
np.arange(n) creates a range that can be used for rows or columns
[None, :] and [:, None] turn it into a 2d array, 1 x n or n x 1
numpy then broadcasts dimensions, replicating row and column indexes to match dimensions - thus, implicitly tiling both into n x n when added
since both ranges are 0-based, using +1 instead of -1
As a rule of thumb, it is almost never a good idea to use for loops on numpy arrays. A vectorized approach (i.e. matrix form computations) is orders of magnitude faster.
It's not a good idea to use fors to fill a list then convert it to a matrix. the operation that you have can be vectorized with numpy from scratch. if you think that given the i,j, M(i,j) = 1/(j+i-1) considering that both indices starts at 1.
Here's my proposal :
def createMatrix2(n):
arr =np.arange(1,n+1)
xx,yy = np.meshgrid(arr,arr)
matrix = 1/(xx+yy-1)
return matrix
looking at Marat answer, I think his/her it's better, so tested the 3 methods:
EDIT: added wwii method as createMatrix4 (correcting the errors):
import numpy as np
from time import time
def createMatrix1(n):
matrix = []
for j in range(1,n+1):
row = []
for i in range(1,n+1):
value = 1/(i+j-1)
row.append(value)
matrix.append(row)
return np.array(matrix)
def createMatrix2(n):
arr =np.arange(1,n+1)
xx,yy = np.meshgrid(arr,arr)
matrix = 1/(xx+yy-1)
return matrix
def createMatrix3(n):
"""Marat's proposed matrix"""
return 1 / (1 + np.arange(n)[None, :] + np.arange(n)[:, None])
def createMatrix4(n):
""" wwii method"""
i,j = np.ogrid[1:n,1:n]
return 1/(i+j-1)
#test all the three methods
n = 10000
t1 = time()
m1 = createMatrix1(n)
t2 = time()
m2 = createMatrix2(n)
t3 = time()
m3 = createMatrix3(n)
t4 = time()
m4 = createMatrix4(n)
t5 = time()
print(np.allclose(m1,m2))
print(np.allclose(m1,m3))
print(np.allclose(m1,m4))
print("Matrix 1 (OP): ",t2-t1)
print("Matrix 2: (mine)",t3-t2)
print("Matrix 3: (Marat)",t4-t3)
print("Matrix 4: (wwii)",t5-t4)
# the output is:
#True
#True
#True
#Matrix 1 (OP): 18.4886577129364
#Matrix 2: (mine) 1.005324363708496
#Matrix 3: (Marat) 0.43033909797668457
#Matrix 4: (wwii) 0.5138359069824219
So Marat's solution is faster. As general comments:
Try to avoid fors loops
Think your problem as operation with indices and dessing operations with numpy arrays directly.
For last, given Marat's answer I thought my proposal is a easier to read, and understand. But it's just a subjective view
Your code can be written in another style, accelerated by numba library in a parallel no python mode:
import numba as nb
#nb.njit("float64[:, ::1](int64)", parallel=True, fastmath=True)
def createMatrix(n):
matrix = np.empty((n, n)) # np.zeros is slower than np.empty
for j in nb.prange(1, n + 1):
for i in range(1, n + 1):
matrix[j - 1, i - 1] = 1 / (i + j - 1)
return matrix
This solution will be faster than the Marat answer above 3 times.
Benchmarks: (temporary link to colab)
n = 1000
1000 loops, best of 5: 3.52 ms per loop # Marat
1000 loops, best of 5: 1.5 ms per loop # numba accelerated with np.zeros
1000 loops, best of 5: 1.05 ms per loop # numba accelerated with np.empty
n = 3000
1000 loops, best of 5: 39.5 ms per loop
1000 loops, best of 5: 19.3 ms per loop
1000 loops, best of 5: 8.91 ms per loop
n = 5000
1000 loops, best of 5: 109 ms per loop
1000 loops, best of 5: 53.5 ms per loop
1000 loops, best of 5: 24.8 ms per loop

how to compare entries in numpy array with each other efficiently?

I have a numpy array embed_vec of length tot_vec in which each entry is a 3d vector:
[[ 0.52483319 0.78015841 0.71117216]
[ 0.53041481 0.79462171 0.67234534]
[ 0.53645428 0.80896727 0.63119403]
...,
[ 0.72283509 0.40070804 0.15220522]
[ 0.71277758 0.38498613 0.16141834]
[ 0.70221445 0.36918032 0.17370776]]
For each of the elements in this array, I want to find out the number of other entries which are "close" to that entry. By close, I mean that the distance between two vectors is less than a specified value R. For this, I must compare all the possible pairs in this array with each other and then find out the number of close vectors for each of the vectors in the array. So I am doing this:
p = np.zeros(tot_vec) # This contains the number of close vectors
for i in range(tot_vec-1):
for j in range(i+1, tot_vec):
if np.linalg.norm(embed_vec[i]-embed_vec[j]) < R:
p[i] += 1
However, this is extremely inefficient because I have two nested python loops and for larger array sizes, this takes forever. If this were in C++ or Fortran, it wouldn't have been a great issue. My question is, can one achieve the same thing using numpy efficiently using some vectorization method? As a side note, I don't mind a solution using Pandas also.
Approach #1 : Vectorized approach -
def vectorized_app(embed_vec, R):
tot_vec = embed_vec.shape[0]
r,c = np.triu_indices(tot_vec,1)
subs = embed_vec[r] - embed_vec[c]
dists = np.einsum('ij,ij->i',subs,subs)
return np.bincount(r,dists<R**2,minlength=tot_vec)
Approach #2 : With less loop complexity (for very large arrays) -
def loopy_less_app(embed_vec, R):
tot_vec = embed_vec.shape[0]
Rsq = R**2
out = np.zeros(tot_vec,dtype=int)
for i in range(tot_vec):
subs = embed_vec[i] - embed_vec[i+1:tot_vec]
dists = np.einsum('ij,ij->i',subs,subs)
out[i] = np.count_nonzero(dists < Rsq)
return out
Benchmarking
Original approach -
def loopy_app(embed_vec, R):
tot_vec = embed_vec.shape[0]
p = np.zeros(tot_vec) # This contains the number of close vectors
for i in range(tot_vec-1):
for j in range(i+1, tot_vec):
if np.linalg.norm(embed_vec[i]-embed_vec[j]) < R:
p[i] += 1
return p
Timings -
In [76]: # Sample random array
...: embed_vec = np.random.rand(3000,3)
...: R = 0.5
...:
In [77]: %timeit loopy_app(embed_vec, R)
1 loops, best of 3: 50.5 s per loop
In [78]: %timeit loopy_less_app(embed_vec, R)
10 loops, best of 3: 143 ms per loop
350x+ speedup there!
Going with much bigger array with the proposed loopy_less_app -
In [81]: # Sample random array
...: embed_vec = np.random.rand(20000,3)
...: R = 0.5
...:
In [82]: %timeit loopy_less_app(embed_vec, R)
1 loops, best of 3: 4.47 s per loop
I am intrigued by that question and attempted to solve it efficintly using scipy's cKDTree. However, this approach may run out of memory because internally a list of all pairs with distance <= R is maintained. If your R and tot_vec are small enough it will work:
import numpy as np
from scipy.spatial import cKDTree as KDTree
tot_vec = 60000
embed_vec = np.random.randn(tot_vec, 3)
R = 0.1
tree = KDTree(embed_vec, leafsize=100)
p = np.zeros(tot_vec)
for pair in tree.query_pairs(R):
p[pair[0]] += 1
p[pair[1]] += 1
In case memory is an issue, with some effort it is possible to rewrite query_pairs as a generator function in Python at the cost of C performance.
first broadcast the difference:
disp_vecs=tot_vec[:,None,:]-tot_vec[None,:,:]
Now, depending on how big your dataset is, you may want to do a fist pass without all the math. If the distance is less than r, all the components should be less than r
first_mask=np.max(disp_vec, axis=-1)<r
Then do the actual calculation
disps=np.linlg.norm(disp_vec[first_mask],axis=-1)
second_mask=disps<r
Now reassign
disps=disps[second_mask]
first_mask[first_mask]=second_mask
disps are now the good values, and first_mask is a boolean mask of where they go. You can process from there.

Python, numpy, einsum multiply a stack of matrices

For performance reasons,
I'm curious if there is a way to multiply a stack of a stack of matrices. I have a 4-D array (500, 201, 2, 2). Its basically a 500 length stack of (201,2,2) matrices where for each of the 500, I want to multiply the adjacent matrices using einsum and get another (201,2,2) matrix.
I am only doing matrix multiplication on the [2x2] matrices at the end. Since my explanation is already heading off the rails, I'll just show what I'm doing now, and also the 'reduce' equivalent and why its not helpful (because its the same speed computationally). Preferably this would be a numpy one-liner, but I don't know what that is, or even if its possible.
Code:
Arr = rand(500,201,2,2)
def loopMult(Arr):
ArrMult = Arr[0]
for i in range(1,len(Arr)):
ArrMult = np.einsum('fij,fjk->fik', ArrMult, Arr[i])
return ArrMult
def myeinsum(A1, A2):
return np.einsum('fij,fjk->fik', A1, A2)
A1 = loopMult(Arr)
A2 = reduce(myeinsum, Arr)
print np.all(A1 == A2)
print shape(A1); print shape(A2)
%timeit loopMult(Arr)
%timeit reduce(myeinsum, Arr)
Returns:
True
(201, 2, 2)
(201, 2, 2)
10 loops, best of 3: 34.8 ms per loop
10 loops, best of 3: 35.2 ms per loop
Any help would be appreciated. Things are functional, but when I have to iterate this over a large series of parameters, the code tends to take a long time, and I'm wondering if there's a way to avoid the 500 iterations through a loop.
I don't think it's possible to do this efficiently using numpy (the cumprod solution was elegant, though). This is the sort of situation where I would use f2py. It's the simplest way of calling a faster language that I know of and only requires a single extra file.
fortran.f90:
subroutine multimul(a, b)
implicit none
real(8), intent(in) :: a(:,:,:,:)
real(8), intent(out) :: b(size(a,1),size(a,2),size(a,3))
real(8) :: work(size(a,1),size(a,2))
integer i, j, k, l, m
!$omp parallel do private(work,i,j)
do i = 1, size(b,3)
b(:,:,i) = a(:,:,i,size(a,4))
do j = size(a,4)-1, 1, -1
work = matmul(b(:,:,i),a(:,:,i,j))
b(:,:,i) = work
end do
end do
end subroutine
Compile with f2py -c -m fortran fortran.f90 (or F90FLAGS="-fopenmp" f2py -c -m fortran fortran.f90 -lgomp to enable OpenMP acceleration). Then you would use it in your script as
import numpy as np, fmuls
Arr = np.random.standard_normal([500,201,2,2])
def loopMult(Arr):
ArrMult = Arr[0]
for i in range(1,len(Arr)):
ArrMult = np.einsum('fij,fjk->fik', ArrMult, Arr[i])
return ArrMult
def myeinsum(A1, A2):
return np.einsum('fij,fjk->fik', A1, A2)
A1 = loopMult(Arr)
A2 = reduce(myeinsum, Arr)
A3 = fmuls.multimul(Arr.T).T
print np.allclose(A1,A2)
print np.allclose(A1,A3)
%timeit loopMult(Arr)
%timeit reduce(myeinsum, Arr)
%timeit fmuls.multimul(Arr.T).T
Which outputs
True
True
10 loops, best of 3: 48.4 ms per loop
10 loops, best of 3: 48.8 ms per loop
100 loops, best of 3: 5.82 ms per loop
So that's a factor 8 speedup. The reason for all the transposes is that f2py implicitly transposes all the arrays, and we need to transpose them manually to tell it that our fortran code expects things to be transposed. This avoids a copy operation. The cost is that each of our 2x2 matrices are transposed, so to avoid performing the wrong operation we have to loop in reverse.
Greater speedups than 8 should be possible - I didn't spend any time trying to optimize this.

numpy vectorization of double python for loop

V is (n,p) numpy array typically dimensions are n~10, p~20000
The code I have now looks like
A = np.zeros(p)
for i in xrange(n):
for j in xrange(i+1):
A += F[i,j] * V[i,:] * V[j,:]
How would I go about rewriting this to avoid the double python for loop?
While Isaac's answer seems promising, as it removes those two nested for loops, you are having to create an intermediate array M which is n times the size of your original V array. Python for loops are not cheap, but memory access ain't free either:
n = 10
p = 20000
V = np.random.rand(n, p)
F = np.random.rand(n, n)
def op_code(V, F):
n, p = V.shape
A = np.zeros(p)
for i in xrange(n):
for j in xrange(i+1):
A += F[i,j] * V[i,:] * V[j,:]
return A
def isaac_code(V, F):
n, p = V.shape
F = F.copy()
F[np.triu_indices(n, 1)] = 0
M = (V.reshape(n, 1, p) * V.reshape(1, n, p)) * F.reshape(n, n, 1)
return M.sum((0, 1))
If you now take both for a test ride:
In [20]: np.allclose(isaac_code(V, F), op_code(V, F))
Out[20]: True
In [21]: %timeit op_code(V, F)
100 loops, best of 3: 3.18 ms per loop
In [22]: %timeit isaac_code(V, F)
10 loops, best of 3: 24.3 ms per loop
So removing the for loops is costing you an 8x slowdown. Not a very good thing... At this point you may even want to consider whether a function taking about 3ms to evaluate requires any further optimization. IN case you do, there's a small improvement which can be had by using np.einsum:
def einsum_code(V, F):
n, p = V.shape
F = F.copy()
F[np.triu_indices(n, 1)] = 0
return np.einsum('ij,ik,jk->k', F, V, V)
And now:
In [23]: np.allclose(einsum_code(V, F), op_code(V, F))
Out[23]: True
In [24]: %timeit einsum_code(V, F)
100 loops, best of 3: 2.53 ms per loop
So that's roughly a 20% speed up that introduces code that may very well not be as readable as your for loops. I would say not worth it...
The difficult part about this is that you only want to take the sum of the elements with j <= i. If not for that then you could do the following:
M = (V.reshape(n, 1, p) * V.reshape(1, n, p)) * F.reshape(n, n, 1)
A = M.sum(0).sum(0)
If F is symmetric (if F[i,j] == F[j,i]) then you can exploit the symmetry of M above as follows:
D = M[range(n), range(n)].sum(0)
A = (M.sum(0).sum(0) - D) / 2.0 + D
That said, this is really not a great candidate for vectorization, as you have n << p and so your for-loops are not going to have much effect on the speed of this computation.
Edit: As Bill said below, you can just make sure that the elements of F that you don't want to use are set to zero first, and then the M.sum(0).sum(0) result will be what you want.
The expression can be written as
and thus you can sum it like this using the np.newaxis-construct:
na = np.newaxis
X = (np.tri(n)*F)[:,:,na]*V[:,na,:]*V[na,:,:]
X.sum(axis=1).sum(axis=0)
Here a 3D-array X[i,j,p] is constructed, and then the 2 first axes are summed, which results in a 1D array A[p]. Additionaly F was multiplied with a triangular matrix to restrict the summation according to the problem.

Iterating through a scipy.sparse vector (or matrix)

I'm wondering what the best way is to iterate nonzero entries of sparse matrices with scipy.sparse. For example, if I do the following:
from scipy.sparse import lil_matrix
x = lil_matrix( (20,1) )
x[13,0] = 1
x[15,0] = 2
c = 0
for i in x:
print c, i
c = c+1
the output is
0
1
2
3
4
5
6
7
8
9
10
11
12
13 (0, 0) 1.0
14
15 (0, 0) 2.0
16
17
18
19
so it appears the iterator is touching every element, not just the nonzero entries. I've had a look at the API
http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.lil_matrix.html
and searched around a bit, but I can't seem to find a solution that works.
Edit: bbtrb's method (using coo_matrix) is much faster than my original suggestion, using nonzero. Sven Marnach's suggestion to use itertools.izip also improves the speed. Current fastest is using_tocoo_izip:
import scipy.sparse
import random
import itertools
def using_nonzero(x):
rows,cols = x.nonzero()
for row,col in zip(rows,cols):
((row,col), x[row,col])
def using_coo(x):
cx = scipy.sparse.coo_matrix(x)
for i,j,v in zip(cx.row, cx.col, cx.data):
(i,j,v)
def using_tocoo(x):
cx = x.tocoo()
for i,j,v in zip(cx.row, cx.col, cx.data):
(i,j,v)
def using_tocoo_izip(x):
cx = x.tocoo()
for i,j,v in itertools.izip(cx.row, cx.col, cx.data):
(i,j,v)
N=200
x = scipy.sparse.lil_matrix( (N,N) )
for _ in xrange(N):
x[random.randint(0,N-1),random.randint(0,N-1)]=random.randint(1,100)
yields these timeit results:
% python -mtimeit -s'import test' 'test.using_tocoo_izip(test.x)'
1000 loops, best of 3: 670 usec per loop
% python -mtimeit -s'import test' 'test.using_tocoo(test.x)'
1000 loops, best of 3: 706 usec per loop
% python -mtimeit -s'import test' 'test.using_coo(test.x)'
1000 loops, best of 3: 802 usec per loop
% python -mtimeit -s'import test' 'test.using_nonzero(test.x)'
100 loops, best of 3: 5.25 msec per loop
The fastest way should be by converting to a coo_matrix:
cx = scipy.sparse.coo_matrix(x)
for i,j,v in zip(cx.row, cx.col, cx.data):
print "(%d, %d), %s" % (i,j,v)
To loop a variety of sparse matrices from the scipy.sparse code section I would use this small wrapper function (note that for Python-2 you are encouraged to use xrange and izip for better performance on large matrices):
from scipy.sparse import *
def iter_spmatrix(matrix):
""" Iterator for iterating the elements in a ``scipy.sparse.*_matrix``
This will always return:
>>> (row, column, matrix-element)
Currently this can iterate `coo`, `csc`, `lil` and `csr`, others may easily be added.
Parameters
----------
matrix : ``scipy.sparse.sp_matrix``
the sparse matrix to iterate non-zero elements
"""
if isspmatrix_coo(matrix):
for r, c, m in zip(matrix.row, matrix.col, matrix.data):
yield r, c, m
elif isspmatrix_csc(matrix):
for c in range(matrix.shape[1]):
for ind in range(matrix.indptr[c], matrix.indptr[c+1]):
yield matrix.indices[ind], c, matrix.data[ind]
elif isspmatrix_csr(matrix):
for r in range(matrix.shape[0]):
for ind in range(matrix.indptr[r], matrix.indptr[r+1]):
yield r, matrix.indices[ind], matrix.data[ind]
elif isspmatrix_lil(matrix):
for r in range(matrix.shape[0]):
for c, d in zip(matrix.rows[r], matrix.data[r]):
yield r, c, d
else:
raise NotImplementedError("The iterator for this sparse matrix has not been implemented")
tocoo() materializes the entire matrix into a different structure, which is not the preferred MO for python 3. You can also consider this iterator, which is especially useful for large matrices.
from itertools import chain, repeat
def iter_csr(matrix):
for (row, col, val) in zip(
chain(*(
repeat(i, r)
for (i,r) in enumerate(comparisons.indptr[1:] - comparisons.indptr[:-1])
)),
matrix.indices,
matrix.data
):
yield (row, col, val)
I have to admit that I'm using a lot of python-constructs which possibly should be replaced by numpy-constructs (especially enumerate).
NB:
In [43]: t=time.time(); sum(1 for x in rather_dense_sparse_matrix.data); print(time.time()-t)
52.48686504364014
In [44]: t=time.time(); sum(1 for x in enumerate(rather_dense_sparse_matrix.data)); print(time.time()-t)
70.19013023376465
In [45]: rather_dense_sparse_matrix
<99829x99829 sparse matrix of type '<class 'numpy.float16'>'
with 757622819 stored elements in Compressed Sparse Row format>
So yes, enumerate is somewhat slow(ish)
For the iterator:
In [47]: it = iter_csr(rather_dense_sparse_matrix)
In [48]: t=time.time(); sum(1 for x in it); print(time.time()-t)
113.something something
So you decide whether this overhead is acceptable, in my case the tocoo caused MemoryOverflows's.
IMHO: such an iterator should be part of the csr_matrix interface, similar to items() in a dict() :)
I had the same problem and actually, if your concern is only speed, the fastest way (more than 1 order of magnitude faster) is to convert the sparse matrix to a dense one (x.todense()), and iterating over the nonzero elements in the dense matrix. (Though, of course, this approach requires a lot more memory)
Try filter(lambda x:x, x) instead of x.

Categories

Resources