I have a large real 1-d data set called r. I would like plot:
mean(log(1+a*r)) vs a, with a > -1 .
This is my code:
rr=pd.read_csv('goog.csv')
dd=rr['Close']
series=pd.Series(dd)
seriespct=series.pct_change()
seriespct[0]=seriespct.mean()
dum1 =[0]*len(dd)
a=1.
a_max = 1.
a_step = 0.01
a = scipy.arange(-3.+a_step, a_max, a_step)
n = len(a)
dum2 =[0]*n
m=len(dd)
for j in range(n):
for i in range(m):
dum1[i]=math.log(1+a[j]*seriespct[i])
dum2[j]=scipy.mean(dum1)
plt.plot(a,dum2)
plt.show()
How can I do this in a more elgant way?
I would recommend this:
plt.plot(a, np.log(1 + r*a[:,None]).mean(1))
This has a big speed advantage because it avoids for-loops, and loops done in numpy are significantly faster in case your dataset is large.
In [49]: a = np.arange(a_step-.3, a_max, a_step)
In [50]: r = np.random.random(100)
In [51]: timeit [scipy.mean(log(1+a[i]*r)) for i in range(len(a))]
100 loops, best of 3: 5.47 ms per loop
In [52]: timeit np.log(1 + r*a[:,None]).mean(1)
1000 loops, best of 3: 384 µs per loop
It works by broadcasting so that a varies along one axis and r along another, then you can take the mean just along the axis that r varies along, so you still have an array that varies with a (and has the same shape as a):
import numpy as np
import matplotlib.pyplot as plt
r = np.random.random(100)
a = 1.
a_max = 1.
a_step = 0.01
a = np.arange(a_step-.3, a_max, a_step)
a.shape
#(129,)
a = a[:,None] #adds a new axis, making this a column vector, same as: a = a.reshape(-1,1)
a.shape
#(129, 1)
(a*r).shape
#(129, 100)
loga = np.log(1 + a*r)
loga.shape
#(129,100)
mloga = loga.mean(axis=1) #take the mean along the 2nd axis where `a` varies
mloga.shape
#(129,)
plt.plot(a, mloga)
plt.show()
ADDENDUM:
To avoid dependency on broadcasting, you can use np.outer:
plt.plot(a, np.log(1 + np.outer(a,r)).mean(1))
Which has no need for reshaping a (skip the step a = a[:,None])
Here's a simpler example, so you can see what's happening:
r = np.exp(np.arange(1,5))
a = np.arange(5)
In [33]: r
Out[33]: array([ 2.71828183, 7.3890561 , 20.08553692, 54.59815003])
In [34]: a
Out[34]: array([0, 1, 2, 3, 4])
In [39]: r*a[:,None]
Out[39]:
# this is 2.7... 7.3... 20.08... 54.5... # times:
array([[ 0. , 0. , 0. , 0. ], # 0
[ 2.71828183, 7.3890561 , 20.08553692, 54.59815003], # 1
[ 5.43656366, 14.7781122 , 40.17107385, 109.19630007], # 2
[ 8.15484549, 22.1671683 , 60.25661077, 163.7944501 ], # 3
[ 10.87312731, 29.5562244 , 80.34214769, 218.39260013]]) # 4
In [40]: np.outer(a,r)
Out[40]:
array([[ 0. , 0. , 0. , 0. ],
[ 2.71828183, 7.3890561 , 20.08553692, 54.59815003],
[ 5.43656366, 14.7781122 , 40.17107385, 109.19630007],
[ 8.15484549, 22.1671683 , 60.25661077, 163.7944501 ],
[ 10.87312731, 29.5562244 , 80.34214769, 218.39260013]])
# this is the mean of each column:
In [41]: (np.outer(a,r)).mean(1)
Out[41]: array([ 0. , 21.19775622, 42.39551244, 63.59326866, 84.79102488])
# and the log of 1 + the above is:
In [42]: np.log(1+(np.outer(a,r)).mean(1))
Out[42]: array([ 0. , 3.09999121, 3.77035604, 4.16811021, 4.4519144 ])
You can use scipy to do means.
You can use matplotlib to do plotting.
import scipy
from matplotlib import pyplot
#convert r from a python list to an 1-D array
r = scipy.array(r)
#edit these
a_max = 100
a_step = 0.1
a = scipy.arange(-1+a_step, a_max, a_step)
n = len(a)
pyplot.plot(a, [scipy.mean(log(1+a[i]*r)) for i in range(n)], 'b-')
pyplot.show()
Related
I'm trying to create a certain style of band(ed) matrix (see Wikipedia). The following code works, but for large M (~300 or so) it becomes quite slow because of the for loop. Is there a way to vectorize it/make better use of NumPy and/or SciPy? I am having trouble figuring out the mathematical operation that this corresponds to, and hence I have not succeeded thus far.
The code I have is as follows
def banded_matrix(M):
phis = np.linspace(0, 2*np.pi, M)
i = 0
ham = np.zeros((int(2*M), int(2*M)))
for phi in phis:
ham_phi = np.array([[1, 1],
[1, -1]])*(1+np.cos(phi))
array_phi = np.zeros(M)
array_phi[i] = 1
mat_phi = np.diag(array_phi)
ham += np.kron(mat_phi, ham_phi)
i += 1
return ham
With %timeit banded_matrix(M=300) it takes about 4 seconds on my computer.
Since the code is a bit opaque, what I want to do is construct a large 2M by 2M matrix. In a sense it has M entries on it's 'width 2' diagonal, where the entries are 2x2 matrices ham_phi that depend on phi. The matrix will afterwards be diagonalized, so perhaps one could even make use of its structure/the fact that it is rather sparse to speed that up, but of that I am not sure.
If anyone has an idea where to go with this, I'd be happy to follow up on that!
Your matrix is diagonal by blocks, so you can use scipy.linalg.block_diag:
import numpy as np
from scipy.linalg import block_diag
def banded_matrix_scipy(M):
ham = np.array([[1, 1], [1, -1]])
phis = np.linspace(0, 2 * np.pi, M)
ham_phis = ham * (1 + np.cos(phis))[:, None, None]
return block_diag(*ham_phis)
Let's check that it works and is faster:
b1 = banded_matrix(300)
b2 = banded_matrix_scipy(300)
np.all(b1 == b2) # True
>>> %timeit banded_matrix(300)
>>> %timeit banded_matrix_scipy(300)
1.51 s ± 57 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.24 ms ± 4.57 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
The obligatory np.einsum benchmark
def banded_matrix_einsum(M):
return np.einsum('ij, kl-> ikjl',
np.eye(M)*(1 + np.cos(np.linspace(0, 2 * np.pi, M))),
np.array([[1, 1], [1, -1]])).reshape(2*M, 2*M)
banded_matrix_einsum(4)
Output
array([[ 2. , 2. , 0. , 0. , 0. , 0. , 0. , 0. ],
[ 2. , -2. , 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0.5, 0.5, 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0.5, -0.5, 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0.5, 0.5, 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0.5, -0.5, 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 2. , 2. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 2. , -2. ]])
Benchmark results
import perfplot
perfplot.show(
setup = lambda M: M,
kernels = [banded_matrix_einsum, banded_matrix_scipy, banded_matrix],
n_range = [50, 100, 150, 200, 250, 300],
logx = False
)
scipy.linalg.block_diag vs np.einsum details
perfplot.show(
setup = lambda M: M,
kernels = [banded_matrix_einsum, banded_matrix_scipy],
n_range = [50, 100, 150, 200, 250, 300, 350, 400],
logx = False
)
As another way, you can use numba accelerator to speed it up with jitting. I propose an equivalent scipy.linalg.block_diag numba method that is based on paime answer:
import numba as nb
#nb.njit
def block_diag_numba(result, ham_phis):
for i in range(ham_phis.shape[0]):
for j in range(ham_phis.shape[1]):
result[i * 2, i * 2:i * 2 + 2] = ham_phis[i, 0]
result[i * 2 + 1, i * 2:i * 2 + 2] = ham_phis[i, 1]
return result
def numba_(M):
ham = np.array([[1, 1], [1, -1]])
phis = np.linspace(0, 2 * np.pi, M)
ham_phis = ham * (1 + np.cos(phis))[:, None, None]
return block_diag_numba(np.zeros((M * ham.shape[1], M * ham.shape[1])), ham_phis)
This method will be faster than the previous ones at least 4-5 times for up to m=400 (us scale). This method can be adjust for other array shapes and improved by optimizing the code further (not using the paime answer) and bringing all code lines to numba function or parallelizing. I didn't go further because the paime answer performance seemed to be satisfiable by the OP acceptance; Just to show we can use numba to write much faster scipy.linalg.block_diag equivalent code:
I have different sized vectors and want to do element-wise manipulations. How can I optimize the following for-loop in Python? (For instance with np.vectorize())
import numpy as np
n = 1000000
vec1 = np.random.rand(n)
vec2 = np.random.rand(3*n)
vec3 = np.random.rand(3*n)
for i in range(len(vec1)):
if vec1[i] < 0.5:
vec2[3*i : 3*(i+1)] = vec1[i]*vec3[3*i : 3*(i+1)]
else:
vec2[3*i : 3*(i+1)] = [0,0,0]
Thanks a lot for your help.
We could leverage broadcasting -
v = vec3.reshape(-1,3)*vec1[:,None]
m = vec1<0.5
vec2_out = (v*m[:,None]).ravel()
Another way to express that would be -
mask = vec1<0.5
vec2_out = (vec3.reshape(-1,3)*(vec1*mask)[:,None]).ravel()
And use multi-cores with numexpr module -
import numexpr as ne
d = {'V3r':vec3.reshape(-1,3),'vec12D':vec1[:,None]}
out = ne.evaluate('V3r*vec12D*(vec12D<0.5)',d).ravel()
Timings -
In [84]: n = 1000000
...: np.random.seed(0)
...: vec1 = np.random.rand(n)
...: vec2 = np.random.rand(3*n)
...: vec3 = np.random.rand(3*n)
In [86]: %%timeit
...: v = vec3.reshape(-1,3)*vec1[:,None]
...: m = vec1<0.5
...: vec2_out = (v*m[:,None]).ravel()
10 loops, best of 3: 23.2 ms per loop
In [87]: %%timeit
...: mask = vec1<0.5
...: vec2_out = (vec3.reshape(-1,3)*(vec1*mask)[:,None]).ravel()
100 loops, best of 3: 13.1 ms per loop
In [88]: %%timeit
...: d = {'V3r':vec3.reshape(-1,3),'vec12D':vec1[:,None]}
...: out = ne.evaluate('V3r*vec12D*(vec12D<0.5)',d).ravel()
100 loops, best of 3: 4.11 ms per loop
For a generic case, where the else-part could be something other than zeros, it would be -
mask = vec1<0.5
IF_vals = vec3.reshape(-1,3)*vec1[:,None]
ELSE_vals = np.array([1,1,1])
out = np.where(mask[:,None],IF_vals,ELSE_vals).ravel()
numpy.vectorize, as mentioned in the comments, is for convenience, not performance, per the docs:
The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.
One solution to actually vectorize this would be:
vec2[:] = vec1.repeat(3) * vec3 # Bulk compute all results
vec2[(vec1 < 0.5).repeat(3)] = 0 # Zero the results you meant to exclude
Another approach (that minimizes temporaries) would be to filter and reshape vec1 so it can be assigned to vec2, then multiply vec2 by vec3 in place to avoid a temporary (beyond the two n length arrays from the first step), e.g.:
vec2.reshape(-1, 3)[:] = (vec1 * (vec1 >= 0.5)).reshape(-1, 1)
vec2 *= vec3
An additional temporary could be shaved if vec1 can be modified, simplifying to:
vec1 *= vec1 >= 0.5
vec2.reshape(-1, 3)[:] = vec1.reshape(-1, 1)
vec2 *= vec3
The reshape/broadcasting that #Divakar demonstrates is equivalent to rewriting your iteration as:
In [5]: n = 10
...: vec1 = np.random.rand(n)
...: vec2 = np.zeros((n,3))
...: vec3 = np.random.rand(n,3)
...:
...: for i in range(len(vec1)):
...: if vec1[i] < 0.5:
...: vec2[i,:] = vec1[i]*vec3[i,:]
...: else:
...: vec2[i,:] = 0
...:
In [6]: vec2
Out[6]:
array([[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0.119655 , 0.05079028, 0.00392748],
[0.04529872, 0.04630456, 0.01565116],
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0.08361475, 0.21825921, 0.1273483 ]])
In [7]: vec1
Out[7]:
array([0.934649 , 0.85309325, 0.50775071, 0.91246865, 0.12970539,
0.13075136, 0.89861756, 0.68921343, 0.80572879, 0.25996369])
By defining vec2 as a (n,3) array, we replace this indexing vec2[3*i : 3*(i+1)] with vec2[i,:] or vec2[i].
Use of a mask to set values to 0 is a good basic numpy idea. But ufunc also provide a where parameter that can be used as:
In [11]: vec2 = np.zeros((n,3))
In [12]: np.multiply(vec1[:,None],vec3, out=vec2, where=vec1[:,None]<0.5);
In [13]: vec2
Out[13]:
array([[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0.119655 , 0.05079028, 0.00392748],
[0.04529872, 0.04630456, 0.01565116],
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0.08361475, 0.21825921, 0.1273483 ]])
This where needs to be used in conjunction with a out parameter, since it only does the multiply for the True instances.
I'm not sure how much of a time saver it is.
x has shape [batch_size, n_time] where the batches are independent
If k=3, d=discount_rate. Pseudocode:
x[:,i] = x[:,i] + x[:,i+1]*(d**1) + x[:,i+2]*(d**2) + x[:,i+3]*(d**3)
Here's working code, but it's very slow. I'll be executing this function millions of times, so I'm hoping for a faster implementation
import numpy as np
def k_step_discount(x, k, discount_rate):
n_time = x.shape[1]
k_include_cur = k + 1 # k excludes current timestep
for i in range(n_time):
k_cur = min(n_time - i, k_include_cur) # prevent out of bounds
for j in range(1, k_cur):
x[:, i] += x[:, i+j] * (discount_rate ** j)
return x
x = np.array([
[0,0,0,1,0,0],
[0,1,2,3,4,5.]
])
y = k_step_discount(x+0, k=2, discount_rate=.9)
print('x\n{}\ny\n{}'.format(x, y))
>> x
[[ 0. 0. 0. 1. 0. 0.]
[ 0. 1. 2. 3. 4. 5.]]
>> y
[[ 0. 0.81 0.9 1. 0. 0. ]
[ 2.52 5.23 7.94 10.65 8.5 5. ]]
A scipy function that's similar is:
import scipy.signal
import numpy as np
x = np.array([[0,0,0,1,0,0.]])
discount_rate = .9
y = np.flip(scipy.signal.lfilter([1], [1, -discount_rate], np.flip(x+0, 1), axis=1), 1)
print('x\n{}\ny\n{}'.format(x, y))
>> x
[[ 0. 0. 0. 1. 0. 0.]]
>> y
[[ 0.729 0.81 0.9 1. 0. 0. ]]
However, it discounts until the end of n_time rather than only for k steps
I'm also interested in K-step discounting without batches, if that'd be easier/faster
import numpy as np
def k_step_discount_no_batch(x, k, discount_rate):
n_time = x.shape[0]
k_include_cur = k + 1 # k excludes current timestep
for i in range(n_time):
k_cur = min(n_time - i, k_include_cur) # prevent out of bounds
for j in range(1, k_cur):
x[i] += x[i+j] * (discount_rate ** j)
return x
x = np.array([8,0,0,0,1,2.])
y = k_step_discount_no_batch(x+0, k=2, discount_rate=.9)
print('x\n{}\ny\n{}'.format(x, y))
>> x
[ 8. 0. 0. 0. 1. 2.]
>> y
[ 8. 0. 0.81 2.52 2.8 2. ]
Similar no_batch scipy function
import scipy.signal
import numpy as np
x = np.array([8,0,0,0,1,2.])
discount_rate = .9
y = scipy.signal.lfilter([1], [1, -discount_rate], x[::-1], axis=0)[::-1]
print('x\n{}\ny\n{}'.format(x, y))
>> x
[ 8. 0. 0. 0. 1. 2.]
>> y
[ 9.83708 2.0412 2.268 2.52 2.8 2. ]
You could use 2D convolution here. To get the scaling done properly, we need to create the proper 2D kernel, which would be a flipped version of the powered-scaled numbers of discount_rate. This is in accordance with the definition of convolution, in which kernel is slided in the flipped order against the input data and its elements are scaled with those kernel ones and summed up, as precisely done in this case.
Thus, the implementation would be simply -
from scipy.signal import convolve2d as conv2d
import numpy as np
def k_step_discount(x, k, discount_rate, is_batch=True):
if is_batch:
kernel = discount_rate**np.arange(k+1)[::-1][None]
return conv2d(x,kernel)[:,k:]
else:
kernel = discount_rate**np.arange(k+1)[::-1]
return np.convolve(x, kernel)[k:]
Sample run -
In [190]: x
Out[190]:
array([[ 0., 0., 0., 1., 0., 0.],
[ 0., 1., 2., 3., 4., 5.]])
# Proposed method
In [191]: k_step_discount_conv2d(x, k=2, discount_rate=0.9)
Out[191]:
array([[ 0. , 0.81, 0.9 , 1. , 0. , 0. ],
[ 2.52, 5.23, 7.94, 10.65, 8.5 , 5. ]])
# Original loopy method
In [192]: k_step_discount(x, k=2, discount_rate=.9)
Out[192]:
array([[ 0. , 0.81, 0.9 , 1. , 0. , 0. ],
[ 2.52, 5.23, 7.94, 10.65, 8.5 , 5. ]])
Runtime test
In [206]: x = np.random.randint(0,9,(100,1000)).astype(float)
In [207]: %timeit k_step_discount_conv2d(x, k=2, discount_rate=0.9)
1000 loops, best of 3: 1.27 ms per loop
In [208]: %timeit k_step_discount(x, k=2, discount_rate=.9)
100 loops, best of 3: 4.83 ms per loop
With bigger k's :
In [215]: x = np.random.randint(0,9,(100,1000)).astype(float)
In [216]: %timeit k_step_discount_conv2d(x, k=20, discount_rate=0.9)
100 loops, best of 3: 5.44 ms per loop
In [217]: %timeit k_step_discount(x, k=20, discount_rate=.9)
10 loops, best of 3: 44.8 ms per loop
Thus, expect huge speedups with bigger k's!
Further boost
As suggested by #Eric, we could also leverage scipy.ndimage.filters's 1D convolution here.
For a proper comparison listing both with Scipy's 2D and 1D convolution methods -
from scipy.ndimage.filters import convolve1d as conv1d
def using_conv2d(x, k, discount_rate):
kernel = discount_rate**np.arange(k+1)[::-1][None]
return conv2d(x,kernel)[:,k:]
def using_conv1d(x, k, discount_rate):
kernel = discount_rate**np.arange(k+1)[::-1]
return conv1d(x,kernel, mode='constant', origin=k//2)
Timings -
In [100]: x = np.random.randint(0,9,(100,1000)).astype(float)
In [101]: out1 = using_conv2d(x, k=20, discount_rate=0.9)
...: out2 = using_conv1d(x, k=20, discount_rate=0.9)
...:
In [102]: np.allclose(out1, out2)
Out[102]: True
In [103]: %timeit using_conv2d(x, k=20, discount_rate=0.9)
100 loops, best of 3: 5.27 ms per loop
In [104]: %timeit using_conv1d(x, k=20, discount_rate=0.9)
1000 loops, best of 3: 1.43 ms per loop
I have an NumPy array of coordinates. For example purposes, I will use this
In [1]: np.random.seed(123)
In [2]: coor = np.random.randint(10, size=12).reshape(-1,3)
In [3]: coor
Out[3]: array([[2, 2, 6],
[1, 3, 9],
[6, 1, 0],
[1, 9, 0]])
I want the triangular matrix of distances between all coordinates. A simple approach would be to code a double loop over all coordinates
In [4]: n_coor = len(coor)
In [5]: dist = np.zeros((n_coor, n_coor))
In [6]: for j in xrange(n_coor):
for k in xrange(j+1, n_coor):
dist[j, k] = np.sqrt(np.sum((coor[j] - coor[k]) ** 2))
with the result being an upper triangular matrix of the distances
In [7]: dist
Out[7]: array([[ 0. , 3.31662479, 7.28010989, 9.2736185 ],
[ 0. , 0. , 10.48808848, 10.81665383],
[ 0. , 0. , 0. , 9.43398113],
[ 0. , 0. , 0. , 0. ]])
Leveraging NumPy, I can avoid looping using
In [8]: dist = np.sqrt(((coor[:, None, :] - coor) ** 2).sum(-1))
but the result is the entire matrix
In [9]: dist
Out[9]: array([[ 0. , 3.31662479, 7.28010989, 9.2736185 ],
[ 3.31662479, 0. , 10.48808848, 10.81665383],
[ 7.28010989, 10.48808848, 0. , 9.43398113],
[ 9.2736185 , 10.81665383, 9.43398113, 0. ]])
This one line version takes roughly half the time when I use 2048 coordinates (4 s instead of 10 s) but this is doing twice as many calculations as it needs in order to get the symmetric matrix. Is there a way to adjust the one line command to only get the triangular matrix (and the additional 2x speedup, i.e. 2 s)?
We can use SciPy's pdist method to get those distances. So, we just need to initialize the output array and then set the upper triangular values with those distances
from scipy.spatial.distance import pdist
n_coor = len(coor)
dist = np.zeros((n_coor, n_coor))
row,col = np.triu_indices(n_coor,1)
dist[row,col] = pdist(coor)
Alternatively, we can use boolean-indexing to assign values, replacing the last two lines
dist[np.arange(n_coor)[:,None] < np.arange(n_coor)] = pdist(coor)
Runtime test
Functions:
def subscripted_indexing(coor):
n_coor = len(coor)
dist = np.zeros((n_coor, n_coor))
row,col = np.triu_indices(n_coor,1)
dist[row,col] = pdist(coor)
return dist
def boolean_indexing(coor):
n_coor = len(coor)
dist = np.zeros((n_coor, n_coor))
r = np.arange(n_coor)
dist[r[:,None] < r] = pdist(coor)
return dist
Timings:
In [110]: # Setup input array
...: coor = np.random.randint(0,10, (2048,3))
In [111]: %timeit subscripted_indexing(coor)
10 loops, best of 3: 91.4 ms per loop
In [112]: %timeit boolean_indexing(coor)
10 loops, best of 3: 47.8 ms per loop
I'm doing a project and I'm doing a lot of matrix computation in it.
I'm looking for a smart way to speed up my code. In my project, I'm dealing with a sparse matrix of size 100Mx1M with around 10M non-zeros values. The example below is just to see my point.
Let's say I have:
A vector v of size (2)
A vector c of size (3)
A sparse matrix X of size (2,3)
v = np.asarray([10, 20])
c = np.asarray([ 2, 3, 4])
data = np.array([1, 1, 1, 1])
row = np.array([0, 0, 1, 1])
col = np.array([1, 2, 0, 2])
X = coo_matrix((data,(row,col)), shape=(2,3))
X.todense()
# matrix([[0, 1, 1],
# [1, 0, 1]])
Currently I'm doing:
result = np.zeros_like(v)
d = scipy.sparse.lil_matrix((v.shape[0], v.shape[0]))
d.setdiag(v)
tmp = d * X
print tmp.todense()
#matrix([[ 0., 10., 10.],
# [ 20., 0., 20.]])
# At this point tmp is csr sparse matrix
for i in range(tmp.shape[0]):
x_i = tmp.getrow(i)
result += x_i.data * ( c[x_i.indices] - x_i.data)
# I only want to do the subtraction on non-zero elements
print result
# array([-430, -380])
And my problem is the for loop and especially the subtraction.
I would like to find a way to vectorize this operation by subtracting only on the non-zero elements.
Something to get directly the sparse matrix on the subtraction:
matrix([[ 0., -7., -6.],
[ -18., 0., -16.]])
Is there a way to do this smartly ?
You don't need to loop over the rows to do what you are already doing. And you can use a similar trick to perform the multiplication of the rows by the first vector:
import scipy.sparse as sps
# number of nonzero entries per row of X
nnz_per_row = np.diff(X.indptr)
# multiply every row by the corresponding entry of v
# You could do this in-place as:
# X.data *= np.repeat(v, nnz_per_row)
Y = sps.csr_matrix((X.data * np.repeat(v, nnz_per_row), X.indices, X.indptr),
shape=X.shape)
# subtract from the non-zero entries the corresponding column value in c...
Y.data -= np.take(c, Y.indices)
# ...and multiply by -1 to get the value you are after
Y.data *= -1
To see that it works, set up some dummy data
rows, cols = 3, 5
v = np.random.rand(rows)
c = np.random.rand(cols)
X = sps.rand(rows, cols, density=0.5, format='csr')
and after run the code above:
>>> x = X.toarray()
>>> mask = x == 0
>>> x *= v[:, np.newaxis]
>>> x = c - x
>>> x[mask] = 0
>>> x
array([[ 0.79935123, 0. , 0. , -0.0097763 , 0.59901243],
[ 0.7522559 , 0. , 0.67510109, 0. , 0.36240006],
[ 0. , 0. , 0.72370725, 0. , 0. ]])
>>> Y.toarray()
array([[ 0.79935123, 0. , 0. , -0.0097763 , 0.59901243],
[ 0.7522559 , 0. , 0.67510109, 0. , 0.36240006],
[ 0. , 0. , 0.72370725, 0. , 0. ]])
The way you are accumulating your result requires that there are the same number of non-zero entries in every row, which seems a pretty weird thing to do. Are you sure that is what you are after? If that's really what you want you could get that value with something like:
result = np.sum(Y.data.reshape(Y.shape[0], -1), axis=0)
but I have trouble believing that is really what you are after...