I am trying to solve the inverse of a banded sparse matrix in the most efficient way so that I can incorporate this in my real-time system. I am generating sparse-banded matrices which represent a convolution operation. Currently, I am using spsolve from scipy.sparse.linalg library. I found that there is a better way by using solve_banded from the scipy.linalg library. However, solve_banded requires (l,u) which is the number of non-zero lower and upper diagonals and ab which (l + u + 1, M) array like banded matrix. I am not sure how to convert my code so that I can use solve_banded. Any help with this regard is highly appreciated.
import numpy as np
from scipy import linalg
import math
import time
from scipy.sparse import spdiags
from scipy.sparse.linalg import spsolve
def ABC(deg, fc, N):
r"""Generate sparse-banded matrices
"""
omc = 2*math.pi*fc
t = ((1-math.cos(omc))/(1+math.cos(omc)))**deg
p = 1
for k in np.arange(deg):
p = np.convolve(p,np.array([-1,1]),'full')
P = spdiags(np.kron(p,np.ones((N,1))).T, np.arange(deg+1), N-deg, N)
B = P.T.dot(P)
q = np.sqrt(t)
for k in np.arange(deg):
q = np.convolve(q,np.array([1,1]),'full')
Q = spdiags(np.kron(q,np.ones((N,1))).T, np.arange(deg+1), N-deg, N)
C = Q.T.dot(Q)
A = B + C
return A,B,C
if __name__ == '__main__':
mu = 0.1
deg = 3
wc = 0.1
for i in np.arange(1,7,1):
# some dense random vector
x = np.random.rand(10**i,1)
# generate sparse banded matrices
A,_,C = ABC(deg, wc, 10**i)
# another banded matrix
G = mu*A.dot(A.T) + C.dot(C.T)
# SCIPY SPSOLVE
st = time.time()
y = spsolve(G,x)
et = time.time()
print("SCIPY SPSOLVE: N = ", 10**i, "Time taken: ", et-st)
Results
SCIPY SPSOLVE: N = 10 Time taken: 0.0
SCIPY SPSOLVE: N = 100 Time taken: 0.0
SCIPY SPSOLVE: N = 1000 Time taken: 0.015689611434936523
SCIPY SPSOLVE: N = 10000 Time taken: 0.020943641662597656
SCIPY SPSOLVE: N = 100000 Time taken: 0.16722917556762695
SCIPY SPSOLVE: N = 1000000 Time taken: 1.7254831790924072
Solved it using solveh_banded from the scipy library. Very fast matrix inversion technique for extremely large sparse-banded matrices when the matrix is symmetric and positive definite banded matrix.
from scipy.linalg import solveh_banded
def sp_inv(A, x):
A = A.toarray()
N = np.shape(A)[0]
D = np.count_nonzero(A[0,:])
ab = np.zeros((D,N))
for i in np.arange(1,D):
ab[i,:] = np.concatenate((np.diag(A,k=i),np.zeros(i,)),axis=None)
ab[0,:] = np.diag(A,k=0)
y = solveh_banded(ab,x,lower=True)
return y
Related
I am using the root_scalar function from the scipy.optimize module to find the root of a complex function defined in sympy. However, the function takes around 15-20 seconds to return the root, and I need to find a way to speed up this computation. Is it possible to convert the entire sympy function to scipy for faster processing, or is there any other way to optimize this process and reduce the computation time?
from sympy.stats import Gamma, density, cdf, E, variance
from sympy import Symbol, pprint, simplify
import numpy as np
l = 7
m = 30
p = 17
w = 6
K = 500
c = 6
h = 0.1
mean = 500
std = 296
def calculate_mean(days):
return mean*days
def calculate_std(days):
return std*np.sqrt(days)
def calculate_mean_std(days):
mean = calculate_mean(days)
std = calculate_std(days)
return mean, std
mean_m, std_m = calculate_mean_std(m)
mean_l, std_l = calculate_mean_std(l)
shape_m = (mean_m/std_m)**2
scale_m = std_m**2/mean_m
shape_l = (mean_l/std_l)**2
scale_l = std_l**2/mean_l
k = Symbol("k", positive=True)
theta = Symbol("theta", positive=True)
x = Symbol("x")
X = Gamma("z", k, theta)
P = density(X)(x)
C = cdf(X, meijerg=True)(x)
cdf_m_symb = C.subs([(theta, scale_m) , (k, shape_m)])
cdf_l_symb = C.subs([(theta, scale_l) , (k, shape_l)])
pdf_m_symb = P.subs([(theta, scale_m) , (k, shape_m)])
pdf_l_symb = P.subs([(theta, scale_l) , (k, shape_l)])
max_Q = np.ceil(mean*(m+l)).astype(int)
def g(r: float) -> float:
result = sp.N(-p + (p + w * cdf_m_symb.subs(x, max_Q)) * cdf_l_symb.subs(x, r) + \
w * sp.Integral(cdf_l_symb * pdf_m_symb.subs(x, (r + max_Q - x)), (x, 0, r)))
return result
from scipy.optimize import root_scalar
import sympy as sp
import time
start_time = time.time()
r0 = 200 # initial estimate for the root
bracket = (-10, 5000) # the upper and lower bounds of where the root is
solution = root_scalar(g, x0=r0, bracket=bracket)
print(solution) # info about the convergence
print("Results: ",solution.root) # the actual number
end_time = time.time()
print("Time taken:", end_time - start_time)
Here is the output from the above code
converged: True
flag: 'converged'
function_calls: 10
iterations: 9
root: 3966.9429368680453
Results: 3966.9429368680453
Time taken: 13.81236743927002
I have provided the code that I am currently using and the output that it produces. Any suggestions or examples of how to optimize this process would be greatly appreciated.
Compute the integral numerically, tabulate g(x) and interpolate x(g). Then your root-finding is nothing but evaluating a spline at a given point. Can't get any faster then that.
X is T by m matrix (Given Matrix)
B is T by n matrix (Variable 1)
A is n by m matrix (Variable 2)
I want to minimize ||X-A*B|| forbinious norm and find A and B for that using Python / cvxpy
I did this on Matlab and works fine
Discriminative disaggregation sparse coding for energy disaggregation algorithm was successfully implemented on Matlab but difficult to use for large sample sets, So need to implement it on python
import cvxpy as cp
import numpy as np
n = 5
m = 4
T = 3
np.random.seed(1)
A = cp.Variable((n, m))
B = cp.Variable((T, n))
x = np.random.rand(T, m)
constraints = [A >= 0,
B >= 0]
obj = cp.Minimize(cp.norm(x - cp.matmul(B,A),"fro"))
prob = cp.Problem(obj,constraints)
prob.solve()
Need to use cvxpy or any other tool on python to minimize multi objective function
I have a very large matrix, but I only want to find the eigenvectors (more than 1) with one specific eigenvalue. How can I get this without solving the whole eigenvalues and eigenvectors of this matrix in python?
One option could be perhaps to use shift-invert method. The method eigs in scipy has an optional parameter sigma using which it is possible to specify the value close to which it should search for eigenvalues:
import numpy as np
from scipy.sparse.linalg import eigs
np.random.seed(42)
N = 10
A = np.random.random_sample((N, N))
A += A.T
A += N*np.identity(N)
#get N//2 largest eigenvalues
l,_ = eigs(A, N//2)
print(l)
#get 2 eigenvalues closest in magnitude to 12
l,_ = eigs(A, 2, sigma = 12)
print(l)
This produces:
[ 19.52479260+0.j 12.28842653+0.j 11.43948696+0.j 10.89132148+0.j
10.79397596+0.j]
[ 12.28842653+0.j 11.43948696+0.j]
EDIT:
In case you know the eigenvalues in advance, then you could try to calculate the basis of the corresponding nullspace. For example:
import numpy as np
from numpy.linalg import eig, svd, norm
from scipy.sparse.linalg import eigs
from scipy.linalg import orth
def nullspace(A, atol=1e-13, rtol=0):
A = np.atleast_2d(A)
u, s, vh = svd(A)
tol = max(atol, rtol * s[0])
nnz = (s >= tol).sum()
ns = vh[nnz:].conj().T
return ns
np.random.seed(42)
eigen_values = [1,2,3,3,4,5]
N = len(eigen_values)
D = np.matrix(np.diag(eigen_values))
#generate random unitary matrix
U = np.matrix(orth(np.random.random_sample((N, N))))
#construct test matrix - it has the same eigenvalues as D
A = U.T * D * U
#get eigenvectors corresponding to eigenvalue 3
Omega = nullspace(A - np.eye(N)*3)
_,M = Omega.shape
for i in range(0, M):
v = Omega[:,i]
print(i, norm(A*v - 3*v))
I am trying to evaluate the density of multivariate t distribution of a 13-d vector. Using the dmvt function from the mvtnorm package in R, the result I get is
[1] 1.009831e-13
When i tried to write the function by myself in Python (thanks to the suggestions in this post:
multivariate student t-distribution with python), I realized that the gamma function was taking very high values (given the fact that I have n=7512 observations), making my function going out of range.
I tried to modify the algorithm, using the math.lgamma() and np.linalg.slogdet() functions to transform it to the log scale, but the result I got was
8.97669876e-15
This is the function that I used in python is the following:
def dmvt(x,mu,Sigma,df,d):
'''
Multivariate t-student density:
output:
the density of the given element
input:
x = parameter (d dimensional numpy array or scalar)
mu = mean (d dimensional numpy array or scalar)
Sigma = scale matrix (dxd numpy array)
df = degrees of freedom
d: dimension
'''
Num = math.lgamma( 1. *(d+df)/2 ) - math.lgamma( 1.*df/2 )
(sign, logdet) = np.linalg.slogdet(Sigma)
Denom =1/2*logdet + d/2*( np.log(pi)+np.log(df) ) + 1.*( (d+df)/2 )*np.log(1 + (1./df)*np.dot(np.dot((x - mu),np.linalg.inv(Sigma)), (x - mu)))
d = 1. * (Num - Denom)
return np.exp(d)
Any ideas why this functions does not produce the same results as the R equivalent?
Using as x = (0,0) produces similar results (up to a point, die to rounding) but with x = (1,1)1 I get a significant difference!
I finally managed to 'translate' the code from the mvtnorm package in R and the following script works without numerical underflows.
import numpy as np
import scipy.stats
import math
from math import lgamma
from numpy import matrix
from numpy import linalg
from numpy.linalg import slogdet
import scipy.special
from scipy.special import gammaln
mu = np.array([3,3])
x = np.array([1, 1])
Sigma = np.array([[1, 0], [0, 1]])
p=2
df=1
def dmvt(x, mu, Sigma, df, log):
'''
Multivariate t-student density. Returns the density
of the function at points specified by x.
input:
x = parameter (n x d numpy array)
mu = mean (d dimensional numpy array)
Sigma = scale matrix (d x d numpy array)
df = degrees of freedom
log = log scale or not
'''
p = Sigma.shape[0] # Dimensionality
dec = np.linalg.cholesky(Sigma)
R_x_m = np.linalg.solve(dec,np.matrix.transpose(x)-mu)
rss = np.power(R_x_m,2).sum(axis=0)
logretval = lgamma(1.0*(p + df)/2) - (lgamma(1.0*df/2) + np.sum(np.log(dec.diagonal())) \
+ p/2 * np.log(math.pi * df)) - 0.5 * (df + p) * math.log1p((rss/df) )
if log == False:
return(np.exp(logretval))
else:
return(logretval)
print(dmvt(x,mu,Sigma,df,True))
print(dmvt(x,mu,Sigma,df,False))
I'm trying to use numbapro to write a simple matrix vector multiplication below:
from numbapro import cuda
from numba import *
import numpy as np
import math
from timeit import default_timer as time
n = 100
#cuda.jit('void(float32[:,:], float32[:], float32[:])')
def cu_matrix_vector(A, b, c):
y, x = cuda.grid(2)
if y < n:
c[y] = 0.0
if x < n and y < n:
for i in range(n):
c[y] += A[y, i] * b[i]
A = np.array(np.random.random((n, n)), dtype=np.float32)
B = np.array(np.random.random((n, 1)), dtype=np.float32)
C = np.empty_like(B)
s = time()
dA = cuda.to_device(A)
dB = cuda.to_device(B)
dC = cuda.to_device(C)
cu_matrix_vector(dA, dB, dC)
dC.to_host()
e = time()
tcuda = e - s
but I'm getting following error:
numbapro.cudadrv.error.CudaDriverError: CUDA_ERROR_LAUNCH_FAILED Failed to copy memory D->H
I don't understand why the device to host copy is failing. Please help
Your code has multiple problems.
The B and C vectors are Nx1 2D matrices, not 1D vectors, but the type signature of your kernel lists them as "float32[:]" -- 1D vectors. It also indexes them with a single index, which results in runtime errors on the GPU due to misaligned access (cuda-memcheck is your friend here!)
Your kernel assumes a 2D grid, but only uses 1 column of it -- meaning many threads doing the same computation and overwriting each other.
There is no execution configuration given, so NumbaPro is launching a kernel with 1 block of 1 thread. (nvprof is your friend here!)
Here is a code that works. Note that this uses a 1D grid of 1D blocks, and loops over the columns of the matrix. Therefore it is optimized for the case where the number of rows in the vector/matrix is large. A kernel that is optimized for a short and wide matrix would need to use another approach (parallel reductions). But I would use CUBLAS sgemv (which is exposed in NumbaPro also) instead.
from numbapro import cuda
from numba import *
import numpy as np
import math
from timeit import default_timer as time
m = 100000
n = 100
#cuda.jit('void(f4[:,:], f4[:], f4[:])')
def cu_matrix_vector(A, b, c):
row = cuda.grid(1)
if (row < m):
sum = 0
for i in range(n):
sum += A[row, i] * b[i]
c[row] = sum
A = np.array(np.random.random((m, n)), dtype=np.float32)
B = np.array(np.random.random(m), dtype=np.float32)
C = np.empty_like(B)
s = time()
dA = cuda.to_device(A)
dB = cuda.to_device(B)
dC = cuda.to_device(C)
cu_matrix_vector[(m+511)/512, 512](dA, dB, dC)
dC.to_host()
print C
e = time()
tcuda = e - s