Improve speed when numpy array works with sympy matrix - python

I am confused sometimes when I want to get an analytical expression in terms of certain variables. In the following case, one is a Numpy array (T) and the other is a Sympy matrix (X). I know it is not a good idea to directly multiply them, so I decide to convert T to a Sympy matrix. However, it takes ages to get the result for this large-sized matrix. Are there any more efficient ways? Thank you.
import numpy as np
import sympy as sp
T = np.random.rand(100, 5000)
x = sp.symbols('x:'+str(5000))
X = sp.Matrix(x)
W = sp.Matrix(T)
V = W * X

There are potentially more efficient ways to do this. One possibility if you want a "compiled" version of SymPy is to use SymEngine which is more limited in scope than SymPy but can do what you seem to be asking for faster than SymPy (It's basically a reimplementation of certain parts of SymPy in C++):
import numpy as np
import symengine as sp # <<-- changed import
T = np.random.rand(100, 5000)
x = sp.symbols('x:'+str(5000))
X = sp.Matrix(x)
W = sp.Matrix(T.tolist()) # Need tolist() here
V = W * X
This still takes a while but it is faster than SymPy. In any case though there is almost certainly a better way of approaching your actual problem than computing the illustrated matrix product. The matrix V here is really just an inefficient representation of the original numpy array T. I expect that whatever it is that you want to do with V could be done more directly with T.

Related

How to Lambdify a multivariate sympy expression for numeric integration efficiently

I have this very complex symbolic function of three variables, f(x,y,z), which I need to evaluate over a point in z and integrate over all the domain in x and y.
Summarizing, I've been using:
f_substituted=f.subs(z,valueforz)
f_numeric=lambdify([x,y].f_substituted)
#and then, integrate it a la Riemann (dblquad does not converge), by doing:
N=1000
q = np.linspace(-10, 10, N)
p = np.linspace(-10, 10, N)
dS=(q[1]-q[0])*(p[1]-p[0])
q, p = np.meshgrid(q, p) #I meshgrid to have a 2D grid for integration.
#I define a dataframe here because I have seen it is quicker than vectorizing the expression
df = pd.DataFrame({'Q':np.concatenate(q),'P':np.concatenate(p)})
integral=sum((rho_dot_00_x(df['Q'],df['P'])).dropna()).real*dS
My problem is that it works really slowly and I cannot make work ufuncify, or theano.
Also, I tried to lambdify using "numpy" argument, but it ran way slowlier (I thought it should be quicker than math).
I also tried evalf after the substitution, like this:
f_substituted=f.subs(z,valueforz).evalf()
but it was even slower.
Any help to make the numeric evaluation quicker would be welcome.
Thanks in advance.

Is there any inverse np.dot function?

If I have two matrices a and b, is there any function I can find the matrix x, that when dot multiplied by a makes b? Looking for python solutions, for matrices in the form of numpy arrays.
This problem of finding X such as A*X=B is equivalent to search the "inverse of A", i.e. a matrix such as X = Ainverse * B.
For information, in math Ainverse is noted A^(-1) ("A to the power -1", but you can say "A inverse" instead).
In numpy, this is a builtin function to find the inverse of a matrix a:
import numpy as np
ainv = np.linalg.inv(a)
see for instance this tutorial for explanations.
You need to be aware that some matrices are not "invertible", most obvious examples (roughly) are:
matrix that are not square
matrix that represent a projection
numpy can still approximate some value in certain cases.
if A is a full rank, square matrix
import numpy as np
from numpy.linalg import inv
X = inv(A) # B
if not, then such a matrix does not exist, but we can approximate it
import numpy as np
from numpy.linalg import inv
X = inv(A.T # A) # A.T # B

How to assign a index based formula to each indexed position of a tensor on sympy

Let's say that we have a IndexedBase 2-dim tensor r[i,j]. I want to assign to each indexed position a formula that uses the i and j positions of other 1-dim tensors, like this.
from sympy import symbols, IndexedBase, Idx
from sympy.functions.elementary.exponential import *
N = symbols('N', integer=True)
Np = symbols('Np', integer=True)
x = IndexedBase('x', (Np,))
z = IndexedBase('z', (Np,))
r = IndexedBase('r', (Np,N,))
i = Idx('i', (1,Np))
j = Idx('j', (1,N))
r[i,j] = sqrt(x[i]**2 + z[j]**2)
I know that could be easily translated to numpy, but sympy does not allow item association IndexedBase objects.
I need to understand how sympy treats the IndexedBase variables on this case. The final objective is to use lambdify on a much more complex expression, in order to allow numpy vectors as input arguments, but the operations are all based at this type of association. How could I perform this task?
Maybe I did not uderstand correctly the basis of the IndexedBase variables in Sympy. Sorry if this is a dummy question.

Solve overdetermined system with QR decomposition in Python

I'm trying to solve an overdetermined system with QR decomposition and linalg.solve but the error I get is
LinAlgError: Last 2 dimensions of the array must be square.
This happens when the R array is not square, right? The code looks like this
import numpy as np
import math as ma
A = np.random.rand(2,3)
b = np.random.rand(2,1)
Q, R = np.linalg.qr(A)
Qb = np.matmul(Q.T,b)
x_qr = np.linalg.solve(R,Qb)
Is there a way to write this in a more efficient way for arbitrary A dimensions? If not, how do I make this code snippet work?
The reason is indeed that the matrix R is not square, probably because the system is overdetermined. You can try np.linalg.lstsq instead, finding the solution which minimizes the squared error (which should yield the exact solution if it exists).
import numpy as np
A = np.random.rand(2, 3)
b = np.random.rand(2, 1)
x_qr = np.linalg.lstsq(A, b)[0]
You need to call QR with the flag mode='reduced'. The default Q R matrices are returned as M x M and M x N, so if M is greater than N then your matrix R will be nonsquare. If you choose reduced (economic) mode your matrices will be M x N and N x N, in which case the solve routine will work fine.
However, you also have equations/unknowns backwards for an overdetermined system. Your code snippet should be
import numpy as np
A = np.random.rand(3,2)
b = np.random.rand(3,1)
Q, R = np.linalg.qr(A, mode='reduced')
#print(Q.shape, R.shape)
Qb = np.matmul(Q.T,b)
x_qr = np.linalg.solve(R,Qb)
As noted by other contributors, you could also call lstsq directly, but sometimes it is more convenient to have Q and R directly (e.g. if you are also planning on computing projection matrix).
As shown in the documentation of numpy.linalg.solve:
Computes the “exact” solution, x, of the well-determined, i.e., full rank, linear matrix equation ax = b.
Your system of equations is underdetermined not overdetermined. Notice that you have 3 variables in it and 2 equations, thus fewer equations than unknowns.
Also notice how it also mentions that in numpy.linalg.solve(a,b), a must be an MxM matrix. The reason behind this is that solving the system of equations Ax=b involves computing the inverse of A, and only square matrices are invertible.
In these cases a common approach is to take the Moore-Penrose pseudoinverse, which will compute a best fit (least squares) solution of the system. So instead of trying to solve for the exact solution use numpy.linalg.lstsq:
x_qr = np.linalg.lstsq(R,Qb)

Python numpy : "Array is too big"

import numpy
from scipy.spatial.distance import pdist
X = numpy.zeros(50000,25)
C = pdist(X, 'euclidian')
I want to find:
And then numpy gives error : Array is too big.
I think problem is about array size of C. Pdist cannot creates (50000,50000) array. I dont know why numpy restricts? I can run same code in matlab. How can i run this code using array?
And also ,i found possible duplication but their array-matrix size too big.
Is it possible to create a 1million x 1 million matrix using numpy?
Very large matrices using Python and NumPy
first thing there are a couple of typos in your code. It's:
X = numpy.zeros((50000,25)) # it's a tuple going in
C = pdist(X, 'euclidean') # euclidean with an e
of course it does not matter for the question.
The Euclidean pdist is just a call for numpy.linalg.norm (http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.norm.html). It's a very general function. If it does not work in your case due to memory constraints you can always create something yourself. Two 50000 length vectors do not take that much memory and this can make one pairwise comparison:
np.sqrt(np.sum(np.square(X[0])) + np.sum(np.square(X[1])))
And then you only need to loop through the whole thing.
Hope it helps,
P

Categories

Resources