How to Lambdify a multivariate sympy expression for numeric integration efficiently - python

I have this very complex symbolic function of three variables, f(x,y,z), which I need to evaluate over a point in z and integrate over all the domain in x and y.
Summarizing, I've been using:
f_substituted=f.subs(z,valueforz)
f_numeric=lambdify([x,y].f_substituted)
#and then, integrate it a la Riemann (dblquad does not converge), by doing:
N=1000
q = np.linspace(-10, 10, N)
p = np.linspace(-10, 10, N)
dS=(q[1]-q[0])*(p[1]-p[0])
q, p = np.meshgrid(q, p) #I meshgrid to have a 2D grid for integration.
#I define a dataframe here because I have seen it is quicker than vectorizing the expression
df = pd.DataFrame({'Q':np.concatenate(q),'P':np.concatenate(p)})
integral=sum((rho_dot_00_x(df['Q'],df['P'])).dropna()).real*dS
My problem is that it works really slowly and I cannot make work ufuncify, or theano.
Also, I tried to lambdify using "numpy" argument, but it ran way slowlier (I thought it should be quicker than math).
I also tried evalf after the substitution, like this:
f_substituted=f.subs(z,valueforz).evalf()
but it was even slower.
Any help to make the numeric evaluation quicker would be welcome.
Thanks in advance.

Related

Is there any function for calculating k and b coefficients for linear regression model with only one independent variable?

I know I can just write needed method by myself but there must be a function for this because this problem is so common as heck. If somebody does't understand what I am talking about take a look at the following formula
{Image must be here}
For example, I have a function y = kx+b where y is dependent variable, and x is independent. I need to calculate k (slope) and b (intercept), and I have formulas from the picture, and everything those formulas need. Is there any function in common data science libraries which can help calculate those ones? I mentioned "only one independent variable" because sometimes there are multiple independent vars which leads to multidimentional plots
Googling gives nothing. I already use my own implementation of those ones, but I prefer native functions from packages such as scipy and numpy, or sklearn
not sure to fully understand the question (especially, what do you mean by "one independent variable"?), so I try to reformulate. If you have two variables, x andy, both represented by samples (x_1,..., x_n), (y_1,..., y_n) and that you suspect a linear relationship between them, y = a*x +b, then you can use numpy.polyfit to find the coefficients a and b. Here is an example:
import numpy as np
n = 100
x = np.linspace(0, 1, n)
y = 2*x + 0.3
a, b = np.polyfit(x, y, 1)
print(f"a={a}, b={b}")
Returns
a=2.0, b=0.30000000000000016
Hope that helps!

Improve speed when numpy array works with sympy matrix

I am confused sometimes when I want to get an analytical expression in terms of certain variables. In the following case, one is a Numpy array (T) and the other is a Sympy matrix (X). I know it is not a good idea to directly multiply them, so I decide to convert T to a Sympy matrix. However, it takes ages to get the result for this large-sized matrix. Are there any more efficient ways? Thank you.
import numpy as np
import sympy as sp
T = np.random.rand(100, 5000)
x = sp.symbols('x:'+str(5000))
X = sp.Matrix(x)
W = sp.Matrix(T)
V = W * X
There are potentially more efficient ways to do this. One possibility if you want a "compiled" version of SymPy is to use SymEngine which is more limited in scope than SymPy but can do what you seem to be asking for faster than SymPy (It's basically a reimplementation of certain parts of SymPy in C++):
import numpy as np
import symengine as sp # <<-- changed import
T = np.random.rand(100, 5000)
x = sp.symbols('x:'+str(5000))
X = sp.Matrix(x)
W = sp.Matrix(T.tolist()) # Need tolist() here
V = W * X
This still takes a while but it is faster than SymPy. In any case though there is almost certainly a better way of approaching your actual problem than computing the illustrated matrix product. The matrix V here is really just an inefficient representation of the original numpy array T. I expect that whatever it is that you want to do with V could be done more directly with T.

Solve overdetermined system with QR decomposition in Python

I'm trying to solve an overdetermined system with QR decomposition and linalg.solve but the error I get is
LinAlgError: Last 2 dimensions of the array must be square.
This happens when the R array is not square, right? The code looks like this
import numpy as np
import math as ma
A = np.random.rand(2,3)
b = np.random.rand(2,1)
Q, R = np.linalg.qr(A)
Qb = np.matmul(Q.T,b)
x_qr = np.linalg.solve(R,Qb)
Is there a way to write this in a more efficient way for arbitrary A dimensions? If not, how do I make this code snippet work?
The reason is indeed that the matrix R is not square, probably because the system is overdetermined. You can try np.linalg.lstsq instead, finding the solution which minimizes the squared error (which should yield the exact solution if it exists).
import numpy as np
A = np.random.rand(2, 3)
b = np.random.rand(2, 1)
x_qr = np.linalg.lstsq(A, b)[0]
You need to call QR with the flag mode='reduced'. The default Q R matrices are returned as M x M and M x N, so if M is greater than N then your matrix R will be nonsquare. If you choose reduced (economic) mode your matrices will be M x N and N x N, in which case the solve routine will work fine.
However, you also have equations/unknowns backwards for an overdetermined system. Your code snippet should be
import numpy as np
A = np.random.rand(3,2)
b = np.random.rand(3,1)
Q, R = np.linalg.qr(A, mode='reduced')
#print(Q.shape, R.shape)
Qb = np.matmul(Q.T,b)
x_qr = np.linalg.solve(R,Qb)
As noted by other contributors, you could also call lstsq directly, but sometimes it is more convenient to have Q and R directly (e.g. if you are also planning on computing projection matrix).
As shown in the documentation of numpy.linalg.solve:
Computes the “exact” solution, x, of the well-determined, i.e., full rank, linear matrix equation ax = b.
Your system of equations is underdetermined not overdetermined. Notice that you have 3 variables in it and 2 equations, thus fewer equations than unknowns.
Also notice how it also mentions that in numpy.linalg.solve(a,b), a must be an MxM matrix. The reason behind this is that solving the system of equations Ax=b involves computing the inverse of A, and only square matrices are invertible.
In these cases a common approach is to take the Moore-Penrose pseudoinverse, which will compute a best fit (least squares) solution of the system. So instead of trying to solve for the exact solution use numpy.linalg.lstsq:
x_qr = np.linalg.lstsq(R,Qb)

Calculating 1/r*d/dr(r*f) numerically in python when r=0. f is a function of r

Usually when you do this by hand there's no problem as the 1/r usually gets cancelled with another r. But doing this numerically with scipy.misc.derivative works like a charm for r different from zero. But of course, as soon as I ask for r = 0, I get division by zero, which I expected. So how else could you calculate this numerically. I insist on the fact that everything has to be done numerically as my function are now so complicated that I won't be able to find a derivative manually. Thank you!
My code:
rAtheta = lambda _r: _r*Atheta(_r,theta,z,t)
if r != 0:
return derivative(rAtheta,r,dx=1e-10,order=3)/r
else:
#What should go here so that it doesn't blow up when calculating the gradient?
tl;dr: use symbolic differentiation, or complex step differentiation if that fails
If you insist on using numerical methods, you really have to approximate the limit of the derivative as r->0 one way or the other.
I suggest trying complex step differentiation. The idea is to use complex arguments inside the function you're trying to differentiate, but it usually gets rid of the numerical instability that is imposed by standard finite difference schemes. The result is a procedure that needs complex arithmetic (hooray numpy, and python in general!) but in turn can be much more stable at small dx values.
Here's another point: complex step differentiation uses
F′(x0) = Im(F(x0+ih))/h + O(h^2)
Let's apply this to your r=0 case:
F′(0) = Im(F(ih))/h + O(h^2)
There are no singularities even for r=0! Choose a small h, possibly the same dx you're passing to your function, and use that:
def rAtheta(_r):
# note that named lambdas are usually frowned upon
return _r*Atheta(_r,theta,z,t)
tol = 1e-10
dr = 1e-12
if np.abs(r) > tol: # or math.abs or your favourite other abs
return derivative(rAtheta,r,dx=dr,order=3)/r
else:
return rAtheta(r + 1j*dr).imag/dr/r
Here is the above in action for f = r*ln(r):
The result is straightforwardly smooth, even though the points below r=1e-10 were computed with complex step differentiation.
Very important note: notice the separation between tol and dr in the code. The former is used to determine when to switch between methods, and the latter is used as a step in complex step differentiation. Look what happens when tol=dr=1e-10:
the result is a smoothly wrong function below r=1e-10! That's why you always have to be careful with numerical differentiation. And I wouldn't advise going too much below that in dr, as machine precision will bite you sooner or later.
But why stop here? I'm fairly certain that your functions could be written in a vectorized way, i.e. they could accept an array of radial points. Using complex step differentiation you don't have to loop over the radial points (which you would have to resort to using scipy.misc.derivative). Example:
import numpy as np
import matplotlib.pyplot as plt
def Atheta(r,*args):
return r*np.log(r) # <-- vectorized expression
def rAtheta(r):
return r*Atheta(r) #,theta,z,t) # <-- vectorized as much as Atheta is
def vectorized_difffun(rlist):
r = np.asarray(rlist)
dr = 1e-12
return (rAtheta(r + 1j*dr)).imag/dr/r
rarr = np.logspace(-12,-2,20)
darr = vectorized_difffun(rarr)
plt.figure()
plt.loglog(rarr,np.abs(darr),'.-')
plt.xlabel(r'$r$')
plt.ylabel(r'$|\frac{1}{r} \frac{d}{dr}(r^2 \ln r)|$')
plt.tight_layout()
plt.show()
The result should be familiar:
Having cleared the fun weirdness that is complex step differentiation, I should note that you should strongly consider using symbolic math. In cases like this when 1/r factors disappear exactly, it wouldn't hurt if you reached this conclusion exactly. After all double precision is still just double precision.
For this you'll need the sympy module, define your function symbolically once, differentiate it symbolically once, turn your simplified result into a numpy function using sympy.lambdify, and use this numerical function as much as you need (assuming that this whole process runs in finite time and the resulting function is not too slow to use). Example:
import sympy as sym
# only needed for the example:
import numpy as np
import matplotlib.pyplot as plt
r = sym.symbols('r')
f = r*sym.ln(r)
df = sym.diff(r*f,r)
res_sym = sym.simplify(df/r)
res_num = sym.lambdify(r,res_sym,'numpy')
rarr = np.logspace(-12,-2,20)
darr = res_num(rarr)
plt.figure()
plt.loglog(rarr,np.abs(darr),'.-')
plt.xlabel(r'$r$')
plt.ylabel(r'$|\frac{1}{r} \frac{d}{dr}(r^2 \ln r)|$')
plt.tight_layout()
plt.show()
resulting in
As you see, the resulting function was vectorized thanks to lambdify using numpy during the conversion from symbolic to numeric function. Obviously, the best solution is the symbolic one as long as the resulting function is not so complicated to make its practical use impossible. I urge you to first try the symbolic version, and if for some reason it's not applicable, switch to complex step differentiation, with due caution.

resampling, interpolating matrix

I'm trying to interpolate some data for the purpose of plotting. For instance, given N data points, I'd like to be able to generate a "smooth" plot, made up of 10*N or so interpolated data points.
My approach is to generate an N-by-10*N matrix and compute the inner product the original vector and the matrix I generated, yielding a 1-by-10*N vector. I've already worked out the math I'd like to use for the interpolation, but my code is pretty slow. I'm pretty new to Python, so I'm hopeful that some of the experts here can give me some ideas of ways I can try to speed up my code.
I think part of the problem is that generating the matrix requires 10*N^2 calls to the following function:
def sinc(x):
import math
try:
return math.sin(math.pi * x) / (math.pi * x)
except ZeroDivisionError:
return 1.0
(This comes from sampling theory. Essentially, I'm attempting to recreate a signal from its samples, and upsample it to a higher frequency.)
The matrix is generated by the following:
def resampleMatrix(Tso, Tsf, o, f):
from numpy import array as npar
retval = []
for i in range(f):
retval.append([sinc((Tsf*i - Tso*j)/Tso) for j in range(o)])
return npar(retval)
I'm considering breaking up the task into smaller pieces because I don't like the idea of an N^2 matrix sitting in memory. I could probably make 'resampleMatrix' into a generator function and do the inner product row-by-row, but I don't think that will speed up my code much until I start paging stuff in and out of memory.
Thanks in advance for your suggestions!
This is upsampling. See Help with resampling/upsampling for some example solutions.
A fast way to do this (for offline data, like your plotting application) is to use FFTs. This is what SciPy's native resample() function does. It assumes a periodic signal, though, so it's not exactly the same. See this reference:
Here’s the second issue regarding time-domain real signal interpolation, and it’s a big deal indeed. This exact interpolation algorithm provides correct results only if the original x(n) sequence is periodic within its full time inter­val.
Your function assumes the signal's samples are all 0 outside of the defined range, so the two methods will diverge away from the center point. If you pad the signal with lots of zeros first, it will produce a very close result. There are several more zeros past the edge of the plot not shown here:
Cubic interpolation won't be correct for resampling purposes. This example is an extreme case (near the sampling frequency), but as you can see, cubic interpolation isn't even close. For lower frequencies it should be pretty accurate.
If you want to interpolate data in a quite general and fast way, splines or polynomials are very useful. Scipy has the scipy.interpolate module, which is very useful. You can find many examples in the official pages.
Your question isn't entirely clear; you're trying to optimize the code you posted, right?
Re-writing sinc like this should speed it up considerably. This implementation avoids checking that the math module is imported on every call, doesn't do attribute access three times, and replaces exception handling with a conditional expression:
from math import sin, pi
def sinc(x):
return (sin(pi * x) / (pi * x)) if x != 0 else 1.0
You could also try avoiding creating the matrix twice (and holding it twice in parallel in memory) by creating a numpy.array directly (not from a list of lists):
def resampleMatrix(Tso, Tsf, o, f):
retval = numpy.zeros((f, o))
for i in xrange(f):
for j in xrange(o):
retval[i][j] = sinc((Tsf*i - Tso*j)/Tso)
return retval
(replace xrange with range on Python 3.0 and above)
Finally, you can create rows with numpy.arange as well as calling numpy.sinc on each row or even on the entire matrix:
def resampleMatrix(Tso, Tsf, o, f):
retval = numpy.zeros((f, o))
for i in xrange(f):
retval[i] = numpy.arange(Tsf*i / Tso, Tsf*i / Tso - o, -1.0)
return numpy.sinc(retval)
This should be significantly faster than your original implementation. Try different combinations of these ideas and test their performance, see which works out the best!
I'm not quite sure what you're trying to do, but there are some speedups you can do to create the matrix. Braincore's suggestion to use numpy.sinc is a first step, but the second is to realize that numpy functions want to work on numpy arrays, where they can do loops at C speen, and can do it faster than on individual elements.
def resampleMatrix(Tso, Tsf, o, f):
retval = numpy.sinc((Tsi*numpy.arange(i)[:,numpy.newaxis]
-Tso*numpy.arange(j)[numpy.newaxis,:])/Tso)
return retval
The trick is that by indexing the aranges with the numpy.newaxis, numpy converts the array with shape i to one with shape i x 1, and the array with shape j, to shape 1 x j. At the subtraction step, numpy will "broadcast" the each input to act as a i x j shaped array and the do the subtraction. ("Broadcast" is numpy's term, reflecting the fact no additional copy is made to stretch the i x 1 to i x j.)
Now the numpy.sinc can iterate over all the elements in compiled code, much quicker than any for-loop you could write.
(There's an additional speed-up available if you do the division before the subtraction, especially since inthe latter the division cancels the multiplication.)
The only drawback is that you now pay for an extra Nx10*N array to hold the difference. This might be a dealbreaker if N is large and memory is an issue.
Otherwise, you should be able to write this using numpy.convolve. From what little I just learned about sinc-interpolation, I'd say you want something like numpy.convolve(orig,numpy.sinc(numpy.arange(j)),mode="same"). But I'm probably wrong about the specifics.
If your only interest is to 'generate a "smooth" plot' I would just go with a simple polynomial spline curve fit:
For any two adjacent data points the coefficients of a third degree polynomial function can be computed from the coordinates of those data points and the two additional points to their left and right (disregarding boundary points.) This will generate points on a nice smooth curve with a continuous first dirivitive. There's a straight forward formula for converting 4 coordinates to 4 polynomial coefficients but I don't want to deprive you of the fun of looking it up ;o).
Here's a minimal example of 1d interpolation with scipy -- not as much fun as reinventing, but.
The plot looks like sinc, which is no coincidence:
try google spline resample "approximate sinc".
(Presumably less local / more taps ⇒ better approximation,
but I have no idea how local UnivariateSplines are.)
""" interpolate with scipy.interpolate.UnivariateSpline """
from __future__ import division
import numpy as np
from scipy.interpolate import UnivariateSpline
import pylab as pl
N = 10
H = 8
x = np.arange(N+1)
xup = np.arange( 0, N, 1/H )
y = np.zeros(N+1); y[N//2] = 100
interpolator = UnivariateSpline( x, y, k=3, s=0 ) # s=0 interpolates
yup = interpolator( xup )
np.set_printoptions( 1, threshold=100, suppress=True ) # .1f
print "yup:", yup
pl.plot( x, y, "green", xup, yup, "blue" )
pl.show()
Added feb 2010: see also basic-spline-interpolation-in-a-few-lines-of-numpy
Small improvement. Use the built-in numpy.sinc(x) function which runs in compiled C code.
Possible larger improvement: Can you do the interpolation on the fly (as the plotting occurs)? Or are you tied to a plotting library that only accepts a matrix?
I recommend that you check your algorithm, as it is a non-trivial problem. Specifically, I suggest you gain access to the article "Function Plotting Using Conic Splines" (IEEE Computer Graphics and Applications) by Hu and Pavlidis (1991). Their algorithm implementation allows for adaptive sampling of the function, such that the rendering time is smaller than with regularly spaced approaches.
The abstract follows:
A method is presented whereby, given a
mathematical description of a
function, a conic spline approximating
the plot of the function is produced.
Conic arcs were selected as the
primitive curves because there are
simple incremental plotting algorithms
for conics already included in some
device drivers, and there are simple
algorithms for local approximations by
conics. A split-and-merge algorithm
for choosing the knots adaptively,
according to shape analysis of the
original function based on its
first-order derivatives, is
introduced.

Categories

Resources