scipy.interp2d warning and large errors off the grid - python

I am trying to interpolate a 2-dimensional function and I am running into what I consider weird behavior by scipy.interpolate.interp2d. I don't understand what the problem is, and I'd be happy for any help or hints.
import numpy as np
from scipy.interpolate import interp2d
x = np.arange(10)
y = np.arange(20)
xx, yy = np.meshgrid(x, y, indexing = 'ij')
val = xx + yy
f = interp2d(xx, yy, val, kind = 'linear')
When I run this code, I get the following Warning:
scipy/interpolate/fitpack.py:981: RuntimeWarning: No more knots can be
added because the number of B-spline coefficients already exceeds the
number of data points m. Probable causes: either s or m too small.
(fp>s) kx,ky=1,1 nx,ny=18,15 m=200 fp=0.000000 s=0.000000
warnings.warn(RuntimeWarning(_iermess2[ierm][0] + _mess))
I don't understand why interp2d would use any splines when I tell it it should do linear interpolation. When I continue and evaluate f on the grid everything is good:
>>> f(1,1)
array([ 2.])
When I evaluate it off the grid, I get large errors, even though the function is clearly linear.
>>> f(1.1,1)
array([ 2.44361975])
I am a bit confused and I am not sure what the problem is. Did anybody run into similar problems? I used to work with matlab and this is almost 1:1 how I would do it there, but maybe I did something wrong.
When I use a rectangular grid (i.e. y = np.arange(10)) everything works fine by the way, but that isn't what I need. When I use cubic instead of linear interpolation, the error gets smaller (that doesn't make much sense either since the function is linear) but is still unacceptably large.

I tried a couple of things and managed to get (kind of) what I want using scipy.LinearNDInterpolator. However, I have to convert the grid to lists of points and values. Since the rest of my program stores coordinates and values in grid format that is kind of annoying, so if possible I'd still like to get the original code to work properly.
import numpy as np
import itertools
from scipy.interpolate import LinearNDInterpolator
x = np.arange(10)
y = np.arange(20)
coords = list(itertools.product(x,y))
val = [sum(c) for c in coords]
f = LinearNDInterpolator(coords, val)
>>>f(1,1)
array(2.0)
>>> f(1.1,1)
array(2.1)

Related

solve_ivp from scipy does not integrate the whole range of tspan

I'm trying to use solve_ivp from scipy in Python to solve an IVP. I specified the tspan argument of solve_ivp to be (0,10), as shown below. However, for some reason, the solutions I get always stop around t=2.5.
from scipy.integrate import solve_ivp
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as optim
def dudt(t, u):
return u*(1-u/12)-4*np.heaviside(-(t-5), 1)
ic = [2,4,6,8,10,12,14,16,18,20]
sol = solve_ivp(dudt, (0, 10), ic, t_eval=np.linspace(0, 10, 10000))
for solution in sol.y:
y = [y for y in solution if y >= 0]
t = sol.t[:len(y)]
plt.plot(t, y)
What is going wrong
You should always look at what the solver returns. In this case it gives
message: 'Required step size is less than spacing between numbers.'
Think of the process of solving your initial value problem with scipy.integrate.solve_ivp as repeatedly estimating a direction and then going a small step in that direction. The above error means that the solutions to your equation change so fast that taking the minimal step size possible is too far. But your equation is simple enough that at least for t =< 5 where 4*np.heaviside(-(t-5), 1) always gives 4 it can be solved exactly/symbolically. I will explain more for t > 5 later.
Symbolic Solution
Sympy can solve your differential equation. While you can provide it an initial value it would have taken much longer to solve it once for each of your initial values. So instead I told it to give me all solutions and then I calculated the parameters C1 for your initial value separately.
import numpy as np
import matplotlib.pyplot as plt
from sympy import *
ics = [2,4,6,8,10,12,14,16,18,20]
f = symbols("f", cls=Function)
t = symbols("t")
eq = Eq(f(t).diff(t),f(t)*(1-f(t)/12)-4)
base_sol = dsolve(eq)
c1s = [solve(base_sol.args[1].subs({t:0})-ic) for ic in ics]
# Apparently sympy is unhappy that numpy does not supply a cotangent.
# So I do that manually.
sols = [lambdify(t, base_sol.args[1].subs({symbols('C1'):C1[0]}),
modules=['numpy', {'cot':lambda x:1/np.tan(x)}]) for C1 in c1s]
t = np.linspace(0, 5, 10000)
for sol in sols:
y = sol(t)
mask = (y > -5) & (y < 20)
plt.plot(t[mask], y[mask])
At first glance the picture looks odd. Especially the blue and orange straight line part. This is just due to the values lying outside the masked range so matplotlib connects them directly. What is actually happening is a sudden jump. That jumped tipped off the numeric ode solver earlier. You can see it even more clearly when you make sympy print the first solution.
The tangent is known to have a jump at pi/4 and if you solve the argument of the tangent above you get 2.47241377386575. Which is probably where your plotting stopped.
Now what about t>5?
Unfortunately your equation is not continuous in t=5. One approach would be to solve the equation for t>5 separately for the initial values given by following the solutions of the first equation. But that is an other question for an other day.

Matlab's spline equivalent in Python, three inputs.

I'm converting a matlab script to python and I have it a roadblock.
In order to use cubic spline interpolation on a signal. The script uses the command spline with three inputs. f_o, c_signal and freq. so it looks like the following.
cav_sig_freq = spline(f_o, c_signal, freq)
f_o = 1x264, c_signal = 1x264 and freq = 1x264
From the documentation in matlab it reads that "s = spline(x,y,xq) returns a vector of interpolated values s corresponding to the query points in xq. The values of s are determined by cubic spline interpolation of x and y."
In python i'm struggling to find the correct python equivalent. Non of different interpolation functions I have found in the numpy and Scipy documentation let's use the third input like in Matlab.
Thanks for taking the time to read this. If there are any suggestion to how I can make it more clear, I'll be happy to do so.
Basically you will first need to generate something like an interpolant function, then give it your points. Using your variable names like this:
from scipy import interpolate
tck = interpolate.splrep(f_o, c_signal, s=0)
and then apply this tck to your points:
c_interp = interpolate.splev(freq, tck, der=0)
For more on this your can read this post.
Have you tried the InterpolatedUnivariateSpline within scipy.interpolate? If I understand the MatLab part correctly, then I think this will work.
import numpy as np
from scipy.interpolate import InterpolatedUnivariateSpline as ius
a = [1,2,3,4,5,6]
b = [r * 2 for r in a]
c = ius(a, b, k=1)
# what values do you want to query?
targets = [3.4, 2.789]
interpolated_values = c(targets)
It seems this may add one more step to your code than what MatLab provides, but I think it is what you want.

How to improve the performance when 2d interpolating/smoothing lines using scipy?

I have a moderate size data set, namely 20000 x 2 floats in a two column matrix. The first column is the the x column which represents the distance to the original point along a trajectory, another column is the y column which represents the work has done to the object. This data set is obtained from lab operations, so it's fairly arbitrary. I've already turned this structure into numpy array. I want to plot y vs x in a figure with a smooth curve. So I hope the following code could help me:
x_smooth = np.linspace(x.min(),x.max(), 20000)
y_smooth = spline(x, y, x_smooth)
plt.plot(x_smooth, y_smooth)
plt.show()
However, when my program execute the line y_smooth = spline(x,y,x_smooth), it takes a very long time,say 10 min, and even sometimes it will blow my memory that I have to restart my machine. I tried to reduce the chunk number to 200 and 2000 and none of them works. Then I checked the official scipy reference: scipy.interpolate.spline here. And they said that spline is deprecated in v 0.19, but I'm not using the new version. If spline is deprecated for quite a bit of the time, how to use the equivalent Bspline now? If spline is still functioning, then what causes the slow performance
One portion of my data could look like this:
13.202 0.0
13.234738 -0.051354643759
12.999116 0.144464320836
12.86252 0.07396528119
13.1157 0.10019738758
13.357109 -0.30288563381
13.234004 -0.045792536285
12.836279 0.0362257166275
12.851597 0.0542649286915
13.110691 0.105297378401
13.220619 -0.0182963209185
13.092143 0.116647353635
12.545676 -0.641112204849
12.728248 -0.147460703493
12.874176 0.0755861585235
12.746764 -0.111583725833
13.024995 0.148079528382
13.106033 0.119481137144
13.327233 -0.197666132456
13.142423 0.0901867159545
Several issues here. First and foremost, spline fitting you're trying to use is global. This means that you're solving a system of linear equations of the size 20000 at the construction time (evaluations are weakly sensitive to the dataset size though). This explains why the spline construction is slow.
scipy.interpolate.spline, furthermore, does linear algebra with full matrices --- hence memory consumption. This is precisely why it's deprecated from scipy 0.19.0 on.
The recommended replacement, available in scipy 0.19.0, is the BSpline/ make_interp_spline combo:
>>> spl = make_interp_spline(x, y, k=3) # returns a BSpline object
>>> y_new = spl(x_new) # evaluate
Notice it is not BSpline(x, y, k): BSpline objects do not know anything about the data or fitting or interpolation.
If you are using older scipy versions, your options are:
CubicSpline(x, y) for cubic splines
splrep(x, y, s=0) / splev combo.
However, you may want to think if you really need twice continuously differentiable functions. If only once differentiable functions are smooth enough for your purposes, then you can use local spline interpolations, e.g. Akima1DInterpolator or PchipInterpolator:
In [1]: import numpy as np
In [2]: from scipy.interpolate import pchip, splmake
In [3]: x = np.arange(1000)
In [4]: y = x**2
In [5]: %timeit pchip(x, y)
10 loops, best of 3: 58.9 ms per loop
In [6]: %timeit splmake(x, y)
1 loop, best of 3: 5.01 s per loop
Here splmake is what spline uses under the hood, and it's also deprecated.
Most interpolation methods in SciPy are function-generating, i.e. they return function which you can then execute on your x data. For example, using CubicSpline method, which connects all points with pointwise cubic spline would be
from scipy.interpolate import CubicSpline
spline = CubicSpline(x, y)
y_smooth = spline(x_smooth)
Based on your description I think that you correctly want to use BSpline. To do so, follow the pattern above, i.e.
from scipy.interpolate import BSpline
order = 2 # smoothness order
spline = BSpline(x, y, order)
y_smooth = spline(x_smooth)
Since you have such amount of data, it probably must be very noisy. I'd suggest using bigger spline order, which relates to the number of knots used for interpolation.
In both cases, your knots, i.e. x and y, should be sorted. These are 1D interpolation (since you are using only x_smooth as input). You can sort them using np.argsort. In short:
from scipy.interpolate import BSpline
sort_idx = np.argsort(x)
x_sorted = x[sort_idx]
y_sorted = y[sort_idx]
order = 20 # smoothness order
spline = BSpline(x_sorted, y_sorted, order)
y_smooth = spline(x_smooth)
plt.plot(x_sorted, y_sorted, '.')
plt.plot(x_smooth, y_smooth, '-')
plt.show()
My problem can be generalize to how to smoothly plot 2d graphs when data points are randomized. Since you are only dealing with two columns of data, if you sort your data by independent variable, at least your data points will be connected in order, and that's how matplotlib connects your data points.
#Dawid Laszuk has provided one solution to sort data by independent variable, and I'll display mine here:
plotting_columns = []
for i in range(len(x)):
plotting_columns.append(np.array([x[i],y[i]]))
plotting_columns.sort(key=lambda pair : pair[0])
plotting_columns = np.array(plotting_columns)
traditional sort() by filter condition could also do the sorting job efficient here.
But it's just your first step. The following steps are not hard either, to smooth your graph, you also want to keep your independent variable in linear ascending order with identical step interval, so
x_smooth = np.linspace(x.min(), x.max(), num_steps)
is enough to do the job. Usually, if you have plenty of data points, for example, more than 10000 points (correctness and accuracy are not human verifiable), you just want to plot the significant points to display the trend, then only smoothing x is enough. So you can plt.plot(x_smooth,y) simply.
You will notice that x_smooth will generate many x values that will not have corresponding y value. When you want to maintain the correctness, you need to use line fitting functions. As #ev-br demonstrated in his answer, spline functions are expensive on purpose. Therefore you might want to do some simpler trick. I smoothed my graph without using those functions. And you have some simple steps to it.
First, round your values so that your data will not vary too much in small intervals. (You can skip this step)
You can change one line when you constructing the plotting_columns as:
plotting_columns.append(np.around(np.array(x[i],y[i]), decimal=4))
After done this, you can filter out the point that you don't want to plot by choosing the points close to the x_smooth values:
new_plots = []
for i in range(len(x_smooth)):
if plotting_columns[:,0][i] >= x_smooth[i] - error and plotting_columns[:,0][i]< x_smooth[i] + error:
new_plots.append(plotting_columns[i])
else:
# Remove all points between the interval #
This is how I solved my problems.

Scipy: efficiently generate a series of integration (integral function)

I have a function, I want to get its integral function, something like this:
That is, instead of getting a single integration value at point x, I need to get values at multiple points.
For example:
Let's say I want the range at (-20,20)
def f(x):
return x**2
x_vals = np.arange(-20, 21, 1)
y_vals =[integrate.nquad(f, [[0, x_val]]) for x_val in x_vals ]
plt.plot(x_vals, y_vals,'-', color = 'r')
The problem
In the example code I give above, for each point, the integration is done from scratch. In my real code, the f(x) is pretty complex, and it's a multiple integration, so the running time is simply too slow(Scipy: speed up integration when doing it for the whole surface?).
I'm wondering if there is any way of efficient generating the Phi(x), at a giving range.
My thoughs:
The integration value at point Phi(20) is calucation from Phi(19), and Phi(19) is from Phi(18) and so on. So when we get Phi(20), in reality we also get the series of (-20,-19,-18,-17 ... 18,19,20). Except that we didn't save the value.
So I'm thinking, is it possible to create save points for a integrate function, so when it passes a save point, the value would get saved and continues to the next point. Therefore, by a single process toward 20, we could also get the value at (-20,-19,-18,-17 ... 18,19,20)
One could implement the strategy you outlined by integrating only over the short intervals (between consecutive x-values) and then taking the cumulative sum of the results. Like this:
import numpy as np
import scipy.integrate as si
def f(x):
return x**2
x_vals = np.arange(-20, 21, 1)
pieces = [si.quad(f, x_vals[i], x_vals[i+1])[0] for i in range(len(x_vals)-1)]
y_vals = np.cumsum([0] + pieces)
Here pieces are the integrals over short intervals, which get summed to produce y-values. As written, this code outputs a function that is 0 at the beginning of the range of integration which is -20. One can, of course, subtract the y-value that corresponds to x=0 in order to have the same normalization as on your plot.
That said, the split-and-sum process is unnecessary. When you find an indefinite integral of f, you are really solving the differential equation F' = f. And SciPy has a built-in method for that, odeint. Just use it:
import numpy as np
import scipy.integrate as si
def f(x):
return x**2
x_vals = np.arange(-20, 21, 1)
y_vals = si.odeint(lambda y,x: f(x), 0, x_vals)
The output is essential identical to the first version (within tiny computational errors), with less code. The reason for using lambda y,x: f(x) is that the first argument of odeint must be a function taking two arguments, the right-hand side of the equation y' = f(y, x).
For the equivalent version of user3717023's answer using scipy's solve_ivp you need to keep in mind the different ordering of x and y in the function f (different from the odeint version).
Further, keep in mind that you can only compute the solution up to a constant. So you might want to shift the result according to some given condition. In the example here (with the function f(x)=x^2 as given by the OP), I shifted the numeric solution such that it goes through the origin, matching the simplest analytic solution F(x)=x^3/3.
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import solve_ivp
def f(x):
return x**2
xs = np.linspace(-20, 20, 1001)
# This is the integration step:
sol = solve_ivp(lambda x, y: f(x), t_span=(xs[0], xs[-1]), y0=[0], t_eval=xs)
plt.plot(sol.t, sol.t**3/3, ls='-', c='C0', label="analytic: $F(x)=x^3/3$")
plt.plot(sol.t, sol.y[0], ls='--', c='C1', label="numeric solution")
plt.plot(sol.t, sol.y[0] - sol.y[0][sol.t.size//2], ls='-.', c='C3', label="shifted solution going through origin")
plt.legend()
In case you don't have an analytical version of the function f, but only xs and ys as data points, then you can use scipy's interp1d function to interpolate between the data points and pass on that interpolating function the same way as before:
from scipy.interpolate import interp1d
f = interp1d(xs, ys)

Python: Integration of Interpolation

I've got some question I cant solve:
#! /usr/bin/env python
import numpy as np
from scipy.interpolate import UnivariateSpline
from scipy.integrate import quad
import pylab as pl
x = ([0,10,20,30,40,50,60,70,...,4550,4560])
y = ([0,0,0,0,0,0,0,3,2,3,2,1,2,1,2,...,8,6,5,7,11,6,7,10,6,5,8,13,6,8,8,3])
s = UnivariateSpline(x, y, k=5, s=5)
xs = np.linspace(0, 4560, 4560)
ys = s(xs)
This is my code for making some Interpolation over some data.
In addition, I plotted this function.
But now I want to integrate it (from zero to infinity).
I tried
results = integrate.quad(ys, 0, 99999)
but it didnt work.
Can you give me some hints (or solutions) please? thanks
As Pierre GM said, you have to give a function for quad (I think also you can use np.inf for the upper bound, though here it doesn't matter as the splines go to 0 quickly anyways). However, what you want is:
s.integral(0, np.inf)
Since this is a spline, the UnivariateSpline object already implements an integral that should be better and faster.
According to the documentation of quad, you need to give a function as first argument, followed by the lower and upper bounds of the integration range, and some extra arguments for your function (type help(quad) in your shell for more info.
You passed an array as first argument (ys), which is why it doesn't work. You may want to try something like:
results = quad(s, xs[0], xs[-1])
or
results = quad(s, 0, 9999)

Categories

Resources