I am trying to invert an interpolated function using scipy's interpolate function. Let's say I create an interpolated function,
import scipy.interpolate as interpolate
interpolatedfunction = interpolated.interp1d(xvariable,data,kind='cubic')
Is there some function that can find x when I specify a:
interpolatedfunction(x) == a
In other words, "I want my interpolated function to equal a; what is the value of xvariable such that my function is equal to a?"
I appreciate I can do this with some numerical scheme, but is there a more straightforward method? What if the interpolated function is multivalued in xvariable?
There are dedicated methods for finding roots of cubic splines. The simplest to use is the .roots() method of InterpolatedUnivariateSpline object:
spl = InterpolatedUnivariateSpline(x, y)
roots = spl.roots()
This finds all of the roots instead of just one, as generic solvers (fsolve, brentq, newton, bisect, etc) do.
x = np.arange(20)
y = np.cos(np.arange(20))
spl = InterpolatedUnivariateSpline(x, y)
print(spl.roots())
outputs array([ 1.56669456, 4.71145244, 7.85321627, 10.99554642, 14.13792756, 17.28271674])
However, you want to equate the spline to some arbitrary number a, rather than 0. One option is to rebuild the spline (you can't just subtract a from it):
solutions = InterpolatedUnivariateSpline(x, y - a).roots()
Note that none of this will work with the function returned by interp1d; it does not have roots method. For that function, using generic methods like fsolve is an option, but you will only get one root at a time from it. In any case, why use interp1d for cubic splines when there are more powerful ways to do the same kind of interpolation?
Non-object-oriented way
Instead of rebuilding the spline after subtracting a from data, one can directly subtract a from spline coefficients. This requires us to drop down to non-object-oriented interpolation methods. Specifically, sproot takes in a tck tuple prepared by splrep, as follows:
tck = splrep(x, y, k=3, s=0)
tck_mod = (tck[0], tck[1] - a, tck[2])
solutions = sproot(tck_mod)
I'm not sure if messing with tck is worth the gain here, as it's possible that the bulk of computation time will be in root-finding anyway. But it's good to have alternatives.
After creating an interpolated function interp_fn, you can find the value of x where interp_fn(x) == a by the roots of the function
interp_fn2 = lambda x: interp_fn(x) - a
There are number of options to find the roots in scipy.optimize. For instance, to use Newton's method with the initial value starting at 10:
from scipy import optimize
optimize.newton(interp_fn2, 10)
Actual example
Create an interpolated function and then find the roots where fn(x) == 5
import numpy as np
from scipy import interpolate, optimize
x = np.arange(10)
y = 1 + 6*np.arange(10) - np.arange(10)**2
y2 = 5*np.ones_like(x)
plt.scatter(x,y)
plt.plot(x,y)
plt.plot(x,y2,'k-')
plt.show()
# create the interpolated function, and then the offset
# function used to find the roots
interp_fn = interpolate.interp1d(x, y, 'quadratic')
interp_fn2 = lambda x: interp_fn(x)-5
# to find the roots, we need to supply a starting value
# because there are more than 1 root in our range, we need
# to supply multiple starting values. They should be
# fairly close to the actual root
root1, root2 = optimize.newton(interp_fn2, 1), optimize.newton(interp_fn2, 5)
root1, root2
# returns:
(0.76393202250021064, 5.2360679774997898)
If your data are monotonic you might also try the following:
inversefunction = interpolated.interp1d(data, xvariable, kind='cubic')
Mentioning another option because I found this page in a google search and the other option works for my simple use case. Hopefully it'll be of use to someone.
If the function you're interpolating is very simple and always has a 1:1 relationship between y and x, then you can simply take your data, swap x and y when you pass it into interp1d, and then call the interpolation function in that direction.
Adapting code from https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate
x = np.arange(0, 10)
y = np.exp(-x/3.0)
f = interpolate.interp1d(x, y)
xnew = np.arange(0, 9, 0.1)
ynew = f(xnew)
plt.plot(x, y, 'o', xnew, ynew, '-')
plt.show()
When x and y have been swapped you can call swappedInterpolationFunction(a) to get the x value where that would occur.
f = interpolate.interp1d(y, x)
xnew = np.arange(np.exp(-9/3), np.exp(0), 0.01)
ynew = f(xnew)
plt.plot(y, x, 'o', xnew, ynew, '-')
plt.title("Inverted")
plt.show()
Of course, if the function ever has multiple x values for a given y value (like sine or a parabola) then this will not work because it will no longer be a 1:1 function from x to y, and the above answers are necessary. This is just a simplification in a limited use case.
Related
I'd like to be able to numerically differentiate and integrate arrays in Python. I am aware that there are functions for this in numpy and scipy. I am noticing an offset however, when integrating.
As an example, I start with an initial function, y=cos(x).
image, y = cos(x)
I then take the derivative using numpy.gradient. It works as expected (plots as -sin(x)):
image, dydx = d/dx(cos(x))
When I integrate the derivative with scipy.cumtrapz, I expect to get back the initial function. However, there is some offset. I realize that the integral of -sin(x) is cos(x)+constant, so is the constant not accounted for with cumtrapz numerical integration?
image, y = int(dydx)
My concern is, if you have some arbitrary signal, and did not know the initial/boundary conditions, will the +constant term be unaccounted for with cumtrapz? Is there a solution for this with cumtrapz?
The code I used is as follows:
import numpy as np
import matplotlib.pyplot as plt
from scipy import integrate
x = np.linspace(-2*np.pi, 2*np.pi,100)
y = np.cos(x) #starting function
dydx = np.gradient(y, x) #derivative of function
dydx_int = integrate.cumtrapz(dydx, x, initial = 0) #integral of derivative
fig, ax = plt.subplots()
ax.plot(x, y)
ax.plot(x, dydx)
ax.plot(x, dydx_int)
ax.legend(['y = cos(x)', 'dydx = d/dx(cos(x))', 'y = int(dydx)'])
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.show()
cumtrapz(), cumsum() and similar do what they state they do: summing the input array cumulatively. If the summed array starts with 0 as with your input array (dydx), the first element at the summed array is also zero.
To fix it in your code, you should add the offset to the cumulated sum:
dydx_int = dydx_int + y[0]
But for the general question about initial conditions of an integral:
My concern is, if you have some arbitrary signal, and did not know the initial/boundary conditions, will the +constant term be unaccounted for with cumtrapz? Is there a solution for this with cumtrapz?
Well, if you don't know the initial/boundry condition, cumtrapz won't know either... Your question doesn't quite make sense..
I have already checked post1, post2, post3 and post4 but didn't help.
I have a data about a specific plant including two variables called "Age" and "Height". The correlation between them is non-linear.
To fit a model, one solution I assume is as follows:
If the non-linear function is
then we can bring in a new variable k where
so we have changed the first non-linear function into a multilinear regression one. Based on this, I have the following code:
data['K'] = data["Age"].pow(2)
x = data[["Age", "K"]]
y = data["Height"]
model = LinearRegression().fit(x, y)
print(model.score(x, y)) # = 0.9908571840250205
Am I doing correctly?
How to do with cubic and exponential functions?
Thanks.
for cubic polynomials
data['x2'] = data["Age"].pow(2)
data['x3'] = data["Age"].pow(3)
x = data[["Age", "x2","x3"]]
y = data["Height"]
model = LinearRegression().fit(x, y)
print(model.score(x, y))
you can handle exponential data by fitting log(y).
or find some library that can fit polynomials automatically t.ex: https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html
Hopefully you don't have a religious fervor for using SKLearn here because the answer I'm going to suggest is going to completely ignore it.
If you're interested doing regression analysis where you get to have complete autonomy with the fitting function, I'd suggest cutting directly down to the least-squares optimization algorithm that drives a lot of this type of work, which you can do using scipy
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import leastsq
x, y = np.array([0,1,2,3,4,5]), np.array([0,1,4,9,16,25])
# initial_guess[i] maps to p[x] in function_to_fit, must be reasonable
initial_guess = [1, 1, 1]
def function_to_fit(x, p):
return pow(p[0]*x, 2) + p[1]*x + p[2]
def residuals(p,y,x):
return y - function_to_fit(x,p)
cnsts = leastsq(
residuals,
initial_guess,
args=(y, x)
)[0]
fig, ax = plt.subplots()
ax.plot(x, y, 'o')
xi = np.arange(0,10,0.1)
ax.plot(xi, [function_to_fit(x, cnsts) for x in xi])
plt.show()
Now this is a numeric approach to the solution, so I would recommend taking a moment to make sure you understand the limitations of such an approach - but for problems like these I've found they're more than adequate for functionalizing non-linear data sets without trying to do some hand-waving to make it if inside a linearizable manifold.
What is the best way to do a quadratic spline in python? I used the interp1d, but this method is not what I pretend to do.
The is the example of python code:
from scipy.interpolate import interp1d
x = [5,6,7,8,9,10,11]
y = [2,9,6,3,4,20,6]
xx = [1,2,3,4,5,6,7,8,9,10,11,12]
f = interp1d(x, y, kind='quadratic')
yy = f(xx)
Every time I run this code I get this error:
ValueError: A value in x_new is below the interpolation range.
That error happens when you try to get a point out of the interpolation range. For instance if you only have data between 5 and 11 you only can interpolate within this range (otherwise we would be talking about extrapolations). However the function "scipy.interpolate.CubicSpline" interpolates and extrapolates, so you can get results out of the range.
Try adding the parameter fill_value:
f= interp1d(x, y, kind='quadratic', fill_value='extrapolate')
Values of xx: 1,2,34 and 12 are out of the initial data you provided so the spline must be built in order to handle this.
I have a function, I want to get its integral function, something like this:
That is, instead of getting a single integration value at point x, I need to get values at multiple points.
For example:
Let's say I want the range at (-20,20)
def f(x):
return x**2
x_vals = np.arange(-20, 21, 1)
y_vals =[integrate.nquad(f, [[0, x_val]]) for x_val in x_vals ]
plt.plot(x_vals, y_vals,'-', color = 'r')
The problem
In the example code I give above, for each point, the integration is done from scratch. In my real code, the f(x) is pretty complex, and it's a multiple integration, so the running time is simply too slow(Scipy: speed up integration when doing it for the whole surface?).
I'm wondering if there is any way of efficient generating the Phi(x), at a giving range.
My thoughs:
The integration value at point Phi(20) is calucation from Phi(19), and Phi(19) is from Phi(18) and so on. So when we get Phi(20), in reality we also get the series of (-20,-19,-18,-17 ... 18,19,20). Except that we didn't save the value.
So I'm thinking, is it possible to create save points for a integrate function, so when it passes a save point, the value would get saved and continues to the next point. Therefore, by a single process toward 20, we could also get the value at (-20,-19,-18,-17 ... 18,19,20)
One could implement the strategy you outlined by integrating only over the short intervals (between consecutive x-values) and then taking the cumulative sum of the results. Like this:
import numpy as np
import scipy.integrate as si
def f(x):
return x**2
x_vals = np.arange(-20, 21, 1)
pieces = [si.quad(f, x_vals[i], x_vals[i+1])[0] for i in range(len(x_vals)-1)]
y_vals = np.cumsum([0] + pieces)
Here pieces are the integrals over short intervals, which get summed to produce y-values. As written, this code outputs a function that is 0 at the beginning of the range of integration which is -20. One can, of course, subtract the y-value that corresponds to x=0 in order to have the same normalization as on your plot.
That said, the split-and-sum process is unnecessary. When you find an indefinite integral of f, you are really solving the differential equation F' = f. And SciPy has a built-in method for that, odeint. Just use it:
import numpy as np
import scipy.integrate as si
def f(x):
return x**2
x_vals = np.arange(-20, 21, 1)
y_vals = si.odeint(lambda y,x: f(x), 0, x_vals)
The output is essential identical to the first version (within tiny computational errors), with less code. The reason for using lambda y,x: f(x) is that the first argument of odeint must be a function taking two arguments, the right-hand side of the equation y' = f(y, x).
For the equivalent version of user3717023's answer using scipy's solve_ivp you need to keep in mind the different ordering of x and y in the function f (different from the odeint version).
Further, keep in mind that you can only compute the solution up to a constant. So you might want to shift the result according to some given condition. In the example here (with the function f(x)=x^2 as given by the OP), I shifted the numeric solution such that it goes through the origin, matching the simplest analytic solution F(x)=x^3/3.
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import solve_ivp
def f(x):
return x**2
xs = np.linspace(-20, 20, 1001)
# This is the integration step:
sol = solve_ivp(lambda x, y: f(x), t_span=(xs[0], xs[-1]), y0=[0], t_eval=xs)
plt.plot(sol.t, sol.t**3/3, ls='-', c='C0', label="analytic: $F(x)=x^3/3$")
plt.plot(sol.t, sol.y[0], ls='--', c='C1', label="numeric solution")
plt.plot(sol.t, sol.y[0] - sol.y[0][sol.t.size//2], ls='-.', c='C3', label="shifted solution going through origin")
plt.legend()
In case you don't have an analytical version of the function f, but only xs and ys as data points, then you can use scipy's interp1d function to interpolate between the data points and pass on that interpolating function the same way as before:
from scipy.interpolate import interp1d
f = interp1d(xs, ys)
Is there a way to get scipy's interp1d (in linear mode) to return the derivative at each interpolated point? I could certainly write my own 1D interpolation routine that does, but presumably scipy's is internally in C and therefore faster, and speed is already a major issue.
I am ultimately feeding a munging of the interpolated function into a multi-dimensional minimization routine, so being able to pass analytic derivatives would speed things up a lot rather than having the minimization routine try to calculate them itself. And interp1d must be calculating them internally --- so can I access them?
Use UnivariateSpline instead of interp1d, and use the derivative method to generate the first derivative. The example at the manual page here is pretty self-explanatory.
You can combine scipy.interpolate.interp1d and scipy.misc.derivative, but there is something that must be taken into account:
When calling derivative method with some dx chosen as spacing, the derivative at x0 will be computed as the first order difference between x0-dx and x0+dx:
derivative(f, x0, dx) = (f(x0+dx) - f(x0-dx)) / (2 * dx)
As a result, you can't use derivative closer than dx to your interpolated function range limits, because f will raise a ValueError telling you that your interpolated function is not defined there.
So, what can you do closer than dx to those range limits?
If f is defined inside [xmin, xmax] (range):
At the range limits you can move x0 a bit in:
x0 = xmin + dx or x0 = xmax - dx
For other points you can refine dx (make it smaller).
Uniform function outside interpolation range:
If your interpolated function happens to be uniform outside the interpolation range:
f(x0 < xmin) = f(x0 > xmax) = f_out
You may define your interpolated function like this:
f = interp1d(x, y, bound_errors=False, fill_value=f_out)
Linear interpolation case:
For the linear case it might be cheaper to calculate just once the differences between points:
import numpy as np
df = np.diff(y) / np.diff(x)
This way you can access them as the components of an array.
As far as I know, internally interp1d uses a BSpline.
The BSpline has a derivative which gives the nuth derivative.
So for an interploation f = interp1d(x, y) you can use
fd1 = f._spline.derivative(nu=1)
However, be carful as always when using functions with leading underscore.
I don't think bounds are checked if you choose values outside the interpolation region. It also seems, that BSpline appends a tailing dimension, so you have to write
val = fd1(0).item()
val_arr = fd1(np.array([0, 1]))[..., 0]