I have a function, I want to get its integral function, something like this:
That is, instead of getting a single integration value at point x, I need to get values at multiple points.
For example:
Let's say I want the range at (-20,20)
def f(x):
return x**2
x_vals = np.arange(-20, 21, 1)
y_vals =[integrate.nquad(f, [[0, x_val]]) for x_val in x_vals ]
plt.plot(x_vals, y_vals,'-', color = 'r')
The problem
In the example code I give above, for each point, the integration is done from scratch. In my real code, the f(x) is pretty complex, and it's a multiple integration, so the running time is simply too slow(Scipy: speed up integration when doing it for the whole surface?).
I'm wondering if there is any way of efficient generating the Phi(x), at a giving range.
My thoughs:
The integration value at point Phi(20) is calucation from Phi(19), and Phi(19) is from Phi(18) and so on. So when we get Phi(20), in reality we also get the series of (-20,-19,-18,-17 ... 18,19,20). Except that we didn't save the value.
So I'm thinking, is it possible to create save points for a integrate function, so when it passes a save point, the value would get saved and continues to the next point. Therefore, by a single process toward 20, we could also get the value at (-20,-19,-18,-17 ... 18,19,20)
One could implement the strategy you outlined by integrating only over the short intervals (between consecutive x-values) and then taking the cumulative sum of the results. Like this:
import numpy as np
import scipy.integrate as si
def f(x):
return x**2
x_vals = np.arange(-20, 21, 1)
pieces = [si.quad(f, x_vals[i], x_vals[i+1])[0] for i in range(len(x_vals)-1)]
y_vals = np.cumsum([0] + pieces)
Here pieces are the integrals over short intervals, which get summed to produce y-values. As written, this code outputs a function that is 0 at the beginning of the range of integration which is -20. One can, of course, subtract the y-value that corresponds to x=0 in order to have the same normalization as on your plot.
That said, the split-and-sum process is unnecessary. When you find an indefinite integral of f, you are really solving the differential equation F' = f. And SciPy has a built-in method for that, odeint. Just use it:
import numpy as np
import scipy.integrate as si
def f(x):
return x**2
x_vals = np.arange(-20, 21, 1)
y_vals = si.odeint(lambda y,x: f(x), 0, x_vals)
The output is essential identical to the first version (within tiny computational errors), with less code. The reason for using lambda y,x: f(x) is that the first argument of odeint must be a function taking two arguments, the right-hand side of the equation y' = f(y, x).
For the equivalent version of user3717023's answer using scipy's solve_ivp you need to keep in mind the different ordering of x and y in the function f (different from the odeint version).
Further, keep in mind that you can only compute the solution up to a constant. So you might want to shift the result according to some given condition. In the example here (with the function f(x)=x^2 as given by the OP), I shifted the numeric solution such that it goes through the origin, matching the simplest analytic solution F(x)=x^3/3.
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import solve_ivp
def f(x):
return x**2
xs = np.linspace(-20, 20, 1001)
# This is the integration step:
sol = solve_ivp(lambda x, y: f(x), t_span=(xs[0], xs[-1]), y0=[0], t_eval=xs)
plt.plot(sol.t, sol.t**3/3, ls='-', c='C0', label="analytic: $F(x)=x^3/3$")
plt.plot(sol.t, sol.y[0], ls='--', c='C1', label="numeric solution")
plt.plot(sol.t, sol.y[0] - sol.y[0][sol.t.size//2], ls='-.', c='C3', label="shifted solution going through origin")
plt.legend()
In case you don't have an analytical version of the function f, but only xs and ys as data points, then you can use scipy's interp1d function to interpolate between the data points and pass on that interpolating function the same way as before:
from scipy.interpolate import interp1d
f = interp1d(xs, ys)
Related
I am trying to invert an interpolated function using scipy's interpolate function. Let's say I create an interpolated function,
import scipy.interpolate as interpolate
interpolatedfunction = interpolated.interp1d(xvariable,data,kind='cubic')
Is there some function that can find x when I specify a:
interpolatedfunction(x) == a
In other words, "I want my interpolated function to equal a; what is the value of xvariable such that my function is equal to a?"
I appreciate I can do this with some numerical scheme, but is there a more straightforward method? What if the interpolated function is multivalued in xvariable?
There are dedicated methods for finding roots of cubic splines. The simplest to use is the .roots() method of InterpolatedUnivariateSpline object:
spl = InterpolatedUnivariateSpline(x, y)
roots = spl.roots()
This finds all of the roots instead of just one, as generic solvers (fsolve, brentq, newton, bisect, etc) do.
x = np.arange(20)
y = np.cos(np.arange(20))
spl = InterpolatedUnivariateSpline(x, y)
print(spl.roots())
outputs array([ 1.56669456, 4.71145244, 7.85321627, 10.99554642, 14.13792756, 17.28271674])
However, you want to equate the spline to some arbitrary number a, rather than 0. One option is to rebuild the spline (you can't just subtract a from it):
solutions = InterpolatedUnivariateSpline(x, y - a).roots()
Note that none of this will work with the function returned by interp1d; it does not have roots method. For that function, using generic methods like fsolve is an option, but you will only get one root at a time from it. In any case, why use interp1d for cubic splines when there are more powerful ways to do the same kind of interpolation?
Non-object-oriented way
Instead of rebuilding the spline after subtracting a from data, one can directly subtract a from spline coefficients. This requires us to drop down to non-object-oriented interpolation methods. Specifically, sproot takes in a tck tuple prepared by splrep, as follows:
tck = splrep(x, y, k=3, s=0)
tck_mod = (tck[0], tck[1] - a, tck[2])
solutions = sproot(tck_mod)
I'm not sure if messing with tck is worth the gain here, as it's possible that the bulk of computation time will be in root-finding anyway. But it's good to have alternatives.
After creating an interpolated function interp_fn, you can find the value of x where interp_fn(x) == a by the roots of the function
interp_fn2 = lambda x: interp_fn(x) - a
There are number of options to find the roots in scipy.optimize. For instance, to use Newton's method with the initial value starting at 10:
from scipy import optimize
optimize.newton(interp_fn2, 10)
Actual example
Create an interpolated function and then find the roots where fn(x) == 5
import numpy as np
from scipy import interpolate, optimize
x = np.arange(10)
y = 1 + 6*np.arange(10) - np.arange(10)**2
y2 = 5*np.ones_like(x)
plt.scatter(x,y)
plt.plot(x,y)
plt.plot(x,y2,'k-')
plt.show()
# create the interpolated function, and then the offset
# function used to find the roots
interp_fn = interpolate.interp1d(x, y, 'quadratic')
interp_fn2 = lambda x: interp_fn(x)-5
# to find the roots, we need to supply a starting value
# because there are more than 1 root in our range, we need
# to supply multiple starting values. They should be
# fairly close to the actual root
root1, root2 = optimize.newton(interp_fn2, 1), optimize.newton(interp_fn2, 5)
root1, root2
# returns:
(0.76393202250021064, 5.2360679774997898)
If your data are monotonic you might also try the following:
inversefunction = interpolated.interp1d(data, xvariable, kind='cubic')
Mentioning another option because I found this page in a google search and the other option works for my simple use case. Hopefully it'll be of use to someone.
If the function you're interpolating is very simple and always has a 1:1 relationship between y and x, then you can simply take your data, swap x and y when you pass it into interp1d, and then call the interpolation function in that direction.
Adapting code from https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate
x = np.arange(0, 10)
y = np.exp(-x/3.0)
f = interpolate.interp1d(x, y)
xnew = np.arange(0, 9, 0.1)
ynew = f(xnew)
plt.plot(x, y, 'o', xnew, ynew, '-')
plt.show()
When x and y have been swapped you can call swappedInterpolationFunction(a) to get the x value where that would occur.
f = interpolate.interp1d(y, x)
xnew = np.arange(np.exp(-9/3), np.exp(0), 0.01)
ynew = f(xnew)
plt.plot(y, x, 'o', xnew, ynew, '-')
plt.title("Inverted")
plt.show()
Of course, if the function ever has multiple x values for a given y value (like sine or a parabola) then this will not work because it will no longer be a 1:1 function from x to y, and the above answers are necessary. This is just a simplification in a limited use case.
I'm trying to use solve_ivp from scipy in Python to solve an IVP. I specified the tspan argument of solve_ivp to be (0,10), as shown below. However, for some reason, the solutions I get always stop around t=2.5.
from scipy.integrate import solve_ivp
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as optim
def dudt(t, u):
return u*(1-u/12)-4*np.heaviside(-(t-5), 1)
ic = [2,4,6,8,10,12,14,16,18,20]
sol = solve_ivp(dudt, (0, 10), ic, t_eval=np.linspace(0, 10, 10000))
for solution in sol.y:
y = [y for y in solution if y >= 0]
t = sol.t[:len(y)]
plt.plot(t, y)
What is going wrong
You should always look at what the solver returns. In this case it gives
message: 'Required step size is less than spacing between numbers.'
Think of the process of solving your initial value problem with scipy.integrate.solve_ivp as repeatedly estimating a direction and then going a small step in that direction. The above error means that the solutions to your equation change so fast that taking the minimal step size possible is too far. But your equation is simple enough that at least for t =< 5 where 4*np.heaviside(-(t-5), 1) always gives 4 it can be solved exactly/symbolically. I will explain more for t > 5 later.
Symbolic Solution
Sympy can solve your differential equation. While you can provide it an initial value it would have taken much longer to solve it once for each of your initial values. So instead I told it to give me all solutions and then I calculated the parameters C1 for your initial value separately.
import numpy as np
import matplotlib.pyplot as plt
from sympy import *
ics = [2,4,6,8,10,12,14,16,18,20]
f = symbols("f", cls=Function)
t = symbols("t")
eq = Eq(f(t).diff(t),f(t)*(1-f(t)/12)-4)
base_sol = dsolve(eq)
c1s = [solve(base_sol.args[1].subs({t:0})-ic) for ic in ics]
# Apparently sympy is unhappy that numpy does not supply a cotangent.
# So I do that manually.
sols = [lambdify(t, base_sol.args[1].subs({symbols('C1'):C1[0]}),
modules=['numpy', {'cot':lambda x:1/np.tan(x)}]) for C1 in c1s]
t = np.linspace(0, 5, 10000)
for sol in sols:
y = sol(t)
mask = (y > -5) & (y < 20)
plt.plot(t[mask], y[mask])
At first glance the picture looks odd. Especially the blue and orange straight line part. This is just due to the values lying outside the masked range so matplotlib connects them directly. What is actually happening is a sudden jump. That jumped tipped off the numeric ode solver earlier. You can see it even more clearly when you make sympy print the first solution.
The tangent is known to have a jump at pi/4 and if you solve the argument of the tangent above you get 2.47241377386575. Which is probably where your plotting stopped.
Now what about t>5?
Unfortunately your equation is not continuous in t=5. One approach would be to solve the equation for t>5 separately for the initial values given by following the solutions of the first equation. But that is an other question for an other day.
I'd like to be able to numerically differentiate and integrate arrays in Python. I am aware that there are functions for this in numpy and scipy. I am noticing an offset however, when integrating.
As an example, I start with an initial function, y=cos(x).
image, y = cos(x)
I then take the derivative using numpy.gradient. It works as expected (plots as -sin(x)):
image, dydx = d/dx(cos(x))
When I integrate the derivative with scipy.cumtrapz, I expect to get back the initial function. However, there is some offset. I realize that the integral of -sin(x) is cos(x)+constant, so is the constant not accounted for with cumtrapz numerical integration?
image, y = int(dydx)
My concern is, if you have some arbitrary signal, and did not know the initial/boundary conditions, will the +constant term be unaccounted for with cumtrapz? Is there a solution for this with cumtrapz?
The code I used is as follows:
import numpy as np
import matplotlib.pyplot as plt
from scipy import integrate
x = np.linspace(-2*np.pi, 2*np.pi,100)
y = np.cos(x) #starting function
dydx = np.gradient(y, x) #derivative of function
dydx_int = integrate.cumtrapz(dydx, x, initial = 0) #integral of derivative
fig, ax = plt.subplots()
ax.plot(x, y)
ax.plot(x, dydx)
ax.plot(x, dydx_int)
ax.legend(['y = cos(x)', 'dydx = d/dx(cos(x))', 'y = int(dydx)'])
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.show()
cumtrapz(), cumsum() and similar do what they state they do: summing the input array cumulatively. If the summed array starts with 0 as with your input array (dydx), the first element at the summed array is also zero.
To fix it in your code, you should add the offset to the cumulated sum:
dydx_int = dydx_int + y[0]
But for the general question about initial conditions of an integral:
My concern is, if you have some arbitrary signal, and did not know the initial/boundary conditions, will the +constant term be unaccounted for with cumtrapz? Is there a solution for this with cumtrapz?
Well, if you don't know the initial/boundry condition, cumtrapz won't know either... Your question doesn't quite make sense..
I have a time series x(t) that is a NumPy array. My assignment tells me that I need to find the integral of this data with time.
How am I supposed to do this? It's not a function that I need to integrate, it's a list of data.
It depends on the statement of the problem. A rude approach would be something like this
import numpy as np
import scipy as sp
t = np.linspace(-1, 1, 100)
x = t*t
delta = t[1] - t[0]
I = sum(delta*x)
You can use Simpson's Rule. A routine that does that for you is simps in spicy.integrate.
>>> help(scipy.integrate.simps)
Help on function simps in module scipy.integrate.quadrature:
simps(y, x=None, dx=1, axis=-1, even='avg')
Integrate y(x) using samples along the given axis and the composite
Simpson's rule. If x is None, spacing of dx is assumed.
If there are an even number of samples, N, then there are an odd
number of intervals (N-1), but Simpson's rule requires an even number
of intervals. The parameter 'even' controls how this is handled.
I am trying to utilize Numpy's fft function, however when I give the function a simple gausian function the fft of that gausian function is not a gausian, its close but its halved so that each half is at either end of the x axis.
The Gaussian function I'm calculating is
y = exp(-x^2)
Here is my code:
from cmath import *
from numpy import multiply
from numpy.fft import fft
from pylab import plot, show
""" Basically the standard range() function but with float support """
def frange (min_value, max_value, step):
value = float(min_value)
array = []
while value < float(max_value):
array.append(value)
value += float(step)
return array
N = 256.0 # number of steps
y = []
x = frange(-5, 5, 10/N)
# fill array y with values of the Gaussian function
cache = -multiply(x, x)
for i in cache: y.append(exp(i))
Y = fft(y)
# plot the fft of the gausian function
plot(x, abs(Y))
show()
The result is not quite right, cause the FFT of a Gaussian function should be a Gaussian function itself...
np.fft.fft returns a result in so-called "standard order": (from the docs)
If A = fft(a, n), then A[0]
contains the zero-frequency term (the
mean of the signal), which is always
purely real for real inputs. Then
A[1:n/2] contains the
positive-frequency terms, and
A[n/2+1:] contains the
negative-frequency terms, in order of
decreasingly negative frequency.
The function np.fft.fftshift rearranges the result into the order most humans expect (and which is good for plotting):
The routine np.fft.fftshift(A)
shifts transforms and their
frequencies to put the zero-frequency
components in the middle...
So using np.fft.fftshift:
import matplotlib.pyplot as plt
import numpy as np
N = 128
x = np.arange(-5, 5, 10./(2 * N))
y = np.exp(-x * x)
y_fft = np.fft.fftshift(np.abs(np.fft.fft(y))) / np.sqrt(len(y))
plt.plot(x,y)
plt.plot(x,y_fft)
plt.show()
Your result is not even close to a Gaussian, not even one split into two halves.
To get the result you expect, you will have to position your own Gaussian with the center at index 0, and the result will also be positioned that way. Try the following code:
from pylab import *
N = 128
x = r_[arange(0, 5, 5./N), arange(-5, 0, 5./N)]
y = exp(-x*x)
y_fft = fft(y) / sqrt(2 * N)
plot(r_[y[N:], y[:N]])
plot(r_[y_fft[N:], y_fft[:N]])
show()
The plot commands split the arrays in two halfs and swap them to get a nicer picture.
It is being displayed with the center (i.e. mean) at coefficient index zero. That is why it appears that the right half is on the left, and vice versa.
EDIT: Explore the following code:
import scipy
import scipy.signal as sig
import pylab
x = sig.gaussian(2048, 10)
X = scipy.absolute(scipy.fft(x))
pylab.plot(x)
pylab.plot(X)
pylab.plot(X[range(1024, 2048)+range(0, 1024)])
The last line will plot X starting from the center of the vector, then wrap around to the beginning.
A fourier transform implicitly repeats indefinitely, as it is a transform of a signal that implicitly repeats indefinitely. Note that when you pass y to be transformed, the x values are not supplied, so in fact the gaussian that is transformed is one centred on the median value between 0 and 256, so 128.
Remember also that translation of f(x) is phase change of F(x).
Following on from Sven Marnach's answer, a simpler version would be this:
from pylab import *
N = 128
x = ifftshift(arange(-5,5,5./N))
y = exp(-x*x)
y_fft = fft(y) / sqrt(2 * N)
plot(fftshift(y))
plot(fftshift(y_fft))
show()
This yields a plot identical to the above one.
The key (and this seems strange to me) is that NumPy's assumed data ordering --- in both frequency and time domains --- is to have the "zero" value first. This is not what I'd expect from other implementations of FFT, such as the FFTW3 libraries in C.
This was slightly fudged in the answers from unutbu and Steve Tjoa above, because they're taking the absolute value of the FFT before plotting it, thus wiping away the phase issues resulting from not using the "standard order" in time.