Integration offset with cumtrapz (python/scipy) - python

I'd like to be able to numerically differentiate and integrate arrays in Python. I am aware that there are functions for this in numpy and scipy. I am noticing an offset however, when integrating.
As an example, I start with an initial function, y=cos(x).
image, y = cos(x)
I then take the derivative using numpy.gradient. It works as expected (plots as -sin(x)):
image, dydx = d/dx(cos(x))
When I integrate the derivative with scipy.cumtrapz, I expect to get back the initial function. However, there is some offset. I realize that the integral of -sin(x) is cos(x)+constant, so is the constant not accounted for with cumtrapz numerical integration?
image, y = int(dydx)
My concern is, if you have some arbitrary signal, and did not know the initial/boundary conditions, will the +constant term be unaccounted for with cumtrapz? Is there a solution for this with cumtrapz?
The code I used is as follows:
import numpy as np
import matplotlib.pyplot as plt
from scipy import integrate
x = np.linspace(-2*np.pi, 2*np.pi,100)
y = np.cos(x) #starting function
dydx = np.gradient(y, x) #derivative of function
dydx_int = integrate.cumtrapz(dydx, x, initial = 0) #integral of derivative
fig, ax = plt.subplots()
ax.plot(x, y)
ax.plot(x, dydx)
ax.plot(x, dydx_int)
ax.legend(['y = cos(x)', 'dydx = d/dx(cos(x))', 'y = int(dydx)'])
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.show()

cumtrapz(), cumsum() and similar do what they state they do: summing the input array cumulatively. If the summed array starts with 0 as with your input array (dydx), the first element at the summed array is also zero.
To fix it in your code, you should add the offset to the cumulated sum:
dydx_int = dydx_int + y[0]
But for the general question about initial conditions of an integral:
My concern is, if you have some arbitrary signal, and did not know the initial/boundary conditions, will the +constant term be unaccounted for with cumtrapz? Is there a solution for this with cumtrapz?
Well, if you don't know the initial/boundry condition, cumtrapz won't know either... Your question doesn't quite make sense..

Related

How do I obtain exact value from scipy interpolation/matplotlib? [duplicate]

I am trying to invert an interpolated function using scipy's interpolate function. Let's say I create an interpolated function,
import scipy.interpolate as interpolate
interpolatedfunction = interpolated.interp1d(xvariable,data,kind='cubic')
Is there some function that can find x when I specify a:
interpolatedfunction(x) == a
In other words, "I want my interpolated function to equal a; what is the value of xvariable such that my function is equal to a?"
I appreciate I can do this with some numerical scheme, but is there a more straightforward method? What if the interpolated function is multivalued in xvariable?
There are dedicated methods for finding roots of cubic splines. The simplest to use is the .roots() method of InterpolatedUnivariateSpline object:
spl = InterpolatedUnivariateSpline(x, y)
roots = spl.roots()
This finds all of the roots instead of just one, as generic solvers (fsolve, brentq, newton, bisect, etc) do.
x = np.arange(20)
y = np.cos(np.arange(20))
spl = InterpolatedUnivariateSpline(x, y)
print(spl.roots())
outputs array([ 1.56669456, 4.71145244, 7.85321627, 10.99554642, 14.13792756, 17.28271674])
However, you want to equate the spline to some arbitrary number a, rather than 0. One option is to rebuild the spline (you can't just subtract a from it):
solutions = InterpolatedUnivariateSpline(x, y - a).roots()
Note that none of this will work with the function returned by interp1d; it does not have roots method. For that function, using generic methods like fsolve is an option, but you will only get one root at a time from it. In any case, why use interp1d for cubic splines when there are more powerful ways to do the same kind of interpolation?
Non-object-oriented way
Instead of rebuilding the spline after subtracting a from data, one can directly subtract a from spline coefficients. This requires us to drop down to non-object-oriented interpolation methods. Specifically, sproot takes in a tck tuple prepared by splrep, as follows:
tck = splrep(x, y, k=3, s=0)
tck_mod = (tck[0], tck[1] - a, tck[2])
solutions = sproot(tck_mod)
I'm not sure if messing with tck is worth the gain here, as it's possible that the bulk of computation time will be in root-finding anyway. But it's good to have alternatives.
After creating an interpolated function interp_fn, you can find the value of x where interp_fn(x) == a by the roots of the function
interp_fn2 = lambda x: interp_fn(x) - a
There are number of options to find the roots in scipy.optimize. For instance, to use Newton's method with the initial value starting at 10:
from scipy import optimize
optimize.newton(interp_fn2, 10)
Actual example
Create an interpolated function and then find the roots where fn(x) == 5
import numpy as np
from scipy import interpolate, optimize
x = np.arange(10)
y = 1 + 6*np.arange(10) - np.arange(10)**2
y2 = 5*np.ones_like(x)
plt.scatter(x,y)
plt.plot(x,y)
plt.plot(x,y2,'k-')
plt.show()
# create the interpolated function, and then the offset
# function used to find the roots
interp_fn = interpolate.interp1d(x, y, 'quadratic')
interp_fn2 = lambda x: interp_fn(x)-5
# to find the roots, we need to supply a starting value
# because there are more than 1 root in our range, we need
# to supply multiple starting values. They should be
# fairly close to the actual root
root1, root2 = optimize.newton(interp_fn2, 1), optimize.newton(interp_fn2, 5)
root1, root2
# returns:
(0.76393202250021064, 5.2360679774997898)
If your data are monotonic you might also try the following:
inversefunction = interpolated.interp1d(data, xvariable, kind='cubic')
Mentioning another option because I found this page in a google search and the other option works for my simple use case. Hopefully it'll be of use to someone.
If the function you're interpolating is very simple and always has a 1:1 relationship between y and x, then you can simply take your data, swap x and y when you pass it into interp1d, and then call the interpolation function in that direction.
Adapting code from https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate
x = np.arange(0, 10)
y = np.exp(-x/3.0)
f = interpolate.interp1d(x, y)
xnew = np.arange(0, 9, 0.1)
ynew = f(xnew)
plt.plot(x, y, 'o', xnew, ynew, '-')
plt.show()
When x and y have been swapped you can call swappedInterpolationFunction(a) to get the x value where that would occur.
f = interpolate.interp1d(y, x)
xnew = np.arange(np.exp(-9/3), np.exp(0), 0.01)
ynew = f(xnew)
plt.plot(y, x, 'o', xnew, ynew, '-')
plt.title("Inverted")
plt.show()
Of course, if the function ever has multiple x values for a given y value (like sine or a parabola) then this will not work because it will no longer be a 1:1 function from x to y, and the above answers are necessary. This is just a simplification in a limited use case.

FFT normalization with numpy

Just started working with numpy package and started it with the simple task to compute the FFT of the input signal. Here's the code:
import numpy as np
import matplotlib.pyplot as plt
#Some constants
L = 128
p = 2
X = 20
x = np.arange(-X/2,X/2,X/L)
fft_x = np.linspace(0,128,128, True)
fwhl = 1
fwhl_y = (2/fwhl) \
*(np.log([2])/np.pi)**0.5*np.e**(-(4*np.log([2]) \
*x**2)/fwhl**2)
fft_fwhl = np.fft.fft(fwhl_y, norm='ortho')
ampl_fft_fwhl = np.abs(fft_fwhl)
plt.bar(fft_x, ampl_fft_fwhl, width=.7, color='b')
plt.show()
Since I work with an exponential function with some constant divided by pi before it, I expect to get the exponential function in Fourier space, where the constant part of the FFT is always equal to 1 (zero frequency).
But the value of that component I get using numpy is larger (it's about 1,13). Here I have an amplitude spectrum which is normalized by 1/(number_of_counts)**0.5 (that's what I read in numpy documentation). I can't understand what's wrong... Can anybody help me?
Thanks!
[EDITED] It seems like the problem is solved, all you need to get the same result of Fourier integral and of FFT is to multiply FFT by the step (in my case it's X/L). And as for normalization as option of numpy.fft.fft(..., norm='ortho'), it's used only to save the scale of the transform, otherwise you'll need to divide the result of the inverse FFT by the number of samples. Thanks everyone for their help!
I've finally solved my problem. All you need to bond FFT with Fourier integral is to multiply the result of the transform (FFT) by the step (X/L in my case, FFTX/L), it works in general. In my case it's a bit more complex since I have an extra rule for the function to be transformed. I have to be sure that the area under the curve is equal to 1, because it's a model of δ function, so since the step is unchangeable, I have to fulfill stepsum(fwhl_y)=1 condition, that is X/L=1/sum(fwhl_y). So to get the correct result I have to make following things:
to calculate FFT fft_fwhl = np.fft.fft(fwhl_y)
to get rid of phase component which comes due to the symmetry of fwhl_y function, that is the function defined in [-T/2,T/2] interval, where T is period and np.fft.fft operation thinks that my function is defined in [0,T] interval. So to get amplitude spectrum only (that's what I need) I simply use np.abs(FFT)
to get the values I expect I should multiply the result I got on previous step by X/L, that is np.abs(FFT)*X/L
I have an extra condition on the area under the curve, so it's X/L*sum(fwhl_y)=1 and I finally come to np.abs(FFT)*X/L = np.abs(FFT)/sum(fwhl_y)
Hope it'll help anyone at least.
Here's a possible solution to your problem:
import numpy as np
import matplotlib.pyplot as plt
from scipy import fft
from numpy import log, pi, e
# Signal setup
Fs = 150
Ts = 1.0 / Fs
t = np.arange(0, 1, Ts)
ff = 50
fwhl = 1
y = (2 / fwhl) * (log([2]) / pi)**0.5 * e**(-(4 * log([2]) * t**2) / fwhl**2)
# Plot original signal
plt.subplot(2, 1, 1)
plt.plot(t, y, 'k-')
plt.xlabel('time')
plt.ylabel('amplitude')
# Normalized FFT
plt.subplot(2, 1, 2)
n = len(y)
k = np.arange(n)
T = n / Fs
frq = k / T
freq = frq[range(n / 2)]
Y = np.fft.fft(y) / n
Y = Y[range(n / 2)]
plt.plot(freq, abs(Y), 'r-')
plt.xlabel('freq (Hz)')
plt.ylabel('|Y(freq)|')
plt.show()
With fwhl=1:
With fwhl=0.1:
You can see in the above graphs how the exponential & FFT plots varies when fwhl is close to 0

Scipy: efficiently generate a series of integration (integral function)

I have a function, I want to get its integral function, something like this:
That is, instead of getting a single integration value at point x, I need to get values at multiple points.
For example:
Let's say I want the range at (-20,20)
def f(x):
return x**2
x_vals = np.arange(-20, 21, 1)
y_vals =[integrate.nquad(f, [[0, x_val]]) for x_val in x_vals ]
plt.plot(x_vals, y_vals,'-', color = 'r')
The problem
In the example code I give above, for each point, the integration is done from scratch. In my real code, the f(x) is pretty complex, and it's a multiple integration, so the running time is simply too slow(Scipy: speed up integration when doing it for the whole surface?).
I'm wondering if there is any way of efficient generating the Phi(x), at a giving range.
My thoughs:
The integration value at point Phi(20) is calucation from Phi(19), and Phi(19) is from Phi(18) and so on. So when we get Phi(20), in reality we also get the series of (-20,-19,-18,-17 ... 18,19,20). Except that we didn't save the value.
So I'm thinking, is it possible to create save points for a integrate function, so when it passes a save point, the value would get saved and continues to the next point. Therefore, by a single process toward 20, we could also get the value at (-20,-19,-18,-17 ... 18,19,20)
One could implement the strategy you outlined by integrating only over the short intervals (between consecutive x-values) and then taking the cumulative sum of the results. Like this:
import numpy as np
import scipy.integrate as si
def f(x):
return x**2
x_vals = np.arange(-20, 21, 1)
pieces = [si.quad(f, x_vals[i], x_vals[i+1])[0] for i in range(len(x_vals)-1)]
y_vals = np.cumsum([0] + pieces)
Here pieces are the integrals over short intervals, which get summed to produce y-values. As written, this code outputs a function that is 0 at the beginning of the range of integration which is -20. One can, of course, subtract the y-value that corresponds to x=0 in order to have the same normalization as on your plot.
That said, the split-and-sum process is unnecessary. When you find an indefinite integral of f, you are really solving the differential equation F' = f. And SciPy has a built-in method for that, odeint. Just use it:
import numpy as np
import scipy.integrate as si
def f(x):
return x**2
x_vals = np.arange(-20, 21, 1)
y_vals = si.odeint(lambda y,x: f(x), 0, x_vals)
The output is essential identical to the first version (within tiny computational errors), with less code. The reason for using lambda y,x: f(x) is that the first argument of odeint must be a function taking two arguments, the right-hand side of the equation y' = f(y, x).
For the equivalent version of user3717023's answer using scipy's solve_ivp you need to keep in mind the different ordering of x and y in the function f (different from the odeint version).
Further, keep in mind that you can only compute the solution up to a constant. So you might want to shift the result according to some given condition. In the example here (with the function f(x)=x^2 as given by the OP), I shifted the numeric solution such that it goes through the origin, matching the simplest analytic solution F(x)=x^3/3.
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import solve_ivp
def f(x):
return x**2
xs = np.linspace(-20, 20, 1001)
# This is the integration step:
sol = solve_ivp(lambda x, y: f(x), t_span=(xs[0], xs[-1]), y0=[0], t_eval=xs)
plt.plot(sol.t, sol.t**3/3, ls='-', c='C0', label="analytic: $F(x)=x^3/3$")
plt.plot(sol.t, sol.y[0], ls='--', c='C1', label="numeric solution")
plt.plot(sol.t, sol.y[0] - sol.y[0][sol.t.size//2], ls='-.', c='C3', label="shifted solution going through origin")
plt.legend()
In case you don't have an analytical version of the function f, but only xs and ys as data points, then you can use scipy's interp1d function to interpolate between the data points and pass on that interpolating function the same way as before:
from scipy.interpolate import interp1d
f = interp1d(xs, ys)

How to force polyfit with second degree to a y-intercept of 0

I've been using the numpy.polyfit function to do some forecasting. If I put in a degree of 1, it works, but I need to do a second degree polynomial fit. In some cases it works, in other cases the plot of the prediction goes down and then goes up forever. For example:
import matplotlib.pyplot as plt
from numpy import *
x=[1,2,3,4,5,6,7,8,9,10]
y=[100,85,72,66,52,48,39,33,29,32]
fit = polyfit(x, y, degree)
fitfunction = poly1d(z4)
to_predict=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
plt.plot(to_predict,fitfunction(to_predict))
plt.show()
After I run that, this shows up (I tried putting a picture up but stackoverflow won't let me).
I want to force it to go through zero.
How would I do that?
If you don't need the fit's error be computed using the original least square formula (i.e. minimizing ∑ |yi - (axi2 + bxi)|2), you could try to perform a linear fit of y/x instead, because (ax2 + bx)/x = ax + b.
If you must use the same error metric, construct the coefficient matrices directly and use numpy.linalg.lstsq:
coeff = numpy.transpose([x*x, x])
((a, b), _, _, _) = numpy.linalg.lstsq(coeff, y)
polynomial = numpy.poly1d([a, b, 0])
(Note that your provided data sequence does not look like a parabola having a y-intercept of 0.)
if anyone has to do this under a deadline, a quick solution is to just add a bunch of extra points at 0 to skew the weighting off. i did this:
for i in range(0,100):
x_vent.insert(i,0)
y_vent.insert(i,0)
slope_vent,intercept_vent=np.polyfit(x_vent,y_vent,1)

Fourier transform of a Gaussian is not a Gaussian, but thats wrong! - Python

I am trying to utilize Numpy's fft function, however when I give the function a simple gausian function the fft of that gausian function is not a gausian, its close but its halved so that each half is at either end of the x axis.
The Gaussian function I'm calculating is
y = exp(-x^2)
Here is my code:
from cmath import *
from numpy import multiply
from numpy.fft import fft
from pylab import plot, show
""" Basically the standard range() function but with float support """
def frange (min_value, max_value, step):
value = float(min_value)
array = []
while value < float(max_value):
array.append(value)
value += float(step)
return array
N = 256.0 # number of steps
y = []
x = frange(-5, 5, 10/N)
# fill array y with values of the Gaussian function
cache = -multiply(x, x)
for i in cache: y.append(exp(i))
Y = fft(y)
# plot the fft of the gausian function
plot(x, abs(Y))
show()
The result is not quite right, cause the FFT of a Gaussian function should be a Gaussian function itself...
np.fft.fft returns a result in so-called "standard order": (from the docs)
If A = fft(a, n), then A[0]
contains the zero-frequency term (the
mean of the signal), which is always
purely real for real inputs. Then
A[1:n/2] contains the
positive-frequency terms, and
A[n/2+1:] contains the
negative-frequency terms, in order of
decreasingly negative frequency.
The function np.fft.fftshift rearranges the result into the order most humans expect (and which is good for plotting):
The routine np.fft.fftshift(A)
shifts transforms and their
frequencies to put the zero-frequency
components in the middle...
So using np.fft.fftshift:
import matplotlib.pyplot as plt
import numpy as np
N = 128
x = np.arange(-5, 5, 10./(2 * N))
y = np.exp(-x * x)
y_fft = np.fft.fftshift(np.abs(np.fft.fft(y))) / np.sqrt(len(y))
plt.plot(x,y)
plt.plot(x,y_fft)
plt.show()
Your result is not even close to a Gaussian, not even one split into two halves.
To get the result you expect, you will have to position your own Gaussian with the center at index 0, and the result will also be positioned that way. Try the following code:
from pylab import *
N = 128
x = r_[arange(0, 5, 5./N), arange(-5, 0, 5./N)]
y = exp(-x*x)
y_fft = fft(y) / sqrt(2 * N)
plot(r_[y[N:], y[:N]])
plot(r_[y_fft[N:], y_fft[:N]])
show()
The plot commands split the arrays in two halfs and swap them to get a nicer picture.
It is being displayed with the center (i.e. mean) at coefficient index zero. That is why it appears that the right half is on the left, and vice versa.
EDIT: Explore the following code:
import scipy
import scipy.signal as sig
import pylab
x = sig.gaussian(2048, 10)
X = scipy.absolute(scipy.fft(x))
pylab.plot(x)
pylab.plot(X)
pylab.plot(X[range(1024, 2048)+range(0, 1024)])
The last line will plot X starting from the center of the vector, then wrap around to the beginning.
A fourier transform implicitly repeats indefinitely, as it is a transform of a signal that implicitly repeats indefinitely. Note that when you pass y to be transformed, the x values are not supplied, so in fact the gaussian that is transformed is one centred on the median value between 0 and 256, so 128.
Remember also that translation of f(x) is phase change of F(x).
Following on from Sven Marnach's answer, a simpler version would be this:
from pylab import *
N = 128
x = ifftshift(arange(-5,5,5./N))
y = exp(-x*x)
y_fft = fft(y) / sqrt(2 * N)
plot(fftshift(y))
plot(fftshift(y_fft))
show()
This yields a plot identical to the above one.
The key (and this seems strange to me) is that NumPy's assumed data ordering --- in both frequency and time domains --- is to have the "zero" value first. This is not what I'd expect from other implementations of FFT, such as the FFTW3 libraries in C.
This was slightly fudged in the answers from unutbu and Steve Tjoa above, because they're taking the absolute value of the FFT before plotting it, thus wiping away the phase issues resulting from not using the "standard order" in time.

Categories

Resources