Integral of data with time (Python) - python

I have a time series x(t) that is a NumPy array. My assignment tells me that I need to find the integral of this data with time.
How am I supposed to do this? It's not a function that I need to integrate, it's a list of data.

It depends on the statement of the problem. A rude approach would be something like this
import numpy as np
import scipy as sp
t = np.linspace(-1, 1, 100)
x = t*t
delta = t[1] - t[0]
I = sum(delta*x)

You can use Simpson's Rule. A routine that does that for you is simps in spicy.integrate.
>>> help(scipy.integrate.simps)
Help on function simps in module scipy.integrate.quadrature:
simps(y, x=None, dx=1, axis=-1, even='avg')
Integrate y(x) using samples along the given axis and the composite
Simpson's rule. If x is None, spacing of dx is assumed.
If there are an even number of samples, N, then there are an odd
number of intervals (N-1), but Simpson's rule requires an even number
of intervals. The parameter 'even' controls how this is handled.

Related

expand 1 dim vector by using taylor series of log(1+e^x) in python

I need to non-linearly expand on each pixel value from 1 dim pixel vector with taylor series expansion of specific non-linear function (e^x or log(x) or log(1+e^x)), but my current implementation is not right to me at least based on taylor series concepts. The basic intuition behind is taking pixel array as input neurons for a CNN model where each pixel should be non-linearly expanded with taylor series expansion of non-linear function.
new update 1:
From my understanding from taylor series, taylor series is written for a function F of a variable x in terms of the value of the function F and it's derivatives in for another value of variable x0. In my problem, F is function of non-linear transformation of features (a.k.a, pixels), x is each pixel value, x0 is maclaurin series approximation at 0.
new update 2
if we use taylor series of log(1+e^x) with approximation order of 2, each pixel value will yield two new pixel by taking first and second expansion terms of taylor series.
graphic illustration
Here is the graphical illustration of the above formulation:
Where X is pixel array, p is approximation order of taylor series, and α is the taylor expansion coefficient.
I wanted to non-linearly expand pixel vectors with taylor series expansion of non-linear function like above illustration demonstrated.
My current attempt
This is my current attempt which is not working correctly for pixel arrays. I was thinking about how to make the same idea applicable to pixel arrays.
def taylor_func(x, approx_order=2):
x_ = x[..., None]
x_ = tf.tile(x_, multiples=[1, 1, approx_order+ 1])
pows = tf.range(0, approx_order + 1, dtype=tf.float32)
x_p = tf.pow(x_, pows)
x_p_ = x_p[..., None]
return x_p_
x = Input(shape=(4,4,3))
x_new = Lambda(lambda x: taylor_func(x, max_pow))(x)
my new updated attempt:
x_input= Input(shape=(32, 32,3))
def maclurin_exp(x, powers=2):
out= 0
for k in range(powers):
out+= ((-1)**k) * (x ** (2*k)) / (math.factorial(2 * k))
return res
x_input_new = Lambda(lambda x: maclurin_exp(x, max_pow))(x_input)
This attempt doesn't yield what the above mathematical formulation describes. I bet I missed something while doing the expansion. Can anyone point me on how to make this correct? Any better idea?
goal
I wanted to take pixel vector and make non-linearly distributed or expanded with taylor series expansion of certain non-linear function. Is there any possible way to do this? any thoughts? thanks
This is a really interesting question but I can't say that I'm clear on it as of yet. So, while I have some thoughts, I might be missing the thrust of what you're looking to do.
It seems like you want to develop your own activation function instead of using something RELU or softmax. Certainly no harm there. And you gave three candidates: e^x, log(x), and log(1+e^x).
Notice log(x) asymptotically approaches negative infinity x --> 0. So, log(x) is right out. If that was intended as a check on the answers you get or was something jotted down as you were falling asleep, no worries. But if it wasn't, you should spend some time and make sure you understand the underpinnings of what you doing because the consequences can be quite high.
You indicated you were looking for a canonical answer and you get a two for one here. You get both a canonical answer and highly performant code.
Considering you're not likely to able to write faster, more streamlined code than the folks of SciPy, Numpy, or Pandas. Or, PyPy. Or Cython for that matter. Their stuff is the standard. So don't try to compete against them by writing your own, less performant (and possibly bugged) version which you will then have to maintain as time passes. Instead, maximize your development and run times by using them.
Let's take a look at the implementation e^x in SciPy and give you some code to work with. I know you don't need a graph for what you're at this stage but they're pretty and can help you understand how they Taylor (or Maclaurin, aka Euler-Maclaurin) will work as the order of the approximation changes. It just so happens that SciPy has Taylor approximation built-in.
import scipy
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import approximate_taylor_polynomial
x = np.linspace(-10.0, 10.0, num=100)
plt.plot(x, np.exp(x), label="e^x", color = 'black')
for degree in np.arange(1, 4, step=1):
e_to_the_x_taylor = approximate_taylor_polynomial(np.exp, 0, degree, 1, order=degree + 2)
plt.plot(x, e_to_the_x_taylor(x), label=f"degree={degree}")
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.0, shadow=True)
plt.tight_layout()
plt.axis([-10, 10, -10, 10])
plt.show()
That produces this:
But let's say if you're good with 'the maths', so to speak, and are willing to go with something slightly slower if it's more 'mathy' as in it handles symbolic notation well. For that, let me suggest SymPy.
And with that in mind here is a bit of SymPy code with a graph because, well, it looks good AND because we need to go back and hit another point again.
from sympy import series, Symbol, log, E
from sympy.functions import exp
from sympy.plotting import plot
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = 13,10
plt.rcParams['lines.linewidth'] = 2
x = Symbol('x')
def taylor(function, x0, n):
""" Defines Taylor approximation of a given function
function -- is our function which we want to approximate
x0 -- point where to approximate
n -- order of approximation
"""
return function.series(x,x0,n).removeO()
# I get eyestain; feel free to get rid of this
plt.rcParams['figure.figsize'] = 10, 8
plt.rcParams['lines.linewidth'] = 1
c = log(1 + pow(E, x))
plt = plot(c, taylor(c,0,1), taylor(c,0,2), taylor(c,0,3), taylor(c,0,4), (x,-5,5),legend=True, show=False)
plt[0].line_color = 'black'
plt[1].line_color = 'red'
plt[2].line_color = 'orange'
plt[3].line_color = 'green'
plt[4].line_color = 'blue'
plt.title = 'Taylor Series Expansion for log(1 +e^x)'
plt.show()
I think either option will get you where you need go.
Ok, now for the other point. You clearly stated after a bit of revision that log(1 +e^x) was your first choice. But the others don't pass the sniff test. e^x vacillates wildly as the degree of the polynomial changes. Because of the opaqueness of algorithms and how few people can conceptually understand this stuff, Data Scientists can screw things up to a degree people can't even imagine. So make sure you're very solid on theory for this.
One last thing, consider looking at the CDF of the Erlang Distribution as an activation function (assuming I'm right and you're looking to roll your own activation function as an area of research). I don't think anyone has looked at that but it strikes as promising. I think you could break out each channel of the RGB as one of the two parameters, with the other being the physical coordinate.
You can use tf.tile and tf.math.pow to generate the elements of the series expansion. Then you can use tf.math.cumsum to compute the partial sums s_i. Eventually you can multiply with the weights w_i and compute the final sum.
Here is a code sample:
import math
import tensorflow as tf
x = tf.keras.Input(shape=(32, 32, 3)) # 3-channel RGB.
# The following is determined by your series expansion and its order.
# For example: log(1 + exp(x)) to 3rd order.
# https://www.wolframalpha.com/input/?i=taylor+series+log%281+%2B+e%5Ex%29
order = 3
alpha = tf.constant([1/2, 1/8, -1/192]) # Series coefficients.
power = tf.constant([1.0, 2.0, 4.0])
offset = math.log(2)
# These are the weights of the network; using a constant for simplicity here.
# The shape must coincide with the above order of series expansion.
w_i = tf.constant([1.0, 1.0, 1.0])
elements = offset + alpha * tf.math.pow(
tf.tile(x[..., None], [1, 1, 1, 1, order]),
power
)
s_i = tf.math.cumsum(elements, axis=-1)
y = tf.math.reduce_sum(w_i * s_i, axis=-1)

Scipy: efficiently generate a series of integration (integral function)

I have a function, I want to get its integral function, something like this:
That is, instead of getting a single integration value at point x, I need to get values at multiple points.
For example:
Let's say I want the range at (-20,20)
def f(x):
return x**2
x_vals = np.arange(-20, 21, 1)
y_vals =[integrate.nquad(f, [[0, x_val]]) for x_val in x_vals ]
plt.plot(x_vals, y_vals,'-', color = 'r')
The problem
In the example code I give above, for each point, the integration is done from scratch. In my real code, the f(x) is pretty complex, and it's a multiple integration, so the running time is simply too slow(Scipy: speed up integration when doing it for the whole surface?).
I'm wondering if there is any way of efficient generating the Phi(x), at a giving range.
My thoughs:
The integration value at point Phi(20) is calucation from Phi(19), and Phi(19) is from Phi(18) and so on. So when we get Phi(20), in reality we also get the series of (-20,-19,-18,-17 ... 18,19,20). Except that we didn't save the value.
So I'm thinking, is it possible to create save points for a integrate function, so when it passes a save point, the value would get saved and continues to the next point. Therefore, by a single process toward 20, we could also get the value at (-20,-19,-18,-17 ... 18,19,20)
One could implement the strategy you outlined by integrating only over the short intervals (between consecutive x-values) and then taking the cumulative sum of the results. Like this:
import numpy as np
import scipy.integrate as si
def f(x):
return x**2
x_vals = np.arange(-20, 21, 1)
pieces = [si.quad(f, x_vals[i], x_vals[i+1])[0] for i in range(len(x_vals)-1)]
y_vals = np.cumsum([0] + pieces)
Here pieces are the integrals over short intervals, which get summed to produce y-values. As written, this code outputs a function that is 0 at the beginning of the range of integration which is -20. One can, of course, subtract the y-value that corresponds to x=0 in order to have the same normalization as on your plot.
That said, the split-and-sum process is unnecessary. When you find an indefinite integral of f, you are really solving the differential equation F' = f. And SciPy has a built-in method for that, odeint. Just use it:
import numpy as np
import scipy.integrate as si
def f(x):
return x**2
x_vals = np.arange(-20, 21, 1)
y_vals = si.odeint(lambda y,x: f(x), 0, x_vals)
The output is essential identical to the first version (within tiny computational errors), with less code. The reason for using lambda y,x: f(x) is that the first argument of odeint must be a function taking two arguments, the right-hand side of the equation y' = f(y, x).
For the equivalent version of user3717023's answer using scipy's solve_ivp you need to keep in mind the different ordering of x and y in the function f (different from the odeint version).
Further, keep in mind that you can only compute the solution up to a constant. So you might want to shift the result according to some given condition. In the example here (with the function f(x)=x^2 as given by the OP), I shifted the numeric solution such that it goes through the origin, matching the simplest analytic solution F(x)=x^3/3.
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import solve_ivp
def f(x):
return x**2
xs = np.linspace(-20, 20, 1001)
# This is the integration step:
sol = solve_ivp(lambda x, y: f(x), t_span=(xs[0], xs[-1]), y0=[0], t_eval=xs)
plt.plot(sol.t, sol.t**3/3, ls='-', c='C0', label="analytic: $F(x)=x^3/3$")
plt.plot(sol.t, sol.y[0], ls='--', c='C1', label="numeric solution")
plt.plot(sol.t, sol.y[0] - sol.y[0][sol.t.size//2], ls='-.', c='C3', label="shifted solution going through origin")
plt.legend()
In case you don't have an analytical version of the function f, but only xs and ys as data points, then you can use scipy's interp1d function to interpolate between the data points and pass on that interpolating function the same way as before:
from scipy.interpolate import interp1d
f = interp1d(xs, ys)

How to plot grad(f(x,y))?

I want to calculate and plot a gradient of any scalar function of two variables. If you really want a concrete example, lets say f=x^2+y^2 where x goes from -10 to 10 and same for y. How do I calculate and plot grad(f)? The solution should be vector and I should see vector lines. I am new to python so please use simple words.
EDIT:
#Andras Deak: thank you for your post, i tried what you suggested and instead of your test function (fun=3*x^2-5*y^2) I used function that i defined as V(x,y); this is how the code looks like but it reports an error
import numpy as np
import math
import sympy
import matplotlib.pyplot as plt
def V(x,y):
t=[]
for k in range (1,3):
for l in range (1,3):
t.append(0.000001*np.sin(2*math.pi*k*0.5)/((4*(math.pi)**2)* (k**2+l**2)))
term = t* np.sin(2 * math.pi * k * x/0.004) * np.cos(2 * math.pi * l * y/0.004)
return term
return term.sum()
x,y=sympy.symbols('x y')
fun=V(x,y)
gradfun=[sympy.diff(fun,var) for var in (x,y)]
numgradfun=sympy.lambdify([x,y],gradfun)
X,Y=np.meshgrid(np.arange(-10,11),np.arange(-10,11))
graddat=numgradfun(X,Y)
plt.figure()
plt.quiver(X,Y,graddat[0],graddat[1])
plt.show()
AttributeError: 'Mul' object has no attribute 'sin'
And lets say I remove sin, I get another error:
TypeError: can't multiply sequence by non-int of type 'Mul'
I read tutorial for sympy and it says "The real power of a symbolic computation system such as SymPy is the ability to do all sorts of computations symbolically". I get this, I just dont get why I cannot multiply x and y symbols with float numbers.
What is the way around this? :( Help please!
UPDATE
#Andras Deak: I wanted to make things shorter so I removed many constants from the original formulas for V(x,y) and Cn*Dm. As you pointed out, that caused the sin function to always return 0 (i just noticed). Apologies for that. I will update the post later today when i read your comment in details. Big thanks!
UPDATE 2
I changed coefficients in my expression for voltage and this is the result:
It looks good except that the arrows point in the opposite direction (they are supposed to go out of the reddish dot and into the blue one). Do you know how I could change that? And if possible, could you please tell me the way to increase the size of the arrows? I tried what was suggested in another topic (Computing and drawing vector fields):
skip = (slice(None, None, 3), slice(None, None, 3))
This plots only every third arrow and matplotlib does the autoscale but it doesnt work for me (nothing happens when i add this, for any number that i enter)
You were already of huge help , i cannot thank you enough!
Here's a solution using sympy and numpy. This is the first time I use sympy, so others will/could probably come up with much better and more elegant solutions.
import sympy
#define symbolic vars, function
x,y=sympy.symbols('x y')
fun=3*x**2-5*y**2
#take the gradient symbolically
gradfun=[sympy.diff(fun,var) for var in (x,y)]
#turn into a bivariate lambda for numpy
numgradfun=sympy.lambdify([x,y],gradfun)
now you can use numgradfun(1,3) to compute the gradient at (x,y)==(1,3). This function can then be used for plotting, which you said you can do.
For plotting, you can use, for instance, matplotlib's quiver, like so:
import numpy as np
import matplotlib.pyplot as plt
X,Y=np.meshgrid(np.arange(-10,11),np.arange(-10,11))
graddat=numgradfun(X,Y)
plt.figure()
plt.quiver(X,Y,graddat[0],graddat[1])
plt.show()
UPDATE
You added a specification for your function to be computed. It contains the product of terms depending on x and y, which seems to break my above solution. I managed to come up with a new one to suit your needs. However, your function seems to make little sense. From your edited question:
t.append(0.000001*np.sin(2*math.pi*k*0.5)/((4*(math.pi)**2)* (k**2+l**2)))
term = t* np.sin(2 * math.pi * k * x/0.004) * np.cos(2 * math.pi * l * y/0.004)
On the other hand, from your corresponding comment to this answer:
V(x,y) = Sum over n and m of [Cn * Dm * sin(2pinx) * cos(2pimy)]; sum goes from -10 to 10; Cn and Dm are coefficients, and i calculated
that CkDl = sin(2pik)/(k^2 +l^2) (i used here k and l as one of the
indices from the sum over n and m).
I have several problems with this: both sin(2*pi*k) and sin(2*pi*k/2) (the two competing versions in the prefactor are always zero for integer k, giving you a constant zero V at every (x,y). Furthermore, in your code you have magical frequency factors in the trigonometric functions, which are missing from the comment. If you multiply your x by 4e-3, you drastically change the spatial dependence of your function (by changing the wavelength by roughly a factor of a thousand). So you should really decide what your function is.
So here's a solution, where I assumed
V(x,y)=sum_{k,l = 1 to 10} C_{k,l} * sin(2*pi*k*x)*cos(2*pi*l*y), with
C_{k,l}=sin(2*pi*k/4)/((4*pi^2)*(k^2+l^2))*1e-6
This is a combination of your various versions of the function, with the modification of sin(2*pi*k/4) in the prefactor in order to have a non-zero function. I expect you to be able to fix the numerical factors to your actual needs, after you figure out the proper mathematical model.
So here's the full code:
import sympy as sp
import numpy as np
import matplotlib.pyplot as plt
def CD(k,l):
#return sp.sin(2*sp.pi*k/2)/((4*sp.pi**2)*(k**2+l**2))*1e-6
return sp.sin(2*sp.pi*k/4)/((4*sp.pi**2)*(k**2+l**2))*1e-6
def Vkl(x,y,k,l):
return CD(k,l)*sp.sin(2*sp.pi*k*x)*sp.cos(2*sp.pi*l*y)
def V(x,y,kmax,lmax):
k,l=sp.symbols('k l',integers=True)
return sp.summation(Vkl(x,y,k,l),(k,1,kmax),(l,1,lmax))
#define symbolic vars, function
kmax=10
lmax=10
x,y=sp.symbols('x y')
fun=V(x,y,kmax,lmax)
#take the gradient symbolically
gradfun=[sp.diff(fun,var) for var in (x,y)]
#turn into bivariate lambda for numpy
numgradfun=sp.lambdify([x,y],gradfun,'numpy')
numfun=sp.lambdify([x,y],fun,'numpy')
#plot
X,Y=np.meshgrid(np.linspace(-10,10,51),np.linspace(-10,10,51))
graddat=numgradfun(X,Y)
fundat=numfun(X,Y)
hf=plt.figure()
hc=plt.contourf(X,Y,fundat,np.linspace(fundat.min(),fundat.max(),25))
plt.quiver(X,Y,graddat[0],graddat[1])
plt.colorbar(hc)
plt.show()
I defined your V(x,y) function using some auxiliary functions for transparence. I left the summation cut-offs as literal parameters, kmax and lmax: in your code these were 3, in your comment they were said to be 10, and anyway they should be infinity.
The gradient is taken the same way as before, but when converting to a numpy function using lambdify you have to set an additional string parameter, 'numpy'. This will alow the resulting numpy lambda to accept array input (essentially it will use np.sin instead of math.sin and the same for cos).
I also changed the definition of the grid from array to np.linspace: this is usually more convenient. Since your function is almost constant at integer grid points, I created a denser mesh for plotting (51 points while keeping your original limits of (-10,10) fixed).
For clarity I included a few more plots: a contourf to show the value of the function (contour lines should always be orthogonal to the gradient vectors), and a colorbar to indicate the value of the function. Here's the result:
The composition is obviously not the best, but I didn't want to stray too much from your specifications. The arrows in this figure are actually hardly visible, but as you can see (and also evident from the definition of V) your function is periodic, so if you plot the same thing with smaller limits and less grid points, you'll see more features and larger arrows.

Fourier transform of a Gaussian is not a Gaussian, but thats wrong! - Python

I am trying to utilize Numpy's fft function, however when I give the function a simple gausian function the fft of that gausian function is not a gausian, its close but its halved so that each half is at either end of the x axis.
The Gaussian function I'm calculating is
y = exp(-x^2)
Here is my code:
from cmath import *
from numpy import multiply
from numpy.fft import fft
from pylab import plot, show
""" Basically the standard range() function but with float support """
def frange (min_value, max_value, step):
value = float(min_value)
array = []
while value < float(max_value):
array.append(value)
value += float(step)
return array
N = 256.0 # number of steps
y = []
x = frange(-5, 5, 10/N)
# fill array y with values of the Gaussian function
cache = -multiply(x, x)
for i in cache: y.append(exp(i))
Y = fft(y)
# plot the fft of the gausian function
plot(x, abs(Y))
show()
The result is not quite right, cause the FFT of a Gaussian function should be a Gaussian function itself...
np.fft.fft returns a result in so-called "standard order": (from the docs)
If A = fft(a, n), then A[0]
contains the zero-frequency term (the
mean of the signal), which is always
purely real for real inputs. Then
A[1:n/2] contains the
positive-frequency terms, and
A[n/2+1:] contains the
negative-frequency terms, in order of
decreasingly negative frequency.
The function np.fft.fftshift rearranges the result into the order most humans expect (and which is good for plotting):
The routine np.fft.fftshift(A)
shifts transforms and their
frequencies to put the zero-frequency
components in the middle...
So using np.fft.fftshift:
import matplotlib.pyplot as plt
import numpy as np
N = 128
x = np.arange(-5, 5, 10./(2 * N))
y = np.exp(-x * x)
y_fft = np.fft.fftshift(np.abs(np.fft.fft(y))) / np.sqrt(len(y))
plt.plot(x,y)
plt.plot(x,y_fft)
plt.show()
Your result is not even close to a Gaussian, not even one split into two halves.
To get the result you expect, you will have to position your own Gaussian with the center at index 0, and the result will also be positioned that way. Try the following code:
from pylab import *
N = 128
x = r_[arange(0, 5, 5./N), arange(-5, 0, 5./N)]
y = exp(-x*x)
y_fft = fft(y) / sqrt(2 * N)
plot(r_[y[N:], y[:N]])
plot(r_[y_fft[N:], y_fft[:N]])
show()
The plot commands split the arrays in two halfs and swap them to get a nicer picture.
It is being displayed with the center (i.e. mean) at coefficient index zero. That is why it appears that the right half is on the left, and vice versa.
EDIT: Explore the following code:
import scipy
import scipy.signal as sig
import pylab
x = sig.gaussian(2048, 10)
X = scipy.absolute(scipy.fft(x))
pylab.plot(x)
pylab.plot(X)
pylab.plot(X[range(1024, 2048)+range(0, 1024)])
The last line will plot X starting from the center of the vector, then wrap around to the beginning.
A fourier transform implicitly repeats indefinitely, as it is a transform of a signal that implicitly repeats indefinitely. Note that when you pass y to be transformed, the x values are not supplied, so in fact the gaussian that is transformed is one centred on the median value between 0 and 256, so 128.
Remember also that translation of f(x) is phase change of F(x).
Following on from Sven Marnach's answer, a simpler version would be this:
from pylab import *
N = 128
x = ifftshift(arange(-5,5,5./N))
y = exp(-x*x)
y_fft = fft(y) / sqrt(2 * N)
plot(fftshift(y))
plot(fftshift(y_fft))
show()
This yields a plot identical to the above one.
The key (and this seems strange to me) is that NumPy's assumed data ordering --- in both frequency and time domains --- is to have the "zero" value first. This is not what I'd expect from other implementations of FFT, such as the FFTW3 libraries in C.
This was slightly fudged in the answers from unutbu and Steve Tjoa above, because they're taking the absolute value of the FFT before plotting it, thus wiping away the phase issues resulting from not using the "standard order" in time.

Recreating time series data using FFT results without using ifft

I analyzed the sunspots.dat data (below) using fft which is a classic example in this area. I obtained results from fft in real and imaginery parts. Then I tried to use these coefficients (first 20) to recreate the data following the formula for Fourier transform. Thinking real parts correspond to a_n and imaginery to b_n, I have
import numpy as np
from scipy import *
from matplotlib import pyplot as gplt
from scipy import fftpack
def f(Y,x):
total = 0
for i in range(20):
total += Y.real[i]*np.cos(i*x) + Y.imag[i]*np.sin(i*x)
return total
tempdata = np.loadtxt("sunspots.dat")
year=tempdata[:,0]
wolfer=tempdata[:,1]
Y=fft(wolfer)
n=len(Y)
print n
xs = linspace(0, 2*pi,1000)
gplt.plot(xs, [f(Y, x) for x in xs], '.')
gplt.show()
For some reason however, my plot does not mirror the one generated by ifft (I use the same number of coefficients on both sides). What could be wrong ?
Data:
http://linuxgazette.net/115/misc/andreasen/sunspots.dat
When you called fft(wolfer), you told the transform to assume a fundamental period equal to the length of the data. To reconstruct the data, you have to use basis functions of the same fundamental period = 2*pi/N. By the same token, your time index xs has to range over the time samples of the original signal.
Another mistake was in forgetting to do to the full complex multiplication. It's easier to think of this as Y[omega]*exp(1j*n*omega/N).
Here's the fixed code. Note I renamed i to ctr to avoid confusion with sqrt(-1), and n to N to follow the usual signal processing convention of using the lower case for a sample, and the upper case for total sample length. I also imported __future__ division to avoid confusion about integer division.
forgot to add earlier: Note that SciPy's fft doesn't divide by N after accumulating. I didn't divide this out before using Y[n]; you should if you want to get back the same numbers, rather than just seeing the same shape.
And finally, note that I am summing over the full range of frequency coefficients. When I plotted np.abs(Y), it looked like there were significant values in the upper frequencies, at least until sample 70 or so. I figured it would be easier to understand the result by summing over the full range, seeing the correct result, then paring back coefficients and seeing what happens.
from __future__ import division
import numpy as np
from scipy import *
from matplotlib import pyplot as gplt
from scipy import fftpack
def f(Y,x, N):
total = 0
for ctr in range(len(Y)):
total += Y[ctr] * (np.cos(x*ctr*2*np.pi/N) + 1j*np.sin(x*ctr*2*np.pi/N))
return real(total)
tempdata = np.loadtxt("sunspots.dat")
year=tempdata[:,0]
wolfer=tempdata[:,1]
Y=fft(wolfer)
N=len(Y)
print(N)
xs = range(N)
gplt.plot(xs, [f(Y, x, N) for x in xs])
gplt.show()
The answer from mtrw was extremely helpful and helped me answer the same question as the OP, but my head almost exploded trying to understand the nested loop.
Here's the last part but with numpy broadcasting (not sure if this even existed when the question was asked) rather than calling the f function:
xs = np.arange(N)
omega = 2*np.pi/N
phase = omega * xs[:,None] * xs[None,:]
reconstruct = Y[None,:] * (np.cos(phase) + 1j*np.sin(phase))
reconstruct = (reconstruct).sum(axis=1).real / N
# same output
plt.plot(reconstruct)
plt.plot(wolfer)

Categories

Resources