FFT derivatives using Numpy and the Nyquist frequency

FFT derivatives using Numpy and the Nyquist frequency - python

I am having trouble understanding Numpy's behavior regarding the Nyquist frequency. Consider the following example:
import numpy as np
x=np.linspace(0, 2*np.pi, 21)[:-1]
k=np.fft.rfftfreq(len(x), d=x[1]-x[0])
FFT=np.fft.rfft(x)
x1=np.fft.irfft(1j*k*FFT)
FFT[-1]+=1e5
x2=np.fft.irfft(1j*k*FFT)
print(np.allclose(x1,x2))
Prints True. So apparently it doesn't matter what I do with the Nyquist frequency in FFT, the result is always the same and the change is ignored. Curiously, this does not happen when trying to just recover the function (no derivation):
x1=np.fft.irfft(FFT)
FFT[-1]+=1e5
x2=np.fft.irfft(FFT)
print(np.allclose(x1,x2))
prints False.
I may be misunderstanding what the Nyquist frequency is here (Wikipedia and other sources weren't very helpful) but aren't both results supposed to be affected by a change in the Nyquist frequency? The closest explanation I can find is that the Nyquist frequency is supposed to be a real number, but still doesn't seem to explain both behaviors.
The reason I'm asking this is because I'm trying to reproduce results that I know are correct from a Fortran code that does do some stuff with the Nyquist frequency wen differentiating. My results are always about 1% off and I'm guessing this is the culprit.

The r in np.fft.rfft() indicates that you are using the DFT on real input. But if that is not True, you will get unexpected behaviors like this one. Just use fft functions for complex values. As a side note, always try to inspect your data.
EDIT (additional explanation):
In particular, when you calculate the "DFT for real inputs" you are enforcing certain properties to your data, i.e. the (D)FT of real valued function, implies that the (D)FT transform is Hermitian-symmetric, and hence the negative (D)FT coefficients are redundant, so rfft and later irfft are optimized for the computation under this assumption.
See their documentations np.fft.rfft() and np.fft.irfft() for more information.
Briefly, because of this expected parity, half of your coefficients (the negative ones) will not be computed by np.fft.rfft() and because of parity of the (D)FT transform, the first component is purely real (by definition) and the last component is also purely real (for convenience).
Because of the 1j multiplication, whatever was purely real is now purely imaginary (and viceversa) in the subsequent irfft calls.
Since the irfft() will ignore the imaginary part of the first and last components, your statement will not affect its result.

Related

Polynomial Kernel not PSD?

I am currently going through different Machine Learning methods on a low-level basis. Since the polynomial kernel
K(x,z)=(1+x^T*z)^d
is one of the more commonly used kernels, I was under the assumption that this kernel must yield a positive (semi-)definite matrix for a fixed set of data {x1,...,xn}.
However, in my implementation, this seems to not be the case, as the following example shows:
import numpy as np
np.random.seed(93)
x=np.random.uniform(0,1,5)
#Assuming d=1
kernel=(1+x[:,None]#x.T[None,:])
np.linalg.eigvals(kernel)
The output would be
[ 6.9463439e+00 1.6070016e-01 9.5388039e-08 -1.5821310e-08 -3.7724433e-08]
I'm also getting negative eigenvalues for d>2.
Am I totally misunderstanding something here? Or is the polynomia kernel simply not PSD?
EDIT: In a previous version, i used x=np.float32(np.random.uniform(0,1,5)) to reduce computing time, which lead to a greater amount of negative eigenvalues (I believe due to numerical instabilities, as #user2357112 mentioned). I guess this is a good example that precision does matter? Since negative eigenvalues still occur, even for float64 precision, the follow-up question would then be how to avoid such numerical instabilities?

Integration with Scipy giving incorrect results with negative lower bound

I am attempting to calculate integrals between two limits using python/scipy.
I am using online calculators to double check my results (http://www.wolframalpha.com/widgets/view.jsp?id=8c7e046ce6f4d030f0b386ea5c17b16a, http://www.integral-calculator.com/), and my results disagree when I have certain limits set.
The code used is:
import scipy as sp
import numpy as np
def integrand(x):
return np.exp(-0.5*x**2)
def int_test(a,b):
# a and b are the lower and upper bounds of the integration
return sp.integrate.quad(integrand,a,b)
When setting the limits (a,b) to (-np.inf,1) I get answers that agree (2.10894...)
however if I set (-np.inf,300) I get an answer of zero.
On further investigation using:
for i in range(50):
print(i,int_test(-np.inf,i))
I can see that the result goes wrong at i=36.
I was wondering if there was a way to avoid this?
Thanks,
Matt

I am guessing this has to do with the infinite bounds. scipy.integrate.quad is a wrapper around quadpack routines.
https://people.sc.fsu.edu/~jburkardt/f_src/quadpack/quadpack.html
In the end, these routines chose suitable intervals and try to get the value of the integral through function evaluations and then numerical integrations. This works fine for finite integrals, assuming you know roughly how fine you can make the steps of the function evaluation.
For infinite integrals it depends how well the algorithms choose respective subintervals and how accurately they are computed.
My advice: do NOT use numerical integration software AT ALL if you are interested in accurate values for infinite integrals.
If your problem can be solved analytically, try that or confine yourself to certain bounds.

Why is numpy's sine function so inaccurate at some points?

I just checked numpy's sine function. Apparently, it produce highly inaccurate results around pi.
In [26]: import numpy as np
In [27]: np.sin(np.pi)
Out[27]: 1.2246467991473532e-16
The expected result is 0. Why is numpy so inaccurate there?
To some extend, I feel uncertain whether it is acceptable to regard the calculated result as inaccurate: Its absolute error comes within one machine epsilon (for binary64), whereas the relative error is +inf -- reason why I feel somewhat confused. Any idea?
[Edit] I fully understand that floating-point calculation can be inaccurate. But most of the floating-point libraries can manage to deliver results within a small range of error. Here, the relative error is +inf, which seems unacceptable. Just imagine that we want to calculate
1/(1e-16 + sin(pi))
The results would be disastrously wrong if we use numpy's implementation.

The main problem here is that np.pi is not exactly π, it's a finite binary floating point number that is close to the true irrational real number π but still off by ~1e-16. np.sin(np.pi) is actually returning a value closer to the true infinite-precision result for sin(np.pi) (i.e. the ideal mathematical sin() function being given the approximated np.pi value) than 0 would be.

The value is dependent upon the algorithm used to compute it. A typical implementation will use some quickly-converging infinite series, carried out until it converges within one machine epsilon. Many modern chips (starting with the Intel 960, I think) had such functions in the instruction set.
To get 0 returned for this, we would need either a notably more accurate algorithm, one that ran extra-precision arithmetic to guarantee the closest-match result, or something that recognizes special cases: detect a multiple of PI and return the exact value.

fft2 different result in numpy and matlab

I was trying to port one code from python to matlab, but I encounter one inconsistence between numpy fft2 and matlab fft2:
peak =
4.377491037053e-223 3.029446976068e-216 ...
1.271610790463e-209 3.237410810582e-203 ...
(Large data can't be list directly, it can be accessed here:https://drive.google.com/file/d/0Bz1-hopez9CGTFdzU0t3RDAyaHc/edit?usp=sharing)
Matlab:
fft2(peak) --(sample result)
12.5663706143590 -12.4458341615690
-12.4458341615690 12.3264538927637
Python:
np.fft.fft2(peak) --(sample result)
12.56637061 +0.00000000e+00j -12.44583416 +3.42948517e-15j
-12.44583416 +3.35525358e-15j 12.32645389 -6.78073635e-15j
Please help me to explain why, and give suggestion on how to fix it.

The Fourier transform of a real, even function is real and even (ref). Therefore, it appears that your FFT should be real? Numpy is probably just struggling with the numerics while MATLAB may outright check for symmetry and force the solution to be real.
MATLAB uses FFTW3 while my research indicates Numpy uses a library called FFTPack. FFTW is one of the standards for FFT performance and uses a number of tricks to work quickly and perform calculations to the best precision possible. You can incredibly tiny numbers and this offers a number of numerical challenges that any library will be hard pressed to resolve.
You might consider executing the Python code against an FFTW3 wrapper like pyFFTW3 and see if you get similar results.
It appears that your input data is gaussian real and even, in which case we do expect the FFT2 of the signal to be real and even. If all your inputs are this way you could just take the real part. Or round to a certain precision. I would trust MATLAB's FFTW code over the Python code.
Or you could just ignore it. The differences are quite small and a value of 3e-15i is effectively zero for most applications. If you have automated the comparison, consider calling them equivalent if the mean square error of all the entries is less than some threshold (say 1e-8 or 1e-15 or 1e-20).

how could we obtain magnitude of frequency from a set of complex numbers obtained after performing FFT in python?

i don't know what to do after obtaining a set of complex numbers from FFT on a wav file.How could i obtain the corresponding frequencies.This is output i got after performing FFT which is shown below
[ 12535945.00000000 +0.j -30797.74496367 +6531.22295858j
-26330.14948055-11865.08322966j ..., 34265.08792783+31937.15794965j
-26330.14948055+11865.08322966j -30797.74496367 -6531.22295858j]

Actually the abs(x) operation only converts a real/imaginary pair from your result list into a magnitude. Do that unless you want to keep the imaginary portion for future use. So after conversion, each number in the result list represents a magnitude of signal at a certain frequency in your frequency spectrum. So frequency is represented by list index. When you plot the data on an XY graph, what you see is the magnitude of frequencies that your source signal contains. Don't forget that only your first half of data is valid. The other half is usually a mirror image of the first half due to aliasing.
For example say you run a 1024 point FFT on a wav file that contains data sampled at 10Khz. The FFT will take that 10Khz spectrum and divide that into 1024 'bins'. Then the FFT will decide how much each chunk of spectrum is present in the source wav file. Your output should be those bins. Generally when I do a frequency analysis, the actual numbers I get back aren't what's important. Its the magnitudes relative to surrounding bins that I'm interested in.
For a little more detail, we're relying on the principle of superposition which states that any time-varying signal containing many frequencies can be split up into many signals containing one component frequency each and vice versa. So the FFT output reflects this property. Each value of your output list represents a magnitude for a signal at a single frequency (usually called a 'bin') that is present in your source signal. Combine all those signals together and you should get your source signal back.
Oh and in case you didn't know, only the first half of your result list is valid due to Nyquist's Rule (or law, not sure) which says that all sampling systems can only reproduce frequencies in a signal that is at most half the sampling frequency. So if you sample a signal at 10Khz, you can only reproduce frequencies up to 5Khz from the data taken during your sampling. The same principle is the reason why only the first half of your FFT data is valid. The second half is an alias of the first half.
Sorry for the long-winded explanation, your question doesn't indicate what experience you have so I thought an explanation of the general gist of an FFT is needed.

As #KennyTM already explained on the duplicate question:
The frequency is determined by the index of the array. Each element corresponds to a frequency.
To determine the frequency of that each element represents, you need to know the sampling frequency of your data and the length of the array.
Basically, it would be something like:
sampling_freq = 1000.0 # in Hz
freq = np.linspace(0, (1.0 / sampling_freq / 2.0), (x.size / 2) + 1)
For one half of the fft array (which is symmetric about the center). My memory is rusty, though, so this may be a bit off...
Either way, numpy has a helper function to do it for you: numpy.fft.fftfreq

If I'm not mistaken, the frequency can be obtained by calculating the magnitude of the complex number. So a simple abs(x) on each of those complex numbers should return the frequency.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.