Weird FFT plot with numpy random set

Weird FFT plot with numpy random set - python

Code below:
import numpy as np
from numpy import random_intel
import mkl_fft
import matplotlib.pyplot as plt
n = 10**5
a = np.random_intel.rand(n)
b = mkl_fft.fft(a)
plt.scatter(b.real,b.imag)
plt.show()
print(b)
for i in b :
if i.real > n/2:
print("Weird FFT Number is ",i)
Result is :
You can see:
Weird FFT Number is (50020.99077289924+0j)
Why FFT with random set came out one particular number?
(Thanks to Paul Panzer & SleuthEye)
With mkl_fft.fft(a-0.5) the final result is:
[2019/03/29 Updated]
With normalized data everything went well
b = mkl_fft.fft((a - np.mean(a))/np.std(a))
The average value of (a - np.mean(a))/np.std(a) is near zero

That is the constant or zero frequency mode, which is essentially the mean of your signal. You are sampling uniformly from the unit interval, so the mean is ~0.5. Some fft implementations scale this with the number of points to save a multiplication.

The large value in the FFT output happens to be the very first one which corresponds to the DC component. This indicates that the input has a non-zero average value over the entire data set.
Indeed if you look closer at the input data, you might notice that the values are always between 0 and 1, with an average value around 0.5. This is consistent with the rand function implementation which provides pseudo-random samples drawn from a uniform distribution over [0, 1).
You may confirm this to be the case by subtracting the average value with
b = mkl_fft.fft(a - np.mean(a))
and noting that the large initial value b[0] should be near zero.

Related

FFT Scipy Calculating Frequency

I am trying to calculate a signal-frequency by using scipy FFT. By calculating the frequency "by hand" its obviously around 2.5 Hz.
So this is my input signal:
Signal Amplitude over Time
this is the code i am using:
import matplotlib.pyplot as plt
import numpy as np
from scipy import fft
##---Get Data
AccZ = np.loadtxt('...', delimiter=';', usecols=(0,3))
Time = np.array(AccZ[:,0])
AccZ = np.array(AccZ[:,1])
Time_Step = Time[1]-Time[0]
##---FFT
AccZ_fft = fft.fft(AccZ)
Amp = np.abs(AccZ_fft)
Sample_Freq = fft.fftfreq(AccZ.size, d=Time_Step)
Amp_Freq = np.array([Amp, Sample_Freq])
Amp_Pos = Amp_Freq[0,:].argmax()
Peak_Freq = Amp_Freq[1, Amp_Pos]
This is what i get from the FFT:
Amplitude over Frequency
Unfortunatly my highest value is always in array position [0] which means my peak frequency from my Sample_Freq array is always 0.
What am i doing wrong here? Would apreciate any help.

The zeroth bin is always the DC component, i.e. vertical offset of the input function. If you're not interested in this value, you can simply do:
AccZ_fft[0] = 0
This is equivalent to doing this to the input:
AccZ -= np.mean(AccZ)
and should simplify the the peak finding.
By the way, for real-valued input signals (as yours seems to be), it is advisable to use np.fft.rfft instead of np.fft.fft. This way, you won't get this symmetrical output with redundant positive and negative frequencies.

PDF of the sum of Gaussian distributions using FFT

I am trying to derive the PDF of the sum of independent random variables. At first i would like to do this for a simple case: sum of Gaussian random variables.
I was surprised to see that I don't get a Gaussian density function when I sum an even number of gaussian random variables. I actually get:
which looks like two halfs of a Gaussian distribution.
On the other hand, when I sum an odd number of Gaussian distributions i get the right distribution:
below the code I used to produce the results above:
import numpy as np
from scipy.stats import norm
from scipy.fftpack import fft,ifft
import matplotlib.pyplot as plt
%matplotlib inline
a=10**(-15)
end=norm(0,1).ppf(a)
sample=np.linspace(end,-end,1000)
pdf=norm(0,1).pdf(sample)
plt.subplot(211)
plt.plot(np.real(ifft(fft(pdf)**2)))
plt.subplot(212)
plt.plot(np.real(ifft(fft(pdf)**3)))
Could someone help me understand why I get odd results for even sums of Gaussian distributions?

Even though your code creates a zero-mean Gaussian PDF:
sample=np.linspace(end,-end,1000)
pdf=norm(0,1).pdf(sample)
the FFT does not know about sample, and only sees pdf with samples at 0, 1, 2, 3, ... 999. The FFT expects the origin to be the first sample of the signal. To the FFT function, your PDF is not zero mean, but has a mean of 500.
Thus, what is going on here is that you are adding two PDFs with a 500 mean, leading to one with a 1000 mean. And because the FFT imposes a periodicity to the spatial domain signal, you are seeing the PDF exiting the graph on the right and coming back in on the left.
Adding 3 PDFs shifts the mean to 1500, which due to periodicity is the same as 500, meaning it ends up in the same place as the original PDF.
The solution is to shift the origin to the first sample for the FFT, and shift the result back:
from scipy.fftpack import fftshift, ifftshift
pdf2 = fftshift(ifft(fft(ifftshift(pdf))**2))
ifftshift shifts the signal so that the center sample ends up at the first sample, and fftshift shifts it back to where you wanted it for display.
But do note that the way you generate the PDF, the origin is not at a sample, and so the above will not work exactly. Instead, use:
sample=np.linspace(end,-end,1001)
pdf=norm(0,1).pdf(sample)
By picking 1001 samples instead of 1000, zero is exactly at the middle sample.

Use R!
library(ggplot2)
f <- function(n) {
x1 <- rnorm(n)
x2 <- rnorm(n)
X <- x1+x2
return(ds)
}
ds.list <- lapply(10^(2:5),f)
ds <- Reduce(rbind,ds.list)
ggplot(ds,aes(X,fill = n)) + geom_density(alpha = 0.5) + xlab("")
Here's the distribution plot:

Matrix inversion for matrix with large values in python

I'm doing matrix inversion in python, and I found it very weird that the result differs by the data scale.
In the code below, it is expected that A_inv/B_inv = B/A. However, it shows that the difference between A_inv/B_inv and B/A becomes larger and larger depend on the data scale... Is this because Python cannot compute matrix inverse precisely for matrix with large values?
Also, I checked the condition number for B, which is a constant ~3.016 no matter the scale is.
Thanks!!!
import numpy as np
from matplotlib import pyplot as plt
D = 30
N = 300
np.random.seed(10)
original_data = np.random.sample([D, N])
A = np.cov(original_data)
A_inv = np.linalg.inv(A)
B_cond = []
diff = []
for k in xrange(1,10):
B = A * np.power(10,k)
B_cond.append(np.linalg.cond(B))
B_inv = np.linalg.inv(B)
### Two measurements of difference are used
diff.append(np.log(np.linalg.norm(A_inv/B_inv - B/A)))
#diff.append(np.max(np.abs(A_inv/B_inv - B/A)))
# print B_cond
plt.figure()
plt.plot(xrange(1,10), diff)
plt.xlabel('data(B) / data(A)')
plt.ylabel('log(||A_inv/B_inv - B/A||)')
plt.savefig('Inversion for large matrix')

I may be wrong, but I think it comes from number representation in machine.
When you are dealing with great numbers, your inverse matrix is going to have very little number in magnitude (close to zero). And clsoe to zero, the representation of the floating number is not precise enough, I guess...
https://en.wikipedia.org/wiki/Floating-point_arithmetic

There is no reason that you should expect np.linalg.norm(A_inv/B_inv - B/A) to be equal to anything special. Instead, you can check the quality of the inverse calculation by multiplying the original matrix by its inverse and checking the determinant, np.linalg.det(A.dot(A_inv)), which should be equal to 1.

real time series, differences in numpy ifft v. irfft

I want to generate a random time series with a prescribed spectral shape. To do this I will draw random complex Fourier coefficients with from the appropriate spectral distribution then transform the frequency to the time domain.
To generate a real time series, the Fourier spectrum must have real DC and Nyquist coefficients and have symmetric negative frequencies.
When I do this, I get different behavior from numpy's ifft versus its irfft.
As an example, here's a 32 sample white spectrum:
import numpy as np
Nsamp = 2**5
Nfreq = (Nsamp-1)//2 # num pos freq bins not including DC or Nyquist
DC = 0.
f_pos = np.random.randn(Nfreq) + 1j*np.random.randn(Nfreq)
Nyquist = np.random.randn() # this is real
f_neg = f_pos[::-1] # mirror pos freqs
f_tot = np.hstack((DC, f_pos, Nyquist, f_neg))
f_rep = np.hstack((DC, f_pos, Nyquist))
t1 = np.fft.ifft(f_tot)
t2 = np.fft.irfft(f_rep)
print(t1)
print(t2)
I would expect both t1 to be real and t1 and t2 to agree (within machine precision). Neither is true.
Am I using the ifft correctly? Looking at the frequencies output by np.fft.fftfreq(Nsamp), makes me think I'm building f_tot correctly for input.
irfft is the correct result, so I'll use that... but I'd like to know how use ifft for the future.
from the numpy.fft docs:
A[0] contains the zero-frequency term (the sum of the signal), which is always purely real for real inputs. Then A[1:n/2] contains the positive-frequency terms, and A[n/2+1:] contains the negative-frequency terms, in order of decreasingly negative frequency. For an even number of input points, A[n/2] represents both positive and negative Nyquist frequency, and is also purely real for real input.

extracting phase information using numpy fft

I am trying to use a fast fourier transform to extract the phase shift of a single sinusoidal function. I know that on paper, If we denote the transform of our function as T, then we have the following relations:
However, I am finding that while I am able to accurately capture the frequency of my cosine wave, the phase is inaccurate unless I sample at an extremely high rate. For example:
import numpy as np
import pylab as pl
num_t = 100000
t = np.linspace(0,1,num_t)
dt = 1.0/num_t
w = 2.0*np.pi*30.0
phase = np.pi/2.0
amp = np.fft.rfft(np.cos(w*t+phase))
freqs = np.fft.rfftfreq(t.shape[-1],dt)
print (np.arctan2(amp.imag,amp.real))[30]
pl.subplot(211)
pl.plot(freqs[:60],np.sqrt(amp.real**2+amp.imag**2)[:60])
pl.subplot(212)
pl.plot(freqs[:60],(np.arctan2(amp.imag,amp.real))[:60])
pl.show()
Using num=100000 points I get a phase of 1.57173880459.
Using num=10000 points I get a phase of 1.58022110476.
Using num=1000 points I get a phase of 1.6650441064.
What's going wrong? Even with 1000 points I have 33 points per cycle, which should be enough to resolve it. Is there maybe a way to increase the number of computed frequency points? Is there any way to do this with a "low" number of points?
EDIT: from further experimentation it seems that I need ~1000 points per cycle in order to accurately extract a phase. Why?!
EDIT 2: further experiments indicate that accuracy is related to number of points per cycle, rather than absolute numbers. Increasing the number of sampled points per cycle makes phase more accurate, but if both signal frequency and number of sampled points are increased by the same factor, the accuracy stays the same.

Your points are not distributed equally over the interval, you have the point at the end doubled: 0 is the same point as 1. This gets less important the more points you take, obviusly, but still gives some error. You can avoid it totally, the linspace has a flag for this. Also it has a flag to return you the dt directly along with the array.
Do
t, dt = np.linspace(0, 1, num_t, endpoint=False, retstep=True)
instead of
t = np.linspace(0,1,num_t)
dt = 1.0/num_t
then it works :)

The phase value in the result bin of an unrotated FFT is only correct if the input signal is exactly integer periodic within the FFT length. Your test signal is not, thus the FFT measures something partially related to the phase difference of the signal discontinuity between end-points of the test sinusoid. A higher sample rate will create a slightly different last end-point from the sinusoid, and thus a possibly smaller discontinuity.
If you want to decrease this FFT phase measurement error, create your test signal so the your test phase is referenced to the exact center (sample N/2) of the test vector (not the 1st sample), and then do an fftshift operation (rotate by N/2) so that there will be no signal discontinuity between the 1st and last point in your resulting FFT input vector of length N.

This snippet of code might help:
def reconstruct_ifft(data):
"""
In this function, we take in a signal, find its fft, retain the dominant modes and reconstruct the signal from that
Parameters
----------
data : Signal to do the fft, ifft
Returns
-------
reconstructed_signal : the reconstructed signal
"""
N = data.size
yf = rfft(data)
amp_yf = np.abs(yf) #amplitude
yf = yf*(amp_yf>(THRESHOLD*np.amax(amp_yf)))
reconstructed_signal = irfft(yf)
return reconstructed_signal
The 0.01 is the threshold of amplitudes of the fft that you would want to retain. Making the THRESHOLD greater(more than 1 does not make any sense), will give
fewer modes and cause higher rms error but ensures higher frequency selectivity.
(Please adjust the TABS for the python code)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Weird FFT plot with numpy random set - python

That is the constant or zero frequency mode, which is essentially the mean of your signal. You are sampling uniformly from the unit interval, so the mean is ~0.5. Some fft implementations scale this with the number of points to save a multiplication.

Related

FFT Scipy Calculating Frequency

PDF of the sum of Gaussian distributions using FFT

Matrix inversion for matrix with large values in python

real time series, differences in numpy ifft v. irfft

extracting phase information using numpy fft

Categories

Resources