I have a WAV file which I would like to visualize in the frequency domain. Next, I would like to write a simple script that takes in a WAV file and outputs whether the energy at a certain frequency "F" exceeds a threshold "Z" (whether a certain tone has a strong presence in the WAV file). There are a bunch of code snippets online that show how to plot an FFT spectrum in Python, but I don't understand a lot of the steps.
I know that wavfile.read(myfile) returns the sampling rate (fs) and the data array (data), but when I run an FFT on it (y = numpy.fft.fft(data)), what units is y in?
To get the array of frequencies for the x-axis, some posters do this where n = len(data):
X = numpy.linspace(0.0, 1.0/(2.0*T), n/2)
and others do this:
X = numpy.fft.fftfreq(n) * fs)[range(n/2)]
Is there a difference between these two methods and is there a good online explanation for what these operations do conceptually?
Some of the online tutorials about FFTs mention windowing, but not a lot of posters use windowing in their code snippets. I see that numpy has a numpy.hamming(N), but what should I use as the input to that method and how do I "apply" the output window to my FFT arrays?
For my threshold computation, is it correct to find the frequency in X that's closest to my desired tone/frequency and check if the corresponding element (same index) in Y has an amplitude greater than the threshold?
FFT data is in units of normalized frequency where the first point is 0 Hz and one past the last point is fs Hz. You can create the frequency axis yourself with linspace(0.0, (1.0 - 1.0/n)*fs, n). You can also use fftfreq but the components will be negative.
These are the same if n is even. You can also use rfftfreq I think. Note that this is only the "positive half" of your frequencies, which is probably what you want for audio (which is real-valued). Note that you can use rfft to just produce the positive half of the spectrum, and then get the frequencies with rfftfreq(n,1.0/fs).
Windowing will decrease sidelobe levels, at the cost of widening the mainlobe of any frequencies that are there. N is the length of your signal and you multiply your signal by the window. However, if you are looking in a long signal you might want to "chop" it up into pieces, window them, and then add the absolute values of their spectra.
"is it correct" is hard to answer. The simple approach is as you said, find the bin closest to your frequency and check its amplitude.
Related
I'm trying to do some tests before I proceed analyzing some real dataset via FFT, and I've found the following problem.
First, I create a signal as the sum of two cosines and then use rfft to to the transformation (since it has only real values):
import numpy as np
import matplotlib.pyplot as plt
from scipy.fft import rfft, rfftfreq
# Number of sample points
N = 800
# Sample spacing
T = 1.0 / 800.0
x = np.linspace(0.0, N*T, N)
y = 0.5*np.cos(10*2*np.pi*x) + 0.5*np.cos(200*2*np.pi*x)
# FFT
yf = rfft(y)
xf = rfftfreq(N, T)
fig, ax = plt.subplots(1,2,figsize=(15,5))
ax[0].plot(x,y)
ax[1].plot(xf, 2.0/N*np.abs(yf))
As it can be seen from the definition of the signal, I have two oscillations with amplitude 0.5 and frequency 10 and 200. Now, I would expect the FFT spectrum to be something like two deltas at those points, but apparently increasing the frequency broadens the peaks:
From the first peak it can be infered that the amplitude is 0.5, but not for the second. I've tryied to obtain the area under the peak using np.trapz and use that as an estimate for the amplitude, but as it is close to a dirac delta it's very sensitive to the interval I choose. My problem is that I need to get the amplitude as exact as possible for my data analysis.
EDIT: As it seems to be something related with the number of points, I decided to increment (now that I can) the sample frequency. This seems to solve the problem, as it can be seen in the figure:
However, it still seems strange that for a certain number of points and sample frequency, the high frequency peaks broaden...
It is not strange , you have leakage of the frequency bins. When you discretize the signal (sampling) needed for the Fourier transfrom , frequency bins are created which are frequency intervals where the the amplitude is calculated. And each bin has wide which is given by the sample_rate / num_points . So , the less the number of bins the more difficult is to assign precise amplitudes to every frequency. Other problems in choosing the best sampling rate exist such as the shannon-nyquist theorem to prevent aliasing. https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem . But depending on the problem sometimes there some custom rates used for sampling. E.g. when dealing with audio a sampling rate of 44,100 Hz is widely used , cause is based on the limits of the human hearing. So it depends also on nature of the data you want to perform analysis as you wrote. Anyway , since this question has also theoretical value , you can also check https://dsp.stackexchange.com for some useful info.
I would comment to George's answer, but yet I cannot.
Maybe a starting point for your research are the properties of the Discrete Fourier Transform.
The signal in the time domain is actual the cosines multiplied by a box window which transforms into the frequency domain as the convolution of the deltas with the sinc function. The sinc functions will smear the spectrum.
However, I am not sure we are observing spectral leakage here, since the window fits exactly to the full period of cosines. The discretization of the bins might still play a role here.
I want to apply Fourier transformation using fft function to my time series data to find "patterns" by extracting the dominant frequency components in the observed data, ie. the lowest 5 dominant frequencies to predict the y value (bacteria count) at the end of each time series.
I would like to preserve the smallest 5 coefficients as features, and eliminate the rest.
My code is as below:
df = pd.read_csv('/content/drive/My Drive/df.csv', sep=',')
X = df.iloc[0:2,0:10000]
dft_X = np.fft.fft(X)
print(dft_X)
print(len(dft_X))
plt.plot(dft_X)
plt.grid(True)
plt.show()
# What is the graph about(freq/amplitude)? How much data did it use?
for i in dft_X:
m = i[np.argpartition(i,5)[:5]]
n = i[np.argpartition(i,range(5))[:5]]
print(m,'\n',n)
Here is the output:
But I am not sure how to interpret this graph. To be precise,
1) Does the graph show the transformed values of the input data? I only used 2 rows of data(each row is a time series), thus data is 2x10000, why are there so many lines in the graph?
2) To obtain frequency value, should I use np.fft.fftfreq(n, d=timestep)?
Parameters:
n : int
Window length.
d : scalar, optional
Sample spacing (inverse of the sampling rate). Defaults to 1.
Returns:
f : ndarray
Array of length n containing the sample frequencies.
How to determine n(window length) and sample spacing?
3) Why are transformed values all complex numbers?
Thanks
I'm gonna answer in reverse order of your questions
3) Why are transformed values all complex numbers?
The output of a Fourier Transform is always complex numbers. To get around this fact, you can either apply the absolute value on the output of the transform, or only plot the real part using:
plt.plot(dft_X.real)
2) To obtain frequency value, should I use np.fft.fftfreq(n, d=timestep)?
No, the "frequency values" will be visible on the output of the FFT.
1) Does the graph show the transformed values of the input data? I only used 2 rows of data(each row is a time series), thus data is 2x10000, why are there so many lines in the graph?
Your graph has so many lines because it's making a line for each column of your data set. Apply the FFT on each row separately (or possibly just transpose your dataframe) and then you'll get more actual frequency domain plots.
Follow up
Would using absolute value or real part of the output as features for a later model have different effect than using the original output?
Absolute values are easier to work with usually.
Using real part
Using absolute value
Here's the Octave code that generated this:
Fs = 4000; % Sampling rate of signal
T = 1/Fs; % Period
L = 4000; % Length of signal
t = (0:L-1)*T; % Time axis
freq = 1000; % Frequency of our sinousoid
sig = sin(freq*2*pi*t); % Fill Time-Domain with 1000 Hz sinusoid
f_sig = fft(sig); % Apply FFT
f = Fs*(0:(L/2))/L; % Frequency axis
figure
plot(f,abs(f_sig/L)(1:end/2+1)); % peak at 1kHz)
figure
plot(f,real(f_sig/L)(1:end/2+1)); % main peak at 1kHz)
In my example, you can see the absolute value returned no noise at frequencies other than the sinusoid of frequency 1kHz I generated while the real part had a bigger peak at 1kHz but also had much more noise.
As for effects, I don't know what you mean by that.
is it expected that "frequency values" always be complex numbers
Always? No. The Fourier series represents the frequency coefficients at which the sum of sines and cosines completely equate any continuous periodic function. Sines and cosines can be written in complex forms through Euler's formula. This is the most convenient way to store Fourier coefficients. In truth, the imaginary part of your frequency-domain signal represents the phase of the signal. (i.e if I have 2 sine functions of the same frequency, they can have different complex forms depending on the time shifting). However, most libraries that provide an FFT function will, by default, store FFT coefficients as complex numbers, to facilitate phase and magnitude calculations.
Is it convention that FFT use each column of dataset when plotting a line
I think it is an issue with mathplotlib.plot, not np.fft.
Could you please show me how to apply FFT on each row separately
There are many ways to go around this and I don't want to force you down one path, so I will propose the general solution to iterate over each row of your dataframe and apply the FFT on each specific row. Otherwise, in your case, I believe transposing your output could also work.
We are trying to build a program to get amplitude and frequency list from an .wav file, trying it in Python.
We tried pyaudio for that I don't know much about pyaudio, so I need some suggestions on it.
import scipy
import numpy as np
file = '123.wav'
from scipy.io import wavfile as wav
fs, data = wav.read(file)
length=len(data.shape)
#if length==2:
# data= data.sum(axis=1)/2
n = data.shape[0]
sec = n/float(fs)
ts = 1.00/fs
t = scipy.arange(0,sec,ts)
FFT = abs(scipy.fft(data))
FFT_size = FFT[range(n//2)]
freq = scipy.fftpack.fftfreq(data.size, t[1]-t[0])
max_freq = max(freq)
min_freq = min(freq)
plot_freq(freq, n, t, data)
The actual result returning is frequency list. I also want amplitude list don't know how to get it.
typically a call to an fft api will return an array of imaginary numbers where each array element contains a complex number in the form of ( Areal, AImaginary ) where each element of the array represents a frequency (the value of the freq is implied by array index [find the formula to calc freq based on array index])
on the complex array element 0 represents frequency 0 which is your direct current offset, then freq of each subsequent freq is calculated using
incr_freq := sample_rate / number_of_samples
so for that to be meaningful you must have prior knowledge of the sample rate of your source input time series ( audio or whatever ) and number of samples is just the length of the floating point raw audio curve array you fed into your fft call
... as you iterate across this array of complex numbers calculate the amplitude using the Areal and AImaginary of each frequency bin's complex number using formula
curr_mag = 2.0 * math.Sqrt(curr_real*curr_real+curr_imag*curr_imag) / number_of_samples
as you iterate across the complex array returned from your fft call be aware of notion of Nyquist Limit which means you only consume the first half of the number of elements of that complex array (and double the magnitude of each freq - see formula above)
... see the full pseudocode at Get frequency with highest amplitude from FFT
... I ran your code and nothing happened ... what is the meaning of your python
[range(n//2)]
You possibly want pitch, not spectral frequency, which is a different algorithm than just using an FFT to find the highest magnitude. An FFT returns the entire spectral frequency range (every frequency up to Fs/2, not just one frequency), in your case for the entire file. And the highest magnitude is often not for the pitch frequency (possibly for some high overtone instead).
You also took the FFT of the entire file, not a bunch of FFTs for time slices (usually small overlapping windows) at the time increment you desire for your list's temporal resolution. This will produce a time array of all the FFT frequency arrays (thus, a 2D array). Usually called a spectrogram. There may be a built in function for this in some library.
Can I make amplitude from this formula
the frequency of the wave is set by whatever is driving the oscillation in the medium. Examples are a speaker that sets up a sound wave, or the hand that shakes the end of a stretched string.
the speed of the wave is a property of the medium.
the wavelength of the wave is then determined by the frequency and speed:
λ = v/f
I don't know it gonna be the right process or not
I am trying to use a fast fourier transform to extract the phase shift of a single sinusoidal function. I know that on paper, If we denote the transform of our function as T, then we have the following relations:
However, I am finding that while I am able to accurately capture the frequency of my cosine wave, the phase is inaccurate unless I sample at an extremely high rate. For example:
import numpy as np
import pylab as pl
num_t = 100000
t = np.linspace(0,1,num_t)
dt = 1.0/num_t
w = 2.0*np.pi*30.0
phase = np.pi/2.0
amp = np.fft.rfft(np.cos(w*t+phase))
freqs = np.fft.rfftfreq(t.shape[-1],dt)
print (np.arctan2(amp.imag,amp.real))[30]
pl.subplot(211)
pl.plot(freqs[:60],np.sqrt(amp.real**2+amp.imag**2)[:60])
pl.subplot(212)
pl.plot(freqs[:60],(np.arctan2(amp.imag,amp.real))[:60])
pl.show()
Using num=100000 points I get a phase of 1.57173880459.
Using num=10000 points I get a phase of 1.58022110476.
Using num=1000 points I get a phase of 1.6650441064.
What's going wrong? Even with 1000 points I have 33 points per cycle, which should be enough to resolve it. Is there maybe a way to increase the number of computed frequency points? Is there any way to do this with a "low" number of points?
EDIT: from further experimentation it seems that I need ~1000 points per cycle in order to accurately extract a phase. Why?!
EDIT 2: further experiments indicate that accuracy is related to number of points per cycle, rather than absolute numbers. Increasing the number of sampled points per cycle makes phase more accurate, but if both signal frequency and number of sampled points are increased by the same factor, the accuracy stays the same.
Your points are not distributed equally over the interval, you have the point at the end doubled: 0 is the same point as 1. This gets less important the more points you take, obviusly, but still gives some error. You can avoid it totally, the linspace has a flag for this. Also it has a flag to return you the dt directly along with the array.
Do
t, dt = np.linspace(0, 1, num_t, endpoint=False, retstep=True)
instead of
t = np.linspace(0,1,num_t)
dt = 1.0/num_t
then it works :)
The phase value in the result bin of an unrotated FFT is only correct if the input signal is exactly integer periodic within the FFT length. Your test signal is not, thus the FFT measures something partially related to the phase difference of the signal discontinuity between end-points of the test sinusoid. A higher sample rate will create a slightly different last end-point from the sinusoid, and thus a possibly smaller discontinuity.
If you want to decrease this FFT phase measurement error, create your test signal so the your test phase is referenced to the exact center (sample N/2) of the test vector (not the 1st sample), and then do an fftshift operation (rotate by N/2) so that there will be no signal discontinuity between the 1st and last point in your resulting FFT input vector of length N.
This snippet of code might help:
def reconstruct_ifft(data):
"""
In this function, we take in a signal, find its fft, retain the dominant modes and reconstruct the signal from that
Parameters
----------
data : Signal to do the fft, ifft
Returns
-------
reconstructed_signal : the reconstructed signal
"""
N = data.size
yf = rfft(data)
amp_yf = np.abs(yf) #amplitude
yf = yf*(amp_yf>(THRESHOLD*np.amax(amp_yf)))
reconstructed_signal = irfft(yf)
return reconstructed_signal
The 0.01 is the threshold of amplitudes of the fft that you would want to retain. Making the THRESHOLD greater(more than 1 does not make any sense), will give
fewer modes and cause higher rms error but ensures higher frequency selectivity.
(Please adjust the TABS for the python code)
I am trying to implement an algorithm in python, but I am not sure when I should use fftshift(fft(fftshift(x))) and when only fft(x) (from numpy). Is there a rule of thumb based on the shape of input data?
I am using fftshift instead of ifftshift due to the even number of values in the vector x.
It really just depends on what you want. The DFT (and hence the FFT) is periodic in the frequency domain with period equal to 2pi.
The fft() function will return the approximation of the DFT with omega (radians/s) from 0 to pi (i.e. 0 to fs, where fs is the sampling frequency). All fftshift() does is swap the output vector of the fft() right down the middle. So the output of fftshift(fft()) is now from -pi/2 to pi/2.
Usually, people like to plot a good approximation of the DTFT (or maybe even the CTFT) using the FFT, so they zero-pad the input with a huge amount of zeros (the function fft() does this on it's own) and then they use the fftshift() function to plot between -pi and pi.
In other words, use fftshift(fft()) for plotting, and fft() for the math!
fft(fftshift(x)) rotates the input vector so the the phase of the complex FFT result is relative to the center of the original data window. If the input waveform is not exactly integer periodic in the FFT width, phase relative to the center of the original window of data may make more sense than the phase relative to some averaging between the discontinuous beginning and end. fft(fftshift(x)) also has the property that the imaginary component of a result will always be positive for a positive zero crossing at the center of the window of any antisymmetric waveform component.
fftshift(fft(y)) rotates the FFT results so that the DC bin is in the center of the result, halfway between -Fs/2 and Fs/2, which is a common spectrum display format.