I am trying to understand what the scipy.signal.spectrogram()'s output are, and how to use them. Currently, I read a .wav file and generate a spectrogram.
from scipy.io import wavfile as wav
from scipy import signal
sample_rate, data = wav.read('sound.wav')
f, t, Sxx = signal.spectrogram(data, sample_rate)
--
In case understanding this completely wrong, my idea of a spectrogram is a 3D graph consisting of:
x-axis: time
y-axis: frequency
pixel colour/brightness: amplitude
So I'm wondering how f, t and Sxx relate to the time, frequency, and amplitude.
Thanks for reading, any help is appreciated!
f is the frequency array, containing the frequencies of every band of the fft. Which can be used as the labels for a graph
t is the time array, containing the time at which this FFT was made relative to the source signal. Again can be used for labels.
The Sxx array contains the amplitudes and is a 2d array whose shape is the length of f by the length of t.
Therefore the axis which matches the length of the time array is the time axis and the other the frequency.
You will need to find the min and max values of the Sxx array yourself, if you want to normalise for display.
Related
I have a dataset a csv file representing a wave like shown below. I would like to find the frequency of oscillations, so I have done fft. But the output of fft is peak at zero. I am new to python and fft. So I am not sure what I am doing wrong.
The data is captured at 300Hz(300 data points in one second). The data set contains 6317 values.
[image1]
Every peak has a wave following it. Here is an example at data points from 250 to 350
[image2]
import matplotlib.pyplot as plt
import csv
import numpy as np
csvfile=open('./abc.csv')
csvreader=csv.reader(csvfile)
readdata=next(csvreader)
csvfile.close()
data=np.array([readdata],dtype='float')
data1=data.reshape(6317,)
sp = np.fft.fft(data1)
sp_mag=np.abs(sp)/data1.size
freq = np.fft.fftfreq(data1.shape[-1])
plt.subplot(2,1,1)
plt.plot(data1)
plt.subplot(2,1,2)
plt.plot(freq,sp_mag)
plt.show()
The csv is available here .
The frequency associated with first three and next 3 peaks is same. So in fft i expect two peaks t different frequency.
Any help is really appreciated. Kindly let me know if any other data is needed to answer this question.
The value of the FFT at 0 is proportional to the sum of the data. Probably the easiest fix is to subtract off the mean of the data before taking the FFT (assuming you don't care about the constant offset).
Adopting the notation from wikipedia
X[m] = sum[ x[n]*exp(-i*2*pi*n*m/N) ]
(X is the FFT, x is the original data)
For m=0, the exponential factors are all ==1, so X[0] == sum[x[n]] (for this convention on where to put the normalization factors).
I want to apply Fourier transformation using fft function to my time series data to find "patterns" by extracting the dominant frequency components in the observed data, ie. the lowest 5 dominant frequencies to predict the y value (bacteria count) at the end of each time series.
I would like to preserve the smallest 5 coefficients as features, and eliminate the rest.
My code is as below:
df = pd.read_csv('/content/drive/My Drive/df.csv', sep=',')
X = df.iloc[0:2,0:10000]
dft_X = np.fft.fft(X)
print(dft_X)
print(len(dft_X))
plt.plot(dft_X)
plt.grid(True)
plt.show()
# What is the graph about(freq/amplitude)? How much data did it use?
for i in dft_X:
m = i[np.argpartition(i,5)[:5]]
n = i[np.argpartition(i,range(5))[:5]]
print(m,'\n',n)
Here is the output:
But I am not sure how to interpret this graph. To be precise,
1) Does the graph show the transformed values of the input data? I only used 2 rows of data(each row is a time series), thus data is 2x10000, why are there so many lines in the graph?
2) To obtain frequency value, should I use np.fft.fftfreq(n, d=timestep)?
Parameters:
n : int
Window length.
d : scalar, optional
Sample spacing (inverse of the sampling rate). Defaults to 1.
Returns:
f : ndarray
Array of length n containing the sample frequencies.
How to determine n(window length) and sample spacing?
3) Why are transformed values all complex numbers?
Thanks
I'm gonna answer in reverse order of your questions
3) Why are transformed values all complex numbers?
The output of a Fourier Transform is always complex numbers. To get around this fact, you can either apply the absolute value on the output of the transform, or only plot the real part using:
plt.plot(dft_X.real)
2) To obtain frequency value, should I use np.fft.fftfreq(n, d=timestep)?
No, the "frequency values" will be visible on the output of the FFT.
1) Does the graph show the transformed values of the input data? I only used 2 rows of data(each row is a time series), thus data is 2x10000, why are there so many lines in the graph?
Your graph has so many lines because it's making a line for each column of your data set. Apply the FFT on each row separately (or possibly just transpose your dataframe) and then you'll get more actual frequency domain plots.
Follow up
Would using absolute value or real part of the output as features for a later model have different effect than using the original output?
Absolute values are easier to work with usually.
Using real part
Using absolute value
Here's the Octave code that generated this:
Fs = 4000; % Sampling rate of signal
T = 1/Fs; % Period
L = 4000; % Length of signal
t = (0:L-1)*T; % Time axis
freq = 1000; % Frequency of our sinousoid
sig = sin(freq*2*pi*t); % Fill Time-Domain with 1000 Hz sinusoid
f_sig = fft(sig); % Apply FFT
f = Fs*(0:(L/2))/L; % Frequency axis
figure
plot(f,abs(f_sig/L)(1:end/2+1)); % peak at 1kHz)
figure
plot(f,real(f_sig/L)(1:end/2+1)); % main peak at 1kHz)
In my example, you can see the absolute value returned no noise at frequencies other than the sinusoid of frequency 1kHz I generated while the real part had a bigger peak at 1kHz but also had much more noise.
As for effects, I don't know what you mean by that.
is it expected that "frequency values" always be complex numbers
Always? No. The Fourier series represents the frequency coefficients at which the sum of sines and cosines completely equate any continuous periodic function. Sines and cosines can be written in complex forms through Euler's formula. This is the most convenient way to store Fourier coefficients. In truth, the imaginary part of your frequency-domain signal represents the phase of the signal. (i.e if I have 2 sine functions of the same frequency, they can have different complex forms depending on the time shifting). However, most libraries that provide an FFT function will, by default, store FFT coefficients as complex numbers, to facilitate phase and magnitude calculations.
Is it convention that FFT use each column of dataset when plotting a line
I think it is an issue with mathplotlib.plot, not np.fft.
Could you please show me how to apply FFT on each row separately
There are many ways to go around this and I don't want to force you down one path, so I will propose the general solution to iterate over each row of your dataframe and apply the FFT on each specific row. Otherwise, in your case, I believe transposing your output could also work.
We are trying to build a program to get amplitude and frequency list from an .wav file, trying it in Python.
We tried pyaudio for that I don't know much about pyaudio, so I need some suggestions on it.
import scipy
import numpy as np
file = '123.wav'
from scipy.io import wavfile as wav
fs, data = wav.read(file)
length=len(data.shape)
#if length==2:
# data= data.sum(axis=1)/2
n = data.shape[0]
sec = n/float(fs)
ts = 1.00/fs
t = scipy.arange(0,sec,ts)
FFT = abs(scipy.fft(data))
FFT_size = FFT[range(n//2)]
freq = scipy.fftpack.fftfreq(data.size, t[1]-t[0])
max_freq = max(freq)
min_freq = min(freq)
plot_freq(freq, n, t, data)
The actual result returning is frequency list. I also want amplitude list don't know how to get it.
typically a call to an fft api will return an array of imaginary numbers where each array element contains a complex number in the form of ( Areal, AImaginary ) where each element of the array represents a frequency (the value of the freq is implied by array index [find the formula to calc freq based on array index])
on the complex array element 0 represents frequency 0 which is your direct current offset, then freq of each subsequent freq is calculated using
incr_freq := sample_rate / number_of_samples
so for that to be meaningful you must have prior knowledge of the sample rate of your source input time series ( audio or whatever ) and number of samples is just the length of the floating point raw audio curve array you fed into your fft call
... as you iterate across this array of complex numbers calculate the amplitude using the Areal and AImaginary of each frequency bin's complex number using formula
curr_mag = 2.0 * math.Sqrt(curr_real*curr_real+curr_imag*curr_imag) / number_of_samples
as you iterate across the complex array returned from your fft call be aware of notion of Nyquist Limit which means you only consume the first half of the number of elements of that complex array (and double the magnitude of each freq - see formula above)
... see the full pseudocode at Get frequency with highest amplitude from FFT
... I ran your code and nothing happened ... what is the meaning of your python
[range(n//2)]
You possibly want pitch, not spectral frequency, which is a different algorithm than just using an FFT to find the highest magnitude. An FFT returns the entire spectral frequency range (every frequency up to Fs/2, not just one frequency), in your case for the entire file. And the highest magnitude is often not for the pitch frequency (possibly for some high overtone instead).
You also took the FFT of the entire file, not a bunch of FFTs for time slices (usually small overlapping windows) at the time increment you desire for your list's temporal resolution. This will produce a time array of all the FFT frequency arrays (thus, a 2D array). Usually called a spectrogram. There may be a built in function for this in some library.
Can I make amplitude from this formula
the frequency of the wave is set by whatever is driving the oscillation in the medium. Examples are a speaker that sets up a sound wave, or the hand that shakes the end of a stretched string.
the speed of the wave is a property of the medium.
the wavelength of the wave is then determined by the frequency and speed:
λ = v/f
I don't know it gonna be the right process or not
I perform a Short Time Fourier Transform as described here.
from scipy.signal import stft
f, t, Zxx = stft(data)
As far as I understand I get the following objects: (1) an 1D array containing values of frequencies, (2) an 1D array containing values of time, and (3) a 2D array containing intensities of a given frequency at given moment of time.
My question is about how to control / modify the grid of frequencies. Per default I got a grid of 129 frequencies. The first thing that I would like to do is to increase the number of frequencies (to have a more granular grid).
In addition to that, it would be nice to be able to specify what frequencies range should be used.
As Uvar said, the range of observable frequencies is limited by the parameter nperseg. Given n samples, one can observe only n/2 + 1 frequencies, namely the frequencies fs*k/n with k = 0,1,2,...,n/2 where fs is the sampling frequency, and n is nperseg. Anything higher is lost due to aliasing. This is a mathematical limitation, nothing SciPy can do about it. To have a sufficiently granular list of frequencies, increase nperseg. The default value nperseg = 256 gives (256/2) + 1 = 129 frequencies.
The discrete Fourier transform gives you all observable frequencies at once, it is not possible to choose a custom range. Of course, you can slice the output f to select the range of frequencies of interest.
I'm trying to apply machine learning algorithms on raw audio. My training would be on the Fourier coefficient of the audio signal.
I was trying to get those and apply ifft to get my audio back but it doesn't work with my implementation, which is :
fs, data = wavfile.read('dataset piano/wav/music (1).wav')
Te = 0.25
T = 40
a = data.T[0] #retrieve first channel
#put the information in a matrix, one row will contain the fourier coefficients of 0.25s of music.
#The whole matrix, which has 40 rows will contain information of 10s of the wav file.
X = np.array([fft(a[int(i*fs*Te):int((i+1)*fs*Te)]) for i in range(T)])
Z = ifft(X.flatten())
Z = Z.astype(data.dtype)
wavfile.write('test3.wav',fs,Z)
Normally it should play the first 10s of the wav file but it doesn't and I really don't understand why. All I get is a high-pitched sound. I am using the fft and ifft from scipy.
You were very close. Just change
Z = ifft(X.flatten())
to
Z = ifft(X).flatten()
What you are doing is computing an inverse Fourier transform on a concatenation of spectra, which really makes no sense. I think what you rather want to do, is concatenate inverse Fourier transform on spectra. This is what I have done and managed to reconstitute a signal that sounds well.
ifft(X) will run an IFFT on every array along the last dimension, which is the spectrum dimension in your case, and return an array of the same shape (40, 11025). Then flatten will concatenate every row, making an sensible signal.