My main task is to recognize a human humming from a microphone in real time. As the first step to recognizing signals in general, I have made a 5 seconds recording of a 440 Hz signal generated from an app on my phone and tried to detect the same frequency.
I used Audacity to plot and verify the spectrum from the same 440Hz wav file and I got this, which shows that 440Hz is indeed the dominant frequency :
(https://i.imgur.com/2UImEkR.png)
To do this with python, I use the PyAudio library and refer this blog. The code I have so far which I run with the wav file is this :
"""PyAudio Example: Play a WAVE file."""
import pyaudio
import wave
import sys
import struct
import numpy as np
import matplotlib.pyplot as plt
CHUNK = 1024
if len(sys.argv) < 2:
print("Plays a wave file.\n\nUsage: %s filename.wav" % sys.argv[0])
sys.exit(-1)
wf = wave.open(sys.argv[1], 'rb')
p = pyaudio.PyAudio()
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
channels=wf.getnchannels(),
rate=wf.getframerate(),
output=True)
data = wf.readframes(CHUNK)
i = 0
while data != '':
i += 1
data_unpacked = struct.unpack('{n}h'.format(n= len(data)/2 ), data)
data_np = np.array(data_unpacked)
data_fft = np.fft.fft(data_np)
data_freq = np.abs(data_fft)/len(data_fft) # Dividing by length to normalize the amplitude as per https://www.mathworks.com/matlabcentral/answers/162846-amplitude-of-signal-after-fft-operation
print("Chunk: {} max_freq: {}".format(i,np.argmax(data_freq)))
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot(data_freq)
ax.set_xscale('log')
plt.show()
stream.write(data)
data = wf.readframes(CHUNK)
stream.stop_stream()
stream.close()
p.terminate()
In the output, I get that the max frequency is 10 for all the chunks and an example of one of the plots is :
(https://i.imgur.com/zsAXME5.png)
I had expected this value to be 440 instead of 10 for all the chunks. I admit I know very little about the theory of FFTs and I appreciate any help in letting my solve this.
EDIT:
The sampling rate is 44100. no. of channels is 2 and sample width is also 2.
Forewords
As xdurch0 pointed out, you are reading a kind of index instead of a frequency. If you are about to make all computation by yourself you need to compute you own frequency vector before plotting if you want to get consistent result. Reading this answer may help you towards the solution.
The frequency vector for FFT (half plane) is:
f = np.linspace(0, rate/2, N_fft/2)
Or (full plane):
f = np.linspace(-rate/2, rate/2, N_fft)
On the other hand we can delegate most of the work to the excellent scipy.signal toolbox which aims to cope with this kind of problems (and many more).
MCVE
Using scipy package it is straight forward to get the desired result for a simple WAV file with a single frequency (source):
import numpy as np
from scipy import signal
from scipy.io import wavfile
import matplotlib.pyplot as plt
# Read the file (rate and data):
rate, data = wavfile.read('tone.wav') # See source
# Compute PSD:
f, P = signal.periodogram(data, rate) # Frequencies and PSD
# Display PSD:
fig, axe = plt.subplots()
axe.semilogy(f, P)
axe.set_xlim([0,500])
axe.set_ylim([1e-8, 1e10])
axe.set_xlabel(r'Frequency, $\nu$ $[\mathrm{Hz}]$')
axe.set_ylabel(r'PSD, $P$ $[\mathrm{AU^2Hz}^{-1}]$')
axe.set_title('Periodogram')
axe.grid(which='both')
Basically:
Read the wav file and get the sample rate (here 44.1kHz);
Compute the Power Spectrum Density and frequencies;
Then display it with matplotlib.
This outputs:
Find Peak
Then we can find the frequency of the first highest peak (P>1e-2, this criterion is subject to tuning) using find_peaks:
idx = signal.find_peaks(P, height=1e-2)[0][0]
f[idx] # 440.0 Hz
Putting all together it merely boils down to:
def freq(filename, setup={'height': 1e-2}):
rate, data = wavfile.read(filename)
f, P = signal.periodogram(data, rate)
return f[signal.find_peaks(P, **setup)[0][0]]
Handling multiple channels
I tried this code with my wav file, and got the error for the line
axe.semilogy(f, Pxx_den) as follows : ValueError: x and y must have
same first dimension. I checked the shapes and f has (2,) while
Pxx_den has (220160,2). Also, the Pxx_den array seems to have all
zeros only.
Wav file can hold multiple channels, mainly there are mono or stereo files (max. 2**16 - 1 channels). The problem you underlined occurs because of multiple channels file (stereo sample).
rate, data = wavfile.read('aaaah.wav') # Shape: (46447, 2), Rate: 48 kHz
It is not well documented, but the method signal.periodogram also performs on matrix and its input is not directly consistent with wavfile.read output (they perform on different axis by default). So we need to carefully orient dimensions (using axis switch) when performing PSD:
f, P = signal.periodogram(data, rate, axis=0, detrend='linear')
It also works with Transposition data.T but then we need to back transpose the result.
Specifying the axis solve the issue: frequency vector is correct and PSD is not null everywhere (before it performed on the axis=1 which is of length 2, in your case it performed 220160 PSD on 2-samples signals we wanted the converse).
The detrend switch ensure the signal has zero mean and its linear trend is removed.
Real application
This approach should work for real chunked samples, provided chunks hold enough data (see Nyquist-Shannon sampling theorem). Then data are sub-samples of the signal (chunks) and rate is kept constant since it does not change during the process.
Having chunks of size 2**10 seems to work, we can identify specific frequencies from them:
f, P = signal.periodogram(data[:2**10,:], rate, axis=0, detrend='linear') # Shapes: (513,) (513, 2)
idx0 = signal.find_peaks(P[:,0], threshold=0.01, distance=50)[0] # Peaks: [46.875, 2625., 13312.5, 16921.875] Hz
fig, axe = plt.subplots(2, 1, sharex=True, sharey=True)
axe[0].loglog(f, P[:,0])
axe[0].loglog(f[idx0], P[idx0,0], '.')
# [...]
At this point, the trickiest part is the fine tuning of find-peaks method to catch desired frequencies. You may need to consider to pre-filter your signal or post-process the PSD in order to make the identification easier.
Related
I am working with the pyaudio and matplotlib packages for the first time and I am attempting to plot live audio data from microphone input, transform it to frequency domain information, and then output peaks with an input distance. This project is a modification of the three-part guide to build a spectrum analyzer found here.
Currently the code is formatted in a class as I have alternative methods that I am applying to the audio but I am only posting the class with the relevant methods as they don't make reference to each and are self-contained. Another quirk of the program is that it calls upon a local file though it only uses input from the user microphone; this is a leftover from the original functionality of plotting a sound file's intensity while it played and is no longer integral to the code.
import pyaudio
import wave
import struct
import pandas as pd
from scipy.fftpack import fft
from scipy.signal import find_peaks
import matplotlib.pyplot as plt
import numpy as np
class Wave:
def __init__(self, file) -> None:
self.CHUNK = 1024 * 4
self.obj = wave.open(file, "r")
self.callback_output = []
self.data = self.obj.readframes(self.CHUNK)
self.rate = 44100
# Initiate an instance of PyAudio
self.p = pyaudio.PyAudio()
# Open a stream with the file specifications
self.stream = self.p.open(format = pyaudio.paInt16,
channels = self.obj.getnchannels(),
rate = self.rate,
output = True,
input = True,
frames_per_buffer = self.CHUNK)
def fft_plot(self, distance: float):
x_fft = np.linspace(0, self.rate, self.CHUNK)
fig, ax = plt.subplots()
line_fft, = ax.semilogx(x_fft, np.random.rand(self.CHUNK), "-", lw = 2)
# Bind plot window sizes
ax.set_xlim(20, self.rate / 2)
plot_data = self.stream.read(self.CHUNK)
self.data_int = pd.DataFrame(struct.unpack(\
str(self.CHUNK * 2) + 'h', plot_data)).astype(dtype = "b")[::2]
y_fft = fft(self.data_int)
line_fft.set_ydata(np.abs(y_fft[0:self.CHUNK]) / (256 * self.CHUNK))
plt.show(block = False)
while True:
# Read incoming audio data
data = self.stream.read(self.CHUNK)
# Convert data to bits then to array
self.data_int = struct.unpack(str(4 * self.CHUNK) + 'B', data)
# Recompute FFT and update line
yf = fft(self.data_int)
line_data = np.abs(yf[0:self.CHUNK]) / (128 * self.CHUNK)
line_fft.set_ydata(line_data)
# Find all values above threshold
peaks, _ = find_peaks(line_data, distance = distance)
# Update the plot
plt.plot(peaks, line_data[peaks], "x")
fig.canvas.draw()
fig.canvas.flush_events()
# Exit program when plot window is closed
fig.canvas.mpl_connect('close_event', exit)
test_file = "C:/Users/Tam/Documents/VScode/Final Project/PrismGuitars.wav"
audio_test = Wave(test_file)
audio_test.fft_plot(2000)
The code does not throw any errors and runs fine with an okay framerate and only terminates when the plot window is closed, all of which is good. The issue I'm encountering is with the determination and plotting of the peaks of line_data as when I run this code the output over time looks like this matplotlib graph instance.
It seems that the peaks (or peak) are being found but at a lower frequency than the x of line_data and as such are shifted comparatively. The other, more minor, issue is that since this is a live plot I would like to clear the previous instance of the peak marker so that it only shows the current instance and not all of the ones plotted prior.
I have attempted in prior fixes to use the line_fft in the peak detection but as it is cast to a Line2D format the peak detection algorithm isn't able to deal with the data type. I have also tried implementing a list comprehension as seen in this post but the time to cast to list is prohibitively slow and did not return any peak markers when I ran it.
EDIT: Following Jody's input the program now returns the proper values as I was only printing an index for the x-coordinate of the peak marker. Nevertheless I would still appreciate some insight as to whether it is possible to update per marker rather than having all the previous ones constantly displayed.
As for the marker updating I have attempted to clear the plot in the while loop both before and after drawing the markers (in different tests of course) but I only ever end up with a completely blank graph.
Please let me know if there is anything I should clarify and thank you for your time.
As Jody pointed out the peaks variable contains indexes for the detected peaks that then need to be retrieved from x_fft and line_data in order to match up with the displayed data.
First we create a scatter plot:
scat = ax.scatter([], [], c = "purple", marker = "x")
This data can then be stacked using a container variable in the while loop as such:
array_peaks = np.c_[x_fft[peaks], line_data[peaks]]
and update the data in the while loop with:
scat.set_offsets(array_peaks)
firstly, this function is to remove silence of an audio.
here is the official description:
https://librosa.github.io/librosa/generated/librosa.effects.split.html
librosa.effects.split(y, top_db=10, *kargs)
Split an audio signal into non-silent intervals.
top_db:number > 0 The threshold (in decibels) below reference to consider as silence
return: intervals:np.ndarray, shape=(m, 2) intervals[i] == (start_i, end_i) are the start and end time (in samples) of non-silent interval i.
so this is quite straightforward, for any sound which is lower than 10dB, treat it as silence and remove from the audio. It will return me a list of intervals which are non-silent segments in the audio.
So I did a very simple example and the result confuses me:
the audio i load here is a 3 second humand talking, very normal takling.
y, sr = librosa.load(file_list[0]) #load the data
print(y.shape) -> (87495,)
intervals = librosa.effects.split(y, top_db=100)
intervals -> array([[0, 87495]])
#if i change 100 to 10
intervals = librosa.effects.split(y, top_db=10)
intervals -> array([[19456, 23040],
[27136, 31232],
[55296, 58880],
[64512, 67072]])
how is this possible...
I tell librosa, ok, for any sound which is below 100dB, treat it as silence.
under this setting, the whole audio should be treated as silence, and based on the document, it should give me array[[0,0]] something...because after remove silence, there is nothing left...
But it seems librosa returns me the silence part instead of the non-silence part.
librosa.effects.split()
It says in the documentation that it returns a numpy array that contains the intervals which contain non silent audio. These intervals of course depend on the value you assign to the parameter top_db. It does not return any audio, just the start and end points of the non-silent slices of your waveform
In your case, even if you set top_db = 100, it does not treat the entire audio as silence since they state in the documentation that they use The reference power. By default, it uses **np.max** and compares to the peak power in the signal. So setting your top_db higher than the maximal value that exists in your audio will actually result in top_db not having any effect.
Here's an example:
import librosa
import numpy as np
import matplotlib.pyplot as plt
# create a hypothetical waveform with 1000 noisy samples and 1000 silent samples
nonsilent = np.random.randint(11, 100, 1000) * 100
silent = np.zeros(1000)
wave = np.concatenate((nonsilent, silent))
# look at it
print(wave.shape)
plt.plot(wave)
plt.show()
# get the noisy interval
non_silent_interval = librosa.effects.split(wave, top_db=0.1, hop_length=1000)
print(non_silent_interval)
# plot only the noisy chunk of the waveform
plt.plot(wave[non_silent_interval[0][0]:non_silent_interval[0][1]])
plt.show()
# now set top_db higher than anything in your audio
non_silent_interval = librosa.effects.split(wave, top_db=1000, hop_length=1000)
print(non_silent_interval)
# and you'll get the entire audio again
plt.plot(wave[non_silent_interval[0][0]:non_silent_interval[0][1]])
plt.show()
You can see that non silent audio is from 0 to 1000 and the silent audio is from 1000 to 2000 samples:
Here it only gives us the noisy chunk of the wave we created:
And here is with top_db set at a 1000:
That means librosa did everything that it promised to do in the documentation. Hope this helps.
I am trying to use the NumPy library for Python to do some frequency analysis. I have two .wav files that both contain a 440 Hz sine wave. One of them I generated with the NumPy sine function, and the other I generated in Audacity. The FFT works on the Python-generated one, but does nothing on the Audacity one.
Here are links to the two files:
The non-working file: 440_audacity.wav
The working file: 440_gen.wav
This is the code I am using to do the Fourier transform:
import numpy as np
import matplotlib.pyplot as plt
import scipy.io.wavfile as wave
infile = "440_gen.wav"
rate, data = wave.read(infile)
data = np.array(data)
data_fft = np.fft.fft(data)
frequencies = np.abs(data_fft)
plt.subplot(2,1,1)
plt.plot(data[:800])
plt.title("Original wave: " + infile)
plt.subplot(2,1,2)
plt.plot(frequencies)
plt.title("Fourier transform results")
plt.xlim(0, 1000)
plt.tight_layout()
plt.show()
I have two 16-bit PCM .wav files, one from Audacity and one created with the NumPy sine function. The NumPy-generated one gives the following (correct) result, with the spike at 440Hz:
The one I created with Audacity, although the waveform appears identical, does not give any result on the Fourier transform:
I admit I am at a loss here. The two files should contain in effect the same data. They are encoded the same way, and the wave forms appear identical on the upper graph.
Here is the code used to generate the working file:
import numpy as np
import wave
import struct
import matplotlib.pyplot as plt
from operator import add
freq_one = 440.0
num_samples = 44100
sample_rate = 44100.0
amplitude = 12800
file = "440_gen.wav"
s1 = [np.sin(2 * np.pi * freq_one * x/sample_rate) * amplitude for x in range(num_samples)]
sine_one = np.array(s1)
nframes = num_samples
comptype = "NONE"
compname="not compressed"
nchannels = 1
sampwidth = 2
wav_file = wave.open(file, 'w')
wav_file.setparams((nchannels, sampwidth, int(sample_rate), nframes, comptype, compname))
for s in sine_one:
wav_file.writeframes(struct.pack('h', int(s)))
Let me explain why your code doesn't work. And why it works with [:44100].
First of all, you have different files:
440_gen.wav = 1 sec and 44100 samples (counts)
440_audacity.wav = 5 sec and 220500 samples (counts)
Since for 440_gen.wav in FFT you use the number of reference points N=44100 and the sample rate 44100, your frequency resolution is 1 Hz (bins are followed in 1 Hz increments).
Therefore, on the graph, each FFT sample corresponds to a delta equal to 1 Hz.
plt.xlim(0, 1000) just corresponds to the range 0-1000 Hz.
However, for 440_audacity.wav in FFT, you use the number of reference points N=220500 and the sample rate 44100. Your frequency resolution is 0.2 Hz (bins follow in 0.2 Hz increments) - on the graph, each FFT sample corresponds to a frequency in 0.2 Hz increments (min-max = +(-) 22500 Hz).
plt.xlim(0, 1000) just corresponds to the range 1000x0.2 = 0-200 Hz.
That is why the result is not visible - it does not fall within this range.
plt.xlim (0, 5000) will correct your situation and extend the range to 0-1000 Hz.
The solution [:44100] that jwalton brought in really only forces the FFT to use N = 44100. And this repeats the situation with the calculation for 440_gen.wav
A more correct solution to your problem is to use the N (Windows Size) parameter in the code and the np.fft.fftfreq() function.
Sample code below.
I also recommend an excellent article https://realpython.com/python-scipy-fft/
import numpy as np
import matplotlib.pyplot as plt
import scipy.io.wavfile as wave
N = 44100 # added
infile = "440_audacity.wav"
rate, data = wave.read(infile)
data = np.array(data)
data_fft = np.fft.fft(data, N) # added N
frequencies = np.abs(data_fft)
x_freq = np.fft.fftfreq(N, 1/44100) # added
plt.subplot(2,1,1)
plt.plot(data[:800])
plt.title("Original wave: " + infile)
plt.subplot(2,1,2)
plt.plot(x_freq, frequencies) # added x_freq
plt.title("Fourier transform results")
plt.xlim(0, 1000)
plt.tight_layout()
plt.show()
Since answering this question #Konyukh Fyodorov was able to provide a better and properly justified solution (below).
The following worked for me and produced the plots as expected. Unfortunately I cannot piece together quite why this works, but I'm sharing this solution in the hope it may assist someone else to make that leap.
import numpy as np
import matplotlib.pyplot as plt
import scipy.io.wavfile as wave
infile = "440_gen.wav"
rate, data = wave.read(infile)
data = np.array(data)
# Use first 44100 datapoints in transform
data_fft = np.fft.fft(data[:44100])
frequencies = np.abs(data_fft)
plt.subplot(2,1,1)
plt.plot(data[:800])
plt.title("Original wave: " + infile)
plt.subplot(2,1,2)
plt.plot(frequencies)
plt.title("Fourier transform results")
plt.xlim(0, 1000)
plt.tight_layout()
plt.show()
I have python 3.4.
I transmitted a 2MHz (for example) frequency and received the cavitation over the time (until I stopped the measurement).
I want to get a spectrogram (cavitation vs frequency) and more interesting is a spectrogram of cavitation over the time of the sub-harmonic (1MHz) frequency.
The data is saved in sdataA (=cavitation), and t (=measurement time)
I tried to save fft in FFTA
FFTA = np.array([])
FFTA = np.fft.fft(dataA)
FFTA = np.append(FFTA, dataA)
I got real and complex numbers
Then I took only half (from 0 to 1MHz) and save the real and complex data.
nA = int(len(FFTA)/2)
yAre = FFTA[range(nA)].real
yAim = FFTA[range(nA)].imag
I tried to get the frequencies by:
FFTAfreqs = np.fft.fftfreq(len(yAre))
But it is totally wrong (I printed the data by print (FFTAfreqs))
I also plotted the data and again it's wrong:
plt.plot(t, FFTA[range(n)].real, 'b-', t, FFTA[range(n)].imag, 'r--')
plt.legend(('real', 'imaginary'))
plt.show()
How can I output a spectrogram of cavitation over the time of the sub-harmonic (1MHz) frequency?
EDIT:
Data example:
see a sample of 'dataA' and 'time':
dataA = [6.08E-04,2.78E-04,3.64E-04,3.64E-04,4.37E-04,4.09E-04,4.49E-04,4.09E-04,3.52E-04,3.24E-04,3.92E-04,3.24E-04,2.67E-04,3.24E-04,2.95E-04,2.95E-04,4.94E-04,4.09E-04,3.64E-04,3.07E-04]
time = [0.00E+00,4.96E-07,9.92E-07,1.49E-06,1.98E-06,2.48E-06,2.98E-06,3.47E-06,3.97E-06,4.46E-06,4.96E-06,5.46E-06,5.95E-06,6.45E-06,6.94E-06,7.44E-06,7.94E-06,8.43E-06,8.93E-06,9.42E-06]
EDIT II:
From #Martin example I tried the following code, please let me know if I did it right.
In the case that dataA and Time are saved as h5 files (or the data that I posted already)
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dfdata = pd.read_hdf("C:\\data_python\\DataA.h5")
dft = pd.read_hdf("C:\\data_python\\time.h5")
dft_cor = int((len(dft)-2)*4.96E-6) # calculating the measured time
fs = 2000000 #sampling frequency 2MHz
CHUNK = 10000
signal_time = dft_cor # seconds
def sine(freq,fs,secs):
data=dfdata
wave = np.sin(freq*2*np.pi*data)
return wave
a1 = sine(fs,fs,120)
a2 = sine(fs/2,fs,120)
signal = a1+a2
afft = np.abs(np.fft.fft(signal[0:CHUNK]))
freqs = np.linspace(0,fs,CHUNK)[0:int(fs/2)]
spectrogram_chunk = freqs/np.amax(freqs*1.0)
# Plot spectral analysis
plt.plot(freqs[0:1000000],afft[0:1000000]) # 0-1MHz
plt.show()
number_of_chunks = 1000
# Empty spectrogram
Spectrogram = np.zeros(shape = [CHUNK,number_of_chunks])
for i in range(number_of_chunks):
afft = np.abs(np.fft.fft(signal[i*CHUNK:(1+i)*CHUNK]))
freqs = np.linspace(0,fs,CHUNK)[0:int(fs/2)]
spectrogram_chunk = afft/np.amax(afft*1.0)
try:
Spectrogram[:,i]=spectrogram_chunk
except:
break
import cv2
Spectrogram = Spectrogram[0:1000000,:]
cv2.imshow('spectrogram',np.uint8(255*Spectrogram/np.amax(Spectrogram)))
cv2.waitKey()
cv2.destroyAllWindows()
It seems your problem is not in Python but in understanding what is Spectrogram.
Spectrogram is sequences of spectral analysis of a signal.
1) You need to cut your signal in CHUNKS.
2) Do spectral analysis of these CHUNKS and stick it together.
Example:
You have 1 second of audio recoding (44100 HZ sampling). That means the recording will have 1s * 44100 -> 44100 samples. You define CHUNK size = 1024 (for example).
For each chunk you will do FFT, and stick it together into 2D matrix (X axis - FFT of the CHUNK, Y axis - CHUNK number,). 44100 samples / CHUNK ~ 44 FFTs, each of the FFT covers 1024/44100~0.023 seconds of the signal
The bigger the CHUNK, the more accurate Spectrogram is, but less 'realtime'.
The smaller the CHUNK is, the less acurate is the Spectrogram, but you have more measurements as you measure frequencies 'more often'.
If you need 1MHZ - actually you cannot use anything higher than 1MHZ, you just take half of the resulting FFT array - and it doesnt matter which half, because 1MHZ is just the half of your sampling frequency, and the FFT is mirroring anything that is higher than 1/2 of sampling frequency.
About FFT, you dont want complex numbers. You want to do
FFT = np.abs(FFT) # Edit - I just noticed you use '.real', but I will keep it here
because you want real numbers.
Preparation for Spectrogram - example of Spectrogram
Audio Signal with 150HZ wave and 300HZ Wave
import numpy as np
import matplotlib.pyplot as plt
fs = 44100#sampling frequency
CHUNK = 10000
signal_time = 20 # seconds
def sine(freq,fs,secs):
data=np.arange(fs*secs)/(fs*1.0)
wave = np.sin(freq*2*np.pi*data)
return wave
a1 = sine(150,fs,120)
a2 = sine(300,fs,120)
signal = a1+a2
afft = np.abs(np.fft.fft(signal[0:CHUNK]))
freqs = np.linspace(0,fs,CHUNK)[0:int(fs/2)]
spectrogram_chunk = freqs/np.amax(freqs*1.0)
# Plot spectral analysis
plt.plot(freqs[0:250],afft[0:250])
plt.show()
number_of_chunks = 1000
# Empty spectrogram
Spectrogram = np.zeros(shape = [CHUNK,number_of_chunks])
for i in range(number_of_chunks):
afft = np.abs(np.fft.fft(signal[i*CHUNK:(1+i)*CHUNK]))
freqs = np.linspace(0,fs,CHUNK)[0:int(fs/2)]
#plt.plot(spectrogram_chunk[0:250],afft[0:250])
#plt.show()
spectrogram_chunk = afft/np.amax(afft*1.0)
#print(signal[i*CHUNK:(1+i)*CHUNK].shape)
try:
Spectrogram[:,i]=spectrogram_chunk
except:
break
import cv2
Spectrogram = Spectrogram[0:250,:]
cv2.imshow('spectrogram',np.uint8(255*Spectrogram/np.amax(Spectrogram)))
cv2.waitKey()
cv2.destroyAllWindows()
Spectral analysis of single CHUNK
Spectrogram
I am working with python and would like to perform the following. I have a wav audio file that I would like to read and plot frequency response for. I am only interested in the time window of 3-4 seconds, not the entire file. Also, I would like to resample my input file to 48k, instead of 192k which it comes in as.
I would like my plot to be with lines, of FFT length 8192, Hamming window, logx scale from 20 - 20k Hz.
Not hard to do in Python, you just have to install some packages:
import numpy as np
from scipy.io import wavfile
from scipy import signal
from matplotlib import pyplot as plt
sr, x = wavfile.read('file.wav')
x = signal.decimate(x, 4)
x = x[48000*3:48000*3+8192]
x *= np.hamming(8192)
X = abs(np.fft.rfft(x))
X_db = 20 * np.log10(X)
freqs = np.fft.rfftfreq(8192, 1/48000)
plt.plot(freqs, X_db)
plt.show()
What I do not understand, your time window of 3-4 seconds. Do you mean the window from 3 seconds on? (That is done in the code above.) Or do yo mean a window of 3 seconds duration? Then the window must be 3*48000 samples long.
Matlab is the easiest:
[x,fs] = audioread('file.wav');
;; downsample 4:1
x = resample(x, 4, 1);
;; snip 8192 samples 3 seconds in
x = x(48000*3:48000*3+8192);
plot(abs(fft(x));
I'll leave it to you to get the plot formatted the way you desire but just a hint is that you'll need to construct a frequency axis and snip the desired bins out of the fft.