Get frequency value (hz) in split wav audio file by python - python

I have simple case, i'm recording bee sound in Bee have by python pyAudio. After recording i have to split this record in 10 sec chunks and analyze this chunks by python wave, numpy or python scipy, numpy, i don't know what is easiest way.
I would like to read the record then split it to 10 sec chunk and apply fft or rfft and after that i need get dominant frequencies in Hz in chunk.
I collect this values with timestamp in histogram for histogram plot.
Right now i have some examples where i can get whole record and make plot by python matplotlib, scipy.signal but i don't know how to separated it into listo of Hz values.
If i have completely wrong way please tell me. Thx for advice.
import numpy as np
import struct, wave
def main():
audio = wave.open('wav_data/18-08-07_09_10_12.wav', 'rb')
rate = audio.getframerate()
num_frames = audio.getnframes()
dur = int(num_frames / rate)
fmt = "%ih" % rate
for x in range(dur):
data = audio.readframes(rate)
data_int = struct.unpack(fmt, data)
data_np = np.array(data_int, dtype='b')
w = np.fft.rfft(data_np)
freqs = np.fft.fftfreq(len(w))
idx = np.argmax(w)
freq = freqs[idx]
freq_in_hz = abs(freq * rate)
print(freq_in_hz)
if __name__ == "__main__":
main()

Related

Preferred way to write audio data to a WAV file?

I am trying to write an audio file using python's wave and numpy. So far I have the following and it works well:
import wave
import numpy as np
# set up WAV file parameters
num_channels = 1 # mono audio
sample_width = 1 # 8 bits(1 byte)/sample
sample_rate = 44.1e3 # 44.1k samples/second
frequency = 440 # 440 Hz
duration = 20 # play for this many seconds
num_samples = int(sample_rate * duration) # samples/seconds * seconds
# open WAV file and write data
with wave.open('sine8bit_2.wav', 'w') as wavfile:
wavfile.setnchannels(num_channels)
wavfile.setsampwidth(sample_width)
wavfile.setframerate(sample_rate)
t = np.linspace(0, duration, num_samples)
data = (127*np.sin(2*np.pi*frequency*t)).astype(np.int8)
wavfile.writeframes(data) # or data.tobytes() ??
My issue is that since I am using a high sampling rate, the num_samples variable might quickly become too large (9261000 samples for a 3 minute 30 seconds track say). Would using a numpy array this large be advisable? Is there a better way of going about this? Also is use of writeframes(.tobytes()) needed in this case because my code runs fine without it and it seems like extra overhead (especially if the arrays get too large).
Assuming you are only going to write a sine wave, you could very well create only one period as your data array and write that several times to the .wav file.
Using the parameters you provided, your data array is 8800 times smaller with that approach. Its size also no longer depends on the duration of your file!
import wave
import numpy as np
# set up WAV file parameters
num_channels = 1 # mono audio
sample_width = 1 # 8 bits(1 byte)/sample
sample_rate = 44.1e3 # 44.1k samples/second
frequency = 440 # 440 Hz
duration = 20 # play for this many seconds
# Create a single period of sine wave.
n = round(sample_rate/frequency)
t = np.linspace(0, 1/frequency, n)
data = (127*np.sin(2*np.pi*frequency*t)).astype(np.int8)
periods = round(frequency*duration)
# open WAV file and write data
with wave.open('sine8bit_2.wav', 'w') as wavfile:
wavfile.setnchannels(num_channels)
wavfile.setsampwidth(sample_width)
wavfile.setframerate(sample_rate)
for _ in range(periods):
wavfile.writeframes(data)

FFT spectrogram in python

I have python 3.4.
I transmitted a 2MHz (for example) frequency and received the cavitation over the time (until I stopped the measurement).
I want to get a spectrogram (cavitation vs frequency) and more interesting is a spectrogram of cavitation over the time of the sub-harmonic (1MHz) frequency.
The data is saved in sdataA (=cavitation), and t (=measurement time)
I tried to save fft in FFTA
FFTA = np.array([])
FFTA = np.fft.fft(dataA)
FFTA = np.append(FFTA, dataA)
I got real and complex numbers
Then I took only half (from 0 to 1MHz) and save the real and complex data.
nA = int(len(FFTA)/2)
yAre = FFTA[range(nA)].real
yAim = FFTA[range(nA)].imag
I tried to get the frequencies by:
FFTAfreqs = np.fft.fftfreq(len(yAre))
But it is totally wrong (I printed the data by print (FFTAfreqs))
I also plotted the data and again it's wrong:
plt.plot(t, FFTA[range(n)].real, 'b-', t, FFTA[range(n)].imag, 'r--')
plt.legend(('real', 'imaginary'))
plt.show()
How can I output a spectrogram of cavitation over the time of the sub-harmonic (1MHz) frequency?
EDIT:
Data example:
see a sample of 'dataA' and 'time':
dataA = [6.08E-04,2.78E-04,3.64E-04,3.64E-04,4.37E-04,4.09E-04,4.49E-04,4.09E-04,3.52E-04,3.24E-04,3.92E-04,3.24E-04,2.67E-04,3.24E-04,2.95E-04,2.95E-04,4.94E-04,4.09E-04,3.64E-04,3.07E-04]
time = [0.00E+00,4.96E-07,9.92E-07,1.49E-06,1.98E-06,2.48E-06,2.98E-06,3.47E-06,3.97E-06,4.46E-06,4.96E-06,5.46E-06,5.95E-06,6.45E-06,6.94E-06,7.44E-06,7.94E-06,8.43E-06,8.93E-06,9.42E-06]
EDIT II:
From #Martin example I tried the following code, please let me know if I did it right.
In the case that dataA and Time are saved as h5 files (or the data that I posted already)
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dfdata = pd.read_hdf("C:\\data_python\\DataA.h5")
dft = pd.read_hdf("C:\\data_python\\time.h5")
dft_cor = int((len(dft)-2)*4.96E-6) # calculating the measured time
fs = 2000000 #sampling frequency 2MHz
CHUNK = 10000
signal_time = dft_cor # seconds
def sine(freq,fs,secs):
data=dfdata
wave = np.sin(freq*2*np.pi*data)
return wave
a1 = sine(fs,fs,120)
a2 = sine(fs/2,fs,120)
signal = a1+a2
afft = np.abs(np.fft.fft(signal[0:CHUNK]))
freqs = np.linspace(0,fs,CHUNK)[0:int(fs/2)]
spectrogram_chunk = freqs/np.amax(freqs*1.0)
# Plot spectral analysis
plt.plot(freqs[0:1000000],afft[0:1000000]) # 0-1MHz
plt.show()
number_of_chunks = 1000
# Empty spectrogram
Spectrogram = np.zeros(shape = [CHUNK,number_of_chunks])
for i in range(number_of_chunks):
afft = np.abs(np.fft.fft(signal[i*CHUNK:(1+i)*CHUNK]))
freqs = np.linspace(0,fs,CHUNK)[0:int(fs/2)]
spectrogram_chunk = afft/np.amax(afft*1.0)
try:
Spectrogram[:,i]=spectrogram_chunk
except:
break
import cv2
Spectrogram = Spectrogram[0:1000000,:]
cv2.imshow('spectrogram',np.uint8(255*Spectrogram/np.amax(Spectrogram)))
cv2.waitKey()
cv2.destroyAllWindows()
It seems your problem is not in Python but in understanding what is Spectrogram.
Spectrogram is sequences of spectral analysis of a signal.
1) You need to cut your signal in CHUNKS.
2) Do spectral analysis of these CHUNKS and stick it together.
Example:
You have 1 second of audio recoding (44100 HZ sampling). That means the recording will have 1s * 44100 -> 44100 samples. You define CHUNK size = 1024 (for example).
For each chunk you will do FFT, and stick it together into 2D matrix (X axis - FFT of the CHUNK, Y axis - CHUNK number,). 44100 samples / CHUNK ~ 44 FFTs, each of the FFT covers 1024/44100~0.023 seconds of the signal
The bigger the CHUNK, the more accurate Spectrogram is, but less 'realtime'.
The smaller the CHUNK is, the less acurate is the Spectrogram, but you have more measurements as you measure frequencies 'more often'.
If you need 1MHZ - actually you cannot use anything higher than 1MHZ, you just take half of the resulting FFT array - and it doesnt matter which half, because 1MHZ is just the half of your sampling frequency, and the FFT is mirroring anything that is higher than 1/2 of sampling frequency.
About FFT, you dont want complex numbers. You want to do
FFT = np.abs(FFT) # Edit - I just noticed you use '.real', but I will keep it here
because you want real numbers.
Preparation for Spectrogram - example of Spectrogram
Audio Signal with 150HZ wave and 300HZ Wave
import numpy as np
import matplotlib.pyplot as plt
fs = 44100#sampling frequency
CHUNK = 10000
signal_time = 20 # seconds
def sine(freq,fs,secs):
data=np.arange(fs*secs)/(fs*1.0)
wave = np.sin(freq*2*np.pi*data)
return wave
a1 = sine(150,fs,120)
a2 = sine(300,fs,120)
signal = a1+a2
afft = np.abs(np.fft.fft(signal[0:CHUNK]))
freqs = np.linspace(0,fs,CHUNK)[0:int(fs/2)]
spectrogram_chunk = freqs/np.amax(freqs*1.0)
# Plot spectral analysis
plt.plot(freqs[0:250],afft[0:250])
plt.show()
number_of_chunks = 1000
# Empty spectrogram
Spectrogram = np.zeros(shape = [CHUNK,number_of_chunks])
for i in range(number_of_chunks):
afft = np.abs(np.fft.fft(signal[i*CHUNK:(1+i)*CHUNK]))
freqs = np.linspace(0,fs,CHUNK)[0:int(fs/2)]
#plt.plot(spectrogram_chunk[0:250],afft[0:250])
#plt.show()
spectrogram_chunk = afft/np.amax(afft*1.0)
#print(signal[i*CHUNK:(1+i)*CHUNK].shape)
try:
Spectrogram[:,i]=spectrogram_chunk
except:
break
import cv2
Spectrogram = Spectrogram[0:250,:]
cv2.imshow('spectrogram',np.uint8(255*Spectrogram/np.amax(Spectrogram)))
cv2.waitKey()
cv2.destroyAllWindows()
Spectral analysis of single CHUNK
Spectrogram

FFT of data received from PyAudio gives wrong frequency

My main task is to recognize a human humming from a microphone in real time. As the first step to recognizing signals in general, I have made a 5 seconds recording of a 440 Hz signal generated from an app on my phone and tried to detect the same frequency.
I used Audacity to plot and verify the spectrum from the same 440Hz wav file and I got this, which shows that 440Hz is indeed the dominant frequency :
(https://i.imgur.com/2UImEkR.png)
To do this with python, I use the PyAudio library and refer this blog. The code I have so far which I run with the wav file is this :
"""PyAudio Example: Play a WAVE file."""
import pyaudio
import wave
import sys
import struct
import numpy as np
import matplotlib.pyplot as plt
CHUNK = 1024
if len(sys.argv) < 2:
print("Plays a wave file.\n\nUsage: %s filename.wav" % sys.argv[0])
sys.exit(-1)
wf = wave.open(sys.argv[1], 'rb')
p = pyaudio.PyAudio()
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
channels=wf.getnchannels(),
rate=wf.getframerate(),
output=True)
data = wf.readframes(CHUNK)
i = 0
while data != '':
i += 1
data_unpacked = struct.unpack('{n}h'.format(n= len(data)/2 ), data)
data_np = np.array(data_unpacked)
data_fft = np.fft.fft(data_np)
data_freq = np.abs(data_fft)/len(data_fft) # Dividing by length to normalize the amplitude as per https://www.mathworks.com/matlabcentral/answers/162846-amplitude-of-signal-after-fft-operation
print("Chunk: {} max_freq: {}".format(i,np.argmax(data_freq)))
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot(data_freq)
ax.set_xscale('log')
plt.show()
stream.write(data)
data = wf.readframes(CHUNK)
stream.stop_stream()
stream.close()
p.terminate()
In the output, I get that the max frequency is 10 for all the chunks and an example of one of the plots is :
(https://i.imgur.com/zsAXME5.png)
I had expected this value to be 440 instead of 10 for all the chunks. I admit I know very little about the theory of FFTs and I appreciate any help in letting my solve this.
EDIT:
The sampling rate is 44100. no. of channels is 2 and sample width is also 2.
Forewords
As xdurch0 pointed out, you are reading a kind of index instead of a frequency. If you are about to make all computation by yourself you need to compute you own frequency vector before plotting if you want to get consistent result. Reading this answer may help you towards the solution.
The frequency vector for FFT (half plane) is:
f = np.linspace(0, rate/2, N_fft/2)
Or (full plane):
f = np.linspace(-rate/2, rate/2, N_fft)
On the other hand we can delegate most of the work to the excellent scipy.signal toolbox which aims to cope with this kind of problems (and many more).
MCVE
Using scipy package it is straight forward to get the desired result for a simple WAV file with a single frequency (source):
import numpy as np
from scipy import signal
from scipy.io import wavfile
import matplotlib.pyplot as plt
# Read the file (rate and data):
rate, data = wavfile.read('tone.wav') # See source
# Compute PSD:
f, P = signal.periodogram(data, rate) # Frequencies and PSD
# Display PSD:
fig, axe = plt.subplots()
axe.semilogy(f, P)
axe.set_xlim([0,500])
axe.set_ylim([1e-8, 1e10])
axe.set_xlabel(r'Frequency, $\nu$ $[\mathrm{Hz}]$')
axe.set_ylabel(r'PSD, $P$ $[\mathrm{AU^2Hz}^{-1}]$')
axe.set_title('Periodogram')
axe.grid(which='both')
Basically:
Read the wav file and get the sample rate (here 44.1kHz);
Compute the Power Spectrum Density and frequencies;
Then display it with matplotlib.
This outputs:
Find Peak
Then we can find the frequency of the first highest peak (P>1e-2, this criterion is subject to tuning) using find_peaks:
idx = signal.find_peaks(P, height=1e-2)[0][0]
f[idx] # 440.0 Hz
Putting all together it merely boils down to:
def freq(filename, setup={'height': 1e-2}):
rate, data = wavfile.read(filename)
f, P = signal.periodogram(data, rate)
return f[signal.find_peaks(P, **setup)[0][0]]
Handling multiple channels
I tried this code with my wav file, and got the error for the line
axe.semilogy(f, Pxx_den) as follows : ValueError: x and y must have
same first dimension. I checked the shapes and f has (2,) while
Pxx_den has (220160,2). Also, the Pxx_den array seems to have all
zeros only.
Wav file can hold multiple channels, mainly there are mono or stereo files (max. 2**16 - 1 channels). The problem you underlined occurs because of multiple channels file (stereo sample).
rate, data = wavfile.read('aaaah.wav') # Shape: (46447, 2), Rate: 48 kHz
It is not well documented, but the method signal.periodogram also performs on matrix and its input is not directly consistent with wavfile.read output (they perform on different axis by default). So we need to carefully orient dimensions (using axis switch) when performing PSD:
f, P = signal.periodogram(data, rate, axis=0, detrend='linear')
It also works with Transposition data.T but then we need to back transpose the result.
Specifying the axis solve the issue: frequency vector is correct and PSD is not null everywhere (before it performed on the axis=1 which is of length 2, in your case it performed 220160 PSD on 2-samples signals we wanted the converse).
The detrend switch ensure the signal has zero mean and its linear trend is removed.
Real application
This approach should work for real chunked samples, provided chunks hold enough data (see Nyquist-Shannon sampling theorem). Then data are sub-samples of the signal (chunks) and rate is kept constant since it does not change during the process.
Having chunks of size 2**10 seems to work, we can identify specific frequencies from them:
f, P = signal.periodogram(data[:2**10,:], rate, axis=0, detrend='linear') # Shapes: (513,) (513, 2)
idx0 = signal.find_peaks(P[:,0], threshold=0.01, distance=50)[0] # Peaks: [46.875, 2625., 13312.5, 16921.875] Hz
fig, axe = plt.subplots(2, 1, sharex=True, sharey=True)
axe[0].loglog(f, P[:,0])
axe[0].loglog(f[idx0], P[idx0,0], '.')
# [...]
At this point, the trickiest part is the fine tuning of find-peaks method to catch desired frequencies. You may need to consider to pre-filter your signal or post-process the PSD in order to make the identification easier.

Remove Silence from an Audio Input and then find the frequencies of the remaining audio signal using numpy in Python?

I have a Audio Signal and have imported using wave.open function.
then I am converting the signal into frames and then using a window to take a set number of samples and checking their amplitude to see if it falls within a set threshold. If it does then I consider it as a Silence and if not then and Audio Signal.
import wave
import struct
import numpy as np
import matplotlib.pyplot as plt
sound_file = wave.open('Audio_1.wav', 'r')
file_length = sound_file.getnframes()
sound = np.zeros(file_length)
for i in range(file_length):
data = sound_file.readframes(1)
data = struct.unpack("<h", data)
sound[i] = int(data[0])
sound = np.divide(sound, float(2**15)) # Normalized data range [-1 to 1]
#print sound #vector corresponding to audio signal containing audio samples
Ap = np.pad(sound, (0,int(np.ceil(len(sound) / 2205.)) * 2205 - len(sound)), 'constant', constant_values=0) # Padding of the sound samples so that the input is a multiple of 2205(window Length)
Apr = Ap.reshape((len(Ap) // 2205, 2205))
Apr.shape
array1=(Apr ** 2).sum(axis=1) #Record Sum of Squares of the amplitude of the signal falling within that window
print array1
#print len(array1)
threshold =1103.4
result= np.array(filter(lambda x: x>= threshold, array1)) #filtering elements below set threshold
print result
print len(result)
print np.where(array1>1103.4) # finding starting index of the elements.
Below are my doubts:
How to find the ending index of the window? So that I can specifically slice that window out from the input.
How should I proceed so that I can get back the samples which contain the audio signal and convert those signals into frequency domain using np.fft.fft().
If any statement or question unclear. Please Specify.
Thank you

Read audio wav file and plot audio freq response smoothed in python

I am working with python and would like to perform the following. I have a wav audio file that I would like to read and plot frequency response for. I am only interested in the time window of 3-4 seconds, not the entire file. Also, I would like to resample my input file to 48k, instead of 192k which it comes in as.
I would like my plot to be with lines, of FFT length 8192, Hamming window, logx scale from 20 - 20k Hz.
Not hard to do in Python, you just have to install some packages:
import numpy as np
from scipy.io import wavfile
from scipy import signal
from matplotlib import pyplot as plt
sr, x = wavfile.read('file.wav')
x = signal.decimate(x, 4)
x = x[48000*3:48000*3+8192]
x *= np.hamming(8192)
X = abs(np.fft.rfft(x))
X_db = 20 * np.log10(X)
freqs = np.fft.rfftfreq(8192, 1/48000)
plt.plot(freqs, X_db)
plt.show()
What I do not understand, your time window of 3-4 seconds. Do you mean the window from 3 seconds on? (That is done in the code above.) Or do yo mean a window of 3 seconds duration? Then the window must be 3*48000 samples long.
Matlab is the easiest:
[x,fs] = audioread('file.wav');
;; downsample 4:1
x = resample(x, 4, 1);
;; snip 8192 samples 3 seconds in
x = x(48000*3:48000*3+8192);
plot(abs(fft(x));
I'll leave it to you to get the plot formatted the way you desire but just a hint is that you'll need to construct a frequency axis and snip the desired bins out of the fft.

Categories

Resources