Plot Spectrogram of a wav audio file - python

I want to plot Spectrogram of 30s of a audio file in wav. But I encountered error while doing so in python. How can I achieve my goal?
import scipy
import matplotlib.pyplot as plt
import scipy.io.wavfile
sample_rate, X = scipy.io.wavfile.read('595.wav')
print (sample_rate, X.shape )
plt.specgram(X, Fs=sample_rate, xextent=(0,30))
And error
ValueError: only 1-dimensional arrays can be used

The error is pretty clear: ValueError: only 1-dimensional arrays can be used.
In your case X is not 1-dimensional. You would find out by printing X.shape.
While I can't be certain without a complete example here, the best guess would be that you're having a stereo wav file, which has 2 channels. So you need to select if you want to plot the spectrogram for the left or the right channel. E.g. for the left channel:
plt.specgram(X[:,0], Fs=sample_rate, xextent=(0,30))

Related

Error passing wav file to IPython.display

I am new to Python but I am studying it as programming language for DSP. I recorded a wav file, and have been trying to play it back using IPython.display.Audio:
import IPython.display
from scipy.io import wavfile
rate, s = wavfile.read('h.wav')
IPython.display.Audio(s, rate=rate)
But this gives the following error:
struct.error: ushort format requires 0 <= number <= 0xffff
I tried installing FFmpeg but it hasn't helped.
That's not a very useful error message, it took a bit of debugging to figure out what was going on! It is caused by the "shape" of the matrix returned from wavfile being the wrong way around.
The docs for IPython.display.Audio say it expects a:
Numpy 2d array containing waveforms for each channel. Shape=(NCHAN, NSAMPLES).
If I read a (stereo) wav file I have lying around:
rate, samples = wavfile.read(path)
print(samples.shape)
I get (141120, 2) showing this is of shape (NSAMPLES, NCHAN). Passing this array directly to Audio I get a similar error as you do. Transposing the array will flip these around, causing the array to be compatible with this method. The transpose of a matrix in Numpy is accessed via the .T attribute, e.g.:
IPython.display.Audio(samples.T, rate=rate)
works for me.
Thank you for your answer, it helped me.
below is my code, maybe can help someone.
frequency = 44100
duration = 5
record = sd.rec((frequency * duration), samplerate=frequency , channels=1, blocking=True, dtype='float64')
sd.wait()
st.audio(record.T, sample_rate=frequency)

Getting MFCC from a spectrogram time/ frequency series array

I have several spectrogra time/frequency [500,1024] files.
I need to calculate the MFCC of these files. There are lot's of the library for calculating MFCC on a raw audio file but I'm looking a method in python for calculating directly from np.array.
This can be done with librosa, as it allows to pass in spectrograms instead of audio waveform using the parameter S.
I am assuming that you have a STFT magnitude spectrogram (linear spectrogram with phase discarded). Then need to convert this into a mel-filtered spectrogram, perform log-scaling, and then do the DCT-2 and truncation to obtain MFCC coefficients. Skeleton code below:
import librosa
import numpy
# TODO: you need to provide these
sr = my_samplerate
my_stft
mels = librosa.feature.melspectrogram(S=my_stft, sr=sr, n_mels=64)
log_mels = librosa.core.amplitude_to_db(mels, ref=numpy.max)
mfcc = librosa.feature.mfcc(S=log_mels, sr=sr, n_mfcc=20)
See the librosa API reference for more details.

Python Convert an array to Wav

In Python, I have an array of floats representing the voltages of an analog signal.
Can anyone explain how I can change the array into a .wav format? I have seen this
Do I first need to change the data format from [1.23,1.24,1.25,1.26] (for example) to 1.231.241.251.26 before adding the headers so that it's read correctly?
I eventually plan on using FFT on the values to derive the fundamental frequencies is there a better way to store the values in this case?
Thank you
If you know the sampling frequency of your signal and data is already scaled appropriately by max(abs(data)) then you can do it very easily using scipy:
from __future__ import print_function
import scipy.io.wavfile as wavf
import numpy as np
if __name__ == "__main__":
samples = np.random.randn(44100)
fs = 44100
out_f = 'out.wav'
wavf.write(out_f, fs, samples)
You can also use the standard wave module.

Audio spectrum extraction from audio file by python

Sorry if I submit a duplicate, but I wonder if there is any lib in python which makes you able to extract sound spectrum from audio files. I want to be able to take an audio file and write an algoritm which will return a set of data {TimeStampInFile; Frequency-Amplitude}.
I heard that this is usually called Beat Detection, but as far as I see beat detection is not a precise method, it is good only for visualisation, while I want to manipulate on the extracted data and then convert it back to an audio file. I don't need to do this real-time.
I will appreciate any suggestions and recommendations.
You can compute and visualize the spectrum and the spectrogram this using scipy, for this test i used this audio file: vignesh.wav
from scipy.io import wavfile # scipy library to read wav files
import numpy as np
AudioName = "vignesh.wav" # Audio File
fs, Audiodata = wavfile.read(AudioName)
# Plot the audio signal in time
import matplotlib.pyplot as plt
plt.plot(Audiodata)
plt.title('Audio signal in time',size=16)
# spectrum
from scipy.fftpack import fft # fourier transform
n = len(Audiodata)
AudioFreq = fft(Audiodata)
AudioFreq = AudioFreq[0:int(np.ceil((n+1)/2.0))] #Half of the spectrum
MagFreq = np.abs(AudioFreq) # Magnitude
MagFreq = MagFreq / float(n)
# power spectrum
MagFreq = MagFreq**2
if n % 2 > 0: # ffte odd
MagFreq[1:len(MagFreq)] = MagFreq[1:len(MagFreq)] * 2
else:# fft even
MagFreq[1:len(MagFreq) -1] = MagFreq[1:len(MagFreq) - 1] * 2
plt.figure()
freqAxis = np.arange(0,int(np.ceil((n+1)/2.0)), 1.0) * (fs / n);
plt.plot(freqAxis/1000.0, 10*np.log10(MagFreq)) #Power spectrum
plt.xlabel('Frequency (kHz)'); plt.ylabel('Power spectrum (dB)');
#Spectrogram
from scipy import signal
N = 512 #Number of point in the fft
f, t, Sxx = signal.spectrogram(Audiodata, fs,window = signal.blackman(N),nfft=N)
plt.figure()
plt.pcolormesh(t, f,10*np.log10(Sxx)) # dB spectrogram
#plt.pcolormesh(t, f,Sxx) # Lineal spectrogram
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [seg]')
plt.title('Spectrogram with scipy.signal',size=16);
plt.show()
i tested all the code and it works, you need, numpy, matplotlib and scipy.
cheers
I think your question has three separate parts:
How to load audio files into python?
How to calculate spectrum in python?
What to do with the spectrum?
1. How to load audio files in python?
You are probably best off by using scipy, as it provides a lot of signal processing functions. For loading audio files:
import scipy.io.wavfile
samplerate, data = scipy.io.wavfile.read("mywav.wav")
Now you have the sample rate (samples/s) in samplerate and data as a numpy.array in data. You may want to transform the data into floating point, depending on your application.
There is also a standard python module wave for loading wav-files, but numpy/scipy offers a simpler interface and more options for signal processing.
2. How to calculate the spectrum
Brief answer: Use FFT. For more words of wisdom, see:
Analyze audio using Fast Fourier Transform
Longer answer is quite long. Windowing is very important, otherwise you'll have strange spectra.
3. What to do with the spectrum
This is a bit more difficult. Filtering is often performed in time domain for longer signals. Maybe if you tell us what you want to accomplish, you'll receive a good answer for this one. Calculating the frequency spectrum is one thing, getting meaningful results with it in signal processing is a bit more complicated.
(I know you did not ask this one, but I see it coming with a probability >> 0. Of course, it may be that you have good knowledge on audio signal processing, in which case this is irrelevant.)

Python NumPy Convert FFT To File

I was wondering if it's possible to get the frequencies present in a file with NumPy, and then alter those frequencies and create a new WAV file from them? I would like to do some filtering on a file, but I have yet to see a way to read a WAV file into NumPy, filter it, and then output the filtered version. If anyone could help, that would be great.
SciPy provides functions for doing FFTs on NumPy arrays, and also provides functions for reading and writing them to WAV files. e.g.
from scipy.io.wavfile import read, write
from scipy.fftpack import rfft, irfft
import np as numpy
rate, input = read('input.wav')
transformed = rfft(input)
filtered = function_that_does_the_filtering(transformed)
output = irfft(filtered)
write('output.wav', rate, output)
(input, transformed and output are all numpy arrays)

Categories

Resources