Plot spectogram from mp3 - python

I am trying to plot a spectogram straight from an mp3 file in python 2.7.3 (using ubuntu). I can do it from a wav file as follows.
#!/usr/bin/python
from scikits.audiolab import wavread
from pylab import *
signal, fs, enc = wavread('XC124158.wav')
specgram(signal)
show()
What's the cleanest way to do the same thing from an mp3 file instead of a wav? I don't want to convert all the mp3 files to wav if I can avoid it.

Another very simple way of plotting spectrogram of mp3 file.
from pydub import AudioSegment
import matplotlib.pyplot as plt
from scipy.io import wavfile
from tempfile import mktemp
mp3_audio = AudioSegment.from_file('speech.mp3', format="mp3") # read mp3
wname = mktemp('.wav') # use temporary file
mp3_audio.export(wname, format="wav") # convert to wav
FS, data = wavfile.read(wname) # read wav file
plt.specgram(data, Fs=FS, NFFT=128, noverlap=0) # plot
plt.show()
This uses the pydub library which is more convenient compared to calling external commands.
This way you can iterate over all your .mp3 files without having to convert them to .wav prior to plotting.

I'd install the Debian/Ubuntu package libav-tools and call avconv to decode the mp3 to a temporary wav file:
Edit: Your other question was closed, so I'll expand my answer here a bit with a simple bandpass filtering example. In the file you linked it looks like most of the birdsong is concentrated in 4 kHz - 5.5 kHz.
import os
from subprocess import check_call
from tempfile import mktemp
from scikits.audiolab import wavread, play
from scipy.signal import remez, lfilter
from pylab import *
# convert mp3, read wav
mp3filename = 'XC124158.mp3'
wname = mktemp('.wav')
check_call(['avconv', '-i', mp3filename, wname])
sig, fs, enc = wavread(wname)
os.unlink(wname)
# bandpass filter
bands = array([0,3500,4000,5500,6000,fs/2.0]) / fs
desired = [0, 1, 0]
b = remez(513, bands, desired)
sig_filt = lfilter(b, 1, sig)
sig_filt /= 1.05 * max(abs(sig_filt)) # normalize
subplot(211)
specgram(sig, Fs=fs, NFFT=1024, noverlap=0)
axis('tight'); axis(ymax=8000)
title('Original')
subplot(212)
specgram(sig_filt, Fs=fs, NFFT=1024, noverlap=0)
axis('tight'); axis(ymax=8000)
title('Filtered')
show()
play(sig_filt, fs)

Related

How to write an audio file that is inside a for loop in python

I have to write an echo code. So I wrote this code. I want to know how to add each of these in a separate wav file. Can someone provide me with an answer. Thanks in advance.
import sounddevice as sd
from scipy.io import wavfile
import numpy as np
from scipy.io.wavfile import write
fs,x=wavfile.read('hello.wav')
amp=1
for i in range (2,6):
nx=(amp/i**3)*x
sd.play(nx,fs)
sd.wait()
write('hello[i]',fs,myrecording)
There are two things you need to do here:
replace myrecording (which isn't defined in your code snippet) with the sound data array that you want to write out to file. Inspection of your code makes me think it is actually nx that you want to write out.
you'll need a different filename for each file written out or else you will just write over the same file and you'll only be left with the one written out last. There are many ways to generate/structure the string for this filename but, based on what you had, I used one that will label the files things like "hello_2.wav" etc.
I have integrated both those things into the code here:
import sounddevice as sd
from scipy.io import wavfile
from scipy.io.wavfile import write
fs, x = wavfile.read('hello.wav')
AMP = 1
for i in range(2, 6):
nx = (AMP/i**3) * x
sd.play(nx, fs)
sd.wait()
filename = f'hello_{i}.wav'
write(filename, fs, nx)

How to create a spectrogram image from an audio file in Python just like how FFMPEG does?

My code:
import matplotlib.pyplot as plt
from matplotlib.pyplot import specgram
import librosa
import librosa.display
import numpy as np
import io
from PIL import Image
samples, sample_rate = librosa.load('thabo.wav')
fig = plt.figure(figsize=[4, 4])
ax = fig.add_subplot(111)
ax.axes.get_xaxis().set_visible(False)
ax.axes.get_yaxis().set_visible(False)
ax.set_frame_on(False)
S = librosa.feature.melspectrogram(y=samples, sr=sample_rate)
librosa.display.specshow(librosa.power_to_db(S, ref=np.max))
buf = io.BytesIO()
plt.savefig(buf, bbox_inches='tight',pad_inches=0)
# plt.close('all')
buf.seek(0)
im = Image.open(buf)
# im = Image.open(buf).convert('L')
im.show()
buf.close()
Spectrogram produced
Using FFMPEG
ffmpeg -i thabo.wav -lavfi showspectrumpic=s=224x224:mode=separate:legend=disabled spectrogram.png
Spectrogram produced
Please help, i want a spectrogram that is exactly the same as the one produced by FFMPEG, for use with a speech recognition model exported from google's teachable machine.
Offline recognition
You can directly pipe the audio to ffmpeg which will avoid the intermediate file, and ffmpeg can output to pipe as well if you wanted to avoid image file output.
Demonstration using three instances of ffmpeg:
ffmpeg -i input.wav -f wav - | ffmpeg -i - -filter_complex "showspectrumpic=s=224x224:mode=separate:legend=disabled" -c:v png -f image2pipe - | ffmpeg -y -i - output.png
The first and last ffmpeg instances of course will be replaced with your particular processes for your workflow.

Python, speech_recognition tool does not recognize .wav file

I have generated a .wav audio file containing some speech with some other interference speech in the background.
This code worked for me for a test .wav file:
import speech_recognition as sr
r = sr.Recognizer()
with sr.WavFile(wav_path) as source:
audio = r.record(source)
text = r.recognize_google(audio)
If I use my .wav file, I get the following error:
ValueError: Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format
The situation slightly improves if I save this .wav file with soundfile:
import soundfile as sf
wav, samplerate = sf.read(wav_path)
sf.write(saved_wav_path, original_wav, fs)
and then load the new saved_wav_path back into the first block of code, this time I get:
if not isinstance(actual_result, dict) or len(actual_result.get("alternative", [])) == 0: raise UnknownValueError()
The audio files were saved as
wavfile.write(wav_path, fs, data)
where wav_path = 'data.wav'. Any ideas?
SOLUTION:
Saving the audio data the following way generates the correct .wav files:
import wavio
wavio.write(wav_path, data, fs ,sampwidth=2)
From a brief look at the code in the speech_recognition package, it appears that it uses wave from the Python standard library to read WAV files. Python's wave library does not handle floating point WAV files, so you'll have to ensure that you use speech_recognition with files that were saved in an integer format.
SciPy's function scipy.io.wavfile.write will create an integer file if you pass it an array of integers. So if data is a floating point numpy array, you could try this:
from scipy.io import wavfile
# Convert `data` to 32 bit integers:
y = (np.iinfo(np.int32).max * (data/np.abs(data).max())).astype(np.int32)
wavfile.write(wav_path, fs, y)
Then try to read that file with speech_recognition.
Alternatively, you could use wavio (a small library that I created) to save your data to a WAV file. It also uses Python's wave library to create its output, so speech_recognition should be able to read the files that it creates.
I couldn't figure out what the sampwidth should be for wavio from its documentation; however, I added the following line sounddevice.default.dtype='int32', 'int32' which allowed sounddevice, scipy.io.wavfile.write / soundfile, and speech_recognizer to finally work together. The default dtype for sounddevice was float32 for both input and output. I tried changing only the output but it didnt work. Weirdly, audacity still thinks the output files are in float32. I am not suggesting this is a better solution, but it did work with both soundfile and scipy.
I also noticed another oddity. When sounddevice.default.dtype was left at the default [float32, float32] and I opened the resulting file in audacity. From audacity, I exported it and this exported wav would work with speechrecognizer. Audacity says its export is float32 and the same samplerate, so I don't fully understand. I am a noob but looked at both files in a hex editor and they look the same for the first 64 hex values then they differ... so it seems like the header is the same. Those two look very different than the file I made using int32 output, so seems like there's another factor at play...
Similar to Warren's answer, I was able to resolve this issue by rewriting the WAV file using pydub:
from pydub import AudioSegment
filename = "payload.wav" # File that already exists.
sound = AudioSegment.from_mp3(filename)
sound.export(filename, format="wav")

Joining .wav files without writing on disk in Python

I have a list of .wav files in binary format (they are coming from a websocket), which I want to join in a single binary .wav file to then do speech recognition with it. I have been able to make it work with the following code:
audio = [binary_wav1, binary_wav2,..., binary_wavN] # a list of .wav binary files coming from a socket
audio = [io.BytesIO(x) for x in audio]
# Join wav files
with wave.open('/tmp/input.wav', 'wb') as temp_input:
params_set = False
for audio_file in audio:
with wave.open(audio_file, 'rb') as w:
if not params_set:
temp_input.setparams(w.getparams())
params_set = True
temp_input.writeframes(w.readframes(w.getnframes()))
# Do speech recognition
binary_audio = open('/tmp/input.wav', 'rb').read())
ASR(binary_audio)
The problem is that I don't want to write the file '/tmp/input.wav' in disk. Is there any way to do it without writing any file in the disk?
Thanks.
The general solution for having a file but never putting it to disk is a stream. For this we use the io library which is the default library for working with in-memory streams. You even already use BytesIO earlier in your code it seems.
audio = [binary_wav1, binary_wav2,..., binary_wavN] # a list of .wav binary files coming from a socket
audio = [io.BytesIO(x) for x in audio]
# Join wav files
params_set = False
temp_file = io.BytesIO()
with wave.open(temp_file, 'wb') as temp_input:
for audio_file in audio:
with wave.open(audio_file, 'rb') as w:
if not params_set:
temp_input.setparams(w.getparams())
params_set = True
temp_input.writeframes(w.readframes(w.getnframes()))
#move the cursor back to the beginning of the "file"
temp_file.seek(0)
# Do speech recognition
binary_audio = temp_file.read()
ASR(binary_audio)
note I don't have any .wav files to try this out on. It's up to the wave library to handle the difference between real files and buffered streams properly.
With scipy and numpy you can read the wav files as numpy arrays and than do the modifications you want.
from scipy.io import wavfile
import numpy as np
# load files
_, arr1 = wavfile.read('song.wav')
_, arr2 = wavfile.read('Aaron_Copland-Quiet_City.wav')
print(arr1.shape)
print(arr2.shape)
>>> (1323001,)
>>> (1323000,)
# make new array by concatenating two audio waves
new_arr = np.hstack((arr1, arr2))
print(new_arr.shape)
>>> (2646001,)
# save new audio wave
wavfile.write('new_audio.wav')

Read Nist Wav File in TIMIT database into python numpy array

Is this possible?? I seem to be getting this error when using wavread from scikits.audiolab:
x86_64.egg/scikits/audiolab/pysndfile/matapi.pyc in basic_reader(filename, last, first)
93 if not hdl.format.file_format == filetype:
94 raise ValueError, "%s is not a %s file (is %s)" \
---> 95 % (filename, filetype, hdl.format.file_format)
96
97 fs = hdl.samplerate
ValueError: si762.wav is not a wav file (is nist)
I'm guessing it can't read NIST wav files but is there another way to easily read them into a numpy array? If not, what is the best way to go about reading in the data?
Possibly rewriting the audiolab wavread to recognize the nist header??
Answer my own question because figured it out but you can use the Sndfile class from scikits.audiolab which supports a multitude of reading and writing file formats depending on the libsndfile you have. Then you just use:
from scikits.audiolab import Sndfile, play
f = Sndfile(filename, 'r')
data = f.read_frames(10000)
play(data) # Just to test the read data
To expand upon J Spen's answer, when using scikits.audiolab, if you want to read the whole file, not just a specified number of frames, you can use the nframes parameter of the Sndfile class to read the whole thing. For example:
from scikits.audiolab import Sndfile, play
f = Sndfile(filename, 'r')
data = f.read_frames(f.nframes)
play(data) # Just to test the read data
I couldn't find any references to this in the documentation, but it is there in the source.
In contrast to the above answers, there is another alternative way to read
audio files in multiple formats, e.g., .wav, .aif, .mp3 etc.
import matplotlib.pyplot as plt
import soundfile as sf
import sounddevice as sd
# https://freewavesamples.com/files/Alesis-Sanctuary-QCard-Crotales-C6.wav
data, fs = sf.read('Alesis-Sanctuary-QCard-Crotales-C6.wav')
print(data.shape,fs)
sd.play(data, fs, blocking=True)
plt.plot(data)
plt.show()
Output:
(88116, 2) 44100

Categories

Resources