Python How to convert pyaudio bytes into virtual file? - python

In Short
Is there a way to convert raw audio data (obtained by PyAudio module) into the form of virtual file (can be obtained by using python open() function), without saving it to the disk and read it from the disk? Details are provided as belows.
What Am I Doing
I'm using PyAudio to record audio, then it will be fed into a tensorflow model to get prediction. Currently, it works when I firstly save the recorded sound as .wav file on the disk, and then read it again to feed it into the model. Here is the code of recording and saving:
import pyaudio
import wave
CHUNK_LENGTH = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
RECORD_SECONDS = 1
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK_LENGTH)
print("* recording")
frames = [stream.read(RATE * RECORD_SECONDS)] # here is the recorded data, in the form of list of bytes
print("* done recording")
stream.stop_stream()
stream.close()
p.terminate()
After I get the raw audio data (variable frames), it can be saved by using python wave module as belows. We can see that when saving, some meta message must be saved by calling functions like wf.setxxx.
import os
output_dir = "data/"
output_path = output_dir + "{:%Y%m%d_%H%M%S}.wav".format(datetime.now())
if not os.path.exists(output_dir):
os.makedirs(output_dir)
# save the recorded data as wav file using python `wave` module
wf = wave.open(output_path, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()
And here is the code of using the saved file to run inference on tensorflow model. It just simply read it as binary then the model will handle the rest.
import classifier # my tensorflow model
with open(output_path, 'rb') as f:
w = f.read()
classifier.run_graph(w, labels, 5)
THE PROBLEM
For real-time needs, I need to keep streaming the audio and feeding it into the model once a while. But it seems unreasonable to keep saving the file on the disk and then read it again and again, which will spend losts of time on I/O.
I want to keep the data in memeory and use it directly, rather than saving and reading it repeatedly. However, python wave module does not support reading and writing simultaneously (refers here).
If I directly feed the data without some meta data (e.g. channels, frame rate) (which can be added by wave module during saving) like this:
w = b''.join(frames)
classifier.run_graph(w, labels, 5)
I will get error as belows:
2021-04-07 11:05:08.228544: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at decode_wav_op.cc:55 : Invalid argument: Header mismatch: Expected RIFF but found
Traceback (most recent call last):
File "C:\Users\anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1365, in _do_call
return fn(*args)
File "C:\Users\anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1350, in _run_fn
target_list, run_metadata)
File "C:\Users\anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Header mismatch: Expected RIFF but found
The tensorflow model I'm using is provided here: ML-KWS-for-MCU, hope this helps.
Here is the code that produces the error: (classifier.run_graph())
def run_graph(wav_data, labels, num_top_predictions):
"""Runs the audio data through the graph and prints predictions."""
with tf.Session() as sess:
# Feed the audio data as input to the graph.
# predictions will contain a two-dimensional array, where one
# dimension represents the input image count, and the other has
# predictions per class
softmax_tensor = sess.graph.get_tensor_by_name("labels_softmax:0")
predictions, = sess.run(softmax_tensor, {"wav_data:0": wav_data})
# Sort to show labels in order of confidence
top_k = predictions.argsort()[-num_top_predictions:][::-1]
for node_id in top_k:
human_string = labels[node_id]
score = predictions[node_id]
print('%s (score = %.5f)' % (human_string, score))
return 0

You should be able to use io.BytesIO instead of a physical file, they share the same interface but BytesIO is only kept in memory:
import io
container = io.BytesIO()
wf = wave.open(container, 'wb')
wf.setnchannels(4)
wf.setsampwidth(4)
wf.setframerate(4)
wf.writeframes(b'abcdef')
# Read the data up to this point
container.seek(0)
data_package = container.read()
# add some more data...
wf.writeframes(b'ghijk')
# read the data added since last
container.seek(len(data_package))
data_package = container.read()
This should allow you to continuously stream the data into the file while reading the excess using your TensorFlow code.

Related

How to play an audio file from python terminal

I am using librosa library to do data analysis on an audio file in .wav format. But it seems librosa can only read or write audio file in form of an array apart from feature extraction. I would also like to play the audio file with my analysis code.
In Ipython notebook, I can use Ipython.display.audio to play audio directly in Ipython ntoebook, but when I convert code to .py, I doesn't work, so I need something that can be used for the same purpose.
You could use pydub to load the audio file (mp3, wav, ogg, raw) and simpleaudio for playback. Just do
sound = pydub.AudioSegment.from_wav('audiofile.wav')
playback = simpleaudio.play_buffer(
sound.raw_data,
num_channels=sound.channels,
bytes_per_sample=sound.sample_width,
sample_rate=sound.frame_rate
)
And voila! you finally got your beats going. To stop just call playback.stop()
If you want to use a blacking mode where the execution will wait until the streaming is finished you can use pyaudio blocking mode full documentation
example:
"""PyAudio Example: Play a wave file."""
import pyaudio
import wave
import sys
CHUNK = 1024
if len(sys.argv) < 2:
print("Plays a wave file.\n\nUsage: %s filename.wav" % sys.argv[0])
sys.exit(-1)
wf = wave.open(sys.argv[1], 'rb')
# instantiate PyAudio (1)
p = pyaudio.PyAudio()
# open stream (2)
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
channels=wf.getnchannels(),
rate=wf.getframerate(),
output=True)
# read data
data = wf.readframes(CHUNK)
# play stream (3)
while len(data) > 0:
stream.write(data)
data = wf.readframes(CHUNK)
# stop stream (4)
stream.stop_stream()
stream.close()
# close PyAudio (5)
p.terminate()

Saving audio from mp4 as wav file using Moviepy Audiofile

I have a video file named 'video.mp4'. I am trying to seperate a section of audio from the video and save it as a wav file that can be used with other Python modules. I want to do this with MoviePy.
I send parameters to the write_audiofile function, specifying the filename, fps, nbyte, and codec.
Following the MoviePy AudioClip docs, I specified the codec as ‘pcm_s32le’ for a 32-bit wav file.
from moviepy.editor import *
sound = AudioFileClip("video.mp4")
newsound = sound.subclip("00:00:13","00:00:15") #audio from 13 to 15 seconds
newsound.write_audiofile("sound.wav", 44100, 2, 2000,"pcm_s32le")
This code generates a .wav file, named 'sound.wav'.
Opening the audio file in Audacity
The resulting file, sound.wav, can be opened in Audacity, however I run into problems when I try to use it as a wav file with other Python modules.
Playing the sound file in pygame
import pygame
pygame.mixer.init()
sound=pygame.mixer.Sound("sound.wav")
The third line gives the following error:
pygame.error: Unable to open file 'sound.wav'
Determining type of sound file using sndhdr.what()
import sndhdr
sndhdr.what("sound.wav")
The sndhdr method returned none
. According to the docs, when this happens, the method failed to determine the type of sound data stored in the file.
Reading the file with Google Speech Recognition
import speech_recognition as sr
r = sr.Recognizer()
audio = "sound.wav"
with sr.AudioFile(audio) as source:
audio = r.record(source)
text= r.recognize_google(audio)
print(text)
This code stops execution on the second to last line:
ValueError: Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format
Why does the audio file open in Audacity, if sndhdr.what() can not recognize it as an audio file type?
How can I properly export a MoviePy AudioClip as a wav file?
I had the same issue with no codec specified or with codec = 'pcms32le', the one that worked for me was pcm_s16le.
Note that I am using "fr-FR" language, you should probably adapt to yur needs.
here is the entire code :
# Python code to convert video to audio
import moviepy.editor as mp
import speech_recognition as sr
# Insert Local Video File Path
clip = mp.VideoFileClip("/tmp/data/test.mp4")
# Insert Local Audio File Path
clip.audio.write_audiofile("/tmp/data/test.wav",codec='pcm_s16le')
# initialize the recognizer
r = sr.Recognizer()
# open the file
with sr.AudioFile("/tmp/data/test.wav") as source:
# listen for the data (load audio to memory)
audio_data = r.record(source)
# recognize (convert from speech to text)
text = r.recognize_google(audio_data, language = "fr-FR")
print(text)
I had the same issue. I was trying to get a mp4 file from URL, then convert It into wav file and call Google Speech Recognition over It. Instead I used pydub to handle conversion and it worked! Here's a sample of the code:
import requests
import io
import speech_recognition as sr
from pydub import AudioSegment
# This function translate speech to text
def speech_to_text(file):
recognizer = sr.Recognizer()
audio = sr.AudioFile(file)
with audio as source:
speech = recognizer.record(source)
try:
# Call recognizer with audio and language
text = recognizer.recognize_google(speech, language='pt-BR')
print("Você disse: " + text)
return text
# If recognizer don't understand
except:
print("Não entendi")
def mp4_to_wav(file):
audio = AudioSegment.from_file(file, format="mp4")
audio.export("audio.wav", format="wav")
return audio
def mp4_to_wav_mem(file):
audio = AudioSegment.from_file_using_temporary_files(file, 'mp4')
file = io.BytesIO()
file = audio.export(file, format="wav")
file.seek(0)
return file
url = ''
r = requests.get(url, stream=True)
file = io.BytesIO(r.content)
file = mp4_to_wav_mem(file)
speech_to_text(file)
Note that I wrote two functions: mp4_to_wav and mp4_to_wav_mem. The only difference is mp4_to_wav_mem handle all files in memory and mp4_to_wav generates .wav file.
I read the docs of MoviePy and found that the parameter nbyte should be consistent with codec. nbyte is for the Sample width (set to 2 for 16-bit sound, 4 for 32-bit sound). Hence, it better set nbyte=4, when you set codec=pcm_s32le.
i think this is the right method:
import os
from moviepy.editor import AudioFileClip
PATH= "files/"
fileName = "nameOfYourFile.mp4"
newFileName = "nameOfTheNewFile"
Ext = "wav"
AudioFileClip(os.path.join(PATH, f"{fileName}")).write_audiofile(os.path.join(PATH, f"{newFileName}.{Ext}"))
I think this approach is very easy to understand.
from moviepy.editor import *
input_file = "../Database/myvoice.mp4"
output_file = "../Database/myvoice.wav"
sound = AudioFileClip(input_file)
sound.write_audiofile(output_file, 44100, 2, 2000,"pcm_s32le")

Read in Python a wav file to which data is being appended

I need to transcribe the speech that is being written to a wav file. I've implemented the following iterator to try to incrementally read the audio from the file:
import wave
def read_audio(path, chunk_size=1024):
wave_file = wave.open(open(path, 'rb'))
while True:
data = wave_file.readframes(chunk_size)
if data != "":
yield data
In order to test the generator, I've implemented a function that keeps writing to a wav file the audio captured by the computer's microphone:
import pyaudio
def record_to_file(out_path):
fmt = pyaudio.paInt16
channels = 1
rate = 16000
chunk = 1024
audio = pyaudio.PyAudio()
stream = audio.open(format=fmt, channels=channels,
rate=rate, input=True,
frames_per_buffer=chunk)
wave_file = wave.open(out_path, 'wb')
wave_file.setnchannels(channels)
wave_file.setsampwidth(audio.get_sample_size(fmt))
wave_file.setframerate(rate)
while True:
data = stream.read(chunk)
waveFile.writeframes(data)
Below is the test script:
import threading
import time
WAV_PATH='out.wav'
def record_worker():
record_to_file(WAV_PATH)
if __name__=='__main__':
t = threading.Thread(target=record_worker)
t.setDaemon(True)
t.start()
time.sleep(5)
reader = read_audio(WAV_PATH)
for chunk in reader:
print(len(chunk))
It doesn't work as I'd expect - the reader stops yielding after a while. Since the test is successful if I adapt record_file to set the wav file's nframes to a very large number beforehand and do the writing with writeframesraw, my guess is that wave.open eagerly reads nframes, not trying to read anything after that number of frames has been read.
Is it possible to obtain that incremental read in Python 2.7 without resorting to this setnframes hack? It's worth noting that, contrary to the test script, I have no control of the wav file's generation in the scenario in which I plan to utilize such feature. The writing gets done by a SWIG-adapted C library named pjsip (http://www.pjsip.org/python/pjsua.htm), so I don't expect it to be possible to do any modifications on that end.

Error doing speech recognition with pygsr

I'm attempting to get speech recognition for searching working for a project I'm working on. At the minute, I'm only focussing on getting the speech recognition working and I'm using pygsr to do this. I found a post about pygsr on here earlier but I'm currently struggling to get it to work. This is the code that I'm using:
from pygsr import Pygsr
speech = Pygsr()
speech.record(3)
phrase, complete_response =speech.speech_to_text('en_US')
print phrase
After spending a while installing the library with OS X, I finally got it to actually sort of work. It detected the library and worked seemingly but then I would get this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 4, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pygsr/__init__.py", line 33, in record
data = stream.read(self.chunk)
File "/Library/Python/2.7/site-packages/pyaudio.py", line 605, in read
return pa.read_stream(self._stream, num_frames)
IOError: [Errno Input overflowed] -9981
I have no idea if this is due to something I'm doing wrong or if I can't use pygsr on OS X. If there is no way for this to work, does anyone have any recommendations for a speech recognition library for OS X that uses Python 2.7?
you could test if it works correctly pyaudio running this script:
import pyaudio
import wave
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "output.wav"
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
print("* recording")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print("* done recording")
stream.stop_stream()
stream.close()
p.terminate()
wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()
I received many reports that pygsr macOS is not working, but I could not fix it because I could not test it on a mac.

POST ".wav" File to URL with Python, print Response How To?

So here we have a Python script:
""" Record a few seconds of audio and save to a WAVE file. """
import pyaudio
import wave
import sys
chunk = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "output.wav"
p = pyaudio.PyAudio()
stream = p.open(format = FORMAT,
channels = CHANNELS,
rate = RATE,
input = True,
frames_per_buffer = chunk)
print "* recording"
all = []
for i in range(0, RATE / chunk * RECORD_SECONDS):
data = stream.read(chunk)
all.append(data)
print "* done recording"
stream.close()
p.terminate()
# write data to WAVE file
data = ''.join(all)
wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(data)
wf.close()
And this script does what the first comentary line says, if you run it in terminal it will output a ".wav" file in the path you're set in the moment of the execution... What I want to do is to get that file and "manipule" it, instead of outputting it to the Computer, I want to store it in a variable or something like that, and then I would like to POST it to an URL parsing some parametters along with it... I saw some interesting examples of posting multipart-encoded files using requests, as you can see here:
http://docs.python-requests.org/en/latest/user/quickstart/
But I made several attempts of achieving what I'm descripting in this question and I was unlucky... Maybe a little guidance will help with this one :)
Being Brief, what I need is to record a WAV file from microphone and then POST it to an URL (Parsing Data like the Headers with it) and then get the output in a print statement or something like that in the terminal...
Thank You!!
wave.open lets you pass either a file name or a file-like object to save into. If you pass in a StringIO object rather than WAVE_OUTPUT_FILENAME, you'll can get a string object that you can presumably use to construct a POST request.
Note that this will load the file into memory -- if it might be really long, you might prefer to do it into a temporary file and then use that to make your request. Of course, you're already loading it into memory, so maybe that's not an issue.

Categories

Resources