I'm currently using Azure speech to text in my project. It is recognizing speech input directly from microphone (which is what I want) and saving the text output, but I'm also interested in saving that audio input so that I can listen to it later on. Before moving to Azure I was using the python speech recognition library with recognize_google, that allowed me to use get_wav_data() to save the input as a .wav file. Is there something similar I can use with Azure? I read the documentation but could only find ways to save audio files for text to speech. My temporary solution is to save the audio input myself first and then use the azure stt on that audio file rather than directly using the microphone for input, but I'm worried this will slow down the process. Any ideas?
Thank you in advance!
This is Darren from the Microsoft Speech SDK Team. Unfortunately, at the moment there is no built-in support for simultaneously doing live recognition from a microphone and writing the audio to a WAV file. We have heard this customer request before and we will consider adding this feature in a future version of the Speech SDK.
What I think you can do at the moment (it will require a bit of programming on your part), is use Speech SDK with a push stream. You can write code to read audio buffers from the microphone and write it to a WAV file. At the same time, you can push the same audio buffers into Speech SDK for recognition. We have Python samples showing how to use Speech SDK with push stream. See function "speech_recognition_with_push_stream" in this file: https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/python/console/speech_sample.py. However, I'm not familiar with Python options for reading real-time audio buffers from a Microphone, and writing to WAV file.
Darren
If you use Azure's speech_recognizer.recognize_once_async(), you can simultaneously capture the microphone with pyaudio. Below is the code I use:
#!/usr/bin/env python3
# enter your output path here:
output_file='/Users/username/micaudio.wav'
import pyaudio, signal, sys, os, requests, wave
pa = pyaudio.PyAudio()
import azure.cognitiveservices.speech as speechsdk
def vocrec_callback(in_data, frame_count, time_info, status):
global voc_data
voc_data['frames'].append(in_data)
return (in_data, pyaudio.paContinue)
def vocrec_start():
global voc_stream
global voc_data
voc_data = {
'channels':1 if sys.platform == 'darwin' else 2,
'rate':44100,
'width':pa.get_sample_size(pyaudio.paInt16),
'format':pyaudio.paInt16,
'frames':[]
}
voc_stream = pa.open(format=voc_data['format'],
channels=voc_data['channels'],
rate=voc_data['rate'],
input=True,
output=False,
stream_callback=vocrec_callback)
def vocrec_stop():
voc_stream.close()
def vocrec_write():
with wave.open(output_file, 'wb') as wave_file:
wave_file.setnchannels(voc_data['channels'])
wave_file.setsampwidth(voc_data['width'])
wave_file.setframerate(voc_data['rate'])
wave_file.writeframes(b''.join(voc_data['frames']))
class SIGINT_handler():
def __init__(self):
self.SIGINT = False
def signal_handler(self, signal, frame):
self.SIGINT = True
print('You pressed Ctrl+C!')
vocrec_stop()
quit()
def init_azure():
global speech_recognizer
# ——— check azure keys
my_speech_key = os.getenv('SPEECH_KEY')
if my_speech_key is None:
error_and_quit("Error: No Azure Key.")
my_speech_region = os.getenv('SPEECH_REGION')
if my_speech_region is None:
error_and_quit("Error: No Azure Region.")
_headers = {
'Ocp-Apim-Subscription-Key': my_speech_key,
'Content-type': 'application/x-www-form-urlencoded',
# 'Content-Length': '0',
}
_URL = f"https://{my_speech_region}.api.cognitive.microsoft.com/sts/v1.0/issueToken"
_response = requests.post(_URL,headers=_headers)
if not "200" in str(_response):
error_and_quit("Error: Wrong Azure Key Or Region.")
# ——— keys correct. continue
speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('SPEECH_KEY'),
region=os.environ.get('SPEECH_REGION'))
audio_config_stt = speechsdk.audio.AudioConfig(use_default_microphone=True)
speech_config.set_property(speechsdk.PropertyId.SpeechServiceResponse_RequestSentenceBoundary, 'true')
# ——— disable profanity filter:
speech_config.set_property(speechsdk.PropertyId.SpeechServiceResponse_ProfanityOption, "2")
speech_config.speech_recognition_language="en-US"
speech_recognizer = speechsdk.SpeechRecognizer(
speech_config=speech_config,
audio_config=audio_config_stt)
def error_and_quit(_error):
print(error)
quit()
def recognize_speech ():
vocrec_start()
print("Say something: ")
speech_recognition_result = speech_recognizer.recognize_once_async().get()
print("Recording done.")
vocrec_stop()
vocrec_write()
quit()
handler = SIGINT_handler()
signal.signal(signal.SIGINT, handler.signal_handler)
init_azure()
recognize_speech()
Related
I'm wanting to take an internet audio/radio stream (specifically Longplayer, click for direct stream URL) and play it with python.
It's preferable that it's backgrounded, such that the script able to continue running its main loop. (e.g. as game background music or something, though Pyglet, PyGame et al. may provide their own tools for that.)
I've seen some likely out of date examples of recording internet radio using requests and dumping it into a file but this isn't exactly what I want and the answers' comments seemed to have arguments about requests being problematic among other things? (see here)
I'm open to using any packages you can pip so long as it works with Python 3.X. (Currently using 3.6 purely because I haven't gathered the effort to install 3.7 yet)
To reiterate, I don't want to save the stream, just play it immediately (or with buffering if that's needed?) back to the user. This is preferably without blocking the script, which I imagine would need multithreadng/multiprocessing but this is secondary to just getting playback.)
As it always seems to be the case with these kinds of apparently simple questions, the devil is in the details. I ended up writing some code that should solve this question. The pip dependencies can be installed using python3 -m pip install ffmpeg-python PyOpenAL. The workflow of the code can be divided into two steps:
The code must download binary chunks of mp3 file data from an online stream and convert them to raw PCM data (basically signed uint16_t amplitude values) for playback. This is done using the ffmpeg-python library, which is a wrapper for FFmpeg. This wrapper runs FFmpeg in a separate process, so no blocking occurs here.
The code must then queue these chunks for playback. This is done using PyOpenAL, which is a wrapper for OpenAL. After creating a device and context to enable audio playback, a 3d-positioned source is created. This source is continuously queued with buffers (simulating a "ring buffer") that are filled with data piped in from FFmpeg. This runs on a separate thread from the first step, making downloading new audio chunks run independently from audio chunk playback.
Here is what that code looks like (with some commenting). Please let me know if you have any questions about the code or any other part of this answer.
import ctypes
import ffmpeg
import numpy as np
from openal.al import *
from openal.alc import *
from queue import Queue, Empty
from threading import Thread
import time
from urllib.request import urlopen
def init_audio():
#Create an OpenAL device and context.
device_name = alcGetString(None, ALC_DEFAULT_DEVICE_SPECIFIER)
device = alcOpenDevice(device_name)
context = alcCreateContext(device, None)
alcMakeContextCurrent(context)
return (device, context)
def create_audio_source():
#Create an OpenAL source.
source = ctypes.c_uint()
alGenSources(1, ctypes.pointer(source))
return source
def create_audio_buffers(num_buffers):
#Create a ctypes array of OpenAL buffers.
buffers = (ctypes.c_uint * num_buffers)()
buffers_ptr = ctypes.cast(
ctypes.pointer(buffers),
ctypes.POINTER(ctypes.c_uint),
)
alGenBuffers(num_buffers, buffers_ptr)
return buffers_ptr
def fill_audio_buffer(buffer_id, chunk):
#Fill an OpenAL buffer with a chunk of PCM data.
alBufferData(buffer_id, AL_FORMAT_STEREO16, chunk, len(chunk), 44100)
def get_audio_chunk(process, chunk_size):
#Fetch a chunk of PCM data from the FFMPEG process.
return process.stdout.read(chunk_size)
def play_audio(process):
#Queues up PCM chunks for playing through OpenAL
num_buffers = 4
chunk_size = 8192
device, context = init_audio()
source = create_audio_source()
buffers = create_audio_buffers(num_buffers)
#Initialize the OpenAL buffers with some chunks
for i in range(num_buffers):
buffer_id = ctypes.c_uint(buffers[i])
chunk = get_audio_chunk(process, chunk_size)
fill_audio_buffer(buffer_id, chunk)
#Queue the OpenAL buffers into the OpenAL source and start playing sound!
alSourceQueueBuffers(source, num_buffers, buffers)
alSourcePlay(source)
num_used_buffers = ctypes.pointer(ctypes.c_int())
while True:
#Check if any buffers are used up/processed and refill them with data.
alGetSourcei(source, AL_BUFFERS_PROCESSED, num_used_buffers)
if num_used_buffers.contents.value != 0:
used_buffer_id = ctypes.c_uint()
used_buffer_ptr = ctypes.pointer(used_buffer_id)
alSourceUnqueueBuffers(source, 1, used_buffer_ptr)
chunk = get_audio_chunk(process, chunk_size)
fill_audio_buffer(used_buffer_id, chunk)
alSourceQueueBuffers(source, 1, used_buffer_ptr)
if __name__ == "__main__":
url = "http://icecast.spc.org:8000/longplayer"
#Run FFMPEG in a separate process using subprocess, so it is non-blocking
process = (
ffmpeg
.input(url)
.output("pipe:", format='s16le', acodec='pcm_s16le', ac=2, ar=44100, loglevel="quiet")
.run_async(pipe_stdout=True)
)
#Run audio playing OpenAL code in a separate thread
thread = Thread(target=play_audio, args=(process,), daemon=True)
thread.start()
#Some example code to show that this is not being blocked by the audio.
start = time.time()
while True:
print(time.time() - start)
With pyminiaudio: (it provides an icecast stream source class):
import miniaudio
def title_printer(client: miniaudio.IceCastClient, new_title: str) -> None:
print("Stream title: ", new_title)
with miniaudio.IceCastClient("http://icecast.spc.org:8000/longplayer",
update_stream_title=title_printer) as source:
print("Connected to internet stream, audio format:", source.audio_format.name)
print("Station name: ", source.station_name)
print("Station genre: ", source.station_genre)
print("Press <enter> to quit playing.\n")
stream = miniaudio.stream_any(source, source.audio_format)
with miniaudio.PlaybackDevice() as device:
device.start(stream)
input() # wait for user input, stream plays in background
I am developing a virtual assistant. I am using google_speech_to_text converter i am unable to keep the audio input continues. I think if there is any way i can use two environments, one will be used for listening and converting text and other for rest of the processing.
I don't want to change my STT engine. i just want to know is it possible to simultaneously switch between environments. If yes, HOW?
Here is my input.py file : whereever I require to take audio input I call the function start_listening():
import speech_recognition as sr
import output
import winsound
def start_listening():
r = sr.Recognizer()
with sr.Microphone() as source:
# output.speak("Listening")
r.adjust_for_ambient_noise(source)
audio = r.record(source, duration=5)
try:
return r.recognize_google(audio)
except:
output.speak("Unable to Translate, speak again")
start_listening()
Here is my processing.py file :
import input as listener
import output as speak
import clock
import query_processor as mind
import rideBooking
#First Greeting at the startup , according to the time select the greeting
speak.speak(clock.get_part_of_day())
def searching_for_word(word,sentence):
if word in sentence:
return True
else:
return False
def main_organ():
input = listener.start_listening()
inputType = mind.classify(input)
if inputType == 'whatever':
#run different functions on different commands
main_organ()
#run the app with the below code
if __name__ == "__main__":
main_organ()
While the processing is ON the app is unable to listen. It can only start_listening when the processing is fully completed.
You can create multiple processes.
To do that, import the multiprocessing.Process.run module and recover the return value.
You can use a queue to process the data as it comes from your subprocess.
You don't need multiple environments.
I am using pynfc to read in NFC tags. I have an ACR 122U USB NFC reader/writes unit. This unit is capable to make a sound when it reads in a tag, however i was unable to find anything in the pynfc docs about controlling it. Is there a way with either pynfc, or some other python, or linux OS to invoke the sound of an NFC reader?
Here is an example to buzz the buzzer:
Add the following code to pynfc/__init__.py at line 75.(above def poll at same indent)
def buzz(self):
ba = (c_ubyte * 9)(*[0xFF,0x00,0x40,0x00,0x4C,0x10,0x00,0x01,0x01])
result = nfc.nfc_initiator_transceive_bytes.argtypes[3]._type_()
nfc.nfc_initiator_transceive_bytes(self.pdevice, ctypes.byref(ba), len(ba), ctypre.byref(result),2,1000)
Call nfc.buzz() from your script.
I do not have a device to test the code. Also note that you cannot poll and buzz at the same time.
For nfcpy i found out that if the on-connect function returns True the buzzer and the light will go off if the reader is capable.
#!/usr/bin/python
import nfc
import time
import datetime
def on_connect(tag):
print('Last read: {}'.format(datetime.datetime.now()))
return True
while True:
with nfc.ContactlessFrontend('usb') as clf:
clf.connect(rdwr={'on-connect': on_connect, 'beep-on-connect': True})
time.sleep(1)
I am trying to stream video and audio data from a YouTube video so that I can do some video and audio analysis seperately which is then superimposed on the frames using OpenCV. I have this working perfectly fine with files but want to extend this to streaming from YouTube.
At the moment, I've thought of usings VLC Python bindings to stream from YouTube but I'm not sure how to extract the frames from this video.
Here is the vlc code that performs YouTube streaming at the moment:
import vlc
import time
import numpy as np
from ctypes import *
class MyDisplay(vlc.VideoDisplayCb):
def __doc__(o,p):
print "fsdfs"
class MyLock(vlc.VideoLockCb):
def __doc__():
raise Exception("sdsds")
return np.array(500,500).__array_interface__['data']
class MyPlayback(vlc.AudioPlayCb):
def from_param(self,a,b,c,d):
print "asfds"
def callbck(a,b,c,d):
print 'aa'
print a
print b
print c
print d
return 'a'
if __name__ == '__main__':
url = 'https://www.youtube.com/watch?v=F82XtLmL0tU'
i = vlc.Instance('--verbose 2'.split())
media = i.media_new(url)
media_list = i.media_list_new([url])
p = i.media_player_new()
p.set_media(media)
lp = i.media_list_player_new()
lp.set_media_player(p)
lp.set_media_list(media_list)
CMPFUNC = CFUNCTYPE(c_char, c_void_p, c_void_p, c_uint, c_long)
lp.next()
lock = MyLock()
display = MyDisplay()
playback = MyPlayback()
p.audio_set_callbacks(CMPFUNC(callbck),None,None,None,None,None)
p.play()
time.sleep(5)
r = p.video_take_snapshot(0,'rnd.pong',0,0)
How could I produce a stream of frames and audio data using VLC (with Python bindings)? Also is there another way to do this (using ffmpeg for example)?
Thanks
I'm looking for a method to play midi files in python.
It seems python does not support MIDI in its standard library.
After I searched, I found some python midi librarys such as pythonmidi.
However, most of them can only create and read MIDI file without playing function.
I would like to find a python midi library including playing method.
Any recommendations? Thanks!
The pygame module can be used to play midi files.
http://www.pygame.org/docs/ref/music.html
See the example here:
http://www.daniweb.com/software-development/python/code/216979
a whole bunch of options available at:
http://wiki.python.org/moin/PythonInMusic
and also here which you can modify to suit your purpose:
http://xenon.stanford.edu/~geksiong/code/playmus/playmus.py
Just to add a minimal example (via DaniWeb):
# conda install -c cogsci pygame
import pygame
def play_music(midi_filename):
'''Stream music_file in a blocking manner'''
clock = pygame.time.Clock()
pygame.mixer.music.load(midi_filename)
pygame.mixer.music.play()
while pygame.mixer.music.get_busy():
clock.tick(30) # check if playback has finished
midi_filename = 'FishPolka.mid'
# mixer config
freq = 44100 # audio CD quality
bitsize = -16 # unsigned 16 bit
channels = 2 # 1 is mono, 2 is stereo
buffer = 1024 # number of samples
pygame.mixer.init(freq, bitsize, channels, buffer)
# optional volume 0 to 1.0
pygame.mixer.music.set_volume(0.8)
# listen for interruptions
try:
# use the midi file you just saved
play_music(midi_filename)
except KeyboardInterrupt:
# if user hits Ctrl/C then exit
# (works only in console mode)
pygame.mixer.music.fadeout(1000)
pygame.mixer.music.stop()
raise SystemExit
pretty_midi can generate the waveform for you, you can then play it with e.g. IPython.display.Audio
from IPython.display import Audio
from pretty_midi import PrettyMIDI
sf2_path = 'path/to/sf2' # path to sound font file
midi_file = 'music.mid'
music = PrettyMIDI(midi_file=midi_file)
waveform = music.fluidsynth(sf2_path=sf2_path)
Audio(waveform, rate=44100)
Use pygame to play your midi file. Examples are here or here
I find that midi2audio works well.
Example:
from midi2audio import FluidSynth
#Play MIDI
FluidSynth().play_midi('input.mid')
#Synthesize MIDI to audio
# Note: the default sound font is in 44100 Hz sample rate
fs = FluidSynth()
fs.midi_to_audio('input.mid', 'output.wav')
# FLAC, a lossless codec, is recommended
fs.midi_to_audio('input.mid', 'output.flac')
On macOS, you can use the pyObjC library to access the OS's own MIDI handling routines. This script will play midi files given as arguments.
#!/usr/bin/env python3
from AVFoundation import AVMIDIPlayer
from Foundation import NSURL
import time
import sys
def myCompletionHandler():
return
def playMIDIFile(filepath):
midiFile = NSURL.fileURLWithPath_(filepath)
midiPlayer, error = AVMIDIPlayer.alloc().initWithContentsOfURL_soundBankURL_error_(midiFile, None, None)
if error:
print (error)
sys.exit(1)
MIDItime = midiPlayer.duration()
midiPlayer.prepareToPlay()
midiPlayer.play_(myCompletionHandler)
if not midiPlayer.isPlaying:
midiPlayer.stop()
else:
time.sleep(MIDItime)
return
if __name__ == "__main__":
for filename in sys.argv[1:]:
playMIDIFile(filename)