Python speech recognition very slow

Python speech recognition very slow - python

I am currently developing a smart assistant program (basically it is just listening to what the user says and based on that does something with the code). It was working fine up until today, when I switched to my laptop. The program does not print out any errors, but it also doesn't print out what I said. I'm using the Python Speech Recognition library version 3.8.1. Does anybody know of an alternative for this library? If yes, please try to explain how I would use it 'on the fly' (without first recording the file and then sending it to the server, more like real-time speech).
EDIT: I forgot to say it in the post, I'm using Python 3.
EDIT: Here's the code:
#!/usr/bin/env python3
import speech_recognition as sr
global x
def speech():
try:
with sr.Microphone() as source:
global x
r = sr.Recognizer()
audio = r.listen(source)
x = r.recognize_google(audio)
except sr.UnknownValueError:
print("No clue what you said, listening again... \n")
speech()
if __name__ == '__main__':
print('Listening and printing what I heard: \n')
speech()
print(x)

I found that the problem was in the laptop's microphone. The speech recognition worked fine after I plugged in my Blue Snowball. I forced the program to use the Blue Snowball by going into pavucontrol and selecting the Blue Snowball under the recording tab.

Another reason could be that your mic levels are either too high or too low,
in both the cases the speech_recognition would get either too less audio or too much audio to work with. Take a look at that in your system settings. It helped me, hope it helps you.

Related

Using Tkinter for a subtitle application, Python

I am trying to create an automatic subtitle writer program to be used for online presentations using Python. Basically, the program will listen to your microphone, and display the words you've said on your screen. This is what I have done so far:
import speech_recognition as sr
r = sr.Recognizer()
count = 1
while count <= 5:
with sr.Microphone() as source:
print(" ")
audio = r.listen(source)
try:
text = r.recognize_google(audio)
print("{}".format(text))
except:
print("")
Now I know it's not much, but I am trying my best as I'm very, very new to programming. Anyways, I found out an hour ago that Tkinter is a good tool to use for this project, especially when it comes to having the window (where the subtitles are displayed) always on top of your screen. However, I am failing to understand how to properly use it in my case.
So my question is, what should I do to incorporate Tkinter in to my project?

Converting speech to text in real time using Python

I'm currently trying to create a program that will print what is being said through a microphone input into console as it is being said, rather in just one big block after the user is finished saying whatever they want to say. How would I extend the SpeechRecognition/PyAudio Modules in order to achieve this. I'm thinking it may involve being able to detect when someone has finished saying a word and then looping back around to the voice detection, but I wouldnt be sure how to implement that.
This is just the basic example I'm going off of in order to print the text after the user is finished speaking:
microphone = sr.Microphone(device_index=0)
r = sr.Recognizer()
with microphone as source:
audio = r.listen(source)
print(r.recognize_google(audio))
Thanks :)

Using a Raspberry pi, mic works perfectly fine for everything, except for code (r.listen(source))

I'm working on a speech to morse code Group Project. The problem is that the raspberry pi has a problem of using the USB mic in the listen() function of the program.
It works completely fine on laptops, but it doesn't on an RPi for some reason. The mics work completely fine everywhere else, using arecord -d, and testing on discord. I installed a majority of the stuff required using pip install: Pyaduio, speech recognition, and portaudio, at least all the stuff I needed for the laptops. I am using Python 2.7.13, and I've tried it on Python 3; same issue.
import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
print("Speak Anything :")
audio = r.listen(source)
try:
text = r.recognize_google(audio)
print("You said : {}".format(text))
except:
print("Sorry could not recognize what you said")
The expected result is supposed to print what you said; however, what it would do is either get stuck on "Speak Anything:" or "Sorry could not recognize what you said."
I really need some help with this. Out of everything to do on this project, I've spent the most time on this, which seems to be something that is supposed to be easy.

Can I use AUX port in Raspberry Pi 3 (Model B) to plug a microphone to get audio signals in?

import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
audio = r.listen(source)
print(r.recognize_sphinx(audio))
When I run this code in Python in raspberry pi 3 (model B), it gives the following error.
OSError: No Default Input Device Available
what is the reason for this? do I need to have a USB microphone to get the audio signals in rather than using the microphone in earphones?

< /Hey >
As designed by the Raspberry Pi's circuit layout, in short:
The 3.5mm Audio Jack on the Raspberry Pi models cannot be used as an audio input.
I'm not sure if you would want to anyways.
This means you have a couple of options on how you want to set up your microphone setup.
1. Using a small mic array (Like Alexa Echo or Google Home)
A lot of the time these kind of systems are prototyped on Raspberry Pi's or similar (see the official Alexa development kit). You can find similar replicas to the microphone arrays found on google home etc. , specifically fitted for the Raspberry Pi. These include some added advanced features such as Noise Suppression, Direction of Sound Source and other neat features I'll leave for you to explore yourself.
Here's 3 I found after googling (I'm sure if you look you can find more):
ReSpeaker 4-mic array
ReSpeaker 7-mic array
Matrix Creator
If you wanted high quality results for speech recognition I'd probably begin to look more down this route.
2. Using a normal USB microphone
Probably the most common approach is to get a standard USB microphone that has Raspberry Pi drivers and use this. I found one from Adafruit which I'm sure is just plug and play which could be nice and easy to get going with.
Again I'm sure you'll find plenty of other options online, these were just suggestions to get you started.
Hopefully this helps! :-)

What you could use is a USB microphone, these tend to install the required drivers and work out of the box more easily.
Source: https://www.raspberrypi.org/forums/viewtopic.php?t=188108

Playing mp3 file through microphone with python

Is there a way using python (and not any external software) to play a mp3 file like a microphone input?
For example, I have a mp3 file and with a python script it would play it through my mic so other in a voice room would hear it. As I say it is just an example.
Of course, I have done some research. I found out that I can use a software to create a virtual device and do few things to have the result. But my point is if it is possible without installing software but with some kind of python script?

It is possible but it isn't 100% in python as it requires the installation of other software. (Also from what I know this specific answer only works on Windows, but it should be similar on Linux with PulseAudio instead of VB-Audio Cable, but I'm not a daily Linux user so I don't know.)
First download: https://www.vb-audio.com/Cable/, this will create a "Virtual Audio Cable" where programs can play music to the input device (What looks like a speaker) and it'll pipe it to the output device (What looks like a microphone).
Then run this command in cmd: pip install pygame==2.0.0.dev8 (or py -m pip install pygame==2.0.0.dev8, depending on your installation of python) [Also the reason it's the dev version is that it requires some functions only in sdl2, whereas the main branch uses sdl1)
Then:
>>> from pygame._sdl2 import get_num_audio_devices, get_audio_device_name #Get playback device names
>>> from pygame import mixer #Playing sound
>>> mixer.init() #Initialize the mixer, this will allow the next command to work
>>> [get_audio_device_name(x, 0).decode() for x in range(get_num_audio_devices(0))] #Returns playback devices
['Headphones (Oculus Virtual Audio Device)', 'MONITOR (2- NVIDIA High Definition Audio)', 'Speakers (High Definition Audio Device)', 'Speakers (NVIDIA RTX Voice)', 'CABLE Input (VB-Audio Virtual Cable)']
>>> mixer.quit() #Quit the mixer as it's initialized on your main playback device
>>> mixer.init(devicename='CABLE Input (VB-Audio Virtual Cable)') #Initialize it with the correct device
>>> mixer.music.load("Megalovania.mp3") #Load the mp3
>>> mixer.music.play() #Play it
To stop the music do: mixer.music.stop()
Also, the music doesn't play through your speakers, so you're going to have another python script or thread running that handles playing it through your speakers. (Also if you want it to play on a button press I recommend using the python library keyboard, the GitHub documentation is really good and you should be able to figure it out on your own.)
PS: This took me a while to figure out, your welcome.
PPS: I'm still trying to figure out a way to pipe your own mic through there as well since this method will obviously not pipe your real microphone in too, but looking into the source code of pygame is making my head hurt due to it all being written in C.

If you meant how to play MP3 using Python, well, this is a broad question.
Is it possible, without any dependencies, yes it is, but it is not worth it. Well, playing uncompressed audio is, but MP3, well, I'll explain below.
To play raw audio data from Python without installing pyaudio or pygame or similar, you first have to know the platform on which your script will be run.
Then implement a nice set of functions for choosing an audio device, setting up properties like sample rate, bit rate, mono/stereo..., feeding the stream to audio card and stopping the playback.
It is not hard, but to do it you have to use ctypes on Windows, PyObjC on Mac and Linux is special case as it supports many audio systems (probably use sockets to connect to PulseAudio or pipe to some process like aplay/paplay/mpeg123... or exploit gstreamer.).
But why go through all this just to avoid dependencies, when you have nice libraries out there with simple interfaces to access and use audio devices.
PyAudio is great one.
Well, that is your concern.
But, playing MP3 without external libraries, in real time, from pure Python, well, it's not exactly impossible, but it is very hard to achieve, and as far as I know nobody even tried doing it.
There is pure Python MP3 decoder implementation, but it is 10 times slower than necessary for real-time audio playback. It can be optimized for nearly full speed, but nobody is interested in doing so.
It has mostly educational value and it is used in cases where you do not need real-time speed.
This is what you should do:
Install pygame and use it to play MP3 directly
or:
Install PyAudio and some library that decodes Mp3, there are quite a few of them on pypi.python.org, and use it to decode the MP3 and feed the output to PyAudio.
There are some more possibilities, including pymedia, but I consider these the easiest solutions.
Okay, as we clarified what is really you need here is the answer.
I will leave first answer intact as you need that part too.
Now, you want to play audio to the recording stream, so that any application recording the audio input records the stuff that you are playing.
On Windows, this is called stereo mix and can be found in Volume Control, under audio input.
You choose stereo mix as your default input. Now, when you open an recording app which doesn't select itsown input channel, but uses the selected one (e.g. Skype) , it will record all coming out of your speakers and coming into your mic/line in.
I am not 100% sure whether this option will appear on all Windows or it is a feature of an audio card you have.
I am positive that Creative and Realtek audio cards supports it.
So, research this.
To select that option from Python, you have to connect to winmm.dll using ctypes and call the appropriate function. I do not know which one and with what arguments.
If this option is not present in volume control, there is nothing for it but to install a virtual audio card to do the loopback for you.
There might be such a software that comes packaged in as library so that you can use it from Python or whatever.
On Linux this should be easy using Pulseaudio. I do not know how, but I know that you can do it, redirect the streams etc. There is tutorial out there somewhere.
Then you can call that command from Python, to set to this and reset back to normal.
On Mac, well, I really have no idea, but it should be possible.
If you want your MP3 to be played only to the recording stream, and not on your speakers at all, well on Windows, you will not be able to do that without a loopback audio device.
On Linux, I am sure you will be able to do it, and on Mac it should be possible, but how is the Q.
I currently have no time to sniff around libraries etc. to provide you with some useful code, so you will have to do it yourself. But I hope my directions will help you.

Just an update on #PyPylia's answer for the benefit of anyone who struggled to implement this like I did.
Current Package Version: pygame 2.1.2 (SDL 2.0.18, Python 3.10.6)
Tested Systems: Windows 10 (21H2 - 19044.1288), (Should be the same process on Mac but this is untested as of now...)
First, you'll need to download the VB-Cable Virtual Mic Driver for your respective platform and install it. This provides us with a virtual mic that'll allow us to pass audio we play on our machine as a microphone input when using a video calling software (Google Meet, Microsoft Teams, Zoom). After that, it's all handled through the pygame module's audio package.
To get the audio device list:
from pygame import mixer, _sdl2 as devicer
mixer.init() # Initialize the mixer, this will allow the next command to work
# Returns playback devices, Boolean value determines whether they are Input or Output devices.
print("Inputs:", devicer.audio.get_audio_device_names(True))
print("Outputs:", devicer.audio.get_audio_device_names(False))
mixer.quit() # Quit the mixer as it's initialized on your main playback device
For example, My device returns:
Inputs: ['Microphone (High Definition Audio Device)', 'CABLE Output (VB-Audio Virtual Cable)']
Outputs: ['Speakers (High Definition Audio Device)', 'CABLE Input (VB-Audio Virtual Cable)']
Then, to playback the audio:
import time
from pygame import mixer
mixer.init(devicename = 'CABLE Input (VB-Audio Virtual Cable)') # Initialize it with the correct device
mixer.music.load("Toby Fox - Megalovania.mp3") # Load the mp3
mixer.music.play() # Play it
while mixer.music.get_busy(): # wait for music to finish playing
time.sleep(1)
If you wish to play multiple tracks back to back, add the following code segments to the while loop above:
...
else:
mixer.music.unload() # Unload the mp3 to free up system resources
mixer.music.load("Sleeping at Last - Saturn.wav") # Load the wav
...
Then, on the other end, inside the relevant software, just change the microphone input from the default to CABLE Output (VB-Audio Virtual Cable) to have those on the other end hear the audio from the source.
If you're using a newer version of the package and some of the listed methods don't seem to work because of an AttributeError: module 'pygame' has no attribute {method_name}, use pyup and search for the method in question, to see if there have been any changes to how the method is invoked. This was the main reason #PyPylia's code snippet no longer works unless you use an older version of pygame.

If you want to play an audio file in local directory, you may follow this flow.
#!/usr/bin/env python
import pyaudio
import socket
import sys
import os
CHUNK = 4096
output = os.path.join(BASE_DIR, "speech.wav") #WAV format Output file name
wf = wave.open(output, 'rb')
p = pyaudio.PyAudio()
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
channels=wf.getnchannels(),
rate=wf.getframerate(),
output=True)
try:
while True:
data = wf.readframes(CHUNK)
stream.write(data)
except KeyboardInterrupt:
pass
print('Shutting down')
s.close()
stream.close()
audio.terminate()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.