Whisper Module Python Speech to Text

Whisper Module Python Speech to Text - python

I'm just trying to create a simple speech to text transcriber using the openai whisper module and streamlit for web application but having some problems.
It is giving me error
Traceback (most recent call last):
File "C:\Users\Satyam Singh\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 565, in _run_script
exec(code, module.__dict__)
File "C:\Users\Satyam Singh\Desktop\Python Project\app.py", line 19, in <module>
transcription = model.transcribe(audio_file)
File "C:\Users\Satyam Singh\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\transcribe.py", line 84, in transcribe
mel = log_mel_spectrogram(audio)
File "C:\Users\Satyam Singh\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\audio.py", line 112, in log_mel_spectrogram audio = torch.from_numpy(audio)
**TypeError: expected np.ndarray (got NoneType)**
Here's my code
import streamlit as st
import whisper
from audio_recorder_streamlit import audio_recorder
# App Title Name
st.title("Speech to Text")
# uploading an audio file
# audio_file = st.file_uploader("Upload Audio", type=["wav","mp3","m4a"])
audio_file = audio_recorder()
if audio_file:
st.audio(audio_file, format="audio/wav")
model = whisper.load_model("base")
st.text("Whisper Model Loaded")
transcription = model.transcribe(audio_file)
print(transcription['text'])
if st.sidebar.button("Transcribe Audio"):
if audio_file is not None:
st.sidebar.success("Transcribing Audio")
transcription = model.transcribe(audio_file.name)
st.sidebar.success("Transcription Complete")
st.text(transcription["text"])
else:
st.sidebar.error("Please Upload an Audio File")
I want something like Baseten
I want this code to work or something more innovative which works same way like this one or Baseten.

Related

facebook graph api python -

This is my script to post a picture on my facebook page:
import facebook
def add_static_image_to_audio(text, media):
page_access_token = "token"
graph = facebook.GraphAPI(page_access_token)
facebook_page_id = "1234567890"
photo = open('example.jpg', 'rb')
graph.put_object(facebook_page_id, "photos", message=text, file=photo.read())
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(description="Simple Python script to add a static image to an audio to make a video")
parser.add_argument("text", help="The text")
parser.add_argument("media", help="image")
args = parser.parse_args()
add_static_image_to_audio(args.text, args.media)
This error code occurs when i run the script above:
Traceback (most recent call last):
File "C:\xampp\htdocs\laravel\storage\app\fb.py", line 19, in <module>
add_static_image_to_audio(args.text, args.media)
File "C:\xampp\htdocs\laravel\storage\app\fb.py", line 11, in add_static_image_to_audio
graph.put_object(facebook_page_id, "photos", message=text, file=photo)
File "C:\Users\Linus\AppData\Local\Programs\Python\Python310\lib\site-packages\facebook\__init__.py", line 189, in put_object
return self.request(
File "C:\Users\Linus\AppData\Local\Programs\Python\Python310\lib\site-packages\facebook\__init__.py", line 313, in request
raise GraphAPIError(result)
facebook.GraphAPIError: (#324) Requires upload file
How do i have to call the file so this problem stops occuring? The example.jpg used here is in the same directory as the script.

Python Speech Recognition KeyError BufferedReader

I am transcoding some audio files using the speech recognition package in Python, and I am getting a KeyError (io.BufferedReader) when I am recording the file. I am not really sure what it means or how to solve it. Any help would be appreciated.
Below are my code and error.
import speech_recognition as sr
r = sr.Recognizer()
AudioFile = sr.AudioFile('/Users/USERNAME/Audio_file.wav')
with AudioFile as source:
audio = r.record(source)
Expected results:
string variable with the audio transcript.
Actual results: The following error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 2, in <module>
File "/Users/USERNAME/anaconda2/envs/python_sr/lib/python3.7/site-packages/speech_recognition/__init__.py",line 295, in __exit__ self.audio_reader.close()
File "/Users/USERNAME/anaconda2/envs/python_sr/lib/python3.7/wave.py", line 194, in close file.close() File "<string>", line 131, in close
KeyError: <_io.BufferedReader name='/Users/USERNAME/Audio_file.wav'>

How can i work with templates in Python-pptx

I know this module is not very popular but if you know the answer then please help me out with it.
My code is:
from pptx import Presentation
prs = Presentation('template.pptx')
title_slide_layout = prs.slide_layout[0]
# print(len(prs.slide_layout))
slide = prs.slides.add_slide(title_slide_layout)
title = slide.shapes.title
subtitle = slide.placeholders[1]
title.text = "Python 3.6 - Turtle Race"
subtitle.text = "Data Analytics&Visualization with random generated data"
prs.save("out.pptx")
An error I have got:
Traceback (most recent call last):
File "D:/!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!_!Piton/turtleRace/presentationMaker.py", line 8, in <module>
prs = Presentation('template.pptx')
File "D:\!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!_!Piton\turtleRace\venv\lib\site-packages\pptx\api.py", line 28, in Presentation
presentation_part = Package.open(pptx).main_document_part
File "D:\!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!_!Piton\turtleRace\venv\lib\site-packages\pptx\opc\package.py", line 103, in main_document_part
return self.part_related_by(RT.OFFICE_DOCUMENT)
File "D:\!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!_!Piton\turtleRace\venv\lib\site-packages\pptx\opc\package.py", line 136, in part_related_by
return self.rels.part_with_reltype(reltype)
File "D:\!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!_!Piton\turtleRace\venv\lib\site-packages\pptx\opc\package.py", line 439, in part_with_reltype
rel = self._get_rel_of_type(reltype)
File "D:\!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!_!Piton\turtleRace\venv\lib\site-packages\pptx\opc\package.py", line 491, in _get_rel_of_type
raise KeyError(tmpl % reltype)
KeyError: "no relationship of type 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument' in collection"
A picture of my project interpreter:
PICTURE
So why I have got this error?

It is an issue about the type when you save the file as Strict Open XML Presentation. Try the standard Presentation document.
You can get more informations about relations inside the file using opc-diag:
You can resolve error Here
Trying to fix a old file:
Extract
unzip <FILE> -d old-file
Repackage it into a new fresh file
opc repackage bad-file new-file.docx
diff of relationships
opc diff-item test.docx test-ok.docx .rels

I have found the solution!!!
Before I saved the file(called template) as Strict Open XML Presentation(.pptx)
and not as PowerPoint Presentation(.pptx)
It's now opening the file but now I have another error:
Traceback (most recent call last):
File "D:/!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!_!Piton/turtleRace/presentationMaker.py", line 9, in <module>
title_slide_layout = prs.slide_layout[0]
AttributeError: 'Presentation' object has no attribute 'slide_layout'
Everything is the same just the saving method in PowerPoint has changed.

pyglet.media.codecs.wave.WAVEFormatException: file does not start with RIFF id

it says something about not having a wave format .wav do work but i need videos to work and open cv is not a option
ive tryed adding avbin64 to all the recemended files like system system64WOW resours file ext. ive tryed turning the .mp4 into .avi tryed a differnd video
import pyglet
pyglet.resource.path = ['C:\\Users\\Gebruiker\\PycharmProjects\\project1 \\res']
pyglet.resource.reindex()
vid = ('file_example_MP4_480_1_5MG.mp4')
vidpath = pyglet.resource.media(vid)
window = pyglet.window.Window()
player = pyglet.media.Player()
source = pyglet.media.StreamingSource()
MediaLoad = pyglet.media.load(vidPath)
#player.queue(MediaLoad)
#player.play()
#window.event
def on_draw():
window.clear()
if player.source and player.source.video_format:
player.get_texture().blit(50,50)
player.draw()
pyglet.app.run()
error code:
Traceback (most recent call last):
File "C:/Users/Gebruiker/PycharmProjects/project1/venv/test", line 7, in <module>
vidpath = pyglet.resource.media(vid)
File "C:\Users\Gebruiker\PycharmProjects\project1\venv\lib\site-packages\pyglet\resource.py", line 678, in media
return media.load(path, streaming=streaming)
File "C:\Users\Gebruiker\PycharmProjects\project1\venv\lib\site-packages\pyglet\media\__init__.py", line 133, in load
loaded_source = decoder.decode(file, filename, streaming)
File "C:\Users\Gebruiker\PycharmProjects\project1\venv\lib\site-packages\pyglet\media\codecs\wave.py", line 109, in decode
return WaveSource(filename, file)
File "C:\Users\Gebruiker\PycharmProjects\project1\venv\lib\site-packages\pyglet\media\codecs\wave.py", line 61, in __init__
raise WAVEFormatException(e)
pyglet.media.codecs.wave.WAVEFormatException: file does not start with RIFF id
hoped that the video would play

pyglet.media.codecs.wave.WAVEFormatException: file does not start with
RIFF id
This error occurred because you are using version 1.4.4 of pyglet. You can solve this issue by downgrading to version 1.3.2
pip install pyglet==1.3.2

google cloud speech ImportError: cannot import name 'enums'

I'm using google-cloud-speech api for my project . I'm using pipenv for virtual environment i installed google-cloud-speech api with
pipenv install google-cloud-speech
and
pipenv update google-cloud-speech
i followed this docs https://cloud.google.com/speech-to-text/docs/reference/libraries
This is my code:
google.py:
# !/usr/bin/env python
# coding: utf-8
import argparse
import io
import sys
import codecs
import datetime
import locale
import os
from google.cloud import speech_v1 as speech
from google.cloud.speech import enums
from google.cloud.speech import types
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = os.path.join("alt_speech_dev_01-fa5fec6806d9.json")
def get_model_by_language_id(language_id):
model = ''
if language_id == 1:
model = 'ja-JP'
elif language_id == 2:
model = 'en-US'
elif language_id == 3:
model = "zh-CN"
else:
raise ('Not Match Lang')
return model
def transcribe_gcs_without_speech_contexts(audio_file_path, model):
client = speech.SpeechClient()
with io.open(audio_file_path, 'rb') as audio_file:
content = audio_file.read()
audio = types.RecognitionAudio(content=content)
config = {
"encoding": enums.RecognitionConfig.AudioEncoding.FLAC,
"sample_rate_hertz": 16000,
"languageCode": model
}
operation = client.long_running_recognize(config, audio)
print('Waiting for operation to complete...')
operationResult = operation.result()
ret=''
for result in operationResult.results:
for alternative in result.alternatives:
ret = alternative.transcript
return ret
def transcribe_gcs(audio_file_path, model, keywords=None):
client = speech.SpeechClient()
with io.open(audio_file_path, 'rb') as audio_file:
content = audio_file.read()
audio = types.RecognitionAudio(content=content)
config = {
"encoding": enums.RecognitionConfig.AudioEncoding.FLAC,
"sample_rate_hertz": 16000,
"languageCode": model,
"speech_contexts":[{"phrases":keywords}]
}
operation = client.long_running_recognize(config, audio)
print('Waiting for operation to complete...')
operationResult = operation.result()
ret=''
for result in operationResult.results:
for alternative in result.alternatives:
ret = alternative.transcript
return ret
transcribe_gcs_without_speech_contexts('alt_en.wav', get_model_by_language_id(2))
When i try to run the python file with
python google.py
it return error ImportError: cannot import name 'SpeechClient' with the following traceback:
Traceback (most recent call last):
File "google.py", line 11, in <module>
from google.cloud import speech_v1 as speech
File "/home/hoanglinh/Documents/practice_speech/.venv/lib/python3.6/site-packages/google/cloud/speech_v1/__init__.py", line 17, in <module>
from google.cloud.speech_v1.gapic import speech_client
File "/home/hoanglinh/Documents/practice_speech/.venv/lib/python3.6/site-packages/google/cloud/speech_v1/gapic/speech_client.py", line 18, in <module>
import pkg_resources
File "/home/hoanglinh/Documents/practice_speech/.venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3241, in <module>
#_call_aside
File "/home/hoanglinh/Documents/practice_speech/.venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3225, in _call_aside
f(*args, **kwargs)
File "/home/hoanglinh/Documents/practice_speech/.venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3269, in _initialize_master_working_set
for dist in working_set
File "/home/hoanglinh/Documents/practice_speech/.venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3269, in <genexpr>
for dist in working_set
File "/home/hoanglinh/Documents/practice_speech/.venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2776, in activate
declare_namespace(pkg)
File "/home/hoanglinh/Documents/practice_speech/.venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2275, in declare_namespace
_handle_ns(packageName, path_item)
File "/home/hoanglinh/Documents/practice_speech/.venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2208, in _handle_ns
loader.load_module(packageName)
File "/home/hoanglinh/Documents/practice_speech/google.py", line 12, in <module>
from google.cloud.speech import enums
File "/home/hoanglinh/Documents/practice_speech/.venv/lib/python3.6/site-packages/google/cloud/speech.py", line 19, in <module>
from google.cloud.speech_v1 import SpeechClient
ImportError: cannot import name 'SpeechClient'
Am i doing something wrong ? when i search the error online there only 1 question with no answer to it
UPDATE:
i changed from
google.cloud import speech_v1 as speech
to this
from google.cloud import speech
now i got another return error with traceback like so
Traceback (most recent call last):
File "google.py", line 11, in <module>
from google.cloud import speech
File "/home/hoanglinh/Documents/practice_speech/.venv/lib/python3.6/site-packages/google/cloud/speech.py", line 19, in <module>
from google.cloud.speech_v1 import SpeechClient
File "/home/hoanglinh/Documents/practice_speech/.venv/lib/python3.6/site-packages/google/cloud/speech_v1/__init__.py", line 17, in <module>
from google.cloud.speech_v1.gapic import speech_client
File "/home/hoanglinh/Documents/practice_speech/.venv/lib/python3.6/site-packages/google/cloud/speech_v1/gapic/speech_client.py", line 18, in <module>
import pkg_resources
File "/home/hoanglinh/Documents/practice_speech/.venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3241, in <module>
#_call_aside
File "/home/hoanglinh/Documents/practice_speech/.venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3225, in _call_aside
f(*args, **kwargs)
File "/home/hoanglinh/Documents/practice_speech/.venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3269, in _initialize_master_working_set
for dist in working_set
File "/home/hoanglinh/Documents/practice_speech/.venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3269, in <genexpr>
for dist in working_set
File "/home/hoanglinh/Documents/practice_speech/.venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2776, in activate
declare_namespace(pkg)
File "/home/hoanglinh/Documents/practice_speech/.venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2275, in declare_namespace
_handle_ns(packageName, path_item)
File "/home/hoanglinh/Documents/practice_speech/.venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2208, in _handle_ns
loader.load_module(packageName)
File "/home/hoanglinh/Documents/practice_speech/google.py", line 12, in <module>
from google.cloud.speech import enums
ImportError: cannot import name 'enums'
Have anyone tried this library before ? because it seem there so much errors just with following the docs of its

The following error message is seen
from google.cloud.speech import enums
ImportError: cannot import name 'enums'
if an 'new' installation of the google speech api was performed. Please see this page.
Along the same lines usage of nanos attributes would result in the following message if you have update the api
AttributeError: 'datetime.timedelta' object has no attribute 'nanos'
Please see this page. Use 'microseconds' instead of 'nanos'.

First solution try to check your python3.6/site-packages/google/cloud if there is speech_v1. if there is none, you need to install it first
Second solution try to check your python3.6/site-packages/google/cloud if there is an existing speech file, if it exists then the cause of the import is shadowing. since your alias is 'speech'
Hope this helps
try this line of codes if your using speech_v1:
from google.cloud import speech_v1 as speech
from google.cloud.speech_v1 import enums
from google.cloud.speech_v1 import types
speech:
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types

If you can check this link.
Google has moved the AudioEncodings under google.cloud.speech_v1.types you can use it by importing types and then running the code below:
from google.cloud.speech_v1 import types
types.RecognitionConfig.AudioEncoding.LINEAR16

From Google Cloud documentation :
Enums and Types
WARNING: Breaking change
The submodules enums and types have been removed.
Before:
from google.cloud import videointelligence
features = [videointelligence.enums.Feature.SPEECH_TRANSCRIPTION]
video_context = videointelligence.types.VideoContext()
After:
from google.cloud import videointelligence
features = [videointelligence.Feature.SPEECH_TRANSCRIPTION]
video_context = videointelligence.VideoContext()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Whisper Module Python Speech to Text - python

Related

facebook graph api python -

Python Speech Recognition KeyError BufferedReader

How can i work with templates in Python-pptx

pyglet.media.codecs.wave.WAVEFormatException: file does not start with RIFF id

google cloud speech ImportError: cannot import name 'enums'

Categories

Resources