I am creating a program to turn text into speech (TTS).
What I've done so far is to split a given word into syllables and then play each pre-recorded syllables.
For example:
INPUT: [TELEVISION]
OUTPUT: [TEL - E - VI - SION]
And then the program plays each sound in order:
First: play TEL.wav
Second: play E.wav
Third: play VI.wav
Fourth: play SION.wav
I am using wave and PyAudio to play each wav file:
wf = wave.open("sounds/%s.wav" %(ss), 'rb')
p = pyaudio.PyAudio()
stream = p.open(...)
data = wf.readframes(CHUNK)
stream.write(data)
... etc.
Now the problem is that during the playback there is a delay between each audio file and the spoken word sounds unnatural.
Is it possible to mix these audio files without creating a new file and play them with 0.2s delay between each audio file?
Edit: I tried Nullman's solution and it worked better than just calling a new wf on each sound.
I also tried putting a crossfade following these instructions.
Related
I know the question does not seems to make sense but let me explain.
So I have a voice changer software that Change the voice in real time. If i open the audacity and choose the micro of that software I can speak and record with the voice changed.
Now what I want is, I have a audio file already recorded and I want to pass that file into that same microphone (to simulate me speaking) and save the output with the voice changed in another file. I made some attemps using pyaudio but no success.
The idea here is to use a tts module in python to read a dataset I have with multiple lines, save the output in a file and then pass that output to the microphone to change the voice and save in another file. That way I can automate a creation of a new dataset with a new speaker to train a new tts. But the problem is I missing the way to pass a file to the microphone to simulate me speaking to it but instead using an audio file already recorded.
Sorry of it was confused. I made my best to explain. Hope someone can help me.
Thank you in advanced!
This is what I have but no success.
import pyaudio
import wave
# Open the audio file
wf = wave.open("my_audio_file.wav", "rb")
# Open the output file
wf_out = wave.open("my_output_file.wav", "wb")
# Set the output file's format and parameters to match the input file
wf_out.setframerate(wf.getframerate())
wf_out.setsampwidth(wf.getsampwidth())
wf_out.setnchannels(wf.getnchannels())
# Open the microphone using pyaudio
p = pyaudio.PyAudio()
# Create a stream to send the audio data to the microphone
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
channels=wf.getnchannels(),
rate=wf.getframerate(),
output=True)
# Start streaming the audio data to the microphone
stream.start_stream()
# Send the audio data to the stream and output file
data = wf.readframes(1024)
while data != "":
stream.write(data)
wf_out.writeframes(data)
data = wf.readframes(1024)
# Stop the stream
stream.stop_stream()
# Close the stream, microphone, and output file
stream.close()
p.terminate()
wf_out.close()
I have a series of wav files I would like to combine and export as a single wav using Pydub. I would like the audio from the original files to play back at different times in the exported file e.g. the audio in audio_1.wav starts at time=0 in the exported file while the audio in audio_2.wav starts at time=5 instead of both starting at time=0 as the overlay function has them. Is there any way to do this? Below is the code I currently have for importing, overlaying, and exporting the audio files.
from pydub import AudioSegment
audio_1 = AudioSegment.from_file("audio_1.wav",
format="wav")
audio_2 = AudioSegment.from_file("audio_2.wav",
format="wav")
overlay = vln_audio_1.overlay(vla_audio_2)
file_handle = overlay.export("output2.wav", format="wav")
I didn't test it but based on documentation it may need overlay(..., position=5000)
BTW:
you may also add silence at the beginning to move audio
silence_5_seconds = AudioSegment.silent(duration=5000)
audio_2 = silence_5_seconds + audio_2
I'm learning how to use py-webrtcvad library for Python and trying to pick out segments containing speech on a .wav file. Right now, I don't know how to split a .wav files into smaller frames to put it into a vad object. Here's my code below
spf = wave.open("test4.wav", "rb")
signal = spf.readframes(-1)
vad = webrtcvad.Vad(3)
sample_rate = 16000
frame_duration = 10 # ms
#TODO: turn signal into segments and run the below print statement in a for loop
print('Contains speech: %s' % (vad.is_speech(signal, sample_rate)))
As I have read from https://github.com/wiseman/py-webrtcvad, the frames fed into the vad object must be 16-bit and either 10,20 or 30ms long. My sample is about 9s long and now I don't know how to split it into the appropriate frames. I have tested the example.py in the repo, but it is a bit too complicated for me so I tried to do a very simple example first.
Thank you.
I'd like to include a short silence duration at the end of an audio clip. I haven't found any specific functions in the Moviepy documentation, so I've resorted to creating a muted audio file of 500ms and concatenating it with the original audio file.
In some cases, this concatenation will introduce a noticeable glitch at the intersection, and I haven't figured out why. I also realized by importing the concatenated audiofile to Audacity that Moviepy actually creates two audio tracks when concatenating.
Do you know a better way to add silence to the end of the clip, or maybe the reason why this glitch appears sometimes (in my experience about 1 every 4 instances)?
Here's my code:
from moviepy.editor import *
temp_audio = "original audio dir"
silence = "silence audio dir"
audio1 = AudioFileClip(temp_audio) #original audio file
audio2 = AudioFileClip(silence) #silence audio file
final_audio = concatenate_audioclips([audio1,audio2])
final_audio.write_audiofile(output)
I am currently using Python 3.9.5 and Moviepy 1.0.3
may be fps=44100
will work . mp3 file's frequence
You can use the below solution for adding the silence at the end or start of audio:
from pydub import AudioSegment
orig_seg = AudioSegment.from_file('audio.wav')
silence_seg = AudioSegment.silent(duration=1000) # 1000 for 1 sec, 2000 for 2 secs
# for adding silence at the end of audio
combined_audio = orig_seg + silence_seg
# for adding silence at the start of audio
#combined_audio = silence_seg + orig_seg
combined_audio.export('new_audio.wav', format='wav')
I want to record short audio clips from a USB microphone in Python. I have tried pyaudio, which seemed to fail communicating with ALSA, and alsaaudio, the code example of which produces an unreadable files.
So my question: What is the easiest way to record clips from a USB mic in Python?
This script records to test.wav while printing the current amplitute:
import alsaaudio, wave, numpy
inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE)
inp.setchannels(1)
inp.setrate(44100)
inp.setformat(alsaaudio.PCM_FORMAT_S16_LE)
inp.setperiodsize(1024)
w = wave.open('test.wav', 'w')
w.setnchannels(1)
w.setsampwidth(2)
w.setframerate(44100)
while True:
l, data = inp.read()
a = numpy.fromstring(data, dtype='int16')
print numpy.abs(a).mean()
w.writeframes(data)