After doing split in an audio file with Librosa, I want to know how to obtain the resultant fragments in mp3 filesSee audio image
Can you just open individual files like
fragment1 = open("x.mp3", "a")
fragment2 = open("y.mp3", "a")
and then write to each of those using what you have as variables?
Related
Is there any way to extract MXF (Material Exchange Format) file data using python?
All I want to do is get data like video duration, actual video stream and if possible the voice in mp3 or any audio format from an MXF file.
You can do those things using ffmpeg-python.
Example of how to extract video duration:
import ffmpeg
filename = 'sample_960x400_ocean_with_audio.mxf'
# Get duration
# Credit: https://github.com/kkroening/ffmpeg-python/issues/545#issuecomment-836792082
video_info = ffmpeg.probe(filename)
duration = float(video_info['format']['duration'])
print(f'Duration: {duration} seconds')
Example of how to convert the audio to MP3:
import ffmpeg
filename = 'sample_960x400_ocean_with_audio.mxf'
# Load file
in_file = ffmpeg.input(filename)
# Get audio track, convert to mp3
in_file.output('file.mp3').run()
Example of how to convert the audio and video to mp4:
import ffmpeg
filename = 'sample_960x400_ocean_with_audio.mxf'
# Load file
in_file = ffmpeg.input(filename)
# Get video, convert to mp4
in_file.output('file.mp4').run()
Note that to use ffmpeg-python, you must install both ffmpeg-python and ffmpeg. See the documentation for more.
I have a series of wav files I would like to combine and export as a single wav using Pydub. I would like the audio from the original files to play back at different times in the exported file e.g. the audio in audio_1.wav starts at time=0 in the exported file while the audio in audio_2.wav starts at time=5 instead of both starting at time=0 as the overlay function has them. Is there any way to do this? Below is the code I currently have for importing, overlaying, and exporting the audio files.
from pydub import AudioSegment
audio_1 = AudioSegment.from_file("audio_1.wav",
format="wav")
audio_2 = AudioSegment.from_file("audio_2.wav",
format="wav")
overlay = vln_audio_1.overlay(vla_audio_2)
file_handle = overlay.export("output2.wav", format="wav")
I didn't test it but based on documentation it may need overlay(..., position=5000)
BTW:
you may also add silence at the beginning to move audio
silence_5_seconds = AudioSegment.silent(duration=5000)
audio_2 = silence_5_seconds + audio_2
I have several .vti files . How can I convert .vti files into a standard format of 2D images? I've tried ParaView, but didn't find any option to convert in one of above mentioned formats like JPEG or PNG.
In ParaView: load your vti file, then in the File menu click "Save Data..." and chose the image file format you prefer (I tested with PNG, it works).
In Python, this script reads test.vti and saves test.jpg:
import vtk
reader = vtk.vtkXMLImageDataReader()
reader.SetFileName("test.vti")
reader.Update()
image = reader.GetOutput()
writer = vtk.vtkJPEGWriter()
writer.SetInputData(image)
writer.SetFileName("test.jpg")
writer.Write()
I have generated a .wav audio file containing some speech with some other interference speech in the background.
This code worked for me for a test .wav file:
import speech_recognition as sr
r = sr.Recognizer()
with sr.WavFile(wav_path) as source:
audio = r.record(source)
text = r.recognize_google(audio)
If I use my .wav file, I get the following error:
ValueError: Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format
The situation slightly improves if I save this .wav file with soundfile:
import soundfile as sf
wav, samplerate = sf.read(wav_path)
sf.write(saved_wav_path, original_wav, fs)
and then load the new saved_wav_path back into the first block of code, this time I get:
if not isinstance(actual_result, dict) or len(actual_result.get("alternative", [])) == 0: raise UnknownValueError()
The audio files were saved as
wavfile.write(wav_path, fs, data)
where wav_path = 'data.wav'. Any ideas?
SOLUTION:
Saving the audio data the following way generates the correct .wav files:
import wavio
wavio.write(wav_path, data, fs ,sampwidth=2)
From a brief look at the code in the speech_recognition package, it appears that it uses wave from the Python standard library to read WAV files. Python's wave library does not handle floating point WAV files, so you'll have to ensure that you use speech_recognition with files that were saved in an integer format.
SciPy's function scipy.io.wavfile.write will create an integer file if you pass it an array of integers. So if data is a floating point numpy array, you could try this:
from scipy.io import wavfile
# Convert `data` to 32 bit integers:
y = (np.iinfo(np.int32).max * (data/np.abs(data).max())).astype(np.int32)
wavfile.write(wav_path, fs, y)
Then try to read that file with speech_recognition.
Alternatively, you could use wavio (a small library that I created) to save your data to a WAV file. It also uses Python's wave library to create its output, so speech_recognition should be able to read the files that it creates.
I couldn't figure out what the sampwidth should be for wavio from its documentation; however, I added the following line sounddevice.default.dtype='int32', 'int32' which allowed sounddevice, scipy.io.wavfile.write / soundfile, and speech_recognizer to finally work together. The default dtype for sounddevice was float32 for both input and output. I tried changing only the output but it didnt work. Weirdly, audacity still thinks the output files are in float32. I am not suggesting this is a better solution, but it did work with both soundfile and scipy.
I also noticed another oddity. When sounddevice.default.dtype was left at the default [float32, float32] and I opened the resulting file in audacity. From audacity, I exported it and this exported wav would work with speechrecognizer. Audacity says its export is float32 and the same samplerate, so I don't fully understand. I am a noob but looked at both files in a hex editor and they look the same for the first 64 hex values then they differ... so it seems like the header is the same. Those two look very different than the file I made using int32 output, so seems like there's another factor at play...
Similar to Warren's answer, I was able to resolve this issue by rewriting the WAV file using pydub:
from pydub import AudioSegment
filename = "payload.wav" # File that already exists.
sound = AudioSegment.from_mp3(filename)
sound.export(filename, format="wav")
I have a list of .wav files in binary format (they are coming from a websocket), which I want to join in a single binary .wav file to then do speech recognition with it. I have been able to make it work with the following code:
audio = [binary_wav1, binary_wav2,..., binary_wavN] # a list of .wav binary files coming from a socket
audio = [io.BytesIO(x) for x in audio]
# Join wav files
with wave.open('/tmp/input.wav', 'wb') as temp_input:
params_set = False
for audio_file in audio:
with wave.open(audio_file, 'rb') as w:
if not params_set:
temp_input.setparams(w.getparams())
params_set = True
temp_input.writeframes(w.readframes(w.getnframes()))
# Do speech recognition
binary_audio = open('/tmp/input.wav', 'rb').read())
ASR(binary_audio)
The problem is that I don't want to write the file '/tmp/input.wav' in disk. Is there any way to do it without writing any file in the disk?
Thanks.
The general solution for having a file but never putting it to disk is a stream. For this we use the io library which is the default library for working with in-memory streams. You even already use BytesIO earlier in your code it seems.
audio = [binary_wav1, binary_wav2,..., binary_wavN] # a list of .wav binary files coming from a socket
audio = [io.BytesIO(x) for x in audio]
# Join wav files
params_set = False
temp_file = io.BytesIO()
with wave.open(temp_file, 'wb') as temp_input:
for audio_file in audio:
with wave.open(audio_file, 'rb') as w:
if not params_set:
temp_input.setparams(w.getparams())
params_set = True
temp_input.writeframes(w.readframes(w.getnframes()))
#move the cursor back to the beginning of the "file"
temp_file.seek(0)
# Do speech recognition
binary_audio = temp_file.read()
ASR(binary_audio)
note I don't have any .wav files to try this out on. It's up to the wave library to handle the difference between real files and buffered streams properly.
With scipy and numpy you can read the wav files as numpy arrays and than do the modifications you want.
from scipy.io import wavfile
import numpy as np
# load files
_, arr1 = wavfile.read('song.wav')
_, arr2 = wavfile.read('Aaron_Copland-Quiet_City.wav')
print(arr1.shape)
print(arr2.shape)
>>> (1323001,)
>>> (1323000,)
# make new array by concatenating two audio waves
new_arr = np.hstack((arr1, arr2))
print(new_arr.shape)
>>> (2646001,)
# save new audio wave
wavfile.write('new_audio.wav')