Extract a segment from a .wav file - python

I have the following code to load a .wav file and play it:
import base64
import winsound
with open('file.wav','rb') as f:
data = base64.b64encode(f.read())
winsound.PlaySound(base64.b64decode(data), winsound.SND_MEMORY)
It plays the file no problem but now I would like to extract a 'chunk' let's say from 233 to 300 and play that portion only.
seg = data[233:300]
winsound.PlaySound(base64.b64decode(seg), winsound.SND_MEMORY)
I Get: TypeError: 'sound' must be str or None, not 'bytes'

PlaySound() is expecting a fully-formed WAV file, not a segment of PCM audio data. From the docs for PlaySound(), the underlying function called by Python's winsound:
The SND_MEMORY flag indicates that the lpszSoundName parameter is a pointer to an in-memory image of the WAVE file.
(emphasis added)
Rather than playing around with the internals of WAV files (though that isn't too hard, if you're interested), I'd suggest a more flexible audio library like pygame's. There, you could use the music module and it's set_pos function or use the Sound class to access the raw audio data and cut it like you proposed in your question.

Related

MP3 loading using librosa return empty data when start_time metadata is 0

I have a dataset of thousands of bird chirps audios (mp3) and I try to load them using librosa.load()
MP3 files are loaded but, most of the time, resulting data is an empty np.ndarray instead of a np.ndarray filled with floats
Using pydub.utils.mediainfo() I wanted to compare MP3 metadata. This function return information such as sampling_rate, codec, duration, bitrate, start_time, ...
I found out that start_time information was the explanation of failed loadings. Indeed, every file where start_time is 0 are not loaded correctly. At the contrary every file where start_time is over 0 are loaded correctly.
I have no problem listening every single MP3 file using VLC audio player.
Is there anything that can explain this behavior? Is there any solution to make these loadings succeed?
I had the same very specific error. The error message I was getting was "Input signal length=0 is too small to resample from 48000->22050", which was because librosa was loading empty arrays in the same circumstances as you mention.
My workaround for it was to specify a duration parameter, in this case I set it to the full length of the file:
dur = pydub.utils.mediainfo(filepath)["duration"]
data, sr = librosa.load(filepath, duration = math.floor(float(dur)))
This solved the empty arrays for me

Convert the following codes(MIDIDDSP) to MIDI or wav

I am confused about these codes. Basically, I am trying using MIDIDDSP to synthesis midi file. However, I couldn't find how to convert the following codes to MIDI. The link of this model is Minimal example
Here are the simple codes they've provided, and I didn't add anything, just follow what they've provided.
from midi_ddsp import synthesize_midi, load_pretrained_model
midi_file = 'ode_to_joy.mid'
# Load pre-trained model
synthesis_generator, expression_generator = load_pretrained_model()
# Synthesize MIDI
output = synthesize_midi(synthesis_generator, expression_generator, midi_file)
# The synthesized audio
synthesized_audio = output['mix_audio']
synthesized_audio
Synthesized_audio return an array, neither a wav nor midi file. Hope someone can tell me how to convert it to audio or midi. Thanks a lot

Python, speech_recognition tool does not recognize .wav file

I have generated a .wav audio file containing some speech with some other interference speech in the background.
This code worked for me for a test .wav file:
import speech_recognition as sr
r = sr.Recognizer()
with sr.WavFile(wav_path) as source:
audio = r.record(source)
text = r.recognize_google(audio)
If I use my .wav file, I get the following error:
ValueError: Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format
The situation slightly improves if I save this .wav file with soundfile:
import soundfile as sf
wav, samplerate = sf.read(wav_path)
sf.write(saved_wav_path, original_wav, fs)
and then load the new saved_wav_path back into the first block of code, this time I get:
if not isinstance(actual_result, dict) or len(actual_result.get("alternative", [])) == 0: raise UnknownValueError()
The audio files were saved as
wavfile.write(wav_path, fs, data)
where wav_path = 'data.wav'. Any ideas?
SOLUTION:
Saving the audio data the following way generates the correct .wav files:
import wavio
wavio.write(wav_path, data, fs ,sampwidth=2)
From a brief look at the code in the speech_recognition package, it appears that it uses wave from the Python standard library to read WAV files. Python's wave library does not handle floating point WAV files, so you'll have to ensure that you use speech_recognition with files that were saved in an integer format.
SciPy's function scipy.io.wavfile.write will create an integer file if you pass it an array of integers. So if data is a floating point numpy array, you could try this:
from scipy.io import wavfile
# Convert `data` to 32 bit integers:
y = (np.iinfo(np.int32).max * (data/np.abs(data).max())).astype(np.int32)
wavfile.write(wav_path, fs, y)
Then try to read that file with speech_recognition.
Alternatively, you could use wavio (a small library that I created) to save your data to a WAV file. It also uses Python's wave library to create its output, so speech_recognition should be able to read the files that it creates.
I couldn't figure out what the sampwidth should be for wavio from its documentation; however, I added the following line sounddevice.default.dtype='int32', 'int32' which allowed sounddevice, scipy.io.wavfile.write / soundfile, and speech_recognizer to finally work together. The default dtype for sounddevice was float32 for both input and output. I tried changing only the output but it didnt work. Weirdly, audacity still thinks the output files are in float32. I am not suggesting this is a better solution, but it did work with both soundfile and scipy.
I also noticed another oddity. When sounddevice.default.dtype was left at the default [float32, float32] and I opened the resulting file in audacity. From audacity, I exported it and this exported wav would work with speechrecognizer. Audacity says its export is float32 and the same samplerate, so I don't fully understand. I am a noob but looked at both files in a hex editor and they look the same for the first 64 hex values then they differ... so it seems like the header is the same. Those two look very different than the file I made using int32 output, so seems like there's another factor at play...
Similar to Warren's answer, I was able to resolve this issue by rewriting the WAV file using pydub:
from pydub import AudioSegment
filename = "payload.wav" # File that already exists.
sound = AudioSegment.from_mp3(filename)
sound.export(filename, format="wav")

having cv2.imread reading images from file objects or memory-stream-like data (here non-extracted tar)

I have a .tar file containing several hundreds of pictures (.png). I need to process them via opencv.
I am wondering whether - for efficiency reasons - it is possible to process them without passing by the disc. In other, words I want to read the pictures from the memory stream related to the tar file.
Consider for instance
import tarfile
import cv2
tar0 = tarfile.open('mytar.tar')
im = cv2.imread( tar0.extractfile('fname.png').read() )
The last line doesn't work as imread expects a file name rather than a stream.
Consider that this way of reading directly from the tar stream can be achieved e.g. for text (see e.g. this SO question).
Any suggestion to open the stream with the correct png encoding?
Untarring to ramdisk is of course an option, although I was looking for something more cachable.
Thanks to the suggestion of #abarry and this SO answer I managed to find the answer.
Consider the following
def get_np_array_from_tar_object(tar_extractfl):
'''converts a buffer from a tar file in np.array'''
return np.asarray(
bytearray(tar_extractfl.read())
, dtype=np.uint8)
tar0 = tarfile.open('mytar.tar')
im0 = cv2.imdecode(
get_np_array_from_tar_object(tar0.extractfile('fname.png'))
, 0 )
Perhaps use imdecode with a buffer coming out of the tar file? I haven't tried it but seems promising.

Is it possible to code images into a python script?

Instead of using directories to reference an image, is it possible to code an image into the program directly?
You can use the base64 module to embed data into your programs. From the base64 documentation:
>>> import base64
>>> encoded = base64.b64encode('data to be encoded')
>>> encoded
'ZGF0YSB0byBiZSBlbmNvZGVk'
>>> data = base64.b64decode(encoded)
>>> data
'data to be encoded'
Using this ability you can base64 encode an image and embed the resulting string in your program. To get the original image data you would pass that string to base64.b64decode.
Try img2py script. It's included as part of wxpython (google to see if you can dl seperately).
img2py.py -- Convert an image to PNG format and embed it in a Python
module with appropriate code so it can be loaded into a program at runtime. The benefit is that since it is Python source code it can be delivered as a .pyc or 'compiled' into the program using freeze, py2exe, etc.
Usage:
img2py.py [options] image_file python_file
There is no need to base64 encode the string, just paste it's repr into the code
If you mean, storing the bytes that represent the image in the program code itself, you could do it by base64 encoding the image file, and setting a variable to that string.
You could also declare a byte array, where the contents of the array are the bytes that represent the image.
In both cases, if you want to operate on the image, you may need to decode the value that you have included in your source code.
Warning: you may be treading on a performance minefield here.
A better way might be to store the image/s in the directory structure of your module, and the loading it on demand (even caching it). You could write a generalized method/function that loads the right image based on some identifier which maps to the particular image file name that is part and parcel of your module.

Categories

Resources