I have a stream of PCM audio frames coming into my Python script, and I am able to save blocks of these frames as .wav files as such:
def update_wav():
filename = "test.wav"
wav_file = wave.open(filename, "wb")
n_frames = len(audio)
wav_file.setparams((n_channels, sample_width, sample_rate, n_frames, comptype, compname))
for sample in audio:
wav_file.writeframes(struct.pack('h', int(sample * 32767.0)))
wav_file.close()
However, I'd like this to continually update as new frames come in. Is there way to writeframe in a way that appends to an existing .wav file? Right now I am only able to accomplish an overwrite.
I found a way of doing this with SciPy, it actually seems to be the default functionality for their writing method.
import scipy.io.wavfile
def update_wav():
numpy_data = numpy.array(audio, dtype=float)
scipy.io.wavfile.write("test.wav", 8000, numpy_data)
Related
I want to add to a wav file, ideal would be from a numpy array. I tried the following code:
data = stream.read(CHUNK)
audio_numpy = numpy.frombuffer(data, dtype=numpy.int16)
scipy.io.wavfile.write(FILENAME, RATE, audio_numpy)
where stream is created by
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
as I heard that scipy would add to the file and not overwrite it. Unfortunately however, it overwrites the file.
How can I append to a WAV file? The input comes from the microphone.
The WAV file should be accessed from ffmpeg later, so that the WAV file should not be written in total again, as this is also inefficient.
I have generated a .wav audio file containing some speech with some other interference speech in the background.
This code worked for me for a test .wav file:
import speech_recognition as sr
r = sr.Recognizer()
with sr.WavFile(wav_path) as source:
audio = r.record(source)
text = r.recognize_google(audio)
If I use my .wav file, I get the following error:
ValueError: Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format
The situation slightly improves if I save this .wav file with soundfile:
import soundfile as sf
wav, samplerate = sf.read(wav_path)
sf.write(saved_wav_path, original_wav, fs)
and then load the new saved_wav_path back into the first block of code, this time I get:
if not isinstance(actual_result, dict) or len(actual_result.get("alternative", [])) == 0: raise UnknownValueError()
The audio files were saved as
wavfile.write(wav_path, fs, data)
where wav_path = 'data.wav'. Any ideas?
SOLUTION:
Saving the audio data the following way generates the correct .wav files:
import wavio
wavio.write(wav_path, data, fs ,sampwidth=2)
From a brief look at the code in the speech_recognition package, it appears that it uses wave from the Python standard library to read WAV files. Python's wave library does not handle floating point WAV files, so you'll have to ensure that you use speech_recognition with files that were saved in an integer format.
SciPy's function scipy.io.wavfile.write will create an integer file if you pass it an array of integers. So if data is a floating point numpy array, you could try this:
from scipy.io import wavfile
# Convert `data` to 32 bit integers:
y = (np.iinfo(np.int32).max * (data/np.abs(data).max())).astype(np.int32)
wavfile.write(wav_path, fs, y)
Then try to read that file with speech_recognition.
Alternatively, you could use wavio (a small library that I created) to save your data to a WAV file. It also uses Python's wave library to create its output, so speech_recognition should be able to read the files that it creates.
I couldn't figure out what the sampwidth should be for wavio from its documentation; however, I added the following line sounddevice.default.dtype='int32', 'int32' which allowed sounddevice, scipy.io.wavfile.write / soundfile, and speech_recognizer to finally work together. The default dtype for sounddevice was float32 for both input and output. I tried changing only the output but it didnt work. Weirdly, audacity still thinks the output files are in float32. I am not suggesting this is a better solution, but it did work with both soundfile and scipy.
I also noticed another oddity. When sounddevice.default.dtype was left at the default [float32, float32] and I opened the resulting file in audacity. From audacity, I exported it and this exported wav would work with speechrecognizer. Audacity says its export is float32 and the same samplerate, so I don't fully understand. I am a noob but looked at both files in a hex editor and they look the same for the first 64 hex values then they differ... so it seems like the header is the same. Those two look very different than the file I made using int32 output, so seems like there's another factor at play...
Similar to Warren's answer, I was able to resolve this issue by rewriting the WAV file using pydub:
from pydub import AudioSegment
filename = "payload.wav" # File that already exists.
sound = AudioSegment.from_mp3(filename)
sound.export(filename, format="wav")
I have a list of .wav files in binary format (they are coming from a websocket), which I want to join in a single binary .wav file to then do speech recognition with it. I have been able to make it work with the following code:
audio = [binary_wav1, binary_wav2,..., binary_wavN] # a list of .wav binary files coming from a socket
audio = [io.BytesIO(x) for x in audio]
# Join wav files
with wave.open('/tmp/input.wav', 'wb') as temp_input:
params_set = False
for audio_file in audio:
with wave.open(audio_file, 'rb') as w:
if not params_set:
temp_input.setparams(w.getparams())
params_set = True
temp_input.writeframes(w.readframes(w.getnframes()))
# Do speech recognition
binary_audio = open('/tmp/input.wav', 'rb').read())
ASR(binary_audio)
The problem is that I don't want to write the file '/tmp/input.wav' in disk. Is there any way to do it without writing any file in the disk?
Thanks.
The general solution for having a file but never putting it to disk is a stream. For this we use the io library which is the default library for working with in-memory streams. You even already use BytesIO earlier in your code it seems.
audio = [binary_wav1, binary_wav2,..., binary_wavN] # a list of .wav binary files coming from a socket
audio = [io.BytesIO(x) for x in audio]
# Join wav files
params_set = False
temp_file = io.BytesIO()
with wave.open(temp_file, 'wb') as temp_input:
for audio_file in audio:
with wave.open(audio_file, 'rb') as w:
if not params_set:
temp_input.setparams(w.getparams())
params_set = True
temp_input.writeframes(w.readframes(w.getnframes()))
#move the cursor back to the beginning of the "file"
temp_file.seek(0)
# Do speech recognition
binary_audio = temp_file.read()
ASR(binary_audio)
note I don't have any .wav files to try this out on. It's up to the wave library to handle the difference between real files and buffered streams properly.
With scipy and numpy you can read the wav files as numpy arrays and than do the modifications you want.
from scipy.io import wavfile
import numpy as np
# load files
_, arr1 = wavfile.read('song.wav')
_, arr2 = wavfile.read('Aaron_Copland-Quiet_City.wav')
print(arr1.shape)
print(arr2.shape)
>>> (1323001,)
>>> (1323000,)
# make new array by concatenating two audio waves
new_arr = np.hstack((arr1, arr2))
print(new_arr.shape)
>>> (2646001,)
# save new audio wave
wavfile.write('new_audio.wav')
I'm trying to write a wav upload function for my webapp. The front end portion seems to be working great. The problem is my backend (python). When it receives the binary data I'm not sure how to write it to a file. I tried using the basic write functon, and the sound is corrupt... Sounds like "gobbly-gook". Is there a special way to write wav files in Python?
Here is my backend... Not really much to it.
form = cgi.FieldStorage()
fileData = str(form.getvalue('data'))
with open("audio", 'w') as file:
file.write(fileData)
I even tried...
with open("audio", 'wb') as file:
file.write(fileData)
I am using aplay to play the sound, and I noticed that all the properties are messed up as well.
Before:
Signed 16 bit Little Endian, Rate 44100 Hz, Stereo
After upload:
Unsigned 8 bit, Rate 8000 Hz, Mono
Perhaps the wave module might help?
import wave
import struct
import numpy as np
rate = 44100
def sine_samples(freq, dur):
# Get (sample rate * duration) samples on X axis (between freq
# occilations of 2pi)
X = (2*np.pi*freq/rate) * np.arange(rate*dur)
# Get sine values for these X axis samples (as integers)
S = (32767*np.sin(X)).astype(int)
# Pack integers as signed "short" integers (-32767 to 32767)
as_packed_bytes = (map(lambda v:struct.pack('h',v), S))
return as_packed_bytes
def output_wave(path, frames):
# Python 3.X allows the use of the with statement
# with wave.open(path,'w') as output:
# # Set parameters for output WAV file
# output.setparams((2,2,rate,0,'NONE','not compressed'))
# output.writeframes(frames)
output = wave.open(path,'w')
output.setparams((2,2,rate,0,'NONE','not compressed'))
output.writeframes(frames)
output.close()
def output_sound(path, freq, dur):
# join the packed bytes into a single bytes frame
frames = b''.join(sine_samples(freq,dur))
# output frames to file
output_wave(path, frames)
output_sound('sine440.wav', 440, 2)
EDIT:
I think in your case, you might only need:
packedData = map(lambda v:struct.pack('h',v), fileData)
frames = b''.join(packedData)
output_wave('example.wav', frames)
In this case, you just need to know the sampling rate. Check the wave module for information on the other output file parameters (i.e. the arguments to the setparams method).
The code I pasted will write a wav file as long as the data isn't corrupt. It was not necessary to use the wave module.
with open("audio", 'w') as file:
file.write(fileData)
I was originally reading the file in Javascript as FileAPI.readAsBinaryString. I changed this to FileAPI.readAsDataURL, and then decoded it in python using base64.decode(). Once I decoded it I was able to just write the data to a file. The .wav file was in perfect condition.
I'm writing a code that supposed to give some audio output to the user based on his action, and I want to generate the sound rather than having a fixed number of wav files to play. Now, what I'm doing is to generate the signal in numpy format, store the data in a wav file and then read the same file into pyaudio. I think this is redundant, however, I couldn't find a way to do that. My question is, can I stream a numpy array (or a regular list) directly into my the pyaudio to play?
If its just playback and does not need to be synchronised to anything then you can just do the following:
# Open stream with correct settings
stream = self.p.open(format=pyaudio.paFloat32,
channels=CHANNELS,
rate=48000,
output=True,
output_device_index=1
)
# Assuming you have a numpy array called samples
data = samples.astype(np.float32).tostring()
stream.write(data)
I use this method and it works fine for me. If you need to record at the same time then this won't work.
If you are just looking to generate audio tones then below code may be useful,
It does need pyaudio that can be installed as
pip install pyaudio
Sample Code
#Play a fixed frequency sound.
from __future__ import division
import math
import pyaudio
#See http://en.wikipedia.org/wiki/Bit_rate#Audio
BITRATE = 44100 #number of frames per second/frameset.
#See http://www.phy.mtu.edu/~suits/notefreqs.html
FREQUENCY = 2109.89 #Hz, waves per second, 261.63=C4-note.
LENGTH = 1.2 #seconds to play sound
NUMBEROFFRAMES = int(BITRATE * LENGTH)
RESTFRAMES = NUMBEROFFRAMES % BITRATE
WAVEDATA = ''
for x in xrange(NUMBEROFFRAMES):
WAVEDATA = WAVEDATA+chr(int(math.sin(x/((BITRATE/FREQUENCY)/math.pi))*127+128))
#fill remainder of frameset with silence
for x in xrange(RESTFRAMES):
WAVEDATA = WAVEDATA+chr(128)
p = pyaudio.PyAudio()
stream = p.open(format = p.get_format_from_width(1),
channels = 1,
rate = BITRATE,
output = True)
stream.write(WAVEDATA)
stream.stop_stream()
stream.close()
p.terminate()
Code is slightly modified from this askubuntu site
You can directly stream the data through pyaudio, there is no need to write and read a .wav file.
import pyaudio
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paFloat32,
channels=1,
rate=44100,
frames_per_buffer=1024,
output=True,
output_device_index=1
)
samples = np.sin(np.arange(50000)/20)
stream.write(samples.astype(np.float32).tostring())
stream.close()