looking for reading audio wave values in python - python

I'm trying to read and manipulate audio file ,
how to read and manipulate the waves value of a wave file using python?

The SciPy libraries have great resources for this:
Writing and Reading:
import numpy as np
from scipy.io import wavfile
fs = 44.1e3
t = np.arange(0, 1.0, 1.0/fs)
f1 = 440
f2 = 600
x = 0.5*np.sin(2*np.pi*f1*t) + 0.5*np.sin(2*np.pi*f2*t)
fname = 'output.wav'
wavfile.write( fname, fs, x )
fs, data = wavfile.read( fname )
print fs, data[:10]
Documentation:
http://docs.scipy.org/doc/scipy-0.14.0/reference/io.html#module-scipy.io.wavfile
Reading: http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.io.wavfile.read.html#scipy.io.wavfile.read
Writing: http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.io.wavfile.write.html#scipy.io.wavfile.write
This question has been asked before: Reading *.wav files in Python

Related

How can I reverse a scipy.signal.spectrogram to audio with Python?

I have:
import librosa
from scipy import signal
import scipy.io.wavfile as sf
samples, sample_rate = sf.read(args.file)
nperseg = int(sample_rate * 0.001 * 20)
frequencies, times, spectrogram = signal.spectrogram(samples,
sample_rate,
nperseg=nperseg,
window=signal.hann(nperseg))
audio_signal = librosa.griffinlim(spectrogram)
print(audio_signal, audio_signal.shape)
sf.write('test.wav', audio_signal, sample_rate)
However, this produces a (near) empty sound file.
As #DrSpill mentioned, scipy.io.wav.read and scipy.io.wav.write orders were wrong and also the import from librosa was not correct. This should do it:
import librosa
import numpy as np
import scipy.signal
import scipy.io.wavfile
# read file
file = "temp/processed_file.wav"
fs, sig = scipy.io.wavfile.read(file)
nperseg = int(fs * 0.001 * 20)
# process
frequencies, times, spectrogram = scipy.signal.spectrogram(sig,
fs,
nperseg=nperseg,
window=scipy.signal.hann(nperseg))
audio_signal = librosa.core.spectrum.griffinlim(spectrogram)
print(audio_signal, audio_signal.shape)
# write output
scipy.io.wavfile.write('test.wav', fs, np.array(audio_signal, dtype=np.int16))
Remark:
The resulting file had an accelerated tempo when I heard it, I think this is due to your processing but with some tweaking it should work.
A good alternative, would be to only use librosa, like this:
import librosa
import numpy as np
# read file
file = "temp/processed_file.wav"
sig, fs = librosa.core.load(file, sr=8000)
# process
abs_spectrogram = np.abs(librosa.core.spectrum.stft(sig))
audio_signal = librosa.core.spectrum.griffinlim(abs_spectrogram)
print(audio_signal, audio_signal.shape)
# write output
librosa.output.write_wav('test2.wav', audio_signal, fs)
librosa.output was removed. It is no longer providing its deprecated output module. Instead try soundfile.write:
import numpy as np
import soundfile as sf
sf.write('stereo_file.wav', np.random.randn(10, 2), 44100, 'PCM_24')
#Per your code you could try:
sf.write('test.wav', audio_signal, sample_rate, 'PCM_24')

How to create multichannel .WAV file in Python?

The .WAV format looks like it should allow more than two channels (nChannels).
But unfortunately scipy.io.wavfile only writes 1 or 2.
I don't really want to manually write my own Python WAV-writer, but I can't find anything out there.
Is there any code out there that does the job?
It turns out the documentation for scipy.io.wavfile is incorrect.
Looking at the source code, I can clearly see that it accepts an arbitrary number of channels.
The following code works:
import numpy as np
from scipy.io import wavfile
fs = 48000
nsamps = fs * 10
A, Csharp, E, G = 440.0, 554.365, 660.0, 783.991
def sine(freqHz):
τ = 2 * np.pi
return np.sin(
np.linspace(0, τ * freqHz * nsamps / fs, nsamps, endpoint=False)
)
A7_chord = np.array( [ sine(A), sine(Csharp), sine(E), sine(G) ] ).T
wavfile.write("A7--4channel.wav", fs, A7_chord)

Creating .wav file from bytes

I am reading bytes from wav audio downloaded from a URL. I would like to "reconstruct" these bytes into a .wav file. I have attempted the code below, but the resulting file is pretty much static. For example, when I download audio of myself speaking, the .wav file produced is static only, but I can hear slight alterations/distortions when I know the audio should be playing my voice. What am I doing wrong?
from pprint import pprint
import scipy.io.wavfile
import numpy
#download a wav audio recording from a url
>>>response = client.get_recording(r"someurl.com")
>>>pprint(response)
(b'RIFFv\xfc\x03\x00WAVEfmt \x10\x00\x00\x00\x01\x00\x01\x00\x80>\x00\x00'
...
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
...
b'\xea\xff\xfd\xff\x10\x00\x0c\x00\xf0\xff\x06\x00\x10\x00\x06\x00'
...)
>>>a=bytearray(response)
>>>pprint(a)
bytearray(b'RIFFv\xfc\x03\x00WAVEfmt \x10\x00\x00\x00\x01\x00\x01\x00'
b'\x80>\x00\x00\x00}\x00\x00\x02\x00\x10\x00LISTJ\x00\x00\x00INFOINAM'
b'0\x00\x00\x00Conference d95ac842-08b7-4380-83ec-85ac6428cc41\x00'
b'IART\x06\x00\x00\x00Nexmo\x00data\x00\xfc\x03\x00\xff\xff'
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
...
b'\x12\x00\xf6\xff\t\x00\xed\xff\xf6\xff\xfc\xff\xea\xff\xfd\xff'
...)
>>>b = numpy.array(a, dtype=numpy.int16)
>>>pprint(b)
array([ 82, 73, 70, ..., 255, 248, 255], dtype=int16)
>>>scipy.io.wavfile.write(r"C:\Users\somefolder\newwavfile.wav",
16000, b)
You can simply write the data in response to a file:
with open('myfile.wav', mode='bx') as f:
f.write(response)
If you want to access the audio data as a NumPy array without writing it to a file first, you can do this with the soundfile module like this:
import io
import soundfile as sf
data, samplerate = sf.read(io.BytesIO(response))
See also this example: https://pysoundfile.readthedocs.io/en/0.9.0/#virtual-io
AudioSegment.from_raw() also will work while you have a continues stream of bytes:
import io
from pydub import AudioSegment
current_data is defined as the stream of bytes that you receive
s = io.BytesIO(current_data)
audio = AudioSegment.from_raw(s, sample_width, frame_rate, channels).export(filename, format='wav')
To add wave file header to raw audio bytes (extracted from wave library):
import struct
def write_header(_bytes, _nchannels, _sampwidth, _framerate):
WAVE_FORMAT_PCM = 0x0001
initlength = len(_bytes)
bytes_to_add = b'RIFF'
_nframes = initlength // (_nchannels * _sampwidth)
_datalength = _nframes * _nchannels * _sampwidth
bytes_to_add += struct.pack('<L4s4sLHHLLHH4s',
36 + _datalength, b'WAVE', b'fmt ', 16,
WAVE_FORMAT_PCM, _nchannels, _framerate,
_nchannels * _framerate * _sampwidth,
_nchannels * _sampwidth,
_sampwidth * 8, b'data')
bytes_to_add += struct.pack('<L', _datalength)
return bytes_to_add + _bytes
I faced the same problem while streaming and I used the answers above to write a complete function.
In my case, the byte array was coming from streaming an audio file (the frontend) and the backend needs to process it as a ndarray.
This function simulates how the front-ends sends the audio file as chunks that are accumulated into a byte array:
audio_file_path = 'offline_input/zoom283.wav'
chunk = 1024
wf = wave.open(audio_file_path, 'rb')
audio_input = b''
d = wf.readframes(chunk)
while len(d) > 0:
d = wf.readframes(chunk)
audio_input = audio_input + d
some import libraries:
import io
import wave
import numpy as np
import scipy.io.wavfile
import soundfile as sf
from scipy.io.wavfile import write
Finally, the backend will take a byte array and convert it to ndarray:
def convert_bytearray_to_wav_ndarray(input_bytearray: bytes, sampling_rate=16000):
bytes_wav = bytes()
byte_io = io.BytesIO(bytes_wav)
write(byte_io, sampling_rate, np.frombuffer(input_bytearray, dtype=np.int16))
output_wav = byte_io.read()
output, samplerate = sf.read(io.BytesIO(output_wav))
return output
output = convert_bytearray_to_wav_ndarray(input_bytearray=audio_input)
The output represents the audio file to be processed by the backend:
To check that the file has been received correctly, we write it to the desk:
scipy.io.wavfile.write("output1.wav", 16000, output)

Writing WAV file using Python, Numpy array and WAVE module

I am trying to implement the Karplus-Strong algorithm.
All is looking fine when I play (through Jupyter Notebook using Audio(y, rate=Fs)) the collected numpy array (representing guitar accord).
Unfortunately, writing the numpy array: y, into wav file using WAVE module is incorrect (using the next python code):
noise_output = wave.open('k-s.wav', 'w')
noise_output.setparams((1, 4, Fs, 0, 'NONE', 'not compressed'))
for i in range(0, len(y)):
value = y[i]
packed_value = struct.pack('f', value)
noise_output.writeframes(packed_value)
noise_output.close()
Each element of y is
<type 'numpy.float64'>
How should I amend the writing loop in order write the WAV file correctly?
Some more information about the issue. Before writing to WAV, the first elements of the y array are:
[ 0.33659756 0.33659756 -0.43915295 -0.87036152 1.40708988 0.32123558
-0.6889402 1.9739982 -1.29587159 -0.12299964 2.18381762 0.82228042
0.24593503 -1.28067426 -0.67568838 -0.01843234 -1.830472 1.2729578
-0.56575346 0.55410736]
After writing the elements to the WAV file, close the WAV file and read it again, I got this for the first 20 elements of the collected array:
[ 1051481732 1051481732 -1092560728 -1084305405 1068768133 1050966269
-1087349149 1073523705 -1079648481 -1107564740 1074512811 1062371576
1048303204 -1079775966 -1087571478 -1130954901 -1075163928 1067642952
-1089415880 1057872379]
Here are code samples to write a (stereo) wave file using the wave standard library.
I included two examples: one using numpy, and one that doesn't require any dependencies.
Using a numpy array
Note that if your data is in a numpy array, no need for the struct library.
import wave
import numpy as np
samplerate = 44100
# A note on the left channel for 1 second.
t = np.linspace(0, 1, samplerate)
left_channel = 0.5 * np.sin(2 * np.pi * 440.0 * t)
# Noise on the right channel.
right_channel = np.random.random(size=samplerate)
# Put the channels together with shape (2, 44100).
audio = np.array([left_channel, right_channel]).T
# Convert to (little-endian) 16 bit integers.
audio = (audio * (2 ** 15 - 1)).astype("<h")
with wave.open("sound1.wav", "w") as f:
# 2 Channels.
f.setnchannels(2)
# 2 bytes per sample.
f.setsampwidth(2)
f.setframerate(samplerate)
f.writeframes(audio.tobytes())
Using a list
This is (almost) the same code but without using numpy. No external dependencies are required.
import math
import random
import struct
import wave
samplerate = 44100
left_channel = [
0.5 * math.sin(2 * math.pi * 440.0 * i / samplerate) for i in range(samplerate)
]
right_channel = [random.random() for _ in range(samplerate)]
with wave.open("sound2.wav", "w") as f:
f.setnchannels(2)
f.setsampwidth(2)
f.setframerate(samplerate)
for samples in zip(left_channel, right_channel):
for sample in samples:
sample = int(sample * (2 ** 15 - 1))
f.writeframes(struct.pack("<h", sample))
import scipy.io.wavfile
scipy.io.wavfile.write("karplus.wav", Fs, y)
Tada! AFAIK works with float64 and float32, and probably others. For stereo, shape must be (nb_samples, 2). See scipy.io.wavfile.write.
Read and write wave file to and from a file:
from scipy.io import wavfile
sampling_rate, data = wavfile.read(wpath)
wavfile.write('abc1.wav', sampling_rate, data)

How to program Power Spectrum of .wav file

So I am trying to calculate the power spectrum of noise I recorded from the sun from a .wav file it recorded to. So far my code is (NEW CODE FROM OLD POST):
import pyaudio
import sys
import struct
import numpy
from pylab import *
import wave
import pyfits
sundata = ('sun_noise_ouput.wav')
chunk = 1024
FORMAT = pyaudio.paInt16 # 16-bit integers
CHANNELS = 1
RATE = 25000
RECORD_SECONDS = 120
p = pyaudio.PyAudio()
# Convert to pair of bytes to numerical datatype
N = len(sundata)/2
data = numpy.zeros(N,dtype=float)
for i in range(N) :
data[i] = struct.unpack('h',sundata[2*i:2*(i+1)])[0]
column = pyfits.Column(name='integer data', array=data, format="J")
fitsoutput = pyfits.new_table([column])
fitsoutput.writeto('sun_noise_output.fits', clobber=True)
wf = wave.open('sun_noise_output.wav', 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(sundata)
wf.close()
dataft = numpy.fft.fft(data)
powerspectrum = abs(dataft)**2
figure()
plot(range(N),data)
figure()
plot(range(N),powerspectrum)
show()
May also help to note that when I try playing the file it returns no audio and says it has length 0:00 seconds
Also when I downlaod a sample from NASA's homepage there is no playback audio and these are the graphs produced:

Categories

Resources