Trying to Use FFT to Analyze Audio Signal in Python

Trying to Use FFT to Analyze Audio Signal in Python - python

I've been trying to use FFT to get a frequency of a signal, and I'm having a bit of trouble dealing with it. I found a site that talked about using FFT to analyze and plot a signal here:
http://macdevcenter.com/pub/a/python/2001/01/31/numerically.html?page=2
But I've run into an issue implementing it with Python 2.7. EDIT I updated the code with the improved version. This one works, actually, and plots the waveforms (a bit slowly) onto a chart. I'm wondering if this is the correct method for reading frames, though - I read that even numbered array indices are for the left-channel (and so the odd-numbered ones would be for the right, I suppose).
So, I guess that I should read however many frames, but divide it by the sample width, and then sample every other even frame for the left channel if it's stereo, huh?
import scipy
import wave
import struct
import numpy
import pylab
fp = wave.open('./music.wav', 'rb')
samplerate = fp.getframerate()
totalsamples = fp.getnframes()
fft_length = 256 # Guess
num_fft = (totalsamples / fft_length) - 2
#print (samplerate)
temp = numpy.zeros((num_fft, fft_length), float)
leftchannel = numpy.zeros((num_fft, fft_length), float)
rightchannel = numpy.zeros((num_fft, fft_length), float)
for i in range(num_fft):
tempb = fp.readframes(fft_length / fp.getnchannels() / fp.getsampwidth());
up = (struct.unpack("%dB"%(fft_length), tempb))
temp[i,:] = numpy.array(up, float) - 128.0
temp = temp * numpy.hamming(fft_length)
temp.shape = (-1, fp.getnchannels())
fftd = numpy.fft.fft(temp)
pylab.plot(abs(fftd[:,1]))
pylab.show()
The music I'm loading in is some that I made myself.
EDIT: So now, I'm getting the audio file read through reading the frames, and dividing the current number to read by the number of channels and the number of bits per frame. Am I losing any data by doing this? This is the only way that I could get any data at all - otherwise it would be too much data for the file handler to read into the struct.unpack function. Also, I'm trying to separate the left channel from the right channel (get the FFT data for each channel). How would I go about doing this?

I have not used scipy's version of numpy/numarray in a long time, but seek out the function frombuffer. It is a lot easier to use than trying to shuffle all of the data through struct.unpack. An example reading the data using numpy:
fp = wave.open('./music.wav', 'rb')
assert fp.getnchannels() == 1, "Assumed 1 channel"
assert fp.getsampwidth() == 2, "Assuming int16 data"
numpy.frombuffer(fp.getnframes(fp.readframes()), 'i2')
Keep in mind that wave files can have different data types in them and multiple channels, so be aware of that when unpacking.

Related

Generated sine wave sounds buzzy or 'square' instead of sine

I asked a similar question earlier, but I made the question more complex than it had to be. I am generating a 100 hz sine wave, that I then playback using simpleaudio.
Note: I had this problem when I encoded the wave to a .wav file. Sounded exactly the same as with simple audio. Also changing channels from 2 to 1 changes the sound, but does not fix this problem.
To install simple audio:
sudo apt-get install -y python3-dev libasound2-dev
python -m pip install simpleaudio
Stand alone code:
import numpy as np
import simpleaudio as sa
import matplotlib.pyplot as plt
def generate_sine_tone(numsamples, sample_time, frequency):
t = np.arange(numsamples) * sample_time # Time vector
signal = 8388605*np.sin(2*np.pi * frequency*t)
return signal
if __name__ == "__main__":
duration = 1
samprate = 44100 # Sampling rate
numsamples = samprate*duration# Sample count
st = 1.0 / samprate # Sample time
t = np.arange(numsamples) * st # Time vecto
nchannels = 2
sampwidth = 3
signal = generate_sine_tone(numsamples, st, 100)
signal2 = np.asarray([ int(x) for x in signal ])
play_obj = sa.play_buffer(signal2, nchannels, sampwidth, samprate)
print(signal2)
plt.figure(0)
plt.plot(signal2)
plt.show()
Running this in the command line will produce a graph of the sine wave for 1 second, or 44100, samples, which is 100 periods of the sine wave. It will also play the sound into your speakers, so turn your system sound down a good bit before running.
My other posts on this issue: Trying to generate a sine wave '.wav' file in Python. Comes out as a square wave
https://music.stackexchange.com/questions/110688/generated-sine-wave-in-python-comes-out-buzzy-or-square-ey
expected sound: https://www.youtube.com/watch?v=eDk1bOX-P3w&t=4s
received sound (approx): https://www.youtube.com/watch?v=F7DnVBJ9R34
This problem is annoying me sooo much, I would greatly appreciate any help that can be provided.

There are two problems here.
The lesser one is that you are creating a single array and playing it back as if it were stereo. You need to set nchannels = 1 (or duplicate all the values by creating an array with two columns).
The other problem is trying to create 24-bit samples. Very few people have good enough equipment and good enough ears to tell the difference between 24-bit and 16-bit audio. Using a sample width of 2 makes things much easier. You can generate 24-bit samples if you wish and normalize them to 16-bit for playback: signal *= 32767 / np.max(np.abs(signal))
This code works
import numpy as np
import simpleaudio as sa
def generate_sine_tone(numsamples, sample_time, frequency):
t = np.arange(numsamples) * sample_time # Time vector
signal = 32767*np.sin(2*np.pi * frequency*t)
return signal
duration = 1
samprate = 44100 # Sampling rate
numsamples = samprate*duration# Sample count
st = 1.0 / samprate # Sample time
nchannels = 1
sampwidth = 2
signal = generate_sine_tone(numsamples, st, 100)
signal2 = signal.astype(np.int16)
#signal2 = np.asarray([ int(x) for x in signal ])
play_obj = sa.play_buffer(signal2, nchannels, sampwidth, samprate)
play_obj.wait_done()

The simpleaudio.play_buffer() function does not convert your data. It only takes the exact memory buffer (i.e. the buffer it gets from the object you gave) and interprets it as what you claim it to contain. In your program your description of what the buffer contains (2 * 3 byte items) is not what it actually contains (1 * 8 byte items). Unfortunately in your example program this does not result in an error, because the size of the buffer you gave it coincidentally happens to be an exact multiple of 6, the size in bytes you claim your memory buffer's items to have. If you try it with one more sample, numsamples = 44101, you will get an error, because 44101 * 8 is not divisible by 6:
ValueError: Buffer size (in bytes) is not a multiple of bytes-per-sample and the number of channels.
Try what print(signal2.itemsize) shows. It's not the 3 * 2 that you claim it to be in your call to simpleaudio.play_buffer(). If the following is still correct, there's no way to get 24 bit buffers from Numpy even if you tried to: NumPy: 3-byte, 6-byte types (aka uint24, uint48)
And perhaps that's why the tutorial tells you to just use 16-bit data type for Numpy buffers, see https://github.com/hamiltron/py-simple-audio/blob/master/docs/tutorial.rst
Numpy arrays can be used to store audio but there are a few crucial
requirements. If they are to store stereo audio, the array must have
two columns since each column contains one channel of audio data. They
must also have a signed 16-bit integer dtype and the sample amplitude
values must consequently fall in the range of -32768 to 32767.
What are these "buffers"? They are a way for Python objects to pass low-level raw byte data between each other and libraries written in e.g. C. See this: https://docs.python.org/3/c-api/buffer.html or this: https://jakevdp.github.io/blog/2014/05/05/introduction-to-the-python-buffer-protocol/
If you want to create 24 bit buffers from your audio data, then you'll have to use some other library or low-level byte-by-byte hacking for creating the memory buffer, because Numpy won't do it for you. But you might be able to use dtype=numpy.float32 to get 32-bit floats that have 4-byte samples per channel. Simpleaudio detects this from the sample size, for example for Alsa:
https://github.com/hamiltron/py-simple-audio/blob/master/c_src/simpleaudio_alsa.c
/* set that format appropriately */
if (bytes_per_chan == 1) {
sample_format = SND_PCM_FORMAT_U8;
} else if (bytes_per_chan == 2) {
sample_format = SND_PCM_FORMAT_S16_LE;
} else if (bytes_per_chan == 3) {
sample_format = SND_PCM_FORMAT_S24_3LE;
} else if (bytes_per_chan == 4) {
sample_format = SND_PCM_FORMAT_FLOAT_LE;
} else {
ALSA_EXCEPTION("Unsupported Sample Format.", 0, "", err_msg_buf);
return NULL;
}
That's a little bit like using the weight of a vehicle for determining if it's a car, a motorcycle or a bicycle. It works, but it might feel odd to only be asked about the weight of a vehicle and not at all about its type.
So. To fix your program, use the dtype parameter of asarray() to convert your data to the buffer format you want, and declare the correct format in play_buffer(). And perhaps remove the scaling factor 8388605 from the sine generation, replace it with whatever you actually want and place it somewhere near the format specification.

Reading a wav file with scipy and librosa in python

I am trying to load a .wav file in Python using the scipy folder. My final objective is to create the spectrogram of that audio file. The code for reading the file could be summarized as follows:
import scipy.io.wavfile as wav
(sig, rate) = wav.read(_wav_file_)
For some .wav files I am receiving the following error:
WavFileWarning: Chunk (non-data) not understood, skipping it.
WavFileWarning) ** ValueError: Incomplete wav chunk.
Therefore, I decided to use librosa for reading the files using the:
import librosa
(sig, rate) = librosa.load(_wav_file_, sr=None)
That is working properly for all cases, however, I noticed a difference in the colors of the spectrogram. While it was the same exact figure, however, somehow the colors were inversed. More specifically, I noticed that when keeping the same function for calculation of the specs and changing only the way I am reading the .wav there was this difference. Any idea what can produce that thing? Is there a default difference between the way the two approaches read the .wav file?
EDIT:
(rate1, sig1) = wav.read(spec_file) # rate1 = 16000
sig, rate = librosa.load(spec_file) # rate 22050
sig = np.array(α*sig, dtype = "int16")
Something that almost worked is to multiple the result of sig with a constant α alpha that was the scale between the max values of the signal from scipy wavread and the signal derived from librosa. Still though the signal rates were different.

This sounds like a quantization problem. If samples in the wave file are stored as float and librosa is just performing a straight cast to an int, and value less than 1 will be truncated to 0. More than likely, this is why sig is an array of all zeros. The float must be scaled to map it into range of an int. For example,
>>> a = sp.randn(10)
>>> a
array([-0.04250369, 0.244113 , 0.64479281, -0.3665814 , -0.2836227 ,
-0.27808428, -0.07668698, -1.3104602 , 0.95253315, -0.56778205])
Convert a to type int without scaling
>>> a.astype(int)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
Convert a to int with scaling for 16-bit integer
>>> b = (a* 32767).astype(int)
>>> b
array([ -1392, 7998, 21127, -12011, -9293, -9111, -2512, -42939,
31211, -18604])
Convert scaled int back to float
>>> c = b/32767.0
>>> c
array([-0.04248177, 0.24408704, 0.64476455, -0.36655782, -0.28360851,
-0.27805414, -0.0766625 , -1.31043428, 0.9525132 , -0.56776635])
c and b are only equal to about 3 or 4 decimal places due to quantization to int.
If librosa is returning a float, you can scale it by 2**15 and cast it to an int to get same range of values that scipy wave reader is returning. Since librosa is returning a float, chances are the values going to lie within a much smaller range, such as [-1, +1], than a 16-bit integer which will be in [-32768, +32767]. So you need to scale one to get the ranges to match. For example,
sig, rate = librosa.load(spec_file, mono=True)
sig = sig × 32767

If you yourself do not want to do the quantization, then you could use pylab using the pylab.specgram function, to do it for you. You can look inside the function and see how it uses vmin and vmax.
It is not completely clear from your post (at least for me) what you want to achieve (as there is also neither a sample input file nor any script beforehand from you). But anyways, to check if the spectrogram of a wave file has significant differences depending on the case that the signal data returned from any of the read functions is float32 or int, I tested the following 3 functions.
Python Script:
_wav_file_ = "africa-toto.wav"
def spectogram_librosa(_wav_file_):
import librosa
import pylab
import numpy as np
(sig, rate) = librosa.load(_wav_file_, sr=None, mono=True, dtype=np.float32)
pylab.specgram(sig, Fs=rate)
pylab.savefig('spectrogram3.png')
def graph_spectrogram_wave(wav_file):
import wave
import pylab
def get_wav_info(wav_file):
wav = wave.open(wav_file, 'r')
frames = wav.readframes(-1)
sound_info = pylab.fromstring(frames, 'int16')
frame_rate = wav.getframerate()
wav.close()
return sound_info, frame_rate
sound_info, frame_rate = get_wav_info(wav_file)
pylab.figure(num=3, figsize=(10, 6))
pylab.title('spectrogram pylab with wav_file')
pylab.specgram(sound_info, Fs=frame_rate)
pylab.savefig('spectrogram2.png')
def graph_wavfileread(_wav_file_):
import matplotlib.pyplot as plt
from scipy import signal
from scipy.io import wavfile
import numpy as np
sample_rate, samples = wavfile.read(_wav_file_)
frequencies, times, spectrogram = signal.spectrogram(samples,sample_rate,nfft=1024)
plt.pcolormesh(times, frequencies, 10*np.log10(spectrogram))
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.savefig("spectogram1.png")
spectogram_librosa(_wav_file_)
#graph_wavfileread(_wav_file_)
#graph_spectrogram_wave(_wav_file_)
which produced the following 3 outputs:
which apart from the minor differences in size and intensity seem quite similar, no matter the read method, library or data type, which makes me question a little, for what purpose need the outputs be 'exactly' same and how exact should they be.
I do find strange though that the librosa.load() function offers a dtype parameter but works anyways only with float values. Googling in this regard led to me to only this issue which wasn't much help and this issue says that that's how it will stay with librosa, as internally it seems to only use floats.

To add on to what has been said, Librosa has a utility to convert integer arrays to floats.
float_audio = librosa.util.buf_to_float(sig)
I use this to great success when producing spectrograms of Pydub audiosegments. Keep in mind, one of its arguments is the number of bytes per sample. It defaults to 2. You can read about it more in the documentation here. Here is the source code:
def buf_to_float(x, n_bytes=2, dtype=np.float32):
"""Convert an integer buffer to floating point values.
This is primarily useful when loading integer-valued wav data
into numpy arrays.
See Also
--------
buf_to_float
Parameters
----------
x : np.ndarray [dtype=int]
The integer-valued data buffer
n_bytes : int [1, 2, 4]
The number of bytes per sample in `x`
dtype : numeric type
The target output type (default: 32-bit float)
Returns
-------
x_float : np.ndarray [dtype=float]
The input data buffer cast to floating point
"""
# Invert the scale of the data
scale = 1./float(1 << ((8 * n_bytes) - 1))
# Construct the format string
fmt = '<i{:d}'.format(n_bytes)
# Rescale and format the data buffer
return scale * np.frombuffer(x, fmt).astype(dtype)

PyQT Graph -- Graph display is huge

I have a code here that displays the waveform of an audio file in PyQT Graph, unfortunately the graph seems so big.
I can't attached an image yet so I'll provide a link of the screenshot of the graph that I made.
And here is my code:
self.waveFile = wave.open(audio,'rb')
self.format = pyaudio.paInt16
channel = self.waveFile.getnchannels()
self.rate = self.waveFile.getframerate()
self.frame = self.waveFile.getnframes()
self.stream = p.open(format=self.format,
channels=channel,
rate=self.rate,
output=True)
durationF = self.frame / float(self.rate)
self.data_int = self.waveFile.readframes(self.frame)
self.data_plot = np.fromstring(self.data_int, 'Int16')
self.data_plot.shape = -1, 2
self.data_plot = self.data_plot.T
self.time = np.arange(0, self.frame) * (1.0 / self.rate)
w = pg.plot()
w.plot(self.time, self.data_plot[0])
Should I need to adjust X and Y range limits? Should I adjust the Y peak? As you can see the X(time) matches from the audio file that I used with 8 seconds duration. But the Y isn't(?). I am not sure how to adjust the data of the waveform so that it can be fit inside the window. Any response and suggestions will be of great help!

I think there are a couple of options depending on what you want to show.
1: Adjust the Y-limit
The simplest solution is to scale the Y axis.
# See docs for function setYrange
# setYRange(min, max, padding=None, update=True)
w.setRange(YRange=[min,max])
You can check the docs here.
That is if you want to keep all of the audio values the same as they are currently, although do you really need the audio data in terms of those values? Normally audio data is displayed as a float between -1 and +1 for scientific purposes at least.
2: Adjust your data
As said before audio data tends to be most useful when its scaled between -1 and +1; it's just easier for us to glance at it and instantly get a feeling for if the amplitude is correct (if we were testing a gain program for example). There are plenty of other Python libraries other than the built in wave module, which will handle this much easier for you like PySoundFile or many others (see this other SO post for other methods of reading .wav files in Python).
Otherwise you can convert the data received from the wave module to floating point data using something like this (props to yeeking for the code):
import wave
import struct
import sys
def wav_to_floats(wave_file):
w = wave.open(wave_file)
astr = w.readframes(w.getnframes())
# convert binary chunks to short
a = struct.unpack("%ih" % (w.getnframes()* w.getnchannels()), astr)
a = [float(val) / pow(2, 15) for val in a]
return a
# read the wav file specified as first command line arg
signal = wav_to_floats(sys.argv[1])
print "read "+str(len(signal))+" frames"
print "in the range "+str(min(signal))+" to "+str(min(signal))
If possible using a library is always better in this case, because the wave module as it stands doesn't support many audio use cases (as far as I was aware, only mono 16 bit audio).
Note: If you do convert it to -1 to +1 data probably still worthwhile adjusting the Y-Limit like explained in part 1. Just to avoid weird scaling when loading different .wav files.

Python wave: Extracting stereo channels separately from *.wav-file

I have a 32 bit *.wav-file 44100Hz sampling. I use use wave and python struct.unpack to get the data to an array. However I want to get each of the two stereo channels as a separate array. How do I do this as simply as possible? This is the code I have:
def read_values(filename):
wave_file = open(filename, 'r')
nframes = wave_file.getnframes()
nchannels = wave_file.getnchannels()
sampling_frequency = wave_file.getframerate()
T = nframes / float(sampling_frequency)
read_frames = wave_file.readframes(nframes)
wave_file.close()
data = unpack("%dh" % nchannels*nframes, read_frames)
return T, data, nframes, nchannels, sampling_frequency
Are you able to (1) modify the code such that it returns two arrays, one for each channel, and (2) explain how the wave is structured, and how the functions which I used incorrectly is used correctly.

Left and right channel in stereo files are interleaved. You can find lots of information about this online, e.g. look at the figures here.
So if you already have your whole audio data in an list, you just have to get very 2nd sample for stereo, every 4th sample for quadro, etc. You can do this with list splicing:
data_per_channel = [data[offset::nchannels] for offset in range(nchannels)]

Reading *.wav files in Python

I need to analyze sound written in a .wav file. For that I need to transform this file into set of numbers (arrays, for example). I think I need to use the wave package. However, I do not know how exactly it works. For example I did the following:
import wave
w = wave.open('/usr/share/sounds/ekiga/voicemail.wav', 'r')
for i in range(w.getnframes()):
frame = w.readframes(i)
print frame
As a result of this code I expected to see sound pressure as function of time. In contrast I see a lot of strange, mysterious symbols (which are not hexadecimal numbers). Can anybody, pleas, help me with that?

Per the documentation, scipy.io.wavfile.read(somefile) returns a tuple of two items: the first is the sampling rate in samples per second, the second is a numpy array with all the data read from the file:
from scipy.io import wavfile
samplerate, data = wavfile.read('./output/audio.wav')

Using the struct module, you can take the wave frames (which are in 2's complementary binary between -32768 and 32767 (i.e. 0x8000 and 0x7FFF). This reads a MONO, 16-BIT, WAVE file. I found this webpage quite useful in formulating this:
import wave, struct
wavefile = wave.open('sine.wav', 'r')
length = wavefile.getnframes()
for i in range(0, length):
wavedata = wavefile.readframes(1)
data = struct.unpack("<h", wavedata)
print(int(data[0]))
This snippet reads 1 frame. To read more than one frame (e.g., 13), use
wavedata = wavefile.readframes(13)
data = struct.unpack("<13h", wavedata)

Different Python modules to read wav:
There is at least these following libraries to read wave audio files:
SoundFile
scipy.io.wavfile (from scipy)
wave (to read streams. Included in Python 2 and 3)
scikits.audiolab (unmaintained since 2010)
sounddevice (play and record sounds, good for streams and real-time)
pyglet
librosa (music and audio analysis)
madmom (strong focus on music information retrieval (MIR) tasks)
The most simple example:
This is a simple example with SoundFile:
import soundfile as sf
data, samplerate = sf.read('existing_file.wav')
Format of the output:
Warning, the data are not always in the same format, that depends on the library. For instance:
from scikits import audiolab
from scipy.io import wavfile
from sys import argv
for filepath in argv[1:]:
x, fs, nb_bits = audiolab.wavread(filepath)
print('Reading with scikits.audiolab.wavread:', x)
fs, x = wavfile.read(filepath)
print('Reading with scipy.io.wavfile.read:', x)
Output:
Reading with scikits.audiolab.wavread: [ 0. 0. 0. ..., -0.00097656 -0.00079346 -0.00097656]
Reading with scipy.io.wavfile.read: [ 0 0 0 ..., -32 -26 -32]
SoundFile and Audiolab return floats between -1 and 1 (as matab does, that is the convention for audio signals). Scipy and wave return integers, which you can convert to floats according to the number of bits of encoding, for example:
from scipy.io.wavfile import read as wavread
samplerate, x = wavread(audiofilename) # x is a numpy array of integers, representing the samples
# scale to -1.0 -- 1.0
if x.dtype == 'int16':
nb_bits = 16 # -> 16-bit wav files
elif x.dtype == 'int32':
nb_bits = 32 # -> 32-bit wav files
max_nb_bit = float(2 ** (nb_bits - 1))
samples = x / (max_nb_bit + 1) # samples is a numpy array of floats representing the samples

IMHO, the easiest way to get audio data from a sound file into a NumPy array is SoundFile:
import soundfile as sf
data, fs = sf.read('/usr/share/sounds/ekiga/voicemail.wav')
This also supports 24-bit files out of the box.
There are many sound file libraries available, I've written an overview where you can see a few pros and cons.
It also features a page explaining how to read a 24-bit wav file with the wave module.

You can accomplish this using the scikits.audiolab module. It requires NumPy and SciPy to function, and also libsndfile.
Note, I was only able to get it to work on Ubunutu and not on OSX.
from scikits.audiolab import wavread
filename = "testfile.wav"
data, sample_frequency,encoding = wavread(filename)
Now you have the wav data

If you want to procces an audio block by block, some of the given solutions are quite awful in the sense that they imply loading the whole audio into memory producing many cache misses and slowing down your program. python-wavefile provides some pythonic constructs to do NumPy block-by-block processing using efficient and transparent block management by means of generators. Other pythonic niceties are context manager for files, metadata as properties... and if you want the whole file interface, because you are developing a quick prototype and you don't care about efficency, the whole file interface is still there.
A simple example of processing would be:
import sys
from wavefile import WaveReader, WaveWriter
with WaveReader(sys.argv[1]) as r :
with WaveWriter(
'output.wav',
channels=r.channels,
samplerate=r.samplerate,
) as w :
# Just to set the metadata
w.metadata.title = r.metadata.title + " II"
w.metadata.artist = r.metadata.artist
# This is the prodessing loop
for data in r.read_iter(size=512) :
data[1] *= .8 # lower volume on the second channel
w.write(data)
The example reuses the same block to read the whole file, even in the case of the last block that usually is less than the required size. In this case you get an slice of the block. So trust the returned block length instead of using a hardcoded 512 size for any further processing.

If you're going to perform transfers on the waveform data then perhaps you should use SciPy, specifically scipy.io.wavfile.

Here's a Python 3 solution using the built in wave module [1], that works for n channels, and 8,16,24... bits.
import sys
import wave
def read_wav(path):
with wave.open(path, "rb") as wav:
nchannels, sampwidth, framerate, nframes, _, _ = wav.getparams()
print(wav.getparams(), "\nBits per sample =", sampwidth * 8)
signed = sampwidth > 1 # 8 bit wavs are unsigned
byteorder = sys.byteorder # wave module uses sys.byteorder for bytes
values = [] # e.g. for stereo, values[i] = [left_val, right_val]
for _ in range(nframes):
frame = wav.readframes(1) # read next frame
channel_vals = [] # mono has 1 channel, stereo 2, etc.
for channel in range(nchannels):
as_bytes = frame[channel * sampwidth: (channel + 1) * sampwidth]
as_int = int.from_bytes(as_bytes, byteorder, signed=signed)
channel_vals.append(as_int)
values.append(channel_vals)
return values, framerate
You can turn the result into a NumPy array.
import numpy as np
data, rate = read_wav(path)
data = np.array(data)
Note, I've tried to make it readable rather than fast. I found reading all the data at once was almost 2x faster. E.g.
with wave.open(path, "rb") as wav:
nchannels, sampwidth, framerate, nframes, _, _ = wav.getparams()
all_bytes = wav.readframes(-1)
framewidth = sampwidth * nchannels
frames = (all_bytes[i * framewidth: (i + 1) * framewidth]
for i in range(nframes))
for frame in frames:
...
Although python-soundfile is roughly 2 orders of magnitude faster (hard to approach this speed with pure CPython).
[1] https://docs.python.org/3/library/wave.html

My dear, as far as I understood what you are looking for, you are getting into a theory field called Digital Signal Processing (DSP). This engineering area comes from a simple analysis of discrete-time signals to complex adaptive filters. A nice idea is to think of the discrete-time signals as a vector, where each element of this vector is a sampled value of the original, continuous-time signal. Once you get the samples in a vector form, you can apply different digital signal techniques to this vector.
Unfortunately, on Python, moving from audio files to NumPy array vector is rather cumbersome, as you could notice... If you don't idolize one programming language over other, I highly suggest trying out MatLab/Octave. Matlab makes the samples access from files straightforward. audioread() makes this task to you :) And there are a lot of toolboxes designed specifically for DSP.
Nevertheless, if you really intend to get into Python for this, I'll give you a step-by-step to guide you.
1. Get the samples
The easiest way the get the samples from the .wav file is:
from scipy.io import wavfile
sampling_rate, samples = wavfile.read(f'/path/to/file.wav')
Alternatively, you could use the wave and struct package to get the samples:
import numpy as np
import wave, struct
wav_file = wave.open(f'/path/to/file.wav', 'rb')
# from .wav file to binary data in hexadecimal
binary_data = wav_file.readframes(wav_file.getnframes())
# from binary file to samples
s = np.array(struct.unpack('{n}h'.format(n=wav_file.getnframes()*wav_file.getnchannels()), binary_data))
Answering your question: binary_data is a bytes object, which is not human-readable and can only make sense to a machine. You can validate this statement typing type(binary_data). If you really want to understand a little bit more about this bunch of odd characters, click here.
If your audio is stereo (that is, has 2 channels), you can reshape this signal to achieve the same format obtained with scipy.io
s_like_scipy = s.reshape(-1, wav_file.getnchannels())
Each column is a chanell. In either way, the samples obtained from the .wav file can be used to plot and understand the temporal behavior of the signal.
In both alternatives, the samples obtained from the files are represented in the Linear Pulse Code Modulation (LPCM)
2. Do digital signal processing stuffs onto the audio samples
I'll leave that part up to you :) But this is a nice book to take you through DSP. Unfortunately, I don't know good books with Python, they are usually horrible books... But do not worry about it, the theory can be applied in the very same way using any programming language, as long as you domain that language.
Whatever the book you pick up, stick with the classical authors, such as Proakis, Oppenheim, and so on... Do not care about the language programming they use. For a more practical guide of DPS for audio using Python, see this page.
3. Play the filtered audio samples
import pyaudio
p = pyaudio.PyAudio()
stream = p.open(format = p.get_format_from_width(wav_file.getsampwidth()),
channels = wav_file.getnchannels(),
rate = wav_file.getframerate(),
output = True)
# from samples to the new binary file
new_binary_data = struct.pack('{}h'.format(len(s)), *s)
stream.write(new_binary_data)
where wav_file.getsampwidth() is the number of bytes per sample, and wav_file.getframerate() is the sampling rate. Just use the same parameters of the input audio.
4. Save the result in a new .wav file
wav_file=wave.open('/phat/to/new_file.wav', 'w')
wav_file.setparams((nchannels, sampwidth, sampling_rate, nframes, "NONE", "not compressed"))
for sample in s:
wav_file.writeframes(struct.pack('h', int(sample)))
where nchannels is the number of channels, sampwidth is the number of bytes per samples, sampling_rate is the sampling rate, nframes is the total number of samples.

I needed to read a 1-channel 24-bit WAV file. The post above by Nak was very useful. However, as mentioned above by basj 24-bit is not straightforward. I finally got it working using the following snippet:
from scipy.io import wavfile
TheFile = 'example24bit1channelFile.wav'
[fs, x] = wavfile.read(TheFile)
# convert the loaded data into a 24bit signal
nx = len(x)
ny = nx/3*4 # four 3-byte samples are contained in three int32 words
y = np.zeros((ny,), dtype=np.int32) # initialise array
# build the data left aligned in order to keep the sign bit operational.
# result will be factor 256 too high
y[0:ny:4] = ((x[0:nx:3] & 0x000000FF) << 8) | \
((x[0:nx:3] & 0x0000FF00) << 8) | ((x[0:nx:3] & 0x00FF0000) << 8)
y[1:ny:4] = ((x[0:nx:3] & 0xFF000000) >> 16) | \
((x[1:nx:3] & 0x000000FF) << 16) | ((x[1:nx:3] & 0x0000FF00) << 16)
y[2:ny:4] = ((x[1:nx:3] & 0x00FF0000) >> 8) | \
((x[1:nx:3] & 0xFF000000) >> 8) | ((x[2:nx:3] & 0x000000FF) << 24)
y[3:ny:4] = (x[2:nx:3] & 0x0000FF00) | \
(x[2:nx:3] & 0x00FF0000) | (x[2:nx:3] & 0xFF000000)
y = y/256 # correct for building 24 bit data left aligned in 32bit words
Some additional scaling is required if you need results between -1 and +1. Maybe some of you out there might find this useful

if its just two files and the sample rate is significantly high, you could just interleave them.
from scipy.io import wavfile
rate1,dat1 = wavfile.read(File1)
rate2,dat2 = wavfile.read(File2)
if len(dat2) > len(dat1):#swap shortest
temp = dat2
dat2 = dat1
dat1 = temp
output = dat1
for i in range(len(dat2)/2): output[i*2]=dat2[i*2]
wavfile.write(OUTPUT,rate,dat)

PyDub (http://pydub.com/) has not been mentioned and that should be fixed. IMO this is the most comprehensive library for reading audio files in Python right now, although not without its faults. Reading a wav file:
from pydub import AudioSegment
audio_file = AudioSegment.from_wav('path_to.wav')
# or
audio_file = AudioSegment.from_file('path_to.wav')
# do whatever you want with the audio, change bitrate, export, convert, read info, etc.
# Check out the API docs http://pydub.com/
PS. The example is about reading a wav file, but PyDub can handle a lot of various formats out of the box. The caveat is that it's based on both native Python wav support and ffmpeg, so you have to have ffmpeg installed and a lot of the pydub capabilities rely on the ffmpeg version. Usually if ffmpeg can do it, so can pydub (which is quite powerful).
Non-disclaimer: I'm not related to the project, but I am a heavy user.

u can also use simple import wavio library u also need have some basic knowledge of the sound.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Trying to Use FFT to Analyze Audio Signal in Python - python

Related

Generated sine wave sounds buzzy or 'square' instead of sine

Reading a wav file with scipy and librosa in python

PyQT Graph -- Graph display is huge

Python wave: Extracting stereo channels separately from *.wav-file

Reading *.wav files in Python

Categories

Resources