Python perform FFT on wave data for a 25fps animation - python

I am doing a project where I want to use the data of a .wav file to drive animation. The problems I am facing are mainly due to the fact that the animation is 25fps and I have 44100 samples per second in the .wav file, so I've broken down apart to 44100/25 samples. Working with the amplitude is fine and I created an initial test to try it out and it worked. This is the code
import wave
import struct
wav = wave.open('test.wav', 'rb')
rate = 44100
nframes = wav.getnframes()
data = wav.readframes(-1)
wav.close()
data_c = [data[offset::2] for offset in range(2)]
ch1 = struct.unpack('%ih' % nframes, data_c[0])
ch2 = struct.unpack('%ih' % nframes, data_c[1])
kf = []
for i in range(0, len(ch2), 44100/25):
cur1 = 0
cur2 = 0
for j in range(i, i+44100/25):
cur1+=ch2[j]
cur2+=ch1[j]
cur = (cur1+cur2) / 44100. / 25. / 2.
kf.append(cur)
min_v = min(kf)
max_v = max(kf)
if abs(max_v) > abs(min_v):
kf = [float(i)/max_v for i in kf]
else:
kf = [float(i)/min_v for i in kf]
Now I want to get the spectrum for each separate keyframe as I do for the amplitude, but I am struggling to think of a way to do it. I can get the spectrum for the whole file using FFT, but that's not I want, because ideally I would like to have different movements of the objects in accordance to different frequencies.

Look at scipy wavfile. It'll turn the wave file into a numpy array. Numpy also has fft functions. Scipy/matplotlib has a spectrogram plot for the entire spectrogram.
from scipy.io import wavfile
sample_rate, data = wavfile.read(filename)
Then you have to get your timing of how you want to read the data. Matplotlib has animation tools that will call a function at a given interval. The other way of doing it is to use PyAudio. If you use pyaudio you can listen to the data while it is displayed.
Next run the data through the FFT. Store the FFT values in a spectrogram array and use matplotlib imshow to display the spectrogram array. You will probably have to rotate the array in some fashion when you display the spectrogram.
From personal experience be careful of python threads. Threading works for I/O, but for calculations the thread can just dominate the whole application slowing everything down. Also GUI elements (like plotting) don't really work in threads. Use matplotlibs animation tools for the plotting.

Related

Drawing Dynamic Spectrogram in real time

I essentially have three data points in three separate numpy array and I want to plot the time along the x-axis, the frequency along the y-axis and the magnitude should represent the colors. But this is all happening in real time so it should look like the spectrogram is drawing itself as the data updates.
I have simplified the code to show that I am running a loop that takes at most 10 ms to run and during that time I am getting new data from functions.
import numpy as np
import random
from time import sleep
def new_val_freq():
return random.randint(0,22)
def new_val_mag():
return random.randint(100,220)
x_axis_time = np.array([1])
y_axis_frequency = np.array([10])
z_axis_magnitude = np.array([100])
t = 1
while True:
x_axis_time = np.append (x_axis_time , [t+1])
t+=1
y_axis_frequency = np.append (y_axis_frequency , [new_val_freq()])
z_axis_magnitude = np.append (z_axis_magnitude, [new_val_mag()])
time.sleep(0.01)
#Trying to figure out how Create/Update spectrogram plot with above additional
#data in real time without lag
Ideally I would like this to be as fast as it possibly can rather than having to redraw the whole spectrogram. It seems matplolib is not good for plotting dynamic spectrograms and I have not come across any dynamic spectrogram examples so I was wondering how I could do this?

How to properly simulate walkie-talkie / radio noise from recordings and adding to clean audio in Python

I have a slew of audio files that are from radio comms from https://www.liveatc.net/ . My intention is to simulate the communication background noise from these types of audio (think airplanes, AM radio, etc.) and apply it to clean data.
My initial thought was to analyze these noisy audio files and find their standard deviations then apply Gaussian White Noise with the average std deviation of all the files (they are all relatively similar with means close to 0). However, there is likely much more to consider since the clean audio I am looking at is at different volumes, etc. With this current attempt at taking random files and applying to a clean speech file, the white noise massively drowns out the clean speech. How am I able to simulate noise from files and apply it to other data?
I make sure to keep all my files in .wav format with a 16000 khz frame rate. I have attached some pseudocode that I have tried to do such a thing but I am looking for other ideas/things to consider.
import numpy as np
import matplotlib.pyplot as plt
import wave
from pydub import AudioSegment, effects
from Pathlib import Path
# get lists of standard deviations and means for all noisy radio-comms files
std_devs = []
for file in Path(audio_files_dir).iterdir():
audio = AudioSegment.from_file(file)
audio = audio.set_frame_rate(16000)
print(file)
print(audio.frame_rate)
samples = audio.get_array_of_samples() # sample's amplitudes
samples = np.array(samples) # cast to numpy array
std_dev_samples = np.std(samples) # get std_dev of the samples
std_devs.append(std_dev_samples)
means.append(mean_samples)
# mean of all standard deviations, will use this to create white noise
print(sum(std_devs) / len(std_devs))
avg_std_devs = sum(std_devs) / len(std_devs)
# taking a clean audio file and prepping it for adding white noise
audio_signal_pydub = AudioSegment.from_file(audio_file_path)
samples = audio_signal_pydub.get_array_of_samples() # take audio signal and turn it into an array of amplitudes
samples = np.array(samples) # turn it into numpy array
print(samples.shape)
# plotting for pydub
time = np.linspace(0, audio_signal_pydub.duration_seconds, num = len(samples)) # create time for x axis in seconds
plt.figure(1)
plt.title("Plot with pydub")
plt.plot(time, samples)
plt.show()
noise = np.random.normal(0, avg_std_devs, samples.shape) # array of white noise with std dev equal to average of the several ATC files
signal_w_noise = samples + noise
plt.figure(1)
plt.title("Plot signal with gaussian white noise")
plt.plot(time, signal_w_noise)
plt.show()
# turn the signal with noise array back into an audio segment for export
new_audio_segment = audio_signal_pydub._spawn(data = signal_w_noise)
print(new_audio_segment)
new_audio_segment.export(r"new_audio.wav", format = 'wav')

Unsure how to use FFT data for spectrum analyzer

I'm trying to create a home made spectrum analyzer with 8 strips of LED's.
The part i'm struggling with is performing the FFT and understanding how to use the results.
So far this is what I have:
import opc
import time
import pyaudio
import wave
import sys
import numpy
import math
CHUNK = 1024
# Gets the pitch from the audio
def pitch(signal):
# NOT SURE IF ANY OF THIS IS CORRECT
signal = numpy.fromstring(signal, 'Int16');
print "signal = ", signal
testing = numpy.fft.fft(signal)
print "testing = ", testing
wf = wave.open(sys.argv[1], 'rb')
RATE = wf.getframerate()
p = pyaudio.PyAudio() # Instantiate PyAudio
# Open Stream
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
channels=wf.getnchannels(),
rate=wf.getframerate(),
output=True)
# Read data
data = wf.readframes(CHUNK)
# Play Stream
while data != '':
stream.write(data)
data = wf.readframes(CHUNK)
frequency = pitch(data)
print "%f frequency" %frequency
I'm struggling with what to do in the pitch method. I know i need to perform FFT on the data that is passed in, but am really unsure how to do it.
Also should be using this function?
Because of the way np.fft.fft works, if you use 1024 data points you will get values for 512 frequencies (plus a value zero Hz, DC offset). If you only want 8 frequencies you have to feed it 16 data points.
You might be able to do what you want by down sampling by a factor of 64 - then 16 down sampled points would be time-equivalent to 1024 original points. I've never explored this so I don't know what this entails or what the pitfalls might be.
You're going to have to do some learning - The Scientist and Engineer's Guide to Digital Signal Processing really is an excellant resource, at least it was for me.
Keep in mind that for an audio cd .wav file the sample frequency is 44100 Hz - a 1024 sample chunk is only 23 mS of the sound.
scipy.io.wavfile.read makes getting the data easy.
samp_rate, data = scipy.io.wavfile.read(filename)
data is a 2-d numpy array with one channel in in column zero, data[:,0], and the other in column 1, data[:,1]
Matplotlib's specgram and psd functions can give you the data you want. A graphing analog to what you are trying to do would be.
from matplotlib import pyplot as plt
import scipy.io.wavfile
samp_rate, data = scipy.io.wavfile.read(filename)
Pxx, freqs, bins, im = plt.specgram(data[:1024,0], NFFT = 16, noverlap = 0, Fs = samp_rate)
plt.show()
plt.close()
Since you aren't doing any plotting just use matplolib.mlab.specgram.
Pxx, freqs, t = matplolib.mlab.specgram(data[:1024,0], NFFT = 16, noverlap = 0, Fs = samp_rate)
Its return values (Pxx, freqs, t) are
- *Pxx*: 2-D array, columns are the periodograms of successive segments
- *freqs*: 1-D array of frequencies corresponding to the rows in Pxx
- *t*: 1-D array of times corresponding to midpoints of segments.
Pxx[1:, 0] would be the values for the frequencies for T0, Pxx[1:, 1] for T1, Pxx[1:, 2] for T2, ... This is what you would feed to your display. You don't use Pxx[0, :] because it is for 0 Hz.
power spectral density - matplotlib.mlab.psd()
Maybe another strategy to get down to 8 bands would be to use large chunks and normalize the values. Then you could break the values up into eight segments and get the sum of each segments. I think this is valid - maybe only for the power spectral density. sklearn.preprocessing.normalize
w = sklearn.preprocessing.normalize(Pxx[1:,:], norm = 'l1', axis = 0)
But then again, I just made all that up.
I don't know about the scipy.io.wavfile.read function that #wwii mentions in his answer, but it seems that his suggestion is the way to go to handle the signal loading. However, I just wanted to comment on the fourier transform.
What I imagine that you intend to do with your LED setup is to change each of the LED's brightnesses according to the power of the spectra in each of the 8 frequency bands that you intend to use. Thus, what I understood that you need, is to compute in some way the power as time goes by. The first complication is "how to compute the spectral power?"
The best way to do this is with the numpy.fft.rfft, which computes the fourier transform for signals that only have real numbers (not complex numbers). On the other hand, the function numpy.fft.fft is a general purpose function that can compute the fast fourier transform for signals with complex numbers. The conceptual difference is that numpy.fft.fft can be used to study travelling waves and their propagation direction. This is seen because the returned amplitudes correspond to positive or negative frequencies that indicate how the wave travels. numpy.fft.rfft yields the amplitude for real-valued frequencies as seen in numpy.fft.rfftfreq, which is what you need.
The last issue is to choose appropriate frequency bands in which to compute the spectral power. The human ear has a huge frequency response range and the width of each band will vary very much, with the low frequency band being very narrow and the high frequency band being very wide. Googling around, I found this nice resource that defines 7 relevant frequency bands
Sub-bass: 20 to 60 Hz
Bass: 60 to 250 Hz
Low midrange: 250 to 500 Hz
Midrange: 500 Hz to 2 kHz
Upper midrange: 2 to 4 kHz
Presence: 4 to 6 kHz
Brilliance: 6 to 20 kHz
I would suggest to use these bands, but split the upper midrange into 2-3 kHz and 3-4kHz. That way you'll be able to use your 8 LED setup. I'm uploading an updated pitch function for you to use
wf = wave.open(sys.argv[1], 'rb')
CHUNK = 1024
RATE = wf.getframerate()
DT = 1./float(RATE) # time between two successive audio frames
FFT_FREQS = numpy.fft.nfftfreq(CHUNCK,DT)
FFT_FREQS_INDS = -numpy.ones_like(FFT_FREQS)
bands_bounds = [[20,60], # Sub-bass
[60,250], # Bass
[250,500], # Low midrange
[500,2000], # Midrange
[2000,3000], # Upper midrange 0
[3000,4000], # Upper midrange 1
[4000,6000], # Presence
[6000,20000]] # Brilliance
for f_ind,freq in enumerate(FFT_FREQS):
for led_ind,bounds in enumerate(bands_bounds):
if freq<bounds[1] and freq>=bounds[0]:
FFT_FREQS_INDS[ind] = led_ind
# Returns the spectral power in each of the 8 bands assigned to the LEDs
def pitch(signal):
# CONSIDER SWITCHING TO scipy.io.wavfile.read TO GET SIGNAL
signal = numpy.fromstring(signal, 'Int16');
amplitude = numpy.fft.rfft(signal.astype(numpy.float))
power = [np.sum(np.abs(amplitude[FFT_FREQS_INDS==led_ind])**2) for led_ind in range(len(bands_bounds))]
return power
The first part of the code computes the fft frequencies and constructs the array FFT_FREQS_INDS that indicates to which of the 8 frequency bands the fft frequency corresponds to. Then, in pitch the power of the spectra in each of the bands is computed. Of course, this can be optimized but I tried to make the code self-explanatory.

Python: Frequency Analysis of Sound Files

I am generating some sound files that play tones at various frequencies with a certain number of harmonics.
Ultimately, these sounds will be played on a device with a small speaker.
I have the frequency response curve of the speaker and want to do the following in Python:
Plot the frequency spectrum of sound file. I need a take the FFT of the file and plot it with gnuplot
Apply a nonlinear transfer function based on the frequency response curve in the data sheet.
Plot the result after the function is applied.
Does anyone know :
What the simplest way to do this would be?
or of an Application (GNU/Linux based) that could do this for me?
I know you didn't mention Pylab/Matplotlib, but it works. Here is an example (assumes single-channel signal):
x, fs, nbits = audiolab.wavread('schubert.wav')
audiolab.play(x, fs)
N = 4*fs # four seconds of audio
X = scipy.fft(x[:N])
Xdb = 20*scipy.log10(scipy.absolute(X))
f = scipy.linspace(0, fs, N, endpoint=False)
pylab.plot(f, Xdb)
pylab.xlim(0, 5000) # view up to 5 kHz
Y = X*H
y = scipy.real(scipy.ifft(Y))
you can use numpy and matPlotLib. Something like the code below:
spectrum = numpy.fft.fft(signal)
frequencies = numpy.fft.fftfreq(len(spectrum))
pylab.plot(frequencies,spectrum)
pylab.show()
That will show a graph of the fft spectrum.
scipy has an FFT and hooks nicely into gnuplot. You should be able to use the signal module to do the math.

Python frequency detection

Ok what im trying to do is a kind of audio processing software that can detect a prevalent frequency an if the frequency is played for long enough (few ms) i know i got a positive match. i know i would need to use FFT or something simiral but in this field of math i suck, i did search the internet but didn not find a code that could do only this.
the goal im trying to accieve is to make myself a custom protocol to send data trough sound, need very low bitrate per sec (5-10bps) but im also very limited on the transmiting end so the recieving software will need to be able custom (cant use an actual hardware/software modem) also i want this to be software only (no additional hardware except soundcard)
thanks alot for the help.
The aubio libraries have been wrapped with SWIG and can thus be used by Python. Among their many features include several methods for pitch detection/estimation including the YIN algorithm and some harmonic comb algorithms.
However, if you want something simpler, I wrote some code for pitch estimation some time ago and you can take it or leave it. It won't be as accurate as using the algorithms in aubio, but it might be good enough for your needs. I basically just took the FFT of the data times a window (a Blackman window in this case), squared the FFT values, found the bin that had the highest value, and used a quadratic interpolation around the peak using the log of the max value and its two neighboring values to find the fundamental frequency. The quadratic interpolation I took from some paper that I found.
It works fairly well on test tones, but it will not be as robust or as accurate as the other methods mentioned above. The accuracy can be increased by increasing the chunk size (or reduced by decreasing it). The chunk size should be a multiple of 2 to make full use of the FFT. Also, I am only determining the fundamental pitch for each chunk with no overlap. I used PyAudio to play the sound through while writing out the estimated pitch.
Source Code:
# Read in a WAV and find the freq's
import pyaudio
import wave
import numpy as np
chunk = 2048
# open up a wave
wf = wave.open('test-tones/440hz.wav', 'rb')
swidth = wf.getsampwidth()
RATE = wf.getframerate()
# use a Blackman window
window = np.blackman(chunk)
# open stream
p = pyaudio.PyAudio()
stream = p.open(format =
p.get_format_from_width(wf.getsampwidth()),
channels = wf.getnchannels(),
rate = RATE,
output = True)
# read some data
data = wf.readframes(chunk)
# play stream and find the frequency of each chunk
while len(data) == chunk*swidth:
# write data out to the audio stream
stream.write(data)
# unpack the data and times by the hamming window
indata = np.array(wave.struct.unpack("%dh"%(len(data)/swidth),\
data))*window
# Take the fft and square each value
fftData=abs(np.fft.rfft(indata))**2
# find the maximum
which = fftData[1:].argmax() + 1
# use quadratic interpolation around the max
if which != len(fftData)-1:
y0,y1,y2 = np.log(fftData[which-1:which+2:])
x1 = (y2 - y0) * .5 / (2 * y1 - y2 - y0)
# find the frequency and output it
thefreq = (which+x1)*RATE/chunk
print "The freq is %f Hz." % (thefreq)
else:
thefreq = which*RATE/chunk
print "The freq is %f Hz." % (thefreq)
# read some more data
data = wf.readframes(chunk)
if data:
stream.write(data)
stream.close()
p.terminate()
If you're going to use FSK (frequency shift keying) for encoding data, you're probably better off using the Goertzel algorithm so you can check just the frequencies you want, instead of a full DFT/FFT.
You can find the frequency spectrum of the sliding windows over your sound from here and then check the presence of the prevalent frequency band via finding the area under the frequency spectrum curve for that band from here.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import auc
np.random.seed(0)
# Sine sample with a frequency of 5hz and add some noise
sr = 32 # sampling rate
y = np.linspace(0, 5 * 2*np.pi, sr)
y = np.tile(np.sin(y), 5)
y += np.random.normal(0, 1, y.shape)
t = np.arange(len(y)) / float(sr)
# Generate frquency spectrum
spectrum, freqs, _ = plt.magnitude_spectrum(y, sr)
# Calculate percentage for a frequency range
lower_frq, upper_frq = 4, 6
ind_band = np.where((freqs > lower_frq) & (freqs < upper_frq))
plt.fill_between(freqs[ind_band], spectrum[ind_band], color='red', alpha=0.6)
frq_band_perc = auc(freqs[ind_band], spectrum[ind_band]) / auc(freqs, spectrum)
print('{:.1%}'.format(frq_band_perc))
# 19.8%
While I haven't tried audio processing with Python before, perhaps you could build something based on SciPy (or its subproject NumPy), a framework for efficient scientific/engineering numerical computation? You might start by looking at scipy.fftpack for your FFT.

Categories

Resources