How do I call a librosa function on the entire audio file?

How do I call a librosa function on the entire audio file? - python

I have short audio files which I'm trying to analyze using Librosa, in particular the spectral centroid function. However, this function outputs an array of different values representing the spectral centroid at different frames within the audio file. The documentation says that the frame size can be changed by specifying the parameter n_fft when calling the function. It would be more beneficial to me if this function analyzed the entire audio file at once rather than outputting the result at multiple points in time. Is there a way for me to specify that I want the function to be called with, say, a frame size of the entire audio file instead of the default time which is 2048 samples? Is there another better way?
Cheers and thank you!

The length of the FFT window (n_fft) specifies not only how many samples you need, but also the frequency resolution of the result (longer n_fft, better resolution). To ensure comparable results for many files you probably want to use the same n_fft value for all of them.
With that out of the way, say your files all have no more than 16k samples. Then you may still achieve a reasonable runtime (FFT runs in O(N log N)). Obviously, this will get worse as your file size increases. So you could call spectral_centroid(y=y, n_fft=16384, hop_length=16384, center=False) and because hop_length is set to the same value as n_fft you would compute the FFT for non-overlapping windows. And because n_fft is greater than the max number of samples in all your files (in this example), you should get only one value. Note that I set center to False to avoid an adjustment that is not necessary for your scenario.
Alternatively to choosing a long transform window, you could also compute many values for overlapping windows (or frames) using the STFT (which is what librosa does anyway) and simply average the resulting values like this:
import numpy as np
import librosa
y, sr = librosa.load(librosa.ex('trumpet'))
cent = librosa.feature.spectral_centroid(y=y, sr=sr, center=False)
avg_cent = np.mean(cent)
print(avg_cent)
2618.004809523263
The latter solution is in line with what is usually done in MIR and my recommendation. Note that this also allows you to use other statistics functions like the median, which may or may not be something you are interested in. In other words, you can determine the distribution of the centroids, which arguably carries more meaning.

Related

How can I process large 3D grids of background data in Python to be quickly accessible for calculating numerical integrals dependent on the grid data?

The Problem
I'm a physics graduate research assistant, and I'm trying to build a very ambitious Python code that, boiled down to the basics, evaluates a line integral in a background described by very large arrays of data.
I'm generating large sets of data that are essentially t x n x n arrays, with n and t on the order of 100. They represent the time-varying temperatures and velocities of a 2 dimensional space. I need to collect many of these grids, then randomly choose a dataset and calculate a numerical integral dependent on the grid data along some random path through the plane (essentially 3 separate grids: x-velocity, y-velocity, and temperature, as the vector information is important). The end goal is gross amounts of statistics on the integral values for given datasets.
Visual example of time-varying background.
That essentially entails being able to sample the background at particular points in time and space, say like (t, x, y). The data begins as a big ol' table of points, with each row organized as ['time','xpos','ypos','temp','xvel','yvel'], with an entry for each (xpos, ypos) point in each timestep time, and I can massage it how I like.
The issue is that I need to sample thousands of random paths in many different backgrounds, so time is a big concern. Barring the time required to generate the grids, the biggest holdup is being able to access the data points on the fly.
I previously built a proof of concept of this project in Mathematica, which is much friendlier to the analytic mindset that I'm approaching the project from. In that prototype, I read in a collection of 10 backgrounds and used Mathematica's ListInterpolation[] function to generate a continuous function that represented the discrete grid data. I stored these 10 interpolating functions in an array and randomly called them when calculating my numerical integrals.
This method worked well enough, after some massaging, for 10 datasets, but as the project moves forward that may rapidly expand to, say, 10000 datasets. Eventually, the project is likely to be set up to generate the datasets on the fly, saving each for future analysis if necessary, but that is some time off. By then, it will be running on a large cluster machine that should be able to parallelize a lot of this.
In the meantime, I would like to generate some number of datasets and be able to sample from them at will in whatever is likely to be the fastest process. The data must be interpolated to be continuous, but beyond that I can be very flexible with how I want to do this. My plan was to do something similar to the above, but I was hoping to find a way to generate these interpolating functions for each dataset ahead of time, then save them to some file. The code would then randomly select a background, load its interpolating function, and evaluate the line integral.
Initial Research
While hunting to see if someone else had already asked a similar question, I came across this:
Fast interpolation of grid data
The OP seemed interested in just getting back a tighter grid rather than a callable function, which might be useful to me if all else fails, but the solutions also seemed to rely on methods that are limited by the size of my datasets.
I've been Googling about for interpolation packages that could get at what I want. The only things I've found that seem to fit the bill are:
Scipy griddata()
Scipy interpn()
Numpy interp()
Attempts at a Solution
I have one sample dataset (I would make it available, but it's about 200MB or so), and I'm trying to generate and store an interpolating function for the temperature grid. Even just this step is proving pretty troubling for me, since I'm not very fluent in Python. I found that it was slightly faster to load the data through pandas, cut to the sections I'm interested in, and then stick this in a numpy array.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.interpolate import griddata
# Load grid data from file
gridData = pd.read_fwf('Backgrounds\\viscous_14_moments_evo.dat', header=None, names=['time','xpos','ypos','temp','xvel','yvel'])
# Set grid parameters
# nGridSpaces is total number of grid spaces / bins.
# Will be data-dependent in the future.
nGridSpaces = 27225
# Number of timesteps is gridData's time column divided by number of grid spaces.
NT = int(len(gridData['time'])/nGridSpaces)
From here, I've tried to use Scipy's interpnd() and griddata() functions, to no avail. I believe I'm just not understanding how it wants to take the data. I think that my issue with both is trying to corral the (t, x, y) "points" corresponding to the temperature values into a useable form.
The main thrust of my efforts has been trying to get them into Numpy's meshgrid(), but I believe that maybe I'm hitting the upper limit of the size of data Numpy will take for this sort of thing.
# Lists of points individually
tList=np.ndarray.flatten(pd.DataFrame(gridData[['time']]).to_numpy())
xList=np.ndarray.flatten(pd.DataFrame(gridData[['xpos']]).to_numpy())
yList=np.ndarray.flatten(pd.DataFrame(gridData[['ypos']]).to_numpy())
# 3D grid of points
points = np.meshgrid(tList, xList, yList)
# List of temperature values
tempValues=np.ndarray.flatten(pd.DataFrame(gridData[['temp']]).to_numpy())
# Interpolate and spit out a value for a point somewhere central-ish as a check
point = np.array([1,80,80])
griddata(points, tempValues, point)
This returns a value error on the line calling meshgrid():
ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.
The Questions
First off... What limitations are there on the sizes of datasets I'm using? I wasn't able to find anything in Numpy's documentation about maximum sizes.
Next... Does this even sound like a smart plan? Can anyone think of a smarter framework to get where I want to go?
What are the biggest impacts on speed when handling these types of large arrays, and how can I mitigate them?

Down Sampling a long vector based on average

Problem:
I have a long vector which I want to downsample.
By downsampling I mean take the long vector and shorten it by averaging parts of it.
I'm using Python (2.7)
For example I have a mono audio file which was recorded at 44,100 Hz.
The recording is 10 seconds long, providing a vector of length 441,000.
Say I want to downsample it to 8KHz. The output vector should be 80,000 long.
Of course, much of information will be lost by downsampling.
Suggested solution:
The Sample rates ratio is:
R = 44100 / 8000 = 5.5125
So, I would like to take every 5.5125 samples of the original vector and average it into a single number.
Is there a library function which does this (i.e. from Numpy, statistics, etc. Python libraries)?
If not, can you recommend a neat way to run this? How can I average a non-integer index (5.5125)?
How can I "break" the original vector into smaller parts of 5.5125 size in a neat way?

What is the most efficient way to filter (smooth) continuous streaming data

I am in the process of making my own system monitoring tool. I'm looking to run a filter (like a Gaussian filter or similar) on a continuous stream of raw data that i'm receiving from a device (My cpu % in this case).
The collection of data values is n elements long. Every time this piece of code runs it appends the new cpu value and removes the oldest keeping the collection at a length of n essentially a deque([float('nan')] * n, maxlen=n) where n is the length of the graph i'm plotting to.
then it filters the whole collection through a Gaussian filter creating the smoothed data points and then plots them, creating an animated graph similar to most system monitors cpu % graphs found on your computer.
This works just fine... However there has to be a more efficient way to filter the incoming data instead of running a filter on the whole data set every time a new data val is added (in my case the graph updates every .2 sec)
I can think of ways to do it without filtering the whole list but im not sure they are very efficient. Is there anything out there in the signal processing world that will work for me? Apologies if my explanation is a bit confusing, I'm very new to this.
from scipy.ndimage.filters import gaussian_filter1d
# Not my actual code but hopefully describes what im doing
def animate(): # function that is called every couple of milliseconds to animate the graph
# ... other stuff
values.append(get_new_val) # values = collection of data vals from cpu
line.set_ydata(gaussian_filter1d(values, sigma=4)) # line = the line object used for graphing
# ... other stuff
graph_line(line) # function that graphs the line
tl;dr: looking for an optimized way to smooth raw streaming data instead of filtering the whole data set every pass.

I've never used one, but what you need like sounds what a Savitzky–Golay filter is for. It is a local smoothing filter that can be used to make data more differentiable (and to differentiate it, while we're at it).
The good news is that scipy supports this filter as of version 0.14. The relevant part of the documentation:
scipy.signal.savgol_filter(x, window_length, polyorder, deriv=0, delta=1.0, axis=-1, mode='interp', cval=0.0)
Apply a Savitzky-Golay filter to an array.
This is a 1-d filter. If x has dimension greater than 1, axis determines the axis along which the filter is applied.
Parameters:
x : array_like
The data to be filtered. If x is not a single or double precision floating point array, it will be converted to type numpy.float64 before ftering.
window_length : int
The length of the filter window (i.e. the number of coefficients). window_length must be a positive odd integer.
polyorder : int
The order of the polynomial used to fit the samples. polyorder must be less than window_length.
[...]
I would first determine a small pair of polynomial order and window size. Instead of working with the full n data points, you only need to smooth a much smaller deque of a length of roughly window_length. As each new data point comes in, you have to append it to your smaller deque, apply the Savitzky–Golay filter, take the new filtered point, and append it to your graph.
Note, however, that it seems to me that the method is mostly well-defined when not on the edge of the data set. This might mean that for precision's sake you might have to introduce a few measurements' worth of delay, so that you can always use points which are inside a given window (what I mean is that for a given time point you likely need "future" data points to get a reliable filtered value). Considering that your data is measured five times every second, this might be a reasonable compromise if necessary.

Filter gain issue when using scipy.signal in Python

I am attempting to filter a list of 16-bit two's-complement integer using a Butterworth filter generated using scipy.signal.butter and scipy.signal.lfilter (the list was extracted from a PCM-encoded .wav file). My primary language is C, however the majority of this project already exists in Python, and so I need to code this feature within the existing Python framework.
The butterworth filters produced by scipy.signal (as far as I'm aware) are unity gain, and based upon the plot of the frequency response of the filter, that should be the case. I designed the filter using butter() and filtered my dataset using:
data.l = lfilter(b, a, data.l)
where data.l is a class containing an list 'l' containing PCM data, and 'b' and 'a' are the coefficients produced by butter().
However, regardless of the PCM data that I send in, the filter appears to be applying non-unity gain.
For example, filtering full-bandwidth pink noise with max(data.l) = 32,767, and min(data.l) = -32,768 before filtering (as expected for a 16-bit PCM signal) returns a signal with approximately 5% increased gain in the passband. i.e max(data.l) = 34,319.0057 and min(data.l) = -37,593.
The filter appears to be correctly filtering the signal apart from the gain; if I save this PCM data back into a .wav file, and compare a spectrogram of the data to the original signal, the frequency response is exactly as would be expected from my test filters. It seems to be functioning perfectly except for the odd increase in gain?
Obviously I can just rescale this output down to fit into my 16-bit PCM dataset, however I am writing this as part of a wider set of signal processing modules that are designed to be flexible and eventually include non-unity gain filters. For this reason, I want to attempt to figure out why this gain is being applied so as to potentially fix the issue, and not be arbitrarily rescaling the output of my butter() filter.
Does anyone with experience with scipy.signal have an idea as to why this may be the case?

It is normal for a discrete time Butterworth filter to have ringing transients, and these can overshoot the bounds of the input. Take a look at the step response of your filter (as opposed to the frequency response, which is a steady state calculation).
For example, the following code
In [638]: from scipy.signal import butter, lfilter, freqz
In [639]: b, a = butter(3, 0.2)
In [640]: step_response = lfilter(b, a, np.ones(50))
In [641]: plot(step_response)
Out[641]: [<matplotlib.lines.Line2D at 0x10ecb2850>]
generates the plot
Note the overshoot.
The frequency response shows the expected gains for a (discrete time) Butterworth filter.
In [642]: w, h = freqz(b, a, worN=1000)
In [643]: plot(w/np.pi, np.abs(h))
Out[643]: [<matplotlib.lines.Line2D at 0x10f7a90d0>]
See also: http://www.dspguide.com/ch20/3.htm

Concerning your second question: If you remove the overshoots, you'll either cause distortions (clipping) or you'll end up with a different frequency response as impulse / step and frequency response are chained together by the Laplace transform.
By filtering, you change your samples, so the concept of "preserving signal level" is questionable in my opinion - the level of a stationary (e.g. sinusoidal) signal in the passband should remain more or less the same as a Butterworth filter has no ripple in the passband, but components in the stop band (transients -> high frequency) are changed of course.
You could try using a filter with Bessel characteristics that has no overshoot (at the cost of a more gentle slope between pass and stop band) if you want to avoid the rescaling.

FFT result; amplitude-phase relation

I want to amplify audio input at specific frequencies, and I use numpy.fft.
So my question is: When changing the amplitudes of the signal, what happens with phase?
For example, if I multiply amplitudes in some frequency range, by some factor, let's say 2, do I need to change the phases, and if so, what should I do with them?
I've done the amplification without changing phases, and the result was not what I wanted. It's pretty much the same signal, with some unwanted noise.

You shouldn't need the change the phase for something like this. More likely the problem is that you need to be a bit more gentle about applying the boost. It sounds like you are taking some frequency window and multiplying by a constant while leaving everything else unchanged. This will cause ringing in the time domain with a very long tail. You need to smooth the transition from the gain=1 region to the gain=2 region, for instance by using a gaussian waveform with code that looks something like this:
x, t = get_waveform()
f0, df = get_parameters() # Center frequency and bandwidth of gain region
f = np.fft.rfft(x)
freqs = np.fft.fftfreq(len(x), t[1]-t[0])
freqs = freqs[0:len(f)] # rfft has only non-negative frequency components
gain_window = 1 + np.exp(-(freqs-f0)**2/(df)**2)
f = f * gain_window
x = np.fft.irfft(f)
return x
If that works, you can experiment with more aggressive functions that have sharper turn-on and a flatter top.
The FFT may not actually be what you want. FFTs are not normally used for real-time / streaming applications. This is because in the naive approach you have to collect the whole sample buffer before you start processing. For simple filtering applications it is often easier to do filtering directly in the time domain. This is what FIR and IIR filters do.
In order to filter with the fourier transform in real time what you have to do is break your data stream into overlapping blocks of a fixed length, FFT, filter, reverse FFT, and stich them back together without introducing glitches. This is possible, but it is tricky to get right. For a full-blown multi-channel EQ it might still be the best option, but you should at least consider time domain filtering.
If this is not a real-time application, then FFT is the way to go. For medium sized data sets (up to a few hundred megabytes) you can just FFT the whole data set. For much larger data sets you still have to break the data up into blocks, but they can be much larger blocks and you don't have to worry about the latency introduced.
Also, remember the FFT treats the signal as periodic, so if your signal doesn't go to zero at the beginning and end you will need to do some sort of windowing.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.