How to do Histogram Equalization based on audio frequency?

How to do Histogram Equalization based on audio frequency? - python

I have already tried histogram equalization based on image and it works just fine.
But now I want to implement the same approach using audio frequency instead of image gray scale. Which means I would like to make the spectrum flatter. The sampling rate I use is 44.1kHz and want to make the frequency evenly spread to range 0-22050Hz, but the peak is still the highest.
Here is the spectrum:
And this is what I have tried:
I think the original histogram I plot is already wrong, I can't count the number of occurrences per frequency, or maybe I shouldn't do this at all. Somebody told me I need to use fft() but I have no idea how to do it.
Any help would be appreciated! Thanks
Here is the code for how I plot the spectrum :
import librosa
import numpy as np
import matplotlib.pyplot as plt
import math
file = 'example.wav'
y, sr = librosa.load(file, sr=None)
n_fft = 2048
S = librosa.stft(y, n_fft=n_fft, hop_length=n_fft//2)
S = abs(S)
D_AVG = np.mean(S, axis=1)
plt.figure(figsize=(25, 12))
plt.bar(np.arange(D_AVG.shape[0]), D_AVG)
x_ticks_positions = [n for n in range(0, n_fft // 2, n_fft // 16)]
x_ticks_labels = [str(sr / 2048 * n) + 'Hz' for n in x_ticks_positions]
plt.xticks(x_ticks_positions, x_ticks_labels)
plt.xlabel('Frequency')
plt.ylabel('dB')
plt.savefig('spectrum.png')

"Equalization" in the sense of making a flat frequency spectrum is usually done by a whitening transformation. This post on dsp.stackexchange might also be helpful. As Mark mentioned, this spectral equalization is different from histogram equalization in image processing.
Equalizing/whitening the spectrum of a signal:
Estimate the PSD. Given an array of samples x with sample rate fs, you can compute a robust estimate of the power spectral density (PSD) with scipy.signal.welch:
f, psd = scipy.signal.welch(x, fs=fs)
This function performs Welch's method. Basically, it divides up the signal into several segments, does FFT on each one, and averages the power spectra to get a good estimate of how much power x has at each frequency on average. The point of all this is it gets a more reliable frequency characterization than just taking one FFT of x as a whole.
Compute equalizer gain. Use eq_gain = 1 / (1e-6 + psd)**0.5, or something similar, to determine the gain of the equalizer. The 1e-6 denominator offset is to avoid division by zero. It often happens that the PSD extremely small for some frequencies because, say, x went through an anti-aliasing filter that made some high frequency powers nearly zero.
Apply the equalizer gain. Finally, eq_gain needs to be applied to the signal x to equalize it. There are many ways this could be done, but one way is to use scipy.signal.firwin2 to turn the gains into an FIR filter,
eq_filter = scipy.signal.firwin2(99, f, eq_gain, fs=fs)
and use a convolution or scipy.signal.lfilter to apply the filter to x. You can then use scipy.signal.welch again to check that the PSD is flatter than before.

Related

Area under the peak of a FFT in Python

I'm trying to do some tests before I proceed analyzing some real dataset via FFT, and I've found the following problem.
First, I create a signal as the sum of two cosines and then use rfft to to the transformation (since it has only real values):
import numpy as np
import matplotlib.pyplot as plt
from scipy.fft import rfft, rfftfreq
# Number of sample points
N = 800
# Sample spacing
T = 1.0 / 800.0
x = np.linspace(0.0, N*T, N)
y = 0.5*np.cos(10*2*np.pi*x) + 0.5*np.cos(200*2*np.pi*x)
# FFT
yf = rfft(y)
xf = rfftfreq(N, T)
fig, ax = plt.subplots(1,2,figsize=(15,5))
ax[0].plot(x,y)
ax[1].plot(xf, 2.0/N*np.abs(yf))
As it can be seen from the definition of the signal, I have two oscillations with amplitude 0.5 and frequency 10 and 200. Now, I would expect the FFT spectrum to be something like two deltas at those points, but apparently increasing the frequency broadens the peaks:
From the first peak it can be infered that the amplitude is 0.5, but not for the second. I've tryied to obtain the area under the peak using np.trapz and use that as an estimate for the amplitude, but as it is close to a dirac delta it's very sensitive to the interval I choose. My problem is that I need to get the amplitude as exact as possible for my data analysis.
EDIT: As it seems to be something related with the number of points, I decided to increment (now that I can) the sample frequency. This seems to solve the problem, as it can be seen in the figure:
However, it still seems strange that for a certain number of points and sample frequency, the high frequency peaks broaden...

It is not strange , you have leakage of the frequency bins. When you discretize the signal (sampling) needed for the Fourier transfrom , frequency bins are created which are frequency intervals where the the amplitude is calculated. And each bin has wide which is given by the sample_rate / num_points . So , the less the number of bins the more difficult is to assign precise amplitudes to every frequency. Other problems in choosing the best sampling rate exist such as the shannon-nyquist theorem to prevent aliasing. https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem . But depending on the problem sometimes there some custom rates used for sampling. E.g. when dealing with audio a sampling rate of 44,100 Hz is widely used , cause is based on the limits of the human hearing. So it depends also on nature of the data you want to perform analysis as you wrote. Anyway , since this question has also theoretical value , you can also check https://dsp.stackexchange.com for some useful info.

I would comment to George's answer, but yet I cannot.
Maybe a starting point for your research are the properties of the Discrete Fourier Transform.
The signal in the time domain is actual the cosines multiplied by a box window which transforms into the frequency domain as the convolution of the deltas with the sinc function. The sinc functions will smear the spectrum.
However, I am not sure we are observing spectral leakage here, since the window fits exactly to the full period of cosines. The discretization of the bins might still play a role here.

How do I get the frequencies from a signal?

I am look for a way to obtain the frequency from a signal. Here's an example:
signal = [numpy.sin(numpy.pi * x / 2) for x in range(1000)]
This Array will represent the sample of a recorded sound (x = miliseconds)
sin(pi*x/2) => 250 Hrz
How can we go from the signal (list of points), to obtaining the frequencies form this array?
Note:
I have read many Stackoverflow threads and watch many youtube videos. I am yet to find an answer. Please use simple words.
(I am Thankfull for every answer)

What you're looking for is known as the Fourier Transform
A bit of background
Let's start with the formal definition:
The Fourier transform (FT) decomposes a function (often a function of time, or a signal) into its constituent frequencies
This is in essence a mathematical operation that when applied over a signal, gives you an idea of how present each frequency is in the time series. In order to get some intuition behind this, it might be helpful to look at the mathematical definition of the DFT:
Where k here is swept all the way up t N-1 to calculate all the DFT coefficients.
The first thing to notice is that, this definition resembles somewhat that of the correlation of two functions, in this case x(n) and the negative exponential function. While this may seem a little bit abstract, by using Euler's formula and by playing a bit around with the definition, the DFT can be expressed as the correlation with both a sine wave and a cosine wave, which will account for the imaginary and the real parts of the DFT.
So keeping in mind that this is in essence computing a correlation, whenever a corresponding sine or cosine from the decomposition of the complex exponential matches with that of x(n), there will be a peak in X(K), meaning that, such frequency is present in the signal.
How can we do the same with numpy?
So having given a very brief theoretical background, let's consider an example to see how this can be implemented in python. Lets consider the following signal:
import numpy as np
import matplotlib.pyplot as plt
Fs = 150.0; # sampling rate
Ts = 1.0/Fs; # sampling interval
t = np.arange(0,1,Ts) # time vector
ff = 50; # frequency of the signal
y = np.sin(2*np.pi*ff*t)
plt.plot(t, y)
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.show()
Now, the DFT can be computed by using np.fft.fft, which as mentioned, will be telling you which is the contribution of each frequency in the signal now in the transformed domain:
n = len(y) # length of the signal
k = np.arange(n)
T = n/Fs
frq = k/T # two sides frequency range
frq = frq[:len(frq)//2] # one side frequency range
Y = np.fft.fft(y)/n # dft and normalization
Y = Y[:n//2]
Now, if we plot the actual spectrum, you will see that we get a peak at the frequency of 50Hz, which in mathematical terms it will be a delta function centred in the fundamental frequency of 50Hz. This can be checked in the following Table of Fourier Transform Pairs table.
So for the above signal, we would get:
plt.plot(frq,abs(Y)) # plotting the spectrum
plt.xlabel('Freq (Hz)')
plt.ylabel('|Y(freq)|')
plt.show()

Applying Fourier Transform on Time Series data and avoiding aliasing

I am willing to apply Fourier transform on a time series data to convert data into frequency domain. I am not sure if the method I've used to apply Fourier Transform is correct or not? Following is the link to data that I've used.
After reading the data file I've plotted original data using
t = np.linspace(0,55*24*60*60, 55)
s = df.values
sns.set_style("darkgrid")
plt.ylabel("Amplitude")
plt.xlabel("Time [s]")
plt.plot(t, s)
plt.show()
Since the data is on a daily frequency I've converted it into seconds using 24*60*60 and for a period of 55 days using 55*24*60*60
The graph looks as follows:
Next I've implemeted Fourier Transform using following piece of code and obtained the image as follows:
#Applying Fourier Transform
fft = fftpack.fft(s)
#Time taken by one complete cycle of wave (seconds)
T = t[1] - t[0]
#Calculating sampling frequency
F = 1/T
N = s.size
#Avoid aliasing by multiplying sampling frequency by 1/2
f = np.linspace(0, 0.5*F, N)
#Convert frequency to mHz
f = f * 1000
#Plotting frequency domain against amplitude
sns.set_style("darkgrid")
plt.ylabel("Amplitude")
plt.xlabel("Frequency [mHz]")
plt.plot(f[:N // 2], np.abs(fft)[:N // 2])
plt.show()
I've following questions:
I am not sure if my above methodology is correct to implement Fourier Transform.
I am not sure if the method I am using to avoid aliasing is correct.
If, what I've done is correct than how to interpret the three peaks in Frequency domain plot.
Finally, how would I invert transform using only frequencies that are significant.

While I'd refrain from answering your first two questions (it looks okay to me but I'd love an expert's input), I can weigh in on the latter two:
If, what I've done is correct than how to interpret the three peaks in Frequency domain plot.
Well, that means you've got three main components to your signal at frequencies roughly 0.00025 mHz (not the best choice of units here, possibly!), 0.00125 mHz and 0.00275 mHz.
Finally, how would I invert transform using only frequencies that are significant.
You could just zero out every frequency below a cutoff you decide (say, absolute value of 3 - that should cover your peaks here). Then you can do:
below_cutoff = np.abs(fft) < 3
fft[below_cutoff] = 0
cleaner_signal = fftpack.ifft(fft)
And that should do it, really!

extracting phase information using numpy fft

I am trying to use a fast fourier transform to extract the phase shift of a single sinusoidal function. I know that on paper, If we denote the transform of our function as T, then we have the following relations:
However, I am finding that while I am able to accurately capture the frequency of my cosine wave, the phase is inaccurate unless I sample at an extremely high rate. For example:
import numpy as np
import pylab as pl
num_t = 100000
t = np.linspace(0,1,num_t)
dt = 1.0/num_t
w = 2.0*np.pi*30.0
phase = np.pi/2.0
amp = np.fft.rfft(np.cos(w*t+phase))
freqs = np.fft.rfftfreq(t.shape[-1],dt)
print (np.arctan2(amp.imag,amp.real))[30]
pl.subplot(211)
pl.plot(freqs[:60],np.sqrt(amp.real**2+amp.imag**2)[:60])
pl.subplot(212)
pl.plot(freqs[:60],(np.arctan2(amp.imag,amp.real))[:60])
pl.show()
Using num=100000 points I get a phase of 1.57173880459.
Using num=10000 points I get a phase of 1.58022110476.
Using num=1000 points I get a phase of 1.6650441064.
What's going wrong? Even with 1000 points I have 33 points per cycle, which should be enough to resolve it. Is there maybe a way to increase the number of computed frequency points? Is there any way to do this with a "low" number of points?
EDIT: from further experimentation it seems that I need ~1000 points per cycle in order to accurately extract a phase. Why?!
EDIT 2: further experiments indicate that accuracy is related to number of points per cycle, rather than absolute numbers. Increasing the number of sampled points per cycle makes phase more accurate, but if both signal frequency and number of sampled points are increased by the same factor, the accuracy stays the same.

Your points are not distributed equally over the interval, you have the point at the end doubled: 0 is the same point as 1. This gets less important the more points you take, obviusly, but still gives some error. You can avoid it totally, the linspace has a flag for this. Also it has a flag to return you the dt directly along with the array.
Do
t, dt = np.linspace(0, 1, num_t, endpoint=False, retstep=True)
instead of
t = np.linspace(0,1,num_t)
dt = 1.0/num_t
then it works :)

The phase value in the result bin of an unrotated FFT is only correct if the input signal is exactly integer periodic within the FFT length. Your test signal is not, thus the FFT measures something partially related to the phase difference of the signal discontinuity between end-points of the test sinusoid. A higher sample rate will create a slightly different last end-point from the sinusoid, and thus a possibly smaller discontinuity.
If you want to decrease this FFT phase measurement error, create your test signal so the your test phase is referenced to the exact center (sample N/2) of the test vector (not the 1st sample), and then do an fftshift operation (rotate by N/2) so that there will be no signal discontinuity between the 1st and last point in your resulting FFT input vector of length N.

This snippet of code might help:
def reconstruct_ifft(data):
"""
In this function, we take in a signal, find its fft, retain the dominant modes and reconstruct the signal from that
Parameters
----------
data : Signal to do the fft, ifft
Returns
-------
reconstructed_signal : the reconstructed signal
"""
N = data.size
yf = rfft(data)
amp_yf = np.abs(yf) #amplitude
yf = yf*(amp_yf>(THRESHOLD*np.amax(amp_yf)))
reconstructed_signal = irfft(yf)
return reconstructed_signal
The 0.01 is the threshold of amplitudes of the fft that you would want to retain. Making the THRESHOLD greater(more than 1 does not make any sense), will give
fewer modes and cause higher rms error but ensures higher frequency selectivity.
(Please adjust the TABS for the python code)

Python - modelling noise in electrical systems

This may open a can of worms or will be very easily answered:
I'm building a model of a system within Python: how do I quantitatively add noise? So far I have this (below code) -
i. Can I do this by broadcasting, even for unique noise added to each sample?
and
ii. Should noise be Gaussian or Uniform for electrical signal modelling?
(Gaussian I think though I'm unsure)
import random
import numpy as np
import matplotlib.pyplot as plt
f = 1e6
T = 1/f
pi = np.pi
t = np.arange(0,20e-6,10e-9)
# create signal and normalise
y = np.sin(2*pi*f*t)
y /= max(y)
# add noise
for i in range(0, len(y)):
noise = random.uniform(-1, 1) / 10 **#10% noise added**
y[i] += noise
plt.figure(1)
plt.plot(t*1e6,y,'r-')
plt.grid()
plt.show()

Judging by the signal you've generated it looks like your going for volts vs time. In which case you're wanting to add Gaussian noise.
You can generate Gaussian noise by exploiting the central limits theorem. Simply generate a bunch of random numbers (the distribution doesn't matter), add them together, store the result. Repeat that len(y) times and the list of results will be randomish but Gaussian distributed. Then just add that list to your y signal. But there's probably a predefined routine to give you Gaussian noise in the first place.
As for doing it in a more pythonic way, I expect numpy has a vector add routine.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.