This may open a can of worms or will be very easily answered:
I'm building a model of a system within Python: how do I quantitatively add noise? So far I have this (below code) -
i. Can I do this by broadcasting, even for unique noise added to each sample?
and
ii. Should noise be Gaussian or Uniform for electrical signal modelling?
(Gaussian I think though I'm unsure)
import random
import numpy as np
import matplotlib.pyplot as plt
f = 1e6
T = 1/f
pi = np.pi
t = np.arange(0,20e-6,10e-9)
# create signal and normalise
y = np.sin(2*pi*f*t)
y /= max(y)
# add noise
for i in range(0, len(y)):
noise = random.uniform(-1, 1) / 10 **#10% noise added**
y[i] += noise
plt.figure(1)
plt.plot(t*1e6,y,'r-')
plt.grid()
plt.show()
Judging by the signal you've generated it looks like your going for volts vs time. In which case you're wanting to add Gaussian noise.
You can generate Gaussian noise by exploiting the central limits theorem. Simply generate a bunch of random numbers (the distribution doesn't matter), add them together, store the result. Repeat that len(y) times and the list of results will be randomish but Gaussian distributed. Then just add that list to your y signal. But there's probably a predefined routine to give you Gaussian noise in the first place.
As for doing it in a more pythonic way, I expect numpy has a vector add routine.
Related
I am binning some time series data, I need to apply a half-normal filter to the binned data. How can I do this in python? I've provided a toy example bellow. I need Xbinned to be smoothed with a half-gaussian filter with std of 0.25 (or what ever). I'm pretty sure the half gaussian should be facing the forward time direction.
import numpy as np
X = np.random.randint(2, size=100) #example random process
bin_size = 5
Xbinned = []
for i in range(0, len(X)+1, bin_size):
Xbinned.append(sum(X[i:i+(bin_size-1)])/bin_size)
How to implement half-gaussian filtering
Scipy has a function called scipy.ndimage.gaussian_filter(). It nearly implements what we want here. Unfortunately, there's no option to use a half-gaussian instead of a gaussian. However, scipy is open-source, so we can just take the source code and modify it to be a half-gaussian.
I used this source code, and removed all of the parts that are not needed for this particular case. At the end, I had this:
import scipy.ndimage
def halfgaussian_kernel1d(sigma, radius):
"""
Computes a 1-D Half-Gaussian convolution kernel.
"""
sigma2 = sigma * sigma
x = np.arange(0, radius+1)
phi_x = np.exp(-0.5 / sigma2 * x ** 2)
phi_x = phi_x / phi_x.sum()
return phi_x
def halfgaussian_filter1d(input, sigma, axis=-1, output=None,
mode="constant", cval=0.0, truncate=4.0):
"""
Convolves a 1-D Half-Gaussian convolution kernel.
"""
sd = float(sigma)
# make the radius of the filter equal to truncate standard deviations
lw = int(truncate * sd + 0.5)
weights = halfgaussian_kernel1d(sigma, lw)
origin = -lw // 2
return scipy.ndimage.convolve1d(input, weights, axis, output, mode, cval, origin)
A short summary of how this works:
First, it generates a convolution kernel. It uses the formula e^(-1/2 * (x/sigma)^2) to generate the gaussian distribution. It keeps going until you're 4 standard deviations away from the center.
Next, it convolves that kernel against your signal. It adjusts the kernel to start at the current timestep instead of being centered on the current timestep.
Trying this on your signal, I get a result like this:
array([0.59979879, 0.6 , 0.40006707, 0.59993293, 0.79993293,
0.40013414, 0.20006707, 0.59986586, 0.40006707, 0.4 ,
0.99979879, 0.00033535, 0.59979879, 0.40006707, 0.00013414,
0.59979879, 0.20013414, 0.00006707, 0.19993293, 0.59986586])
Choice of standard deviation
If you pick a standard deviation of 0.25, that is going to have almost no effect on your signal. Here are the convolution weights it uses: [0.99966465 0.00033535]. In other words, this has less than a 0.1% effect on the signal.
I'd recommend using a larger sigma value.
Off by one error
Also, I want to point out the off-by-one error here:
for i in range(0, len(X)+1, bin_size):
Xbinned.append(sum(X[i:i+(bin_size-1)])/bin_size)
Numpy ranges are not inclusive, so a range of i to i+(bin_size-1) actually captures 4 elements, not 5.
To fix this, you can change it to this:
for i in range(0, len(X), bin_size):
Xbinned.append(X[i:i+bin_size].mean())
(Also, I fixed an off-by-one error in the loop specification and used a numpy shortcut for finding the mean.)
I have already tried histogram equalization based on image and it works just fine.
But now I want to implement the same approach using audio frequency instead of image gray scale. Which means I would like to make the spectrum flatter. The sampling rate I use is 44.1kHz and want to make the frequency evenly spread to range 0-22050Hz, but the peak is still the highest.
Here is the spectrum:
And this is what I have tried:
I think the original histogram I plot is already wrong, I can't count the number of occurrences per frequency, or maybe I shouldn't do this at all. Somebody told me I need to use fft() but I have no idea how to do it.
Any help would be appreciated! Thanks
Here is the code for how I plot the spectrum :
import librosa
import numpy as np
import matplotlib.pyplot as plt
import math
file = 'example.wav'
y, sr = librosa.load(file, sr=None)
n_fft = 2048
S = librosa.stft(y, n_fft=n_fft, hop_length=n_fft//2)
S = abs(S)
D_AVG = np.mean(S, axis=1)
plt.figure(figsize=(25, 12))
plt.bar(np.arange(D_AVG.shape[0]), D_AVG)
x_ticks_positions = [n for n in range(0, n_fft // 2, n_fft // 16)]
x_ticks_labels = [str(sr / 2048 * n) + 'Hz' for n in x_ticks_positions]
plt.xticks(x_ticks_positions, x_ticks_labels)
plt.xlabel('Frequency')
plt.ylabel('dB')
plt.savefig('spectrum.png')
"Equalization" in the sense of making a flat frequency spectrum is usually done by a whitening transformation. This post on dsp.stackexchange might also be helpful. As Mark mentioned, this spectral equalization is different from histogram equalization in image processing.
Equalizing/whitening the spectrum of a signal:
Estimate the PSD. Given an array of samples x with sample rate fs, you can compute a robust estimate of the power spectral density (PSD) with scipy.signal.welch:
f, psd = scipy.signal.welch(x, fs=fs)
This function performs Welch's method. Basically, it divides up the signal into several segments, does FFT on each one, and averages the power spectra to get a good estimate of how much power x has at each frequency on average. The point of all this is it gets a more reliable frequency characterization than just taking one FFT of x as a whole.
Compute equalizer gain. Use eq_gain = 1 / (1e-6 + psd)**0.5, or something similar, to determine the gain of the equalizer. The 1e-6 denominator offset is to avoid division by zero. It often happens that the PSD extremely small for some frequencies because, say, x went through an anti-aliasing filter that made some high frequency powers nearly zero.
Apply the equalizer gain. Finally, eq_gain needs to be applied to the signal x to equalize it. There are many ways this could be done, but one way is to use scipy.signal.firwin2 to turn the gains into an FIR filter,
eq_filter = scipy.signal.firwin2(99, f, eq_gain, fs=fs)
and use a convolution or scipy.signal.lfilter to apply the filter to x. You can then use scipy.signal.welch again to check that the PSD is flatter than before.
I am look for a way to obtain the frequency from a signal. Here's an example:
signal = [numpy.sin(numpy.pi * x / 2) for x in range(1000)]
This Array will represent the sample of a recorded sound (x = miliseconds)
sin(pi*x/2) => 250 Hrz
How can we go from the signal (list of points), to obtaining the frequencies form this array?
Note:
I have read many Stackoverflow threads and watch many youtube videos. I am yet to find an answer. Please use simple words.
(I am Thankfull for every answer)
What you're looking for is known as the Fourier Transform
A bit of background
Let's start with the formal definition:
The Fourier transform (FT) decomposes a function (often a function of time, or a signal) into its constituent frequencies
This is in essence a mathematical operation that when applied over a signal, gives you an idea of how present each frequency is in the time series. In order to get some intuition behind this, it might be helpful to look at the mathematical definition of the DFT:
Where k here is swept all the way up t N-1 to calculate all the DFT coefficients.
The first thing to notice is that, this definition resembles somewhat that of the correlation of two functions, in this case x(n) and the negative exponential function. While this may seem a little bit abstract, by using Euler's formula and by playing a bit around with the definition, the DFT can be expressed as the correlation with both a sine wave and a cosine wave, which will account for the imaginary and the real parts of the DFT.
So keeping in mind that this is in essence computing a correlation, whenever a corresponding sine or cosine from the decomposition of the complex exponential matches with that of x(n), there will be a peak in X(K), meaning that, such frequency is present in the signal.
How can we do the same with numpy?
So having given a very brief theoretical background, let's consider an example to see how this can be implemented in python. Lets consider the following signal:
import numpy as np
import matplotlib.pyplot as plt
Fs = 150.0; # sampling rate
Ts = 1.0/Fs; # sampling interval
t = np.arange(0,1,Ts) # time vector
ff = 50; # frequency of the signal
y = np.sin(2*np.pi*ff*t)
plt.plot(t, y)
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.show()
Now, the DFT can be computed by using np.fft.fft, which as mentioned, will be telling you which is the contribution of each frequency in the signal now in the transformed domain:
n = len(y) # length of the signal
k = np.arange(n)
T = n/Fs
frq = k/T # two sides frequency range
frq = frq[:len(frq)//2] # one side frequency range
Y = np.fft.fft(y)/n # dft and normalization
Y = Y[:n//2]
Now, if we plot the actual spectrum, you will see that we get a peak at the frequency of 50Hz, which in mathematical terms it will be a delta function centred in the fundamental frequency of 50Hz. This can be checked in the following Table of Fourier Transform Pairs table.
So for the above signal, we would get:
plt.plot(frq,abs(Y)) # plotting the spectrum
plt.xlabel('Freq (Hz)')
plt.ylabel('|Y(freq)|')
plt.show()
I am trying to smooth a noisy one-dimensional physical signal, y, while retaining correspondence between the signal's amplitude and its units. I'm applying a Gaussian kernel and normalizing the Gaussian itself based on its own integral.
import numpy.random
import matplotlib.pyplot as plt
import numpy as np
from scipy.integrate import quad
#Gaussian kernel (not normalized here)
def gaussian(x, sigma):
return np.exp(-(x/sigma)**2/2)
#convolution
def smooth(y,box_pts):
x = (np.linspace(-box_pts/2.,box_pts/2.,box_pts + 1)) #Gaussian centred on 0
std_norm = 3. #3. is an arbitrary value for normalizing the sigma
sigma = box_pts/std_norm
integral = quad(gaussian, x[0], x[-1], args=(sigma))[0]
box = gaussian(x, sigma)/integral
y_smooth = np.convolve(y,box,mode='same')
return y_smooth
box_size = 10
length = 100
y = numpy.random.randn(length)
y_smooth = smooth(y,box_size)
plt.plot(y)
plt.show()
plt.plot(y_smooth)
plt.show()
In this example, y is the signal I want to smooth, box_pts is the width or region over which the Gaussian kernel is applied as it is translated. And to normalize my convolution, I've simply divided by the integral of the Gaussian by itself. Since the Gaussian has a certain width (given by box_pts) and is zero outside of this interval compared to the physical axis of y, I am normalizing the Gaussian to the region over which it is non-zero as opposed to $-\infty$ to $ \infty$. This method appears simple enough and the normalization appears to work, but is dividing by the kernel's integral itself the "right" or suggested approach to take when normalizing convolutions/kernels?
Based on a simple example, it appears keep the integral about the same, but amplitudes are no longer consistent as visualized below which is a bit problematic as I want to retain a smoothed signal while removing the noise:
Original:
Smoothed:
I get improved amplitudes when I decrease box_size relative to length, but this keeps the signal noisy. Decreasing std_norm also appears to help reduce noise, but this influences the amplitude. My question boils down to deciding how one chooses optimal std_norm and box_norm for a given noisy signal y of size length such that I can reduce the noise while keeping the output physically sound (i.e. smoothed signal amplitude is somewhat consistent with the original signal and the integral under the smoothed curve is fairly similar, too). I have applied a Gaussian kernel here, but am interested in hopefully a general approach for any arbitrary kernel.
I am trying to convolve a square wave with itself more than once and see the resulting graph. I know how to do convolution by my hands but I am not experienced in signal processing with Python. So my questions are:
How can I represent a signal in Python? For example:
x(t) = 1 , 0 ≤ t ≤ 1
x(t) = 0, otherwise
How can I convolve this square wave with itself?
What I have come so far is that I must use numpy's built-in convolve method; but the problem is I am stuck representing this square wave.
One way to create a suitable 0-1 array is np.fromfunction, passing it a function that returns True within the wave. Converting to float results in a 0-1 array.
For this illustration it's better to position the wave in the middle of the array, avoiding boundary effects associated with convolution. Using mode='same' allows for all curves to be plotted together. Also, don't forget to divide the output of convolve by sample_rate, otherwise it will grow proportionally to it with each convolution.
import numpy as np
import matplotlib.pyplot as plt
sample_rate = 100
num_samples = 500
wave = np.fromfunction(lambda i: (2*sample_rate < i) & (i < 3*sample_rate), (num_samples,)).astype(np.float)
wave1 = np.convolve(wave, wave, mode='same')/sample_rate
wave2 = np.convolve(wave1, wave, mode='same')/sample_rate
wave3 = np.convolve(wave2, wave, mode='same')/sample_rate
plt.plot(np.stack((wave, wave1, wave2, wave3), axis=1))
plt.show()
Mathematically, these are known as cardinal B-splines and as the density of Irwin–Hall distribution.