Erasing noise from fft chart - python

Do you know how to delete so much noise from the FFT?
Here is my code of FFT:
import numpy as np
fft1 = (Bx[51:-14])
fft2 = (By[1:-14])
# Loop for FFT data
for dataset in [fft1]:
dataset = np.asarray(dataset)
psd = np.abs(np.fft.fft(dataset))**2
freq = np.fft.fftfreq(dataset.size, float(300)/dataset.size)
plt.semilogy(freq[freq>0], psd[freq>0]/dataset.size**2, color='r')
for dataset2 in [fft2]:
dataset2 = np.asarray(dataset2)
psd2 = np.abs(np.fft.fft(dataset2))**2
freq2 = np.fft.fftfreq(dataset2.size, float(300)/dataset2.size)
plt.semilogy(freq2[freq2>0], psd2[freq2>0]/dataset2.size**2, color='b')
What I get:
What I need:
Any ideas? Welch does not work, so as you can see, I don't want to smooth my chart, but erase so much noise to the level which is presented on the second picture.
This is what Welch do:
and a bit of code:
freqs, psd = scipy.signal.welch(dataset, fs=300, window='hamming')
Updated Welch:
A bit of code:
# Loop for FFT data
for dataset in [fft1]:
dataset = np.asarray(dataset)
freqs, psd = welch(dataset, fs=266336/300, window='hamming', nperseg=512)
plt.semilogy(freqs, psd/dataset.size**2, color='r')
for dataset2 in [fft2]:
dataset2 = np.asarray(dataset2)
freqs2, psd2 = welch(dataset2, fs=266336/300, window='hamming', nperseg=512)
plt.semilogy(freqs2, psd2/dataset2.size**2, color='b')
As you can see Welch is well configurated, it shows 60 Hz electricity line, and harmonic modes. It is almost good, but it smoothed completely my plot. See graph two which is desired. Btw. y scale is wrong at Welch plot, but it is just a case of power data to the two.
I have changed to nperseg=8192 and it worked. Look at the results.

Here is an example that shows how to use nperseg to control the frequency resolution vs. noise reduction tradeoff:
Setting nperseg to the length of the signal is more or less equivalent to using the FFT without any averaging.
Here is the code to generate this image:
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt
plt.figure(figsize=[8, 12])
n = 2**21
fs = 887
# example data
x = np.random.randn(n)
x += np.sin(np.cumsum(0.42 + np.random.randn(n) * 0.01)) * 5
x = signal.lfilter([1, 0.5], 2, x)
plt.subplot(3, 2, 1)
plt.semilogy(np.abs(np.fft.fft(x)[:n//2])**2 / n**2, label='FFT')
plt.legend(loc='best')
for i, nperseg in enumerate([128, 512, 8192, 65536, n]):
plt.subplot(3, 2, i+2)
f, psd = signal.welch(x, fs=fs, window='hamming', nperseg=nperseg, noverlap=0)
plt.semilogy(f, psd, label='nperseg={}'.format(nperseg))
plt.legend(loc='best')
plt.show()

Related

How to set scale and parameter w in wavelet transform (scipy)?

I try do a wavelet transform on a signal.
The signal is recorded with 500 Hz (: 500 datapoints per second). I just want to do a wavelet transform on 0.2 seconds of the signal (-> 100 datapoints).
I am really not sure how to define parameter "w" and also if the y-axis is scaled correctly? I cannot find anything in the documentation of scipy to "w".
My code and output looks as follows:
import matplotlib.pyplot as plt
import numpy as np
from scipy import signal
t, dt = np.linspace(0, 0.2, 100, retstep=True)
fs = 1/dt
w = 6.
sig = sig[0:100] # 0.2 seconds of the signal
freq = np.linspace(1, fs/2, 100)
widths = w*fs / (2*freq*np.pi)
cwtm = signal.cwt(sig, signal.morlet2, widths, w=w)
plt.pcolormesh(t, freq, np.abs(cwtm), cmap='jet', shading='gouraud')
plt.xlabel("Time")
plt.ylabel("Scale")
plt.show()
Output:
Bonus question: How would you interprete the output in two sentences?

How to stack multiple histograms in a single figure in Python?

I have a numpy array with shape [30, 10000], where the first axis is the time step, and the second contains the values observed for a series of 10000 variables. I would like to visualize the data in a single figure, similar to this:
that you can find in the seaborn tutorial here. Basically, what I would like is to draw a histogram of 30/40 bins for each of the 30 temporal steps, and then - somehow - concatenate these histogram to have a common axis and plot them in the same figure.
My data look like a gaussian that moves and gets wider in time. You can reproduce something similar using the following code:
mean = 0.0
std = 1.0
data = []
for t in range(30):
mean = mean + 0.01
std = std + 0.1
data.append(np.random.normal(loc=mean, scale=std, size=[10000]))
data = np.array(data)
A figure similar to the picture showed above would be the best, but any help is appreciated!
Thank you,
G.
Use histogram? You could do this with np.hist2d, but this way is a little clearer...
import matplotlib.pyplot as plt
import numpy as np
data = np.random.randn(30, 10000)
H = np.zeros((30, 40))
bins = np.linspace(-3, 3, 41)
for i in range(30):
H[i, :], _ = np.histogram(data[i, :], bins)
fig, ax = plt.subplots()
times = np.arange(30) * 0.1
pc = ax.pcolormesh(bins, times, H)
ax.set_xlabel('data bins')
ax.set_ylabel('time [s]')
fig.colorbar(pc, label='count')

Plotting numpy rfft

I am trying to plot the fft of a wav file, I have successfully completed it using the regular fft but I wanted to experiment with rfft as my application was to perform this in music. When I try to plot xf and yf (figure 2) I run into an issue where xf is half the length of yf and I can't figure out why, I assume its due to the negative frequencies missing but I thought changing both function calls to rfft and rfftfreq would handle it.
import numpy as np
import soundfile as sf
import matplotlib.pyplot as plt
square = 'square.wav'
sine = 'sine.wav'
k = '1Khz.wav'
cello = 'cello.wav'
data, fs = sf.read(k)
#Plot the Signal
N = len(data)
T = 1.0/fs
x = np.linspace(0, (N*T), N)
plt.plot(x, data)
plt.grid()
count = 0
yf = np.fft.rfft(data)
xf = np.fft.rfftfreq(yf.size, d=T)
plt.figure(2)
plt.plot(xf, yf)
plt.show()
The sizes used for numpy.fft.rfft and numpy.fft.rfftfreq need to match. As such you should use your data.size rather yf.size (since the size of yf is already reduced by not including the negative frequencies) as argument to rfftfreq:
yf = np.fft.rfft(data)
xf = np.fft.rfftfreq(data.size, d=T)
Finally note that as you plot yf with plt.plot(xf, yf) you would get a warning about the imaginary part being lost. If you are interested in plotting the magnitude of the frequency spectrum, you should rather use plt.plot(xf, abs(yf)).
You need to convert the frequencies to the sample rate. See https://stackoverflow.com/a/27191172/7919597 or the doc of rfftfreq:
signal = np.array([-2, 8, 6, 4, 1, 0, 3, 5, -3, 4], dtype=float)
fourier = np.fft.rfft(signal)
n = signal.size
sample_rate = 100
freq = np.fft.fftfreq(n, d=1./sample_rate)
print(freq)
freq = np.fft.rfftfreq(n, d=1./sample_rate)
print(freq)

How to use Python to draw a normal probability plot by using certain column data in dataFrame

I have a Data Frame that contains two columns named, "thousands of dollars per year", and "EMPLOY".
I create a new variable in this data frame named "cubic_Root" by computing the data in df['thousands of dollars per year']
df['cubic_Root'] = -1 / df['thousands of dollars per year'] ** (1. / 3)
The data in df['cubic_Root'] like that:
ID cubic_Root
1 -0.629961
2 -0.405480
3 -0.329317
4 -0.480750
5 -0.305711
6 -0.449644
7 -0.449644
8 -0.480750
Now! How can I draw a normal probability plot by using the data in df['cubic_Root'].
You want the "Probability" Plots.
So for a single plot, you'd have something like below.
import scipy.stats
import numpy as np
import matplotlib.pyplot as plt
# 100 values from a normal distribution with a std of 3 and a mean of 0.5
data = 3.0 * np.random.randn(100) + 0.5
counts, start, dx, _ = scipy.stats.cumfreq(data, numbins=20)
x = np.arange(counts.size) * dx + start
plt.plot(x, counts, 'ro')
plt.xlabel('Value')
plt.ylabel('Cumulative Frequency')
plt.show()
If you want to plot a distribution, and you know it, define it as a function, and plot it as so:
import numpy as np
from matplotlib import pyplot as plt
def my_dist(x):
return np.exp(-x ** 2)
x = np.arange(-100, 100)
p = my_dist(x)
plt.plot(x, p)
plt.show()
If you don't have the exact distribution as an analytical function, perhaps you can generate a large sample, take a histogram and somehow smooth the data:
import numpy as np
from scipy.interpolate import UnivariateSpline
from matplotlib import pyplot as plt
N = 1000
n = N/10
s = np.random.normal(size=N) # generate your data sample with N elements
p, x = np.histogram(s, bins=n) # bin it into n = N/10 bins
x = x[:-1] + (x[1] - x[0])/2 # convert bin edges to centers
f = UnivariateSpline(x, p, s=n)
plt.plot(x, f(x))
plt.show()
You can increase or decrease s (smoothing factor) within the UnivariateSpline function call to increase or decrease smoothing. For example, using the two you get:
Probability density Function (PDF) of inter-arrival time of events.
import numpy as np
import scipy.stats
# generate data samples
data = scipy.stats.expon.rvs(loc=0, scale=1, size=1000, random_state=123)
A kernel density estimation can then be obtained by simply calling
scipy.stats.gaussian_kde(data,bw_method=bw)
where bw is an (optional) parameter for the estimation procedure. For this data set, and considering three values for bw the fit is as shown below
# test values for the bw_method option ('None' is the default value)
bw_values = [None, 0.1, 0.01]
# generate a list of kde estimators for each bw
kde = [scipy.stats.gaussian_kde(data,bw_method=bw) for bw in bw_values]
# plot (normalized) histogram of the data
import matplotlib.pyplot as plt
plt.hist(data, 50, normed=1, facecolor='green', alpha=0.5);
# plot density estimates
t_range = np.linspace(-2,8,200)
for i, bw in enumerate(bw_values):
plt.plot(t_range,kde[i](t_range),lw=2, label='bw = '+str(bw))
plt.xlim(-1,6)
plt.legend(loc='best')
Reference:
Python: Matplotlib - probability plot for several data set
how to plot Probability density Function (PDF) of inter-arrival time of events?

Normal distribution appears too dense when plotted in matplotlib

I am trying to estimate the probability density function of my data. IN my case, the data is a satellite image with a shape 8200 x 8100.
Below, I present you the code of PDF (the function 'is_outlier' is borrowed by a guy that post this code on here ). As we can see, the PDF is in figure 1 too dense. I guess, this is due to the thousands of pixels that the satellite image is composed of. This is very ugly.
My question is, how can I plot a PDF that is not too dense? something like shown in figure 2 for example.
lst = 'satellite_img.tif' #import the image
lst_flat = lst.flatten() #create 1D array
#the function below removes the outliers
def is_outlier(points, thres=3.5):
if len(points.shape) == 1:
points = points[:,None]
median = np.median(points, axis=0)
diff = np.sum((points - median)**2, axis=-1)
diff = np.sqrt(diff)
med_abs_deviation = np.median(diff)
modified_z_score = 0.6745 * diff / med_abs_deviation
return modified_z_score > thres
lst_flat = np.r_[lst_flat]
lst_flat_filtered = lst_flat[~is_outlier(lst_flat)]
fit = stats.norm.pdf(lst_flat_filtered, np.mean(lst_flat_filtered), np.std(lst_flat_filtered))
plt.plot(lst_flat_filtered, fit)
plt.hist(lst_flat_filtered, bins=30, normed=True)
plt.show()
figure 1
figure 2
The issue is that the x values in the PDF plot are not sorted, so the plotted line is going back and forwards between random points, creating the mess you see.
Two options:
Don't plot the line, just plot points (not great if you have lots of points, but will confirm if what I said above is right or not):
plt.plot(lst_flat_filtered, fit, 'bo')
Sort the lst_flat_filtered array before calculating the PDF and plotting it:
lst_flat = np.r_[lst_flat]
lst_flat_filtered = np.sort(lst_flat[~is_outlier(lst_flat)]) # Changed this line
fit = stats.norm.pdf(lst_flat_filtered, np.mean(lst_flat_filtered), np.std(lst_flat_filtered))
plt.plot(lst_flat_filtered, fit)
Here's some minimal examples showing these behaviours:
Reproducing your problem:
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
lst_flat_filtered = np.random.normal(7, 5, 1000)
fit = stats.norm.pdf(lst_flat_filtered, np.mean(lst_flat_filtered), np.std(lst_flat_filtered))
plt.hist(lst_flat_filtered, bins=30, normed=True)
plt.plot(lst_flat_filtered, fit)
plt.show()
Plotting points
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
lst_flat_filtered = np.random.normal(7, 5, 1000)
fit = stats.norm.pdf(lst_flat_filtered, np.mean(lst_flat_filtered), np.std(lst_flat_filtered))
plt.hist(lst_flat_filtered, bins=30, normed=True)
plt.plot(lst_flat_filtered, fit, 'bo')
plt.show()
Sorting the data
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
lst_flat_filtered = np.sort(np.random.normal(7, 5, 1000))
fit = stats.norm.pdf(lst_flat_filtered, np.mean(lst_flat_filtered), np.std(lst_flat_filtered))
plt.hist(lst_flat_filtered, bins=30, normed=True)
plt.plot(lst_flat_filtered, fit)
plt.show()

Categories

Resources