Not getting expected results with Numpy FFT

Not getting expected results with Numpy FFT - python

So I have this car that moves at a velocity that is the sum of three different sine waves (whose individual frequencies I know). I used the following to construct this
velocity time graph
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('drivingdata.csv') # velocity values
s = df['leadspeed'].values # transform csv col into array
t = np.linspace(0, 1, 5067)
plt.ylabel("Amplitude")
plt.xlabel("Time[s]")
plt.plot(t, s)
plt.show()
This is fine, and then I perform a FFT on this data with the following numpy function:
T = t[1]-t[0] # sample rate
N = s.size
fft = np.fft.fft(s)
f = np.linspace(0, 1//T, N) # 1/T is the frequency
plt.ylabel("Amplitude")
plt.xlabel("Frequency [Hz]")
plt.bar(f[:N // 2], np.abs(fft)[:N // 2] * 1 // N) # 1/N is a normalization factor
plt.show()
Then I get this amplitude vs frequency graph. How do I "zoom-in" so that I can confirm my initial frequencies (all under 0.2) ?
I'm completely new to fft, so criticism/help would be appreciated.
EDIT:
I followed your helpful advice, Cris Luengo, and this is my new graph. The frequencies I input into my waves were 0.033, 0.083, and 0.117, so I'm still left seeking answers.
EDIT 2:
My apologies, Cris. Here you go. Are the frequencies I'm looking for just right past the 0 there? Is there a way to "zoom in" ? New graph

Related

How to correctly plot a linear regression on a log10 scale?

I am plotting two lists of data against each other, namely freq and data. Freq stands for frequency, and data are the numeric observations for each frequency.
In the next step, I apply the ordinary linear least-squared regression between freq and data, using stats.linregress on the logarithmic scale. My aim is applying the linear regression inside the log-log scale, not on the normal scale.
Before doing so, I transform both freq and data into np.log10, since I plan to plot a straight linear regression line on the logarithmic scale, using plt.loglog.
Problem:
The problem is that the regression line, plotted in red color, is plotted far from the actual data, plotted in green color. I assume that there is a problem in combination with plt.loglog in my code, hence the visual distance between the green data and the red regression line. How can I fix this problem, so that the regression line plots on top of the actual data?
Here is my reproducible code:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
# Data
freq = [0.0102539, 0.0107422, 0.0112305, 0.0117188, 0.012207, 0.0126953,
0.0131836]
data = [4.48575, 4.11893, 3.69591, 3.34766, 3.18452, 3.23554, 3.43357]
# Plot log10 of freq vs. data
plt.loglog(freq, data, c="green")
# Linear regression
log_freq = np.log10(freq)
log_data = np.log10(data)
reg = stats.linregress(log_freq, log_data)
slope = reg[0]
intercept = reg[1]
plt.plot(freq, slope*log_freq + intercept, color="red")
And here is a screenshot of the code’s result:

You can convert your data sets to log base 10 first, then do linear regression and plot them accordingly.
Note that after the log transformation, the numbers inlog_freq will all be negative; therefore x-axis cannot be log-scaled.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
# Data
freq = np.array([0.0102539, 0.0107422, 0.0112305, 0.0117188, 0.012207, 0.0126953,
0.0131836])
data = np.array([4.48575, 4.11893, 3.69591, 3.34766, 3.18452, 3.23554, 3.43357])
# transform date to log base 10
log_freq = np.log10(freq)
log_data = np.log10(data)
# Plot freq vs. data
fig, ax = plt.subplots(figsize=(8, 6))
ax.plot(log_freq, log_data, c="green", label='Original data (log 10 base)')
# Linear regression
reg = stats.linregress(log_freq, log_data)
# Plot fitted freq vs. data
ax.plot(log_freq, reg.slope * log_freq + reg.intercept, color="red",
label='Fitted line on the original data (log 10 base)')
plt.legend()
plt.tight_layout()
plt.show()
References:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html
https://numpy.org/doc/stable/reference/generated/numpy.log10.html#

First of all, I question the necessity of log-log axes, because the ranges of the data, or at least the ranges of the data that you've shown us, are limited on both coordinates.
In the code below, I have
computed the logarithms in base 10 of your arrays,
used the formulas for linear regression but using the logarithms of data to obtain the equation of a straight line:
                 y = a + b·x
in, so to say, the logarithmic space.
Because a straight line in log-space corresponds, in data-space, to a power law, y = pow(10, a)·pow(x, b), I have plotted
the original data, in log-log, and
the power law, also in log-log,
obtaining a straight line in the log-log representation.
import matplotlib.pyplot as plt
from math import log10
freq = [.0102539, .0107422, .0112305, .0117188, .012207, .0126953, .0131836]
data = [4.48575, 4.11893, 3.69591, 3.34766, 3.18452, 3.23554, 3.43357]
n = len(freq)
# the following block of code is the unfolding of the formulas in
# https://mathworld.wolfram.com/LeastSquaresFittingPowerLaw.html
# START ##############################################
lx, ly = [[log10(V) for V in v] for v in (freq, data)]
sum_x = sum(x for x in lx)
sum_y = sum(y for y in ly)
sum_x2 = sum(x**2 for x in lx)
sum_y2 = sum(y**2 for y in ly)
sum_xy = sum(x*y for x, y in zip(lx, ly))
# coefficients of a straight line "y = a + b x" in log-log space
b = (n*sum_xy - sum_x*sum_y)/(n*sum_x2-sum_x**2)
a = (sum_y - b*sum_x)/n
A = pow(10, a)
# END ##############################################
plt.loglog(freq, data)
plt.loglog(freq, [A*pow(x, b) for x in freq])

Plot normal distribution over histogram

I am new to python and in the following code, I would like to plot a bell curve to show how the data follows a norm distribution. How would I go about it? Also, can anyone answer why when showing the hist, I have values (x-axis) greater than 100? I would assume by defining the Randels to 100, it would not show anything above it. If I am not mistaken, the x-axis represents what "floor" I am in and the y-axis represents how many observations matched that floor. By the way, this is a datacamp project.
"""
Let's say I roll a dice to determine if I go up or down a step in a building with
100 floors (1 step = 1 floor). If the dice is less than 2, I go down a step. If
the dice is less than or equal to 5, I go up a step, and if the dice is equal to 6,
I go up x steps based on a random integer generator between 1 and 6. What is the probability
I will be higher than floor 60?
"""
import numpy as np
import matplotlib.pyplot as plt
# Set the seed
np.random.seed(123)
# Simulate random walk
all_walks = []
for i in range(1000) :
random_walk = [0]
for x in range(100) :
step = random_walk[-1]
dice = np.random.randint(1,7)
if dice <= 2:
step = max(0, step - 1)
elif dice <= 5:
step = step + 1
else:
step = step + np.random.randint(1,7)
if np.random.rand() <= 0.001 : # There's a 0.1% chance I fall and have to start at 0
step = 0
random_walk.append(step)
all_walks.append(random_walk)
# Create and plot np_aw_t
np_aw_t = np.transpose(np.array(all_walks))
# Select last row from np_aw_t: ends
ends = np_aw_t[-1,:]
# Plot histogram of ends, display plot
plt.hist(ends,bins=10,edgecolor='k',alpha=0.65)
plt.style.use('fivethirtyeight')
plt.xlabel("Floor")
plt.ylabel("# of times in floor")
plt.show()

You can use scipy.stats.norm to get a normal distribution. Documentation for it here. To fit any function to a data set you can use scipy.optimize.curve_fit(), documentation for that here. My suggestion would be something like the following:
import scipy.stats as ss
import numpy as np
import scipy.optimize as opt
import matplotlib.pyplot as plt
#Making a figure with two y-axis (one for the hist, one for the pdf)
#An alternative would be to multiply the pdf by the sum of counts if you just want to show the fit.
fig, ax = plt.subplots(1,1)
twinx = ax.twinx()
rands = ss.norm.rvs(loc = 1, scale = 1, size = 1000)
#hist returns the bins and the value of each bin, plot to the y-axis ax
hist = ax.hist(rands)
vals, bins = hist[0], hist[1]
#calculating the center of each bin
bin_centers = [(bins[i] + bins[i+1])/2 for i in range(len(bins)-1)]
#finding the best fit coefficients, note vals/sum(vals) to get the probability in each bin instead of the count
coeff, cov = opt.curve_fit(ss.norm.pdf, bin_centers, vals/sum(vals), p0 = [0,1] )
#loc and scale are mean and standard deviation i believe
loc, scale = coeff
#x-values to plot the normal distribution curve
x = np.linspace(min(bins), max(bins), 100)
#Evaluating the pdf with the best fit mean and std
p = ss.norm.pdf(x, loc = loc, scale = scale)
#plot the pdf to the other axis and show
twinx.plot(x,p)
plt.show()
There are likely more elegant ways to do this, but if you are new to python and are going to use it for calculations and such, getting to know curve_fit and scipy.stats is recomended. I'm not sure I understand whan you mean by "defining the Randels", hist will plot a "standard" histogram with bins on the x-axis and the count in each bin on the y-axis. When using these counts to fit a pdf we can just divide all the counts by the total number of counts.
Hope that helps, just ask if anything is unclear :)
Edit: compact version
vals, bins,_ = ax.hist(my_histogram_data)
bin_centers = [(bins[i] + bins[i+1])/2 for i in range(len(bins)-1)]
coeff, cov = opt.curve_fit(ss.norm.pdf, bin_centers, vals/sum(vals), p0 = [0,1] )
x = np.linspace(min(bins), max(bins), 100)
p = ss.norm.pdf(x, loc = coeff[0], scale = coeff[1])
#p is now the fitted normal distribution

function to determine the frequency of a sinusoid

I am having trouble with my Digital Signal Processing homework. Using Python, I need to create a function that is able to determine the frequency of a sinusoid. I am given random frequencies form 0-4000 Hz with an Fs=8000. Can someone please help?
import numpy as np
def freqfinder(signal):
"""REPLACE"""
x=np.fft.fft(signal)
x=np.abs(x)
x=np.max(x)
return x
t=np.linspace(0,2*np.pi,8*8000)
y=np.sin(2*t)
print(freqfinder(y))
z = np.fft.fft(y)
zz = np.abs(z)
plt.plot(zz)
I tried this as a test for the fft.

Your code is off to a good start. A few things to note:
You should only look at the first half of your FFT -- For a REAL input, the output is symmetric around 0 and you only care about the frequencies greater than 0 (the first half of the fft output).
You want the magnitude of each frequency - so you should then take the absolute value of the resulting fft.
The max you are locating is NOT the frequency, but is related to the index of the frequency. It is the strength of the strongest frequency.
Here is a little script demonstrating these ideas:
import numpy as np
import matplotlib.pyplot as plt
fs = 8000
t = np.linspace(0, 2*np.pi, fs)
freqs = [ 2, 152, 423, 2423, 3541] # Frequencies to test
amps = [0.5, 0.5, 1.0, 0.8, 0.3] # Amplitude for each freq
y = np.zeros(len(t))
for freq, amp in zip(freqs, amps):
y += amp*np.sin(freq*t)
fig, ax = plt.subplots(1, 2)
ax = ax.flatten()
ax[0].plot(t, y)
ax[0].set_title("Original signal")
y_fft = np.fft.fft(y) # Original FFT
y_fft = y_fft[:round(len(t)/2)] # First half ( pos freqs )
y_fft = np.abs(y_fft) # Absolute value of magnitudes
y_fft = y_fft/max(y_fft) # Normalized so max = 1
freq_x_axis = np.linspace(0, fs/2, len(y_fft))
ax[1].plot(freq_x_axis, y_fft, "o-")
ax[1].set_title("Frequency magnitudes")
ax[1].set_xlabel("Frequency")
ax[1].set_ylabel("Magnitude")
plt.grid()
plt.tight_layout()
plt.show()
f_loc = np.argmax(y_fft) # Finds the index of the max
f_val = freq_x_axis[f_loc] # The strongest frequency value
print(f"The strongest frequency is f = {f_val}")
The output:
The strongest frequency is f = 423.1057764441111
You can see on the right graph that there is a peak at each of the frequencies we specified in freqs, which is what is expected.
This kind of setup is fine if you only have one frequency you're looking for, but otherwise you may need to find and implement some peak finding algorithms to find all the indices of all the frequency peaks of y_fft and then correlate that with the frequencies in freq_x_axis

Beginner Python Monte Carlo Simulation

I'm a beginner at Python and am working through exercises set by our instructor. I am struggling with this question.
In the Python editor, write a Monte Carlo simulation to estimate the value of the number π.
Specifically, follow these steps:
A. Produce two arrays, one called x, one called y, which contain 100 elements each,
which are randomly and uniformly distributed real numbers between -1 and 1.
B. Plot y versus x as dots in a plot. Label your axes accordingly.
C. Write down a mathematical expression that defines which (x, y) pairs of data points
are located in a circle with radius 1, centred on the (0, 0) origin of the graph.
D. Use Boolean masks to identify the points inside the circle, and overplot them in a
different colour and marker size on top of the data points you already plotted in B.
This is what I have at the moment.
import numpy as np
import math
import matplotlib.pyplot as plt
np.random.seed(12345)
x = np.random.uniform(-1,1,100)
y = np.random.uniform(-1,1,100)
plt.plot(x,y) //this works
for i in x:
newarray = (1>math.sqrt(y[i]*y[i] + x[i]*x[i]))
plt.plot(newarray)
Any suggestions?

as pointed out in the comment the error in your code is for i in x should be for i in xrange(len(x))
If you want to actually use a Boolean mask as said in the statement you could do something like this
import pandas as pd
allpoints = pd.DataFrame({'x':x, 'y':y})
# this is your boolean mask
mask = pow(allpoints.x, 2) + pow(allpoints.y, 2) < 1
circlepoints = allpoints[mask]
plt.scatter(allpoints.x, allpoints.y)
plt.scatter(circlepoints.x, circlepoints.y)
increasing the number of point to 10000 you would get something like this
to estimate PI you can use the famous montecarlo derivation
>>> n = 10000
>>> ( len(circlepoints) * 4 ) / float(n)
<<< 3.1464

You are close to the solution. I slightly reshape your MCVE:
import numpy as np
import math
import matplotlib.pyplot as plt
np.random.seed(12345)
N = 10000
x = np.random.uniform(-1, 1, N)
y = np.random.uniform(-1, 1, N)
Now, we compute a criterion that makes sense in this context, such as the distance of points to the origin:
d = x**2 + y**2
Then we use Boolean Indexing to discriminate between points within and outside the Unit Circle:
q = (d <= 1)
At this point lies the Monte Carlo Hypothesis. We assume the ratio of uniformly distributed points in the Circle and in the plane U(-1,1)xU(-1,1) is representative for the Area of the Unit Circle and the Square. Then we can statistically assess pi = 4*(Ac/As) from the ratio of points within the Circle/Square. This leads to:
pi = 4*q.sum()/q.size # 3.1464
Finally we plot the result:
fig, axe = plt.subplots()
axe.plot(x[q], y[q], '.', color='green', label=r'$d \leq 1$')
axe.plot(x[~q], y[~q], '.', color='red', label=r'$d > 1$')
axe.set_aspect('equal')
axe.set_title(r'Monte Carlo: $\pi$ Estimation')
axe.set_xlabel('$x$')
axe.set_ylabel('$y$')
axe.legend(bbox_to_anchor=(1, 1), loc='upper left')
fig.savefig('MonteCarlo.png', dpi=120)
It outputs:

getting output ifft at a different resolution

I'm trying to smooth and interpolate some periodic data in python using scipy.fftp. I have managed to take the fft of the data, remove the higher order frequencies above wn (by doing myfft[wn:-wn] = 0) and then reconstruct a "smoothed" version of the data with ifft(myfft). The array created by the ifft has the same number of points as the original data. How can I use that fft to create an array with more points.
x = [i*2*np.pi/360 for i in range(0,360,30)]
data = np.sin(x)
#get fft
myfft = fftp.fft(data)
#kill feqs above wn
myfft[wn:-wn] = 0
#make new series
newdata = fftp.ifft(myfft)
I've also been able to manually recreate the series at the same resolution as demonstrated here
Recreating time series data using FFT results without using ifft
but when I tried upping the resolution of the x-values array it didn't give me the right answer either.
Thanks in advance
Niall

What np.fft.fft returns has the DC component at position 0, followed by all positive frequencies, then the Nyquist frequency (only if the number of elements is even), then the negative frequencies in reverse order. So to add more resolution you could add zeros at both sides of the Nyquist frequency:
import numpy as np
import matplotlib.pyplot as plt
y = np.sin(np.linspace(0, 2*np.pi, 32, endpoint=False))
f = np.fft.fft(y)
n = len(f)
f_ = np.concatenate((f[0:(n+1)//2],
np.zeros(n//2),
[] if n%2 != 0 else f[(n+1)//2:(n+3)//2],
np.zeros(n//2),
f[(n+3)//2:]))
y_ = np.fft.ifft(f_)
plt.plot(y, 'ro')
plt.plot(y_, 'bo')
plt.show()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Not getting expected results with Numpy FFT - python

Related

How to correctly plot a linear regression on a log10 scale?

Plot normal distribution over histogram

function to determine the frequency of a sinusoid

Beginner Python Monte Carlo Simulation

getting output ifft at a different resolution

Categories

Resources