Draw average line of scatter plot with matplotlib python - python

Im trying to fit an average line in a scatter plot with matplotlib. All im getting is this.
But I want it like this green line
I have tried the following two snippets for fitting the curve:
z = np.polyfit(x, y, 1)
p = np.poly1d(z)
plt.plot(x,p(x),"r-")
and
def func(x, a, b, c):
return a * np.exp(-b * x) + c
popt, pcov = curve_fit(func, x, residual)
plt.plot(x, func(x, *popt), 'r-', label='fit')

As you no provide data I made my own and I've tried this:
N = 10000
xr = np.linspace(-1,6,N)
yr = -1*(np.ones(N)-1+xr) + 10*np.random.rand(N)
x = np.concatenate((xr[0:3800],xr[4900:]))
y = np.concatenate((yr[0:3800],yr[4900:]))
z = np.polyfit(x, y, 1)
p = np.poly1d(z)
plt.plot(x,p(x),"r-")
plt.scatter(x,y, s=2)
plt.show()
```
and this is the output I have, which is the expected output:
If you can share a piece of your data I could test it as well.

Related

3D graphing the complex values of a function in Python

This is the real function I am looking to represent in 3D:
y = f(x) = x^2 + 1
The complex function would be as follows:
w = f(z) = z^2 + 1
Where z = x + iy and w = u + iv. These are four dimentions (x, y, u, v), but one can use u for 3D graphing.
We get:
f(x + iy) = x^2 + 2xyi - y^2 + 1
So:
u = x^2 - y^2 + 1
and v = 2xy
This u is what is being used in the code below.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-100, 101, 150)
y = np.linspace(-100, 101, 150)
X, Y = np.meshgrid(x,y)
U = (X**2) - (Y**2) + 1
fig = plt.figure(dpi = 300)
ax = plt.axes(projection='3d')
ax.plot_surface(X, Y, Z)
plt.show()
The following images are the side-view of the 3D function and the 2D plot for reference. I do not think they are alike.
Likewise, here is the comparison between the 3 side-view and the 2D plot of w = z^3 + 1. They seem to differ as well.
I have not been able to find too many resources regarding plotting in 3D using complex numbers. Because of this and the possible discrepancies mentioned before, I think the code must be flawed, but I can't figure out why. I would be grateful if you could correct me or advise me on any changes.
The inspiration came from Welch Labs' 'Imaginary Numbers are Real' YouTube series where he shows a jaw-dropping representation of the complex values of the function I have been tinkering with.
I was just wondering if anybody could point out any flaws in my reasoning or the execution of my idea since this code would be helpful in explaining the importance of complex numbers to HS students.
Thank you very much for your time.
The f(z) = z^2 + 1 projection (that is, side-view) looks OK to me. You can use this technique to add the projections; this code:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
def f(z):
return z**2 + 1
def freal(x, y):
return x**2 - y**2 + 1
x = np.linspace(-100, 101, 150)
y = np.linspace(-100, 101, 150)
yproj = 0 # value of y for which to project xu axes
xproj = 0 # value of x to project onto yu axes
X, Y = np.meshgrid(x,y)
Z = X + 1j * Y
W = f(Z)
U = W.real
fig = plt.figure()
ax = plt.axes(projection='3d')
## surface
ax.plot_surface(X, Y, U, alpha=0.7)
# xu projection
xuproj = freal(x, yproj)
ax.plot(x, xuproj, zs=101, zdir='y', color='red', lw=5)
ax.plot(x, xuproj, zs=yproj, zdir='y', color='red', lw=5)
# yu projection
yuproj = freal(xproj, y)
ax.plot(y, yuproj, zs=101, zdir='x', color='green', lw=5)
ax.plot(y, yuproj, zs=xproj, zdir='x', color='green', lw=5)
# partially reproduce https://www.youtube.com/watch?v=T647CGsuOVU&t=107s
x = np.linspace(-3, 3, 150)
y = np.linspace(0, 3, 150)
X, Y = np.meshgrid(x,y)
U = f(X + 1j*Y).real
fig = plt.figure()
ax = plt.axes(projection='3d')
## surface
ax.plot_surface(X, Y, U, cmap=cm.jet)
ax.set_box_aspect( (np.diff(ax.get_xlim())[0],
np.diff(ax.get_ylim())[0],
np.diff(ax.get_zlim())[0]))
#ax.set_aspect('equal')
plt.show()
gives this result:
and
The axis ticks don't look very good: you can investigate plt.xticks or ax.set_xticks (and yticks, zticks) to fix this.
There is a way to visualize complex functions using colour as a fourth dimension; see complex-analysis.com for examples.

Trying to fit data into sine cosine curve fit using scipy

I am new to signal processing in python.
Here I am trying fit data to sine cosine curve using the equation -
A * np.sin(x) + B * np.cos(x) + C.
here is the snippet of the code
def func(x, A, B, C):
return A * np.sin(x) + B * np.cos(x) + C
p0 = 0, 25, 10
popt, pcov = curve_fit(func, x, y)
times = np.linspace(x[0], x[-1], num=21)
plt.plot(x, y, 'o', color='red', label="data")
plt.plot(times, func(times, *popt), '--', color='blue', label="optimized data")
# plt.plot(x, func(x, *popt), '--', color='blue', label="optimized data")
plt.legend()
plt.show()
I am getting below output (in image)
Could anyone help me spotting the mistake or any suggestion with the code

How to create a confidence interval with plt.fill_between inside a scatter plot

I created a scatter plot that uses data from two sources: x = []and y = []. In a second step, I added a linear regression line for the two lists of data above using the following code:
(m, b) = np.polyfit(x, y, 1)
Y_Polyval = np.polyval([m, b], x)
plt.plot(x, Y_Polyval, linewidth=3, c="black")
The result of that is a standard scatterplot as shown below.
Now I would like to add a 95% confidence interval to the black regression line, using plt.fill_between. I know that there are many topics on this, I read through many of them, but I cannot solve the problem, i.e., adapting a code to my particular code and regression line.
Adding
CI = 1.96 * np.std(y) / np.mean(y)
plt.fill_between(y, (y-CI), (y+CI), color='blue', alpha=0.1)
to my code results in the following output below.
The blueish confidence interval by plt.fill_between is somewhere drawn on the left side of the image, but not around the regression line. What I would like to achieve is that the confidence interval draws around the black regression line. The full code is shown subsequently:
import numpy as np
import matplotlib.pyplot as plt
# Scatter plot
x = [0.472202, 0.685151, 0.287613, 0.546364, 0.518002, 0.675128, 0.462418, 0.61817, 0.692822, 0.23433,
0.194009, 0.720232, 0.597321, 0.625955, 0.660571, 0.737754, 0.436876, 0.689937, 0.483067, 0.646723,
0.699367, 0.384102, 0.561493]
y = [0.131113, 0.123865, 0.150355, 0.138914, 0.140417, 0.119358, 0.130019, 0.129782, 0.113508, 0.13434,
0.15162, 0.125768, 0.128473, 0.128056, 0.114403, 0.142878, 0.139192, 0.118033, 0.132616, 0.133043,
0.133973, 0.146611, 0.129792]
(m, b) = np.polyfit(x, y, 1)
Y_Polyval = np.polyval([m, b], x)
plt.plot(x, Y_Polyval, linewidth=3, c="black")
CI = 1.96 * np.std(y) / np.mean(y)
plt.fill_between(y, (y-CI), (y+CI), color='blue', alpha=0.1)
plt.scatter(x, y, s=250, linewidths=2, zorder=2)
plt.show()
You should plot the predicted value Y_Polyval instead of the true value y and sort the (x, y) values to fill the areas:
plt.fill_between(x, (Y_Polyval-CI), (Y_Polyval+CI), color='blue', alpha=0.1)
Full Example
import numpy as np
import matplotlib.pyplot as plt
# Scatter plot
x = [0.472202, 0.685151, 0.287613, 0.546364, 0.518002, 0.675128, 0.462418, 0.61817, 0.692822, 0.23433,
0.194009, 0.720232, 0.597321, 0.625955, 0.660571, 0.737754, 0.436876, 0.689937, 0.483067, 0.646723,
0.699367, 0.384102, 0.561493]
y = [0.131113, 0.123865, 0.150355, 0.138914, 0.140417, 0.119358, 0.130019, 0.129782, 0.113508, 0.13434,
0.15162, 0.125768, 0.128473, 0.128056, 0.114403, 0.142878, 0.139192, 0.118033, 0.132616, 0.133043,
0.133973, 0.146611, 0.129792]
# Sort coordinate values
coords = [(a, b) for a, b in zip(x, y)]
coords = sorted(coords, key=lambda x: x[1], reverse=True)
x, y = zip(*coords)
(m, b) = np.polyfit(x, y, 1)
Y_Polyval = np.polyval([m, b], x)
plt.plot(x, Y_Polyval, linewidth=3, c="black")
plt.scatter(x, y, s=250, linewidths=2, zorder=2)
plt.fill_between(x, (Y_Polyval-CI), (Y_Polyval+CI), color='blue', alpha=0.1)

Scipy curve_fit giving straight line

So I'm trying to get an exponential curve for some COVID data, but I can't seem to get my curve_fit function to show any sort of curve whatsoever. It's so bad it perfectly overlaps the regression line seaborn generated in my graph.
I've tried making both my date and case data smaller/bigger before throwing it into the curve_fit function, but I still either get a similar line and/or an Optimization error. I even tried calculating my function manually but that was (naturally) also way off.
#Plot scatter plot for total case count
x = df_sb['date_ordinal']
y1 = df_sb['totalcountconfirmed']
y2 = df_sb['totalcountdeaths']
plt.figure(figsize=(14,10))
ax = plt.subplot(1,1,1)
# Plot scatter plot along with linear regression line
sns.regplot(x='date_ordinal', y='totalcountconfirmed', data=df_sb)
# Formatting axes
ax.set_xlim(x.min() - 1, x.max() + 10)
ax.set_ylim(0, y1.max() + 1)
ax.set_xlabel('Date')
labels = [dt.date.fromordinal(int(item)) for item in ax.get_xticks()]
ax.set_xticklabels(labels)
plt.xticks(rotation = 45)
plt.ylabel("Total Confirmed Cases")
# Exponential Curve
from scipy.optimize import curve_fit
from scipy.special import expit
x_data = df_sb['date_ordinal'].to_numpy()
Y_data = df_sb['totalcountconfirmed'].to_numpy()
def func(x, a, b, c):
return a * expit(-b * x) + c
popt, pcov = curve_fit(func, x_data, Y_data, maxfev=10000)
a, b, c = popt
fit_y = func(x_data, a, b, c)
plt.plot(x_data, fit_y)
plt.legend(['Total Cases (Linear)','Total Cases (Exponential)'])
# Inserting Significant Date Labels
add_sig_dates(df_sb, 'totalcountconfirmed')
plt.show()
Despite you did not give any access to the data, just by looking at the plot I'm pretty sure you mean
def func(x, a, b, c):
return a * np.exp(-b * x) + c
instead of
def func(x, a, b, c):
return a * expit(-b * x) + c
Since it's an exponential fit, I think you should provide initial guess for parameters in order to achieve good results. This can be done with the p0 argument.
For example:
p0 = [2 ,1, 0] # < -- just an example, they are bad guesses
popt, pcov = curve_fit(func, x_data, Y_data, maxfev=10000, p0=p0)

How to fit part of a Cosine curve to data in Python?

Written this code to try and plot a a graph of y = a(1 + cos(bx - pi)) + c to our data collected but when using np.cos it tries to fit an entire cycle of cos onto the data, which doesn't fit our results. Any help on how to fit only a section of the curve to our data would be fab!
Tried to avoid using cos by using maclaurin series expansion but this still doesn't work.
x_data = w
y_data = mean
e = error
from scipy import optimize
def test_func(x, a, b, c):
y = (a/2)*(1 + (1 - (1/2)*(b*x - np.pi)**2 + (1/24)*(b*x - np.pi)**4)) + c
return y
params, params_covariance = optimize.curve_fit(test_func, x_data, y_data)
print(params)
a = params[0]
b = params[1]
c = params[2]
figure(num=None, figsize=(12, 6), dpi=80, facecolor='w', edgecolor='k')
plt.errorbar(x_data, y_data, yerr=e, fmt='o', marker='o', label='Data', markersize=3, color='k', elinewidth=1, capsize=2, markeredgewidth=1)
plt.plot(x_data, test_func(x_data, params[0], params[1], params[2]), label='Fitted function')
plt.legend(loc='best')
plt.ylabel('Interference intensity, $I$')
plt.xlabel('Rotational velocity of interferometer, $w$')
plt.show()
Your question is "how to fit only a section of a curve to our data." This can be accomplished by defining a piece-wise function and fitting a section of your data to each corresponding piece of the function. You need to define the cut-off values that separate the parts of your data and pick which functions to fit to each part.
In order to fit a curve to only a section of the data, you need to only pass the portion of the data to curve_fit that you want to fit. Here are working examples of fitting the data to both a Maclaurin series and a cosine function:
from scipy import optimize
# Generate sample data
np.random.seed(0)
x_data = np.linspace(-np.pi,3*np.pi,101)
y_data = np.cos(x_data) + np.random.rand(len(x_data))/4
idx = (x_data < 0) | (x_data > 2*np.pi)
y_data[idx] = 1 + np.random.rand(sum(idx))/4
e = np.random.rand(len(x_data))/10
# Select part of data to fit
fit_part = ~idx
x_data_to_fit = x_data[fit_part]
y_data_to_fit = y_data[fit_part]
Cosine Function:
def test_func(x, a, b):
y = a*np.cos(b*x)
return y
params, params_covariance = optimize.curve_fit(test_func, x_data_to_fit, y_data_to_fit)
print(params)
a = params[0]
b = params[1]
plt.figure(num=None, figsize=(12, 6), dpi=80, facecolor='w', edgecolor='k')
plt.title('Cosine Function Fit')
plt.errorbar(x_data, y_data, yerr=e, fmt='o', marker='o', label='Data', markersize=3, color='k', elinewidth=1, capsize=2, markeredgewidth=1)
plt.plot(x_data_to_fit, test_func(x_data_to_fit, a, b), label='Fitted function')
plt.legend(loc='best')
plt.ylabel('Interference intensity, $I$')
plt.xlabel('Rotational velocity of interferometer, $w$')
plt.show()
Maclaurin Series:
def test_func(x, a, b, c):
y = (a/2)*(1 + (1 - (1/2)*(b*x - np.pi)**2 + (1/24)*(b*x - np.pi)**4)) + c
return y
params, params_covariance = optimize.curve_fit(test_func, x_data_to_fit, y_data_to_fit)
print(params)
a = params[0]
b = params[1]
c = params[2]
plt.figure(num=None, figsize=(12, 6), dpi=80, facecolor='w', edgecolor='k')
plt.title('MacLaurin Series Fit')
plt.errorbar(x_data, y_data, yerr=e, fmt='o', marker='o', label='Data', markersize=3, color='k', elinewidth=1, capsize=2, markeredgewidth=1)
plt.plot(x_data_to_fit, test_func(x_data_to_fit, a, b, c), label='Fitted function')
plt.legend(loc='best')
plt.ylabel('Interference intensity, $I$')
plt.xlabel('Rotational velocity of interferometer, $w$')
plt.show()
The cosine function matches the data better than the Maclaurin series in this case because the data was generated using a cosine function.

Categories

Resources