I am not high proficiency at maths, what i'm trying to do is to fit several populations that are supposed to be lognormal distributed. My piece of code is the next:
from scipy.optimize import curve_fit
# Generation of 3 population:
import numpy as np
s1 = np.random.lognormal(2, 0.6, 1000) #mu and sigma
s2 = np.random.lognormal(1.6, 0.3, 1000) #mu and sigma
s3 = np.random.lognormal(1.8, 0.5, 1000) #mu and sigma
mb = np.max([s1,s2,s3])
X = np.arange(1,mb,1)
#histogram population 1
Y11, bins1 = np.histogram(s1, X)
Y1 = Y11/Y11.sum()
X1 = bins1[:-1]
#histogram population 2
Y22, bins2 = np.histogram(s2, X)
Y2 = Y22/Y22.sum()
X2 = bins2[:-1]
#histogram population 3
Y33, bins3 = np.histogram(s3, X)
Y3 = Y33/Y33.sum()
X3 = bins3[:-1]
#universe, with all mixed populations
S = np.concatenate((s1, s2, s3), axis=None)
Yi, bins = np.histogram(S, X)
Y = Yi/Yi.sum()
X = bins[:-1]
def logN(x, mu, sigma):
return (np.exp(-(np.log(x) - mu)**2 / (2 * sigma **2)) / (x * sigma * np.sqrt(2 * np.pi))) #lognormal function
params, pcov = curve_fit(logN, X,Y, method="lm")
print(params)
plt.plot(X1, Y1, 'o')
plt.plot(X2, Y2, 'o')
plt.plot(X3, Y3, 'o')
plt.plot(X, Y, 'r-o')
plt.plot(X, logN(X ,params[0], params[1]))
plt.show()
This code produces a graph where I can get the global parameters mu and sigma. However, I'm confusing how should I do to get back the parameters of each population from the mixed population data. Any idea how to handle this problem is welcome
Related
I have created this code by modification from previous topics.
I put the calculated volume on the volume plot. My questions are:
My plots are correct right?
My volume calculations are correct too right?
Why there will be negative volume? If I put the formula for vx(x) as r1 - r2 it will be negative. Should I put abs (absolute value) instead in the future? So I could careless If I put r1 - r2 or r2 - r1, the numbers is the same, only one has negative sign. What is the significant meaning of negative sign for volume? Do we need a careful thought when calculating volume through integration?
I do not use sympy is sympy better in calculating integral than numpy/scipy?
Thanks.. this is my code / MWE:
# Compare the plot at xy axis with the solid of revolution toward x and y axis
# For region bounded by the line x - 2y = 0 and y^2 = 4x
# Plotting the revolution of the bounded region
# can be done by limiting the np.linspace of the y, u, and x_inverse values
# You can determine the limits by finding the intersection points of the two functions.
import matplotlib.pyplot as plt
import numpy as np
import sympy as sy
def r1(x):
return x/2
def r2(x):
return 2*(x**(1/2))
def r3(x):
return 2*x
def r4(x):
return (x/2)**(2)
def vx(x):
return np.pi*(r2(x)**2 - r1(x)**2)
def vy(x):
return np.pi*(r3(x)**2 - r4(x)**2)
x = sy.Symbol("x")
vx = sy.integrate(vx(x), (x, 0, 16))
vy = sy.integrate(vy(x), (x, 0, 8))
n = 200
fig = plt.figure(figsize=(14, 7))
ax1 = fig.add_subplot(221)
ax2 = fig.add_subplot(222, projection='3d')
ax3 = fig.add_subplot(223)
ax4 = fig.add_subplot(224, projection='3d')
y = np.linspace(0, 8, n)
x1 = (2*y)
x2 = (y / 2) ** (2)
t = np.linspace(0, np.pi * 2, n)
u = np.linspace(0, 16, n)
v = np.linspace(0, 2 * np.pi, n)
U, V = np.meshgrid(u, v)
X = U
Y1 = (2 * U ** (1/2)) * np.cos(V)
Z1 = (2 * U ** (1/2)) * np.sin(V)
Y2 = (U / 2) * np.cos(V)
Z2 = (U / 2) * np.sin(V)
Y3 = ((U / 2) ** (2)) * np.cos(V)
Z3 = ((U / 2) ** (2)) * np.sin(V)
Y4 = (2*U) * np.cos(V)
Z4 = (2*U) * np.sin(V)
ax1.plot(x1, y, label='$y=x/2$')
ax1.plot(x2, y, label='$y=2 \sqrt{x}$')
ax1.legend()
ax1.set_title('$f(x)$')
ax2.plot_surface(X, Y3, Z3, alpha=0.3, color='red', rstride=6, cstride=12)
ax2.plot_surface(X, Y4, Z4, alpha=0.3, color='blue', rstride=6, cstride=12)
ax2.set_title("$f(x)$: Revolution around $y$ \n Volume = {}".format(vy))
# find the inverse of the function
x_inverse = np.linspace(0, 8, n)
y1_inverse = np.power(2*x_inverse, 1)
y2_inverse = np.power(x_inverse / 2, 2)
ax3.plot(x_inverse, y1_inverse, label='Inverse of $y=x/2$')
ax3.plot(x_inverse, y2_inverse, label='Inverse of $y=2 \sqrt{x}$')
ax3.set_title('Inverse of $f(x)$')
ax3.legend()
ax4.plot_surface(X, Y1, Z1, alpha=0.3, color='red', rstride=6, cstride=12)
ax4.plot_surface(X, Y2, Z2, alpha=0.3, color='blue', rstride=6, cstride=12)
ax4.set_title("$f(x)$: Revolution around $x$ \n Volume = {}".format(vx))
plt.tight_layout()
plt.show()
Your plots are correct except for the plot at the upper right. The boundary is a little bit off. I change the np.linspace for u to u = np.linspace(0, 8, n). However, the np.linspace of u for bottom right plot is correct, so it remains u = np.linspace(0, 16, n). You can create different variable names for them, but I just simply reassign u again to u itself, and create an X2. I attached the complete code below.
Your volume calculations are correct.
Upper right plot:
Bottom right plot:
It is impossible to have negative volume. You can solve the integrate by hand first and compare it to the numerical results. You can check:
https://math.stackexchange.com/questions/261244/is-there-a-fundamental-reason-that-int-ba-int-ab?rq=1
SymPy is for symbolic computation, but it can also do numerical integration, so does SciPy. My guess is that they both have dependencies on NumPy. I think you are fine as long as you implement them correctly.
# Compare the plot at xy axis with the solid of revolution toward x and y axis
# For region bounded by the line x - 2y = 0 and y^2 = 4x
# Plotting the revolution of the bounded region
# can be done by limiting the np.linspace of the y, u, and x_inverse values
# You can determine the limits by finding the intersection points of the two functions.
import matplotlib.pyplot as plt
import numpy as np
import sympy as sy
def r1(x):
return x / 2
def r2(x):
return 2 * (x ** (1 / 2))
def r3(x):
return 2 * x
def r4(x):
return (x / 2) ** (2)
def vx(x):
return np.pi * (r2(x) ** 2 - r1(x) ** 2)
def vy(x):
return np.pi * (r3(x) ** 2 - r4(x) ** 2)
x = sy.Symbol("x")
vx = sy.integrate(vx(x), (x, 0, 16))
vy = sy.integrate(vy(x), (x, 0, 8))
n = 200
fig = plt.figure(figsize=(14, 7))
ax1 = fig.add_subplot(221)
ax2 = fig.add_subplot(222, projection='3d')
ax3 = fig.add_subplot(223)
ax4 = fig.add_subplot(224, projection='3d')
y = np.linspace(0, 8, n)
x1 = (2 * y)
x2 = (y / 2) ** (2)
t = np.linspace(0, np.pi * 2, n)
u = np.linspace(0, 16, n)
v = np.linspace(0, 2 * np.pi, n)
U, V = np.meshgrid(u, v)
X = U
Y1 = (2 * U ** (1 / 2)) * np.cos(V)
Z1 = (2 * U ** (1 / 2)) * np.sin(V)
Y2 = (U / 2) * np.cos(V)
Z2 = (U / 2) * np.sin(V)
#######################################
u = np.linspace(0, 8, n) # linspace u for the upper right figure should be from 0 to 8 instead of 0 to 16
v = np.linspace(0, 2 * np.pi, n)
U, V = np.meshgrid(u, v)
X2 = U # created X2 here
Y3 = ((U / 2) ** (2)) * np.cos(V)
Z3 = ((U / 2) ** (2)) * np.sin(V)
Y4 = (2 * U) * np.cos(V)
Z4 = (2 * U) * np.sin(V)
ax1.plot(x1, y, label='$y=x/2$')
ax1.plot(x2, y, label='$y=2 \sqrt{x}$')
ax1.legend()
ax1.set_title('$f(x)$')
ax2.plot_surface(X2, Y3, Z3, alpha=0.3, color='red', rstride=6, cstride=12)
ax2.plot_surface(X2, Y4, Z4, alpha=0.3, color='blue', rstride=6, cstride=12)
ax2.set_title("$f(x)$: Revolution around $y$ \n Volume = {}".format(vy))
# find the inverse of the function
x_inverse = np.linspace(0, 8, n)
y1_inverse = np.power(2 * x_inverse, 1)
y2_inverse = np.power(x_inverse / 2, 2)
ax3.plot(x_inverse, y1_inverse, label='Inverse of $y=x/2$')
ax3.plot(x_inverse, y2_inverse, label='Inverse of $y=2 \sqrt{x}$')
ax3.set_title('Inverse of $f(x)$')
ax3.legend()
ax4.plot_surface(X, Y1, Z1, alpha=0.3, color='red', rstride=6, cstride=12)
ax4.plot_surface(X, Y2, Z2, alpha=0.3, color='blue', rstride=6, cstride=12)
ax4.set_title("$f(x)$: Revolution around $x$ \n Volume = {}".format(vx))
plt.tight_layout()
plt.show()
I have a model that describes a sum of Gaussians distributions:
s1 = np.random.normal(2, 0.5, size = (1000, 1))
s2 = np.random.normal(5, 0.5, size = (1000, 1))
mb = (np.concatenate((s1, s2), axis=0)).max()
Xi = np.arange(0,mb,0.1) #bins
#histogram population 1
Y11, bins1 = np.histogram(s1, X)
Y1 = Y11/Y11.sum()
X1 = bins1[:-1]
#histogram population 2
Y22, bins2 = np.histogram(s2, X)
Y2 = Y22/Y22.sum()
X2 = bins2[:-1]
#universe, with all mixed populations
S = np.concatenate((s1, s2), axis=0)
Yi, bins = np.histogram(S, Xi)
Y = Yi/Yi.sum()
X = bins[:-1]
def gaussians(X, amp1, mean1, SD1, amp2, mean2, SD2):
A = amp1 * np.exp(-0.5*((X - mean1)/SD1)**2)
B = amp2 * np.exp(-0.5*((X - mean2)/SD2)**2)
return A + B
params, pcov = curve_fit(gaussians, X,Y, p0=(1,2,1,1,5,1), maxfev=4000)
j = numpy.arange(0.1, mb, 0.1)
plt.figure(figsize=(10, 6)) #size of graph
plt.plot(X, Y, 'o', linewidth=2)
plt.plot(X, gaussians(X ,params[0], params[1],params[2], params[3], params[4], params[5]),'b', linewidth=2)
plt.xlim([-.01, mb])
plt.ylim([0, 0.1])
plt.show()
This code plot a nice graph as follows:
I wonder how to plot each gaussian overlapped in the same graph from the parameters of my model function. I mean, something like this (made by hand):
For those worried to get the answer, I figured out how to do it. It's only matters to become zero all the parameters that you don't want to graph:
plt.plot(X, gaussians(X ,params[0], params[1],params[2], params[3], params[4], params[5]),'b', linewidth=8, alpha=0.1)
plt.plot(X, gaussians(X ,0, params[1],params[2], params[3], params[4], params[5]),'r', linewidth=1 )
plt.plot(X, gaussians(X ,params[0], params[1],params[2], 0, params[4], params[5]),'g', linewidth=1)
plt.xlim([-.01, mb])
plt.ylim([0, 0.1])
I have tried to draw a distribution function with a given mean and standard deviation. However, drawing the distribution function only shows the histograms and not the distribution function and I do not know why it is not drawn:
mean = 15.14
stdev = 0.3738
phi = (stdev ** 2 + mean ** 2) ** 0.5
mu = np.log(mean ** 2 / phi)
sigma = (np.log(phi ** 2 / mean ** 2)) ** 0.5
data=np.random.lognormal(mu, sigma , 1000)
mu, sigma, n= lognorm.fit(data)
plt.hist(data, bins=30, density=True, alpha=0.5, color='b')
# Plot the PDF.
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 1000)
p = lognorm.pdf(x, mu, sigma)
plt.plot(x, p, 'k', linewidth=2)
title = "LogNormal Distribution: Media: {:.2f} y Dev.Est: {:.2f}".format(mean, stdev)
plt.title(title)
plt.show()
The result that I have obtained:
Pay attention to the line:
mu, sigma, n = lognorm.fit(data)
there you are overwriting mu and sigma values used later.
lognorm.pdf(x, mu, sigma) returns zeros because you are evaluating the PDF far away from the mean, where the PDF is actually zero.
In order to properly center the PDF on the mean value, you should replace this line of your code:
p = lognorm.pdf(x, mu, sigma)
with:
p = lognorm.pdf(x = x, scale = mean, s = sigma)
Complete Code
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import lognorm
mean = 15.14
stdev = 0.3738
phi = (stdev ** 2 + mean ** 2) ** 0.5
mu = np.log(mean ** 2 / phi)
sigma = (np.log(phi ** 2 / mean ** 2)) ** 0.5
data=np.random.lognormal(mu, sigma , 1000)
# mu, sigma, n= lognorm.fit(data)
plt.hist(data, bins=30, density=True, alpha=0.5, color='b')
# Plot the PDF.
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 1000)
p = lognorm.pdf(x = x, scale = mean, s = sigma)
plt.plot(x, p, 'k', linewidth=2)
title = "LogNormal Distribution: Media: {:.2f} y Dev.Est: {:.2f}".format(mean, stdev)
plt.title(title)
plt.show()
I would like to predict a ball trajectory by fitting its 3d coordinates into a parabola. Below is my code. But instead of a parabola, I got a straight line. If you have any clue about it, please let me know. Thanks!
# draw scatter coordiante
fig = plt.figure()
ax = plt.axes(projection = '3d')
x_list = []
y_list = []
z_list = []
for x in rm_list:
x_list.append(x[0][0])
y_list.append(x[0][1])
z_list.append(x[0][2])
x = np.array(x_list)
y = np.array(y_list)
z = np.array(z_list)
ax.scatter(x, y, z)
# curve fit
def func(x, a, b, c, d):
return a * x[0]**2 + b * x[1]**2 + c * x[0] * x[1] + d
data = np.column_stack([x_list, y_list, z_list])
popt, _ = curve_fit(func, data[:,:2].T, ydata=data[:,2])
a, b, c, d = popt
print('y= %.5f * x ^ 2 + %.5f * y ^ 2 + %.5f * x * y + %.5f' %(a, b, c, d))
x1 = np.linspace(0.3, 0.4, 100)
y1 = np.linspace(0.02, 0.06, 100)
z1 = a * x1 ** 2 + b * y1 ** 2 + c * x1 * y1 + d
ax.plot(x1, y1, z1, color='green')
plt.show()
Update 1
After changing the func to ax^2 + by^2 + cxy + dx + ey + f, I got a parabola but not fitting to the coordinate.
That you have your underlying timestamp data makes the fitting procedure easier:
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
from numpy.polynomial import Polynomial
# test data generation with some noise
# here read in your data
np.random.seed(123)
n = 40
x_param = [ 1, 21, -1]
y_param = [12, -3, 0]
z_param = [-3, 0, -2]
px = Polynomial(x_param)
py = Polynomial(y_param)
pz = Polynomial(z_param)
t = np.random.choice(np.linspace (-3000, 2000, 1000)/500, n)
x = px(t) + np.random.random(n)
y = py(t) + np.random.random(n)
z = pz(t) + np.random.random(n)
# here start the real calculations
# draw scatter coordinates of raw data
fig = plt.figure()
ax = plt.axes(projection = '3d')
ax.scatter(x, y, z, label="raw data")
# curve fit function
def func(t, x2, x1, x0, y2, y1, y0, z2, z1, z0):
Px=Polynomial([x2, x1, x0])
Py=Polynomial([y2, y1, y0])
Pz=Polynomial([z2, z1, z0])
return np.concatenate([Px(t), Py(t), Pz(t)])
# curve fit
# start values are not necessary for this example
# but make it your rule to always provide start values for curve_fit
start_vals = [ 1, 10, 1,
10, 1, 1,
-1, -1, -1]
xyz = np.concatenate([x, y, z])
popt, _ = curve_fit(func, t, xyz, p0=start_vals)
print(popt)
#[ 1.58003630e+00 2.10059868e+01 -1.00401965e+00
# 1.25895591e+01 -2.97374035e+00 -3.23358241e-03
# -2.44293562e+00 3.96407428e-02 -1.99671092e+00]
# regularly spaced fit data
t_fit = np.linspace(min(t), max(t), 100)
xyz_fit = func(t_fit, *popt).reshape(3, -1)
ax.plot(xyz_fit[0, :], xyz_fit[1, :], xyz_fit[2, :], color="green", label="fitted data")
ax.legend()
plt.show()
Sample output:
I am using the following code in order to smooth my data
a = get_data()
y, x = a.T
t = np.linspace(0, 1, len(x))
t2 = np.linspace(0, 1, len(x))
x2 = np.interp(t2, t, x)
y2 = np.interp(t2, t, y)
sigma = 50
x3 = gaussian_filter1d(x2, sigma)
y3 = gaussian_filter1d(y2, sigma)
x4 = np.interp(t, t2, x3)
y4 = np.interp(t, t2, y3)
plt.plot(x, y, "o-", lw=2)
plt.plot(x3, y3, "r", lw=2)
plt.plot(x4, y4, "o", lw=2)
plt.show()
I found this code here:
line smoothing algorithm in python?
My problem is that I need to get points from the new fit which are exactly with the same x values as my original x values (the points that I have smoothed).
The fit works well but the x values of the new points are different.
How can I get points from the new fit which has the same x value but with the new fit y values. The x values for the points starts from 0 and the space between each one should be 1800.
I think what is particular to your case is that the data to smooth are like a free line in the plane (x, y) = f(t) rather than a function y = f(x)
Maybe the trick is that the points have to be sorted before the interpolation (see numpy.interp):
# Generate random data:
t = np.linspace(0, 3, 20)
x = np.cos(t) + 0.1*np.random.randn(np.size(t))
y = np.sin(t) + 0.1*np.random.randn(np.size(t))
# Smooth the 2D data:
sigma = 2
x_smooth = gaussian_filter1d(x, sigma)
y_smooth = gaussian_filter1d(y, sigma)
# Sort (see: https://stackoverflow.com/a/1903579/8069403)
permutation = x_smooth.argsort()
x_smooth = x_smooth[permutation]
y_smooth = y_smooth[permutation]
x_new = np.sort(x) # not mandatory
# Interpolation on the original x points:
y_smooth_new = np.interp(x_new, x_smooth, y_smooth)
# Plot:
plt.plot(x, y, label='x, y');
plt.plot(x_smooth, y_smooth, label='x_smooth, y_smooth');
plt.plot(x_new, y_smooth_new, '-ro', label='x_new, Y_smooth_new', alpha=0.7);
plt.legend(); plt.xlabel('x');