Intersection between Gaussian - python

I'm just trying to plot two gaussians and to find the intersection point. I have the following code. It's not plotting the exact intersection though and I really cannot figure out why. It's like just barely slightly off but I worked through the derived solution if we took the log of subtracted gaussians and yeah it seems like it should be correct. Can anyone help? Thank you so much!
import numpy as np
import matplotlib.pyplot as plt
def plot_normal(x, mean = 0, sigma = 1):
return 1.0/(2*np.pi*sigma**2) * np.exp(-((x-mean)**2)/(2*sigma**2))
# found online
def solve_gasussians(m1, s1, m2, s2):
a = 1.0/(2.0*s1**2) - 1.0/(2.0*s2**2)
b = m2/(s2**2) - m1/(s1**2)
c = m1**2 /(2*s1**2) - m2**2 / (2.0*s2**2) - np.log(s2/s1)
return np.roots([a,b,c])
s1 = np.linspace(0, 10,300)
s2 = np.linspace(0, 14, 300)
solved_val = solve_gasussians(5.0, 0.5, 7.0, 1.0)
print solved_val
solved_val = solved_val[0]
plt.figure('Baseline Distributions')
plt.title('Baseline Distributions')
plt.xlabel('Response Rate')
plt.ylabel('Probability')
plt.plot(s1, plot_normal(s1, 5.0, 0.5),'r', label='s1')
plt.plot(s2, plot_normal(s2, 7.0, 1.0),'b', label='s2')
plt.plot(solved_val, plot_normal(solved_val, 7.0, 1.0), 'mo')
plt.legend()
plt.show()

You have a small bug in plot_normal function - you are missing square root in the denominator. Proper version:
def plot_normal(x, mean = 0, sigma = 1):
return 1.0/np.sqrt(2*np.pi*sigma**2) * np.exp(-((x-mean)**2)/(2*sigma**2))
gives the expected result:
And two remarks.
Remember that you can have 2 roots of the equation in general (two intersection points), and this is the case with parameters you provided.
As far as I know np.roots gives you approximate result, but you cat get exact result easily, rewriting solve_gasussians function as:
def solve_gasussians(m1, s1, m2, s2):
# coefficients of quadratic equation ax^2 + bx + c = 0
a = (s1**2.0) - (s2**2.0)
b = 2 * (m1 * s2**2.0 - m2 * s1**2.0)
c = m2**2.0 * s1**2.0 - m1**2.0 * s2**2.0 - 2 * s1**2.0 * s2**2.0 * np.log(s1/s2)
x1 = (-b + np.sqrt(b**2.0 - 4.0 * a * c)) / (2.0 * a)
x2 = (-b - np.sqrt(b**2.0 - 4.0 * a * c)) / (2.0 * a)
return x1, x2

I don't know where the mistake lies in your code. But I think I found the code your borrowed from and made part of the adjustment you need.
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
def solve(m1,m2,std1,std2):
a = 1/(2*std1**2) - 1/(2*std2**2)
b = m2/(std2**2) - m1/(std1**2)
c = m1**2 /(2*std1**2) - m2**2 / (2*std2**2) - np.log(std2/std1)
return np.roots([a,b,c])
m1 = 5
std1 = 0.5
m2 = 7
std2 = 1
result = solve(m1,m2,std1,std2)
x = np.linspace(-5,9,10000)
plot1=plt.plot(x,[norm.pdf(_,m1,std1) for _ in x])
plot2=plt.plot(x,[norm.pdf(_,m2,std2) for _ in x])
plot3=plt.plot(result[0],norm.pdf(result[0],m1,std1) ,'o')
plt.show()
I will offer two pieces of unsolicited advice that might make life easier for you (in the way they do for me):
When you adapt code try to make small, incremental changes and check that the code still works at each step.
Look for existing free libraries. In this case norm from scipy is a good replacement for what was used in the original code.

The mistake is here. This line:
def plot_normal(x, mean = 0, sigma = 1):
return 1.0/(2*np.pi*sigma**2) * np.exp(-((x-mean)**2)/(2*sigma**2))
Should be this:
def plot_normal(x, mean = 0, sigma = 1):
return 1.0/np.sqrt(2*np.pi*sigma**2) * np.exp(-((x-mean)**2)/(2*sigma**2))
You forgot the sqrt.
It would be wiser to use a pre-existing normal pdf if that's available, such as:
import scipy.stats
def plot_normal(x, mean = 0, sigma = 1):
return scipy.stats.norm.pdf(x,loc=mean,scale=sigma)
It's also possible to solve for the intersections exactly. This answer provides a quadratic equation for the roots of the Gaussians' intersections. Using maxima to solve for x gives the following expression. Which, while complicated, does not rely on iterative methods and can be automatically generated from simpler expressions.
def solve_gaussians(m1,s1,m2,s2):
x1 = (s1*s2*np.sqrt((-2*np.log(s1/s2)*s2**2)+2*s1**2*np.log(s1/s2)+m2**2-2*m1*m2+m1**2)+m1*s2**2-m2*s1**2)/(s2**2-s1**2)
x2 = -(s1*s2*np.sqrt((-2*np.log(s1/s2)*s2**2)+2*s1**2*np.log(s1/s2)+m2**2-2*m1*m2+m1**2)-m1*s2**2+m2*s1**2)/(s2**2-s1**2)
return x1,x2
Putting it altogether gives:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats
def plot_normal(x, mean = 0, sigma = 1):
return scipy.stats.norm.pdf(x,loc=mean,scale=sigma)
#Use the equation from [this answer](https://stats.stackexchange.com/a/12213/12116) solved for x
def solve_gaussians(m1,s1,m2,s2):
x1 = (s1*s2*np.sqrt((-2*np.log(s1/s2)*s2**2)+2*s1**2*np.log(s1/s2)+m2**2-2*m1*m2+m1**2)+m1*s2**2-m2*s1**2)/(s2**2-s1**2)
x2 = -(s1*s2*np.sqrt((-2*np.log(s1/s2)*s2**2)+2*s1**2*np.log(s1/s2)+m2**2-2*m1*m2+m1**2)-m1*s2**2+m2*s1**2)/(s2**2-s1**2)
return x1,x2
s = np.linspace(0, 14,300)
x = solve_gaussians(5.0,0.5,7.0,1.0)
plt.figure('Baseline Distributions')
plt.title('Baseline Distributions')
plt.xlabel('Response Rate')
plt.ylabel('Probability')
plt.plot(s, plot_normal(s, 5.0, 0.5),'r', label='s1')
plt.plot(s, plot_normal(s, 7.0, 1.0),'b', label='s2')
plt.plot(x[0],plot_normal(x[0],5.,0.5),'mo')
plt.plot(x[1],plot_normal(x[1],5.,0.5),'mo')
plt.legend()
plt.show()
Giving:

Related

Translation from Mathcad to Python

I can't translate a formula from Mathcad to Python. Stuck on "a".
Here's what I was able to do:
from matplotlib import pyplot as plt
import numpy as np
k1 = 1
b = 1.51
D = (1/b) * (np.sqrt(k1/np.pi))
x0 = 10 * b
myArray = np.arange(0, 24, 0.1)
for t in myArray:
S1_t = (k1) / (1 + np.e ** (-(D * myArray - 5)))
S1_deistv = S1_t.real
plt.plot(myArray, S1_deistv, color="black")
plt.show()
As you can see, MathCad is going to:
create an expression containing the symbolic derivative of S1.
finding the root of that expression.
In Python, we have to use different libraries to achieve the same result. In this particular case it is a little bit more convoluted (requires more steps). In particular:
Use SymPy to create a symbolic expression. We can then compute the symbolic derivative.
Use SciPy's root finding algorithms, such as root or bisect, ...
Here is the code: I've added some comments to help you understand.
from matplotlib import pyplot as plt
import numpy as np
from scipy.optimize import root
import sympy as sp
k1 = 1
b = 1.51
D = (1/b) * np.sqrt(k1 / np.pi)
# create a symbol
t = sp.symbols("t")
# create a symbolic expression. Note that we are using the
# exponential function of SymPy (because it is symbolic)
S1 = k1 / (1 + sp.exp(-(D * t - 5)))
print("S1 = ", S1)
# write the expression
# with `diff` we are computing the derivative with respect to `t`
expr = S1 - t * sp.diff(S1, t)
print("expr = ", expr)
# convert the expression to a numerical function so that it
# can be evaluated by Numpy/Scipy
f = sp.lambdify([t], expr)
# plot the symbolic expression to help us decide a good initial
# guess for the root finding algorithm
sp.plot(expr, (t, 0, 24))
# in the interval t in [0, 24] we can see two roots, one at
# about 2 and the other at about 18.
# Let's find the second root.
result = root(f, 18)
print("result", result)
a = result.x[0]
print("a = ", a)
# remember that S1 is a symbolic expression: we can substitute
# t (the symbol) with a (the number)
b = float(S1.subs(t, a))
k = b / a
print("k = ", k)
t_array = np.arange(0, 24, 0.1)
plt.figure()
S1_t = (k1) / (1 + np.e ** (-(D * t_array - 5)))
S1_deistv = S1_t.real
plt.plot(t_array, S1_deistv, color="black")
plt.plot(t_array, k * t_array)
plt.show()
This is the output of the following code:
S1 = 1/(exp(5 - 0.373635485793216*t) + 1)
expr = -0.373635485793216*t*exp(5 - 0.373635485793216*t)/(exp(5 - 0.373635485793216*t) + 1)**2 + 1/(exp(5 - 0.373635485793216*t) + 1)
Function of which we want to find the roots
result fjac: array([[-1.]])
fun: array([-6.66133815e-16])
message: 'The solution converged.'
nfev: 6
qtf: array([3.5682568e-13])
r: array([-0.22395716])
status: 1
success: True
x: array([18.06314347])
a = 18.063143471730815
k = 0.04715849105203411

How to compute the derivative graph of a Python plot

I am working on the following code, which solves a system of coupled differential equations. I have been able to solve them, and I plotted one of them. I am curious how to compute and plot the derivative of this graph numerically (I know the derivative is given in the first function, but suppose I didn't have that). I was thinking that I could use a for-loop, but is there a faster way?
import numpy as np
from scipy.integrate import odeint
from scipy.interpolate import interp1d
import matplotlib.pyplot as plt
import math
def hiv(x,t):
kr1 = 1e5
kr2 = 0.1
kr3 = 2e-7
kr4 = 0.5
kr5 = 5
kr6 = 100
h = x[0] # Healthy Cells -- function of time
i= x[1] #Infected Cells -- function of time
v = x[2] # Virus -- function of time
p = kr3 * h * v
dhdt = kr1 - kr2*h - p
didt = p - kr4*i
dvdt = -p -kr5*v + kr6*i
return [dhdt, didt, dvdt]
print(hiv([1e6, 0, 100], 0))
x0 = [1e6, 0, 100] #initial conditions
t = np.linspace(0,15,1000) #time in years
x = odeint(hiv, x0, t) #vector of the functions H(t), I(t), V(t)
h = x[:,0]
i = x[:,1]
v = x[:,2]
plt.semilogy(t,h)
plt.show()

Python, Four-bar linkage angle-time plot

I'm trying to plot the angle vs. time plot for the output angle of a four-bar linkage (angle fi4 in the image below). This angle is calculated using the solution from the https://scholar.cu.edu.eg/?q=anis/files/week04-mdp206-position_analysis-draft.pdf, page 23.
I'm now trying to plot the fi_4(t) plot and am getting some strange results. The diagram displays the input angle fi2 as blue and output angle fi4 as red. Why is the fi2 fluctuating over time? Shouldn't the fi4 have some sort of sine curve?
Am I missing something here?
Four-bar linkage:
The code:
from __future__ import division
import math
import numpy as np
import matplotlib.pyplot as plt
# Input
#lengths of links (tube testing machine actual lengths)
a = 45.5 #mm
b = 250 #mm
c = 140 #mm
d = 244.244 #mm
# Solution for fi2 being a time function, f(time) = angle
f = 16.7/60 #/s
omega = 2 * np.pi * f #rad/s
t = np.linspace(0, 50, 100)
y = a * np.sin(omega * t)
x = a * np.cos(omega * t)
fi2 = np.arctan(y/x)
# Solution of the vector loop equation
#https://scholar.cu.edu.eg/?q=anis/files/week04-mdp206-position_analysis-draft.pdf
K1 = d/a
K2 = d/c
K3 = (a**2 - b**2 + c**2 + d**2)/(2*a*c)
A = np.cos(fi2) - K1 - K2*np.cos(fi2) + K3
B = -2*np.sin(fi2)
C = K1 - (K2+1)*np.cos(fi2) + K3
fi4_1 = 2*np.arctan((-B+np.sqrt(B**2 - 4*A*C))/(2*A))
fi4_2 = 2*np.arctan((-B-np.sqrt(B**2 - 4*A*C))/(2*A))
# Plot the fi2 time diagram and fi4 time diagram
plt.plot(t, np.degrees(fi2), color = 'blue')
plt.plot(t, np.degrees(fi4_2), color = 'red')
plt.show()
Diagram:
The linespace(0, 50, 100) is too fast. Replacing it with:
t = np.linspace(0, 5, 100)
Second, all the calculations involving the bare np.arctan() are incorrect. You should use np.arctan2(y, x), which determines the correct quadrant (unlike anything based on y/x where the respective signs of x and y are lost). So:
fi2 = np.arctan2(y, x) # not: np.arctan(y/x)
...
fi4_1 = 2 * np.arctan2(-B + np.sqrt(B**2 - 4*A*C), 2*A)
fi4_2 = 2 * np.arctan2(-B - np.sqrt(B**2 - 4*A*C), 2*A)
Putting some labels on your plots and showing both solutions for θ_4:
plt.plot(t, np.degrees(fi2) % 360, color = 'k', label=r'$θ_2$')
plt.plot(t, np.degrees(fi4_1) % 360, color = 'b', label=r'$θ_{4_1}$')
plt.plot(t, np.degrees(fi4_2) % 360, color = 'r', label=r'$θ_{4_2}$')
plt.xlabel('t [s]')
plt.ylabel('degrees')
plt.legend()
plt.show()
With these mods, we get:
BTW, do you want to see an amazingly lazy way of solving problems like these? Much more inefficient than your code, but much easier to derive (e.g. for other structures) without trying to express the closed form of your solution:
from scipy.optimize import fsolve
def polar(r, theta):
return r * np.array((np.cos(theta), np.sin(theta)))
def f(th34, th2):
th3, th4 = th34 # solve simultaneously for theta_3 and theta_4
pb_23 = polar(a, th2) + polar(b, th3) # point B based on links a, b
pb_14 = polar(d, 0) + polar(c, th4) # point B based on links d, c
return pb_23 - pb_14 # error: difference of the two
def solve(th2):
th4_1 = np.array([fsolve(f, [0, -1.5], args=(th2_k,))[1] for th2_k in th2])
th4_2 = np.array([fsolve(f, [0, 1.5], args=(th2_k,))[1] for th2_k in th2])
return th4_1, th4_2
Application:
t = np.linspace(0, 5, 100)
th2 = omega * t
th4_1, th4_2 = solve(th2)
twopi = 2 * np.pi
np.allclose(th4_1 % twopi, fi4_1 % twopi)
# True
np.allclose(th4_2 % twopi, fi4_2 % twopi)
# True
Depending on the structure of your mechanism (e.g. 5 links), you may have more than two solutions, and of course more angles, so you'd have to adapt the code above. But you get the idea.
Be warned: fsolve iterates to find a suitable (close enough) solution, so as I said, it is much slower than your closed form.
Update (some clarification/explanation):
The function f computes the position of the point B in two different ways (via R2-R3 and via R1-R4) and returns the difference (as a vector). We solve for the difference to be zero.
That function takes two arguments: one 2-dimensional variable (th34, which is an array [th3, th4]) and one parameter th2; the parameter is constant during one run of fsolve.
The values [0, -1.5] and [0, 1.5] are initialization values (guesses) for th34 (th3 and th4). We call fsolve twice to get the two possible solutions.
All angles refer to your figure. I use th for θ (theta, not phi), but I kept along the original fi4_1 and fi4_2 for comparison.
Modulo 2*pi, th4_1 should be equal to fi4_1 etc., which is tested by np.allclose to account for numerical rounding errors.

The generalized Student-T probability distribution I coded in Python doesn't integrate to 1 (in some cases)

I've been trying to implement the skewed generalized t distribution in Python to model some financial returns. I based my code on formulas found on Wikipedia, and I used the Beta distribution from scipy.
from scipy.special import beta
import numpy as np
from math import sqrt
def sgt(x, params):
# This function accepts an array of 5 parameters [mu, sigma, lambda, p, q]
mu, sigma, lam, p, q = params
v = (q**(-1/p)) / (sqrt((3*lam*lam + 1)*beta(3/p, q-2/p)/beta(1/p, q) - 4*lam*lam*(beta(2/p, q-1/p)/(beta(1/p, q)))**2))
m = 2*v*sigma*lam*q**(1/p)*beta(2/p, q - 1/p) / beta(1/p, q)
fx = p / (2*v*sigma*(q**(1/p))*beta(1/p, q)*((abs(x-mu+m)**p/(q*(v*sigma)**p*(lam*np.sign(x-mu+m)+1)**p + 1)+1)**(1/p + q)))
return fx
Now, the function seems to work perfectly fine for some sets of parameters, but terribly for other sets of parameters.
For example:
dx = 0.001
x_axis = np.arange(-10, 10, dx)
ok_parameters = [0, 2, 0, 3, 8]
bad_parameters = [0, 2, 0, 1.05, 2.1]
ok_distribution = sgt(x_axis, ok_parameters)
bad_distribution = sgt(x_axis, bad_parameters)
If I try to compute the integrals of those two numbers:
a = np.sum(ok_distribution*dx)
b = np.sum(bad_distribution*dx)
I obtain the results a = 1.0013233154393804 and b = 2.2799746093533346.
Now, in theory both of these should be 1, but I assume since I approximated the integral the value won't always be exactly 1. In the second case however I don't understand why the value is so high.
Does anyone know what the issue is?
These are the graphs of the ok distribution (blue) and bad distribution (orange)
I believe there was just a typo (though I couldn't exactly find where) in your definition sgt. Here is an implementation that works.
%matplotlib inline
import matplotlib.pyplot as plt
from scipy.special import beta
import numpy as np
from math import sqrt
from typing import Union
from scipy import integrate
# Generalised Student T probability Distribution
def generalized_student_t(x:Union[float, np.ndarray], mu:float, sigma:float,
lam:float, p:float, q:float) \
-> Union[float, np.ndarray]:
v = q**(-1/p) * ((3*lam**2 + 1)*(beta(3/p, q - 2/p)/beta(1/p,q)) - 4*lam**2*(beta(2/p, q - 1/p)/beta(1/p,q))**2)**(-1/2)
m = 2*v*sigma*lam*q**(1/p)*beta(2/p,q - 1/p)/beta(1/p,q)
fx = p / (2*v*sigma*q**(1/p)*beta(1/p,q)*(abs(x-mu+m)**p/(q*(v*sigma)**p)*(lam*np.sign(x-mu+m)+1)**p + 1)**(1/p + q))
return fx
def plot_cdf_pdf(x_axis:np.ndarray, pmf:np.ndarray) -> None:
"""
Plot the PDF and CDF of the array returned from the function.
"""
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))
ax1.plot(x_axis, pmf)
ax1.set_title('PDF')
ax2.plot(x_axis, integrate.cumtrapz(x=x_axis, y=pmf, initial = 0))
ax2.set_title('CDF')
pass
dx = 0.0001
x_axis = np.arange(-10, 10, dx)
# Create the Two
distribution1 = generalized_student_t(x=x_axis, mu=0, sigma=1, lam=0, p=2, q=100)
distribution2 = generalized_student_t(x=x_axis, mu=0, sigma=2, lam=0, p=1.05, q=2.1)
plot_cdf_pdf(x_axis=x_axis, pmf=distribution1)
plot_cdf_pdf(x_axis=x_axis, pmf=distribution2)
We can also check that the integral of the PDFs are 1
integrate.simps(x=x_axis, y = distribution1)
integrate.simps(x=x_axis, y = distribution2)
We can see the results of the integral are 0.99999999999999978 and 0.99752026308335162. The reason they are not exactly 1 is due the CDF being defined as integral from -infinity to infinity of the PDF.

Differential equations in Python: Single time step

I would like to be able to do something else in each time step during solving an ODE (using scipy's integrate). Is there a way to do so? Can I somehow write my own time loop and just call a single e.g. Runge-Kutta step by myself? Is there a routine in python or would I have to come up with my own? I think there must be one since odeint etc have to use such a function. So the question is, how do I access them?
So it should looking something along these lines:
from scipy.integrate import *
from pylab import *
def deriv(y, t):
a = -2.0
b = -0.1
return array([y[1], a*y[0]+b*y[1]])
time = linspace(0.0, 10.0, 1000)
dt = 10.0/(1000-1)
yinit = array([0.0005, 0.2])
for t in time:
# doSomething, write into a file or whatever
y[t] = yinit
yinit = RungeKutta(deriv, yinit, t, dt, varargs)
I now came along with this:
from pylab import *
from scipy.integrate import *
def RHS(t, x):
return -x
min_t = 0.0
max_t = 10.0
num_t = 1e2
grid_t = linspace(min_t, max_t, num_t)
grid_dt = (max_t - min_t)/(num_t - 1)
y = zeros(num_t, dtype=complex)
y[0] = complex(1.0, 0.0)
solver = complex_ode(RHS)
solver.set_initial_value(y[0], grid_t[0]).set_integrator('dopri5')
for idx in range(1, int(num_t)):
solver.integrate(solver.t + grid_dt)
y[idx] = solver.y[0]
In here I can do whatever I want during the integration.

Categories

Resources