How to calculate a Normal Distribution percent point function in python

How to calculate a Normal Distribution percent point function in python - python

How do I do the equivalent of scipy.stats.norm.ppf without using Scipy. I have python's Math module has erf built in but I cannot seem to recreate the function.
PS: I cannot just use scipy because Heroku does not allow you to install it and using alternate buildpacks breaches the 300Mb maximum slug size limit.

There's not a simple way to use erf to implement norm.ppf because norm.ppf is related to the inverse of erf. Instead, here's a pure Python implementation of the code from scipy. You should find that the function ndtri returns exactly the same value as norm.ppf:
import math
s2pi = 2.50662827463100050242E0
P0 = [
-5.99633501014107895267E1,
9.80010754185999661536E1,
-5.66762857469070293439E1,
1.39312609387279679503E1,
-1.23916583867381258016E0,
]
Q0 = [
1,
1.95448858338141759834E0,
4.67627912898881538453E0,
8.63602421390890590575E1,
-2.25462687854119370527E2,
2.00260212380060660359E2,
-8.20372256168333339912E1,
1.59056225126211695515E1,
-1.18331621121330003142E0,
]
P1 = [
4.05544892305962419923E0,
3.15251094599893866154E1,
5.71628192246421288162E1,
4.40805073893200834700E1,
1.46849561928858024014E1,
2.18663306850790267539E0,
-1.40256079171354495875E-1,
-3.50424626827848203418E-2,
-8.57456785154685413611E-4,
]
Q1 = [
1,
1.57799883256466749731E1,
4.53907635128879210584E1,
4.13172038254672030440E1,
1.50425385692907503408E1,
2.50464946208309415979E0,
-1.42182922854787788574E-1,
-3.80806407691578277194E-2,
-9.33259480895457427372E-4,
]
P2 = [
3.23774891776946035970E0,
6.91522889068984211695E0,
3.93881025292474443415E0,
1.33303460815807542389E0,
2.01485389549179081538E-1,
1.23716634817820021358E-2,
3.01581553508235416007E-4,
2.65806974686737550832E-6,
6.23974539184983293730E-9,
]
Q2 = [
1,
6.02427039364742014255E0,
3.67983563856160859403E0,
1.37702099489081330271E0,
2.16236993594496635890E-1,
1.34204006088543189037E-2,
3.28014464682127739104E-4,
2.89247864745380683936E-6,
6.79019408009981274425E-9,
]
def ndtri(y0):
if y0 <= 0 or y0 >= 1:
raise ValueError("ndtri(x) needs 0 < x < 1")
negate = True
y = y0
if y > 1.0 - 0.13533528323661269189:
y = 1.0 - y
negate = False
if y > 0.13533528323661269189:
y = y - 0.5
y2 = y * y
x = y + y * (y2 * polevl(y2, P0) / polevl(y2, Q0))
x = x * s2pi
return x
x = math.sqrt(-2.0 * math.log(y))
x0 = x - math.log(x) / x
z = 1.0 / x
if x < 8.0:
x1 = z * polevl(z, P1) / polevl(z, Q1)
else:
x1 = z * polevl(z, P2) / polevl(z, Q2)
x = x0 - x1
if negate:
x = -x
return x
def polevl(x, coef):
accum = 0
for c in coef:
accum = x * accum + c
return accum

The function ppf is the inverse of y = (1+erf(x/sqrt(2))/2. So we need to solve this equation for x, given y between 0 and 1. Here is a code doing this by the bisection method. I imported SciPy function to illustrate that the result is the same.
from math import erf, sqrt
from scipy.stats import norm # only for comparison
y = 0.123
z = 2*y-1
a = 0
while erf(a) > z or erf(a+1) < z: # looking for initial bracket of size 1
if erf(a) > z:
a -= 1
else:
a += 1
b = a+1 # found a bracket, proceed to refine it
while b-a > 1e-15: # 1e-15 ought to be enough precision
c = (a+b)/2.0 # bisection method
if erf(c) > z:
b = c
else:
a = c
print sqrt(2)*(a+b)/2.0 # this is the answer
print norm.ppf(y) # SciPy for comparison
Left for you to do:
preliminary bound checks (y must be between 0 and 1)
scaling and shifting if other mean / variance are desired; the code is for standard normal distribution (mean 0, variance 1).

Related

Minimize system of nonlinear equation (integral on exponent)

General:
I am using maximum entropy to find distribution for on positive integers vectors, I can estimate the mean and variance, and have three equation I am trying to find a and b,
The equations:
integral(exp(a*x^2+bx+c) from (0 , infinity))-1
integral(xexp(ax^2+bx+c)from (0 , infinity))- mean
integral(x^2*exp(a*x^2+bx+c) from (0 , infinity))- mean^2 - var
(integrals between [0,∞))
The problem:
I am trying to use numerical solver and I used fsolve of sympy
But I guess I am missing some knowledge.
My code:
import numpy as np
import sympy as sym
from scipy.optimize import *
def myFunction(x,*data):
y = sym.symbols('y')
m,v=data
F = [0]*3
x[0] = - abs(x[0])
print(x)
F[0] = (sym.integrate(sym.exp(x[0] * y ** 2 + x[1] * y + x[2]), (y, 0,sym.oo)) -1).evalf()
F[1] = (sym.integrate(y*sym.exp(x[0] * y ** 2 + x[1] * y + x[2]), (y, 0,sym.oo))-m).evalf()
F[2] = (sym.integrate((y**2)*sym.exp(x[0] * y ** 2 + x[1] * y + x[2]), (y,0,sym.oo)) -v-m).evalf()
print(F)
return F
data = (10,3.5) # mean and var for example
xGuess = [1, 1, 1]
z = fsolve(myFunction,xGuess,args = data)
print(z)
my result are not that accurate, is there a better way to solve it?
integral(exp(a*x^2+bx+c))-1 = 5.67659292676884
integral(xexp(ax^2+bx+c))- mean = −1.32123173796713
integral(x^2*exp(a*x^2+bx+c))- mean^2 - var = −2.20825624606312
Thanks

I have rewritten the problem replacing sympy with numpy and lambdas (inline functions).
Also note that in your problem statement you subtract the third equation with $mean^2$, but in your code you only subtract $mean$.
import numpy as np
from scipy.optimize import minimize
from scipy.integrate import quad
def myFunction(x,data):
m,v=data
F = np.zeros(3) # use numpy array
# use scipy.integrade.quad for integration of lambda functions
# quad output is (result, error), so we just select the result value at the end
F[0] = quad(lambda y: np.exp(x[0] * y ** 2 + x[1] * y + x[2]), 0, np.inf)[0] -1
F[1] = quad(lambda y: y*np.exp(x[0] * y ** 2 + x[1] * y + x[2]), 0, np.inf)[0] -m
F[2] = quad(lambda y: (y**2)*np.exp(x[0] * y ** 2 + x[1] * y + x[2]), 0, np.inf)[0] -v-m**2
# minimize the squared error
return np.sum(F**2)
data = (10,3.5) # mean and var for example
xGuess = [-1, 1, 1]
z = minimize(lambda x: myFunction(x, data), x0=xGuess,
bounds=((None, 0), (None, None), (None, None))) # use bounds for negative first coefficient
print(z)
# x: array([-0.99899311, 2.18819689, 1.85313181])
Does this seem more reasonable?

What is the cause of the artifacts of this convoluted signal?

I am trying to find out the cause of the artifacts that appear after convolution, they are to be seen in the plot arround x = -.0016 and x= .0021 (please see the code below). I am convoluting the "lorentzian" function (or the derivative of the langevin function) which I define in the code, with 2 Dirac impulses in the function "ditrib".
I would appreciate your help.
Thank you
Here is my code:
import numpy as np
import matplotlib.pyplot as plt
def Lorentzian(xx):
if not hasattr(xx, '__iter__'):
xx = [ xx ]
res = np.zeros(len(xx))
for i in range(len(xx)):
x = xx[i]
if np.fabs(x) < 0.1:
res[i] = 1./3. - x**2/15. + 2.* x**4 / 189. - x**6/675. + 2.* x**8 / 10395. - 1382. * x**10 / 58046625. + 4. * x**12 / 1403325.
else:
res[i] = (1./x**2 - 1./np.sinh(x)**2)
return res
amp = 18e-3
a = 1/.61e3
b = 5.5
t_min = 0
dt = 1/5e6
t_max = (10772) * dt
t = np.arange(t_min,t_max,dt)
x_min = -amp/b
x_max = amp/b
dx = dt*(x_min-x_max)/(t_min-t_max)
x = np.arange(x_min,x_max,dx)
func1 = lambda x : Lorentzian(b*(x/a))
def distrib(x):
res = np.zeros(np.size(x))
res[int(np.floor(np.size(x)/3))] = 1
res[int(3*np.floor(np.size(x)/4))] = 3
return res
func2 = lambda x,xs : np.convolve(distrib(x), func1(xs), 'same')
plt.plot(x, func2(x,x))
plt.xlabel('x (m)')
plt.ylabel('normalized signal')

try removing the "pedestal" of func1
func1(x)[0], func1(x)[-1]
Out[7]: (0.0082945964013920719, 0.008297677313152443)
just subtract
func2 = lambda x,xs : np.convolve(distrib(x), func1(xs)-func1(x)[0], 'same')
gives a smooth convolution curve
depending on the result you want you may have to add it back in after, weighted by the Dirac sum

CORDIC algorithm returning bad numbers

I started to implement a CORDIC algorithm from zero and I don't know what I'm missing, here's what I have so far.
import math
from __future__ import division
# angles
n = 5
angles = []
for i in range (0, n):
angles.append(math.atan(1/math.pow(2,i)))
# constants
kn = []
fator = 1.0
for i in range (0, n):
fator = fator * (1 / math.pow(1 + (2**(-i))**2, (1/2)))
kn.append(fator)
# taking an initial point p = (x,y) = (1,0)
z = math.pi/2 # Angle to be calculated
x = 1
y = 0
for i in range (0, n):
if (z < 0):
x = x + y*(2**(-1*i))
y = y - x*(2**(-1*i))
z = z + angles[i]
else:
x = x - y*(2**(-1*i))
y = y + x*(2**(-1*i))
z = z - angles[i]
x = x * kn[n-1]
y = y * kn[n-1]
print x, y
When I plug z = π/2 it returns 0.00883479322917 and 0.107149125055, which makes no sense.
Any help will be great!
#edit, I made some changes and now my code has this lines instead of those ones
for i in range (0, n):
if (z < 0):
x = x0 + y0*(2**(-1*i))
y = y0 - x0*(2**(-1*i))
z = z + angles[i]
else:
x = x0 - y0*(2**(-1*i))
y = y0 + x0*(2**(-1*i))
z = z - angles[i]
x0 = x
y0 = y
x = x * kn[n-1]
y = y * kn[n-1]
Now it's working way better, I had the problem because I wasn't using temporary variables as x0 and y0, now when I plug z = pi/2 it gives me better numbers as (4.28270993661e-13, 1.0) :)

Python - Cutting an array at a designated point based on value in row

I have a 300 x 4 matrix called X created by the odeint function. In the second column are y-values and I would like to cut the matrix when the y-value dips below 0. As a first step I was attempting to create a function that would read the second column and spit out the row number where the column first dips below 0.
X = odeint(func, X0, t)
Yval = X[:,1]
def indexer():
i = 0
if Yval[i] > 0:
i = i + 1
if Yval[i] < 0:
return i
Which is not working and conceptually I know this is wrong, I just couldn't think of another way to do this. Is there a way to cut out all the rows that contain and follow the first <0 y value?
This is my entire code:
import numpy as np
import math
from scipy.integrate import odeint
g = 9.8
theta = (45 * math.pi)/180
v0 = 10.0
k = 0.3
x0 = 0
y0 = 0
vx0 = v0*math.sin(theta)
vy0 = v0*math.cos(theta)
def func(i_state,time):
f = np.zeros(4)
f[0] = i_state[2]
f[1] = i_state[3]
f[2] = -k*(f[0]**2 + f[1]**2)**(.5)*f[0]
f[3] = -g - k*(f[0]**2 + f[1]**2)**(.5)*f[1]
return f
X0 = [x0, y0, vx0, vy0]
t0 = 0
tf = 3
timestep = 0.01
nsteps = (tf - t0)/timestep
t = np.linspace(t0, tf, num = nsteps)
X = odeint(func, X0, t)
Yval = X[:,1]
def indexer():
i = 0
if Yval[i] > 0:
i = i + 1
if Yval[i] < 0:
return i

Maybe you could use the takewhile function from the itertools package:
from itertools import takewhile
first_elements = list(takewhile(lambda x: x[1] >= 0, X))
Where X is your matrix. I used x[1] in the lambda predicate to compare the numbers in the second column.
Here, first_elements will be the rows of the matrix before the first row that contains a value less than zero. You can use len(first_elements) to know what the cutoff point was.
I converted it to a list but you don't have to if you are just going to iterate through the result.
I hope this works.

You could do something like this:
newVals = []
i = 0
while( i < len(X) and X[i][1] >= 0):
newVals.append(X[i])
i += 1
This would go through X and append values to the list newVals until you either reach the end of the list (i < len(X)) or you reach your condition (X[i][1] >= 0).

exercise 7.2: Think Python

"Encapsulate this loop in a function called square_root that takes a as a parameter, chooses a reasonable value of x, and returns an estimate of the square root of a."
def square_root(a):
x = 2
y = (x + a/x) / 2
epsilon = 0.00000000001
if abs(y - x) < epsilon:
print y
while abs(y - x) > epsilon:
x = x + 1
y = (x + a/x) / 2
break
else:
return
print y
square_root(33)
up to putting 33 for 'a', it estimates the correct square root. after that, it starts jumping exponentially, to the point where when I send 100 in for 'a', it guesses the square root to be approximately 18. I don't know if maybe this is the nature of estimating. I know of how to find the precise square root, but this is an exercise from the book "Think Python" and it's to practice with recursion and thinking algorithmically.

You shouldn't be incrementing x by 1 in the loop body. You should be setting x to y (look at the Wikipedia article and notice how x3 depends on x2, and so on):
while abs(y - x) > epsilon:
x = y
y = (x + a/x) / 2
You want to get rid of that break as well, as it makes your while loop pointless. Your final code will then be:
def square_root(a):
x = 2
y = (x + a/x) / 2
epsilon = 0.00000000001
if abs(y - x) < epsilon:
print y
while abs(y - x) > epsilon:
x = y
y = (x + a/x) / 2
print y
But there's still room for improvement. Here's how I'd write it:
def square_root(a, epsilon=0.001):
# Initial guess also coerces `a` to a float
x = a / 2.0
while True:
y = (x + a / x) / 2
if abs(y - x) < epsilon:
return y
x = y
Also, since Python's floating point type doesn't have infinite precision, you can only get about 15 digits of precision anyways, so you might as well just remove epsilon:
def square_root(a):
x = a / 2.0
while True:
y = (x + a / x) / 2
# You've reached Python's max float precision
if x == y:
return x
x = y
But this last version might not terminate if y oscillates between two values.

Here's another way; it just updates x instead of using a y.
def square_root(a):
x = a / 2.0
epsilon = 0.00000001
while abs(a - (x**2)) > epsilon:
x = (x + a/x) / 2
return x

If you want it more abstract create good-enough? and guess as primitives .. Refer to the classic text of SICP here http://mitpress.mit.edu/sicp/full-text/sicp/book/node108.html

so simple , just take x = a and then follow neuton formula , like this :
def square_root(a):
x = a
while True:
print x
y = (x+a/x)/2
if abs(y-x) < 0.0000001:
break
x = y

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to calculate a Normal Distribution percent point function in python - python

Related

Minimize system of nonlinear equation (integral on exponent)

What is the cause of the artifacts of this convoluted signal?

CORDIC algorithm returning bad numbers

Python - Cutting an array at a designated point based on value in row

exercise 7.2: Think Python

Categories

Resources