Calculate very large number using python - python

I'm trying to calculate (3e28 choose 2e28)/2^(3e28). I tried scipy.misc.comb to calculate 3e28 choose 2e28 but it gave me inf. When I calculate 2^(3e28), it raised OverflowError: (34, 'Result too large'). How can I compute or estimate (3e28 choose 2e28)/2^(3e28)?

Use Stirling's approximation (which is very accurate in the 1e10+ range), combined with logarithms:
(3e28 choose 2e28) / 2^(3e28) = 3e28! / [(3e28 - 2e28)! * 2e28!] / 2^(3e28)
= e^ [log (3e28!) - log((3e28-2e28)!) - log(2e28!) - 3e28 * log(2)]
and from there apply Stirling's approximation:
log n! ~= log(sqrt(2*pi*n)) + n*log(n) - n
and you'll get your answer.
Here's an example of how accurate this approximation is:
>>> import math
>>> math.log(math.factorial(100))
363.73937555556347
>>> math.log((2*math.pi*100)**.5) + 100*math.log(100) - 100
363.7385422250079
For 100!, it's off by less than 0.01% in log-space.

You can compute this ratio with the normal approximation to the binomial for large n. When n is large, k has to be relatively close to n/2 for (n choose k) / 2^n to not be negligible.
Code
Here's some code that will compute this:
def n_choose_k_over_2_pow_n(n, k):
# compute the mean and standard deviation of the normal
# approximation
mu = n / 2.
sigma = np.sqrt(n) * 1/4.
# now transform to a standard normal variable
z = (k - mu) / sigma
return 1/np.sqrt(2*np.pi) * np.exp(-1/2. * z**2)
So that:
>>> n_choose_k_over_2_pow_n(3e28, 2e28)
0.0
>>> n_choose_k_over_2_pow_n(3e28, 1.5e28)
0.3989422804014327
As you can see, the computation underflows. A solution is to compute the log of the answer, which we can do with this code:
def log_n_choose_k_over_2_pow_n(n, k):
# compute the mean and standard deviation of the normal
# approximation
mu = n / 2.
sigma = np.sqrt(n) * 1/4.
# now transform to a standard normal variable
z = (k - mu) / sigma
# return the log of the answer
return -1./2 * (np.log(2 * np.pi) + z**2)
Another quick check:
>>> log_n_choose_k_over_2_pow_n(3e28, 2e28)
-6.6666666666666638e+27
>>> log_n_choose_k_over_2_pow_n(3e28, 1.5e28)
-0.91893853320467267
If we exponentiate these, we'll get our previous answers.
Explanation
We can do this by an appeal to results from statistics. The binomial distribution is given by:
P(K = k) = (n choose k) p^k * p^(n-k)
For large n, this is well-approximated by the normal distribution with mean n*p and variance n*p*(1-p).
Set p to be 1/2. Then we have:
P(K = k) = (n choose k) (1/2)^k * (1/2)^(n-k)
= (n choose k) (1/2)^n
= (n choose k) / (2^n)
Which is precisely the form of your ratio. Therefore, after a transformation to a standard normal variable with mean n/2 and variance n/4, we can compute your ratio by a simple evaluation of the standard normal distribution pdf.

The following uses log2comb from my answer here:
from math import log
from scipy.special import gammaln
def log2comb(n, k):
return (gammaln(n+1) - gammaln(n-k+1) - gammaln(k+1)) / log(2)
log2p = log2comb(3e28, 2e28) - 3e28
print "log2p =", log2p
which prints
log2p = -2.45112497837e+27
So the base-2 logarithm of your number is about -2.45e27. If you try to compute 2**log2p, you get 0. That is, the number is smaller than the smallest positive number representable with standard 64 bit floating point numbers.

There are python libraries that allow you to do arbitrary precision arithmetic. For example mpmath as used in SymPy.
You will have to rewrite your code to use the library functions though.
http://docs.sympy.org/latest/modules/mpmath/basics.html?highlight=precision
Edit: I just noticed the size of the numbers you are dealing with - much too large for my suggestion.

Related

How to write Bessel function using power series method in Python without Sympy?

I am studying Computational Physics with a lecturer who always ask me to write Python and Matlab code without using instant code (a library that gives me final answer without showing mathematical expression). So I try to write Bessel function for first kind using power series because I thought it was easy compare to other method (I am not sure). I dont know why the result is still very different? Far from answer that Sympy.special provided?
Here is my code for x = 5 and n = 3
import math
def bessel_function(n, x, num_terms):
# Initialize the power series expansion with the first term
series_sum = (x / 2) ** n
# Calculate the remaining terms of the power series expansion
for k in range(0, num_terms):
term = ((-1) ** k) * ((x / 2) ** (2 * k)) / (math.factorial(k)**2)*(2**2*k)
series_sum = series_sum + term
return series_sum
# Test the function with n = 3, x = 5, and num_terms = 10
print(bessel_function(3, 5, 30))
print(bessel_function(3, 5, 15))
And here is the code using sympy library:
from mpmath import *
mp.dps = 15; mp.pretty = True
print(besselj(3, 5))
import sympy
def bessel_function(n, x):
# Use the besselj function from sympy to calculate the Bessel function
return sympy.besselj(n, x)
# Calculate the numerical value of the Bessel function using evalf
numerical_value = bessel_function(3, 5).evalf()
print(numerical_value)
It is a big waste to compute the terms like you do, each from scratch with power and factorial. Much more efficient to compute a term from the previous.
For Jn,
Tk / Tk-1 = - (X/2)²/(k(k+N))
with T0 = (X/2)^N/N!.
N= 3
X= 5
# First term
X*= 0.5
Term= pow(X, N) / math.factorial(N)
Sum= Term
print(Sum)
# Next terms
X*= -X
for k in range(1, 21):
Term*= X / (k * (k + N))
Sum+= Term
print(Sum)
The successive sums are
2.6041666666666665
-1.4648437499999996
1.0782877604166665
0.19525598596643523
0.39236129276336185
0.3615635885763421
0.365128137672062
0.3648098743599441
0.3648324782883616
0.36483117019065225
0.3648312330799652
0.36483123052763916
0.3648312306162616
0.3648312306135987
0.3648312306136686
0.364831230613667
0.36483123061366707
0.36483123061366707
0.36483123061366707
0.36483123061366707
0.36483123061366707

Improve accuracy when solving a linear system where equations have coefficients ranging from E13 to E-18

i am doing some scientific calculations using python.I got into trouble when solving linear equations,where some coefficients are very large ~ E13 , some are very small ~E-69.It gives me a inaccurate solutions.
The equations,as folowing in the code, are very simple,just like rate[0]=rate[1]=...=rate[6] rate[7]=0.I use slove to get the solutions,but rate[0]!=rate[1]=rate[2]=...Although the most are right,but the physical meaning is totally wrong,which is unacceptable.rate[0]~rate[6] must be equal.
I tried some ways to promote the accuracy.
#convert float to symbols
kf_ = [symbols(str(k)) for k in kf_]
kb_= [symbols(str(k)) for k in kb_]
or
#convert float to decimal
kf_ = [Decimal(str(k)) for k in kf_]
kb_ = [Decimal(str(k)) for k in kb_]
But they both don't work.
I tried the same code on matlab,using solve or vpasolveinsymbolic math tool box.The solutions are right.But for some reasons,the calutions must be done using python.
So my question is how to promote the accuracy?
from sympy import symbols , solve
from decimal import Decimal
#coeffcients Some are very large ~ E13 , some are very small ~E-69
kf_= [804970533.1611289,
1.5474657692374676e-13,
64055206.72863516,
43027484.879109934,
239.58564380236825,
43027484.879109934,
0.6887097015872349,
43027484.879109934]
kb_=[51156022807606.22,
4.46863981338889e-18,
9.17599257631182,
8.862701377525092e-43,
4.203415587017237e-20,
2180151.4516747626,
5.590961781720337e-69,
0.011036598954049947]
#convert float to symbols , it takes quite a long time
#kf_ = [symbols(str(k)) for k in kf_]
#kb_= [symbols(str(k)) for k in kb_]
#or
#convert float to decimal
#kf_ = [Decimal(str(k)) for k in kf_]
#kb_ = [Decimal(str(k)) for k in kb_]
# define unkown
theta = list(symbols('theta1:%s' % (8 + 1)))
#define some expressions
rate=[0]*8
rate[0] = kf_[0] * theta[0] - kb_[0] * theta[1]
rate[1] = kf_[1] * theta[1] - kb_[1] * theta[2]
rate[2] = kf_[2] * theta[2] - kb_[2] * theta[3]
rate[3] = kf_[3] * theta[3] - kb_[3] * theta[4]
rate[4] = kf_[4] * theta[4] - kb_[4] * theta[5]
rate[5] = kf_[5] * theta[5] - kb_[5] * theta[6]
rate[6] = kf_[6] * theta[6] - kb_[6] * theta[0]
rate[7] = kf_[7] * theta[0] - kb_[7] * theta[7]
print('\n'.join(str(r) for r in rate))
#euqations
fun=[0]*8
fun[0]=sum(theta)-1
fun[1]=rate[0]-rate[1]# The coefficients kb[0] (~E13 )and kf[1] (~E-13) are merged
fun[2] = rate[1] - rate[2]
fun[3] = rate[2] - rate[3]
fun[4] = rate[3] - rate[4]
fun[5] = rate[4] - rate[5]
fun[6] = rate[5] - rate[6]
fun[7] = rate[7]
#solve
solThetaT=solve(fun,theta)
#print(solThetaT)
theta_=[solThetaT[t] for t in theta]
#print(theta_)
rate_=[0]*8
for i in range(len(rate)):
rate_[i]=rate[i].subs(solThetaT)
print('\n'.join(str(r) for r in rate_))
#when convert float to symbols
#for r in rate_:
#print(eval(str(r)))
The result for rate[0]~rate[7]:
-1.11022302462516e-16
6.24587893889839e-28
6.24587893889839e-28
6.24587893889840e-28
6.24587893889840e-28
6.24587893222751e-28
6.24587895329296e-28
-3.81639164714898e-17
The most serious is rate[0] is negative and rate[6],which is supposed to be zero, has the largest absorbate value.
And the right solutions in matlab
6.245878938898438e-28
6.245878938898395e-28
6.245878938898395e-28
6.245878938898395e-28
6.245878938898395e-28
6.245878938898395e-28
6.245878938898395e-28
0
To avoid any loss of precision in floating point arithmetics, recast the given coefficients as rationals. Assuming from sympy import Rational, this is done with
kf_ = [Rational(x) for x in kf_]
kb_ = [Rational(x) for x in kb_]
Then use the code you have to solve the system and compute the rates. To display floating point numbers instead of giant rationals, use evalf method:
print('\n'.join(str(r.evalf()) for r in rate_))
prints
6.24587893889840e-28
6.24587893889840e-28
6.24587893889840e-28
6.24587893889840e-28
6.24587893889840e-28
6.24587893889840e-28
6.24587893889840e-28
0
Note: SymPy documentation recommends using linsolve for linear systems, but you'd need to adapt the code to deal with the different return type of linsolve.
Also, a linear system with numeric coefficients can be solved directly by mpmath which allows setting an arbitrarily large precision of floating point computations.

Calculating inverse trigonometric functions with formulas

I have been trying to create custom calculator for calculating trigonometric functions. Aside from Chebyshev pylonomials and/or Cordic algorithm I have used Taylor series which have been accurate by few places of decimal.
This is what i have created to calculate simple trigonometric functions without any modules:
from __future__ import division
def sqrt(n):
ans = n ** 0.5
return ans
def factorial(n):
k = 1
for i in range(1, n+1):
k = i * k
return k
def sin(d):
pi = 3.14159265359
n = 180 / int(d) # 180 degrees = pi radians
x = pi / n # Converting degrees to radians
ans = x - ( x ** 3 / factorial(3) ) + ( x ** 5 / factorial(5) ) - ( x ** 7 / factorial(7) ) + ( x ** 9 / factorial(9) )
return ans
def cos(d):
pi = 3.14159265359
n = 180 / int(d)
x = pi / n
ans = 1 - ( x ** 2 / factorial(2) ) + ( x ** 4 / factorial(4) ) - ( x ** 6 / factorial(6) ) + ( x ** 8 / factorial(8) )
return ans
def tan(d):
ans = sin(d) / sqrt(1 - sin(d) ** 2)
return ans
Unfortunately i could not find any sources that would help me interpret inverse trigonometric function formulas for Python. I have also tried putting sin(x) to the power of -1 (sin(x) ** -1) which didn't work as expected.
What could be the best solution to do this in Python (In the best, I mean simplest with similar accuracy as Taylor series)? Is this possible with power series or do i need to use cordic algorithm?
The question is broad in scope, but here are some simple ideas (and code!) that might serve as a starting point for computing arctan. First, the good old Taylor series. For simplicity, we use a fixed number of terms; in practice, you might want to decide the number of terms to use dynamically based on the size of x, or introduce some kind of convergence criterion. With a fixed number of terms, we can evaluate efficiently using something akin to Horner's scheme.
def arctan_taylor(x, terms=9):
"""
Compute arctan for small x via Taylor polynomials.
Uses a fixed number of terms. The default of 9 should give good results for
abs(x) < 0.1. Results will become poorer as abs(x) increases, becoming
unusable as abs(x) approaches 1.0 (the radius of convergence of the
series).
"""
# Uses Horner's method for evaluation.
t = 0.0
for n in range(2*terms-1, 0, -2):
t = 1.0/n - x*x*t
return x * t
The above code gives good results for small x (say smaller than 0.1 in absolute value), but the accuracy drops off as x becomes larger, and for abs(x) > 1.0, the series never converges, no matter how many terms (or how much extra precision) we throw at it. So we need a better way to compute for larger x. One solution is to use argument reduction, via the identity arctan(x) = 2 * arctan(x / (1 + sqrt(1 + x^2))). This gives the following code, which builds on arctan_taylor to give reasonable results for a wide range of x (but beware possible overflow and underflow when computing x*x).
import math
def arctan_taylor_with_reduction(x, terms=9, threshold=0.1):
"""
Compute arctan via argument reduction and Taylor series.
Applies reduction steps until x is below `threshold`,
then uses Taylor series.
"""
reductions = 0
while abs(x) > threshold:
x = x / (1 + math.sqrt(1 + x*x))
reductions += 1
return arctan_taylor(x, terms=terms) * 2**reductions
Alternatively, given an existing implementation for tan, you could simply find a solution y to the equation tan(y) = x using traditional root-finding methods. Since arctan is already naturally bounded to lie in the interval (-pi/2, pi/2), bisection search works well:
def arctan_from_tan(x, tolerance=1e-15):
"""
Compute arctan as the inverse of tan, via bisection search. This assumes
that you already have a high quality tan function.
"""
low, high = -0.5 * math.pi, 0.5 * math.pi
while high - low > tolerance:
mid = 0.5 * (low + high)
if math.tan(mid) < x:
low = mid
else:
high = mid
return 0.5 * (low + high)
Finally, just for fun, here's a CORDIC-like implementation, which is really more appropriate for a low-level implementation than for Python. The idea here is that you precompute, once and for all, a table of arctan values for 1, 1/2, 1/4, etc., and then use those to compute general arctan values, essentially by computing successive approximations to the true angle. The remarkable part is that, after the precomputation step, the arctan computation involves only additions, subtractions, and multiplications by by powers of 2. (Of course, those multiplications aren't any more efficient than any other multiplication at the level of Python, but closer to the hardware, this could potentially make a big difference.)
cordic_table_size = 60
cordic_table = [(2**-i, math.atan(2**-i))
for i in range(cordic_table_size)]
def arctan_cordic(y, x=1.0):
"""
Compute arctan(y/x), assuming x positive, via CORDIC-like method.
"""
r = 0.0
for t, a in cordic_table:
if y < 0:
r, x, y = r - a, x - t*y, y + t*x
else:
r, x, y = r + a, x + t*y, y - t*x
return r
Each of the above methods has its strengths and weaknesses, and all of the above code can be improved in a myriad of ways. I encourage you to experiment and explore.
To wrap it all up, here are the results of calling the above functions on a small number of not-very-carefully-chosen test values, comparing with the output of the standard library math.atan function:
test_values = [2.314, 0.0123, -0.56, 168.9]
for value in test_values:
print("{:20.15g} {:20.15g} {:20.15g} {:20.15g}".format(
math.atan(value),
arctan_taylor_with_reduction(value),
arctan_from_tan(value),
arctan_cordic(value),
))
Output on my machine:
1.16288340166519 1.16288340166519 1.16288340166519 1.16288340166519
0.0122993797673 0.0122993797673 0.0122993797673002 0.0122993797672999
-0.510488321916776 -0.510488321916776 -0.510488321916776 -0.510488321916776
1.56487573286064 1.56487573286064 1.56487573286064 1.56487573286064
The simplest way to do any inverse function is to use binary search.
definitions
let assume function
x = g(y)
And we want to code its inverse:
y = f(x) = f(g(y))
x = <x0,x1>
y = <y0,y1>
bin search on floats
You can do it on integer math accessing mantissa bits like in here:
Any Faster RMS Value Calculation in C?
but if you do not know the exponent of the result prior to computation then you need to use floats for bin search too.
so the idea behind binary search is to change mantissa of y from y1 to y0 bit by bit from MSB to LSB. Then call direct function g(y) and if the result cross x revert the last bit change.
In case of using floats you can use variable that will hold approximate value of the mantissa bit targeted instead of integer bit access. That will eliminate unknown exponent problem. So at the beginning set y = y0 and actual bit to MSB value so b=(y1-y0)/2. After each iteration halve it and do as many iterations as you got mantissa bits n... This way you obtain result in n iterations within (y1-y0)/2^n accuracy.
If your inverse function is not monotonic break it into monotonic intervals and handle each as separate binary search.
The function increasing/decreasing just determine the crossing condition direction (use of < or >).
C++ acos example
so y = acos(x) is defined on x = <-1,+1> , y = <0,M_PI> and decreasing so:
double f64_acos(double x)
{
const int n=52; // mantisa bits
double y,y0,b;
int i;
// handle domain error
if (x<-1.0) return 0;
if (x>+1.0) return 0;
// x = <-1,+1> , y = <0,M_PI> , decreasing
for (y= 0.0,b=0.5*M_PI,i=0;i<n;i++,b*=0.5) // y is min, b is half of max and halving each iteration
{
y0=y; // remember original y
y+=b; // try set "bit"
if (cos(y)<x) y=y0; // if result cross x return to original y decreasing is < and increasing is >
}
return y;
}
I tested it like this:
double x0,x1,y;
for (x0=0.0;x0<M_PI;x0+=M_PI*0.01) // cycle all angle range <0,M_PI>
{
y=cos(x0); // direct function (from math.h)
x1=f64_acos(y); // my inverse function
if (fabs(x1-x0)>1e-9) // check result and output to log if error
Form1->mm_log->Lines->Add(AnsiString().sprintf("acos(%8.3lf) = %8.3lf != %8.3lf",y,x0,x1));
}
Without any difference found... so the implementation is working correctly. Of coarse binary search on 52 bit mantissa is usually slower then polynomial approximation ... on the other hand the implementation is so simple ...
[Notes]
If you do not want to take care of the monotonic intervals you can try
approximation search
As you are dealing with goniometric functions you need to handle singularities to avoid NaN or division by zero etc ...
If you're interested here more bin search examples (mostly on integers)
Power by squaring for negative exponents it contains

Random generating numbers on a given ratio

I need to generate numbers on a positive given interval (a,b) distributed following an exponential distribution. Using the Inverse CDF Method, I made a generator of a number exponentialy distributed. But, of course, this number is a positive number and I want it to be on the given interval. What should I do to only generate on the interval?
The code to generate a number exponentially distributed using the inverse cdf method is, in
Python
u = random.uniform(0,1)
return (-1/L)*math.log(u)
where L is a given positive parameter.
Thanks in advance
The probability of an outcome x would normally be L exp(-Lx). However, when we are restricted to [a,b], the probability of x in [a,b] is scaled up by 1/the fraction of the CDF that occurs between a and b: integral from a to b(L exp(-Lt)dt) = -(exp(-Lb) - exp(-La)).
Therefore, the pdf at x is
L exp(-Lx))/(exp(-La) - exp(-Lb),
giving a cdf at x of
integral from a to x[ L exp(-Lt)/(exp(-La) - exp(-Lb))dt]
= [-exp(-Lx) + exp(-La)]/[exp(-La) - exp(-Lb)] = u
Now invert:
exp(-Lx) = exp(-La) - u[exp(-La) - exp(-Lb)]
-Lx = -La + log( 1 - u[1 - exp(-Lb)/exp(-La)])
x = a + (-1/L) log( 1 - u[1 - exp(-Lb)/exp(-La)])
giving code:
u = random.uniform(0,1)
return a + (-1/L)*math.log( 1 - u*(1 - math.exp(-L*b)/math.exp(-L*a)) )
be aware: for large L or a, math.exp(-L*a) will round to 0, leading to ZeroDivisionError.

Given f, is there an automatic way to calculate fprime for Newton's method?

The following was ported from the pseudo-code from the Wikipedia article on Newton's method:
#! /usr/bin/env python3
# https://en.wikipedia.org/wiki/Newton's_method
import sys
x0 = 1
f = lambda x: x ** 2 - 2
fprime = lambda x: 2 * x
tolerance = 1e-10
epsilon = sys.float_info.epsilon
maxIterations = 20
for i in range(maxIterations):
denominator = fprime(x0)
if abs(denominator) < epsilon:
print('WARNING: Denominator is too small')
break
newtonX = x0 - f(x0) / denominator
if abs(newtonX - x0) < tolerance:
print('The root is', newtonX)
break
x0 = newtonX
else:
print('WARNING: Not able to find solution within the desired tolerance of', tolerance)
print('The last computed approximate root was', newtonX)
Question
Is there an automated way to calculate some form of fprime given some form of f in Python 3.x?
A common way of approximating the derivative of f at x is using a finite difference:
f'(x) = (f(x+h) - f(x))/h Forward difference
f'(x) = (f(x+h) - f(x-h))/2h Symmetric
The best choice of h depends on x and f: mathematically the difference approaches the derivative as h tends to 0, but the method suffers from loss of accuracy due to catastrophic cancellation if h is too small. Also x+h should be distinct from x. Something like h = x*1e-15 might be appropriate for your application. See also implementing the derivative in C/C++.
You can avoid approximating f' by using the secant method. It doesn't converge as fast as Newton's, but it's computationally cheaper and you avoid the problem of having to calculate the derivative.
You can approximate fprime any number of ways. One of the simplest would be something like:
lambda fprime x,dx=0.1: (f(x+dx) - f(x-dx))/(2*dx)
the idea here is to sample f around the point x. The sampling region (determined by dx) should be small enough that the variation in f over that region is approximately linear. The algorithm that I've used is known as the midpoint method. You could get more accurate by using higher order polynomial fits for most functions, but that would be more expensive to calculate.
Of course, you'll always be more accurate and efficient if you know the analytical derivative.
Answer
Define the functions formula and derivative as the following directly after your import.
def formula(*array):
calculate = lambda x: sum(c * x ** p for p, c in enumerate(array))
calculate.coefficients = array
return calculate
def derivative(function):
return (p * c for p, c in enumerate(function.coefficients[1:], 1))
Redefine f using formula by plugging in the function's coefficients in order of increasing power.
f = formula(-2, 0, 1)
Redefine fprime so that it is automatically created using functions derivative and formula.
fprime = formula(*derivative(f))
That should solve your requirement to automatically calculate fprime from f in Python 3.x.
Summary
This is the final solution that produces the original answer while automatically calculating fprime.
#! /usr/bin/env python3
# https://en.wikipedia.org/wiki/Newton's_method
import sys
def formula(*array):
calculate = lambda x: sum(c * x ** p for p, c in enumerate(array))
calculate.coefficients = array
return calculate
def derivative(function):
return (p * c for p, c in enumerate(function.coefficients[1:], 1))
x0 = 1
f = formula(-2, 0, 1)
fprime = formula(*derivative(f))
tolerance = 1e-10
epsilon = sys.float_info.epsilon
maxIterations = 20
for i in range(maxIterations):
denominator = fprime(x0)
if abs(denominator) < epsilon:
print('WARNING: Denominator is too small')
break
newtonX = x0 - f(x0) / denominator
if abs(newtonX - x0) < tolerance:
print('The root is', newtonX)
break
x0 = newtonX
else:
print('WARNING: Not able to find solution within the desired tolerance of', tolerance)
print('The last computed approximate root was', newtonX)

Categories

Resources