Precision error in `scipy.stats.binom` method - python

I am using scipy.stats.binom to work with the binomial distribution. Given n and p, the probability function is
A sum over k ranging over 0 to n should (and indeed does) give 1. Fixing a point x_0, we can add the probabilities in both directions and the two sums ought to add to 1. However the code below yields two different answers when x_0 is close to n.
from scipy.stats import binom
n = 9
p = 0.006985
b = binom(n=n, p=p)
x_0 = 8
# Method 1
cprob = 0
for k in range(x_0, n+1):
cprob += b.pmf(k)
print('cumulative prob with method 1:', cprob)
# Method 2
cprob = 1
for k in range(0, x_0):
cprob -= b.pmf(k)
print('cumulative prob with method 2:', cprob)
I expect the outputs from both methods to agree. For x_0 < 7 it agrees but for x_0 >= 8 as above I get
>> cumulative prob with method 1: 5.0683768775504006e-17
>> cumulative prob with method 2: 1.635963929799698e-16
The precision error in the two methods propagates through my code (later) and gives vastly different answers. Any help is appreciated.

Roundoff errors of the order of the machine epsilon are expected and are inevitable. That these propagate and later blow up means that your problem is very poorly conditioned. You'd need to rethink the algorithm or an implementation, depending on where the bad conditioning comes from.
In your specific example you can get by using either np.sum (which tries to be careful with roundoff), or even math.fsum from the standard library.

Related

Does Python have a function which computes multinomial coefficients?

I was looking for a Python library function which computes multinomial coefficients.
I could not find any such function in any of the standard libraries.
For binomial coefficients (of which multinomial coefficients are a generalization) there is scipy.special.binom and also scipy.misc.comb. Also, numpy.random.multinomial draws samples from a multinomial distribution, and sympy.ntheory.multinomial.multinomial_coefficients returns a dictionary related to multinomial coefficients.
However, I could not find a multinomial coefficients function proper, which given a,b,...,z returns (a+b+...+z)!/(a! b! ... z!). Did I miss it? Is there a good reason there is none available?
I would be happy to contribute an efficient implementation to SciPy say. (I would have to figure out how to contribute, as I have never done this).
For background, they do come up when expanding (a+b+...+z)^n. Also, they count the ways of depositing a+b+...+z distinct objects into distinct bins such that the first bin contains a objects, etc. I need them occasionally for a Project Euler problem.
BTW, other languages do offer this function: Mathematica, MATLAB, Maple.
To partially answer my own question, here is my simple and fairly efficient implementation of the multinomial function:
def multinomial(lst):
res, i = 1, 1
for a in lst:
for j in range(1,a+1):
res *= i
res //= j
i += 1
return res
It seems from the comments so far that no efficient implementation of the function exists in any of the standard libraries.
Update (January 2020). As Don Hatch has pointed out in the comments, this can be further improved by looking for the largest argument (especially for the case that it dominates all others):
def multinomial(lst):
res, i = 1, sum(lst)
i0 = lst.index(max(lst))
for a in lst[:i0] + lst[i0+1:]:
for j in range(1,a+1):
res *= i
res //= j
i -= 1
return res
No, there is not a built-in multinomial library or function in Python.
Anyway this time math could help you. In fact a simple method for calculating the multinomial
keeping an eye on the performance is to rewrite it by using the characterization of the multinomial coefficient as a product of binomial coefficients:
where of course
Thanks to scipy.special.binom and the magic of recursion you can solve the problem like this:
from scipy.special import binom
def multinomial(params):
if len(params) == 1:
return 1
return binom(sum(params), params[-1]) * multinomial(params[:-1])
where params = [n1, n2, ..., nk].
Note: Splitting the multinomial as a product of binomial is also good to prevent overflow in general.
You wrote "sympy.ntheory.multinomial.multinomial_coefficients returns a dictionary related to multinomial coefficients", but it is not clear from that comment if you know how to extract the specific coefficients from that dictionary. Using the notation from the wikipedia link, the SymPy function gives you all the multinomial coefficients for the given m and n. If you only want a specific coefficient, just pull it out of the dictionary:
In [39]: from sympy import ntheory
In [40]: def sympy_multinomial(params):
...: m = len(params)
...: n = sum(params)
...: return ntheory.multinomial_coefficients(m, n)[tuple(params)]
...:
In [41]: sympy_multinomial([1, 2, 3])
Out[41]: 60
In [42]: sympy_multinomial([10, 20, 30])
Out[42]: 3553261127084984957001360
Busy Beaver gave an answer written in terms of scipy.special.binom. A potential problem with that implementation is that binom(n, k) returns a floating point value. If the coefficient is large enough, it will not be exact, so it would probably not help you with a Project Euler problem. Instead of binom, you can use scipy.special.comb, with the argument exact=True. This is Busy Beaver's function, modified to use comb:
In [46]: from scipy.special import comb
In [47]: def scipy_multinomial(params):
...: if len(params) == 1:
...: return 1
...: coeff = (comb(sum(params), params[-1], exact=True) *
...: scipy_multinomial(params[:-1]))
...: return coeff
...:
In [48]: scipy_multinomial([1, 2, 3])
Out[48]: 60
In [49]: scipy_multinomial([10, 20, 30])
Out[49]: 3553261127084984957001360
Here are two approaches, one using factorials, one using Stirling's approximation.
Using factorials
You can define a function to return multinomial coefficients in a single line using vectorised code (instead of for-loops) as follows:
from scipy.special import factorial
def multinomial_coeff(c):
return factorial(c.sum()) / factorial(c).prod()
(Where c is an np.ndarray containing the number of counts for each different object). Usage example:
>>> import numpy as np
>>> coeffs = np.array([2, 3, 4])
>>> multinomial_coeff(coeffs)
1260.0
In some cases this might be slower because you will be computing certain factorial expressions multiple times, in other cases this might be faster because I believe that numpy naturally parallelises vectorised code. Also this reduces the required number of lines in your program and is arguably more readable. If someone has the time to run speed tests on these different options then I'd be interested to see the results.
Using Stirling's approximation
In fact the logarithm of the multinomial coefficient is much faster to compute (based on Stirling's approximation) and allows computation of much larger coefficients:
from scipy.special import gammaln
def log_multinomial_coeff(c):
return gammaln(c.sum()+1) - gammaln(c+1).sum()
Usage example:
>>> import numpy as np
>>> coeffs = np.array([2, 3, 4])
>>> np.exp(log_multinomial_coeff(coeffs))
1259.999999999999
Your own answer (the accepted one) is quite good, and is especially simple. However, it does have one significant inefficiency: your outer loop for a in lst is executed one more time than is necessary. In the first pass through that loop, the values of i and j are always identical, so the multiplications and divisions do nothing. In your example multinomial([123, 134, 145]), there are 123 unneeded multiplications and divisions, adding time to the code.
I suggest finding the maximum value in the parameters and removing it, so those unneeded operations are not done. That adds complexity to the code but reduces the execution time, especially for short lists of large numbers. My code below executes multcoeff(123, 134, 145) in 111 microseconds, while your code takes 141 microseconds. That is not a large increase, but that could matter. So here is my code. This also takes individual values as parameters rather than a list, so that is another difference from your code.
def multcoeff(*args):
"""Return the multinomial coefficient
(n1 + n2 + ...)! / n1! / n2! / ..."""
if not args: # no parameters
return 1
# Find and store the index of the largest parameter so we can skip
# it (for efficiency)
skipndx = args.index(max(args))
newargs = args[:skipndx] + args[skipndx + 1:]
result = 1
num = args[skipndx] + 1 # a factor in the numerator
for n in newargs:
for den in range(1, n + 1): # a factor in the denominator
result = result * num // den
num += 1
return result
Starting Python 3.8,
since the standard library now includes the math.comb function (binomial coefficient)
and since the multinomial coefficient can be computed as a product of binomial coefficients
we can implement it without external libraries:
import math
def multinomial(*params):
return math.prod(math.comb(sum(params[:i]), x) for i, x in enumerate(params, 1))
multinomial(10, 20, 30) # 3553261127084984957001360

Minimizing a multivariable function of loop iterations

I'm trying to minimize a function f of ~80 variables stored in an array. The function is defined by two nested loops: the outer one indexes array by i, while the inner loop is performed array[i] times and adds the result of a computation to a running total. The computation depends on some conditions x and y and changes slightly every time it's performed, which is why I need the loop structure. Here is a minimal working example in Python:
def f[array]:
total = 0
x = 0
y = 0
for i in range(len(array)):
for j in range(array[i]):
result = 2*x + y
total = total + result
x = x+1
x = 0
y = y+1
return total
So for instance, print f([2,1]) returns 3, since [(2*0) + 0] + [(2*1) + 0] + [(2*0) + 1] = 0+2+1 = 3.
I want to find the entries of array that minimize the value of f. However, when I tell (e.g.) Mathematica to minimize f([x1, x2, ..., x80]) and spit out the minimizer array, the program complains because it can't perform the loops defining f an indeterminate number of times.
In light of this, my question is the following:
How do I minimize a multivariate function whose parameters describe the number of times a given loop is to be iterated?
I had originally tried to implement this in Mathematica, but found that I could not define f by the procedure above. The best I could do is tell Mathematica to perform the loops above, then define f[array_] := total after total had been computed. When I ran my code, Mathematica naturally claimed that it could not evaluate f, throwing an error even before it executed my command to NMinimize[{f[array] array ϵ Integers}, array]. The fact that Mathematica is trying to evaluate f before it is called in NMinimize indicates that I don't quite understand how functions work in Mathematica. Any help in untangling this situation would be greatly appreciated!
As written your function has an analytical minimum and there is no need for numerical optimization. Unfortunately, StackOverflow won't let me show the mathematics of it (if you ask it on MathExchange I can provide a derivation), but given an array A = [a0 a1 ... an] where each ai is a positive integer, and an array Y = [0 1 ... n] the function you posted reduces to the following matrix multiplication A * (A - 1 + Y)' where ' denotes a matrix transpose and * denotes matrix multiplication. So, trivially, the function is minimized when each ai is minimized. So, if this is part of a larger optimization, your task should be focused on finding the minimum of each element of A if the elements themselves are constrained.

Why am I getting inf on controlled calculation?

I have a function that has a loop, inside of which I do both division and multiplication. The final answer is easily representable, as should the running answer be.
def tie(total):
count = total / 2
prob = 1.0
for i in xrange(1, count + 1):
i_f = float(i)
prob *= (count + i_f) / i_f / 4
return prob
-
tie(4962) == 0.01132634537589437
but
tie(4964) == inf
Is the compiler trying to do some optimization, doing the arithmetic operations in an order other than I seem to have specified and that order is supposedly equivalent but causes the overflow?
You're running into issues because even though the final result of your tie function should mathematically be between 0 and 1, the intermediate values in your loop grow very large: for total = 4962, the value of prob halfway through the iteration is around 1.5e308, which is almost but not quite large enough to overflow a Python float. For total = 4964, the mid-way value really does overflow a float, and since inf times anything finite is still inf, the inf from the overflow propagates all the way down to the final value.
If you're prepared to accept a (fairly small) amount of floating-point error, there's no need to compute this quantity using a loop at all: you can use the lgamma function from the math module to compute the log of the relevant factorials. (You could also use the gamma function directly, but that would likely also lead to overflow issues.)
Here's a version of your function based on this.
from math import lgamma, log, exp
def tie(total):
count = total / 2
return exp(lgamma(2*count + 1) - 2*lgamma(count + 1) - count*log(4))
Alternatively, you could compute the 2n-choose-n term using pure integer arithmetic (which won't cause overflow), and only produce a float at the last moment (when dividing by 4**count). This will be less efficient that the above, but will give you (in a sense) perfect accuracy, in that it'll give the closest representable float to the exact answer. Here's what that version looks like:
from __future__ import division
def tie(total):
count = total // 2
prod = 1
for i in xrange(1, count+1):
prod = prod * (count + i) // i
return prod / 4**count
Note: the floor division in prod * (count + i) // i may look wrong, but it actually works: a little bit of elementary number theory shows that at this point in the calculation, prod * (count + i) must be divisible by i, so it's safe to do an integer division.
Finally, just for fun, here's a third way to compute your probability that's similar in spirit to your original code, but avoids overflow: the value prob starts at 1.0 and steadily decreases to the final value.
def tie(total):
count = total // 2
prob = 1.0
for i in xrange(1, count+1):
prob *= (i-0.5) / i
return prob
Besides being immune from overflow issues, this solution will be more efficient that the integer-based solution, and more accurate than the lgamma-based one.
prob grows to be quite large and eventually overflows. Given the name, did you intend prob to always be between 0 and 1?
What do you mean "controlled calculation"? What causes the overflow is prob getting bigger and bigger.
Your prob variable grows very large and for total equals 4964 it overflows Python maximum float value sys.float_info
>>> import sys
>>> print(sys.float_info.max)
1.7976931348623157e+308

How to do a Sigma in python 3

I'm trying to make a calculator for something, but the formulas use a sigma, I have no idea how to do a sigma in python, is there an operator for it?
Ill put a link here with a page that has the formulas on it for illustration:http://fromthedepths.gamepedia.com/User:Evil4Zerggin/Advanced_cannon
A sigma (∑) is a Summation operator. It evaluates a certain expression many times, with slightly different variables, and returns the sum of all those expressions.
For example, in the Ballistic coefficient formula
The Python implementation would look something like this:
# Just guessing some values. You have to search the actual values in the wiki.
ballistic_coefficients = [0.3, 0.5, 0.1, 0.9, 0.1]
total_numerator = 0
total_denominator = 0
for i, coefficient in enumerate(ballistic_coefficients):
total_numerator += 2**(-i) * coefficient
total_denominator += 2**(-i)
print('Total:', total_numerator / total_denominator)
You may want to look at the enumerate function, and beware precision problems.
The easiest way to do this is to create a sigma function the returns the summation, you can barely understand this, you don't need to use a library. you just need to understand the logic .
def sigma(first, last, const):
sum = 0
for i in range(first, last + 1):
sum += const * i
return sum
# first : is the first value of (n) (the index of summation)
# last : is the last value of (n)
# const : is the number that you want to sum its multiplication each (n) times with (n)
An efficient way to do this in Python is to use reduce().
To solve
3
Σ i
i=1
You can use the following:
from functools import reduce
result = reduce(lambda a, x: a + x, [0]+list(range(1,3+1)))
print(result)
reduce() will take arguments of a callable and an iterable, and return one value as specified by the callable. The accumulator is a and is set to the first value (0), and then the current sum following that. The current value in the iterable is set to x and added to the accumulator. The final accumulator is returned.
The formula to the right of the sigma is represented by the lambda. The sequence we are summing is represented by the iterable. You can change these however you need.
For example, if I wanted to solve:
Σ π*i^2
i
For a sequence I [2, 3, 5], I could do the following:
reduce(lambda a, x: a + 3.14*x*x, [0]+[2,3,5])
You can see the following two code lines produce the same result:
>>> reduce(lambda a, x: a + 3.14*x*x, [0]+[2,3,5])
119.32
>>> (3.14*2*2) + (3.14*3*3) + (3.14*5*5)
119.32
I've looked all the answers that different programmers and coders have tried to give to your query but i was unable to understand any of them maybe because i am a high school student anyways according to me using LIST will definately reduce some pain of coding so here it is what i think simplest way to form a sigma function .
#creating a sigma function
a=int(input("enter a number for sigma "))
mylst=[]
for i in range(1,a+1):
mylst.append(i)
b=sum(mylst)
print(mylst)
print(b)
Captial sigma (Σ) applies the expression after it to all members of a range and then sums the results.
In Python, sum will take the sum of a range, and you can write the expression as a comprehension:
For example
Speed Coefficient
A factor in muzzle velocity is the speed coefficient, which is a
weighted average of the speed modifiers si of the (non-
casing) parts, where each component i starting at the head has half the
weight of the previous:
The head will thus always determine at least 25% of the speed
coefficient.
For example, suppose the shell has a Composite Head (speed modifier
1.6), a Solid Warhead Body (speed modifier 1.3), and a Supercavitation
Base (speed modifier 0.9). Then we have
s0=1.6
s1=1.3
s2=0.9
From the example we can see that i starts from 0 not the usual 1 and so we can do
def speed_coefficient(parts):
return (
sum(0.75 ** i * si for i, si in enumerate(parts))
/
sum(0.75 ** i for i, si in enumerate(parts))
)
>>> speed_coefficient([1.6, 1.3, 0.9])
1.3324324324324326
import numpy as np
def sigma(s,e):
x = np.arange(s,e)
return np.sum([x+1])

Random number function python that includes 1?

I am new to Python and am trying to create a program for a project- firstly, I need to generate a point between the numbers 0-1.0, including 0 and 1.0 ([0, 1.0]). I searched the python library for functions (https://docs.python.org/2/library/random.html) and I found this function:
random.random()
This will return the next random floating point number in the range [0.0, 1.0). This is a problem, since it does not include 1. Although the chances of actually generating a 1 are very slim anyway, it is still important because this is a scientific program that will be used in a larger data collection.
I also found this function:
rand.randint
This will return an integer, which is also a problem.
I researched on the website and previously asked questions and found that this function:
random.uniform(a, b)
will only return a number that is greater than or equal to a and less than b.
Does anyone know how to create a random function on python that will include [0, 1.0]?
Please correct me if I was mistaken on any of this information. Thank you.
*The random numbers represent the x value of a three dimensional point on a sphere.
Could you make do with something like this?
random.randint(0, 1000) / 1000.0
Or more formally:
precision = 3
randomNumber = random.randint(0, 10 ** precision) / float(10 ** precision)
Consider the following function built on top of random.uniform. I believe that the re-sampling approach should cause all numbers in the desired interval to appear with equal probability, because the probability of returning candidate > b is 0, and originally all numbers should be equally likely.
import sys
import random
def myRandom(a, b):
candidate = uniform.random(a, b + sys.float_info.epsilon)
while candidate > b:
candidate = uniform.random(a, b + sys.float_info.epsilon)
return candidate
As gnibbler mentioned below, for the general case, it may make more sense to change both the calls to the following. Note that this will only work correctly if b > 0.
candidate = uniform.random(a, b*1.000001)
Try this:
import random
random.uniform(0.0, 1.0)
Which will, according to the documentation [Python 3.x]:
Return a random floating point number N such that a <= N <= b for a <= b and b <= N <= a for b < a.
Notice that the above paragraph states that b is in fact included in the range of possible values returned by the function. However, beware of the second part (emphasis mine):
The end-point value b may or may not be included in the range depending on floating-point rounding in the equation a + (b-a) * random().
For floating point numbers you can use numpy's machine limits for floats class to get the smallest possible value for 64bit or 32bit floating point numbers. In theory, you should be able to add this value to b in random.uniform(a, b) making 1 inclusive in your generator:
import numpy
import random
def randomDoublePrecision():
floatinfo = numpy.finfo(float)
epsilon = floatinfo.eps
a = random.uniform(0, 1 + eps)
return a
This assumes that you are using full precision floating point numbers for your number generator. For more info read this Wikipedia article.
Would it be just:
list_rnd=[random.random() for i in range(_number_of_numbers_you_want)]
list_rnd=[item/max(list_rnd) for item in list_rnd]
Generate a list of random numbers and divide it by its max value. The resulting list still flows uniform distribution.
I've had the same problem, this should help you.
a: upper limit,
b: lower limit, and
digit: digit after comma
def konv_des (bin,a,b,l,digit):
des=int(bin,2)
return round(a+(des*(b-a)/((2**l)-1)),digit)
def rand_bin(p):
key1 = ""
for i in range(p):
temp = str(random.randint(0, 1))
key1 += temp
return(key1)
def rand_chrom(a,b,digit):
l = 1
eq=False
while eq==False:
l += 1
eq=2**(l-1) < (b-a)*(10**digit) and (b-a)*(10**digit) <= (2**l)-1
return konv_des(rand_bin(l),a,b,l,digit)
#run
rand_chrom(0,1,4)

Categories

Resources