Handling large numbers to calculate the number of combinations

Handling large numbers to calculate the number of combinations - python

I wrote the following code to calculate a probability. I know the formula I used is correct. The code gives reasonable results for small values of m and n. However, for the large values of m and n, the result, sumC, is outside the interval [0,1] which is not expected. The variable iter takes large values which may be the reason. How can I handle this problem?
import math
from scipy.special import comb
n = 50
m = 100
k = 15
P=[]
sumC = 0
for j in range(k, m+1):
if not (n-j < 0):
iter = (-1)**(j+k) * comb(j, k, exact=True) * comb(m, j, exact=True) * math.factorial(n) * (m-j)**(n-j) /( (m)**(n) * math.factorial(n-j))
sumC = sumC + iter
print(sumC )

Using the Python mpmath arbitrary precision library with 50 digits of precision produces a value in [0, 1] range.
from scipy.special import comb
from mpmath import fac, mp
mp.dps = 50; mp.pretty = True
def compute_prob(k, m, n):
""" Original summation, but using factorial ('fac') from mpmath """
sumC = 0
for j in range(k, m+1):
if not (n-j < 0):
iter_ = (-1)**(j+k) * comb(j, k, exact=True) * comb(m, j, exact=True) * fac(n) * (m-j)**(n-j) /( (m)**(n) * fac(n-j))
sumC = sumC + iter_
return sumC
n = 50
m = 100
k = 15
print(compute_prob(k, m, n))
Output
0.000054845306977312907595945622368606216050228369266162

Related

binom.pmf only returning zero

code:
def expected_profit(n):
total = 0
X = np.arange(0,n+1)
p = np.arange(0,n+1)
profit = np.arange(0,n+1)
for i in list(range(1,n+1)):
print("X_i:", X[i])
p[i] = binom.pmf(X[i],n,19/20)
print(p[i])
if X[i] > 100:
profit[i] = 50*n-60*(X[i]-100)
else:
profit[i] = 50*n
total += profit[i]*p[i]
return total
expected_profit(10)
>>>0
For some reason, after each iteration, p[i] is equal to zero. Yet when I manually type out (for example) binom.pmf(10,10,19/20) I get a non zero answer. What is the problem here?
This seems to happen with any call to binom.pmf within the function call.

With p = np.arange(0,n+1) you initialize p with an integer array 0,...,n. That makes that binom.pmf(...) is converted to an integer when assigned to p[i]. The solution is to make p an array of floats. np.zeros() by default creates an array of floats. The same problem holds for profit.
Fitting this into the code would look like:
from scipy.stats import binom
import numpy as np
def expected_profit(n):
n = 10
total = 0
X = np.arange(0, n + 1)
p = np.zeros(n + 1, dtype=float)
profit = np.zeros(n + 1, dtype=float)
for i in range(1, n + 1):
p[i] = binom.pmf(X[i], n, 19/20)
if X[i] > 100:
profit[i] = 50 * n - 60 * (X[i] - 100)
else:
profit[i] = 50 * n
total += profit[i] * p[i]
expected_profit(10)

Graphing Intergration in Python

In the following code I have implemented Simpsons Rule in Python. I have attempted to plot the absolute error as a function of n for a suitable range of integer values n. I know that the exact result should be 1-cos(pi/2). However my graph doesn't seem to be correct. How can I fix my code to get the correct output? there were two loops and I don't think I implemented my graph coding correctly
def simpson(f, a, b, n):
"""Approximates the definite integral of f from a to b by the composite Simpson's rule, using n subintervals (with n even)"""
h = (b - a) / (n)
s = f(a) + f(b)
diffs = {}
for i in range(1, n, 2):
s += 4 * f(a + i * h)
for i in range(2, n-1, 2):
s += 2 * f(a + i * h)
r = s
exact = 1 - cos(pi/2)
diff = abs(r - exact)
diffs[n] = diff
ordered = sorted(diffs.items())
x,y = zip(*ordered)
plt.autoscale()
plt.loglog(x,y)
plt.xlabel("Intervals")
plt.ylabel("Error")
plt.show()
return s * h / 3
simpson(lambda x: sin(x), 0.0, pi/2, 100)

Your simpson method should just calculate the integral for a single value of n (as it does), but creating the plot for many values of n should be outside that method. as:
from math import pi, cos, sin
from matplotlib import pyplot as plt
def simpson(f, a, b, n):
"""Approximates the definite integral of f from a to b by the composite Simpson's rule, using 2n subintervals """
h = (b - a) / (2*n)
s = f(a) + f(b)
for i in range(1, 2*n, 2):
s += 4 * f(a + i * h)
for i in range(2, 2*n-1, 2):
s += 2 * f(a + i * h)
return s * h / 3
diffs = {}
exact = 1 - cos(pi/2)
for n in range(1, 100):
result = simpson(lambda x: sin(x), 0.0, pi/2, n)
diffs[2*n] = abs(exact - result) # use 2*n or n here, your choice.
ordered = sorted(diffs.items())
x,y = zip(*ordered)
plt.autoscale()
plt.loglog(x,y)
plt.xlabel("Intervals")
plt.ylabel("Error")
plt.show()

Wiki example for Arnoldi iteration only works for real matrices?

The Wikipedia entry for the Arnoldi method provides a Python example that produces basis of the Krylov subspace of a matrix A. Supposedly, if A is Hermitian (i.e. if A == A.conj().T) then the Hessenberg matrix h generated by this algorithm is tridiagonal (source). However, when I use the Wikipedia code on a real-world Hermitian matrix, the Hessenberg matrix is not at all tridiagonal. When I perform the computation on the real part of A (so that A == A.T) then I do get a tridiagonal Hessenberg matrix, so there seems to be a problem with the imaginary components of A. Does anybody know why the Wikipedia code doesn't produce the expected results?
Working example:
import numpy as np
import matplotlib.pyplot as plt
from scipy.linalg import circulant
def arnoldi_iteration(A, b, n):
m = A.shape[0]
h = np.zeros((n + 1, n), dtype=np.complex)
Q = np.zeros((m, n + 1), dtype=np.complex)
q = b / np.linalg.norm(b) # Normalize the input vector
Q[:, 0] = q # Use it as the first Krylov vector
for k in range(n):
v = A.dot(q) # Generate a new candidate vector
for j in range(k + 1): # Subtract the projections on previous vectors
h[j, k] = np.dot(Q[:, j], v)
v = v - h[j, k] * Q[:, j]
h[k + 1, k] = np.linalg.norm(v)
eps = 1e-12 # If v is shorter than this threshold it is the zero vector
if h[k + 1, k] > eps: # Add the produced vector to the list, unless
q = v / h[k + 1, k] # the zero vector is produced.
Q[:, k + 1] = q
else: # If that happens, stop iterating.
return Q, h
return Q, h
# Construct matrix A
N = 2**4
I = np.eye(N)
k = np.fft.fftfreq(N, 1.0 / N) + 0.5
alpha = np.linspace(0.1, 1.0, N)*2e2
c = np.fft.fft(alpha) / N
C = circulant(c)
A = np.einsum("i, ij, j->ij", k, C, k)
# Show that A is Hermitian
print(np.allclose(A, A.conj().T))
# Arbitrary (random) initial vector
np.random.seed(0)
v = np.random.rand(N)
# Perform Arnoldi iteration with complex A
_, h = arnoldi_iteration(A, v, N)
# Perform Arnoldi iteration with real A
_, h2 = arnoldi_iteration(np.real(A), v, N)
# Plot results
plt.subplot(121)
plt.imshow(np.abs(h))
plt.title("Complex A")
plt.subplot(122)
plt.imshow(np.abs(h2))
plt.title("Real A")
plt.tight_layout()
plt.show()
Result:

After browsing through some conference presentation slides, I realised that at some point Q had to be conjugated when A is complex. The correct algorithm is posted below for reference, with the code change marked (note that this correction has also been submitted to the Wikipedia entry):
import numpy as np
def arnoldi_iteration(A, b, n):
m = A.shape[0]
h = np.zeros((n + 1, n), dtype=np.complex)
Q = np.zeros((m, n + 1), dtype=np.complex)
q = b / np.linalg.norm(b)
Q[:, 0] = q
for k in range(n):
v = A.dot(q)
for j in range(k + 1):
h[j, k] = np.dot(Q[:, j].conj(), v) # <-- Q needs conjugation!
v = v - h[j, k] * Q[:, j]
h[k + 1, k] = np.linalg.norm(v)
eps = 1e-12
if h[k + 1, k] > eps:
q = v / h[k + 1, k]
Q[:, k + 1] = q
else:
return Q, h
return Q, h

Close form solution for finding a root

Suppose I have a Pandas Series s whose values sum to 1 and whose values are also all greater than or equal to 0. I need to subtract a constant from all values such that the sum of the new Series is equal to 0.6. The catch is, when I subtract this constant, the values never end up less than zero.
In math formula, assume I have a series of x's and I want to find k
MCVE
import pandas as pd
import numpy as np
from string import ascii_uppercase
np.random.seed([3, 141592653])
s = np.power(
1000, pd.Series(
np.random.rand(10),
list(ascii_uppercase[:10])
)
).pipe(lambda s: s / s.sum())
s
A 0.001352
B 0.163135
C 0.088365
D 0.010904
E 0.007615
F 0.407947
G 0.005856
H 0.198381
I 0.027455
J 0.088989
dtype: float64
The sum is 1
s.sum()
0.99999999999999989
What I've tried
I can use Newton's method (among others) found in Scipy's optimize module
from scipy.optimize import newton
def f(k):
return s.sub(k).clip(0).sum() - .6
Finding the root of this function will give me the k I need
initial_guess = .1
k = newton(f, x0=initial_guess)
Then subtract this from s
new_s = s.sub(k).clip(0)
new_s
A 0.000000
B 0.093772
C 0.019002
D 0.000000
E 0.000000
F 0.338583
G 0.000000
H 0.129017
I 0.000000
J 0.019626
dtype: float64
And the new sum is
new_s.sum()
0.60000000000000009
Question
Can we find k without resorting to using a solver?

I was not expecting newton to carry the day. But on large arrays, it does.
numba.njit
Inspire by Thierry's Answer
Using a loop on a sorted array with numbas jit
import numpy as np
from numba import njit
#njit
def find_k_numba(a, t):
a = np.sort(a)
m = len(a)
s = a.sum()
to_remove = s - t
if to_remove <= 0:
k = 0
else:
for i, x in enumerate(a):
k = to_remove / (m - i)
if k < x:
break
else:
to_remove -= x
return k
numpy
Inspired by Paul's Answer
Numpy carrying the heavy lifting.
import numpy as np
def find_k_numpy(a, t):
a = np.sort(a)
m = len(a)
s = a.sum()
to_remove = s - t
if to_remove <= 0:
k = 0
else:
c = a.cumsum()
n = np.arange(m)[::-1]
b = n * a + c
i = np.searchsorted(b, to_remove)
k = a[i] + (to_remove - b[i]) / (m - i)
return k
scipy.optimize.newton
My method via Newton
import numpy as np
from scipy.optimize import newton
def find_k_newton(a, t):
s = a.sum()
if s <= t:
k = 0
else:
def f(k_):
return np.clip(a - k_, 0, a.max()).sum() - t
k = newton(f, (s - t) / len(a))
return k
Time Trials
import pandas as pd
from timeit import timeit
res = pd.DataFrame(
np.nan, [10, 30, 100, 300, 1000, 3000, 10000, 30000],
'find_k_newton find_k_numpy find_k_numba'.split()
)
for i in res.index:
a = np.random.rand(i)
t = a.sum() * .6
for j in res.columns:
stmt = f'{j}(a, t)'
setp = f'from __main__ import {j}, a, t'
res.at[i, j] = timeit(stmt, setp, number=200)
Results
res.plot(loglog=True)
res.div(res.min(1), 0)
find_k_newton find_k_numpy find_k_numba
10 57.265421 17.552150 1.000000
30 29.221947 9.420263 1.000000
100 16.920463 5.294890 1.000000
300 10.761341 3.037060 1.000000
1000 1.455159 1.033066 1.000000
3000 1.000000 2.076484 2.550152
10000 1.000000 3.798906 4.421955
30000 1.000000 5.551422 6.784594

Updated: Three different implementations - interestingly, the least sophisticated scales best.
import numpy as np
def f_sort(A, target=0.6):
B = np.sort(A)
C = np.cumsum(np.r_[B[0], np.diff(B)] * np.arange(N, 0, -1))
idx = np.searchsorted(C, 1 - target)
return B[idx] + (1 - target - C[idx]) / (N-idx)
def f_partition(A, target=0.6):
target, l = 1 - target, len(A)
while len(A) > 1:
m = len(A) // 2
A = np.partition(A, m-1)
ls = A[:m].sum()
if ls + A[m-1] * (l-m) > target:
A = A[:m]
else:
l -= m
target -= ls
A = A[m:]
return target / l
def f_direct(A, target=0.6):
target = 1 - target
while True:
gt = A > target / len(A)
if np.all(gt):
return target / len(A)
target -= A[~gt].sum()
A = A[gt]
N = 10
A = np.random.random(N)
A /= A.sum()
print(f_sort(A), np.clip(A-f_sort(A), 0, None).sum())
print(f_partition(A), np.clip(A-f_partition(A), 0, None).sum())
print(f_direct(A), np.clip(A-f_direct(A), 0, None).sum())
from timeit import timeit
kwds = dict(globals=globals(), number=1000)
N = 100000
A = np.random.random(N)
A /= A.sum()
print(timeit('f_sort(A)', **kwds))
print(timeit('f_partition(A)', **kwds))
print(timeit('f_direct(A)', **kwds))
Sample run:
0.04813686999999732 0.5999999999999999
0.048136869999997306 0.6000000000000001
0.048136869999997306 0.6000000000000001
8.38109541599988
2.1064437470049597
1.2743922089866828

An exact solution, requesting only a sort, then in O(n) (well, less: we only need as many loops as the number of values that will turn to zero):
we turn the smallest values to zero while possible, then share the remaining excess between the remaining ones:
l = [0.001352, 0.163135, 0.088365, 0.010904, 0.007615, 0.407947,
0.005856, 0.198381, 0.027455, 0.088989]
initial_sum = sum(l)
target_sum = 0.6
# number of values not yet turned to zero
non_zero = len(l)
# already substracted by substracting the constant where possible
substracted = 0
# what we want to substract to each value
constant = 0
for v in sorted(l):
if initial_sum - substracted - non_zero * (v - constant) >= target_sum:
substracted += non_zero * (v - constant)
constant = v
non_zero -= 1
else:
constant += (initial_sum - substracted - target_sum) / non_zero
break
l = [v - constant if v > constant else 0 for v in l]
print(l)
print(sum(l))
# [0, 0.09377160000000001, 0.019001600000000007, 0, 0, 0.3385836, 0, 0.1290176, 0, 0.019625600000000007]
# 0.6

Just wanted to add an option to #piRSquared 's answer: find_k_hybrd
find_k_hybrd is a mixture of the "numba" and "newton" solutions. I use the hybrd root finding algorithm in NumbaMinpack. NumbaMinpack is faster than scipy for problems like this, because it's root finding methods can be within jit-ed functions.
import numpy as np
from NumbaMinpack import hybrd, minpack_sig
from numba import njit, cfunc
n = 10000
np.random.seed(0)
a = np.random.rand(n)
t = a.sum()*.6
#cfunc(minpack_sig)
def func(k_, fvec, args):
t = args[n]
amax = -np.inf
for i in range(n):
if args[i] > amax:
amax = args[i]
args1 = np.empty(n)
for i in range(n):
args1[i] = args[i] - k_[0]
if args1[i] < 0.0:
args1[i] = 0.0
elif args1[i] > amax:
args1[i] = amax
argsum = args1.sum()
fvec[0] = argsum - t
funcptr = func.address
#njit
def find_k_hybrd(a, t):
s = a.sum()
if s <= t:
k = 0.0
else:
k_init = np.array([(s - t) / len(a)])
args = np.append(a, np.array([t]))
sol = hybrd(funcptr, k_init, args)
k = sol[0][0]
return k
print(find_k_hybrd(a, t))

Simpson's Rule returning 0

I coded a function for Simpson's Rule of numerical integration. For values of n more than or equal to 34, the function returns 0.
Here, n is the number of intervals, a is the start point, and b is the end point.
import math
def simpsons(f, a,b,n):
x = []
h = (b-a)/n
for i in range(n+1):
x.append(a+i*h)
I=0
for i in range(1,(n/2)+1):
I+=f(x[2*i-2])+4*f(x[2*i-1])+f(x[2*i])
return I*(h/3)
def func(x):
return (x**(3/2))/(math.cosh(x))
x = []
print(simpsons(func,0,100,34))
I am not sure why this is happening. I also coded a function for the Trapezoidal Method and that does not return 0 even when n = 50. What is going on here?

Wikipedia has the code for Simpson's rule in Python :
from __future__ import division # Python 2 compatibility
import math
def simpson(f, a, b, n):
"""Approximates the definite integral of f from a to b by the
composite Simpson's rule, using n subintervals (with n even)"""
if n % 2:
raise ValueError("n must be even (received n=%d)" % n)
h = (b - a) / n
s = f(a) + f(b)
for i in range(1, n, 2):
s += 4 * f(a + i * h)
for i in range(2, n-1, 2):
s += 2 * f(a + i * h)
return s * h / 3
def func(x):
return (x**(3/2))/(math.cosh(x))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Handling large numbers to calculate the number of combinations - python

Related

binom.pmf only returning zero

Graphing Intergration in Python

Wiki example for Arnoldi iteration only works for real matrices?

Close form solution for finding a root

Simpson's Rule returning 0

Categories

Resources