I have the following question:
Suppose that a given coin is known to be either a fair coin or else a biased coin such as that described in part a). Although you do not know which it is, you assign a prior probability of 0.8 to the hypothesis that the coin is fair.
The coin is tossed and lands heads. Use Bayes’ theorem to determine the probability that the coin is fair and the probability that the coin is biased* given this observation.
The Theorem is as follows:
P(A|B) = P(B|A) / P(B) * P(A)
The code below is my attempt but I got an error saying I can't use "|" with floats:
print(((0.2|0.8) / 0.2) * 0.8)
The operator "|" is called "binary OR" and it is a binary operator, does not work for floats.
Additionally, P(B|A) is a single probability, read as "Probability of B given condition A happening." It is not the "|" of the single probabilities. In order for Bayes' theorem to work, you need to have 3 inputs, not two.
Check out Bayes' Theorem on Math is Fun
Related
I happen to have a numpy array of floats:
a.dtype, a.shape
#(dtype('float64'), (32769,))
The values are:
a[0]
#3.699822718929953
all(a == a[0])
True
However:
a.mean()
3.6998227189299517
The mean is off by 15th and 16th figure.
Can anybody show how this difference is accumulated over 30K mean and if there is a way to avoid it?
In case it matters my OS is 64 bit.
Here is a rough approximation of a bound on the maximum error. This will not be representative of average error, and it could be improved with more analysis.
Consider calculating a sum using floating-point arithmetic with round-to-nearest ties-to-even:
sum = 0;
for (i = 0; i < n; ++n)
sum += a[i];
where each a[i] is in [0, m).
Let ULP(x) denote the unit of least precision in the floating-point number x. (For example, in the IEEE-754 binary64 format with 53-bit significands, if the largest power of 2 not greater than |x| is 2p, then ULP(x) = 2p−52. With round-to-nearest, the maximum error in any operation with result x is ½ULP(x).
If we neglect rounding errors, the maximum value of sum after i iterations is i•m. Therefore, a bound on the error in the addition in iteration i is ½ULP(i•m). (Actually zero for i=1, since that case adds to zero, which has no error, but we neglect that for this approximation.) Then the total of the bounds on all the additions is the sum of ½ULP(i•m) for i from 1 to n. This is approximately ½•n•(n+1)/2•ULP(m) = ¼•n•(n+1)•ULP(m). (This is an approximation because it moves i outside the ULP function, but ULP is a discontinuous function. It is “approximately linear,“ but there are jumps. Since the jumps are by factors of two, the approximation can be off by at most a factor of two.)
So, with 32,769 elements, we can say the total rounding error will be at most about ¼•32,769•32,770•ULP(m), about 2.7•108 times the ULP of the maximum element value. The ULP is 2−52 times the greatest power of two not less than m, so that is about 2.7•108•2−52 = 6•10−8 times m.
Of course, the likelihood that 32,768 sums (not 32,769 because the first necessarily has no error) all round in the same direction by chance is vanishingly small but I conjecture one might engineer a sequence of values that gets close to that.
An Experiment
Here is a chart of (in blue) the mean error over 10,000 samples of summing arrays with sizes 100 to 32,800 by 100s and elements drawn randomly from a uniform distribution over [0, 1). The error was calculated by comparing the sum calculated with float (IEEE-754 binary32) to that calculated with double (IEEE-754 binary64). (The samples were all multiples of 2−24, and double has enough precision so that the sum for up to 229 such values is exact.)
The green line is c n √n with c set to match the last point of the blue line. We see it tracks the blue line over the long term. At points where the average sum crosses a power of two, the mean error increases faster for a time. At these points, the sum has entered a new binade, and further additions have higher average errors due to the increased ULP. Over the course of the binade, this fixed ULP decreases relative to n, bringing the blue line back to the green line.
This is due to incapability of float64 type to store the sum of your float numbers with correct precision. In order to get around this problem you need to use a larger data type of course*. Numpy has a longdouble dtype that you can use in such cases:
In [23]: np.mean(a, dtype=np.longdouble)
Out[23]: 3.6998227189299530693
Also, note:
In [25]: print(np.longdouble.__doc__)
Extended-precision floating-point number type, compatible with C
``long double`` but not necessarily with IEEE 754 quadruple-precision.
Character code: ``'g'``.
Canonical name: ``np.longdouble``.
Alias: ``np.longfloat``.
Alias *on this platform*: ``np.float128``: 128-bit extended-precision floating-point number type.
* read the comments for more details.
The mean is (by definition):
a.sum()/a.size
Unfortunately, adding all those values up and dividing accumulates floating point errors. They are usually around the magnitude of:
np.finfo(np.float).eps
Out[]: 2.220446049250313e-16
Yeah, e-16, about where you get them. You can make the error smaller by using higher-accuracy floats like float128 (if your system supports it) but they'll always accumulate whenever you're summing a large number of float together. If you truly want the identity, you'll have to hardcode it:
def mean_(arr):
if np.all(arr == arr[0]):
return arr[0]
else:
return arr.mean()
In practice, you never really want to use == between floats. Generally in numpy we use np.isclose or np.allclose to compare floats for exactly this reason. There are ways around it using other packages and leveraging arcane machine-level methods of calculating numbers to get (closer to) exact equality, but it's rarely worth the performance and clarity hit.
For the moment, put aside any issues relating to pseudorandom number generators and assume that numpy.random.rand perfectly samples from the discrete distribution of floating point numbers over [0, 1). What are the odds getting at least two exactly identical floating point numbers in the result of:
numpy.random.rand(n)
for any given value of n?
Mathematically, I think this is equivalent to first asking how many IEEE 754 singles or doubles there are in the interval [0, 1). Then I guess the next step would be to solve the equivalent birthday problem? I'm not really sure. Anyone have some insight?
The computation performed by numpy.random.rand for each element generates a number 0.<53 random bits>, for a total of 2^53 equally likely outputs. (Of course, the memory representation isn't a fixed-point 0.stuff; it's still floating point.) This computation is incapable of producing most binary64 floating-point numbers between 0 and 1; for example, it cannot produce 1/2^60. You can see the code in numpy/random/mtrand/randomkit.c:
double
rk_double(rk_state *state)
{
/* shifts : 67108864 = 0x4000000, 9007199254740992 = 0x20000000000000 */
long a = rk_random(state) >> 5, b = rk_random(state) >> 6;
return (a * 67108864.0 + b) / 9007199254740992.0;
}
(Note that rk_random produces 32-bit outputs, regardless of the size of long.)
Assuming a perfect source of randomness, the probability of repeats in numpy.random.rand(n) is 1-(1-0/k)(1-1/k)(1-2/k)...(1-(n-1)/k), where k=2^53. It's probably best to use an approximation instead of calculating this directly for large values of n. (The approximation may even be more accurate, depending on how the approximation error compares to the rounding error accumulated in a direct computation.)
I think you are correct, this is like the birthday problem.
But you need to decide on the number of possible options. You do this by deciding the precision of your floating point numbers.
For example, if you decide to have a precision of 2 numbers after the dot, then there are 100 options(including zero and excluding 1).
And if you have n numbers then the probability of not having a collision is:
or when given R possible numbers and N data points, the probability of no collision is:
And of collision is 1 - P.
This is because the probability of getting any given number is 1/R. And at any point, the probability of a data point not colliding with prior data points is (R-i)/R for i being the index of the data point. But to get the probability of no data points colliding with each other, we need to multiply all the probabilities of data points not colliding with those prior to them. Applying some algebraic operations, we get the equation above.
I want to find numerical solutions to the following exponential equation where a,b,c,d are constants and I want to solve for r, which is not equal to 1.
a^r + b^r = c^r + d^r (Equation 1)
I define a function in order to use Scipy.optimize.fsolve:
from scipy.optimize import fsolve
def func(r,a,b,c,d):
if r==1:
return 10**5
else:
return ( a**(1-r) + b**(1-r) ) - ( c**(1-r) + d**(1-r) )
fsolve(funcp,0.1, args=(5,5,4,7))
However, the fsolve always returns 1 as the solution, which is not what I want. Can someone help me with this issue? Or in general, tell me how to solve (Equation 1). I used an online numerical solver long time ago, but I cannot find it anymore. That's why I am trying to figure it out using Python.
You need to apply some mathematical reasoning when choosing the initial guess. Consider your problem f(r) = (51-r + 51-r) − (41-r + 71-r)
When r ≤ 1, f(r) is always negative and decreasing (since 71-r is growing much faster than other terms). Therefore, all root-finding algorithms will be pushed to right towards 1 until reaching this local solution.
You need to pick a point far away from 1 on the right to find the nontrivial solution:
>>> scipy.optimize.fsolve(lambda r: 5**(1-r)+5**(1-r)-4**(1-r)-7**(1-r), 2.0)
array([ 2.48866034])
Simply setting f(1) = 105 is not going to have any effect, as the root-finding algorithm won't check f(1) until the very last step(note).
If you wish to apply a penalty, the penalty must be applied to a range of value around 1. One way to do so, without affecting the position of other roots, is to divide the whole function by (r − 1):
>>> scipy.optimize.fsolve(lambda r: (5**(1-r)+5**(1-r)-4**(1-r)-7**(1-r)) / (r-1), 0.1)
array([ 2.48866034])
(note): they may climb like f(0.1) → f(0.4) → f(0.7) → f(0.86) → f(0.96) → f(0.997) → … and stop as soon as |f(x)| < 10-5, so your f(1) is never evaluated
First of your code seems to uses a different equation than your question: 1-r instead of just r.
Valid answers to the equation is 1 and 2.4886 approximately as can be seen here. With the second argument of fsolve you specify a starting estimate. I think due to 0.1 being close to 1 you get that result. Using the 2.1 as starting estimate I get the other answer 2.4886.
from scipy.optimize import fsolve
def func(r,a,b,c,d):
if r==1:
return 10**5
else:
return ( a**(1-r) + b**(1-r) ) - ( c**(1-r) + d**(1-r) )
print(fsolve(func, 2.1, args=(5,5,4,7)))
Chosing a starting estimate is tricky as many give the following error: ValueError: Integers to negative integer powers are not allowed.
I found some Python code that claims checking primality based on Fermat's little theorem:
def CheckIfProbablyPrime(x):
return (2 << x - 2) % x == 1
My questions:
How does it work?
What's its relation to Fermat's little theorem?
How accurate is this method?
If it's not accurate, what's the advantage of using it?
I found it here.
1. How does it work?
Fermat's little theorem says that if a number x is prime, then for any integer a:
If we divide both sides by a, then we can re-write the equation as follows:
I'm going to punt on proving how this works (your first question) because there are many good proofs (better than I can provide) on this wiki page and under some Google searches.
2. Relation between code and theorem
So, the function you posted checks if (2 << x - 2) % x == 1.
First off, (2 << x-2) is the same thing as writing 2**(x-1), or in math-form:
That's because << is the logical left-shift operator, which is explained better here. The relation between bit-shifting and multiplying by powers of 2 is specific to the way that numbers are represented on computers (in binary), but it all boils down to
I can subtract 1 from the exponent on both sides, which gives
Now, we know from above that for any number a,
Let's say then that a = 2. That gives us
Well heck, that's the same as 2 << (x-2)! So then we can write:
Which leads to the final relation:
Now, the math version of mod looks kind of odd, but we can write the equivalent code as follows:
(2 << x - 2) % x == 1
And that's the relation.
3. Accuracy of method
So, I think "accuracy" is a bad term here, because Fermat's little theorem is definitely true for all prime numbers. However, that does not mean that it's true or false for all numbers -- which is to say, if I have some number i, and I'm not sure if i is prime, using Fermat's Little Relation will only tell me if it is definitely NOT prime. If Fermat's Little Relation is true, then i could not be prime. These kinds of numbers are called pseudoprime numbers, or more specifically in this case Fermat Pseudoprime numbers.
If this sort of thing sounds interesting, take a look at the Carmichael numbers AKA the Absolute Fermat Pseudoprimes, which pass the Fermat test in any base but are not prime. In our case we run into numbers which pass in base 2, but Fermat's little theorem might not hold for these numbers in other bases -- the Carmichael numbers pass the test for all bases coprime to x.
On the wiki page of the Carmichael there is a discussion of their distribution over the range of natural numbers -- they appear exponentially with the size of the range over which you're looking, though the exponent is less than 1 (about 1/3). So, if you're searching for primes over a big range, you're going to run into exponentially more Carmichael numbers, which are effectively false positives for this method CheckIfProbablyPrime. That might be okay, depending on your input and how much you care about running into false positives.
4. Why is this useful?
In short, it's an optimization.
The main reason to use something like this is to speed up a search for prime numbers. That's because actually checking if a number is prime is expensive -- i.e. more than O(1) running time. Doable, but still more expensive than O(1) time. So, if we can avoid doing that actual check for some numbers, we'll be able to devote more time to checking actual candidates. Since Fermat's little relation will only say yes if a number is possibly prime (it will never say no if the number is prime), and it can be checked in O(1) time, we can toss it into an is_prime loop to ignore a fair amount of numbers. So, we can speed things up.
There are many primality checks like this one, you can find some coded prime checkers here
Final Note
One of the confusing things about this optimization is that it uses the bit shift operator << instead of the exponentiation operator **. This is because bit shifting is one of the fastest operations that your computer can do, while exponentiation is slower by some amount. It is not always the best optimization in many cases, because most modern languages know how to replace things we write with more optimized operations. But, that's my venture as to why the authors of this code used the bit shift instead of 2**(x-1).
Edit: As MarkDickinson notes, taking the exponent of a number and then modding it explicitly is not the best way to do it. This is a thing called modular exponentiation, and there exist algorithms which can do it faster than the way we've written it. Python's builtin pow actually implements one of these algorithms, and takes an optional third argument to mod by. So we can write a final version of this function:
def CheckIfProbablyPrime(x):
return pow(2, x-1, x) == 1
Which is not only more readable but also faster than the confusing bit-shift crap. You know what they say.
I believe, the code in your example is incorrect because binary left shift operator is not equivalent to power of a number, which is used in Fermat's little theorem. With base of two, binary left shift would be equal to power of x + 1, which is NOT used in a version of Fermat's little format.
Instead, use ** for power of integer in Python.
def CheckIfProbablyPrime(x):
return (2 ** x - 2) % x == 0
" p − a is an integer multiple of p " therefore for primes, following theorem, result of 2 in power of x - 2 divided by x will leave a leftover of 0 (modulo '%' checks for number left over after division.
For x - 1 version,
def CheckIfProbablyPrime(a, x):
return (a ** (x-1) - 1) % x == 0
both variations should result as true for prime numbers, because they're representing the Fermat's little theorem in Python
I have to make a program that can take from the user two polynomials (string) to calculate the result. The problem is that in polynomials the program must sum the coefficients that have the same power. I have to make it using class. I'm beginner in python. Thank you
Do you really want to parse the polynomials from strings? Because you can always represent polynomials as a list of coefficients for each of the terms. E.g.
[3 0 2 1]
would represent the polynomial
3 + 2*x^2 + x^3 == 0
Once you have that representation, summing the polynomials is trivial.
If you want to accept strings as input, you should first extract the coefficients from the string and build a coefficient vector, and then proceed.
Edit: Reversed the representation of the polynomial to make it more natural in the sense that element i represents the coefficient of x^i.