I want to plot the Poisson distribution and get negative probabilities for lambda >= 9.
This code generates plots for different lambdas:
import numpy as np
import matplotlib.pyplot as plt
from scipy.special import factorial
for lambda_val in range(1, 12, 2):
plt.figure()
k = np.arange(0,20)
y = np.power(lambda_val, k)*np.exp(-lambda_val)/factorial(k)
plt.bar(k, y)
plt.title('lambda = ' + str(lambda_val))
plt.xlabel('k')
plt.ylabel('probability')
plt.ylim([-0.1, 0.4])
plt.grid()
plt.show()
Please see these two plots:
Lambda = 5 looks fine in my opinion.
Lambda = 9 not.
I'm quite sure it has something to do with np.power because
np.power(11, 9)
gives me: -1937019605, whereas
11**9
gives me: 2357947691 (same in WolframAlpha).
But if I avoid np.power and use
y = (lambda_val**k)*math.exp(-lambda_val)/factorial(k)
for calculating the probability, I get negative values as well. I am totally confused. Can anybody explain me the effect or what am I doing wrong? Thanks in advance. :)
Your problem is due to 32-bit integer overflows. This happens because Numpy is sometimes compiled with 32-bit integer even though the platform (OS + processor) is a 64-bit one. There is an overflow because Numpy automatically transform the unbounded integer of the Python interpreter to the native np.int_ type. You can check if this type is a 64-bit one using np.int_ is np.int64. AFAIK, the default Numpy binary package compiled for Windows available on Python Pip use 32-bit integers and the one of the Linux packages use 64-bit integers (assuming you are on a 64-bit platform).
The issue can be easily reproduced using:
In [546]: np.power(np.int32(11), np.int32(9))
Out[546]: -1937019605
It can also be solved using:
In [547]: np.power(np.int64(11), np.int64(9))
Out[547]: 2357947691
In the second expression, you use k which is of type np.int_ by default and this is certainly why you get the same problem. Hopefully, you can specify to Numpy that the integer should be bigger. Note that Numpy have some implicit rule to avoid overflow but this is hard to avoid them in all case without strongly impacting performance. Here is a fixed formula:
k = np.arange(0, 20, dtype=np.int64)
y = np.power(lambda_val, k) * np.exp(-lambda_val) / factorial(k)
The rule of thumb is to be very careful about implicit conversions when you get unexpected results.
Related
I'm having a really hard time translating this Matlab code to Python.
I'll show you my effort so far.
This is the matlab code
Sigma=BW1/(2*(2*(-log(10^(att_bw/10)))^(1/Order))^(1/2))
Now I tried to used Python power operator as I studied earlier this morning **
My code is
BW1 = np.array([100])
att_bw = np.array([-3])
Order = np.array([1])
Sigma = BW1/(2*(2*(-np.log(10**(att_bw[0]/10)))**(1/Order))**(1/2))
However it says that it cannot handle negative powers unfortunately
The result for sigma should be 42.539
EDIT: it seems my code runs perfectly fine in Python 3. However I'm stuck with Python 2.7. So is there any easy way to port it?
In python2 you need to make sure you use floating point numbers. To make them so, add . after each integer you now have in your formula.
Like this:
import numpy as np
BW1 = np.array([100])
att_bw = np.array([-3])
Order = np.array([1])
Sigma = BW1/(2.*(2.*(-np.log(10.**(att_bw[0]/10.)))**(1./Order))**(1./2.))
print Sigma
Output
[42.53892736]
from functools import partial
import hypothesis as h
import hypothesis.strategies as hs
import hypothesis.extra.numpy as hnp
import numpy as np
floats_notnull = partial(hs.floats, allow_nan=False, allow_infinity=False)
complex_notnull = partial(hs.complex_numbers, allow_nan=False, allow_infinity=False)
data_strategy_real = hnp.arrays(
np.float64,
hs.tuples(hs.integers(min_value=2, max_value=50),
hs.integers(min_value=2, max_value=5)),
floats_notnull()
)
data_strategy_complex = hnp.arrays(
np.complex64,
hs.tuples(hs.integers(min_value=2, max_value=50), hs.just(1)),
complex_notnull()
)
data_strategy = hs.one_of(data_strategy_real, data_strategy_complex)
If you run data_strategy.example() a couple times, you'll notice that some of the values in the result have infinite real or imaginary parts. My intention here was to specifically disallow infinite or NaN parts.
What am I doing wrong?
Update: if I use
data_strategy = hs.lists(complex_notnull, min_size=2, max_size=50)
and convert that to an array inside my test, the problem appears to go away. Are the complex numbers overflowing? I'm not getting the usual deprecation warning about overflow from Hypothesis.
And if I use
data_strategy = data_strategy_real
no infs appear.
The complex64 type is too small and it's overflowing. Somehow Hypothesis is failing to catch this.
Yep, the root cause of this problem is that you're generating 64-bit finite floats, then casting them to 32-bit (because complex64 is a pair of 32-bit floats). You can fix that with the width=32 argument to floats():
floats_notnull_32 = partial(hs.floats, allow_nan=False, allow_infinity=False, width=32)
And you're not getting the usual overflow check because it's only implemented for floats and integers at the moment. I've opened (edit: and fixed) issue #1591 to check complex and string types too.
The complex64 type is too small and it's overflowing. Somehow Hypothesis is failing to catch this.
Switching to complex128 fixed the problem for now.
I seem to have found a pitfall with using .sum() on numpy arrays but I'm unable to find an explanation. Essentially, if I try to sum a large array then I start getting nonsensical answers but this happens silently and I can't make sense of the output well enough to Google the cause.
For example, this works exactly as expected:
a = sum(xrange(2000))
print('a is {}'.format(a))
b = np.arange(2000).sum()
print('b is {}'.format(b))
Giving the same output for both:
a is 1999000
b is 1999000
However, this does not work:
c = sum(xrange(200000))
print('c is {}'.format(c))
d = np.arange(200000).sum()
print('d is {}'.format(d))
Giving the following output:
c is 19999900000
d is -1474936480
And on an even larger array, it's possible to get back a positive result. This is more insidious because I might not identify that something unusual was happening at all. For example this:
e = sum(xrange(100000000))
print('e is {}'.format(e))
f = np.arange(100000000).sum()
print('f is {}'.format(f))
Gives this:
e is 4999999950000000
f is 887459712
I guessed that this was to do with data types and indeed even using the python float seems to fix the problem:
e = sum(xrange(100000000))
print('e is {}'.format(e))
f = np.arange(100000000, dtype=float).sum()
print('f is {}'.format(f))
Giving:
e is 4999999950000000
f is 4.99999995e+15
I have no background in Comp. Sci. and found myself stuck (perhaps this is a dupe). Things I've tried:
numpy arrays have a fixed size. Nope; this seems to show I should hit a MemoryError first.
I might somehow have a 32-bit installation (probably not relevant); nope, I followed this and confirmed I have 64-bit.
Other examples of weird sum behaviour; nope (?) I found this but I can't see how it applies.
Can someone please explain briefly what I'm missing and tell me what I need to read up on? Also, other than remembering to define a dtype each time, is there a way to stop this happening or give a warning?
Possibly relevant:
Windows 7
numpy 1.11.3
Running out of Enthought Canopy on Python 2.7.9
On Windows (on 64-bit system too) the default integer NumPy uses if you convert from Python ints is 32-bit. On Linux and Mac it is 64-bit.
Specify a 64-bit integer and it will work:
d = np.arange(200000, dtype=np.int64).sum()
print('d is {}'.format(d))
Output:
c is 19999900000
d is 19999900000
While not most elegant, you can do some monkey patching, using functools.partial:
from functools import partial
np.arange = partial(np.arange, dtype=np.int64)
From now on np.arange works with 64-bit integers as default.
This is clearly numpy's integer type overflowing 32-bits. Normally you can configure numpy to fail in such situations using np.seterr:
>>> import numpy as np
>>> np.seterr(over='raise')
{'divide': 'warn', 'invalid': 'warn', 'over': 'warn', 'under': 'ignore'}
>>> np.int8(127) + np.int8(2)
FloatingPointError: overflow encountered in byte_scalars
However, sum is explicitly documented with the behaviour "No error is raised on overflow", so you might be out of luck here. Using numpy is often a trade-off of performance for convenience!
You can however manually specify the dtype for the accumulator, like this:
>>> a = np.ones(129)
>>> a.sum(dtype=np.int8) # will overflow
-127
>>> a.sum(dtype=np.int64) # no overflow
129
Watch ticket #593, because this is an open issue and it might be fixed by numpy devs sometime.
I'm not a numpy expert, but can reproduce your arange(200000) result in pure Python:
>>> s = 0
>>> for i in range(200000):
... s += i
... s &= 0xffffffff
>>> s
2820030816
>>> s.bit_length()
32
>>> s - 2**32 # adjust for that "the sign bit" is set
-1474936480
In other words, the result you're seeing is what I expect if numpy is doing its arithmetic on signed 2's-complement 32-bit integers.
Since I'm not a numpy expert, I can't suggest a good approach to never getting surprised (I would have left this as a comment, but I couldn't show nicely formatted code then).
Numpy's default integer type is the same as the C long type. Now, this isn't guaranteed to be 64-bits on a 64-bit platform. In fact, on Windows, long is always 32-bits.
As a result, the numpy sum is overflowing the value and looping back around.
Unfortunately, as far as I know, there is no way to change the default dtype. You'll have to specify it as np.int64 every time.
You could try to create your own arange:
def arange(*args, **kw):
return np.arange(dtype=np.int64, *args, **kw)
and then use that version instead of numpy's.
EDIT: If you want to flag this, you could just put something like this in the top of your code:
assert np.array(0).dtype.name != 'int32', 'This needs to be run with 64-bit integers!'
Python, NumPy and R all use the same algorithm (Mersenne Twister) for generating random number sequences. Thus, theoretically speaking, setting the same seed should result in same random number sequences in all 3. This is not the case. I think the 3 implementations use different parameters causing this behavior.
R
>set.seed(1)
>runif(5)
[1] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819
Python
In [3]: random.seed(1)
In [4]: [random.random() for x in range(5)]
Out[4]:
[0.13436424411240122,
0.8474337369372327,
0.763774618976614,
0.2550690257394217,
0.49543508709194095]
NumPy
In [23]: import numpy as np
In [24]: np.random.seed(1)
In [25]: np.random.rand(5)
Out[25]:
array([ 4.17022005e-01, 7.20324493e-01, 1.14374817e-04,
3.02332573e-01, 1.46755891e-01])
Is there some way, where NumPy and Python implementation could produce the same random number sequence? Ofcourse as some comments and answers point out, one could use rpy. What I am specifically looking for is to fine tune the parameters in the respective calls in Python and NumPy to get the sequence.
Context: The concern comes from an EDX course offering in which R is used. In one of the forums, it was asked if Python could be used and the staff replied that some assignments would require setting specific seeds and submitting answers.
Related:
Comparing Matlab and Numpy code that uses random number generation From this it seems that the underlying NumPy and Matlab implementation are similar.
python vs octave random generator: This question does come fairly close to the intended answer. Some sort of wrapper around the default state generator is required.
use rpy2 to call r in python, here is a demo, the numpy array data is sharing memory with x in R:
import rpy2.robjects as robjects
data = robjects.r("""
set.seed(1)
x <- runif(5)
""")
print np.array(data)
data[1] = 1.0
print robjects.r["x"]
I realize this is an old question, but I've stumbled upon the same problem recently, and created a solution which can be useful to others.
I've written a random number generator in C, and linked it to both R and Python. This way, the random numbers are guaranteed to be the same in both languages since they are generated using the same C code.
The program is called SyncRNG and can be found here: https://github.com/GjjvdBurg/SyncRNG.
I'm a Python newbie coming from using MATLAB extensively. I was converting some code that uses log2 in MATLAB and I used the NumPy log2 function and got a different result than I was expecting for such a small number. I was surprised since the precision of the numbers should be the same (i.e. MATLAB double vs NumPy float64).
MATLAB Code
a = log2(64);
--> a=6
Base Python Code
import math
a = math.log2(64)
--> a = 6.0
NumPy Code
import numpy as np
a = np.log2(64)
--> a = 5.9999999999999991
Modified NumPy Code
import numpy as np
a = np.log(64) / np.log(2)
--> a = 6.0
So the native NumPy log2 function gives a result that causes the code to fail a test since it is checking that a number is a power of 2. The expected result is exactly 6, which both the native Python log2 function and the modified NumPy code give using the properties of the logarithm. Am I doing something wrong with the NumPy log2 function? I changed the code to use the native Python log2 for now, but I just wanted to know the answer.
No. There is nothing wrong with the code, it is just because floating points cannot be represented perfectly on our computers. Always use an epsilon value to allow a range of error while checking float values. Read The Floating Point Guide and this post to know more.
EDIT - As cgohlke has pointed out in the comments,
Depending on the compiler used to build numpy np.log2(x) is either computed by the C library or as 1.442695040888963407359924681001892137*np.log(x) See this link.
This may be a reason for the erroneous output.