Hypothesis strategy generating inf when specifically asked not to - python

from functools import partial
import hypothesis as h
import hypothesis.strategies as hs
import hypothesis.extra.numpy as hnp
import numpy as np
floats_notnull = partial(hs.floats, allow_nan=False, allow_infinity=False)
complex_notnull = partial(hs.complex_numbers, allow_nan=False, allow_infinity=False)
data_strategy_real = hnp.arrays(
np.float64,
hs.tuples(hs.integers(min_value=2, max_value=50),
hs.integers(min_value=2, max_value=5)),
floats_notnull()
)
data_strategy_complex = hnp.arrays(
np.complex64,
hs.tuples(hs.integers(min_value=2, max_value=50), hs.just(1)),
complex_notnull()
)
data_strategy = hs.one_of(data_strategy_real, data_strategy_complex)
If you run data_strategy.example() a couple times, you'll notice that some of the values in the result have infinite real or imaginary parts. My intention here was to specifically disallow infinite or NaN parts.
What am I doing wrong?
Update: if I use
data_strategy = hs.lists(complex_notnull, min_size=2, max_size=50)
and convert that to an array inside my test, the problem appears to go away. Are the complex numbers overflowing? I'm not getting the usual deprecation warning about overflow from Hypothesis.
And if I use
data_strategy = data_strategy_real
no infs appear.

The complex64 type is too small and it's overflowing. Somehow Hypothesis is failing to catch this.
Yep, the root cause of this problem is that you're generating 64-bit finite floats, then casting them to 32-bit (because complex64 is a pair of 32-bit floats). You can fix that with the width=32 argument to floats():
floats_notnull_32 = partial(hs.floats, allow_nan=False, allow_infinity=False, width=32)
And you're not getting the usual overflow check because it's only implemented for floats and integers at the moment. I've opened (edit: and fixed) issue #1591 to check complex and string types too.

The complex64 type is too small and it's overflowing. Somehow Hypothesis is failing to catch this.
Switching to complex128 fixed the problem for now.

Related

Numpy power returns negative value

I want to plot the Poisson distribution and get negative probabilities for lambda >= 9.
This code generates plots for different lambdas:
import numpy as np
import matplotlib.pyplot as plt
from scipy.special import factorial
for lambda_val in range(1, 12, 2):
plt.figure()
k = np.arange(0,20)
y = np.power(lambda_val, k)*np.exp(-lambda_val)/factorial(k)
plt.bar(k, y)
plt.title('lambda = ' + str(lambda_val))
plt.xlabel('k')
plt.ylabel('probability')
plt.ylim([-0.1, 0.4])
plt.grid()
plt.show()
Please see these two plots:
Lambda = 5 looks fine in my opinion.
Lambda = 9 not.
I'm quite sure it has something to do with np.power because
np.power(11, 9)
gives me: -1937019605, whereas
11**9
gives me: 2357947691 (same in WolframAlpha).
But if I avoid np.power and use
y = (lambda_val**k)*math.exp(-lambda_val)/factorial(k)
for calculating the probability, I get negative values as well. I am totally confused. Can anybody explain me the effect or what am I doing wrong? Thanks in advance. :)
Your problem is due to 32-bit integer overflows. This happens because Numpy is sometimes compiled with 32-bit integer even though the platform (OS + processor) is a 64-bit one. There is an overflow because Numpy automatically transform the unbounded integer of the Python interpreter to the native np.int_ type. You can check if this type is a 64-bit one using np.int_ is np.int64. AFAIK, the default Numpy binary package compiled for Windows available on Python Pip use 32-bit integers and the one of the Linux packages use 64-bit integers (assuming you are on a 64-bit platform).
The issue can be easily reproduced using:
In [546]: np.power(np.int32(11), np.int32(9))
Out[546]: -1937019605
It can also be solved using:
In [547]: np.power(np.int64(11), np.int64(9))
Out[547]: 2357947691
In the second expression, you use k which is of type np.int_ by default and this is certainly why you get the same problem. Hopefully, you can specify to Numpy that the integer should be bigger. Note that Numpy have some implicit rule to avoid overflow but this is hard to avoid them in all case without strongly impacting performance. Here is a fixed formula:
k = np.arange(0, 20, dtype=np.int64)
y = np.power(lambda_val, k) * np.exp(-lambda_val) / factorial(k)
The rule of thumb is to be very careful about implicit conversions when you get unexpected results.

datetime.timedelta(x,y) returns TypeError on CoCalc.com but works elsewhere -- Why?

My code works on onlinegdb.com but not on CoCalc.com.
import datetime
slowduration = datetime.timedelta(0,1)
print(slowduration)
Returns
TypeError: unsupported type for timedelta seconds component: sage.rings.integer.Integer
It isn't clear to me if this is a feature or a bug.
To complement #kcrisman's answer and the "int(0), int(1)" trick...
Two other options if one wants to stick to the Sage kernel are
(1) disable the preparser with preparser(False),
(2) append r (for "raw") to the integers, eg datetime.timedelta(0r, 1r).
See also similar questions and answers around Sage's preparsing of floats and integers:
(a) Stack Overflow question 40578746: Sage and NumPy
(b) Stack Overflow question 28426920: Unsized object with numpy.random.permutation
(c) Stack Overflow question 16289354: Why is range(0, log(len(list), 2)) slow?
Finally, note that code can be loaded into Sage from external files using either:
load('/path/to/filename.py')
load('/path/to/filename.sage')
where .sage files will get "Sage-preparsed" while .py files will not.
This gives a third option to bypass the preparser: load code from a .pyfile.
If anyone else has a problem like this - It turns out that I was using the Sage math kernel and not the Python math kernel. This website offers something like 15 different kernels.
Jacob's self-answer is correct; here are a few more details.
In SageMath there is something called a preparser which interprets things so that integers are mathematical integers, not Python ints. So for example:
sage: preparse('1+1')
'Integer(1)+Integer(1)'
There is a lot more that involves - try preparse('f(x)=x^2') for some real fun. But yes, it's a feature.
To fix your problem within the Sage kernel, though, you could just do this:
import datetime
slowduration = datetime.timedelta(int(0),int(1))
print(slowduration)
to get 0:00:01 as your answer.

numpy.sum() giving strange results on large arrays

I seem to have found a pitfall with using .sum() on numpy arrays but I'm unable to find an explanation. Essentially, if I try to sum a large array then I start getting nonsensical answers but this happens silently and I can't make sense of the output well enough to Google the cause.
For example, this works exactly as expected:
a = sum(xrange(2000))
print('a is {}'.format(a))
b = np.arange(2000).sum()
print('b is {}'.format(b))
Giving the same output for both:
a is 1999000
b is 1999000
However, this does not work:
c = sum(xrange(200000))
print('c is {}'.format(c))
d = np.arange(200000).sum()
print('d is {}'.format(d))
Giving the following output:
c is 19999900000
d is -1474936480
And on an even larger array, it's possible to get back a positive result. This is more insidious because I might not identify that something unusual was happening at all. For example this:
e = sum(xrange(100000000))
print('e is {}'.format(e))
f = np.arange(100000000).sum()
print('f is {}'.format(f))
Gives this:
e is 4999999950000000
f is 887459712
I guessed that this was to do with data types and indeed even using the python float seems to fix the problem:
e = sum(xrange(100000000))
print('e is {}'.format(e))
f = np.arange(100000000, dtype=float).sum()
print('f is {}'.format(f))
Giving:
e is 4999999950000000
f is 4.99999995e+15
I have no background in Comp. Sci. and found myself stuck (perhaps this is a dupe). Things I've tried:
numpy arrays have a fixed size. Nope; this seems to show I should hit a MemoryError first.
I might somehow have a 32-bit installation (probably not relevant); nope, I followed this and confirmed I have 64-bit.
Other examples of weird sum behaviour; nope (?) I found this but I can't see how it applies.
Can someone please explain briefly what I'm missing and tell me what I need to read up on? Also, other than remembering to define a dtype each time, is there a way to stop this happening or give a warning?
Possibly relevant:
Windows 7
numpy 1.11.3
Running out of Enthought Canopy on Python 2.7.9
On Windows (on 64-bit system too) the default integer NumPy uses if you convert from Python ints is 32-bit. On Linux and Mac it is 64-bit.
Specify a 64-bit integer and it will work:
d = np.arange(200000, dtype=np.int64).sum()
print('d is {}'.format(d))
Output:
c is 19999900000
d is 19999900000
While not most elegant, you can do some monkey patching, using functools.partial:
from functools import partial
np.arange = partial(np.arange, dtype=np.int64)
From now on np.arange works with 64-bit integers as default.
This is clearly numpy's integer type overflowing 32-bits. Normally you can configure numpy to fail in such situations using np.seterr:
>>> import numpy as np
>>> np.seterr(over='raise')
{'divide': 'warn', 'invalid': 'warn', 'over': 'warn', 'under': 'ignore'}
>>> np.int8(127) + np.int8(2)
FloatingPointError: overflow encountered in byte_scalars
However, sum is explicitly documented with the behaviour "No error is raised on overflow", so you might be out of luck here. Using numpy is often a trade-off of performance for convenience!
You can however manually specify the dtype for the accumulator, like this:
>>> a = np.ones(129)
>>> a.sum(dtype=np.int8) # will overflow
-127
>>> a.sum(dtype=np.int64) # no overflow
129
Watch ticket #593, because this is an open issue and it might be fixed by numpy devs sometime.
I'm not a numpy expert, but can reproduce your arange(200000) result in pure Python:
>>> s = 0
>>> for i in range(200000):
... s += i
... s &= 0xffffffff
>>> s
2820030816
>>> s.bit_length()
32
>>> s - 2**32 # adjust for that "the sign bit" is set
-1474936480
In other words, the result you're seeing is what I expect if numpy is doing its arithmetic on signed 2's-complement 32-bit integers.
Since I'm not a numpy expert, I can't suggest a good approach to never getting surprised (I would have left this as a comment, but I couldn't show nicely formatted code then).
Numpy's default integer type is the same as the C long type. Now, this isn't guaranteed to be 64-bits on a 64-bit platform. In fact, on Windows, long is always 32-bits.
As a result, the numpy sum is overflowing the value and looping back around.
Unfortunately, as far as I know, there is no way to change the default dtype. You'll have to specify it as np.int64 every time.
You could try to create your own arange:
def arange(*args, **kw):
return np.arange(dtype=np.int64, *args, **kw)
and then use that version instead of numpy's.
EDIT: If you want to flag this, you could just put something like this in the top of your code:
assert np.array(0).dtype.name != 'int32', 'This needs to be run with 64-bit integers!'

Passing a float array pointer through a python extension/wrapper – SndObj-library

So I'm feeling that Google is getting tired of trying to help me with this.
I've been trying to experiment some with the SndObj library as of late, and more specifically the python wrapper of it.
The library is kind enough to include a python example to play around with, the only issue being it to get it to work. The last line below is giving me a world of hurt:
from sndobj import SndObj, SndRTIO, HarmTable, Oscili, SND_OUTPUT
from scipy import zeros, pi, sin, float32
import numpy
sine = numpy.array([256],float32)
for i in range(sine.size):
sine[i] = 0.5 * sin((2 * pi * i) / sine.size)
sine *= 32768
obj = SndObj()
obj.PushIn(sine,256)
In the original code it was:
obj.PushIn(sine)
That gave me the error
TypeError: SndObj_PushIn() takes exactly 3 arguments (2 given)
Alright, fair enough. I check the (automatically generated) documentation and some example code around the web and find that it also wants an integer size. Said and done (I like how they have, what I'm guessing is at least, dated code in the example).
Anyway, new argument; new error:
TypeError: in method 'SndObj_PushIn', argument 2 of type 'float *'
I'm not experienced at all in c++, which I believe is the library's "native" (excuse my lack of proper terminology) language, but I'm pretty sure I've picked up that it wants a float array/vector as its second argument (the first being self). However, I am having a hard time accomplishing that. Isn't what I've got a float array/vector already? I've also, among other things, tried using float instead of float32 in the first line and float(32768) in the fourth to no avail.
Any help, suggestion or tip would be much appreciated!
EDIT:
Became unsure of the float vector/array part and went to the auto-docs again:
int SndObj::PushIn ( float * vector,
int size
)
So I'd say that at least the c++ wants a float array/vector, although I can of course still be wrong about the python wrapper.
UPDATE
As per Prune's request (saying that the error message isn't asking for a float vector, but saying that that's the error), I tried inputing different integer (int,int32, etc.) vectors instead. However, seeing that I still got the same error message and keeping the EDIT above in mind, I'd say that its actually supposed to be a float vector after all.
UPDATE2
After some hints from saulspatz I've changed the question title and tags to better formulate my problem. I did some further googling according to this as well, but am yet to dig out anything useful.
UDATE3
SOLVED
Actually, the problem is the opposite: PushIn takes an array of integers. The error message is complaining that you gave it floats. Try this in place of your call to PushIn
int_sine = numpy.array([256],int32)
int_sine = [int(x) for x in sine]
and then feed int_sine instead of sine to PushIn.
I don't really have an answer to your question, but I have some information for you that's too long to fit in a comment, and that I think may prove useful. I looked at the source of what I take to be the latest version, SndObj 2.6.7. In SndObj.h the definition of PushIn is
int PushIn(float *in_vector, int size){
for(int i = 0; i<size; i++){
if(m_vecpos >= m_vecsize) m_vecpos = 0;
m_output[m_vecpos++] = in_vector[i];
}
return m_vecpos;
}
so it's clear that size is the number of elements to push. (I presume this would be the number of elements in your array, and 256 is right.) The float* means a pointer to float; in_vector is just an identifier. I read the error message to mean that the function received a float when it was expecting a pointer to float. In a C++ program, you might pass a pointer to float by passing the name of an array of floats, though this is not the only way to do it.
I don't know anything about how python extensions are programmed, I'm sorry to say. From what I'm seeing, obj.PushIn(sine,256) looks right, but that's a naive view.
Perhaps with this information, you can formulate another question (or find another tag) that will attract the attention of someone who knows about writing python extensions in C/C++.
I hope this helps.
So finally managed to get it working (with some assistance the very friendly wrapper author)!
It turns out that there is a floatArray class in the sandbox-library which is used for passing float arrays to the c++-functions. I'm guessing that they included that after the numpy-test.py was written which threw me for a loop.
Functioning code:
from sndobj import SndObj, SndRTIO, SND_OUTPUT, floatArray
from scipy import pi, sin
# ---------------------------------------------------------------------------
# Test PushIn
# Create 1 frame of a sine wave in a numpy array
sine = floatArray(256)
for i in range(256):
sine[i] = float(32768*0.5 * sin((2 * pi * i) / 256))
obj = SndObj()
obj.PushIn(sine,256)
outp = SndRTIO(1, SND_OUTPUT)
outp.SetOutput(1, obj)
# Repeatedly output the 1 frame of sine wave
duration = outp.GetSr() * 2 # 2 seconds
i = 0
vector_size = outp.GetVectorSize()
while i < duration:
outp.Write()
i += vector_size

Python NumPy log2 vs MATLAB

I'm a Python newbie coming from using MATLAB extensively. I was converting some code that uses log2 in MATLAB and I used the NumPy log2 function and got a different result than I was expecting for such a small number. I was surprised since the precision of the numbers should be the same (i.e. MATLAB double vs NumPy float64).
MATLAB Code
a = log2(64);
--> a=6
Base Python Code
import math
a = math.log2(64)
--> a = 6.0
NumPy Code
import numpy as np
a = np.log2(64)
--> a = 5.9999999999999991
Modified NumPy Code
import numpy as np
a = np.log(64) / np.log(2)
--> a = 6.0
So the native NumPy log2 function gives a result that causes the code to fail a test since it is checking that a number is a power of 2. The expected result is exactly 6, which both the native Python log2 function and the modified NumPy code give using the properties of the logarithm. Am I doing something wrong with the NumPy log2 function? I changed the code to use the native Python log2 for now, but I just wanted to know the answer.
No. There is nothing wrong with the code, it is just because floating points cannot be represented perfectly on our computers. Always use an epsilon value to allow a range of error while checking float values. Read The Floating Point Guide and this post to know more.
EDIT - As cgohlke has pointed out in the comments,
Depending on the compiler used to build numpy np.log2(x) is either computed by the C library or as 1.442695040888963407359924681001892137*np.log(x) See this link.
This may be a reason for the erroneous output.

Categories

Resources