I was converting some codes from Matlab to Python.
In matlab there is the function mod which gives the modulo operation.
For example the following example shows different results between the matlab mod and the equivalent numpy remainder operation:
Matlab:
>> mod(6, 0.05)
ans =
0
Numpy:
np.remainder(6, 0.05)
0.04999999999999967
np.mod(6, 0.05)
0.04999999999999967
Python modulus operator gives the same results as numpy.
6%0.05
0.04999999999999967
Is there anything in python which gives the same mod operation as the one in Matlab? preferably can be operated on numpy 2d/3d arrays.
numpy documentation says that numpy.mod is equivalent to matlab mod.
This is the core of the problem, in python:
>>> 6/0.05 == 120
True
>>> 6//0.05 == 120 # this is 119 instead
False
The floating-point result of 6/0.05 is close enough to 120 (i.e. within the resolution of double precision) that it gets rounded to 120.0. However, it's ever so slightly smaller than 120, so explicit floor division will truncate that number to 119 before it could be normalized to 120.0.
Some proof:
>>> from decimal import Decimal
... print(6/Decimal(0.05)) # exactly approximate
... print(6/Decimal('0.05')) # exact
119.9999999999999933386618522
1.2E+2
The first number is what you'd first get with 6/0.05, but the number 119.9999999999999933386618522 gets rounded to the nearest number representable with double precision, and this is 120. One can easily prove that these two numbers are indeed the same within double precision:
>>> print(6/Decimal('0.05') - 6/Decimal(0.05))
6.6613381478E-15
>>> 120 - 6.6613381478E-15 == 120
True
Now here's help mod from MATLAB:
MOD(x,y) returns x - floor(x./y).*y if y ~= 0, carefully computed to
avoid rounding error. If y is not an integer and the quotient x./y is
within roundoff error of an integer, then n is that integer.
This suggests that when x/y is close to an integer then it's rounded first, rather than being truncated like in python. So MATLAB goes out of its way to do some magic with the floating-point results.
The simplest solution would be to round your numbers yourself (unless you can use something like decimal.Decimal, but this means you should forgo native doubles entirely, including literals) and reproduce MATLAB's mod that way, assuming that makes sense for your use cases.
Here is a workaround. It basically rounds the denominator to its str representation and from there does integer arithmetic:
import numpy as np
import decimal
def round_divmod(b,a):
n,d = np.frompyfunc(lambda x:decimal.Decimal(x).as_integer_ratio(),1,2)(a.astype('U'))
n,d = n.astype(int),d.astype(int)
q,r = np.divmod(b*d,n)
return q,r/d
a = np.round(np.linspace(0.05,1,20),2).reshape(4,5)
a
# array([[0.05, 0.1 , 0.15, 0.2 , 0.25],
# [0.3 , 0.35, 0.4 , 0.45, 0.5 ],
# [0.55, 0.6 , 0.65, 0.7 , 0.75],
# [0.8 , 0.85, 0.9 , 0.95, 1. ]])
round_divmod(6,a)
# (array([[120, 60, 40, 30, 24],
# [ 20, 17, 15, 13, 12],
# [ 10, 10, 9, 8, 8],
# [ 7, 7, 6, 6, 6]]), array([[0. , 0. , 0. , 0. , 0. ],
# [0. , 0.05, 0. , 0.15, 0. ],
# [0.5 , 0. , 0.15, 0.4 , 0. ],
# [0.4 , 0.05, 0.6 , 0.3 , 0. ]]))
Related
I have a cumulative transition matrix and need to build a simple random walk algorithm to generate let's say 500 values from the matrix as efficiently as possible (the actual matrix is 1000 x 1000)
cum_sum
[array([0.3, 0.5, 0.7, 0.9, 1. ]),
array([0.5 , 0.5 , 0.5 , 0.75, 1. ]),
array([0.66666667, 1. , 1. , 1. , 1. ]),
array([0.4, 0.6, 0.8, 1. , 1. ]),
array([0.5, 0.5, 0.5, 1. , 1. ])]
Select the initial state i in the matrix randomly
Produce a random value between 0 and 1
The value of the random number is compared with the elements of the i-th row of the cumulative matrix. If the random number value was greater then the cumulative probability of the previous state but less than or equal to the cumulative probability of the following state the followin state is adopted.
Something I tried
def random_walk(cum_sum):
start_point=random.choice([item[0] for item in cum_sum])
random=np.random.uniform(0,1,1)
if random > start_point:
You can use numpy random choice at each stage to simulate a transition.
You can give the probability as the argument p, and the first positional argument define the sample space. In order to convert the cumulative probability distribution to a probability distribution what I do is to insert a zero at the beginning and use numpy diff to compute the increase at each step.
Preparing your example probabilities
P = np.array([
[0.3, 0.5, 0.7, 0.9, 1. ],
[0.5 , 0.5 , 0.5 , 0.75, 1. ],
[0.66666667, 1. , 1. , 1. , 1. ],
[0.4, 0.6, 0.8, 1. , 1. ],
[0.5, 0.5, 0.5, 1. , 1. ]])
P = np.diff(np.hstack([np.zeros((len(P), 1)), P]), axis=1)
Then run a few steps
i = 0;
path = [0]
I = np.arange(len(P))
for _ in range(10):
i = np.random.choice(I, p = P[i])
path.append(i);
print(path)
Trying to generate numbers using np.random.random:
for portfolio in range(2437):
weights = np.random.random(3)
weights /= np.sum(weights)
print(weights)
It works just as expected:
[0.348674 0.329747 0.321579]
[0.215606 0.074008 0.710386]
[0.350316 0.589782 0.059901]
[0.639651 0.025353 0.334996]
[0.697505 0.171061 0.131434]
.
.
.
.
however, how do i change the numbers such that each row is is limited to 1 decimal, like:
[0.1 0.2 0.7]
[0.2 0.2 0.6]
[0.5 0.4 0.1]
.
.
.
.
You can use
In [1]: weights.round(1)
Out[2]: array([0.4, 0.5, 0.2])
The argument to round is the amount of decimal digits you want. It also accepts negative arguments, meaning rounding to a larger-than-1 power of ten:
In [2]: np.array([123, 321, 332]).round(-1)
Out[2]: array([120, 320, 330])
For visualization only, you can use np.set_printoptions:
import numpy as np
np.set_printoptions(precision=1, suppress=True)
np.random.rand(4, 4)
array([[0.8, 0.8, 0.3, 0.3],
[0.1, 0.2, 0. , 0.2],
[0.8, 0.2, 1. , 0.2],
[0.2, 0.7, 0.6, 0.2]])
you can try np.round:
weights = np.round(weights, 1)
maybe my answer is not the most efficient but there is it:
for portfolio in range(2437):
weights = np.random.random(3)
weights /= np.sum(weights)
t_weights = []
for num in weights:
num *= 10
num = int(num)
num = float(num) / 10
t_weights.append(num)
weights = t_weights
print(weights)
I read that numpy is unbiased in rounding and that it works the way its designed. That "if you always round 0.5 up to the next largest number, then the average of a bunch rounded numbers is likely to be slightly larger than the average of the unrounded numbers: this bias or drift can have very bad effects on some numerical algorithms and make them inaccurate."
Disregarding this information and assuming that I always want to round up, how can I do it in numpy? Assuming my array can be quite large.
For simplicity, lets assume i have the array:
import numpy as np
A = [ [10, 15, 30], [25, 134, 41], [134, 413, 51]]
A = np.array(A, dtype=np.int16)
decimal = A * .1
whole = np.round(decimal)
decimal looks like:
[[ 1. 1.5 3. ]
[ 2.5 13.4 4.1]
[ 13.4 41.3 5.1]]
whole looks like:
[[ 1. 2. 3.]
[ 2. 13. 4.]
[ 13. 41. 5.]]
As you can see, 1.5 rounded to 2 and 2.5 also rounded to 2. How can I force to always get a round up answer for a XX.5? I know I can loop through the array and use python round() but that would definitely be much slower. Was wondering if there is a way to do it using numpy functions
The answer is almost never np.vectorize. You can, and should, do this in a fully vectorized manner. Let's say that for x >= 0, you want r = floor(x + 0.5). If you want negative numbers to round towards zero, the same formula applies for x < 0. So let's say that you always want to round away from zero. In that case, you are looking for ceil(x - 0.5) for x < 0.
To implement that for an entire array without calling np.vectorize, you can use masking:
def round_half_up(x):
mask = (x >= 0)
out = np.empty_like(x)
out[mask] = np.floor(x[mask] + 0.5)
out[~mask] = np.ceil(x[~mask] - 0.5)
return out
Notice that you don't need to use a mask if you round all in one direction:
def round_up(x):
return np.floor(x + 0.5)
Now if you want to make this really efficient, you can get rid of all the temp arrays. This will use the full power of ufuncs:
def round_half_up(x):
out = x.copy()
mask = (out >= 0)
np.add(out, 0.5, where=mask, out=out)
np.floor(out, where=mask, out=out)
np.invert(mask, out=mask)
np.subtract(out, 0.5, where=mask, out=out)
np.ceil(out, where=mask, out=out)
return out
And:
def round_up(x):
out = x + 0.5
np.floor(out, out=out)
return out
import numpy as np
A = [ [1.0, 1.5, 3.0], [2.5, 13.4, 4.1], [13.4, 41.3, 5.1]]
A = np.array(A)
print(A)
def rounder(x):
if (x-int(x) >= 0.5):
return np.ceil(x)
else:
return np.floor(x)
rounder_vec = np.vectorize(rounder)
whole = rounder_vec(A)
print(whole)
Alternatively, you can also look at numpy.ceil, numpy.floor, numpy.trunc for other rounding styles
I am looking to get :
input:
arange(0.0,0.6,0.2)
output:
0.,0.4
I want
0.,0.2,0.4,0.6
how do i achieve using range or arange. If not what is alternate ?
A simpler approach to get the desired output is to add the step size in the upper limit. For instance,
np.arange(start, end + step, step)
would allow you to include the end point as well. In your case:
np.arange(0.0, 0.6 + 0.2, 0.2)
would result in
array([0. , 0.2, 0.4, 0.6]).
In short
I wrote a function crange, which does what you require.
In the example below, orange does the job of numpy.arange
crange(1, 1.3, 0.1) >>> [1. 1.1 1.2 1.3]
orange(1, 1.3, 0.1) >>> [1. 1.1 1.2]
crange(0.0, 0.6, 0.2) >>> [0. 0.2 0.4 0.6]
orange(0.0, 0.6, 0.2) >>> [0. 0.2 0.4]
Background information
I had your problem a view times as well. I usually quick-fixed it with adding a small value to stop. As mentioned by Kasrâmvd in the comments, the issue is a bit more complex, as floating point rounding errors can occur in numpy.arange (see here and here).
Unexpected behavior can be found in this example:
>>> numpy.arange(1, 1.3, 0.1)
array([1. , 1.1, 1.2, 1.3])
To clear up things a bit for myself, I decided to stop using numpy.arange if not needed specifically. I instead use my self defined function orange to avoid unexpected behavior. This combines numpy.isclose and numpy.linspace.
Here is the Code
Enough bla bla - here is the code ^^
import numpy as np
def cust_range(*args, rtol=1e-05, atol=1e-08, include=[True, False]):
"""
Combines numpy.arange and numpy.isclose to mimic
open, half-open and closed intervals.
Avoids also floating point rounding errors as with
>>> numpy.arange(1, 1.3, 0.1)
array([1. , 1.1, 1.2, 1.3])
args: [start, ]stop, [step, ]
as in numpy.arange
rtol, atol: floats
floating point tolerance as in numpy.isclose
include: boolean list-like, length 2
if start and end point are included
"""
# process arguments
if len(args) == 1:
start = 0
stop = args[0]
step = 1
elif len(args) == 2:
start, stop = args
step = 1
else:
assert len(args) == 3
start, stop, step = tuple(args)
# determine number of segments
n = (stop-start)/step + 1
# do rounding for n
if np.isclose(n, np.round(n), rtol=rtol, atol=atol):
n = np.round(n)
# correct for start/end is exluded
if not include[0]:
n -= 1
start += step
if not include[1]:
n -= 1
stop -= step
return np.linspace(start, stop, int(n))
def crange(*args, **kwargs):
return cust_range(*args, **kwargs, include=[True, True])
def orange(*args, **kwargs):
return cust_range(*args, **kwargs, include=[True, False])
print('crange(1, 1.3, 0.1) >>>', crange(1, 1.3, 0.1))
print('orange(1, 1.3, 0.1) >>>', orange(1, 1.3, 0.1))
print('crange(0.0, 0.6, 0.2) >>>', crange(0.0, 0.6, 0.2))
print('orange(0.0, 0.6, 0.2) >>>', orange(0.0, 0.6, 0.2))
Interesting that you get that output. Running arange(0.0,0.6,0.2) I get:
array([0. , 0.2, 0.4])
Regardless, from the numpy.arange docs: Values are generated within the half-open interval [start, stop) (in other words, the interval including start but excluding stop).
Also from the docs: When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use numpy.linspace for these cases
The only thing I can suggest to achieve what you want is to modify the stop parameter and add a very small amount, for example
np.arange(0.0, 0.6 + 0.001 ,0.2)
Returns
array([0. , 0.2, 0.4, 0.6])
Which is your desired output.
Anyway, it is better to use numpy.linspace and set endpoint=True
Old question, but it can be done much easier.
def arange(start, stop, step=1, endpoint=True):
arr = np.arange(start, stop, step)
if endpoint and arr[-1]+step==stop:
arr = np.concatenate([arr,[end]])
return arr
print(arange(0, 4, 0.5, endpoint=True))
print(arange(0, 4, 0.5, endpoint=False))
which gives
[0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. ]
[0. 0.5 1. 1.5 2. 2.5 3. 3.5]
A simple example using np.linspace (mentioned numerous times in other answers, but no simple examples were present):
import numpy as np
start = 0.0
stop = 0.6
step = 0.2
num = round((stop - start) / step) + 1 # i.e. length of resulting array
np.linspace(start, stop, num)
>>> array([0.0, 0.2, 0.4, 0.6])
Assumption: stop is a multiple of step. round is necessary to correct for floating point error.
Ok I will leave this solution, here. First step is to calculate the fractional portion of number of items given the bounds [a,b] and the step amount. Next calculate an appropriate amount to add to the end that will not effect the size of the result numpy array and then call the np.arrange().
import numpy as np
def np_arange_fix(a, b, step):
nf = (lambda n: n-int(n))((b - a)/step+1)
bb = (lambda x: step*max(0.1, x) if x < 0.5 else 0)(nf)
arr = np.arange(a, b+bb, step)
if int((b-a)/step+1) != len(arr):
print('I failed, expected {} items, got {} items, arr-out{}'.format(int((b-a)/step), len(arr), arr))
raise
return arr
print(np_arange_fix(1.0, 4.4999999999999999, 1.0))
print(np_arange_fix(1.0, 4 + 1/3, 1/3))
print(np_arange_fix(1.0, 4 + 1/3, 1/3 + 0.1))
print(np_arange_fix(1.0, 6.0, 1.0))
print(np_arange_fix(0.1, 6.1, 1.0))
Prints:
[1. 2. 3. 4.]
[1. 1.33333333 1.66666667 2. 2.33333333 2.66666667
3. 3.33333333 3.66666667 4. 4.33333333]
[1. 1.43333333 1.86666667 2.3 2.73333333 3.16666667
3.6 4.03333333]
[1. 2. 3. 4. 5. 6.]
[0.1 1.1 2.1 3.1 4.1 5.1 6.1]
If you want to compact this down to a function:
def np_arange_fix(a, b, step):
b += (lambda x: step*max(0.1, x) if x < 0.5 else 0)((lambda n: n-int(n))((b - a)/step+1))
return np.arange(a, b, step)
The title is a bit misleading, because it's not exactly x and x, it's x and 0.3; however, the values should be the same.
I have:
arr = np.arange(0, 1.1, 0.1)
and I receive:
arr[arr <= 0.3]
> array([0., 0.1, 0.2])
The correct result should be:
arr[arr <= 0.3]
> array([0., 0.1, 0.2, 0.3])
I have not yet stumbled upon this problem. I know it is related to floating point precision ... but what can I do here?
Don't rely on comparing floats for equality (unless you know exactly what floats you are dealing with).
Since you know the stepsize used to generate the array is 0.1,
arr = np.arange(0, 1.1, 0.1)
you could increase the threshold value, 0.3, by half the stepsize to find a new threshold which is safely between values in arr:
In [48]: stepsize = 0.1; arr[arr < 0.3+(stepsize/2)]
Out[48]: array([ 0. , 0.1, 0.2, 0.3])
By the way, the 1.1 in np.arange(0, 1.1, 0.1) is an application of the same idea -- given the vagaries of floating-point arithmetic, we couldn't be sure that 1.0 would be included if we wrote np.arange(0, 1.0, 0.1), so the right endpoint was increased by the stepsize.
Fundamentally, the problem boils down to floating-point arithmetic being inaccurate:
In [17]: 0.1+0.2 == 0.3
Out[17]: False
So the fourth value in the array is a little bit greater than 0.3.
In [40]: arr = np.arange(0,1.1, 0.1)
In [41]: arr[3]
Out[41]: 0.30000000000000004
Note that rounding may not be a viable solution. For example,
if arr has dtype float128:
In [53]: arr = np.arange(0, 1.1, 0.1, dtype='float128')
In [56]: arr[arr.round(1) <= 0.3]
Out[56]: array([ 0.0, 0.1, 0.2], dtype=float128)
Although making the dtype float128 made arr[3] closer to the decimal 0.3,
In [54]: arr[3]
Out[54]: 0.30000000000000001665
now rounding does not produce a number less than 0.3:
In [55]: arr.round(1)[3]
Out[55]: 0.30000000000000000001
Unutbu points out the main problem. You should avoid comparing floating point numbers, as they have a round off error.
However this is a problem many people come across, therefore there is a function that helps you getting around this problem; np.isclose in your case this would lead to:
arr[np.logical_or(arr <= 0.3, np.isclose(0.3, arr))]
>>> array([0., 0.1, 0.2, 0.3])
In this case this might not be the best option, but it might be helpful to know about this function.
Sidenote:
In case nobody has ever explained to you, why this happens. Basically computer save everything in binary, however 0.1 is a periodic number in binary, this means that the computer can't save all the digits (as there are infinitely many). The equivalent in decimal would be:
1/3+1/3+1/3 = 0.33333 + 0.33333 + 0.33333 = 0.99999
Which is not 1