I read that numpy is unbiased in rounding and that it works the way its designed. That "if you always round 0.5 up to the next largest number, then the average of a bunch rounded numbers is likely to be slightly larger than the average of the unrounded numbers: this bias or drift can have very bad effects on some numerical algorithms and make them inaccurate."
Disregarding this information and assuming that I always want to round up, how can I do it in numpy? Assuming my array can be quite large.
For simplicity, lets assume i have the array:
import numpy as np
A = [ [10, 15, 30], [25, 134, 41], [134, 413, 51]]
A = np.array(A, dtype=np.int16)
decimal = A * .1
whole = np.round(decimal)
decimal looks like:
[[ 1. 1.5 3. ]
[ 2.5 13.4 4.1]
[ 13.4 41.3 5.1]]
whole looks like:
[[ 1. 2. 3.]
[ 2. 13. 4.]
[ 13. 41. 5.]]
As you can see, 1.5 rounded to 2 and 2.5 also rounded to 2. How can I force to always get a round up answer for a XX.5? I know I can loop through the array and use python round() but that would definitely be much slower. Was wondering if there is a way to do it using numpy functions
The answer is almost never np.vectorize. You can, and should, do this in a fully vectorized manner. Let's say that for x >= 0, you want r = floor(x + 0.5). If you want negative numbers to round towards zero, the same formula applies for x < 0. So let's say that you always want to round away from zero. In that case, you are looking for ceil(x - 0.5) for x < 0.
To implement that for an entire array without calling np.vectorize, you can use masking:
def round_half_up(x):
mask = (x >= 0)
out = np.empty_like(x)
out[mask] = np.floor(x[mask] + 0.5)
out[~mask] = np.ceil(x[~mask] - 0.5)
return out
Notice that you don't need to use a mask if you round all in one direction:
def round_up(x):
return np.floor(x + 0.5)
Now if you want to make this really efficient, you can get rid of all the temp arrays. This will use the full power of ufuncs:
def round_half_up(x):
out = x.copy()
mask = (out >= 0)
np.add(out, 0.5, where=mask, out=out)
np.floor(out, where=mask, out=out)
np.invert(mask, out=mask)
np.subtract(out, 0.5, where=mask, out=out)
np.ceil(out, where=mask, out=out)
return out
And:
def round_up(x):
out = x + 0.5
np.floor(out, out=out)
return out
import numpy as np
A = [ [1.0, 1.5, 3.0], [2.5, 13.4, 4.1], [13.4, 41.3, 5.1]]
A = np.array(A)
print(A)
def rounder(x):
if (x-int(x) >= 0.5):
return np.ceil(x)
else:
return np.floor(x)
rounder_vec = np.vectorize(rounder)
whole = rounder_vec(A)
print(whole)
Alternatively, you can also look at numpy.ceil, numpy.floor, numpy.trunc for other rounding styles
Related
I have 2 NumPy arrays like the below
array_1 = np.array([1.2, 2.3, -1.0, -0.5])
array_2 = np.array([-0.5, 1.3, 2.5, -0.9])
We can do the element-wise simple arithmetic calculation (addition, subtraction, division etc) easily using different np functions
array_sum = np.add(array_1, array_2)
print(array_sum) # [ 0.7 3.6 3.5 -0.4]
array_sign = np.sign(array_1 * array_2)
print(array_sign) # [-1. 1. 1. -1.]
However, I need to check element-wise multiple conditions for 2 arrays and want to save them in 2 new arrays (say X and Y).
For example, if both elements contain different sign (e.g.: 1st and 3rd element pairs of the given example)) then, X will contain 0 and Y will be the sum of the poitive element and abs(negative element)
X = [0]
Y = [1.7]
When both elements are positive (e.g.: 2nd element pair of the given example) then, X will contain the lower value and Y will contain the greater value
X = [1.3]
Y = [2.3]
If both elements are negative, then, X will be 0 and Y will be the sum of the abs(negative element) and abs(negative element)
So, the final X and Y will be something like
X = [0, 1.3, 0, 0]
Y = [1.7, 2.3, 3.5, 1.4]
I have gone through some posts (this, and this) that described, the comparison procedures between 2 arrays, but not getting idea for multiple conditions. Here, 2 arrays are very small but, my real arrays are very large (e.g.: contains 2097152 element per array).
Any ideas are highly appreciated.
Try with numpy.select:
conditions = [(array_1>0)&(array_2>0), (array_1<0)&(array_2<0)]
choiceX = [np.minimum(array_1, array_2), np.zeros(len(array_1))]
choiceY = [np.maximum(array_1, array_2), -np.add(array_1,array_2)]
X = np.select(conditions, choiceX)
Y = np.select(conditions, choiceY, np.add(np.abs(array_1), np.abs(array_2)))
>>> X
array([0. , 1.3, 0. , 0. ])
>>> Y
array([1.7, 2.3, 3.5, 1.4])
This will do it. It does require vertically stacking the two arrays. I'm sure someone will pipe up if there is a more efficient solution.
import numpy as np
array_1 = np.array([1.2, 2.3, -1.0, -0.5])
array_2 = np.array([-0.5, 1.3, 2.5, -0.9])
def pick(t):
if t[0] < 0 or t[1] < 0:
return (0,abs(t[0])+abs(t[1]))
return (t.min(), t.max())
print( np.apply_along_axis( pick, 0, np.vstack((array_1,array_2))))
Output:
[[0. 1.3 0. 0. ]
[1.7 2.3 3.5 1.4]]
The second line of the function can also be written:
return (0,np.abs(t).sum())
But since these will only be two-element arrays, I doubt that saves anything at all.
I was converting some codes from Matlab to Python.
In matlab there is the function mod which gives the modulo operation.
For example the following example shows different results between the matlab mod and the equivalent numpy remainder operation:
Matlab:
>> mod(6, 0.05)
ans =
0
Numpy:
np.remainder(6, 0.05)
0.04999999999999967
np.mod(6, 0.05)
0.04999999999999967
Python modulus operator gives the same results as numpy.
6%0.05
0.04999999999999967
Is there anything in python which gives the same mod operation as the one in Matlab? preferably can be operated on numpy 2d/3d arrays.
numpy documentation says that numpy.mod is equivalent to matlab mod.
This is the core of the problem, in python:
>>> 6/0.05 == 120
True
>>> 6//0.05 == 120 # this is 119 instead
False
The floating-point result of 6/0.05 is close enough to 120 (i.e. within the resolution of double precision) that it gets rounded to 120.0. However, it's ever so slightly smaller than 120, so explicit floor division will truncate that number to 119 before it could be normalized to 120.0.
Some proof:
>>> from decimal import Decimal
... print(6/Decimal(0.05)) # exactly approximate
... print(6/Decimal('0.05')) # exact
119.9999999999999933386618522
1.2E+2
The first number is what you'd first get with 6/0.05, but the number 119.9999999999999933386618522 gets rounded to the nearest number representable with double precision, and this is 120. One can easily prove that these two numbers are indeed the same within double precision:
>>> print(6/Decimal('0.05') - 6/Decimal(0.05))
6.6613381478E-15
>>> 120 - 6.6613381478E-15 == 120
True
Now here's help mod from MATLAB:
MOD(x,y) returns x - floor(x./y).*y if y ~= 0, carefully computed to
avoid rounding error. If y is not an integer and the quotient x./y is
within roundoff error of an integer, then n is that integer.
This suggests that when x/y is close to an integer then it's rounded first, rather than being truncated like in python. So MATLAB goes out of its way to do some magic with the floating-point results.
The simplest solution would be to round your numbers yourself (unless you can use something like decimal.Decimal, but this means you should forgo native doubles entirely, including literals) and reproduce MATLAB's mod that way, assuming that makes sense for your use cases.
Here is a workaround. It basically rounds the denominator to its str representation and from there does integer arithmetic:
import numpy as np
import decimal
def round_divmod(b,a):
n,d = np.frompyfunc(lambda x:decimal.Decimal(x).as_integer_ratio(),1,2)(a.astype('U'))
n,d = n.astype(int),d.astype(int)
q,r = np.divmod(b*d,n)
return q,r/d
a = np.round(np.linspace(0.05,1,20),2).reshape(4,5)
a
# array([[0.05, 0.1 , 0.15, 0.2 , 0.25],
# [0.3 , 0.35, 0.4 , 0.45, 0.5 ],
# [0.55, 0.6 , 0.65, 0.7 , 0.75],
# [0.8 , 0.85, 0.9 , 0.95, 1. ]])
round_divmod(6,a)
# (array([[120, 60, 40, 30, 24],
# [ 20, 17, 15, 13, 12],
# [ 10, 10, 9, 8, 8],
# [ 7, 7, 6, 6, 6]]), array([[0. , 0. , 0. , 0. , 0. ],
# [0. , 0.05, 0. , 0.15, 0. ],
# [0.5 , 0. , 0.15, 0.4 , 0. ],
# [0.4 , 0.05, 0.6 , 0.3 , 0. ]]))
I am looking to get :
input:
arange(0.0,0.6,0.2)
output:
0.,0.4
I want
0.,0.2,0.4,0.6
how do i achieve using range or arange. If not what is alternate ?
A simpler approach to get the desired output is to add the step size in the upper limit. For instance,
np.arange(start, end + step, step)
would allow you to include the end point as well. In your case:
np.arange(0.0, 0.6 + 0.2, 0.2)
would result in
array([0. , 0.2, 0.4, 0.6]).
In short
I wrote a function crange, which does what you require.
In the example below, orange does the job of numpy.arange
crange(1, 1.3, 0.1) >>> [1. 1.1 1.2 1.3]
orange(1, 1.3, 0.1) >>> [1. 1.1 1.2]
crange(0.0, 0.6, 0.2) >>> [0. 0.2 0.4 0.6]
orange(0.0, 0.6, 0.2) >>> [0. 0.2 0.4]
Background information
I had your problem a view times as well. I usually quick-fixed it with adding a small value to stop. As mentioned by Kasrâmvd in the comments, the issue is a bit more complex, as floating point rounding errors can occur in numpy.arange (see here and here).
Unexpected behavior can be found in this example:
>>> numpy.arange(1, 1.3, 0.1)
array([1. , 1.1, 1.2, 1.3])
To clear up things a bit for myself, I decided to stop using numpy.arange if not needed specifically. I instead use my self defined function orange to avoid unexpected behavior. This combines numpy.isclose and numpy.linspace.
Here is the Code
Enough bla bla - here is the code ^^
import numpy as np
def cust_range(*args, rtol=1e-05, atol=1e-08, include=[True, False]):
"""
Combines numpy.arange and numpy.isclose to mimic
open, half-open and closed intervals.
Avoids also floating point rounding errors as with
>>> numpy.arange(1, 1.3, 0.1)
array([1. , 1.1, 1.2, 1.3])
args: [start, ]stop, [step, ]
as in numpy.arange
rtol, atol: floats
floating point tolerance as in numpy.isclose
include: boolean list-like, length 2
if start and end point are included
"""
# process arguments
if len(args) == 1:
start = 0
stop = args[0]
step = 1
elif len(args) == 2:
start, stop = args
step = 1
else:
assert len(args) == 3
start, stop, step = tuple(args)
# determine number of segments
n = (stop-start)/step + 1
# do rounding for n
if np.isclose(n, np.round(n), rtol=rtol, atol=atol):
n = np.round(n)
# correct for start/end is exluded
if not include[0]:
n -= 1
start += step
if not include[1]:
n -= 1
stop -= step
return np.linspace(start, stop, int(n))
def crange(*args, **kwargs):
return cust_range(*args, **kwargs, include=[True, True])
def orange(*args, **kwargs):
return cust_range(*args, **kwargs, include=[True, False])
print('crange(1, 1.3, 0.1) >>>', crange(1, 1.3, 0.1))
print('orange(1, 1.3, 0.1) >>>', orange(1, 1.3, 0.1))
print('crange(0.0, 0.6, 0.2) >>>', crange(0.0, 0.6, 0.2))
print('orange(0.0, 0.6, 0.2) >>>', orange(0.0, 0.6, 0.2))
Interesting that you get that output. Running arange(0.0,0.6,0.2) I get:
array([0. , 0.2, 0.4])
Regardless, from the numpy.arange docs: Values are generated within the half-open interval [start, stop) (in other words, the interval including start but excluding stop).
Also from the docs: When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use numpy.linspace for these cases
The only thing I can suggest to achieve what you want is to modify the stop parameter and add a very small amount, for example
np.arange(0.0, 0.6 + 0.001 ,0.2)
Returns
array([0. , 0.2, 0.4, 0.6])
Which is your desired output.
Anyway, it is better to use numpy.linspace and set endpoint=True
Old question, but it can be done much easier.
def arange(start, stop, step=1, endpoint=True):
arr = np.arange(start, stop, step)
if endpoint and arr[-1]+step==stop:
arr = np.concatenate([arr,[end]])
return arr
print(arange(0, 4, 0.5, endpoint=True))
print(arange(0, 4, 0.5, endpoint=False))
which gives
[0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. ]
[0. 0.5 1. 1.5 2. 2.5 3. 3.5]
A simple example using np.linspace (mentioned numerous times in other answers, but no simple examples were present):
import numpy as np
start = 0.0
stop = 0.6
step = 0.2
num = round((stop - start) / step) + 1 # i.e. length of resulting array
np.linspace(start, stop, num)
>>> array([0.0, 0.2, 0.4, 0.6])
Assumption: stop is a multiple of step. round is necessary to correct for floating point error.
Ok I will leave this solution, here. First step is to calculate the fractional portion of number of items given the bounds [a,b] and the step amount. Next calculate an appropriate amount to add to the end that will not effect the size of the result numpy array and then call the np.arrange().
import numpy as np
def np_arange_fix(a, b, step):
nf = (lambda n: n-int(n))((b - a)/step+1)
bb = (lambda x: step*max(0.1, x) if x < 0.5 else 0)(nf)
arr = np.arange(a, b+bb, step)
if int((b-a)/step+1) != len(arr):
print('I failed, expected {} items, got {} items, arr-out{}'.format(int((b-a)/step), len(arr), arr))
raise
return arr
print(np_arange_fix(1.0, 4.4999999999999999, 1.0))
print(np_arange_fix(1.0, 4 + 1/3, 1/3))
print(np_arange_fix(1.0, 4 + 1/3, 1/3 + 0.1))
print(np_arange_fix(1.0, 6.0, 1.0))
print(np_arange_fix(0.1, 6.1, 1.0))
Prints:
[1. 2. 3. 4.]
[1. 1.33333333 1.66666667 2. 2.33333333 2.66666667
3. 3.33333333 3.66666667 4. 4.33333333]
[1. 1.43333333 1.86666667 2.3 2.73333333 3.16666667
3.6 4.03333333]
[1. 2. 3. 4. 5. 6.]
[0.1 1.1 2.1 3.1 4.1 5.1 6.1]
If you want to compact this down to a function:
def np_arange_fix(a, b, step):
b += (lambda x: step*max(0.1, x) if x < 0.5 else 0)((lambda n: n-int(n))((b - a)/step+1))
return np.arange(a, b, step)
I use numpy to calculate matrix multiply.
If I use t = t * x, it works just fine, but if I use t *= x, it doesn't.
Do I need to use t = t * x?
import numpy as np
if __name__ == '__main__':
x = [
[0.9, 0.075, 0.025],
[0.15, 0.8, 0.05],
[0.25, 0.25, 0.5]
]
t = [1, 0, 0]
x = np.matrix(x)
t = np.matrix(t)
t = t * x # work , [[ 0.9 0.075 0.025]]
# t *= x # not work? always [[0 0 0]]
print t
You filled t with ints rather than floats, so NumPy decides you want a matrix of integer dtype. When you do t *= x, this requests that the operation be performed in place, reusing the t object to store the result. This forces the results to be cast to integers, so they can be stored in t.
Initialize t with floats:
t = numpy.matrix([1.0, 0.0, 0.0])
I would also recommend switching to plain arrays, rather than matrices. The convenience of * over dot isn't worth the inconsistencies matrix causes. If you're on Python 3.5 or later, you can even use # for matrix multiplication with regular arrays.
So, let us say that we are given an array of ints like:
x = numpy.array([0, 1, 1, 0])
We are also given another array, with floats, the same length as x:
y = numpy.array([-1.5, 2.2, -1.0, 1.0])
I want to use x and y to make an array z such that z[i] = y[i] if y[i] <= 0 (regardless of what x[i] is), but z[i] = 0 if x[i] = 1 AND y[i] > 0. So, using our example arrays:
z = [-1.5, 0, -1.0, 1.0]
This would be easy to do if I were using Python for loops, but I don't want to use Python for loops. Another idea is to write it using for loops, and then simply use something like Cython or Numba to speed up the for loop.
However, I want to use Numpy functions as much as possible (that's what makes this question a question), but I don't really see how. Maybe using masks? How would you do it?
Method #1: enforce the condition directly.
>>> z = y.copy()
>>> z[(x == 1) & (y > 0)] = 0
>>> z
array([-1.5, 0. , -1. , 1. ])
Method #2: use np.where:
>>> np.where((x == 1) & (y > 0), 0, y)
array([-1.5, 0. , -1. , 1. ])