The title is a bit misleading, because it's not exactly x and x, it's x and 0.3; however, the values should be the same.
I have:
arr = np.arange(0, 1.1, 0.1)
and I receive:
arr[arr <= 0.3]
> array([0., 0.1, 0.2])
The correct result should be:
arr[arr <= 0.3]
> array([0., 0.1, 0.2, 0.3])
I have not yet stumbled upon this problem. I know it is related to floating point precision ... but what can I do here?
Don't rely on comparing floats for equality (unless you know exactly what floats you are dealing with).
Since you know the stepsize used to generate the array is 0.1,
arr = np.arange(0, 1.1, 0.1)
you could increase the threshold value, 0.3, by half the stepsize to find a new threshold which is safely between values in arr:
In [48]: stepsize = 0.1; arr[arr < 0.3+(stepsize/2)]
Out[48]: array([ 0. , 0.1, 0.2, 0.3])
By the way, the 1.1 in np.arange(0, 1.1, 0.1) is an application of the same idea -- given the vagaries of floating-point arithmetic, we couldn't be sure that 1.0 would be included if we wrote np.arange(0, 1.0, 0.1), so the right endpoint was increased by the stepsize.
Fundamentally, the problem boils down to floating-point arithmetic being inaccurate:
In [17]: 0.1+0.2 == 0.3
Out[17]: False
So the fourth value in the array is a little bit greater than 0.3.
In [40]: arr = np.arange(0,1.1, 0.1)
In [41]: arr[3]
Out[41]: 0.30000000000000004
Note that rounding may not be a viable solution. For example,
if arr has dtype float128:
In [53]: arr = np.arange(0, 1.1, 0.1, dtype='float128')
In [56]: arr[arr.round(1) <= 0.3]
Out[56]: array([ 0.0, 0.1, 0.2], dtype=float128)
Although making the dtype float128 made arr[3] closer to the decimal 0.3,
In [54]: arr[3]
Out[54]: 0.30000000000000001665
now rounding does not produce a number less than 0.3:
In [55]: arr.round(1)[3]
Out[55]: 0.30000000000000000001
Unutbu points out the main problem. You should avoid comparing floating point numbers, as they have a round off error.
However this is a problem many people come across, therefore there is a function that helps you getting around this problem; np.isclose in your case this would lead to:
arr[np.logical_or(arr <= 0.3, np.isclose(0.3, arr))]
>>> array([0., 0.1, 0.2, 0.3])
In this case this might not be the best option, but it might be helpful to know about this function.
Sidenote:
In case nobody has ever explained to you, why this happens. Basically computer save everything in binary, however 0.1 is a periodic number in binary, this means that the computer can't save all the digits (as there are infinitely many). The equivalent in decimal would be:
1/3+1/3+1/3 = 0.33333 + 0.33333 + 0.33333 = 0.99999
Which is not 1
Related
I am using numpy in Python
I have an array of numbers, for example:
arr = np.array([0.1, 1, 1.2, 0.5, -0.3, -0.2, 0.1, 0.5, 1)
If i is a position in the array, I want to create a function which creates a running sum of i and the two previous numbers, but only accumulating the number if it is equal to or greater than 0.
In other words, negative numbers in the array become equal to 0 when calculating the three number running sum.
For example, the answer I would be looking for here is
2.3, 2.7, 1.7, 0.5, 0.1, 0.6, 1.6
The new array has two elements less than the original array as the calculation can't be completed for the first two number.
Thank you !
As Dani Mesejo answered, you can use stride tricks. You can either use clip or boolean indexing to handle the <0 elements. I have explained how stride tricks work below -
arr[arr<0]=0 sets all elements below 0 as 0
as_strided takes in the array, the expected shape of the view (7,3) and the number of strides in the respective axes, (8,8). This is the number of bytes you have to move in axis0 and axis1 respectively to access the next element. E.g. If you want to move every 2 elements, then you can set it to (16,8). This means you would move 16 bytes each time to get the element in axis0 (which is 0.1->1.2->0->0.1->.., till a shape of 7) and 8 bytes each time to get element in axis1 (which is 0.1->1->1.2, till a shape of 3)
Use this function with caution! Always use x.strides to define the strides parameter to avoid corrupting memory!
Lastly, sum this array view over axis=1 to get your rolling sum.
arr = np.array([0.1, 1, 1.2, 0.5, -0.3, -0.2, 0.1, 0.5, 1])
w = 3 #rolling window
arr[arr<0]=0
shape = arr.shape[0]-w+1, w #Expected shape of view (7,3)
strides = arr.strides[0], arr.strides[0] #Strides (8,8) bytes
rolling = np.lib.stride_tricks.as_strided(arr, shape=shape, strides=strides)
rolling_sum = np.sum(rolling, axis=1)
rolling_sum
array([2.3, 2.7, 1.7, 0.5, 0.1, 0.6, 1.6])
You could clip, roll and sum:
import numpy as np
def rolling_window(a, window):
"""Recipe from https://stackoverflow.com/q/6811183/4001592"""
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
a = np.array([0.1, 1, 1.2, 0.5, -0.3, -0.2, 0.1, 0.5, 1])
res = rolling_window(np.clip(a, 0, a.max()), 3).sum(axis=1)
print(res)
Output
[2.3 2.7 1.7 0.5 0.1 0.6 1.6]
You may use np.correlate to sweep an array of 3 ones over the clipped of arr to get desired output
In [20]: np.correlate(arr.clip(0), np.ones(3), mode='valid')
Out[20]: array([2.3, 2.7, 1.7, 0.5, 0.1, 0.6, 1.6])
arr = np.array([0.1, 1, 1.2, 0.5, -0.3, -0.2, 0.1, 0.5, 1])
def sum_3(x):
collector = []
for i in range(len(arr)-2):
collector.append(sum(arr[i:i+3][arr[i:i+3]>0]))
return collector
#output
[2.3, 2.7, 1.7, 0.5, 0.1, 0.6, 1.6]
Easiest and most comprehensible way. The collector will append the sum of the 3 consecutive numbers if their indices are True otherwise, they are all turned to 0s.
The method is not general, it is for 3 consecutives but you can adapt it.
def sum_any(x,n):
collector = []
for i in range(len(arr)-(n-1)):
collector.append(sum(arr[i:i+n][arr[i:i+n]>0]))
return collector
Masked arrays and view_as_windows (which uses numpy strides under the hood) are built for this purpose:
from skimage.util import view_as_windows
arr = view_as_windows(arr, 3)
arr2 = np.ma.masked_array(arr, arr<0).sum(-1)
output:
[2.3 2.7 1.7 0.5 0.1 0.6 1.6]
I am trying to write a function that returns the value of the smallest integer that needs to be multiplied for a list of floats to be all integers. I tried implementing something with the "Least Common Multiple," but I'm not sure if the math checks out...
Say I have the following list (or list-like object) of float values:
example = [0.5, 0.4, 0.2, 0.1]
How could I write a function that returns func(example) = 10 ?
Another example would be...
example = [0.05, 0.1, 0.7, 0.8]
> func(example)
20
Since...
> 20 * np.array(example)
np.array([1, 2, 14, 16])
And all are integers.
Find the largest decimal places, multiply it to the list, find gcd, and find the minimum integer multiplier.
import numpy as np
import decimal
from math import gcd
from functools import reduce
def find_gcd(lst):
x = reduce(gcd, lst)
return x
example = [0.05, 0.1, 0.7, 0.8, 0.9]
decimal_places = min([decimal.Decimal(str(val)).as_tuple().exponent for val in example])
x1 = np.array(example)
multiplier = 1/(10**decimal_places)
gcd_val = find_gcd(map(int, x1 * multiplier))
min_multipler = int(multiplier/gcd_val)
print('Minimum Integer Multipler: ', min_multipler)
If you don't like Decimal.
example = [0.05, 0.1, 0.7, 0.8, 0.9]
n_places = max([len(str(val).split('.')[1]) for val in example])
multiplier = 10**n_places
x1 = np.array(example)
gcd_val = find_gcd(map(int, x1 * multiplier))
min_multipler = int(multiplier/gcd_val)
print('Minimum Integer Multipler: ', min_multipler)
If you have an upper bound den_max on plausible denominators the fractions.Fraction class has a handy limit_denominator method.
For example:
import fractions
max_den = 1000
fractions.Fraction(1/3)
# probably not what we want
# Fraction(6004799503160661, 18014398509481984)
fractions.Fraction(1/3).limit_denominator(max_den)
# better
# Fraction(1, 3)
import sympy
example = [0.5, 0.4, 0.2, 0.1]
sympy.lcm([fractions.Fraction(x).limit_denominator(max_den).denominator for x in example])
# 10
example = [0.05, 0.1, 0.7, 0.8]
sympy.lcm([fractions.Fraction(x).limit_denominator(max_den).denominator for x in example])
# 20
I read that numpy is unbiased in rounding and that it works the way its designed. That "if you always round 0.5 up to the next largest number, then the average of a bunch rounded numbers is likely to be slightly larger than the average of the unrounded numbers: this bias or drift can have very bad effects on some numerical algorithms and make them inaccurate."
Disregarding this information and assuming that I always want to round up, how can I do it in numpy? Assuming my array can be quite large.
For simplicity, lets assume i have the array:
import numpy as np
A = [ [10, 15, 30], [25, 134, 41], [134, 413, 51]]
A = np.array(A, dtype=np.int16)
decimal = A * .1
whole = np.round(decimal)
decimal looks like:
[[ 1. 1.5 3. ]
[ 2.5 13.4 4.1]
[ 13.4 41.3 5.1]]
whole looks like:
[[ 1. 2. 3.]
[ 2. 13. 4.]
[ 13. 41. 5.]]
As you can see, 1.5 rounded to 2 and 2.5 also rounded to 2. How can I force to always get a round up answer for a XX.5? I know I can loop through the array and use python round() but that would definitely be much slower. Was wondering if there is a way to do it using numpy functions
The answer is almost never np.vectorize. You can, and should, do this in a fully vectorized manner. Let's say that for x >= 0, you want r = floor(x + 0.5). If you want negative numbers to round towards zero, the same formula applies for x < 0. So let's say that you always want to round away from zero. In that case, you are looking for ceil(x - 0.5) for x < 0.
To implement that for an entire array without calling np.vectorize, you can use masking:
def round_half_up(x):
mask = (x >= 0)
out = np.empty_like(x)
out[mask] = np.floor(x[mask] + 0.5)
out[~mask] = np.ceil(x[~mask] - 0.5)
return out
Notice that you don't need to use a mask if you round all in one direction:
def round_up(x):
return np.floor(x + 0.5)
Now if you want to make this really efficient, you can get rid of all the temp arrays. This will use the full power of ufuncs:
def round_half_up(x):
out = x.copy()
mask = (out >= 0)
np.add(out, 0.5, where=mask, out=out)
np.floor(out, where=mask, out=out)
np.invert(mask, out=mask)
np.subtract(out, 0.5, where=mask, out=out)
np.ceil(out, where=mask, out=out)
return out
And:
def round_up(x):
out = x + 0.5
np.floor(out, out=out)
return out
import numpy as np
A = [ [1.0, 1.5, 3.0], [2.5, 13.4, 4.1], [13.4, 41.3, 5.1]]
A = np.array(A)
print(A)
def rounder(x):
if (x-int(x) >= 0.5):
return np.ceil(x)
else:
return np.floor(x)
rounder_vec = np.vectorize(rounder)
whole = rounder_vec(A)
print(whole)
Alternatively, you can also look at numpy.ceil, numpy.floor, numpy.trunc for other rounding styles
I am looking to get :
input:
arange(0.0,0.6,0.2)
output:
0.,0.4
I want
0.,0.2,0.4,0.6
how do i achieve using range or arange. If not what is alternate ?
A simpler approach to get the desired output is to add the step size in the upper limit. For instance,
np.arange(start, end + step, step)
would allow you to include the end point as well. In your case:
np.arange(0.0, 0.6 + 0.2, 0.2)
would result in
array([0. , 0.2, 0.4, 0.6]).
In short
I wrote a function crange, which does what you require.
In the example below, orange does the job of numpy.arange
crange(1, 1.3, 0.1) >>> [1. 1.1 1.2 1.3]
orange(1, 1.3, 0.1) >>> [1. 1.1 1.2]
crange(0.0, 0.6, 0.2) >>> [0. 0.2 0.4 0.6]
orange(0.0, 0.6, 0.2) >>> [0. 0.2 0.4]
Background information
I had your problem a view times as well. I usually quick-fixed it with adding a small value to stop. As mentioned by Kasrâmvd in the comments, the issue is a bit more complex, as floating point rounding errors can occur in numpy.arange (see here and here).
Unexpected behavior can be found in this example:
>>> numpy.arange(1, 1.3, 0.1)
array([1. , 1.1, 1.2, 1.3])
To clear up things a bit for myself, I decided to stop using numpy.arange if not needed specifically. I instead use my self defined function orange to avoid unexpected behavior. This combines numpy.isclose and numpy.linspace.
Here is the Code
Enough bla bla - here is the code ^^
import numpy as np
def cust_range(*args, rtol=1e-05, atol=1e-08, include=[True, False]):
"""
Combines numpy.arange and numpy.isclose to mimic
open, half-open and closed intervals.
Avoids also floating point rounding errors as with
>>> numpy.arange(1, 1.3, 0.1)
array([1. , 1.1, 1.2, 1.3])
args: [start, ]stop, [step, ]
as in numpy.arange
rtol, atol: floats
floating point tolerance as in numpy.isclose
include: boolean list-like, length 2
if start and end point are included
"""
# process arguments
if len(args) == 1:
start = 0
stop = args[0]
step = 1
elif len(args) == 2:
start, stop = args
step = 1
else:
assert len(args) == 3
start, stop, step = tuple(args)
# determine number of segments
n = (stop-start)/step + 1
# do rounding for n
if np.isclose(n, np.round(n), rtol=rtol, atol=atol):
n = np.round(n)
# correct for start/end is exluded
if not include[0]:
n -= 1
start += step
if not include[1]:
n -= 1
stop -= step
return np.linspace(start, stop, int(n))
def crange(*args, **kwargs):
return cust_range(*args, **kwargs, include=[True, True])
def orange(*args, **kwargs):
return cust_range(*args, **kwargs, include=[True, False])
print('crange(1, 1.3, 0.1) >>>', crange(1, 1.3, 0.1))
print('orange(1, 1.3, 0.1) >>>', orange(1, 1.3, 0.1))
print('crange(0.0, 0.6, 0.2) >>>', crange(0.0, 0.6, 0.2))
print('orange(0.0, 0.6, 0.2) >>>', orange(0.0, 0.6, 0.2))
Interesting that you get that output. Running arange(0.0,0.6,0.2) I get:
array([0. , 0.2, 0.4])
Regardless, from the numpy.arange docs: Values are generated within the half-open interval [start, stop) (in other words, the interval including start but excluding stop).
Also from the docs: When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use numpy.linspace for these cases
The only thing I can suggest to achieve what you want is to modify the stop parameter and add a very small amount, for example
np.arange(0.0, 0.6 + 0.001 ,0.2)
Returns
array([0. , 0.2, 0.4, 0.6])
Which is your desired output.
Anyway, it is better to use numpy.linspace and set endpoint=True
Old question, but it can be done much easier.
def arange(start, stop, step=1, endpoint=True):
arr = np.arange(start, stop, step)
if endpoint and arr[-1]+step==stop:
arr = np.concatenate([arr,[end]])
return arr
print(arange(0, 4, 0.5, endpoint=True))
print(arange(0, 4, 0.5, endpoint=False))
which gives
[0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. ]
[0. 0.5 1. 1.5 2. 2.5 3. 3.5]
A simple example using np.linspace (mentioned numerous times in other answers, but no simple examples were present):
import numpy as np
start = 0.0
stop = 0.6
step = 0.2
num = round((stop - start) / step) + 1 # i.e. length of resulting array
np.linspace(start, stop, num)
>>> array([0.0, 0.2, 0.4, 0.6])
Assumption: stop is a multiple of step. round is necessary to correct for floating point error.
Ok I will leave this solution, here. First step is to calculate the fractional portion of number of items given the bounds [a,b] and the step amount. Next calculate an appropriate amount to add to the end that will not effect the size of the result numpy array and then call the np.arrange().
import numpy as np
def np_arange_fix(a, b, step):
nf = (lambda n: n-int(n))((b - a)/step+1)
bb = (lambda x: step*max(0.1, x) if x < 0.5 else 0)(nf)
arr = np.arange(a, b+bb, step)
if int((b-a)/step+1) != len(arr):
print('I failed, expected {} items, got {} items, arr-out{}'.format(int((b-a)/step), len(arr), arr))
raise
return arr
print(np_arange_fix(1.0, 4.4999999999999999, 1.0))
print(np_arange_fix(1.0, 4 + 1/3, 1/3))
print(np_arange_fix(1.0, 4 + 1/3, 1/3 + 0.1))
print(np_arange_fix(1.0, 6.0, 1.0))
print(np_arange_fix(0.1, 6.1, 1.0))
Prints:
[1. 2. 3. 4.]
[1. 1.33333333 1.66666667 2. 2.33333333 2.66666667
3. 3.33333333 3.66666667 4. 4.33333333]
[1. 1.43333333 1.86666667 2.3 2.73333333 3.16666667
3.6 4.03333333]
[1. 2. 3. 4. 5. 6.]
[0.1 1.1 2.1 3.1 4.1 5.1 6.1]
If you want to compact this down to a function:
def np_arange_fix(a, b, step):
b += (lambda x: step*max(0.1, x) if x < 0.5 else 0)((lambda n: n-int(n))((b - a)/step+1))
return np.arange(a, b, step)
No, this is not a duplicate and the link above is specifically what I was referring to as not the correct answer. That link, and my post here specifically ask about producing a Decimal list. But the "answer" produces a float list.
The correct answer is to use Decimal parameters with np.arange as in
`x_values = np.arange(Decimal(-2.0), Decimal(2.0), Decimal(0.1)) Thanks https://stackoverflow.com/users/2084384/boargules
I believe this may be answered elsewhere, but the answers I've found seem wrong. I want a list of decimals (precision = 1 decimal place) from -2 to 2.
-2, -1.9, -1.8 ... 1.8, 1.9, 2.0
When I do:
import numpy as np
x_values = np.arange(-2,2,0.1)
x_values
I get:
array([ -2.00000000e+00, -1.90000000e+00, -1.80000000e+00, ...
I tried:
from decimal import getcontext, Decimal
getcontext().prec = 2
x_values = [x for x in np.around(np.arange(-2, 2, .1), 2)]
x_values2 = [Decimal(x) for x in x_values]
x_values2
I get:
[Decimal('-2'),
Decimal('-1.899999999999999911182158029987476766109466552734375'),
Decimal('-1.8000000000000000444089209850062616169452667236328125'), ...
I'm running 3.6.3 in jupyter notebook.
Update: I changed the ranges from 2 to 2.0. This improved the result, but I still get a rounding error:
import numpy as np
x_values = np.arange(-2.0, 2.0, 0.1)
x_values
Which produces:
-2.00000000e+00, -1.90000000e+00, -1.80000000e+00, ...
1.00000000e-01, 1.77635684e-15, 1.00000000e-01, ...
1.80000000e+00, 1.90000000e+00
Note 1.77635684e-15 may be an incredibly small number, but it's NOT zero. A test for zero will fail. Therefore the output is wrong.
My response to the duplicate assertion. As you can see by my results the answer at How to use a decimal range() step value? does not produce the same results I'm seeing with a different range. Specifically floats are still being returned and not rounded and 1.77635684e-15 is not equal to zero.
The discussion and duplicate dance around a simple solution:
In [177]: np.arange(Decimal('-2.0'), Decimal('2.0'), Decimal('0.1'))
Out[177]:
array([Decimal('-2.0'), Decimal('-1.9'), Decimal('-1.8'), Decimal('-1.7'),
Decimal('-1.6'), Decimal('-1.5'), Decimal('-1.4'), Decimal('-1.3'),
Decimal('-1.2'), Decimal('-1.1'), Decimal('-1.0'), Decimal('-0.9'),
Decimal('-0.8'), Decimal('-0.7'), Decimal('-0.6'), Decimal('-0.5'),
Decimal('-0.4'), Decimal('-0.3'), Decimal('-0.2'), Decimal('-0.1'),
Decimal('0.0'), Decimal('0.1'), Decimal('0.2'), Decimal('0.3'),
Decimal('0.4'), Decimal('0.5'), Decimal('0.6'), Decimal('0.7'),
Decimal('0.8'), Decimal('0.9'), Decimal('1.0'), Decimal('1.1'),
Decimal('1.2'), Decimal('1.3'), Decimal('1.4'), Decimal('1.5'),
Decimal('1.6'), Decimal('1.7'), Decimal('1.8'), Decimal('1.9')],
dtype=object)
Giving float values to Decimal does not work well:
In [180]: np.arange(Decimal(-2.0), Decimal(2.0), Decimal(0.1))
Out[180]:
array([Decimal('-2'), Decimal('-1.899999999999999994448884877'),
Decimal('-1.799999999999999988897769754'),
Decimal('-1.699999999999999983346654631'),
because Decimal(0.1) just solidifies the floating point inprecision of 0.1:
In [178]: Decimal(0.1)
Out[178]: Decimal('0.1000000000000000055511151231257827021181583404541015625')
Suggested duplicate: How to use a decimal range() step value?
From numpy docs -
import numpy as np
np.set_printoptions(suppress=True)
will make sure that "always print floating point numbers using fixed point notation, in which case numbers equal to zero in the current precision will print as zero"
In[2]: import numpy as np
In[3]: np.array([1/50000000])
Out[3]: array([2.e-08])
In[4]: np.set_printoptions(suppress=True)
In[5]: np.array([1/50000000])
Out[5]: array([0.00000002])
In[6]: np.set_printoptions(precision=6)
In[7]: np.array([1/50000000])
Out[7]: array([0.])
In[8]: x_values = np.arange(-2,2,0.1)
In[9]: x_values
Out[9]:
array([-2. , -1.9, -1.8, -1.7, -1.6, -1.5, -1.4, -1.3, -1.2, -1.1, -1. ,
-0.9, -0.8, -0.7, -0.6, -0.5, -0.4, -0.3, -0.2, -0.1, 0. , 0.1,
0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2,
1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9])