Pandas subtraction behavior having precision issues (even after casting) [duplicate] - python

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 2 years ago.
Maybe this was answered before, but I'm trying to understand what is the best way to work with Pandas subtraction.
import pandas as pd
import random
import numpy as np
random.seed(42)
data = {'r': list([float(random.random()) for i in range(5)])}
for i in range(5):
data['r'].append(float(0.7))
df = pd.DataFrame(data)
If I run the following, I get the expected results:
print(np.sum(df['r'] >= 0.7))
6
However, if I modify slightly the condition, I don't get the expected results:
print(np.sum(df['r']-0.5 >= 0.2))
1
The same happens if I try to fix it by casting into float or np.float64 (and combinations of this), like the following:
print(np.sum(df['r'].astype(np.float64)-np.float64(0.5) >= np.float64(0.2)))
1
For sure I'm not doing the casting properly, but any help on this would be more than welcome!

You're not doing anything improperly. This is a totally straightforward floating point error. It will always happen.
>>> 0.7 >= 0.7
True
>>> (0.7 - 0.5) >= 0.2
False
You have to remember that floating point numbers are represented in binary, so they can only represent sums of powers of 2 with perfect precision. Anything that can't be represented finitely as a sum of powers of two will be subject to error like this.
You can see why by forcing Python to display the full-precision value associated with the literal 0.7:
format(0.7, '.60g')
'0.6999999999999999555910790149937383830547332763671875'

To add to #senderle answer, since this is a floating point issue you can solve it by:
((df['r'] - 0.5) >= 0.19).sum()
Oh a slightly different note, I'm not sure why you use np.sum when you could just use pandas .sum, seems like an unnecessary import

Related

How to fix the floating point error in python [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 1 year ago.
I have been recently trying to make a calculator in python and added floating points in it
but unfortunately it gives me for this simple code
print(0.1 + 0.2)
out put
0.30000004
I have searched a lot for it in stackoverflow but I keep getting questions about why it happened and not how to fix it.
Edit:
a lot of people have been recently giving me private feedbacks about this already exists I appreciate that to improve stackoverflow but there are no questions most of them are explanations about why it is happening and not how to fix it. some of the feedbacks have got even questions that are not python
you can try any of the follow methods which is conventional for you
#for two decimal places use 2f, for 3 use 3f
val = 0.1 + 0.2
print(f"val = {val:.2f}")
#output | val = 0.30
#or else
print(round(val,2))
#output | val = 0.30
You can use the decimal built-in module with strings and make your own methods:
from decimal import Decimal
def exact_add(*nbs):
return float(sum([Decimal(str(nb)) for nb in nbs]))
exact_add(0.1, 0.2)
# > 0.3
I think the recommended way to resolve this is to determine a number of decimal places you want to consider and round to that using the built-in round function. Say you want to use 5 decimal places, you could do:
ans = 0.1 + 0.2
print(ans) # 0.30000004
round(ans,5)
print(ans) # 0.3
Note that round also gets rid of extra zeroes in the end. If you round 0.333333333333 to 5 decimals it will return 0.33333, but rounding 0.30000000004 returns 0.3.
This has nothing to do with python, but with the way floating point values are represented on your machine. The way floating point works is the same for all languages, and it can potentially introduce rounding errors. This link shows you exactly why you get this error.
Just reduce the number of visible digits and you'll be fine. Just now that all computers and calculators have limited space for numbers, and all of them are prone to rounding errors.
You can use:
a = 0.1 + 0.2
a = round(a, 2)
print(a)
Output: 0.3
use round() syntax to get rid of excessive number of zeros.
'''
a = 0.1
b = 0.2
c = a + b
answer = round(c,3) # here 3 represents the number of digits you want to round
print(answer)
'''
and incase you want more clarification on round() syntax just visit this link once: https://www.bitdegree.org/learn/python-round

Cannot increase accuracy of calculation (not enough decimal spaces) [duplicate]

This question already has answers here:
Is floating point arbitrary precision available?
(5 answers)
Closed 4 years ago.
I have code that is supposed to return a table with values that solve a particular system of equations using Backwards Euler Method.
I wrote code for one a timestep of 0.025, and it has returned me the necessary values, however, when I increase the amount of calculations, at some point the accuracy is lost because of lack of decimal spaces at the end of the values.
It just returns the same numbers over and over.
What can I do to increase accuracy? I have tried working with decimal but it still returns me the same values, some with less accuracy
This is the code that works:
delta3 = float(0.025)
nfinal3 = 1/0.025
ic3 = numpy.array([[1], [1]])
be3 = PrettyTable(['t','x','y'])
for l in range(0, int(nfinal3)):
x3=ic3[0]
y3=ic3[1]
firstline3 = -199*(x3+delta3)-198*(y3+delta3)
secondline3 = 99*(x3+delta3)+98*(y3+delta3)
systems3 = numpy.array([[firstline3],
[secondline3]])
step3 = delta3*systems3
result3 = numpy.array([[ic3[0] + step3[0]], [[ic3[1]+step3[1]]]])
ic3[0]=result3[0]
ic3[1]=result3[1]
be3.add_row([l+1, result3[0], result3[1]])
print be3[0]
and this is the code that gives inaccurate numbers
t4 = 0.01
n4 = 1/t4
ic4 = numpy.array([[1],[1]])
be4 = PrettyTable(['t','x','y'])
for q in range(0, int(n4)):
x4=ic4[0]
y4=ic4[1]
firstline4 = t4*(-199*(x4+t4)-198*(y4+t4))
secondline4 = t4*(99*(x4+t4)+98*(y4+t4))
result4 = numpy.array([[ic4[0]+firstline4], [ic4[1]+secondline4]])
ic4[0]=result4[0]
ic4[1]=result4[1]
be4.add_row([q+1, result4[0], result4[1]])
print be4
I'm relatively new to Python, so I might not understand higher-end concepts, but I would appreciate if anyone could point out what I'm doing wrong or what is a good module or function to use for this.
You could make your numpy arrays to store 128-bit floating points like so: result4 = np.array([[ic4[0]+firstline4], [ic4[1]+secondline4]], dtype = np.float128). You will have to do this for all the numpy arrays being used in the calculation. You can also make your scalars to be 0-Dimensional Numpy arrays with 128-bit like so t4 = np.array(0.01, dtype=np.float128). Then you'll need to re-write your operations using only Numpy arrays.
Another alternative is to use the decimal library from https://docs.python.org/2/library/decimal.html
Welcome to StackOverflow!
For higher accuracy, you could use decimal module in python. Simply wrap your decimal number with Decimal function (e.g. Decimal(0.025)) instead of float. There are operations function in the documentation that you could explore too. Use that in priority to the usual operation functions.

Python 2 decimal division unexpected results [duplicate]

This question already has answers here:
How can I force division to be floating point? Division keeps rounding down to 0?
(11 answers)
Closed 6 years ago.
I am trying to do the calculation
Using the python decimal module with the following code:
from decimal import *
getcontext().prec = 9
sum = Decimal(0)
for i in range(1,11):
sum += Decimal(1/(i**4))
print sum
however, this outputs 1, not a very small fraction like I would expect. I can't find much information here https://docs.python.org/2/library/decimal.html about what is wrong with the code. My guess is sum is not being used as a Decimal in the loop, but I am unsure how to resolve that.
If you use Python 2.x, then in the expression: 1/(i**4), the integer devision is used, as result for i=1, it equals to 1 and for all other i>1, it gets 0.
Just add floating point to 1: 1./(i**4), this should fix the problem.
PS In Python 3.x, your code should work as expected, because operator / is defined on floating point numbers, while operator // is defined for integers.
First of all, don't use sum as a variable name, as it is a built-in.
And its sort of necessary to provide at least one float for arithmetic if you expect a float-type answer, here:
s = Decimal(0)
for i in range(1,11):
s += Decimal(1./(i**4)) # dividing 1. or 1.0 instead of just 1
print s
this gives:
1.08203658

Lists in Python last element [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 6 years ago.
I am new to python and I tried this:
import numpy as np
x = np.arange(0.7,1.3,0.1)
print (x)
y = np.arange(0.6,1.3,0.1)
print (y)
The output was [ 0.7 0.8 0.9 1. 1.1 1.2 1.3] and [ 0.6 0.7 0.8 0.9 1. 1.1 1.2]. Why in the first case 1.3 appears in the list and in the second case it doesn't?
This is due to rounding errors. If you actually print the last element in x in it's full precision, you'll see that it is smaller than 1.3:
>>> import numpy as np
>>> x = np.arange(0.7,1.3,0.1)
>>> 1.3 > x[-1]
True
>>> x[-1]
1.2999999999999998
Note, as stated in the documentation
When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use linspace for these cases.:
arange is not suitable for floating point numbers:
When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use linspace for these cases.
I'm not familiar with the internals of numpy, but my guess is that this is a side effect of floating point numbers not being exact (meaning that they can't exactly represent some values).
See the numpy.arange documentation here:
specifically "When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use linspace for these cases"

numpy float32 truncating decimal [duplicate]

This question already has answers here:
How to set the precision on str(numpy.float64)?
(5 answers)
Closed 7 years ago.
I'm working on a school project that requires me to do some math on single-precision floating point numbers. I thought I would use the float32 format in numpy as python is really the only general purpose language I know. IMO this number format should be able to handle the number 1.0000001, but it keeps truncating my answer to 1.0. The closest I can get it to handle is 1.00001. Can anyone shed any light on this? I'm new to this floating point format and Python.
import numpy as np
keyInput=np.float32(input("Enter a number and i'll float 32 it: "))
print(keyInput)
print(np.float32(keyInput))
print("This is of type: ",type(keyInput))
input('Press ENTER to exit')
First of all, print without explicit formatting or conversion is not reliable. You should try something like print "%.10f" % number instead of print number.
Second, as commentators have pointed out, you can't expect all decimal numbers gets represented precisely as floating point number. Read the Goldberg paper. It's a must read.
An example ipython session for you (I'm using Python 2.7, if you use Python 3, print is a function):
In [1]: import numpy
In [2]: print numpy.float32(1.0 + 1e-7)
1.0
In [3]: print "%.10f" % numpy.float32(1.0 + 1e-7)
1.0000001192
In [4]: print "%.10f" % numpy.float32(1.0 + 1e-8)
1.0000000000
Edit: you can use numpy to inspect type precision limits. Consult the doc of numpy.MachAr for more.
Example:
In [1]: import numpy
In [2]: machar = numpy.MachAr(float_conv=numpy.float32)
In [3]: machar.eps
Out[3]: 1.1920928955078125e-07

Categories

Resources