Cannot increase accuracy of calculation (not enough decimal spaces) [duplicate]

Cannot increase accuracy of calculation (not enough decimal spaces) [duplicate] - python

This question already has answers here:
Is floating point arbitrary precision available?
(5 answers)
Closed 4 years ago.
I have code that is supposed to return a table with values that solve a particular system of equations using Backwards Euler Method.
I wrote code for one a timestep of 0.025, and it has returned me the necessary values, however, when I increase the amount of calculations, at some point the accuracy is lost because of lack of decimal spaces at the end of the values.
It just returns the same numbers over and over.
What can I do to increase accuracy? I have tried working with decimal but it still returns me the same values, some with less accuracy
This is the code that works:
delta3 = float(0.025)
nfinal3 = 1/0.025
ic3 = numpy.array([[1], [1]])
be3 = PrettyTable(['t','x','y'])
for l in range(0, int(nfinal3)):
x3=ic3[0]
y3=ic3[1]
firstline3 = -199*(x3+delta3)-198*(y3+delta3)
secondline3 = 99*(x3+delta3)+98*(y3+delta3)
systems3 = numpy.array([[firstline3],
[secondline3]])
step3 = delta3*systems3
result3 = numpy.array([[ic3[0] + step3[0]], [[ic3[1]+step3[1]]]])
ic3[0]=result3[0]
ic3[1]=result3[1]
be3.add_row([l+1, result3[0], result3[1]])
print be3[0]
and this is the code that gives inaccurate numbers
t4 = 0.01
n4 = 1/t4
ic4 = numpy.array([[1],[1]])
be4 = PrettyTable(['t','x','y'])
for q in range(0, int(n4)):
x4=ic4[0]
y4=ic4[1]
firstline4 = t4*(-199*(x4+t4)-198*(y4+t4))
secondline4 = t4*(99*(x4+t4)+98*(y4+t4))
result4 = numpy.array([[ic4[0]+firstline4], [ic4[1]+secondline4]])
ic4[0]=result4[0]
ic4[1]=result4[1]
be4.add_row([q+1, result4[0], result4[1]])
print be4
I'm relatively new to Python, so I might not understand higher-end concepts, but I would appreciate if anyone could point out what I'm doing wrong or what is a good module or function to use for this.

You could make your numpy arrays to store 128-bit floating points like so: result4 = np.array([[ic4[0]+firstline4], [ic4[1]+secondline4]], dtype = np.float128). You will have to do this for all the numpy arrays being used in the calculation. You can also make your scalars to be 0-Dimensional Numpy arrays with 128-bit like so t4 = np.array(0.01, dtype=np.float128). Then you'll need to re-write your operations using only Numpy arrays.
Another alternative is to use the decimal library from https://docs.python.org/2/library/decimal.html

Welcome to StackOverflow!
For higher accuracy, you could use decimal module in python. Simply wrap your decimal number with Decimal function (e.g. Decimal(0.025)) instead of float. There are operations function in the documentation that you could explore too. Use that in priority to the usual operation functions.

Related

How do you round a number to correct uncertainty?

In python, I have a number: U = 0.02462631224438585 +- 3.350971888120506e-06.
How do I round it to the correct significant figures due to the uncertainty being rounded to 1s.f.?
Is there an easy way of using numpy? Or scipy or are the built-in function the best for this?
I've tried using set_printoptions(precision=3) but this doesn't work.
I've also tried using round(number, significant - len(str(number))), but this seems long-winded.
I'm sure I have used a function that is simply a couple of years ago without having to create my own.
The final number should be U = 2.4626e-02 +- 3e-06
or U = (2.4626 +- 3e-4)e-02

the uncertainties module has the capability of computing the number of significant digits
import uncertainties
a = uncertainties.ufloat(0.02462631224438585, 3.350971888120506e-06)
print(a)
# 0.0246263+/-0.0000034
the defaut is two significant digits, however there is a format key for controlling the output
print('{:.1u}, {:.3uf}, {:.2uL}, {:0.2ue}'.format(a,a,a,a))
# 0.024626+/-0.000003, 0.02462631+/-0.00000335, 0.0246263 \pm 0.0000034, (2.46263+/-0.00034)e-02

Which stage to round at?

In Python, at what stage should round be used? Take this example: 10 * math.log(x) + 10 If I want this to be rounded which should I use?
round(10 * math.log(x) + 5)
round(10 * math.log(x)) + 5
10 * round(math.log(x)) + 5
My guess would be that rounding early would run the fastest because more arithmetic happens with integers, which seem like they should be faster than floats. Rounding seems less likely to break if some later values change.
Would the answer be the same with int()?

Don't prematurely optimize. In many cases, it's not highly optimized mathematical functions which slow down programs, but the logic, structure or data types used in the calculation.
To that end, I recommend you use cProfile to identify bottlenecks. Note that cProfile itself has an overhead, so it is mostly useful for relative comparisons.
As per #glibdud's comment, you have to understand how rounding will affect your calculation. Try a few examples, or perform a test to see how your error may vary across a large number of inputs.

The earlier you are rounding, the more your result will be affected by this rounding. In my opinion, it all depends on the expectations of your program.
As for the difference between int() and round(), this thread answers it perfectly.
To be more specific to your question about performance : The round() function, that is a python built-in, is implemented in C, and you shouldn't really worry about performance as it will be very, very negligible.
Round function

That entirely depends upon how you want your answer to be formatted and interpreted. I would not be hung up on the speed of the round function though (unless the very minor performance gain is crucial to your program). I would think about what I'm trying to accomplish by rounding. If your goal is to produce an output that is rounded to the nearest integer (for simplicity) then I would encompass your entire arithmetic statement into the round function. If your goal is to only use rounded integers in your log calculations (maybe because you don't want to use floats) then you should only round the math.log(x) function. There is no technical reason why you would use either, but there is definitely a logical reason that you would want to choose either of your options.

Please note that the Python Math.log() function is the base of e by default. By your questions it's unclear what base you expect so I'll assume log base of 10 like Google does. In order to make it equivalent to the mathematical function provided the code would need to be:
import math
#assuming x equals 2
x = 2
function1 = round(10 * math.log(x,10) + 5)
function2 = round(10 * math.log(x,10)) + 5)
function3 = 10 * round(math.log(x,10)) + 5)
function4 = 10*math.log(x,10)+5
print(function1)
print(function2)
print(function3)
print(function4)
Now, assuming x = 2, the calculations for the mathematical equation is 8.01029995664
Looking at the printed output from the above code:
8
8
5
8.010299956639813
It clearly shows that functions 1,2 and 4 are roughly mathematically equivalent with function 3 being incorrect. This is because the round function uses Half and Above rule to round up. Math.log(2,10) results in 0.3, so when the round function happens it drops to zero.
As for the equivalence of int() and round() the link referenced by IMCoins is pretty good. The summation is that int() removes decimal values from a number and the round uses the half and above rule so it will act like the int() for anything less than x.5.
As for the speed question, if accuracy is non-negotiable it would be best to round upon completion of the answer due to the same reasons as why function 3 was wrong above. If you're fairly certain you can round safely at a step, then I agree with the answer above to use CProfile and find the bottlenecks
Hope this helps.

I have no clue, but let's see :)
import time
import math
n = 1000000
x = 5
def timeit(f):
t_0 = time.perf_counter()
for _ in range(n):
f()
t_1 = time.perf_counter()
print((t_1 - t_0)/ n)
def fun1():
round(10 * math.log(x) + 5)
def fun2():
round(10 * math.log(x)) + 5
def fun3():
10 * round(math.log(x)) + 5
[timeit(_) for _ in [fun1, fun2, fun3]]
On my computer the last one is slightly faster than the others.

python/pandas keeping decimal places exactly as in string

I'm new to Python/pandas, and I have an issue with decimals and can't figure out for a few hours already how to solve it. Basically I want to read CSV file into pandas and keep the decimals exactly as they are stored in text, for future comparisons and simple math operations.
Example:
is_string_dtype(report['item_weight_kg'])
Out[12]: True
l = report.loc[report['item'] == 'B0WY']
num1 = l['item_weight_kg'][8210]
num1
Out[14]: '22.000370049504'
then I try to convert them to float, which gives me a value ending in ...3999 instead of ...4
report['item_weight_kg'] = report.apply(lambda x: float(x['item_weight_kg']), axis = 1 )
l = report.loc[report['item'] == 'B0WY']
num1 = l['item_weight_kg'][8210]
num1
Out[17]: 22.000370049503999
right after importing the dataset, I've tried to convert it to float, and in console it works properly, returns me the desired value, but when I am trying to apply it to the whole dataset, it doesn't
float(decimal.Decimal(l['item_weight_kg'][8210]))
Out[23]: 22.000370049504
report['item_weight_kg'] = report.apply(lambda x: float(decimal.Decimal(x['item_weight_kg'])), axis = 1 )
l = report.loc[report['item'] == 'B0WY']
num1 = l['item_weight_kg'][8210]
num1
Out[25]: 22.000370049503999
How can this be solved?

I have some good and bad news for you.
The bad news is that in python:
0.1 + 0.2 will give you 0.30000000000000004
And 0.1 + 0.2 == 0.3 will give False.
This is not just in python. This phenomena occurs in very large number of programming languages. In fact, there is a whole website dedicated to it! : https://0.30000000000000004.com/
You can read more about this in the official python docs, here.
The thing is, dealing with floats is tricky, especially when you try to do exact math (i.e. equality) just like your case.
Never expect exact math when dealing with floats!
Instead, when you try to check to floats for equality, you check if they are very close to each other.
Python 3.5+ provides this functionality (see here), and you can implement one yourself.
A simple float equality comparison goes like this:
epsilon = 0.0000001 # the smallest acceptable precision error
def float_equals(a,b):
return abs(a-b) &lt= epsilon
But what if we want more precision than what standard python offers?
In that case you can use an arbitrary precision library, like mpmath. That's the good news (maybe, idk).

Normally I'd use print formatting for strings or the round function.
https://docs.python.org/3/library/functions.html?highlight=round#round
Because you are using decimal you might meet your requirements by altering precision
https://docs.python.org/3/library/decimal.html?highlight=round

Python floating point error that has left me puzzled [duplicate]

This question already has answers here:
approximate comparison in python
(3 answers)
Closed 8 years ago.
I just recently ran into a problem where I needed to append numbers to a list only if they weren't in the list already, and then I had to run those numbers through a comparison later on. The problem arises in floating point arithmetic errors. To illustrate what is basically happening in my code:
_list = [5.333333333333333, 6.666666666666667, ...]
number = some_calculation()
if number not in _list:
_list.append(number) #note that I can't use a set to remove
#duplicates because order needs to be maintained
new_list = []
for num in _list:
if some_comparison(num): #note that I can't combine 'some_comparison' with the
new_list.append(num) #above check to see if the item is already in the list
The problem is that some_calculation() would sometimes generate an inexact number, such as 5.333333333333332, which is, as far as my calculations need to go, the same as the first element in _list in this example. The solution I had in mind was to simply round all the numbers generated to 9 or so decimal places. This worked for a short amount of time, until I realized that some_comparison compares num against, again, an inexact calculation. Even if I didn't round the numbers in _list, some_comparison would still return an inexact value and thus would evaluate to False.
I am absolutely puzzled. I've never had to worry about floating point errors so this problem is quite infuriating. Does anyone have any ideas for solutions?
NOTE: I would post the actual code, but it's very convoluted and requires 7 or 8 different functions and classes I made specifically for this purpose, and reposting them here would be a hassle.

Make the comparison something like
if(abs(a-b) <= 1e-6 * (a + b)):
This is standard practice when using floating point. The real value you use (instead of 1e-6) depends on the magnitude of the numbers you use and your definition of "the same".
EDIT I added *(a+b) to give some robustness for values of different magnitudes, and changed the comparison to <= rather than < to cover the case where a==b==0.0.

You can subclass list and add in a tolerance to __contains__:
class ListOFloats(list):
def __contains__(self, f):
# If you want a different tolerance, set it like so:
# l=ListOFloats([seq])
# l.tol=tolerance_you_want
tol=getattr(self, 'tol', 1e-12)
return any(abs(e-f) <= 0.5 * tol * (e + f) for e in self)
_list = ListOFloats([5.333333333333333, 6.666666666666667])
print(5.333333333333333 in _list)
# True
print(6.66666666666666 in _list)
# True
print(6.66666666666 in _list)
# False

Use round on both the values in the list and the comparison values. They won't be exact but they'll be consistent, so a search will return the expected results.

Operations for Long and Float in Python

I'm trying to compute this:
from scipy import *
3600**3400 * (exp(-3600)) / factorial(3400)
the error: unsupported long and float

Try using logarithms instead of working with the numbers directly. Since none of your operations are addition or subtraction, you could do the whole thing in logarithm form and convert back at the end.

Computing with numbers of such magnitude, you just can't use ordinary 64-bit-or-so floats, which is what Python's core runtime supports. Consider gmpy (do not get the sourceforge version, it's aeons out of date) -- with that, math, and some care...:
>>> e = gmpy.mpf(math.exp(1))
>>> gmpy.mpz(3600)**3400 * (e**(-3600)) / gmpy.fac(3400)
mpf('2.37929475533825366213e-5')
(I'm biased about gmpy, of course, since I originated and still participate in that project, but I'd never make strong claims about its floating point abilities... I've been using it mostly for integer stuff... still, it does make this computation possible!-).

You could try using the Decimal object. Calculations will be slower but you won't have trouble with really small numbers.
from decimal import Decimal
I don't know how Decimal interacts with the scipy module, however.
This numpy discussion might be relevant.

Well the error is coming about because you are trying to multiply
3600**3400
which is a long with
exp(-3600)
which is a float.
But regardless, the error you are receiving is disguising the true problem. It seems exp(-3600) is too big a number to fit in a float anyway. The python math library is fickle with large numbers, at best.

exp(-3600) is too smale, factorial(3400) is too large:
In [1]: from scipy import exp
In [2]: exp(-3600)
Out[2]: 0.0
In [3]: from scipy import factorial
In [4]: factorial(3400)
Out[4]: array(1.#INF)
What about calculate it step by step as a workaround(and it makes sense
to check the smallest and biggest intermediate result):
from math import exp
output = 1
smallest = 1e100
biggest = 0
for i,j in izip(xrange(1, 1701), xrange(3400, 1699, -1)):
output = output * 3600 * exp(-3600/3400) / i
output = output * 3600 * exp(-3600/3400) / j
smallest = min(smallest, output)
biggest = max(biggest, output)
print "output: ", output
print "smallest: ", smallest
print "biggest: ", biggest
output is:
output: 2.37929475534e-005
smallest: 2.37929475534e-005
biggest: 1.28724174494e+214

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.