Avoiding small numerical errors when using IPython - python

I have been switching from Matlab to IPython.
In IPython, if we multiply 3.1 by 2.1, the following is the result:
In [297]:
3.1 * 2.1
Out[297]:
6.510000000000001
There is a small round-off error. It is not a big problem, but it is a little bit annoying. I assume that it appeared while converting decimal numbers into binary numbers and vice versa, is it right?
However, in Numpy array, the result is correct:
>>> np.array([3.1 * 2.1])
array([ 6.51])
In Matlab command line prompt, also the result is correct:
>> 3.1 * 2.1
ans =
6.5100
The above round-off error in Python looks annoying. Are there some ways to avoid this error in the python interactive mode or in IPython?

The numpy result is no more precise than the pure Python one - the floating point imprecision is just hidden from you because, by default, numpy prints fewer decimal places of the result:
In [1]: float(np.array([3.1 * 2.1]))
Out[1]: 6.510000000000001
You can control how numpy displays floating point numbers using np.set_printoptions. For example, to print 16 decimal places rather than the usual 8:
In [2]: np.set_printoptions(precision=16)
In [3]: np.array([3.1 * 2.1])
Out[3]: array([ 6.5100000000000007])
In IPython you can also use the %precision magic to control the number of decimal places that are displayed when pretty-printing normal Python floats:
In [4]: %precision 8
Out[4]: u'%.8f'
In [5]: 3.1 * 2.1
Out[5]: 6.51000000
Note that this is purely cosmetic - the value of 3.1 * 2.1 will still be equal to 6.5100000000000006750155990... rather than 6.51.

In Octave, a MATLAB clone, I can display those distant decimals:
octave:12> printf("%24.20f\n", 3.1*2.1)
6.51000000000000067502
They are also present your numpy.array
In [6]: np.array([3.1*2.1]).item()
Out[6]: 6.510000000000001
even the component terms involve this sort of rounding:
octave:13> printf("%24.20f\n", 3.1)
3.10000000000000008882
octave:14> printf("%24.20f\n", 2.1)
2.10000000000000008882

Related

Runtime error for matplotlib using np.arange [duplicate]

I am working with Python 3.6.
I am really confused, why this happened ?
In [1]: import numpy as np
In [2]: a = np.array(-1)
In [3]: a
Out[3]: array(-1)
In [4]: a ** (1/3)
/Users/wonderful/anaconda/bin/ipython:1: RuntimeWarning: invalid value encountered in power
#!/Users/wonderful/anaconda/bin/python
Out[4]: nan
Numpy does not seem to allow fractional powers of negative numbers, even if the power would not result in a complex number. (I actually had this same problem earlier today, unrelatedly). One workaround is to use
np.sign(a) * (np.abs(a)) ** (1 / 3)
change the dtype to complex numbers
a = np.array(-1, dtype=np.complex)
The problem arises when you are working with roots of negative numbers.

How does pandas / numpy round() work when decimal>=1?

I have a pandas series
In [1]: import pandas as pd
In [2]: s = pd.Series([1.3, 2.6, 1.24, 1.27, 1.45])
and I need to round the numbers.
In [4]: s.round(1)
Out[4]:
0 1.3
1 2.6
2 1.2
3 1.3
4 1.4
dtype: float64
it works for 1.27, however 1.45 is rounded to be 1.4, is it the problem of the precision loss of float type? If it is, how can I deal with this problem?
This isn't a bug but it is because, most decimal numbers cannot be represented exactly as a float.
https://www.programiz.com/python-programming/methods/built-in/round
another way of rounding is:
int(number*10^precission+0.5)
however, you might run in simular problems because who knows if 1.45 is closer to 1.4499999.. or 1.4500...1
In general, round() often fails due to floats being imprecise estimates.
In this case though, it's because of a convention by which half of all the numbers (evens) are rounded down, in order to balance out rounding error.
You can pretty easily disable this behavior:
round(x[, n])
x rounded to n digits, rounding half to even. If n is omitted, it defaults to 0.

Multiplication of floating point numbers gives different results in Numpy and R

I am doing data analysis in Python (Numpy) and R. My data is a vector 795067 X 3 and computing the mean, median, standard deviation, and IQR on this data yields different results depending on whether I use Numpy or R. I crosschecked the values and it looks like R gives the "correct" value.
Median:
Numpy:14.948499999999999
R: 14.9632
Mean:
Numpy: 13.097945407088607
R: 13.10936
Standard Deviation:
Numpy: 7.3927612774052083
R: 7.390328
IQR:
Numpy:12.358700000000002
R: 12.3468
Max and min of the data are the same on both platforms. I ran a quick test to better understand what is going on here.
Multiplying 1.2*1.2 in Numpy gives 1.4 (same with R).
Multiplying 1.22*1.22 gives 1.4884 in Numpy and the same with R.
However, multiplying 1.222*1.222 in Numpy gives 1.4932839999999998 which is clearly wrong! Doing the multiplication in R gives the correct answer of 1.49324.
Multiplying 1.2222*1.2222 in Numpy gives 1.4937728399999999 and 1.493773 in R. Once more, R is correct.
In Numpy, the numbers are float64 datatype and they are double in R. What is going on here? Why are Numpy and R giving different results? I know R uses IEEE754 double-precision but I don't know what precision Numpy uses. How can I change Numpy to give me the "correct" answer?
Python
The print statement/function in Python will print single-precision floats. Calculations will actually be done in the precision specified. Python/numpy uses double-precision float by default (at least on my 64-bit machine):
import numpy
single = numpy.float32(1.222) * numpy.float32(1.222)
double = numpy.float64(1.222) * numpy.float64(1.222)
pyfloat = 1.222 * 1.222
print single, double, pyfloat
# 1.49328 1.493284 1.493284
print "%.16f, %.16f, %.16f"%(single, double, pyfloat)
# 1.4932839870452881, 1.4932839999999998, 1.4932839999999998
In an interactive Python/iPython shell, the shell prints double-precision results when printing the results of statements:
>>> 1.222 * 1.222
1.4932839999999998
In [1]: 1.222 * 1.222
Out[1]: 1.4932839999999998
R
It looks like R is doing the same as Python when using print and sprintf:
print(1.222 * 1.222)
# 1.493284
sprintf("%.16f", 1.222 * 1.222)
# "1.4932839999999998"
In contrast to interactive Python shells, the interactive R shell also prints single-precision when printing the results of statements:
> 1.222 * 1.222
[1] 1.493284
Differences between Python and R
The differences in your results could result from using single-precision values in numpy. Calculations with a lot of additions/subtractions will ultimately make the problem surface:
In [1]: import numpy
In [2]: a = numpy.float32(1.222)
In [3]: a*6
Out[3]: 7.3320000171661377
In [4]: a+a+a+a+a+a
Out[4]: 7.3320003
As suggested in the comments to your actual question, make sure to use double-precision floats in your numpy calculations.

Why np.array([1e5])**2 is different from np.array([100000])**2 in Python?

May someone please explain me why np.array([1e5])**2 is not the equivalent of np.array([100000])**2? Coming from Matlab, I found it confusing!
>>> np.array([1e5])**2
array([ 1.00000000e+10]) # correct
>>> np.array([100000])**2
array([1410065408]) # Why??
I found that this behaviour starts from 1e5, as the below code is giving the right result:
>>> np.array([1e4])**2
array([ 1.00000000e+08]) # correct
>>> np.array([10000])**2
array([100000000]) # and still correct
1e5 is a floating point number, but 10000 is an integer:
In [1]: import numpy as np
In [2]: np.array([1e5]).dtype
Out[2]: dtype('float64')
In [3]: np.array([10000]).dtype
Out[3]: dtype('int64')
But in numpy, integers have a fixed width (as opposed to python itself in which they are arbitrary length numbers), so they "roll over" when they get larger than the maximum allowed value.
(Note that in your case you are using a 32-bit build, so in fact the latter would give you dtype('int32'), which has a maximum value 2**32-1=2,147,483,647, roughly 2e9, which is less than 1e10.)
You're system is defaulting to np.int32, which can't handle 100000**2. If you use 64-bit precision, you'll be fine:
In [6]: np.array([100000], dtype=np.int32)**2
Out[6]: array([1410065408], dtype=int32)
In [7]: np.array([100000], dtype=np.int64)**2
Out[7]: array([10000000000])
What the default is (32 vs 64) depends on your numpy build.

Python math module logarithm functions [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Inaccurate Logarithm in Python
Why are the math.log10(x) and math.log(x,10) results different?
In [1]: from math import *
In [2]: log10(1000)
Out[2]: 3.0
In [3]: log(1000,10)
Out[3]: 2.9999999999999996
It's a known bug : http://bugs.python.org/issue3724
Seems logX(y) is always more precise than the equivalent log(Y, X).
math.log10 and math.log(x, 10) are using different algorithm, and the former is usually more accurate. Actually, it's a known issue(Issue6765): math.log, log10 inconsistency.
One may think in this way: log10(x) has a fixed base, hence it can be computed directly by some mathematical approximation formula(e.g. Taylor series), while log(x, 10) comes from a more general formula with two variables, which may be indirectly calculated by log(x) / log(10)(at least the precision of log(10) will affect the precision of quotient). So it's natural that the former way is both faster and more accurate, and that is reasonable considering that it takes advantage of a pre-known logarithmic base(i.e. 10).
As others have pointed out, log(1000, 10) is computed internally as log(1000) / log(10). This can be verified empirically:
In [3]: math.log(1000, 10) == math.log(1000) / math.log(10)
Out[3]: True
In [4]: math.log10(1000) == math.log(1000) / math.log(10)
Out[4]: False
The results of neither log(1000) nor log(10) can be represented as float, so the final result is also inexact.

Categories

Resources