Precision issues when reverting normalization operation - python

I'm doing some normalization operation and, for my surprise, when trying to revert the operation I get a mismatch of 100% for the default 6 decimal precision of assert_array_almost_equal. Why is this occurring? Can it be due to the precision of my maximum value? If so, how can I get more precision in numpy.ndarray.max()?
from __future__ import division
import numpy
_max = numpy.float128(67.1036) # output of numpy.ndarray.max() on an a float32 array
def divide_and_mult(x, y):
return numpy.divide(numpy.float128(x), y) * y
for i in range(100):
try: numpy.testing.assert_array_equal(divide_and_mult(i, _max), numpy.float128(i))
except AssertionError, e: print e

You can't get more precision with numpy arrays than float128, on most systems the best is even lower: float64.
Normally you just don't care about a bit loss in precision and use np.testing.assert_almost_equal or similar functions that let you test for a specific absolute and/or relative difference.
In case you want to do it with much higher precision you need to use a type that has infinite or at least user-defined precision: decimal.Decimal or fractions.Fraction or switch to a symbolic math library like sympy.

Related

Floating point precision in python for a convergent sequence starts oscillating

I'm trying to plot a mathematical expression in python. I have a sum of functions f_i of the following type
-x/r^2*exp(-rx)+2/r^3*(1-exp(-rx))-x/r^2*exp(-r x_i)
where x_i values between 1/360 and 50. r is quite small, meaning 0.0001. I'm interested in plotting the behavior of these functions (actually I'm interested in plotting sum f_i(x) n_i, for some real n_i) as x converges to zero. I know the exact analytical expression, which I can reproduce. However, the plot for very small x tends to start oscillating and doesn't seem to converge. I'm now wondering if this has to do with the floating-point precision in python. I'm considering very small x, like 0.0000001
.0000001 isn't very small.
Check out these pages:
https://docs.python.org/3/library/stdtypes.html#typesnumeric
https://docs.python.org/3/library/sys.html#sys.float_info
Try casting your intermediate values in the computations to float before doing math.

How to increase precision in matplotlib?

I am trying to plot a set of extreme floating-point values that require high precision. It seems to me there are precision limits in matplotlib. It cannot go further than the scale of 1e28.
This is my code for displaying a graph.
import matplotlib.pyplot as plt
import numpy as np
x = np.array([1737100, 38380894.5188064386003616016502, 378029000.0], dtype=np.longdouble)
y = np.array([-76188946654889063420743355676.5, -76188946654889063419450832178.0, -76188946654889063450098993033.0], dtype=np.longdouble)
plt.scatter(x, y)
#coefficients = np.polyfit(x, y, 2)
#poly = np.poly1d(coefficients)
#new_x = np.linspace(x[0], x[-1])
#new_y = poly(new_x)
#plt.plot(new_x, new_y)
plt.xlim([x[0], x[-1]])
plt.title('U vs. r')
plt.xlabel('Distance r')
plt.ylabel('Total gravitational potential energy U(r)')
plt.show()
I am expecting the middle point to be located higher than the other two points. It requires very high precision. How can I configure it?
Your current issue is likely not with matplotlib but with np.longdouble. To discover whether this is the case, run np.finfo(np.longdouble). This will be machine dependent, but on my machine, this says I'm using a float128 with the following description
Machine parameters for float128
---------------------------------------------------------------
precision = 18 resolution = 1.0000000000000000715e-18
machep = -63 eps = 1.084202172485504434e-19
negep = -64 epsneg = 5.42101086242752217e-20
minexp = -16382 tiny = 3.3621031431120935063e-4932
maxexp = 16384 max = 1.189731495357231765e+4932
nexp = 15 min = -max
---------------------------------------------------------------
The precision is just an estimate (due to binary vs decimal representation), but 18 digits is the float128 limit, and your specific numbers only start to become interesting after that.
An easy test is to print y[1]-y[0] and see if you get something other than 0.0.
An easy solution is to use Python ints since you'd capture most of the difference (or int of 10*y) since Python has infinite precision ints. So something like this:
x = np.array([1737100, 38380894.5188064386003616016502, 378029000.0], dtype=np.longdouble)
y = [-76188946654889063420743355676, -76188946654889063419450832178, -76188946654889063450098993033]
plt.scatter(x, [z-y[0] for z in y])
Another solution is to represent the numbers from the start so that they require a more accessible precision (ie, with most of the offset removed). And another is to use a high precision float library. It depends on which way you want to go.
It's also worth noting that, at least for my system which I think is typical, the default np.float is float64. For float64 the floating point mantisaa is 52 bits, whereas for float128 it's only 63 bits. Or in decimal, from about 15 digits to 18. So there's not a great precision increase for going from np.float to np.float128. (Here's a discussion of why np.longdouble ( or np.float128) sounds like it's going to add a lot of precision, but doesn't.)
(Finally, because this may cause confusion for some, if it were the case that np.longdouble or np.float128 were useful for this problem, it's worth noting that the line in the question that sets the initial array wouldn't give the intended precision of np.longdouble. That is, y=np.array( [-76188946654889063420743355676.5], dtype=np.longdouble) first creates and array of Python floats, and then creates the numpy array from that, but the precision will be lost in the Python array. So if longdouble were the solution, a different approach to initializing the array would be needed.)

poly1d gives erroneous coefficients when they are very large integers

I am working with python 3.5.2 in ubuntu 16.04.2 LTS, and NumPy 1.12.1. When I use poly1d function to get the coeffs, there is a mistake with the computation:
>>> from numpy import poly1d
>>> from math import fabs
>>> pol = poly1d([2357888,459987,78123455],True)
>>>[int(x) for x in pol.coeffs]
[1, -80941330, 221226728585581, -84732529566356586496]
as you see in this list the last element is not correct. When I build the polynomial using Wolfram Alpha, I get:
x^3 - 80941330 x^2 + 221226728585581 x - 84732529566356580480
The last coefficient is different using poly1d (the first-one ends in ...496 and the other ends in ...480).Ii have to suppose that the correct ones is the last compute (made by WolframAlpha).
Is this a bug or is there something I am not taking account of? I've probed with roots with low absolute value; and in this case the computation is correct. But when I use "big roots", the difference is notable.
As Warren Weckesser said, this is a precision issue. But it can be worked around by declaring the array of roots to be of type object. In this way you can take advantage of Python's big integers, or of higher precision provided by mpmath objects. NumPy is considerate enough not to coerce them to double precision. Example:
import numpy as np
roots = np.array([2357888, 459987, 78123455], dtype=object)
pol = np.poly1d(roots, True)
print(pol.coeffs)
Output: [1 -80941330 221226728585581 -84732529566356580480]
The coefficients are stored as 64 bit floating point values. These do not have enough precision to represent the value -84732529566356580480 exactly.

Can lambdify return an array with dytpe np.float128?

I am solving a large non-linear system of equations and I need a high degree of numerical precision. I am currently using sympy.lambdify to convert symbolic expressions for the system of equations and its Jacobian into vectorized functions that take ndarrays as inputs and return an ndarray as outputs.
By default, lambdify returns an array with dtype of numpy.float64. Is it possible to have it return an array with dtype numpy.float128? Perhaps this requires the inputs to have dtype of numpy.float128?
If you need a lot of precision, you can try using SymPy floats, or mpmath directly (which is part of SymPy), which provides arbitrary precision. For example, sympy.Float('2.0', 100) creates a float of 2.0 with 100 digits of precision. You can use something like sympy.sin(2).evalf(100) to get 100 digits of sin(2) for instance. This will be a lot slower than numpy because it is arbitrary precision, meaning it doesn't use machine floats, and it is implemented in pure Python (whereas numpy is written in Fortran and C).
The output just reflects the input:
from numpy import float128
from sympy.abc import x
from sympy.utilities import lambdify
f = lambdify(x, x ** 2)
result = f(float128(2))
result
#>>> 4.0
type(result)
#>>> <class 'numpy.float128'>

Is there any documentation of numpy numerical stability?

I looked around for some documentation of how numpy/scipy functions behave in terms of numerical stability, e.g. are any means taken to improve numerical stability or are there alternative stable implementations.
I am specifically interested in addition (+ operator) of floating point arrays, numpy.sum(), numpy.cumsum() and numpy.dot(). In all cases I am essentially summing a very large quantity of floating points numbers and I am concerned about the accuracy of such calculations.
Does anyone know of any reference to such issues in the numpy/scipy documentation or some other source?
The phrase "stability" refers to an algorithm. If your algorithm is unstable to start with then increasing precision or reducing rounding error of the component steps is not going to gain much.
The more complex numpy routines like "solve" are wrappers for the ATLAS/BLAS/LAPACK routines. You can refer to documentation there, for example "dgesv" solves a system of real valued linear equations using an LU decomposition with partial pivoting and row interchanges : underlying Fortran code docs for LAPACK can be seen here http://www.netlib.org/lapack/explore-html/ but http://docs.scipy.org/doc/numpy/user/install.html points out that many different versions of the standard routine implementations are available - speed optimisation and precision will vary between them.
Your examples don't introduce much rounding, "+" has no unnecessary rounding, the precision depends purely on rounding implicit in the floating point datatype when the smaller number has low-order bits that cannot be represented in an answer. Sum and dot depend only on the order of evaluation. Cumsum cannot be easily re-ordered as it outputs an array.
For the cumulative rounding during a "cumsum" or "dot" function you do have choices:
On Linux 64bit numpy provides access to a high precision "long double" type float128 which you could use to reduce loss of precision in intermediate calculations at the cost of performance and memory.
However on my Win64 install "numpy.longdouble" maps to "numpy.float64" a normal C double type so your code is not cross-platform, check "finfo". (Neither float96 or float128 with genuinely higher precision exist on Canopy Express Win64)
log2(finfo(float64).resolution)
> -49.828921423310433
actually 53-bits of mantissa internally ~ 16 significant decimal figures
log2(finfo(float32).resolution)
> -19.931568 # ~ only 7 meaningful digits
Since sum() and dot() reduce the array to a single value, maximising precision is easy with built-ins:
x = arange(1, 1000000, dtype = float32)
y = map(lambda z : float32(1.0/z), arange(1, 1000000))
sum(x) # 4.9994036e+11
sum(x, dtype = float64) # 499999500000.0
sum(y) # 14.357357
sum(y, dtype = float64) # 14.392725788474309
dot(x,y) # 999999.0
einsum('i,i', x, y) # * dot product = 999999.0
einsum('i,i', x, y, dtype = float64) # 999999.00003965141
note the single precision roundings within "dot" cancel in this case as each almost-integer is rounded to an exact integer
Optimising rounding depends on the kind of thing you are adding up - adding many small numbers first can help delay rounding but would not avoid problems where big numbers exist but cancel each other out as intermediate calculations still cause a loss of precision
example showing evaluation order dependence ...
x = array([ 1., 2e-15, 8e-15 , -0.7, -0.3], dtype=float32)
# evaluates to
# array([ 1.00000000e+00, 2.00000001e-15, 8.00000003e-15,
# -6.99999988e-01, -3.00000012e-01], dtype=float32)
sum(x) # 0
sum(x,dtype=float64) # 9.9920072216264089e-15
sum(random.permutation(x)) # gives 9.9999998e-15 / 2e-15 / 0.0

Categories

Resources