Floating point precision in python for a convergent sequence starts oscillating - python

I'm trying to plot a mathematical expression in python. I have a sum of functions f_i of the following type
-x/r^2*exp(-rx)+2/r^3*(1-exp(-rx))-x/r^2*exp(-r x_i)
where x_i values between 1/360 and 50. r is quite small, meaning 0.0001. I'm interested in plotting the behavior of these functions (actually I'm interested in plotting sum f_i(x) n_i, for some real n_i) as x converges to zero. I know the exact analytical expression, which I can reproduce. However, the plot for very small x tends to start oscillating and doesn't seem to converge. I'm now wondering if this has to do with the floating-point precision in python. I'm considering very small x, like 0.0000001

.0000001 isn't very small.
Check out these pages:
https://docs.python.org/3/library/stdtypes.html#typesnumeric
https://docs.python.org/3/library/sys.html#sys.float_info
Try casting your intermediate values in the computations to float before doing math.

Related

Approximating a function with a step function with pairwise total error constraints in python

I need to approximate a function y(x) with a step function of height h where each "high" segment has a length l_i=n_i*l_0 and every "low" segment has a length of d_j=n_j*d_0 where n_i must be an integer. The function is strictly positive, (not strictly) steadily decreasing and continuous.
My function has been derived in sympy and is available as symbolic equation but it's acceptable to convert to numpy/scipy if beneficial.
My first approach was to solve the segments pairwise.
The end application requires the total difference, i.e. the integral between the approximation and target function, to be minimized pairwise.
Another practical constraint is for the segments to be as short as possible, with the constraint of n being an integer.
I would also need to take over any residual of the integral sum into the next calculation because the total approximation should also minimize the accumulated error.
The approach I thought about taking would involve doing a segment wise integral from x_0 to x_1 and from x_1 to x_2, find for which x_1, x_2 the sum of these integrals changes sign (or is minimized) and then find the lowest common denominator of n_i and n_j.
integral = smp.integrate(y-h,(x,x_0,x_1)) + smp.integrate(y,(x,x_1,x_2)
One approach would be to switch over to scipy.optimize.minimize at this point, however, I have read it has problems with integer values? On the other hand, I don't know how I could find a relationship for x_1(x_2) for which the integral would be close to 0 in sympy either as I just started using sympy yesterday. Any help would be hugely appreciated!

Is it possible to make the mean of the list of numbers generated from the normal distribution become exactly zero?

So, even in the absence of errors due to the finite machine precision it would be mathematically incorrect to think of a scenario where finite number of points sampled from a Gaussian distribution give exactly zero mean always. One would truly need an infinite number of points for this to be exactly true.
Nonetheless, I am manually (in an ad hoc manner) trying to center the distribution so that the mean is at zero. For that I first generate a gaussian distribution, find it's mean and then shift each point with that mean. By doing this I take the mean very close to zero but then I encounter a small value close to the machine precision (of the order 10**(-17)) and I do not know how to make it exactly zero.
Here is the code I used:
import numpy as np
n=10000
X=np.random.normal(0,1,size=(2,n))[0,:]
Xm=np.mean(X)
print("Xm = ", Xm)
Y=np.random.normal(0,1,size=(2,n))[1,:]
Ym=np.mean(Ym)
print("Ym = ", Y)
for i in range(len(X)):
X[i]=X[i]-Xm
Y[i]=Y[i]-Ym
new_X=np.mean(X)
new_Y=np.mean(Y)
print(new_X)
print(new_Y)
Output:
Zreli = 0.002713682499601005
Preli = -0.0011499576497770079
-3.552713678800501e-18
2.2026824808563105e-17
I am not good with code but mathematically you could have a while loop that checks for the sum of the numbers you have to not be 0. If it isn't 0 you would add 1 to the lowest unit you allow.

How to calculate coefficient of variation for extremely small numbers

I want to calculate squared coefficient of variation i.e.:
For sample X, where
My work so far
I cannot simply take exponent of ln(X) because those numbers are to small and will be treated as 0. I read on wikipedia that we can use estimation of form (when sample is big):
However, when I'm using this formula, I obtain very unstable result:
vec = np.array([-750.1729, -735.0251])
np.exp(np.var(vec)) - 1
8.1818556596351e+24
Is there any opportunity to have result more accurate? Or it has to be this way, because variance is very big with respect to mean?

Accuracy of math.pow, numpy.power, numpy.float_power, pow and ** in python

Is there are difference in accuracy between math.pow, numpy.power, numpy.float_power, pow() and ** in python, between two floating point numbers x,y?
I assume x is very close to 1, and y is large.
One way in which you would lose precision in all cases is if you are computing a small number (z say) and then computing
p = pow( 1.0+z, y)
The problem is that doubles have around 16 significant figures, so if z is say 1e-8, in forming 1.0+z you will lose half of those figures. Worse, if z is smaller than 1e-16, 1.0+z will be exactly 1.
You can get round this by using the numpy function log1p. This computes the log of its argument plus one, without actually adding 1 to its argument, so not losing precision.
You can compute p above as
p = exp( log1p(z)*y)
which will eliminate the loss of precision due to calculating 1+z

Is there any documentation of numpy numerical stability?

I looked around for some documentation of how numpy/scipy functions behave in terms of numerical stability, e.g. are any means taken to improve numerical stability or are there alternative stable implementations.
I am specifically interested in addition (+ operator) of floating point arrays, numpy.sum(), numpy.cumsum() and numpy.dot(). In all cases I am essentially summing a very large quantity of floating points numbers and I am concerned about the accuracy of such calculations.
Does anyone know of any reference to such issues in the numpy/scipy documentation or some other source?
The phrase "stability" refers to an algorithm. If your algorithm is unstable to start with then increasing precision or reducing rounding error of the component steps is not going to gain much.
The more complex numpy routines like "solve" are wrappers for the ATLAS/BLAS/LAPACK routines. You can refer to documentation there, for example "dgesv" solves a system of real valued linear equations using an LU decomposition with partial pivoting and row interchanges : underlying Fortran code docs for LAPACK can be seen here http://www.netlib.org/lapack/explore-html/ but http://docs.scipy.org/doc/numpy/user/install.html points out that many different versions of the standard routine implementations are available - speed optimisation and precision will vary between them.
Your examples don't introduce much rounding, "+" has no unnecessary rounding, the precision depends purely on rounding implicit in the floating point datatype when the smaller number has low-order bits that cannot be represented in an answer. Sum and dot depend only on the order of evaluation. Cumsum cannot be easily re-ordered as it outputs an array.
For the cumulative rounding during a "cumsum" or "dot" function you do have choices:
On Linux 64bit numpy provides access to a high precision "long double" type float128 which you could use to reduce loss of precision in intermediate calculations at the cost of performance and memory.
However on my Win64 install "numpy.longdouble" maps to "numpy.float64" a normal C double type so your code is not cross-platform, check "finfo". (Neither float96 or float128 with genuinely higher precision exist on Canopy Express Win64)
log2(finfo(float64).resolution)
> -49.828921423310433
actually 53-bits of mantissa internally ~ 16 significant decimal figures
log2(finfo(float32).resolution)
> -19.931568 # ~ only 7 meaningful digits
Since sum() and dot() reduce the array to a single value, maximising precision is easy with built-ins:
x = arange(1, 1000000, dtype = float32)
y = map(lambda z : float32(1.0/z), arange(1, 1000000))
sum(x) # 4.9994036e+11
sum(x, dtype = float64) # 499999500000.0
sum(y) # 14.357357
sum(y, dtype = float64) # 14.392725788474309
dot(x,y) # 999999.0
einsum('i,i', x, y) # * dot product = 999999.0
einsum('i,i', x, y, dtype = float64) # 999999.00003965141
note the single precision roundings within "dot" cancel in this case as each almost-integer is rounded to an exact integer
Optimising rounding depends on the kind of thing you are adding up - adding many small numbers first can help delay rounding but would not avoid problems where big numbers exist but cancel each other out as intermediate calculations still cause a loss of precision
example showing evaluation order dependence ...
x = array([ 1., 2e-15, 8e-15 , -0.7, -0.3], dtype=float32)
# evaluates to
# array([ 1.00000000e+00, 2.00000001e-15, 8.00000003e-15,
# -6.99999988e-01, -3.00000012e-01], dtype=float32)
sum(x) # 0
sum(x,dtype=float64) # 9.9920072216264089e-15
sum(random.permutation(x)) # gives 9.9999998e-15 / 2e-15 / 0.0

Categories

Resources