Avoid Mean of Floating Point Error

Avoid Mean of Floating Point Error - python

When I calculate the mean of a list of floats the following way
def mean(x):
sum(x) / len(x)
then I usually do not care about tiny errors in floating point operations. Though, I am currently facing an issue where I want to get all elements in a list that are equal or above the list's average.
Again, this is usually no issue but when I face cases where all elements in the list are equal floating point numbers than the mean value calculated by the function above actually returns a value above all the elements. That, in my case, obviously is an issue.
I need a workaround to that involving no reliability on python3.x libraries (like e.g. statistics).
Edit:
It has been suggested in the comments to use rounding. This interestingly resulted in errors being rarer, but they still occur, as e.g. in this case:
[0.024484987, 0.024484987, 0.024484987, 0.024484987, ...] # x
0.024485 # mean
[] # numbers above mean

I believe you should be using math.fsum() instead of sum. For example:
>>> a = [0.024484987, 0.024484987, 0.024484987, 0.024484987] * 1360001
>>> math.fsum(a) / len(a)
0.024484987
This is, I believe, the answer you are looking for. It produces more consistent results, irrespective of the length of a, than the equivalent using sum().
>>> sum(a) / len(a)
0.024484987003073517

One neat solution is to use compensated summation, combined with double-double tricks to perform the division accurately:
def mean_kbn(X):
# 1. Kahan-Babuska-Neumaier summation
s = c = 0.0
n = 0
for x in X:
t = s + x
if abs(s) >= abs(x):
c -= ((s-t) + x)
else:
c -= ((x-t) + s)
s = t
n += 1
# sum is now s - c
# 2. double-double division from Dekker (1971)
# https://link.springer.com/article/10.1007%2FBF01397083
u = s / n # first guess of division
# Python doesn't have an fma function, so do mul2 via Veltkamp splitting
v = 1.34217729e8 # 0x1p27 + 1
uv = u*v
u_hi = (u - uv) + uv
u_lo = u - u_hi
nv = n*v
n_hi = (n - nv) + nv
n_lo = n - n_hi
# r = s - u*n exactly
r = (((s - u_hi*n_hi) - u_hi*n_lo) - u_lo*n_hi) - u_lo*n_lo
# add correction
return u + (r-c)/n
Here's a sample case I found, comparing with the sum, math.fsum and numpy.mean:
>>> mean_kbn([0.2,0.2,0.2])
0.2
>>> sum([0.2,0.2,0.2])/3
0.20000000000000004
>>> import math
>>> math.fsum([0.2,0.2,0.2])/3
0.20000000000000004
>>> import numpy
>>> numpy.mean([0.2,0.2,0.2])
0.20000000000000004

How about not using the mean but just multiplying each element by the length of the list and comparing it directly to the sum of the original list?
I think this should do what you want without relying on division

Related

if condition using sympy equation solver/ sympy very slow

I want to solve this equation witht the following parameters:
gamma = 0.1
F = 0.5
w = 0
A = symbols('A')
a = 1 + w**4 -w**2 + 4*(gamma**2)*w**2
b = 1 - w**2
sol = solve(a*A**2 + (9/16)*A**6 + (3/2)*b*A**4 -F**2)
list_A = []
for i in range(len(sol)):
if(type( solutions[i] )==float ):
print(sol[i])
list_A = sol[i]
However, as supposed, I am getting some real and complex values, and I want to remove the complex ones and only keep the floats. But this condition I implemented is not valid due to the type of sol[i] is either sympy.core.add.Add for complex or sympy.core.numbers.Float for floats.
My question is, how can I modify my condition so that it works for getting only the float values?
In addition, is there a way to speed it up? it is very slow if I put it in a loop for many values of omega.
this is my first time working with sympy

When it is able to validate solutions relative to assumptions on symbols, it will; so if you tell SymPy that A is real then -- if it can verify the solutions -- it will only show the real ones:
>>> A = symbols('A',real=True)
>>> sol = solve(a*A**2 + (9/16)*A**6 + (3/2)*b*A**4 -F**2)
>>> sol
[-0.437286658108243, 0.437286658108243]

numpy sin and cos don't return exactly zero for reference angles that have zero values [duplicate]

Does anyone know why the below doesn't equal 0?
import numpy as np
np.sin(np.radians(180))
or:
np.sin(np.pi)
When I enter it into python it gives me 1.22e-16.

The number π cannot be represented exactly as a floating-point number. So, np.radians(180) doesn't give you π, it gives you 3.1415926535897931.
And sin(3.1415926535897931) is in fact something like 1.22e-16.
So, how do you deal with this?
You have to work out, or at least guess at, appropriate absolute and/or relative error bounds, and then instead of x == y, you write:
abs(y - x) < abs_bounds and abs(y-x) < rel_bounds * y
(This also means that you have to organize your computation so that the relative error is larger relative to y than to x. In your case, because y is the constant 0, that's trivial—just do it backward.)
Numpy provides a function that does this for you across a whole array, allclose:
np.allclose(x, y, rel_bounds, abs_bounds)
(This actually checks abs(y - x) < abs_ bounds + rel_bounds * y), but that's almost always sufficient, and you can easily reorganize your code when it's not.)
In your case:
np.allclose(0, np.sin(np.radians(180)), rel_bounds, abs_bounds)
So, how do you know what the right bounds are? There's no way to teach you enough error analysis in an SO answer. Propagation of uncertainty at Wikipedia gives a high-level overview. If you really have no clue, you can use the defaults, which are 1e-5 relative and 1e-8 absolute.

One solution is to switch to sympy when calculating sin's and cos's, then to switch back to numpy using sp.N(...) function:
>>> # Numpy not exactly zero
>>> import numpy as np
>>> value = np.cos(np.pi/2)
6.123233995736766e-17
# Sympy workaround
>>> import sympy as sp
>>> def scos(x): return sp.N(sp.cos(x))
>>> def ssin(x): return sp.N(sp.sin(x))
>>> value = scos(sp.pi/2)
0
just remember to use sp.pi instead of sp.np when using scos and ssin functions.

Faced same problem,
import numpy as np
print(np.cos(math.radians(90)))
>> 6.123233995736766e-17
and tried this,
print(np.around(np.cos(math.radians(90)), decimals=5))
>> 0
Worked in my case. I set decimal 5 not lose too many information. As you can think of round function get rid of after 5 digit values.

Try this... it zeros anything below a given tiny-ness value...
import numpy as np
def zero_tiny(x, threshold):
if (x.dtype == complex):
x_real = x.real
x_imag = x.imag
if (np.abs(x_real) < threshold): x_real = 0
if (np.abs(x_imag) < threshold): x_imag = 0
return x_real + 1j*x_imag
else:
return x if (np.abs(x) > threshold) else 0
value = np.cos(np.pi/2)
print(value)
value = zero_tiny(value, 10e-10)
print(value)
value = np.exp(-1j*np.pi/2)
print(value)
value = zero_tiny(value, 10e-10)
print(value)

Python uses the normal taylor expansion theory it solve its trig functions and since this expansion theory has infinite terms, its results doesn't reach exact but it only approximates.
For e.g
sin(x) = x - x³/3! + x⁵/5! - ...
=> Sin(180) = 180 - ... Never 0 bout approaches 0.
That is my own reason by prove.

Simple.
np.sin(np.pi).astype(int)
np.sin(np.pi/2).astype(int)
np.sin(3 * np.pi / 2).astype(int)
np.sin(2 * np.pi).astype(int)
returns
0
1
0
-1

Random number function python that includes 1?

I am new to Python and am trying to create a program for a project- firstly, I need to generate a point between the numbers 0-1.0, including 0 and 1.0 ([0, 1.0]). I searched the python library for functions (https://docs.python.org/2/library/random.html) and I found this function:
random.random()
This will return the next random floating point number in the range [0.0, 1.0). This is a problem, since it does not include 1. Although the chances of actually generating a 1 are very slim anyway, it is still important because this is a scientific program that will be used in a larger data collection.
I also found this function:
rand.randint
This will return an integer, which is also a problem.
I researched on the website and previously asked questions and found that this function:
random.uniform(a, b)
will only return a number that is greater than or equal to a and less than b.
Does anyone know how to create a random function on python that will include [0, 1.0]?
Please correct me if I was mistaken on any of this information. Thank you.
*The random numbers represent the x value of a three dimensional point on a sphere.

Could you make do with something like this?
random.randint(0, 1000) / 1000.0
Or more formally:
precision = 3
randomNumber = random.randint(0, 10 ** precision) / float(10 ** precision)

Consider the following function built on top of random.uniform. I believe that the re-sampling approach should cause all numbers in the desired interval to appear with equal probability, because the probability of returning candidate > b is 0, and originally all numbers should be equally likely.
import sys
import random
def myRandom(a, b):
candidate = uniform.random(a, b + sys.float_info.epsilon)
while candidate > b:
candidate = uniform.random(a, b + sys.float_info.epsilon)
return candidate
As gnibbler mentioned below, for the general case, it may make more sense to change both the calls to the following. Note that this will only work correctly if b > 0.
candidate = uniform.random(a, b*1.000001)

Try this:
import random
random.uniform(0.0, 1.0)
Which will, according to the documentation [Python 3.x]:
Return a random floating point number N such that a <= N <= b for a <= b and b <= N <= a for b < a.
Notice that the above paragraph states that b is in fact included in the range of possible values returned by the function. However, beware of the second part (emphasis mine):
The end-point value b may or may not be included in the range depending on floating-point rounding in the equation a + (b-a) * random().

For floating point numbers you can use numpy's machine limits for floats class to get the smallest possible value for 64bit or 32bit floating point numbers. In theory, you should be able to add this value to b in random.uniform(a, b) making 1 inclusive in your generator:
import numpy
import random
def randomDoublePrecision():
floatinfo = numpy.finfo(float)
epsilon = floatinfo.eps
a = random.uniform(0, 1 + eps)
return a
This assumes that you are using full precision floating point numbers for your number generator. For more info read this Wikipedia article.

Would it be just:
list_rnd=[random.random() for i in range(_number_of_numbers_you_want)]
list_rnd=[item/max(list_rnd) for item in list_rnd]
Generate a list of random numbers and divide it by its max value. The resulting list still flows uniform distribution.

I've had the same problem, this should help you.
a: upper limit,
b: lower limit, and
digit: digit after comma
def konv_des (bin,a,b,l,digit):
des=int(bin,2)
return round(a+(des*(b-a)/((2**l)-1)),digit)
def rand_bin(p):
key1 = ""
for i in range(p):
temp = str(random.randint(0, 1))
key1 += temp
return(key1)
def rand_chrom(a,b,digit):
l = 1
eq=False
while eq==False:
l += 1
eq=2**(l-1) < (b-a)*(10**digit) and (b-a)*(10**digit) <= (2**l)-1
return konv_des(rand_bin(l),a,b,l,digit)
#run
rand_chrom(0,1,4)

Trapezoid Rule in Python

I am trying to write a program using Python v. 2.7.5 that will compute the area under the curve y=sin(x) between x = 0 and x = pi. Perform this calculation varying the n divisions of the range of x between 1 and 10 inclusive and print the approximate value, the true value, and the percent error (in other words, increase the accuracy by increasing the number of trapezoids). Print all the values to three decimal places.
I am not sure what the code should look like. I was told that I should only have about 12 lines of code for these calculations to be done.
I am using Wing IDE.
This is what I have so far
# base_n = (b-a)/n
# h1 = a + ((n-1)/n)(b-a)
# h2 = a + (n/n)(b-a)
# Trap Area = (1/2)*base*(h1+h2)
# a = 0, b = pi
from math import pi, sin
def TrapArea(n):
for i in range(1, n):
deltax = (pi-0)/n
sum += (1.0/2.0)(((pi-0)/n)(sin((i-1)/n(pi-0))) + sin((i/n)(pi-0)))*deltax
return sum
for i in range(1, 11):
print TrapArea(i)
I am not sure if I am on the right track. I am getting an error that says "local variable 'sum' referenced before assignment. Any suggestions on how to improve my code?

Your original problem and problem with Shashank Gupta's answer was /n does integer division. You need to convert n to float first:
from math import pi, sin
def TrapArea(n):
sum = 0
for i in range(1, n):
deltax = (pi-0)/n
sum += (1.0/2.0)*(((pi-0)/float(n))*(sin((i-1)/float(n)*(pi-0))) + sin((i/float(n))*(pi-0)))*deltax
return sum
for i in range(1, 11):
print TrapArea(i)
Output:
0
0.785398163397
1.38175124526
1.47457409274
1.45836902046
1.42009115659
1.38070223089
1.34524797198
1.31450259385
1.28808354
Note that you can heavily simplify the sum += ... part.
First change all (pi-0) to pi:
sum += (1.0/2.0)*((pi/float(n))*(sin((i-1)/float(n)*pi)) + sin((i/float(n))*pi))*deltax
Then do pi/n wherever possible, which avoids needing to call float as pi is already a float:
sum += (1.0/2.0)*(pi/n * (sin((i-1) * pi/n)) + sin(i * pi/n))*deltax
Then change the (1.0/2.0) to 0.5 and remove some brackets:
sum += 0.5 * (pi/n * sin((i-1) * pi/n) + sin(i * pi/n)) * deltax
Much nicer, eh?

You have some indentation issues with your code but that could just be because of copy paste. Anyways adding a line sum = 0 at the beginning of your TrapArea function should solve your current error. But as #Blender pointed out in the comments, you have another issue, which is the lack of a multiplication operator (*) after your floating point division expression (1.0/2.0).
Remember that in Python expressions are not always evaluated as you would expect mathematically. Thus (a op b)(c) will not automatically multiply the result of a op b by c like you would expect with a mathematical expression. Instead this is the function call notation in Python.
Also remember that you must initialize all variables before using their values for assignment. Python has no default value for unnamed variables so when you reference the value of sum with sum += expr which is equivalent to sum = sum + expr you are trying to reference a name (sum) that is not binded to any object at all.
The following revision to your function should do the trick. Notice how I place multiplication operators (*) between every expression that you intend to multiply.
def TrapArea(n):
sum = 0
for i in range(1, n):
i = float(i)
deltax = (pi-0)/n
sum += (1.0/2.0)*(((pi-0)/n)*(sin((i-1)/n*(pi-0))) + sin((i/n)*(pi-0)))*deltax
return sum
EDIT: I also dealt with the float division issue by converting i to float(i) within every iteration of the loop. In Python 2.x, if you divide one integer type object with another integer type object, the expression evaluates to an integer regardless of the actual value.

A "nicer" way to do the trapezoid rule with equally-spaced points...
Let dx = pi/n be the width of the interval. Also, let f(i) be sin(i*dx) to shorten some expressions below. Then interval i (in range(1,n)) contributes:
dA = 0.5*dx*( f(i) + f(i-1) )
...to the sum (which is an area, so I'm using dA for "delta area"). Factoring out the 0.5*dx, makes the whole some look like:
A = 0.5*dx * ( (f(0) + f(1)) + (f(1) + f(2)) + .... + (f(n-1) + f(n)) )
Notice that there are two f(1) terms, two f(2) terms, on up to two f(n-1) terms. Combine those to get:
A = 0.5*dx * ( f(0) + 2*f(1) + 2*f(2) + ... + 2*f(n-1) + f(n) )
The 0.5 and 2 factors cancel except in the first and last terms:
A = 0.5*dx(f(0) + f(n)) + dx*(f(1) + f(2) + ... + f(n-1))
Finally, you can factor dx out entirely to do just one multiplication at the end. Converting back to sin() calls, then:
def TrapArea(n):
dx = pi/n
asum = 0.5*(sin(0) + sin(pi)) # this is 0 for this problem, but not others
for i in range(1, n-1):
asum += sin(i*dx)
return sum*dx
That changed "sum" to "asum", or maybe "area" would be better. That's mostly because sum() is a built-in function, which I'll use below the line.
Extra credit: The loop part of the sum can be done in one step with a generator expression and the sum builtin function:
def TrapArea2(n):
dx = pi/n
asum = 0.5*(sin(0) + sin(pi))
asum += sum(sin(i*dx) for i in range(1,n-1))
return asum*dx
Testing both of those:
>>> for n in [1, 10, 100, 1000, 10000]:
print n, TrapArea(n), TrapArea2(n)
1 1.92367069372e-16 1.92367069372e-16
10 1.88644298557 1.88644298557
100 1.99884870579 1.99884870579
1000 1.99998848548 1.99998848548
10000 1.99999988485 1.99999988485
That first line is a "numerical zero", since math.sin(math.pi) evaluates to about 1.2e-16 instead of exactly zero. Draw the single interval from 0 to pi and the endpoints are indeed both 0 (or nearly so.)

Checking if float is equivalent to an integer value in python

In Python 3, I am checking whether a given value is triangular, that is, it can be represented as n * (n + 1) / 2 for some positive integer n.
Can I just write:
import math
def is_triangular1(x):
num = (1 / 2) * (math.sqrt(8 * x + 1) - 1)
return int(num) == num
Or do I need to do check within a tolerance instead?
epsilon = 0.000000000001
def is_triangular2(x):
num = (1 / 2) * (math.sqrt(8 * x + 1) - 1)
return abs(int(num) - num) < epsilon
I checked that both of the functions return same results for x up to 1,000,000. But I am not sure if generally speaking int(x) == x will always correctly determine whether a number is integer, because of the cases when for example 5 is represented as 4.99999999999997 etc.
As far as I know, the second way is the correct one if I do it in C, but I am not sure about Python 3.

There is is_integer function in python float type:
>>> float(1.0).is_integer()
True
>>> float(1.001).is_integer()
False
>>>

Both your implementations have problems. It actually can happen that you end up with something like 4.999999999999997, so using int() is not an option.
I'd go for a completely different approach: First assume that your number is triangular, and compute what n would be in that case. In that first step, you can round generously, since it's only necessary to get the result right if the number actually is triangular. Next, compute n * (n + 1) / 2 for this n, and compare the result to x. Now, you are comparing two integers, so there are no inaccuracies left.
The computation of n can be simplified by expanding
(1/2) * (math.sqrt(8*x+1)-1) = math.sqrt(2 * x + 0.25) - 0.5
and utilizing that
round(y - 0.5) = int(y)
for positive y.
def is_triangular(x):
n = int(math.sqrt(2 * x))
return x == n * (n + 1) / 2

You'll want to do the latter. In Programming in Python 3 the following example is given as the most accurate way to compare
def equal_float(a, b):
#return abs(a - b) <= sys.float_info.epsilon
return abs(a - b) <= chosen_value #see edit below for more info
Also, since epsilon is the "smallest difference the machine can distinguish between two floating-point numbers", you'll want to use <= in your function.
Edit: After reading the comments below I have looked back at the book and it specifically says "Here is a simple function for comparing floats for equality to the limit of the machines accuracy". I believe this was just an example for comparing floats to extreme precision but the fact that error is introduced with many float calculations this should rarely if ever be used. I characterized it as the "most accurate" way to compare in my answer, which in some sense is true, but rarely what is intended when comparing floats or integers to floats. Choosing a value (ex: 0.00000000001) based on the "problem domain" of the function instead of using sys.float_info.epsilon is the correct approach.
Thanks to S.Lott and Sven Marnach for their corrections, and I apologize if I led anyone down the wrong path.

Python does have a Decimal class (in the decimal module), which you could use to avoid the imprecision of floats.

floats can exactly represent all integers in their range - floating-point equality is only tricky if you care about the bit after the point. So, as long as all of the calculations in your formula return whole numbers for the cases you're interested in, int(num) == num is perfectly safe.
So, we need to prove that for any triangular number, every piece of maths you do can be done with integer arithmetic (and anything coming out as a non-integer must imply that x is not triangular):
To start with, we can assume that x must be an integer - this is required in the definition of 'triangular number'.
This being the case, 8*x + 1 will also be an integer, since the integers are closed under + and * .
math.sqrt() returns float; but if x is triangular, then the square root will be a whole number - ie, again exactly represented.
So, for all x that should return true in your functions, int(num) == num will be true, and so your istriangular1 will always work. The only sticking point, as mentioned in the comments to the question, is that Python 2 by default does integer division in the same way as C - int/int => int, truncating if the result can't be represented exactly as an int. So, 1/2 == 0. This is fixed in Python 3, or by having the line
from __future__ import division
near the top of your code.

I think the module decimal is what you need

You can round your number to e.g. 14 decimal places or less:
>>> round(4.999999999999997, 14)
5.0
PS: double precision is about 15 decimal places

It is hard to argue with standards.
In C99 and POSIX, the standard for rounding a float to an int is defined by nearbyint() The important concept is the direction of rounding and the locale specific rounding convention.
Assuming the convention is common rounding, this is the same as the C99 convention in Python:
#!/usr/bin/python
import math
infinity = math.ldexp(1.0, 1023) * 2
def nearbyint(x):
"""returns the nearest int as the C99 standard would"""
# handle NaN
if x!=x:
return x
if x >= infinity:
return infinity
if x <= -infinity:
return -infinity
if x==0.0:
return x
return math.floor(x + 0.5)
If you want more control over rounding, consider using the Decimal module and choose the rounding convention you wish to employ. You may want to use Banker's Rounding for example.
Once you have decided on the convention, round to an int and compare to the other int.

Consider using NumPy, they take care of everything under the hood.
import numpy as np
result_bool = np.isclose(float1, float2)

Python has unlimited integer precision, but only 53 bits of float precision. When you square a number, you double the number of bits it requires. This means that the ULP of the original number is (approximately) twice the ULP of the square root.
You start running into issues with numbers around 50 bits or so, because the difference between the fractional representation of an irrational root and the nearest integer can be smaller than the ULP. Even in this case, checking if you are within tolerance will do more harm than good (by increasing the number of false positives).
For example:
>>> x = (1 << 26) - 1
>>> (math.sqrt(x**2)).is_integer()
True
>>> (math.sqrt(x**2 + 1)).is_integer()
False
>>> (math.sqrt(x**2 - 1)).is_integer()
False
>>> y = (1 << 27) - 1
>>> (math.sqrt(y**2)).is_integer()
True
>>> (math.sqrt(y**2 + 1)).is_integer()
True
>>> (math.sqrt(y**2 - 1)).is_integer()
True
>>> (math.sqrt(y**2 + 2)).is_integer()
False
>>> (math.sqrt(y**2 - 2)).is_integer()
True
>>> (math.sqrt(y**2 - 3)).is_integer()
False
You can therefore rework the formulation of your problem slightly. If an integer x is a triangular number, there exists an integer n such that x = n * (n + 1) // 2. The resulting quadratic is n**2 + n - 2 * x = 0. All you need to know is if the discriminant 1 + 8 * x is a perfect square. You can compute the integer square root of an integer using math.isqrt starting with python 3.8. Prior to that, you could use one of the algorithms from Wikipedia, implemented on SO here.
You can therefore stay entirely in python's infinite-precision integer domain with the following one-liner:
def is_triangular(x):
return math.isqrt(k := 8 * x + 1)**2 == k
Now you can do something like this:
>>> x = 58686775177009424410876674976531835606028390913650409380075
>>> math.isqrt(k := 8 * x + 1)**2 == k
True
>>> math.isqrt(k := 8 * (x + 1) + 1)**2 == k
False
>>> math.sqrt(k := 8 * x + 1)**2 == k
False
The first result is correct: x in this example is a triangular number computed with n = 342598234604352345342958762349.

Python still uses the same floating point representation and operations C does, so the second one is the correct way.

Under the hood, Python's float type is a C double.
The most robust way would be to get the nearest integer to num, then test if that integers satisfies the property you're after:
import math
def is_triangular1(x):
num = (1/2) * (math.sqrt(8*x+1)-1 )
inum = int(round(num))
return inum*(inum+1) == 2*x # This line uses only integer arithmetic

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Avoid Mean of Floating Point Error - python

How about not using the mean but just multiplying each element by the length of the list and comparing it directly to the sum of the original list? I think this should do what you want without relying on division

Related

if condition using sympy equation solver/ sympy very slow

numpy sin and cos don't return exactly zero for reference angles that have zero values [duplicate]

Random number function python that includes 1?

Trapezoid Rule in Python

Checking if float is equivalent to an integer value in python

Categories

Resources