can someone tell me why the output of the code mentioned below is negative zero??
a * b = -0
here 'a' is of type long, b is an object of decimal class.....and if a=-28 and b=0, then the output is -0
From the Python decimal docs:
The signed zeros can result from
calculations that underflow. They keep
the sign that would have resulted if
the calculation had been carried out
to greater precision. Since their
magnitude is zero, both positive and
negative zeros are treated as equal
and their sign is informational.
This explains it all :
http://effbot.org/pyfaq/why-are-floating-point-calculations-so-inaccurate.htm
it's something very close to zero, but on the minus side, that is why this happens.
Related
I happen to have a numpy array of floats:
a.dtype, a.shape
#(dtype('float64'), (32769,))
The values are:
a[0]
#3.699822718929953
all(a == a[0])
True
However:
a.mean()
3.6998227189299517
The mean is off by 15th and 16th figure.
Can anybody show how this difference is accumulated over 30K mean and if there is a way to avoid it?
In case it matters my OS is 64 bit.
Here is a rough approximation of a bound on the maximum error. This will not be representative of average error, and it could be improved with more analysis.
Consider calculating a sum using floating-point arithmetic with round-to-nearest ties-to-even:
sum = 0;
for (i = 0; i < n; ++n)
sum += a[i];
where each a[i] is in [0, m).
Let ULP(x) denote the unit of least precision in the floating-point number x. (For example, in the IEEE-754 binary64 format with 53-bit significands, if the largest power of 2 not greater than |x| is 2p, then ULP(x) = 2p−52. With round-to-nearest, the maximum error in any operation with result x is ½ULP(x).
If we neglect rounding errors, the maximum value of sum after i iterations is i•m. Therefore, a bound on the error in the addition in iteration i is ½ULP(i•m). (Actually zero for i=1, since that case adds to zero, which has no error, but we neglect that for this approximation.) Then the total of the bounds on all the additions is the sum of ½ULP(i•m) for i from 1 to n. This is approximately ½•n•(n+1)/2•ULP(m) = ¼•n•(n+1)•ULP(m). (This is an approximation because it moves i outside the ULP function, but ULP is a discontinuous function. It is “approximately linear,“ but there are jumps. Since the jumps are by factors of two, the approximation can be off by at most a factor of two.)
So, with 32,769 elements, we can say the total rounding error will be at most about ¼•32,769•32,770•ULP(m), about 2.7•108 times the ULP of the maximum element value. The ULP is 2−52 times the greatest power of two not less than m, so that is about 2.7•108•2−52 = 6•10−8 times m.
Of course, the likelihood that 32,768 sums (not 32,769 because the first necessarily has no error) all round in the same direction by chance is vanishingly small but I conjecture one might engineer a sequence of values that gets close to that.
An Experiment
Here is a chart of (in blue) the mean error over 10,000 samples of summing arrays with sizes 100 to 32,800 by 100s and elements drawn randomly from a uniform distribution over [0, 1). The error was calculated by comparing the sum calculated with float (IEEE-754 binary32) to that calculated with double (IEEE-754 binary64). (The samples were all multiples of 2−24, and double has enough precision so that the sum for up to 229 such values is exact.)
The green line is c n √n with c set to match the last point of the blue line. We see it tracks the blue line over the long term. At points where the average sum crosses a power of two, the mean error increases faster for a time. At these points, the sum has entered a new binade, and further additions have higher average errors due to the increased ULP. Over the course of the binade, this fixed ULP decreases relative to n, bringing the blue line back to the green line.
This is due to incapability of float64 type to store the sum of your float numbers with correct precision. In order to get around this problem you need to use a larger data type of course*. Numpy has a longdouble dtype that you can use in such cases:
In [23]: np.mean(a, dtype=np.longdouble)
Out[23]: 3.6998227189299530693
Also, note:
In [25]: print(np.longdouble.__doc__)
Extended-precision floating-point number type, compatible with C
``long double`` but not necessarily with IEEE 754 quadruple-precision.
Character code: ``'g'``.
Canonical name: ``np.longdouble``.
Alias: ``np.longfloat``.
Alias *on this platform*: ``np.float128``: 128-bit extended-precision floating-point number type.
* read the comments for more details.
The mean is (by definition):
a.sum()/a.size
Unfortunately, adding all those values up and dividing accumulates floating point errors. They are usually around the magnitude of:
np.finfo(np.float).eps
Out[]: 2.220446049250313e-16
Yeah, e-16, about where you get them. You can make the error smaller by using higher-accuracy floats like float128 (if your system supports it) but they'll always accumulate whenever you're summing a large number of float together. If you truly want the identity, you'll have to hardcode it:
def mean_(arr):
if np.all(arr == arr[0]):
return arr[0]
else:
return arr.mean()
In practice, you never really want to use == between floats. Generally in numpy we use np.isclose or np.allclose to compare floats for exactly this reason. There are ways around it using other packages and leveraging arcane machine-level methods of calculating numbers to get (closer to) exact equality, but it's rarely worth the performance and clarity hit.
I've read in other questions that that for example sin(2π) is not zero due to floating point representation, but is very close. This very small error is no issue in my code as I can just round up 5 decimals for example.
However when multiplying 2π with a very large number, the error is magnified a lot. The answer should be zero (or close), but is far from it.
Am I doing something fundamentally wrong in my thinking? If not, how can I avoid the error margin of floating numbers for π to get "magnified" as the number of periods (2*PI*X) → ∞ ?
Notice that all the 3 last results are the same. Can anyone explain why this is even though 5) is exactly PI/2 larger than 4)? Even with a huge offset in the sinus curve, an increase in PI/2 should still produce a different number right?
Checking small number SIN(2*PI)
print math.sin(math.pi*2)
RESULT = -2.44929359829e-16 AS EXPECTED → This error margin is OK for my purpose
Adding PI/2 to code above: SIN(2*PI + PI/2)
print math.sin((math.pi*2)+(math.pi/2))
RESULT: 1.0 AS EXPECTED
Checking very large number SIN(2*PI*VERY LARGE NUMBER) (still expecting close to zero)
print math.sin(math.pi*2*(415926535897932384626433832795028841971693993751))
RESULT:-0.759488037749 NOT AS EXPECTED --> This error margin is NOT OK for my purpose
Adding PI/2 to code above: SIN(2*PI*VERY LARGE NUMBER + PI/2) (expecting close to one)
print math.sin((math.pi*2*(415926535897932384626433832795028841971693993751))+(math.pi/2))
As above but I added PI/2 - expecting to get 1.0 as result
RESULT:-0.759488037749 NOT AS EXPECTED - why the same result as above when I added PI/2 (should go a quarter period on the sinus curve)
Adding random number (8) to the very large number, expecting neither 1 nor 0
print math.sin(math.pi*2*(415926535897932384626433832795028841971693993759))
as above but I added 8 - expecting to get neither 0 nor 1
RESULT:-0.759488037749 NOT AS EXPECTED - why the same result as above when I added 8
This simply isn't going to work with double-precision variables.
The value of math.pi is correct only to about 16 places of decimals (53 bits in binary), so when you multiply it by a number like 415926535897932384626433832795028841971693993751 (159 bits), it would be impossible to get meaningful results.
You need to use an arbitrary precision math library instead. Try using mpmath for example. Tell it you want 1000 bits of precision, and then try your sums again:
>>> import mpmath
>>> mpmath.mp.prec=1000
>>> print(mpmath.sin((mpmath.pi*2*(415926535897932384626433832795028841971693993751))+(mpmath.pi/2)))
1.0
How to avoid math.sin(math.pi*2*VERY LARGE NUMBER) having a much larger error margin than math.sin(math.pi*2)?
You could % 1 that very large number:
>>> math.sin(math.pi*2*(415926535897932384626433832795028841971693993751))
-0.8975818793257183
>>> math.sin(math.pi*2*(415926535897932384626433832795028841971693993751 % 1))
0.0
>>> math.sin((math.pi*2*(415926535897932384626433832795028841971693993751))+(math.pi/2))
-0.8975818793257183
>>> math.sin((math.pi*2*(415926535897932384626433832795028841971693993751 % 1))+(math.pi/2))
1.0
The algorithms used are approximate, and the values (e.g. pi) are approximate. So $\pi \cdot {SomeLargeNumber}$ will have a largeish error (as $\pi$'s value is approximate). The function used (by hardware?) will reduce the argument, perhaps using a slightly different value of $\pi$.
Note that floating point arithmetic does not satisfy the axioms for real arithmetic.
I am currently translating a MATLAB program into Python. I successfully ported all the previous vector operations using numpy. However I am stuck in the following bit of code which is a cosine similarity measure.
% W and ind are different sized matrices
dist = full(W * (W(ind2(range),:)' - W(ind1(range),:)' + W(ind3(range),:)'));
for i=1:length(range)
dist(ind1(range(i)),i) = -Inf;
dist(ind2(range(i)),i) = -Inf;
dist(ind3(range(i)),i) = -Inf;
end
disp(dist)
[~, mx(range)] = max(dist);
I did not understand the following part.
dist(indx(range(i)),i) = -Inf;
What actuality is happening when you use
= -Inf;
on the right side?
In Matlab (see: Inf):
Inf returns the IEEE® arithmetic representation for positive infinity.
So Inf produces a value that is greater than all other numeric values. -Inf produces a value that is guaranteed to be less than any other numeric value. It's generally used when you want to iteratively find a maximum and need a first value to compare to that's always going to be less than your first comparison.
According to Wikipedia (see: IEEE 754 Inf):
Positive and negative infinity are represented thus:
sign = 0 for positive infinity, 1 for negative infinity.
biased exponent = all 1 bits.
fraction = all 0 bits.
Python has the same concept using '-inf' (see Note 6 here):
float also accepts the strings “nan” and “inf” with an optional prefix “+” or “-” for Not a Number (NaN) and positive or negative infinity.
>>> a=float('-inf')
>>> a
-inf
>>> b=-27983.444
>>> min(a,b)
-inf
It just assigns a minus infinity value to the left-hand side.
It may appear weird to assign that value, particularly because a distance cannot be negative. But it looks like it's used for effectively removing those entries from the max computation in the last line.
If Python doesn't have "infinity" (I don't know Python) and if dist is really a distance (hence nonnegative) , you could use any negative value instead of -inf to achieve the same effect, namely remove those entries from the max computation.
The -Inf is typically used to initialize a variable such that you later can use it to in a comparison in a loop.
For instance if I want to find the the maximum value in a function (and have forgotten the command max). Then I would have made something like:
function maxF = findMax(f,a,b)
maxF = -Inf;
x = a:0.001:b;
for i = 1:length(x)
if f(x) > maxF
maxF = f(x);
end
end
It is a method in matlab to make sure that any other value is larger than the current. So the comparison in Python would be -sys.maxint +1.
See for instance:
Maximum and Minimum values for ints
What is python's threshold of representable negative numbers? What's the lowest number below which Python will call any other value a - negative inifinity?
There is no most negative integer, as Python integers have arbitrary precision. The smallest float greater than negative infinity (which, depending on your implementation, can be represented as -float('inf')) can be found in sys.float_info.
>>> import sys
>>> sys.float_info.max
1.7976931348623157e+308
The actual values depend on the actual implementation, but typically uses your C library's double type. Since floating-point values typically use a sign bit, the smallest negative value is simply the inverse of the largest positive value. Also, because of how floating point values are stored (separate mantissa and exponent), you can't simply subtract a small value from the "minimum" value and get back negative infinity. Subtracting 1, for example, simply returns the same value due to limited precision.
(In other words, the possible float values are a small subset of the actual real numbers, and operations on two float values is not necessarily equivalent to the same operation on the "equivalent" reals.)
I am wondering about the way Python (3.3.0) prints complex numbers. I am looking for an explanation, not a way to change the print.
Example:
>>> complex(1,1)-complex(1,1)
0j
Why doesn't it just print "0"? My guess is: to keep the output of type complex.
Next example:
>>> complex(0,1)*-1
(-0-1j)
Well, a simple "-1j" or "(-1j)" would have done. And why "-0"?? Isn't that the same as +0? It doesn't seem to be a rounding problem:
>>> (complex(0,1)*-1).real == 0.0
True
And when the imaginary part gets positive, the -0 vanishes:
>>> complex(0,1)
1j
>>> complex(0,1)*-1
(-0-1j)
>>> complex(0,1)*-1*-1
1j
Yet another example:
>>> complex(0,1)*complex(0,1)*-1
(1-0j)
>>> complex(0,1)*complex(0,1)*-1*-1
(-1+0j)
>>> (complex(0,1)*complex(0,1)*-1).imag
-0.0
Am I missing something here?
It prints 0j to indicate that it's still a complex value. You can also type it back in that way:
>>> 0j
0j
The rest is probably the result of the magic of IEEE 754 floating point representation, which makes a distinction between 0 and -0, the so-called signed zero. Basically, there's a single bit that says whether the number is positive or negative, regardless of whether the number happens to be zero. This explains why 1j * -1 gives something with a negative zero real part: the positive zero got multiplied by -1.
-0 is required by the standard to compare equal to +0, which explains why (1j * -1).real == 0.0 still holds.
The reason that Python still decides to print the -0, is that in the complex world these make a difference for branch cuts, for instance in the phase function:
>>> phase(complex(-1.0, 0.0))
3.141592653589793
>>> phase(complex(-1.0, -0.0))
-3.141592653589793
This is about the imaginary part, not the real part, but it's easy to imagine situations where the sign of the real part would make a similar difference.
The answer lies in the Python source code itself.
I'll work with one of your examples. Let
a = complex(0,1)
b = complex(-1, 0)
When you doa*b you're calling this function:
real_part = a.real*b.real - a.imag*b.imag
imag_part = a.real*b.imag + a.imag*b.real
And if you do that in the python interpreter, you'll get
>>> real_part
-0.0
>>> imag_part
-1.0
From IEEE754, you're getting a negative zero, and since that's not +0, you get the parens and the real part when printing it.
if (v->cval.real == 0. && copysign(1.0, v->cval.real)==1.0) {
/* Real part is +0: just output the imaginary part and do not
include parens. */
...
else {
/* Format imaginary part with sign, real part without. Include
parens in the result. */
...
I guess (but I don't know for sure) that the rationale comes from the importance of that sign when calculating with elementary complex functions (there's a reference for this in the wikipedia article on signed zero).
0j is an imaginary literal which indeed indicates a complex number rather than an integer or floating-point one.
The +-0 ("signed zero") is a result of Python's conformance to IEEE 754 floating point representation since in Python, complex is by definition a pair of floating point numbers. Due to the latter, there's no need to print or specify zero fraction parts for a complex too.
The -0 part is printed in order to accurately represent the contents as repr()'s documentation demands (repr() is implicitly called whenever an operation's result is output to the console).
Regarding the question why (-0+1j) = 1j but (1j*-1) = (-0+1j).
Note that (-0+0j) or (-0.0+0j) aren't single complex numbers but expressions - an int/float added to a complex. To compute the result, first the first number is converted to a complex (-0-> (0.0,0.0) since integers don't have signed zeros, -0.0-> (-0.0,0.0)). Then its .real and .imag are added to the corresponding ones of 1j which are (+0.0,1.0). The result is (+0.0,1.0) :^) . To construct a complex directly, use complex(-0.0,1).
As far as the first question is concerned: if it just printed 0 it would be mathematically correct, but you wouldn't know you were dealing with a complex object vs an int. As long as you don't specify .real you will always get a J component.
I'm not sure why you would ever get -0; it's not technically incorrect (-1 * 0 = 0) but it's syntactically odd.
As far as the rest goes, it's strange that it isn't consistent, however none are technically correct, just an artifact of the implementation.