Python math.log and math.log10 giving different results - python

I was writing a code to calculate number of digits in a given whole number.
I was initially using
math.log(num,10)
but found out it was giving incorrect(approximate) value at num = 1000
math.log(1000,10)
>2.9999999999999996
I understand that the above might be due to the floating point arithmetic in computers being done differently but the same, however, works flawlessly using math.log10
math.log10(1000)
>3.0
Is it correct to assume that log10 is more accurate than log and to use it wherever log base 10 is involved instead of going with the more generalized log function?

Python's math documentation specifically says:
math.log10(x)
Return the base-10 logarithm of x. This is usually more accurate than log(x, 10).

According to the Python Math module documentation:
math.log(x,[base])
With one argument, return the natural logarithm of x (to base e).
With two arguments, return the logarithm of x to the given base, calculated as log(x)/log(base).
Whereas in the math.log10 section:
math.log10(x)
Return the base-10 logarithm of x. This is usually more accurate than log(x, 10).
It might be due to the rounding of the floating point numbers.
Because,
If I take the first method of using log(1000)/log(10), I get:
>>> log(1000)
6.907755278982137
>>> log(10)
2.302585092994046
>>> 6.907755278982137/2.302585092994046
2.9999999999999996

Related

Weird float to integer conversion issue in python

For a calculation in program that I wrote that involves finite algebraic fields, I needed to check whether (2**58-1)/61 was an integer. However python seems to indicates that it is, while it is not.
For example -
>>> (2**58-1)/61
4725088133634619.0
Even when using numpy functions this issue appears -
>>> np.divide(np.float64(2**58)-1,np.float64(61))
4725088133634619.0
This happens although python does calculate 2**58 correctly (I assume this issue is general, but I encountered it using these numbers).
If you use normal / division, your result is a float, with the associated limited precision. The result gets rounded, and in your case, it gets rounded to 4725088133634619.0 - but that doesn't prove that it is an integer.
If you want to check if the result of the division by 61 is an integer, test if the remainder of the division by 61 is 0, using the modulo operator:
>>> (2**58-1) % 61
45
As you can see, it isn't.
As for float limited precision mentioned by #Thierry Lathuille, Python's float uses 64 bits and is double-precision that provides 53 bits for mantissa (the same is true for np.float64). That means that not all numbers > 2**53 are representable using the float, we have a loss of precision. For example, 2**53 == 2**53 + 1 is true in double precision. More detailed here:
https://en.wikipedia.org/wiki/Double-precision_floating-point_format
Is floating point math broken?
Correct answers have already been given. I am just adding another approach (which is not much different from what has already been said).
What you may want to do, due to inherent limitations of representation errors, is use divmod( ) for python and numpy.divmod( ) for numpy. That way you can check the quotient and the remainder.
print(divmod((2**58-1),61))
Gives quotient and remainder as
(4725088133634618, 45)
In numpy you may want to use similar, divmod, function but the numbers should be np.int type and not np.float type (due to representation errors mentioned above).
np.divmod(np.int64(2**58)-1,np.int8(61))
The above gives quotient and remainder as.
(4725088133634618, 45)

Does Python document its behavior for rounding to a specified number of fractional digits?

Is the algorithm used for rounding a float in Python to a specified number of digits specified in any Python documentation? The semantics of round with zero fractional digits (i.e. rounding to an integer) are simple to understand, but it's not clear to me how the case where the number of digits is nonzero is implemented.
The most straightforward implementation of the function that I can think of (given the existence of round to zero fractional digits) would be:
def round_impl(x, ndigits):
return (10 ** -ndigits) * round(x * (10 ** ndigits))
I'm trying to write some C++ code that mimics the behavior of Python's round() function for all values of ndigits, and the above agrees with Python for the most part, when translated to equivalent C++ calls. However, there are some cases where it differs, e.g.:
>>> round(0.493125, 5)
0.49312
>>> round_impl(0.493125, 5)
0.49313
There is clearly a difference that occurs when the value to be rounded is at or very near the exact midpoint between two potential output values. Therefore, it seems important that I try to use the same technique if I want similar results.
Is the specific means for performing the rounding specified by Python? I'm using CPython 2.7.15 in my tests, but I'm specifically targeting v2.7+.
Also refer to What Every Programmer Should Know About Floating-Point Arithmetic, which has more detailed explanations for why this is happening as it is.
This is a mess. First of all, as far as float is concerned, there is no such number as 0.493125, when you write 0.493125 what you actually get is:
0.493124999999999980015985556747182272374629974365234375
So this number is not exactly between two decimals, it's actually closer to 0.49312 than it is to 0.49313, so it should definitely round to 0.49312, that much is clear.
The problem is that when you multiply by 105, you get the exact number 49312.5. So what happened here is the multiplication gave you an inexact result which by coincidence canceled out the rounding error in the original number. Two rounding errors canceled each other out, yay! But the problem is that when you do this, the rounding is actually incorrect... at least if you want to round up at midpoints, but Python 3 and Python 2 behave differently. Python 2 rounds away from 0, and Python 3 rounds towards even least-significant digits.
Python 2
if two multiples are equally close, rounding is done away from 0
Python 3
...if two multiples are equally close, rounding is done toward the even choice...
Summary
In Python 2,
>>> round(49312.5)
49313.0
>>> round(0.493125, 5)
0.49312
In Python 3,
>>> round(49312.5)
49312
>>> round(0.493125, 5)
0.49312
And in both cases, 0.493125 is really just a short way of writing 0.493124999999999980015985556747182272374629974365234375.
So, how does it work?
I see two plausible ways for round() to actually behave.
Choose the closest decimal number with the specified number of digits, and then round that decimal number to float precision. This is hard to implement, because it requires doing calculations with more precision than you can get from a float.
Take the two closest decimal numbers with the specified number of digits, round them both to float precision, and return whichever is closer. This will give incorrect results, because it rounds numbers twice.
And Python chooses... option #1! The exactly correct, but much harder to implement version. Refer to Objects/floatobject.c:927 double_round(). It uses the following process:
Write the floating-point number to a string in decimal format, using the requested precision.
Parse the string back in as a float.
This uses code based on David Gay's dtoa library. If you want C++ code that gets the actual correct result like Python does, this is a good start. Fortunately you can just include dtoa.c in your program and call it, since its licensing is very permissive.
The Python documentation for and 2.7 specifies the behaviour:
Values are rounded to the closest multiple of 10 to the power minus
ndigits; if two multiples are equally close, rounding is done away
from 0.
For 3.7:
For the built-in types supporting round(), values are rounded to the
closest multiple of 10 to the power minus ndigits; if two multiples
are equally close, rounding is done toward the even choice
Update:
The (cpython) implementation can be found floatobjcet.c in the function float___round___impl, which calls round if ndigits is not given, but double_round if it is.
double_round has two implementations.
One converts the double to a string (aka decimal) and back to a double.
The other one does some floating point calculations, calls to pow and at its core calls round. It seems to have more potential problems with overflows, since it actually multiplies the input by 10**-ndigits.
For the precise algorithm, look at the linked source file.

How do I check the default decimal precision when converting float to str?

When converting a float to a str, I can specify the number of decimal points I want to display
'%.6f' % 0.1
> '0.100000'
'%.6f' % .12345678901234567890
> '0.123457'
But when simply calling str on a float in python 2.7, it seems to default to 12 decimal points max
str(0.1)
>'0.1'
str(.12345678901234567890)
>'0.123456789012'
Where is this max # of decimal points defined/documented? Can I programmatically get this number?
The number of decimals displayed is going to vary greatly, and there won't be a way to predict how many will be displayed in pure Python. Some libraries like numpy allow you to set precision of output.
This is simply because of the limitations of float representation.
The relevant parts of the link talk about how Python chooses to display floats.
Python only prints a decimal approximation to the true decimal value of the binary approximation stored by the machine
Python keeps the number of digits manageable by displaying a rounded value instead
Now, there is the possibility of overlap here:
Interestingly, there are many different decimal numbers that share the same nearest approximate binary fraction
The method for choosing which decimal values to display was changed in Python 3.1 (But the last sentence implies this might be an implementation detail).
For example, the numbers 0.1 and 0.10000000000000001 are both
approximated by 3602879701896397 / 2 ** 55. Since all of these decimal
values share the same approximation, any one of them could be
displayed while still preserving the invariant eval(repr(x)) == x
Historically, the Python prompt and built-in repr() function would
choose the one with 17 significant digits, 0.10000000000000001.
Starting with Python 3.1, Python (on most systems) is now able to
choose the shortest of these and simply display 0.1.
I do not believe this exists in the python language spec. However, the cpython implementation does specify it. The float_repr() function, which turns a float into a string, eventually calls a helper function with the 'r' formatter, which eventually calls a utility function that hardcodes the format to what comes down to format(float, '.16g'). That code can be seen here. Note that this is for python3.6.
>>> import math
>>> str(math.pi*4)
12.5663706144
giving the maximum number of signification digits (both before and after the decimal) at 16. It appears that in the python2.7 implementation, this value was hardcoded to .12g. As for why this happened (and is somewhat lacking documentation, can be found here.)
So if you are trying to get how long a number will be formatted when printed, simply get it's length with .12g.
def len_when_displayed(n):
return len(format(n, '.12g'))
Well, if you're looking for a pure python way of accomplishing this, you could always use something like,
len(str(.12345678901234567890).split('.')[1])
>>>> 12
I couldn't find it in the documentation and will add it here if I do, but this is a work around that can at least always return the length of precision if you want to know before hand.
As you said, it always seems to be 12 even when feeding bigger floating-points.
From what I was able to find, this number can be highly variable and in these cases, finding it empirically seems to be the most reliable way of doing it. So, what I would do is define a simple method like this,
def max_floating_point():
counter = 0
current_length = 0
str_rep = '.1'
while(counter <= current_length):
str_rep += '1'
current_length = len(str(float(str_rep)).split('.')[1])
counter += 1
return current_length
This will return you the maximum length representation on your current system,
print max_floating_point()
>>>> 12
By looking at the output of random numbers converted, I have been unable to understand how the length of the str() is determined, e.g. under Python 3.6.6:
>>> str(.123456789123456789123456789)
'0.12345678912345678'
>>> str(.111111111111111111111111111)
'0.1111111111111111'
You may opt for this code that actually simulates your real situation:
import random
maxdec=max(map(lambda x:len(str(x)),filter(lambda x:x>.1,[random.random() for i in range(99)])))-2
Here we are testing the length of ~90 random numbers in the (.1,1) open interval after conversion (and deducing the 0. from the left, hence the -2).
Python 2.7.5 on a 64bit linux gives me 12, and Python 3.4.8 and 3.6.6 give me 17.

Weird behaviour for Python is_integer from floor

When checking if a floor is an int, the recommend method would be is_integer:
However, I get a weird behaviour with the results of the log function:
print(log(9,3)); #2.0
print((log(9,3)).is_integer()); #True
print((log(243,3))); #5.0
print((log(243,3)).is_integer()); #False
Furthermore:
print((int) (log(9,3))); #2
print((int) (log(243,3))); #4
Is this normal?
log(243,3) simply doesn't give you exactly 5:
>>> '%.60f' % log(243,3)
'4.999999999999999111821580299874767661094665527343750000000000'
As the docs say, log(x, base) is "calculated as log(x)/log(base)". And neither log(243) nor log(3) can be represented exactly, and you get rounding errors. Sometimes you're lucky, sometimes you're not. Don't count on it.
When you want to compare float numbers, use math.isclose().
When you want to convert a float number that is close to an integer, use round().
Float numbers are too subject to error for "conventional" methods to be used. Their precision (and the precision of functions like log) is too limited, unfortunately. What looks like a 5 may not be an exact 5.
And yes: it is normal. This is not a problem with Python, but with every language I'm aware of (they all use the same underlying representation). Python offers some ways to work around float problems: decimal and fractions. Both have their own drawbacks, but sometimes they help. For example, with fractions, you can represent 1/3 without loss of precision. Similarly, with decimal, you can represent 0.1 exactly. However, you'll still have problems with log, sqrt, irrational numbers, numbers that require many digits to be represented and so on.

Unable to see Python's approximations in mathematical calculations

Problem: to see when computer makes approximation in mathematical calculations when I use Python
Example of the problem:
My old teacher once said the following statement
You cannot never calculate 200! with your computer.
I am not completely sure whether it is true or not nowadays.
It seems that it is, since I get a lot zeros for it from a Python script.
How can you see when your Python code makes approximations?
Python use arbitrary-precision arithmetic to calculate with integers, so it can exactly calculate 200!. For real numbers (so-called floating-point), Python does not use an exact representation. It uses a binary representation called IEEE 754, which is essentially scientific notation, except in base 2 instead of base 10.
Thus, any real number that cannot be exactly represented in base 2 with 53 bits of precision, Python cannot produce an exact result. For example, 0.1 (in base 10) is an infinite decimal in base 2, 0.0001100110011..., so it cannot be exactly represented. Hence, if you enter on a Python prompt:
>>> 0.1
0.10000000000000001
The result you get back is different, since has been converted from decimal to binary (with 53 bits of precision), back to decimal. As a consequence, you get things like this:
>>> 0.1 + 0.2 == 0.3
False
For a good (but long) read, see What Every Programmer Should Know About Floating-Point Arithmetic.
Python has unbounded integer sizes in the form of a long type. That is to say, if it is a whole number, the limit on the size of the number is restricted by the memory available to Python.
When you compute a large number such as 200! and you see an L on the end of it, that means Python has automatically cast the int to a long, because an int was not large enough to hold that number.
See section 6.4 of this page for more information.
200! is a very large number indeed.
If the range of an IEEE 64-bit double is 1.7E +/- 308 (15 digits), you can see that the largest factorial you can get is around 170!.
Python can handle arbitrary sized numbers, as can Java with its BigInteger.
Without some sort of clarification to that statement, it's obviously false. Just from personal experience, early lessons in programming (in the late 1980s) included solving very similar, if not exactly the same, problems. In general, to know some device which does calculations isn't making approximations, you have to prove (in the math sense of a proof) that it isn't.
Python's integer types (named int and long in 2.x, both folded into just the int type in 3.x) are very good, and do not overflow like, for example, the int type in C. If you do the obvious of print 200 * 199 * 198 * ... it may be slow, but it will be exact. Similiarly, addition, subtraction, and modulus are exact. Division is a mixed bag, as there's two operators, / and //, and they underwent a change in 2.x—in general you can only treat it as inexact.
If you want more control yet don't want to limit yourself to integers, look at the decimal module.
Python handles large numbers automatically (unlike a language like C where you can overflow its datatypes and the values reset to zero, for example) - over a certain point (sys.maxint or 2147483647) it converts the integer to a "long" (denoted by the L after the number), which can be any length:
>>> def fact(x):
... return reduce(lambda x, y: x * y, range(1, x+1))
...
>>> fact(10)
3628800
>>> fact(200)
788657867364790503552363213932185062295135977687173263294742533244359449963403342920304284011984623904177212138919638830257642790242637105061926624952829931113462857270763317237396988943922445621451664240254033291864131227428294853277524242407573903240321257405579568660226031904170324062351700858796178922222789623703897374720000000000000000000000000000000000000000000000000L
Long numbers are "easy", floating point is more complicated, and almost any computer representation of a floating point number is an approximation, for example:
>>> float(1)/3
0.33333333333333331
Obviously you can't store an infinite number of 3's in memory, so it cheats and rounds it a bit..
You may want to look at the decimal module:
Decimal numbers can be represented exactly. In contrast, numbers like 1.1 do not have an exact representation in binary floating point. End users typically would not expect 1.1 to display as 1.1000000000000001 as it does with binary floating point.
Unlike hardware based binary floating point, the decimal module has a user alterable precision (defaulting to 28 places) which can be as large as needed for a given problem
See Handling very large numbers in Python.
Python has a BigNum class for holding 200! and will use it automatically.
Your teacher's statement, though not exactly true here is true in general. Computers have limitations, and it is good to know what they are. Remember that every time you add another integer of data storage, you can store a number that is 2^32 (4 billion +) times larger. It is hard to comprehend how many more numbers that is - but maths gets slower as you add more integers to store the exact value of a very large number.
As an example (what you can store with 1000 bits)
>>> 2 << 1000
2143017214372534641896850098120003621122809623411067214887500776740702102249872244986396
7576313917162551893458351062936503742905713846280871969155149397149607869135549648461970
8421492101247422837559083643060929499671638825347975351183310878921541258291423929553730
84335320859663305248773674411336138752L
I tried to illustrate how big a number you can store with 10000 bits, or even 8,000,000 bits (a megabyte) but that number is many pages long.

Categories

Resources