Why are these results so different when using Python's "float" function? - python

My Python code was doing something strange to me (or my numbers, rather):
a)
float(poverb.tangibles[1])*1000
1038277000.0
b)
float(poverb.tangibles[1]*1000)
inf
Which led to discovering that:
long(poverb.tangibles[1]*1000)
produces the largest number I've ever seen.
Uhhh, I didn't read the whole Python tutorial or it's doc. Did I miss something critical about how float works?
EDIT:
>>> poverb.tangibles[1]
u'1038277'

What you probably missed is docs on how multiplication works on strings. Your tangibles list contains strings. tangibles[1] is a string. tangibles[1]*1000 is that string repeated 1000 times. Calling float or long on that string interprets it as a number, creating a huge number. If you instead do float(tangibles[1]), you only get the actual number, not the number repeated 1000 times.
What you are seeing is just the same as what goes on in this example:
>>> x = '1'
>>> x
'1'
>>> x*10
'1111111111'
>>> float(x)
1.0
>>> float(x*10)
1111111111.0

Related

Curious Modulus Operator (%) Result

What's going on here?
>>> a = np.int8(1)
>>> a%2
1
>>> a = np.uint8(1)
>>> a%2
1
>>> a = np.int32(1)
>>> a%2
1
>>> a = np.uint32(1)
>>> a%2
1
>>> a = np.int64(1)
>>> a%2
1
>>> a = np.uint64(1)
>>> a%2
'1.0'
We suddenly get what appears to be a a string containing the float 1.0!?
>>> a = np.uint64(1)
>>> type(a%2)
<type 'numpy.float64'>
...though it turns out it's simply a float.
What's the philosophy behind this?
I understand that numpy wants to be stricter about things like types and typing rules in order to be more efficient than basic python, but in this case the downsides of returning a very unexpected result to the user (likely breaking their program) seems to far outweigh the slight increase in cost of just checking the sign of the modulus before wandering down this slippery path.
It's not too rare to be working with uint64 values. For example, if you ever load an image into a numpy int array and then sum it, you have uint64(s). On the other hand, it's extremely rare to ever mod anything by a negative number (I've never done it except to see what would happen), because you generally mod things you can count such as indices, and different languages/standards/libraries can each have their own idea of what the result should be.
All this put together leaves me rather confused.
We suddenly get what appears to be a a string containing the float 1.0!?
This is still a float64 - it just looks weird due to a bug in numpy 1.14.3, which is fixed in 1.15.0-dev.
You'd normally thing that there are only two ways to convert to a string - __repr__ (tp_repr), and __str__ (tp_str).
It turns out that in python 2, there's one more - tp_print. This is only called when outputting directly to the console or the interpreter.
It turns out we implemented this wrong for only the interpreter. It's pretty tricky to test interpreter behavior in the test suite!
though it turns out it's simply a float.
This is sort of by design - 2 is inferred to be np.int64(2), and coercing {int64, uint64} -> float64 (to not cause truncation). There are numerous issues about this, but it's tricky to fix.

Can Decimal('5E+1') be simply converted to Decimal('50') in Python?

Context
We display percentage values to agents in our app without trailing zeros (50% is much easier to quickly scan than is 50.000%), and hitherto we've just used quantize to sort of brute force normalize the value to remove trailing zeros.
This morning I decided to look into using Decimal.normalize instead, but ran into this:
Given the decimal value:
>>> value = Decimal('50.000')
Normalizing that value:
>>> value = value.normalize()
Results in:
>>> value
Decimal('5E+1')
I understand the value is the same:
>>> Decimal('5E+1') == Decimal('50')
True
But from a non-technical user's perspective, 5E+1 is basically meaningless.
Question
Is there a way to convert Decimal('5E+1') to Decimal('50')?
Note
I'm not looking to do anything that would change the value of the Decimal (e.g., removing decimal places altogether), since the value could be e.g., Decimal('33.333'). IOW, don't confuse my 50.000 example as meaning that we're only dealing with whole numbers.
For the purposes of output formatting, you can print your normalized Decimal objects with the f format specifier. (While the format string docs say this defaults to a precision of 6, this does not appear to be the case for Decimal objects.)
>>> print('{:f}%'.format(decimal.Decimal('50.000').normalize()))
50%
>>> print('{:f}%'.format(decimal.Decimal('50.003').normalize()))
50.003%
>>> print('{:f}%'.format(decimal.Decimal('1.23456789').normalize()))
1.23456789%
If for some reason, you really want to make a new Decimal object with different precision, you can do that by just calling Decimal on the f format output, but it sounds like you're dealing with an output format problem, not something you should change the internal representation for.
>>> Decimal('{:f}'.format(Decimal('5E+1')))
Decimal('50')
>>>
>>> Decimal('{:f}'.format(Decimal('50.000').normalize()))
Decimal('50')
>>> Decimal('{:f}'.format(Decimal('50.003').normalize()))
Decimal('50.003')
>>> Decimal('{:f}'.format(Decimal('1.23456789').normalize()))
Decimal('1.23456789')
according to the python 3.9 docs the below is how to do it - https://docs.python.org/3.9/library/decimal.html#decimal-faq
def remove_exponent(d):
return d.quantize(Decimal(1)) if d == d.to_integral() else d.normalize()
Add Decimal(0) to your result.
Decimal('50.000').normalize()
# Decimal('5E+1')
Decimal('50.000').normalize() + Decimal(0)
# Decimal('50')

Float converted to 2.dp reverts to original number of decimal places when inserted into a string

I have created the following snippet of code and I am trying to convert my 5 dp DNumber to a 2 dp one and insert this into a string. However which ever method I try to use, always seems to revert the DNumber back to the original number of decimal places (5)
Code snippet below:
if key == (1, 1):
DNumber = '{r[csvnum]}'.format(r=row)
# returns 7.65321
DNumber = """%.2f""" % (float(DNumber))
# returns 7.65
Check2 = False
if DNumber:
if DNumber <= float(8):
Check2 = True
if Check2:
print DNumber
# returns 7.65
string = 'test {r[csvhello]} TESTHERE test'.format(r=row).replace("TESTHERE", str("""%.2f""" % (float(gtpe))))
# returns: test Hello 7.65321 test
string = 'test {r[csvhello]} TESTHERE test'.format(r=row).replace("TESTHERE", str(DNumber))
# returns: test Hello 7.65321 test
What I hoped it would return: test Hello 7.65 test
Any Ideas or suggestion on alternative methods to try?
It seems like you were hoping that converting the float to a 2-decimal-place string and then back to a float would give you a 2-decimal-place float.
The first problem is that your code doesn't actually do that anywhere. If you'd done that, you would get something very close to 7.65, not 7.65321.
But the bigger problem is that what you're trying to do doesn't make any sense. A float always has 53 binary digits, no matter what. If you round it to two decimal digits (no matter how you do it, including by converting to string and back), what you actually get is a float rounded to two decimal digits and then rounded to 53 binary digits. The closest float to 7.65 is not exactly 7.65, but 7.650000000000000355271368.* So, that's what you'd end up with. And there's no way around that; it's inherent to the way float is stored.
However, there is a different type you can use for this: decimal.Decimal. For example:
>>> f = 7.65321
>>> s = '%.2f' % f
>>> d = decimal.Decimal(s)
>>> f, s, d
(7.65321, '7.65', Decimal('7.65'))
Or, of course, you could just pass around a string instead of a float (as you're accidentally doing in your code already), or you could remember to use the .2f format every time you want to output it.
As a side note, since your DNumber ends up as a string, this line is not doing anything useful:
if DNumber <= 8:
In Python 2.x, comparing two values of different types gives you a consistent but arbitrary and meaningless answer. With CPython 2.x, it will always be False.** In a different Python 2.x implementation, it might be different. In Python 3.x, it raises a TypeError.
And changing it to this doesn't help in any way:
if DNumber <= float(8):
Now, instead of comparing a str to an int, you're comparing a str to a float. This is exactly as meaningless, and follows the exact same rules. (Also, float(8) means the same thing as 8.0, but less readable and potentially slower.)
For that matter, this:
if DNumber:
… is always going to be true. For a number, if foo checks whether it's non-zero. That's a bad idea for float values (you should check whether it's within some absolute or relative error range of 0). But again, you don't have a float value; you have a str. And for strings, if foo checks whether the string is non-empty. So, even if you started off with 0, your string "0.00" is going to be true.
* I'm assuming here that you're using CPython, on a platform that uses IEEE-754 double for its C double type, and that all those extra conversions back and forth between string and float aren't introducing any additional errors.
** The rule is, slightly simplified: If you compare two numbers, they're converted to a type that can hold them both; otherwise, if either value is None it's smaller; otherwise, if either value is a number, it's smaller; otherwise, whichever one's type has an alphabetically earlier name is smaller.
I think you're trying to do the following - combine the formatting with the getter:
>>> a = 123.456789
>>> row = {'csvnum': a}
>>> print 'test {r[csvnum]:.2f} hello'.format(r=row)
test 123.46 hello
If your number is a 7 followed by five digits, you might want to try:
print "%r" % float(str(x)[:4])
where x is the float in question.
Example:
>>>x = 1.11111
>>>print "%r" % float(str(x)[:4])
>>>1.11

Displaying 6.5235375356299998e-07 without exponential notation

I have to convert exponential strings, like 6.5235375356299998e-07,
to a float value, and display the result of my computation like 0.00000065235...
How can I do this in a Python program?
6.5235375356299998e-07 is a perfectly legal float even if there is an e in it. You can do the whole calculation with it:
>>> 6.5235375356299998e-07 * 10000000
6.5235375356300001
>>> 6.5235375356299998e-07 + 10000000
10000000.000000652
In the second case, many digits will disappear because of the precision of a python's float.
If you need the string representation without e, try this:
>>> '{0:.20f}'.format(6.5235375356299998e-07)
'0.00000065235375356300'
but it will become a string and you won't be able to do any calculus with it any more.

Decimal alignment formatting in Python

This should be easy.
Here's my array (rather, a method of generating representative test arrays):
>>> ri = numpy.random.randint
>>> ri2 = lambda x: ''.join(ri(0,9,x).astype('S'))
>>> a = array([float(ri2(x)+ '.' + ri2(y)) for x,y in ri(1,10,(10,2))])
>>> a
array([ 7.99914000e+01, 2.08000000e+01, 3.94000000e+02,
4.66100000e+03, 5.00000000e+00, 1.72575100e+03,
3.91500000e+02, 1.90610000e+04, 1.16247000e+04,
3.53920000e+02])
I want a list of strings where '\n'.join(list_o_strings) would print:
79.9914
20.8
394.0
4661.0
5.0
1725.751
391.5
19061.0
11624.7
353.92
I want to space pad to the left and the right (but no more than necessary).
I want a zero after the decimal if that is all that is after the decimal.
I do not want scientific notation.
..and I do not want to lose any significant digits. (in 353.98000000000002 the 2 is not significant)
Yeah, it's nice to want..
Python 2.5's %g, %fx.x, etc. are either befuddling me, or can't do it.
I have not tried import decimal yet. I can't see that NumPy does it either (although, the array.__str__ and array.__repr__ are decimal aligned (but sometimes return scientific).
Oh, and speed counts. I'm dealing with big arrays here.
My current solution approaches are:
to str(a) and parse off NumPy's brackets
to str(e) each element in the array and split('.') then pad and reconstruct
to a.astype('S'+str(i)) where i is the max(len(str(a))), then pad
It seems like there should be some off-the-shelf solution out there... (but not required)
Top suggestion fails with when dtype is float64:
>>> a
array([ 5.50056103e+02, 6.77383566e+03, 6.01001513e+05,
3.55425142e+08, 7.07254875e+05, 8.83174744e+02,
8.22320510e+01, 4.25076609e+08, 6.28662635e+07,
1.56503068e+02])
>>> ut0 = re.compile(r'(\d)0+$')
>>> thelist = [ut0.sub(r'\1', "%12f" % x) for x in a]
>>> print '\n'.join(thelist)
550.056103
6773.835663
601001.513
355425141.8471
707254.875038
883.174744
82.232051
425076608.7676
62866263.55
156.503068
Sorry, but after thorough investigation I can't find any way to perform the task you require without a minimum of post-processing (to strip off the trailing zeros you don't want to see); something like:
import re
ut0 = re.compile(r'(\d)0+$')
thelist = [ut0.sub(r'\1', "%12f" % x) for x in a]
print '\n'.join(thelist)
is speedy and concise, but breaks your constraint of being "off-the-shelf" -- it is, instead, a modular combination of general formatting (which almost does what you want but leaves trailing zero you want to hide) and a RE to remove undesired trailing zeros. Practically, I think it does exactly what you require, but your conditions as stated are, I believe, over-constrained.
Edit: original question was edited to specify more significant digits, require no extra leading space beyond what's required for the largest number, and provide a new example (where my previous suggestion, above, doesn't match the desired output). The work of removing leading whitespace that's common to a bunch of strings is best performed with textwrap.dedent -- but that works on a single string (with newlines) while the required output is a list of strings. No problem, we'll just put the lines together, dedent them, and split them up again:
import re
import textwrap
a = [ 5.50056103e+02, 6.77383566e+03, 6.01001513e+05,
3.55425142e+08, 7.07254875e+05, 8.83174744e+02,
8.22320510e+01, 4.25076609e+08, 6.28662635e+07,
1.56503068e+02]
thelist = textwrap.dedent(
'\n'.join(ut0.sub(r'\1', "%20f" % x) for x in a)).splitlines()
print '\n'.join(thelist)
emits:
550.056103
6773.83566
601001.513
355425142.0
707254.875
883.174744
82.232051
425076609.0
62866263.5
156.503068
Pythons string formatting can both print out only the necessary decimals (with %g) or use a fixed set of decimals (with %f). However, you want to print out only the necessary decimals, except if the number is a whole number, then you want one decimal, and that makes it complex.
This means you would end up with something like:
def printarr(arr):
for x in array:
if math.floor(x) == x:
res = '%.1f' % x
else:
res = '%.10g' % x
print "%*s" % (15-res.find('.')+len(res), res)
This will first create a string either with 1 decimal, if the value is a whole number, or it will print with automatic decimals (but only up to 10 numbers) if it is not a fractional number. Lastly it will print it, adjusted so that the decimal point will be aligned.
Probably, though, numpy actually does what you want, because you typically do want it to be in exponential mode if it's too long.

Categories

Resources