What's going on here?
>>> a = np.int8(1)
>>> a%2
1
>>> a = np.uint8(1)
>>> a%2
1
>>> a = np.int32(1)
>>> a%2
1
>>> a = np.uint32(1)
>>> a%2
1
>>> a = np.int64(1)
>>> a%2
1
>>> a = np.uint64(1)
>>> a%2
'1.0'
We suddenly get what appears to be a a string containing the float 1.0!?
>>> a = np.uint64(1)
>>> type(a%2)
<type 'numpy.float64'>
...though it turns out it's simply a float.
What's the philosophy behind this?
I understand that numpy wants to be stricter about things like types and typing rules in order to be more efficient than basic python, but in this case the downsides of returning a very unexpected result to the user (likely breaking their program) seems to far outweigh the slight increase in cost of just checking the sign of the modulus before wandering down this slippery path.
It's not too rare to be working with uint64 values. For example, if you ever load an image into a numpy int array and then sum it, you have uint64(s). On the other hand, it's extremely rare to ever mod anything by a negative number (I've never done it except to see what would happen), because you generally mod things you can count such as indices, and different languages/standards/libraries can each have their own idea of what the result should be.
All this put together leaves me rather confused.
We suddenly get what appears to be a a string containing the float 1.0!?
This is still a float64 - it just looks weird due to a bug in numpy 1.14.3, which is fixed in 1.15.0-dev.
You'd normally thing that there are only two ways to convert to a string - __repr__ (tp_repr), and __str__ (tp_str).
It turns out that in python 2, there's one more - tp_print. This is only called when outputting directly to the console or the interpreter.
It turns out we implemented this wrong for only the interpreter. It's pretty tricky to test interpreter behavior in the test suite!
though it turns out it's simply a float.
This is sort of by design - 2 is inferred to be np.int64(2), and coercing {int64, uint64} -> float64 (to not cause truncation). There are numerous issues about this, but it's tricky to fix.
Related
I would like to know if there is a way to find out the maximum, for the sake of having something specific let's say, integer type (or unsigned integer, or float, or complex - any "fixed size" type) supported by numpy at runtime. That is, let's assume that I know (from documentation) that largest unsigned integer type in the current version of numpy is np.uint64 and I have a line of code such as:
y = np.uint64(x)
I would like my code to use whatever is the largest, let's say, unsigned integer type available in the version of numpy that my code uses. That is, I would be interested in replacing the above hardcoded type with something like this:
y = np.largest_uint_type(x)
Is there such a method?
You can use np.sctypes:
>>> def largest_of_kind(kind):
... return max(np.sctypes[kind], key=lambda x: np.dtype(x).itemsize)
...
>>> largest_of_kind('int')
<class 'numpy.int64'>
>>> largest_of_kind('uint')
<class 'numpy.uint64'>
>>> largest_of_kind('float')
<class 'numpy.float128'>
>>> largest_of_kind('complex')
<class 'numpy.complex256'>
While I do like #PaulPanzer solution, I also found that numpy defines a function maximum_sctype() not documented in numpy's standard docs. This function fundamentally does the same thing as #PaulPanzer solution (plus some edge case analysis). From the code it is clear that sctype types are sorted in the increasing size order. Using this function, what I need can be done as follows:
y = np.maximum_sctype(np.float)(x) # currently np.float128 on OSX
y = np.maximum_sctype(np.uint8)(x) # currently np.uint64
etc.
Not so elegant, but using the prior knowledge that np.uint is always an exponent of 2, you can do something like that:
for i in range(4,100):
try:
eval('np.uint'+str(2**i)+'(0)')
except:
c=i-1
break
answer='np.uint'+str(2**c)
>>answer
Out[657]: 'np.uint64'
and you can use it as
y=eval(answer+'('+str(x)+')')
or, alternatively without the assumption of exp(2) and with no eval (check all the numbers up to N, here 1000):
for i in range(1000):
if hasattr(np,'uint'+str(i)):
x='uint'+str(i)
>>x
Out[662]: 'uint64'
Context
We display percentage values to agents in our app without trailing zeros (50% is much easier to quickly scan than is 50.000%), and hitherto we've just used quantize to sort of brute force normalize the value to remove trailing zeros.
This morning I decided to look into using Decimal.normalize instead, but ran into this:
Given the decimal value:
>>> value = Decimal('50.000')
Normalizing that value:
>>> value = value.normalize()
Results in:
>>> value
Decimal('5E+1')
I understand the value is the same:
>>> Decimal('5E+1') == Decimal('50')
True
But from a non-technical user's perspective, 5E+1 is basically meaningless.
Question
Is there a way to convert Decimal('5E+1') to Decimal('50')?
Note
I'm not looking to do anything that would change the value of the Decimal (e.g., removing decimal places altogether), since the value could be e.g., Decimal('33.333'). IOW, don't confuse my 50.000 example as meaning that we're only dealing with whole numbers.
For the purposes of output formatting, you can print your normalized Decimal objects with the f format specifier. (While the format string docs say this defaults to a precision of 6, this does not appear to be the case for Decimal objects.)
>>> print('{:f}%'.format(decimal.Decimal('50.000').normalize()))
50%
>>> print('{:f}%'.format(decimal.Decimal('50.003').normalize()))
50.003%
>>> print('{:f}%'.format(decimal.Decimal('1.23456789').normalize()))
1.23456789%
If for some reason, you really want to make a new Decimal object with different precision, you can do that by just calling Decimal on the f format output, but it sounds like you're dealing with an output format problem, not something you should change the internal representation for.
>>> Decimal('{:f}'.format(Decimal('5E+1')))
Decimal('50')
>>>
>>> Decimal('{:f}'.format(Decimal('50.000').normalize()))
Decimal('50')
>>> Decimal('{:f}'.format(Decimal('50.003').normalize()))
Decimal('50.003')
>>> Decimal('{:f}'.format(Decimal('1.23456789').normalize()))
Decimal('1.23456789')
according to the python 3.9 docs the below is how to do it - https://docs.python.org/3.9/library/decimal.html#decimal-faq
def remove_exponent(d):
return d.quantize(Decimal(1)) if d == d.to_integral() else d.normalize()
Add Decimal(0) to your result.
Decimal('50.000').normalize()
# Decimal('5E+1')
Decimal('50.000').normalize() + Decimal(0)
# Decimal('50')
I have created the following snippet of code and I am trying to convert my 5 dp DNumber to a 2 dp one and insert this into a string. However which ever method I try to use, always seems to revert the DNumber back to the original number of decimal places (5)
Code snippet below:
if key == (1, 1):
DNumber = '{r[csvnum]}'.format(r=row)
# returns 7.65321
DNumber = """%.2f""" % (float(DNumber))
# returns 7.65
Check2 = False
if DNumber:
if DNumber <= float(8):
Check2 = True
if Check2:
print DNumber
# returns 7.65
string = 'test {r[csvhello]} TESTHERE test'.format(r=row).replace("TESTHERE", str("""%.2f""" % (float(gtpe))))
# returns: test Hello 7.65321 test
string = 'test {r[csvhello]} TESTHERE test'.format(r=row).replace("TESTHERE", str(DNumber))
# returns: test Hello 7.65321 test
What I hoped it would return: test Hello 7.65 test
Any Ideas or suggestion on alternative methods to try?
It seems like you were hoping that converting the float to a 2-decimal-place string and then back to a float would give you a 2-decimal-place float.
The first problem is that your code doesn't actually do that anywhere. If you'd done that, you would get something very close to 7.65, not 7.65321.
But the bigger problem is that what you're trying to do doesn't make any sense. A float always has 53 binary digits, no matter what. If you round it to two decimal digits (no matter how you do it, including by converting to string and back), what you actually get is a float rounded to two decimal digits and then rounded to 53 binary digits. The closest float to 7.65 is not exactly 7.65, but 7.650000000000000355271368.* So, that's what you'd end up with. And there's no way around that; it's inherent to the way float is stored.
However, there is a different type you can use for this: decimal.Decimal. For example:
>>> f = 7.65321
>>> s = '%.2f' % f
>>> d = decimal.Decimal(s)
>>> f, s, d
(7.65321, '7.65', Decimal('7.65'))
Or, of course, you could just pass around a string instead of a float (as you're accidentally doing in your code already), or you could remember to use the .2f format every time you want to output it.
As a side note, since your DNumber ends up as a string, this line is not doing anything useful:
if DNumber <= 8:
In Python 2.x, comparing two values of different types gives you a consistent but arbitrary and meaningless answer. With CPython 2.x, it will always be False.** In a different Python 2.x implementation, it might be different. In Python 3.x, it raises a TypeError.
And changing it to this doesn't help in any way:
if DNumber <= float(8):
Now, instead of comparing a str to an int, you're comparing a str to a float. This is exactly as meaningless, and follows the exact same rules. (Also, float(8) means the same thing as 8.0, but less readable and potentially slower.)
For that matter, this:
if DNumber:
… is always going to be true. For a number, if foo checks whether it's non-zero. That's a bad idea for float values (you should check whether it's within some absolute or relative error range of 0). But again, you don't have a float value; you have a str. And for strings, if foo checks whether the string is non-empty. So, even if you started off with 0, your string "0.00" is going to be true.
* I'm assuming here that you're using CPython, on a platform that uses IEEE-754 double for its C double type, and that all those extra conversions back and forth between string and float aren't introducing any additional errors.
** The rule is, slightly simplified: If you compare two numbers, they're converted to a type that can hold them both; otherwise, if either value is None it's smaller; otherwise, if either value is a number, it's smaller; otherwise, whichever one's type has an alphabetically earlier name is smaller.
I think you're trying to do the following - combine the formatting with the getter:
>>> a = 123.456789
>>> row = {'csvnum': a}
>>> print 'test {r[csvnum]:.2f} hello'.format(r=row)
test 123.46 hello
If your number is a 7 followed by five digits, you might want to try:
print "%r" % float(str(x)[:4])
where x is the float in question.
Example:
>>>x = 1.11111
>>>print "%r" % float(str(x)[:4])
>>>1.11
My Python code was doing something strange to me (or my numbers, rather):
a)
float(poverb.tangibles[1])*1000
1038277000.0
b)
float(poverb.tangibles[1]*1000)
inf
Which led to discovering that:
long(poverb.tangibles[1]*1000)
produces the largest number I've ever seen.
Uhhh, I didn't read the whole Python tutorial or it's doc. Did I miss something critical about how float works?
EDIT:
>>> poverb.tangibles[1]
u'1038277'
What you probably missed is docs on how multiplication works on strings. Your tangibles list contains strings. tangibles[1] is a string. tangibles[1]*1000 is that string repeated 1000 times. Calling float or long on that string interprets it as a number, creating a huge number. If you instead do float(tangibles[1]), you only get the actual number, not the number repeated 1000 times.
What you are seeing is just the same as what goes on in this example:
>>> x = '1'
>>> x
'1'
>>> x*10
'1111111111'
>>> float(x)
1.0
>>> float(x*10)
1111111111.0
I'm making a program that, for reasons not needed to be explained, requires a float to be converted into a string to be counted with len(). However, str(float(x)) results in x being rounded when converted to a string, which throws the entire thing off. Does anyone know of a fix for it?
Here's the code being used if you want to know:
len(str(float(x)/3))
Some form of rounding is often unavoidable when dealing with floating point numbers. This is because numbers that you can express exactly in base 10 cannot always be expressed exactly in base 2 (which your computer uses).
For example:
>>> .1
0.10000000000000001
In this case, you're seeing .1 converted to a string using repr:
>>> repr(.1)
'0.10000000000000001'
I believe python chops off the last few digits when you use str() in order to work around this problem, but it's a partial workaround that doesn't substitute for understanding what's going on.
>>> str(.1)
'0.1'
I'm not sure exactly what problems "rounding" is causing you. Perhaps you would do better with string formatting as a way to more precisely control your output?
e.g.
>>> '%.5f' % .1
'0.10000'
>>> '%.5f' % .12345678
'0.12346'
Documentation here.
len(repr(float(x)/3))
However I must say that this isn't as reliable as you think.
Floats are entered/displayed as decimal numbers, but your computer (in fact, your standard C library) stores them as binary. You get some side effects from this transition:
>>> print len(repr(0.1))
19
>>> print repr(0.1)
0.10000000000000001
The explanation on why this happens is in this chapter of the python tutorial.
A solution would be to use a type that specifically tracks decimal numbers, like python's decimal.Decimal:
>>> print len(str(decimal.Decimal('0.1')))
3
Other answers already pointed out that the representation of floating numbers is a thorny issue, to say the least.
Since you don't give enough context in your question, I cannot know if the decimal module can be useful for your needs:
http://docs.python.org/library/decimal.html
Among other things you can explicitly specify the precision that you wish to obtain (from the docs):
>>> getcontext().prec = 6
>>> Decimal('3.0')
Decimal('3.0')
>>> Decimal('3.1415926535')
Decimal('3.1415926535')
>>> Decimal('3.1415926535') + Decimal('2.7182818285')
Decimal('5.85987')
>>> getcontext().rounding = ROUND_UP
>>> Decimal('3.1415926535') + Decimal('2.7182818285')
Decimal('5.85988')
A simple example from my prompt (python 2.6):
>>> import decimal
>>> a = decimal.Decimal('10.000000001')
>>> a
Decimal('10.000000001')
>>> print a
10.000000001
>>> b = decimal.Decimal('10.00000000000000000000000000900000002')
>>> print b
10.00000000000000000000000000900000002
>>> print str(b)
10.00000000000000000000000000900000002
>>> len(str(b/decimal.Decimal('3.0')))
29
Maybe this can help?
decimal is in python stdlib since 2.4, with additions in python 2.6.
Hope this helps,
Francesco
I know this is too late but for those who are coming here for the first time, I'd like to post a solution. I have a float value index and a string imgfile and I had the same problem as you. This is how I fixed the issue
index = 1.0
imgfile = 'data/2.jpg'
out = '%.1f,%s' % (index,imgfile)
print out
The output is
1.0,data/2.jpg
You may modify this formatting example as per your convenience.