Casting float to int truncates rest, seem not reliable - python

Being somehow surprised seeing things like this work:
float f = 10.25f;
int i = (int)f;
// Will give you i = 10
What is the gain?
OTOH 10.25 is quite a different thing than 10, which will be agreed, bad things might happen from such a soft conversion.
Which languages raise an error instead?
Would expect someting like: "Error: Can't represent 10.25 as an integer".
WRT to answers given meanwhile: Yes, it might considered reliable the way a function like "round" is. But not straight WRT to integrity of data/information to be expected from cast.
Maybe a function "truncate" which defaults to behavior of "int" would make a better choice?

It is precisely the
(int)f
that tells that the programmer is aware of what he is doing, while silently cutting off the fractional part and storing the rest in an integer is forbidden in most programming languages.
By the way, it is not just that the fractional part is cut off. It is also that a floating point number can have a value so large that it can't possibly be represented as an int. Consider:
(int) 1e20f

The statement int i = (int)f; explicitly says "Please take my float f and make it into int". This is certainly something I quite often find a useful thing to do - why wouldn't you want to be able to convert a float value from a calculation of some sort to an integer? The cast (int) will tell the compiler that "I really want this to be an integer", just like in C you can do char *p = (char *)1234567; - a typecast is there to tell the compiler "I really know what I'm doing".
If you do int i = f; or int i = 10.25; the compiler will still do what you "asked for" - convert the float value to an integer. It will probably issue a warning to say "You are converting a float to int", if you enable the appropriate warnings.
C and C++ are languages that require you to understand what you are doing, and what the consequences are - some other languages put more "barriers" in place to prevent such things, but that often means that the compiler has to add extra code to check things at runtime - C and C++ are designed to be "fast" languages.
It's a bit like driving a car, putting the car in reverse when there is a wall right behind, and stepping on the gas, will probably cause the car to crash into the wall - if that's not what you want, then "don't do that".

Firstly, the conversion is most definitely "reliable", as in "it will always do the same thing".
Whether you want to do that or not is up to you. In general the C/C++ languages are designed to give the programmer a lot of low-level power, and that means that the programmer needs to know what they are doing. If a float-to-int conversion surprises you then you need to think harder.
In fact, GCC has an option -Wconversion that will highlight cases like this. It isn't enabled by default, and is not part of -Wall or -Wextra (presumably because the behaviour is well understood and "expected" by most programmers), but the option is there if you need it.
Except that it won't give a warning, in this case, because your code includes an explicit cast (int), so the compiler assumes you did it deliberately.
This gives a warning (with -Wconversion):
int i = f;
This does not:
int i = (int)f;

Converting to an integer is useful in cases where you are working with complex data, but ultimately need to convert this data to an int to do something with it. Think of offsets in arrays, or pixels on a screen
Think of drawing a circle on the screen. There does not exist a fraction of a pixel (so the coordinates are ints), but you cannot calculate the coordinates of the pixel with just ints (sinus works with pi and other floats).

Related

How should I type-hint an integer variable that can also be infinite?

Searching for this topic I came across the following: How to represent integer infinity?
I agree with Martijn Peeters that adding a separate special infinity value for int may not be the best of ideas.
However, this makes type hinting difficult. Assume the following code:
myvar = 10 # type: int
myvar = math.inf # <-- raises a typing error because math.inf is a float
However, the code behaves everywhere just the way as it should. And my type hinting is correct everywhere else.
If I write the following instead:
myvar = 10 # type: Union[int, float]
I can assign math.inf without a hitch. But now any other float is accepted as well.
Is there a way to properly constrain the type-hint? Or am I forced to use type: ignore each time I assign infinity?
The super lazy (and probably incorrect) solution:
Rather than adding a specific value, the int class can be extended via subclassing. This approach is not without a number of pitfalls and challenges, such as the requirement to handle the infinity value for the various __dunder__ methods (i.e. __add__, __mul__, __eq__ and the like, and all of these should be tested). This would be an unacceptable amount of overhead in the use cases where a specific value is required. In such a case, wrapping the desired value with typing.cast would be able to better indicate to the type hinting system the specific value (i.e. inf = cast(int, math.inf)) be acceptable for assignment.
The reason why this approach is incorrect is simply this: since the value assigned looks/feels exactly like some number, some other users of your API may end up inadvertently use this as an int and then the program may explode on them badly when math.inf (or variations of such) be provided.
An analogy is this: given that lists have items that are indexed by positive integers, we would expect that any function that return an index to some item be some positive integer so we may use it directly (I know this is not the case in Python given there are semantics that allow negative index values be used, but pretend we are working with say C for the moment). Say this function return the first occurrence of the matched item, but if there are any errors it return some negative number, which clearly exceed the range of valid values for an index to some item. This lack of guarding against naive usage of the returned value will inevitably result in problems that a type system is supposed to solve.
In essence, creating surrogate values and marking that as an int will offer zero value, and inevitably allow unexpected and broken API/behavior to be exhibited by the program due to incorrect usage be automatically allowed.
Not to mention the fact that infinity is not a number, thus no int value can properly represent that (given that int represent some finite number by its very nature).
As an aside, check out str.index vs str.find. One of these have a return value that definitely violate user expectations (i.e. exceed the boundaries of the type positive integer; won't be told that the return value may be invalid for the context which it may be used at during compile time, results in potential failure randomly at runtime).
Framing the question/answer in more correct terms:
Given the problem is really about the assignment of some integer when a rate exist, and if none exist some other token that represent unboundedness for the particular use case should be done (it could be some built-in value such as NotImplemented or None). However as those tokens would also not be int values, it means myvar would actually need a type that encompasses those, and with a way to apply operation that would do the right thing.
This unfortunately isn't directly available in Python in a very nice way, however in strongly static typed languages like Haskell, the more accepted solution is to use a Maybe type to define a number type that can accept infinity. Note that while floating point infinity is also available there, it inherits all the problems of floating point numbers that makes that an untenable solution (again, don't use inf for this).
Back to Python: depending on the property of the assignment you actually want, it could be as simple as creating a class with a constructor that can either accept an int or None (or NotImplemented), and then provide a method which the users of the class may make use of the actual value. Python unfortunately do not provide the advanced constructs to make this elegant so you will inevitably end up with code managing this be splattered all over the place, or have to write a number of methods that handle whatever input as expected and produce the required output in the specific ways your program actual needs.
Unfortunately, type-hinting is really only scratching the surface and simply grazing over of what more advanced languages have provided and solved at a more fundamental level. I supposed if one must program in Python, it is better than not having it.
Facing the same problem, I "solved" as follow.
from typing import Union
import math
Ordinal = Union[int, float] # int or infinity
def fun(x:Ordinal)->Ordinal:
if x > 0:
return x
return math.inf
Formally, it does exactly what you did not wanted to. But now the intend is clearer. When the user sees Ordinal, he knows that it is expected to be int or math.inf.
and the linter is happy.

Questions on how Ctypes (and potentially the python implementation) handle data types?

The following is my source code and its output from the Win10 command line.
from ctypes import *
class eH():
x= c_uint(0)
print (type(x) == c_uint)
print (type(eH.x) == c_uint)
print (eH.x)
print (type(eH.x))
print (type(eH.x) == c_ulong)
print (c_uint == c_ulong)
print (c_int == c_long)
print ("\nEnd of eH prints\n#####")
class eHardware(Structure):
_fields_= [("xyz", c_uint)]
a= eHardware()
a.xyz= eH.x
print (a.xyz)
print (a.xyz == eH.x)
print (type(a.xyz))
print (type(c_uint(a.xyz)))
The command line output is in the link: https://pastebin.com/umWUDEuy
First thing I noticed is that c_uint == c_ulong outputs True. Does that mean ctypes dynamically assign types on the fly and treat them as same in memory? Would this design has any implication if I want to port the similar script to a type-sensitive language, say C.
Second, in line 17 I assign a.xyz = eH.x, but in line 19 a.xyz == eH.x evaluates to False. Also, the type of a.xyz is been converted to int, while eH.x is of type c_uint (or c_ulong as it always evaluates to when type() is called)
Thanks in advance for the responses.
First thing I noticed is that c_uint == c_ulong outputs True.
This is explained right at the top of the docs:
Note: Some code samples reference the ctypes c_int type. On platforms where sizeof(long) == sizeof(int) it is an alias to c_long. So, you should not be confused if c_long is printed if you would expect c_int — they are actually the same type.
If you're wondering why it does this, it's to improve interaction with C code.
C is a weakly typed language—int and long are always distinct types, but you can always implicitly cast between them via the complicated integer promotion and narrowing rules.1 On many platforms, int and long happen to be both 32 bits, so these rules don't matter much, but on other platforms, long is 64 bits,2 so they do. Which makes it really easy to write code that works on your machine, but segfaults on someone else's by screwing up the stack (possibly even in a way that can be exploited by attackers).
ctypes attempts to reign this in by explicitly defining that c_int is an alias to c_long if and only if they're the same size. So:
If you're careful to always use c_int when the C function you're calling wants int and c_long when it wants long, your code will be portable, just like in C.
If you mix and match them arbitrarily, and that happens to be safe on your machine, it'll work on your machine, just like in C.
If you mix and match them arbitrarily, and then try to run them on a machine where that isn't safe, you should get an exception out of ctypes rather than a segfault.
Does that mean ctypes dynamically assign types on the fly and treat them as same in memory?
I suppose it depends on what you mean by "on the fly". If you look at the source, you'll see that when the module is compiled, it does this:
if _calcsize("i") == _calcsize("l"):
# if int and long have the same size, make c_int an alias for c_long
c_int = c_long
c_uint = c_ulong
else:
class c_int(_SimpleCData):
_type_ = "i"
_check_size(c_int)
class c_uint(_SimpleCData):
_type_ = "I"
_check_size(c_uint)
Of course usually, when you import ctypes, you're getting a pre-compiled ctypes.pyc file,3 so the definition of c_int one way or the other is frozen into that pyc. So, in that sense, you don't have to worry about it being dynamic. But you can always delete the .pyc file, or tell Python not to use them at all. Or you can even monkeypatch ctypes.c_int to be something else, if you really want to. So, in that sense, it's definitely dynamic if you want it to be.4
Would this design has any implication if I want to port the similar script to a type-sensitive language, say C.
Well, the whole point of the design is to match C (and, in particular, the implementation-defined details of the C compiler used to build your CPython interpreter) as closely as possible while at the same time working around a few of the pitfalls of dealing with C. So, it's pretty rare that you design an interface with ctypes and then implement it with C; usually it's the other way around. But occasionally, it does happen (usually something to do with multiprocessing shared memory mapped to numpy arrays…).
In that case, just follow the same rules: make sure to keep c_int and c_long straight in your Python code, and match them to int and long in your C code, and things will work. You will definitely want to enable (and read) warnings in your C compiler to attempt to catch when you're mixing them up. And be prepared for occasional segfaults or memory corruption during debugging, but then you always need to be prepared for that in C.5
Also, the type of a.xyz is been converted to int, while eH.x is of type c_uint
The conversions to native types when you access struct members, pass arguments into C functions and return values out, etc. are pretty complicated. 95% of the time it just does what you want, and it's better to not worry about it.
The first time you hit the other 5% (usually it's because you have a c_char_p that you want to treat as a pointer rather than a string…), there's really no substitute for reading through the docs and learning about the default conversions and _as_parameter_ and _CData classes and restype vs. errcheck and so on. And doing a bit of experimentation in the interactive interpreter, and maybe reading the source.6
1. Most modern compilers will warn about narrowing conversions, and even let you optionally turn them into errors.
2. In the old days, when ctypes was first designed, it was more common for int to be 16 bits, but the effect is the same.
3. If you use a Windows or Mac Python installer, or an RPM or DEB binary package, or a Python that came preinstalled on your system, the stdlib was almost always compiled at the time of building the binary package, on someone else's machine. If you build from source, it's usually compiled at build or install time on your machine. If not, it usually gets compiled the first time you import ctypes.
4. Although I don't know why you'd want it to be. Easier to just define your own type with a different name…
5. You might want to consider using a language that's statically typed, and C-compatible, but has a much stricter and stronger type system than C, like Rust, or at least C++ or D. Then the compiler can do a lot more to help you make sure you're getting things right. But the tradeoffs here are really the same as they always are in choosing between C and another language; there's nothing all that ctypes-specific involved.
6. And finally throwing your hands in the air and declaring that from now on you're only ever going to use cffi instead of ctypes, which lasts until the first time you run into one of cffi's quirks…

Why no optimization of Python 3 range object for floats?

Jumping off from a previous question I asked a while back:
Why is 1000000000000000 in range(1000000000000001) so fast in Python 3?
If you do this:
1000000000000000.0 in range(1000000000000001)
...it is clear that range has not been optimized to check if floats are within the specified range.
I think I understand that the intended purpose of range is to work with ints only - so you cannot, for example, do something like this:
1000000000000 in range(1000000000000001.0)
# error: float object cannot be interpreted as an integer
Or this:
1000000000000 in range(0, 1000000000000001, 1.0)
# error: float object cannot be interpreted as an integer
However, the decision was made, for whatever reason, to allow things like this:
1.0 in range(1)
It seems clear that 1.0 (and 1000000000000.0 above) are not being coerced into ints, because then the int optimization would work for those as well.
My question is, why the inconsistency, and why no optimization for floats? Or, alternatively, what is the rationale behind why the above code does not produce the same error as the previous examples?
This seems like an obvious optimization to include in addition to optimization for ints. I'm guessing there are some nuanced issues preventing a clean implementation of such optimization, or alternatively there is some kind of rationale as to why you would not actually want to include such an optimization. Or possibly both.
EDIT: To clarify the issue here a bit, all the following statements evaluated to False as well:
3.2 in range(5)
'' in range(1)
[] in range(1)
None in range(1)
This seems like unexpected behavior to me, but so far there is definitely no inconsistency. However, the following evaluates to True:
1.0 in range(2.0)
And as shown previously, constructions similar to the above have not been optimized.
This does seem inconsistent- at some point in the evaluation, the value 1.0 (or 1000000000001.0 as in my original example) is being coerced into an int. This makes sense since it is a natural thing to convert a float ending in .0 to an int. However, the question still remains: if it is being converted an int anyway, why has 1000000000000.0 in range(1000000000001) not been optimized?
There is no inconsistency here. Floating point values can't be coerced to integers, that only works the other way around. As such, range() won't implicitly convert floats to integers when testing for containment either.
A range() object is a sequence type; it contains discrete integer values (albeit virtually). As such, it has to support containment testing for any object that may test as equal. The following works too:
>>> class ThreeReally:
... def __eq__(self, other):
... return other == 3
...
>>> ThreeReally() in range(4)
True
This has to do a full scan over all possible values in the range to test for equality with each contained integer.
However, only when using actual integers can the optimisation be applied, as that's the only type where the range() object can know what values will be considered equal without conversion.

How to unify single values and sequences/tuples?

In my work, my Python scripts get a lot of input from non-Python-professional users.
So, for example, if my function needs to handle both single values and a series of values (Polymorphism/Duck Typing, Yo!) I would like to do something like this pseudo-code:
def duck_typed(args):
"""
args - Give me a single integer or a list of integers to work on.
"""
for li in args:
<do_something>
If the user passes me a list:
[1,2]
a tuple:
(1,2)
or even a singleton tuple (terminology help here, please)
(1,)
everything works as expected. But as soon as the user passes in a single integer everything goes to $%^&:
1
TypeError: 'int' object is not iterable
The Difference between Smart and Clever
Now, I immediately think, "No problem, I just need to get clever!" and I do something like this:
def duck_typed(args):
"""
args - Give me a single integer or a list of integers to work on.
"""
args = (args,) # <= Ha ha! I'm so clever!
for li in args:
<do_something>
Well, now it works for the single integer case:
args = (1,)
(1,)
but when the user passes in an iterable, the $%^&*s come out again. My trick gives me a nested iterable:
args = ((1,2),)
((1,2),)
ARGH!
The Usual Suspects
There are of course the usual workarounds. Try/except clauses:
try:
args = tuple(args)
except TypeError:
args = tuple((args,))
These work, but I run into this issue A LOT. This is really a 1-line problem and try/except is a 4-line solution. I would really love it if I could just call:
tuple(1)
have it return (1,) and call it a day.
Other People Use this Language Too, You Know
Now I'm aware that my needs in my little corner of the Python programming universe don't apply to the rest of the Python world. Being dynamically typed makes Python such a wonderful language to work in -- especially after years of work in neurotic languages such as C. (Sorry, sorry. I'm not bashing C. It's quite good at what its good at, but you know: xkcd)
I'm quite sure the creators of the Python language have a very good reason to not allow tuple(1).
Question 1
Will someone please explain why the creators chose to not allow tuple(1) and/or list(1) to work? I'm sure its completely sane reason and bloody obvious to many. I may have just missed it during my tenure at the School of Hard Knocks. (It's on the other side of the hill from Hogwarts.)
Question 2
Is there a more practical -- hopefully 1-line -- way to make the following conversion?
X(1) -> (1,)
X((1,2)) -> (1,2)
If not, I guess I could just break down and roll my own.
Duck typing explains and validates why list(1) fails.
The method was expecting a Duck but was given a Hamster, and Hamsters can't swim1.
In duck typing, a programmer is only concerned with ensuring that objects behave as demanded of them in a given context, rather than ensuring that they are of a specific type.
But not all objects/types behave the same, or "as demanded". In this case, an integer does not behave like an iterable and causes an exception. However, list("quack") works precisely because a string does act like an iterable and goes Quack - ['q','u','a','c','k']! To make list take a non-iterable would actually mean special casing, not duck typing.
Expecting an integer to "be iterable" sounds like a design issue because this requires an implicit change in multiplicity. That is, the concepts of a value and a sequence of [zero or more] values should be kept separate. Polymorphism doesn't apply in this case as polymorphism (of any type) only works over unification - and there is no unification to "iterate a non-iterable".
Furthermore, having list(itr) only accept an iterable fits in the strongly-typed Python model and avoids edge-cases. Consider that if list(x) was written in such a way that it allowed a non-iterable as well, one could not determine if the result would be [x] or [x0..xn] without knowing the value supplied. Python simply forbids this operation and puts the burden of changing multiplicity - or, passing a Duck - on the calling code.
See In Python, how do I determine if an object is iterable? which presents several solutions to wrap (or otherwise deal with) a non-iterable value.
While I do not recommend this approach, as it changes the multiplicity, I would likely write such a coercion function as follows. Unlike isinstance checks it will handle all non-iterable values. However, you will have to work out the rules for what should happen on ensure_iterable(None).
def ensure_iterable(x):
try:
return iter(x)
except TypeError:
return (x,)
And then "one line" usage:
for li in ensure_iterable(args):
pass
1 Hamsters can swim .. or at least stay afloat for a little bit. However I find the analogy is apt (and more memorable) precisely because a wet/drowning hamster is a sad thought. Keep those little critters safe and dry!
Try this:
if isinstance(variable, int):
variable = (variable, )

Floating point precision causes errors during comparison

I am using python 2.7.6. When I type this into the interpreter
>>> 0.135-0.027
0.10800000000000001
Whereas it should be just
0.108
This causes problem when comparing things, for example I want to compare
>>> 0.135-0.027 <= 0.108
False
I want this to give answer as True. Do I have to use a special package that will handle floats properly? Is there a way to fix this another way? For example we can force floating division with
from __future__ import division
Is there a similar solution to this problem?
There are various things you can do, but each has its own advantages and disadvantages.
The basic problem is that conversion from decimal to any finite binary representation involves rounding. If you were to use IEEE quadruple precision, for example, these cases would be rarer, but would still occur.
You could use a decimal library or an arbitrary precision library, but you may be unwilling to pay the cost in runtime for using them if you have to do trillions of these calculations.
In that case, you have to ask yourself the question, “How accurately do I really know these numbers?” Then you can consider, “Is it permissible for 0.135-0.027 <= 0.108 to be considered true?” In most cases, the answer to these is “not that accurately” and “yes” and your problem is solved. You might be uncomfortable with the solution, but it's swings and roundabouts: the errors are going to occur “both ways” (in the sense that it sometimes the comparison is going to fail when it should succeed, and sometimes it is going to succeed when it should fail).
If failing one way is perfectly OK, but failing the other way is absolutely not, you can either change the rounding mode of your hardware (to suit the bias you want), or you can add/subtract a ULP (to suit the bias you want).
For example, consider the following (sorry for the C, but I'm sure you get the idea):
double add_ulp(double x) {
union {
double x;
unsigned sign : 1;
unsigned expo : 11;
unsigned long mant : 52;
} inc;
inc.x = x;
inc.mant = 0;
if (inc.expo >= 52 ) {
inc.expo -= 52;
return x+inc.x;
}
return x;
}
You can use this like this:
if( x-y <= add_ulp(z) ) {
// ...
}
And it will give you the answer you want in your case, but it will bias your results in general. If that's the bias you want, it isn't a problem, but if it's not, it's worse than the problem you currently have.
Hope this helps.
This might help:
https://pythonhosted.org/bigfloat/
You can control the precision with this as well.

Categories

Resources