Why no optimization of Python 3 range object for floats? - python

Jumping off from a previous question I asked a while back:
Why is 1000000000000000 in range(1000000000000001) so fast in Python 3?
If you do this:
1000000000000000.0 in range(1000000000000001)
...it is clear that range has not been optimized to check if floats are within the specified range.
I think I understand that the intended purpose of range is to work with ints only - so you cannot, for example, do something like this:
1000000000000 in range(1000000000000001.0)
# error: float object cannot be interpreted as an integer
Or this:
1000000000000 in range(0, 1000000000000001, 1.0)
# error: float object cannot be interpreted as an integer
However, the decision was made, for whatever reason, to allow things like this:
1.0 in range(1)
It seems clear that 1.0 (and 1000000000000.0 above) are not being coerced into ints, because then the int optimization would work for those as well.
My question is, why the inconsistency, and why no optimization for floats? Or, alternatively, what is the rationale behind why the above code does not produce the same error as the previous examples?
This seems like an obvious optimization to include in addition to optimization for ints. I'm guessing there are some nuanced issues preventing a clean implementation of such optimization, or alternatively there is some kind of rationale as to why you would not actually want to include such an optimization. Or possibly both.
EDIT: To clarify the issue here a bit, all the following statements evaluated to False as well:
3.2 in range(5)
'' in range(1)
[] in range(1)
None in range(1)
This seems like unexpected behavior to me, but so far there is definitely no inconsistency. However, the following evaluates to True:
1.0 in range(2.0)
And as shown previously, constructions similar to the above have not been optimized.
This does seem inconsistent- at some point in the evaluation, the value 1.0 (or 1000000000001.0 as in my original example) is being coerced into an int. This makes sense since it is a natural thing to convert a float ending in .0 to an int. However, the question still remains: if it is being converted an int anyway, why has 1000000000000.0 in range(1000000000001) not been optimized?

There is no inconsistency here. Floating point values can't be coerced to integers, that only works the other way around. As such, range() won't implicitly convert floats to integers when testing for containment either.
A range() object is a sequence type; it contains discrete integer values (albeit virtually). As such, it has to support containment testing for any object that may test as equal. The following works too:
>>> class ThreeReally:
... def __eq__(self, other):
... return other == 3
...
>>> ThreeReally() in range(4)
True
This has to do a full scan over all possible values in the range to test for equality with each contained integer.
However, only when using actual integers can the optimisation be applied, as that's the only type where the range() object can know what values will be considered equal without conversion.

Related

Why is Pycharm giving me a warning when using the max function with ints and floats vars as arguments

I tried using the max fuction with a couple of int vars and a couple of float vars and i got a warning that int was expected as the correct type but the program is running fine. Did i do something wrong?
The problem is probably introduced by you specifically designating types - something that can be done in Python and is supported by PyCharm as well, of course, but isn't required as part of the language.
For example:
i: int = 1
a: float = 0.1
print(max(i, a))
This will show a PyCharm warning on a in print(max(i, a)).
And because PyCharm can infer the type, this will show the same warning:
i = 1
a = 0.1
print(max(i, a))
This on the other hand won't:
items = [1, 0.1]
print(max(items))
The reason for this is that PyCharm knows that the max() built-in function of Python will fail if incompatible types are passed into it. For example, max(1, '2') will cause a TypeError at runtime. That's why you get a warning if you pass in several arguments of varying types, PyCharm knows it may become a problem and will give the warning on the first argument that doesn't match the types of the preceding ones.
The reason that the list doesn't give you the same problem is that PyCharm only looks at the types of the call's arguments (a single list in this case), but doesn't have the information that the max() function will of course return the max of the elements in the list - it cannot determine this from how the max() function is defined in the libraries, even though it may be obvious to you and me.
You can avoid the error if you know it's not a problem by wrapping the arguments in an iterable like a list or tuple, by casting the integers to a float, or by explicitly ignoring the warning. Or by taking a look at your code and deciding if you really should be comparing ints and floats.
i: int = 1
a: float = 0.1
print(max([i, a]))
print(max(float(i), a))
# noinspection PyTypeChecker
print(max(i, a))
Note that the final 'solution' will be specific to PyCharm, the rest should give you good results in any editor.
The definition of max() assumes that all of the inputs will have the same type, which is okay but it causes pycharm to show a warning because you're mixing types. You can either
Use an iterable instead (e.g. max((1.4, 11, 17))
or
Suppress the warning, because it's pretty inconsequential. You can do this in the pycharm GUI

Is there a Python alternative for len that returns 1 for simple float

Is there a way in Python to let the len(x) function (or any similar function) return 1 if x is a simple float?
In my case, I have x as a input parameter to a function, and I want to make it robust to (NumPy) array type inputs of x as well as simple scalar float input of x. In the function, the len(x) function is used, but it throws an error object of type 'float' has no len() if x is a float, whereas I want it to return 1. Of course I can write an if statement, but I feel like there should be a shorter solution.
def myfunc(x):
y=np.zeros((5,len(x)))
y[1,:]=x
....
return y
No, there is no built-in function like this.
One of the core aspects of how Python is designed is that it is strongly typed, meaning that values are not implicitly coerced from one type to another. For example, you cannot do 'foo' + 3 to make a string 'foo3'; you have to write 'foo' + str(3) to explicitly convert the int to str in order to use string concatenation. So having built-in operators or functions which could treat a scalar value as if it's a sequence of length 1 would violate the principle of strong typing.
This is in contrast with weakly typed languages like Javascript and PHP, where type coercion is done with the idea that the programmer doesn't have to think so much about data types and how they are converted; in practice, if you write in these languages then you still do have to think about types and conversions, you just have to also know which conversions are or aren't done implicitly.
So, in Python if you want a function to work with multiple different data types, then you either have to do a conversion explicitly (e.g. if isinstance(x, float): x = np.array([x])) or you have to only use operations which are supported by every data type your function accepts (i.e. duck typing).

How should I type-hint an integer variable that can also be infinite?

Searching for this topic I came across the following: How to represent integer infinity?
I agree with Martijn Peeters that adding a separate special infinity value for int may not be the best of ideas.
However, this makes type hinting difficult. Assume the following code:
myvar = 10 # type: int
myvar = math.inf # <-- raises a typing error because math.inf is a float
However, the code behaves everywhere just the way as it should. And my type hinting is correct everywhere else.
If I write the following instead:
myvar = 10 # type: Union[int, float]
I can assign math.inf without a hitch. But now any other float is accepted as well.
Is there a way to properly constrain the type-hint? Or am I forced to use type: ignore each time I assign infinity?
The super lazy (and probably incorrect) solution:
Rather than adding a specific value, the int class can be extended via subclassing. This approach is not without a number of pitfalls and challenges, such as the requirement to handle the infinity value for the various __dunder__ methods (i.e. __add__, __mul__, __eq__ and the like, and all of these should be tested). This would be an unacceptable amount of overhead in the use cases where a specific value is required. In such a case, wrapping the desired value with typing.cast would be able to better indicate to the type hinting system the specific value (i.e. inf = cast(int, math.inf)) be acceptable for assignment.
The reason why this approach is incorrect is simply this: since the value assigned looks/feels exactly like some number, some other users of your API may end up inadvertently use this as an int and then the program may explode on them badly when math.inf (or variations of such) be provided.
An analogy is this: given that lists have items that are indexed by positive integers, we would expect that any function that return an index to some item be some positive integer so we may use it directly (I know this is not the case in Python given there are semantics that allow negative index values be used, but pretend we are working with say C for the moment). Say this function return the first occurrence of the matched item, but if there are any errors it return some negative number, which clearly exceed the range of valid values for an index to some item. This lack of guarding against naive usage of the returned value will inevitably result in problems that a type system is supposed to solve.
In essence, creating surrogate values and marking that as an int will offer zero value, and inevitably allow unexpected and broken API/behavior to be exhibited by the program due to incorrect usage be automatically allowed.
Not to mention the fact that infinity is not a number, thus no int value can properly represent that (given that int represent some finite number by its very nature).
As an aside, check out str.index vs str.find. One of these have a return value that definitely violate user expectations (i.e. exceed the boundaries of the type positive integer; won't be told that the return value may be invalid for the context which it may be used at during compile time, results in potential failure randomly at runtime).
Framing the question/answer in more correct terms:
Given the problem is really about the assignment of some integer when a rate exist, and if none exist some other token that represent unboundedness for the particular use case should be done (it could be some built-in value such as NotImplemented or None). However as those tokens would also not be int values, it means myvar would actually need a type that encompasses those, and with a way to apply operation that would do the right thing.
This unfortunately isn't directly available in Python in a very nice way, however in strongly static typed languages like Haskell, the more accepted solution is to use a Maybe type to define a number type that can accept infinity. Note that while floating point infinity is also available there, it inherits all the problems of floating point numbers that makes that an untenable solution (again, don't use inf for this).
Back to Python: depending on the property of the assignment you actually want, it could be as simple as creating a class with a constructor that can either accept an int or None (or NotImplemented), and then provide a method which the users of the class may make use of the actual value. Python unfortunately do not provide the advanced constructs to make this elegant so you will inevitably end up with code managing this be splattered all over the place, or have to write a number of methods that handle whatever input as expected and produce the required output in the specific ways your program actual needs.
Unfortunately, type-hinting is really only scratching the surface and simply grazing over of what more advanced languages have provided and solved at a more fundamental level. I supposed if one must program in Python, it is better than not having it.
Facing the same problem, I "solved" as follow.
from typing import Union
import math
Ordinal = Union[int, float] # int or infinity
def fun(x:Ordinal)->Ordinal:
if x > 0:
return x
return math.inf
Formally, it does exactly what you did not wanted to. But now the intend is clearer. When the user sees Ordinal, he knows that it is expected to be int or math.inf.
and the linter is happy.

Modulo in sage returning a negative value

I am new to SAGE and am having a problem with something very simple. I have the following code:
delay = float(3.5)
D = delay%1.0
D
But this returns the value -0.5 instead of the expected 0.5. What am I doing wrong?
If I change delay to be delay = float(2.5), I get the right answer, so I don't know why it isn't consistent (I am sure I am using the modulo wrong somehow).
I think that this question will answer things very well indeed for you.
However, I don't know why you are using float in Sage. Then you could just use Python straight up. Anyway, the % operator is tricky to use outside of integers. For example, here is the docstring for its use on Sage rational numbers.
Return the remainder of division of self by other, where other is
coerced to an integer
INPUT:
* ``other`` - object that coerces to an integer.
OUTPUT: integer
EXAMPLES:
sage: (-4/17).__mod__(3/1)
1
I assume this is considered to be a feature, not a bug.

Avoiding Python sum default start arg behavior

I am working with a Python object that implements __add__, but does not subclass int. MyObj1 + MyObj2 works fine, but sum([MyObj1, MyObj2]) led to a TypeError, becausesum() first attempts 0 + MyObj. In order to use sum(), my object needs __radd__ to handle MyObj + 0 or I need to provide an empty object as the start parameter. The object in question is not designed to be empty.
Before anyone asks, the object is not list-like or string-like, so use of join() or itertools would not help.
Edit for details: the module has a SimpleLocation and a CompoundLocation. I'll abbreviate Location to Loc. A SimpleLoc contains one right-open interval, i.e. [start, end). Adding SimpleLoc yields a CompoundLoc, which contains a list of the intervals, e.g. [[3, 6), [10, 13)]. End uses include iterating through the union, e.g. [3, 4, 5, 10, 11, 12], checking length, and checking membership.
The numbers can be relatively large (say, smaller than 2^32 but commonly 2^20). The intervals probably won't be extremely long (100-2000, but could be longer). Currently, only the endpoints are stored. I am now tentatively thinking of attempting to subclass set such that the location is constructed as set(xrange(start, end)). However, adding sets will give Python (and mathematicians) fits.
Questions I've looked at:
python's sum() and non-integer values
why there's a start argument in python's built-in sum function
TypeError after overriding the __add__ method
I'm considering two solutions. One is to avoid sum() and use the loop offered in this comment. I don't understand why sum() begins by adding the 0th item of the iterable to 0 rather than adding the 0th and 1st items (like the loop in the linked comment); I hope there's an arcane integer optimization reason.
My other solution is as follows; while I don't like the hard-coded zero check, it's the only way I've been able to make sum() work.
# ...
def __radd__(self, other):
# This allows sum() to work (the default start value is zero)
if other == 0:
return self
return self.__add__(other)
In summary, is there another way to use sum() on objects that can neither be added to integers nor be empty?
Instead of sum, use:
import operator
from functools import reduce
reduce(operator.add, seq)
in Python 2 reduce was built-in so this looks like:
import operator
reduce(operator.add, seq)
Reduce is generally more flexible than sum - you can provide any binary function, not only add, and you can optionally provide an initial element while sum always uses one.
Also note: (Warning: maths rant ahead)
Providing support for add w/r/t objects that have no neutral element is a bit awkward from the algebraic points of view.
Note that all of:
naturals
reals
complex numbers
N-d vectors
NxM matrices
strings
together with addition form a Monoid - i.e. they are associative and have some kind of neutral element.
If your operation isn't associative and doesn't have a neutral element, then it doesn't "resemble" addition. Hence, don't expect it to work well with sum.
In such case, you might be better off with using a function or a method instead of an operator. This may be less confusing since the users of your class, seeing that it supports +, are likely to expect that it will behave in a monoidic way (as addition normally does).
Thanks for expanding, I'll refer to your particular module now:
There are 2 concepts here:
Simple locations,
Compound locations.
It indeed makes sense that simple locations could be added, but they don't form a monoid because their addition doesn't satisfy the basic property of closure - the sum of two SimpleLocs isn't a SimpleLoc. It's, generally, a CompoundLoc.
OTOH, CompoundLocs with addition looks like a monoid to me (a commutative monoid, while we're at it): A sum of those is a CompoundLoc too, and their addition is associative, commutative and the neutral element is an empty CompoundLoc that contains zero SimpleLocs.
If you agree with me (and the above matches your implementation), then you'll be able to use sum as following:
sum( [SimpleLoc1, SimpleLoc2, SimpleLoc3], start=ComplexLoc() )
Indeed, this appears to work.
I am now tentatively thinking of attempting to subclass set such that the location is constructed as set(xrange(start, end)). However, adding sets will give Python (and mathematicians) fits.
Well, locations are some sets of numbers, so it makes sense to throw a set-like interface on top of them (so __contains__, __iter__, __len__, perhaps __or__ as an alias of +, __and__ as the product, etc).
As for construction from xrange, do you really need it? If you know that you're storing sets of intervals, then you're likely to save space by sticking to your representation of [start, end) pairs. You could throw in an utility method that takes an arbitrary sequence of integers and translates it to an optimal SimpleLoc or CompoundLoc if you feel it's going to help.
I think that the best way to accomplish this is to provide the __radd__ method, or pass the start object to sum explicitly.
In case you really do not want to override __radd__ or provide a start object, how about redefining sum()?
>>> from __builtin__ import sum as builtin_sum
>>> def sum(iterable, startobj=MyCustomStartObject):
... return builtin_sum(iterable, startobj)
...
Preferably use a function with a name like my_sum(), but I guess that is one of the things you want to avoid (even though globally redefining builtin functions is probably something that a future maintainer will curse you for)
Actually, implementing __add__ without the concept of an "empty object" makes little sense. sum needs a start parameter to support the sums of empty and one-element sequences, and you have to decide what result you expect in these cases:
sum([o1, o2]) => o1 + o2 # obviously
sum([o1]) => o1 # But how should __add__ be called here? Not at all?
sum([]) => ? # What now?
You could use an object that's universally neutral wrt. addition:
class Neutral:
def __add__(self, other):
return other
print(sum("A BC D EFG".split(), Neutral())) # ABCDEFG
You could so something like:
from operator import add
try:
total = reduce(add, whatever) # or functools.reduce in Py3.x
except TypeError as e:
# I'm not 100% happy about branching on the exception text, but
# figure this msg isn't likely to be changed after so long...
if e.args[0] == 'reduce() of empty sequence with no initial value':
pass # do something appropriate here if necessary
else:
pass # Most likely that + isn't usable between objects...

Categories

Resources