I'd like to represent an arbitrarily complex range of real values, which can be discontinuous, i.e.:
0--4 and 5--6 and 7.12423--8
Where I'll be adding new ranges incrementally:
(0--4 and 5--6 and 7.12423--8) | ( 2--7) = (0--7 and 7.12423--8)
I don't really know the right language to describe this, so I'm struggling to search, but it seems like a class probably already exists to do what I want to do. Does it?
There are at least a couple of packages listed in the Python Package Index which deal with intervals:
interval
pyinterval
I've experimented with interval before and found it to be well-written and documented (at the moment its website seems to be unavailable). I've not used pyinterval.
In addition to the interval and pyinterval packages mentioned by Ned, there is also pyinter.
As pyinterval didn't compile on my machine, I only played with interval and pyinter.
The former seems better to me, because it has addition and subtraction operators defined for interval sets, which pyinter has not. Also when I tried to calculate the union of two discrete points, it worked as expected in interval, but raised AttributeError ("'int' object has no attribute 'overlaps'") in pyinter.
One very visible difference of pyinter was the the __repr__ function of the interval class which would output (7,9] instead of Interval(7, 9, lower_closed=False, upper_closed=True) (the latter is the representation of the interval package). While this is nice for quick interactive work, closed intervals might be confunded with two-element lists. Here I also like the interval package's approach more: It has a less ambiguous representation, but additionally defines a __str__ method, so that when calling str() or print() on the example interval, it would output as (7..9].
Related
I'd like to represent an arbitrarily complex range of real values, which can be discontinuous, i.e.:
0--4 and 5--6 and 7.12423--8
Where I'll be adding new ranges incrementally:
(0--4 and 5--6 and 7.12423--8) | ( 2--7) = (0--7 and 7.12423--8)
I don't really know the right language to describe this, so I'm struggling to search, but it seems like a class probably already exists to do what I want to do. Does it?
There are at least a couple of packages listed in the Python Package Index which deal with intervals:
interval
pyinterval
I've experimented with interval before and found it to be well-written and documented (at the moment its website seems to be unavailable). I've not used pyinterval.
In addition to the interval and pyinterval packages mentioned by Ned, there is also pyinter.
As pyinterval didn't compile on my machine, I only played with interval and pyinter.
The former seems better to me, because it has addition and subtraction operators defined for interval sets, which pyinter has not. Also when I tried to calculate the union of two discrete points, it worked as expected in interval, but raised AttributeError ("'int' object has no attribute 'overlaps'") in pyinter.
One very visible difference of pyinter was the the __repr__ function of the interval class which would output (7,9] instead of Interval(7, 9, lower_closed=False, upper_closed=True) (the latter is the representation of the interval package). While this is nice for quick interactive work, closed intervals might be confunded with two-element lists. Here I also like the interval package's approach more: It has a less ambiguous representation, but additionally defines a __str__ method, so that when calling str() or print() on the example interval, it would output as (7..9].
Searching for this topic I came across the following: How to represent integer infinity?
I agree with Martijn Peeters that adding a separate special infinity value for int may not be the best of ideas.
However, this makes type hinting difficult. Assume the following code:
myvar = 10 # type: int
myvar = math.inf # <-- raises a typing error because math.inf is a float
However, the code behaves everywhere just the way as it should. And my type hinting is correct everywhere else.
If I write the following instead:
myvar = 10 # type: Union[int, float]
I can assign math.inf without a hitch. But now any other float is accepted as well.
Is there a way to properly constrain the type-hint? Or am I forced to use type: ignore each time I assign infinity?
The super lazy (and probably incorrect) solution:
Rather than adding a specific value, the int class can be extended via subclassing. This approach is not without a number of pitfalls and challenges, such as the requirement to handle the infinity value for the various __dunder__ methods (i.e. __add__, __mul__, __eq__ and the like, and all of these should be tested). This would be an unacceptable amount of overhead in the use cases where a specific value is required. In such a case, wrapping the desired value with typing.cast would be able to better indicate to the type hinting system the specific value (i.e. inf = cast(int, math.inf)) be acceptable for assignment.
The reason why this approach is incorrect is simply this: since the value assigned looks/feels exactly like some number, some other users of your API may end up inadvertently use this as an int and then the program may explode on them badly when math.inf (or variations of such) be provided.
An analogy is this: given that lists have items that are indexed by positive integers, we would expect that any function that return an index to some item be some positive integer so we may use it directly (I know this is not the case in Python given there are semantics that allow negative index values be used, but pretend we are working with say C for the moment). Say this function return the first occurrence of the matched item, but if there are any errors it return some negative number, which clearly exceed the range of valid values for an index to some item. This lack of guarding against naive usage of the returned value will inevitably result in problems that a type system is supposed to solve.
In essence, creating surrogate values and marking that as an int will offer zero value, and inevitably allow unexpected and broken API/behavior to be exhibited by the program due to incorrect usage be automatically allowed.
Not to mention the fact that infinity is not a number, thus no int value can properly represent that (given that int represent some finite number by its very nature).
As an aside, check out str.index vs str.find. One of these have a return value that definitely violate user expectations (i.e. exceed the boundaries of the type positive integer; won't be told that the return value may be invalid for the context which it may be used at during compile time, results in potential failure randomly at runtime).
Framing the question/answer in more correct terms:
Given the problem is really about the assignment of some integer when a rate exist, and if none exist some other token that represent unboundedness for the particular use case should be done (it could be some built-in value such as NotImplemented or None). However as those tokens would also not be int values, it means myvar would actually need a type that encompasses those, and with a way to apply operation that would do the right thing.
This unfortunately isn't directly available in Python in a very nice way, however in strongly static typed languages like Haskell, the more accepted solution is to use a Maybe type to define a number type that can accept infinity. Note that while floating point infinity is also available there, it inherits all the problems of floating point numbers that makes that an untenable solution (again, don't use inf for this).
Back to Python: depending on the property of the assignment you actually want, it could be as simple as creating a class with a constructor that can either accept an int or None (or NotImplemented), and then provide a method which the users of the class may make use of the actual value. Python unfortunately do not provide the advanced constructs to make this elegant so you will inevitably end up with code managing this be splattered all over the place, or have to write a number of methods that handle whatever input as expected and produce the required output in the specific ways your program actual needs.
Unfortunately, type-hinting is really only scratching the surface and simply grazing over of what more advanced languages have provided and solved at a more fundamental level. I supposed if one must program in Python, it is better than not having it.
Facing the same problem, I "solved" as follow.
from typing import Union
import math
Ordinal = Union[int, float] # int or infinity
def fun(x:Ordinal)->Ordinal:
if x > 0:
return x
return math.inf
Formally, it does exactly what you did not wanted to. But now the intend is clearer. When the user sees Ordinal, he knows that it is expected to be int or math.inf.
and the linter is happy.
As the title asks. Python has a lot of special methods, __add__, __len__, __contains__ et c. Why is there no __max__ method that is called when doing max? Example code:
class A:
def __max__():
return 5
a = A()
max(a)
It seems like range() and other constructs could benefit from this. Am I missing some other effective way to do max?ยจ
Addendum 1:
As a trivial example, max(range(1000000000)) takes a long time to run.
I have no authoritative answer but I can offer my thoughts on the subject.
There are several built-in functions that have no corresponding special method. For example:
max
min
sum
all
any
One thing they have in common is that they are reduce-like: They iterate over an iterable and "reduce" it to one value. The point here is that these are more of a building block.
For example you often wrap the iterable in a generator (or another comprehension, or transformation like map or filter) before applying them:
sum(abs(val) for val in iterable) # sum of absolutes
any(val > 10 for val in iterable) # is one value over 10
max(person.age for person in iterable) # the oldest person
That means most of the time it wouldn't even call the __max__ of the iterable but try to access it on the generator (which isn't implemented and cannot be implemented).
So there is simply not much of a benefit if these were implemented. And in the few cases when it makes sense to implement them it would be more obvious if you create a custom method (or property) because it highlights that it's a "shortcut" or that it's different from the "normal result".
For example these functions (min, etc.) have O(n) run-time, so if you can do better (for example if you have a sorted list you could access the max in O(1)) it might make sense to document that explicitly.
Some operations are not basic operations. Take max as an example, it is actually an operation based on comparison. In other words, when you get a max value, you are actually getting a biggest value.
So in this case, why should we implement a specified max function but not override the behave of comparison?
Think in another direction, what does max really mean? For example, when we execute max(list), what are we doing?
I think we are actually checking list's elements, and the max operation is not related to list itself at all.
list is just a container which is unnecessary in max operation. It is list or set or something else, it doesn't matter. What really useful is the elements inside this container.
So if we define a __max__ action for list, we are actually doing another totally different operation. We are asking a container to give us advice about max value.
I think in this case, as it is a totally different operation, it should be a method of container instead of overriding built-in function's behave.
I am working with a Python object that implements __add__, but does not subclass int. MyObj1 + MyObj2 works fine, but sum([MyObj1, MyObj2]) led to a TypeError, becausesum() first attempts 0 + MyObj. In order to use sum(), my object needs __radd__ to handle MyObj + 0 or I need to provide an empty object as the start parameter. The object in question is not designed to be empty.
Before anyone asks, the object is not list-like or string-like, so use of join() or itertools would not help.
Edit for details: the module has a SimpleLocation and a CompoundLocation. I'll abbreviate Location to Loc. A SimpleLoc contains one right-open interval, i.e. [start, end). Adding SimpleLoc yields a CompoundLoc, which contains a list of the intervals, e.g. [[3, 6), [10, 13)]. End uses include iterating through the union, e.g. [3, 4, 5, 10, 11, 12], checking length, and checking membership.
The numbers can be relatively large (say, smaller than 2^32 but commonly 2^20). The intervals probably won't be extremely long (100-2000, but could be longer). Currently, only the endpoints are stored. I am now tentatively thinking of attempting to subclass set such that the location is constructed as set(xrange(start, end)). However, adding sets will give Python (and mathematicians) fits.
Questions I've looked at:
python's sum() and non-integer values
why there's a start argument in python's built-in sum function
TypeError after overriding the __add__ method
I'm considering two solutions. One is to avoid sum() and use the loop offered in this comment. I don't understand why sum() begins by adding the 0th item of the iterable to 0 rather than adding the 0th and 1st items (like the loop in the linked comment); I hope there's an arcane integer optimization reason.
My other solution is as follows; while I don't like the hard-coded zero check, it's the only way I've been able to make sum() work.
# ...
def __radd__(self, other):
# This allows sum() to work (the default start value is zero)
if other == 0:
return self
return self.__add__(other)
In summary, is there another way to use sum() on objects that can neither be added to integers nor be empty?
Instead of sum, use:
import operator
from functools import reduce
reduce(operator.add, seq)
in Python 2 reduce was built-in so this looks like:
import operator
reduce(operator.add, seq)
Reduce is generally more flexible than sum - you can provide any binary function, not only add, and you can optionally provide an initial element while sum always uses one.
Also note: (Warning: maths rant ahead)
Providing support for add w/r/t objects that have no neutral element is a bit awkward from the algebraic points of view.
Note that all of:
naturals
reals
complex numbers
N-d vectors
NxM matrices
strings
together with addition form a Monoid - i.e. they are associative and have some kind of neutral element.
If your operation isn't associative and doesn't have a neutral element, then it doesn't "resemble" addition. Hence, don't expect it to work well with sum.
In such case, you might be better off with using a function or a method instead of an operator. This may be less confusing since the users of your class, seeing that it supports +, are likely to expect that it will behave in a monoidic way (as addition normally does).
Thanks for expanding, I'll refer to your particular module now:
There are 2 concepts here:
Simple locations,
Compound locations.
It indeed makes sense that simple locations could be added, but they don't form a monoid because their addition doesn't satisfy the basic property of closure - the sum of two SimpleLocs isn't a SimpleLoc. It's, generally, a CompoundLoc.
OTOH, CompoundLocs with addition looks like a monoid to me (a commutative monoid, while we're at it): A sum of those is a CompoundLoc too, and their addition is associative, commutative and the neutral element is an empty CompoundLoc that contains zero SimpleLocs.
If you agree with me (and the above matches your implementation), then you'll be able to use sum as following:
sum( [SimpleLoc1, SimpleLoc2, SimpleLoc3], start=ComplexLoc() )
Indeed, this appears to work.
I am now tentatively thinking of attempting to subclass set such that the location is constructed as set(xrange(start, end)). However, adding sets will give Python (and mathematicians) fits.
Well, locations are some sets of numbers, so it makes sense to throw a set-like interface on top of them (so __contains__, __iter__, __len__, perhaps __or__ as an alias of +, __and__ as the product, etc).
As for construction from xrange, do you really need it? If you know that you're storing sets of intervals, then you're likely to save space by sticking to your representation of [start, end) pairs. You could throw in an utility method that takes an arbitrary sequence of integers and translates it to an optimal SimpleLoc or CompoundLoc if you feel it's going to help.
I think that the best way to accomplish this is to provide the __radd__ method, or pass the start object to sum explicitly.
In case you really do not want to override __radd__ or provide a start object, how about redefining sum()?
>>> from __builtin__ import sum as builtin_sum
>>> def sum(iterable, startobj=MyCustomStartObject):
... return builtin_sum(iterable, startobj)
...
Preferably use a function with a name like my_sum(), but I guess that is one of the things you want to avoid (even though globally redefining builtin functions is probably something that a future maintainer will curse you for)
Actually, implementing __add__ without the concept of an "empty object" makes little sense. sum needs a start parameter to support the sums of empty and one-element sequences, and you have to decide what result you expect in these cases:
sum([o1, o2]) => o1 + o2 # obviously
sum([o1]) => o1 # But how should __add__ be called here? Not at all?
sum([]) => ? # What now?
You could use an object that's universally neutral wrt. addition:
class Neutral:
def __add__(self, other):
return other
print(sum("A BC D EFG".split(), Neutral())) # ABCDEFG
You could so something like:
from operator import add
try:
total = reduce(add, whatever) # or functools.reduce in Py3.x
except TypeError as e:
# I'm not 100% happy about branching on the exception text, but
# figure this msg isn't likely to be changed after so long...
if e.args[0] == 'reduce() of empty sequence with no initial value':
pass # do something appropriate here if necessary
else:
pass # Most likely that + isn't usable between objects...
I like to do some silly stuff with python like solving programming puzzles, writing small scripts etc. Each time at a certain point I'm facing a dilemma whether I should create a new class to represent my data or just use quick and dirty and go with all values packed in a list or tuple. Due to extreme laziness and personal dislike of self keyword I usually go with the second option.
I understand than in the long run user defined data type is better because path.min_cost and point.x, point.y is much more expressive than path[2] and point[0], point[1]. But when I just need to return multiple things from a function it strikes me as too much work.
So my question is what is the good rule of thumb for choosing when to create user defined data type and when to go with a list or tuple? Or maybe there is a neat pythonic way I'm not aware of?
Thanks.
Are you aware of collections.namedtuple? (since 2.6)
def getLocation(stuff):
return collections.namedtuple('Point', 'x, y')(x, y)
or, more efficiently,
Point = collections.namedtuple('Point', 'x, y')
def getLocation(stuff):
return Point(x, y)
namedtuple can be accessed by index (point[0]) and unpacked (x, y = point) the same way as tuple, so it offers a nearly painless upgrade path.
First, an observation about expressivity. You mentioned being concerned about the relative expressivity of point.x, point.y vs. point[0], point[1], but this is a problem that can be solved in more than one way. In fact, for a simple point structure, I think there's an argument to be made that a class is overkill, especially when you could just do this:
x, y = get_point(foo)
I would say this is just about as expressive as point.x, point.y; it's also likely to be faster (than a vanilla class, anyway -- no __dict__ lookups) and it's quite readable, assuming the tuple contains just a few items.
My approach to deciding whether to put something in a class has more to do with the way I'll use the data in the program as a whole: I ask myself "is this state?" If I have some data that I know will change a lot, and needs to be stored in one place and manipulated by a group of purpose-built functions, then I know that data is probably state, and I should at least consider putting it in a class. On the other hand, if I have some data that won't change, or is ephemeral and should disappear once I'm done with it, it's probably not state, and probably doesn't need to go into a class.
This is, of course, just a rule of thumb; for example, I can think of cases where you might need some kind of "record" type so that you can manipulate a pretty complex collection of data without having 15 different local variables (hence the existence of namdetuple). But often, if you're manipulating just one or two of them, you'll be better off creating a function that just accepts one or two values and returns one or two values, and for that, a tuple or list is perfectly fine.
This is certainly subjective, but I would try to observe the principle of least surprise.
If the values you return describe the characteristics of an object (like point.x and point.y in your example), then I would use a class.
If they are not part of the same object, (let's say return min, max) then they should be a tuple.