How do I overload `+` on a NamedTuple - python

How do I create a custom overloading of plus for a named tuple in 3.5? I know there's some new syntax in 3.6 for this, can you do it in 3.5? I want it also to pass mypy checks.
from typing import NamedTuple
Point = NamedTuple('Point',[('x',int),
('y',int)])
def joinPoints(a: Point, b:Point) -> Point:
return Point(x = a.x+b.x,y=a.y+b.y)
q = Point(1,2)
r = Point(3,4)
s= joinPoints(q,r)
t = q+r #HOW DO I MAKE THIS GO?
#s should equal t

As a note, what the new, class-based syntax for defining typed namedtuples in Python 3.6 is ultimately doing at runtime is basically a bunch of metaprogramming hijinkery to make a custom class, which happens to also contain your custom __add__ method, if you included one.
Since we can't have that syntax in Python 3.5, the best you can really do is to just implement all that boilerplate yourself, I'm afraid.
Remember, namedtuples are basically meant to be a convenient way of defining simple classes (that subclass tuple), nothing more. If you want anything more complex, you're actually going to need to implement it yourself.
In any case, setting aside types completely, there isn't a super clean way of doing what you're trying to do at runtime, much less with types (at least, to the best of my knowledge). I guess one sort of clean way would be to manually patch the Point class after you define it, like so:
from typing import NamedTuple
Point = NamedTuple('Point', [('x', int), ('y', int)])
def add(self: Point, other: Point) -> Point:
return Point(self.x + other.x, self.y + other.y)
Point.__add__ = add
a = Point(1, 2)
b = Point(3, 4)
print(a + b) # prints 'Point(4, 6)'
However, you'd have to give up on using mypy then -- mypy makes the simplifying (and usually reasonable) assumption that a class's attributes and type signatures will not change after that class has been defined, and so will disallow assigning a new method to Point and will consequently will throw an error on the last line.
Perhaps there's some cleverer approach (maybe using the abc module somehow?) that ends up satisfying you, the Python runtime, and mypy, but I'm currently not aware of such an approach.

Related

How to "fool" duck typing in Python

Suppose I had a class A:
class A:
def __init__(self, x, y):
self.x = x
self.y = y
def sum(self):
return self.x + self.y
And I defined a factory method called factory:
def factory(x, y):
class B: pass
b = B()
setattr(b, 'x', x)
setattr(b, 'y', y)
B.__name__ = 'A'
return b
Now, If I do print(type(A(1, 2))) and print(type(factory(1, 2))) they will show that these are different types. And if I try to do factory(1, 2).sum() I'll get an exception. But, type(A).__name__ and type(factory(1, 2)).__name__ are equivalent and if I do A.sum(factory(1, 2)) I'll get 3, as if I was calling it using an A. So, my question is this:
What would I need to do here to make factory(1, 2).sum() work without defining sum on B or doing inheritance?
I think you're fundamentally misunderstanding the factory pattern, and possibly getting confused with how interfaces work in Python. Either that, or I am fundamentally confused by the question. Either way, there's some sorting out we need to do.
What would I need to do here to make factory(1, 2).sum() work without
defining sum on B or doing inheritance?
Just return an A instead of some other type:
def factory(x, y):
return A(x, y)
then
print(factory(1,2).sum())
will output 3 as expected. But that's kind of a useless factory...could just do A(x, y) and be done with it!
Some notes:
You typically use a "factory" (or factory pattern) when you have easily "nameable" types that may be non-trivial to construct. Consider how when you use scipy.interpolate.interp1d (see here) there's an option for kind, which is basically an enum for all the different strategies you might use to do an interpolation. This is, in essence, a factory (but hidden inside the function for ease of use). You could imagine this could be standalone, so you'd call your "strategy" factory, and then pass this on to the interp1d call. However, doing it inline is a common pattern in Python. Observe: These strategies are easy to "name", somewhat hard to construct in general (you can imagine it would be annoying to have to pass in a function that does linear interpolation as opposed to just doing kind='linear'). That's what makes the factory pattern useful...
If you don't know what A is up front, then it's definitely not the factory pattern you'd want to apply. Furthermore, if you don't know what you're serializing/deserializing, it would be impossible to call it or use it. You'd have to know that, or have some way of inferring it.
Interfaces in Python are not enforced like they are in other languages like Java/C++. That's the spirit of duck typing. If an interface does something like call x.sum(), then it doesn't matter what type x actually is, it just has to have a method called sum(). If it acts like the "sum" duck, quacks like the "sum" duck, then it is the "sum" duck from Python's perspective. Doesn't matter if x is a numpy array, or A, it'll work all the same. In Java/C++, stuff like that wont compile unless the compiler is absolutely certain that x has the method sum defined. Fortunately Python isn't like that, so you can even define it on the fly (which maybe you were trying to do with B). Either way, interfaces are a much different concept in Python than in other mainstream languages.
P.S.
But, type(A).__name__ and type(factory(1, 2)).__name__ are equivalent
Of course they are, you explicitly do this when you say B.__name__ = 'A'. So I'm not sure what you were trying to get at there...
HTH!

Can I change the default typing of variables in Python?

Ideally I would like to be able to redirect assignment to my classes rather than the built in Python classes. For instance:
class MyInt(int):
def __add__(self, other):
#some type checking logic
#convert string to int if needed
return int(self) + int(other)
# def __everything_else__ ...
a = 1
# sure I could declare:
a = MyInt(1)
# but this isn't ideal because:
b = 2
print a + b # cool, this uses my overload
print b + a # hmmm, not my logic, bad
#Now consider:
c = "1"
print b < c # Does not make sense, really 2 < 1, not very Perly :)
The environment is IronPython 2.7
Any thoughts? I know it is against the python mantra to force typing on folks but sometimes people need to be protected from themselves, particularly when no errors are thrown. Case in point:
How does Python compare string and int?
This is a massive case against python as the scripting environment for us, particularly as we need to port Perl code (see "type as you use").
Really I think I may be relying too much on compiler errors as .NET is the runtime but it is important for errors to not cascade.
Changing literals to use your types isn't really what you want. After
all, len(thing) isn't a literal, but you're going to want that to be a
MyInt too. I might say what you want is to change the behavior of
mixed-type operations for built-in types, but that's not quite general
enough. I think what you want is Perl. If you want to try to make
Python behave like Perl at such a low level as this, you're going to
run into zillions of other headaches. I recommend either sticking with
Perl, or changing the code that relies on Perl-style behavior when you
port it.
Indeed. It isn't just an operator overloading exercise, it is at the core of the language. There is code, that its behavior would possibly change by meddling with the fundamental types.

Pythonic way to Implement Data Types (Python 2.7)

The majority of my programming experience has been with C++. Inspired by Bjarne Stroustrup's talk here, one of my favorite programming techniques is "type-rich" programming; the development of new robust data-types that will not only reduce the amount of code I have to write by wrapping functionality into the type (for example vector addition, instead of newVec.x = vec1.x + vec2.x; newVec.y = ... etc, we can just use newVec = vec1 + vec2) but will also reveal problems in your code at compile time through the strong type system.
A recent project I have undertaken in Python 2.7 requires integer values that have upper and lower bounds. My first instinct is to create a new data type (class) that will have all the same behavior as a normal number in python, but will always be within its (dynamic) boundary values.
class BoundInt:
def __init__(self, target = 0, low = 0, high = 1):
self.lowerLimit = low
self.upperLimit = high
self._value = target
self._balance()
def _balance(self):
if (self._value > self.upperLimit):
self._value = self.upperLimit
elif (self._value < self.lowerLimit):
self._value = self.lowerLimit
self._value = int(round(self._value))
def value(self):
self._balance()
return self._value
def set(self, target):
self._value = target
self._balance()
def __str__(self):
return str(self._value)
This is a good start, but it requires accessing the meat of these BoundInt types like so
x = BoundInt()
y = 4
x.set(y) #it would be nicer to do something like x = y
print y #prints "4"
print x #prints "1"
z = 2 + x.value() #again, it would be nicer to do z = 2 + x
print z #prints "3"
We can add a large number of python's "magic method" definitions to the class to add some more functionality:
def __add__(self, other):
return self._value + other
def __sub__(self, other):
return self._value - other
def __mul__(self, other):
return self._value * other
def __div__(self, other):
return self._value / other
def __pow__(self, power):
return self._value**power
def __radd__(self, other):
return self._value + other
#etc etc
Now the code is rapidly exploding in size, and there is a ton of repetition to what is being written, for very little return, this doesn't seem very pythonic at all.
Things get even more complicated when I start to want to construct BoundInt objects from normal python numbers (integers?), and other BoundInt objects
x = BoundInt()
y = BoundInt(x)
z = BoundInt(4)
Which, as far as I'm aware requires the use of rather large/ugly if/else type checking statements within the BoundInt() constructor, as python does not support (c style) overloading.
All of this feels terribly like trying to write c++ code in python, a cardinal sin if one of my favorite books, Code Complete 2, is taken seriously. I feel like I am swimming against the dynamic typing current, instead of letting it carry me forward.
I very much want to learn to code python 'pythonic-ally', what is the best way to approach this sort of problem domain? What are good resources to learn proper pythonic style?
There's plenty of code in the standard library, in popular PyPI modules, and in ActiveState recipes that does this kind of thing, so you're probably better off reading examples than trying to figure it out from first principles. Also, note that this is pretty similar to creating a list-like or dict-like class, which there are even more examples of.
However, there are some answers to what you want to do. I'll start with the most serious, then work backward.
Things get even more complicated when I start to want to construct BoundInt objects from normal python numbers (integers?), and other BoundInt objects
…
Which, as far as I'm aware requires the use of rather large/ugly if/else type checking statements within the BoundInt() constructor, as python does not support (c style) overloading.
Ah, but think about what you're doing: You're constructing a BoundInt from anything that can act like an integer, including, say, an actual int or a BoundInt, right? So, why not:
def __init__(self, target, low, high):
self.target, self.low, self.high = int(target), int(low), int(high)
I'm assuming you've already added a __int__ method to BoundInt, of course (the equivalent of a C++ explicit operator int() const).
Also, keep in mind that the lack of overloading isn't as serious as you're thinking coming from C++, because there is no "copy constructor" for making copies; you just pass the object around, and all that gets taken care of under the covers.
For example, imagine this C++ code:
BoundInt foo(BoundInt param) { BoundInt local = param; return local; }
BoundInt bar;
BoundInt baz = foo(bar);
This copies bar to param, param to local, local to an unnamed "return value" variable, and that to baz. Some of these will be optimized out, and others (in C++11) will use move instead of copy, but still, you've got 4 conceptual invocations of the copy/move constructors/assignment operators.
Now look at the Python equivalent:
def foo(param): local = param; return local
bar = BoundInt();
baz = foo(bar)
Here, we've just got one BoundInt instance—the one that was explicitly created—and all we're doing is binding new names to it. Even assigning baz as a member of a new object that outlives the scope of bar and baz won't make a copy. The only thing that makes a copy is explicitly calling BoundInt(baz) again. (This isn't quite 100% true, because someone can always inspect your object and attempt to clone it from the outside, and pickle, deepcopy, etc. may actually do so… but in that case, they're still not calling a "copy constructor" that you or the compiler wrote.)
Now, what about forwarding all those operators to the value?
Well, one possibility is to do it dynamically. The details depend on whether you're in Python 3 or 2 (and, for 2, how far back you need to support). But the idea is you just have a list of names, and for each one, you define a method with that name that calls the method of the same name on the value object. If you want a sketch of this, provide the extra info and ask, but you're probably better off looking for examples of dynamic method creation.
So, is that Pythonic? Well, it depends.
If you're creating dozens of "integer-like" classes, then yes, it's certainly better than copy-paste code or adding a "compile-time" generation step, and it's probably better than adding an otherwise-unnecessary base class.
And if you're trying to work across many versions of Python and don't want to have to remember "which version am I supposed to stop supplying __cmp__ to act like int again?" type questions, I might go even further and get the list of methods out of int itself (take dir(int()) and blacklist out a few names).
But if you're just doing this one class, in, say, just Python 2.6-2.7 or just 3.3+, I think it's a toss-up.
A good class to read is the fractions.Fraction class in the standard library. It's clearly-written pure Python code. And it partially demonstrates both the dynamic and explicit mechanisms (because it explicitly defines each special message in terms of generic dynamic forwarding functions), and if you've got both 2.x and 3.x around you can compare and contrast the two.
Meanwhile, it seems like your class is underspecified. If x is a BoundInt and y is an int, should x+y really return an int (as it does in your code)? If not, do you need to bound it? What about y+x? What should x+=y do? And so on.
Finally, in Python, it's often worth making "value classes" like this immutable, even if the intuitive C++ equivalent would be mutable. For example, consider this:
>>> i = BoundInt(3, 0, 10)
>>> j = i
>>> i.set(5)
>>> j
5
I don't think you'd expect this. This wouldn't happen in C++ (for a typical value class), because j = i would create a new copy, but in Python, it's just binding a new name to the same copy. (It's equivalent to BoundInt &j = i, not BoundInt j = i.)
If you want BoundInt to be immutable, besides eliminating obvious things like set, also make sure not to implement __iadd__ and friends. If you leave out __iadd__, i += 2 will be turned into i = i.__add__(2): in other words, it will create a new instance, then rebind i to that new instance, leaving the old one alone.
There are likely many opinions on this. But regarding the proliferation of of special methods, you will just have to do that to make it complete. But at least you only do that once, in one place. Also the built-in number types can be subclassed. That's what I did for a similar implementation, that you can look it.
Your set method is an abomination. You do not create a number with a default value of zero and then change the number into some other number. That is very much trying to program C++ in Python, and will cause you endless amounts of headaches if you actually want to treat these the same way you do numbers, because every time you pass them to functions they are passed by reference (like everything in Python). So you'll end up with large amounts of aliasing in things you think you can treat like numbers, and you will almost certainly encounter bugs due to mutating the value of numbers you don't realise are aliased, or expecting to be able to retrieve a value stored in a dictionary with a BoundInt as a key by providing another BoundInt with the same value.
To me, high and low aren't data values associated with a particular BoundInt value, they're type parameters. I want a number 7 in the type BoundInt(1, 10), not a number 7 which is constrained to be between 1 and 10, all of which being a value in the type BoundInt.
If I really wanted to do something like this, the approach I would take would be to subclass intis to treat BoundInt as a class factory; you give it a range, and it gives you the type of integers restricted to be in that range. You can apply that type to any "int-like" object and it will give you a value clamped to that range. Something like:
_bound_int_cache = {}
def BoundInt(low, low):
try:
return _bound_int_cache[(low, high)]
except KeyError:
class Tmp(int):
low = low
high = high
def __new__(cls, value):
value = max(value, cls.low)
value = min(value, cls.max)
return int.__new__(cls, value)
Tmp.__name__ = 'BoundInt({}, {})'.format(low, high)
_bound_int_cache[(low, high)] = Tmp
return _bound_int_cache[(low, high)]
(The cache is just to ensure that two different attempts to get the BoundInt type for the same low/high values give you the exact same class, not two different classes that behave the same way. Probably wouldn't matter in practice most of the time, but it seems nicer.)
You would use this like:
B = BoundInt(1, 10)
x = B(7)
The "class factory" approach means that if you have a small number of meaningful ranges in which you want to bound your integers, you can create the classes for those ranges globally (with meaningful names), and then use them exactly like regular classes.
Subclassing int makes these objects immutable (which is why the initialisation had to be done in __new__), which frees you from aliasing bugs (which people don't expect to have to worry about when they're programming with simple value types like numbers, and for good reason). It also gives you all the integer methods for free, and so these BoundInt types behave exactly as int, except that when you create one the value is clamped by the type. Unfortunately that means that all operations on these types return int objects, not BoundInt objects.
If you could come up with a way of reconciling the low/high values for the two different values involved in e.g. x + y, then you can override the special methods to make them return BoundInt values. The approaches that spring to mind are:
Take the left operand's bounds and ignore the right operand (seems messy and asymmetrical; violates the assumption that x + y = y + x)
Take the maximum low value and the minimum high value. It's nicely symmetrical, and you can treat numeric values that don't have low and high values as if they were sys.minint and sys.maxint (i.e. just use the bounds from the other value). Doesn't make a whole lot of sense if the ranges don't overlap at all, because you'll end up with an empty range, but operating on such numbers together probably doesn't make a whole lot of sense anyway.
Take the minimum low value and the maximum high value. Also symmetrical, but here you probably want to explicitly ignore normal numbers rather than pretending they're BoundInt values that can range over the whole integer range.
Any of the above could work, and any of the above will probably surprise you at some point (e.g. negating a number constrained to be in a positive range will always give you the smallest positive number in the range, which seems weird to me).
If you take this approach, you probably don't want to subclass int. Because if you have normalInt + boundedInt, then normalInt would handle the addition without respecting your code. You instead want it not to recognose boundedInt as an int value, so that int's __add__ wont' work and will give your class a chance to try __radd__. But I would still treat your class as "immutable", and make every operation that comes up with a new number construct a new object; mutating numbers in place is practically guaranteed to cause bugs sometime.
So I'd handle that approach something like this:
class BoundIntBase(object):
# Don't use this class directly; use a subclass that specifies low and high as
# class attributes.
def __init__(self, value):
self.value = min(self.high, max(self.low, int(value)))
def __int__(self):
return self.value
# add binary operations to BoundInt
for method in ['__add__', '__radd__', ...]:
def tmp(self, other):
try:
low = min(self.low, other.low)
high = max(self.high, other.high)
except AttributError:
cls = type(self)
else:
cls = BountInd(low, high)
v = getattr(int(self), method)(int(other))
return cls(v)
tmp.__name__ = method
setattr(BountIntBase, method, tmp)
_bound_int_cache = {}
def BoundInt(low, low):
try:
return _bound_int_cache[(low, high)]
except KeyError:
class Tmp(int):
low = low
high = high
def __new__(cls, value):
value = max(value, cls.low)
value = min(value, cls.max)
return int.__new__(cls, value)
Tmp.__name__ = 'BoundInt({}, {})'.format(low, high)
_bound_int_cache[(low, high)] = Tmp
return _bound_int_cache[(low, high)]
Still seems like more code than it should be, but what you're trying to do is actually more complicated than you think it is.
The type that behaves exactly like numbers in all situations needs
many special methods due to rich syntax support in Python (it seems no
other types require so much methods e.g., it is much simpler to define
types that behave like a list, dict in Python: a couple of methods and you have a Sequence). There are several
ways to make the code less repetitive.
ABC classes such as
numbers.Integral
provide default implementations for some methods e.g., if __add__,
__radd__ are implemented in a subclass then __sub__, __rsub__
are available automatically.
fractions.Fraction
uses
_operator_fallbacks
to define __r*__ and provide fallback operators to
deal with other numeric types:
__op__, __rop__ = _operator_fallbacks(monomorphic_operator, operator.op)
Python allows to generate/modify a class dynamically in a factory
function/metaclass e.g.,
Can anyone help condense this Python code?. Even
exec could be used in (very) rare cases e.g.,
namedtuple().
Numbers are immutable in Python so you should use __new__ instead of __init__.
Rare cases that are not covered by __new__ could be defined in
from_sometype(cls, d: sometype) -> your_type class methods. And in
reverse, cases that are not covered by special methods could use
as_sometype(self) -> sometype methods.
A simpler solution in your case might be to define a higher-level type
specific for your application domain. Number abstraction might be too
low-level e.g.,
decimal.Decimal is
more than 6 KLOC.

What's the point of def some_method(param: int) syntax?

Specifically the ":int" part...
I assumed it somehow checked the type of the parameter at the time the function is called and perhaps raised an exception in the case of a violation. But the following run without problems:
def some_method(param:str):
print("blah")
some_method(1)
def some_method(param:int):
print("blah")
some_method("asdfaslkj")
In both cases "blah" is printed - no exception raised.
I'm not sure what the name of the feature is so I wasn't sure what to google.
EDIT: OK, so it's http://www.python.org/dev/peps/pep-3107/. I can see how it'd be useful in frameworks that utilize metadata. It's not what I assumed it was. Thanks for the responses!
FOLLOW-UP QUESTION - Any thoughts on whether it's a good idea or bad idea to define my functions as def some_method(param:int) if I really only can handle int inputs - even if, as pep 3107 explains, it's just metadata - no enforcement as I originally assumed? At least the consumers of the methods will see clearly what I intended. It's an alternative to documentation. Think this is good/bad/waste of time? Granted, good parameter naming (unlike my contrived example) usually makes it clear what types are meant to be passed in.
it's not used for anything much - it's just there for experimentation (you can read them from within python if you want, for example). they are called "function annotations" and are described in pep 3107.
i wrote a library that builds on it to do things like type checking (and more - for example you can map more easily from JSON to python objects) called pytyp (more info), but it's not very popular... (i should also add that the type checking part of pytyp is not at all efficient - it can be useful for tracking down a bug, but you wouldn't want to use it across an entire program).
[update: i would not recommend using function annotations in general (ie with no particular use in mind, just as docs) because (1) they might eventually get used in a way that you didn't expect and (2) the exact type of things is often not that important in python (more exactly, it's not always clear how best to specify the type of something in a useful way - objects can be quite complex, and often only "parts" are used by any one function, with multiple classes implementing those parts in different ways...). this is a consequence of duck typing - see the "more info" link for related discussion on how python's abstract base classes could be used to tackle this...]
Function annotations are what you make of them.
They can be used for documentation:
def kinetic_energy(mass: 'in kilograms', velocity: 'in meters per second'):
...
They can be used for pre-condition checking:
def validate(func, locals):
for var, test in func.__annotations__.items():
value = locals[var]
msg = 'Var: {0}\tValue: {1}\tTest: {2.__name__}'.format(var, value, test)
assert test(value), msg
def is_int(x):
return isinstance(x, int)
def between(lo, hi):
def _between(x):
return lo <= x <= hi
return _between
def f(x: between(3, 10), y: is_int):
validate(f, locals())
print(x, y)
>>> f(0, 31.1)
Traceback (most recent call last):
...
AssertionError: Var: y Value: 31.1 Test: is_int
Also see http://www.python.org/dev/peps/pep-0362/ for a way to implement type checking.
Not experienced in python, but I assume the point is to annotate/declare the parameter type that the method expects. Whether or not the expected type is rigidly enforced at runtime is beside the point.
For instance, consider:
intToHexString(param:int)
Although the language may technically allow you to call intToHexString("Hello"), it's not semantically meaningful to do so. Having the :int as part of the method declaration helps to reinforce that.
It's basically just used for documentation. When some examines the method signature, they'll see that param is labelled as an int, which will tell them the author of the method expected them to pass an int.
Because Python programmers use duck typing, this doesn't mean you have to pass an int, but it tells you the code is expecting something "int-like". So you'll probably have to pass something basically "numeric" in nature, that supports arithmetic operations. Depending on the method it may have to be usable as an index, or it may not.
However, because it's syntax and not just a comment, the annotation is visible to any code that wants to introspect it. This opens up the possibility of writing a typecheck decorator that can enforce strict type checking on arbitrary functions; this allows you to put the type checking logic in one place, and have each method declare which parameters it wants strictly type checked (by attaching a type annotation) with a minimum on syntax, in a way that is visible to client programmers who are browsing method definitions to find out the interface.
Or you could do other things with those annotations. No standardized meaning has yet been developed. Maybe if someone comes up with a killer feature that uses them and has huge adoption, then it'll one day become part of the Python language, but I suspect the flexibility of using them however you want will be too useful to ever do that.
You might also use the "-> returnValue" notation to indicate what type the function might return.
def mul(a:int, b:int) -> None:
print(a*b)

The advantages of having static function like len(), max(), and min() over inherited method calls

i am a python newbie, and i am not sure why python implemented len(obj), max(obj), and min(obj) as a static like functions (i am from the java language) over obj.len(), obj.max(), and obj.min()
what are the advantages and disadvantages (other than obvious inconsistency) of having len()... over the method calls?
why guido chose this over the method calls? (this could have been solved in python3 if needed, but it wasn't changed in python3, so there gotta be good reasons...i hope)
thanks!!
The big advantage is that built-in functions (and operators) can apply extra logic when appropriate, beyond simply calling the special methods. For example, min can look at several arguments and apply the appropriate inequality checks, or it can accept a single iterable argument and proceed similarly; abs when called on an object without a special method __abs__ could try comparing said object with 0 and using the object change sign method if needed (though it currently doesn't); and so forth.
So, for consistency, all operations with wide applicability must always go through built-ins and/or operators, and it's those built-ins responsibility to look up and apply the appropriate special methods (on one or more of the arguments), use alternate logic where applicable, and so forth.
An example where this principle wasn't correctly applied (but the inconsistency was fixed in Python 3) is "step an iterator forward": in 2.5 and earlier, you needed to define and call the non-specially-named next method on the iterator. In 2.6 and later you can do it the right way: the iterator object defines __next__, the new next built-in can call it and apply extra logic, for example to supply a default value (in 2.6 you can still do it the bad old way, for backwards compatibility, though in 3.* you can't any more).
Another example: consider the expression x + y. In a traditional object-oriented language (able to dispatch only on the type of the leftmost argument -- like Python, Ruby, Java, C++, C#, &c) if x is of some built-in type and y is of your own fancy new type, you're sadly out of luck if the language insists on delegating all the logic to the method of type(x) that implements addition (assuming the language allows operator overloading;-).
In Python, the + operator (and similarly of course the builtin operator.add, if that's what you prefer) tries x's type's __add__, and if that one doesn't know what to do with y, then tries y's type's __radd__. So you can define your types that know how to add themselves to integers, floats, complex, etc etc, as well as ones that know how to add such built-in numeric types to themselves (i.e., you can code it so that x + y and y + x both work fine, when y is an instance of your fancy new type and x is an instance of some builtin numeric type).
"Generic functions" (as in PEAK) are a more elegant approach (allowing any overriding based on a combination of types, never with the crazy monomaniac focus on the leftmost arguments that OOP encourages!-), but (a) they were unfortunately not accepted for Python 3, and (b) they do of course require the generic function to be expressed as free-standing (it would be absolutely crazy to have to consider the function as "belonging" to any single type, where the whole POINT is that can be differently overridden/overloaded based on arbitrary combination of its several arguments' types!-). Anybody who's ever programmed in Common Lisp, Dylan, or PEAK, knows what I'm talking about;-).
So, free-standing functions and operators are just THE right, consistent way to go (even though the lack of generic functions, in bare-bones Python, does remove some fraction of the inherent elegance, it's still a reasonable mix of elegance and practicality!-).
It emphasizes the capabilities of an object, not its methods or type. Capabilites are declared by "helper" functions such as __iter__ and __len__ but they don't make up the interface. The interface is in the builtin functions, and beside this also in the buit-in operators like + and [] for indexing and slicing.
Sometimes, it is not a one-to-one correspondance: For example, iter(obj) returns an iterator for an object, and will work even if __iter__ is not defined. If not defined, it goes on to look if the object defines __getitem__ and will return an iterator accessing the object index-wise (like an array).
This goes together with Python's Duck Typing, we care only about what we can do with an object, not that it is of a particular type.
Actually, those aren't "static" methods in the way you are thinking about them. They are built-in functions that really just alias to certain methods on python objects that implement them.
>>> class Foo(object):
... def __len__(self):
... return 42
...
>>> f = Foo()
>>> len(f)
42
These are always available to be called whether or not the object implements them or not. The point is to have some consistency. Instead of some class having a method called length() and another called size(), the convention is to implement len and let the callers always access it by the more readable len(obj) instead of obj.methodThatDoesSomethingCommon
I thought the reason was so these basic operations could be done on iterators with the same interface as containers. However, it actually doesn't work with len:
def foo():
for i in range(10):
yield i
print len(foo())
... fails with TypeError. len() won't consume and count an iterator; it only works with objects that have a __len__ call.
So, as far as I'm concerned, len() shouldn't exist. It's much more natural to say obj.len than len(obj), and much more consistent with the rest of the language and the standard library. We don't say append(lst, 1); we say lst.append(1). Having a separate global method for length is an odd, inconsistent special case, and eats a very obvious name in the global namespace, which is a very bad habit of Python.
This is unrelated to duck typing; you can say getattr(obj, "len") to decide whether you can use len on an object just as easily--and much more consistently--than you can use getattr(obj, "__len__").
All that said, as language warts go--for those who consider this a wart--this is a very easy one to live with.
On the other hand, min and max do work on iterators, which gives them a use apart from any particular object. This is straightforward, so I'll just give an example:
import random
def foo():
for i in range(10):
yield random.randint(0, 100)
print max(foo())
However, there are no __min__ or __max__ methods to override its behavior, so there's no consistent way to provide efficient searching for sorted containers. If a container is sorted on the same key that you're searching, min/max are O(1) operations instead of O(n), and the only way to expose that is by a different, inconsistent method. (This could be fixed in the language relatively easily, of course.)
To follow up with another issue with this: it prevents use of Python's method binding. As a simple, contrived example, you can do this to supply a function to add values to a list:
def add(f):
f(1)
f(2)
f(3)
lst = []
add(lst.append)
print lst
and this works on all member functions. You can't do that with min, max or len, though, since they're not methods of the object they operate on. Instead, you have to resort to functools.partial, a clumsy second-class workaround common in other languages.
Of course, this is an uncommon case; but it's the uncommon cases that tell us about a language's consistency.

Categories

Resources