Related
In Fluent Python they say
You write len(my_object) and, if my_object is an instance of a user-defined class, then Python calls the __len__ instance method you implemented ...
But for built-in types like list, str, bytearray, and so on, the interpreter takes a shortcut: the CPython implementation of len() actually returns the value of the ob_size field in the PyVarObject C struct that represent the any variable-sized built-in object in memory.
But __len__ is still defined on at least some of these ...
>>> 'string'.__len__()
6
>>> [1, 2, 3].__len__()
3
What purpose do these methods serve if they're not called by len? And if are there other similar examples, what about them?
The fact that the CPython implementation doesn't call __len__() by default doesn't meant that there aren't any Python libraries or user programs out there which do. Keeping the method available is about API stability, even if CPython takes shortcuts where possible.
This question already has answers here:
Why does Python code use len() function instead of a length method?
(7 answers)
Closed 8 years ago.
The "intuitive" way of getting an iterator for someone who usually programs in Java, C++, etc is something like list.iterator().
Why did the Python folks choose to have it as a general function like len() (which results in iter(list) rather than list.iter())?
The same question can be asked for the length of a construct as well (len()).
iter() supports different types of objects.
You can pass in either a sequence (supporting length and item access) or an iterable (which produces an iterator by calling obj.__iter__()) or an iterator (which returns self from __iter__).
The Java list.iter() then is served by list.__iter__() in Python, but the iter() function allows for more types. You can customise the behaviour with a __iter__ method but if you implemented a sequence instead, things will still work.
There is also a second form of the function where a callable and a sentinel are passed in:
iter(fileobj.readline, '')
iterates over a file object by calling the readline() method until it returns an empty string (equal to the second argument, the sentinel).
Then there is the Principle of Least Astonishment argument; iter() gives the standard library a stable API call to standardise on, just like operators do; no need to look up the documentation of the class to see if it implemented obj.iter() or obj.iterator() or obj.get_iterator().
I'm working on this project which deals with vectors in python. But I'm new to python and don't really know how to crack it. Here's the instruction:
"Add a constructor to the Vector class. The constructor should take a single argument. If this argument is either an int or a long or an instance of a class derived from one of these, then consider this argument to be the length of the Vector instance. In this case, construct a Vector of the specified length with each element is initialized to 0.0. If the length is negative, raise a ValueError with an appropriate message. If the argument is not considered to be the length, then if the argument is a sequence (such as a list), then initialize with vector with the length and values of the given sequence. If the argument is not used as the length of the vector and if it is not a sequence, then raise a TypeError with an appropriate message.
Next implement the __repr__ method to return a string of python code which could be used to initialize the Vector. This string of code should consist of the name of the class followed by an open parenthesis followed by the contents of the vector represented as a list followed by a close parenthesis."
I'm not sure how to do the class type checking, as well as how to initialize the vector based on the given object. Could someone please help me with this? Thanks!
Your instructor seems not to "speak Python as a native language". ;) The entire concept for the class is pretty silly; real Python programmers just use the built-in sequence types directly. But then, this sort of thing is normal for academic exercises, sadly...
Add a constructor to the Vector class.
In Python, the common "this is how you create a new object and say what it's an instance of" stuff is handled internally by default, and then the baby object is passed to the class' initialization method to make it into a "proper" instance, by setting the attributes that new instances of the class should have. We call that method __init__.
The constructor should take a single argument. If this argument is either an int or a long or an instance of a class derived from one of these
This is tested by using the builtin function isinstance. You can look it up for yourself in the documentation (or try help(isinstance) at the REPL).
In this case, construct a Vector of the specified length with each element is initialized to 0.0.
In our __init__, we generally just assign the starting values for attributes. The first parameter to __init__ is the new object we're initializing, which we usually call "self" so that people understand what we're doing. The rest of the arguments are whatever was passed when the caller requested an instance. In our case, we're always expecting exactly one argument. It might have different types and different meanings, so we should give it a generic name.
When we detect that the generic argument is an integer type with isinstance, we "construct" the vector by setting the appropriate data. We just assign to some attribute of self (call it whatever makes sense), and the value will be... well, what are you going to use to represent the vector's data internally? Hopefully you've already thought about this :)
If the length is negative, raise a ValueError with an appropriate message.
Oh, good point... we should check that before we try to construct our storage. Some of the obvious ways to do it would basically treat a negative number the same as zero. Other ways might raise an exception that we don't get to control.
If the argument is not considered to be the length, then if the argument is a sequence (such as a list), then initialize with vector with the length and values of the given sequence.
"Sequence" is a much fuzzier concept; lists and tuples and what-not don't have a "sequence" base class, so we can't easily check this with isinstance. (After all, someone could easily invent a new kind of sequence that we didn't think of). The easiest way to check if something is a sequence is to try to create an iterator for it, with the built-in iter function. This will already raise a fairly meaningful TypeError if the thing isn't iterable (try it!), so that makes the error handling easy - we just let it do its thing.
Assuming we got an iterator, we can easily create our storage: most sequence types (and I assume you have one of them in mind already, and that one is certainly included) will accept an iterator for their __init__ method and do the obvious thing of copying the sequence data.
Next implement the __repr__ method to return a string of python code which could be used to initialize the Vector. This string of code should consist of the name of the class followed by an open parenthesis followed by the contents of the vector represented as a list followed by a close parenthesis."
Hopefully this is self-explanatory. Hint: you should be able to simplify this by making use of the storage attribute's own __repr__. Also consider using string formatting to put the string together.
Everything you need to get started is here:
http://docs.python.org/library/functions.html
There are many examples of how to check types in Python on StackOverflow (see my comment for the top-rated one).
To initialize a class, use the __init__ method:
class Vector(object):
def __init__(self, sequence):
self._internal_list = list(sequence)
Now you can call:
my_vector = Vector([1, 2, 3])
And inside other functions in Vector, you can refer to self._internal_list. I put _ before the variable name to indicate that it shouldn't be changed from outside the class.
The documentation for the list function may be useful for you.
You can do the type checking with isinstance.
The initialization of a class with done with an __init__ method.
Good luck with your assignment :-)
This may or may not be appropriate depending on the homework, but in Python programming it's not very usual to explicitly check the type of an argument and change the behaviour based on that. It's more normal to just try to use the features you expect it to have (possibly catching exceptions if necessary to fall back to other options).
In this particular example, a normal Python programmer implementing a Vector that needed to work this way would try using the argument as if it were an integer/long (hint: what happens if you multiply a list by an integer?) to initialize the Vector and if that throws an exception try using it as if it were a sequence, and if that failed as well then you can throw a TypeError.
The reason for doing this is that it leaves your class open to working with other objects types people come up with later that aren't integers or sequences but work like them. In particular it's very difficult to comprehensively check whether something is a "sequence", because user-defined classes that can be used as sequences don't have to be instances of any common type you can check. The Vector class itself is quite a good candidate for using to initialize a Vector, for example!
But I'm not sure if this is the answer your teacher is expecting. If you haven't learned about exception handling yet, then you're almost certainly not meant to use this approach so please ignore my post. Good luck with your learning!
I'm a bit surprised by Python's extensive use of 'magic methods'.
For example, in order for a class to declare that instances have a "length", it implements a __len__ method, which it is called when you write len(obj). Why not just define a len method which is called directly as a member of the object, e.g. obj.len()?
See also: Why does Python code use len() function instead of a length method?
AFAIK, len is special in this respect and has historical roots.
Here's a quote from the FAQ:
Why does Python use methods for some
functionality (e.g. list.index()) but
functions for other (e.g. len(list))?
The major reason is history. Functions
were used for those operations that
were generic for a group of types and
which were intended to work even for
objects that didn’t have methods at
all (e.g. tuples). It is also
convenient to have a function that can
readily be applied to an amorphous
collection of objects when you use the
functional features of Python (map(),
apply() et al).
In fact, implementing len(), max(),
min() as a built-in function is
actually less code than implementing
them as methods for each type. One can
quibble about individual cases but
it’s a part of Python, and it’s too
late to make such fundamental changes
now. The functions have to remain to
avoid massive code breakage.
The other "magical methods" (actually called special method in the Python folklore) make lots of sense, and similar functionality exists in other languages. They're mostly used for code that gets called implicitly when special syntax is used.
For example:
overloaded operators (exist in C++ and others)
constructor/destructor
hooks for accessing attributes
tools for metaprogramming
and so on...
From the Zen of Python:
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
This is one of the reasons - with custom methods, developers would be free to choose a different method name, like getLength(), length(), getlength() or whatsoever. Python enforces strict naming so that the common function len() can be used.
All operations that are common for many types of objects are put into magic methods, like __nonzero__, __len__ or __repr__. They are mostly optional, though.
Operator overloading is also done with magic methods (e.g. __le__), so it makes sense to use them for other common operations, too.
Python uses the word "magic methods", because those methods really performs magic for you program. One of the biggest advantages of using Python's magic methods is that they provide a simple way to make objects behave like built-in types. That means you can avoid ugly, counter-intuitive, and nonstandard ways of performing basic operators.
Consider a following example:
dict1 = {1 : "ABC"}
dict2 = {2 : "EFG"}
dict1 + dict2
Traceback (most recent call last):
File "python", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'dict' and 'dict'
This gives an error, because the dictionary type doesn't support addition. Now, let's extend dictionary class and add "__add__" magic method:
class AddableDict(dict):
def __add__(self, otherObj):
self.update(otherObj)
return AddableDict(self)
dict1 = AddableDict({1 : "ABC"})
dict2 = AddableDict({2 : "EFG"})
print (dict1 + dict2)
Now, it gives following output.
{1: 'ABC', 2: 'EFG'}
Thus, by adding this method, suddenly magic has happened and the error you were getting earlier, has gone away.
I hope, it makes things clear to you. For more information, refer to:
A Guide to Python's Magic Methods (Rafe Kettler, 2012)
Some of these functions do more than a single method would be able to implement (without abstract methods on a superclass). For instance bool() acts kind of like this:
def bool(obj):
if hasattr(obj, '__nonzero__'):
return bool(obj.__nonzero__())
elif hasattr(obj, '__len__'):
if obj.__len__():
return True
else:
return False
return True
You can also be 100% sure that bool() will always return True or False; if you relied on a method you couldn't be entirely sure what you'd get back.
Some other functions that have relatively complicated implementations (more complicated than the underlying magic methods are likely to be) are iter() and cmp(), and all the attribute methods (getattr, setattr and delattr). Things like int also access magic methods when doing coercion (you can implement __int__), but do double duty as types. len(obj) is actually the one case where I don't believe it's ever different from obj.__len__().
They are not really "magic names". It's just the interface an object has to implement to provide a given service. In this sense, they are not more magic than any predefined interface definition you have to reimplement.
While the reason is mostly historic, there are some peculiarities in Python's len that make the use of a function instead of a method appropriate.
Some operations in Python are implemented as methods, for example list.index and dict.append, while others are implemented as callables and magic methods, for example str and iter and reversed. The two groups differ enough so the different approach is justified:
They are common.
str, int and friends are types. It makes more sense to call the constructor.
The implementation differs from the function call. For example, iter might call __getitem__ if __iter__ isn't available, and supports additional arguments that don't fit in a method call. For the same reason it.next() has been changed to next(it) in recent versions of Python - it makes more sense.
Some of these are close relatives of operators. There's syntax for calling __iter__ and __next__ - it's called the for loop. For consistency, a function is better. And it makes it better for certain optimisations.
Some of the functions are simply way too similar to the rest in some way - repr acts like str does. Having str(x) versus x.repr() would be confusing.
Some of them rarely use the actual implementation method, for example isinstance.
Some of them are actual operators, getattr(x, 'a') is another way of doing x.a and getattr shares many of the aforementioned qualities.
I personally call the first group method-like and the second group operator-like. It's not a very good distinction, but I hope it helps somehow.
Having said this, len doesn't exactly fit in the second group. It's more close to the operations in the first one, with the only difference that it's way more common than almost any of them. But the only thing that it does is calling __len__, and it's very close to L.index. However, there are some differences. For example, __len__ might be called for the implementation of other features, such as bool, if the method was called len you might break bool(x) with custom len method that does completely different thing.
In short, you have a set of very common features that classes might implement that might be accessed through an operator, through a special function (that usually does more than the implementation, as an operator would), during object construction, and all of them share some common traits. All the rest is a method. And len is somewhat of an exception to that rule.
There is not a lot to add to the above two posts, but all the "magic" functions are not really magic at all. They are part of the __ builtins__ module which is implicitly/automatically imported when the interpreter starts. I.e.:
from __builtins__ import *
happens every time before your program starts.
I always thought it would be more correct if Python only did this for the interactive shell, and required scripts to import the various parts from builtins they needed. Also probably different __ main__ handling would be nice in shells vs interactive. Anyway, check out all the functions, and see what it is like without them:
dir (__builtins__)
...
del __builtins__
Perhaps, you have noticed it is possible to use certain built-in methods (ex. len(my_list_or_my_string)), and syntaxes (ex. my_list_or_my_string[:3], my_fancy_dict['some_key']) on some native types such as list, dict. Maybe you have been curious as to why it is not possible (yet) to use these same syntaxes on some of the classes you have written.
Variables of native types (list, dict, int, str) have unique behaviours and respond to certain syntaxes because they have some special methods defined in their respective classes — these methods are called Magic Methods.
A few magic methods include: __len__, __gt__, __eq__, etc.
Read more here: https://tomisin.dev/blog/supercharging-python-classes-with-magic-methods
i am a python newbie, and i am not sure why python implemented len(obj), max(obj), and min(obj) as a static like functions (i am from the java language) over obj.len(), obj.max(), and obj.min()
what are the advantages and disadvantages (other than obvious inconsistency) of having len()... over the method calls?
why guido chose this over the method calls? (this could have been solved in python3 if needed, but it wasn't changed in python3, so there gotta be good reasons...i hope)
thanks!!
The big advantage is that built-in functions (and operators) can apply extra logic when appropriate, beyond simply calling the special methods. For example, min can look at several arguments and apply the appropriate inequality checks, or it can accept a single iterable argument and proceed similarly; abs when called on an object without a special method __abs__ could try comparing said object with 0 and using the object change sign method if needed (though it currently doesn't); and so forth.
So, for consistency, all operations with wide applicability must always go through built-ins and/or operators, and it's those built-ins responsibility to look up and apply the appropriate special methods (on one or more of the arguments), use alternate logic where applicable, and so forth.
An example where this principle wasn't correctly applied (but the inconsistency was fixed in Python 3) is "step an iterator forward": in 2.5 and earlier, you needed to define and call the non-specially-named next method on the iterator. In 2.6 and later you can do it the right way: the iterator object defines __next__, the new next built-in can call it and apply extra logic, for example to supply a default value (in 2.6 you can still do it the bad old way, for backwards compatibility, though in 3.* you can't any more).
Another example: consider the expression x + y. In a traditional object-oriented language (able to dispatch only on the type of the leftmost argument -- like Python, Ruby, Java, C++, C#, &c) if x is of some built-in type and y is of your own fancy new type, you're sadly out of luck if the language insists on delegating all the logic to the method of type(x) that implements addition (assuming the language allows operator overloading;-).
In Python, the + operator (and similarly of course the builtin operator.add, if that's what you prefer) tries x's type's __add__, and if that one doesn't know what to do with y, then tries y's type's __radd__. So you can define your types that know how to add themselves to integers, floats, complex, etc etc, as well as ones that know how to add such built-in numeric types to themselves (i.e., you can code it so that x + y and y + x both work fine, when y is an instance of your fancy new type and x is an instance of some builtin numeric type).
"Generic functions" (as in PEAK) are a more elegant approach (allowing any overriding based on a combination of types, never with the crazy monomaniac focus on the leftmost arguments that OOP encourages!-), but (a) they were unfortunately not accepted for Python 3, and (b) they do of course require the generic function to be expressed as free-standing (it would be absolutely crazy to have to consider the function as "belonging" to any single type, where the whole POINT is that can be differently overridden/overloaded based on arbitrary combination of its several arguments' types!-). Anybody who's ever programmed in Common Lisp, Dylan, or PEAK, knows what I'm talking about;-).
So, free-standing functions and operators are just THE right, consistent way to go (even though the lack of generic functions, in bare-bones Python, does remove some fraction of the inherent elegance, it's still a reasonable mix of elegance and practicality!-).
It emphasizes the capabilities of an object, not its methods or type. Capabilites are declared by "helper" functions such as __iter__ and __len__ but they don't make up the interface. The interface is in the builtin functions, and beside this also in the buit-in operators like + and [] for indexing and slicing.
Sometimes, it is not a one-to-one correspondance: For example, iter(obj) returns an iterator for an object, and will work even if __iter__ is not defined. If not defined, it goes on to look if the object defines __getitem__ and will return an iterator accessing the object index-wise (like an array).
This goes together with Python's Duck Typing, we care only about what we can do with an object, not that it is of a particular type.
Actually, those aren't "static" methods in the way you are thinking about them. They are built-in functions that really just alias to certain methods on python objects that implement them.
>>> class Foo(object):
... def __len__(self):
... return 42
...
>>> f = Foo()
>>> len(f)
42
These are always available to be called whether or not the object implements them or not. The point is to have some consistency. Instead of some class having a method called length() and another called size(), the convention is to implement len and let the callers always access it by the more readable len(obj) instead of obj.methodThatDoesSomethingCommon
I thought the reason was so these basic operations could be done on iterators with the same interface as containers. However, it actually doesn't work with len:
def foo():
for i in range(10):
yield i
print len(foo())
... fails with TypeError. len() won't consume and count an iterator; it only works with objects that have a __len__ call.
So, as far as I'm concerned, len() shouldn't exist. It's much more natural to say obj.len than len(obj), and much more consistent with the rest of the language and the standard library. We don't say append(lst, 1); we say lst.append(1). Having a separate global method for length is an odd, inconsistent special case, and eats a very obvious name in the global namespace, which is a very bad habit of Python.
This is unrelated to duck typing; you can say getattr(obj, "len") to decide whether you can use len on an object just as easily--and much more consistently--than you can use getattr(obj, "__len__").
All that said, as language warts go--for those who consider this a wart--this is a very easy one to live with.
On the other hand, min and max do work on iterators, which gives them a use apart from any particular object. This is straightforward, so I'll just give an example:
import random
def foo():
for i in range(10):
yield random.randint(0, 100)
print max(foo())
However, there are no __min__ or __max__ methods to override its behavior, so there's no consistent way to provide efficient searching for sorted containers. If a container is sorted on the same key that you're searching, min/max are O(1) operations instead of O(n), and the only way to expose that is by a different, inconsistent method. (This could be fixed in the language relatively easily, of course.)
To follow up with another issue with this: it prevents use of Python's method binding. As a simple, contrived example, you can do this to supply a function to add values to a list:
def add(f):
f(1)
f(2)
f(3)
lst = []
add(lst.append)
print lst
and this works on all member functions. You can't do that with min, max or len, though, since they're not methods of the object they operate on. Instead, you have to resort to functools.partial, a clumsy second-class workaround common in other languages.
Of course, this is an uncommon case; but it's the uncommon cases that tell us about a language's consistency.