The Python docs are a bit ambiguous
sequence
An iterable which supports efficient element access using integer indices via the __getitem__() special method and defines a __len__()
method that returns the length of the sequence. Some built-in sequence
types are list, str, tuple, and bytes. Note that dict also supports
__getitem__() and __len__(), but is considered a mapping rather than a sequence because the lookups use arbitrary immutable keys rather than
integers.
The collections.abc.Sequence abstract base class defines a much richer interface that goes beyond just __getitem__() and __len__(),
adding count(), index(), __contains__(), and __reversed__(). Types
that implement this expanded interface can be registered explicitly
using register().
In particular, using abc.collections.Sequence as the gold standard as recommended by some would mean that, for example, numpy arrays are not sequences:
isinstance(np.arange(6),collections.abc.Sequence)
# False
There is also something called the Sequence Protocol but that appears to be exposed only at the C-API. There the criterion is
int PySequence_Check(PyObject *o)
Return 1 if the object provides sequence protocol, and 0 otherwise. Note that it returns 1 for Python classes with a
__getitem__() method unless they are dict subclasses since in general case it is impossible to determine what the type of keys it supports.
This function always succeeds.
Finally, I don't follow this new (-ish) type annotation business too closely but I would imagine this also would benefit from a clear concept of what a sequence is.
So my question has both a philosophical and a practical side: What exactly is a sequence? and How do I test whether something is a sequence or not? Ideally, in a way that makes numpy arrays sequences. And if I ever start annotating, how would I approach sequences?
Brief introduction to typing in Python
Skip ahead if you know what structural typing, nominal typing and duck typing are.
I think much of the confusion arises from the fact that typing was a provisional module between versions 3.5 and 3.6. And was still subject to change between versions 3.7 and 3.8. This means there has been a lot of flux in how Python has sought to deal with typing through type annotations.
It also doesn't help that python is both duck-typed and nominally typed. That is, when accessing an attribute of an object, Python is duck-typed. The object will only be checked to see if it has an attribute at runtime, and only when immediately requested. However, Python also has nominal typing features (eg. isinstance()and issubclass()). Nominal typing is where one type is declared to be a subclass of another. This can be through inheritance, or with the register() method of ABCMeta.
typing originally introduced its types using the idea of nominal typing. As of 3.8 it is trying to allow for the more pythonic structural typing.
Structural typing is related to duck-typing, except that it is taken into consideration at "compile time" rather than runtime. For instance, when a linter is trying to detect possible type errors -- such as if you were to pass a dict to a function that only accepts sequences like tuples or list. With structural typing, a class B should be considered a subtype of A if it implements the all the methods of A, regardless of whether it has been declared to be a subtype of A (as in nominal typing).
Answer
sequences (little s) are a duck type. A sequence is any ordered collection of objects that provides random access to its members. Specifically, if it defines __len__ and __getitem__ and uses integer indices between 0 and n-1 then it is a sequence. A Sequence (big s) is a nominal type. That is, to be a Sequence, a class must be declared as such, either by inheriting from Sequence or being registered as a subclass.
A numpy array is a sequence, but it is not a Sequence as it is not registered as a subclass of Sequence. Nor should it be, as it does not implement the full interface promised by Sequence (things like count() and index() are missing).
It sounds like you want is a structured type for a sequence (small s). As of 3.8 this is possible by using protocols. Protocols define a set of methods which a class must implement to be considered a subclass of the protocol (a la structural typing).
from typing import Protocol
import numpy as np
class MySequence(Protocol):
def __getitem__(self, index):
raise NotImplementedError
def __len__(self):
raise NotImplementedError
def __contains__(self, item):
raise NotImplementedError
def __iter__(self):
raise NotImplementedError
def f(s: MySequence):
for i in range(len(s)):
print(s[i], end=' ')
print('end')
f([1, 2, 3, 4]) # should be fine
arr: np.ndarray = np.arange(5)
f(arr) # also fine
f({}) # might be considered fine! Depends on your type checker
Protocols are fairly new, so not all IDEs/type checkers might support them yet. The IDE I use, PyCharm, does. It doesn't like f({}), but it is happy to consider a numpy array a Sequence (big S) though (perhaps not ideal). You can enable runtime checking of protocols by using the runtime_checkable decorator of typing. Be warned, all this does is individually check that each of the Protocols methods can be found on the given object/class. As a result, it can become quite expensive if your protocol has a lot of methods.
I think the most practical way to define a sequence in Python is 'A container that supports indexing with integers'.
The Wikipedia definition also holds:
a sequence is an enumerated collection of objects in which repetitions are allowed and order does matter.
To validate if an object is a sequence, I would emulate the logic from the Sequence Protocol:
hasattr(test_obj, "__getitem__") and not isinstance(test_obj, collections.abc.Mapping)
Per the doc you pasted:
The collections.abc.Sequence abstract base class defines a much richer interface that goes beyond just __getitem__() and __len__(), adding count(), index(), __contains__(), and __reversed__(). Types that implement this expanded interface can be registered explicitly using register().
numpy.ndarray does not implement the Sequence protocol because it does not implement count() or index():
>>> arr = numpy.arange(6)
>>> isinstance(arr, Sequence)
False
>>> arr.count(3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'numpy.ndarray' object has no attribute 'count'
>>> arr.index(3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'numpy.ndarray' object has no attribute 'index'
Contrast to a range:
>>> r = range(6)
>>> isinstance(r, Sequence)
True
>>> r.count(3)
1
>>> r.index(3)
3
If you want to claim that arr is a Sequence you can, by using the register() class method:
>>> Sequence.register(numpy.ndarray)
<class 'numpy.ndarray'>
>>> isinstance(arr, Sequence)
True
but this is a lie, because it doesn't actually implement the protocol (the register() function doesn't actually check for that, it just trusts you):
>>> arr.count(3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'numpy.ndarray' object has no attribute 'count'
so doing this may lead to errors if you pass a numpy.ndarray to a function that expects a Sequence.
Related
I know that python has a len() function that is used to determine the size of a string, but I was wondering why it's not a method of the string object?
Strings do have a length method: __len__()
The protocol in Python is to implement this method on objects which have a length and use the built-in len() function, which calls it for you, similar to the way you would implement __iter__() and use the built-in iter() function (or have the method called behind the scenes for you) on objects which are iterable.
See Emulating container types for more information.
Here's a good read on the subject of protocols in Python: Python and the Principle of Least Astonishment
Jim's answer to this question may help; I copy it here. Quoting Guido van Rossum:
First of all, I chose len(x) over x.len() for HCI reasons (def __len__() came much later). There are two intertwined reasons actually, both HCI:
(a) For some operations, prefix notation just reads better than postfix — prefix (and infix!) operations have a long tradition in mathematics which likes notations where the visuals help the mathematician thinking about a problem. Compare the easy with which we rewrite a formula like x*(a+b) into x*a + x*b to the clumsiness of doing the same thing using a raw OO notation.
(b) When I read code that says len(x) I know that it is asking for the length of something. This tells me two things: the result is an integer, and the argument is some kind of container. To the contrary, when I read x.len(), I have to already know that x is some kind of container implementing an interface or inheriting from a class that has a standard len(). Witness the confusion we occasionally have when a class that is not implementing a mapping has a get() or keys() method, or something that isn’t a file has a write() method.
Saying the same thing in another way, I see ‘len‘ as a built-in operation. I’d hate to lose that. /…/
Python is a pragmatic programming language, and the reasons for len() being a function and not a method of str, list, dict etc. are pragmatic.
The len() built-in function deals directly with built-in types: the CPython implementation of len() actually returns the value of the ob_size field in the PyVarObject C struct that represents any variable-sized built-in object in memory. This is much faster than calling a method -- no attribute lookup needs to happen. Getting the number of items in a collection is a common operation and must work efficiently for such basic and diverse types as str, list, array.array etc.
However, to promote consistency, when applying len(o) to a user-defined type, Python calls o.__len__() as a fallback. __len__, __abs__ and all the other special methods documented in the Python Data Model make it easy to create objects that behave like the built-ins, enabling the expressive and highly consistent APIs we call "Pythonic".
By implementing special methods your objects can support iteration, overload infix operators, manage contexts in with blocks etc. You can think of the Data Model as a way of using the Python language itself as a framework where the objects you create can be integrated seamlessly.
A second reason, supported by quotes from Guido van Rossum like this one, is that it is easier to read and write len(s) than s.len().
The notation len(s) is consistent with unary operators with prefix notation, like abs(n). len() is used way more often than abs(), and it deserves to be as easy to write.
There may also be a historical reason: in the ABC language which preceded Python (and was very influential in its design), there was a unary operator written as #s which meant len(s).
There is a len method:
>>> a = 'a string of some length'
>>> a.__len__()
23
>>> a.__len__
<method-wrapper '__len__' of str object at 0x02005650>
met% python -c 'import this' | grep 'only one'
There should be one-- and preferably only one --obvious way to do it.
There are some great answers here, and so before I give my own I'd like to highlight a few of the gems (no ruby pun intended) I've read here.
Python is not a pure OOP language -- it's a general purpose, multi-paradigm language that allows the programmer to use the paradigm they are most comfortable with and/or the paradigm that is best suited for their solution.
Python has first-class functions, so len is actually an object. Ruby, on the other hand, doesn't have first class functions. So the len function object has it's own methods that you can inspect by running dir(len).
If you don't like the way this works in your own code, it's trivial for you to re-implement the containers using your preferred method (see example below).
>>> class List(list):
... def len(self):
... return len(self)
...
>>> class Dict(dict):
... def len(self):
... return len(self)
...
>>> class Tuple(tuple):
... def len(self):
... return len(self)
...
>>> class Set(set):
... def len(self):
... return len(self)
...
>>> my_list = List([1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F'])
>>> my_dict = Dict({'key': 'value', 'site': 'stackoverflow'})
>>> my_set = Set({1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F'})
>>> my_tuple = Tuple((1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F'))
>>> my_containers = Tuple((my_list, my_dict, my_set, my_tuple))
>>>
>>> for container in my_containers:
... print container.len()
...
15
2
15
15
Something missing from the rest of the answers here: the len function checks that the __len__ method returns a non-negative int. The fact that len is a function means that classes cannot override this behaviour to avoid the check. As such, len(obj) gives a level of safety that obj.len() cannot.
Example:
>>> class A:
... def __len__(self):
... return 'foo'
...
>>> len(A())
Traceback (most recent call last):
File "<pyshell#8>", line 1, in <module>
len(A())
TypeError: 'str' object cannot be interpreted as an integer
>>> class B:
... def __len__(self):
... return -1
...
>>> len(B())
Traceback (most recent call last):
File "<pyshell#13>", line 1, in <module>
len(B())
ValueError: __len__() should return >= 0
Of course, it is possible to "override" the len function by reassigning it as a global variable, but code which does this is much more obviously suspicious than code which overrides a method in a class.
I know that python has a len() function that is used to determine the size of a string, but I was wondering why it's not a method of the string object?
Strings do have a length method: __len__()
The protocol in Python is to implement this method on objects which have a length and use the built-in len() function, which calls it for you, similar to the way you would implement __iter__() and use the built-in iter() function (or have the method called behind the scenes for you) on objects which are iterable.
See Emulating container types for more information.
Here's a good read on the subject of protocols in Python: Python and the Principle of Least Astonishment
Jim's answer to this question may help; I copy it here. Quoting Guido van Rossum:
First of all, I chose len(x) over x.len() for HCI reasons (def __len__() came much later). There are two intertwined reasons actually, both HCI:
(a) For some operations, prefix notation just reads better than postfix — prefix (and infix!) operations have a long tradition in mathematics which likes notations where the visuals help the mathematician thinking about a problem. Compare the easy with which we rewrite a formula like x*(a+b) into x*a + x*b to the clumsiness of doing the same thing using a raw OO notation.
(b) When I read code that says len(x) I know that it is asking for the length of something. This tells me two things: the result is an integer, and the argument is some kind of container. To the contrary, when I read x.len(), I have to already know that x is some kind of container implementing an interface or inheriting from a class that has a standard len(). Witness the confusion we occasionally have when a class that is not implementing a mapping has a get() or keys() method, or something that isn’t a file has a write() method.
Saying the same thing in another way, I see ‘len‘ as a built-in operation. I’d hate to lose that. /…/
Python is a pragmatic programming language, and the reasons for len() being a function and not a method of str, list, dict etc. are pragmatic.
The len() built-in function deals directly with built-in types: the CPython implementation of len() actually returns the value of the ob_size field in the PyVarObject C struct that represents any variable-sized built-in object in memory. This is much faster than calling a method -- no attribute lookup needs to happen. Getting the number of items in a collection is a common operation and must work efficiently for such basic and diverse types as str, list, array.array etc.
However, to promote consistency, when applying len(o) to a user-defined type, Python calls o.__len__() as a fallback. __len__, __abs__ and all the other special methods documented in the Python Data Model make it easy to create objects that behave like the built-ins, enabling the expressive and highly consistent APIs we call "Pythonic".
By implementing special methods your objects can support iteration, overload infix operators, manage contexts in with blocks etc. You can think of the Data Model as a way of using the Python language itself as a framework where the objects you create can be integrated seamlessly.
A second reason, supported by quotes from Guido van Rossum like this one, is that it is easier to read and write len(s) than s.len().
The notation len(s) is consistent with unary operators with prefix notation, like abs(n). len() is used way more often than abs(), and it deserves to be as easy to write.
There may also be a historical reason: in the ABC language which preceded Python (and was very influential in its design), there was a unary operator written as #s which meant len(s).
There is a len method:
>>> a = 'a string of some length'
>>> a.__len__()
23
>>> a.__len__
<method-wrapper '__len__' of str object at 0x02005650>
met% python -c 'import this' | grep 'only one'
There should be one-- and preferably only one --obvious way to do it.
There are some great answers here, and so before I give my own I'd like to highlight a few of the gems (no ruby pun intended) I've read here.
Python is not a pure OOP language -- it's a general purpose, multi-paradigm language that allows the programmer to use the paradigm they are most comfortable with and/or the paradigm that is best suited for their solution.
Python has first-class functions, so len is actually an object. Ruby, on the other hand, doesn't have first class functions. So the len function object has it's own methods that you can inspect by running dir(len).
If you don't like the way this works in your own code, it's trivial for you to re-implement the containers using your preferred method (see example below).
>>> class List(list):
... def len(self):
... return len(self)
...
>>> class Dict(dict):
... def len(self):
... return len(self)
...
>>> class Tuple(tuple):
... def len(self):
... return len(self)
...
>>> class Set(set):
... def len(self):
... return len(self)
...
>>> my_list = List([1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F'])
>>> my_dict = Dict({'key': 'value', 'site': 'stackoverflow'})
>>> my_set = Set({1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F'})
>>> my_tuple = Tuple((1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F'))
>>> my_containers = Tuple((my_list, my_dict, my_set, my_tuple))
>>>
>>> for container in my_containers:
... print container.len()
...
15
2
15
15
Something missing from the rest of the answers here: the len function checks that the __len__ method returns a non-negative int. The fact that len is a function means that classes cannot override this behaviour to avoid the check. As such, len(obj) gives a level of safety that obj.len() cannot.
Example:
>>> class A:
... def __len__(self):
... return 'foo'
...
>>> len(A())
Traceback (most recent call last):
File "<pyshell#8>", line 1, in <module>
len(A())
TypeError: 'str' object cannot be interpreted as an integer
>>> class B:
... def __len__(self):
... return -1
...
>>> len(B())
Traceback (most recent call last):
File "<pyshell#13>", line 1, in <module>
len(B())
ValueError: __len__() should return >= 0
Of course, it is possible to "override" the len function by reassigning it as a global variable, but code which does this is much more obviously suspicious than code which overrides a method in a class.
So, I was playing around with Python while answering this question, and I discovered that this is not valid:
o = object()
o.attr = 'hello'
due to an AttributeError: 'object' object has no attribute 'attr'. However, with any class inherited from object, it is valid:
class Sub(object):
pass
s = Sub()
s.attr = 'hello'
Printing s.attr displays 'hello' as expected. Why is this the case? What in the Python language specification specifies that you can't assign attributes to vanilla objects?
For other workarounds, see How can I create an object and add attributes to it?.
To support arbitrary attribute assignment, an object needs a __dict__: a dict associated with the object, where arbitrary attributes can be stored. Otherwise, there's nowhere to put new attributes.
An instance of object does not carry around a __dict__ -- if it did, before the horrible circular dependence problem (since dict, like most everything else, inherits from object;-), this would saddle every object in Python with a dict, which would mean an overhead of many bytes per object that currently doesn't have or need a dict (essentially, all objects that don't have arbitrarily assignable attributes don't have or need a dict).
For example, using the excellent pympler project (you can get it via svn from here), we can do some measurements...:
>>> from pympler import asizeof
>>> asizeof.asizeof({})
144
>>> asizeof.asizeof(23)
16
You wouldn't want every int to take up 144 bytes instead of just 16, right?-)
Now, when you make a class (inheriting from whatever), things change...:
>>> class dint(int): pass
...
>>> asizeof.asizeof(dint(23))
184
...the __dict__ is now added (plus, a little more overhead) -- so a dint instance can have arbitrary attributes, but you pay quite a space cost for that flexibility.
So what if you wanted ints with just one extra attribute foobar...? It's a rare need, but Python does offer a special mechanism for the purpose...
>>> class fint(int):
... __slots__ = 'foobar',
... def __init__(self, x): self.foobar=x+100
...
>>> asizeof.asizeof(fint(23))
80
...not quite as tiny as an int, mind you! (or even the two ints, one the self and one the self.foobar -- the second one can be reassigned), but surely much better than a dint.
When the class has the __slots__ special attribute (a sequence of strings), then the class statement (more precisely, the default metaclass, type) does not equip every instance of that class with a __dict__ (and therefore the ability to have arbitrary attributes), just a finite, rigid set of "slots" (basically places which can each hold one reference to some object) with the given names.
In exchange for the lost flexibility, you gain a lot of bytes per instance (probably meaningful only if you have zillions of instances gallivanting around, but, there are use cases for that).
As other answerers have said, an object does not have a __dict__. object is the base class of all types, including int or str. Thus whatever is provided by object will be a burden to them as well. Even something as simple as an optional __dict__ would need an extra pointer for each value; this would waste additional 4-8 bytes of memory for each object in the system, for a very limited utility.
Instead of doing an instance of a dummy class, in Python 3.3+, you can (and should) use types.SimpleNamespace for this.
It is simply due to optimization.
Dicts are relatively large.
>>> import sys
>>> sys.getsizeof((lambda:1).__dict__)
140
Most (maybe all) classes that are defined in C do not have a dict for optimization.
If you look at the source code you will see that there are many checks to see if the object has a dict or not.
So, investigating my own question, I discovered this about the Python language: you can inherit from things like int, and you see the same behaviour:
>>> class MyInt(int):
pass
>>> x = MyInt()
>>> print x
0
>>> x.hello = 4
>>> print x.hello
4
>>> x = x + 1
>>> print x
1
>>> print x.hello
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
AttributeError: 'int' object has no attribute 'hello'
I assume the error at the end is because the add function returns an int, so I'd have to override functions like __add__ and such in order to retain my custom attributes. But this all now makes sense to me (I think), when I think of "object" like "int".
https://docs.python.org/3/library/functions.html#object :
Note: object does not have a __dict__, so you can’t assign arbitrary attributes to an instance of the object class.
It's because object is a "type", not a class. In general, all classes that are defined in C extensions (like all the built in datatypes, and stuff like numpy arrays) do not allow addition of arbitrary attributes.
This is (IMO) one of the fundamental limitations with Python - you can't re-open classes. I believe the actual problem, though, is caused by the fact that classes implemented in C can't be modified at runtime... subclasses can, but not the base classes.
I know that python has a len() function that is used to determine the size of a string, but I was wondering why it's not a method of the string object?
Strings do have a length method: __len__()
The protocol in Python is to implement this method on objects which have a length and use the built-in len() function, which calls it for you, similar to the way you would implement __iter__() and use the built-in iter() function (or have the method called behind the scenes for you) on objects which are iterable.
See Emulating container types for more information.
Here's a good read on the subject of protocols in Python: Python and the Principle of Least Astonishment
Jim's answer to this question may help; I copy it here. Quoting Guido van Rossum:
First of all, I chose len(x) over x.len() for HCI reasons (def __len__() came much later). There are two intertwined reasons actually, both HCI:
(a) For some operations, prefix notation just reads better than postfix — prefix (and infix!) operations have a long tradition in mathematics which likes notations where the visuals help the mathematician thinking about a problem. Compare the easy with which we rewrite a formula like x*(a+b) into x*a + x*b to the clumsiness of doing the same thing using a raw OO notation.
(b) When I read code that says len(x) I know that it is asking for the length of something. This tells me two things: the result is an integer, and the argument is some kind of container. To the contrary, when I read x.len(), I have to already know that x is some kind of container implementing an interface or inheriting from a class that has a standard len(). Witness the confusion we occasionally have when a class that is not implementing a mapping has a get() or keys() method, or something that isn’t a file has a write() method.
Saying the same thing in another way, I see ‘len‘ as a built-in operation. I’d hate to lose that. /…/
Python is a pragmatic programming language, and the reasons for len() being a function and not a method of str, list, dict etc. are pragmatic.
The len() built-in function deals directly with built-in types: the CPython implementation of len() actually returns the value of the ob_size field in the PyVarObject C struct that represents any variable-sized built-in object in memory. This is much faster than calling a method -- no attribute lookup needs to happen. Getting the number of items in a collection is a common operation and must work efficiently for such basic and diverse types as str, list, array.array etc.
However, to promote consistency, when applying len(o) to a user-defined type, Python calls o.__len__() as a fallback. __len__, __abs__ and all the other special methods documented in the Python Data Model make it easy to create objects that behave like the built-ins, enabling the expressive and highly consistent APIs we call "Pythonic".
By implementing special methods your objects can support iteration, overload infix operators, manage contexts in with blocks etc. You can think of the Data Model as a way of using the Python language itself as a framework where the objects you create can be integrated seamlessly.
A second reason, supported by quotes from Guido van Rossum like this one, is that it is easier to read and write len(s) than s.len().
The notation len(s) is consistent with unary operators with prefix notation, like abs(n). len() is used way more often than abs(), and it deserves to be as easy to write.
There may also be a historical reason: in the ABC language which preceded Python (and was very influential in its design), there was a unary operator written as #s which meant len(s).
There is a len method:
>>> a = 'a string of some length'
>>> a.__len__()
23
>>> a.__len__
<method-wrapper '__len__' of str object at 0x02005650>
met% python -c 'import this' | grep 'only one'
There should be one-- and preferably only one --obvious way to do it.
There are some great answers here, and so before I give my own I'd like to highlight a few of the gems (no ruby pun intended) I've read here.
Python is not a pure OOP language -- it's a general purpose, multi-paradigm language that allows the programmer to use the paradigm they are most comfortable with and/or the paradigm that is best suited for their solution.
Python has first-class functions, so len is actually an object. Ruby, on the other hand, doesn't have first class functions. So the len function object has it's own methods that you can inspect by running dir(len).
If you don't like the way this works in your own code, it's trivial for you to re-implement the containers using your preferred method (see example below).
>>> class List(list):
... def len(self):
... return len(self)
...
>>> class Dict(dict):
... def len(self):
... return len(self)
...
>>> class Tuple(tuple):
... def len(self):
... return len(self)
...
>>> class Set(set):
... def len(self):
... return len(self)
...
>>> my_list = List([1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F'])
>>> my_dict = Dict({'key': 'value', 'site': 'stackoverflow'})
>>> my_set = Set({1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F'})
>>> my_tuple = Tuple((1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F'))
>>> my_containers = Tuple((my_list, my_dict, my_set, my_tuple))
>>>
>>> for container in my_containers:
... print container.len()
...
15
2
15
15
Something missing from the rest of the answers here: the len function checks that the __len__ method returns a non-negative int. The fact that len is a function means that classes cannot override this behaviour to avoid the check. As such, len(obj) gives a level of safety that obj.len() cannot.
Example:
>>> class A:
... def __len__(self):
... return 'foo'
...
>>> len(A())
Traceback (most recent call last):
File "<pyshell#8>", line 1, in <module>
len(A())
TypeError: 'str' object cannot be interpreted as an integer
>>> class B:
... def __len__(self):
... return -1
...
>>> len(B())
Traceback (most recent call last):
File "<pyshell#13>", line 1, in <module>
len(B())
ValueError: __len__() should return >= 0
Of course, it is possible to "override" the len function by reassigning it as a global variable, but code which does this is much more obviously suspicious than code which overrides a method in a class.
So, I'm just beginning to learn Python (using Codecademy), and I'm a bit confused.
Why are there some methods that take an argument, and others use the dot notation?
len() takes an arugment, but won't work with the dot notation:
>>> len("Help")
4
>>>"help".len()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'len'
And likewise:
>>>"help".upper()
'HELP'
>>>upper("help")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'upper' is not defined
The key word here is method. There is a slight difference between a function and a method.
Method
Is a function that is defined in the class of the given object. For example:
class Dog:
def bark(self):
print 'Woof woof!'
rufus = Dog()
rufus.bark() # called from the object
Function
A function is a globally defined procedure:
def bark():
print 'Woof woof!'
As for your question regarding the len function, the globally defined function calls the object's __len__ special method. So in this scenario, it is an issue of readability.
Otherwise, methods are better when they apply only to certain objects. Functions are better when they apply to multiple objects. For example, how can you uppercase a number? You wouldn't define that as a function, you'd define it as only a method only in the string class.
What you call "dot notation" are class methods and they only work for classes that have the method defined by the class implementer. len is a builtin function that takes one argument and returns the size of that object. A class may implement a method called len if its wants to, but most don't. The builtin len function has a rule that says if a class has a method called __len__, it will use it, so this works:
>>> class C(object):
... def __len__(self):
... return 100
...
>>> len(C())
100
"help".upper is the opposite. The string class defines a method called upper, but that doesn't mean there has to be a function called upper also. It turns out that there is an upper function in the string module, but generally you don't have to implement an extra function just because you implemented a class method.
This is the difference between a function and a method. If you are only just learning the basics, maybe simply accept that this difference exists, and that you will eventually understand it.
Still here? It's not even hard, actually. In object-oriented programming, methods are preferred over functions for many things, because that means one type of object can override its version of the method without affecting the rest of the system.
For example, let's pretend you had a new kind of string where accented characters should lose their accent when you call .upper(). Instances of this type can subclass str and behave exactly the same in every other aspect, basically for free; all they need to redefine is the upper method (and even then, probably call the method of the base class and only change the logic when you handle an accented lowercase character). And software which expects to work on strings will just continue to work and not even know the difference if you pass in an object of this new type where a standard str is expected.
A design principle in Python is that everything is an object. This means you can create your own replacements even for basic fundamental objects like object, class, and type, i.e. extend or override the basic language for your application or platform.
In fact, this happened in Python 2 when unicode strings were introduced to the language. A lot of application software continued to work exactly as before, but now with unicode instances where previously the code had been written to handle str instances. (This difference no longer exists in Python 3; or rather, the type which was called str and was used almost everywhere is now called bytes and is only used when you specifically want to handle data which is not text.)
Going back to our new upper method, think about the opposite case; if upper was just a function in the standard library, how would you even think about modifying software which needs upper to behave differently? What if tomorrow your boss wants you to do the same for lower? It would be a huge undertaking, and the changes you would have to make all over the code base would easily tend towards a spaghetti structure, as well as probably introduce subtle new bugs.
This is one of the cornerstones of object-oriented programming, but it probably only really makes ense when you learn the other two or three principles in a more structured introduction. For now, perhaps the quick and dirty summary is "methods make the implementation modular and extensible."