I'm reading a book on Python, and it says that when you make a call to help(obj) to list all of the methods that can be called on obj, the methods that are surrounded by __ on both sides are private helper methods that cannot be called.
However, one of the listed methods for a string is __len__ and you can verify that if s is some string, entering s.__len__() into Python returns the length of s.
Why is okay to call some of these methods, such as __len__, but others cannot be called?
The book is incorrect. You can call __dunder__ special methods directly; all that is special about them is their documented use in Python and how the language itself uses them.
Most code just should not call them directly and leave it to Python to call them. Use the len() function rather than call the __len__ method on the object, for example, because len() will validate the __len__ return value.
The language reserves all such names for its own use; see Reserved classes of identifiers in the reference documentation:
System-defined names, informally known as "dunder" names. These names are defined by the interpreter and its implementation (including the standard library). Current system names are discussed in the Special method names section and elsewhere. More will likely be defined in future versions of Python. Any use of __*__ names, in any context, that does not follow explicitly documented use, is subject to breakage without warning.
Related
I'm looking at code like this:
def foo():
return 42
foo.x = 5
This obviously adds a member to the function object named foo. I find this very useful as it makes these function objects look very similar to Objects with a __call__ function.
Are there rules I must follow to make sure I don't cause problems in future updates to Python, such as names that I must avoid? Perhaps there is a PEP or documentation section that mentions rules?
There are no rules, other than to take the reserved classes of identifiers into account. Specifically, try to avoid using dunder names:
System-defined names, informally known as “dunder” names. [...] Any use of __*__ names, in any context, that does not follow explicitly documented use, is subject to breakage without warning.
There is otherwise nothing special about functions accepting arbitrary attributes; almost anything in Python accepts arbitrary attributes if there is a place to put them (which is, almost always, a __dict__ attribute).
Within the Python standard library, function attributes are used to link decorator wrapper functions to the original wrapped function (via the functools.update_wrapper() function and it's side-kick, the #functools.wraps() decorator function), and to attach state and methods to a function when augmented by decorators (e.g. the singledispatch() decorator adds several methods and a registry to the decorated function).
It is a good technique. Rule is not shadow any dunder names which have special meanings.
Here is a good way to implement a singleton:
import faker
def my_fake_data():
if not getattr(my_fake_data, 'factory', None):
my_fake_data.factory = faker.Faker()
return my_fake_data.factory
Monkey patching uses a similar technique (setting a class attribute instead pf a function attribute) but for more "devious" reasons such as changing the implementation of a previously defined class.
The inspect.signature doc states that it supports classes as input, but it doesn't go into any sort of detail:
Accepts a wide range of Python callables, from plain functions and classes to functools.partial() objects.
If I call inspect.signature(MyClass), what signature does it return? Does it return the signature of MyClass.__init__? Or MyClass.__new__? Or something else?
It tries pretty much everything it reasonably could. I think the details are probably deliberately undocumented, because they're complicated and likely to get more so as new Python versions add more stuff to try.
For example, as of CPython 3.7.3, the code path tries the following things in order:
If the metaclass has a custom __call__ defined in Python, it uses the signature of the metaclass __call__ with the first argument removed.
Otherwise, if the class has a __new__ method defined in Python, it uses the __new__ signature with the first argument removed.
Otherwise, if the class has an __init__ method defined in Python, it uses the __init__ signature with the first argument removed.
Otherwise, it traverses the MRO looking for a __text_signature__. If it finds one, it parses __text_signature__ to get the signature information.
If it still hasn't found anything, if the type's __init__ is object.__init__ and the type's __new__ is object.__new__, it returns the signature of the object class. (There's a misleading comment and a possible bug involving metaclasses around this point - the comment says it's going to check for type.__init__, but it doesn't do that. I think this commit may have made a mistake here.)
If it still hasn't found anything, it gives up and raises a ValueError saying it couldn't find anything.
Often when I see class definitions class Foo:, I always see them start with upper case letters.
However, isn't a list [] or a dict {} or some other built-in type, a class as well? For that matter, everything typed into the Python's IDLE which is a keyword that is automatically color coded in purple (with the Window's binary distribution), is itself a class, right?
Such as spam = list()
spam is now an instance of a list()
So my question is, why does Python allow us to first of all do something like list = list() when nobody, probably, does that. But also, why is it not list = List()
Did the developers of the language decide not to use any sort of convention, while it is the case that most Python programmers do name their classes as such?
Yes, uppercase-initial classes are the convention, as outlined in PEP 8.
You are correct that many builtin types do not follow this convention. These are holdovers from earlier stages of Python when there was a much bigger difference between user-defined classes and builtin types. However, it still seems that builtin or extension types written in C are more likely to have lowercase names (e.g., numpy.array, not numpy.Array).
Nonetheless, the convention is to use uppercase-initial for your own classes in Python code.
PEP8 is the place to go for code style.
To address your question on why list = list() is valid, list is simply a name in the global namespace, and can be overriden like any other variable.
All are in the end design decisions.
[...] why does python allow us to first of all do something like list = list()
https://www.python.org/dev/peps/pep-0020/ says (try also >>> import this)
Simple is better than complex.
Special cases aren't special enough to break the rules.
And this would be a special case.
[...] why is it not list = List()
https://www.python.org/dev/peps/pep-0008/#class-names says:
Class Names
Class names should normally use the CapWords convention.
The naming convention for functions may be used instead in cases where
the interface is documented and used primarily as a callable.
Note that there is a separate convention for builtin names: most
builtin names are single words (or two words run together), with the
CapWords convention used only for exception names and builtin
constants. [emphasis mine]
All other classes should use the CapWorlds convention. As list, object, etc are built-in names, they follow this separate convention.
I think that the only person who really knows the entire answer to your question is the BDFL. Convention outside of the types implemented in C is definitely to use upper-case (as detailed in PEP8). However, it's interesting to note that not all C-level types follow the convention (i.e. Py_True, Py_False) do not. Yes, they're constants at Python-level, but they're all PyTypeObjects. I'd be curious to know if that's the only distinction and, hence, the difference in convention.
According to PEP 8, a nice set of guidelines for Python developers:
Almost without exception, class names use the CapWords convention.
Classes for internal use have a leading underscore in addition.
PEP 8 is directed at Python development for the standard library in the main Python distribution, but they're sensible guidelines to follow.
So, I'm just beginning to learn Python (using Codecademy), and I'm a bit confused.
Why are there some methods that take an argument, and others use the dot notation?
len() takes an arugment, but won't work with the dot notation:
>>> len("Help")
4
>>>"help".len()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'len'
And likewise:
>>>"help".upper()
'HELP'
>>>upper("help")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'upper' is not defined
The key word here is method. There is a slight difference between a function and a method.
Method
Is a function that is defined in the class of the given object. For example:
class Dog:
def bark(self):
print 'Woof woof!'
rufus = Dog()
rufus.bark() # called from the object
Function
A function is a globally defined procedure:
def bark():
print 'Woof woof!'
As for your question regarding the len function, the globally defined function calls the object's __len__ special method. So in this scenario, it is an issue of readability.
Otherwise, methods are better when they apply only to certain objects. Functions are better when they apply to multiple objects. For example, how can you uppercase a number? You wouldn't define that as a function, you'd define it as only a method only in the string class.
What you call "dot notation" are class methods and they only work for classes that have the method defined by the class implementer. len is a builtin function that takes one argument and returns the size of that object. A class may implement a method called len if its wants to, but most don't. The builtin len function has a rule that says if a class has a method called __len__, it will use it, so this works:
>>> class C(object):
... def __len__(self):
... return 100
...
>>> len(C())
100
"help".upper is the opposite. The string class defines a method called upper, but that doesn't mean there has to be a function called upper also. It turns out that there is an upper function in the string module, but generally you don't have to implement an extra function just because you implemented a class method.
This is the difference between a function and a method. If you are only just learning the basics, maybe simply accept that this difference exists, and that you will eventually understand it.
Still here? It's not even hard, actually. In object-oriented programming, methods are preferred over functions for many things, because that means one type of object can override its version of the method without affecting the rest of the system.
For example, let's pretend you had a new kind of string where accented characters should lose their accent when you call .upper(). Instances of this type can subclass str and behave exactly the same in every other aspect, basically for free; all they need to redefine is the upper method (and even then, probably call the method of the base class and only change the logic when you handle an accented lowercase character). And software which expects to work on strings will just continue to work and not even know the difference if you pass in an object of this new type where a standard str is expected.
A design principle in Python is that everything is an object. This means you can create your own replacements even for basic fundamental objects like object, class, and type, i.e. extend or override the basic language for your application or platform.
In fact, this happened in Python 2 when unicode strings were introduced to the language. A lot of application software continued to work exactly as before, but now with unicode instances where previously the code had been written to handle str instances. (This difference no longer exists in Python 3; or rather, the type which was called str and was used almost everywhere is now called bytes and is only used when you specifically want to handle data which is not text.)
Going back to our new upper method, think about the opposite case; if upper was just a function in the standard library, how would you even think about modifying software which needs upper to behave differently? What if tomorrow your boss wants you to do the same for lower? It would be a huge undertaking, and the changes you would have to make all over the code base would easily tend towards a spaghetti structure, as well as probably introduce subtle new bugs.
This is one of the cornerstones of object-oriented programming, but it probably only really makes ense when you learn the other two or three principles in a more structured introduction. For now, perhaps the quick and dirty summary is "methods make the implementation modular and extensible."
I have noticed that in Python some of the fundamental types have two types of methods: those that are surrounded by __ and those that don't.
For example, if I have a variable of type float called my_number, I can see in IPython that it has the following methods:
my_number.__abs__ my_number.__pos__
my_number.__add__ my_number.__pow__
my_number.__class__ my_number.__radd__
my_number.__coerce__ my_number.__rdiv__
my_number.__delattr__ my_number.__rdivmod__
my_number.__div__ my_number.__reduce__
my_number.__divmod__ my_number.__reduce_ex__
my_number.__doc__ my_number.__repr__
my_number.__eq__ my_number.__rfloordiv__
my_number.__float__ my_number.__rmod__
my_number.__floordiv__ my_number.__rmul__
my_number.__format__ my_number.__rpow__
my_number.__ge__ my_number.__rsub__
my_number.__getattribute__ my_number.__rtruediv__
my_number.__getformat__ my_number.__setattr__
my_number.__getnewargs__ my_number.__setformat__
my_number.__gt__ my_number.__sizeof__
my_number.__hash__ my_number.__str__
my_number.__init__ my_number.__sub__
my_number.__int__ my_number.__subclasshook__
my_number.__le__ my_number.__truediv__
my_number.__long__ my_number.__trunc__
my_number.__lt__ my_number.as_integer_ratio
my_number.__mod__ my_number.conjugate
my_number.__mul__ my_number.fromhex
my_number.__ne__ my_number.hex
my_number.__neg__ my_number.imag
my_number.__new__ my_number.is_integer
my_number.__nonzero__ my_number.real
What is the difference between those that are surrounded by ___ and those that aren't?
Is this some sort of standard used in other programming languages? Does it usually mean the same thing in similar languages?
Generally, "double underscore" methods are used internally by Python for certain builtin functions or operators (e.g. __add__ defines behavior for the +). Those that do not have double underscores are just normal methods that wouldn't be used by operators or builtins. Now, these methods are still "normal" methods, in that you can call them just like any other method, but parts of the Python core treat them specially.
No, as far as I am aware, this is unique to Python, though many other languages support a similar idea (builtin/operator overloading) but through different mechanisms.
Shameless plug: I wrote a guide to this aspect of Python last year which is fairly comprehensive, you can read about how to use these methods on your own objects at http://rafekettler.com/magicmethods.html.
1) The variables you speak of __*__ are system variables or methods.
To quote the Python reference guide:
System-defined names. These names are defined by the interpreter and its implementation (including the standard library); applications should not expect to define additional names using this convention. The set of names of this class defined by Python may be extended in future versions.
Essentially they are variables or methods pre defined by the system. For example the system variable __name__ can be used within any function and will always contain the name of that function. You can find more comprehensive information and examples here.
2) This concept of system reserved variables is fundamental in most programming languages. For example PHP refers to them as Magic Constants. The Python example above to get a function name can be achieved in PHP using __FUNCTION__. More examples here.
From documentation:
Certain classes of identifiers (besides keywords) have special
meanings. These classes are identified by the patterns of leading and
trailing underscore characters:
_*
Not imported by from module import *. The special identifier _ is used in the interactive interpreter to store the result of the last
evaluation; it is stored in the builtin module. When not in
interactive mode, _ has no special meaning and is not defined. See
section The import statement.
Note The name _ is often used in conjunction with internationalization; refer to the documentation for the gettext
module for more information on this convention.
__ * __
System-defined names. These names are defined by the interpreter and its implementation (including the standard library). Current
system names are discussed in the Special method names section and
elsewhere. More will likely be defined in future versions of Python.
Any use of * names, in any context, that does not follow
explicitly documented use, is subject to breakage without warning.
__*
Class-private names. Names in this category, when used within the context of a class definition, are re-written to use a mangled form to
help avoid name clashes between “private” attributes of base and
derived classes. See section Identifiers (Names).
http://docs.python.org/reference/lexical_analysis.html#reserved-classes-of-identifiers
These are special methods that are called in specific contexts.
Read the documentation for the details.:
A class can implement certain operations that are invoked by special
syntax (such as arithmetic operations or subscripting and slicing) by
defining methods with special names. This is Python’s approach to
operator overloading, allowing classes to define their own behavior
with respect to language operators. For instance, if a class defines a
method named __getitem__(), and x is an instance of this class, then
x[i] is roughly equivalent to x.__getitem__(i) for old-style classes
and type(x).__getitem__(x, i) for new-style classes. Except where
mentioned, attempts to execute an operation raise an exception when no
appropriate method is defined (typically AttributeError or TypeError).
To answer your second question, this convention is used in C with macros such as __FILE__. Guido van Rossum, the creator of Python, explicitly says this was his inspiration in his blog on the history of Python.
its used to indicate that the attributes are internal. there is no actual privacy in python, so these are used as hints yo indicate internals rather than api methods.