I have noticed that in Python some of the fundamental types have two types of methods: those that are surrounded by __ and those that don't.
For example, if I have a variable of type float called my_number, I can see in IPython that it has the following methods:
my_number.__abs__ my_number.__pos__
my_number.__add__ my_number.__pow__
my_number.__class__ my_number.__radd__
my_number.__coerce__ my_number.__rdiv__
my_number.__delattr__ my_number.__rdivmod__
my_number.__div__ my_number.__reduce__
my_number.__divmod__ my_number.__reduce_ex__
my_number.__doc__ my_number.__repr__
my_number.__eq__ my_number.__rfloordiv__
my_number.__float__ my_number.__rmod__
my_number.__floordiv__ my_number.__rmul__
my_number.__format__ my_number.__rpow__
my_number.__ge__ my_number.__rsub__
my_number.__getattribute__ my_number.__rtruediv__
my_number.__getformat__ my_number.__setattr__
my_number.__getnewargs__ my_number.__setformat__
my_number.__gt__ my_number.__sizeof__
my_number.__hash__ my_number.__str__
my_number.__init__ my_number.__sub__
my_number.__int__ my_number.__subclasshook__
my_number.__le__ my_number.__truediv__
my_number.__long__ my_number.__trunc__
my_number.__lt__ my_number.as_integer_ratio
my_number.__mod__ my_number.conjugate
my_number.__mul__ my_number.fromhex
my_number.__ne__ my_number.hex
my_number.__neg__ my_number.imag
my_number.__new__ my_number.is_integer
my_number.__nonzero__ my_number.real
What is the difference between those that are surrounded by ___ and those that aren't?
Is this some sort of standard used in other programming languages? Does it usually mean the same thing in similar languages?
Generally, "double underscore" methods are used internally by Python for certain builtin functions or operators (e.g. __add__ defines behavior for the +). Those that do not have double underscores are just normal methods that wouldn't be used by operators or builtins. Now, these methods are still "normal" methods, in that you can call them just like any other method, but parts of the Python core treat them specially.
No, as far as I am aware, this is unique to Python, though many other languages support a similar idea (builtin/operator overloading) but through different mechanisms.
Shameless plug: I wrote a guide to this aspect of Python last year which is fairly comprehensive, you can read about how to use these methods on your own objects at http://rafekettler.com/magicmethods.html.
1) The variables you speak of __*__ are system variables or methods.
To quote the Python reference guide:
System-defined names. These names are defined by the interpreter and its implementation (including the standard library); applications should not expect to define additional names using this convention. The set of names of this class defined by Python may be extended in future versions.
Essentially they are variables or methods pre defined by the system. For example the system variable __name__ can be used within any function and will always contain the name of that function. You can find more comprehensive information and examples here.
2) This concept of system reserved variables is fundamental in most programming languages. For example PHP refers to them as Magic Constants. The Python example above to get a function name can be achieved in PHP using __FUNCTION__. More examples here.
From documentation:
Certain classes of identifiers (besides keywords) have special
meanings. These classes are identified by the patterns of leading and
trailing underscore characters:
_*
Not imported by from module import *. The special identifier _ is used in the interactive interpreter to store the result of the last
evaluation; it is stored in the builtin module. When not in
interactive mode, _ has no special meaning and is not defined. See
section The import statement.
Note The name _ is often used in conjunction with internationalization; refer to the documentation for the gettext
module for more information on this convention.
__ * __
System-defined names. These names are defined by the interpreter and its implementation (including the standard library). Current
system names are discussed in the Special method names section and
elsewhere. More will likely be defined in future versions of Python.
Any use of * names, in any context, that does not follow
explicitly documented use, is subject to breakage without warning.
__*
Class-private names. Names in this category, when used within the context of a class definition, are re-written to use a mangled form to
help avoid name clashes between “private” attributes of base and
derived classes. See section Identifiers (Names).
http://docs.python.org/reference/lexical_analysis.html#reserved-classes-of-identifiers
These are special methods that are called in specific contexts.
Read the documentation for the details.:
A class can implement certain operations that are invoked by special
syntax (such as arithmetic operations or subscripting and slicing) by
defining methods with special names. This is Python’s approach to
operator overloading, allowing classes to define their own behavior
with respect to language operators. For instance, if a class defines a
method named __getitem__(), and x is an instance of this class, then
x[i] is roughly equivalent to x.__getitem__(i) for old-style classes
and type(x).__getitem__(x, i) for new-style classes. Except where
mentioned, attempts to execute an operation raise an exception when no
appropriate method is defined (typically AttributeError or TypeError).
To answer your second question, this convention is used in C with macros such as __FILE__. Guido van Rossum, the creator of Python, explicitly says this was his inspiration in his blog on the history of Python.
its used to indicate that the attributes are internal. there is no actual privacy in python, so these are used as hints yo indicate internals rather than api methods.
Related
I'm looking at code like this:
def foo():
return 42
foo.x = 5
This obviously adds a member to the function object named foo. I find this very useful as it makes these function objects look very similar to Objects with a __call__ function.
Are there rules I must follow to make sure I don't cause problems in future updates to Python, such as names that I must avoid? Perhaps there is a PEP or documentation section that mentions rules?
There are no rules, other than to take the reserved classes of identifiers into account. Specifically, try to avoid using dunder names:
System-defined names, informally known as “dunder” names. [...] Any use of __*__ names, in any context, that does not follow explicitly documented use, is subject to breakage without warning.
There is otherwise nothing special about functions accepting arbitrary attributes; almost anything in Python accepts arbitrary attributes if there is a place to put them (which is, almost always, a __dict__ attribute).
Within the Python standard library, function attributes are used to link decorator wrapper functions to the original wrapped function (via the functools.update_wrapper() function and it's side-kick, the #functools.wraps() decorator function), and to attach state and methods to a function when augmented by decorators (e.g. the singledispatch() decorator adds several methods and a registry to the decorated function).
It is a good technique. Rule is not shadow any dunder names which have special meanings.
Here is a good way to implement a singleton:
import faker
def my_fake_data():
if not getattr(my_fake_data, 'factory', None):
my_fake_data.factory = faker.Faker()
return my_fake_data.factory
Monkey patching uses a similar technique (setting a class attribute instead pf a function attribute) but for more "devious" reasons such as changing the implementation of a previously defined class.
Often when I see class definitions class Foo:, I always see them start with upper case letters.
However, isn't a list [] or a dict {} or some other built-in type, a class as well? For that matter, everything typed into the Python's IDLE which is a keyword that is automatically color coded in purple (with the Window's binary distribution), is itself a class, right?
Such as spam = list()
spam is now an instance of a list()
So my question is, why does Python allow us to first of all do something like list = list() when nobody, probably, does that. But also, why is it not list = List()
Did the developers of the language decide not to use any sort of convention, while it is the case that most Python programmers do name their classes as such?
Yes, uppercase-initial classes are the convention, as outlined in PEP 8.
You are correct that many builtin types do not follow this convention. These are holdovers from earlier stages of Python when there was a much bigger difference between user-defined classes and builtin types. However, it still seems that builtin or extension types written in C are more likely to have lowercase names (e.g., numpy.array, not numpy.Array).
Nonetheless, the convention is to use uppercase-initial for your own classes in Python code.
PEP8 is the place to go for code style.
To address your question on why list = list() is valid, list is simply a name in the global namespace, and can be overriden like any other variable.
All are in the end design decisions.
[...] why does python allow us to first of all do something like list = list()
https://www.python.org/dev/peps/pep-0020/ says (try also >>> import this)
Simple is better than complex.
Special cases aren't special enough to break the rules.
And this would be a special case.
[...] why is it not list = List()
https://www.python.org/dev/peps/pep-0008/#class-names says:
Class Names
Class names should normally use the CapWords convention.
The naming convention for functions may be used instead in cases where
the interface is documented and used primarily as a callable.
Note that there is a separate convention for builtin names: most
builtin names are single words (or two words run together), with the
CapWords convention used only for exception names and builtin
constants. [emphasis mine]
All other classes should use the CapWorlds convention. As list, object, etc are built-in names, they follow this separate convention.
I think that the only person who really knows the entire answer to your question is the BDFL. Convention outside of the types implemented in C is definitely to use upper-case (as detailed in PEP8). However, it's interesting to note that not all C-level types follow the convention (i.e. Py_True, Py_False) do not. Yes, they're constants at Python-level, but they're all PyTypeObjects. I'd be curious to know if that's the only distinction and, hence, the difference in convention.
According to PEP 8, a nice set of guidelines for Python developers:
Almost without exception, class names use the CapWords convention.
Classes for internal use have a leading underscore in addition.
PEP 8 is directed at Python development for the standard library in the main Python distribution, but they're sensible guidelines to follow.
I'm reading a book on Python, and it says that when you make a call to help(obj) to list all of the methods that can be called on obj, the methods that are surrounded by __ on both sides are private helper methods that cannot be called.
However, one of the listed methods for a string is __len__ and you can verify that if s is some string, entering s.__len__() into Python returns the length of s.
Why is okay to call some of these methods, such as __len__, but others cannot be called?
The book is incorrect. You can call __dunder__ special methods directly; all that is special about them is their documented use in Python and how the language itself uses them.
Most code just should not call them directly and leave it to Python to call them. Use the len() function rather than call the __len__ method on the object, for example, because len() will validate the __len__ return value.
The language reserves all such names for its own use; see Reserved classes of identifiers in the reference documentation:
System-defined names, informally known as "dunder" names. These names are defined by the interpreter and its implementation (including the standard library). Current system names are discussed in the Special method names section and elsewhere. More will likely be defined in future versions of Python. Any use of __*__ names, in any context, that does not follow explicitly documented use, is subject to breakage without warning.
I am new to Python as you might tell. I have read various documents but I still can not figure out if there's a "naming best practices" for strings functions and of course, classes.
If I want to name a class or a function as a SiteMap, is it ok to use SiteMap? Should it be Site_map or any other thing, for example?
Thank you!
PS. any further reading resource is GREATLY appreciated!
PS. I am doing web-app development (learning, better to say!)
Naming Conventions:
There are various Python naming conventions I use. Consistency here is certainly good as it helps to identify what sort of object names point to. I think the conventions I use basically follow PEP8.
1) Module names should be lowercase with underscores instead of spaces.
(And should be valid module names for importing.)
2) Variable names and function/method names should also be lowercase
with underscores to separate words.
3) Class names should be CamelCase (uppercase letter to start with,
words run together, each starting with an uppercase letter).
4) Module constants should be all uppercase.
E.g. You would typically have module.ClassName.method_name.
5) Module names in CamelCase with a main class name identical to the
module name are annoying. (e.g. ConfigParser.ConfigParser, which
should always be spelt configobj.ConfigObj.)
6) Also, variables, functions, methods and classes which aren't part of your public API, should begin with a single underscore. (using double underscores to make attributes private almost always turns out to be a mistake - especially for testability.)
Whitespace
And finally, you should always have whitespace around operators and after punctuation. The exception is default arguments to methods and functions.
E.g. def function(default=argument): and x = a * b + c
PEP8 specifies recommended naming convention for Python. Among rules discussed there, it mentions underscore_names for functions and variables (regardless of their type), and CamelCase for classes.
This question already has answers here:
Why do some functions have underscores "__" before and after the function name?
(7 answers)
Closed 5 months ago.
I'm fairly new to actual programming languages, and Python is my first one. I know my way around Linux a bit, enough to get a summer job with it (I'm still in high school), and on the job, I have a lot of free time which I'm using to learn Python.
One thing's been getting me though. What exactly is different in Python when you have expressions such as
x.__add__(y) <==> x+y
x.__getattribute__('foo') <==> x.foo
I know what methods do and stuff, and I get what they do, but my question is: How are those double underscore methods above different from their simpler looking equivalents?
P.S., I don't mind being lectured on programming history, in fact, I find it very useful to know :) If these are mainly historical aspects of Python, feel free to start rambling.
Here is the creator of Python explaining it:
... rather than devising a new syntax for
special kinds of class methods (such
as initializers and destructors), I
decided that these features could be
handled by simply requiring the user
to implement methods with special
names such as __init__, __del__, and
so forth. This naming convention was
taken from C where identifiers
starting with underscores are reserved
by the compiler and often have special
meaning (e.g., macros such as
__FILE__ in the C preprocessor).
...
I also used this technique to allow
user classes to redefine the behavior
of Python's operators. As previously
noted, Python is implemented in C and
uses tables of function pointers to
implement various capabilities of
built-in objects (e.g., “get
attribute”, “add” and “call”). To
allow these capabilities to be defined
in user-defined classes, I mapped the
various function pointers to special
method names such as __getattr__,
__add__, and __call__. There is a
direct correspondence between these
names and the tables of function
pointers one has to define when
implementing new Python objects in C.
When you start a method with two underscores (and no trailing underscores), Python's name mangling rules are applied. This is a way to loosely simulate the private keyword from other OO languages such as C++ and Java. (Even so, the method is still technically not private in the way that Java and C++ methods are private, but it is "harder to get at" from outside the instance.)
Methods with two leading and two trailing underscores are considered to be "built-in" methods, that is, they're used by the interpreter and are generally the concrete implementations of overloaded operators or other built-in functionality.
Well, power for the programmer is good, so there should be a way to customize behaviour. Like operator overloading (__add__, __div__, __ge__, ...), attribute access (__getattribute__, __getattr__ (those two are differnt), __delattr__, ...) etc. In many cases, like operators, the usual syntax maps 1:1 to the respective method. In other cases, there is a special procedure which at some point involves calling the respective method - for example, __getattr__ is only called if the object doesn't have the requested attribute and __getattribute__ is not implemented or raised AttributeError. And some of them are really advanced topics that get you deeeeep into the object system's guts and are rarely needed. So no need to learn them all, just consult the reference when you need/want to know. Speaking of reference, here it is.
They are used to specify that the Python interpreter should use them in specific situations.
E.g., the __add__ function allows the + operator to work for custom classes. Otherwise you will get some sort of not defined error when attempting to add.
From an historical perspective, leading underscores have often been used as a method for indicating to the programmer that the names are to be considered internal to the package/module/library that defines them. In languages which do not provide good support for private namespaces, using underscores is a convention to emulate that. In Python, when you define a method named '__foo__' the maintenance programmer knows from the name that something special is going on which is not happening with a method named 'foo'. If Python had choosen to use 'add' as the internal method to overload '+', then you could never have a class with a method 'add' without causing much confusion. The underscores serve as a cue that some magic will happen.
A number of other questions are now marked as duplicates of this question, and at least two of them ask what either the __spam__ methods are called, or what the convention is called, and none of the existing answers cover that, so:
There actually is no official name for either.
Many developers unofficially call them "dunder methods", for "Double UNDERscore".
Some people use the term "magic methods", but that's somewhat ambiguous between meaning dunder methods, special methods (see below), or something somewhere between the two.
There is an official term "special attributes", which overlaps closely but not completely with dunder methods. The Data Model chapter in the reference never quite explains what a special attribute is, but the basic idea is that it's at least one of the following:
An attribute that's provided by the interpreter itself or its builtin code, like __name__ on a function.
An attribute that's part of a protocol implemented by the interpreter itself, like __add__ for the + operator, or __getitem__ for indexing and slicing.
An attribute that the interpreter is allowed to look up specially, by ignoring the instance and going right to the class, like __add__ again.
Most special attributes are methods, but not all (e.g., __name__ isn't). And most use the "dunder" convention, but not all (e.g., the next method on iterators in Python 2.x).
And meanwhile, most dunder methods are special attributes, but not all—in particular, it's not that uncommon for stdlib or external libraries to want to define their own protocols that work the same way, like the pickle protocol.
[Speculation] Python was influenced by Algol68, Guido possibly used Algol68 at the University of Amsterdam where Algol68 has a similar "stropping regime" called "Quote stropping". In Algol68 the operators, types and keywords can appear in a different typeface (usually **bold**, or __underlined__), in sourcecode files this typeface is achieved with quotes, e.g. 'abs' (quoting similar to quoting in 'wikitext')
Algol68 ⇒ Python (Operators mapped to member functions)
'and' ⇒ __and__
'or' ⇒ __or__
'not' ⇒ not
'entier' ⇒ __trunc__
'shl' ⇒ __lshift__
'shr' ⇒ __rshift__
'upb' ⇒ __sizeof__
'long' ⇒ __long__
'int' ⇒ __int__
'real' ⇒ __float__
'format' ⇒ __format__
'repr' ⇒ __repr__
'abs' ⇒ __abs__
'minus' ⇒ __neg__
'minus' ⇒ __sub__
'plus' ⇒ __add__
'times' ⇒ __mul__
'mod' ⇒ __mod__
'div' ⇒ __truediv__
'over' ⇒ __div__
'up' ⇒ __pow__
'im' ⇒ imag
're' ⇒ real
'conj' ⇒ conjugate
In Algol68 these were refered to as bold names, e.g. abs, but "under-under-abs" __abs__ in python.
My 2 cents: ¢ So sometimes — like a ghost — when you cut and paste python classes into a wiki you will magically reincarnate Algol68's bold keywords. ¢