I'm looking at code like this:
def foo():
return 42
foo.x = 5
This obviously adds a member to the function object named foo. I find this very useful as it makes these function objects look very similar to Objects with a __call__ function.
Are there rules I must follow to make sure I don't cause problems in future updates to Python, such as names that I must avoid? Perhaps there is a PEP or documentation section that mentions rules?
There are no rules, other than to take the reserved classes of identifiers into account. Specifically, try to avoid using dunder names:
System-defined names, informally known as “dunder” names. [...] Any use of __*__ names, in any context, that does not follow explicitly documented use, is subject to breakage without warning.
There is otherwise nothing special about functions accepting arbitrary attributes; almost anything in Python accepts arbitrary attributes if there is a place to put them (which is, almost always, a __dict__ attribute).
Within the Python standard library, function attributes are used to link decorator wrapper functions to the original wrapped function (via the functools.update_wrapper() function and it's side-kick, the #functools.wraps() decorator function), and to attach state and methods to a function when augmented by decorators (e.g. the singledispatch() decorator adds several methods and a registry to the decorated function).
It is a good technique. Rule is not shadow any dunder names which have special meanings.
Here is a good way to implement a singleton:
import faker
def my_fake_data():
if not getattr(my_fake_data, 'factory', None):
my_fake_data.factory = faker.Faker()
return my_fake_data.factory
Monkey patching uses a similar technique (setting a class attribute instead pf a function attribute) but for more "devious" reasons such as changing the implementation of a previously defined class.
Related
I was just wondering, what is the terminology used for bits of code such as:
.lower()
.upper()
.get()
len()
And just general commands such as that.
The terms .lower(), .upper(), .get() etc are called methods in python.
these are functions which are members of a Class.
In a technical way,
A method is a function that takes a class instance as its first parameter. Methods are members of classes.
class My_Class:
def method(self, possibly, other, arguments):
pass # do something here
EDIT
thanks for #J.F.Sebastian for pointing it out,
len() is a function, not a method. len(), dir(), int(), open(), sorted() etc. are built -in functions of python.
Even more edit for conceptual clarification
Generally speaking, methods are functions that belong to a class, functions can be on any scope of the code. So in normal words you can say that all methods are functions, but not all functions are methods. Easy way to distinguish between both is the . operator. If it is preceded by a . operator it is a method. The general calling of a method is the_instance.the_method()
Those are "functions." The ones that are used with an "instance" like "blah".upper() are often called "methods" (of a class).
I'm reading a book on Python, and it says that when you make a call to help(obj) to list all of the methods that can be called on obj, the methods that are surrounded by __ on both sides are private helper methods that cannot be called.
However, one of the listed methods for a string is __len__ and you can verify that if s is some string, entering s.__len__() into Python returns the length of s.
Why is okay to call some of these methods, such as __len__, but others cannot be called?
The book is incorrect. You can call __dunder__ special methods directly; all that is special about them is their documented use in Python and how the language itself uses them.
Most code just should not call them directly and leave it to Python to call them. Use the len() function rather than call the __len__ method on the object, for example, because len() will validate the __len__ return value.
The language reserves all such names for its own use; see Reserved classes of identifiers in the reference documentation:
System-defined names, informally known as "dunder" names. These names are defined by the interpreter and its implementation (including the standard library). Current system names are discussed in the Special method names section and elsewhere. More will likely be defined in future versions of Python. Any use of __*__ names, in any context, that does not follow explicitly documented use, is subject to breakage without warning.
This is a followup to function that returns a dict whose keys are the names of the input arguments, which I learned many things (paraphrased):
Python objects, on the whole, don't know their names.
No, this is not possible in general with *args. You'll have to use keyword arguments
When the number of arguments is fixed, you can do this with locals
Using globals(). This will only work if the values are unique in the module scope, so it's fragile
You're probably better off not doing this anyway and rethinking the problem.
The first point highlighting my fundamental misunderstanding of Python variables. The responses were very pedagogic and nearly instantaneous, clearly this is both a well-understood yet easily confused topic.
Since I'd like to learn how to do things proper, is it considered bad practice to create a dummy class to simply hold the variables with names attached to them?
class system: pass
S = system ()
S.T = 1.0
S.N = 20
S.L = 10
print vars(S)
This accomplishes my original intent, but I'm left wondering if there is something I'm not considering that can bite me later.
I do it as a homage to Javascript, where you don't have any distinction between dictionaries and instance variables. I think it's not necessarily an antipattern, also because differently from dictionaries, if you don't have the value it raises AttributeError instead of KeyError, and it is easier to spot typos of the name. As I said, not an antipattern, provided that
the scope of the class is restricted to a very specific usage
the routine or method you are calling (e.g. vars in your example) is private in nature. I would not want a public interface with that calling semantics, nor I want it as a returned entity
the name of the "dummy" class is extremely clear in its intent and the kind of aggregate it represents.
the lifetime of that object is short and uneventful. It is just a temporary bag of data.
If these constraints are not respected, go for a fully recognized class with properties.
you can do that, but why not use a dictionary?
but if you do that, you're better off passing keywords args to the class's constructor, and then let the constructor copy them to the app's members. something like:
class Foo(object):
def __init__(self, **kwargs):
self.__dict__.update(kwargs)
I have noticed that in Python some of the fundamental types have two types of methods: those that are surrounded by __ and those that don't.
For example, if I have a variable of type float called my_number, I can see in IPython that it has the following methods:
my_number.__abs__ my_number.__pos__
my_number.__add__ my_number.__pow__
my_number.__class__ my_number.__radd__
my_number.__coerce__ my_number.__rdiv__
my_number.__delattr__ my_number.__rdivmod__
my_number.__div__ my_number.__reduce__
my_number.__divmod__ my_number.__reduce_ex__
my_number.__doc__ my_number.__repr__
my_number.__eq__ my_number.__rfloordiv__
my_number.__float__ my_number.__rmod__
my_number.__floordiv__ my_number.__rmul__
my_number.__format__ my_number.__rpow__
my_number.__ge__ my_number.__rsub__
my_number.__getattribute__ my_number.__rtruediv__
my_number.__getformat__ my_number.__setattr__
my_number.__getnewargs__ my_number.__setformat__
my_number.__gt__ my_number.__sizeof__
my_number.__hash__ my_number.__str__
my_number.__init__ my_number.__sub__
my_number.__int__ my_number.__subclasshook__
my_number.__le__ my_number.__truediv__
my_number.__long__ my_number.__trunc__
my_number.__lt__ my_number.as_integer_ratio
my_number.__mod__ my_number.conjugate
my_number.__mul__ my_number.fromhex
my_number.__ne__ my_number.hex
my_number.__neg__ my_number.imag
my_number.__new__ my_number.is_integer
my_number.__nonzero__ my_number.real
What is the difference between those that are surrounded by ___ and those that aren't?
Is this some sort of standard used in other programming languages? Does it usually mean the same thing in similar languages?
Generally, "double underscore" methods are used internally by Python for certain builtin functions or operators (e.g. __add__ defines behavior for the +). Those that do not have double underscores are just normal methods that wouldn't be used by operators or builtins. Now, these methods are still "normal" methods, in that you can call them just like any other method, but parts of the Python core treat them specially.
No, as far as I am aware, this is unique to Python, though many other languages support a similar idea (builtin/operator overloading) but through different mechanisms.
Shameless plug: I wrote a guide to this aspect of Python last year which is fairly comprehensive, you can read about how to use these methods on your own objects at http://rafekettler.com/magicmethods.html.
1) The variables you speak of __*__ are system variables or methods.
To quote the Python reference guide:
System-defined names. These names are defined by the interpreter and its implementation (including the standard library); applications should not expect to define additional names using this convention. The set of names of this class defined by Python may be extended in future versions.
Essentially they are variables or methods pre defined by the system. For example the system variable __name__ can be used within any function and will always contain the name of that function. You can find more comprehensive information and examples here.
2) This concept of system reserved variables is fundamental in most programming languages. For example PHP refers to them as Magic Constants. The Python example above to get a function name can be achieved in PHP using __FUNCTION__. More examples here.
From documentation:
Certain classes of identifiers (besides keywords) have special
meanings. These classes are identified by the patterns of leading and
trailing underscore characters:
_*
Not imported by from module import *. The special identifier _ is used in the interactive interpreter to store the result of the last
evaluation; it is stored in the builtin module. When not in
interactive mode, _ has no special meaning and is not defined. See
section The import statement.
Note The name _ is often used in conjunction with internationalization; refer to the documentation for the gettext
module for more information on this convention.
__ * __
System-defined names. These names are defined by the interpreter and its implementation (including the standard library). Current
system names are discussed in the Special method names section and
elsewhere. More will likely be defined in future versions of Python.
Any use of * names, in any context, that does not follow
explicitly documented use, is subject to breakage without warning.
__*
Class-private names. Names in this category, when used within the context of a class definition, are re-written to use a mangled form to
help avoid name clashes between “private” attributes of base and
derived classes. See section Identifiers (Names).
http://docs.python.org/reference/lexical_analysis.html#reserved-classes-of-identifiers
These are special methods that are called in specific contexts.
Read the documentation for the details.:
A class can implement certain operations that are invoked by special
syntax (such as arithmetic operations or subscripting and slicing) by
defining methods with special names. This is Python’s approach to
operator overloading, allowing classes to define their own behavior
with respect to language operators. For instance, if a class defines a
method named __getitem__(), and x is an instance of this class, then
x[i] is roughly equivalent to x.__getitem__(i) for old-style classes
and type(x).__getitem__(x, i) for new-style classes. Except where
mentioned, attempts to execute an operation raise an exception when no
appropriate method is defined (typically AttributeError or TypeError).
To answer your second question, this convention is used in C with macros such as __FILE__. Guido van Rossum, the creator of Python, explicitly says this was his inspiration in his blog on the history of Python.
its used to indicate that the attributes are internal. there is no actual privacy in python, so these are used as hints yo indicate internals rather than api methods.
I didn't know you could do this:
def tom():
print "tom's locals: ", locals()
def dick(z):
print "z.__name__ = ", z.__name__
z.guest = "Harry"
print "z.guest = ", z.guest
print "dick's locals: ", locals()
tom() #>>> tom's locals: {}
#print tom.guest #AttributeError: 'function' object has no attribute 'guest'
print "tom's dir:", dir(tom) # no 'guest' entry
dick( tom) #>>> z.__name__ = tom
#>>> z.guest = Harry
#>>> dick's locals: {'z': <function tom at 0x02819F30>}
tom() #>>> tom's locals: {}
#print dick.guest #AttributeError: 'function' object has no attribute 'guest'
print tom.guest #>>> Harry
print "tom's dir:", dir(tom) # 'guest' entry appears
Function tom() has no locals. Function dick() knows where tom() lives and puts up Harry as 'guest' over at tom()'s place. harry doesn't appear as a local at tom()'s place, but if you ask for tom's guest, harry answers. harry is a new attribute at tom().
UPDATE: From outside tom(), you can say "print dir(tom)" and see the the tom-object's dictionary. (You can do it from inside tom(), too. So tom could find out he had a new lodger, harry, going under the name of 'guest'.)
So, attributes can be added to a function's namespace from outside the function? Is that often done? Is it acceptable practice? Is it recommended in some situations? Is it actually vital at times? (Is it Pythonic?)
UPDATE: Title now says 'attributes'; it used to say 'variables'. Here's a PEP about Function Attributes.
I think you might be conflating the concepts of local variables and function attributes. For more information on Python function attributes, see the SO question Python function attributes - uses and abuses.
#behindthefall, the motivation to give function objects generic assignable attributes (they didn't use to have them) was that, absent such possibilities, real and popular frameworks were abusing what few assignable attributes existed (typically __doc__) to record information about each given function object. So there was clearly a "pent-up demand" for this functionality, so Guido decided to address it directly (adding an optional dict to each function object to record its attributes isn't a big deal -- most function objects don't need it, and it is optional, so the cost is just 4 bytes for a null pointer;-).
Assigning such attributes in arbitrary places would be very bad practice, making the code harder to understand for no real benefit, but they're very useful when used in a controlled way -- for example, a decorator could usefully record all kinds of things about the function being decorated, and the context in which the decoration occurred, as attributes of the wrapper function, allowing trivially-easy introspection of such metadata to occur later at any time, as needed.
As other answers already pointed out, local variables (which are per-instance, not per-function object!) are a completely disjoint namespace from a function object's attributes held in its __dict__.
In python, a namespace is just a dictionary object, mapping variable name as a string (in this case, 'guest') to a value (in this case, 'Harry'). So as long as you have access to an object, and it's mutable, you can change anything about its namespace.
On small projects, it's not a huge problem, and lets you hack things together faster, but incredibly confusing on larger projects, where your data could be modified from anywhere.
There are ways of making attributes of classes "more private", such as Name Mangling.
tom.guest is just a property on the tom function object, it has nothing to do with the scope or locals() inside that function, and nothing to do with that fact that tom is a function, it would work on any object.
I have used this in the past to make a self-contained function with "enums" that go along with it.
Suppose I were implementing a seek() function. The built-in Python one (on file objects) takes an integer to tell it how to operate; yuck, give me an enum please.
def seek(f, offset, whence=0):
return f.seek(offset, whence)
seek.START = 0
seek.RELATIVE = 1
seek.END = 2
f = open(filename)
seek(f, 0, seek.START) # seek to start of file
seek(f, 0, seek.END) # seek to end of file
What do you think, too tricky and weird? I do like how it keeps the "enum" values bundled together with the function; if you import the function from a module, you get its "enum" values as well, automatically.
Python functions are lexically scoped so there is no way to add variables to the function outside of its defined scope.
However, the function still will have access to all parent scopes, if you really wanted to design the system like that (generally considered bad practice though):
>>> def foo():
>>> def bar():
>>> print x
>>> x = 1
>>> bar()
1
Mutating function variables is mostly a bad idea, since functions are assumed to be immutable. The most pythonic way of implementing this behavior is using classes and methods instead.
Python API documentation generation tools, such as pydoc and epydoc, use introspection to determine a function's name and docstring (available as the __name__ and __doc__ attributes). Well-behaved function decorators are expected to preserve these attributes, so such tools continue to work as expected (i.e. decorating a function should preserve the decorated function's documentation). You do this by copying these attributes from the decorated function to the decorator. Take a look at update_wrapper in the functools module:
WRAPPER_ASSIGNMENTS = ('__module__', '__name__', '__doc__')
WRAPPER_UPDATES = ('__dict__',)
def update_wrapper(wrapper,
wrapped,
assigned = WRAPPER_ASSIGNMENTS,
updated = WRAPPER_UPDATES):
"""Update a wrapper function to look like the wrapped function
wrapper is the function to be updated
wrapped is the original function
...
"""
for attr in assigned:
setattr(wrapper, attr, getattr(wrapped, attr))
for attr in updated:
getattr(wrapper, attr).update(getattr(wrapped, attr, {}))
...
So, that's at least one example where modifying function attributes is useful and accepted.
It some situations, it can be useful to "annotate" a function by setting an attribute; Django uses this in a few places:
You can set alters_data to True
on model methods that change the
database, preventing them from being
called in templates.
You can set
allow_tags on model methods that
will be displayed in the admin, to
signify that the method returns HTML
content, which shouldn't be
automatically escaped.
As always, use your judgement. If modifying attributes is accepted practice (for example, when writing a decorator), then by all means go ahead. If it's going to be part of a well documented API, it's probably fine too.