Python functions can be given new attributes from outside the scope? - python

I didn't know you could do this:
def tom():
print "tom's locals: ", locals()
def dick(z):
print "z.__name__ = ", z.__name__
z.guest = "Harry"
print "z.guest = ", z.guest
print "dick's locals: ", locals()
tom() #>>> tom's locals: {}
#print tom.guest #AttributeError: 'function' object has no attribute 'guest'
print "tom's dir:", dir(tom) # no 'guest' entry
dick( tom) #>>> z.__name__ = tom
#>>> z.guest = Harry
#>>> dick's locals: {'z': <function tom at 0x02819F30>}
tom() #>>> tom's locals: {}
#print dick.guest #AttributeError: 'function' object has no attribute 'guest'
print tom.guest #>>> Harry
print "tom's dir:", dir(tom) # 'guest' entry appears
Function tom() has no locals. Function dick() knows where tom() lives and puts up Harry as 'guest' over at tom()'s place. harry doesn't appear as a local at tom()'s place, but if you ask for tom's guest, harry answers. harry is a new attribute at tom().
UPDATE: From outside tom(), you can say "print dir(tom)" and see the the tom-object's dictionary. (You can do it from inside tom(), too. So tom could find out he had a new lodger, harry, going under the name of 'guest'.)
So, attributes can be added to a function's namespace from outside the function? Is that often done? Is it acceptable practice? Is it recommended in some situations? Is it actually vital at times? (Is it Pythonic?)
UPDATE: Title now says 'attributes'; it used to say 'variables'. Here's a PEP about Function Attributes.

I think you might be conflating the concepts of local variables and function attributes. For more information on Python function attributes, see the SO question Python function attributes - uses and abuses.

#behindthefall, the motivation to give function objects generic assignable attributes (they didn't use to have them) was that, absent such possibilities, real and popular frameworks were abusing what few assignable attributes existed (typically __doc__) to record information about each given function object. So there was clearly a "pent-up demand" for this functionality, so Guido decided to address it directly (adding an optional dict to each function object to record its attributes isn't a big deal -- most function objects don't need it, and it is optional, so the cost is just 4 bytes for a null pointer;-).
Assigning such attributes in arbitrary places would be very bad practice, making the code harder to understand for no real benefit, but they're very useful when used in a controlled way -- for example, a decorator could usefully record all kinds of things about the function being decorated, and the context in which the decoration occurred, as attributes of the wrapper function, allowing trivially-easy introspection of such metadata to occur later at any time, as needed.
As other answers already pointed out, local variables (which are per-instance, not per-function object!) are a completely disjoint namespace from a function object's attributes held in its __dict__.

In python, a namespace is just a dictionary object, mapping variable name as a string (in this case, 'guest') to a value (in this case, 'Harry'). So as long as you have access to an object, and it's mutable, you can change anything about its namespace.
On small projects, it's not a huge problem, and lets you hack things together faster, but incredibly confusing on larger projects, where your data could be modified from anywhere.
There are ways of making attributes of classes "more private", such as Name Mangling.

tom.guest is just a property on the tom function object, it has nothing to do with the scope or locals() inside that function, and nothing to do with that fact that tom is a function, it would work on any object.

I have used this in the past to make a self-contained function with "enums" that go along with it.
Suppose I were implementing a seek() function. The built-in Python one (on file objects) takes an integer to tell it how to operate; yuck, give me an enum please.
def seek(f, offset, whence=0):
return f.seek(offset, whence)
seek.START = 0
seek.RELATIVE = 1
seek.END = 2
f = open(filename)
seek(f, 0, seek.START) # seek to start of file
seek(f, 0, seek.END) # seek to end of file
What do you think, too tricky and weird? I do like how it keeps the "enum" values bundled together with the function; if you import the function from a module, you get its "enum" values as well, automatically.

Python functions are lexically scoped so there is no way to add variables to the function outside of its defined scope.
However, the function still will have access to all parent scopes, if you really wanted to design the system like that (generally considered bad practice though):
>>> def foo():
>>> def bar():
>>> print x
>>> x = 1
>>> bar()
1
Mutating function variables is mostly a bad idea, since functions are assumed to be immutable. The most pythonic way of implementing this behavior is using classes and methods instead.

Python API documentation generation tools, such as pydoc and epydoc, use introspection to determine a function's name and docstring (available as the __name__ and __doc__ attributes). Well-behaved function decorators are expected to preserve these attributes, so such tools continue to work as expected (i.e. decorating a function should preserve the decorated function's documentation). You do this by copying these attributes from the decorated function to the decorator. Take a look at update_wrapper in the functools module:
WRAPPER_ASSIGNMENTS = ('__module__', '__name__', '__doc__')
WRAPPER_UPDATES = ('__dict__',)
def update_wrapper(wrapper,
wrapped,
assigned = WRAPPER_ASSIGNMENTS,
updated = WRAPPER_UPDATES):
"""Update a wrapper function to look like the wrapped function
wrapper is the function to be updated
wrapped is the original function
...
"""
for attr in assigned:
setattr(wrapper, attr, getattr(wrapped, attr))
for attr in updated:
getattr(wrapper, attr).update(getattr(wrapped, attr, {}))
...
So, that's at least one example where modifying function attributes is useful and accepted.
It some situations, it can be useful to "annotate" a function by setting an attribute; Django uses this in a few places:
You can set alters_data to True
on model methods that change the
database, preventing them from being
called in templates.
You can set
allow_tags on model methods that
will be displayed in the admin, to
signify that the method returns HTML
content, which shouldn't be
automatically escaped.
As always, use your judgement. If modifying attributes is accepted practice (for example, when writing a decorator), then by all means go ahead. If it's going to be part of a well documented API, it's probably fine too.

Related

Difference between functions and methods in Python

I can't seem to understand the difference between methods and functions.
From my knowledge I am aware that methods are functions that are unique to the classes they are implemented in, but for functions can I say that they can be used in general and are not restricted to a certain class. Also, is the indentation of functions vs methods another essential difference? As methods are implemented within classes and functions are outside with the least indentation.
A function is virtually the same as a method only that the latter is bound to a class. In Python, in most cases, the same way you define a function is the same way you define a method. However, to refer to the class it is in, you will at times see the 'self' parameter added to the method as in: def function_name(self):. The indentation works similarly in both cases.
I think the reason why you tend to think that the indentation of a method is deeper than that of a function is because by the time you're writing your method, you are already indented inside the class.
Python, in particular, across most programming languages, really does not differentiate much between functions and methods.
What you declare is true: "methods are functions that are unique to the classes they are implemented in" -
And that makes for the indentation difference: since methods are inside a class, they are indented inside the class statement block.
When you create a simple class, and retrieve a method from it directly, it is actually a function (in python 3 - it worked differently in old Python 2.x):
from types import FunctionType
class A:
def b(self):
pass
print(A.b, type(A.b) is FunctionType)
Prints:
<function A.b at 0x...> True
So, in Python, a "method" is really and literally a function!
It will work differently when retrieved from an instance of the class it is declared on.
[optional complicated paragraph] Then, Python wraps it with what can be described as "a lazy object callable that will insert the instance it is bound to as first argument to the underlying function when called." Once you can make sense of this phrase, you can check "seasoned Pythonista" in your CV.
What it means is simply that "self" will appear "magically" when calling the method, and moreover, one can go back to the original function by looking at the method's .__func__ attribute.
So, given the same class A above:
In [52]: print(A().b, type(A().b), type(A().b) is FunctionType)
<bound method A.b of <__main__.A object at 0x...>> <class 'method'> False
In [53]: print(A().b.__func__, type(A().b.__func__), type(A().b.__func__) is FunctionType)
<function A.b at 0x...> <class 'function'> True
Note that in these examples I instantiate a new object of the class A at each time, by writing A(), while in the first example, I retrieve b directly from the class, without parenthesis: A.b
As I stated in the beggining of the text: be aware that the fact a plain function can be used as a method is particular to Python, and both kinds of objects will likely have some differences in other programming languages. The idea however, will remain the same: a method will always "know" about the instance from where it was retrieved.

What are the rules for adding members to Python methods safely?

I'm looking at code like this:
def foo():
return 42
foo.x = 5
This obviously adds a member to the function object named foo. I find this very useful as it makes these function objects look very similar to Objects with a __call__ function.
Are there rules I must follow to make sure I don't cause problems in future updates to Python, such as names that I must avoid? Perhaps there is a PEP or documentation section that mentions rules?
There are no rules, other than to take the reserved classes of identifiers into account. Specifically, try to avoid using dunder names:
System-defined names, informally known as “dunder” names. [...] Any use of __*__ names, in any context, that does not follow explicitly documented use, is subject to breakage without warning.
There is otherwise nothing special about functions accepting arbitrary attributes; almost anything in Python accepts arbitrary attributes if there is a place to put them (which is, almost always, a __dict__ attribute).
Within the Python standard library, function attributes are used to link decorator wrapper functions to the original wrapped function (via the functools.update_wrapper() function and it's side-kick, the #functools.wraps() decorator function), and to attach state and methods to a function when augmented by decorators (e.g. the singledispatch() decorator adds several methods and a registry to the decorated function).
It is a good technique. Rule is not shadow any dunder names which have special meanings.
Here is a good way to implement a singleton:
import faker
def my_fake_data():
if not getattr(my_fake_data, 'factory', None):
my_fake_data.factory = faker.Faker()
return my_fake_data.factory
Monkey patching uses a similar technique (setting a class attribute instead pf a function attribute) but for more "devious" reasons such as changing the implementation of a previously defined class.

Why scope of names from a classdef suite is limited to be discluded from closure of a funcdef?

I was expecting a funcdef to bind the closest inner closure to its definition. Apparently it's not the case:
phoo = 4
class Alice: # 'classdef'
# <class 'suite'>:
phoo = 1
spam = phoo + 11
blah = staticmethod(lambda: phoo + 22)
#staticmethod
def Blake():
return phoo + 33
Test:
>>> Alice.spam
12
>>> Alice.blah()
26
>>> Alice.Blake()
37
It is said that a code block is executed execution frame. In the time when the class definition's 'block' run/executed, spam resolves phoo from inside Alice.
I expected resolution from inside Blake to resolve phoo from Alice. The execution model says,
If the definition occurs in a function block, the scope extends to any blocks contained within the defining one, unless a contained block introduces a different binding for the name.
Then it says,
It is said that a code block is executed execution frame.
This decision caused my assumption to go wrong. What is the rationale behind it?
edit: this is python 2 old-style classes; but if noted answers can be on new-style classes. I asked for the reason, however if you could add insider technical explanation, it is very welcome too!
From an intuitive point of view, the answer is pretty simple:
Free variables in a function definition capture variables in the enclosing scope. But class attributes aren't variables, they're class attributes; you have to access them as Alice.spam or self.spam, not as spam. Therefore, spam doesn't capture the outer spam because there is no outer spam.
But under the covers, this isn't really true.
For a new-style class, while the class definition's body is being executed, spam actually is a local variable in the scope of that body; it's only when the metaclass (type, in this case) is executed that the class attributes are created from those locals.[1]
For an old-style class, it's not completely defined what happens, so you pretty much have to turn to the implementation. In particular, there's no step where the metaclass is executed with the class definition's locals to generate the class object. But for the most part, it works pretty much as if that were the case.
So, why doesn't spam bind to that local?
A free variable can only bind to a closure cell from an outer scope, which is a special kind of local variable. And the compiler only creates a closure cell for a variable in a function definition when a local function accesses it. It doesn't create closure cells for variables in class definitions.
So if spam doesn't bind to Alice.spam, what does it bind to? Well, by the usual LEGB rules, if there's no local assignment, and no enclosing cell variable, it's a global.
Some of the above may be hard to understand without an example, so:
>>> def f():
... a=1
... b=2
... def g():
... b
... return g
>>> f.__code__.co_cellvars # cell locals, captured by closures
('b',)
>>> f.__code__.co_varnames # normal locals
('a', 'g')
>>> g = f()
>>> g.__code__.co_freevars # free variables that captured cells
('b',)
>>> class Alice:
... a=1
... b=2
... def f():
... b
>>> Alice.f.__func__.__code__.co_freevars
()
>>> Alice.f.__func__.__code__.co_varnames
()
>>> Alice.f.__func__.__code__.co_names # loosely, globals
('b',)
If you're wondering where co_cellvars and the like are specified… well, they're not, but the inspect module docs give a brief summary of what they mean.
If you understand CPython bytecode, it's also worth calling dis on all of these chunks of code to see the instructions used for loading and saving all these variables.
So, the big question is, why doesn't Python generate cells for class definitions?
Unless Guido remembers, and finds it interesting enough to write a Python history blog post about this, I'm not sure we'll ever know the answer. (You could, of course, try asking him—a comment on his blog or an email to whichever mailing list seems most relevant is probably the best way.)
But here's my guess:
Cells are implemented as indices into an array stored in a code object. When the function is called, its frame gets an matching array of objects. When a local function definition is executed inside that function call, the free variables are bound to references to the cell slots in the frame.
Classes don't have __code__ members (or, pre-2.6, co_code). Why? Because a class definition is executed as soon as it's defined, and never executed again, so why bother? This means there's nowhere to stash a cell, and nothing for it to reference. On top of that, the execution frame always goes away as soon as the execution finishes, because there can't be any external references to it.
Of course you could change that: add __code__ members to classes, create cells in them, and then, if someone closed over those cells, that would keep the frame alive after execution just as it does with functions. Would that be a good idea? I don't know. My guess is that nobody asked the question when Python classes were first being defined. While it's obvious now how much class definitions and function definitions have in common, I think that's an instance of Guido's time machine—he made a design decision without realizing that it would turn out to solve problems nobody would raise until a decade later.
[1] Some of these details may be CPython-specific. For example, I believe it's technically legal for an implementation to make a closure cell for every local in a function, or to use some other mechanism that's equivalent to that. For example, if you do exec('spam=3') in an inner function, all the language reference says is that it's not guaranteed that it will affect the outer function's spam, not that it's guaranteed not to do so.

Python and reference passing. Limitation?

I would like to do something like the following:
class Foo(object):
def __init__(self):
self.member = 10
pass
def factory(foo):
foo = Foo()
aTestFoo = None
factory(aTestFoo)
print aTestFoo.member
However it crashes with AttributeError: 'NoneType' object has no attribute 'member':
the object aTestFoo has not been modified inside the call of the function factory.
What is the pythonic way of performing that ? Is it a pattern to avoid ? If it is a current mistake, how is it called ?
In C++, in the function prototype, I would have added a reference to the pointer to be created in the factory... but maybe this is not the kind of things I should think about in Python.
In C#, there's the key word ref that allows to modify the reference itself, really close to the C++ way. I don't know in Java... and I do wonder in Python.
Python does not have pass by reference. One of the few things it shares with Java, by the way. Some people describe argument passing in Python as call by value (and define the values as references, where reference means not what it means in C++), some people describe it as pass by reference with reasoning I find quite questionable (they re-define it to use to what Python calls "reference", and end up with something which has nothing to do with what has been known as pass by reference for decades), others go for terms which are not as widely used and abused (popular examples are "{pass,call} by {object,sharing}"). See Call By Object on effbot.org for a rather extensive discussion on the defintions of the various terms, on history, and on the flaws in some of the arguments for the terms pass by reference and pass by value.
The short story, without naming it, goes like this:
Every variable, object attribute, collection item, etc. refers to an object.
Assignment, argument passing, etc. create another variable, object attribute, collection item, etc. which refers to the same object but has no knowledge which other variables, object attributes, collection items, etc. refer to that object.
Any variable, object attribute, collection item, etc. can be used to modify an object, and any other variable, object attribute, collection item, etc. can be used to observe that modification.
No variable, object attribute, collection item, etc. refers to another variable, object attribute, collection items, etc. and thus you can't emulate pass by reference (in the C++ sense) except by treating a mutable object/collection as your "namespace". This is excessively ugly, so don't use it when there's a much easier alternative (such as a return value, or exceptions, or multiple return values via iterable unpacking).
You may consider this like using pointers, but not pointers to pointers (but sometimes pointers to structures containing pointers) in C. And then passing those pointers by value. But don't read too much into this simile. Python's data model is significantly different from C's.
You are making a mistake here because in Python
"We call the argument passing technique _call by sharing_,
because the argument objects are shared between the
caller and the called routine. This technique does not
correspond to most traditional argument passing techniques
(it is similar to argument passing in LISP). In particular it
is not call by value because mutations of arguments per-
formed by the called routine will be visible to the caller.
And it is not call by reference because access is not given
to the variables of the caller, but merely to certain objects."
in Python, the variables in the formal argument list are bound to the
actual argument objects. the objects are shared between caller
and callee; there are no "fresh locations" or extra "stores" involved.
(which, of course, is why the CLU folks called this mechanism "call-
by-sharing".)
and btw, Python functions doesn't run in an extended environment, either. function bodies have very limited access to the surrounding environment.
The Assignment Statements section of the Python docs might be interesting.
The = statement in Python acts differently depending on the situation, but in the case you present, it just binds the new object to a new local variable:
def factory(foo):
# This makes a new instance of Foo,
# and binds it to a local variable `foo`,
foo = Foo()
# This binds `None` to a top-level variable `aTestFoo`
aTestFoo = None
# Call `factory` with first argument of `None`
factory(aTestFoo)
print aTestFoo.member
Although it can potentially be more confusing than helpful, the dis module can show you the byte-code representation of a function, which can reveal how Python works internally. Here is the disassembly of `factory:
>>> dis.dis(factory)
4 0 LOAD_GLOBAL 0 (Foo)
3 CALL_FUNCTION 0
6 STORE_FAST 0 (foo)
9 LOAD_CONST 0 (None)
12 RETURN_VALUE
What that says is, Python loads the global Foo class by name (0), and calls it (3, instantiation and calling are very similar), then stores the result in a local variable (6, see STORE_FAST). Then it loads the default return value None (9) and returns it (12)
What is the pythonic way of performing that ? Is it a pattern to avoid ? If it is a current mistake, how is it called ?
Factory functions are rarely necessary in Python. In the occasional case where they are necessary, you would just return the new instance from your factory (instead of trying to assign it to a passed-in variable):
class Foo(object):
def __init__(self):
self.member = 10
pass
def factory():
return Foo()
aTestFoo = factory()
print aTestFoo.member
Your factory method doesn't return anything - and by default it will have a return value of None. You assign aTestFoo to None, but never re-assign it - which is where your actual error is coming from.
Fixing these issues:
class Foo(object):
def __init__(self):
self.member = 10
pass
def factory(obj):
return obj()
aTestFoo = factory(Foo)
print aTestFoo.member
This should do what I think you are after, although such patterns are not that typical in Python (ie, factory methods).

Python: Using a dummy class to pass variable names?

This is a followup to function that returns a dict whose keys are the names of the input arguments, which I learned many things (paraphrased):
Python objects, on the whole, don't know their names.
No, this is not possible in general with *args. You'll have to use keyword arguments
When the number of arguments is fixed, you can do this with locals
Using globals(). This will only work if the values are unique in the module scope, so it's fragile
You're probably better off not doing this anyway and rethinking the problem.
The first point highlighting my fundamental misunderstanding of Python variables. The responses were very pedagogic and nearly instantaneous, clearly this is both a well-understood yet easily confused topic.
Since I'd like to learn how to do things proper, is it considered bad practice to create a dummy class to simply hold the variables with names attached to them?
class system: pass
S = system ()
S.T = 1.0
S.N = 20
S.L = 10
print vars(S)
This accomplishes my original intent, but I'm left wondering if there is something I'm not considering that can bite me later.
I do it as a homage to Javascript, where you don't have any distinction between dictionaries and instance variables. I think it's not necessarily an antipattern, also because differently from dictionaries, if you don't have the value it raises AttributeError instead of KeyError, and it is easier to spot typos of the name. As I said, not an antipattern, provided that
the scope of the class is restricted to a very specific usage
the routine or method you are calling (e.g. vars in your example) is private in nature. I would not want a public interface with that calling semantics, nor I want it as a returned entity
the name of the "dummy" class is extremely clear in its intent and the kind of aggregate it represents.
the lifetime of that object is short and uneventful. It is just a temporary bag of data.
If these constraints are not respected, go for a fully recognized class with properties.
you can do that, but why not use a dictionary?
but if you do that, you're better off passing keywords args to the class's constructor, and then let the constructor copy them to the app's members. something like:
class Foo(object):
def __init__(self, **kwargs):
self.__dict__.update(kwargs)

Categories

Resources