Python descriptors set and get behaviour [duplicate]

Python descriptors set and get behaviour [duplicate] - python

Simple repro:
class VocalDescriptor(object):
def __get__(self, obj, objtype):
print('__get__, obj={}, objtype={}'.format(obj, objtype))
def __set__(self, obj, val):
print('__set__')
class B(object):
v = VocalDescriptor()
B.v # prints "__get__, obj=None, objtype=<class '__main__.B'>"
B.v = 3 # does not print "__set__", evidently does not trigger descriptor
B.v # does not print anything, we overwrote the descriptor
This question has an effective duplicate, but the duplicate was not answered, and I dug a bit more into the CPython source as a learning exercise. Warning: i went into the weeds. I'm really hoping I can get help from a captain who knows those waters. I tried to be as explicit as possible in tracing the calls I was looking at, for my own future benefit and the benefit of future readers.
I've seen a lot of ink spilled over the behavior of __getattribute__ applied to descriptors, e.g. lookup precedence. The Python snippet in "Invoking Descriptors" just below For classes, the machinery is in type.__getattribute__()... roughly agrees in my mind with what I believe is the corresponding CPython source in type_getattro, which I tracked down by looking at "tp_slots" then where tp_getattro is populated. And the fact that B.v initially prints __get__, obj=None, objtype=<class '__main__.B'> makes sense to me.
What I don't understand is, why does the assignment B.v = 3 blindly overwrite the descriptor, rather than triggering v.__set__? I tried to trace the CPython call, starting once more from "tp_slots", then looking at where tp_setattro is populated, then looking at type_setattro. type_setattro appears to be a thin wrapper around _PyObject_GenericSetAttrWithDict. And there's the crux of my confusion: _PyObject_GenericSetAttrWithDict appears to have logic that gives precedence to a descriptor's __set__ method!! With this in mind, I can't figure out why B.v = 3 blindly overwrites v rather than triggering v.__set__.
Disclaimer 1: I did not rebuild Python from source with printfs, so I'm not
completely sure type_setattro is what's being called during B.v = 3.
Disclaimer 2: VocalDescriptor is not intended to exemplify "typical" or "recommended" descriptor definition. It's a verbose no-op to tell me when the methods are being called.

You are correct that B.v = 3 simply overwrites the descriptor with an integer (as it should). In the descriptor protocol, __get__ is designed to be called as instance attribute or class attribute, but __set__ is designed to be called only as instance attribute.
For B.v = 3 to invoke a descriptor, the descriptor should have been defined on the metaclass, i.e. on type(B).
>>> class BMeta(type):
... v = VocalDescriptor()
...
>>> class B(metaclass=BMeta):
... pass
...
>>> B.v = 3
__set__
To invoke the descriptor on B, you would use an instance: B().v = 3 will do it.
The reason for B.v also invoking the getter is to allow user's customization of what B.v does, independently of whatever B().v does. A common pattern is to allow direct access on the descriptor instance, by returning the descriptor itself when a class attribute access was used:
class VocalDescriptor(object):
def __get__(self, obj, objtype):
if obj is None:
return self
print('__get__, obj={}, objtype={}'.format(obj, objtype))
def __set__(self, obj, val):
print('__set__')
Now B.v would return some instance like <mymodule.VocalDescriptor object at 0xdeadbeef> which you can interact with. It is literally the descriptor object, defined as a class attribute, and its state B.v.__dict__ is shared between all instances of B.
Of course it is up to user's code to define exactly what they want B.v to do, returning self is just the common pattern. A classmethod is an example of a descriptor which does something different here, see the Descriptor HowTo Guide for a pure-python implementation of classmethod.
Unlike __get__, which can be used to customize B().v and B.v independently, __set__ is not invoked unless the attribute access is on an instance. I would suppose that the goal of customizing B().v = other and B.v = other using the same descriptor v is not common or useful enough to complicate the descriptor protocol further, especially since the latter is still possible with a metaclass descriptor anyway, as shown in BMeta.v above.

Barring any overrides, B.v is equivalent to type.__getattribute__(B, "v"), while b = B(); b.v is equivalent to object.__getattribute__(b, "v"). Both definitions invoke the __get__ method of the result if defined.
Note, thought, that the call to __get__ differs in each case. B.v passes None as the first argument, while B().v passes the instance itself. In both cases B is passed as the second argument.
B.v = 3, on the other hand, is equivalent to type.__setattr__(B, "v", 3), which does not invoke __set__.

I think that none of the current answers actually answer your question.
Why does setting a descriptor on a class overwrite the descriptor?
Setting or deleting an attribute on a class (or on a subclass of the class) owning a descriptor (e.g. cls.descr = 3 or del cls.descr) overrides that descriptor because it would be impossible to change a faulty descriptor otherwise (e.g. descr.__set__(None, cls, 3) or descr.__delete__(None, cls) raising an exception) since a class dictionary (e.g. cls.__dict__) is a read-only types.MappingProxyType. You can always define a descriptor on the metaclass if you want to override setting or deleting an attribute on a class which is an instance of that metaclass. So __set__ and __delete__ are always passed an instance of the class owning the descriptor, that is why they do not have an owner parameter.
Getting an attribute on a class (or on a subclass of the class) owning a descriptor (e.g. cls.descr) does not override that descriptor because it does not prevent changing a faulty descriptor (e.g. descr.__get__(None, cls) raising an exception). So __get__ is passed either an instance of the class owning the descriptor, or the class (or a subclass of the class) itself, that is why it has an owner parameter.
More information in this answer.

Related

Why does setting a descriptor on a class overwrite the descriptor?

Simple repro:
class VocalDescriptor(object):
def __get__(self, obj, objtype):
print('__get__, obj={}, objtype={}'.format(obj, objtype))
def __set__(self, obj, val):
print('__set__')
class B(object):
v = VocalDescriptor()
B.v # prints "__get__, obj=None, objtype=<class '__main__.B'>"
B.v = 3 # does not print "__set__", evidently does not trigger descriptor
B.v # does not print anything, we overwrote the descriptor
This question has an effective duplicate, but the duplicate was not answered, and I dug a bit more into the CPython source as a learning exercise. Warning: i went into the weeds. I'm really hoping I can get help from a captain who knows those waters. I tried to be as explicit as possible in tracing the calls I was looking at, for my own future benefit and the benefit of future readers.
I've seen a lot of ink spilled over the behavior of __getattribute__ applied to descriptors, e.g. lookup precedence. The Python snippet in "Invoking Descriptors" just below For classes, the machinery is in type.__getattribute__()... roughly agrees in my mind with what I believe is the corresponding CPython source in type_getattro, which I tracked down by looking at "tp_slots" then where tp_getattro is populated. And the fact that B.v initially prints __get__, obj=None, objtype=<class '__main__.B'> makes sense to me.
What I don't understand is, why does the assignment B.v = 3 blindly overwrite the descriptor, rather than triggering v.__set__? I tried to trace the CPython call, starting once more from "tp_slots", then looking at where tp_setattro is populated, then looking at type_setattro. type_setattro appears to be a thin wrapper around _PyObject_GenericSetAttrWithDict. And there's the crux of my confusion: _PyObject_GenericSetAttrWithDict appears to have logic that gives precedence to a descriptor's __set__ method!! With this in mind, I can't figure out why B.v = 3 blindly overwrites v rather than triggering v.__set__.
Disclaimer 1: I did not rebuild Python from source with printfs, so I'm not
completely sure type_setattro is what's being called during B.v = 3.
Disclaimer 2: VocalDescriptor is not intended to exemplify "typical" or "recommended" descriptor definition. It's a verbose no-op to tell me when the methods are being called.

You are correct that B.v = 3 simply overwrites the descriptor with an integer (as it should). In the descriptor protocol, __get__ is designed to be called as instance attribute or class attribute, but __set__ is designed to be called only as instance attribute.
For B.v = 3 to invoke a descriptor, the descriptor should have been defined on the metaclass, i.e. on type(B).
>>> class BMeta(type):
... v = VocalDescriptor()
...
>>> class B(metaclass=BMeta):
... pass
...
>>> B.v = 3
__set__
To invoke the descriptor on B, you would use an instance: B().v = 3 will do it.
The reason for B.v also invoking the getter is to allow user's customization of what B.v does, independently of whatever B().v does. A common pattern is to allow direct access on the descriptor instance, by returning the descriptor itself when a class attribute access was used:
class VocalDescriptor(object):
def __get__(self, obj, objtype):
if obj is None:
return self
print('__get__, obj={}, objtype={}'.format(obj, objtype))
def __set__(self, obj, val):
print('__set__')
Now B.v would return some instance like <mymodule.VocalDescriptor object at 0xdeadbeef> which you can interact with. It is literally the descriptor object, defined as a class attribute, and its state B.v.__dict__ is shared between all instances of B.
Of course it is up to user's code to define exactly what they want B.v to do, returning self is just the common pattern. A classmethod is an example of a descriptor which does something different here, see the Descriptor HowTo Guide for a pure-python implementation of classmethod.
Unlike __get__, which can be used to customize B().v and B.v independently, __set__ is not invoked unless the attribute access is on an instance. I would suppose that the goal of customizing B().v = other and B.v = other using the same descriptor v is not common or useful enough to complicate the descriptor protocol further, especially since the latter is still possible with a metaclass descriptor anyway, as shown in BMeta.v above.

Barring any overrides, B.v is equivalent to type.__getattribute__(B, "v"), while b = B(); b.v is equivalent to object.__getattribute__(b, "v"). Both definitions invoke the __get__ method of the result if defined.
Note, thought, that the call to __get__ differs in each case. B.v passes None as the first argument, while B().v passes the instance itself. In both cases B is passed as the second argument.
B.v = 3, on the other hand, is equivalent to type.__setattr__(B, "v", 3), which does not invoke __set__.

I think that none of the current answers actually answer your question.
Why does setting a descriptor on a class overwrite the descriptor?
Setting or deleting an attribute on a class (or on a subclass of the class) owning a descriptor (e.g. cls.descr = 3 or del cls.descr) overrides that descriptor because it would be impossible to change a faulty descriptor otherwise (e.g. descr.__set__(None, cls, 3) or descr.__delete__(None, cls) raising an exception) since a class dictionary (e.g. cls.__dict__) is a read-only types.MappingProxyType. You can always define a descriptor on the metaclass if you want to override setting or deleting an attribute on a class which is an instance of that metaclass. So __set__ and __delete__ are always passed an instance of the class owning the descriptor, that is why they do not have an owner parameter.
Getting an attribute on a class (or on a subclass of the class) owning a descriptor (e.g. cls.descr) does not override that descriptor because it does not prevent changing a faulty descriptor (e.g. descr.__get__(None, cls) raising an exception). So __get__ is passed either an instance of the class owning the descriptor, or the class (or a subclass of the class) itself, that is why it has an owner parameter.
More information in this answer.

Why are python static/class method not callable?

Why are python instance methods callable, but static methods and class methods not callable?
I did the following:
class Test():
class_var = 42
#classmethod
def class_method(cls):
pass
#staticmethod
def static_method():
pass
def instance_method(self):
pass
for attr, val in vars(Test).items():
if not attr.startswith("__"):
print (attr, "is %s callable" % ("" if callable(val) else "NOT"))
The result is:
static_method is NOT callable
instance_method is callable
class_method is NOT callable
class_var is NOT callable
Technically this may be because instance method object might have a particular attribute (not) set in a particular way (possibly __call__). Why such asymmetry, or what purpose does it serve?
I came across this while learning python inspection tools.
Additional remarks from comments:
The SO answer linked in the comments says that the static/class methods are descriptors , which are not callable. Now I am curious, why are descriptors made not callable, since descriptors are class with particular attributes (one of __get__, __set__, __del___) defined.

Why are descriptors not callable? Basically because they don't need to be. Not every descriptor represents a callable either.
As you correctly note, the descriptor protocol consists of __get__, __set__ and __del__. Note no __call__, that's the technical reason why it's not callable. The actual callable is the return value of your static_method.__get__(...).
As for the philosophical reason, let's look at the class. The contents of the __dict__, or in your case results of vars(), are basically locals() of the class block. If you define a function, it gets dumped as a plain function. If you use a decorator, such as #staticmethod, it's equivalent to something like:
def _this_is_not_stored_anywhere():
pass
static_method = staticmethod(_this_is_not_stored_anywhere)
I.e., static_method is assigned a return value of the staticmethod() function.
Now, function objects actually implement the descriptor protocol - every function has a __get__ method on it. This is where the special self and the bound-method behavior comes from. See:
def xyz(what):
print(what)
repr(xyz) # '<function xyz at 0x7f8f924bdea0>'
repr(xyz.__get__("hello")) # "<bound method str.xyz of 'hello'>"
xyz.__get__("hello")() # "hello"
Because of how the class calls __get__, your test.instance_method binds to the instance and gets it pre-filled as it first argument.
But the whole point of #classmethod and #staticmethod is that they do something special to avoid the default "bound method" behavior! So they can't return a plain function. Instead they return a descriptor object with a custom __get__ implementation.
Of course, you could put a __call__ method on this descriptor object, but why? It's code that you don't need in practice; you can almost never touch the descriptor object itself. If you do (in code similar to yours), you still need special handling for descriptors, because a general descriptor doesn't have to be(have like a) callable - properties are descriptors too. So you don't want __call__ in the descriptor protocol. So if a third party "forgets" to implement __call__ on something you consider a "callable", your code will miss it.
Also, the object is a descriptor, not a function. Putting a __call__ method on it would be masking its true nature :) I mean, it's not wrong per se, it's just ... something that you should never need for anything.
BTW, in case of classmethod/staticmethod, you can get back the original function from their __func__ attribute.

which one is the right definition of data-descriptor and non-data descriptor?

Both of them are python from documents :
the first one says:
If an object defines both __get__() and __set__(), it is considered a data descriptor.
Descriptors that only define__get__() are called non-data descriptors
(they are typically used for methods but other uses are possible).
the second one says:
If the descriptor defines __set__() and/or __delete__(), it is a data descriptor; if it defines neither, it is a non-data descriptor.
Normally, data descriptors define both __get__() and __set__(), while non-data descriptors have just the __get__() method.
The question is : is it enough to only define __set__ to make a data descriptor ?
And we I refer to the python source code, I found this :
#define PyDescr_IsData(d) (Py_TYPE(d)->tp_descr_set != NULL)
Seems we can only define __set__ without __get__.
And then I turn to write some examples to prove what I got :
class GetSet(object):
def __get__(self, instance, cls =None):
print('__get__')
def __set__(self, obj, val):
print('__set__')
class Get(object):
def __get__(self, instance, cls =None):
print('__get__')
class Set(object):
def __set__(self, obj, val):
print('__set__')
class UserClass(object):
a = Get()
b = Set()
c = GetSet()
u = UserClass()
u.__dict__['a'] = 'a'
u.__dict__['b'] = 'b'
u.__dict__['c'] = 'c'
print('start')
print(u.a)
print(u.b)
print(u.c)
The output makes me confusing again:
start
a
b
__get__
None
According to the python attribute lookup orders : the priority of data descriptor is higher than obj.__dict__.
My example shows : Only the descriptor defines both __set__ and __get__ makes it a data descriptor !
Which one is the right answer ?
__set__ --- > data descriptor
or
__get__ and __set__ ---> data descriptor ?

The second quote is correct. The second quote comes from the Python language reference (though you've provided the wrong link), and the language reference is considered more authoritative than how-to guides. Also, it matches the actual behavior; the PyDescr_IsData macro you found is the actual routine used in object.__getattribute__ to determine what counts as a data descriptor, and either __set__ or __delete__ will cause tp_descr_set to be non-null.
The language reference also explains why Set doesn't override the instance dict for a.b:
If it does not define __get__(), then accessing the attribute will return the descriptor object itself unless there is a value in the object’s instance dictionary. [...] Data descriptors with __set__() and __get__() defined always override a redefinition in an instance dictionary.
Defining either __set__ or __delete__ will set a type's tp_descr_set slot and make instances of the type data descriptors. A data descriptor will always be invoked for attempts to set or delete the attribute it manages, even if there is an entry in the instance's dict with the same name, and even if it only has __set__ and you're trying to delete the attribute or vice versa. (If it doesn't have the needed method, it will raise an exception.) If a data descriptor also has __get__, it will also intercept attempts to get the attribute; otherwise, Python will fall back on the normal attribute lookup behavior, as if it wasn't a descriptor at all.

What are some rules of thumb for deciding between get, getattr, and getattribute?

What are some general rules of thumb for choosing which of these to implement in a given class, in a given situation?
I have read the docs, and so understand the difference between them. Rather, I am looking for guidance on how to best integrate their usage into my workflow by being better able to notice more subtle opportunities to use them, and which to use when. That kind of thing. The methods in question are (to my knowledge):
## fallback
__getattr__
__setattr__
__delattr__
## full control
__getattribute__
##(no __setattribute__ ? What's the deal there?)
## (the descriptor protocol)
__get__
__set__
__delete__

__setattribute__ does not exist because __setattr__ is always called. __getattr__ is only called for f.x if the attribute lookup fails via the normal channel (which is provided by __getattribute__, so that function is similarly always called).
The descriptor protocol is slightly orthogonal to the others. Given
class Foo(object):
def __init__(self):
self.x = 5
f = Foo()
The following are true:
f.x would invoke f.__getattribute__('x') if it were defined.
f.x would not invoke f.__getattr__('x') if it were defined.
f.y would invoke f.__getattr__('y') if it were defined, or else
f.__getattribute__('y') if it were defined.
The descriptor is invoked by an attribute, rather than for an attribute. That is:
class MyDescriptor(object):
def __get__(...):
pass
def __set__(...):
pass
class Foo(object):
x = MyDescriptor()
f = Foo()
Now, f.x would cause type(f).__dict__['x'].__get__ to be called, and f.x = 3 would call type(f).__dict__['x'].__set__(3).
That is, Foo.__getattr__ and Foo.__getattribute__ would be used to find what f.x references; once you have that, f.x produces the result of type(f.x).__get__() if defined, and f.x = y invokes f.x.__set__(y) if defined.
(The above calls to __get__ and __set__ are only approximately correct, since I've left out the details of what arguments __get__ and __set__ actually receive, but this should be enough to explain the difference between __get__ and __getattr[ibute]__.)
Put yet another way, if MyDescriptor did not define __get__, then f.x would simply return the instance of MyDescriptor.

For __getattr__ vs __getattribute__, see for example Difference between __getattr__ vs __getattribute__ .
__get__ is not really related. I'm going to quote from the official documentation for the descriptor protocol here:
The default behavior for attribute access is to get, set, or delete the attribute from an object’s dictionary. For instance, a.x has a lookup chain starting with a.__dict__['x'], then type(a).__dict__['x'], and continuing through the base classes of type(a) excluding metaclasses. If the looked-up value is an object defining one of the descriptor methods, then Python may override the default behavior and invoke the descriptor method instead. Where this occurs in the precedence chain depends on which descriptor methods were defined.
The purpose of __get__ is to control what happens once a.x is found through that "lookup chain" (for example, to create a method object instead of returning a plain function found via type(a).__dict__['x']); the purpose of __getattr__ and __getattribute__ is to alter the lookup chain itself.
There is no __setattribute__ because there is only one way to actually set an attribute of an object, under the hood. You might want to cause other things to happen "magically" when an attribute is set; but __setattr__ covers that. __setattribute__ couldn't possibly provide any functionality that __setattr__ doesn't already.
However - the best answer, in the overwhelming majority of cases, is to not even think of using any of these. First look to higher-level abstractions, such as propertys, classmethods, and staticmethods. If you think you need specialized tools like this, and can't figure it out for yourself, there's a pretty good chance you're wrong in your thinking; but regardless, it's better to post a more specific question in that case.

Implementing class descriptors by subclassing the `type` class

I'd like to have some data descriptors as part of a class. Meaning that I'd like class attributes to actually be properties, whose access is handled by class methods.
It seems that Python doesn't directly support this, but that it can be implemented by subclassing the type class. So adding a property to a subclass of type will cause its instances to have descriptors for that property. Its instances are classes. Thus, class descriptors.
Is this advisable? Are there any gotchas I should watch out for?

It is convention (usually), for a descriptor, when accessed on a class, to return the descriptor object itself. This is what property does; if you access a property object on a class, you get the property object back (because that's what it's __get__ method chooses to do). But that's a convention; you don't have to do it that way.
So, if you only need to have a getter descriptor on your class, and you don't mind that a an attempt to set will overwrite the descriptor, you can do something like this with no metaclass programming:
def classproperty_getter_only(f):
class NonDataDescriptor(object):
def __get__(self, instance, icls):
return f(icls)
return NonDataDescriptor()
class Foo(object):
#classproperty_getter_only
def flup(cls):
return 'hello from', cls
print Foo.flup
print Foo().flup
for
('hello from', <class '__main__.Foo'>)
('hello from', <class '__main__.Foo'>)
If you want a full fledged data descriptor, or want to use the built-in property object, then you're right you can use a metaclass and put it there (realizing that this attribute will be totally invisible from instances of your class; metaclasses are not examined when doing attribute lookup on an instance of a class).
Is it advisable? I don't think so. I wouldn't do what you're describing casually in production code; I would only consider it if I had a very compelling reason to do so (and I can't think of such a scenario off the top of my head). Metaclasses are very powerful, but they aren't well understood by all programmers, and are somewhat harder to reason about, so their use makes your code harder to maintain. I think this sort of design would be frowned upon by the python community at large.

Is this what you mean by "class attributes to actually be properties, whose access is handled by class methods"?
You can use a decorator property to make an accessor appear to be an actual data member. Then, you can use the x.setter decorator to make a setter for that attribute.
Be sure to inherit from object, or this won't work.
class Foo(object):
def __init__(self):
self._hiddenx = 3
#property
def x(self):
return self._hiddenx + 10
#x.setter
def x(self, value):
self._hiddenx = value
p = Foo()
p.x #13
p.x = 4
p.x #14

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.