I was writing a code that finds "unbound methods" of a class using introspection and was surprised to see two different kinds of descriptors for builtin types:
>>> type(list.append), list.append
(<class 'method_descriptor'>, <method 'append' of 'list' objects>)
>>> type(list.__add__), list.__add__
(<class 'wrapper_descriptor'>, <slot wrapper '__add__' of 'list' objects>)
Searching the docs turned up very limited but interesting results:
A note in the inspect module that inspect.getattr_static doesn't resolve descriptors and includes a code that can be used to resolve them.
an optimization made in python 2.4 claiming that method_descriptor is more efficient than wrapper_descriptor but not explaining what they are:
The methods list.__getitem__(), dict.__getitem__(), and dict.__contains__() are now implemented as method_descriptor objects rather than wrapper_descriptor objects. This form of access doubles their performance and makes them more suitable for use as arguments to functionals: map(mydict.__getitem__, keylist).
The difference in performance quite intrigued me, clearly there is a difference so I went looking for additional information.
Neither of these types are in the module types:
>>> import types
>>> type(list.append) in vars(types).values()
False
>>> type(list.__add__) in vars(types).values()
False
using help doesn't provide any useful information:
>>> help(type(list.append))
Help on class method_descriptor in module builtins:
class method_descriptor(object)
| Methods defined here:
|
<generic descriptions for>
__call__, __get__, __getattribute__, __reduce__, and __repr__
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __objclass__
|
| __text_signature__
>>> help(type(list.__add__))
Help on class wrapper_descriptor in module builtins:
class wrapper_descriptor(object)
| Methods defined here:
|
<generic descriptions for>
__call__, __get__, __getattribute__, __reduce__, and __repr__
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __objclass__
|
| __text_signature__
Searching the internet only came up with results about "what is a descriptor" or vague references to the specific types involved.
So my question is:
What is the actual difference between <class 'method_descriptor'> and <class 'wrapper_descriptor'>?
It's an implementation detail. At C level, a built-in type like list defines methods like append by name through an array of PyMethodDef structs, while special methods like __add__ are defined more indirectly.
__add__ corresponds to a function pointer in either of the two slots sq_concat in the type's tp_as_sequence or nb_add in the type's tp_as_number. If a type defines one of those slots, Python generates a wrapper_descriptor wrapping that slot for the __add__ method of the Python-level API.
The wrapping necessary for type slots and PyMethodDef structs is a bit different; for example, two slots could correspond to one method, or one slot could correspond to six methods. Slots also don't carry their method names with them, while the method name is one of the fields in a PyMethodDef. Since different code is needed for the two cases, Python uses different wrapper types to wrap them.
If you want to see the code, both method_descriptor and wrapper_descriptor are implemented in Objects/descrobject.c, with struct typedefs in Include/descrobject.h. You can see the code that initializes the wrappers in Objects/typeobject.c, where PyType_Ready delegates to add_operators for wrapper_descriptors and add_methods for method_descriptors.
It seems that method_descriptor and wrapper_descriptor are a kind of callables that are available in CPython.
The difference between them seems to be simple
method_descriptor is apparently used for the methods of built-in
(implemented in C) objects:
set.__dict__['union'].__class__
<class 'wrapper_descriptor'>
wrapper_descriptor is used for example the operators of built-in types:
int.__dict__['__add__'].__class__.
<class 'method-wrapper'>
This is the place where I found this information.
Related
Where is object.__init__ located in the cpython repository?
I searched for __init__ in Objects/object.c, but it gives no results.
It appears that all the immutable data types use object.__init__, so I would like to know the implementation of it.
Objects/object.c is where (most of) the object protocol is implemented, not where object is implemented.
object is implemented along with type in Objects/typeobject.c, and its __init__ method is object_init in that file.
(Note that the very similar-sounding PyObject_Init function is actually completely unrelated to object.__init__. PyObject_Init is a generic helper function that performs type pointer and refcount initialization for a newly-allocated object struct.)
What are the drawbacks or benefits to using types.FunctionType vs typing.Callable as a type-hint annotation?
Consider the following code...
import types
import typing
def functionA(func: types.FunctionType):
rt = func()
print(func.__name__)
return rt
def functionB(func: typing.Callable):
rt = func()
print(func.__name__)
return rt
The only difference I can see is Callable could be any sort of callable object (function, method, class, etc) while FunctionType is limited to only functions.
Am I overlooking something? Is there a benefit to using FunctionType over Callable in certain situations?
The types module predates PEP 484 annotations and was created mostly to make runtime introspection of objects easier. For example, to determine if some value is a function, you can run isinstance(my_var, types.FunctionType).
The typing module contains type hints that are specifically intended to assist static analysis tools such as mypy. For example, suppose you want to indicate that a parameter must be a function that accepts two ints and returns a str. You can do so like this:
def handle(f: Callable[[int, int], str]) -> None: ...
There is no way to use FunctionType in a similar manner: it simply was not designed for this purpose.
This function signature is also more flexible: it can also accept things like objects with a __call__ since such objects are indeed callable.
The contents of the typing module can also sometimes be used for runtime checks in a manner similar to the contents of types as a convenience: for example, doing isinstance(f, Callable) works. However, this functionality is deliberately limited: doing isinstance(f, Callable[[int, int], str]) is intentionally disallowed. Attempting to perform that check will raise an exception at runtime.
That said, I don't think it's a good style to perform runtime checks using anything from typing: the typing module is meant first and foremost for static analysis.
I would not use anything from the types module within type hints for similar reasons. The only exception is if your function is written in a way such that it's critical that the value you need to receive is specifically an instance of FunctionType, rather than being any arbitrary callable.
The new typing module in Python 3.5 provides a number of tools for use in type annotations. Does it provide an object or type that encapsulates the idea of class? How about the idea of function?
In the following code, which defines a decorator, what should stand in for class_? What should stand in for function? (typing.Callable is inadequate, because for example a class is callable, but the code is trying to identify methods.) (The no_type_check() decorator in the typing module itself might be a prototype for decorators that act like this. no_type_check() itself does not have any annotations, type-hint or otherwise.)
import typing
def is_double_underscore_name (name):
return len(name) > 4 and name.startswith('__') and name.endswith('__')
# This code will not run, because 'class_' and 'function' are names that do not have any
# predefined meaning. See the text of the question.
# Note: This modifies classes in-place but (probably) does not modify functions in-place;
# this is not a considered design decision; it is just the easiest thing to do in a very
# basic example like this.
def do_something (class_or_function: typing.Union[class_, function]):
if isinstance(class_or_function, class_):
for name in class_or_function.__dict__:
if not is_double_underscore_name(name):
object = class_or_function.__dict__[name]
if isinstance(object, function):
class_or_function.__dict__[name] = do_something(object)
return class_or_function
else:
... # return the function, modified in some way
Classes are instances of the type type.
Functions are of the types types.FunctionType or types.BuiltinFunctionType.
Methods are of the types types.MethodType or types.BuiltinMethodType.
types has been a part of Python for... a very long time.
Here is some sample Python code:
import re
some_regex = re.compile(r"\s+1\s+")
result = some_regex.search(" 1 ")
dir(result)
I get back the following using Python 2.6.1:
['__copy__', '__deepcopy__', 'end', 'expand', 'group', 'groupdict', 'groups', 'span', 'start']
Yet result.re exists (from the interpreter):
>>> result.re
<_sre.SRE_Pattern object at 0x10041bc90>
How can an attribute not be listed when using the dir() function?
This page confirms the existence of the re attribute:
http://docs.python.org/library/re.html#re.MatchObject.re
Now I understand that if one tries to access an attribute which is not listed via dir(), then __getattr__ is called, but I don't see __getattr__ listed as one of the object's attributes either, so I'm left scratching my head.
Update
And here is proof of the existence of matchobject.re in the Python 2.6.1 documentation:
http://docs.python.org/release/2.6.1/library/re.html#re.MatchObject.re
You see this behavior because the class is implemented in C, and in the same way that dir() is unreliable with a custom __getattr__(), it is also unreliable when the C code defines a getattr function.
Here is a link to the Python 2.6 C code for the SRE_Match getattr function:
http://hg.python.org/cpython/file/f130ce67387d/Modules/_sre.c#l3565
Note that the methods defined in the match_methods array have Python implementations and are visible in the dir() output, but handled by an if in the match_getattr() function is not visible.
In Python 2.6, it looks like this includes the following attributes: lastindex, lastgroup, string, regs, re, pos, and endpos.
Here is a link to some of the Python 2.7 code which is slightly different. Here there is not a getattr function implemented for SRE_Match, and all methods and attributes can be found in the match_methods, match_members, and match_getset arrays, and everything is visible in dir().
http://hg.python.org/cpython/file/60a7b704de5c/Modules/_sre.c#l3612
The built-in function dir() is a convenience function and results in an approximate list of attributes. From the documentation:
Because dir() is supplied primarily as a convenience for use at an interactive prompt, it tries to supply an interesting set of names more than it tries to supply a rigorously or consistently defined set of names, and its detailed behavior may change across releases. For example, metaclass attributes are not in the result list when the argument is a class.
Note that it is impossible to always give a complete list of attributes, since classes can do in their __getattr__() and __getattribute__() methods whatever they want.
I am writing a binding system that exposes classes and functions to python in a slightly unusual way.
Normally one would create a python type and provide a list of functions that represent the methods of that type, and then allow python to use its generic tp_getattro function to select the right one.
For reasons I wont go into here, I can't do it this way, and must provide my own tp_getattro function, that selects methods from elsewhere and returns my own 'bound method' wrapper. This works fine, but means that a types methods are not listed in its dictionary (so dir(MyType()) doesn't show anything interesting).
The problem is that I cannot seem to get __add__ methods working. see the following sample:
>>> from mymod import Vec3
>>> v=Vec3()
>>> v.__add__
<Bound Method of a mymod Class object at 0xb754e080>
>>> v.__add__(v)
<mymod.Vec3 object at 0xb751d710>
>>> v+v
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'mymod.Vec3' and 'mymod.Vec3'
As you can see, Vec3 has an __add__ method which can be called, but python's + refuses to use it.
How can I get python to use it? How does the + operator actually work in python, and what method does it use to see if you can add two arbitrary objects?
Thanks.
(P.S. I am aware of other systems such as Boost.Python and SWIG which do this automatically, and I have good reason for not using them, however wonderful they may be.)
Do you have an nb_add in your type's number methods structure (pointed by field tp_as_number of your type object)?