PyInstance_NewRaw() with old and new style classes - python

Recently I faced a problem in a C-based python extension while trying to instantiate objects without calling its constructor -- which is a requirement of the extension.
The class to be used to create instances is obtained dynamically: at some point, I have an instance x whose class I wish to use to create other instances, so I store x.__class__ for later use -- let this value be klass.
At a later point, I invoke PyInstance_NewRaw(klass, PyDict_New()) and then, the problem arises. It seems that if klass is an old-style class, the result of that call is the desired new instance. However, if it is a new-style class, the result is NULL and the exception raised is:
SystemError: ../Objects/classobject.c:521: bad argument to internal function
For the record, I'm using Python version 2.7.5. Googling around, I observed no more than one other person looking for a solution (and it seemed to me he was doing a workaround, but didn't detailed it).
For the record #2: the instances the extension is creating are proxies for these same x instances -- the x.__class__ and x.__dict__'s are known, so the extension is spawning new instances based on __class__ (using the aforementioned C function) and setting the respective __dict__ to the new instance (those __dict__'s have inter-process shared-memory data). Not only is conceptually problematic to call an instance's __init__ a second time (first: it's state is already know, second: the expected behavior for ctors is that they should be called exactly once for each instance), it is also impractical, since the extension cannot figure out the arguments and their order to call the __init__() for each instance in the system. Also, changing the __init__ of each class in the system whose instances may be proxies and making them aware there is a proxy mechanism they will be subjected to is conceptually problematic (they shouldn't know about it) and impractical.
So, my question is: how to perform the same behavior of PyInstance_NewRaw regardless of the instance's class style?

The type of new-style classes isn't instance, it's the class itself. So, the PyInstance_* methods aren't even meaningful for new-style classes.
In fact, the documentation explicitly explains this:
Note that the class objects described here represent old-style classes, which will go away in Python 3. When creating new types for extension modules, you will want to work with type objects (section Type Objects).
So, you will have to write code that checks whether klass is an old-style or new-style class and does the appropriate thing for each case. An old-style class's type is PyClass_Type, while a new-style class's type is either PyType_Type, or a custom metaclass.
Meanwhile, there is no direct equivalent of PyInstance_NewRaw for new-style classes. Or, rather, the direct equivalent—calling its tp_alloc slot and then adding a dict—will give you a non-functional class. You could try to duplicate all the other appropriate work, but that's going to be tricky. Alternatively, you could use tp_new, but that will do the wrong thing if there's a custom __new__ function in the class (or any of its bases). See the rejected patches from #5180 for some ideas.
But really, what you're trying to do is probably not a good idea in the first place. Maybe if you explained why this is a requirement, and what you're trying to do, there would be a better way to do it.
If the goal is to build objects by creating a new uninitialized instance of the class, then copying over its _dict__ from an initialized prototype, there's a much easier solution that I think will work for you:
__class__ is a writeable attribute. So (showing it in Python; the C API is basically the same, just a lot more verbose, and I'd probably screw up the refcounting somewhere):
class NewStyleDummy(object):
pass
def make_instance(cls, instance_dict):
if isinstance(cls, types.ClassType):
obj = do_old_style_thing(cls)
else:
obj = NewStyleDummy()
obj.__class__ = cls
obj.__dict__ = instance_dict
return obj
The new object will be an instance of cls—in particular, it will have the same class dictionary, including the MRO, metaclass, etc.
This won't work if cls has a metaclass that's required for its construction, or a custom __new__ method, or __slots__… but then your design of copying over the __dict__ doesn't make any sense in those cases anyway. I believe that in any case where anything could possibly work, this simple solution will work.
Calling cls.__new__ seems like a good solution at first, but it actually isn't. Let me explain the background.
When you do this:
foo = Foo(1, 2)
(where Foo is a new-style class), it gets converted into something like this pseudocode:
foo = Foo.__new__(1, 2)
if isinstance(foo, Foo):
foo.__init__(1, 2)
The problem is that, if Foo or one of its bases has defined a __new__ method, it will expect to get the arguments from the constructor call, just like an __init__ method will.
As you explained in your question, you don't know the constructor call arguments—in fact, that's the main reason you can't call the normal __init__ method in the first place. So, you can't call __new__ either.
The base implementation of __new__ accepts and ignores any arguments it's given. So, if none of your classes has a __new__ override or a __metaclass__, you will happen to get away with this, because of a quirk in object.__new__ (a quirk which works differently in Python 3.x, by the way). But those are the exact same cases the previous solution can handle, and that solution works for much more obvious reason.
Put another way: The previous solution depends on nobody defining __new__ because it never calls __new__. This solution depends on nobody defining __new__ because it calls __new__ with the wrong arguments.

Related

Are there any unique features provided only by metaclasses in Python?

I have read answers for this question: What are metaclasses in Python? and this question: In Python, when should I use a meta class? and skimmed through documentation: Data model.
It is very possible I missed something, and I would like to clarify: is there anything that metaclasses can do that cannot be properly or improperly (unpythonic, etc) done with the help of other tools (decorators, inheritance, etc)?
That is a bit tricky to answer -
However, it is a very nice question to ask at this point, and there are certainly a few things that are easier to do with metaclasses.
So, first, I think it is important to note the things for which one used to need a metaclass in the past, and no longer needs to: I'd say that with the release of Python 3.6 and the inclusion of __init_subclass__ and __set_name__ dunder methods, a lot, maybe the majority of the cases I had always written a metaclass for (most of them for answering questions or in toy code - no one creates that many production-code metaclasses even in a lifetime as a programmer) became outdated.
Specially __init_subclass__ adds the convenience of being able to transform any attribute or method like class-decorators, but is automatically applied on inheritance, which does not happen with decorators.
I guess reading about it was a fator motivating your question - since most metaclasses found out in the wild deal with transforming these attributes in __new__ and __init__ metaclass methods.
However, note that if one needs to transform any attribute prior to having it included in the class, the metaclass __new__ method is the only place it can be done. In most cases, however, one can simply transform it in the final new class namespace.
Then, one version forward, in 3.7, we had __class_getitem__ implemented - since using the [ ] (__getitem__) operator directly on classes became popular due to typing annotations. Before that, one would have to create a metaclass with a __getitem__ method for the sole purpose of being able to indicate to the type-checker toolchain some extra information like generic variables.
One interesting possibility that did not exist in Python 2, was introduced in Python 3, then outdated, and now can only serve very specific cases is the use of the __prepare__ method on the metaclass:
I don't know if this is written in any official docs, but the obvious primary motivation for metaclass __prepare__ which allows one custom namespace for the class body, was to return an ordered dict, so that one could have ordered attributes in classes that would work as data entities. It turns out that also, from Python 3.6 on, class body namespaces where always ordered (which later on Python 3.7 were formalized for all Python dictionaries). However, although not needed for returning an OrderedDict anymore, __prepare__ is still aunique thing in the language in which it allows a custom mapping class to be used as namespace in a piece of Python code (even if that is limited to class bodies). For example, one can trivialy create an "auto-enumeration" metaclass by returning a
class MD(dict):
def __init__(self, *args, **kw):
super().__init__(*args, **kw)
self.counter = 0
def __missing__(self, key):
counter = self[key] = self.counter
self.counter += 1
return counter
class MC(type):
#classmethod
def __prepare__(mcls, name, bases, **kwd):
return MD()
class Colors(metaclass=MC):
RED
GREEN
BLUE
(an example similar to this is included in Luciano Ramalho's 'Fluent Python' 2nd edition)
The __call__ method on the metaclass is also peculiar: it control the calls to __new__ and __init__ whenever an instance of the class is created. There are recipes around that use this to create a "singleton" - I find those terrible and overkill: if I need a singleton, I just create an instance of the singleton class at module level. However, overriding typing.__call__ offers a level of control on class instantiation that may be hard to achieve on the class __new__ and __init__ themselves. But this definitely can be done by correctly keeping the desired states in the class object itself.
__subclasscheck__ and __instancecheck__: these are metaclass only methods, and the only workaround would be to make a class decorator that would re-create a class object so that it would be a "real" subclass of the intended base class. (and that is not always possible).
"hidden" class attributes: now, this can be useful, and is less known, as it derives from the language behavior itself: any attribute or method besides the dunder methods included in a metaclass can be used from a class, but from instances of that class. An example for this is the .register method in classes using abc.ABCMeta. This contrasts with ordinary classmethods which can be used normally from an instance.
And finally, any behavior defined with the dunder methods for a Python object can be implemented to work on classes if they are defined in the metaclass. So if you have any use case for "add-able" classes, or want a special repr for your classes, just implement __add__ or __repr__ on the metaclass: this behavior obviously can't be obtained by other means.
I think I got all covered there.

Equivalent to im_func for __new__?

I've got a chunk of code to automate monkey patching that caches away a function's im_func reference, and then replaces the function while attaching the im_func of the original as a ._unmonkeyed attribute, like so:
class MonkeyPatch(object):
'''A callable object to implement the monkey patch. Stores the previous version in
attribute _unmonkeyed and new version in _monkeyed.'''
def __init__(self,source,target,attr):
self._monkeyPatchSource=source
self._monkeyPatchTarget=target
self._monkeyPatchAttr=attr
self._monkeyed=getattr(source,attr).im_func
self._unmonkeyed=getattr(target,attr,None)
setattr(target,attr,self)
###a few more methods here
def __get__(self,inst,cls=None): #(self,*args,**kwds):
tmp=lambda *args,**kwds: self._monkeyed(inst,*args,**kwds)
tmp._unmonkeyed=lambda *args,**kwds: self._unmonkeyed(inst,*args,**kwds)
return tmp
I'm not much of Pythonista, so I'm sure there's a thousand reasons this is a dumb way to do things, but it's worked for me. Now I find myself in a place where I'd like to patch a class' __new__ method to add some logic before calling the existing __new__. __new__ doesn't have an im_func attribute, and that probably indicates that there are other methods that don't.
Is there a way to accomplish the same job in a general way (preferably without having to keep a list of special cases) for methods without im_func?
Subclassing isn't the behavior I want here because I want to inject the new code into an existing class hierarchy. This isn't production code, so I'm not too worried about the consequences of adding a few blue wires.

How does extending classes (Monkey Patching) work in Python?

class Foo(object):
pass
foo = Foo()
def bar(self):
print 'bar'
Foo.bar = bar
foo.bar() #bar
Coming from JavaScript, if a "class" prototype was augmented with a certain attribute. It is known that all instances of that "class" would have that attribute in its prototype chain, hence no modifications has to be done on any of its instances or "sub-classes".
In that sense, how can a Class-based language like Python achieve Monkey patching?
The real question is, how can it not? In Python, classes are first-class objects in their own right. Attribute access on instances of a class is resolved by looking up attributes on the instance, and then the class, and then the parent classes (in the method resolution order.) These lookups are all done at runtime (as is everything in Python.) If you add an attribute to a class after you create an instance, the instance will still "see" the new attribute, simply because nothing prevents it.
In other words, it works because Python doesn't cache attributes (unless your code does), because it doesn't use negative caching or shadowclasses or any of the optimization techniques that would inhibit it (or, when Python implementations do, they take into account the class might change) and because everything is runtime.
I just read through a bunch of documentation, and as far as I can tell, the whole story of how foo.bar is resolved, is as follows:
Can we find foo.__getattribute__ by the following process? If so, use the result of foo.__getattribute__('bar').
(Looking up __getattribute__ will not cause infinite recursion, but the implementation of it might.)
(In reality, we will always find __getattribute__ in new-style objects, as a default implementation is provided in object - but that implementation is of the following process. ;) )
(If we define a __getattribute__ method in Foo, and access foo.__getattribute__, foo.__getattribute__('__getattribute__') will be called! But this does not imply infinite recursion - if you are careful ;) )
Is bar a "special" name for an attribute provided by the Python runtime (e.g. __dict__, __class__, __bases__, __mro__)? If so, use that. (As far as I can tell, __getattribute__ falls into this category, which avoids infinite recursion.)
Is bar in the foo.__dict__ dict? If so, use foo.__dict__['bar'].
Does foo.__mro__ exist (i.e., is foo actually a class)? If so,
For each base-class base in foo.__mro__[1:]:
(Note that the first one will be foo itself, which we already searched.)
Is bar in base.__dict__? If so:
Let x be base.__dict__['bar'].
Can we find (again, recursively, but it won't cause a problem) x.__get__?
If so, use x.__get__(foo, foo.__class__).
(Note that the function bar is, itself, an object, and the Python compiler automatically gives functions a __get__ attribute which is designed to be used this way.)
Otherwise, use x.
For each base-class base of foo.__class__.__mro__:
(Note that this recursion is not a problem: those attributes should always exist, and fall into the "provided by the Python runtime" case. foo.__class__.__mro__[0] will always be foo.__class__, i.e. Foo in our example.)
(Note that we do this even if foo.__mro__ exists. This is because classes have a class, too: its name is type, and it provides, among other things, the method used to calculate __mro__ attributes in the first place.)
Is bar in base.__dict__? If so:
Let x be base.__dict__['bar'].
Can we find (again, recursively, but it won't cause a problem) x.__get__?
If so, use x.__get__(foo, foo.__class__).
(Note that the function bar is, itself, an object, and the Python compiler automatically gives functions a __get__ attribute which is designed to be used this way.)
Otherwise, use x.
If we still haven't found something to use: can we find foo.__getattr__ by the preceding process? If so, use the result of foo.__getattr__('bar').
If everything failed, raise AttributeError.
bar.__get__ is not really a function - it's a "method-wrapper" - but you can imagine it being implemented vaguely like this:
# Somewhere in the Python internals
class __method_wrapper(object):
def __init__(self, func):
self.func = func
def __call__(self, obj, cls):
return lambda *args, **kwargs: func(obj, *args, **kwargs)
# Except it actually returns a "bound method" object
# that uses cls for its __repr__
# and there is a __repr__ for the method_wrapper that I *think*
# uses the hashcode of the underlying function, rather than of itself,
# but I'm not sure.
# Automatically done after compiling bar
bar.__get__ = __method_wrapper(bar)
The "binding" that happens within the __get__ automatically attached to bar (called a descriptor), by the way, is more or less the reason why you have to specify self parameters explicitly for Python methods. In Javascript, this itself is magical; in Python, it is merely the process of binding things to self that is magical. ;)
And yes, you can explicitly set a __get__ method on your own objects and have it do special things when you set a class attribute to an instance of the object and then access it from an instance of that other class. Python is extremely reflective. :) But if you want to learn how to do that, and get a really full understanding of the situation, you have a lot of reading to do. ;)

How to tell python non-class objects from class objects

I am new to python. I think non-class objects do not have bases attribute whereas class objects do have it. But I am not sure. How does python\cpython checks if an object is non-class or class and passes the correct arguments to the object's descriptor attribute accordingly during the attribute access?
============================================
updated:
I was learning how __getattribute__ and descriptor cooperate together to make bounded methods. I was wondering how class object & non-class object invokes the descriptor's __get__ differently. I thought those 2 types of objects shared the same __getattribute__ CPython function and that same function would have to know if the invoking object was a class or non-class. But I was wrong. This article explains it well:
http://docs.python.org/dev/howto/descriptor.html#functions-and-methods
So class object use type.__getattribute__ whereas non-class object use object.__getattribute__. They are different CPython functions. And super has a third __getattribute__ CPython implementation as well.
However, about the super one, the above article states that:
quote and quote
The object returned by super() also has a custom _getattribute_() method for invoking descriptors. The call super(B, obj).m() searches obj._class_._mro_ for the base class A immediately following B and then returns A._dict_['m']._get_(obj, A). If not a descriptor, m is returned unchanged. If not in the dictionary, m reverts to a search using object._getattribute_().
The statement above didn't seem to match my experiment with Python3.1. What I saw is, which is reasonable to me:
super(B, obj).m ---> A.__dict__['m'].__get__(obj, type(obj))
objclass = type(obj)
super(B, objclass).m ---> A.__dict__['m'].__get__(None, objclass)
A was never passed to __get__
It is reasonable to me because I believe objclass (rather than A) 's mro chain is the one needed within m especially for the second case.
Was I doing something wrong? Or I didn't understand it correctly?
As the commenters asked: Why do you care? Usually that's a sign of not using Python the way it was meant to be used.
A very powerful concept of Python is duck typing. You don't care about the type or class of an object as long as it exposes the attributes you need.
how about inspect.isclass(objectname)?
more info here: http://docs.python.org/library/inspect.html

Python metaclasses

I've been hacking classes in Python like this:
def hack(f,aClass) :
class MyClass(aClass) :
def f(self) :
f()
return MyClass
A = hack(afunc,A)
Which looks pretty clean to me. It takes a class, A, creates a new class derived from it that has an extra method, calling f, and then reassigns the new class to A.
How does this differ from metaclass hacking in Python? What are the advantages of using a metaclass over this?
The definition of a class in Python is an instance of type (or an instance of a subclass of type). In other words, the class definition itself is an object. With metaclasses, you have the ability to control the type instance that becomes the class definition.
When a metaclass is invoked, you have the ability to completely re-write the class definition. You have access to all the proposed attributes of the class, its ancestors, etc. More than just injecting a method or removing a method, you can radically alter the inheritance tree, the type, and pretty much any other aspect. You can also chain metaclasses together for a very dynamic and totally convoluted experience.
I suppose the real benefit, though is that the class's type remains the class's type. In your example, typing:
a_inst = A()
type(a_inst)
will show that it is an instance of MyClass. Yes, isinstance(a_inst, aClass) would return True, but you've introduced a subclass, rather than a dynamically re-defined class. The distinction there is probably the key.
As rjh points out, the anonymous inner class also has performance and extensibility implications. A metaclass is processed only once, and the moment that the class is defined, and never again. Users of your API can also extend your metaclass because it is not enclosed within a function, so you gain a certain degree of extensibility.
This slightly old article actually has a good explanation that compares exactly the "function decoration" approach you used in the example with metaclasses, and shows the history of the Python metaclass evolution in that context: http://www.ibm.com/developerworks/linux/library/l-pymeta.html
You can use the type callable as well.
def hack(f, aClass):
newfunc = lambda self: f()
return type('MyClass', (aClass,), {'f': newfunc})
I find using type the easiest way to get into the metaclass world.
A metaclass is the class of a class. IMO, the bloke here covered it quite serviceably, including some use-cases. See Stack Overflow question "MetaClass", "new", "cls" and "super" - what is the mechanism exactly?.

Categories

Resources