What is the purpose of checking self.__class__ ? I've found some code that creates an abstract interface class and then checks whether its self.__class__ is itself, e.g.
class abstract1 (object):
def __init__(self):
if self.__class__ == abstract1:
raise NotImplementedError("Interfaces can't be instantiated")
What is the purpose of that?
Is it to check whether the class is a type of itself?
The code is from NLTK's http://nltk.googlecode.com/svn/trunk/doc/api/nltk.probability-pysrc.html#ProbDistI
self.__class__ is a reference to the type of the current instance.
For instances of abstract1, that'd be the abstract1 class itself, which is what you don't want with an abstract class. Abstract classes are only meant to be subclassed, not to create instances directly:
>>> abstract1()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in __init__
NotImplementedError: Interfaces can't be instantiated
For an instance of a subclass of abstract1, self.__class__ would be a reference to the specific subclass:
>>> class Foo(abstract1): pass
...
>>> f = Foo()
>>> f.__class__
<class '__main__.Foo'>
>>> f.__class__ is Foo
True
Throwing an exception here is like using an assert statement elsewhere in your code, it protects you from making silly mistakes.
Note that the pythonic way to test for the type of an instance is to use the type() function instead, together with an identity test with the is operator:
class abstract1(object):
def __init__(self):
if type(self) is abstract1:
raise NotImplementedError("Interfaces can't be instantiated")
type() should be preferred over self.__class__ because the latter can be shadowed by a class attribute.
There is little point in using an equality test here as for custom classes, __eq__ is basically implemented as an identity test anyway.
Python also includes a standard library to define abstract base classes, called abc. It lets you mark methods and properties as abstract and will refuse to create instances of any subclass that has not yet re-defined those names.
The code that you posted there is a no-op; self.__class__ == c1 is not part of a conditional so the boolean is evaluated but nothing is done with the result.
You could try to make an abstract base class that checks to see if self.__class__ is equal to the abstract class as opposed to a hypothetical child (via an if statement), in order to prevent the instantiation of the abstract base class itself due to developer mistake.
What is the purpose of that? Is it to check whether the class is a type of itself?
Yes, if you try to construct an object of type Abstract1 it'll throw that exception telling you that you're not allowed to do so.
In Python 3 either using type() to check for the type or __class__ will return the same result.
class C:pass
ci=C()
print(type(ci)) #<class '__main__.C'>
print(ci.__class__) #<class '__main__.C'>
I recently checked the implementation for the #dataclass decorator (Raymond Hettinger is directly involved into that project), and they are using __class__ to refer to the type.
So it is not wrong to use __class__ :)
The clues are in the name of the class, "abstract1", and in the error. This is intended to be an abstract class, meaning one that is intended to be subclassed. Each subclass will provide its own behaviour. The abstract class itself serves to document the interface, i.e. the methods and arguments that classes implementing the interface are expected to have. It is not meant to be instantiated itself, and the test is used to tell whether we are in the class itself or a subclass.
See the section on Abstract Classes in this article by Julien Danjou.
Related
I understand __class__ can be used to get the class of an object, it also can be used to get current class in a class definition. My question is, in a python class definition, is it safe just use __class__, rather than self.__class__?
#!/usr/bin/python3
class foo:
def show_class():
print(__class__)
def show_class_self(self):
print(self.__class__)
if __name__ == '__main__':
x = foo()
x.show_class_self()
foo.show_class()
./foo.py
<class '__main__.foo'>
<class '__main__.foo'>
As the codes above demonstrated, at least in Python3, __class__ can be used to get the current class, in the method show_class, without the present of "self". Is it safe? Will it cause problems in some special situations? (I can think none of it right now).
__class__ is lexically scoped, whereas some_object.__class__ is dynamically dispatched. So the two can different when the lexical scope is different from the of the receiver, like if lambdas are involved:
#!/usr/bin/env python3
class ClassA:
def print_callback(self, callback):
print(callback(self))
class ClassB:
def test(self):
ClassA().print_callback(lambda o: o.__class__) # <class '__main__.ClassA'>
ClassA().print_callback(lambda _: __class__) # <class '__main__.ClassB'>
ClassB().test()
It depends on what you're trying to achieve. Do you want to know which class's source code region you find yourself in, or the class of a particular object?
And I think it goes without saying, but I'll mention it explicitly: don't rely on the attribute directly, use the type function. I.e. prefer type(o) over o.__class__.
That is documented in the datamodel, so I believe it is safe/reliable.
From 3.3.3.5. Executing the class body:
Class variables must be accessed through the first parameter of instance or class methods, or through the implicit lexically scoped __class__ reference described in the next section.
From 3.3.3.6. Creating the class object:
__class__ is an implicit closure reference created by the compiler if any methods in a class body refer to either __class__ or super
It is true that the docs mention any methods, your foo.show_class is a function but perhaps not convincingly a method. However PEP 3135, which added this reference, is worded differently:
Every function will have a cell named __class__ that contains the class object that the function is defined in.
...
For functions defined outside a class body, __class__ is not defined, and will result in runtime SystemError.
Class objects have a __bases__ (and a __base__) attribute:
>>> class Foo(object):
... pass
...
>>> Foo.__bases__
(<class 'object'>,)
Sadly, these attributes aren't accessible in the class body, which would be very convenient for accessing parent class attributes without having to hard-code the name:
class Foo:
cls_attr = 3
class Bar(Foo):
cls_attr = __base__.cls_attr + 2
# throws NameError: name '__base__' is not defined
Is there a reason why __bases__ and __base__ can't be accessed in the class body?
(To be clear, I'm asking if this is a conscious design decision. I'm not asking about the implementation; I know that __bases__ is a descriptor in type and that this descriptor can't be accessed until a class object has been created. I want to know why python doesn't create __bases__ as a local variable in the class body.)
I want to know why python doesn't create __bases__ as a local variable in the class body
As you know, class is mostly a shortcut for type.__new__() - when the runtime hits a class statements, it executes all statements at the top-level of the class body, collects all resulting bindings in a dedicated namespace dict, calls type() with the concrete metaclass, the class name, the base classes and the namespace dict, and binds the resulting class object to the class name in the enclosing scope (usually but not necessarily the module's top-level namespace).
The important point here is that it's the metaclass responsabilty to build the class object, and to allow for class object creation customisations, the metaclass must be free to do whatever it wants with its arguments. Most often a custom metaclass will mainly work on the attrs dict, but it must also be able to mess with the bases argument. Now since the metaclass is only invoked AFTER the class body statements have been executed, there's no way the runtime can reliably expose the bases in the class body scope since those bases could be modified afterward by the metaclass.
There are also some more philosophical considerations here, notably wrt/ explicit vs implicit, and as shx2 mentions, Python designers try to avoid magic variables popping out of the blue. There are indeed a couple implementation variables (__module__ and, in py3, __qualname__) that are "automagically" defined in the class body namespace, but those are just names, mostly intended as additional debugging / inspection informations for developers) and have absolutely no impact on the class object creation nor on its properties, behaviour and whatnots.
As always with Python, you have to consider the whole context (the execution model, the object model, how the different parts work together etc) to really understand the design choices. Whether you agree with the whole design and philosophy is another debate (and one that doesn't belong here), but you can be sure that yes, those choices are "conscious design decisions".
I am not answering as to why it was decided to be implemented the way it was, I'm answering why it wasn't implemented as a "local variable in the class body":
Simply because nothing in python is a local variable magically defined in the class body. Python doesn't like names magically appearing out of nowhere.
It's because it's simply is not yet created.
Consider the following:
>>> class baselessmeta(type):
... def __new__(metaclass, class_name, bases, classdict):
... return type.__new__(
... metaclass,
... class_name,
... (), # I can just ignore all the base
... {}
... )
...
>>> class Baseless(int, metaclass=baselessmeta):
... # imaginary print(__bases__, __base__)
... ...
...
>>> Baseless.__bases__
(<class 'object'>,)
>>> Baseless.__base__
<class 'object'>
>>>
What should the imaginary print result in?
Every python class is created via the type metaclass one way or another.
You have the int argument for the type() in bases argument, yet you do not know what is the return value is going to be. You may use that directly as a base in your metaclass, or you may return another base with your LOC.
Just realized your to be clear part and now my answer is useless haha. Oh welp.
While trying to write unittests that check whether a concrete subclass of an Abstract base class really does raise a TypeError upon instantiation if one of the required methods is not implemented, I stumbled upon something which made me wonder when the check if the required methods is defined by the concrete subclass is actually performed.
Until now I would have said: upon instantiation of the object, since this is the time when the Exception is actually raised when running the program.
But look at this snippet:
import abc
class MyABC(abc.ABC):
#abstractmethod
def foo(self): pass
MyConcreteSubclass(MyABC):
pass
As expected, trying to instantiate MyConcreteSubclass raises a TypeError:
>>> MyConcreteSubclass()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-39-fbfc0708afa6> in <module>()
----> 1 t = MySubclass()
TypeError: Can't instantiate abstract class MySubclass with abstract methods foo
But what happens if I declare a valid subclass at first and then afterwards delete this method surprises me:
class MyConcreteSubclass(MyABC):
def foo(self):
print("bar")
MyConcreteSubclass.foo
--> <function __main__.MyConcreteSubclass.foo(self)>
>>> t = MyConcreteSubclass()
>>> t.foo()
bar
>>> del MyConcreteSubclass.foo
>>> MyConcreteSubclass.foo
<function __main__.MyABC.foo(self)>
>>> t = MyConcreteSubclass()
>>> print(t.foo())
None
This is certainly not what I expected. When inspecting MyConcreteSubclass.foo after deletion, we see that through the method Resolution order the Abstract method of the base class is retrieved, which is the same behaviour as if we haven't implemented foo in the concrete subclass in the first place.
But after instantiation the TypeError is not raised.
So I wonder, are the checks whether the required methods are implemented already performed when the body of the concrete subclass is evaluated by the Interpreter?
If so, why are the TypeErrors only raised when someone tries to instantiate the subclass?
The Tests shown above were performed using Python 3.6.5.
It happens at class creation time. In Python 3.7, it's in C, in compute_abstract_methods in Modules/_abc.c, which is called as part of ABCMeta.__new__.
Incidentally, the docs do mention that
Dynamically adding abstract methods to a class, or attempting to modify the abstraction status of a method or class once it is created, are not supported.
user2357112's answer covers the main question here, but there's a secondary question:
why are the TypeErrors only raised when someone tries to instantiate the subclass?
If a TypeError were raised earlier, at class creation time, it would be impossible to create hierarchies of ABCs:
class MyABC(abc.ABC):
#abstractmethod
def foo(self): pass
class MySecondABC(MyABC):
#abstractmethod
def bar(self): pass
You don't want that to raise a TypeError because MySecondABC doesn't define foo, unless someone tries to instantiate MySecondABC.
What if it were legal only for classes that added new abstract methods? Then it would be possible to create ABC hierarchies, but it would be impossible to create intermediate helper classes:
class MyABCHelper(MySecondABC):
def foo(self):
return bar(self)*2
(For a more realistic example, see the classes in collections.abc that allow you to implement the full MutableSequence interface by defining only 7 of the 18 methods.)
You wouldn't want a rule that made such definitions illegal.
This is pertaining to python 2.x
In the following class, if we subclass "object", I understand the methods are inherited in the derived class Foo which includes __hash__ (can see this by printing dir(Foo() )
Hence calling hash(Foo()) calls the magic method __hash__ and gives us a hash value.
However, if we don't subclass "object", resulting in dir(Foo()) not listing out the __hash__ method, so then why do we still get a hash value in python2?
I believe in python3 this problem has been addressed since the methods from the "object*" class are inherited by default.
#class Foo(object) Works since __hash__ is available in the base class
class Foo: #Why does this work?
def __init__(self):
self.x = None
a = Foo()
print dir(a) # No __hash__ magic method
print hash(a)
# Expecting an error like non-hashable or __hash__ not implemented
# or something similar
Old-style classes are weird. Officially, instances of old-style classes aren't wholly instances of their class, they're all instances of type instance. The instance type defines __hash__ (tp_hash is the C level slot that's equivalent to __hash__ for C defined types), so even though it's not defined on your instance directly, nor on the class that created it, it finds __hash__ on the instance type itself through weird and terrible magic (actually, the magic is in how it manages to use your class's features at all, given that its type is instance).
You can see this in the interactive interpreter:
>>> class Foo: pass
>>> Foo().__hash__ # Same basic error for Foo.__hash__ too
AttributeError Traceback (most recent call last)
...
----> 1 Foo().__hash__
AttributeError: Foo instance has no attribute '__hash__'
>>> type(Foo())
<type 'instance'>
>>> type(Foo()).__hash__
<slot wrapper '__hash__' of 'instance' objects>
This works even though the instance itself can't see __hash__ because "special methods" (those documented special methods that begin and end with double underscores) are looked up on the type, not the instance, so __hash__ is found on instance itself. At the C level, hash(x) is doing the equivalent of type(x).__hash__(x) (it's a little more complicated because it won't use the default __hash__ implementation if __eq__ has a custom definition, but that's the general idea).
I can't understand why the following code behaves a particular way, which is described below:
from abc import ABCMeta
class PackageClass(object):
__metaclass__ = ABCMeta
class MyClass1(PackageClass):
pass
MyClass2 = type('MyClass2', (PackageClass, ), {})
print MyClass1
print MyClass2
>>> <class '__main__.MyClass1'>
>>> <class 'abc.MyClass2'>
Why does repr(MyClass2) says abc.MyClass2 (which is by the way not true)?
Thank you!
The problem stems from the fact that ABCMeta overrides __new__ and calls its superclass constructor (type()) there. type() derives the __module__ for the new class from its calling context1; in this case, the type call appears to come from the abc module. Hence, the new class has __module__ set to abc (since type() has no way of knowing that the actual class construction took place in __main__).
The easy way around is to just set __module__ yourself after creating the type:
MyClass2 = type('MyClass2', (PackageClass, ), {})
MyClass2.__module__ = __name__
I would also recommend filing a bug report.
Related: Base metaclass overriding __new__ generates classes with a wrong __module__, Weird inheritance with metaclasses
1: type is a type object defined in C. Its new method uses the current global __name__ as the __module__, unless it calls a metaclass constructor.