What is meant by "classes themselves are objects"? - python

I was just reading the Python documentation about classes; it says, in Python "classes themselves are objects". How is that different from classes in C#, Java, Ruby, or Smalltalk?
What advantages and disadvantages does this type of classes have compared with those other languages?

In Python, classes are objects in the sense that you can assign them to variables, pass them to functions, etc. just like any other objects. For example
>>> t = type(10)
>>> t
<type 'int'>
>>> len(t.__dict__)
55
>>> t() # construct an int
0
>>> t(10)
10
Java has Class objects which provide some information about a class, but you can't use them in place of explicit class names. They aren't really classes, just class information structures.
Class C = x.getClass();
new C(); // won't work

Declaring a class is simply declaring a variable:
class foo(object):
def bar(self): pass
print foo # <class '__main__.foo'>
They can be assigned and stored like any variable:
class foo(object):
pass
class bar(object):
pass
baz = bar # simple variable assignment
items = [foo, bar]
my_foo = items[0]() # creates a foo
for x in (foo, bar): # create one of each type
print x()
and passed around as a variable:
class foo(object):
def __init__(self):
print "created foo"
def func(f):
f()
func(foo)
They can be created by functions, including the base class list:
def func(base_class, var):
class cls(base_class):
def g(self):
print var
return cls
class my_base(object):
def f(self): print "hello"
new_class = func(my_base, 10)
obj = new_class()
obj.f() # hello
obj.g() # 10
By contrast, while classes in Java have objects representing them, eg. String.class, the class name itself--String--isn't an object and can't be manipulated as one. That's inherent to statically-typed languages.

In C# and Java the classes are not objects. They are types, in the sense in which those languages are statically typed. True you can get an object representing a specific class - but that's not the same as the class itself.
In python what looks like a class is actually an object too.
It's exlpained here much better than I can ever do :)

The main difference is that they mean you can easily manipulate the class as an object. The same facility is available in Java, where you can use the methods of Class to get at information about the class of an object. In languages like Python, Ruby, and Smalltalk, the more dynamic nature of the language lets you "open" the class and change it, which is sometimes called "monkey patching".
Personally I don't think the differences are all that much of a big deal, but I'm sure we can get a good religious war started about it.

Classes are objects in that they are manipulable in Python code just like any object. Others have shown how you can pass them around to functions, allowing them to be operated upon like any object. Here is how you might do this:
class Foo(object):
pass
f = Foo()
f.a = "a" # assigns attribute on instance f
Foo.b = "b" # assigns attribute on class Foo, and thus on all instances including f
print f.a, f.b
Second, like all objects, classes are instantiated at runtime. That is, a class definition is code that is executed rather than a structure that is compiled before anything runs. This means a class can "bake in" things that are only known when the program is run, such as environment variables or user input. These are evaluated once when the class is declared and then become a part of the class. This is different from compiled languages like C# which require this sort of behavior to be implemented differently.
Finally, classes, like any object, are built from classes. Just as an object is built from a class, so is a class built from a special kind of class called a metaclass. You can write your own metaclasses to change how classes are defined.

Another advantage of classes being objects is that objects can change their class at runtime:
>>> class MyClass(object):
... def foo(self):
... print "Yo There! I'm a MyCLass-Object!"
...
>>> class YourClass(object):
... def foo(self):
... print "Guess what?! I'm a YourClass-Object!"
...
>>> o = MyClass()
>>> o.foo()
Yo There! I'm a MyCLass-Object!
>>> o.__class__ = YourClass
>>> o.foo()
Guess what?! I'm a YourClass-Object!
Objects have a special attribute __class__ that points to the class of which they are an instance. This is possible only because classes are objects themself, and therefore can be bound to an attribute like __class__.

As this question has a Smalltalk tag, this answer is from a Smalltalk perspective. In Object-Oriented programming, things get done through message-passing. You send a message to an object, if the object understands that message, it executes the corresponding method and returns a value. But how is the object created in the first place? If special syntax is introduced for creating objects that will break the simple syntax based on message passing. This is what happens in languages like Java:
p = new Point(10, 20); // Creates a new Point object with the help of a special keyword - new.
p.draw(); // Sends the message `draw` to the Point object.
As it is evident from the above code, the language has two ways to get things done - one imperative and the other Object Oriented. In contrast, Smalltalk has a consistent syntax based only on messaging:
p := Point new: 10 y: 20.
p draw.
Here new is a message send to a singleton object called Point which is an instance of a Metaclass. In addition to giving the language a consistent model of computation, metaclasses allow dynamic modification of classes. For instance, the following statement will add a new instance variable to the Point class without requiring a recompilation or VM restart:
Point addInstVarName: 'z'.
The best reading on this subject is The Art of the Metaobject Protocol.

Related

Python OOP - explain how does work static with type(self)

I've seen examples of how I can make a static variable in a class. Here is an example:
class A:
_i = 0
#property
def i(self):
print(type(self)) # <class '__main__.A'>
return type(self)._i
#i.setter
def i(self, val):
type(self)._i = val
How does it work? How does type(self) work and make these variables static?
What does < class '__main__.A' > mean in OOP and polymorphism?
How type(self) works
It just returns the type of self. You can call type on any object to get its type:
>>> type(2)
int
>>> class C(object): pass
>>> type(C)
type
>>> c = C()
>>> type(c)
__main__.C
(The output may look slightly different on Python 2 vs. Python 3, or on different Python implementations.)
… and make that variables are static?
Well, first, these aren't static variables, they're class variables. If you don't have any inheritance, there's no difference—but if you do… I'll come back to that.
If you create an instance:
>>> a = A()
… and assign a value to i:
>>> a.i = 3
… it calls the setter for the i property, passing a as the self parameter—just like a normal method call.
So, since self is a, and type(a) is A, then type(self) is also A.
Which means type(self)._i is A._i. In other words, it's the class attribute.
So, why is this a class attribute rather than a static attribute? Well, let's add a subclass:
>>> a = A()
>>> class B(A):
... _i = 1
>>> b = B()
>>> b.i = 5
>>> A._i
0
>>> B._i
5
Each subclass can have its own _i. And because the setter is setting type(self)._i, when self is b, so type(self) is B, type(self)._i is B._i, not A._i.
What does it < class '__main__.A' > mean in OOP and polymorphism
In Python, everything has a repr, meant for programmers, that gets printed out when you evaluate it at the interactive prompt. It's also used as the str (the thing that gets printed by print) if there's nothing better to use as a human-readable (as in real humans, not us programmers) representation.
In general, a repr is either:
A string that could be pasted into your code to produce an equal value, if that makes sense and is possible/reasonable.
A string inside <> that includes the type, some kind of identifying information if there is any, and some way to distinguish the object from other instances, otherwise.
For types, what you get inside the angle brackets is the fact that it's a class (that's the class part), and the qualified name (that's the __main__.A part, telling you that it's a class named A defined at the top level of a module named __main__), which is both the useful identifier and the way to distinguish it from other classes.
What does is mean specifically in OOP and polymorphism? I can't think of a good answer to that. Even if Python didn't support polymorphism, even if it didn't have first-class types, __main__.A and __main__.B would still be different objects worthy of distinct names, right?
And if you're wondering what kind of name __main__ is: that's just the name of the special module used to run your top-level script or your interactive interpreter session. If you've ever seen a if __name__ == '__main__': guard, that's exactly what it's testing.
type() returns the class that an instance was constructed from. I.e. if you do foo = A() (create a new instance of A and assign it to foo), then type(foo) returns A again. That's what <class '__main__.A'> is, it tells you it's the A class object.
Now all you're doing with type(self)._i is the same as A._i. That _i is an attribute of the class A object, which only exists once. Et voilà, that's all that "static" attributes are.

Properties seem to set to the same value for all objects (Python) [duplicate]

What is the difference between class and instance variables in Python?
class Complex:
a = 1
and
class Complex:
def __init__(self):
self.a = 1
Using the call: x = Complex().a in both cases assigns x to 1.
A more in-depth answer about __init__() and self will be appreciated.
When you write a class block, you create class attributes (or class variables). All the names you assign in the class block, including methods you define with def become class attributes.
After a class instance is created, anything with a reference to the instance can create instance attributes on it. Inside methods, the "current" instance is almost always bound to the name self, which is why you are thinking of these as "self variables". Usually in object-oriented design, the code attached to a class is supposed to have control over the attributes of instances of that class, so almost all instance attribute assignment is done inside methods, using the reference to the instance received in the self parameter of the method.
Class attributes are often compared to static variables (or methods) as found in languages like Java, C#, or C++. However, if you want to aim for deeper understanding I would avoid thinking of class attributes as "the same" as static variables. While they are often used for the same purposes, the underlying concept is quite different. More on this in the "advanced" section below the line.
An example!
class SomeClass:
def __init__(self):
self.foo = 'I am an instance attribute called foo'
self.foo_list = []
bar = 'I am a class attribute called bar'
bar_list = []
After executing this block, there is a class SomeClass, with 3 class attributes: __init__, bar, and bar_list.
Then we'll create an instance:
instance = SomeClass()
When this happens, SomeClass's __init__ method is executed, receiving the new instance in its self parameter. This method creates two instance attributes: foo and foo_list. Then this instance is assigned into the instance variable, so it's bound to a thing with those two instance attributes: foo and foo_list.
But:
print instance.bar
gives:
I am a class attribute called bar
How did this happen? When we try to retrieve an attribute through the dot syntax, and the attribute doesn't exist, Python goes through a bunch of steps to try and fulfill your request anyway. The next thing it will try is to look at the class attributes of the class of your instance. In this case, it found an attribute bar in SomeClass, so it returned that.
That's also how method calls work by the way. When you call mylist.append(5), for example, mylist doesn't have an attribute named append. But the class of mylist does, and it's bound to a method object. That method object is returned by the mylist.append bit, and then the (5) bit calls the method with the argument 5.
The way this is useful is that all instances of SomeClass will have access to the same bar attribute. We could create a million instances, but we only need to store that one string in memory, because they can all find it.
But you have to be a bit careful. Have a look at the following operations:
sc1 = SomeClass()
sc1.foo_list.append(1)
sc1.bar_list.append(2)
sc2 = SomeClass()
sc2.foo_list.append(10)
sc2.bar_list.append(20)
print sc1.foo_list
print sc1.bar_list
print sc2.foo_list
print sc2.bar_list
What do you think this prints?
[1]
[2, 20]
[10]
[2, 20]
This is because each instance has its own copy of foo_list, so they were appended to separately. But all instances share access to the same bar_list. So when we did sc1.bar_list.append(2) it affected sc2, even though sc2 didn't exist yet! And likewise sc2.bar_list.append(20) affected the bar_list retrieved through sc1. This is often not what you want.
Advanced study follows. :)
To really grok Python, coming from traditional statically typed OO-languages like Java and C#, you have to learn to rethink classes a little bit.
In Java, a class isn't really a thing in its own right. When you write a class you're more declaring a bunch of things that all instances of that class have in common. At runtime, there's only instances (and static methods/variables, but those are really just global variables and functions in a namespace associated with a class, nothing to do with OO really). Classes are the way you write down in your source code what the instances will be like at runtime; they only "exist" in your source code, not in the running program.
In Python, a class is nothing special. It's an object just like anything else. So "class attributes" are in fact exactly the same thing as "instance attributes"; in reality there's just "attributes". The only reason for drawing a distinction is that we tend to use objects which are classes differently from objects which are not classes. The underlying machinery is all the same. This is why I say it would be a mistake to think of class attributes as static variables from other languages.
But the thing that really makes Python classes different from Java-style classes is that just like any other object each class is an instance of some class!
In Python, most classes are instances of a builtin class called type. It is this class that controls the common behaviour of classes, and makes all the OO stuff the way it does. The default OO way of having instances of classes that have their own attributes, and have common methods/attributes defined by their class, is just a protocol in Python. You can change most aspects of it if you want. If you've ever heard of using a metaclass, all that is is defining a class that is an instance of a different class than type.
The only really "special" thing about classes (aside from all the builtin machinery to make them work they way they do by default), is the class block syntax, to make it easier for you to create instances of type. This:
class Foo(BaseFoo):
def __init__(self, foo):
self.foo = foo
z = 28
is roughly equivalent to the following:
def __init__(self, foo):
self.foo = foo
classdict = {'__init__': __init__, 'z': 28 }
Foo = type('Foo', (BaseFoo,) classdict)
And it will arrange for all the contents of classdict to become attributes of the object that gets created.
So then it becomes almost trivial to see that you can access a class attribute by Class.attribute just as easily as i = Class(); i.attribute. Both i and Class are objects, and objects have attributes. This also makes it easy to understand how you can modify a class after it's been created; just assign its attributes the same way you would with any other object!
In fact, instances have no particular special relationship with the class used to create them. The way Python knows which class to search for attributes that aren't found in the instance is by the hidden __class__ attribute. Which you can read to find out what class this is an instance of, just as with any other attribute: c = some_instance.__class__. Now you have a variable c bound to a class, even though it probably doesn't have the same name as the class. You can use this to access class attributes, or even call it to create more instances of it (even though you don't know what class it is!).
And you can even assign to i.__class__ to change what class it is an instance of! If you do this, nothing in particular happens immediately. It's not earth-shattering. All that it means is that when you look up attributes that don't exist in the instance, Python will go look at the new contents of __class__. Since that includes most methods, and methods usually expect the instance they're operating on to be in certain states, this usually results in errors if you do it at random, and it's very confusing, but it can be done. If you're very careful, the thing you store in __class__ doesn't even have to be a class object; all Python's going to do with it is look up attributes under certain circumstances, so all you need is an object that has the right kind of attributes (some caveats aside where Python does get picky about things being classes or instances of a particular class).
That's probably enough for now. Hopefully (if you've even read this far) I haven't confused you too much. Python is neat when you learn how it works. :)
What you're calling an "instance" variable isn't actually an instance variable; it's a class variable. See the language reference about classes.
In your example, the a appears to be an instance variable because it is immutable. It's nature as a class variable can be seen in the case when you assign a mutable object:
>>> class Complex:
>>> a = []
>>>
>>> b = Complex()
>>> c = Complex()
>>>
>>> # What do they look like?
>>> b.a
[]
>>> c.a
[]
>>>
>>> # Change b...
>>> b.a.append('Hello')
>>> b.a
['Hello']
>>> # What does c look like?
>>> c.a
['Hello']
If you used self, then it would be a true instance variable, and thus each instance would have it's own unique a. An object's __init__ function is called when a new instance is created, and self is a reference to that instance.

Python weird class variables usage

Suppose we have the following code:
class A:
var = 0
a = A()
I do understand that a.var and A.var are different variables, and I think I understand why this thing happens. I thought it was just a side effect of python's data model, since why would someone want to modify a class variable in an instance?
However, today I came across a strange example of such a usage: it is in google app engine db.Model reference. Google app engine datastore assumes we inherit db.Model class and introduce keys as class variables:
class Story(db.Model):
title = db.StringProperty()
body = db.TextProperty()
created = db.DateTimeProperty(auto_now_add=True)
s = Story(title="The Three Little Pigs")
I don't understand why do they expect me to do like that? Why not introduce a constructor and use only instance variables?
The db.Model class is a 'Model' style class in classic Model View Controller design pattern.
Each of the assignments in there are actually setting up columns in the database, while also giving an easy to use interface for you to program with. This is why
title="The Three Little Pigs"
will update the object as well as the column in the database.
There is a constructor (no doubt in db.Model) that handles this pass-off logic, and it will take a keyword args list and digest it to create this relational model.
This is why the variables are setup the way they are, so that relation is maintained.
Edit: Let me describe that better. A normal class just sets up the blue print for an object. It has instance variables and class variables. Because of the inheritence to db.Model, this is actually doing a third thing: Setting up column definitions in a database. In order to do this third task it is making EXTENSIVE behinds the scenes changes to things like attribute setting and getting. Pretty much once you inherit from db.Model you aren't really a class anymore, but a DB template. Long story short, this is a VERY specific edge case of the use of a class
If all variables are declared as instance variables then the classes using Story class as superclass will inherit nothing from it.
From the Model and Property docs, it looks like Model has overridden __getattr__ and __setattr__ methods so that, in effect, "Story.title = ..." does not actually set the instance attribute; instead it sets the value stored with the instance's Property.
If you ask for story.__dict__['title'], what does it give you?
I do understand that a.var and A.var are different variables
First off: as of now, no, they aren't.
In Python, everything you declare inside the class block belongs to the class. You can look up attributes of the class via the instance, if the instance doesn't already have something with that name. When you assign to an attribute of an instance, the instance now has that attribute, regardless of whether it had one before. (__init__, in this regard, is just another function; it's called automatically by Python's machinery, but it simply adds attributes to an object, it doesn't magically specify some kind of template for the contents of all instances of the class - there's the magic __slots__ class attribute for that, but it still doesn't do quite what you might expect.)
But right now, a has no .var of its own, so a.var refers to A.var. And you can modify a class attribute via an instance - but note modify, not replace. This requires, of course, that the original value of the attribute is something modifiable - a list qualifies, a str doesn't.
Your GAE example, though, is something totally different. The class Story has attributes which specifically are "properties", which can do assorted magic when you "assign to" them. This works by using the class' __getattr__, __setattr__ etc. methods to change the behaviour of the assignment syntax.
The other answers have it mostly right, but miss one critical thing.
If you define a class like this:
class Foo(object):
a = 5
and an instance:
myinstance = Foo()
Then Foo.a and myinstance.a are the very same variable. Changing one will change the other, and if you create multiple instances of Foo, the .a property on each will be the same variable. This is because of the way Python resolves attribute access: First it looks in the object's dict, and if it doesn't find it there, it looks in the class's dict, and so forth.
That also helps explain why assignments don't work the way you'd expect given the shared nature of the variable:
>>> bar = Foo()
>>> baz = Foo()
>>> Foo.a = 6
>>> bar.a = 7
>>> bar.a
7
>>> baz.a
6
What happened here is that when we assigned to Foo.a, it modified the variable that all instance of Foo normally resolve when you ask for instance.a. But when we assigned to bar.a, Python created a new variable on that instance called a, which now masks the class variable - from now on, that particular instance will always see its own local value.
If you wanted each instance of your class to have a separate variable initialized to 5, the normal way to do it would be like this:
class Foo(object);
def __init__(self):
self.a = 5
That is, you define a class with a constructor that sets the a variable on the new instance to 5.
Finally, what App Engine is doing is an entirely different kind of black magic called descriptors. In short, Python allows objects to define special __get__ and __set__ methods. When an instance of a class that defines these special methods is attached to a class, and you create an instance of that class, attempts to access the attribute will, instead of setting or returning the instance or class variable, they call the special __get__ and __set__ methods. A much more comprehensive introduction to descriptors can be found here, but here's a simple demo:
class MultiplyDescriptor(object):
def __init__(self, multiplicand, initial=0):
self.multiplicand = multiplicand
self.value = initial
def __get__(self, obj, objtype):
if obj is None:
return self
return self.multiplicand * self.value
def __set__(self, obj, value):
self.value = value
Now you can do something like this:
class Foo(object):
a = MultiplyDescriptor(2)
bar = Foo()
bar.a = 10
print bar.a # Prints 20!
Descriptors are the secret sauce behind a surprising amount of the Python language. For instance, property is implemented using descriptors, as are methods, static and class methods, and a bunch of other stuff.
These class variables are metadata to Google App Engine generate their models.
FYI, in your example, a.var == A.var.
>>> class A:
... var = 0
...
... a = A()
... A.var = 3
... a.var == A.var
1: True

Is accessing class variables via an instance documented?

In Python, class variables can be accessed via that class instance:
>>> class A(object):
... x = 4
...
>>> a = A()
>>> a.x
4
It's easy to show that a.x is really resolved to A.x, not copied to an instance during construction:
>>> A.x = 5
>>> a.x
5
Despite the fact that this behavior is well known and widely used, I couldn't find any definitive documentation covering it. The closest I could find in Python docs was the section on classes:
class MyClass:
"""A simple example class"""
i = 12345
def f(self):
return 'hello world'
[snip]
... By definition, all attributes of a class that are function objects define corresponding methods of its instances. So in our example, x.f is a valid method reference, since MyClass.f is a function, but x.i is not, since MyClass.i is not. ...
However, this part talks specifically about methods so it's probably not relevant to the general case.
My question is, is this documented? Can I rely on this behavior?
Refs the Classes and Class instances parts in the Python data model documentation
A class has a namespace implemented by a dictionary object. Class
attribute references are translated to lookups in this dictionary,
e.g., C.x is translated to C.__dict__["x"] (although for new-style classes in particular there are a number of hooks which allow for other means of locating attributes).
...
A class instance is created by calling a class object (see above). A
class instance has a namespace implemented as a dictionary which is
the first place in which attribute references are searched. When an
attribute is not found there, and the instance’s class has an
attribute by that name, the search continues with the class
attributes.
Generally, this usage is fine, except the special cases mentioned as "for new-style classes in particular there are a number of hooks which allow for other means of locating attributes".
Not only can you rely on this behavior, you constantly do.
Think about methods. A method is merely a function that has been made a class attribute. You then look it up on the instance.
>>> def foo(self, x):
... print "foo:", self, x
...
>>> class C(object):
... method = foo # What a weird way to write this! But perhaps illustrative?
...
>>> C().method("hello")
foo: <__main__.C object at 0xadad50> hello
In the case of objects like functions, this isn't a plain lookup, but some magic occurs to pass self automatically. You may have used other objects that are meant to be stored as class attributes and looked up on the instance; properties are an example (check out the property builtin if you're not familiar with it.)
As okm notes, the way this works is described in the data model reference (including information about and links to more information about the magic that makes methods and properties work). The Data Model page is by far the most useful part of the Language Reference; it also includes among other things documentation about almost all the __foo__ methods and names.

What is the purpose of python's inner classes?

Python's inner/nested classes confuse me. Is there something that can't be accomplished without them? If so, what is that thing?
Quoted from http://www.geekinterview.com/question_details/64739:
Advantages of inner class:
Logical grouping of classes: If a class is useful to only one other class then it is logical to embed it in that class and keep the two together. Nesting such "helper classes" makes their package more streamlined.
Increased encapsulation: Consider two top-level classes A and B where B needs access to members of A that would otherwise be declared private. By hiding class B within class A A's members can be declared private and B can access them. In addition B itself can be hidden from the outside world.
More readable, maintainable code: Nesting small classes within top-level classes places the code closer to where it is used.
The main advantage is organization. Anything that can be accomplished with inner classes can be accomplished without them.
Is there something that can't be accomplished without them?
No. They are absolutely equivalent to defining the class normally at top level, and then copying a reference to it into the outer class.
I don't think there's any special reason nested classes are ‘allowed’, other than it makes no particular sense to explicitly ‘disallow’ them either.
If you're looking for a class that exists within the lifecycle of the outer/owner object, and always has a reference to an instance of the outer class — inner classes as Java does it – then Python's nested classes are not that thing. But you can hack up something like that thing:
import weakref, new
class innerclass(object):
"""Descriptor for making inner classes.
Adds a property 'owner' to the inner class, pointing to the outer
owner instance.
"""
# Use a weakref dict to memoise previous results so that
# instance.Inner() always returns the same inner classobj.
#
def __init__(self, inner):
self.inner= inner
self.instances= weakref.WeakKeyDictionary()
# Not thread-safe - consider adding a lock.
#
def __get__(self, instance, _):
if instance is None:
return self.inner
if instance not in self.instances:
self.instances[instance]= new.classobj(
self.inner.__name__, (self.inner,), {'owner': instance}
)
return self.instances[instance]
# Using an inner class
#
class Outer(object):
#innerclass
class Inner(object):
def __repr__(self):
return '<%s.%s inner object of %r>' % (
self.owner.__class__.__name__,
self.__class__.__name__,
self.owner
)
>>> o1= Outer()
>>> o2= Outer()
>>> i1= o1.Inner()
>>> i1
<Outer.Inner inner object of <__main__.Outer object at 0x7fb2cd62de90>>
>>> isinstance(i1, Outer.Inner)
True
>>> isinstance(i1, o1.Inner)
True
>>> isinstance(i1, o2.Inner)
False
(This uses class decorators, which are new in Python 2.6 and 3.0. Otherwise you'd have to say “Inner= innerclass(Inner)” after the class definition.)
There's something you need to wrap your head around to be able to understand this. In most languages, class definitions are directives to the compiler. That is, the class is created before the program is ever run. In python, all statements are executable. That means that this statement:
class foo(object):
pass
is a statement that is executed at runtime just like this one:
x = y + z
This means that not only can you create classes within other classes, you can create classes anywhere you want to. Consider this code:
def foo():
class bar(object):
...
z = bar()
Thus, the idea of an "inner class" isn't really a language construct; it's a programmer construct. Guido has a very good summary of how this came about here. But essentially, the basic idea is this simplifies the language's grammar.
Nesting classes within classes:
Nested classes bloat the class definition making it harder to see whats going on.
Nested classes can create coupling that would make testing more difficult.
In Python you can put more than one class in a file/module, unlike Java, so the class still remains close to top level class and could even have the class name prefixed with an "_" to help signify that others shouldn't be using it.
The place where nested classes can prove useful is within functions
def some_func(a, b, c):
class SomeClass(a):
def some_method(self):
return b
SomeClass.__doc__ = c
return SomeClass
The class captures the values from the function allowing you to dynamically create a class like template metaprogramming in C++
I understand the arguments against nested classes, but there is a case for using them in some occasions. Imagine I'm creating a doubly-linked list class, and I need to create a node class for maintaing the nodes. I have two choices, create Node class inside the DoublyLinkedList class, or create the Node class outside the DoublyLinkedList class. I prefer the first choice in this case, because the Node class is only meaningful inside the DoublyLinkedList class. While there's no hiding/encapsulation benefit, there is a grouping benefit of being able to say the Node class is part of the DoublyLinkedList class.
Is there something that can't be accomplished without them? If so,
what is that thing?
There is something that cannot be easily done without: inheritance of related classes.
Here is a minimalist example with the related classes A and B:
class A(object):
class B(object):
def __init__(self, parent):
self.parent = parent
def make_B(self):
return self.B(self)
class AA(A): # Inheritance
class B(A.B): # Inheritance, same class name
pass
This code leads to a quite reasonable and predictable behaviour:
>>> type(A().make_B())
<class '__main__.A.B'>
>>> type(A().make_B().parent)
<class '__main__.A'>
>>> type(AA().make_B())
<class '__main__.AA.B'>
>>> type(AA().make_B().parent)
<class '__main__.AA'>
If B were a top-level class, you could not write self.B() in the method make_B but would simply write B(), and thus lose the dynamic binding to the adequate classes.
Note that in this construction, you should never refer to class A in the body of class B. This is the motivation for introducing the parent attribute in class B.
Of course, this dynamic binding can be recreated without inner class at the cost of a tedious and error-prone instrumentation of the classes.
1. Two functionally equivalent ways
The two ways shown before are functionally identical. However, there are some subtle differences, and there are situations when you would like to choose one over another.
Way 1: Nested class definition (="Nested class")
class MyOuter1:
class Inner:
def show(self, msg):
print(msg)
Way 2: With module level Inner class attached to Outer class(="Referenced inner class")
class _InnerClass:
def show(self, msg):
print(msg)
class MyOuter2:
Inner = _InnerClass
Underscore is used to follow PEP8 "internal interfaces (packages, modules, classes, functions, attributes or other names) should -- be prefixed with a single leading underscore."
2. Similarities
Below code snippet demonstrates the functional similarities of the "Nested class" vs "Referenced inner class"; They would behave the same way in code checking for the type of an inner class instance. Needless to say, the m.inner.anymethod() would behave similarly with m1 and m2
m1 = MyOuter1()
m2 = MyOuter2()
innercls1 = getattr(m1, 'Inner', None)
innercls2 = getattr(m2, 'Inner', None)
isinstance(innercls1(), MyOuter1.Inner)
# True
isinstance(innercls2(), MyOuter2.Inner)
# True
type(innercls1()) == mypackage.outer1.MyOuter1.Inner
# True (when part of mypackage)
type(innercls2()) == mypackage.outer2.MyOuter2.Inner
# True (when part of mypackage)
3. Differences
The differences of "Nested class" and "Referenced inner class" are listed below. They are not big, but sometimes you would like to choose one or the other based on these.
3.1 Code Encapsulation
With "Nested classes" it is possible to encapsulate code better than with "Referenced inner class". A class in the module namespace is a global variable. The purpose of nested classes is to reduce clutter in the module and put the inner class inside the outer class.
While no-one* is using from packagename import *, low amount of module level variables can be nice for example when using an IDE with code completion / intellisense.
*Right?
3.2 Readability of code
Django documentation instructs to use inner class Meta for model metadata. It is a bit more clearer* to instruct the framework users to write a class Foo(models.Model) with inner class Meta;
class Ox(models.Model):
horn_length = models.IntegerField()
class Meta:
ordering = ["horn_length"]
verbose_name_plural = "oxen"
instead of "write a class _Meta, then write a class Foo(models.Model) with Meta = _Meta";
class _Meta:
ordering = ["horn_length"]
verbose_name_plural = "oxen"
class Ox(models.Model):
Meta = _Meta
horn_length = models.IntegerField()
With the "Nested class" approach the code can be read a nested bullet point list, but with the "Referenced inner class" method one has to scroll back up to see the definition of _Meta to see its "child items" (attributes).
The "Referenced inner class" method can be more readable if your code nesting level grows or the rows are long for some other reason.
* Of course, a matter of taste
3.3 Slightly different error messages
This is not a big deal, but just for completeness: When accessing non-existent attribute for the inner class, we see slighly different exceptions. Continuing the example given in Section 2:
innercls1.foo()
# AttributeError: type object 'Inner' has no attribute 'foo'
innercls2.foo()
# AttributeError: type object '_InnerClass' has no attribute 'foo'
This is because the types of the inner classes are
type(innercls1())
#mypackage.outer1.MyOuter1.Inner
type(innercls2())
#mypackage.outer2._InnerClass
The main use case I use this for is the prevent proliferation of small modules and to prevent namespace pollution when separate modules are not needed. If I am extending an existing class, but that existing class must reference another subclass that should always be coupled to it. For example, I may have a utils.py module that has many helper classes in it, that aren't necessarily coupled together, but I want to reinforce coupling for some of those helper classes. For example, when I implement https://stackoverflow.com/a/8274307/2718295
:utils.py:
import json, decimal
class Helper1(object):
pass
class Helper2(object):
pass
# Here is the notorious JSONEncoder extension to serialize Decimals to JSON floats
class DecimalJSONEncoder(json.JSONEncoder):
class _repr_decimal(float): # Because float.__repr__ cannot be monkey patched
def __init__(self, obj):
self._obj = obj
def __repr__(self):
return '{:f}'.format(self._obj)
def default(self, obj): # override JSONEncoder.default
if isinstance(obj, decimal.Decimal):
return self._repr_decimal(obj)
# else
super(self.__class__, self).default(obj)
# could also have inherited from object and used return json.JSONEncoder.default(self, obj)
Then we can:
>>> from utils import DecimalJSONEncoder
>>> import json, decimal
>>> json.dumps({'key1': decimal.Decimal('1.12345678901234'),
... 'key2':'strKey2Value'}, cls=DecimalJSONEncoder)
{"key2": "key2_value", "key_1": 1.12345678901234}
Of course, we could have eschewed inheriting json.JSONEnocder altogether and just override default():
:
import decimal, json
class Helper1(object):
pass
def json_encoder_decimal(obj):
class _repr_decimal(float):
...
if isinstance(obj, decimal.Decimal):
return _repr_decimal(obj)
return json.JSONEncoder(obj)
>>> json.dumps({'key1': decimal.Decimal('1.12345678901234')}, default=json_decimal_encoder)
'{"key1": 1.12345678901234}'
But sometimes just for convention, you want utils to be composed of classes for extensibility.
Here's another use-case: I want a factory for mutables in my OuterClass without having to invoke copy:
class OuterClass(object):
class DTemplate(dict):
def __init__(self):
self.update({'key1': [1,2,3],
'key2': {'subkey': [4,5,6]})
def __init__(self):
self.outerclass_dict = {
'outerkey1': self.DTemplate(),
'outerkey2': self.DTemplate()}
obj = OuterClass()
obj.outerclass_dict['outerkey1']['key2']['subkey'].append(4)
assert obj.outerclass_dict['outerkey2']['key2']['subkey'] == [4,5,6]
I prefer this pattern over the #staticmethod decorator you would otherwise use for a factory function.
I have used Python's inner classes to create deliberately buggy subclasses within unittest functions (i.e. inside def test_something():) in order to get closer to 100% test coverage (e.g. testing very rarely triggered logging statements by overriding some methods).
In retrospect it's similar to Ed's answer https://stackoverflow.com/a/722036/1101109
Such inner classes should go out of scope and be ready for garbage collection once all references to them have been removed. For instance, take the following inner.py file:
class A(object):
pass
def scope():
class Buggy(A):
"""Do tests or something"""
assert isinstance(Buggy(), A)
I get the following curious results under OSX Python 2.7.6:
>>> from inner import A, scope
>>> A.__subclasses__()
[]
>>> scope()
>>> A.__subclasses__()
[<class 'inner.Buggy'>]
>>> del A, scope
>>> from inner import A
>>> A.__subclasses__()
[<class 'inner.Buggy'>]
>>> del A
>>> import gc
>>> gc.collect()
0
>>> gc.collect() # Yes I needed to call the gc twice, seems reproducible
3
>>> from inner import A
>>> A.__subclasses__()
[]
Hint - Don't go on and try doing this with Django models, which seemed to keep other (cached?) references to my buggy classes.
So in general, I wouldn't recommend using inner classes for this kind of purpose unless you really do value that 100% test coverage and can't use other methods. Though I think it's nice to be aware that if you use the __subclasses__(), that it can sometimes get polluted by inner classes. Either way if you followed this far, I think we're pretty deep into Python at this point, private dunderscores and all.

Categories

Resources