To use init or not to in Python classes - python

I have always defined variables for classes like:
class A():
def __init__(self):
self.x = 1
However, I discovered it is also simply possible to use:
class A():
x = 1
In both cases, a new instance will have a variable x with a value of 1.
Is there any difference?

For further reading, in the Python Tutorial chapter on classes, that matter is discussed in detail. A summary follows:
There is a difference as soon as non-immutable data structures take part in the game.
>>> class A:
... x = [1]
...
>>> a1 = A()
>>> a2 = A()
>>> a1.x.append(2)
>>> a1.x
[1, 2]
>>> a2.x
[1, 2]
In that case, the same instance of x is used for both class instances. When using __init__, new instances are created when a new A instance is created:
>>> class A:
... def __init__(self):
... self.x = [1]
...
>>> a1 = A()
>>> a2 = A()
>>> a1.x.append(2)
>>> a1.x
[1, 2]
>>> a2.x
[1]
In the first example, a list is created and bound to A.x. This can be accessed both using A.x and using A().x (for any A(), such as a1 or a2). They all share the same list object.
In the second example, A does not have an attribute x. Instead, the objects receive an attribute x during initialization, which is distinct for each object.

Your question is very imprecise. You speak about "variables for classes", but later you say "instance will have a variable". In fact, your examples are reversed. Second one shows a class A with a variable x, and the first one shows a class A with no variable x, but whose every instance (after __init__, unless deleted) has a variable x.
If the value is immutable, there is not much difference, since when you have a=A() and a doesn't have a variable x, a.x automatically delegates to A.x. But if the value is mutable, then it matters, since there is only one x in the second example, and as many xs as there are instances (zero, one, two, seventeen,...) in the first one.

Related

How does closures see context variables into the stack?

I would like to understand how the stack frame pushed by calling b() can access the value of x that lives in the stack frame pushed by a().
Is there a pointer from b() frame to a() frame? Or does the runtime copy the value of x as a local variable in the b() frame? Or is there another machanism under the hood?
This example is in python, but is there a universal mechanism to solve that or different languages use different mechanisms?
>>> def a():
... x = 5
... def b():
... return x + 2
... return b()
...
>>> a()
7
In CPython (the implementation most people use) b itself contains a reference to the value. Consider this modification to your function:
def a():
x = 5
def b():
return x + 2
# b.__closure__[0] corresponds to x
print(b.__closure__[0].cell_contents)
x = 9
print(b.__closure__[0].cell_contents)
When you call a, note that the value of the cell content changes with the local variable x.
The __closure__ attribute is a tuple of cell objects, one per variable that b closes over. The cell object basically has one interesting attribute, cell_contents, that acts like a reference to the variable it represents. (You can even assign to the cell_contents attribute to change the value of the variable, but I can't imagine when that would be a good idea.)

Python behavior for immutable default parameter values

>>> def a():
... print "a executed"
... return []
...
>>>
>>> def b(x=a()):
... x.append(5)
... print x
...
a executed
>>> b()
[5]
>>> b()
[5, 5]
x is bound to an empty list object when the function b is first defined. the empty list object gets modified each time b is called because b is bound to the object.
What I don't get is when this happens to immutable objects:
>>> def a():
... print "a executed"
... return 0
...
>>>
>>> def b(x=a()):
... x = x + 2
... print x
...
a executed
>>> b()
2
>>> b()
2
From my POV, x is bound to the int object 0 when the function b is first defined. Then, x is modified when b() is called. Therefore subsequent calls to b() should re-bind x to 2, 4, 6, and so on. Why doesn't this occur? I am obviously missing something important here!
Thx :)
When you do x = you're not modifying the object that x references, you're just changing the reference x to point to a different object, in this case, another int. In this case it's event irrelevant whether x points to an immutable object. If you would do x = x + [5] with lists, it would also remain unchanged. Note the difference:
def b1(x = []):
x = x + [5]
print(x)
def b2(x = []):
x.append(5)
print(x)
print("b1:")
b1()
print("b1:")
b1()
print("b2:")
b2()
print("b2:")
b2()
Gives:
b1:
[5]
b1:
[5]
b2:
[5]
b2:
[5, 5]
When the function is being executed, you're working on a local variable x that either was initialized using the default value, or provided by the caller. So what gets rebound is the local variable x, not the default value for the parameter.
You may want to also read about the difference between formal and actual parameters. It's only slightly related to this problem, but may help you understand this better. An example explanation can be found here.
Careful, there's a huge difference between:
x.append(5)
and:
x = x + 1
Namely, the first mutates the object referenced by x whereas the second creates a new object which is the result of x + 1 and rebinds it to the name x.
Of course, this is a bit of an over-simplification -- e.g. what if you had used += ...
It really falls back on how __add__ and __iadd__ are defined in the first place, but this should get the point across ...
To go a little deeper, you can think of a function as an object or an instance of a class. It has some special attributes which you can even look at if you want to:
>>> def foo(x = lst): pass
...
>>> foo.func_defaults
([],)
>>> foo.func_defaults[0] is lst
True
When the function is defined, func_defaults1 gets set. Every time the function gets called, python looks at the defaults and the stuff which was present in the call and it figures out which defaults to pass into the function and which ones were provided already. The take away is that this is why, when you append to the list in the first case, the change persists -- You're actually changing the value in func_defaults too. In the second case where you use x = x + 1, you're not actually doing anything to change func_defaults -- You're just creating something new and putting it into the function's namespace.
1the attribute is just __defaults__ in python3.x

Static class variables and `self` in Python

Why do the examples below behave differently?
Example 1: foo seems to behave like a class variable that is specific for various objects
class A:
foo = 1
a, b = A(), A()
a.foo = 5
print b.foo
----------------
Output: 1
Example 2: foo seems to behave like a static class variable that is the same for all object. Perhaps the behavior has something to do with lists working as pointers.
class A:
foo = []
a, b = A(), A()
a.foo.append(5)
print b.foo
----------------
Output: [5]
Example 3: Doesn't work
class A:
self.foo = []
a, b = A(), A()
a.foo.append(5)
print b.foo
----------------
Output: Error
The first two examples are both class attributes. The reason they seem different is because you're not doing the same thing in both cases: you're assigning a new value in the first case and modifying the existing value in the second case.
Notice that you are not doing the same thing in the first two examples. In the first example you do a.foo = 5, assigning a new value. In the second example, if you did the analogous thing, assigning, a.foo = [5], you would see the same kind of result as in the first example. But instead you altered the existing list with a.foo.append(5), so the behavior is different. a.foo = 5 changes only the variable (i.e., what value it points to); a.foo.append(5) changes the value itself.
(Notice that there is no way to do the equivalent of the second example in the first example. That is, there's nothing like a.foo.add(1) to add 1 to 5. That's because integers are not mutable but lists are. But what matters is not that lists "are" mutable, but that you mutated one. In other words, it doesn't matter what you can do with a list, it matters what you actually do in the specific code.)
Also, notice that although the foo you defined in the class definition is a class attribute, when you do a.foo = 5, you are creating a new attribute on the instance. It happens to have the same name as the class attribute, but it doesn't change the value of the class attribute, which is what b.foo still sees.
The last example doesn't work because, just like in the first two examples, code inside the class block is at the class scope. There is no self because there are no instances yet at the time the class is defined.
There are many, many other questions about this on StackOverflow and I urge you to search and read a bunch of them to gain a fuller understanding of how this works.
This doesn't work:
class A:
self.foo = []
Which raises an error.
NameError: name 'self' is not defined
Because self is not a keyword in Python, it's just a variable name commonly assigned to the instance of the class that is passed to a method of the class when the class is called.
Here's an example:
class A(object):
def __init__(self):
self.foo = []
a, b = A(), A()
a.foo.append(5)
print(b.foo)
Then returns:
[]
When each one is initialized, they each get their own list which can be accessed by the attribute foo, and when one is modified, the other, being a separate list stored at a different place in memory, is not affected.
The difference has not to do with mutability/immutability, but what operations are performed.
In example 1, the class has an attribute foo. After object creation, you give the object another attribute foo which shadows the former one. So the class attribute acts as a kind of "default" or "fallback".
In example 2, you have one object which you perform an operation on (which, admittedly, only works on mutable objects). So the object referred to by A.foo, which can be accessed as well via a.foo and b.foo due to the lack of an instance attribute with the same name, gets added a 5.
Example 3 doesn't work because self doesn't exist where you use it.
Note that example 1 would as well work with mutable objects, such as lists:
class A:
foo = []
a, b = A(), A()
a.foo = []
a.foo.append(5)
b.foo.append(10)
print a.foo # [5]
print b.foo # [10]
print A.foo # [10]
Here a.foo gets a new, empty list. b.foo, lacking an instance attribute, continues to refer to the class attribute. So we have two empty lists which are independent of each other, as we see when .append()ing.

Understanding reference count of class variable

This is an attempt to better understand how reference count works in Python.
Let's create a class and instantiate it. The instance's reference count would be 1 (getrefcount displays 2 because it's own internal structures reference that class instance increasing reference count by 1):
>>> from sys import getrefcount as grc
>>> class A():
def __init__(self):
self.x = 100000
>>> a = A()
>>> grc(a)
2
a's internal variable x has 2 references:
>>> grc(a.x)
3
I expected it to be referenced by a and by A's __init__ method. Then I decided to check.
So I created a temporary variable b in the __main__ namespace just to be able to access the variable x. It increased the ref-number by 1 for it to become 3 (as expected):
>>> b = a.x
>>> grc(a.x)
4
Then I deleted the class instance and the ref count decreased by 1:
>>> del a
>>> grc(b)
3
So now there are 2 references: one is by b and one is by A (as I expected).
By deleting A from __main__ namespace I expect the count to decrease by 1 again.
>>> del A
>>> grc(b)
3
But it doesn't happen. There is no class A or its instances that may reference 100000, but still it's referenced by something other than b in __main__ namespace.
So, my question is, what is 100000 referenced by apart from b?
BrenBarn suggested that I should use object() instead of a number which may be stored somewhere internally.
>>> class A():
def __init__(self):
self.x = object()
>>> a = A()
>>> b = a.x
>>> grc(a.x)
3
>>> del a
>>> grc(b)
2
After deleting the instance a there were only one reference by b which is very logical.
The only thing that is left to be understood is why it's not that way with number 100000.
a.x is the integer 10000. This constant is referenced by the code object corresponding to the __init__() method of A. Code objects always include references to all literal constants in the code:
>>> def f(): return 10000
>>> f.__code__.co_consts
(None, 10000)
The line
del A
only deletes the name A and decreases the reference count of A. In Python 3.x (but not in 2.x), classes often include some cyclic references, and hence are only garbage collected when you explicitly run the garbage collector. And indeed, using
import gc
gc.collect()
after del A does lead to the reduction of the reference count of b.
It's likely that this is an artifact of your using an integer as your test value. Python sometimes stores integer objects for later re-use, because they are immutable. When I run your code using self.x = object() instead (which will always create a brand-new object for x) I do get grc(b)==2 at the end.

How should I declare default values for instance variables in Python?

Should I give my class members default values like this:
class Foo:
num = 1
or like this?
class Foo:
def __init__(self):
self.num = 1
In this question I discovered that in both cases,
bar = Foo()
bar.num += 1
is a well-defined operation.
I understand that the first method will give me a class variable while the second one will not. However, if I do not require a class variable, but only need to set a default value for my instance variables, are both methods equally good? Or one of them more 'pythonic' than the other?
One thing I've noticed is that in the Django tutorial, they use the second method to declare Models. Personally I think the second method is more elegant, but I'd like to know what the 'standard' way is.
Extending bp's answer, I wanted to show you what he meant by immutable types.
First, this is okay:
>>> class TestB():
... def __init__(self, attr=1):
... self.attr = attr
...
>>> a = TestB()
>>> b = TestB()
>>> a.attr = 2
>>> a.attr
2
>>> b.attr
1
However, this only works for immutable (unchangable) types. If the default value was mutable (meaning it can be replaced), this would happen instead:
>>> class Test():
... def __init__(self, attr=[]):
... self.attr = attr
...
>>> a = Test()
>>> b = Test()
>>> a.attr.append(1)
>>> a.attr
[1]
>>> b.attr
[1]
>>>
Note that both a and b have a shared attribute. This is often unwanted.
This is the Pythonic way of defining default values for instance variables, when the type is mutable:
>>> class TestC():
... def __init__(self, attr=None):
... if attr is None:
... attr = []
... self.attr = attr
...
>>> a = TestC()
>>> b = TestC()
>>> a.attr.append(1)
>>> a.attr
[1]
>>> b.attr
[]
The reason my first snippet of code works is because, with immutable types, Python creates a new instance of it whenever you want one. If you needed to add 1 to 1, Python makes a new 2 for you, because the old 1 cannot be changed. The reason is mostly for hashing, I believe.
The two snippets do different things, so it's not a matter of taste but a matter of what's the right behaviour in your context. Python documentation explains the difference, but here are some examples:
Exhibit A
class Foo:
def __init__(self):
self.num = 1
This binds num to the Foo instances. Change to this field is not propagated to other instances.
Thus:
>>> foo1 = Foo()
>>> foo2 = Foo()
>>> foo1.num = 2
>>> foo2.num
1
Exhibit B
class Bar:
num = 1
This binds num to the Bar class. Changes are propagated!
>>> bar1 = Bar()
>>> bar2 = Bar()
>>> bar1.num = 2 #this creates an INSTANCE variable that HIDES the propagation
>>> bar2.num
1
>>> Bar.num = 3
>>> bar2.num
3
>>> bar1.num
2
>>> bar1.__class__.num
3
Actual answer
If I do not require a class variable, but only need to set a default value for my instance variables, are both methods equally good? Or one of them more 'pythonic' than the other?
The code in exhibit B is plain wrong for this: why would you want to bind a class attribute (default value on instance creation) to the single instance?
The code in exhibit A is okay.
If you want to give defaults for instance variables in your constructor I would however do this:
class Foo:
def __init__(self, num = None):
self.num = num if num is not None else 1
...or even:
class Foo:
DEFAULT_NUM = 1
def __init__(self, num = None):
self.num = num if num is not None else DEFAULT_NUM
...or even: (preferrable, but if and only if you are dealing with immutable types!)
class Foo:
def __init__(self, num = 1):
self.num = num
This way you can do:
foo1 = Foo(4)
foo2 = Foo() #use default
Using class members to give default values works very well just so long as you are careful only to do it with immutable values. If you try to do it with a list or a dict that would be pretty deadly. It also works where the instance attribute is a reference to a class just so long as the default value is None.
I've seen this technique used very successfully in repoze which is a framework that runs on top of Zope. The advantage here is not just that when your class is persisted to the database only the non-default attributes need to be saved, but also when you need to add a new field into the schema all the existing objects see the new field with its default value without any need to actually change the stored data.
I find it also works well in more general coding, but it's a style thing. Use whatever you are happiest with.
With dataclasses, a feature added in Python 3.7, there is now yet another (quite convenient) way to achieve setting default values on class instances. The decorator dataclass will automatically generate a few methods on your class, such as the constructor. As the documentation linked above notes, "[t]he member variables to use in these generated methods are defined using PEP 526 type annotations".
Considering OP's example, we could implement it like this:
from dataclasses import dataclass
#dataclass
class Foo:
num: int = 0
When constructing an object of this class's type we could optionally overwrite the value.
print('Default val: {}'.format(Foo()))
# Default val: Foo(num=0)
print('Custom val: {}'.format(Foo(num=5)))
# Custom val: Foo(num=5)
Using class members for default values of instance variables is not a good idea, and it's the first time I've seen this idea mentioned at all. It works in your example, but it may fail in a lot of cases. E.g., if the value is mutable, mutating it on an unmodified instance will alter the default:
>>> class c:
... l = []
...
>>> x = c()
>>> y = c()
>>> x.l
[]
>>> y.l
[]
>>> x.l.append(10)
>>> y.l
[10]
>>> c.l
[10]
You can also declare class variables as None which will prevent propagation. This is useful when you need a well defined class and want to prevent AttributeErrors.
For example:
>>> class TestClass(object):
... t = None
...
>>> test = TestClass()
>>> test.t
>>> test2 = TestClass()
>>> test.t = 'test'
>>> test.t
'test'
>>> test2.t
>>>
Also if you need defaults:
>>> class TestClassDefaults(object):
... t = None
... def __init__(self, t=None):
... self.t = t
...
>>> test = TestClassDefaults()
>>> test.t
>>> test2 = TestClassDefaults([])
>>> test2.t
[]
>>> test.t
>>>
Of course still follow the info in the other answers about using mutable vs immutable types as the default in __init__.

Categories

Resources