Understanding reference count of class variable

Understanding reference count of class variable - python

This is an attempt to better understand how reference count works in Python.
Let's create a class and instantiate it. The instance's reference count would be 1 (getrefcount displays 2 because it's own internal structures reference that class instance increasing reference count by 1):
>>> from sys import getrefcount as grc
>>> class A():
def __init__(self):
self.x = 100000
>>> a = A()
>>> grc(a)
2
a's internal variable x has 2 references:
>>> grc(a.x)
3
I expected it to be referenced by a and by A's __init__ method. Then I decided to check.
So I created a temporary variable b in the __main__ namespace just to be able to access the variable x. It increased the ref-number by 1 for it to become 3 (as expected):
>>> b = a.x
>>> grc(a.x)
4
Then I deleted the class instance and the ref count decreased by 1:
>>> del a
>>> grc(b)
3
So now there are 2 references: one is by b and one is by A (as I expected).
By deleting A from __main__ namespace I expect the count to decrease by 1 again.
>>> del A
>>> grc(b)
3
But it doesn't happen. There is no class A or its instances that may reference 100000, but still it's referenced by something other than b in __main__ namespace.
So, my question is, what is 100000 referenced by apart from b?
BrenBarn suggested that I should use object() instead of a number which may be stored somewhere internally.
>>> class A():
def __init__(self):
self.x = object()
>>> a = A()
>>> b = a.x
>>> grc(a.x)
3
>>> del a
>>> grc(b)
2
After deleting the instance a there were only one reference by b which is very logical.
The only thing that is left to be understood is why it's not that way with number 100000.

a.x is the integer 10000. This constant is referenced by the code object corresponding to the __init__() method of A. Code objects always include references to all literal constants in the code:
>>> def f(): return 10000
>>> f.__code__.co_consts
(None, 10000)
The line
del A
only deletes the name A and decreases the reference count of A. In Python 3.x (but not in 2.x), classes often include some cyclic references, and hence are only garbage collected when you explicitly run the garbage collector. And indeed, using
import gc
gc.collect()
after del A does lead to the reduction of the reference count of b.

It's likely that this is an artifact of your using an integer as your test value. Python sometimes stores integer objects for later re-use, because they are immutable. When I run your code using self.x = object() instead (which will always create a brand-new object for x) I do get grc(b)==2 at the end.

Related

How does closures see context variables into the stack?

I would like to understand how the stack frame pushed by calling b() can access the value of x that lives in the stack frame pushed by a().
Is there a pointer from b() frame to a() frame? Or does the runtime copy the value of x as a local variable in the b() frame? Or is there another machanism under the hood?
This example is in python, but is there a universal mechanism to solve that or different languages use different mechanisms?
>>> def a():
... x = 5
... def b():
... return x + 2
... return b()
...
>>> a()
7

In CPython (the implementation most people use) b itself contains a reference to the value. Consider this modification to your function:
def a():
x = 5
def b():
return x + 2
# b.__closure__[0] corresponds to x
print(b.__closure__[0].cell_contents)
x = 9
print(b.__closure__[0].cell_contents)
When you call a, note that the value of the cell content changes with the local variable x.
The __closure__ attribute is a tuple of cell objects, one per variable that b closes over. The cell object basically has one interesting attribute, cell_contents, that acts like a reference to the variable it represents. (You can even assign to the cell_contents attribute to change the value of the variable, but I can't imagine when that would be a good idea.)

How to create cyclic references that still get garbage collected out of scope?

This is the test setup:
import weakref, gc
class A:
pass
def test():
a = A()
b = A()
b.a = a
return weakref.ref(a), weakref.ref(b)
r1, r2 = test()
print(r1(), r2())
# None, None
def test2():
a = A()
a.b = b = A()
b.a = a
return weakref.ref(a), weakref.ref(b)
r1, r2 = test2()
print(r1(), r2())
# <__main__.A object at 0x7f2c2521de80> <__main__.A object at 0x7f2c3404ef28>
I'd expect that since both a and b are out of scope that they'd be garbage collected and disappear. But because they both hold references to eachother it seems like they keep eachother alive.
How can I keep cyclic references in objects but still have them disappear out of scope?

The garbage collector doesn't run all the time, only occasionally or when you ask for it. Try running gc.collect() before the final print. Or [[] for _ in range(gc.get_threshold()[0])], that should also trigger it (it worked for me in your example, but if the objects are in generation 1 or 2 already, this might not suffice). Though I just mean that to illustrate the process, not suggesting to actually use that in production :-)

To use init or not to in Python classes

I have always defined variables for classes like:
class A():
def __init__(self):
self.x = 1
However, I discovered it is also simply possible to use:
class A():
x = 1
In both cases, a new instance will have a variable x with a value of 1.
Is there any difference?

For further reading, in the Python Tutorial chapter on classes, that matter is discussed in detail. A summary follows:
There is a difference as soon as non-immutable data structures take part in the game.
>>> class A:
... x = [1]
...
>>> a1 = A()
>>> a2 = A()
>>> a1.x.append(2)
>>> a1.x
[1, 2]
>>> a2.x
[1, 2]
In that case, the same instance of x is used for both class instances. When using __init__, new instances are created when a new A instance is created:
>>> class A:
... def __init__(self):
... self.x = [1]
...
>>> a1 = A()
>>> a2 = A()
>>> a1.x.append(2)
>>> a1.x
[1, 2]
>>> a2.x
[1]
In the first example, a list is created and bound to A.x. This can be accessed both using A.x and using A().x (for any A(), such as a1 or a2). They all share the same list object.
In the second example, A does not have an attribute x. Instead, the objects receive an attribute x during initialization, which is distinct for each object.

Your question is very imprecise. You speak about "variables for classes", but later you say "instance will have a variable". In fact, your examples are reversed. Second one shows a class A with a variable x, and the first one shows a class A with no variable x, but whose every instance (after __init__, unless deleted) has a variable x.
If the value is immutable, there is not much difference, since when you have a=A() and a doesn't have a variable x, a.x automatically delegates to A.x. But if the value is mutable, then it matters, since there is only one x in the second example, and as many xs as there are instances (zero, one, two, seventeen,...) in the first one.

Static class variables and `self` in Python

Why do the examples below behave differently?
Example 1: foo seems to behave like a class variable that is specific for various objects
class A:
foo = 1
a, b = A(), A()
a.foo = 5
print b.foo
----------------
Output: 1
Example 2: foo seems to behave like a static class variable that is the same for all object. Perhaps the behavior has something to do with lists working as pointers.
class A:
foo = []
a, b = A(), A()
a.foo.append(5)
print b.foo
----------------
Output: [5]
Example 3: Doesn't work
class A:
self.foo = []
a, b = A(), A()
a.foo.append(5)
print b.foo
----------------
Output: Error

The first two examples are both class attributes. The reason they seem different is because you're not doing the same thing in both cases: you're assigning a new value in the first case and modifying the existing value in the second case.
Notice that you are not doing the same thing in the first two examples. In the first example you do a.foo = 5, assigning a new value. In the second example, if you did the analogous thing, assigning, a.foo = [5], you would see the same kind of result as in the first example. But instead you altered the existing list with a.foo.append(5), so the behavior is different. a.foo = 5 changes only the variable (i.e., what value it points to); a.foo.append(5) changes the value itself.
(Notice that there is no way to do the equivalent of the second example in the first example. That is, there's nothing like a.foo.add(1) to add 1 to 5. That's because integers are not mutable but lists are. But what matters is not that lists "are" mutable, but that you mutated one. In other words, it doesn't matter what you can do with a list, it matters what you actually do in the specific code.)
Also, notice that although the foo you defined in the class definition is a class attribute, when you do a.foo = 5, you are creating a new attribute on the instance. It happens to have the same name as the class attribute, but it doesn't change the value of the class attribute, which is what b.foo still sees.
The last example doesn't work because, just like in the first two examples, code inside the class block is at the class scope. There is no self because there are no instances yet at the time the class is defined.
There are many, many other questions about this on StackOverflow and I urge you to search and read a bunch of them to gain a fuller understanding of how this works.

This doesn't work:
class A:
self.foo = []
Which raises an error.
NameError: name 'self' is not defined
Because self is not a keyword in Python, it's just a variable name commonly assigned to the instance of the class that is passed to a method of the class when the class is called.
Here's an example:
class A(object):
def __init__(self):
self.foo = []
a, b = A(), A()
a.foo.append(5)
print(b.foo)
Then returns:
[]
When each one is initialized, they each get their own list which can be accessed by the attribute foo, and when one is modified, the other, being a separate list stored at a different place in memory, is not affected.

The difference has not to do with mutability/immutability, but what operations are performed.
In example 1, the class has an attribute foo. After object creation, you give the object another attribute foo which shadows the former one. So the class attribute acts as a kind of "default" or "fallback".
In example 2, you have one object which you perform an operation on (which, admittedly, only works on mutable objects). So the object referred to by A.foo, which can be accessed as well via a.foo and b.foo due to the lack of an instance attribute with the same name, gets added a 5.
Example 3 doesn't work because self doesn't exist where you use it.
Note that example 1 would as well work with mutable objects, such as lists:
class A:
foo = []
a, b = A(), A()
a.foo = []
a.foo.append(5)
b.foo.append(10)
print a.foo # [5]
print b.foo # [10]
print A.foo # [10]
Here a.foo gets a new, empty list. b.foo, lacking an instance attribute, continues to refer to the class attribute. So we have two empty lists which are independent of each other, as we see when .append()ing.

distinguish between instances of class

Im completly new to Python. Following this guide: http://roguebasin.roguelikedevelopment.org/index.php/Complete_Roguelike_Tutorial,_using_python%2Blibtcod
I have a simple question: When all the monsters have been created here, how does python distinguish between each instance of the class? As far as i can tell, all the instances are named "monster".
def place_objects(room):
#choose random number of monsters
num_monsters = libtcod.random_get_int(0, 0, MAX_ROOM_MONSTERS)
for i in range(num_monsters):
#choose random spot for this monster
x = libtcod.random_get_int(0, room.x1, room.x2)
y = libtcod.random_get_int(0, room.y1, room.y2)
#only place it if the tile is not blocked
if not is_blocked(x, y):
if libtcod.random_get_int(0, 0, 100) < 80: #80% chance of getting an orc
#create an orc
fighter_component = Fighter(hp=10, defense=0, power=3, death_function=monster_death)
ai_component = BasicMonster()
monster = Object(x, y, 'o', 'orc', libtcod.desaturated_green,
blocks=True, fighter=fighter_component, ai=ai_component)
else:
#create a troll
fighter_component = Fighter(hp=16, defense=1, power=4, death_function=monster_death)
ai_component = BasicMonster()
monster = Object(x, y, 'T', 'troll', libtcod.darker_green,
blocks=True, fighter=fighter_component, ai=ai_component)
objects.append(monster)

Each object is stored at different memory location. That's how you differentiate.
Use the builtin function id()
id(object)
Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
Documentation also says
CPython implementation detail: This is the address of the object in
memory.
Example:
>>> class Foo:
... pass
...
>>> x = Foo()
>>> y = Foo()
>>> id (x)
17385736
>>> id (y)
20391336
>>>

Creating different identical instances of a class produces different objects, which have different ids.
>>> class A(object):
... pass
...
>>>
>>> x = A()
>>> y = A()
>>> z = A()
>>> x
<__main__.A object at 0x10049dbd0>
>>> y
<__main__.A object at 0x10049db90>
>>> z
<__main__.A object at 0x10049dc10>
>>> x == y
False
>>>
also different hash codes
>>> x.__hash__()
268737981
>>> y.__hash__()
268737977
>>> z.__hash__()
268737985
>>> x.__hash__() == y.__hash__()
False

Python (CPython) uses reference counting to keep track of the objects.
This line creates an object and binds it to a name.
foo = MyObject()
The object is now referenced by one entity (the foo name) so its reference count is 1.
When you create a new reference, the count is increased. When you delete a reference, the count is decreased.
baz = foo
This created a new reference so now it's 2.
Now say all this was in a function:
def test():
foo = MyObject()
baz = foo
After the function is done executing, all its local variables are deleted. This means they stop referencing the objects and the counts go down. In this case the count will reach 0 and MyObject instance will be freed from memory.
To keep objects alive, you have to keep references to them. If you don't have a reference, the object itself is most likely gone already.
That's why your code collects all monsters in a list called objects (objects.append(monster)). A list will increment the count of every object it contains so when you re-bind the name "monster" to another instance, the previous instance will not be deleted.
The only thing you're missing is:
objects = []
at the start of your function and:
return objects
at the end.

We don't have variables in python, just names (references): that are like tags attached to the their value (take a look [here]).
So when you do:
monster = Object(x, y, 'T', 'troll')
You're just attaching the name "monster" to the instance of Object('troll'), and when you do:
objects.append(monster)
You're attaching the tag objects[0] to that instance.
This means that when you do again the loop and the tag "monster" is moved onto another instance, of Object('orc') for example, you don't lose the instance of Object('troll') because there's at least one tag attached to it: objects[0].
So you really have no need of keeping tracks of every instance, you just need to call the right tag.
I hope this helped you clear some of your doubts on how python works.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Understanding reference count of class variable - python

Related

How does closures see context variables into the stack?

How to create cyclic references that still get garbage collected out of scope?

To use init or not to in Python classes

Static class variables and `self` in Python

distinguish between instances of class

Categories

Resources