How does garbage collection in Python work with class methods? - python

class example:
def exampleMethod(self):
aVar = 'some string'
return aVar
In this example, how does garbage collection work after each call to example.exampleMethod()? Will aVar be deallocated once the method returns?

The variable is never deallocated.
The object (in this case a string, with a value of 'some string' is reused again and again, so that object can never be deallocated.
Objects are deallocated when no variable refers to the object. Think of this.
a = 'hi mom'
a = 'next value'
In this case, the first object (a string with the value 'hi mom') is no longer referenced anywhere in the script when the second statement is executed. The object ('hi mom') can be removed from memory.

Every time You assign an object to a variable, You increase this object's reference counter.
a = MyObject() # +1, so it's at 1
b = a # +1, so it's now 2
a = 'something else' # -1, so it's 1
b = 'something else' # -1, so it's 0
Noone can access this the MyObject object We have created at the first line anymore.
When the counter reaches zero, the garbage collector frees the memory.
There is a way to make a tricky reference that does not increase reference counter (f.e. if You don't want an object to be hold in memory just because it's in some cache dict).
More on cPython's reference counting can be found here.
Python is language, cPython is it's (quite popular) implementation. Afaik the language itself doesn't specify how the memory is freed.

From your example, if you call example.exampleMethod() , without assigning the results (eg. a = example.exampleMethod()) then it will be deallocated straight away (in CPython), as CPython uses a reference counting mechanism. Strings aren't a very good example to use, because they also have a number of implementation specific optimizations. Strings can be cached, and are not deallocated so that they can be reused. This is especially useful because strings are very common for use as keys in dicts.
Again, garbage collecting is specific to the implementations, so CPython, Jython and IronPython will have different behaviours, most of these being documented on the respective sites/manuals/code/etc. If you want to explore a bit, I'd suggest creating a class where you have defined the del() method, which will be called upon the object being garbage collected (it's the destructor). Make it print something so you can trace it's call :)

As in Nico's answer, it depends on what you do with the result returned by exampleMethod. Python (or CPython anyway) uses reference counting. During the method, aVar references the string, while after that the variable aVar is deleted, which may leave no references, in which case, it is deleted.
Below is an example with a custom class that has a destructor (del(self), that print out "Object 1 being destructed" or similar. The gc is the garbage collector module, that automatically deletes objects with a reference count of 0. It's there for convenience, as otherwise there is no guarantee when the garbage collector is run.
import gc
class Noisy(object):
def __init__(self, n):
self.n = n
def __del__(self):
print "Object " + str(self.n) + " being destructed"
class example(object):
def exampleMethod(self, n):
aVar = Noisy(n)
return aVar
a = example()
a.exampleMethod(1)
b = a.exampleMethod(2)
gc.collect()
print "Before b is deleted"
del b
gc.collect()
print "After b is deleted"
The result should be as follows:
Object 1 being destructed
Before b is deleted
Object 2 being destructed
After b is deleted
Notice that the first Noisy object is deleted after the method is returned, as it is not assigned to a variable, so has a reference count of 0, but the second one is deleted only after the variable b is deleted, leaving a reference count of 0.

Related

When is the reference count for a local variable in a python function decreased?

I have the following function:
def myfn():
big_obj = BigObj()
result = consume(big_obj)
return result
When is the reference count for the value of BigObj() increased / decreased:
Is it:
when consume(big_obj) is called (since big_obj is not referenced afterwards in myfn)
when the function returns
some point, I don't no yet
Would it make a difference to change the last line to:
return consume(big_obj)
Edit (clarification for comments):
A local variable exists until the function returns
the reference can be deleted with del obj
But what is with temporaries (e.g f1(f2())?
I checked references to temporaries with this code:
import sys
def f2(c):
print("f2: References to c", sys.getrefcount(c))
def f0():
print("f0")
f2(object())
def f1():
c = object()
print("f1: References to c", sys.getrefcount(c))
f2(c)
f0()
f1()
This prints:
f0
f2: References to c 3
f1: References to c 2
f2: References to c 4
It seems, that references to temporary variables are held. Not that getrefcount gives one more than you would expect because it holds a reference, too.
When is the reference count for big_obj decreased
big_obj does not have a reference count. Variables don't have reference counts. Values do.
big_obj = BigObj()
This line of code creates an instance of the BigObj class. The reference count for that instance may increase or decrease multiple times, depending on the implementation details of that creation process (which is not necessarily written in Python). Notably, though, the assignment to the name big_obj increases the reference count.
when the function returns
At this point, the name big_obj ceases to exist - the name does not disappear simply because it won't be used again. (That's really hard to detect in the general case, and there isn't a particular benefit to it normally). If you must cause a name to cease to exist at a specific point in the operation (for example, because you know that is the last reference and want to trigger garbage collection; or perhaps because you are doing something tricky with __weakref__) then that is what the del statement is for.
Because a name for the object ceases to exist, its reference count decreases. If and when that count reaches zero, the object is garbage collected. It may have any number of references stored in other places, for a wide variety of reasons. (For example, there could be a bug in C code that implements the class; or the class might deliberately maintain its own list of every instance ever created.)
Note that all of the above pertains specifically to the reference implementation. In other implementations, things will differ. There might be some other trigger for garbage collection to happen. There might not be reference counting at all (as with Jython).
From the comments, it seems like what you are worried about is the potential for a memory leak. The code that you show cannot cause a memory leak - but neither can it fix a memory leak caused elsewhere. In Python, as in garbage-collected languages in general, memory leaks happen because objects hold on to references to each other that aren't needed. But there is no concept of "ownership" or "transfer" of references normally - all you need to do is not do things like "maintain a list of every instance ever created" without a) a good reason and b) a way to take instances out of that list when you want to forget about them.
A local variable, though, by definition, cannot extend the lifetime of an object beyond the local scope.
Disclaimer: Most information is from the comments. So credit for every one who participated in the discussion.
When an object is deleted is an implementation detail in general.
I will refer to CPython, which is based on reference counting. I ran the code examples with CPython 3.10.0.
An object is deleted, when the reference count hits zero.
Returning from a function deletes all local references.
Assigning a name to a new value decreases the reference count of the old value
passing a local increases the reference count. The reference is in on the stack(frame)
Returning from a function removes the reference from the stack
The last point is even valid for temporary references like f(g()). The last reference to g() is deleted, when f returns (assuming that g does not save a reference somewhere)see here
So for the example from the question:
def myfn():
big_obj = BigObj() # reference 1
result = consume(big_obj) # reference 2 on the stack frame for
# consume. Not yet counting any
# reference inside of consume
# after consume returns: The stack frame
# and reference 2 are deleted. Reference
# 1 remains
return result # myfn returns reference 1 is deleted.
# BigObj is deleted
def consume(big_obj):
pass # consume is holding reference 3
If we would change this to:
def myfn():
return consume(BigObj()) # reference is still saved on the stack
# frame and will be only deleted after
# consume returns
def consume(big_obj):
pass # consume is holding reference 2
How can I check reliably, if an object was deleted?
You cannot rely on gc.get_objects(). gc is used to detect and recycle reference cycles. Not every reference is tracked by the gc.
You can create a weak reference and check if the reference is still valid.
class BigObj:
pass
import weakref
ref = None
def make_ref(obj):
global ref
ref = weakref.ref(obj)
return obj
def myfn():
return consume(make_ref(BigObj()))
def consume(obj):
obj = None # remove to see impact on ref count
print(sys.getrefcount(ref()))
print(ref()) # There is still a valid reference. It is the one from consume stack frame
myfn()
How to pass a reference to a function and remove all references in the calling function?
You can box the reference, pass to the function and clear the boxed reference from inside the function:
class Ref:
def __init__(ref):
self.ref = ref
def clear():
self.ref = None
def f1(ref):
r = ref.ref
ref.clear()
def f2():
f1(Ref(object())
Variables have function scope in Python, so they aren't removed until the function returns. As far as I can tell, you can't destroy a reference to a local variable in a function from outside that function. I added some gc calls in the example code to test this.
import gc
class BigObj:
pass
def consume(obj):
del obj # only deletes the local reference to obj, but another one still exists in the calling function
def myfn():
big_obj = BigObj()
big_obj_id = id(big_obj) # in CPython, this is the memory address of the object
consume(big_obj)
print(any(id(obj) == big_obj_id for obj in gc.get_objects()))
return big_obj_id
>>> big_obj_id = myfn()
True
>>> gc.collect() # I'm not sure where the reference cycle is, but I needed to run this to clean out the big object from the gc's list of objects in my shell
>>> print(any(id(obj) == big_obj_id for obj in gc.get_objects()))
False
Since True was printed, the big object still existed after we forced garbage collection to occur even though there were no references to that variable after that point in the function. Forcing garbage collection after the function returns rightfully determines that the reference count to the big object is 0, so it cleans that object up. NOTE: As the comments below point out, ids for deleted objects may be reused so checking for equal ids may result in false positives. However, I'm confident that the conclusion is still correct.
One thing you can do to reclaim that memory earlier is to make the big object global, which could allow you to delete it from within the called function.
def consume():
# do whatever you need to do with the big object
big_obj_id = id(big_obj)
del globals()["big_obj"]
print(any(id(obj) == big_obj_id for obj in gc.get_objects()))
# do anything else you need to do without the big object
def myfn():
globals()["big_obj"] = BigObj()
result = consume()
return result
>>> myfn()
False
This sort of pattern is pretty weird and likely very hard to maintain though, so I would advise against using this. If you only need to delete the big object right after consume() is called, you could do something like this in order to free up the memory used by the big object as soon as possible.
big_obj = BigObj()
consume(big_obj)
del big_obj
Another strategy you might try is deleting the references within the big object that's passed in from the consume() function with del big_obj.x for some attribute x.

Circular reference in python

I am not sure how python deal with circular reference (reference cycle). I check some answers and found this:
Python's standard reference counting mechanism cannot free cycles, so the structure in your example would leak.
The supplemental garbage collection facility, however, is enabled by default and should be able to free that structure, if none of its components are reachable from the outside anymore and they do not have __del__() methods.
I guess it means that if none of the instances in reference cycle are reachable outside, both of them will be cleaned up. Is this true?
On the other hand, there is a package weakref which is often used to deal with map dictionary. The purpose of its existence, I suppose, is to avoid reference cycle.
In summary, can python deal with reference cycle automatically? If they can, why do we have to use weakref?
You do not have to worry about reference cycles if the objects in the cycle don't have a custom __del__ method, as Python can (and will) destroy the objects in any order.
If your custom methods do have a __del__ method, Python doesn't know if one objects deletion effects another objects deletion at all. Say when an object is deleted, it sets some global variable. So, the objects stick around. As a quick test, you can make a __del__ method that prints something:
class Deletor(str):
def __del__(self):
print(self, 'has been deleted')
a = Deletor('a') # refcount: 1
del a # refcount: 0
Outputs:
a has been deleted
But if you had code like this:
a = Deletor('a') # refcount: 1
a.circular = a # refcount: 2
del a # refcount: 1
It outputs nothing, as Python can't safely delete a. It becomes an "uncollectable object", and can be found in gc.garbage †
There are two solutions to this. The weakref (which doesn't increase the reference count):
# refcount: a b
a = Deletor('a') # 1 0
b = Deletor('b') # 1 1
b.a = a # 2 1
a.b = weakref.ref(b) # 2 1
del a # 1 1
del b # 1 0
# del b kills b.a # 0 0
Outputs:
b has been deleted
a has been deleted
(Note how b is deleted first before it can delete a)
And you can manually delete the cycles (If you can keep track of them):
# refcount a b
a = Deletor('a') # 1 0
b = Deletor('b') # 1 1
b.a = a # 2 1
a.b = b # 2 2
del b # 2 1
print('del b')
del a.b # 2 0
# b is deleted, now dead
# b.a now dead # 1 0
print('del a.b')
del a # 0 0
print('del a')
Outputs:
del b
b has been deleted
del a.b
a has been deleted
del a
Notice how b is deleted after a.b is deleted.
† Starting with Python 3.4, things changed due to PEP 442. __del__ may be called even on objects in a reference cycle and the semantics are slightly different, so it is slightly harder to become an uncollectable object.
A weakref is still helpful, as it is less intense on the garbage collector, and memory can be reclaimed earlier.
(Trying to answer why do we then have the weak references subquestion.)
Weakrefs do not only break circular references, but also prevent unwanted non-circular references.
My favourite example is counting simultaneous network connections (kind of load measuring) using a WeakSet. In this example every new connection has to be added to the WeakSet, but that's the only task the networking code needs to do. The connection can be closed by the server, by the client, or in an error handler, but none of these routines has the responsibility to remove the connection from the set, and that is because the additional references are weak.
Variables are memory references.
my_var=10
this is stored in the one of the memory slots. my_var actually references the address of the memory slot where the 10 is stored. if you type:
id(my_var)
you will get the address of the slot in base-10. hex(id(my_var) ) will give the hex representation of the address.
Whenever we use my_var python memory manager goes to the memory and retrieves the value of 10. Python memory manager also keeps track of the number of references for this memory slot. if there is no reference to this memory address, python memory manager destroys this object, and uses this memory slot for new objects.
imagine we have two classes:
class A:
def __init__(self):
self.b = B(self)
print('A: self: {0}, b:{1}'.format(hex(id(self)), hex(id(self.b))))
class B:
def __init__(self, a):
self.a = a
print('B: self: {0}, a: {1}'.format(hex(id(self)), hex(id(self.a))))
when you define an instance of class A:
my_var = A()
you will get this printed: (in your system, you will have different addresses)
B: self: 0x1fc1eae44e0, a: 0x1fc1eae4908
A: self: 0x1fc1eae4908, b:0x1fc1eae44e0
pay attention to the references. they are circular referenced.
NOTE: in order to see those references you have to disable the garbage collector otherwise it will delete them automatically.
gc.disable()
Currently reference count of my_var which is (0x1fc1eae4908) is 2. my_var and classB are referencing to this address. if we change the my_var
my_var= None
now my_var is not pointing the same memory address. Now reference count of (0x1fc1eae4908) is 1 therefore this memory slot is not cleaned up.
now we will have Memory Leak which is when the memory is no longer needed is not cleaned up.
Garbage Collector will automatically identify the memory leak in circular references and clean them up. But if even one of the objects in the circular reference has a destructor (del()), Garbage Collector does not know the destruction order of the objects. So the object is marked as uncollectable and objects in the circular reference are not cleaned up which leads to memory leak.
weakref is used for caching purposes. I think python has very good documentation.
here is the reference for the weakref:
weakref documentation

Python Static Variable List __del__

I'm trying to create a class using a static List, which collects all new instances of an object class. The problem I'm facing, seems like as soon as i try to use a list the same way as for example an integer, i can't use the magic marker __del__ anymore.
My Example:
class MyClass(object):
count = 0
#instances = []
def __init__(self, a, b):
self.a = a
self.b = b
MyClass.count += 1
#MyClass.instances.append(self)
def __str__(self):
return self.__repr__()
def __repr__(self):
return "a: " + str(self.a) + ", b: " + str(self.b)
def __del__(self):
MyClass.count -= 1
#MyClass.instances.remove(self)
A = MyClass(1,'abc')
B = MyClass(2,'def')
print MyClass.count
del B
print MyClass.count
With comments I get the correct answer:
2
1
But without the comments - including now the static object list MyClass.instances I get the wrong answer:
2
2
It seems like MyClass can't reach its __del__ method anymore! How Come?
From the docs,
del x doesn’t directly call x.__del__() — the former decrements the reference
count for x by one, and the latter is only called when x‘s reference count
reaches zero.
When you uncomment,
instances = []
...
...
MyClass.instances.append(self)
You are storing a reference to the current Object in the MyClass.instances. That means, the reference count is internally incremented by 1. That is why __del__ is not getting called immediately.
To resolve this problem, explicitly remove the item from the list like this
MyClass.instances.remove(B)
del B
Now it will print
2
1
as expected.
There is one more way to fix this problem. That is to use weakref. From the docs,
A weak reference to an object is not enough to keep the object alive:
when the only remaining references to a referent are weak references,
garbage collection is free to destroy the referent and reuse its
memory for something else. A primary use for weak references is to
implement caches or mappings holding large objects, where it’s desired
that a large object not be kept alive solely because it appears in a
cache or mapping.
So, having a weakref will not postpone object's deletion. With weakref, this can be fixed like this
MyClass.instances.append(weakref.ref(self))
...
...
# MyClass.instances.remove(weakref.ref(self))
MyClass.instances = [w_ref for w_ref in MyClass.instances if w_ref() is None]
Instead of using remove method, we can call each of the weakref objects and if they return None, they are already dead. So, we filter them out with the list comprehension.
So, now, when you say del B, even though weakrefs exist for B, it will call __del__ (unless you have made some other variable point to the same object, like by doing an assigment).
From to http://docs.python.org/2.7/reference/datamodel.html#basic-customization I quote (paragraph in gray after object.__del__):
del x doesn’t directly call x.__del__() — the former decrements the reference count for x by one, and the latter is only called when x‘s reference count reaches zero.
Here you call del B but there is still an instance of B in MyClass.instances, so that B is still referenced and hence not destroyed, so that the __del__ function is not called.
If you call directly B.__del__(), it works.
__del__ is only called when no more instances are left.
You should consider putting only weak refs into the MyClass.instances list.
This can be achieved with import weakref and then
either using a WeakSet for the list
or putting weakref.ref(self) into the list.
__del__ is automatically called whenever the last "strict" reference is removed. The weakrefs disappear automatically.
But be aware that there are some caveats on __del__ mentioned in the docs.
__del__ is used when the garbage collector remove an object from the memory. If you add your object to MyClass.instances then the object is marked as "used" and the garbage collector will never try to remove it. And so __del__ is never called.
You'd better use an explicit function (MyClass.del_element()) because you can't really predict when __del__ will be called (even if you don't add it to a list).

Why doesn't __del__ work properly

I think this is the most common question on interviews:
class A:
def __init__(self, name):
self.name = name
def __del__(self):
print self.name,
aa = [A(str(i)) for i in range(3)]
for a in aa:
del a
And so what output of this code and why.
Output will be is nothing and why?
Thats because a is ref on object in list and then we call del method we remove this ref but not object?
There are at least 2 references to the object that a references (variables are references to objects, they are not the objects themselves). There's the one reference inside the list, and then there's the reference "a". When you del a, you remove one reference (the variable a) but not the reference from inside the list.
Also note that Python doesn't guarantee that __del__ will ever be called ...
__del__ gets called when an object is destroyed; this will happen after the last possible reference to the object is removed from the program's accessible memory. Depending on the implementation this might happen immediately or might be after some time.
Your code just removes the local name a from the execution scope; the object remains in the list so is still accessible. Try writing del aa[0], for example.
From the docs:
Note del x doesn’t directly call x.__del__() — the former decrements the reference count for x by one, and the latter is only called when x‘s reference count reaches zero.
__del__ is triggered when the garbage collector finds an object to be destroyed. The garbage collector will try to destroy objects with a reference count of 0. del just decouples the label in the local namespace, thereby decrementing the reference count for the object in the interpreter. The behavior of the garbage collector is for the most part considered an implementation detail of the interpreter, so there's no guarantee that __del__ on objects will be called in any specific order, or even at all. That's why the behavior of this code is undefined.

Weak References in python

I have been trying to understand how python weak reference lists/dictionaries work. I've read the documentation for it, however I cannot figure out how they work, and what they can be used for. Could anyone give me a basic example of what they do and an explanation of how they work?
(EDIT) Using Thomas's code, when i substitute obj for [1,2,3] it throws:
Traceback (most recent call last):
File "C:/Users/nonya/Desktop/test.py", line 9, in <module>
r = weakref.ref(obj)
TypeError: cannot create weak reference to 'list' object
Theory
The reference count usually works as such: each time you create a reference to an object, it is increased by one, and whenever you delete a reference, it is decreased by one.
Weak references allow you to create references to an object that will not increase the reference count.
The reference count is used by python's Garbage Collector when it runs: any object whose reference count is 0 will be garbage collected.
You would use weak references for expensive objects, or to avoid circle references (although the garbage collector usually does it on its own).
Usage
Here's a working example demonstrating their usage:
import weakref
import gc
class MyObject(object):
def my_method(self):
print 'my_method was called!'
obj = MyObject()
r = weakref.ref(obj)
gc.collect()
assert r() is obj #r() allows you to access the object referenced: it's there.
obj = 1 #Let's change what obj references to
gc.collect()
assert r() is None #There is no object left: it was gc'ed.
Just want to point out that weakref.ref does not work for built-in list because there is no __weakref__ in the __slots__ of list.
For example, the following code defines a list container that supports weakref.
import weakref
class weaklist(list):
__slots__ = ('__weakref__',)
l = weaklist()
r = weakref.ref(l)
The point is that they allow references to be retained to objects without preventing them from being garbage collected.
The two main reasons why you would want this are where you do your own periodic resource management, e.g. closing files, but because the time between such passes may be long, the garbage collector may do it for you; or where you create an object, and it may be relatively expensive to track down where it is in the programme, but you still want to deal with instances that actually exist.
The second case is probably the more common - it is appropriate when you are holding e.g. a list of objects to notify, and you don't want the notification system to prevent garbage collection.
Here is the example comparing dict and WeakValueDictionary:
class C: pass
ci=C()
print(ci)
wvd = weakref.WeakValueDictionary({'key' : ci})
print(dict(wvd), len(wvd)) #1
del ci
print(dict(wvd), len(wvd)) #0
ci2=C()
d=dict()
d['key']=ci2
print(d, len(d))
del ci2
print(d, len(d))
And here is the output:
<__main__.C object at 0x00000213775A1E10>
{'key': <__main__.C object at 0x00000213775A1E10>} 1
{} 0
{'key': <__main__.C object at 0x0000021306B0E588>} 1
{'key': <__main__.C object at 0x0000021306B0E588>} 1
Note how in the first case once we del ci the actual object will be also removed from the dictionary wvd.
In the case or regular Python dictionary dict class, we may try to remove the object but it will still be there as shown.
Note: if we use del, we do not to call gc.collect() after that, since just del effectively removes the object.

Categories

Resources