I have a very large object inside a tuple that I would like to explicitly delete. Unfortunately, since tuples are immutable, just doing del tup[0] throws:
TypeError: 'tuple' object doesn't support item deletion
I don't actually need tup at all anymore, so I could delete the whole thing, I just want to make sure that the large object referenced from tup is deleted immediately without waiting for the garbage collector.
Will just deleting tup remove it immediately or is there a better way?
del does not reclaim memory. It does not delete the object. Depending on whether you remove an index (like del somelist[0]), attribute (like del foo.attr) or a variable (like del foo).
del will decrement the reference counter, and if the reference counter hits zero, the memory will be reclaimed, usually immediately.
If there are however cyclic references, the garbage collector has to walk by to remove them. So if you absolutely need to remove the large object with the tuple, you can use:
import gc
del tup
gc.collect()
Note that if other items refer to the large object, it will still not be removed. So make sure that there is only one reference.
For simple variables that have no special delete procedure (some have, like closing a file), del will thus remove the variable from the local scope, and set the reference count one less than it was before. This will usually be faster than setting the variable to None for instance, since in the latter case, it will increment the reference count of the None singleton.
Related
I have the following function:
def myfn():
big_obj = BigObj()
result = consume(big_obj)
return result
When is the reference count for the value of BigObj() increased / decreased:
Is it:
when consume(big_obj) is called (since big_obj is not referenced afterwards in myfn)
when the function returns
some point, I don't no yet
Would it make a difference to change the last line to:
return consume(big_obj)
Edit (clarification for comments):
A local variable exists until the function returns
the reference can be deleted with del obj
But what is with temporaries (e.g f1(f2())?
I checked references to temporaries with this code:
import sys
def f2(c):
print("f2: References to c", sys.getrefcount(c))
def f0():
print("f0")
f2(object())
def f1():
c = object()
print("f1: References to c", sys.getrefcount(c))
f2(c)
f0()
f1()
This prints:
f0
f2: References to c 3
f1: References to c 2
f2: References to c 4
It seems, that references to temporary variables are held. Not that getrefcount gives one more than you would expect because it holds a reference, too.
When is the reference count for big_obj decreased
big_obj does not have a reference count. Variables don't have reference counts. Values do.
big_obj = BigObj()
This line of code creates an instance of the BigObj class. The reference count for that instance may increase or decrease multiple times, depending on the implementation details of that creation process (which is not necessarily written in Python). Notably, though, the assignment to the name big_obj increases the reference count.
when the function returns
At this point, the name big_obj ceases to exist - the name does not disappear simply because it won't be used again. (That's really hard to detect in the general case, and there isn't a particular benefit to it normally). If you must cause a name to cease to exist at a specific point in the operation (for example, because you know that is the last reference and want to trigger garbage collection; or perhaps because you are doing something tricky with __weakref__) then that is what the del statement is for.
Because a name for the object ceases to exist, its reference count decreases. If and when that count reaches zero, the object is garbage collected. It may have any number of references stored in other places, for a wide variety of reasons. (For example, there could be a bug in C code that implements the class; or the class might deliberately maintain its own list of every instance ever created.)
Note that all of the above pertains specifically to the reference implementation. In other implementations, things will differ. There might be some other trigger for garbage collection to happen. There might not be reference counting at all (as with Jython).
From the comments, it seems like what you are worried about is the potential for a memory leak. The code that you show cannot cause a memory leak - but neither can it fix a memory leak caused elsewhere. In Python, as in garbage-collected languages in general, memory leaks happen because objects hold on to references to each other that aren't needed. But there is no concept of "ownership" or "transfer" of references normally - all you need to do is not do things like "maintain a list of every instance ever created" without a) a good reason and b) a way to take instances out of that list when you want to forget about them.
A local variable, though, by definition, cannot extend the lifetime of an object beyond the local scope.
Disclaimer: Most information is from the comments. So credit for every one who participated in the discussion.
When an object is deleted is an implementation detail in general.
I will refer to CPython, which is based on reference counting. I ran the code examples with CPython 3.10.0.
An object is deleted, when the reference count hits zero.
Returning from a function deletes all local references.
Assigning a name to a new value decreases the reference count of the old value
passing a local increases the reference count. The reference is in on the stack(frame)
Returning from a function removes the reference from the stack
The last point is even valid for temporary references like f(g()). The last reference to g() is deleted, when f returns (assuming that g does not save a reference somewhere)see here
So for the example from the question:
def myfn():
big_obj = BigObj() # reference 1
result = consume(big_obj) # reference 2 on the stack frame for
# consume. Not yet counting any
# reference inside of consume
# after consume returns: The stack frame
# and reference 2 are deleted. Reference
# 1 remains
return result # myfn returns reference 1 is deleted.
# BigObj is deleted
def consume(big_obj):
pass # consume is holding reference 3
If we would change this to:
def myfn():
return consume(BigObj()) # reference is still saved on the stack
# frame and will be only deleted after
# consume returns
def consume(big_obj):
pass # consume is holding reference 2
How can I check reliably, if an object was deleted?
You cannot rely on gc.get_objects(). gc is used to detect and recycle reference cycles. Not every reference is tracked by the gc.
You can create a weak reference and check if the reference is still valid.
class BigObj:
pass
import weakref
ref = None
def make_ref(obj):
global ref
ref = weakref.ref(obj)
return obj
def myfn():
return consume(make_ref(BigObj()))
def consume(obj):
obj = None # remove to see impact on ref count
print(sys.getrefcount(ref()))
print(ref()) # There is still a valid reference. It is the one from consume stack frame
myfn()
How to pass a reference to a function and remove all references in the calling function?
You can box the reference, pass to the function and clear the boxed reference from inside the function:
class Ref:
def __init__(ref):
self.ref = ref
def clear():
self.ref = None
def f1(ref):
r = ref.ref
ref.clear()
def f2():
f1(Ref(object())
Variables have function scope in Python, so they aren't removed until the function returns. As far as I can tell, you can't destroy a reference to a local variable in a function from outside that function. I added some gc calls in the example code to test this.
import gc
class BigObj:
pass
def consume(obj):
del obj # only deletes the local reference to obj, but another one still exists in the calling function
def myfn():
big_obj = BigObj()
big_obj_id = id(big_obj) # in CPython, this is the memory address of the object
consume(big_obj)
print(any(id(obj) == big_obj_id for obj in gc.get_objects()))
return big_obj_id
>>> big_obj_id = myfn()
True
>>> gc.collect() # I'm not sure where the reference cycle is, but I needed to run this to clean out the big object from the gc's list of objects in my shell
>>> print(any(id(obj) == big_obj_id for obj in gc.get_objects()))
False
Since True was printed, the big object still existed after we forced garbage collection to occur even though there were no references to that variable after that point in the function. Forcing garbage collection after the function returns rightfully determines that the reference count to the big object is 0, so it cleans that object up. NOTE: As the comments below point out, ids for deleted objects may be reused so checking for equal ids may result in false positives. However, I'm confident that the conclusion is still correct.
One thing you can do to reclaim that memory earlier is to make the big object global, which could allow you to delete it from within the called function.
def consume():
# do whatever you need to do with the big object
big_obj_id = id(big_obj)
del globals()["big_obj"]
print(any(id(obj) == big_obj_id for obj in gc.get_objects()))
# do anything else you need to do without the big object
def myfn():
globals()["big_obj"] = BigObj()
result = consume()
return result
>>> myfn()
False
This sort of pattern is pretty weird and likely very hard to maintain though, so I would advise against using this. If you only need to delete the big object right after consume() is called, you could do something like this in order to free up the memory used by the big object as soon as possible.
big_obj = BigObj()
consume(big_obj)
del big_obj
Another strategy you might try is deleting the references within the big object that's passed in from the consume() function with del big_obj.x for some attribute x.
I am writing a python class like this:
class MyImageProcessor:
def __init__ (self, image, metadata):
self.image=image
self.metadata=metadata
Both image and metadata are objects of a class written by a
colleague. Now I need to make sure there is no waste of memory. I am thinking of defining a quit() method like this,
def quit():
self.image=None
self.metadata=None
import gc
gc.collect()
and suggest users to call quit() systematically. I would like to know whether this is the right way. In particular, do the instructions in quit() above guarantee that unused memories being well collected?
Alternatively, I could rename quit() to the build-in __exit__(), and suggest users to use the "with" syntax. But my question is
more about whether the instructions in quit() indeed fulfill the garbage collection work one would need in this situation.
Thank you for your help.
In python every object has a built-in reference_count, the variables(names) you create are only pointers to the objects. There are mutable and unmutable variables (for example if you change the value of an integer, the name will be pointed to another integer object, while changing a list element will not cause changing of the list name).
Reference count basically counts how many variable uses that data, and it is incremented/decremented automatically.
The garbage collector will destroy the objects with zero references (actually not always, it takes extra steps to save time). You should check out this article.
Similarly to object constructors (__init__()), which are called on object creation, you can define destructors (__del__()), which are executed on object deletion (usually when the reference count drops to 0). According to this article, in python they are not needed as much needed in C++ because Python has a garbage collector that handles memory management automatically. You can check out those examples too.
Hope it helps :)
No need for quit() (Assuming you're using C-based python).
Python uses two methods of garbage collection, as alluded to in the other answers.
First, there's reference counting. Essentially each time you add a reference to an object it gets incremented & each time you remove the reference (e.g., it goes out of scope) it gets decremented.
From https://devguide.python.org/garbage_collector/:
When an object’s reference count becomes zero, the object is deallocated. If it contains references to other objects, their reference counts are decremented. Those other objects may be deallocated in turn, if this decrement makes their reference count become zero, and so on.
You can get information about current reference counts for an object using sys.getrefcount(x), but really, why bother.
The second way is through garbage collection (gc). [Reference counting is a type of garbage collection, but python specifically calls this second method "garbage collection" -- so we'll also use this terminology. ] This is intended to find those places where reference count is not zero, but the object is no longer accessible. ("Reference cycles") For example:
class MyObj:
pass
x = MyObj()
x.self = x
Here, x refers to itself, so the actual reference count for x is more than 1. You can call del x but that merely removes it from your scope: it lives on because "someone" still has a reference to it.
gc, and specifically gc.collect() goes through objects looking for cycles like this and, when it finds an unreachable cycle (such as your x post deletion), it will deallocate the whole lot.
Back to your question: You don't need to have a quit() object because as soon as your MyImageProcessor object goes out of scope, it will decrement reference counters for image and metadata. If that puts them to zero, they're deallocated. If that doesn't, well, someone else is using them.
Your setting them to None first, merely decrements the reference count right then, but when MyImageProcessor goes out of scope, it won't decrement those reference count again, because MyImageProcessor no longer holds the image or metadata objects! So you're just explicitly doing what python does for you already for free: no more, no less.
You didn't create a cycle, so your calling gc.collect() is unlikely to change anything.
Check out https://devguide.python.org/garbage_collector/ if you are interested in more earthy details.
Not sure if it make sense but to my logic you could
Use :
gc.get_count()
before and after
gc.collect()
to see if something has been removed.
what are count0, count1 and count2 values returned by the Python gc.get_count()
I have something like that:
a = [instance1, instance2, ...]
if I do a
del a[1]
instance2 is removed from list, but is instance2 desctructor method called?
I'm interested in this because my code uses a lot of memory and I need to free memory deleting instances from a list.
Coming from a language like c++ (as I did), this tends to be a subject many people find difficult to grasp when first learning Python.
The bottomline is this: when you do del XXX, you are never* deleting an object when you use del. You are only deleting an object reference. However, in practice, assuming there are no other references laying about to the instance2 object, deleting it from your list will free the memory as you desire.
If you don't understand the difference between an object and an object reference, read on.
Python: Pass by value, or pass by reference?
You are likely familiar with the concept of passing arguments to a function by reference, or by value. However, Python does things differently. Arguments are always passed by object reference. Read this article for a helpful explanation of what this means.
To summarize: this means that when you pass a variable to a function, you are not passing a copy of the value of the variable (pass by value), nor are you passing the object itself - i.e., the address of the value in memory. You are passing the name-object that indirectly refers to the value held in memory.
What does this have to do with del...?
Well, I'll tell you.
Say you do this:
def deleteit(thing):
del thing
mylist = [1,2,3]
deleteit(mylist)
...what do you think will happen? Has mylist been deleted from the global namespace?
The answer is NO:
assert mylist # No error
The reason is that when you do del thing in the deleteit function, you are only deleting a local object reference to the object. That object reference exists ONLY inside of the function. As a sidebar, you might ask: is it possible to delete an object reference from the global namespace while inside a function? The answer is yes, but you have to declare the object reference to be part of the global namespace first:
def deletemylist():
global mylist
del mylist
mylist = [1,2,3]
deletemylist()
assert mylist #ERROR, as expected
Putting it all together
Now to get back to your original question. When, in ANY namespace, you do this:
del XXX
...you have NOT deleted the object signified by XXX. You CAN'T do that. You have only deleted the object reference XXX, which refers to some object in memory. The object itself is managed by the Python memory manager. This is a very important distinction.
Note that as a consequence, when you override the __del__ method of some object, which gets called when the object is deleted (NOT the object reference!):
class MyClass():
def __del__(self):
print(self, "deleted")
super().__del__()
m = MyClass()
del m
...the print statement inside the __del__ method does not necessarily occur immediately after you do del m. It only occurs at the point in time the object itself is deleted, and that is not up to you. It is up to the Python memory manager. When all object references in all the namespaces have been deleted, the __del__ method will eventually be executed. But not necessarily immediately.
The same is true when you delete an object reference that is part of a list, like in the original example. When you do del a[1], only the object reference to the object signified by a[1] is deleted, and the __del__ method of that object may or may not be called immediately (though as stated before, it will eventually be called once there are no more references to the object, and the object is garbage collected by the memory manager).
As a result of this, it is not recommended that you put things in the __del__ method that you want to happen immediately upon del mything, because it may not happen that way.
*I believe it is never. Inevitably someone will likely downvote my answer and leave a comment discussing the exception to the rule. But whatevs.
No. Calling del on a list element only removes a reference to an object from the list, it doesn't do anything (explicitly) to the object itself. However: If the reference in the list was the last one referring to the object, the object can now be destroyed and recycled. I think that the "normal" CPython implementation will immediately destroy and recycle the object, other variants' behaviour can vary.
If your object is resource-heavy and you want to be sure that the resources are freed correctly, use the with() construct. It's very easy to leak resources when relying on destructors. See this SO post for more details.
I am creating a program that spawns objects randomly. These objects have a limited lifetime.
I create these objects and place them in a list. The objects keep track of how long they exist and eventually expire. They are no longer needed after expiration.
I would like to delete the objects after they expire but I'm not sure how to reference the specific object in the list to delete it.
if something:
list.append(SomeObject())
---- later---
I would like a cleanup process that looks at the variable in the Object and if it is expired, then remove it from the list.
Thanks for your help in advance.
You can use the refCount in case you define "no longer used" as "no other object keeps a reference". Which is a good way, for as no references exist, the object can no longer be accessed and may be disposed of. In fact, Python's garbage collector will do that for you.
Where it goes wrong is when you also have all the instances in a list. That also counts as a refeference to the object and it therefore never will be disposed of.
For example, a list of state variables that are not only referenced by their owning objects, but also by a list to allow linear access. Explicitly call a cleanup function from the accessor to keep the list clean:
GlobalStateList = []
def gcOnGlobalStateList():
for s in reversed(GlobalStateList):
if (getrefcount(s) <= 3): # 1=GlobalStateList, 2=Iterator, 3=getrefcount()
GlobalStateList.remove(s)
def getGlobalStateList():
gcOnGlobalStateList()
return GlobalStateList
Note that even looking at the refcount increases it, so the test-value is three or less.
Assuming that your concept of SomeObject "expiry" is not directly related to the underlying Python object lifetime (reference counting, etc.) I would suggest that the easiest way to purge the list is to occasionally run through it, dereferencing any expired objects:
lst = [obj for obj in lst if not obj.expired]
Note that you shouldn't call your own variables list, as this will shadow the built-in.
I think this is the most common question on interviews:
class A:
def __init__(self, name):
self.name = name
def __del__(self):
print self.name,
aa = [A(str(i)) for i in range(3)]
for a in aa:
del a
And so what output of this code and why.
Output will be is nothing and why?
Thats because a is ref on object in list and then we call del method we remove this ref but not object?
There are at least 2 references to the object that a references (variables are references to objects, they are not the objects themselves). There's the one reference inside the list, and then there's the reference "a". When you del a, you remove one reference (the variable a) but not the reference from inside the list.
Also note that Python doesn't guarantee that __del__ will ever be called ...
__del__ gets called when an object is destroyed; this will happen after the last possible reference to the object is removed from the program's accessible memory. Depending on the implementation this might happen immediately or might be after some time.
Your code just removes the local name a from the execution scope; the object remains in the list so is still accessible. Try writing del aa[0], for example.
From the docs:
Note del x doesn’t directly call x.__del__() — the former decrements the reference count for x by one, and the latter is only called when x‘s reference count reaches zero.
__del__ is triggered when the garbage collector finds an object to be destroyed. The garbage collector will try to destroy objects with a reference count of 0. del just decouples the label in the local namespace, thereby decrementing the reference count for the object in the interpreter. The behavior of the garbage collector is for the most part considered an implementation detail of the interpreter, so there's no guarantee that __del__ on objects will be called in any specific order, or even at all. That's why the behavior of this code is undefined.