Observe when variable is deleted

Observe when variable is deleted - python

I'm using Python as a wrapper to a library that, for desired reasons, keeps certain objects in memory until the process is killed and system GC removes them (or, a command is sent to explicitly remove them).
A user can retrieve references to one of these objects using a Python function, so I know when a user has accessed them, but I don't know when a user is done accessing them.
My question is: it is possible in Python to observe when a variable is deleted (for reasons of reassignment, going out of scope, garbage collection, etc.)? Can I observe state change on variables at all (similar to Swift's didSet/willSet)?

Python calls __del__ magic method when about to destroy an object.
You could override it and add your logic.
class ObserveDel:
def __del__(self):
# do your stuff
Or just replace it in place.
def _handle_del(obj):
# do your stuff
a.__del__ = _handle_del

Related

Strange behavior with `weakref` in IPython

While coding a cache class for one of my projects I wanted to try out the weakref package as its functionality seems to fit this purpose very well. The class is supposed to cache blocks of data from disk as readable and writable buffers for ctypes.Structures. The blocks of data are supposed to be discarded when no structure is pointing to them, unless the buffer was modified due to some change to the structures.
To prevent dirty blocks from being garbage collected my idea was to set block.some_attr_name = block in the structures' __setattr__. Even when all structures are eventually garbage collected, the underlying block of data still has a reference count of at least 1 because block.some_attr_name references block.
I wanted to test this idea, so I opened up an IPython session and typed
import weakref
class Test:
def __init__ (self):
self.self = self
ref = weakref.ref(Test(), lambda r: print("Test was trashed"))
As expected, this didn't print Test was trashed. But when I went to type del ref().self to see whether the referent will be discarded, while typing, before hitting Enter, Test was trashed appeared. Oddly enough, even hitting the arrow keys or resizing the command line window after assigning ref will cause the referent to be trashed, even though the referent's reference count cannot drop to zero because it is referencing itself. This behavior persists even if I artificially increase the reference count by replacing self.self = self with self.refs = [self for i in range(20)].
I couldn't reproduce this behavior in the standard python.exe interpreter (interactive session) which is why I assume this behavior to be tied to IPython (but I am not actually sure about this).
Is this behavior expected with the devil hiding somewhere in the details of IPython's implementation or is this behavior a bug?
Edit 1: It gets stranger. If in the IPython session I run
import weakref
class Test:
def __init__ (self):
self.self = self
test = Test()
ref = weakref.ref(test, lambda r: print("Aaaand it's gone...", flush = True))
del test
the referent is not trashed immediately. But if I hold down any key, "typing" out "aaaa..." (~200 a's), suddenly Aaaand it's gone... appears. And since I added flush = True I can rule out buffering for the late response. I definitely wouldn't expect IPython to be decreasing reference counts just because I was holding down a key. Maybe Python itself checks for circular references in some garbage collection cycles?
(tested with IPython 7.30.1 running Python 3.10.1 on Windows 10 x64)

In Python's documentation on Extending and Embedding the Python Interpreter under subsection 1.10 Reference Counts the second to last paragraph reads:
While Python uses the traditional reference counting implementation, it also offers a cycle detector that works to detect reference cycles. This allows applications to not worry about creating direct or indirect circular references; these are the weakness of garbage collection implemented using only reference counting. Reference cycles consist of objects which contain (possibly indirect) references to themselves, so that each object in the cycle has a reference count which is non-zero. Typical reference counting implementations are not able to reclaim the memory belonging to any objects in a reference cycle, or referenced from the objects in the cycle, even though there are no further references to the cycle itself.
So I guess my idea of circular references to prevent garbage collection from eating my objects won't work out.

Garbage collection for a simple python class

I am writing a python class like this:
class MyImageProcessor:
def __init__ (self, image, metadata):
self.image=image
self.metadata=metadata
Both image and metadata are objects of a class written by a
colleague. Now I need to make sure there is no waste of memory. I am thinking of defining a quit() method like this,
def quit():
self.image=None
self.metadata=None
import gc
gc.collect()
and suggest users to call quit() systematically. I would like to know whether this is the right way. In particular, do the instructions in quit() above guarantee that unused memories being well collected?
Alternatively, I could rename quit() to the build-in __exit__(), and suggest users to use the "with" syntax. But my question is
more about whether the instructions in quit() indeed fulfill the garbage collection work one would need in this situation.
Thank you for your help.

In python every object has a built-in reference_count, the variables(names) you create are only pointers to the objects. There are mutable and unmutable variables (for example if you change the value of an integer, the name will be pointed to another integer object, while changing a list element will not cause changing of the list name).
Reference count basically counts how many variable uses that data, and it is incremented/decremented automatically.
The garbage collector will destroy the objects with zero references (actually not always, it takes extra steps to save time). You should check out this article.
Similarly to object constructors (__init__()), which are called on object creation, you can define destructors (__del__()), which are executed on object deletion (usually when the reference count drops to 0). According to this article, in python they are not needed as much needed in C++ because Python has a garbage collector that handles memory management automatically. You can check out those examples too.
Hope it helps :)

No need for quit() (Assuming you're using C-based python).
Python uses two methods of garbage collection, as alluded to in the other answers.
First, there's reference counting. Essentially each time you add a reference to an object it gets incremented & each time you remove the reference (e.g., it goes out of scope) it gets decremented.
From https://devguide.python.org/garbage_collector/:
When an object’s reference count becomes zero, the object is deallocated. If it contains references to other objects, their reference counts are decremented. Those other objects may be deallocated in turn, if this decrement makes their reference count become zero, and so on.
You can get information about current reference counts for an object using sys.getrefcount(x), but really, why bother.
The second way is through garbage collection (gc). [Reference counting is a type of garbage collection, but python specifically calls this second method "garbage collection" -- so we'll also use this terminology. ] This is intended to find those places where reference count is not zero, but the object is no longer accessible. ("Reference cycles") For example:
class MyObj:
pass
x = MyObj()
x.self = x
Here, x refers to itself, so the actual reference count for x is more than 1. You can call del x but that merely removes it from your scope: it lives on because "someone" still has a reference to it.
gc, and specifically gc.collect() goes through objects looking for cycles like this and, when it finds an unreachable cycle (such as your x post deletion), it will deallocate the whole lot.
Back to your question: You don't need to have a quit() object because as soon as your MyImageProcessor object goes out of scope, it will decrement reference counters for image and metadata. If that puts them to zero, they're deallocated. If that doesn't, well, someone else is using them.
Your setting them to None first, merely decrements the reference count right then, but when MyImageProcessor goes out of scope, it won't decrement those reference count again, because MyImageProcessor no longer holds the image or metadata objects! So you're just explicitly doing what python does for you already for free: no more, no less.
You didn't create a cycle, so your calling gc.collect() is unlikely to change anything.
Check out https://devguide.python.org/garbage_collector/ if you are interested in more earthy details.

Not sure if it make sense but to my logic you could
Use :
gc.get_count()
before and after
gc.collect()
to see if something has been removed.
what are count0, count1 and count2 values returned by the Python gc.get_count()

Do I need to implement the Dispose Pattern for a resource wrapper in Python

If I am going to implement a safe resource wrapper in Python, do I need to implement the Dispose Pattern like C#?
Here is a demo implementation of what I mean:
class ResourceWrapper:
def __init__(self):
self._python_resource = ... # A Python object that manages some resources.
self._external_resource = _allocate_resource() # A resource handle to an external resource.
self._is_closed = False # Whether the object has been closed.
def __del__(self):
self._close(manual_close=False) # Called by GC.
def close(self):
self._close(manual_close=True) # Called by user to free resource early.
def _close(self, manual_close):
if not self._is_closed: # Don’t want a resource to be closed more than once.
if manual_close:
# Since `_close` is called by user, we can guarantee that `self._python_resource` is still valid, so we
# can close it safely.
self._python_resource.close()
else:
# This means `_close` is called by GC, `self._python_resource` might be already GCed, but we don’t know
# for sure, so we do nothing and rely on GC to free `self._python_resource`.
pass
# GC will not take care of freeing unmanaged resource, so whether manual close or not, we have to close the
# resource to prevent leaking.
_free_resource(self._external_resource)
# Now we mark the object as closed to prevent closing multiple times.
self._is_closed = True
self._python_resource is a resource wrapper object managed by Python GC, and self._external_resource is a handle to an external resource that is not managed by Python GC.
I want to ensure both managed and unmanaged resource gets freed if user manual closes the wrapper, and, they also gets freed if the wrapper object gets GCed.

No, in Python you should use Context Managers:
class ResourceWrapper:
def __init__(self):
...
...
def __enter__(self):
return self
def __exit__(self, type, value, traceback):
self._close(manual_close=False)
with ResourceWrapper() as wrapper:
# do something with wrapper
Note 1: There's this comment in _close() method:
This means _close is called by GC, self._python_resource might be
already GCed, but we don’t knowfor sure, so we do nothing and rely on
GC to free self._python_resource.
I'm not sure what you mean by that, but as long as you hold reference to an object (and as long as it isn't a weak reference) it won't be GC'ed.
Note 2: What happens if an object that is a context manager is used without with block? Then resource will be released when object is garbage collected - but I wouldn't worry about that. Using context managers is common idiom in python (see any example with open()ing file). If that's crucial for your application, you can acquire resources in __enter__(), that way won't be acquired unless in with block.
Note 3, about cyclic references: If you have two objects that hold reference to each other, you've formed cyclic reference, so that two object won't be freed by "regular" reference-counting GC. Instead, they are to be collected by generational GC, unless thay happen to have __del__ method. __del__ inhibits GC from collecting objects. See gc.garbage:
A list of objects which the collector found to be unreachable but
could not be freed (uncollectable objects). By default, this list
contains only objects with __del__() methods. [1] Objects that have
__del__() methods and are part of a reference cycle cause the entire reference cycle to be uncollectable, including objects not necessarily
in the cycle but reachable only from it.
Python 3.4 introduced PEP-442, which introduces safe object finalization. Either way, you won't have invalid references. If you have attribute (hasattr(self, "_python_resource")) it will be valid.
Takeaway: don't use __del__.

Difference between calling a method vs using the field from init in python within a class?

So I have a class with a couple of methods defined as:
class Recognizer(object):
def __init__(self):
self.image = None
self.reduced_image = None
def load_image(self, path):
self.image = cv2.imread(path)
return self.image
Say I wanna add a third method that uses a return value from load_image(). Should I define it like this:
def shrink_image(self):
self.reduced_img = cv2.resize(self.image, (300, 300))
return self.reduced_img
Or should I define it like this:
def shrink_image(self, path):
reduced_img = cv2.resize(self.load_image(path), (300, 300))
return reduced_img
What exactly is the difference between the two? I can see that I can have access to the fields inside of init from any method that I declare within that class so I guess if I update the fields within init I would be able to access those fields for a instance at a given time.
Is there a consensus on which way is better?

What exactly is the difference between the two?
In Python the function with the signature __init__ is the constructor of the object, which is invoked implicitly when calling it via (), such as Recognizer()
The term "better" is vague, because in the former example you are saving the image as a property on the object, hence making the object larger.
But in second example you are simply returning the data from the function, to be used by the caller.
So it's a matter of context and style.
A simple rule of thumb is if that you are going to be using the property reduced_img in the context of the Recognizer object then it would be ideal to save it as a property on the object, to be accessed via self. If the caller is simply using the reduced_img and Recognizer is unaware of any state changes, then it's fine to just return it from the function.

In the second way the variable is scoped to the shrink_image function.
In the first way the variable is scoped to the objects lifetime, and having self.reduced_img set is a side-effect of the method.
Only seeing your code sample, without seeing clients, the second case is "better", because reduced_img isn't used anywhere else, and is unecessary to bind it to the instance. There def may be a use case where you need to persist the last self.reduced_img call making it a necessary side-effect.
In general it is extremely helpful to minimize side effects. Having side effects especially ones that mutate state can make reasoning about your program more difficult.
This is especially seen when you have multiple accessors to your object.
Imagine having the first shrink_image, you release your program, you have a single client in a single call site of the program calling shrink_object, easy peasy. After the call self.reduced_img will be the result.
Imagine sharing the object between multiple call sites?? It introduces a temporal-ish coupling: you may no longer be able to make an assumption about what reduced_img is, and accesses to it before calling shrink_image may no longer be None, because there may be other callers!!!
Compare this to the second shrink image, callers no longer have the mutatable state, and it's easier to reason about the state of Recognizer instance across shrink_image calls.
Something really nuts happens for the first example when multiple concurrent calls are introduced. It goes from being difficult to reason about and potentially logically incorrect to being a synchronization and data race issue.
Without concurrent callers this isn't going to be an issue. But it's def a possibility, If you're using this call in a web framework and you create a single instance to share between multiple web worker processes you can get this implicit concurrency and could potentially, maybe be subject to race conditions :p

Python: how to "kill" a class instance/object?

I want a Roach class to "die" when it reaches a certain amount of "hunger", but I don't know how to delete the instance. I may be making a mistake with my terminology, but what I mean to say is that I have a ton of "roaches" on the window and I want specific ones to disappear entirely.
I would show you the code, but it's quite long. I have the Roach class being appended into a Mastermind classes roach population list.

In general:
Each binding variable -> object increases internal object's reference counter
there are several usual ways to decrease reference (dereference object -> variable binding):
exiting block of code where variable was declared (used for the first time)
destructing object will release references of all attributes/method variable -> object references
calling del variable will also delete reference in the current context
after all references to one object are removed (counter==0) it becomes good candidate for garbage collection, but it is not guaranteed that it will be processed (reference here):
CPython currently uses a reference-counting scheme with (optional)
delayed detection of cyclically linked garbage, which collects most
objects as soon as they become unreachable, but is not guaranteed to
collect garbage containing circular references. See the documentation
of the gc module for information on controlling the collection of
cyclic garbage. Other implementations act differently and CPython may
change. Do not depend on immediate finalization of objects when they
become unreachable (ex: always close files).
how many references on the object exists, use sys.getrefcount
module for configure/check garbage collection is gc
GC will call object.__ del__ method when destroying object (additional reference here)
some immutable objects like strings are handled in a special way - e.g. if two vars contain same string, it is possible that they reference the same object, but some not - check identifying objects, why does the returned value from id(...) change?
id of object can be found out with builtin function id
module memory_profiler looks interesting - A module for monitoring memory usage of a python program
there is lot of useful resources for the topic, one example: Find all references to an object in python

You cannot force a Python object to be deleted; it will be deleted when nothing references it (or when it's in a cycle only referred to be the items in the cycle). You will have to tell your "Mastermind" to erase its reference.
del somemastermind.roaches[n]

for i,roach in enumerate(roachpopulation_list)
if roach.hunger == 100
del roachpopulation_list[i]
break
Remove the instance by deleting it from your population list (containing all the roach instances.
If your Roaches are Sprites created in Pygame, then a simple command of .kill would remove the instance.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.