I tend to use it whenever I am working on a prototype script, and:
Use a somewhat common variable (such as fileCount), and
Have a large method (20+ lines), and
Do not use classes or namespaces yet.
In this situation, in order to avoid potential variable clash, I delete the bugger as soon as I am done with it. I know, in a production code I should avoid 1., 2., and 3., but going from a prototype that works to a completely polished class is time consuming. Sometimes I might want to settle for a sub-optimal, quick refactoring job. In that case I find keeping the del statements handy. Am I developing an unnecessary, bad habit? Is del totally avoidable? When would it be a good thing?
I don't think that del by itself is a code smell.
Reusing a variable name in the same namespace is definitely a code smell as is not using classes and other namespaces where appropriate. So using del to facilitate that sort of thing is a code smell.
The only really appropriate use of del that I can think of off the top of my head is breaking cyclic references which are often a code smell as well (and often times, this isn't even necessary). Remember, all del does is delete the reference to the object and not the object itself. That will be taken care of by either reference counting or garbage collecting.
>>> a = [1, 2]
>>> b = a
>>> del a
>>> a
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined
>>> b
[1, 2]
You can see that the list is kept alive after the del statement because b still holds a reference to it.
So, while del isn't really a code smell, it can be associated with things that are.
Any code that's well organized in functions, classes and methods doesn't need del except in exceptional circumstances. Aim to build your apps well factored from the start by using more functions and methods, avoid reusing variable names, etc.
The use of a del statement is OK - it doesn't lead to any trouble, I use it often when I use Python as a replacement for shell scripts on my system, and when I'm making script experiments. However, if it appears often in a real application or library, it is an indication that something isn't all right, probably badly structured code. I never had to use it in an application, and you'd rarely see it used anywhere on code that's been released.
Related
As a programmer, I generally try to avoid the del statement because it is often an extra complication that Python program don't often need. However, when browsing the standard library (threading, os, etc...) and the pseudo-standard library (numpy, scipy, etc...) I see it used a non-zero amount of times, and I'd like to better understand when it is/isn't appropriate the del statement.
Specifically, I'm curious about the relationship between the Python del statement and the efficiency of a Python program. It seems to me that del might help a program run faster by reducing the amount of clutter lookup instructions need to sift through. However, I can also see a world where the extra instruction takes up more time than it saves.
My question is: does anyone have any interesting code snippets that demonstrate cases where del significantly changes the speed of the program? I'm most interested in cases where del improves the execution speed of a program, although non-trivial cases where del can really hurt are also interesting.
The main reason that standard Python libraries use del is not for speed but for namespace decluttering ("avoiding namespace pollution" is another term I believe I have seen for this). As user2357112 noted in a comment, it can also be used to break a traceback cycle.
Let's take a concrete example: line 58 of types.py in the cpython implementation reads:
del sys, _f, _g, _C, _c, # Not for export
If we look above, we find:
def _f(): pass
FunctionType = type(_f)
LambdaType = type(lambda: None) # Same as FunctionType
CodeType = type(_f.__code__)
MappingProxyType = type(type.__dict__)
SimpleNamespace = type(sys.implementation)
def _g():
yield 1
GeneratorType = type(_g())
_f and _g are two of the names being deled; as the comment says, they are "not for export".1
You might think this is covered via:
__all__ = [n for n in globals() if n[:1] != '_']
(which is near the end of that same file), but as What's the python __all__ module level variable for? (and the linked Can someone explain __all__ in Python?) note, these affect the names exported via from types import *, rather than what's visible via import types; dir(types).
It's not necessary to clean up your module namespace, but doing so prevents people from sneaking into it and using undefined items. So it's good for a couple of purposes.
1Looks like someone forgot to update this to include _ag. _GeneratorWrapper is harder to hide, unfortunately.
Specifically, I'm curious about the relationship between the Python del statement and the efficiency of a Python program.
As far as performance is concerned, del (excluding index deletion like del x[i]) is primarily useful for GC purposes. If you have a variable pointing to some large object that is no longer needed, deling that variable will (assuming there are no other references to it) deallocate that object (with CPython this happens immediately, as it uses reference counting). This could make the program faster if you'd otherwise be filling your RAM/caches; only way to know is to actually benchmark it.
It seems to me that del might help a program run faster by reducing the amount of clutter lookup instructions need to sift through.
Unless you're using thousands of variables (which you shouldn't be), it's exceedingly unlikely that removing variables using del will make any noticeable difference in performance.
I want to get all object generated from another module, even the object do not have a name or reference, is it possible? For example:
in module1.py, there's only one line code:
MyClass()
in module2.py:
module1 = __import__("module1")
# print sth of MyClass from module1
What you're trying to do is generally impossible.
An object that has no name or other reference is garbage. That's the technical meaning of the term "garbage". In CPython (the Python implementation you're probably using if you don't know which one you're using), garbage is collected immediately—as soon as that MyClass() statement ends, the instance gets destroyed.
So, you can't access the object, because it doesn't exist.
In some other Python implementations, the object may not be destroyed until the next garbage collection cycle, but that's going to be pretty soon, and it's not deterministic exactly when—and you still have no way to get at it before it's destroyed. So it might as well not exist, even if it hasn't actually been finalized yet.
Now, "generally" means there are some exceptions. They're not common, but they do exist.
For example, imagine a class like this:
class MyClass:
_instances = []
def __init__(self):
MyClass._instances.append(self)
Now, when you do MyClass(), there actually is a reference to that instance, so it's not garbage. And, if you know where it is (which you'd presumably find in the documentation, or in the source code), you can access it as MyClass._instances[-1]. But it's unlikely that an arbitrary class MyClass does anything like this.
OK, I lied. There is sort of a way to do this, but (a) it’s cheating, and (b) it’s almost certainly a terrible idea that has no valid use cases you’ll ever think of. But just for fun, here’s how you could do this.
You need to write an import hook, and make sure it gets installed before the first time you import the module. Then you can do almost anything you want. The simplest idea I can think of is transforming the AST to turn every expression statement (or maybe just every expression statement at the top level) into an assignment statement that assigns to a hidden variable. You can even make the variable name an invalid identifier, so it'll be safe to run on any legal module no matter what's in the global namespace. Then you can access the first object created and abandoned by the module as something like module.globals()['.0'].
In some Python modules, I have code like this:
try:
someGlobal
except NameError:
someGlobal = []
This can be important in case I want to support module reloading and some certain object must not be overwritten (e.g. because I know that it is referred to directly).
Many editors (e.g. PyCharm) mark this as an error. Is there some other way to write the same code which is more Python idiomatic? Or is this already Python idiomatic and it's a fault of the editors to complain about this?
I'd go with
if 'someGlobal' not in dir():
someGlobal = 23
This has the advantage of simplicity, but can be a bit slow if the module has a lot of globals, since dir() is a list and the in operator on it is O(N).
For speed, and at a modest disadvantage in terms of simplicity,
if 'someGlobal' not in vars():
someGlobal = 23
which should be faster since vars() is a dict, so the in operator on it is O(1).
It is an error, at least given the information availale to the editor. So the editor isn't wrong; it is just that you are specifically coding for that error.
I've seen in different code bases and just read on PyMOTW (see the first Note here).
The explanation says that a cycle will be created in case the traceback is assigned to a variable from sys.exc_info()[2], but why is that?
How big of a problem is this? Should I search for all uses of exc_info in my code base and make sure the traceback is deleted?
Python 3 (update to original answer):
In Python 3, the advice quoted in the question has been removed from the Python documentation. My original answer (which follows) applies only to versions of Python that include the quote in their documentation.
Python 2:
The Python garbage collector will, eventually, find and delete circular references like the one created by referring to a traceback stack from inside one of the stack frames themselves, so don't go back and rewrite your code. But, going forward, you could follow the advice of
http://docs.python.org/library/sys.html
(where it documents exc_info()) and say:
exctype, value = sys.exc_info()[:2]
when you need to grab the exception.
Two more thoughts:
First, why are you running exc_info() at all?
If you want to catch an exception shouldn't you just say:
try:
...
except Exception as e: # or "Exception, e" in old Pythons
... do with with e ...
instead of mucking about with objects inside the sys module?
Second: Okay, I've given a lot of advice but haven't really answered your question. :-)
Why is a cycle created? Well, in simple cases, a cycle is created when an object refers to itself:
a = [1,2,3]
a.append(a)
Or when two objects refer to each other:
a = [1,2,3]
b = [4,5,a]
a.append(b)
In both of these cases, when the function ends the variable values will still exist because they're locked in a reference-count embrace: neither can go away until the other has gone away first! Only the modern Python garbage collector can resolve this, by eventually noticing the loop and breaking it.
And so the key to understanding this situation is that a "traceback" object — the third thing (at index #2) returned by exc_info() — contains a "stack frame" for each function that was active when the exception was called. And those stack frames are not "dead" objects showing what was true when the execption was called; the frames are still alive! The function that's caught the exception is still alive, so its stack frame is a living thing, still growing and losing variable references as its code executes to handle the exception (and do whatever else it does as it finishes the "except" clause and goes about its work).
So when you say t = sys.exc_info()[2], one of those stack frames inside of the traceback — the frame, in fact, belonging to the very function that's currently running — now has a variable in it named t that points back to the stack frame itself, creating a loop just like the ones that I showed above.
The traceback contains references to all the active frames, which in turn contain references to all the local variables in those various frames -- those references are a big part of the very job of traceback and frame objects, so that's hardly surprising. So, if you add a reference back to the traceback (or fail to remove it promptly having temporarily added it), you inevitably form a big loop of references -- which interferes with garbage collection (and may stop it altogether if any of the objects in the loop belong to classes that overide __del__, the finalizer method).
Especially in a long-running program, interfering with garbage collection is not the best of idea, because you'll be holding on to memory you don't really need (for longer than necessary, or indefinitely if you've essentially blocked garbage collection on such loops by having them include objects with finalizers).
So, it's definitely best to get rid of tracebacks as soon as feasible, whether they come from exc_info or not!
What are the best practices and recommendations for using explicit del statement in python? I understand that it is used to remove attributes or dictionary/list elements and so on, but sometimes I see it used on local variables in code like this:
def action(x):
result = None
something = produce_something(x)
if something:
qux = foo(something)
result = bar(qux, something)
del qux
del something
return result
Are there any serious reasons for writing code like this?
Edit: consider qux and something to be something "simple" without a __del__ method.
I don't remember when I last used del -- the need for it is rare indeed, and typically limited to such tasks as cleaning up a module's namespace after a needed import or the like.
In particular, it's not true, as another (now-deleted) answer claimed, that
Using del is the only way to make sure
a object's __del__ method is called
and it's very important to understand this. To help, let's make a class with a __del__ and check when it is called:
>>> class visdel(object):
... def __del__(self): print 'del', id(self)
...
>>> d = visdel()
>>> a = list()
>>> a.append(d)
>>> del d
>>>
See? del doesn't "make sure" that __del__ gets called: del removes one reference, and only the removal of the last reference causes __del__ to be called. So, also:
>>> a.append(visdel())
>>> a[:]=[1, 2, 3]
del 550864
del 551184
when the last reference does go away (including in ways that don't involve del, such as a slice assignment as in this case, or other rebindings of names and other slots), then __del__ gets called -- whether del was ever involved in reducing the object's references, or not, makes absolutely no difference whatsoever.
So, unless you specifically need to clean up a namespace (typically a module's namespace, but conceivably that of a class or instance) for some specific reason, don't bother with del (it can be occasionally handy for removing an item from a container, but I've found that I'm often using the container's pop method or item or slice assignment even for that!-).
No.
I'm sure someone will come up with some silly reason to do this, e.g. to make sure someone doesn't accidentally use the variable after it's no longer valid. But probably whoever wrote this code was just confused. You can remove them.
When you are running programs handling really large amounts of data ( to my experience when the totals memory consumption of the program approaches something like 1GB) deleting some objects:
del largeObject1
del largeObject2
…
can give your program the necessary breathing room to function without running out of memory. This can be the easiest way to modify a given program, in case of a “MemoryError” runtime error.
Actually, I just came across a use for this. If you use locals() to return a dictionary of local variables (useful when parsing things) then del is useful to get rid of a temporary that you don't want to return.