Can `del` make Python faster? - python

As a programmer, I generally try to avoid the del statement because it is often an extra complication that Python program don't often need. However, when browsing the standard library (threading, os, etc...) and the pseudo-standard library (numpy, scipy, etc...) I see it used a non-zero amount of times, and I'd like to better understand when it is/isn't appropriate the del statement.
Specifically, I'm curious about the relationship between the Python del statement and the efficiency of a Python program. It seems to me that del might help a program run faster by reducing the amount of clutter lookup instructions need to sift through. However, I can also see a world where the extra instruction takes up more time than it saves.
My question is: does anyone have any interesting code snippets that demonstrate cases where del significantly changes the speed of the program? I'm most interested in cases where del improves the execution speed of a program, although non-trivial cases where del can really hurt are also interesting.

The main reason that standard Python libraries use del is not for speed but for namespace decluttering ("avoiding namespace pollution" is another term I believe I have seen for this). As user2357112 noted in a comment, it can also be used to break a traceback cycle.
Let's take a concrete example: line 58 of types.py in the cpython implementation reads:
del sys, _f, _g, _C, _c, # Not for export
If we look above, we find:
def _f(): pass
FunctionType = type(_f)
LambdaType = type(lambda: None) # Same as FunctionType
CodeType = type(_f.__code__)
MappingProxyType = type(type.__dict__)
SimpleNamespace = type(sys.implementation)
def _g():
yield 1
GeneratorType = type(_g())
_f and _g are two of the names being deled; as the comment says, they are "not for export".1
You might think this is covered via:
__all__ = [n for n in globals() if n[:1] != '_']
(which is near the end of that same file), but as What's the python __all__ module level variable for? (and the linked Can someone explain __all__ in Python?) note, these affect the names exported via from types import *, rather than what's visible via import types; dir(types).
It's not necessary to clean up your module namespace, but doing so prevents people from sneaking into it and using undefined items. So it's good for a couple of purposes.
1Looks like someone forgot to update this to include _ag. _GeneratorWrapper is harder to hide, unfortunately.

Specifically, I'm curious about the relationship between the Python del statement and the efficiency of a Python program.
As far as performance is concerned, del (excluding index deletion like del x[i]) is primarily useful for GC purposes. If you have a variable pointing to some large object that is no longer needed, deling that variable will (assuming there are no other references to it) deallocate that object (with CPython this happens immediately, as it uses reference counting). This could make the program faster if you'd otherwise be filling your RAM/caches; only way to know is to actually benchmark it.
It seems to me that del might help a program run faster by reducing the amount of clutter lookup instructions need to sift through.
Unless you're using thousands of variables (which you shouldn't be), it's exceedingly unlikely that removing variables using del will make any noticeable difference in performance.

Related

Setting a variable to a parameter value inline when calling a function

In other languages, like Java, you can do something like this:
String path;
if (exists(path = "/some/path"))
my_path = path;
the point being that path is being set as part of specifying a parameter to a method call. I know that this doesn't work in Python. It is something that I've always wished Python had.
Is there any way to accomplish this in Python? What I mean here by "accomplish" is to be able to write both the call to exists and the assignment to path, as a single statement with no prior supporting code being necessary.
I'll be OK with it if a way of doing this requires the use of an additional call to a function or method, including anything I might write myself. I spent a little time trying to come up with such a module, but failed to come up with anything that was less ugly than just doing the assignment before calling the function.
UPDATE: #BrokenBenchmark's answer is perfect if one can assume Python 3.8 or better. Unfortunately, I can't yet do that, so I'm still searching for a solution to this problem that will work with Python 3.7 and earlier.
Yes, you can use the walrus operator if you're using Python 3.8 or above:
import os
if os.path.isdir((path := "/some/path")):
my_path = path
I've come up with something that has some issues, but does technically get me where I was looking to be. Maybe someone else will have ideas for improving this to make it fully cool. Here's what I have:
# In a utility module somewhere
def v(varname, arg=None):
if arg is not None:
if not hasattr(v, 'vals'):
v.vals = {}
v.vals[varname] = arg
return v.vals[varname]
# At point of use
if os.path.exists(v('path1', os.path.expanduser('~/.harmony/mnt/fetch_devqa'))):
fetch_devqa_path = v('path1')
As you can see, this fits my requirement of no extra lines of code. The "variable" involved, path1 in this example, is stored on the function that implements all of this, on a per-variable-name basis.
One can question if this is concise and readable enough to be worth the bother. For me, the verdict is still out. If not for the need to call the v() function a second time, I think I'd be good with it structurally.
The only functional problem I see with this is that it isn't thread-safe. Two copies of the code could run concurrently and run into a race condition between the two calls to v(). The same problem is greatly magnified if one fails to choose unique variable names every time this is used. That's probably the deal killer here.
Can anyone see how to use this to get to a similar solution without the drawbacks?

NameError for globals

In some Python modules, I have code like this:
try:
someGlobal
except NameError:
someGlobal = []
This can be important in case I want to support module reloading and some certain object must not be overwritten (e.g. because I know that it is referred to directly).
Many editors (e.g. PyCharm) mark this as an error. Is there some other way to write the same code which is more Python idiomatic? Or is this already Python idiomatic and it's a fault of the editors to complain about this?
I'd go with
if 'someGlobal' not in dir():
someGlobal = 23
This has the advantage of simplicity, but can be a bit slow if the module has a lot of globals, since dir() is a list and the in operator on it is O(N).
For speed, and at a modest disadvantage in terms of simplicity,
if 'someGlobal' not in vars():
someGlobal = 23
which should be faster since vars() is a dict, so the in operator on it is O(1).
It is an error, at least given the information availale to the editor. So the editor isn't wrong; it is just that you are specifically coding for that error.

Analogue of devar in Python

When writing Python code, I often find myself wanting to get behavior similar to Lisp's defvar. Basically, if some variable doesn't exist, I want to create it and assign a particular value to it. Otherwise, I don't want to do anything, and in particular, I don't want to override the variable's current value.
I looked around online and found this suggestion:
try:
some_variable
except NameError:
some_variable = some_expensive_computation()
I've been using it and it works fine. However, to me this has the look of code that's not paradigmatically correct. The code is four lines, instead of the 1 that would be required in Lisp, and it requires exception handling to deal with something that's not "exceptional."
The context is that I'm doing interactively development. I'm executing my Python code file frequently, as I improve it, and I don't want to run some_expensive_computation() each time I do so. I could arrange to run some_expensive_computation() by hand every time I start a new Python interpreter, but I'd rather do something automated, particularly so that my code can be run non-interactively. How would a season Python programmer achieve this?
I'm using WinXP with SP3, Python 2.7.5 via Anaconda 1.6.2 (32-bit), and running inside Spyder.
It's generally a bad idea to rely on the existence or not of a variable having meaning. Instead, use a sentinel value to indicate that a variable is not set to an appropriate value. None is a common choice for this kind of sentinel, though it may not be appropriate if that is a possible output of your expensive computation.
So, rather than your current code, do something like this:
# early on in the program
some_variable = None
# later:
if some_variable is None:
some_variable = some_expensive_computation()
# use some_variable here
Or, a version where None could be a significant value:
_sentinel = object()
some_variable = _sentinel # this means it doesn't have a meaningful value
# later
if some_variable is _sentinel:
some_variable = some_expensive_computation()
It is hard to tell which is of greater concern to you, specific language features or a persistent session. Since you say:
The context is that I'm doing interactively development. I'm executing my Python code file frequently, as I improve it, and I don't want to run some_expensive_computation() each time I do so.
You may find that IPython provides a persistent, interactive environment that is pleasing to you.
Instead of writing Lisp in Python, just think about what you're trying to do. You want to avoid calling an expensive function twice and having it run two times. You can write your function do to that:
def f(x):
if x in cache:
return cache[x]
result = ...
cache[x] = result
return result
Or make use of Python's decorators and just decorate the function with another function that takes care of the caching for you. Python 3.3 comes with functools.lru_cache, which does just that:
import functools
#functools.lru_cache()
def f(x):
return ...
There are quite a few memoization libraries in the PyPi for 2.7.
For the use case you give, guarding with a try ... except seems like a good way to go about it: Your code is depending on leftover variables from a previous execution of your script.
But I agree that it's not a nice implementation of the concept "here's a default value, use it unless the variable is already set". Python does not directly support this for variables, but it does have a default-setter for dictionary keys:
myvalues = dict()
myvalues.setdefault("some_variable", 42)
print some_variable # prints 42
The first argument of setdefault must be a string containing the name of the variable to be defined.
If you had a complicated system of settings and defaults (like emacs does), you'd probably keep the system settings in their own dictionary, so this is all you need. In your case, you could also use setdefault directly on global variables (only), with the help of the built-in function globals() which returns a modifiable dictionary:
globals().setdefault("some_variable", 42)
But I would recommend using a dictionary for your persistent variables (you can use the try... except method to create it conditionally). It keeps things clean and it seems more... pythonic somehow.
Let me try to summarize what I've learned here:
Using exception handling for flow control is fine in Python. I could do it once to set up a dict in which I can store what ever I want.
There are libraries and language features that are designed for some form of persistence; these can provide "high road" solutions for some applications. The shelve module is an obvious candidate here, but I would construe "some form of persistence" broadly enough to include #Blender's suggest to use memoization.

In Python, is use of `del` statement a code smell?

I tend to use it whenever I am working on a prototype script, and:
Use a somewhat common variable (such as fileCount), and
Have a large method (20+ lines), and
Do not use classes or namespaces yet.
In this situation, in order to avoid potential variable clash, I delete the bugger as soon as I am done with it. I know, in a production code I should avoid 1., 2., and 3., but going from a prototype that works to a completely polished class is time consuming. Sometimes I might want to settle for a sub-optimal, quick refactoring job. In that case I find keeping the del statements handy. Am I developing an unnecessary, bad habit? Is del totally avoidable? When would it be a good thing?
I don't think that del by itself is a code smell.
Reusing a variable name in the same namespace is definitely a code smell as is not using classes and other namespaces where appropriate. So using del to facilitate that sort of thing is a code smell.
The only really appropriate use of del that I can think of off the top of my head is breaking cyclic references which are often a code smell as well (and often times, this isn't even necessary). Remember, all del does is delete the reference to the object and not the object itself. That will be taken care of by either reference counting or garbage collecting.
>>> a = [1, 2]
>>> b = a
>>> del a
>>> a
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined
>>> b
[1, 2]
You can see that the list is kept alive after the del statement because b still holds a reference to it.
So, while del isn't really a code smell, it can be associated with things that are.
Any code that's well organized in functions, classes and methods doesn't need del except in exceptional circumstances. Aim to build your apps well factored from the start by using more functions and methods, avoid reusing variable names, etc.
The use of a del statement is OK - it doesn't lead to any trouble, I use it often when I use Python as a replacement for shell scripts on my system, and when I'm making script experiments. However, if it appears often in a real application or library, it is an indication that something isn't all right, probably badly structured code. I never had to use it in an application, and you'd rarely see it used anywhere on code that's been released.

Using explicit del in python on local variables

What are the best practices and recommendations for using explicit del statement in python? I understand that it is used to remove attributes or dictionary/list elements and so on, but sometimes I see it used on local variables in code like this:
def action(x):
result = None
something = produce_something(x)
if something:
qux = foo(something)
result = bar(qux, something)
del qux
del something
return result
Are there any serious reasons for writing code like this?
Edit: consider qux and something to be something "simple" without a __del__ method.
I don't remember when I last used del -- the need for it is rare indeed, and typically limited to such tasks as cleaning up a module's namespace after a needed import or the like.
In particular, it's not true, as another (now-deleted) answer claimed, that
Using del is the only way to make sure
a object's __del__ method is called
and it's very important to understand this. To help, let's make a class with a __del__ and check when it is called:
>>> class visdel(object):
... def __del__(self): print 'del', id(self)
...
>>> d = visdel()
>>> a = list()
>>> a.append(d)
>>> del d
>>>
See? del doesn't "make sure" that __del__ gets called: del removes one reference, and only the removal of the last reference causes __del__ to be called. So, also:
>>> a.append(visdel())
>>> a[:]=[1, 2, 3]
del 550864
del 551184
when the last reference does go away (including in ways that don't involve del, such as a slice assignment as in this case, or other rebindings of names and other slots), then __del__ gets called -- whether del was ever involved in reducing the object's references, or not, makes absolutely no difference whatsoever.
So, unless you specifically need to clean up a namespace (typically a module's namespace, but conceivably that of a class or instance) for some specific reason, don't bother with del (it can be occasionally handy for removing an item from a container, but I've found that I'm often using the container's pop method or item or slice assignment even for that!-).
No.
I'm sure someone will come up with some silly reason to do this, e.g. to make sure someone doesn't accidentally use the variable after it's no longer valid. But probably whoever wrote this code was just confused. You can remove them.
When you are running programs handling really large amounts of data ( to my experience when the totals memory consumption of the program approaches something like 1GB) deleting some objects:
del largeObject1
del largeObject2
…
can give your program the necessary breathing room to function without running out of memory. This can be the easiest way to modify a given program, in case of a “MemoryError” runtime error.
Actually, I just came across a use for this. If you use locals() to return a dictionary of local variables (useful when parsing things) then del is useful to get rid of a temporary that you don't want to return.

Categories

Resources