In one of my classes I have a number of methods that all draw values from the same dictionaries. However, if one of the methods tries to access a value that isn't there, it has to call another method to make the value associated with that key.
I currently have this implemented as follows, where findCrackDepth(tonnage) assigns a value to self.lowCrackDepth[tonnage].
if tonnage not in self.lowCrackDepth:
self.findCrackDepth(tonnage)
lcrack = self.lowCrackDepth[tonnage]
However, it would also be possible for me to do this as
try:
lcrack = self.lowCrackDepth[tonnage]
except KeyError:
self.findCrackDepth(tonnage)
lcrack = self.lowCrackDepth[tonnage]
I assume there is a performance difference between the two related to how often the values is already in the dictionary. How big is this difference? I'm generating a few million such values (spread across a many dictionaries in many instances of the class), and for each time the value doesn't exist, there are probably two times where it does.
It's a delicate problem to time this because you need care to avoid "lasting side effects" and the performance tradeoff depends on the % of missing keys. So, consider a dil.py file as follows:
def make(percentmissing):
global d
d = dict.fromkeys(range(100-percentmissing), 1)
def addit(d, k):
d[k] = k
def with_in():
dc = d.copy()
for k in range(100):
if k not in dc:
addit(dc, k)
lc = dc[k]
def with_ex():
dc = d.copy()
for k in range(100):
try: lc = dc[k]
except KeyError:
addit(dc, k)
lc = dc[k]
def with_ge():
dc = d.copy()
for k in range(100):
lc = dc.get(k)
if lc is None:
addit(dc, k)
lc = dc[k]
and a series of timeit calls such as:
$ python -mtimeit -s'import dil; dil.make(10)' 'dil.with_in()'
10000 loops, best of 3: 28 usec per loop
$ python -mtimeit -s'import dil; dil.make(10)' 'dil.with_ex()'
10000 loops, best of 3: 41.7 usec per loop
$ python -mtimeit -s'import dil; dil.make(10)' 'dil.with_ge()'
10000 loops, best of 3: 46.6 usec per loop
this shows that, with 10% missing keys, the in check is substantially the fastest way.
$ python -mtimeit -s'import dil; dil.make(1)' 'dil.with_in()'
10000 loops, best of 3: 24.6 usec per loop
$ python -mtimeit -s'import dil; dil.make(1)' 'dil.with_ex()'
10000 loops, best of 3: 23.4 usec per loop
$ python -mtimeit -s'import dil; dil.make(1)' 'dil.with_ge()'
10000 loops, best of 3: 42.7 usec per loop
with just 1% missing keys, the exception approach is marginally fastest (and the get approach remains the slowest one in either case).
So, for optimal performance, unless the vast majority (99%+) of lookups is going to succeed, the in approach is preferable.
Of course, there's another, elegant possibility: adding a dict subclass like...:
class dd(dict):
def __init__(self, *a, **k):
dict.__init__(self, *a, **k)
def __missing__(self, k):
addit(self, k)
return self[k]
def with_dd():
dc = dd(d)
for k in range(100):
lc = dc[k]
However...:
$ python -mtimeit -s'import dil; dil.make(1)' 'dil.with_dd()'
10000 loops, best of 3: 46.1 usec per loop
$ python -mtimeit -s'import dil; dil.make(10)' 'dil.with_dd()'
10000 loops, best of 3: 55 usec per loop
...while slick indeed, this is not a performance winner -- it's about even with the get approach, or slower, just with much nicer-looking code to use it. (defaultdict, semantically analogous to this dd class, would be a performance win if it was applicable, but that's because the __missing__ special method, in that case, is implemented in well optimized C code).
Checking if a key exists is cheaper or at least as cheap as retrieving it. So use the if not in solution which is much cleaner and more readable.
According to your question a key not existing is not an error-like case so there's no good reason to let python raise an exception (even though you catch it immediately), and if you have a if not in check, everyone knows your intention - to get the existing value or otherwise generate it.
When in doubt, profile.
Run a test to see if, in your environment, one runs faster than another.
If it is exceptional, use an exception. If you expect the key to be in there, use try/except, if you don't know whether the key is in there, use not in.
I believe the .get() method of a dict has a parameter for setting the default value. You could use that and have it in one line. I'm not sure how it affects performance though.
Related
I've recently noticed that the built-in list has a list.clear() method. So far, when I wanted to ensure a list is empty, I just create a new list: l = [].
I was curious if it makes a difference, so I measured it:
$ python --version
Python 3.11.0
$ python -m timeit 'a = [1, 2, 3, 4]; a= []'
5000000 loops, best of 5: 61.5 nsec per loop
$ python -m timeit 'a = [1, 2, 3, 4]; a.clear()'
5000000 loops, best of 5: 57.4 nsec per loop
So creating a new empty list is about 7% slower than using clear() for small lists.
For bigger lists, it seems to be faster to just create a new list:
$ python -m timeit 'a = list(range(10_000)); a = []'
2000 loops, best of 5: 134 usec per loop
$ python -m timeit 'a = list(range(10_000)); a = []'
2000 loops, best of 5: 132 usec per loop
$ python -m timeit 'a = list(range(10_000)); a = []'
2000 loops, best of 5: 134 usec per loop
$ python -m timeit 'a = list(range(10_000)); a.clear()'
2000 loops, best of 5: 143 usec per loop
$ python -m timeit 'a = list(range(10_000)); a.clear()'
2000 loops, best of 5: 139 usec per loop
$ python -m timeit 'a = list(range(10_000)); a.clear()'
2000 loops, best of 5: 139 usec per loop
why is that the case?
edit: Small clarification: I am aware that both are (in some scenarios) not semantically the same. If you clear a list you could still have two pointers to it. When you create a new list object, there will not be a pointer to it. For this question, I don't care about this potential difference. Just assume there is exactly one reference to that list.
I was curious if it makes a difference, so I measured it:
This is not the right way to do the measurement. The setup code should be separated out, so that it is not included in the timing. In fact, all of these operations are O(1); you see O(N) results because creating the original data is O(N).
So far, when I wanted to ensure a list is empty, I just create a new list: l = [].
list.clear is not equivalent to creating a new list. Consider:
a = [1,2,3]
b = a
Doing a = [] will not cause b to change, because it only rebinds the a name. But doing a.clear() will cause b to change, because both names refer to the same list object, which was mutated.
(Using a = [] is fine when that result is intentional, of course. Similarly, user-defined objects are often best "reset" by re-creating them.)
Instead, a.clear() is equivalent to a[:] = [] (or the same with some other empty sequence). However, this is harder to read. The Zen of Python tells us: "Beautiful is better than ugly"; "Explicit is better than implicit"; "There should be one-- and preferably only one --obvious way to do it".
It's also clearly (sorry) faster:
$ python -m timeit --setup 'a = list(range(10_000))' 'a.clear()'
10000000 loops, best of 5: 34.9 nsec per loop
$ python -m timeit --setup 'a = list(range(10_000))' 'a[:] = []'
5000000 loops, best of 5: 67.1 nsec per loop
$ python -m timeit --setup 'a = list(range(10_000))' 'a[:] = ()'
5000000 loops, best of 5: 51.7 nsec per loop
The "clever" ways require creating temporary objects, both for the iterable and for the slice used to index the list. Either way, it will end up in C code that does relatively simple, O(1) pointer bookkeeping (and a memory deallocation).
I have to make a very large number of simulations on a R*C grid.
These simulations are altering the grid, so I need to copy my reference grid before each, and then apply my simulating function on the fresh new grid.
What is the fastest way to do this in Python?
Since I have not found a similar question on StackOverflow, I did the tests myself and decided to post them here thinking they could be useful to other people.
The answer will be a community response so that other people can add new measurements with possibly other techniques.
If you add another method, remember to measure all the old tests and update them because the time depends on the computer used, avoid biasing the results.
I used a bash variable for setting up the timeit tests:
setup="""
R = 100
C = 100
from copy import deepcopy
import numpy as np
ref = [[i for i in range(C)] for _ in range(R)]
ref_np = np.array(ref)
cp = [[100 for i in range(C)] for _ in range(R)]
cp_np = np.array(cp)
"""
Just for convenience, I also set a temporary alias pybench:
alias pybench='python3.5 -m timeit -s "$setup" $1'
Python 3
Python 3.5.0+ (default, Oct 11 2015, 09:05:38)
Deepcopy:
>>> pybench "cp = deepcopy(ref)"
100 loops, best of 3: 8.29 msec per loop
Modifying pre-created array using index:
>>> pybench \
"for y in range(R):
for x in range(C):
cp[y][x] = ref[y][x]"
1000 loops, best of 3: 1.16 msec per loop
Nested list comprehension:
>>> pybench "cp = [[x for x in row] for row in ref]"
1000 loops, best of 3: 390 usec per loop
Slicing:
>>> pybench "cp = [row[:] for row in ref]"
10000 loops, best of 3: 45.8 usec per loop
NumPy copy:
>>> pybench "cp_np = np.copy(ref_np)"
100000 loops, best of 3: 6.03 usec per loop
Copying to pre-created NumPy array:
>>> pybench "np.copyto(cp_np, ref_np)"
100000 loops, best of 3: 4.52 usec per loop
There is nothing very surprising in these results, as you might have guessed, use NumPy is enormously faster, especially if one avoids creating a new table each time.
To add to the answer from Delgan, numpy copy's documentation says to use numpy.ndarray.copy as the preferred method. So for now, without doing a timing test, I will use numpy.ndarray.copy
https://numpy.org/doc/stable/reference/generated/numpy.copy.html
https://numpy.org/doc/stable/reference/generated/numpy.ndarray.copy.html
I have this simple function that partitions a list and returns an index i in the list such that elements at indices less that i are smaller than list[i] and elements at indices greater than i are bigger.
def partition(arr):
first_high = 0
pivot = len(arr) - 1
for i in range(len(arr)):
if arr[i] < arr[pivot]:
arr[first_high], arr[i] = arr[i], arr[first_high]
first_high = first_high + 1
arr[first_high], arr[pivot] = arr[pivot], arr[first_high]
return first_high
if __name__ == "__main__":
arr = [1, 5, 4, 6, 0, 3]
pivot = partition(arr)
print(pivot)
The runtime is substantially bigger with python 3.4 that python 2.7.6
on OS X:
time python3 partition.py
real 0m0.040s
user 0m0.027s
sys 0m0.010s
time python partition.py
real 0m0.031s
user 0m0.018s
sys 0m0.011s
Same thing on ubuntu 14.04 / virtual box
python3:
real 0m0.049s
user 0m0.034s
sys 0m0.015s
python:
real 0m0.044s
user 0m0.022s
sys 0m0.018s
Is python3 inherently slower that python2.7 or is there any specific optimizations to the code do make run as fast as on python2.7
As mentioned in the comments, you should be benchmarking with timeit rather than with OS tools.
My guess is the range function is probably performing a little slower in Python 3. In Python 2 it simply returns a list, in Python 3 it returns a range which behave more or less like a generator. I did some benchmarking and this was the result, which may be a hint on what you're experiencing:
python -mtimeit "range(10)"
1000000 loops, best of 3: 0.474 usec per loop
python3 -mtimeit "range(10)"
1000000 loops, best of 3: 0.59 usec per loop
python -mtimeit "range(100)"
1000000 loops, best of 3: 1.1 usec per loop
python3 -mtimeit "range(100)"
1000000 loops, best of 3: 0.578 usec per loop
python -mtimeit "range(1000)"
100000 loops, best of 3: 11.6 usec per loop
python3 -mtimeit "range(1000)"
1000000 loops, best of 3: 0.66 usec per loop
As you can see, when input provided to range is small, it tends to be fast in Python 2. If the input grows, then Python 3's range behave better.
My suggestion: test the code for larger arrays, with a hundred or a thousand elements.
Actually, I went further and test a complete iteration through the elements. The results were totally in favor of Python 2:
python -mtimeit "for i in range(1000):pass"
10000 loops, best of 3: 31 usec per loop
python3 -mtimeit "for i in range(1000):pass"
10000 loops, best of 3: 45.3 usec per loop
python -mtimeit "for i in range(10000):pass"
1000 loops, best of 3: 330 usec per loop
python3 -mtimeit "for i in range(10000):pass"
1000 loops, best of 3: 480 usec per loop
My conclusion is that, is probably faster to iterate through a list than through a generator. Although the latter is definitely more efficient regarding memory consumption. This is a classic example of the trade off between speed and memory. Although the speed difference is not that big per se (less than miliseconds). So you should value this and what's better for your program.
I have some data which is either 1 or 2 dimensional. I want to iterate through every pattern in the data set and perform foo() on it. If the data is 1D then add this value to a list, if it's 2D then take the mean of the inner list and append this value.
I saw this question, and decided to implement it checking for instance of a list. I can't use numpy for this application.
outputs = []
for row in data:
if isinstance(row, list):
vals = [foo(window) for window in row]
outputs.append(sum(vals)/float(len(vals)))
else:
outputs.append(foo(row))
Is there a neater way of doing this? On each run, every pattern will have the same dimensionality, so I could make a separate class for 1D/2D but that will add a lot of classes to my code. The datasets can get quite large so a quick solution is preferable.
Your code is already almost as neat and fast as it can be. The only slight improvement is replacing [foo(window) for window in row] with map(foo, row), which can be seen by the benchmarks:
> python -m timeit "foo = lambda x: x+1; list(map(foo, range(1000)))"
10000 loops, best of 3: 132 usec per loop
> python -m timeit "foo = lambda x: x+1; [foo(a) for a in range(1000)]"
10000 loops, best of 3: 140 usec per loop
isinstance() already seems faster than its counterparts hasattr() and type() ==:
> python -m timeit "[isinstance(i, int) for i in range(1000)]"
10000 loops, best of 3: 117 usec per loop
> python -m timeit "[hasattr(i, '__iter__') for i in range(1000)]"
1000 loops, best of 3: 470 usec per loop
> python -m timeit "[type(i) == int for i in range(1000)]"
10000 loops, best of 3: 130 usec per loop
However, if you count short as neat, you can also simplify your code (after replacingmap) to:
mean = lambda x: sum(x)/float(len(x)) #or `from statistics import mean` in python3.4
output = [foo(r) if isinstance(r, int) else mean(map(foo, r)) for r in data]
I recently wrote a quick and dirty BFS implementation, to find diamonds in a directed graph.
The BFS loop looked like this:
while toVisit:
y = toVisit.pop()
if y in visited: return "Found diamond"
visited.add(y)
toVisit.extend(G[y])
(G is the graph - a dictionary from node names to the lists of their neighbors)
Then comes the interesting part:
I thought that list.pop() is probably too slow, so I ran a profiler to compare the speed of this implementation with deque.pop - and got a bit of an improvement. Then I compared it with y = toVisit[0]; toVisit = toVisit[1:], and to my surprise, the last implementation is actually the fastest one.
Does this make any sense?
Is there any performance reason to ever use list.pop() instead of the apparently much faster two-liner?
You have measured wrong. With cPython 2.7 on x64, I get the following results:
$ python -m timeit 'l = list(range(10000))' 'while l: l = l[1:]'
10 loops, best of 3: 365 msec per loop
$ python -m timeit 'l = list(range(10000))' 'while l: l.pop()'
1000 loops, best of 3: 1.82 msec per loop
$ python -m timeit 'import collections' \
'l = collections.deque(list(range(10000)))' 'while l: l.pop()'
1000 loops, best of 3: 1.67 msec per loop
Use generators for perfomance
python -m timeit 'import itertools' 'l=iter(xrange(10000))' 'while next(l, None): l,a = itertools.tee(l)'
1000000 loops, best of 3: 0.986 usec per loop