I need to process lots of data in lists and so have been looking at what the best way of doing this is using Python.
The main ways I've come up with are using:
- List comprehensions
- generator expressions
- functional style operations (map,filter etc.)
I know generally list comprehensions are probably the most "Pythonic" method, but what is best in terms of performance?
Inspired by this answer: Python List Comprehension Vs. Map , I've tweaked the questions to allow generator expressions to be compared:
For built-ins:
$ python -mtimeit -s 'import math;xs=range(10)' 'sum(map(math.sqrt, xs))'
100000 loops, best of 3: 2.96 usec per loop
$ python -mtimeit -s 'import math;xs=range(10)' 'sum([math.sqrt(x) for x in xs)]'
100000 loops, best of 3: 3.75 usec per loop
$ python -mtimeit -s 'import math;xs=range(10)' 'sum(math.sqrt(x) for x in xs)'
100000 loops, best of 3: 3.71 usec per loop
For lambdas:
$ python -mtimeit -s'xs=range(10)' 'sum(map(lambda x: x+2, xs))'
100000 loops, best of 3: 2.98 usec per loop
$ python -mtimeit -s'xs=range(10)' 'sum([x+2 for x in xs])'
100000 loops, best of 3: 1.66 usec per loop
$ python -mtimeit -s'xs=range(10)' 'sum(x+2 for x in xs)'
100000 loops, best of 3: 1.48 usec per loop
Making a list:
$ python -mtimeit -s'xs=range(10)' 'list(map(lambda x: x+2, xs))'
100000 loops, best of 3: 3.19 usec per loop
$ python -mtimeit -s'xs=range(10)' '[x+2 for x in xs]'
100000 loops, best of 3: 1.21 usec per loop
$ python -mtimeit -s'xs=range(10)' 'list(x+2 for x in xs)'
100000 loops, best of 3: 3.36 usec per loop
It appears that map is best when paired with built-in functions, otherwise, generator expressions beat out list comprehensions. Along with slightly cleaner syntax, generator expressions also save much memory over list comprehensions because they are lazily evaluated. So in the absence of specific tests for your application, you should use map with builtins, a list comprehension when you require a list result, otherwise a generator. If you're really concerned with performance, you might take a look at whether you actually require lists at all points in your program.
Related
I've been doing some benchmarking of Python code and have noticed a strange thing:
$ python -m timeit '{x**2 for x in range(10_000)}'
100 loops, best of 3: 3.3 msec per loop
$ python -m timeit 'set([x**2 for x in range(10_000)])'
100 loops, best of 3: 3.21 msec per loop
Why is a set comprehension slower than using the set() function with a list comprehension, which creates an intermediate list? I'm using Python 3.6.6.
According to this question, I checked the performance on my laptop.
Surprisingly, I found that pop(0) from a list is faster than popleft() from a deque stucture:
python -m timeit 'l = range(10000)' 'l.pop(0)'
gives:
10000 loops, best of 3: 66 usec per loop
While:
python -m timeit 'import collections' 'l = collections.deque(range(10000))' 'l.popleft()'
gives:
10000 loops, best of 3: 123 usec per loop
Moreover, I checked the performance on jupyter finding the same outcome:
%timeit l = range(10000); l.pop(0)
10000 loops, best of 3: 64.7 µs per loop
from collections import deque
%timeit l = deque(range(10000)); l.popleft()
10000 loops, best of 3: 122 µs per loop
What is the reason?
The problem is that your timeit call also times the deque/list creation, and creating a deque is obviously much slower because of the chaining.
In the command line, you can pass the setup to timeit using the -s option like this:
python -m timeit -s"import collections, time; l = collections.deque(range(10000000))" "l.popleft()"
Also, since setup is only run once, you get a pop error (empty list) after a whule, since I haven't changed default number of iterations, so I created a large deque to make it up, and got
10000000 loops, best of 3: 0.0758 usec per loop
on the other hand with list it's slower:
python -m timeit -s "l = list(range(10000000))" "l.pop(0)"
100 loops, best of 3: 9.72 msec per loop
I have also coded the bench in a script (more convenient), with a setup (to avoid clocking the setup) and 99999 iterations on a 100000-size list:
import timeit
print(timeit.timeit(stmt='l.pop(0)',setup='l = list(range(100000))',number=99999))
print(timeit.timeit(setup='import collections; l = collections.deque(range(100000))', stmt='l.popleft()', number=99999))
no surprise: deque wins:
2.442976927292288 for pop in list
0.007311641921253109 for pop in deque
note that l.pop() for the list runs in 0.011536903686244897 seconds, which is very good when popping the last element, as expected.
I have this simple function that partitions a list and returns an index i in the list such that elements at indices less that i are smaller than list[i] and elements at indices greater than i are bigger.
def partition(arr):
first_high = 0
pivot = len(arr) - 1
for i in range(len(arr)):
if arr[i] < arr[pivot]:
arr[first_high], arr[i] = arr[i], arr[first_high]
first_high = first_high + 1
arr[first_high], arr[pivot] = arr[pivot], arr[first_high]
return first_high
if __name__ == "__main__":
arr = [1, 5, 4, 6, 0, 3]
pivot = partition(arr)
print(pivot)
The runtime is substantially bigger with python 3.4 that python 2.7.6
on OS X:
time python3 partition.py
real 0m0.040s
user 0m0.027s
sys 0m0.010s
time python partition.py
real 0m0.031s
user 0m0.018s
sys 0m0.011s
Same thing on ubuntu 14.04 / virtual box
python3:
real 0m0.049s
user 0m0.034s
sys 0m0.015s
python:
real 0m0.044s
user 0m0.022s
sys 0m0.018s
Is python3 inherently slower that python2.7 or is there any specific optimizations to the code do make run as fast as on python2.7
As mentioned in the comments, you should be benchmarking with timeit rather than with OS tools.
My guess is the range function is probably performing a little slower in Python 3. In Python 2 it simply returns a list, in Python 3 it returns a range which behave more or less like a generator. I did some benchmarking and this was the result, which may be a hint on what you're experiencing:
python -mtimeit "range(10)"
1000000 loops, best of 3: 0.474 usec per loop
python3 -mtimeit "range(10)"
1000000 loops, best of 3: 0.59 usec per loop
python -mtimeit "range(100)"
1000000 loops, best of 3: 1.1 usec per loop
python3 -mtimeit "range(100)"
1000000 loops, best of 3: 0.578 usec per loop
python -mtimeit "range(1000)"
100000 loops, best of 3: 11.6 usec per loop
python3 -mtimeit "range(1000)"
1000000 loops, best of 3: 0.66 usec per loop
As you can see, when input provided to range is small, it tends to be fast in Python 2. If the input grows, then Python 3's range behave better.
My suggestion: test the code for larger arrays, with a hundred or a thousand elements.
Actually, I went further and test a complete iteration through the elements. The results were totally in favor of Python 2:
python -mtimeit "for i in range(1000):pass"
10000 loops, best of 3: 31 usec per loop
python3 -mtimeit "for i in range(1000):pass"
10000 loops, best of 3: 45.3 usec per loop
python -mtimeit "for i in range(10000):pass"
1000 loops, best of 3: 330 usec per loop
python3 -mtimeit "for i in range(10000):pass"
1000 loops, best of 3: 480 usec per loop
My conclusion is that, is probably faster to iterate through a list than through a generator. Although the latter is definitely more efficient regarding memory consumption. This is a classic example of the trade off between speed and memory. Although the speed difference is not that big per se (less than miliseconds). So you should value this and what's better for your program.
As the title states, how expensive are Python dictionaries to handle? Creation, insertion, updating, deletion, all of it.
Asymptotic time complexities are interesting themselves, but also how they compare to e.g. tuples or normal lists.
dicts (just like sets when you don't need to associate a value to each key but simply record if a key is present or absent) are pretty heavily optimized. Creating a dict from N keys or key/value pairs is O(N), fetching is O(1), putting is amortized O(1), and so forth. Can't really do anything substantially better for any non-tiny container!
For tiny containers, you can easily check the boundaries with timeit-based benchmarks. For example:
$ python -mtimeit -s'empty=()' '23 in empty'
10000000 loops, best of 3: 0.0709 usec per loop
$ python -mtimeit -s'empty=set()' '23 in empty'
10000000 loops, best of 3: 0.101 usec per loop
$ python -mtimeit -s'empty=[]' '23 in empty'
10000000 loops, best of 3: 0.0716 usec per loop
$ python -mtimeit -s'empty=dict()' '23 in empty'
10000000 loops, best of 3: 0.0926 usec per loop
this shows that checking membership in empty lists or tuples is faster, by a whopping 20-30 nanoseconds, than checking membership in empty sets or dicts; when every nanosecond matters, this info might be relevant to you. Moving up a bit...:
$ python -mtimeit -s'empty=range(7)' '23 in empty'
1000000 loops, best of 3: 0.318 usec per loop
$ python -mtimeit -s'empty=tuple(range(7))' '23 in empty'
1000000 loops, best of 3: 0.311 usec per loop
$ python -mtimeit -s'empty=set(range(7))' '23 in empty'
10000000 loops, best of 3: 0.109 usec per loop
$ python -mtimeit -s'empty=dict.fromkeys(range(7))' '23 in empty'
10000000 loops, best of 3: 0.0933 usec per loop
you see that for 7-items containers (not including the one of interest) the balance of performance has shifted, and now dicts and sets have the advantages by HUNDREDS of nanoseconds. When the item of interest IS present:
$ python -mtimeit -s'empty=range(7)' '5 in empty'
1000000 loops, best of 3: 0.246 usec per loop
$ python -mtimeit -s'empty=tuple(range(7))' '5 in empty'
1000000 loops, best of 3: 0.25 usec per loop
$ python -mtimeit -s'empty=dict.fromkeys(range(7))' '5 in empty'
10000000 loops, best of 3: 0.0921 usec per loop
$ python -mtimeit -s'empty=set(range(7))' '5 in empty'
10000000 loops, best of 3: 0.112 usec per loop
dicts and sets don't gain much, but tuples and list do, even though dicts and set remain vastly faster.
And so on, and so forth -- timeit makes it trivially easy to run micro-benchmarks (strictly speaking, warranted only for those exceedingly rare situations where nanoseconds DO matter, but, easy enough to do, that it's no big hardship to check for OTHER cases;-).
Dictionaries are one of the more heavily tuned parts of Python, since they underlie so much of the language. For example, members of a class, and variables in a stack frame are both stored internally in dictionaries. They will be a good choice if they are the right data structure.
Choosing between lists and dicts based on performance seems odd: they do different things. Maybe you can tell us more about the problem you are trying to solve.
It's the sixth of July, 2022, and I thought I'd update the performance values of the accepted answer. On an AMD 5900HS in low power mode, yet connected to power, using Python 3.10.4. Windows 10.
(edit: forgot the last ones)
python -mtimeit -s"empty=()" "23 in empty"
20000000 loops, best of 5: 16.9 nsec per loop
python -mtimeit -s"empty=set()" "23 in empty"
20000000 loops, best of 5: 18.1 nsec per loop
python -mtimeit -s"empty=[]" "23 in empty"
20000000 loops, best of 5: 15.1 nsec per loop
python -mtimeit -s"empty=dict()" "23 in empty"
10000000 loops, best of 5: 21.7 nsec per loopop
python -mtimeit -s"empty=range(7)" "23 in empty"
10000000 loops, best of 5: 30.9 nsec per loop
python -mtimeit -s"empty=tuple(range(7))" "23 in empty"
5000000 loops, best of 5: 60 nsec per loop
python -mtimeit -s"empty=set(range(7))" "23 in empty"
20000000 loops, best of 5: 16.6 nsec per loop
python -mtimeit -s"empty=dict.fromkeys(range(7))" "23 in empty"
20000000 loops, best of 5: 18.6 nsec per loop
python -mtimeit -s'empty=range(7)' '5 in empty'
5000000 loops, best of 5: 43 nsec per loop
python -mtimeit -s"empty=tuple(range(7))" "5 in empty"
5000000 loops, best of 5: 46.7 nsec per loop
python -mtimeit -s"empty=set(range(7))" "5 in empty"
20000000 loops, best of 5: 16.6 nsec per loop
python -mtimeit -s"empty=dict.fromkeys(range(7))" "5 in empty"
10000000 loops, best of 5: 18.5 nsec per loop
What is the cost of len() function for Python built-ins? (list/tuple/string/dictionary)
It's O(1) (constant time, not depending of actual length of the element - very fast) on every type you've mentioned, plus set and others such as array.array.
Calling len() on those data types is O(1) in CPython, the official and most common implementation of the Python language. Here's a link to a table that provides the algorithmic complexity of many different functions in CPython:
TimeComplexity Python Wiki Page
All those objects keep track of their own length. The time to extract the length is small (O(1) in big-O notation) and mostly consists of [rough description, written in Python terms, not C terms]: look up "len" in a dictionary and dispatch it to the built_in len function which will look up the object's __len__ method and call that ... all it has to do is return self.length
The below measurements provide evidence that len() is O(1) for oft-used data structures.
A note regarding timeit: When the -s flag is used and two strings are passed to timeit the first string is executed only once and is not timed.
List:
$ python -m timeit -s "l = range(10);" "len(l)"
10000000 loops, best of 3: 0.0677 usec per loop
$ python -m timeit -s "l = range(1000000);" "len(l)"
10000000 loops, best of 3: 0.0688 usec per loop
Tuple:
$ python -m timeit -s "t = (1,)*10;" "len(t)"
10000000 loops, best of 3: 0.0712 usec per loop
$ python -m timeit -s "t = (1,)*1000000;" "len(t)"
10000000 loops, best of 3: 0.0699 usec per loop
String:
$ python -m timeit -s "s = '1'*10;" "len(s)"
10000000 loops, best of 3: 0.0713 usec per loop
$ python -m timeit -s "s = '1'*1000000;" "len(s)"
10000000 loops, best of 3: 0.0686 usec per loop
Dictionary (dictionary-comprehension available in 2.7+):
$ python -mtimeit -s"d = {i:j for i,j in enumerate(range(10))};" "len(d)"
10000000 loops, best of 3: 0.0711 usec per loop
$ python -mtimeit -s"d = {i:j for i,j in enumerate(range(1000000))};" "len(d)"
10000000 loops, best of 3: 0.0727 usec per loop
Array:
$ python -mtimeit -s"import array;a=array.array('i',range(10));" "len(a)"
10000000 loops, best of 3: 0.0682 usec per loop
$ python -mtimeit -s"import array;a=array.array('i',range(1000000));" "len(a)"
10000000 loops, best of 3: 0.0753 usec per loop
Set (set-comprehension available in 2.7+):
$ python -mtimeit -s"s = {i for i in range(10)};" "len(s)"
10000000 loops, best of 3: 0.0754 usec per loop
$ python -mtimeit -s"s = {i for i in range(1000000)};" "len(s)"
10000000 loops, best of 3: 0.0713 usec per loop
Deque:
$ python -mtimeit -s"from collections import deque;d=deque(range(10));" "len(d)"
100000000 loops, best of 3: 0.0163 usec per loop
$ python -mtimeit -s"from collections import deque;d=deque(range(1000000));" "len(d)"
100000000 loops, best of 3: 0.0163 usec per loop
len is an O(1) because in your RAM, lists are stored as tables (series of contiguous addresses). To know when the table stops the computer needs two things : length and start point. That is why len() is a O(1), the computer stores the value, so it just needs to look it up.