I have to make a very large number of simulations on a R*C grid.
These simulations are altering the grid, so I need to copy my reference grid before each, and then apply my simulating function on the fresh new grid.
What is the fastest way to do this in Python?
Since I have not found a similar question on StackOverflow, I did the tests myself and decided to post them here thinking they could be useful to other people.
The answer will be a community response so that other people can add new measurements with possibly other techniques.
If you add another method, remember to measure all the old tests and update them because the time depends on the computer used, avoid biasing the results.
I used a bash variable for setting up the timeit tests:
setup="""
R = 100
C = 100
from copy import deepcopy
import numpy as np
ref = [[i for i in range(C)] for _ in range(R)]
ref_np = np.array(ref)
cp = [[100 for i in range(C)] for _ in range(R)]
cp_np = np.array(cp)
"""
Just for convenience, I also set a temporary alias pybench:
alias pybench='python3.5 -m timeit -s "$setup" $1'
Python 3
Python 3.5.0+ (default, Oct 11 2015, 09:05:38)
Deepcopy:
>>> pybench "cp = deepcopy(ref)"
100 loops, best of 3: 8.29 msec per loop
Modifying pre-created array using index:
>>> pybench \
"for y in range(R):
for x in range(C):
cp[y][x] = ref[y][x]"
1000 loops, best of 3: 1.16 msec per loop
Nested list comprehension:
>>> pybench "cp = [[x for x in row] for row in ref]"
1000 loops, best of 3: 390 usec per loop
Slicing:
>>> pybench "cp = [row[:] for row in ref]"
10000 loops, best of 3: 45.8 usec per loop
NumPy copy:
>>> pybench "cp_np = np.copy(ref_np)"
100000 loops, best of 3: 6.03 usec per loop
Copying to pre-created NumPy array:
>>> pybench "np.copyto(cp_np, ref_np)"
100000 loops, best of 3: 4.52 usec per loop
There is nothing very surprising in these results, as you might have guessed, use NumPy is enormously faster, especially if one avoids creating a new table each time.
To add to the answer from Delgan, numpy copy's documentation says to use numpy.ndarray.copy as the preferred method. So for now, without doing a timing test, I will use numpy.ndarray.copy
https://numpy.org/doc/stable/reference/generated/numpy.copy.html
https://numpy.org/doc/stable/reference/generated/numpy.ndarray.copy.html
Related
I've recently noticed that the built-in list has a list.clear() method. So far, when I wanted to ensure a list is empty, I just create a new list: l = [].
I was curious if it makes a difference, so I measured it:
$ python --version
Python 3.11.0
$ python -m timeit 'a = [1, 2, 3, 4]; a= []'
5000000 loops, best of 5: 61.5 nsec per loop
$ python -m timeit 'a = [1, 2, 3, 4]; a.clear()'
5000000 loops, best of 5: 57.4 nsec per loop
So creating a new empty list is about 7% slower than using clear() for small lists.
For bigger lists, it seems to be faster to just create a new list:
$ python -m timeit 'a = list(range(10_000)); a = []'
2000 loops, best of 5: 134 usec per loop
$ python -m timeit 'a = list(range(10_000)); a = []'
2000 loops, best of 5: 132 usec per loop
$ python -m timeit 'a = list(range(10_000)); a = []'
2000 loops, best of 5: 134 usec per loop
$ python -m timeit 'a = list(range(10_000)); a.clear()'
2000 loops, best of 5: 143 usec per loop
$ python -m timeit 'a = list(range(10_000)); a.clear()'
2000 loops, best of 5: 139 usec per loop
$ python -m timeit 'a = list(range(10_000)); a.clear()'
2000 loops, best of 5: 139 usec per loop
why is that the case?
edit: Small clarification: I am aware that both are (in some scenarios) not semantically the same. If you clear a list you could still have two pointers to it. When you create a new list object, there will not be a pointer to it. For this question, I don't care about this potential difference. Just assume there is exactly one reference to that list.
I was curious if it makes a difference, so I measured it:
This is not the right way to do the measurement. The setup code should be separated out, so that it is not included in the timing. In fact, all of these operations are O(1); you see O(N) results because creating the original data is O(N).
So far, when I wanted to ensure a list is empty, I just create a new list: l = [].
list.clear is not equivalent to creating a new list. Consider:
a = [1,2,3]
b = a
Doing a = [] will not cause b to change, because it only rebinds the a name. But doing a.clear() will cause b to change, because both names refer to the same list object, which was mutated.
(Using a = [] is fine when that result is intentional, of course. Similarly, user-defined objects are often best "reset" by re-creating them.)
Instead, a.clear() is equivalent to a[:] = [] (or the same with some other empty sequence). However, this is harder to read. The Zen of Python tells us: "Beautiful is better than ugly"; "Explicit is better than implicit"; "There should be one-- and preferably only one --obvious way to do it".
It's also clearly (sorry) faster:
$ python -m timeit --setup 'a = list(range(10_000))' 'a.clear()'
10000000 loops, best of 5: 34.9 nsec per loop
$ python -m timeit --setup 'a = list(range(10_000))' 'a[:] = []'
5000000 loops, best of 5: 67.1 nsec per loop
$ python -m timeit --setup 'a = list(range(10_000))' 'a[:] = ()'
5000000 loops, best of 5: 51.7 nsec per loop
The "clever" ways require creating temporary objects, both for the iterable and for the slice used to index the list. Either way, it will end up in C code that does relatively simple, O(1) pointer bookkeeping (and a memory deallocation).
How would you vectorize the evaluation of arrays of lambda functions?
Here's an example to understand what I'm talking about. (And even though I'm using numpy arrays, I'm not limiting myself to only using numpy.)
Let's say I have the following numpy arrays.
array1 = np.array(["hello", 9])
array2 = np.array([lambda s: s == "hello", lambda num: num < 10])
(You could store these kinds of objects in numpy without throwing an error, believe it or not.) What I want is something akin to the following.
array2 * array1
# Return np.array([True, True]). PS: An explanation of how to `AND` all of
# booleans together quickly would be nice too.
Of course, this seems impractical for arrays of size 2, but for arrays of arbitrary sizes, I'll assume this would yield a performance boost because of all of the low level optimizations.
So, anyone know how to write this weird kind of python code?
The simple answer, of course, is that you can't easily do this with numpy (or with standard Python, for that matter). Numpy doesn't actually vectorize most operations itself, to my knowledge: it uses libraries like BLAS/ATLAS/etc that do for certain situations. Even if it did, it would do so in C for specific situations: it certainly can't vectorize Python function execution.
If you want to involve multiprocessing in this, it is possible, but it depends on your situation. Are your individual function applications time-consuming, making them feasible to send out one-by-one, or do you need a very large number of fast function executions, in which case you'd probably want to send batches of them to each process?
In general, because of what could be argued as poor fundamental design (eg, the Global Interpreter Lock), it's very difficult with standard Python to have lightweight parallelization as you're hoping for here. There are significantly heavier methods, like the multiprocessing module or Ipython.parallel, but these require some work to use.
Alright guys, I have an answer: numpy's vectorize.
Please read the edited section though. You'll discover that python actually optimizes code for you, which actually defeats the purpose of using numpy arrays in this case. (But using numpy arrays does not decrease the performance.)
The last test really shows is that python lists are as efficient as they could be, and so this vectorization procedure is unnecessary. This is why I didn't mark this question as the "best answer".
Setup code:
def factory(i): return lambda num: num==i
array1 = list()
for i in range(10000): array1.append(factory(i))
array1 = np.array(array1)
array2 = np.array(xrange(10000))
The "unvectorized" version:
def evaluate(array1, array2):
return [func(val) for func, val in zip(array1, array2)]
%timeit evaluate(array1, array2)
# 100 loops, best of 3: 10 ms per loop
The vectorized version
def evaluate2(func, b): return func(b)
vec_evaluate = np.vectorize(evaluate2)
vec_evaluate(array1, array2)
# 100 loops, best of 3: 2.65 ms per loop
EDIT
Okay, I just wanted to paste more benchmarks that I received using the above tests, except with different test cases.
I made a third edit, showing what happens if you simply use python lists. The long story short, you actually won't regret much. This test case is on the very bottom.
Test cases only involving integers
In summary, if n is small, then the unvectorized version is better. Otherwise, vectorized is the way to go.
With n = 30
%timeit evaluate(array1, array2)
# 10000 loops, best of 3: 35.7 µs per loop
%timeit vec_evaluate(array1, array2)
# 10000 loops, best of 3: 27.6 µs per loop
With n = 7
%timeit evaluate(array1, array2)
100000 loops, best of 3: 9.93 µs per loop
%timeit vec_evaluate(array1, array2)
10000 loops, best of 3: 21.6 µs per loop
Test cases involving strings
Vectorization wins.
Setup code:
def factory(i): return lambda num: str(num)==str(i)
array1 = list()
for i in range(7):
array1.append(factory(i))
array1 = np.array(array1)
array2 = np.array(xrange(7))
With n = 10000
%timeit evaluate(array1, array2)
10 loops, best of 3: 36.7 ms per loop
%timeit vec_evaluate(array1, array2)
100 loops, best of 3: 6.57 ms per loop
With n = 7
%timeit evaluate(array1, array2)
10000 loops, best of 3: 28.3 µs per loop
%timeit vec_evaluate(array1, array2)
10000 loops, best of 3: 27.5 µs per loop
Random tests
Just to see how branch prediction played a role. From what I'm seeing, it didn't really change much. Vectorization still usually wins.
Setup code.
def factory(i):
if random() < 0.5:
return lambda num: str(num) == str(i)
return lambda num: num == i
When n = 10000
%timeit evaluate(array1, array2)
10 loops, best of 3: 25.7 ms per loop
%timeit vec_evaluate(array1, array2)
100 loops, best of 3: 4.67 ms per loop
When n = 7
%timeit evaluate(array1, array2)
10000 loops, best of 3: 23.1 µs per loop
%timeit vec_evaluate(array1, array2)
10000 loops, best of 3: 23.1 µs per loop
Using python lists instead of numpy arrays
I ran this test to see what happened when I chose not to use the "optimized" numpy arrays, and I received some very surprising results.
The setup code is almost the same, except I'm choosing not to use numpy arrays. I'm also doing this test for only the "random" case.
def factory(i):
if random() < 0.5:
return lambda num: str(num) == str(i)
return lambda num: num == i
array1 = list()
for i in range(10000): array1.append(factory(i))
array2 = range(10000)
And the "unvectorized" version:
%timeit evaluate(array1, array2)
100 loops, best of 3: 4.93 ms per loop
You could see this is actually pretty surprising, because this is almost the same benchmark I was receiving with my random test case involving the vectorized evaluate.
%timeit vec_evaluate(array1, array2)
10 loops, best of 3: 19.8 ms per loop
Likewise, if you change these into numpy arrays before using vec_evaluate, you get the same 4.5 ms benchmark.
I need to be able to store a numpy array in a dict for caching purposes. Hash speed is important.
The array represents indicies, so while the actual identity of the object is not important, the value is. Mutabliity is not a concern, as I'm only interested in the current value.
What should I hash in order to store it in a dict?
My current approach is to use str(arr.data), which is faster than md5 in my testing.
I've incorporated some examples from the answers to get an idea of relative times:
In [121]: %timeit hash(str(y))
10000 loops, best of 3: 68.7 us per loop
In [122]: %timeit hash(y.tostring())
1000000 loops, best of 3: 383 ns per loop
In [123]: %timeit hash(str(y.data))
1000000 loops, best of 3: 543 ns per loop
In [124]: %timeit y.flags.writeable = False ; hash(y.data)
1000000 loops, best of 3: 1.15 us per loop
In [125]: %timeit hash((b*y).sum())
100000 loops, best of 3: 8.12 us per loop
It would appear that for this particular use case (small arrays of indicies), arr.tostring offers the best performance.
While hashing the read-only buffer is fast on its own, the overhead of setting the writeable flag actually makes it slower.
You can simply hash the underlying buffer, if you make it read-only:
>>> a = random.randint(10, 100, 100000)
>>> a.flags.writeable = False
>>> %timeit hash(a.data)
100 loops, best of 3: 2.01 ms per loop
>>> %timeit hash(a.tostring())
100 loops, best of 3: 2.28 ms per loop
For very large arrays, hash(str(a)) is a lot faster, but then it only takes a small part of the array into account.
>>> %timeit hash(str(a))
10000 loops, best of 3: 55.5 us per loop
>>> str(a)
'[63 30 33 ..., 96 25 60]'
You can try xxhash via its Python binding. For large arrays this is much faster than hash(x.tostring()).
Example IPython session:
>>> import xxhash
>>> import numpy
>>> x = numpy.random.rand(1024 * 1024 * 16)
>>> h = xxhash.xxh64()
>>> %timeit hash(x.tostring())
1 loops, best of 3: 208 ms per loop
>>> %timeit h.update(x); h.intdigest(); h.reset()
100 loops, best of 3: 10.2 ms per loop
And by the way, on various blogs and answers posted to Stack Overflow, you'll see people using sha1 or md5 as hash functions. For performance reasons this is usually not acceptable, as those "secure" hash functions are rather slow. They're useful only if hash collision is one of the top concerns.
Nevertheless, hash collisions happen all the time. And if all you need is implementing __hash__ for data-array objects so that they can be used as keys in Python dictionaries or sets, I think it's better to concentrate on the speed of __hash__ itself and let Python handle the hash collision[1].
[1] You may need to override __eq__ too, to help Python manage hash collision. You would want __eq__ to return a boolean, rather than an array of booleans as is done by numpy.
Coming late to the party, but for large arrays, I think a decent way to do it is to randomly subsample the matrix and hash that sample:
def subsample_hash(a):
rng = np.random.RandomState(89)
inds = rng.randint(low=0, high=a.size, size=1000)
b = a.flat[inds]
b.flags.writeable = False
return hash(b.data)
I think this is better than doing hash(str(a)), because the latter could confuse arrays that have unique data in the middle but zeros around the edges.
If your np.array() is small and in a tight loop, then one option is to skip hash() completely and just use np.array().data.tobytes() directly as your dict key:
grid = np.array([[True, False, True],[False, False, True]])
hash = grid.data.tobytes()
cache = cache or {}
if hash not in cache:
cache[hash] = function(grid)
return cache[hash]
What kind of data do you have?
array-size
do you have an index several times in the array
If your array only consists of permutation of indices you can use a base-convertion
(1, 0, 2) -> 1 * 3**0 + 0 * 3**1 + 2 * 3**2 = 10(base3)
and use '10' as hash_key via
import numpy as num
base_size = 3
base = base_size ** num.arange(base_size)
max_base = (base * num.arange(base_size)).sum()
hashed_array = (base * array).sum()
Now you can use an array (shape=(base_size, )) instead of a dict in order to access the values.
I have some data which is either 1 or 2 dimensional. I want to iterate through every pattern in the data set and perform foo() on it. If the data is 1D then add this value to a list, if it's 2D then take the mean of the inner list and append this value.
I saw this question, and decided to implement it checking for instance of a list. I can't use numpy for this application.
outputs = []
for row in data:
if isinstance(row, list):
vals = [foo(window) for window in row]
outputs.append(sum(vals)/float(len(vals)))
else:
outputs.append(foo(row))
Is there a neater way of doing this? On each run, every pattern will have the same dimensionality, so I could make a separate class for 1D/2D but that will add a lot of classes to my code. The datasets can get quite large so a quick solution is preferable.
Your code is already almost as neat and fast as it can be. The only slight improvement is replacing [foo(window) for window in row] with map(foo, row), which can be seen by the benchmarks:
> python -m timeit "foo = lambda x: x+1; list(map(foo, range(1000)))"
10000 loops, best of 3: 132 usec per loop
> python -m timeit "foo = lambda x: x+1; [foo(a) for a in range(1000)]"
10000 loops, best of 3: 140 usec per loop
isinstance() already seems faster than its counterparts hasattr() and type() ==:
> python -m timeit "[isinstance(i, int) for i in range(1000)]"
10000 loops, best of 3: 117 usec per loop
> python -m timeit "[hasattr(i, '__iter__') for i in range(1000)]"
1000 loops, best of 3: 470 usec per loop
> python -m timeit "[type(i) == int for i in range(1000)]"
10000 loops, best of 3: 130 usec per loop
However, if you count short as neat, you can also simplify your code (after replacingmap) to:
mean = lambda x: sum(x)/float(len(x)) #or `from statistics import mean` in python3.4
output = [foo(r) if isinstance(r, int) else mean(map(foo, r)) for r in data]
I need to be able to store a numpy array in a dict for caching purposes. Hash speed is important.
The array represents indicies, so while the actual identity of the object is not important, the value is. Mutabliity is not a concern, as I'm only interested in the current value.
What should I hash in order to store it in a dict?
My current approach is to use str(arr.data), which is faster than md5 in my testing.
I've incorporated some examples from the answers to get an idea of relative times:
In [121]: %timeit hash(str(y))
10000 loops, best of 3: 68.7 us per loop
In [122]: %timeit hash(y.tostring())
1000000 loops, best of 3: 383 ns per loop
In [123]: %timeit hash(str(y.data))
1000000 loops, best of 3: 543 ns per loop
In [124]: %timeit y.flags.writeable = False ; hash(y.data)
1000000 loops, best of 3: 1.15 us per loop
In [125]: %timeit hash((b*y).sum())
100000 loops, best of 3: 8.12 us per loop
It would appear that for this particular use case (small arrays of indicies), arr.tostring offers the best performance.
While hashing the read-only buffer is fast on its own, the overhead of setting the writeable flag actually makes it slower.
You can simply hash the underlying buffer, if you make it read-only:
>>> a = random.randint(10, 100, 100000)
>>> a.flags.writeable = False
>>> %timeit hash(a.data)
100 loops, best of 3: 2.01 ms per loop
>>> %timeit hash(a.tostring())
100 loops, best of 3: 2.28 ms per loop
For very large arrays, hash(str(a)) is a lot faster, but then it only takes a small part of the array into account.
>>> %timeit hash(str(a))
10000 loops, best of 3: 55.5 us per loop
>>> str(a)
'[63 30 33 ..., 96 25 60]'
You can try xxhash via its Python binding. For large arrays this is much faster than hash(x.tostring()).
Example IPython session:
>>> import xxhash
>>> import numpy
>>> x = numpy.random.rand(1024 * 1024 * 16)
>>> h = xxhash.xxh64()
>>> %timeit hash(x.tostring())
1 loops, best of 3: 208 ms per loop
>>> %timeit h.update(x); h.intdigest(); h.reset()
100 loops, best of 3: 10.2 ms per loop
And by the way, on various blogs and answers posted to Stack Overflow, you'll see people using sha1 or md5 as hash functions. For performance reasons this is usually not acceptable, as those "secure" hash functions are rather slow. They're useful only if hash collision is one of the top concerns.
Nevertheless, hash collisions happen all the time. And if all you need is implementing __hash__ for data-array objects so that they can be used as keys in Python dictionaries or sets, I think it's better to concentrate on the speed of __hash__ itself and let Python handle the hash collision[1].
[1] You may need to override __eq__ too, to help Python manage hash collision. You would want __eq__ to return a boolean, rather than an array of booleans as is done by numpy.
Coming late to the party, but for large arrays, I think a decent way to do it is to randomly subsample the matrix and hash that sample:
def subsample_hash(a):
rng = np.random.RandomState(89)
inds = rng.randint(low=0, high=a.size, size=1000)
b = a.flat[inds]
b.flags.writeable = False
return hash(b.data)
I think this is better than doing hash(str(a)), because the latter could confuse arrays that have unique data in the middle but zeros around the edges.
If your np.array() is small and in a tight loop, then one option is to skip hash() completely and just use np.array().data.tobytes() directly as your dict key:
grid = np.array([[True, False, True],[False, False, True]])
hash = grid.data.tobytes()
cache = cache or {}
if hash not in cache:
cache[hash] = function(grid)
return cache[hash]
What kind of data do you have?
array-size
do you have an index several times in the array
If your array only consists of permutation of indices you can use a base-convertion
(1, 0, 2) -> 1 * 3**0 + 0 * 3**1 + 2 * 3**2 = 10(base3)
and use '10' as hash_key via
import numpy as num
base_size = 3
base = base_size ** num.arange(base_size)
max_base = (base * num.arange(base_size)).sum()
hashed_array = (base * array).sum()
Now you can use an array (shape=(base_size, )) instead of a dict in order to access the values.