I have some data which is either 1 or 2 dimensional. I want to iterate through every pattern in the data set and perform foo() on it. If the data is 1D then add this value to a list, if it's 2D then take the mean of the inner list and append this value.
I saw this question, and decided to implement it checking for instance of a list. I can't use numpy for this application.
outputs = []
for row in data:
if isinstance(row, list):
vals = [foo(window) for window in row]
outputs.append(sum(vals)/float(len(vals)))
else:
outputs.append(foo(row))
Is there a neater way of doing this? On each run, every pattern will have the same dimensionality, so I could make a separate class for 1D/2D but that will add a lot of classes to my code. The datasets can get quite large so a quick solution is preferable.
Your code is already almost as neat and fast as it can be. The only slight improvement is replacing [foo(window) for window in row] with map(foo, row), which can be seen by the benchmarks:
> python -m timeit "foo = lambda x: x+1; list(map(foo, range(1000)))"
10000 loops, best of 3: 132 usec per loop
> python -m timeit "foo = lambda x: x+1; [foo(a) for a in range(1000)]"
10000 loops, best of 3: 140 usec per loop
isinstance() already seems faster than its counterparts hasattr() and type() ==:
> python -m timeit "[isinstance(i, int) for i in range(1000)]"
10000 loops, best of 3: 117 usec per loop
> python -m timeit "[hasattr(i, '__iter__') for i in range(1000)]"
1000 loops, best of 3: 470 usec per loop
> python -m timeit "[type(i) == int for i in range(1000)]"
10000 loops, best of 3: 130 usec per loop
However, if you count short as neat, you can also simplify your code (after replacingmap) to:
mean = lambda x: sum(x)/float(len(x)) #or `from statistics import mean` in python3.4
output = [foo(r) if isinstance(r, int) else mean(map(foo, r)) for r in data]
Related
This question already has answers here:
How can I multiply all items in a list together with Python?
(15 answers)
Closed 6 years ago.
How do I multiply the items in a list ?
For example:
num_list = [1,2,3,4,5]
def multiplyListItems(l):
# some code here...
The expected calculation and return value is 1 x 2 x 3 x 4 x 5 = 120.
One way is to use reduce:
>>> num_list = [1,2,3,4,5]
>>> reduce(lambda x, y: x*y, num_list)
120
Use functools.reduce, which is faster (see below) and more forward-compatible with Python 3.
import operator
import functools
num_list = [1,2,3,4,5]
accum_value = functools.reduce(operator.mul, num_list)
print(accum_value)
# Output
120
Measure the execution time for 3 different ways,
# Way 1: reduce
$ python -m timeit "reduce(lambda x, y: x*y, [1,2,3,4,5])"
1000000 loops, best of 3: 0.727 usec per loop
# Way 2: np.multiply.reduce
$ python -m timeit -s "import numpy as np" "np.multiply.reduce([1,2,3,4,5])"
100000 loops, best of 3: 6.71 usec per loop
# Way 3: functools.reduce
$ python -m timeit -s "import operator, functools" "functools.reduce(operator.mul, [1,2,3,4,5])"
1000000 loops, best of 3: 0.421 usec per loop
For a bigger list, it is better to use np.multiply.reduce as mentioned by #MikeMüller.
$ python -m timeit "reduce(lambda x, y: x*y, range(1, int(1e5)))"
10 loops, best of 3: 3.01 sec per loop
$ python -m timeit -s "import numpy as np" "np.multiply.reduce(range(1, int(1e5)))"
100 loops, best of 3: 11.2 msec per loop
$ python -m timeit -s "import operator, functools" "functools.reduce(operator.mul, range(1, int(1e5)))"
10 loops, best of 3: 2.98 sec per loop
A NumPy solution:
>>> import numpy as np
>>> np.multiply.reduce(num_list)
120
Run times for a bit larger list:
In [303]:
from operator import mul
from functools import reduce
import numpy as np
a = list(range(1, int(1e5)))
In [304]
%timeit np.multiply.reduce(a)
100 loops, best of 3: 8.25 ms per loop
In [305]:
%timeit reduce(lambda x, y: x*y, a)
1 loops, best of 3: 5.04 s per loop
In [306]:
%timeit reduce(mul, a)
1 loops, best of 3: 5.37 s per loop
NumPy is largely implemented in C. Therefore, it can often be one or two orders of magnitudes faster than writing loops over Python lists. This works for larger arrays. If an array is small and it is used often form Python, things can be slower than using pure Python. This is because of the overhead converting between Python objects and C data types. In fact, it is an anti-pattern to write Python for loops to iterate over NumPy arrays.
Here, the list with five numbers causes considerable overhead compared to gain
from the faster numerics.
I have to make a very large number of simulations on a R*C grid.
These simulations are altering the grid, so I need to copy my reference grid before each, and then apply my simulating function on the fresh new grid.
What is the fastest way to do this in Python?
Since I have not found a similar question on StackOverflow, I did the tests myself and decided to post them here thinking they could be useful to other people.
The answer will be a community response so that other people can add new measurements with possibly other techniques.
If you add another method, remember to measure all the old tests and update them because the time depends on the computer used, avoid biasing the results.
I used a bash variable for setting up the timeit tests:
setup="""
R = 100
C = 100
from copy import deepcopy
import numpy as np
ref = [[i for i in range(C)] for _ in range(R)]
ref_np = np.array(ref)
cp = [[100 for i in range(C)] for _ in range(R)]
cp_np = np.array(cp)
"""
Just for convenience, I also set a temporary alias pybench:
alias pybench='python3.5 -m timeit -s "$setup" $1'
Python 3
Python 3.5.0+ (default, Oct 11 2015, 09:05:38)
Deepcopy:
>>> pybench "cp = deepcopy(ref)"
100 loops, best of 3: 8.29 msec per loop
Modifying pre-created array using index:
>>> pybench \
"for y in range(R):
for x in range(C):
cp[y][x] = ref[y][x]"
1000 loops, best of 3: 1.16 msec per loop
Nested list comprehension:
>>> pybench "cp = [[x for x in row] for row in ref]"
1000 loops, best of 3: 390 usec per loop
Slicing:
>>> pybench "cp = [row[:] for row in ref]"
10000 loops, best of 3: 45.8 usec per loop
NumPy copy:
>>> pybench "cp_np = np.copy(ref_np)"
100000 loops, best of 3: 6.03 usec per loop
Copying to pre-created NumPy array:
>>> pybench "np.copyto(cp_np, ref_np)"
100000 loops, best of 3: 4.52 usec per loop
There is nothing very surprising in these results, as you might have guessed, use NumPy is enormously faster, especially if one avoids creating a new table each time.
To add to the answer from Delgan, numpy copy's documentation says to use numpy.ndarray.copy as the preferred method. So for now, without doing a timing test, I will use numpy.ndarray.copy
https://numpy.org/doc/stable/reference/generated/numpy.copy.html
https://numpy.org/doc/stable/reference/generated/numpy.ndarray.copy.html
How would you vectorize the evaluation of arrays of lambda functions?
Here's an example to understand what I'm talking about. (And even though I'm using numpy arrays, I'm not limiting myself to only using numpy.)
Let's say I have the following numpy arrays.
array1 = np.array(["hello", 9])
array2 = np.array([lambda s: s == "hello", lambda num: num < 10])
(You could store these kinds of objects in numpy without throwing an error, believe it or not.) What I want is something akin to the following.
array2 * array1
# Return np.array([True, True]). PS: An explanation of how to `AND` all of
# booleans together quickly would be nice too.
Of course, this seems impractical for arrays of size 2, but for arrays of arbitrary sizes, I'll assume this would yield a performance boost because of all of the low level optimizations.
So, anyone know how to write this weird kind of python code?
The simple answer, of course, is that you can't easily do this with numpy (or with standard Python, for that matter). Numpy doesn't actually vectorize most operations itself, to my knowledge: it uses libraries like BLAS/ATLAS/etc that do for certain situations. Even if it did, it would do so in C for specific situations: it certainly can't vectorize Python function execution.
If you want to involve multiprocessing in this, it is possible, but it depends on your situation. Are your individual function applications time-consuming, making them feasible to send out one-by-one, or do you need a very large number of fast function executions, in which case you'd probably want to send batches of them to each process?
In general, because of what could be argued as poor fundamental design (eg, the Global Interpreter Lock), it's very difficult with standard Python to have lightweight parallelization as you're hoping for here. There are significantly heavier methods, like the multiprocessing module or Ipython.parallel, but these require some work to use.
Alright guys, I have an answer: numpy's vectorize.
Please read the edited section though. You'll discover that python actually optimizes code for you, which actually defeats the purpose of using numpy arrays in this case. (But using numpy arrays does not decrease the performance.)
The last test really shows is that python lists are as efficient as they could be, and so this vectorization procedure is unnecessary. This is why I didn't mark this question as the "best answer".
Setup code:
def factory(i): return lambda num: num==i
array1 = list()
for i in range(10000): array1.append(factory(i))
array1 = np.array(array1)
array2 = np.array(xrange(10000))
The "unvectorized" version:
def evaluate(array1, array2):
return [func(val) for func, val in zip(array1, array2)]
%timeit evaluate(array1, array2)
# 100 loops, best of 3: 10 ms per loop
The vectorized version
def evaluate2(func, b): return func(b)
vec_evaluate = np.vectorize(evaluate2)
vec_evaluate(array1, array2)
# 100 loops, best of 3: 2.65 ms per loop
EDIT
Okay, I just wanted to paste more benchmarks that I received using the above tests, except with different test cases.
I made a third edit, showing what happens if you simply use python lists. The long story short, you actually won't regret much. This test case is on the very bottom.
Test cases only involving integers
In summary, if n is small, then the unvectorized version is better. Otherwise, vectorized is the way to go.
With n = 30
%timeit evaluate(array1, array2)
# 10000 loops, best of 3: 35.7 µs per loop
%timeit vec_evaluate(array1, array2)
# 10000 loops, best of 3: 27.6 µs per loop
With n = 7
%timeit evaluate(array1, array2)
100000 loops, best of 3: 9.93 µs per loop
%timeit vec_evaluate(array1, array2)
10000 loops, best of 3: 21.6 µs per loop
Test cases involving strings
Vectorization wins.
Setup code:
def factory(i): return lambda num: str(num)==str(i)
array1 = list()
for i in range(7):
array1.append(factory(i))
array1 = np.array(array1)
array2 = np.array(xrange(7))
With n = 10000
%timeit evaluate(array1, array2)
10 loops, best of 3: 36.7 ms per loop
%timeit vec_evaluate(array1, array2)
100 loops, best of 3: 6.57 ms per loop
With n = 7
%timeit evaluate(array1, array2)
10000 loops, best of 3: 28.3 µs per loop
%timeit vec_evaluate(array1, array2)
10000 loops, best of 3: 27.5 µs per loop
Random tests
Just to see how branch prediction played a role. From what I'm seeing, it didn't really change much. Vectorization still usually wins.
Setup code.
def factory(i):
if random() < 0.5:
return lambda num: str(num) == str(i)
return lambda num: num == i
When n = 10000
%timeit evaluate(array1, array2)
10 loops, best of 3: 25.7 ms per loop
%timeit vec_evaluate(array1, array2)
100 loops, best of 3: 4.67 ms per loop
When n = 7
%timeit evaluate(array1, array2)
10000 loops, best of 3: 23.1 µs per loop
%timeit vec_evaluate(array1, array2)
10000 loops, best of 3: 23.1 µs per loop
Using python lists instead of numpy arrays
I ran this test to see what happened when I chose not to use the "optimized" numpy arrays, and I received some very surprising results.
The setup code is almost the same, except I'm choosing not to use numpy arrays. I'm also doing this test for only the "random" case.
def factory(i):
if random() < 0.5:
return lambda num: str(num) == str(i)
return lambda num: num == i
array1 = list()
for i in range(10000): array1.append(factory(i))
array2 = range(10000)
And the "unvectorized" version:
%timeit evaluate(array1, array2)
100 loops, best of 3: 4.93 ms per loop
You could see this is actually pretty surprising, because this is almost the same benchmark I was receiving with my random test case involving the vectorized evaluate.
%timeit vec_evaluate(array1, array2)
10 loops, best of 3: 19.8 ms per loop
Likewise, if you change these into numpy arrays before using vec_evaluate, you get the same 4.5 ms benchmark.
I have long list(nextWordIndices) of indices size of 7000. I want to get value list from another list to mach that indices. I can do this but it take lots of time
nextWord = []
for i in nextWordIndices:
nextWord.append(allWords[i])
is there any optimize way?
If the indices are frequently the same, you can use operator.itemgetter:
word_getter = operator.itemgetter(*nextWordIndices)
nextWord = word_getter(allWords)
If you can use word_getter multiple times, and tuples are OK for output, you might see a speed-up compared to a list-comprehension.
Timings:
python -m timeit -s "allWords = range(7000); nextWordIndices = range(7000)" "[allWords[i] for i in nextWordIndices]"
1000 loops, best of 3: 415 usec per loop
python -m timeit -s "allWords = range(7000); nextWordIndices = range(7000)" "map(allWords.__getitem__, nextWordIndices)"
1000 loops, best of 3: 614 usec per loop
python -m timeit -s "allWords = range(7000); nextWordIndices = range(7000); from operator import itemgetter" "itemgetter(*nextWordIndices)(allWords)"
1000 loops, best of 3: 292 usec per loop
Using a list comp:
nextWord = [allWords[i] for i in nextWordIndices]
Actually this might be faster (will have to timeit)
map(allWords.__getitem__, nextWordIndices)
Use map instead of loop.
def getWord(i):
return allWords[i]
nextWord = map(getWord, nextWordIndices)
In one of my classes I have a number of methods that all draw values from the same dictionaries. However, if one of the methods tries to access a value that isn't there, it has to call another method to make the value associated with that key.
I currently have this implemented as follows, where findCrackDepth(tonnage) assigns a value to self.lowCrackDepth[tonnage].
if tonnage not in self.lowCrackDepth:
self.findCrackDepth(tonnage)
lcrack = self.lowCrackDepth[tonnage]
However, it would also be possible for me to do this as
try:
lcrack = self.lowCrackDepth[tonnage]
except KeyError:
self.findCrackDepth(tonnage)
lcrack = self.lowCrackDepth[tonnage]
I assume there is a performance difference between the two related to how often the values is already in the dictionary. How big is this difference? I'm generating a few million such values (spread across a many dictionaries in many instances of the class), and for each time the value doesn't exist, there are probably two times where it does.
It's a delicate problem to time this because you need care to avoid "lasting side effects" and the performance tradeoff depends on the % of missing keys. So, consider a dil.py file as follows:
def make(percentmissing):
global d
d = dict.fromkeys(range(100-percentmissing), 1)
def addit(d, k):
d[k] = k
def with_in():
dc = d.copy()
for k in range(100):
if k not in dc:
addit(dc, k)
lc = dc[k]
def with_ex():
dc = d.copy()
for k in range(100):
try: lc = dc[k]
except KeyError:
addit(dc, k)
lc = dc[k]
def with_ge():
dc = d.copy()
for k in range(100):
lc = dc.get(k)
if lc is None:
addit(dc, k)
lc = dc[k]
and a series of timeit calls such as:
$ python -mtimeit -s'import dil; dil.make(10)' 'dil.with_in()'
10000 loops, best of 3: 28 usec per loop
$ python -mtimeit -s'import dil; dil.make(10)' 'dil.with_ex()'
10000 loops, best of 3: 41.7 usec per loop
$ python -mtimeit -s'import dil; dil.make(10)' 'dil.with_ge()'
10000 loops, best of 3: 46.6 usec per loop
this shows that, with 10% missing keys, the in check is substantially the fastest way.
$ python -mtimeit -s'import dil; dil.make(1)' 'dil.with_in()'
10000 loops, best of 3: 24.6 usec per loop
$ python -mtimeit -s'import dil; dil.make(1)' 'dil.with_ex()'
10000 loops, best of 3: 23.4 usec per loop
$ python -mtimeit -s'import dil; dil.make(1)' 'dil.with_ge()'
10000 loops, best of 3: 42.7 usec per loop
with just 1% missing keys, the exception approach is marginally fastest (and the get approach remains the slowest one in either case).
So, for optimal performance, unless the vast majority (99%+) of lookups is going to succeed, the in approach is preferable.
Of course, there's another, elegant possibility: adding a dict subclass like...:
class dd(dict):
def __init__(self, *a, **k):
dict.__init__(self, *a, **k)
def __missing__(self, k):
addit(self, k)
return self[k]
def with_dd():
dc = dd(d)
for k in range(100):
lc = dc[k]
However...:
$ python -mtimeit -s'import dil; dil.make(1)' 'dil.with_dd()'
10000 loops, best of 3: 46.1 usec per loop
$ python -mtimeit -s'import dil; dil.make(10)' 'dil.with_dd()'
10000 loops, best of 3: 55 usec per loop
...while slick indeed, this is not a performance winner -- it's about even with the get approach, or slower, just with much nicer-looking code to use it. (defaultdict, semantically analogous to this dd class, would be a performance win if it was applicable, but that's because the __missing__ special method, in that case, is implemented in well optimized C code).
Checking if a key exists is cheaper or at least as cheap as retrieving it. So use the if not in solution which is much cleaner and more readable.
According to your question a key not existing is not an error-like case so there's no good reason to let python raise an exception (even though you catch it immediately), and if you have a if not in check, everyone knows your intention - to get the existing value or otherwise generate it.
When in doubt, profile.
Run a test to see if, in your environment, one runs faster than another.
If it is exceptional, use an exception. If you expect the key to be in there, use try/except, if you don't know whether the key is in there, use not in.
I believe the .get() method of a dict has a parameter for setting the default value. You could use that and have it in one line. I'm not sure how it affects performance though.