Getting past slow list creation in python

Getting past slow list creation in python - python

Good evening!
I was running some tests on lists and list creation vs iterator creation and I came across some staggering time differences. Observe the following:
>>> timeit.timeit('map(lambda x: x**3, [1, 2, 3, 4, 5])')
0.4515998857965542
>>> timeit.timeit('list(map(lambda x: x**3, [1, 2, 3, 4, 5]))')
2.868906182460819
The iterator version returned by the first test runs more than 6x as fast as converting to a list. I understand basically why this might be occurring, but what I'm more interested in is a solution. Does anyone know of a data structure similar to a list that offers fast creation time? (Basically, I want to know if there is a way to go straight from iterator (i.e. map or filter function, etc.), to a list without any major performance hits)
Things I can sacrifice for speed:
Appending, inserting, popping and deleting elements.
Slicing of elements.
Reversing the list or any inplace operators like sort.
Contains (in) operator.
Concatenation and multiplication.
All suggestions are welcome thanks!
EDIT: Indeed this is for python 3.

In Python 3.x, map doesn't create a list, but just an iterator, unlike Python 2.x.
print(type(map(lambda x: x**3, [1, 2, 3, 4, 5])))
# <class 'map'>
To really get a list, iterate it with the list function, like this
print(type(list(map(lambda x: x**3, [1, 2, 3, 4, 5]))))
# <class 'list'>
So, you are really not comparing two similar things.

Expanding on thefourtheye's answer; The expressions inside the map function will not be evaluated before you iterate over it. This example should be pretty clear:
from time import sleep
def badass_heavy_function():
sleep(3600)
# Method call isn't evaluated
foo = map(lambda x: x(), [badass_heavy_function, badass_heavy_function])
# Methods call will be evaluated, please wait 2 hours
bar = list(map(lambda x: x(), [badass_heavy_function, badass_heavy_function]))
for _ in foo:
# Please wait one hour
pass

To further extend the answers of the other two guys:
You had a misconception about the iterator. But you refer to as "slow creation time", and then you look for a "faster container", because of your misinterpretation.
Note that the creation of a list object in python is fast:
%timeit list(range(10000))
10000 loops, best of 3: 164 µs per loop
What you experience as slow is the actual loop that you need to do calculate the values that need to go into the list.
see a very unoptimized example of slowly "creating" a new list of another list:
x = list(range(10000))
def slow_loop(x):
new = []
for i in x:
new.append(i**2)
return new
%timeit slow_loop(x)
100 loops, best of 3: 4.17 ms per loop
the time that is spent is actually on the loop, that is "slow" in python.
This is actually what you are doing here technically if you compare:
def your_loop(x):
return list(map(lambda y: y**2, x))
%timeit your_loop(x)
100 loops, best of 3: 4.5 ms per loop
There is a way to speed this up though:
def faster_loop(x):
return [i**2 for i in x]
%timeit faster_loop(x)
100 loops, best of 3: 3.67 ms per loop
although not by much given this kind of function. The thing is: the slow part here is the math, not the list and not the container. You can prove this by using numpy
arr = np.array(x)
%timeit arr ** 2
100000 loops, best of 3: 7.44 µs per loop
Woah... crazy speedup.
With the benchmarking - I find myself guilty of this quite often as well - people doubt the system too often but themselves not often enough. So it's not like python is very unoptimized or "slow" it's just that you're doing it wrong. Don't doubt the python list efficiency. Doubt your slow, inefficient code. You will probably get it right quicker...
It seems here the pure python ** operator is very slow, as a simple multiplication
is much quicker:
def faster_loop2(x):
return [i * i for i in x]
%timeit faster_loop2(x)
1000 loops, best of 3: 534 µs per loop

Related

Why is range()-function slower than multiplying items to get copies inside nested list?

To copy a nested list in an existing list, it is unfortunately not sufficient to simply multiply it, otherwise references are created and not independent lists in the list, see this example:
x = [[1, 2, 3]] * 2
x[0] is x[1] # will evaluate to True
To achieve your goal, you could use the range function in a list comprehension, for example, see this:
x = [[1, 2, 3] for _ in range(2)]
x[0] is x[1] # will evaluate to False (wanted behaviour)
This is a good way to multiply items in a list without just creating references, and this is also explained multiple times on many different websites.
However, there is a more efficient way to copy the list elements. That code seems a little faster to me (measured by timeit via command line and with different paramater n ∈ {1, 50, 100, 10000} for code below and range(n) in code above):
x = [[1, 2, 3] for _ in [0] * n]
But I wonder, why does this code run faster? Are there other disadvantages (more memory consumption or similar)?
python -m timeit '[[1, 2, 3] for _ in range(1)]'
1000000 loops, best of 3: 0.243 usec per loop
python -m timeit '[[1, 2, 3] for _ in range(50)]'
100000 loops, best of 3: 3.79 usec per loop
python -m timeit '[[1, 2, 3] for _ in range(100)]'
100000 loops, best of 3: 7.39 usec per loop
python -m timeit '[[1, 2, 3] for _ in range(10000)]'
1000 loops, best of 3: 940 usec per loop
python -m timeit '[[1, 2, 3] for _ in [0] * 1]'
1000000 loops, best of 3: 0.242 usec per loop
python -m timeit '[[1, 2, 3] for _ in [0] * 50]'
100000 loops, best of 3: 3.77 usec per loop
python -m timeit '[[1, 2, 3] for _ in [0] * 100]'
100000 loops, best of 3: 7.3 usec per loop
python -m timeit '[[1, 2, 3] for _ in [0] * 10000]'
1000 loops, best of 3: 927 usec per loop
# difference will be greater for larger n
python -m timeit '[[1, 2, 3] for _ in range(1000000)]'
10 loops, best of 3: 144 msec per loop
python -m timeit '[[1, 2, 3] for _ in [0] * 1000000]'
10 loops, best of 3: 126 msec per loop

This is correct; range, even in Python 3 where it produces a compact range object, is more complicated than a list, in the classical tradeoff between computation and storage.
As the list grows too large to fit in cache (the primary question if we're concerned with performance), the range object runs into a different issue: as each number in the range is created, it destroys and creates new int objects (the first 256 or so are less costly because they're interned, but their difference can still cost a few cache misses). The list would keep referring to the same one.
There are still more efficient options, though; a bytearray, for instance, would consume far less memory than the list. Probably the best function for the task is hidden away in itertools: repeat. Like a range object, it doesn't need storage for all the copies, but like the repeated list it doesn't need to create distinct objects. Something like for _ in repeat(None, x) would therefore just poke at the same few cache lines (iteration count and reference count for the object).
In the end, the main reason people stick to using range is because it's what's prominently presented (both in the idiom for a fixed count loop and among the builtins).
In other Python implementations, it's quite possible for range to be faster than repeat; this would be because the counter itself already holds the value. I'd expect such behaviours from Cython or PyPy.

apply vector of functions to vector of arguments

I'd like to take in a list of functions, funclist, and return a new function which takes in a list of arguments, arglist, and applies the ith function in funclist to the ith element of arglist, returning the results in a list:
def myfunc(funclist):
return lambda arglist: [ funclist[i](elt) for i, elt in enumerate(arglist) ]
This is not optimized for parallel/vectorized application of the independent functions in funclist to the independent arguments in argvec. Is there a built-in function in python or numpy (or otherwise) that will return a more optimized version of the lambda above? It would be similar in spirit to map or numpy.vectorize (but obviously not the same), and so far I haven't found anything.

In numpy terms true vectorization means performing the iterative stuff in compiled code. Usually that requires using numpy functions that work with whole arrays, doing thing like addition and indexing.
np.vectorize is a way of iterate of several arrays, and using their elements in a function that does not handle arrays. It doesn't do much in compiled code, so does not improve the speed much. It's most valuable as a way of applying numpy broadcasting rules to your own scalar function.
map is a variant on list comprehension, and has basically the same speed. And a list comprehension has more expressive power, working with several lists.
#Tore's zipped comprehension is a clear expression this task
[f(args) for f, args in zip(funclist, arglist)]
map can work with several input lists:
In [415]: arglist=[np.arange(3),np.arange(1,4)]
In [416]: fnlist=[np.sum, np.prod]
In [417]: [f(a) for f,a in zip(fnlist, arglist)]
Out[417]: [3, 6]
In [418]: list(map(lambda f,a: f(a), fnlist, arglist))
Out[418]: [3, 6]
Your version is a little wordier, but functionally the same.
In [423]: def myfunc(funclist):
...: return lambda arglist: [ funclist[i](elt) for i, elt in enumerate(arglist) ]
In [424]: myfunc(fnlist)
Out[424]: <function __main__.myfunc.<locals>.<lambda>>
In [425]: myfunc(fnlist)(arglist)
Out[425]: [3, 6]
It has the advantage of generating a function that can be applied to different arglists:
In [426]: flist=myfunc(fnlist)
In [427]: flist(arglist)
Out[427]: [3, 6]
In [428]: flist(arglist[::-1])
Out[428]: [6, 0]
I would have written myfunc more like:
def altfun(funclist):
def foo(arglist):
return [f(a) for f,a in zip(funclist, arglist)]
return foo
but the differences are just stylistic.
================
Time test for zip v enumerate:
In [154]: funclist=[sum]*N
In [155]: arglist=[list(range(N))]*N
In [156]: sum([funclist[i](args) for i,args in enumerate(arglist)])
Out[156]: 499500000
In [157]: sum([f(args) for f,args in zip(funclist, arglist)])
Out[157]: 499500000
In [158]: timeit [funclist[i](args) for i,args in enumerate(arglist)]
10 loops, best of 3: 43.5 ms per loop
In [159]: timeit [f(args) for f,args in zip(funclist, arglist)]
10 loops, best of 3: 43.1 ms per loop
Basically the same. But map is 2x faster
In [161]: timeit list(map(lambda f,a: f(a), funclist, arglist))
10 loops, best of 3: 23.1 ms per loop
Packaging the iteration in a callable is also faster
In [165]: timeit altfun(funclist)(arglist)
10 loops, best of 3: 23 ms per loop
In [179]: timeit myfunc(funclist)(arglist)
10 loops, best of 3: 22.6 ms per loop

Processing time difference between tuple-list

Wondering why this tuple process;
x = tuple((t for t in range(100000)))
# 0.014001131057739258 seconds
Took longer than this list;
y = [z for z in range(100000)]
# 0.005000114440917969 seconds
I learned that tuple processes are faster than list since tuples are immutable.
Edit: After I changed the codes;
x = tuple(t for t in range(100000))
y = list(z for z in range(100000))
>>>
0.009999990463256836
0.0
>>>
These are the result: Still tuple is the slower one.

Tuple operations aren't necessarily faster. Being immutable at most opens the door to more optimisations, but that doesn't mean Python does them or that they apply in every case.
The difference here is very marginal, and - without profiling to confirm - it seems likely that it relates to the generator version having an extra name lookup and function call. As mentioned in the comments, rewriting the list comprehension as a call to list wrapped around a generator expression, the difference will likely shrink.

using comparative methods of testing the tuple is slightly faster:
In [12]: timeit tuple(t for t in range(100000))
100 loops, best of 3: 7.41 ms per loop
In [13]: timeit list(t for t in range(100000))
100 loops, best of 3: 7.53 ms per loop
calling list does actually create a list:
In [19]: x = list(t for t in range(10))
In [20]: x
Out[20]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
we can also see calling list on the generator does not allocate as much space as using a list comprehension:
In [28]: x = list(t for t in range(10))
In [29]: sys.getsizeof(x)
Out[29]: 168
In [30]: x = [t for t in range(10)]
In [31]: sys.getsizeof(x)
Out[31]: 200
So both operations are very similar.
A better comparison would be creating lists and tuples as subelements:
In [41]: timeit tuple((t,) for t in range(1000000))
10 loops, best of 3: 151 ms per loop
In [42]: timeit list([t] for t in range(1000000))
1 loops, best of 3: 247 ms per loop
Now we see a much larger difference.

Most pythonic and/or performant way to assign a single value to a slice?

I want to assign a single value to a part of a list. Is there a better solution to this than one of the following?
Maybe most performant but somehow ugly:
>>> l=[0,1,2,3,4,5]
>>> for i in range(2,len(l)): l[i] = None
>>> l
[0, 1, None, None, None, None]
Concise (but I don't know if Python recognizes that no rearrangement of the list elements is necesssary):
>>> l=[0,1,2,3,4,5]
>>> l[2:] = [None]*(len(l)-2)
>>> l
[0, 1, None, None, None, None]
Same caveat like above:
>>> l=[0,1,2,3,4,5]
>>> l[2:] = [None for _ in range(len(l)-2)]
>>> l
[0, 1, None, None, None, None]
Not sure if using a library for such a trivial task is wise:
>>> import itertools
>>> l=[0,1,2,3,4,5]
>>> l[2:] = itertools.repeat(None,len(l)-2)
>>> l
[0, 1, None, None, None, None]
The problem that I see with the assignment to the slice (vs. the for loop) is that Python maybe tries to prepare for a change in the length of "l". After all, changing the list by inserting a shorter/longer slice involves copying all elements (that is, all references) of the list AFAIK. If Python does this in my case too (although it is unnecessary), the operation becomes O(n) instead of O(1) (assuming that I only always change a handful of elements).

Timing it:
python -mtimeit "l=[0,1,2,3,4,5]" "for i in range(2,len(l)):" " l[i] = None"
1000000 loops, best of 3: 0.669 usec per loop
python -mtimeit "l=[0,1,2,3,4,5]" "l[2:] = [None]*(len(l)-2)"
1000000 loops, best of 3: 0.419 usec per loop
python -mtimeit "l=[0,1,2,3,4,5]" "l[2:] = [None for _ in range(len(l)-2)]"
1000000 loops, best of 3: 0.655 usec per loop
python -mtimeit "l=[0,1,2,3,4,5]" "l[2:] = itertools.repeat(None,len(l)-2)"
1000000 loops, best of 3: 0.997 usec per loop
Looks like l[2:] = [None]*(len(l)-2) is the best of the options you provided (for the scope you are dealing with).
Note:
Keep in mind that results will vary based on Python version, operation system, other currently running programs, and most of all - the size of the list and of the slice to be replaced. For larger scopes probably the last option (using itertools.repeat) will be the most effective, being both easily readable (pythonic) and efficient (performance).

All of your solutions are Pythonic and about equally readable. If you really care about performance and think that it matters in this case, use the timeit module to benchmark them.
Having said that, I would expect that the first solution is almost certainly not the most performant one because it iterates over the list elements in Python. Also, Python doesn't optimize away list the list created on the right-hand-side of assignment, but the list creation is extremely fast, and in most cases a small temporary list doesn't affect execution at all. Personally, for a short list I'd go with your second solution, and for a longer list I'd go with itertools.repeat().
Note that itertools doesn't really count as a "library", it comes with Python and is so often used that it is essentially part of the language.

I think there's no straight out of the box feature in Python to do this. I like your second approach, but keep in mind that there's a tradeoff between space and time. This is a very good reading recommended by #user4815162342: Python Patterns - An Optimization Anecdote.
Anyhow, if this is an operation you'll be performing eventually in your code, I think your best option is to wrap it inside a helper function:
def setvalues(lst, index=0, value=None):
for i in range(index, len(lst)):
lst[i] = value
>>>l=[1,2,3,4,5]
>>>setvalues(l,index=2)
>>>l
>>>[1, 2, None, None, None]
This has some advantages:
The code is refactored inside a function, so easy to modify if you change your mind about how to perform the action.
You can have several functions that accomplish the same target and therefore can measure their performance.
You can write tests for them.
Every other advantage you can get by refactoring :)
Since IMHO there's no straight Python future for this action, this is the best workaround I can imagine.
Hope this helps!

Being more Pythonic and being more performant are goals that can sometimes collide. So you're basically asking two questions. If you really need the performance: measure it and take the fastest. In all other cases just go with what is most readable, in other words what is most Pythonic (what is most readable/familiar to other Python programmers).
Personally I think your second solution is quite readable:
>>> l=[0,1,2,3,4,5]
>>> l[2:] = [None]*(len(l)-2)
>>> l
[0, 1, None, None, None, None]
The start of the second line immediately tells me that you're replacing a specific part of the values of the list.

I'd suggest something like this:
>>> l = [0, 1, 2, 3, 4, 5]
>>> n = [None] * len(l)
>>> l[2:] = n[2:]
>>> l
[0, 1, None, None, None, None]
It looks pretty: no explicit loops, no 'if', no comparisons! At the cost of a 2N complexity! (or not?)

EDIT - Editing the original list now.
You forgot this one, (IMO, this one is more readable)
>>> l = [i if i < 2 else None for i in range(6)]
>>> l
[0, 1, None, None, None, None]
If preserving is necessary,
>>> l = range(6)
>>> l
[0, 1, 2, 3, 4, 5]
>>> l[:] = [l[i] if i < 2 else None
... for i in range(len(l))]
>>> l
[0, 1, None, None, None, None]
Timed it, performance is roughly 2.5 times slower than what Inbar got as the fastest method.

numpy array partial sums with weights

I have a numpy array, say, [a,b,c,d,e,...], and would like to compute an array that would look like [x*a+y*b, x*b+y*c, x*c+y*d,...]. The idea that I have is to first split the original array into something like [[a,b],[b,c],[c,d],[d,e],...] and then attack this creature with np.average specifying the axis and weights (x+y=1 in my case), or even use np.dot. Unfortunately, I don't know how to create such array of [a,b],[b,c],... pairs. Any help, or completely different idea even to accomplish the major task, are much appreciated :-)

The quickest, simplest would be to manually extract two slices of your array and add them together:
>>> arr = np.arange(5)
>>> x, y = 10, 1
>>> x*arr[:-1] + y*arr[1:]
array([ 1, 12, 23, 34])
This will turn into a pain if you want to generalize it to triples, quadruples... But you can create your array of pairs from the original array with as_strided in a much more general form:
>>> from numpy.lib.stride_tricks import as_strided
>>> arr_pairs = as_strided(arr, shape=(len(arr)-2+1,2), strides=arr.strides*2)
>>> arr_pairs
array([[0, 1],
[1, 2],
[2, 3],
[3, 4]])
Of course the nice thing about using as_strided is that, just like with the array slices, there is no data copying involved, just messing with the way memory is viewed, so creating this array is virtually costless.
And now probably the fastest is to use np.dot:
>>> xy = [x, y]
>>> np.dot(arr_pairs, xy)
array([ 1, 12, 23, 34])

This looks like a correlate problem.
a
Out[61]: array([0, 1, 2, 3, 4, 5, 6, 7])
b
Out[62]: array([1, 2])
np.correlate(a,b,mode='valid')
Out[63]: array([ 2, 5, 8, 11, 14, 17, 20])
Depending on array size and BLAS dot can be faster, your milage will vary greatly:
arr = np.random.rand(1E6)
b = np.random.rand(2)
np.allclose(jamie_dot(arr,b),np.convolve(arr,b[::-1],mode='valid'))
True
%timeit jamie_dot(arr,b)
100 loops, best of 3: 16.1 ms per loop
%timeit np.correlate(arr,b,mode='valid')
10 loops, best of 3: 28.8 ms per loop
This is with an intel mkl BLAS and 8 cores, np.correlate will likely be faster for most implementations.
Also an interesting observation from #Jamie's post:
%timeit b[0]*arr[:-1] + b[1]*arr[1:]
100 loops, best of 3: 8.43 ms per loop
His comment also eliminated the use of np.convolve(a,b[::-1],mode=valid) to the simpler correlate syntax.

If you have a small array, I would create a shifted copy:
shifted_array=numpy.append(original_array[1:],0)
result_array=x*original_array+y*shifted_array
Here you have to store your array twice in memory, so this solution is very memory inefficient, but you can get rid of the for loops.
If you have large arrays, you really need a loop (but much rather a list comprehension):
result_array=[x*original_array[i]+y*original_array[i+1] for i in xrange(len(original_array)-1)]
It gives you the same result as a python list, except for the last item, which should be treated differently anyway.
Based on some random trials, for arrays smaller than 2000 items. the first solution seems to be faster than the second one, but runs into MemoryError even for relatively small arrays (a few 10s of thousands on my PC).
So generally, use a list comprehension, but if you surely know that you will run this only on small (max. 1-2 thousand) arrays, you have a better shot.
Creating a new list like [[a,b],[b,c],[c,d],[d,e],...] would be both memory and time inefficient, as you also need a for loop (or similar) to create it, and you have to store every old value in a new array twice, so you would end up with storing your original array three times.

Another way is to create the right pairs in the array a = np.array([a,b,c,d,e,...]), reshape according to the size of array b = np.array([x, y, ...]) and then take advantage of numpy broadcasting rules:
a = np.arange(8)
b = np.array([1, 2])
a = a.repeat(2)[1:-1]
ans = a.reshape(-1, b.shape[0]).dot(b)
Timings (on my computer):
#Ophion's solution:
# 100000 loops, best of 3: 4.67 µs per loop
This solution:
# 100000 loops, best of 3: 9.78 µs per loop
So, it is slower. #Jaime's solution is better since it does not copy the data like this one.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Getting past slow list creation in python - python

Related

Why is range()-function slower than multiplying items to get copies inside nested list?

apply vector of functions to vector of arguments

Processing time difference between tuple-list

Most pythonic and/or performant way to assign a single value to a slice?

numpy array partial sums with weights

Categories

Resources