In Python, is there a difference (say, in performance) between writing
L.append(x)
and
L[len(L):len(L)] = [x]
where L is a list? If there is, what is it caused by?
Thanks!
Apart from append method, you could append elements to list using insert, I'm guessing that's what you are pointing at:
In [115]: l=[1,]
In [116]: l.insert(len(l), 11)
In [117]: l
Out[117]: [1, 11]
l.append(x) vs. l.insert(len(l), x):
In [166]: %timeit -n1000 l=[1]; l.append(11)
1000 loops, best of 3: 936 ns per loop
In [167]: %timeit -n1000 l=[1]; l.insert(len(l), 11)
1000 loops, best of 3: 1.44 us per loop
It's obvious that method append is better.
and then L.append(x) vs L[len(L):len(L)] = [x]:
or L[len(L):]=[x]
In [145]: %timeit -n1000 l=[1]; l.append(123);
1000 loops, best of 3: 878 ns per loop
In [146]: %timeit -n1000 l=[1]; l[len(l):]=[123]
1000 loops, best of 3: 1.24 us per loop
In [147]: %timeit -n1000 l=[1]; l[len(l):len(l)]=[123]
1000 loops, best of 3: 1.46 us per loop
There is no difference on my system...
In [22]: f = (4,)
In [21]: %timeit l = [1,2,3]; l.append(4)
1000000 loops, best of 3: 265 ns per loop
In [23]: %timeit l = [1,2,3]; l.append(f)
1000000 loops, best of 3: 266 ns per loop
In [24]: %timeit l = [1,2,3]; l.extend(f)
1000000 loops, best of 3: 270 ns per loop
In [25]: %timeit l = [1,2,3]; l[4:] = f
1000000 loops, best of 3: 260 ns per loop
This means that in an apples-to-apples comparison, they are the same (above differences are probably less than random error).
However, anything extra (such as having to calculate len in that version) may skew the results for some particular implementation.
As always, performance testing has pitfalls. But in your example:
x need not be an iterable, you are wrapping it in an iterable. This obviously is an extra step that incurs a performance penalty.
Performing len(L) is not free, it takes a non-zero amount of time. This also incurs a performance penalty.
Some quick testing bears this out:
def f():
a = []
for i in range(10000):
a.append(0)
def g():
a = []
for i in range(10000):
a[len(a):len(a)] = [0]
%timeit f()
1000 loops, best of 3: 683 us per loop
%timeit g()
100 loops, best of 3: 2.4 ms per loop
Now one non-obvious "optimization" you can do to remove the len(L) effect is use a constant slice that is higher than the length of your list will ever get. Extended slicing never throws an IndexError, even if you're waaaaay off the end of the iterable. So let's do that.
def h():
a = []
for i in range(10000):
a[11111:11111] = [0]
%timeit h()
1000 loops, best of 3: 1.45 ms per loop
So as suspected, both wrapping your x in an iterable and calling len have small but tangible performance penalties.
And, of course, doing li[len(li):len(li)] is UGLY. That's the biggest performance penalty: the time it takes my brain to figure out what the heck it just looked at. :-)
Related
What is the best way to check if an numpy array contains any element of another array?
example:
array1 = [10,5,4,13,10,1,1,22,7,3,15,9]
array2 = [3,4,9,10,13,15,16,18,19,20,21,22,23]`
I want to get a True if array1 contains any value of array2, otherwise a False.
Using Pandas, you can use isin:
a1 = np.array([10,5,4,13,10,1,1,22,7,3,15,9])
a2 = np.array([3,4,9,10,13,15,16,18,19,20,21,22,23])
>>> pd.Series(a1).isin(a2).any()
True
And using the in1d numpy function(per the comment from #Norman):
>>> np.any(np.in1d(a1, a2))
True
For small arrays such as those in this example, the solution using set is the clear winner. For larger, dissimilar arrays (i.e. no overlap), the Pandas and Numpy solutions are faster. However, np.intersect1d appears to excel for larger arrays.
Small arrays (12-13 elements)
%timeit set(array1) & set(array2)
The slowest run took 4.22 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.69 µs per loop
%timeit any(i in a1 for i in a2)
The slowest run took 12.29 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 1.88 µs per loop
%timeit np.intersect1d(a1, a2)
The slowest run took 10.29 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 15.6 µs per loop
%timeit np.any(np.in1d(a1, a2))
10000 loops, best of 3: 27.1 µs per loop
%timeit pd.Series(a1).isin(a2).any()
10000 loops, best of 3: 135 µs per loop
Using an array with 100k elements (no overlap):
a3 = np.random.randint(0, 100000, 100000)
a4 = a3 + 100000
%timeit np.intersect1d(a3, a4)
100 loops, best of 3: 13.8 ms per loop
%timeit pd.Series(a3).isin(a4).any()
100 loops, best of 3: 18.3 ms per loop
%timeit np.any(np.in1d(a3, a4))
100 loops, best of 3: 18.4 ms per loop
%timeit set(a3) & set(a4)
10 loops, best of 3: 23.6 ms per loop
%timeit any(i in a3 for i in a4)
1 loops, best of 3: 34.5 s per loop
You can try this
>>> array1 = [10,5,4,13,10,1,1,22,7,3,15,9]
>>> array2 = [3,4,9,10,13,15,16,18,19,20,21,22,23]
>>> set(array1) & set(array2)
set([3, 4, 9, 10, 13, 15, 22])
If you get result means there are common elements in both array.
If result is empty means no common elements.
You can use any built-in function and list comprehension:
>>> array1 = [10,5,4,13,10,1,1,22,7,3,15,9]
>>> array2 = [3,4,9,10,13,15,16,18,19,20,21,22,23]
>>> any(i in array2 for i in array1)
True
I'm testing different versions of string sanitizing and encountered the effect below. What is hard for me to tell if this is really the result of caching as %timeit of IPython warns or if this is real. Please advise:
str.replace:
def sanit2(s):
for c in ["'", '%', '"']:
s=s.replace(c,'')
return s
In [44]: %timeit sanit2(r""" ' ' % a % ' """)
The slowest run took 12.43 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 985 ns per loop
List comprehension:
def sanit3(s):
removed = [x for x in s if not x in ["'", '%', '"']]
return ''.join(removed)
In [42]: %timeit sanit3(r""" ' ' % a % ' """)
The slowest run took 8.95 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 2.12 µs per loop
This seems to hold for relatively long strings too:
In [46]: reallylong = r""" ' ' % a % ' """ * 1000
In [47]: len(reallylong)
Out[47]: 22000
In [48]: %timeit sanit2(reallylong)
The slowest run took 4.94 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 96.9 µs per loop
In [49]: %timeit sanit3(reallylong)
1000 loops, best of 3: 1.9 ms per loop
UPDATE: I presumed that str.replace also has more or less O(n) complexity, so I expected both sanit2 and sanit3 to have about O(n^2) complexity.
I tested cost of str.replace depending on string length:
In [59]: orig_str = r""" ' ' % a % ' """
In [60]: for i in range(1,11):
....: longer = orig_str * i * 1000
....: %timeit longer.replace('%', '')
....:
10000 loops, best of 3: 44.2 µs per loop
10000 loops, best of 3: 87.8 µs per loop
10000 loops, best of 3: 131 µs per loop
10000 loops, best of 3: 177 µs per loop
1000 loops, best of 3: 219 µs per loop
1000 loops, best of 3: 259 µs per loop
1000 loops, best of 3: 311 µs per loop
1000 loops, best of 3: 349 µs per loop
1000 loops, best of 3: 398 µs per loop
1000 loops, best of 3: 435 µs per loop
In [61]: t="""10000 loops, best of 3: 44.2 s per loop
....: 10000 loops, best of 3: 87.8 s per loop
....: 10000 loops, best of 3: 131 s per loop
....: 10000 loops, best of 3: 177 s per loop
....: 1000 loops, best of 3: 219 s per loop
....: 1000 loops, best of 3: 259 s per loop
....: 1000 loops, best of 3: 311 s per loop
....: 1000 loops, best of 3: 349 s per loop
....: 1000 loops, best of 3: 398 s per loop
....: 1000 loops, best of 3: 435 s per loop"""
Looks linear, but I calculated it to be sure:
In [63]: averages=[]
In [66]: for idx, line in enumerate(t.split('\n')):
....: repl_time = line.rsplit(':',1)[1].split(' ')[1]
....: averages.append(float(repl_time)/(idx+1))
....:
In [67]: averages
Out[67]:
[44.2,
43.9,
43.666666666666664,
44.25,
43.8,
43.166666666666664,
44.42857142857143,
43.625,
44.22222222222222,
43.5]
Yes, str.replace is almost perfectly O(n). So on top of iterating over a list of characters to be replaced, sanit2 should have O(n^2) complexity just like sanit3 (x for x in s => iterate over characters of a string to be replaced, O(n). ...x in ["'", '%', '"'] should be O(n) as well given list.__contains__ cost. Altogether O(n^2)).
So in reply to chepner, yes, sanit2 does a fixed number of function calls (and few, just 3 in the example), but due to internal cost of str.replace it seems like sanit2 should have similar order of complexity to sanit3.
Is the difference all due to the fact that str.replace is implemented in C or maybe the function call (list.__contains__) also play an important role?
sanit2 makes a fixed number of calls, independent of the length of s, to a string method implemented in C.
sanit3 makes a variable number of calls (one per element in s) to list.__contains__, which itself uses an O(n), not O(1), algorithm. It also has to construct a list object, then call ''.join on that list.
It's not surprising that sanit2 is faster.
in pandas' manual, there is this example about indexing:
In [653]: criterion = df2['a'].map(lambda x: x.startswith('t'))
In [654]: df2[criterion]
then Wes wrote:
**# equivalent but slower**
In [655]: df2[[x.startswith('t') for x in df2['a']]]
can anyone here explain a bit why the map approach is faster? Is this a python feature or this is a pandas feature?
Arguments about why a certain way of doing things in Python "should be" faster can't be taken too seriously, because you're often measuring implementation details which may behave differently in certain situations. As a result, when people guess what should be faster, they're often (usually?) wrong. For example, I find that map can actually be slower. Using this setup code:
import numpy as np, pandas as pd
import random, string
def make_test(num, width):
s = [''.join(random.sample(string.ascii_lowercase, width)) for i in range(num)]
df = pd.DataFrame({"a": s})
return df
Let's compare the time they take to make the indexing object -- whether a Series or a list -- and the resulting time it takes to use that object to index into the DataFrame. It could be, for example, that making a list is fast but before using it as an index it needs to be internally converted to a Series or an ndarray or something and so there's extra time added there.
First, for a small frame:
>>> df = make_test(10, 10)
>>> %timeit df['a'].map(lambda x: x.startswith('t'))
10000 loops, best of 3: 85.8 µs per loop
>>> %timeit [x.startswith('t') for x in df['a']]
100000 loops, best of 3: 15.6 µs per loop
>>> %timeit df['a'].str.startswith("t")
10000 loops, best of 3: 118 µs per loop
>>> %timeit df[df['a'].map(lambda x: x.startswith('t'))]
1000 loops, best of 3: 304 µs per loop
>>> %timeit df[[x.startswith('t') for x in df['a']]]
10000 loops, best of 3: 194 µs per loop
>>> %timeit df[df['a'].str.startswith("t")]
1000 loops, best of 3: 348 µs per loop
and in this case the listcomp is fastest. That doesn't actually surprise me too much, to be honest, because going via a lambda is likely to be slower than using str.startswith directly, but it's really hard to guess. 10 is small enough we're probably still measuring things like setup costs for Series; what happens in a larger frame?
>>> df = make_test(10**5, 10)
>>> %timeit df['a'].map(lambda x: x.startswith('t'))
10 loops, best of 3: 46.6 ms per loop
>>> %timeit [x.startswith('t') for x in df['a']]
10 loops, best of 3: 27.8 ms per loop
>>> %timeit df['a'].str.startswith("t")
10 loops, best of 3: 48.5 ms per loop
>>> %timeit df[df['a'].map(lambda x: x.startswith('t'))]
10 loops, best of 3: 47.1 ms per loop
>>> %timeit df[[x.startswith('t') for x in df['a']]]
10 loops, best of 3: 52.8 ms per loop
>>> %timeit df[df['a'].str.startswith("t")]
10 loops, best of 3: 49.6 ms per loop
And now it seems like the map is winning when used as an index, although the difference is marginal. But not so fast: what if we manually turn the listcomp into an array or a Series?
>>> %timeit df[np.array([x.startswith('t') for x in df['a']])]
10 loops, best of 3: 40.7 ms per loop
>>> %timeit df[pd.Series([x.startswith('t') for x in df['a']])]
10 loops, best of 3: 37.5 ms per loop
and now the listcomp wins again!
Conclusion: who knows? But never believe anything without timeit results, and even then you have to ask whether you're testing what you think you are.
I'm using the numpy.array() function to create numpy.float64 ndarrays from lists.
I noticed that this is very slow when either the list contains None or a list of lists is provided.
Below are some examples with times. There are obvious workarounds but why is this so slow?
Examples for list of None:
### Very slow to call array() with list of None
In [3]: %timeit numpy.array([None]*100000, dtype=numpy.float64)
1 loops, best of 3: 240 ms per loop
### Problem doesn't exist with array of zeroes
In [4]: %timeit numpy.array([0.0]*100000, dtype=numpy.float64)
100 loops, best of 3: 9.94 ms per loop
### Also fast if we use dtype=object and convert to float64
In [5]: %timeit numpy.array([None]*100000, dtype=numpy.object).astype(numpy.float64)
100 loops, best of 3: 4.92 ms per loop
### Also fast if we use fromiter() insead of array()
In [6]: %timeit numpy.fromiter([None]*100000, dtype=numpy.float64)
100 loops, best of 3: 3.29 ms per loop
Examples for list of lists:
### Very slow to create column matrix
In [7]: %timeit numpy.array([[0.0]]*100000, dtype=numpy.float64)
1 loops, best of 3: 353 ms per loop
### No problem to create column vector and reshape
In [8]: %timeit numpy.array([0.0]*100000, dtype=numpy.float64).reshape((-1,1))
100 loops, best of 3: 10 ms per loop
### Can use itertools to flatten input lists
In [9]: %timeit numpy.fromiter(itertools.chain.from_iterable([[0.0]]*100000),dtype=numpy.float64).reshape((-1,1))
100 loops, best of 3: 9.65 ms per loop
I've reported this as a numpy issue. The report and patch files are here:
https://github.com/numpy/numpy/issues/3392
After patching:
# was 240 ms, best alternate version was 3.29
In [5]: %timeit numpy.array([None]*100000)
100 loops, best of 3: 7.49 ms per loop
# was 353 ms, best alternate version was 9.65
In [6]: %timeit numpy.array([[0.0]]*100000)
10 loops, best of 3: 23.7 ms per loop
My guess would be that the code for converting lists just calls float on everything. If the argument defines __float__, we call that, otherwise we treat it like a string (throwing an exception on None, we catch that and puts in np.nan). The exception handling should be relatively slower.
Timing seems to verify this hypothesis:
import numpy as np
%timeit [None] * 100000
> 1000 loops, best of 3: 1.04 ms per loop
%timeit np.array([0.0] * 100000)
> 10 loops, best of 3: 21.3 ms per loop
%timeit [i.__float__() for i in [0.0] * 100000]
> 10 loops, best of 3: 32 ms per loop
def flt(d):
try:
return float(d)
except:
return np.nan
%timeit np.array([None] * 100000, dtype=np.float64)
> 1 loops, best of 3: 477 ms per loop
%timeit [flt(d) for d in [None] * 100000]
> 1 loops, best of 3: 328 ms per loop
Adding another case just to be obvious about where I'm going with this. If there was an explicit check for None, it would not be this slow above:
def flt2(d):
if d is None:
return np.nan
try:
return float(d)
except:
return np.nan
%timeit [flt2(d) for d in [None] * 100000]
> 10 loops, best of 3: 45 ms per loop
So i have an array, say something like [5,2,2,0], is there a function to return the number of elements that pass a criterion?
Currently i'm doing this:
a = [5,2,2,0]
len([i for i in a if i > 0])
someone suggested this approach too:
sum(b > 0 for b in a)
but IMO this is really the same thing, just a little less readable.
Is there some method like this i could use:
def crit(x): return x > 0
a.count(criterion=crit)
Not much else you can do, but if you already have your predicate
def crit(x):
return x > 0
you can do
sum(map(crit, a))
or
len(filter(crit, a))
len([x for x in a if x > 0]) is the most efficient, but can lead to code duplication if you want to reuse the predicate.
Tests:
In [6]: %timeit len([x for x in a if x > 0])
100000 loops, best of 3: 3.57 us per loop
In [7]: def crit(x):
...: return x > 0
...:
In [8]: %timeit len([x for x in a if crit(x)])
100000 loops, best of 3: 10.1 us per loop
In [9]: %timeit sum([x > 0 for x in a])
100000 loops, best of 3: 5.66 us per loop
In [10]: %timeit sum([crit(x) for x in a])
100000 loops, best of 3: 12 us per loop
In [11]: %timeit sum(map(crit, a))
100000 loops, best of 3: 11.3 us per loop
In [12]: %timeit len(filter(crit, a))
100000 loops, best of 3: 8.21 us per loop
Generators (generators have no len):
In [13]: %timeit sum(1 for x in a if x > 0)
100000 loops, best of 3: 3.99 us per loop
In [14]: %timeit sum([1 for x in a if crit(x)])
10000 loops, best of 3: 10.6 us per loop
In [15]: %timeit sum(x > 0 for x in a)
100000 loops, best of 3: 6.24 us per loop
In [16]: %timeit sum(crit(x) for x in a)
100000 loops, best of 3: 13 us per loop
imap is faster than map:
In [17]: %timeit sum(itertools.imap(crit, a))
100000 loops, best of 3: 10.7 us per loop
After testing all this, I think I would go with [13], [17], or [14].
I'd go for the sum approach instead of materialising a list - if you find it that horrendous, just write a helper function:
def count_if(f, iterable):
return sum(1 for i in iterable if f(i))
Or even better, use one of the recipes in the itertools documentation:
def quantify(iterable, pred=bool):
"Count how many times the predicate is true"
return sum(imap(pred, iterable))
You can use filter function len(filter(crit, a))