Python benchmark: Why for in loop is faster than simple while? - python

I was trying to optimize simple character counting function. After few changes I decided to check the timings and expected the function using basic 'while' loop to be faster than 'for in' loop.
But to my surprise while loop was almost 30% slower than for in here! Shouldn't be simple 'while' loop which has lower abstraction (doing less internally) be much faster than 'for in'?
import timeit
def faster_count_alphabet(filename):
l = [0] * 128 # all ascii values 0 to 127
with open(filename) as fh:
a = fh.read()
for chars in a:
l[ord(chars)] += 1
return l
def faster_count_alphabet2(filename):
l = [0] * 128 # all ascii values 0 to 127
with open(filename) as fh:
a = fh.read()
i = 0
size = len(a)
while(i<size):
l[ord(a[i])] += 1
i+=1
return l
if __name__ == "__main__":
print timeit.timeit("faster_count_alphabet('connect.log')", setup="from __main__ import faster_count_alphabet", number = 10)
print timeit.timeit("faster_count_alphabet2('connect.log')", setup="from __main__ import faster_count_alphabet2", number = 10)
Here is the timings I am getting:
7.087787236
9.9472761879

While Loop
Well in your while loop the interpreter has to check every iteration whether your expression is true therefore it has to access both elements i and size and compare them.
For Loop
The for loop on the other hand has no need for that since the for loop is optimized as Chris_Rands already pointed out

There are the result of my testing with python2.7:
For Loop
test1
python -mtimeit -s"d='/Users/xuejiang/go/src/main'.split('/')" "for i in range(len(d)):k=('index:',i,'value:',d[i])"
result:
1000000 loops, best of 3: 0.747 usec per loop
test2:
python -mtimeit -s"d='/Users/xuejiang/go/src/main'.split('/');i=0" "for v in d:k=('index:',i,'value:',v);i+=1"
result:
1000000 loops, best of 3: 0.524 usec per loop
While Loop
test
python -mtimeit -s"d='/Users/xuejiang/go/src/main'.split('/');i=0" "while i <len(d):k=('index:',i,'value:',d[i]);i+=1"
result:
10000000 loops, best of 3: 0.0658 usec per loop
That is: while loop is much faster

Your code should represent same functionality, I have modified it for you to re-run your test.
I can see you even use the optimised version of the foreach loop instead of the normal for loop while you are benchmarking.
def faster_count_for_loop(filename):
l = [0] * 128 # all ascii values 0 to 127
with open(filename) as fh:
a = fh.read()
size = len(a)
for i in range(size):
l[ord(a[i])] += 1
return l
def faster_count_while_loop(filename):
l = [0] * 128 # all ascii values 0 to 127
with open(filename) as fh:
a = fh.read()
i = 0
size = len(a)
while(i < size):
l[ord(a[i])] += 1
i += 1
return l

Related

Optimizing execution-time to check if chars of a word are in a list python

I am writing python2.7.15 code to access chars inside a word. How can I optimize this process, in order to check also if every word is contained inside an external list?
I have tried two versions of python2 code: version(1) is an extended version of what my code has to do, whereas in version (2) I tried a compact version of the same code.
chars_array = ['a','b','c']
VERSION (1)
def version1(word):
chars =[x for x in word]
count = 0
for c in chars:
if not c in chars_array:
count+=1
return count
VERSION (2)
def version2(word):
return sum([1 for c in [x for x in word] if not c in chars_array])
I am analyzing a large corpus and for version1 I obtain an execution time of 8.56 sec, whereas for version2 it is 8.12 sec.
The fastest solution (can be up to 100x faster for an extremely long string):
joined = ''.join(chars_array)
def version3(word):
return len(word.translate(None, joined))
Another slower solution that is approximately the same speed as your code:
from itertools import ifilterfalse
def version4(word):
return sum(1 for _ in ifilterfalse(set(chars_array).__contains__, word))
Timings (s is a random string):
In [17]: %timeit version1(s)
1000 loops, best of 3: 79.9 µs per loop
In [18]: %timeit version2(s)
10000 loops, best of 3: 98.1 µs per loop
In [19]: %timeit version3(s)
100000 loops, best of 3: 4.12 µs per loop # <- fastest
In [20]: %timeit version4(s)
10000 loops, best of 3: 84.3 µs per loop
With chars_array = ['a', 'e', 'i', 'o', 'u', 'y'] and words equal to a list
of 56048 English words, I measured a number of variants with a command similar to the following at an IPython prompt:
%timeit n = [version1(word) for word in words]
In each case it reported "10 loops, best of 3", and I have shown the time per loop
in comments next to each function definition below:
# OP's originals:
def version1(word): # 163 ms
chars =[x for x in word]
count = 0
for c in chars:
if not c in chars_array:
count+=1
return count
def version2(word): # 173 ms
return sum([1 for c in [x for x in word] if not c in chars_array])
Now let's hit version1 and version2 with three optimizations:
remove the redundant list comprehension and iterate through word directly instead;
use the operator not in rather than negating the result of the in operator;
check for (non-)membership of a set rather than a list.
_
chars_set = set(chars_array)
def version1a(word): # 95.5 ms
count = 0
for c in word:
if c not in chars_set:
count+=1
return count
def version2a(word): # 104 ms
return sum([1 for c in word if c not in chars_set])
So there's actually an advantage for the multi-line code over the list comprehension. This may depend on word length, though: version2a has to allocate a new list the same length as the word, whereas version1a does not. Let's refine version2a further to give it that same advantage, by summing over a generator expression rather than a list comprehension:
def version2b(word): # 111 ms
return sum(1 for c in word if c not in chars_set)
To my surprise that was actually slightly counterproductive—but again, that effect may depend on word length.
Finally let's experience the power of .translate():
chars_str = ''.join(chars_set)
def version3(word): # 40.7 ms
return len(word.translate(None, chars_str))
We have a clear winner.

why python dict update insanely slow?

I had a python program which reads lines from files and puts them into dict, for simple, it looks like:
data = {'file_name':''}
with open('file_name') as in_fd:
for line in in_fd:
data['file_name'] += line
I found it took hours to finish.
And then, I did a bit change to the program:
data = {'file_name':[]}
with open('file_name') as in_fd:
for line in in_fd:
data['file_name'].append(line)
data['file_name'] = ''.join(data['file_name'])
It finished in seconds.
I thought it's += makes the program slow, but it seems not. Please take a look at the result of the following test.
I knew we could use list append and join to improve performance when concat strings. But I never thought such a performance gap between append and join and add and assign.
So I decided to do some more tests, and finally found it's the dict update operation makes the program insanely slow. Here is a scripts:
import time
LOOPS = 10000
WORD = 'ABC'*100
s1=time.time()
buf1 = []
for i in xrange(LOOPS):
buf1.append(WORD)
ss = ''.join(buf1)
s2=time.time()
buf2 = ''
for i in xrange(LOOPS):
buf2 += WORD
s3=time.time()
buf3 = {'1':''}
for i in xrange(LOOPS):
buf3['1'] += WORD
s4=time.time()
buf4 = {'1':[]}
for i in xrange(LOOPS):
buf4['1'].append(WORD)
buf4['1'] = ''.join(buf4['1'])
s5=time.time()
print s2-s1, s3-s2, s4-s3, s5-s4
In my laptop(mac pro 2013 mid, OS X 10.9.5, cpython 2.7.10), it's output is:
0.00299620628357 0.00415587425232 3.49465799332 0.00231599807739
Inspired by juanpa.arrivillaga's comments, I did a bit change to the second loop:
trivial_reference = []
buf2 = ''
for i in xrange(LOOPS):
buf2 += WORD
trivial_reference.append(buf2) # add a trivial reference to avoid optimization
After the change, now the second loops takes 19 seconds to complete. So it seems just a optimization problem just as juanpa.arrivillaga said.
+= performs really bad when building large strings but can be efficient in one-case in CPython.mentioned below
For sure-shot faster string concatenation use str.join().
From String Concatenation section under Python Performance Tips:
Avoid this:
s = ""
for substring in list:
s += substring
Use s = "".join(list) instead. The former is a very common and catastrophic mistake when building large strings.
Why s += x is faster than s['1'] += x or s[0] += x?
From Note 6:
CPython implementation detail: If s and t are both strings, some
Python implementations such as CPython can usually perform an in-place
optimization for assignments of the form s = s + t or s += t. When
applicable, this optimization makes quadratic run-time much less
likely. This optimization is both version and implementation
dependent. For performance sensitive code, it is preferable to use the
str.join() method which assures consistent linear concatenation
performance across versions and implementations.
The optimization in case of CPython is that if a string has only one reference then we can resize it in-place.
/* Note that we don't have to modify *unicode for unshared Unicode
objects, since we can modify them in-place. */
Now latter two are not simple in-place additions. In fact these are not in-place additions at all.
s[0] += x
is equivalent to:
temp = s[0] # Extra reference. `S[0]` and `temp` both point to same string now.
temp += x
s[0] = temp
Example:
>>> lst = [1, 2, 3]
>>> def func():
... lst[0] = 90
... return 100
...
>>> lst[0] += func()
>>> print lst
[101, 2, 3] # Not [190, 2, 3]
But in general never use s += x for concatenating string, always use str.join on a collection of strings.
Timings
LOOPS = 1000
WORD = 'ABC'*100
def list_append():
buf1 = [WORD for _ in xrange(LOOPS)]
return ''.join(buf1)
def str_concat():
buf2 = ''
for i in xrange(LOOPS):
buf2 += WORD
def dict_val_concat():
buf3 = {'1': ''}
for i in xrange(LOOPS):
buf3['1'] += WORD
return buf3['1']
def list_val_concat():
buf4 = ['']
for i in xrange(LOOPS):
buf4[0] += WORD
return buf4[0]
def val_pop_concat():
buf5 = ['']
for i in xrange(LOOPS):
val = buf5.pop()
val += WORD
buf5.append(val)
return buf5[0]
def val_assign_concat():
buf6 = ['']
for i in xrange(LOOPS):
val = buf6[0]
val += WORD
buf6[0] = val
return buf6[0]
>>> %timeit list_append()
1000 loops, best of 3: 1.31 ms per loop
>>> %timeit str_concat()
100 loops, best of 3: 3.09 ms per loop
>>> %run so.py
>>> %timeit list_append()
10000 loops, best of 3: 71.2 us per loop
>>> %timeit str_concat()
1000 loops, best of 3: 276 us per loop
>>> %timeit dict_val_concat()
100 loops, best of 3: 9.66 ms per loop
>>> %timeit list_val_concat()
100 loops, best of 3: 9.64 ms per loop
>>> %timeit val_pop_concat()
1000 loops, best of 3: 556 us per loop
>>> %timeit val_assign_concat()
100 loops, best of 3: 9.31 ms per loop
val_pop_concat is fast here because by using pop() we are dropping reference from the list to that string and now CPython can resize it in-place(guessed correctly by #niemmi in comments).

ordered reduce for multiple functions in python

Ordered list reduction
I need to reduce some lists where, depending on element types, the speed and implementation of the binary operation varies, i.e. large speed reductions can be gained by reducing some pairs with specific functions first.
For example foo(a[0], bar(a[1], a[2]))
might be a lot slower than bar(foo(a[0], a[1]), a[2]) but in this case give the same result.
I have the code that produces an optimal ordering in the form of a list of tuples (pair_index, binary_function) already. I am struggling to implement an efficient function to perform the reduction, ideally one that returns a new partial function which can then be used repeatedly on lists of the same type-ordering but varying values.
Simple and slow(?) solution
Here is my naive solution involving a for loop, deletion of elements and closure over the (pair_index, binary_function) list to return a 'precomputed' function.
def ordered_reduce(a, pair_indexes, binary_functions, precompute=False):
"""
a: list to reduce, length n
pair_indexes: order of pairs to reduce, length (n-1)
binary_functions: functions to use for each reduction, length (n-1)
"""
def ord_red_func(x):
y = list(x) # copy so as not to eat up
for p, f in zip(pair_indexes, binary_functions):
b = f(y[p], y[p+1])
# Replace pair
del y[p]
y[p] = b
return y[0]
return ord_red_func if precompute else ord_red_func(a)
>>> foos = (lambda a, b: a - b, lambda a, b: a + b, lambda a, b: a * b)
>>> ordered_reduce([1, 2, 3, 4], (2, 1, 0), foos)
1
>>> 1 * (2 + (3-4))
1
And how pre-compution works:
>>> foo = ordered_reduce(None, (0, 1, 0), foos)
>>> foo([1, 2, 3, 4])
-7
>>> (1 - 2) * (3 + 4)
-7
However it involves copying the whole list and is also (therefore?) slow. Is there a better/standard way to do this?
(EDIT:) Some Timings:
from operators import add
from functools import reduce
from itertools import repeat
from random import random
r = 100000
xs = [random() for _ in range(r)]
# slightly trivial choices of pairs and functions, to replicate reduce
ps = [0]*(r-1)
fs = repeat(add)
foo = ordered_reduce(None, ps, fs, precompute=True)
>>> %timeit reduce(add, xs)
100 loops, best of 3: 3.59 ms per loop
>>> %timeit foo(xs)
1 loop, best of 3: 1.44 s per loop
This is kind of worst case scenario, and slightly cheating as reduce does not take a iterable of functions, but a function which does (but no order) is still pretty fast:
def multi_reduce(fs, xs):
xs = iter(xs)
x = next(xs)
for f, nx in zip(fs, xs):
x = f(x, nx)
return x
>>> %timeit multi_reduce(fs, xs)
100 loops, best of 3: 8.71 ms per loop
(EDIT2): and for fun, the performance of a massively cheating 'compiled' version, which gives some idea of the total overhead occurring.
from numba import jit
#jit(nopython=True)
def numba_sum(xs):
y = 0
for x in xs:
y += x
return y
>>> %timeit numba_sum(xs)
1000 loops, best of 3: 1.46 ms per loop
When I read this problem, I immediately thought of reverse Polish notation (RPN). While it may not be the best approach, it still gives a substantial speedup in this case.
My second thought is that you may get an equivalent result if you just reorder the sequence xs appropriately to get rid of del y[p]. (Arguably the best performance would be achieved if the whole reduce procedure is written in C. But it's a different kettle of fish.)
Reverse Polish Notation
If you are not familiar with RPN, please read the short explanation in the wikipedia article. Basically, all operations can be written down without parentheses, for example (1-2)*(3+4) is 1 2 - 3 4 + * in RPN, while 1-(2*(3+4)) becomes 1 2 3 4 + * -.
Here is a simple implementation of an RPN parser. I separated an list of objects from an RPN sequence, so that the same sequence can be used for directly for different lists.
def rpn(arr, seq):
'''
Reverse Polish Notation algorithm
(this version works only for binary operators)
arr: array of objects
seq: rpn sequence containing indices of objects from arr and functions
'''
stack = []
for x in seq:
if isinstance(x, int):
# it's an object: push it to stack
stack.append(arr[x])
else:
# it's a function: pop two objects, apply the function, push the result to stack
b = stack.pop()
#a = stack.pop()
#stack.append(x(a,b))
## shortcut:
stack[-1] = x(stack[-1], b)
return stack.pop()
Example of usage:
# Say we have an array
arr = [100, 210, 42, 13]
# and want to calculate
(100 - 210) * (42 + 13)
# It translates to RPN:
100 210 - 42 13 + *
# or
arr[0] arr[1] - arr[2] arr[3] + *
# So we apply `
rpn(arr,[0, 1, subtract, 2, 3, add, multiply])
To apply RPN to your case you'd need either to generate rpn sequences from scratch or to convert your (pair_indexes, binary_functions) into them. I haven't thought about a converter but it surely can be done.
Tests
Your original test comes first:
r = 100000
xs = [random() for _ in range(r)]
ps = [0]*(r-1)
fs = repeat(add)
foo = ordered_reduce(None, ps, fs, precompute=True)
rpn_seq = [0] + [x for i, f in zip(range(1,r), repeat(add)) for x in (i,f)]
rpn_seq2 = list(range(r)) + list(repeat(add,r-1))
# Here rpn_seq denotes (_ + (_ + (_ +( ... )...))))
# and rpn_seq2 denotes ((...( ... _)+ _) + _).
# Obviously, they are not equivalent but with 'add' they yield the same result.
%timeit reduce(add, xs)
100 loops, best of 3: 7.37 ms per loop
%timeit foo(xs)
1 loops, best of 3: 1.71 s per loop
%timeit rpn(xs, rpn_seq)
10 loops, best of 3: 79.5 ms per loop
%timeit rpn(xs, rpn_seq2)
10 loops, best of 3: 73 ms per loop
# Pure numpy just out of curiosity:
%timeit np.sum(np.asarray(xs))
100 loops, best of 3: 3.84 ms per loop
xs_np = np.asarray(xs)
%timeit np.sum(xs_np)
The slowest run took 4.52 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 48.5 µs per loop
So, rpn was 10 times slower than reduce but about 20 times faster than ordered_reduce.
Now, let's try something more complicated: alternately adding and multiplying matrices. I need a special function for it to test against reduce.
add_or_dot_b = 1
def add_or_dot(x,y):
'''calls 'add' and 'np.dot' alternately'''
global add_or_dot_b
if add_or_dot_b:
out = x+y
else:
out = np.dot(x,y)
add_or_dot_b = 1 - add_or_dot_b
# normalizing out to avoid `inf` in results
return out/np.max(out)
r = 100001 # +1 for convenience
# (we apply an even number of functions)
xs = [np.random.rand(2,2) for _ in range(r)]
ps = [0]*(r-1)
fs = repeat(add_or_dot)
foo = ordered_reduce(None, ps, fs, precompute=True)
rpn_seq = [0] + [x for i, f in zip(range(1,r), repeat(add_or_dot)) for x in (i,f)]
%timeit reduce(add_or_dot, xs)
1 loops, best of 3: 894 ms per loop
%timeit foo(xs)
1 loops, best of 3: 2.72 s per loop
%timeit rpn(xs, rpn_seq)
1 loops, best of 3: 1.17 s per loop
Here, rpn was roughly 25% slower than reduce and more than 2 times faster than ordered_reduce.

Most pythonic way to interleave two strings

What's the most pythonic way to mesh two strings together?
For example:
Input:
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'
Output:
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
For me, the most pythonic* way is the following which pretty much does the same thing but uses the + operator for concatenating the individual characters in each string:
res = "".join(i + j for i, j in zip(u, l))
print(res)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
It is also faster than using two join() calls:
In [5]: l1 = 'A' * 1000000; l2 = 'a' * 1000000
In [6]: %timeit "".join("".join(item) for item in zip(l1, l2))
1 loops, best of 3: 442 ms per loop
In [7]: %timeit "".join(i + j for i, j in zip(l1, l2))
1 loops, best of 3: 360 ms per loop
Faster approaches exist, but they often obfuscate the code.
Note: If the two input strings are not the same length then the longer one will be truncated as zip stops iterating at the end of the shorter string. In this case instead of zip one should use zip_longest (izip_longest in Python 2) from the itertools module to ensure that both strings are fully exhausted.
*To take a quote from the Zen of Python: Readability counts.
Pythonic = readability for me; i + j is just visually parsed more easily, at least for my eyes.
Faster Alternative
Another way:
res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
print(''.join(res))
Output:
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
Speed
Looks like it is faster:
%%timeit
res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
''.join(res)
100000 loops, best of 3: 4.75 µs per loop
than the fastest solution so far:
%timeit "".join(list(chain.from_iterable(zip(u, l))))
100000 loops, best of 3: 6.52 µs per loop
Also for the larger strings:
l1 = 'A' * 1000000; l2 = 'a' * 1000000
%timeit "".join(list(chain.from_iterable(zip(l1, l2))))
1 loops, best of 3: 151 ms per loop
%%timeit
res = [''] * len(l1) * 2
res[::2] = l1
res[1::2] = l2
''.join(res)
10 loops, best of 3: 92 ms per loop
Python 3.5.1.
Variation for strings with different lengths
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijkl'
Shorter one determines length (zip() equivalent)
min_len = min(len(u), len(l))
res = [''] * min_len * 2
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
print(''.join(res))
Output:
AaBbCcDdEeFfGgHhIiJjKkLl
Longer one determines length (itertools.zip_longest(fillvalue='') equivalent)
min_len = min(len(u), len(l))
res = [''] * min_len * 2
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
res += u[min_len:] + l[min_len:]
print(''.join(res))
Output:
AaBbCcDdEeFfGgHhIiJjKkLlMNOPQRSTUVWXYZ
With join() and zip().
>>> ''.join(''.join(item) for item in zip(u,l))
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
On Python 2, by far the faster way to do things, at ~3x the speed of list slicing for small strings and ~30x for long ones, is
res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)
This wouldn't work on Python 3, though. You could implement something like
res = bytearray(len(u) * 2)
res[::2] = u.encode("ascii")
res[1::2] = l.encode("ascii")
res.decode("ascii")
but by then you've already lost the gains over list slicing for small strings (it's still 20x the speed for long strings) and this doesn't even work for non-ASCII characters yet.
FWIW, if you are doing this on massive strings and need every cycle, and for some reason have to use Python strings... here's how to do it:
res = bytearray(len(u) * 4 * 2)
u_utf32 = u.encode("utf_32_be")
res[0::8] = u_utf32[0::4]
res[1::8] = u_utf32[1::4]
res[2::8] = u_utf32[2::4]
res[3::8] = u_utf32[3::4]
l_utf32 = l.encode("utf_32_be")
res[4::8] = l_utf32[0::4]
res[5::8] = l_utf32[1::4]
res[6::8] = l_utf32[2::4]
res[7::8] = l_utf32[3::4]
res.decode("utf_32_be")
Special-casing the common case of smaller types will help too. FWIW, this is only 3x the speed of list slicing for long strings and a factor of 4 to 5 slower for small strings.
Either way I prefer the join solutions, but since timings were mentioned elsewhere I thought I might as well join in.
If you want the fastest way, you can combine itertools with operator.add:
In [36]: from operator import add
In [37]: from itertools import starmap, izip
In [38]: timeit "".join([i + j for i, j in uzip(l1, l2)])
1 loops, best of 3: 142 ms per loop
In [39]: timeit "".join(starmap(add, izip(l1,l2)))
1 loops, best of 3: 117 ms per loop
In [40]: timeit "".join(["".join(item) for item in zip(l1, l2)])
1 loops, best of 3: 196 ms per loop
In [41]: "".join(starmap(add, izip(l1,l2))) == "".join([i + j for i, j in izip(l1, l2)]) == "".join(["".join(item) for item in izip(l1, l2)])
Out[42]: True
But combining izip and chain.from_iterable is faster again
In [2]: from itertools import chain, izip
In [3]: timeit "".join(chain.from_iterable(izip(l1, l2)))
10 loops, best of 3: 98.7 ms per loop
There is also a substantial difference between
chain(* and chain.from_iterable(....
In [5]: timeit "".join(chain(*izip(l1, l2)))
1 loops, best of 3: 212 ms per loop
There is no such thing as a generator with join, passing one is always going to be slower as python will first build a list using the content because it does two passes over the data, one to figure out the size needed and one to actually do the join which would not be possible using a generator:
join.h:
/* Here is the general case. Do a pre-pass to figure out the total
* amount of space we'll need (sz), and see whether all arguments are
* bytes-like.
*/
Also if you have different length strings and you don't want to lose data you can use izip_longest :
In [22]: from itertools import izip_longest
In [23]: a,b = "hlo","elworld"
In [24]: "".join(chain.from_iterable(izip_longest(a, b,fillvalue="")))
Out[24]: 'helloworld'
For python 3 it is called zip_longest
But for python2, veedrac's suggestion is by far the fastest:
In [18]: %%timeit
res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)
....:
100 loops, best of 3: 2.68 ms per loop
You could also do this using map and operator.add:
from operator import add
u = 'AAAAA'
l = 'aaaaa'
s = "".join(map(add, u, l))
Output:
'AaAaAaAaAa'
What map does is it takes every element from the first iterable u and the first elements from the second iterable l and applies the function supplied as the first argument add. Then join just joins them.
Jim's answer is great, but here's my favorite option, if you don't mind a couple of imports:
from functools import reduce
from operator import add
reduce(add, map(add, u, l))
A lot of these suggestions assume the strings are of equal length. Maybe that covers all reasonable use cases, but at least to me it seems that you might want to accomodate strings of differing lengths too. Or am I the only one thinking the mesh should work a bit like this:
u = "foobar"
l = "baz"
mesh(u,l) = "fboaozbar"
One way to do this would be the following:
def mesh(a,b):
minlen = min(len(a),len(b))
return "".join(["".join(x+y for x,y in zip(a,b)),a[minlen:],b[minlen:]])
I like using two fors, the variable names can give a hint/reminder to what is going on:
"".join(char for pair in zip(u,l) for char in pair)
Just to add another, more basic approach:
st = ""
for char in u:
st = "{0}{1}{2}".format( st, char, l[ u.index( char ) ] )
Feels a bit un-pythonic not to consider the double-list-comprehension answer here, to handle n string with O(1) effort:
"".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)
where all_strings is a list of the strings you want to interleave. In your case, all_strings = [u, l]. A full use example would look like this:
import itertools
a = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
b = 'abcdefghijklmnopqrstuvwxyz'
all_strings = [a,b]
interleaved = "".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)
print(interleaved)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
Like many answers, fastest? Probably not, but simple and flexible. Also, without too much added complexity, this is slightly faster than the accepted answer (in general, string addition is a bit slow in python):
In [7]: l1 = 'A' * 1000000; l2 = 'a' * 1000000;
In [8]: %timeit "".join(a + b for i, j in zip(l1, l2))
1 loops, best of 3: 227 ms per loop
In [9]: %timeit "".join(c for cs in zip(*(l1, l2)) for c in cs)
1 loops, best of 3: 198 ms per loop
Potentially faster and shorter than the current leading solution:
from itertools import chain
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'
res = "".join(chain(*zip(u, l)))
Strategy speed-wise is to do as much at the C-level as possible. Same zip_longest() fix for uneven strings and it would be coming out of the same module as chain() so can't ding me too many points there!
Other solutions I came up with along the way:
res = "".join(u[x] + l[x] for x in range(len(u)))
res = "".join(k + l[i] for i, k in enumerate(u))
You could use iteration_utilities.roundrobin1
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'
from iteration_utilities import roundrobin
''.join(roundrobin(u, l))
# returns 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
or the ManyIterables class from the same package:
from iteration_utilities import ManyIterables
ManyIterables(u, l).roundrobin().as_string()
# returns 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
1 This is from a third-party library I have written: iteration_utilities.
I would use zip() to get a readable and easy way:
result = ''
for cha, chb in zip(u, l):
result += '%s%s' % (cha, chb)
print result
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

perfomance of len(List) vs reading a variable

A similar question has already been ask Cost of len() function here. However, this question looks at the cost of len it self.
Suppose, I have a code that repeats many times len(List), every time is O(1), reading a variable is also O(1) plus assigning it is also O(1).
As a side note, I find that n_files = len(Files) is somewhat more readable than repeated len(Files) in my code. So, that is already an incentive for me to do this.
You could also argue against me, that somewhere in the code Files can be modified, so n_files is no longer correct, but that is not the case.
My question is:
Is the a number of calls to len(Files) after which accessing n_files
will be faster?
A few results (time, in seconds, for one million calls), with a ten-element list using Python 2.7.10 on Windows 7; store is whether we store the length or keeping calling len, and alias is whether or not we create a local alias for len:
Store Alias n= 1 10 100
Yes Yes 0.862 1.379 6.669
Yes No 0.792 1.337 6.543
No Yes 0.914 1.924 11.616
No No 0.879 1.987 12.617
and a thousand-element list:
Store Alias n= 1 10 100
Yes Yes 0.877 1.369 6.661
Yes No 0.785 1.299 6.808
No Yes 0.926 1.886 11.720
No No 0.891 1.948 12.843
Conclusions:
Storing the result is more efficient than calling len repeatedly, even for n == 1;
Creating a local alias for len can make a small improvement for larger n where we aren't storing the result, but not as much as just storing the result would; and
The influence of the length of the list is negligible, suggesting that whether or not the integers are interned isn't making any difference.
Test script:
def test(n, l, store, alias):
if alias:
len_ = len
len_l = len_(l)
else:
len_l = len(l)
for _ in range(n):
if store:
_ = len_l
elif alias:
_ = len_(l)
else:
_ = len(l)
if __name__ == '__main__':
from itertools import product
from timeit import timeit
setup = 'from __main__ import test, l'
for n, l, store, alias in product(
(1, 10, 100),
([None]*10,),
(True, False),
(True, False),
):
test_case = 'test({!r}, l, {!r}, {!r})'.format(n, store, alias)
print test_case, len(l),
print timeit(test_case, setup=setup)
Function calls in python are costly, so if you are 100% sure that the size of n_files would not change when you are accessing its length from the variable, you can use the variable, if that is what is more readable for you as well.
An Example performance test for both accessing len(list) and accessing from variable , gives the following result -
In [36]: l = list(range(100000))
In [37]: n_l = len(l)
In [40]: %timeit newn = len(l)
10000000 loops, best of 3: 92.8 ns per loop
In [41]: %timeit new_n = n_l
10000000 loops, best of 3: 33.1 ns per loop
Accessing the variable is always faster than using len() .
Using l = len(li) is faster:
python -m timeit -s "li = [1, 2, 3]" "len(li)"
1000000 loops, best of 3: 0.239 usec per loop
python -m timeit -s "li = [1, 2, 3]; l = len(li)" "l"
10000000 loops, best of 3: 0.0949 usec per loop
Using len(Files) instead of n_files is likely to be slower. Yes you have to lookup n_files, but in the former case you'll have to lookup both len and Files and then on top of that call a function that "calculates" the length of Files.

Categories

Resources