What's the most pythonic way to mesh two strings together?
For example:
Input:
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'
Output:
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
For me, the most pythonic* way is the following which pretty much does the same thing but uses the + operator for concatenating the individual characters in each string:
res = "".join(i + j for i, j in zip(u, l))
print(res)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
It is also faster than using two join() calls:
In [5]: l1 = 'A' * 1000000; l2 = 'a' * 1000000
In [6]: %timeit "".join("".join(item) for item in zip(l1, l2))
1 loops, best of 3: 442 ms per loop
In [7]: %timeit "".join(i + j for i, j in zip(l1, l2))
1 loops, best of 3: 360 ms per loop
Faster approaches exist, but they often obfuscate the code.
Note: If the two input strings are not the same length then the longer one will be truncated as zip stops iterating at the end of the shorter string. In this case instead of zip one should use zip_longest (izip_longest in Python 2) from the itertools module to ensure that both strings are fully exhausted.
*To take a quote from the Zen of Python: Readability counts.
Pythonic = readability for me; i + j is just visually parsed more easily, at least for my eyes.
Faster Alternative
Another way:
res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
print(''.join(res))
Output:
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
Speed
Looks like it is faster:
%%timeit
res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
''.join(res)
100000 loops, best of 3: 4.75 µs per loop
than the fastest solution so far:
%timeit "".join(list(chain.from_iterable(zip(u, l))))
100000 loops, best of 3: 6.52 µs per loop
Also for the larger strings:
l1 = 'A' * 1000000; l2 = 'a' * 1000000
%timeit "".join(list(chain.from_iterable(zip(l1, l2))))
1 loops, best of 3: 151 ms per loop
%%timeit
res = [''] * len(l1) * 2
res[::2] = l1
res[1::2] = l2
''.join(res)
10 loops, best of 3: 92 ms per loop
Python 3.5.1.
Variation for strings with different lengths
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijkl'
Shorter one determines length (zip() equivalent)
min_len = min(len(u), len(l))
res = [''] * min_len * 2
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
print(''.join(res))
Output:
AaBbCcDdEeFfGgHhIiJjKkLl
Longer one determines length (itertools.zip_longest(fillvalue='') equivalent)
min_len = min(len(u), len(l))
res = [''] * min_len * 2
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
res += u[min_len:] + l[min_len:]
print(''.join(res))
Output:
AaBbCcDdEeFfGgHhIiJjKkLlMNOPQRSTUVWXYZ
With join() and zip().
>>> ''.join(''.join(item) for item in zip(u,l))
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
On Python 2, by far the faster way to do things, at ~3x the speed of list slicing for small strings and ~30x for long ones, is
res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)
This wouldn't work on Python 3, though. You could implement something like
res = bytearray(len(u) * 2)
res[::2] = u.encode("ascii")
res[1::2] = l.encode("ascii")
res.decode("ascii")
but by then you've already lost the gains over list slicing for small strings (it's still 20x the speed for long strings) and this doesn't even work for non-ASCII characters yet.
FWIW, if you are doing this on massive strings and need every cycle, and for some reason have to use Python strings... here's how to do it:
res = bytearray(len(u) * 4 * 2)
u_utf32 = u.encode("utf_32_be")
res[0::8] = u_utf32[0::4]
res[1::8] = u_utf32[1::4]
res[2::8] = u_utf32[2::4]
res[3::8] = u_utf32[3::4]
l_utf32 = l.encode("utf_32_be")
res[4::8] = l_utf32[0::4]
res[5::8] = l_utf32[1::4]
res[6::8] = l_utf32[2::4]
res[7::8] = l_utf32[3::4]
res.decode("utf_32_be")
Special-casing the common case of smaller types will help too. FWIW, this is only 3x the speed of list slicing for long strings and a factor of 4 to 5 slower for small strings.
Either way I prefer the join solutions, but since timings were mentioned elsewhere I thought I might as well join in.
If you want the fastest way, you can combine itertools with operator.add:
In [36]: from operator import add
In [37]: from itertools import starmap, izip
In [38]: timeit "".join([i + j for i, j in uzip(l1, l2)])
1 loops, best of 3: 142 ms per loop
In [39]: timeit "".join(starmap(add, izip(l1,l2)))
1 loops, best of 3: 117 ms per loop
In [40]: timeit "".join(["".join(item) for item in zip(l1, l2)])
1 loops, best of 3: 196 ms per loop
In [41]: "".join(starmap(add, izip(l1,l2))) == "".join([i + j for i, j in izip(l1, l2)]) == "".join(["".join(item) for item in izip(l1, l2)])
Out[42]: True
But combining izip and chain.from_iterable is faster again
In [2]: from itertools import chain, izip
In [3]: timeit "".join(chain.from_iterable(izip(l1, l2)))
10 loops, best of 3: 98.7 ms per loop
There is also a substantial difference between
chain(* and chain.from_iterable(....
In [5]: timeit "".join(chain(*izip(l1, l2)))
1 loops, best of 3: 212 ms per loop
There is no such thing as a generator with join, passing one is always going to be slower as python will first build a list using the content because it does two passes over the data, one to figure out the size needed and one to actually do the join which would not be possible using a generator:
join.h:
/* Here is the general case. Do a pre-pass to figure out the total
* amount of space we'll need (sz), and see whether all arguments are
* bytes-like.
*/
Also if you have different length strings and you don't want to lose data you can use izip_longest :
In [22]: from itertools import izip_longest
In [23]: a,b = "hlo","elworld"
In [24]: "".join(chain.from_iterable(izip_longest(a, b,fillvalue="")))
Out[24]: 'helloworld'
For python 3 it is called zip_longest
But for python2, veedrac's suggestion is by far the fastest:
In [18]: %%timeit
res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)
....:
100 loops, best of 3: 2.68 ms per loop
You could also do this using map and operator.add:
from operator import add
u = 'AAAAA'
l = 'aaaaa'
s = "".join(map(add, u, l))
Output:
'AaAaAaAaAa'
What map does is it takes every element from the first iterable u and the first elements from the second iterable l and applies the function supplied as the first argument add. Then join just joins them.
Jim's answer is great, but here's my favorite option, if you don't mind a couple of imports:
from functools import reduce
from operator import add
reduce(add, map(add, u, l))
A lot of these suggestions assume the strings are of equal length. Maybe that covers all reasonable use cases, but at least to me it seems that you might want to accomodate strings of differing lengths too. Or am I the only one thinking the mesh should work a bit like this:
u = "foobar"
l = "baz"
mesh(u,l) = "fboaozbar"
One way to do this would be the following:
def mesh(a,b):
minlen = min(len(a),len(b))
return "".join(["".join(x+y for x,y in zip(a,b)),a[minlen:],b[minlen:]])
I like using two fors, the variable names can give a hint/reminder to what is going on:
"".join(char for pair in zip(u,l) for char in pair)
Just to add another, more basic approach:
st = ""
for char in u:
st = "{0}{1}{2}".format( st, char, l[ u.index( char ) ] )
Feels a bit un-pythonic not to consider the double-list-comprehension answer here, to handle n string with O(1) effort:
"".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)
where all_strings is a list of the strings you want to interleave. In your case, all_strings = [u, l]. A full use example would look like this:
import itertools
a = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
b = 'abcdefghijklmnopqrstuvwxyz'
all_strings = [a,b]
interleaved = "".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)
print(interleaved)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
Like many answers, fastest? Probably not, but simple and flexible. Also, without too much added complexity, this is slightly faster than the accepted answer (in general, string addition is a bit slow in python):
In [7]: l1 = 'A' * 1000000; l2 = 'a' * 1000000;
In [8]: %timeit "".join(a + b for i, j in zip(l1, l2))
1 loops, best of 3: 227 ms per loop
In [9]: %timeit "".join(c for cs in zip(*(l1, l2)) for c in cs)
1 loops, best of 3: 198 ms per loop
Potentially faster and shorter than the current leading solution:
from itertools import chain
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'
res = "".join(chain(*zip(u, l)))
Strategy speed-wise is to do as much at the C-level as possible. Same zip_longest() fix for uneven strings and it would be coming out of the same module as chain() so can't ding me too many points there!
Other solutions I came up with along the way:
res = "".join(u[x] + l[x] for x in range(len(u)))
res = "".join(k + l[i] for i, k in enumerate(u))
You could use iteration_utilities.roundrobin1
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'
from iteration_utilities import roundrobin
''.join(roundrobin(u, l))
# returns 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
or the ManyIterables class from the same package:
from iteration_utilities import ManyIterables
ManyIterables(u, l).roundrobin().as_string()
# returns 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
1 This is from a third-party library I have written: iteration_utilities.
I would use zip() to get a readable and easy way:
result = ''
for cha, chb in zip(u, l):
result += '%s%s' % (cha, chb)
print result
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
Related
I am writing python2.7.15 code to access chars inside a word. How can I optimize this process, in order to check also if every word is contained inside an external list?
I have tried two versions of python2 code: version(1) is an extended version of what my code has to do, whereas in version (2) I tried a compact version of the same code.
chars_array = ['a','b','c']
VERSION (1)
def version1(word):
chars =[x for x in word]
count = 0
for c in chars:
if not c in chars_array:
count+=1
return count
VERSION (2)
def version2(word):
return sum([1 for c in [x for x in word] if not c in chars_array])
I am analyzing a large corpus and for version1 I obtain an execution time of 8.56 sec, whereas for version2 it is 8.12 sec.
The fastest solution (can be up to 100x faster for an extremely long string):
joined = ''.join(chars_array)
def version3(word):
return len(word.translate(None, joined))
Another slower solution that is approximately the same speed as your code:
from itertools import ifilterfalse
def version4(word):
return sum(1 for _ in ifilterfalse(set(chars_array).__contains__, word))
Timings (s is a random string):
In [17]: %timeit version1(s)
1000 loops, best of 3: 79.9 µs per loop
In [18]: %timeit version2(s)
10000 loops, best of 3: 98.1 µs per loop
In [19]: %timeit version3(s)
100000 loops, best of 3: 4.12 µs per loop # <- fastest
In [20]: %timeit version4(s)
10000 loops, best of 3: 84.3 µs per loop
With chars_array = ['a', 'e', 'i', 'o', 'u', 'y'] and words equal to a list
of 56048 English words, I measured a number of variants with a command similar to the following at an IPython prompt:
%timeit n = [version1(word) for word in words]
In each case it reported "10 loops, best of 3", and I have shown the time per loop
in comments next to each function definition below:
# OP's originals:
def version1(word): # 163 ms
chars =[x for x in word]
count = 0
for c in chars:
if not c in chars_array:
count+=1
return count
def version2(word): # 173 ms
return sum([1 for c in [x for x in word] if not c in chars_array])
Now let's hit version1 and version2 with three optimizations:
remove the redundant list comprehension and iterate through word directly instead;
use the operator not in rather than negating the result of the in operator;
check for (non-)membership of a set rather than a list.
_
chars_set = set(chars_array)
def version1a(word): # 95.5 ms
count = 0
for c in word:
if c not in chars_set:
count+=1
return count
def version2a(word): # 104 ms
return sum([1 for c in word if c not in chars_set])
So there's actually an advantage for the multi-line code over the list comprehension. This may depend on word length, though: version2a has to allocate a new list the same length as the word, whereas version1a does not. Let's refine version2a further to give it that same advantage, by summing over a generator expression rather than a list comprehension:
def version2b(word): # 111 ms
return sum(1 for c in word if c not in chars_set)
To my surprise that was actually slightly counterproductive—but again, that effect may depend on word length.
Finally let's experience the power of .translate():
chars_str = ''.join(chars_set)
def version3(word): # 40.7 ms
return len(word.translate(None, chars_str))
We have a clear winner.
I had a python program which reads lines from files and puts them into dict, for simple, it looks like:
data = {'file_name':''}
with open('file_name') as in_fd:
for line in in_fd:
data['file_name'] += line
I found it took hours to finish.
And then, I did a bit change to the program:
data = {'file_name':[]}
with open('file_name') as in_fd:
for line in in_fd:
data['file_name'].append(line)
data['file_name'] = ''.join(data['file_name'])
It finished in seconds.
I thought it's += makes the program slow, but it seems not. Please take a look at the result of the following test.
I knew we could use list append and join to improve performance when concat strings. But I never thought such a performance gap between append and join and add and assign.
So I decided to do some more tests, and finally found it's the dict update operation makes the program insanely slow. Here is a scripts:
import time
LOOPS = 10000
WORD = 'ABC'*100
s1=time.time()
buf1 = []
for i in xrange(LOOPS):
buf1.append(WORD)
ss = ''.join(buf1)
s2=time.time()
buf2 = ''
for i in xrange(LOOPS):
buf2 += WORD
s3=time.time()
buf3 = {'1':''}
for i in xrange(LOOPS):
buf3['1'] += WORD
s4=time.time()
buf4 = {'1':[]}
for i in xrange(LOOPS):
buf4['1'].append(WORD)
buf4['1'] = ''.join(buf4['1'])
s5=time.time()
print s2-s1, s3-s2, s4-s3, s5-s4
In my laptop(mac pro 2013 mid, OS X 10.9.5, cpython 2.7.10), it's output is:
0.00299620628357 0.00415587425232 3.49465799332 0.00231599807739
Inspired by juanpa.arrivillaga's comments, I did a bit change to the second loop:
trivial_reference = []
buf2 = ''
for i in xrange(LOOPS):
buf2 += WORD
trivial_reference.append(buf2) # add a trivial reference to avoid optimization
After the change, now the second loops takes 19 seconds to complete. So it seems just a optimization problem just as juanpa.arrivillaga said.
+= performs really bad when building large strings but can be efficient in one-case in CPython.mentioned below
For sure-shot faster string concatenation use str.join().
From String Concatenation section under Python Performance Tips:
Avoid this:
s = ""
for substring in list:
s += substring
Use s = "".join(list) instead. The former is a very common and catastrophic mistake when building large strings.
Why s += x is faster than s['1'] += x or s[0] += x?
From Note 6:
CPython implementation detail: If s and t are both strings, some
Python implementations such as CPython can usually perform an in-place
optimization for assignments of the form s = s + t or s += t. When
applicable, this optimization makes quadratic run-time much less
likely. This optimization is both version and implementation
dependent. For performance sensitive code, it is preferable to use the
str.join() method which assures consistent linear concatenation
performance across versions and implementations.
The optimization in case of CPython is that if a string has only one reference then we can resize it in-place.
/* Note that we don't have to modify *unicode for unshared Unicode
objects, since we can modify them in-place. */
Now latter two are not simple in-place additions. In fact these are not in-place additions at all.
s[0] += x
is equivalent to:
temp = s[0] # Extra reference. `S[0]` and `temp` both point to same string now.
temp += x
s[0] = temp
Example:
>>> lst = [1, 2, 3]
>>> def func():
... lst[0] = 90
... return 100
...
>>> lst[0] += func()
>>> print lst
[101, 2, 3] # Not [190, 2, 3]
But in general never use s += x for concatenating string, always use str.join on a collection of strings.
Timings
LOOPS = 1000
WORD = 'ABC'*100
def list_append():
buf1 = [WORD for _ in xrange(LOOPS)]
return ''.join(buf1)
def str_concat():
buf2 = ''
for i in xrange(LOOPS):
buf2 += WORD
def dict_val_concat():
buf3 = {'1': ''}
for i in xrange(LOOPS):
buf3['1'] += WORD
return buf3['1']
def list_val_concat():
buf4 = ['']
for i in xrange(LOOPS):
buf4[0] += WORD
return buf4[0]
def val_pop_concat():
buf5 = ['']
for i in xrange(LOOPS):
val = buf5.pop()
val += WORD
buf5.append(val)
return buf5[0]
def val_assign_concat():
buf6 = ['']
for i in xrange(LOOPS):
val = buf6[0]
val += WORD
buf6[0] = val
return buf6[0]
>>> %timeit list_append()
1000 loops, best of 3: 1.31 ms per loop
>>> %timeit str_concat()
100 loops, best of 3: 3.09 ms per loop
>>> %run so.py
>>> %timeit list_append()
10000 loops, best of 3: 71.2 us per loop
>>> %timeit str_concat()
1000 loops, best of 3: 276 us per loop
>>> %timeit dict_val_concat()
100 loops, best of 3: 9.66 ms per loop
>>> %timeit list_val_concat()
100 loops, best of 3: 9.64 ms per loop
>>> %timeit val_pop_concat()
1000 loops, best of 3: 556 us per loop
>>> %timeit val_assign_concat()
100 loops, best of 3: 9.31 ms per loop
val_pop_concat is fast here because by using pop() we are dropping reference from the list to that string and now CPython can resize it in-place(guessed correctly by #niemmi in comments).
Ordered list reduction
I need to reduce some lists where, depending on element types, the speed and implementation of the binary operation varies, i.e. large speed reductions can be gained by reducing some pairs with specific functions first.
For example foo(a[0], bar(a[1], a[2]))
might be a lot slower than bar(foo(a[0], a[1]), a[2]) but in this case give the same result.
I have the code that produces an optimal ordering in the form of a list of tuples (pair_index, binary_function) already. I am struggling to implement an efficient function to perform the reduction, ideally one that returns a new partial function which can then be used repeatedly on lists of the same type-ordering but varying values.
Simple and slow(?) solution
Here is my naive solution involving a for loop, deletion of elements and closure over the (pair_index, binary_function) list to return a 'precomputed' function.
def ordered_reduce(a, pair_indexes, binary_functions, precompute=False):
"""
a: list to reduce, length n
pair_indexes: order of pairs to reduce, length (n-1)
binary_functions: functions to use for each reduction, length (n-1)
"""
def ord_red_func(x):
y = list(x) # copy so as not to eat up
for p, f in zip(pair_indexes, binary_functions):
b = f(y[p], y[p+1])
# Replace pair
del y[p]
y[p] = b
return y[0]
return ord_red_func if precompute else ord_red_func(a)
>>> foos = (lambda a, b: a - b, lambda a, b: a + b, lambda a, b: a * b)
>>> ordered_reduce([1, 2, 3, 4], (2, 1, 0), foos)
1
>>> 1 * (2 + (3-4))
1
And how pre-compution works:
>>> foo = ordered_reduce(None, (0, 1, 0), foos)
>>> foo([1, 2, 3, 4])
-7
>>> (1 - 2) * (3 + 4)
-7
However it involves copying the whole list and is also (therefore?) slow. Is there a better/standard way to do this?
(EDIT:) Some Timings:
from operators import add
from functools import reduce
from itertools import repeat
from random import random
r = 100000
xs = [random() for _ in range(r)]
# slightly trivial choices of pairs and functions, to replicate reduce
ps = [0]*(r-1)
fs = repeat(add)
foo = ordered_reduce(None, ps, fs, precompute=True)
>>> %timeit reduce(add, xs)
100 loops, best of 3: 3.59 ms per loop
>>> %timeit foo(xs)
1 loop, best of 3: 1.44 s per loop
This is kind of worst case scenario, and slightly cheating as reduce does not take a iterable of functions, but a function which does (but no order) is still pretty fast:
def multi_reduce(fs, xs):
xs = iter(xs)
x = next(xs)
for f, nx in zip(fs, xs):
x = f(x, nx)
return x
>>> %timeit multi_reduce(fs, xs)
100 loops, best of 3: 8.71 ms per loop
(EDIT2): and for fun, the performance of a massively cheating 'compiled' version, which gives some idea of the total overhead occurring.
from numba import jit
#jit(nopython=True)
def numba_sum(xs):
y = 0
for x in xs:
y += x
return y
>>> %timeit numba_sum(xs)
1000 loops, best of 3: 1.46 ms per loop
When I read this problem, I immediately thought of reverse Polish notation (RPN). While it may not be the best approach, it still gives a substantial speedup in this case.
My second thought is that you may get an equivalent result if you just reorder the sequence xs appropriately to get rid of del y[p]. (Arguably the best performance would be achieved if the whole reduce procedure is written in C. But it's a different kettle of fish.)
Reverse Polish Notation
If you are not familiar with RPN, please read the short explanation in the wikipedia article. Basically, all operations can be written down without parentheses, for example (1-2)*(3+4) is 1 2 - 3 4 + * in RPN, while 1-(2*(3+4)) becomes 1 2 3 4 + * -.
Here is a simple implementation of an RPN parser. I separated an list of objects from an RPN sequence, so that the same sequence can be used for directly for different lists.
def rpn(arr, seq):
'''
Reverse Polish Notation algorithm
(this version works only for binary operators)
arr: array of objects
seq: rpn sequence containing indices of objects from arr and functions
'''
stack = []
for x in seq:
if isinstance(x, int):
# it's an object: push it to stack
stack.append(arr[x])
else:
# it's a function: pop two objects, apply the function, push the result to stack
b = stack.pop()
#a = stack.pop()
#stack.append(x(a,b))
## shortcut:
stack[-1] = x(stack[-1], b)
return stack.pop()
Example of usage:
# Say we have an array
arr = [100, 210, 42, 13]
# and want to calculate
(100 - 210) * (42 + 13)
# It translates to RPN:
100 210 - 42 13 + *
# or
arr[0] arr[1] - arr[2] arr[3] + *
# So we apply `
rpn(arr,[0, 1, subtract, 2, 3, add, multiply])
To apply RPN to your case you'd need either to generate rpn sequences from scratch or to convert your (pair_indexes, binary_functions) into them. I haven't thought about a converter but it surely can be done.
Tests
Your original test comes first:
r = 100000
xs = [random() for _ in range(r)]
ps = [0]*(r-1)
fs = repeat(add)
foo = ordered_reduce(None, ps, fs, precompute=True)
rpn_seq = [0] + [x for i, f in zip(range(1,r), repeat(add)) for x in (i,f)]
rpn_seq2 = list(range(r)) + list(repeat(add,r-1))
# Here rpn_seq denotes (_ + (_ + (_ +( ... )...))))
# and rpn_seq2 denotes ((...( ... _)+ _) + _).
# Obviously, they are not equivalent but with 'add' they yield the same result.
%timeit reduce(add, xs)
100 loops, best of 3: 7.37 ms per loop
%timeit foo(xs)
1 loops, best of 3: 1.71 s per loop
%timeit rpn(xs, rpn_seq)
10 loops, best of 3: 79.5 ms per loop
%timeit rpn(xs, rpn_seq2)
10 loops, best of 3: 73 ms per loop
# Pure numpy just out of curiosity:
%timeit np.sum(np.asarray(xs))
100 loops, best of 3: 3.84 ms per loop
xs_np = np.asarray(xs)
%timeit np.sum(xs_np)
The slowest run took 4.52 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 48.5 µs per loop
So, rpn was 10 times slower than reduce but about 20 times faster than ordered_reduce.
Now, let's try something more complicated: alternately adding and multiplying matrices. I need a special function for it to test against reduce.
add_or_dot_b = 1
def add_or_dot(x,y):
'''calls 'add' and 'np.dot' alternately'''
global add_or_dot_b
if add_or_dot_b:
out = x+y
else:
out = np.dot(x,y)
add_or_dot_b = 1 - add_or_dot_b
# normalizing out to avoid `inf` in results
return out/np.max(out)
r = 100001 # +1 for convenience
# (we apply an even number of functions)
xs = [np.random.rand(2,2) for _ in range(r)]
ps = [0]*(r-1)
fs = repeat(add_or_dot)
foo = ordered_reduce(None, ps, fs, precompute=True)
rpn_seq = [0] + [x for i, f in zip(range(1,r), repeat(add_or_dot)) for x in (i,f)]
%timeit reduce(add_or_dot, xs)
1 loops, best of 3: 894 ms per loop
%timeit foo(xs)
1 loops, best of 3: 2.72 s per loop
%timeit rpn(xs, rpn_seq)
1 loops, best of 3: 1.17 s per loop
Here, rpn was roughly 25% slower than reduce and more than 2 times faster than ordered_reduce.
I know Python isn't built for speed but I would like to improve the performance of the following code:
listA = [1,2]
listB = [1,2,3,4,5,6,7,8,9,10]
# pre-allocate for speed. Appending empty list is slower?
newList = ['NaN']*len(listB)
# Do I need a loop? Can I use something faster?
for n in xrange(len(listB)):
if listB[n] % 2 == 1:
newList[n] = listA[0]
else:
newList[n] = listA[1]
My issue is listB can get pretty large.
I have already pre-allocated memory for newList and used xrange. I believe these provide significant speed increases for large lists.
But do I even need a for loop at all since each loop is not dependent on the previous result. Does python have an array type?
Can I break up listB and run the operation in parallel similar to parfor in Matlab?
ADDITIONAL INFO:
For my problem, as listA gets bigger, listB gets exponentially bigger.
For each item in listB there needs to be a lookup in listA. Then a calculation is performed (not necessary modulo) and the result appended to newList. Then I do a statistical analysis on newList (say take an average for simplicity). newList will always be the same length as listB.
The shortest and, perhaps, fastest way would be using list comprehension:
newList = [listA[1 - x%2] for x in listB]
The purpose of xrange is not to gain speed; its purpose is to reduce memory usage. The difference between range(N) and xrange(N) is that the latter doesn't expand to a list of size N but to a small generator object.
A few tips:
If your list is big, look into numpy. Numpy has efficient algorithms for array handling and uses native code internally.
Modulo is slow (if listB[n] % 2 == 1:). Better use a bitwise operator (if ListB[n]&1) in this case.
The if statement can go out: newList[n] = listA[1-ListB[n]&1] for each value of n in range. Invert the order of listA to get git of the 1- and save another integer op.
Using list comprehension seems to cut some time:
listB = [i for i in xrange(1,1000000)]
start = clock()
listA = [1,2]
for n in xrange(len(listB)):
if listB[n] % 2 == 1:
newList[n] = listA[0]
else:
newList[n] = listB[1]
print "Time taken = %.5f" % (clock() - start)
>>> 0.21216
Compared to:
listB = [i for i in xrange(1,1000000)]
start = clock()
listA = [1,2]
newList = [listA[0] if i%2 == 1 else listA[1] for i in listB]
print "Time taken = %.5f" % (clock() - start)
>> 0.15658
First, replace the modulo operator, n % 2, with the bitwise and operator, n & 1. Next, instead of accessing listB by index, just iterate through its items directly using in. You can remove listA entirely. These small improvements should should speed things up slightly.
newList = ((n & 1) + 1 for n in listB)
The real advantage of this code though, is that it is a generator comprehension, not a list comprehension. Although this doesn't make it any faster, it does make it much more memory efficient. That being said, it also has some disadvantages; you cannot access the entire list, and once you access a value it is gone. If you only intend on iterating through newList or performing some calculation on each item of newList this will be fine. If not, then make newList a list comprehension:
newList = [(n & 1) + 1 for n in listB]
Best of luck!
Just loop over listB and set two variables at the start instead of repeatedly indexing:
newList = []
i, j = listA[0], listA[1]
for n in listB:
if n % 2:
newList.append(i)
else:
newList.append(j)
Or use a list comp:
[i if n % 2 else j for n in listB]
Timings:
In [4]: %%timeit
newList = ['NaN']*len(listB)
for n in xrange(len(listB)):
if listB[n] % 2 == 1:
newList[n] = listA[0]
else:
newList[n] = listA[1]
...:
100000 loops, best of 3: 2.33 µs per loop
In [5]: %%timeit
...: i,j = listA[0], listA[1]
...: [i if n % 2 else j for n in listB]
...:
1000000 loops, best of 3: 1.12 µs per loop
In [16]: %%timeit
....: newList = []
....: i,j = listA[0], listA[1]
....: for n in listB:
....: if n % 2 == 1:
....: newList.append(i)
....: else:
....: newList.append(j)
....:
1000000 loops, best of 3: 1.88 µs per loop
In [18]: timeit [listA[1 - x%2] for x in listB]
1000000 loops, best of 3: 1.38 µs per loop
Using if n & 1 is slightly faster:
In [11]: %%timeit
i,j = listA[0], listA[1]
[i if n & 1 else j for n in listB]
....:
1000000 loops, best of 3: 1.04 µs per loop
So indexing always adds more overhead whether in a list comp or a loop. It is pointless continually indexing listA when you just want the two values.
If you want more speed compiling with cython and simply typing a couple of variables cuts down the runtime:
In [31]: %%cython
....: def faster(l1,l2):
....: cdef int i,j,n
....: i, j = l1[0], l2[1]
....: return [i if n & 1 else j for n in l2]
....:
In [32]:
In [32]: timeit faster(listA,listB)
1000000 loops, best of 3: 455 ns per loop
If you are doing a lot of numeric calculations you may want to look further into cython and or numpy.
I presently have this code for factoring large numbers:
def f1(n):
return [[i, n//i] for i in range(1 , int(n**0.5) + 1) if n % i == 0]
It's the fastest version I've seen so far (If there's a faster way I'd love to know about that as well), but I'd like a single list of all the factors with no nesting (so I want something like: [factor 1, factor 2, factor 3,..., factor n-3, factor n-2, factor n-1, factor n] and so on. The order isn't really important.
As such I was wondering if there was a way to ascribe multiple assignments via a list comprehension.
i.e.
def f1(n):
return [i, n//i for i in range(1 , int(n**0.5) + 1) if n % i == 0]
That way I don't have a nested list. It would be faster and speed is of the essence.
I looked in the documentation and I couldn't find a single example of multiple assignments.
List comprehensions are great, but sometimes they're not best the solution, depending on requirements for readability and speed. Sometimes, just writing out the implied for loop (and if statement) is more readable and quicker.
def factors(n):
l = []
for i in range(1, int(n**0.5)+1):
if n % i == 0:
l.append(i)
l.append(n//i)
return l
For small numbers, the above function is quicker than the list comprehension. At larger numbers (1,000,000 and bigger), the function and list comprehension are equal in terms of speed.
For a slight speed increase you can also cache the append method of the list, though this makes the function slightly less readable.
def factors(n):
l = []
append = l.append
for i in range(1, int(n**0.5)+1):
if n % i == 0:
append(i)
append(n//i)
return l
Speed comparison:
In [86]: %timeit factors_list_comprehension(1000)
100000 loops, best of 3: 7.57 µs per loop
In [87]: %timeit factors_function(1000)
100000 loops, best of 3: 6.24 µs per loop
In [88]: %timeit factors_optimised_function(1000)
100000 loops, best of 3: 5.81 µs per loop
In [89]: %timeit factors_list_comprehension(1000000)
10000 loops, best of 3: 111 µs per loop
In [90]: %timeit factors_function(1000000)
10000 loops, best of 3: 108 µs per loop
In [91]: %timeit factors_optimised_function(1000000)
10000 loops, best of 3: 106 µs per loop
Use itertools.chain:
from itertools import chain
def f1(n):
return list(chain.from_iterable([i, n//i] for i in xrange(1 , int(n**0.5) + 1) if not n % i))
If you don't need a list remove the list call on chain and just iterate over the returned chain object.
If optimization is important you should use extend and xrange:
def f1(n):
l = []
for i in xrange(1, int(n**0.5)+1):
if not n % i:
l.extend((i,n//i))
return l
You can achieve the desired result using sum(). For example:
>>> sum([[1,6],[2,3]],[])
[1, 6, 2, 3]
We can define the answer in terms of your existing code:
def f2(n):
return sum(f1(n), [])
However, be careful that your code returns the square root twice when n is a perfect square:
>>> f1(9)
[[1, 9], [3, 3]]
>>> f2(9)
[1, 9, 3, 3]