Related
I want to determine whether or not my list (actually a numpy.ndarray) contains duplicates in the fastest possible execution time. Note that I don't care about removing the duplicates, I simply want to know if there are any.
Note: I'd be extremely surprised if this is not a duplicate, but I've tried my best and can't find one. Closest are this question and this question, both of which are requesting that the unique list be returned.
Here are the four ways I thought of doing it.
TL;DR: if you expect very few (less than 1/1000) duplicates:
def contains_duplicates(X):
return len(np.unique(X)) != len(X)
If you expect frequent (more than 1/1000) duplicates:
def contains_duplicates(X):
seen = set()
seen_add = seen.add
for x in X:
if (x in seen or seen_add(x)):
return True
return False
The first method is an early exit from this answer which wants to return the unique values, and the second of which is the same idea applied to this answer.
>>> import numpy as np
>>> X = np.random.normal(0,1,[10000])
>>> def terhorst_early_exit(X):
...: elems = set()
...: for i in X:
...: if i in elems:
...: return True
...: elems.add(i)
...: return False
>>> %timeit terhorst_early_exit(X)
100 loops, best of 3: 10.6 ms per loop
>>> def peterbe_early_exit(X):
...: seen = set()
...: seen_add = seen.add
...: for x in X:
...: if (x in seen or seen_add(x)):
...: return True
...: return False
>>> %timeit peterbe_early_exit(X)
100 loops, best of 3: 9.35 ms per loop
>>> %timeit len(set(X)) != len(X)
100 loops, best of 3: 4.54 ms per loop
>>> %timeit len(np.unique(X)) != len(X)
1000 loops, best of 3: 967 µs per loop
Do things change if you start with an ordinary Python list, and not a numpy.ndarray?
>>> X = X.tolist()
>>> %timeit terhorst_early_exit(X)
100 loops, best of 3: 9.34 ms per loop
>>> %timeit peterbe_early_exit(X)
100 loops, best of 3: 8.07 ms per loop
>>> %timeit len(set(X)) != len(X)
100 loops, best of 3: 3.09 ms per loop
>>> %timeit len(np.unique(X)) != len(X)
1000 loops, best of 3: 1.83 ms per loop
Edit: what if we have a prior expectation of the number of duplicates?
The above comparison is functioning under the assumption that a) there are likely to be no duplicates, or b) we're more worried about the worst case than the average case.
>>> X = np.random.normal(0, 1, [10000])
>>> for n_duplicates in [1, 10, 100]:
>>> print("{} duplicates".format(n_duplicates))
>>> duplicate_idx = np.random.choice(len(X), n_duplicates, replace=False)
>>> X[duplicate_idx] = 0
>>> print("terhost_early_exit")
>>> %timeit terhorst_early_exit(X)
>>> print("peterbe_early_exit")
>>> %timeit peterbe_early_exit(X)
>>> print("set length")
>>> %timeit len(set(X)) != len(X)
>>> print("numpy unique length")
>>> %timeit len(np.unique(X)) != len(X)
1 duplicates
terhost_early_exit
100 loops, best of 3: 12.3 ms per loop
peterbe_early_exit
100 loops, best of 3: 9.55 ms per loop
set length
100 loops, best of 3: 4.71 ms per loop
numpy unique length
1000 loops, best of 3: 1.31 ms per loop
10 duplicates
terhost_early_exit
1000 loops, best of 3: 1.81 ms per loop
peterbe_early_exit
1000 loops, best of 3: 1.47 ms per loop
set length
100 loops, best of 3: 5.44 ms per loop
numpy unique length
1000 loops, best of 3: 1.37 ms per loop
100 duplicates
terhost_early_exit
10000 loops, best of 3: 111 µs per loop
peterbe_early_exit
10000 loops, best of 3: 99 µs per loop
set length
100 loops, best of 3: 5.16 ms per loop
numpy unique length
1000 loops, best of 3: 1.19 ms per loop
So if you expect very few duplicates, the numpy.unique function is the way to go. As the number of expected duplicates increases, the early exit methods dominate.
Depending on how large your array is, and how likely duplicates are, the answer will be different.
For example, if you expect the average array to have around 3 duplicates, early exit will cut your average-case time (and space) by 2/3rds; if you expect only 1 in 1000 arrays to have any duplicates at all, it will just add a bit of complexity without improving anything.
Meanwhile, if the arrays are big enough that building a temporary set as large as the array is likely to be expensive, sticking a probabilistic test like a bloom filter in front of it will probably speed things up dramatically, but if not, it's again just wasted effort.
Finally, you want to stay within numpy if at all possible. Looping over an array of floats (or whatever) and boxing each one into a Python object is going to take almost as much time as hashing and checking the values, and of course storing things in a Python set instead of optimized numpy storage is wasteful as well. But you have to trade that off against the other issues—you can't do early exit with numpy, and there may be nice C-optimized bloom filter implementations a pip install away but not be any that are numpy-friendly.
So, there's no one best solution for all possible scenarios.
Just to give an idea of how easy it is to write a bloom filter, here's one I hacked together in a couple minutes:
from bitarray import bitarray # pip3 install bitarray
def dupcheck(X):
# Hardcoded values to give about 5% false positives for 10000 elements
size = 62352
hashcount = 4
bits = bitarray(size)
bits.setall(0)
def check(x, hash=hash): # TODO: default-value bits, hashcount, size?
for i in range(hashcount):
if not bits[hash((x, i)) % size]: return False
return True
def add(x):
for i in range(hashcount):
bits[hash((x, i)) % size] = True
seen = set()
seen_add = seen.add
for x in X:
if check(x) or add(x):
if x in seen or seen_add(x):
return True
return False
This only uses 12KB (a 62352-bit bitarray plus a 500-float set) instead of 80KB (a 10000-float set or np.array). Which doesn't matter when you're only dealing with 10K elements, but with, say, 10B elements that use up more than half of your physical RAM, it would be a different story.
Of course it's almost certainly going to be an order of magnitude or so slower than using np.unique, or maybe even set, because we're doing all that slow looping in Python. But if this turns out to be worth doing, it should be a breeze to rewrite in Cython (and to directly access the numpy array without boxing and unboxing).
My timing tests differ from Scott for small lists. Using Python 3.7.3, set() is much faster than np.unique for a small numpy array from randint (length 8), but faster for a larger array (length 1000).
Length 8
Timing test iterations: 10000
Function Min Avg Sec Conclusion p-value
---------- --------- ----------- ------------ ---------
set_len 0 7.73486e-06 Baseline
unique_len 9.644e-06 2.55573e-05 Slower 0
Length 1000
Timing test iterations: 10000
Function Min Avg Sec Conclusion p-value
---------- ---------- ----------- ------------ ---------
set_len 0.00011066 0.000270466 Baseline
unique_len 4.3684e-05 8.95608e-05 Faster 0
Then I tried my own implementation, but I think it would require optimized C code to beat set:
def check_items(key_rand, **kwargs):
for i, vali in enumerate(key_rand):
for j in range(i+1, len(key_rand)):
valj = key_rand[j]
if vali == valj:
break
Length 8
Timing test iterations: 10000
Function Min Avg Sec Conclusion p-value
----------- ---------- ----------- ------------ ---------
set_len 0 6.74221e-06 Baseline
unique_len 0 2.14604e-05 Slower 0
check_items 1.1138e-05 2.16369e-05 Slower 0
(using my randomized compare_time() function from easyinfo)
Ordered list reduction
I need to reduce some lists where, depending on element types, the speed and implementation of the binary operation varies, i.e. large speed reductions can be gained by reducing some pairs with specific functions first.
For example foo(a[0], bar(a[1], a[2]))
might be a lot slower than bar(foo(a[0], a[1]), a[2]) but in this case give the same result.
I have the code that produces an optimal ordering in the form of a list of tuples (pair_index, binary_function) already. I am struggling to implement an efficient function to perform the reduction, ideally one that returns a new partial function which can then be used repeatedly on lists of the same type-ordering but varying values.
Simple and slow(?) solution
Here is my naive solution involving a for loop, deletion of elements and closure over the (pair_index, binary_function) list to return a 'precomputed' function.
def ordered_reduce(a, pair_indexes, binary_functions, precompute=False):
"""
a: list to reduce, length n
pair_indexes: order of pairs to reduce, length (n-1)
binary_functions: functions to use for each reduction, length (n-1)
"""
def ord_red_func(x):
y = list(x) # copy so as not to eat up
for p, f in zip(pair_indexes, binary_functions):
b = f(y[p], y[p+1])
# Replace pair
del y[p]
y[p] = b
return y[0]
return ord_red_func if precompute else ord_red_func(a)
>>> foos = (lambda a, b: a - b, lambda a, b: a + b, lambda a, b: a * b)
>>> ordered_reduce([1, 2, 3, 4], (2, 1, 0), foos)
1
>>> 1 * (2 + (3-4))
1
And how pre-compution works:
>>> foo = ordered_reduce(None, (0, 1, 0), foos)
>>> foo([1, 2, 3, 4])
-7
>>> (1 - 2) * (3 + 4)
-7
However it involves copying the whole list and is also (therefore?) slow. Is there a better/standard way to do this?
(EDIT:) Some Timings:
from operators import add
from functools import reduce
from itertools import repeat
from random import random
r = 100000
xs = [random() for _ in range(r)]
# slightly trivial choices of pairs and functions, to replicate reduce
ps = [0]*(r-1)
fs = repeat(add)
foo = ordered_reduce(None, ps, fs, precompute=True)
>>> %timeit reduce(add, xs)
100 loops, best of 3: 3.59 ms per loop
>>> %timeit foo(xs)
1 loop, best of 3: 1.44 s per loop
This is kind of worst case scenario, and slightly cheating as reduce does not take a iterable of functions, but a function which does (but no order) is still pretty fast:
def multi_reduce(fs, xs):
xs = iter(xs)
x = next(xs)
for f, nx in zip(fs, xs):
x = f(x, nx)
return x
>>> %timeit multi_reduce(fs, xs)
100 loops, best of 3: 8.71 ms per loop
(EDIT2): and for fun, the performance of a massively cheating 'compiled' version, which gives some idea of the total overhead occurring.
from numba import jit
#jit(nopython=True)
def numba_sum(xs):
y = 0
for x in xs:
y += x
return y
>>> %timeit numba_sum(xs)
1000 loops, best of 3: 1.46 ms per loop
When I read this problem, I immediately thought of reverse Polish notation (RPN). While it may not be the best approach, it still gives a substantial speedup in this case.
My second thought is that you may get an equivalent result if you just reorder the sequence xs appropriately to get rid of del y[p]. (Arguably the best performance would be achieved if the whole reduce procedure is written in C. But it's a different kettle of fish.)
Reverse Polish Notation
If you are not familiar with RPN, please read the short explanation in the wikipedia article. Basically, all operations can be written down without parentheses, for example (1-2)*(3+4) is 1 2 - 3 4 + * in RPN, while 1-(2*(3+4)) becomes 1 2 3 4 + * -.
Here is a simple implementation of an RPN parser. I separated an list of objects from an RPN sequence, so that the same sequence can be used for directly for different lists.
def rpn(arr, seq):
'''
Reverse Polish Notation algorithm
(this version works only for binary operators)
arr: array of objects
seq: rpn sequence containing indices of objects from arr and functions
'''
stack = []
for x in seq:
if isinstance(x, int):
# it's an object: push it to stack
stack.append(arr[x])
else:
# it's a function: pop two objects, apply the function, push the result to stack
b = stack.pop()
#a = stack.pop()
#stack.append(x(a,b))
## shortcut:
stack[-1] = x(stack[-1], b)
return stack.pop()
Example of usage:
# Say we have an array
arr = [100, 210, 42, 13]
# and want to calculate
(100 - 210) * (42 + 13)
# It translates to RPN:
100 210 - 42 13 + *
# or
arr[0] arr[1] - arr[2] arr[3] + *
# So we apply `
rpn(arr,[0, 1, subtract, 2, 3, add, multiply])
To apply RPN to your case you'd need either to generate rpn sequences from scratch or to convert your (pair_indexes, binary_functions) into them. I haven't thought about a converter but it surely can be done.
Tests
Your original test comes first:
r = 100000
xs = [random() for _ in range(r)]
ps = [0]*(r-1)
fs = repeat(add)
foo = ordered_reduce(None, ps, fs, precompute=True)
rpn_seq = [0] + [x for i, f in zip(range(1,r), repeat(add)) for x in (i,f)]
rpn_seq2 = list(range(r)) + list(repeat(add,r-1))
# Here rpn_seq denotes (_ + (_ + (_ +( ... )...))))
# and rpn_seq2 denotes ((...( ... _)+ _) + _).
# Obviously, they are not equivalent but with 'add' they yield the same result.
%timeit reduce(add, xs)
100 loops, best of 3: 7.37 ms per loop
%timeit foo(xs)
1 loops, best of 3: 1.71 s per loop
%timeit rpn(xs, rpn_seq)
10 loops, best of 3: 79.5 ms per loop
%timeit rpn(xs, rpn_seq2)
10 loops, best of 3: 73 ms per loop
# Pure numpy just out of curiosity:
%timeit np.sum(np.asarray(xs))
100 loops, best of 3: 3.84 ms per loop
xs_np = np.asarray(xs)
%timeit np.sum(xs_np)
The slowest run took 4.52 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 48.5 µs per loop
So, rpn was 10 times slower than reduce but about 20 times faster than ordered_reduce.
Now, let's try something more complicated: alternately adding and multiplying matrices. I need a special function for it to test against reduce.
add_or_dot_b = 1
def add_or_dot(x,y):
'''calls 'add' and 'np.dot' alternately'''
global add_or_dot_b
if add_or_dot_b:
out = x+y
else:
out = np.dot(x,y)
add_or_dot_b = 1 - add_or_dot_b
# normalizing out to avoid `inf` in results
return out/np.max(out)
r = 100001 # +1 for convenience
# (we apply an even number of functions)
xs = [np.random.rand(2,2) for _ in range(r)]
ps = [0]*(r-1)
fs = repeat(add_or_dot)
foo = ordered_reduce(None, ps, fs, precompute=True)
rpn_seq = [0] + [x for i, f in zip(range(1,r), repeat(add_or_dot)) for x in (i,f)]
%timeit reduce(add_or_dot, xs)
1 loops, best of 3: 894 ms per loop
%timeit foo(xs)
1 loops, best of 3: 2.72 s per loop
%timeit rpn(xs, rpn_seq)
1 loops, best of 3: 1.17 s per loop
Here, rpn was roughly 25% slower than reduce and more than 2 times faster than ordered_reduce.
What's the most pythonic way to mesh two strings together?
For example:
Input:
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'
Output:
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
For me, the most pythonic* way is the following which pretty much does the same thing but uses the + operator for concatenating the individual characters in each string:
res = "".join(i + j for i, j in zip(u, l))
print(res)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
It is also faster than using two join() calls:
In [5]: l1 = 'A' * 1000000; l2 = 'a' * 1000000
In [6]: %timeit "".join("".join(item) for item in zip(l1, l2))
1 loops, best of 3: 442 ms per loop
In [7]: %timeit "".join(i + j for i, j in zip(l1, l2))
1 loops, best of 3: 360 ms per loop
Faster approaches exist, but they often obfuscate the code.
Note: If the two input strings are not the same length then the longer one will be truncated as zip stops iterating at the end of the shorter string. In this case instead of zip one should use zip_longest (izip_longest in Python 2) from the itertools module to ensure that both strings are fully exhausted.
*To take a quote from the Zen of Python: Readability counts.
Pythonic = readability for me; i + j is just visually parsed more easily, at least for my eyes.
Faster Alternative
Another way:
res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
print(''.join(res))
Output:
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
Speed
Looks like it is faster:
%%timeit
res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
''.join(res)
100000 loops, best of 3: 4.75 µs per loop
than the fastest solution so far:
%timeit "".join(list(chain.from_iterable(zip(u, l))))
100000 loops, best of 3: 6.52 µs per loop
Also for the larger strings:
l1 = 'A' * 1000000; l2 = 'a' * 1000000
%timeit "".join(list(chain.from_iterable(zip(l1, l2))))
1 loops, best of 3: 151 ms per loop
%%timeit
res = [''] * len(l1) * 2
res[::2] = l1
res[1::2] = l2
''.join(res)
10 loops, best of 3: 92 ms per loop
Python 3.5.1.
Variation for strings with different lengths
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijkl'
Shorter one determines length (zip() equivalent)
min_len = min(len(u), len(l))
res = [''] * min_len * 2
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
print(''.join(res))
Output:
AaBbCcDdEeFfGgHhIiJjKkLl
Longer one determines length (itertools.zip_longest(fillvalue='') equivalent)
min_len = min(len(u), len(l))
res = [''] * min_len * 2
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
res += u[min_len:] + l[min_len:]
print(''.join(res))
Output:
AaBbCcDdEeFfGgHhIiJjKkLlMNOPQRSTUVWXYZ
With join() and zip().
>>> ''.join(''.join(item) for item in zip(u,l))
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
On Python 2, by far the faster way to do things, at ~3x the speed of list slicing for small strings and ~30x for long ones, is
res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)
This wouldn't work on Python 3, though. You could implement something like
res = bytearray(len(u) * 2)
res[::2] = u.encode("ascii")
res[1::2] = l.encode("ascii")
res.decode("ascii")
but by then you've already lost the gains over list slicing for small strings (it's still 20x the speed for long strings) and this doesn't even work for non-ASCII characters yet.
FWIW, if you are doing this on massive strings and need every cycle, and for some reason have to use Python strings... here's how to do it:
res = bytearray(len(u) * 4 * 2)
u_utf32 = u.encode("utf_32_be")
res[0::8] = u_utf32[0::4]
res[1::8] = u_utf32[1::4]
res[2::8] = u_utf32[2::4]
res[3::8] = u_utf32[3::4]
l_utf32 = l.encode("utf_32_be")
res[4::8] = l_utf32[0::4]
res[5::8] = l_utf32[1::4]
res[6::8] = l_utf32[2::4]
res[7::8] = l_utf32[3::4]
res.decode("utf_32_be")
Special-casing the common case of smaller types will help too. FWIW, this is only 3x the speed of list slicing for long strings and a factor of 4 to 5 slower for small strings.
Either way I prefer the join solutions, but since timings were mentioned elsewhere I thought I might as well join in.
If you want the fastest way, you can combine itertools with operator.add:
In [36]: from operator import add
In [37]: from itertools import starmap, izip
In [38]: timeit "".join([i + j for i, j in uzip(l1, l2)])
1 loops, best of 3: 142 ms per loop
In [39]: timeit "".join(starmap(add, izip(l1,l2)))
1 loops, best of 3: 117 ms per loop
In [40]: timeit "".join(["".join(item) for item in zip(l1, l2)])
1 loops, best of 3: 196 ms per loop
In [41]: "".join(starmap(add, izip(l1,l2))) == "".join([i + j for i, j in izip(l1, l2)]) == "".join(["".join(item) for item in izip(l1, l2)])
Out[42]: True
But combining izip and chain.from_iterable is faster again
In [2]: from itertools import chain, izip
In [3]: timeit "".join(chain.from_iterable(izip(l1, l2)))
10 loops, best of 3: 98.7 ms per loop
There is also a substantial difference between
chain(* and chain.from_iterable(....
In [5]: timeit "".join(chain(*izip(l1, l2)))
1 loops, best of 3: 212 ms per loop
There is no such thing as a generator with join, passing one is always going to be slower as python will first build a list using the content because it does two passes over the data, one to figure out the size needed and one to actually do the join which would not be possible using a generator:
join.h:
/* Here is the general case. Do a pre-pass to figure out the total
* amount of space we'll need (sz), and see whether all arguments are
* bytes-like.
*/
Also if you have different length strings and you don't want to lose data you can use izip_longest :
In [22]: from itertools import izip_longest
In [23]: a,b = "hlo","elworld"
In [24]: "".join(chain.from_iterable(izip_longest(a, b,fillvalue="")))
Out[24]: 'helloworld'
For python 3 it is called zip_longest
But for python2, veedrac's suggestion is by far the fastest:
In [18]: %%timeit
res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)
....:
100 loops, best of 3: 2.68 ms per loop
You could also do this using map and operator.add:
from operator import add
u = 'AAAAA'
l = 'aaaaa'
s = "".join(map(add, u, l))
Output:
'AaAaAaAaAa'
What map does is it takes every element from the first iterable u and the first elements from the second iterable l and applies the function supplied as the first argument add. Then join just joins them.
Jim's answer is great, but here's my favorite option, if you don't mind a couple of imports:
from functools import reduce
from operator import add
reduce(add, map(add, u, l))
A lot of these suggestions assume the strings are of equal length. Maybe that covers all reasonable use cases, but at least to me it seems that you might want to accomodate strings of differing lengths too. Or am I the only one thinking the mesh should work a bit like this:
u = "foobar"
l = "baz"
mesh(u,l) = "fboaozbar"
One way to do this would be the following:
def mesh(a,b):
minlen = min(len(a),len(b))
return "".join(["".join(x+y for x,y in zip(a,b)),a[minlen:],b[minlen:]])
I like using two fors, the variable names can give a hint/reminder to what is going on:
"".join(char for pair in zip(u,l) for char in pair)
Just to add another, more basic approach:
st = ""
for char in u:
st = "{0}{1}{2}".format( st, char, l[ u.index( char ) ] )
Feels a bit un-pythonic not to consider the double-list-comprehension answer here, to handle n string with O(1) effort:
"".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)
where all_strings is a list of the strings you want to interleave. In your case, all_strings = [u, l]. A full use example would look like this:
import itertools
a = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
b = 'abcdefghijklmnopqrstuvwxyz'
all_strings = [a,b]
interleaved = "".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)
print(interleaved)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
Like many answers, fastest? Probably not, but simple and flexible. Also, without too much added complexity, this is slightly faster than the accepted answer (in general, string addition is a bit slow in python):
In [7]: l1 = 'A' * 1000000; l2 = 'a' * 1000000;
In [8]: %timeit "".join(a + b for i, j in zip(l1, l2))
1 loops, best of 3: 227 ms per loop
In [9]: %timeit "".join(c for cs in zip(*(l1, l2)) for c in cs)
1 loops, best of 3: 198 ms per loop
Potentially faster and shorter than the current leading solution:
from itertools import chain
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'
res = "".join(chain(*zip(u, l)))
Strategy speed-wise is to do as much at the C-level as possible. Same zip_longest() fix for uneven strings and it would be coming out of the same module as chain() so can't ding me too many points there!
Other solutions I came up with along the way:
res = "".join(u[x] + l[x] for x in range(len(u)))
res = "".join(k + l[i] for i, k in enumerate(u))
You could use iteration_utilities.roundrobin1
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'
from iteration_utilities import roundrobin
''.join(roundrobin(u, l))
# returns 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
or the ManyIterables class from the same package:
from iteration_utilities import ManyIterables
ManyIterables(u, l).roundrobin().as_string()
# returns 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
1 This is from a third-party library I have written: iteration_utilities.
I would use zip() to get a readable and easy way:
result = ''
for cha, chb in zip(u, l):
result += '%s%s' % (cha, chb)
print result
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
I know Python isn't built for speed but I would like to improve the performance of the following code:
listA = [1,2]
listB = [1,2,3,4,5,6,7,8,9,10]
# pre-allocate for speed. Appending empty list is slower?
newList = ['NaN']*len(listB)
# Do I need a loop? Can I use something faster?
for n in xrange(len(listB)):
if listB[n] % 2 == 1:
newList[n] = listA[0]
else:
newList[n] = listA[1]
My issue is listB can get pretty large.
I have already pre-allocated memory for newList and used xrange. I believe these provide significant speed increases for large lists.
But do I even need a for loop at all since each loop is not dependent on the previous result. Does python have an array type?
Can I break up listB and run the operation in parallel similar to parfor in Matlab?
ADDITIONAL INFO:
For my problem, as listA gets bigger, listB gets exponentially bigger.
For each item in listB there needs to be a lookup in listA. Then a calculation is performed (not necessary modulo) and the result appended to newList. Then I do a statistical analysis on newList (say take an average for simplicity). newList will always be the same length as listB.
The shortest and, perhaps, fastest way would be using list comprehension:
newList = [listA[1 - x%2] for x in listB]
The purpose of xrange is not to gain speed; its purpose is to reduce memory usage. The difference between range(N) and xrange(N) is that the latter doesn't expand to a list of size N but to a small generator object.
A few tips:
If your list is big, look into numpy. Numpy has efficient algorithms for array handling and uses native code internally.
Modulo is slow (if listB[n] % 2 == 1:). Better use a bitwise operator (if ListB[n]&1) in this case.
The if statement can go out: newList[n] = listA[1-ListB[n]&1] for each value of n in range. Invert the order of listA to get git of the 1- and save another integer op.
Using list comprehension seems to cut some time:
listB = [i for i in xrange(1,1000000)]
start = clock()
listA = [1,2]
for n in xrange(len(listB)):
if listB[n] % 2 == 1:
newList[n] = listA[0]
else:
newList[n] = listB[1]
print "Time taken = %.5f" % (clock() - start)
>>> 0.21216
Compared to:
listB = [i for i in xrange(1,1000000)]
start = clock()
listA = [1,2]
newList = [listA[0] if i%2 == 1 else listA[1] for i in listB]
print "Time taken = %.5f" % (clock() - start)
>> 0.15658
First, replace the modulo operator, n % 2, with the bitwise and operator, n & 1. Next, instead of accessing listB by index, just iterate through its items directly using in. You can remove listA entirely. These small improvements should should speed things up slightly.
newList = ((n & 1) + 1 for n in listB)
The real advantage of this code though, is that it is a generator comprehension, not a list comprehension. Although this doesn't make it any faster, it does make it much more memory efficient. That being said, it also has some disadvantages; you cannot access the entire list, and once you access a value it is gone. If you only intend on iterating through newList or performing some calculation on each item of newList this will be fine. If not, then make newList a list comprehension:
newList = [(n & 1) + 1 for n in listB]
Best of luck!
Just loop over listB and set two variables at the start instead of repeatedly indexing:
newList = []
i, j = listA[0], listA[1]
for n in listB:
if n % 2:
newList.append(i)
else:
newList.append(j)
Or use a list comp:
[i if n % 2 else j for n in listB]
Timings:
In [4]: %%timeit
newList = ['NaN']*len(listB)
for n in xrange(len(listB)):
if listB[n] % 2 == 1:
newList[n] = listA[0]
else:
newList[n] = listA[1]
...:
100000 loops, best of 3: 2.33 µs per loop
In [5]: %%timeit
...: i,j = listA[0], listA[1]
...: [i if n % 2 else j for n in listB]
...:
1000000 loops, best of 3: 1.12 µs per loop
In [16]: %%timeit
....: newList = []
....: i,j = listA[0], listA[1]
....: for n in listB:
....: if n % 2 == 1:
....: newList.append(i)
....: else:
....: newList.append(j)
....:
1000000 loops, best of 3: 1.88 µs per loop
In [18]: timeit [listA[1 - x%2] for x in listB]
1000000 loops, best of 3: 1.38 µs per loop
Using if n & 1 is slightly faster:
In [11]: %%timeit
i,j = listA[0], listA[1]
[i if n & 1 else j for n in listB]
....:
1000000 loops, best of 3: 1.04 µs per loop
So indexing always adds more overhead whether in a list comp or a loop. It is pointless continually indexing listA when you just want the two values.
If you want more speed compiling with cython and simply typing a couple of variables cuts down the runtime:
In [31]: %%cython
....: def faster(l1,l2):
....: cdef int i,j,n
....: i, j = l1[0], l2[1]
....: return [i if n & 1 else j for n in l2]
....:
In [32]:
In [32]: timeit faster(listA,listB)
1000000 loops, best of 3: 455 ns per loop
If you are doing a lot of numeric calculations you may want to look further into cython and or numpy.
I am wanting to zip up a list of entities with a new entity to generate a list of coordinates (2-tuples), but I want to assure that for (i, j) that i < j is always true.
However, I am not extremely pleased with my current solutions:
from itertools import repeat
mems = range(1, 10, 2)
mem = 8
def ij(i, j):
if i < j:
return (i, j)
else:
return (j, i)
def zipij(m=mem, ms=mems, f=ij):
return map(lambda i: f(i, m), ms)
def zipij2(m=mem, ms=mems):
return map(lambda i: tuple(sorted([i, m])), ms)
def zipij3(m=mem, ms=mems):
return [tuple(sorted([i, m])) for i in ms]
def zipij4(m=mem, ms=mems):
mems = zip(ms, repeat(m))
half1 = [(i, j) for i, j in mems if i < j]
half2 = [(j, i) for i, j in mems[len(half1):]]
return half1 + half2
def zipij5(m=mem, ms=mems):
mems = zip(ms, repeat(m))
return [(i, j) for i, j in mems if i < j] + [(j, i) for i, j in mems if i > j]
Output for above:
>>> print zipij() # or zipij{2-5}
[(1, 8), (3, 8), (5, 8), (7, 8), (8, 9)]
Instead of normally:
>>> print zip(mems, repeat(mem))
[(1, 8), (3, 8), (5, 8), (7, 8), (9, 8)]
Timings: snipped (no longer relevant, see much faster results in answers below)
For len(mems) == 5, there is no real issue with any solution, but for zipij5() for instance, the second list comprehension is needlessly going back over the first four values when i > j was already evaluated to be True for those in the first comprehension.
For my purposes, I'm positive that len(mems) will never exceed ~10000, if that helps form any answers for what solution is best. To explain my use case a bit (I find it interesting), I will be storing a sparse, upper-triangular, similarity matrix of sorts, and so I need the coordinate (i, j) to not be duplicated at (j, i). I say of sorts because I will be utilizing the new Counter() object in 2.7 to perform quasi matrix-matrix and matrix-vector addition. I then simply feed counter_obj.update() a list of 2-tuples and it increments those coordinates how many times they occur. SciPy sparse matrices ran about 50x slower, to my dismay, for my use cases... so I quickly ditched those.
So anyway, I was surprised by my results... The first methods I came up with were zipij4 and zipij5, and yet they are still the fastest, despite building a normal zip() and then generating a new zip after changing the values. I'm still rather new to Python, relatively speaking (Alex Martelli, can you hear me?), so here are my naive conclusions:
tuple(sorted([i, j])) is extremely expensive (Why is that?)
map(lambda ...) seems to always do worse than a list comp (I think I've read this and it makes sense)
Somehow zipij5() isn't much slower despite going over the list twice to check for i-j inequality. (Why is this?)
And lastly, I would like to know which is considered most efficient... or if there are any other fast and memory-inexpensive ways that I haven't yet thought of. Thank you.
Current Best Solutions
## Most BRIEF, Quickest with UNSORTED input list:
## truppo's
def zipij9(m=mem, ms=mems):
return [(i, m) if i < m else (m, i) for i in ms]
## Quickest with pre-SORTED input list:
## Michal's
def zipij10(m=mem, ms=mems):
i = binsearch(m, ms) ## See Michal's answer for binsearch()
return zip(ms[:i], repeat(m)) + zip(repeat(m), ms[i:])
Timings
# Michal's
Presorted - 410µs per loop
Unsorted - 2.09ms per loop ## Due solely to the expensive sorted()
# truppo's
Presorted - 880µs per loop
Unsorted - 896µs per loop ## No sorted() needed
Timings were using mems = range(1, 10000, 2), which is only ~5000 in length. sorted() will probably become worse at higher values, and with lists that are more shuffled. random.shuffle() was used for the "Unsorted" timings.
Current version:
(Fastest at the time of posting with Python 2.6.4 on my machine.)
Update 3: Since we're going all out, let's do a binary search -- in a way which doesn't require injecting m into mems:
def binsearch(x, lst):
low, high = -1, len(lst)
while low < high:
i = (high - low) // 2
if i > 0:
i += low
if lst[i] < x:
low = i
else:
high = i
else:
i = high
high = low
return i
def zipij(m=mem, ms=mems):
i = binsearch(m, ms)
return zip(ms[:i], repeat(m)) + zip(repeat(m), ms[i:])
This runs in 828 µs = 0.828 ms on my machine vs the OP's current solution's 1.14 ms. Input list assumed sorted (and the test case is the usual one, of course).
This binary search implementation returns the index of the first element in the given list which is not smaller than the object being searched for. Thus there's no need to inject m into mems and sort the whole thing (like in the OP's current solution with .index(m)) or walk through the beginning of the list step by step (like I did previously) to find the offset at which it should be divided.
Earlier attempts:
How about this? (Proposed solution next to In [25] below, 2.42 ms to zipij5's 3.13 ms.)
In [24]: timeit zipij5(m = mem, ms = mems)
100 loops, best of 3: 3.13 ms per loop
In [25]: timeit [(i, j) if i < j else (j, i) for (i, j) in zip(mems, repeat(mem))]
100 loops, best of 3: 2.42 ms per loop
In [27]: [(i, j) if i < j else (j, i) for (i, j) in zip(mems, repeat(mem))] == zipij5(m=mem, ms=mems)
Out[27]: True
Update: This appears to be just about exactly as fast as the OP's self-answer. Seems more straighforward, though.
Update 2: An implementation of the OP's proposed simplified solution:
def zipij(m=mem, ms=mems):
split_at = 0
for item in ms:
if item < m:
split_at += 1
else:
break
return [(item, m) for item in mems[:split_at]] + [(m, item) for item in mems[split_at:]]
In [54]: timeit zipij()
1000 loops, best of 3: 1.15 ms per loop
Also, truppo's solution runs in 1.36 ms on my machine. I guess the above is the fastest so far. Note you need to sort mems before passing them into this function! If you're generating it with range, it is of course already sorted, though.
Why not just inline your ij()-function?
def zipij(m=mem, ms=mems):
return [(i, m) if i < m else (m, i) for i in ms]
(This runs in 0.64 ms instead of 2.12 ms on my computer)
Some benchmarks:
zipit.py:
from itertools import repeat
mems = range(1, 50000, 2)
mem = 8
def zipij7(m=mem, ms=mems):
cpy = sorted(ms + [m])
loc = cpy.index(m)
return zip(ms[:(loc)], repeat(m)) + zip(repeat(m), ms[(loc):])
def zipinline(m=mem, ms=mems):
return [(i, m) if i < m else (m, i) for i in ms]
Sorted:
>python -m timeit -s "import zipit" "zipit.zipinline()"
100 loops, best of 3: 4.44 msec per loop
>python -m timeit -s "import zipit" "zipit.zipij7()"
100 loops, best of 3: 4.8 msec per loop
Unsorted:
>python -m timeit -s "import zipit, random; random.shuffle(zipit.mems)" "zipit.zipinline()"
100 loops, best of 3: 4.65 msec per loop
p>python -m timeit -s "import zipit, random; random.shuffle(zipit.mems)" "zipit.zipij7()"
100 loops, best of 3: 17.1 msec per loop
Most recent version:
def zipij7(m=mem, ms=mems):
cpy = sorted(ms + [m])
loc = cpy.index(m)
return zip(ms[:(loc)], repeat(m)) + zip(repeat(m), ms[(loc):])
Benches slightly faster for me than truppo's, slower by 30% than Michal's. (Looking into that now)
I may have found my answer (for now). It seems I forgot about making a list comp version for `zipij()``:
def zipij1(m=mem, ms=mems, f=ij):
return [f(i, m) for i in ms]
It still relies on my silly ij() helper function, so it doesn't win the award for brevity, certainly, but timings have improved:
# 10000
1.27s
# 50000
6.74s
So it is now my current "winner", and also does not need to generate more than one list, or use a lot of function calls, other than the ij() helper, so I believe it would also be the most efficient.
However, I think this could still be improved... I think that making N ij() function calls (where N is the length of the resultant list) is not needed:
Find at what index mem would fit into mems when ordered
Split mems at that index into two parts
Do zip(part1, repeat(mem))
Add zip(repeat(mem), part2) to it
It'd basically be an improvement on zipij4(), and this avoids N extra function calls, but I am not sure of the speed/memory benefits over the cost of brevity. I will maybe add that version to this answer if I figure it out.