I know Python isn't built for speed but I would like to improve the performance of the following code:
listA = [1,2]
listB = [1,2,3,4,5,6,7,8,9,10]
# pre-allocate for speed. Appending empty list is slower?
newList = ['NaN']*len(listB)
# Do I need a loop? Can I use something faster?
for n in xrange(len(listB)):
if listB[n] % 2 == 1:
newList[n] = listA[0]
else:
newList[n] = listA[1]
My issue is listB can get pretty large.
I have already pre-allocated memory for newList and used xrange. I believe these provide significant speed increases for large lists.
But do I even need a for loop at all since each loop is not dependent on the previous result. Does python have an array type?
Can I break up listB and run the operation in parallel similar to parfor in Matlab?
ADDITIONAL INFO:
For my problem, as listA gets bigger, listB gets exponentially bigger.
For each item in listB there needs to be a lookup in listA. Then a calculation is performed (not necessary modulo) and the result appended to newList. Then I do a statistical analysis on newList (say take an average for simplicity). newList will always be the same length as listB.
The shortest and, perhaps, fastest way would be using list comprehension:
newList = [listA[1 - x%2] for x in listB]
The purpose of xrange is not to gain speed; its purpose is to reduce memory usage. The difference between range(N) and xrange(N) is that the latter doesn't expand to a list of size N but to a small generator object.
A few tips:
If your list is big, look into numpy. Numpy has efficient algorithms for array handling and uses native code internally.
Modulo is slow (if listB[n] % 2 == 1:). Better use a bitwise operator (if ListB[n]&1) in this case.
The if statement can go out: newList[n] = listA[1-ListB[n]&1] for each value of n in range. Invert the order of listA to get git of the 1- and save another integer op.
Using list comprehension seems to cut some time:
listB = [i for i in xrange(1,1000000)]
start = clock()
listA = [1,2]
for n in xrange(len(listB)):
if listB[n] % 2 == 1:
newList[n] = listA[0]
else:
newList[n] = listB[1]
print "Time taken = %.5f" % (clock() - start)
>>> 0.21216
Compared to:
listB = [i for i in xrange(1,1000000)]
start = clock()
listA = [1,2]
newList = [listA[0] if i%2 == 1 else listA[1] for i in listB]
print "Time taken = %.5f" % (clock() - start)
>> 0.15658
First, replace the modulo operator, n % 2, with the bitwise and operator, n & 1. Next, instead of accessing listB by index, just iterate through its items directly using in. You can remove listA entirely. These small improvements should should speed things up slightly.
newList = ((n & 1) + 1 for n in listB)
The real advantage of this code though, is that it is a generator comprehension, not a list comprehension. Although this doesn't make it any faster, it does make it much more memory efficient. That being said, it also has some disadvantages; you cannot access the entire list, and once you access a value it is gone. If you only intend on iterating through newList or performing some calculation on each item of newList this will be fine. If not, then make newList a list comprehension:
newList = [(n & 1) + 1 for n in listB]
Best of luck!
Just loop over listB and set two variables at the start instead of repeatedly indexing:
newList = []
i, j = listA[0], listA[1]
for n in listB:
if n % 2:
newList.append(i)
else:
newList.append(j)
Or use a list comp:
[i if n % 2 else j for n in listB]
Timings:
In [4]: %%timeit
newList = ['NaN']*len(listB)
for n in xrange(len(listB)):
if listB[n] % 2 == 1:
newList[n] = listA[0]
else:
newList[n] = listA[1]
...:
100000 loops, best of 3: 2.33 µs per loop
In [5]: %%timeit
...: i,j = listA[0], listA[1]
...: [i if n % 2 else j for n in listB]
...:
1000000 loops, best of 3: 1.12 µs per loop
In [16]: %%timeit
....: newList = []
....: i,j = listA[0], listA[1]
....: for n in listB:
....: if n % 2 == 1:
....: newList.append(i)
....: else:
....: newList.append(j)
....:
1000000 loops, best of 3: 1.88 µs per loop
In [18]: timeit [listA[1 - x%2] for x in listB]
1000000 loops, best of 3: 1.38 µs per loop
Using if n & 1 is slightly faster:
In [11]: %%timeit
i,j = listA[0], listA[1]
[i if n & 1 else j for n in listB]
....:
1000000 loops, best of 3: 1.04 µs per loop
So indexing always adds more overhead whether in a list comp or a loop. It is pointless continually indexing listA when you just want the two values.
If you want more speed compiling with cython and simply typing a couple of variables cuts down the runtime:
In [31]: %%cython
....: def faster(l1,l2):
....: cdef int i,j,n
....: i, j = l1[0], l2[1]
....: return [i if n & 1 else j for n in l2]
....:
In [32]:
In [32]: timeit faster(listA,listB)
1000000 loops, best of 3: 455 ns per loop
If you are doing a lot of numeric calculations you may want to look further into cython and or numpy.
Related
What's the most pythonic way to mesh two strings together?
For example:
Input:
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'
Output:
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
For me, the most pythonic* way is the following which pretty much does the same thing but uses the + operator for concatenating the individual characters in each string:
res = "".join(i + j for i, j in zip(u, l))
print(res)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
It is also faster than using two join() calls:
In [5]: l1 = 'A' * 1000000; l2 = 'a' * 1000000
In [6]: %timeit "".join("".join(item) for item in zip(l1, l2))
1 loops, best of 3: 442 ms per loop
In [7]: %timeit "".join(i + j for i, j in zip(l1, l2))
1 loops, best of 3: 360 ms per loop
Faster approaches exist, but they often obfuscate the code.
Note: If the two input strings are not the same length then the longer one will be truncated as zip stops iterating at the end of the shorter string. In this case instead of zip one should use zip_longest (izip_longest in Python 2) from the itertools module to ensure that both strings are fully exhausted.
*To take a quote from the Zen of Python: Readability counts.
Pythonic = readability for me; i + j is just visually parsed more easily, at least for my eyes.
Faster Alternative
Another way:
res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
print(''.join(res))
Output:
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
Speed
Looks like it is faster:
%%timeit
res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
''.join(res)
100000 loops, best of 3: 4.75 µs per loop
than the fastest solution so far:
%timeit "".join(list(chain.from_iterable(zip(u, l))))
100000 loops, best of 3: 6.52 µs per loop
Also for the larger strings:
l1 = 'A' * 1000000; l2 = 'a' * 1000000
%timeit "".join(list(chain.from_iterable(zip(l1, l2))))
1 loops, best of 3: 151 ms per loop
%%timeit
res = [''] * len(l1) * 2
res[::2] = l1
res[1::2] = l2
''.join(res)
10 loops, best of 3: 92 ms per loop
Python 3.5.1.
Variation for strings with different lengths
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijkl'
Shorter one determines length (zip() equivalent)
min_len = min(len(u), len(l))
res = [''] * min_len * 2
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
print(''.join(res))
Output:
AaBbCcDdEeFfGgHhIiJjKkLl
Longer one determines length (itertools.zip_longest(fillvalue='') equivalent)
min_len = min(len(u), len(l))
res = [''] * min_len * 2
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
res += u[min_len:] + l[min_len:]
print(''.join(res))
Output:
AaBbCcDdEeFfGgHhIiJjKkLlMNOPQRSTUVWXYZ
With join() and zip().
>>> ''.join(''.join(item) for item in zip(u,l))
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
On Python 2, by far the faster way to do things, at ~3x the speed of list slicing for small strings and ~30x for long ones, is
res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)
This wouldn't work on Python 3, though. You could implement something like
res = bytearray(len(u) * 2)
res[::2] = u.encode("ascii")
res[1::2] = l.encode("ascii")
res.decode("ascii")
but by then you've already lost the gains over list slicing for small strings (it's still 20x the speed for long strings) and this doesn't even work for non-ASCII characters yet.
FWIW, if you are doing this on massive strings and need every cycle, and for some reason have to use Python strings... here's how to do it:
res = bytearray(len(u) * 4 * 2)
u_utf32 = u.encode("utf_32_be")
res[0::8] = u_utf32[0::4]
res[1::8] = u_utf32[1::4]
res[2::8] = u_utf32[2::4]
res[3::8] = u_utf32[3::4]
l_utf32 = l.encode("utf_32_be")
res[4::8] = l_utf32[0::4]
res[5::8] = l_utf32[1::4]
res[6::8] = l_utf32[2::4]
res[7::8] = l_utf32[3::4]
res.decode("utf_32_be")
Special-casing the common case of smaller types will help too. FWIW, this is only 3x the speed of list slicing for long strings and a factor of 4 to 5 slower for small strings.
Either way I prefer the join solutions, but since timings were mentioned elsewhere I thought I might as well join in.
If you want the fastest way, you can combine itertools with operator.add:
In [36]: from operator import add
In [37]: from itertools import starmap, izip
In [38]: timeit "".join([i + j for i, j in uzip(l1, l2)])
1 loops, best of 3: 142 ms per loop
In [39]: timeit "".join(starmap(add, izip(l1,l2)))
1 loops, best of 3: 117 ms per loop
In [40]: timeit "".join(["".join(item) for item in zip(l1, l2)])
1 loops, best of 3: 196 ms per loop
In [41]: "".join(starmap(add, izip(l1,l2))) == "".join([i + j for i, j in izip(l1, l2)]) == "".join(["".join(item) for item in izip(l1, l2)])
Out[42]: True
But combining izip and chain.from_iterable is faster again
In [2]: from itertools import chain, izip
In [3]: timeit "".join(chain.from_iterable(izip(l1, l2)))
10 loops, best of 3: 98.7 ms per loop
There is also a substantial difference between
chain(* and chain.from_iterable(....
In [5]: timeit "".join(chain(*izip(l1, l2)))
1 loops, best of 3: 212 ms per loop
There is no such thing as a generator with join, passing one is always going to be slower as python will first build a list using the content because it does two passes over the data, one to figure out the size needed and one to actually do the join which would not be possible using a generator:
join.h:
/* Here is the general case. Do a pre-pass to figure out the total
* amount of space we'll need (sz), and see whether all arguments are
* bytes-like.
*/
Also if you have different length strings and you don't want to lose data you can use izip_longest :
In [22]: from itertools import izip_longest
In [23]: a,b = "hlo","elworld"
In [24]: "".join(chain.from_iterable(izip_longest(a, b,fillvalue="")))
Out[24]: 'helloworld'
For python 3 it is called zip_longest
But for python2, veedrac's suggestion is by far the fastest:
In [18]: %%timeit
res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)
....:
100 loops, best of 3: 2.68 ms per loop
You could also do this using map and operator.add:
from operator import add
u = 'AAAAA'
l = 'aaaaa'
s = "".join(map(add, u, l))
Output:
'AaAaAaAaAa'
What map does is it takes every element from the first iterable u and the first elements from the second iterable l and applies the function supplied as the first argument add. Then join just joins them.
Jim's answer is great, but here's my favorite option, if you don't mind a couple of imports:
from functools import reduce
from operator import add
reduce(add, map(add, u, l))
A lot of these suggestions assume the strings are of equal length. Maybe that covers all reasonable use cases, but at least to me it seems that you might want to accomodate strings of differing lengths too. Or am I the only one thinking the mesh should work a bit like this:
u = "foobar"
l = "baz"
mesh(u,l) = "fboaozbar"
One way to do this would be the following:
def mesh(a,b):
minlen = min(len(a),len(b))
return "".join(["".join(x+y for x,y in zip(a,b)),a[minlen:],b[minlen:]])
I like using two fors, the variable names can give a hint/reminder to what is going on:
"".join(char for pair in zip(u,l) for char in pair)
Just to add another, more basic approach:
st = ""
for char in u:
st = "{0}{1}{2}".format( st, char, l[ u.index( char ) ] )
Feels a bit un-pythonic not to consider the double-list-comprehension answer here, to handle n string with O(1) effort:
"".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)
where all_strings is a list of the strings you want to interleave. In your case, all_strings = [u, l]. A full use example would look like this:
import itertools
a = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
b = 'abcdefghijklmnopqrstuvwxyz'
all_strings = [a,b]
interleaved = "".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)
print(interleaved)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
Like many answers, fastest? Probably not, but simple and flexible. Also, without too much added complexity, this is slightly faster than the accepted answer (in general, string addition is a bit slow in python):
In [7]: l1 = 'A' * 1000000; l2 = 'a' * 1000000;
In [8]: %timeit "".join(a + b for i, j in zip(l1, l2))
1 loops, best of 3: 227 ms per loop
In [9]: %timeit "".join(c for cs in zip(*(l1, l2)) for c in cs)
1 loops, best of 3: 198 ms per loop
Potentially faster and shorter than the current leading solution:
from itertools import chain
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'
res = "".join(chain(*zip(u, l)))
Strategy speed-wise is to do as much at the C-level as possible. Same zip_longest() fix for uneven strings and it would be coming out of the same module as chain() so can't ding me too many points there!
Other solutions I came up with along the way:
res = "".join(u[x] + l[x] for x in range(len(u)))
res = "".join(k + l[i] for i, k in enumerate(u))
You could use iteration_utilities.roundrobin1
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'
from iteration_utilities import roundrobin
''.join(roundrobin(u, l))
# returns 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
or the ManyIterables class from the same package:
from iteration_utilities import ManyIterables
ManyIterables(u, l).roundrobin().as_string()
# returns 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
1 This is from a third-party library I have written: iteration_utilities.
I would use zip() to get a readable and easy way:
result = ''
for cha, chb in zip(u, l):
result += '%s%s' % (cha, chb)
print result
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
I want to find a sequence of items in a sorted array of values.
I know that with numpy I can do:
l = np.searchsorted(values, items)
This has the complexity of O(len(items)*log(len(values))).
However, my items are also sorted, so I can do it in O(len(items)+len(values)) doing:
l = np.zeros(items.size, dtype=np.int32)
k, K = 0, len(values)
for i in range(len(items)):
while k < K and values[k] < items[i]:
k += 1
l[i] = k
The problem is that this version in pure python is way slower than searchsorted because of the python loop, even for large len(items) and len(values) (~10^6).
Any idea how to "vectorize" this loop with numpy?
Some example data:
# some example data
np.random.seed(0)
n_values = 1000000
n_items = 100000
values = np.random.rand(n_values)
items = np.random.rand(n_items)
values.sort()
items.sort()
Your original code snippet as well as an implementation of #PeterE's suggestion:
def original(values, items):
l = np.empty(items.size, dtype=np.int32)
k, K = 0, len(values)
for i, item in enumerate(items):
while k < K and values[k] < item:
k += 1
l[i] = k
return l
def peter_e(values, items):
l = np.empty(items.size, dtype=np.int32)
last_idx = 0
for i, item in enumerate(items):
last_idx += values[last_idx:].searchsorted(item)
l[i] = last_idx
return l
Test for correctness against naive np.searchsorted:
ss = values.searchsorted(items)
print(all(original(values, items) == ss))
# True
print(all(peter_e(values, items) == ss))
# True
Timings:
In [1]: %timeit original(values, items)
10 loops, best of 3: 115 ms per loop
In [2]: %timeit peter_e(values, items)
10 loops, best of 3: 79.8 ms per loop
In [3]: %timeit values.searchsorted(items)
100 loops, best of 3: 4.09 ms per loop
So for inputs of this size, naive use of np.searchsorted handily beats your original code, as well as PeterE's suggestion.
Update
To avoid any caching effects that might skew the timings, we can generate a new set of random input arrays for each iteration of the benchmark:
In [1]: %%timeit values = np.random.randn(n_values); items = np.random.randn(n_items); values.sort(); items.sort();
original(values, items)
.....:
10 loops, best of 3: 115 ms per loop
In [2]: %%timeit values = np.random.randn(n_values); items = np.random.randn(n_items); values.sort(); items.sort();
peter_e(values, items)
.....:
10 loops, best of 3: 79.9 ms per loop
In [3]: %%timeit values = np.random.randn(n_values); items = np.random.randn(n_items); values.sort(); items.sort();
values.searchsorted(items)
.....:
100 loops, best of 3: 4.08 ms per loop
Update 2
It's not that hard to write a Cython function that will beat np.searchsorted for the case where both values and items are sorted.
search_doubly_sorted.pyx:
import numpy as np
cimport numpy as np
cimport cython
#cython.boundscheck(False)
#cython.wraparound(False)
def search_doubly_sorted(values, items):
cdef:
double[:] _values = values.astype(np.double)
double[:] _items = items.astype(np.double)
long n_items = items.shape[0]
long n_values = values.shape[0]
long[:] out = np.empty(n_items, dtype=np.int64)
long ii, jj, last_idx
last_idx = 0
for ii in range(n_items):
for jj in range(last_idx, n_values):
if _items[ii] <= _values[jj]:
break
last_idx = jj
out[ii] = last_idx
return out.base
Test for correctness:
In [1]: from search_doubly_sorted import search_doubly_sorted
In [2]: print(all(search_doubly_sorted(values, items) == values.searchsorted(items)))
# True
Benchmark:
In [3]: %timeit values.searchsorted(items)
100 loops, best of 3: 4.07 ms per loop
In [4]: %timeit search_doubly_sorted(values, items)
1000 loops, best of 3: 1.44 ms per loop
The performance improvement is fairly marginal, though. Unless this is a serious bottleneck in your code then you should probably stick with np.searchsorted.
I presently have this code for factoring large numbers:
def f1(n):
return [[i, n//i] for i in range(1 , int(n**0.5) + 1) if n % i == 0]
It's the fastest version I've seen so far (If there's a faster way I'd love to know about that as well), but I'd like a single list of all the factors with no nesting (so I want something like: [factor 1, factor 2, factor 3,..., factor n-3, factor n-2, factor n-1, factor n] and so on. The order isn't really important.
As such I was wondering if there was a way to ascribe multiple assignments via a list comprehension.
i.e.
def f1(n):
return [i, n//i for i in range(1 , int(n**0.5) + 1) if n % i == 0]
That way I don't have a nested list. It would be faster and speed is of the essence.
I looked in the documentation and I couldn't find a single example of multiple assignments.
List comprehensions are great, but sometimes they're not best the solution, depending on requirements for readability and speed. Sometimes, just writing out the implied for loop (and if statement) is more readable and quicker.
def factors(n):
l = []
for i in range(1, int(n**0.5)+1):
if n % i == 0:
l.append(i)
l.append(n//i)
return l
For small numbers, the above function is quicker than the list comprehension. At larger numbers (1,000,000 and bigger), the function and list comprehension are equal in terms of speed.
For a slight speed increase you can also cache the append method of the list, though this makes the function slightly less readable.
def factors(n):
l = []
append = l.append
for i in range(1, int(n**0.5)+1):
if n % i == 0:
append(i)
append(n//i)
return l
Speed comparison:
In [86]: %timeit factors_list_comprehension(1000)
100000 loops, best of 3: 7.57 µs per loop
In [87]: %timeit factors_function(1000)
100000 loops, best of 3: 6.24 µs per loop
In [88]: %timeit factors_optimised_function(1000)
100000 loops, best of 3: 5.81 µs per loop
In [89]: %timeit factors_list_comprehension(1000000)
10000 loops, best of 3: 111 µs per loop
In [90]: %timeit factors_function(1000000)
10000 loops, best of 3: 108 µs per loop
In [91]: %timeit factors_optimised_function(1000000)
10000 loops, best of 3: 106 µs per loop
Use itertools.chain:
from itertools import chain
def f1(n):
return list(chain.from_iterable([i, n//i] for i in xrange(1 , int(n**0.5) + 1) if not n % i))
If you don't need a list remove the list call on chain and just iterate over the returned chain object.
If optimization is important you should use extend and xrange:
def f1(n):
l = []
for i in xrange(1, int(n**0.5)+1):
if not n % i:
l.extend((i,n//i))
return l
You can achieve the desired result using sum(). For example:
>>> sum([[1,6],[2,3]],[])
[1, 6, 2, 3]
We can define the answer in terms of your existing code:
def f2(n):
return sum(f1(n), [])
However, be careful that your code returns the square root twice when n is a perfect square:
>>> f1(9)
[[1, 9], [3, 3]]
>>> f2(9)
[1, 9, 3, 3]
I have a small block of code which I use to fill a list with integers. I need to improve its performance, perhaps translating the whole thing into numpy arrays, but I'm not sure how.
Here's the MWE:
import numpy as np
# List filled with integers.
a = np.random.randint(0,100,1000)
N = 10
b = [[] for _ in range(N-1)]
for indx,integ in enumerate(a):
if 0<elem<N:
b[integ-1].append(indx)
This is what it does:
for every integer (integ) in a
see if it is located between a given range (0,N)
if it is, store its index in a sub-list of b where the index of said sub-list is the original integer minus 1 (integ-1)
This bit of code runs pretty fast but my actual code uses much larger lists, hence the need to improve its performance.
Here's one way of doing it:
mask = (a > 0) & (a < N)
elements = a[mask]
indicies = np.arange(a.size)[mask]
b = [indicies[elements == i] for i in range(1, N)]
If we time the two:
import numpy as np
a = np.random.randint(0,100,1000)
N = 10
def original(a, N):
b = [[] for _ in range(N-1)]
for indx,elem in enumerate(a):
if 0<elem<N:
b[elem-1].append(indx)
return b
def new(a, N):
mask = (a > 0) & (a < N)
elements = a[mask]
indicies = np.arange(a.size)[mask]
return [indicies[elements == i] for i in range(1, N)]
The "new" way is considerably (~20x) faster:
In [5]: %timeit original(a, N)
100 loops, best of 3: 1.21 ms per loop
In [6]: %timeit new(a, N)
10000 loops, best of 3: 57 us per loop
And the results are identical:
In [7]: new_results = new(a, N)
In [8]: old_results = original(a, N)
In [9]: for x, y in zip(new_results, old_results):
....: assert np.allclose(x, y)
....:
In [10]:
The "new" vectorized version also scales much better to longer sequences. If we use a million-item-long sequence for a, the original solution takes slightly over 1 second, while the new version takes only 17 milliseconds (a ~70x speedup).
Try this solution! The first half I shamelessly stole from Joe's answer, but after that it uses sorting and binary search, which scales better with N.
def new(a, N):
mask = (a > 0) & (a < N)
elements = a[mask]
indices = np.arange(a.size)[mask]
sorting_idx = np.argsort(elements, kind='mergesort')
ind_sorted = indices[sorting_idx]
x = np.searchsorted(elements, range(N), side='right', sorter=sorting_idx)
return [ind_sorted[x[i]:x[i+1]] for i in range(N-1)]
You could put x = x.tolist() in there for an additional albeit small speed improvement (NB: if you do an a = a.tolist() in your original code, you do get a significant speedup). Also, I used 'mergesort' which is a stable sort but if you don't need the final result sorted, you can get away with a faster sorting algorithm.
I am wanting to zip up a list of entities with a new entity to generate a list of coordinates (2-tuples), but I want to assure that for (i, j) that i < j is always true.
However, I am not extremely pleased with my current solutions:
from itertools import repeat
mems = range(1, 10, 2)
mem = 8
def ij(i, j):
if i < j:
return (i, j)
else:
return (j, i)
def zipij(m=mem, ms=mems, f=ij):
return map(lambda i: f(i, m), ms)
def zipij2(m=mem, ms=mems):
return map(lambda i: tuple(sorted([i, m])), ms)
def zipij3(m=mem, ms=mems):
return [tuple(sorted([i, m])) for i in ms]
def zipij4(m=mem, ms=mems):
mems = zip(ms, repeat(m))
half1 = [(i, j) for i, j in mems if i < j]
half2 = [(j, i) for i, j in mems[len(half1):]]
return half1 + half2
def zipij5(m=mem, ms=mems):
mems = zip(ms, repeat(m))
return [(i, j) for i, j in mems if i < j] + [(j, i) for i, j in mems if i > j]
Output for above:
>>> print zipij() # or zipij{2-5}
[(1, 8), (3, 8), (5, 8), (7, 8), (8, 9)]
Instead of normally:
>>> print zip(mems, repeat(mem))
[(1, 8), (3, 8), (5, 8), (7, 8), (9, 8)]
Timings: snipped (no longer relevant, see much faster results in answers below)
For len(mems) == 5, there is no real issue with any solution, but for zipij5() for instance, the second list comprehension is needlessly going back over the first four values when i > j was already evaluated to be True for those in the first comprehension.
For my purposes, I'm positive that len(mems) will never exceed ~10000, if that helps form any answers for what solution is best. To explain my use case a bit (I find it interesting), I will be storing a sparse, upper-triangular, similarity matrix of sorts, and so I need the coordinate (i, j) to not be duplicated at (j, i). I say of sorts because I will be utilizing the new Counter() object in 2.7 to perform quasi matrix-matrix and matrix-vector addition. I then simply feed counter_obj.update() a list of 2-tuples and it increments those coordinates how many times they occur. SciPy sparse matrices ran about 50x slower, to my dismay, for my use cases... so I quickly ditched those.
So anyway, I was surprised by my results... The first methods I came up with were zipij4 and zipij5, and yet they are still the fastest, despite building a normal zip() and then generating a new zip after changing the values. I'm still rather new to Python, relatively speaking (Alex Martelli, can you hear me?), so here are my naive conclusions:
tuple(sorted([i, j])) is extremely expensive (Why is that?)
map(lambda ...) seems to always do worse than a list comp (I think I've read this and it makes sense)
Somehow zipij5() isn't much slower despite going over the list twice to check for i-j inequality. (Why is this?)
And lastly, I would like to know which is considered most efficient... or if there are any other fast and memory-inexpensive ways that I haven't yet thought of. Thank you.
Current Best Solutions
## Most BRIEF, Quickest with UNSORTED input list:
## truppo's
def zipij9(m=mem, ms=mems):
return [(i, m) if i < m else (m, i) for i in ms]
## Quickest with pre-SORTED input list:
## Michal's
def zipij10(m=mem, ms=mems):
i = binsearch(m, ms) ## See Michal's answer for binsearch()
return zip(ms[:i], repeat(m)) + zip(repeat(m), ms[i:])
Timings
# Michal's
Presorted - 410µs per loop
Unsorted - 2.09ms per loop ## Due solely to the expensive sorted()
# truppo's
Presorted - 880µs per loop
Unsorted - 896µs per loop ## No sorted() needed
Timings were using mems = range(1, 10000, 2), which is only ~5000 in length. sorted() will probably become worse at higher values, and with lists that are more shuffled. random.shuffle() was used for the "Unsorted" timings.
Current version:
(Fastest at the time of posting with Python 2.6.4 on my machine.)
Update 3: Since we're going all out, let's do a binary search -- in a way which doesn't require injecting m into mems:
def binsearch(x, lst):
low, high = -1, len(lst)
while low < high:
i = (high - low) // 2
if i > 0:
i += low
if lst[i] < x:
low = i
else:
high = i
else:
i = high
high = low
return i
def zipij(m=mem, ms=mems):
i = binsearch(m, ms)
return zip(ms[:i], repeat(m)) + zip(repeat(m), ms[i:])
This runs in 828 µs = 0.828 ms on my machine vs the OP's current solution's 1.14 ms. Input list assumed sorted (and the test case is the usual one, of course).
This binary search implementation returns the index of the first element in the given list which is not smaller than the object being searched for. Thus there's no need to inject m into mems and sort the whole thing (like in the OP's current solution with .index(m)) or walk through the beginning of the list step by step (like I did previously) to find the offset at which it should be divided.
Earlier attempts:
How about this? (Proposed solution next to In [25] below, 2.42 ms to zipij5's 3.13 ms.)
In [24]: timeit zipij5(m = mem, ms = mems)
100 loops, best of 3: 3.13 ms per loop
In [25]: timeit [(i, j) if i < j else (j, i) for (i, j) in zip(mems, repeat(mem))]
100 loops, best of 3: 2.42 ms per loop
In [27]: [(i, j) if i < j else (j, i) for (i, j) in zip(mems, repeat(mem))] == zipij5(m=mem, ms=mems)
Out[27]: True
Update: This appears to be just about exactly as fast as the OP's self-answer. Seems more straighforward, though.
Update 2: An implementation of the OP's proposed simplified solution:
def zipij(m=mem, ms=mems):
split_at = 0
for item in ms:
if item < m:
split_at += 1
else:
break
return [(item, m) for item in mems[:split_at]] + [(m, item) for item in mems[split_at:]]
In [54]: timeit zipij()
1000 loops, best of 3: 1.15 ms per loop
Also, truppo's solution runs in 1.36 ms on my machine. I guess the above is the fastest so far. Note you need to sort mems before passing them into this function! If you're generating it with range, it is of course already sorted, though.
Why not just inline your ij()-function?
def zipij(m=mem, ms=mems):
return [(i, m) if i < m else (m, i) for i in ms]
(This runs in 0.64 ms instead of 2.12 ms on my computer)
Some benchmarks:
zipit.py:
from itertools import repeat
mems = range(1, 50000, 2)
mem = 8
def zipij7(m=mem, ms=mems):
cpy = sorted(ms + [m])
loc = cpy.index(m)
return zip(ms[:(loc)], repeat(m)) + zip(repeat(m), ms[(loc):])
def zipinline(m=mem, ms=mems):
return [(i, m) if i < m else (m, i) for i in ms]
Sorted:
>python -m timeit -s "import zipit" "zipit.zipinline()"
100 loops, best of 3: 4.44 msec per loop
>python -m timeit -s "import zipit" "zipit.zipij7()"
100 loops, best of 3: 4.8 msec per loop
Unsorted:
>python -m timeit -s "import zipit, random; random.shuffle(zipit.mems)" "zipit.zipinline()"
100 loops, best of 3: 4.65 msec per loop
p>python -m timeit -s "import zipit, random; random.shuffle(zipit.mems)" "zipit.zipij7()"
100 loops, best of 3: 17.1 msec per loop
Most recent version:
def zipij7(m=mem, ms=mems):
cpy = sorted(ms + [m])
loc = cpy.index(m)
return zip(ms[:(loc)], repeat(m)) + zip(repeat(m), ms[(loc):])
Benches slightly faster for me than truppo's, slower by 30% than Michal's. (Looking into that now)
I may have found my answer (for now). It seems I forgot about making a list comp version for `zipij()``:
def zipij1(m=mem, ms=mems, f=ij):
return [f(i, m) for i in ms]
It still relies on my silly ij() helper function, so it doesn't win the award for brevity, certainly, but timings have improved:
# 10000
1.27s
# 50000
6.74s
So it is now my current "winner", and also does not need to generate more than one list, or use a lot of function calls, other than the ij() helper, so I believe it would also be the most efficient.
However, I think this could still be improved... I think that making N ij() function calls (where N is the length of the resultant list) is not needed:
Find at what index mem would fit into mems when ordered
Split mems at that index into two parts
Do zip(part1, repeat(mem))
Add zip(repeat(mem), part2) to it
It'd basically be an improvement on zipij4(), and this avoids N extra function calls, but I am not sure of the speed/memory benefits over the cost of brevity. I will maybe add that version to this answer if I figure it out.