Ordered list reduction
I need to reduce some lists where, depending on element types, the speed and implementation of the binary operation varies, i.e. large speed reductions can be gained by reducing some pairs with specific functions first.
For example foo(a[0], bar(a[1], a[2]))
might be a lot slower than bar(foo(a[0], a[1]), a[2]) but in this case give the same result.
I have the code that produces an optimal ordering in the form of a list of tuples (pair_index, binary_function) already. I am struggling to implement an efficient function to perform the reduction, ideally one that returns a new partial function which can then be used repeatedly on lists of the same type-ordering but varying values.
Simple and slow(?) solution
Here is my naive solution involving a for loop, deletion of elements and closure over the (pair_index, binary_function) list to return a 'precomputed' function.
def ordered_reduce(a, pair_indexes, binary_functions, precompute=False):
"""
a: list to reduce, length n
pair_indexes: order of pairs to reduce, length (n-1)
binary_functions: functions to use for each reduction, length (n-1)
"""
def ord_red_func(x):
y = list(x) # copy so as not to eat up
for p, f in zip(pair_indexes, binary_functions):
b = f(y[p], y[p+1])
# Replace pair
del y[p]
y[p] = b
return y[0]
return ord_red_func if precompute else ord_red_func(a)
>>> foos = (lambda a, b: a - b, lambda a, b: a + b, lambda a, b: a * b)
>>> ordered_reduce([1, 2, 3, 4], (2, 1, 0), foos)
1
>>> 1 * (2 + (3-4))
1
And how pre-compution works:
>>> foo = ordered_reduce(None, (0, 1, 0), foos)
>>> foo([1, 2, 3, 4])
-7
>>> (1 - 2) * (3 + 4)
-7
However it involves copying the whole list and is also (therefore?) slow. Is there a better/standard way to do this?
(EDIT:) Some Timings:
from operators import add
from functools import reduce
from itertools import repeat
from random import random
r = 100000
xs = [random() for _ in range(r)]
# slightly trivial choices of pairs and functions, to replicate reduce
ps = [0]*(r-1)
fs = repeat(add)
foo = ordered_reduce(None, ps, fs, precompute=True)
>>> %timeit reduce(add, xs)
100 loops, best of 3: 3.59 ms per loop
>>> %timeit foo(xs)
1 loop, best of 3: 1.44 s per loop
This is kind of worst case scenario, and slightly cheating as reduce does not take a iterable of functions, but a function which does (but no order) is still pretty fast:
def multi_reduce(fs, xs):
xs = iter(xs)
x = next(xs)
for f, nx in zip(fs, xs):
x = f(x, nx)
return x
>>> %timeit multi_reduce(fs, xs)
100 loops, best of 3: 8.71 ms per loop
(EDIT2): and for fun, the performance of a massively cheating 'compiled' version, which gives some idea of the total overhead occurring.
from numba import jit
#jit(nopython=True)
def numba_sum(xs):
y = 0
for x in xs:
y += x
return y
>>> %timeit numba_sum(xs)
1000 loops, best of 3: 1.46 ms per loop
When I read this problem, I immediately thought of reverse Polish notation (RPN). While it may not be the best approach, it still gives a substantial speedup in this case.
My second thought is that you may get an equivalent result if you just reorder the sequence xs appropriately to get rid of del y[p]. (Arguably the best performance would be achieved if the whole reduce procedure is written in C. But it's a different kettle of fish.)
Reverse Polish Notation
If you are not familiar with RPN, please read the short explanation in the wikipedia article. Basically, all operations can be written down without parentheses, for example (1-2)*(3+4) is 1 2 - 3 4 + * in RPN, while 1-(2*(3+4)) becomes 1 2 3 4 + * -.
Here is a simple implementation of an RPN parser. I separated an list of objects from an RPN sequence, so that the same sequence can be used for directly for different lists.
def rpn(arr, seq):
'''
Reverse Polish Notation algorithm
(this version works only for binary operators)
arr: array of objects
seq: rpn sequence containing indices of objects from arr and functions
'''
stack = []
for x in seq:
if isinstance(x, int):
# it's an object: push it to stack
stack.append(arr[x])
else:
# it's a function: pop two objects, apply the function, push the result to stack
b = stack.pop()
#a = stack.pop()
#stack.append(x(a,b))
## shortcut:
stack[-1] = x(stack[-1], b)
return stack.pop()
Example of usage:
# Say we have an array
arr = [100, 210, 42, 13]
# and want to calculate
(100 - 210) * (42 + 13)
# It translates to RPN:
100 210 - 42 13 + *
# or
arr[0] arr[1] - arr[2] arr[3] + *
# So we apply `
rpn(arr,[0, 1, subtract, 2, 3, add, multiply])
To apply RPN to your case you'd need either to generate rpn sequences from scratch or to convert your (pair_indexes, binary_functions) into them. I haven't thought about a converter but it surely can be done.
Tests
Your original test comes first:
r = 100000
xs = [random() for _ in range(r)]
ps = [0]*(r-1)
fs = repeat(add)
foo = ordered_reduce(None, ps, fs, precompute=True)
rpn_seq = [0] + [x for i, f in zip(range(1,r), repeat(add)) for x in (i,f)]
rpn_seq2 = list(range(r)) + list(repeat(add,r-1))
# Here rpn_seq denotes (_ + (_ + (_ +( ... )...))))
# and rpn_seq2 denotes ((...( ... _)+ _) + _).
# Obviously, they are not equivalent but with 'add' they yield the same result.
%timeit reduce(add, xs)
100 loops, best of 3: 7.37 ms per loop
%timeit foo(xs)
1 loops, best of 3: 1.71 s per loop
%timeit rpn(xs, rpn_seq)
10 loops, best of 3: 79.5 ms per loop
%timeit rpn(xs, rpn_seq2)
10 loops, best of 3: 73 ms per loop
# Pure numpy just out of curiosity:
%timeit np.sum(np.asarray(xs))
100 loops, best of 3: 3.84 ms per loop
xs_np = np.asarray(xs)
%timeit np.sum(xs_np)
The slowest run took 4.52 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 48.5 µs per loop
So, rpn was 10 times slower than reduce but about 20 times faster than ordered_reduce.
Now, let's try something more complicated: alternately adding and multiplying matrices. I need a special function for it to test against reduce.
add_or_dot_b = 1
def add_or_dot(x,y):
'''calls 'add' and 'np.dot' alternately'''
global add_or_dot_b
if add_or_dot_b:
out = x+y
else:
out = np.dot(x,y)
add_or_dot_b = 1 - add_or_dot_b
# normalizing out to avoid `inf` in results
return out/np.max(out)
r = 100001 # +1 for convenience
# (we apply an even number of functions)
xs = [np.random.rand(2,2) for _ in range(r)]
ps = [0]*(r-1)
fs = repeat(add_or_dot)
foo = ordered_reduce(None, ps, fs, precompute=True)
rpn_seq = [0] + [x for i, f in zip(range(1,r), repeat(add_or_dot)) for x in (i,f)]
%timeit reduce(add_or_dot, xs)
1 loops, best of 3: 894 ms per loop
%timeit foo(xs)
1 loops, best of 3: 2.72 s per loop
%timeit rpn(xs, rpn_seq)
1 loops, best of 3: 1.17 s per loop
Here, rpn was roughly 25% slower than reduce and more than 2 times faster than ordered_reduce.
Related
I've written a method that takes in an integer "n" and creates a square matrix where the values of each element are dictated by their respective i,j indices.
When I build a small matrix 30x30 it works just fine, but when I try to do something larger like 1000x1000 it takes very long. Is there any way that I can speed it up with multiprocessing?
def createMatrix(n):
matrix = []
for j in range(1,n+1):
row = []
for i in range(1,n+1):
value = 1/(i+j-1)
row.append(value)
matrix.append(row)
return np.array(matrix)
Parallelizing two computation-bound for loops in Python is not trivial because of GIL. The good news is that your case is perfectly vectorizeable:
def createMatrix(n):
return 1 / (np.arange(n)[None, :] + np.arange(n)[:, None] + 1)
Explanation:
essentially, your formula for the matrix is X[row][column] = 1/(row+column-1), where rows and columns are 1-based
np.arange(n) creates a range that can be used for rows or columns
[None, :] and [:, None] turn it into a 2d array, 1 x n or n x 1
numpy then broadcasts dimensions, replicating row and column indexes to match dimensions - thus, implicitly tiling both into n x n when added
since both ranges are 0-based, using +1 instead of -1
As a rule of thumb, it is almost never a good idea to use for loops on numpy arrays. A vectorized approach (i.e. matrix form computations) is orders of magnitude faster.
It's not a good idea to use fors to fill a list then convert it to a matrix. the operation that you have can be vectorized with numpy from scratch. if you think that given the i,j, M(i,j) = 1/(j+i-1) considering that both indices starts at 1.
Here's my proposal :
def createMatrix2(n):
arr =np.arange(1,n+1)
xx,yy = np.meshgrid(arr,arr)
matrix = 1/(xx+yy-1)
return matrix
looking at Marat answer, I think his/her it's better, so tested the 3 methods:
EDIT: added wwii method as createMatrix4 (correcting the errors):
import numpy as np
from time import time
def createMatrix1(n):
matrix = []
for j in range(1,n+1):
row = []
for i in range(1,n+1):
value = 1/(i+j-1)
row.append(value)
matrix.append(row)
return np.array(matrix)
def createMatrix2(n):
arr =np.arange(1,n+1)
xx,yy = np.meshgrid(arr,arr)
matrix = 1/(xx+yy-1)
return matrix
def createMatrix3(n):
"""Marat's proposed matrix"""
return 1 / (1 + np.arange(n)[None, :] + np.arange(n)[:, None])
def createMatrix4(n):
""" wwii method"""
i,j = np.ogrid[1:n,1:n]
return 1/(i+j-1)
#test all the three methods
n = 10000
t1 = time()
m1 = createMatrix1(n)
t2 = time()
m2 = createMatrix2(n)
t3 = time()
m3 = createMatrix3(n)
t4 = time()
m4 = createMatrix4(n)
t5 = time()
print(np.allclose(m1,m2))
print(np.allclose(m1,m3))
print(np.allclose(m1,m4))
print("Matrix 1 (OP): ",t2-t1)
print("Matrix 2: (mine)",t3-t2)
print("Matrix 3: (Marat)",t4-t3)
print("Matrix 4: (wwii)",t5-t4)
# the output is:
#True
#True
#True
#Matrix 1 (OP): 18.4886577129364
#Matrix 2: (mine) 1.005324363708496
#Matrix 3: (Marat) 0.43033909797668457
#Matrix 4: (wwii) 0.5138359069824219
So Marat's solution is faster. As general comments:
Try to avoid fors loops
Think your problem as operation with indices and dessing operations with numpy arrays directly.
For last, given Marat's answer I thought my proposal is a easier to read, and understand. But it's just a subjective view
Your code can be written in another style, accelerated by numba library in a parallel no python mode:
import numba as nb
#nb.njit("float64[:, ::1](int64)", parallel=True, fastmath=True)
def createMatrix(n):
matrix = np.empty((n, n)) # np.zeros is slower than np.empty
for j in nb.prange(1, n + 1):
for i in range(1, n + 1):
matrix[j - 1, i - 1] = 1 / (i + j - 1)
return matrix
This solution will be faster than the Marat answer above 3 times.
Benchmarks: (temporary link to colab)
n = 1000
1000 loops, best of 5: 3.52 ms per loop # Marat
1000 loops, best of 5: 1.5 ms per loop # numba accelerated with np.zeros
1000 loops, best of 5: 1.05 ms per loop # numba accelerated with np.empty
n = 3000
1000 loops, best of 5: 39.5 ms per loop
1000 loops, best of 5: 19.3 ms per loop
1000 loops, best of 5: 8.91 ms per loop
n = 5000
1000 loops, best of 5: 109 ms per loop
1000 loops, best of 5: 53.5 ms per loop
1000 loops, best of 5: 24.8 ms per loop
What's the most pythonic way to mesh two strings together?
For example:
Input:
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'
Output:
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
For me, the most pythonic* way is the following which pretty much does the same thing but uses the + operator for concatenating the individual characters in each string:
res = "".join(i + j for i, j in zip(u, l))
print(res)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
It is also faster than using two join() calls:
In [5]: l1 = 'A' * 1000000; l2 = 'a' * 1000000
In [6]: %timeit "".join("".join(item) for item in zip(l1, l2))
1 loops, best of 3: 442 ms per loop
In [7]: %timeit "".join(i + j for i, j in zip(l1, l2))
1 loops, best of 3: 360 ms per loop
Faster approaches exist, but they often obfuscate the code.
Note: If the two input strings are not the same length then the longer one will be truncated as zip stops iterating at the end of the shorter string. In this case instead of zip one should use zip_longest (izip_longest in Python 2) from the itertools module to ensure that both strings are fully exhausted.
*To take a quote from the Zen of Python: Readability counts.
Pythonic = readability for me; i + j is just visually parsed more easily, at least for my eyes.
Faster Alternative
Another way:
res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
print(''.join(res))
Output:
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
Speed
Looks like it is faster:
%%timeit
res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
''.join(res)
100000 loops, best of 3: 4.75 µs per loop
than the fastest solution so far:
%timeit "".join(list(chain.from_iterable(zip(u, l))))
100000 loops, best of 3: 6.52 µs per loop
Also for the larger strings:
l1 = 'A' * 1000000; l2 = 'a' * 1000000
%timeit "".join(list(chain.from_iterable(zip(l1, l2))))
1 loops, best of 3: 151 ms per loop
%%timeit
res = [''] * len(l1) * 2
res[::2] = l1
res[1::2] = l2
''.join(res)
10 loops, best of 3: 92 ms per loop
Python 3.5.1.
Variation for strings with different lengths
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijkl'
Shorter one determines length (zip() equivalent)
min_len = min(len(u), len(l))
res = [''] * min_len * 2
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
print(''.join(res))
Output:
AaBbCcDdEeFfGgHhIiJjKkLl
Longer one determines length (itertools.zip_longest(fillvalue='') equivalent)
min_len = min(len(u), len(l))
res = [''] * min_len * 2
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
res += u[min_len:] + l[min_len:]
print(''.join(res))
Output:
AaBbCcDdEeFfGgHhIiJjKkLlMNOPQRSTUVWXYZ
With join() and zip().
>>> ''.join(''.join(item) for item in zip(u,l))
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
On Python 2, by far the faster way to do things, at ~3x the speed of list slicing for small strings and ~30x for long ones, is
res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)
This wouldn't work on Python 3, though. You could implement something like
res = bytearray(len(u) * 2)
res[::2] = u.encode("ascii")
res[1::2] = l.encode("ascii")
res.decode("ascii")
but by then you've already lost the gains over list slicing for small strings (it's still 20x the speed for long strings) and this doesn't even work for non-ASCII characters yet.
FWIW, if you are doing this on massive strings and need every cycle, and for some reason have to use Python strings... here's how to do it:
res = bytearray(len(u) * 4 * 2)
u_utf32 = u.encode("utf_32_be")
res[0::8] = u_utf32[0::4]
res[1::8] = u_utf32[1::4]
res[2::8] = u_utf32[2::4]
res[3::8] = u_utf32[3::4]
l_utf32 = l.encode("utf_32_be")
res[4::8] = l_utf32[0::4]
res[5::8] = l_utf32[1::4]
res[6::8] = l_utf32[2::4]
res[7::8] = l_utf32[3::4]
res.decode("utf_32_be")
Special-casing the common case of smaller types will help too. FWIW, this is only 3x the speed of list slicing for long strings and a factor of 4 to 5 slower for small strings.
Either way I prefer the join solutions, but since timings were mentioned elsewhere I thought I might as well join in.
If you want the fastest way, you can combine itertools with operator.add:
In [36]: from operator import add
In [37]: from itertools import starmap, izip
In [38]: timeit "".join([i + j for i, j in uzip(l1, l2)])
1 loops, best of 3: 142 ms per loop
In [39]: timeit "".join(starmap(add, izip(l1,l2)))
1 loops, best of 3: 117 ms per loop
In [40]: timeit "".join(["".join(item) for item in zip(l1, l2)])
1 loops, best of 3: 196 ms per loop
In [41]: "".join(starmap(add, izip(l1,l2))) == "".join([i + j for i, j in izip(l1, l2)]) == "".join(["".join(item) for item in izip(l1, l2)])
Out[42]: True
But combining izip and chain.from_iterable is faster again
In [2]: from itertools import chain, izip
In [3]: timeit "".join(chain.from_iterable(izip(l1, l2)))
10 loops, best of 3: 98.7 ms per loop
There is also a substantial difference between
chain(* and chain.from_iterable(....
In [5]: timeit "".join(chain(*izip(l1, l2)))
1 loops, best of 3: 212 ms per loop
There is no such thing as a generator with join, passing one is always going to be slower as python will first build a list using the content because it does two passes over the data, one to figure out the size needed and one to actually do the join which would not be possible using a generator:
join.h:
/* Here is the general case. Do a pre-pass to figure out the total
* amount of space we'll need (sz), and see whether all arguments are
* bytes-like.
*/
Also if you have different length strings and you don't want to lose data you can use izip_longest :
In [22]: from itertools import izip_longest
In [23]: a,b = "hlo","elworld"
In [24]: "".join(chain.from_iterable(izip_longest(a, b,fillvalue="")))
Out[24]: 'helloworld'
For python 3 it is called zip_longest
But for python2, veedrac's suggestion is by far the fastest:
In [18]: %%timeit
res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)
....:
100 loops, best of 3: 2.68 ms per loop
You could also do this using map and operator.add:
from operator import add
u = 'AAAAA'
l = 'aaaaa'
s = "".join(map(add, u, l))
Output:
'AaAaAaAaAa'
What map does is it takes every element from the first iterable u and the first elements from the second iterable l and applies the function supplied as the first argument add. Then join just joins them.
Jim's answer is great, but here's my favorite option, if you don't mind a couple of imports:
from functools import reduce
from operator import add
reduce(add, map(add, u, l))
A lot of these suggestions assume the strings are of equal length. Maybe that covers all reasonable use cases, but at least to me it seems that you might want to accomodate strings of differing lengths too. Or am I the only one thinking the mesh should work a bit like this:
u = "foobar"
l = "baz"
mesh(u,l) = "fboaozbar"
One way to do this would be the following:
def mesh(a,b):
minlen = min(len(a),len(b))
return "".join(["".join(x+y for x,y in zip(a,b)),a[minlen:],b[minlen:]])
I like using two fors, the variable names can give a hint/reminder to what is going on:
"".join(char for pair in zip(u,l) for char in pair)
Just to add another, more basic approach:
st = ""
for char in u:
st = "{0}{1}{2}".format( st, char, l[ u.index( char ) ] )
Feels a bit un-pythonic not to consider the double-list-comprehension answer here, to handle n string with O(1) effort:
"".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)
where all_strings is a list of the strings you want to interleave. In your case, all_strings = [u, l]. A full use example would look like this:
import itertools
a = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
b = 'abcdefghijklmnopqrstuvwxyz'
all_strings = [a,b]
interleaved = "".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)
print(interleaved)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
Like many answers, fastest? Probably not, but simple and flexible. Also, without too much added complexity, this is slightly faster than the accepted answer (in general, string addition is a bit slow in python):
In [7]: l1 = 'A' * 1000000; l2 = 'a' * 1000000;
In [8]: %timeit "".join(a + b for i, j in zip(l1, l2))
1 loops, best of 3: 227 ms per loop
In [9]: %timeit "".join(c for cs in zip(*(l1, l2)) for c in cs)
1 loops, best of 3: 198 ms per loop
Potentially faster and shorter than the current leading solution:
from itertools import chain
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'
res = "".join(chain(*zip(u, l)))
Strategy speed-wise is to do as much at the C-level as possible. Same zip_longest() fix for uneven strings and it would be coming out of the same module as chain() so can't ding me too many points there!
Other solutions I came up with along the way:
res = "".join(u[x] + l[x] for x in range(len(u)))
res = "".join(k + l[i] for i, k in enumerate(u))
You could use iteration_utilities.roundrobin1
u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'
from iteration_utilities import roundrobin
''.join(roundrobin(u, l))
# returns 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
or the ManyIterables class from the same package:
from iteration_utilities import ManyIterables
ManyIterables(u, l).roundrobin().as_string()
# returns 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
1 This is from a third-party library I have written: iteration_utilities.
I would use zip() to get a readable and easy way:
result = ''
for cha, chb in zip(u, l):
result += '%s%s' % (cha, chb)
print result
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
I presently have this code for factoring large numbers:
def f1(n):
return [[i, n//i] for i in range(1 , int(n**0.5) + 1) if n % i == 0]
It's the fastest version I've seen so far (If there's a faster way I'd love to know about that as well), but I'd like a single list of all the factors with no nesting (so I want something like: [factor 1, factor 2, factor 3,..., factor n-3, factor n-2, factor n-1, factor n] and so on. The order isn't really important.
As such I was wondering if there was a way to ascribe multiple assignments via a list comprehension.
i.e.
def f1(n):
return [i, n//i for i in range(1 , int(n**0.5) + 1) if n % i == 0]
That way I don't have a nested list. It would be faster and speed is of the essence.
I looked in the documentation and I couldn't find a single example of multiple assignments.
List comprehensions are great, but sometimes they're not best the solution, depending on requirements for readability and speed. Sometimes, just writing out the implied for loop (and if statement) is more readable and quicker.
def factors(n):
l = []
for i in range(1, int(n**0.5)+1):
if n % i == 0:
l.append(i)
l.append(n//i)
return l
For small numbers, the above function is quicker than the list comprehension. At larger numbers (1,000,000 and bigger), the function and list comprehension are equal in terms of speed.
For a slight speed increase you can also cache the append method of the list, though this makes the function slightly less readable.
def factors(n):
l = []
append = l.append
for i in range(1, int(n**0.5)+1):
if n % i == 0:
append(i)
append(n//i)
return l
Speed comparison:
In [86]: %timeit factors_list_comprehension(1000)
100000 loops, best of 3: 7.57 µs per loop
In [87]: %timeit factors_function(1000)
100000 loops, best of 3: 6.24 µs per loop
In [88]: %timeit factors_optimised_function(1000)
100000 loops, best of 3: 5.81 µs per loop
In [89]: %timeit factors_list_comprehension(1000000)
10000 loops, best of 3: 111 µs per loop
In [90]: %timeit factors_function(1000000)
10000 loops, best of 3: 108 µs per loop
In [91]: %timeit factors_optimised_function(1000000)
10000 loops, best of 3: 106 µs per loop
Use itertools.chain:
from itertools import chain
def f1(n):
return list(chain.from_iterable([i, n//i] for i in xrange(1 , int(n**0.5) + 1) if not n % i))
If you don't need a list remove the list call on chain and just iterate over the returned chain object.
If optimization is important you should use extend and xrange:
def f1(n):
l = []
for i in xrange(1, int(n**0.5)+1):
if not n % i:
l.extend((i,n//i))
return l
You can achieve the desired result using sum(). For example:
>>> sum([[1,6],[2,3]],[])
[1, 6, 2, 3]
We can define the answer in terms of your existing code:
def f2(n):
return sum(f1(n), [])
However, be careful that your code returns the square root twice when n is a perfect square:
>>> f1(9)
[[1, 9], [3, 3]]
>>> f2(9)
[1, 9, 3, 3]
In Matlab, the builtin isequal does a check if two arrays are equal. If they are not equal, this might be very fast, as the implementation presumably stops checking as soon as there is a difference:
>> A = zeros(1e9, 1, 'single');
>> B = A(:);
>> B(1) = 1;
>> tic; isequal(A, B); toc;
Elapsed time is 0.000043 seconds.
Is there any equavalent in Python/numpy? all(A==B) or all(equal(A, B)) is far slower, because it compares all elements, even if the initial one differs:
In [13]: A = zeros(1e9, dtype='float32')
In [14]: B = A.copy()
In [15]: B[0] = 1
In [16]: %timeit all(A==B)
1 loops, best of 3: 612 ms per loop
Is there any numpy equivalent? It should be very easy to implement in C, but slow to implement in Python because this is a case where we do not want to broadcast, so it would require an explicit loop.
Edit:
It appears array_equal does what I want. However, it is not faster than all(A==B), because it's not a built-in, but just a short Python function doing A==B. So it does not meet my need for a fast check.
In [12]: %timeit array_equal(A, B)
1 loops, best of 3: 623 ms per loop
First, it should be noted that in the OP's example the arrays have identical elements because B=A[:] is just a view onto the array, so:
>>> print A[0], B[0]
1.0, 1.0
But, although the test isn't a fit one, the basic complaint is true: Numpy does not have a short-circuiting equivalency check.
One can easily see from the source that all of allclose, array_equal, and array_equiv are just variations upon all(A==B) to match their respective details, and are not notable faster.
An advantage of numpy though is that slices are just views, and are therefore very fast, so one could write their own short-circuiting comparison fairly easily (I'm not saying this is ideal, but it does work):
from numpy import *
A = zeros(1e8, dtype='float32')
B = A[:]
B[0] = 1
C = array(B)
C[0] = 2
D = array(A)
D[-1] = 2
def short_circuit_check(a, b, n):
L = len(a)/n
for i in range(n):
j = i*L
if not all(a[j:j+L]==b[j:j+L]):
return False
return True
In [26]: %timeit short_circuit_check(A, C, 100) # 100x faster
1000 loops, best of 3: 1.49 ms per loop
In [27]: %timeit all(A==C)
1 loops, best of 3: 158 ms per loop
In [28]: %timeit short_circuit_check(A, D, 100)
10 loops, best of 3: 144 ms per loop
In [29]: %timeit all(A==D)
10 loops, best of 3: 160 ms per loop
I am trying to improve numpy performance by applying operations on a 2d array, the problem is that the value at each element in the array depends on the i,j location of that element.
Obviously the easy way to do this is to use a nested for-loop, but I was wondering if there might be a better way by referencing np.indices or something along those lines? Here is my 'stupid' code:
for J in range(1025):
for I in range(1025):
PSI[I][J] = A*math.sin((float(I+1)-.5)*DI)*math.sin((float(J+1)-.5)*DJ)
P[I][J] = PCF*(math.cos(2.*float(I)*DI)+math.cos(2.*float(J)*DJ))+50000.
Since you're doing multiplication among your two arrays, you can use the outer function, after using arange to get arrays of your sin/cos.
Something like this (use numpy's trig functions, since they're vectorized)
PSI_i = numpy.sin((arange(1,1026)-0.5)*DI)
PSI_j = numpy.sin((arange(1,1026)-0.5)*DJ)
PSI = A*outer(PSI_i, PSI_j)
P_i = numpy.cos(2.*arange(1,1026)*DI)
P_j = numpy.cos(2.*arange(1,1026)*DJ)
P = PCF*outer(P_i, P_j) + 50000
If your environment is set up using from numpy import * or from pylab import *, then you don't need those numpy. prefixes before your trig functions. I kept them in to distinguish them from the math ones, which won't work for this approach.
You can get a grid of the index values with indices:
I,J=np.indices(PSI.shape)
#All constants set to one
PSI2=np.sin(I+1-.5)*np.sin(J+1-.5)
print PSI-PSI2 # should be zero.
I did some timings with ipython:
import numpy as np
import math
A = 1
P = 1
DI = 1
DJ = 1
def a():
PSI=np.zeros((1025,1025))
for J in range(1025):
for I in range(1025):
PSI[I][J] = A*math.sin((float(I+1)-.5)*DI)*math.sin((float(J+1)-.5)*DJ)
%timeit a()
def b():
PSI=np.zeros((1025,1025))
for I,J in np.ndindex(*PSI.shape):
PSI[I,J] = A*math.sin((float(I+1)-.5)*DI)*math.sin((float(J+1)-.5)*DJ)
%timeit b()
def c():
I,J=np.indices((1025, 1025))
P2=A*np.sin((I+1-.5)*DI)*np.sin((J+1-.5)*DJ)
%timeit c()
def d():
PSI_i = np.sin((np.arange(1,1026)-0.5)*DI)
PSI_j = np.sin((np.arange(1,1026)-0.5)*DJ)
PSI = A*np.outer(PSI_i, PSI_j)
%timeit d()
The result is not at all surprising on my machine:
1 loops, best of 3: 1.75 s per loop
1 loops, best of 3: 3.51 s per loop
10 loops, best of 3: 77.1 ms per loop
100 loops, best of 3: 7.16 ms per loop
Try the ndenumerate function of numpy, which returns the value as well as the indices:
>>> a
array([[5, 5, 5],
[1, 2, 3]])
>>> for index, value in numpy.ndenumerate(a):
... print index, value
(0, 0) 5
(0, 1) 5
(0, 2) 5
(1, 0) 1
(1, 1) 2
(1, 2) 3