I need to get at the pairwise terms when you expand a product of sums in python.
e.g. expanding (a1+a2+a3)(b1+b2+b3)(c1+c2+c3) gives:
a1b1c1 + a1b1c2 + a1b1c3+ a1b2c1 + ... + a3b3c3
with 22 or extra terms.
I need to find a way to remove any elements of this expansion where the indices match (e.g. anything with a1 and b1, or b2 and c2).
Or in code:
import numpy as np
a = np.array([0,1,2])
b = np.array([3,4,5])
c = np.array([6,7,8])
output = a.sum() * b.sum() * c.sum()
The I need to remove the terms a[i]*b[j]*c[k] where i==j, i==k or j==k.
For small vectors it's straightforward, but as these vectors get long and there are more of them there are a lot more possible combinations to try (my vectors are ~200 elements).
My boss has a scheme for doing this in Mathematica which does the algebraic expansion explicitly, and pulls out terms with matching exponents, but this relies very heavily on Mathematica's symbolic algebraic setup, so I can't see how to implement it in Python.
itertools.combinations give you a list of all such combinations, but this is really slow for longer vectors. I've also looked at using sympy, but this also didn't seem suited to very long vectors.
Can anyone recommend a better way of doing this in Python?
How about something like this? Does this speed up your calculations?
import numpy as np
import itertools
a = np.array([0,1,2])
b = np.array([3,4,5])
c = np.array([6,7,8])
combination = [a, b, c]
added = []
# Getting the required permutations
for p in itertools.permutations(range(len(a)), len(a)):
# Using iterators and generators speeds up your calculations
# zip(combination, p) pairs the index to the correct lists
# so for p = (0, 1, 2) we get (a,0), (b, 1), (c, 2)
# now find sum of (a[0], b[1], c[2]) and appened to added
added.append(sum(i[j] for i, j in zip(combination, p)))
# print added and total sum
print(added)
print(sum(added))
I don't know if it is faster than your current implementation, but by rolling a NumPy array (special_sum below) you can avoid terms which have duplicated indexes faster than the "obvious" implementation (regular_sum):
a = np.random.randint(15, size=100)
b = np.random.randint(15, size=100)
c = np.random.randint(15, size=100)
def regular_sum(a, b, c):
n = len(a)
s = 0
for i in range(n):
for j in range(n):
for k in range(n):
if i==j or i==k or j==k:
continue
s += a[i] * b[j] * c[k]
return s
def special_sum(a, b, c):
# all combinations b1c1, b1c2, b1c3, b2c1, ..., b3c3
A = np.outer(b, c)
# remove bici terms
np.fill_diagonal(A, 0)
# Now sum terms like: a1 * (terms without b1 or c1),
# a2 * (terms without b2 or c2), ..., rolling the array A
# to keep the unwanted terms in the first row and first column:
s = 0
for i in range(0,len(a)):
s += np.sum(a[i] * A[1:,1:])
A = np.roll(A, -1, axis=0)
A = np.roll(A, -1, axis=1)
return s
I get:
In [44]: %timeit regular_sum(a,b,c)
1 loops, best of 3: 454 ms per loop
In [45]: %timeit special_sum(a,b,c)
100 loops, best of 3: 6.44 ms per loop
Related
I want to make the following computation, i use random arrays for demonstration:
a = np.random.randint(10, size=(100,3))
b = np.random.randint(10, size=(3,2))
result = np.zeros(100)
for i in range(100):
result[i] = a[i] # b # b.T # a[i].T
To speed up the calculation, i thought about removing the for loop by an einsteins sum.
So I tried the following, with the same vectors:
result = np.einsum('ij,jk,jk,ij->i', a, b, b, a)
I put the 'i' on the right hand side of the einsum, because the result vector shows a correct size. However, the result is slightly different.
Can my problem be solved with an einsum?
Franz
In one einsum, it would be -
np.einsum('ij,jl,kl,ik->i',a,b,b,a)
Bringing in matrix-multiplication with np.dot -
np.einsum('ij,jk,ik->i',a,b.dot(b.T),a)
Or with more of it -
np.einsum('ij,ij->i',a.dot(b.dot(b.T)),a)
With np.matmul/#-operator in Python 3.x, it translates to -
((a#(b#b.T))[:,None,:] # a[:,:,None])[:,0,0]
I have a numpy array embed_vec of length tot_vec in which each entry is a 3d vector:
[[ 0.52483319 0.78015841 0.71117216]
[ 0.53041481 0.79462171 0.67234534]
[ 0.53645428 0.80896727 0.63119403]
...,
[ 0.72283509 0.40070804 0.15220522]
[ 0.71277758 0.38498613 0.16141834]
[ 0.70221445 0.36918032 0.17370776]]
For each of the elements in this array, I want to find out the number of other entries which are "close" to that entry. By close, I mean that the distance between two vectors is less than a specified value R. For this, I must compare all the possible pairs in this array with each other and then find out the number of close vectors for each of the vectors in the array. So I am doing this:
p = np.zeros(tot_vec) # This contains the number of close vectors
for i in range(tot_vec-1):
for j in range(i+1, tot_vec):
if np.linalg.norm(embed_vec[i]-embed_vec[j]) < R:
p[i] += 1
However, this is extremely inefficient because I have two nested python loops and for larger array sizes, this takes forever. If this were in C++ or Fortran, it wouldn't have been a great issue. My question is, can one achieve the same thing using numpy efficiently using some vectorization method? As a side note, I don't mind a solution using Pandas also.
Approach #1 : Vectorized approach -
def vectorized_app(embed_vec, R):
tot_vec = embed_vec.shape[0]
r,c = np.triu_indices(tot_vec,1)
subs = embed_vec[r] - embed_vec[c]
dists = np.einsum('ij,ij->i',subs,subs)
return np.bincount(r,dists<R**2,minlength=tot_vec)
Approach #2 : With less loop complexity (for very large arrays) -
def loopy_less_app(embed_vec, R):
tot_vec = embed_vec.shape[0]
Rsq = R**2
out = np.zeros(tot_vec,dtype=int)
for i in range(tot_vec):
subs = embed_vec[i] - embed_vec[i+1:tot_vec]
dists = np.einsum('ij,ij->i',subs,subs)
out[i] = np.count_nonzero(dists < Rsq)
return out
Benchmarking
Original approach -
def loopy_app(embed_vec, R):
tot_vec = embed_vec.shape[0]
p = np.zeros(tot_vec) # This contains the number of close vectors
for i in range(tot_vec-1):
for j in range(i+1, tot_vec):
if np.linalg.norm(embed_vec[i]-embed_vec[j]) < R:
p[i] += 1
return p
Timings -
In [76]: # Sample random array
...: embed_vec = np.random.rand(3000,3)
...: R = 0.5
...:
In [77]: %timeit loopy_app(embed_vec, R)
1 loops, best of 3: 50.5 s per loop
In [78]: %timeit loopy_less_app(embed_vec, R)
10 loops, best of 3: 143 ms per loop
350x+ speedup there!
Going with much bigger array with the proposed loopy_less_app -
In [81]: # Sample random array
...: embed_vec = np.random.rand(20000,3)
...: R = 0.5
...:
In [82]: %timeit loopy_less_app(embed_vec, R)
1 loops, best of 3: 4.47 s per loop
I am intrigued by that question and attempted to solve it efficintly using scipy's cKDTree. However, this approach may run out of memory because internally a list of all pairs with distance <= R is maintained. If your R and tot_vec are small enough it will work:
import numpy as np
from scipy.spatial import cKDTree as KDTree
tot_vec = 60000
embed_vec = np.random.randn(tot_vec, 3)
R = 0.1
tree = KDTree(embed_vec, leafsize=100)
p = np.zeros(tot_vec)
for pair in tree.query_pairs(R):
p[pair[0]] += 1
p[pair[1]] += 1
In case memory is an issue, with some effort it is possible to rewrite query_pairs as a generator function in Python at the cost of C performance.
first broadcast the difference:
disp_vecs=tot_vec[:,None,:]-tot_vec[None,:,:]
Now, depending on how big your dataset is, you may want to do a fist pass without all the math. If the distance is less than r, all the components should be less than r
first_mask=np.max(disp_vec, axis=-1)<r
Then do the actual calculation
disps=np.linlg.norm(disp_vec[first_mask],axis=-1)
second_mask=disps<r
Now reassign
disps=disps[second_mask]
first_mask[first_mask]=second_mask
disps are now the good values, and first_mask is a boolean mask of where they go. You can process from there.
I have a small block of code which I use to fill a list with integers. I need to improve its performance, perhaps translating the whole thing into numpy arrays, but I'm not sure how.
Here's the MWE:
import numpy as np
# List filled with integers.
a = np.random.randint(0,100,1000)
N = 10
b = [[] for _ in range(N-1)]
for indx,integ in enumerate(a):
if 0<elem<N:
b[integ-1].append(indx)
This is what it does:
for every integer (integ) in a
see if it is located between a given range (0,N)
if it is, store its index in a sub-list of b where the index of said sub-list is the original integer minus 1 (integ-1)
This bit of code runs pretty fast but my actual code uses much larger lists, hence the need to improve its performance.
Here's one way of doing it:
mask = (a > 0) & (a < N)
elements = a[mask]
indicies = np.arange(a.size)[mask]
b = [indicies[elements == i] for i in range(1, N)]
If we time the two:
import numpy as np
a = np.random.randint(0,100,1000)
N = 10
def original(a, N):
b = [[] for _ in range(N-1)]
for indx,elem in enumerate(a):
if 0<elem<N:
b[elem-1].append(indx)
return b
def new(a, N):
mask = (a > 0) & (a < N)
elements = a[mask]
indicies = np.arange(a.size)[mask]
return [indicies[elements == i] for i in range(1, N)]
The "new" way is considerably (~20x) faster:
In [5]: %timeit original(a, N)
100 loops, best of 3: 1.21 ms per loop
In [6]: %timeit new(a, N)
10000 loops, best of 3: 57 us per loop
And the results are identical:
In [7]: new_results = new(a, N)
In [8]: old_results = original(a, N)
In [9]: for x, y in zip(new_results, old_results):
....: assert np.allclose(x, y)
....:
In [10]:
The "new" vectorized version also scales much better to longer sequences. If we use a million-item-long sequence for a, the original solution takes slightly over 1 second, while the new version takes only 17 milliseconds (a ~70x speedup).
Try this solution! The first half I shamelessly stole from Joe's answer, but after that it uses sorting and binary search, which scales better with N.
def new(a, N):
mask = (a > 0) & (a < N)
elements = a[mask]
indices = np.arange(a.size)[mask]
sorting_idx = np.argsort(elements, kind='mergesort')
ind_sorted = indices[sorting_idx]
x = np.searchsorted(elements, range(N), side='right', sorter=sorting_idx)
return [ind_sorted[x[i]:x[i+1]] for i in range(N-1)]
You could put x = x.tolist() in there for an additional albeit small speed improvement (NB: if you do an a = a.tolist() in your original code, you do get a significant speedup). Also, I used 'mergesort' which is a stable sort but if you don't need the final result sorted, you can get away with a faster sorting algorithm.
V is (n,p) numpy array typically dimensions are n~10, p~20000
The code I have now looks like
A = np.zeros(p)
for i in xrange(n):
for j in xrange(i+1):
A += F[i,j] * V[i,:] * V[j,:]
How would I go about rewriting this to avoid the double python for loop?
While Isaac's answer seems promising, as it removes those two nested for loops, you are having to create an intermediate array M which is n times the size of your original V array. Python for loops are not cheap, but memory access ain't free either:
n = 10
p = 20000
V = np.random.rand(n, p)
F = np.random.rand(n, n)
def op_code(V, F):
n, p = V.shape
A = np.zeros(p)
for i in xrange(n):
for j in xrange(i+1):
A += F[i,j] * V[i,:] * V[j,:]
return A
def isaac_code(V, F):
n, p = V.shape
F = F.copy()
F[np.triu_indices(n, 1)] = 0
M = (V.reshape(n, 1, p) * V.reshape(1, n, p)) * F.reshape(n, n, 1)
return M.sum((0, 1))
If you now take both for a test ride:
In [20]: np.allclose(isaac_code(V, F), op_code(V, F))
Out[20]: True
In [21]: %timeit op_code(V, F)
100 loops, best of 3: 3.18 ms per loop
In [22]: %timeit isaac_code(V, F)
10 loops, best of 3: 24.3 ms per loop
So removing the for loops is costing you an 8x slowdown. Not a very good thing... At this point you may even want to consider whether a function taking about 3ms to evaluate requires any further optimization. IN case you do, there's a small improvement which can be had by using np.einsum:
def einsum_code(V, F):
n, p = V.shape
F = F.copy()
F[np.triu_indices(n, 1)] = 0
return np.einsum('ij,ik,jk->k', F, V, V)
And now:
In [23]: np.allclose(einsum_code(V, F), op_code(V, F))
Out[23]: True
In [24]: %timeit einsum_code(V, F)
100 loops, best of 3: 2.53 ms per loop
So that's roughly a 20% speed up that introduces code that may very well not be as readable as your for loops. I would say not worth it...
The difficult part about this is that you only want to take the sum of the elements with j <= i. If not for that then you could do the following:
M = (V.reshape(n, 1, p) * V.reshape(1, n, p)) * F.reshape(n, n, 1)
A = M.sum(0).sum(0)
If F is symmetric (if F[i,j] == F[j,i]) then you can exploit the symmetry of M above as follows:
D = M[range(n), range(n)].sum(0)
A = (M.sum(0).sum(0) - D) / 2.0 + D
That said, this is really not a great candidate for vectorization, as you have n << p and so your for-loops are not going to have much effect on the speed of this computation.
Edit: As Bill said below, you can just make sure that the elements of F that you don't want to use are set to zero first, and then the M.sum(0).sum(0) result will be what you want.
The expression can be written as
and thus you can sum it like this using the np.newaxis-construct:
na = np.newaxis
X = (np.tri(n)*F)[:,:,na]*V[:,na,:]*V[na,:,:]
X.sum(axis=1).sum(axis=0)
Here a 3D-array X[i,j,p] is constructed, and then the 2 first axes are summed, which results in a 1D array A[p]. Additionaly F was multiplied with a triangular matrix to restrict the summation according to the problem.
I'm looking for a fast way to calculate a sum of n outer products.
Essentially, I start with two matrices generated from normal distributions - there are n vectors with v elements:
A = np.random.normal(size = (n, v))
B = np.random.normal(size = (n, v))
What I'd like is to calculate the outer products of each vector of size v in A and B and sum them together.
Note that A * B.T doesn't work - A is of size n x v whereas B is of size v x n.
The best I can do is create a loop where the outer products are constructed, then summed later. I have it like so:
outers = np.array([A[i] * B[i].T])
This creates an n x v x v array (the loop is within the list comprehension, which is subsequently converted into an array), which I can then sum together by using np.sum(outers, axis = 0). However, this is quite slow, and I was wondering if there's a vectorized function I could use to speed this up.
If anybody has any advice, I would really appreciate it!
It seems to me all you need to do is change the order of the transpositions, and do A.T * B instead of A * B.T.
If that's not quite what you are after, take a look at np.einsum, which can do some very powerful voodoo. For the above example, you would do:
np.einsum('ij,ik->jk', A, B)
Also consider np.outer.
np.array([np.outer(A, B) for i in xrange(n)]).sum(0)
although np.einsum suggested by #Jamie is the clear winner.
In [63]: %timeit np.einsum('ij,ik->jk', A, B)
100000 loops, best of 3: 4.61 us per loop
In [64]: %timeit np.array([np.outer(A[i], B[i]) for i in xrange(n)]).sum(0)
10000 loops, best of 3: 169 us per loop
and, to be sure, their results are identical:
In [65]: np.testing.assert_allclose(method_outer, method_einsum)
But, as an aside, I do not find that A.T * B or A * B.T broadcast successfully.