Use np.einsum to replace for loop - python

I want to make the following computation, i use random arrays for demonstration:
a = np.random.randint(10, size=(100,3))
b = np.random.randint(10, size=(3,2))
result = np.zeros(100)
for i in range(100):
result[i] = a[i] # b # b.T # a[i].T
To speed up the calculation, i thought about removing the for loop by an einsteins sum.
So I tried the following, with the same vectors:
result = np.einsum('ij,jk,jk,ij->i', a, b, b, a)
I put the 'i' on the right hand side of the einsum, because the result vector shows a correct size. However, the result is slightly different.
Can my problem be solved with an einsum?
Franz

In one einsum, it would be -
np.einsum('ij,jl,kl,ik->i',a,b,b,a)
Bringing in matrix-multiplication with np.dot -
np.einsum('ij,jk,ik->i',a,b.dot(b.T),a)
Or with more of it -
np.einsum('ij,ij->i',a.dot(b.dot(b.T)),a)
With np.matmul/#-operator in Python 3.x, it translates to -
((a#(b#b.T))[:,None,:] # a[:,:,None])[:,0,0]

Related

Simplifying looped np.tensordot expression

Currently, my script looks as follows:
import numpy as np
a = np.random.rand(10,5,2)
b = np.random.rand(10,5,50)
c = np.random.rand(10,2,50)
for i in range(a.shape[0]):
c[i] = np.tensordot(a[i], b[i], axes=(0,0))
I want to replicate the same behaviour without using a for loop, since it can be done in parallel. However, I have not found a neat way yet to do this with the tensordot function. Is there any way to create a one-liner for this operation?
You can use numpy.einsum function, in this case
c = np.einsum('ijk,ijl->ikl', a, b)
An alternative to einsum is matmul/#. The first array has to be transposed so the sum-of-products dimension is last:
In [162]: a = np.random.rand(10,5,2)
...: b = np.random.rand(10,5,50)
In [163]: c=a.transpose(0,2,1)#b
In [164]: c.shape
Out[164]: (10, 2, 50)
In [165]: c1 = np.random.rand(10,2,50)
...:
...: for i in range(a.shape[0]):
...: c1[i] = np.tensordot(a[i], b[i], axes=(0,0))
...:
In [166]: np.allclose(c,c1)
Out[166]: True
tensordot reshapes and transposes the arguments, reducing the task to simple dot. So while it's fine for switching which axes get the sum-of-products, it doesn't handle batches any better than dot. That's a big part of why matmul was added. np.einsum gives a same power (and more), but performance usually isn't quite as good (unless it's been "optimized" to the equivalent matmul).

Update a Numpy 1D array B in each loop to solve the matrix expression A*x = B

I need to solve x from sparse matrix expression A*x = B in loops, where A is a Scipy CSC sparse-matrix and B is Numpy 1D array. Both A and B are large about 500K rows. Basically, I need to update B in each loop. So the speed to update B is critical. Right now, my way is to define csc_matrix in each loop, and then convert it to 1D Numpy array as below which is really expensive in terms of time:
B = csc_matrix((data,(row, col)),shape=(500000, 1), dtype='complex128').toarray()[:,0];
Please note:
row has lots of the repeated index, such as [0,1,2,0,2,2,3,3....],
col is [0,0, 0,.......0];
Is there fast way to update B in each loop?
Assuming col contains only zeros, data/row/col are Numpy arrays and you want B stored as a Numpy array. You can use Numba to generate B efficiently. Here is how:
import numba
# Works in-place to avoid any slow allocation in the critical loop.
# Note that the type of row may be different.
#nb.njit(void(nb.complex128[:], nb.complex128[:], nb.int64[:]))
def updateVector(B, data, row):
B.fill(0.)
for i in range(len(row)):
B[row[i]] += data[i]
updateVector update the value of B in-place. This assume B has been allocated at the correct size before (using for example B = np.empty(500000, dtype=np.complex128)).
On my machine this is 14 times faster with the following configuration:
row = np.random.randint(0, 500000, size=100000)
col = np.zeros(100000, dtype=np.int64)
data = np.random.rand(100000) + np.random.rand(100000) * 1j

score calculation takes too long: avoid for loops - python

I am new to python and I need your kindly help.
I have three matrices, in particular:
Matrix M (class of the matrix: scipy.sparse.csc.csc_matrix), dimensions: N x C;
Matrix G (class of the matrix: numpy.ndarray), dimensions: C x T;
Matrix L (class of the matrix: numpy.ndarray), dimensions: T x N.
Where: N = 10000, C = 1000, T = 20.
I would like to calculate, this score:
I tried by using two for loops , one for the i-index and one for c. Furthermore, I used a dot product for obtaining the last sum in the equation. But my implementation requires too much times for giving the result.
This is what I implemented:
score = 0.0
for i in range(N):
for c in range(C):
Mic = M[i,c]
score += np.outer(Mic,(np.dot(L[:,i],G[c,:])))
Is there a way to avoid the two for loops?
Thank you in advance!
Best
Try this score = np.einsum("ic,ti,ct->", M, L, G)
EDIT1
By the way, in your case, score = np.sum(np.diag(M # G # L)) (in PYTHON3 starting from version 3.5, you can use the semantics of the # operator for matmul function) is faster than einsum (especially in np.trace((L # M) # G ) due to efficient use of memory, maybe #hpaulj meant this in his comment). But einsum is easier to use for complex tensor products (to encode with einsum I used your math expression directly without thinking about optimization).
Generally, using for with numpy results in a dramatic slowdown in computation speed (think "vectorize your computations" in the case of numpy).

Unique elements of a polynomial in python

I need to get at the pairwise terms when you expand a product of sums in python.
e.g. expanding (a1+a2+a3)(b1+b2+b3)(c1+c2+c3) gives:
a1b1c1 + a1b1c2 + a1b1c3+ a1b2c1 + ... + a3b3c3
with 22 or extra terms.
I need to find a way to remove any elements of this expansion where the indices match (e.g. anything with a1 and b1, or b2 and c2).
Or in code:
import numpy as np
a = np.array([0,1,2])
b = np.array([3,4,5])
c = np.array([6,7,8])
output = a.sum() * b.sum() * c.sum()
The I need to remove the terms a[i]*b[j]*c[k] where i==j, i==k or j==k.
For small vectors it's straightforward, but as these vectors get long and there are more of them there are a lot more possible combinations to try (my vectors are ~200 elements).
My boss has a scheme for doing this in Mathematica which does the algebraic expansion explicitly, and pulls out terms with matching exponents, but this relies very heavily on Mathematica's symbolic algebraic setup, so I can't see how to implement it in Python.
itertools.combinations give you a list of all such combinations, but this is really slow for longer vectors. I've also looked at using sympy, but this also didn't seem suited to very long vectors.
Can anyone recommend a better way of doing this in Python?
How about something like this? Does this speed up your calculations?
import numpy as np
import itertools
a = np.array([0,1,2])
b = np.array([3,4,5])
c = np.array([6,7,8])
combination = [a, b, c]
added = []
# Getting the required permutations
for p in itertools.permutations(range(len(a)), len(a)):
# Using iterators and generators speeds up your calculations
# zip(combination, p) pairs the index to the correct lists
# so for p = (0, 1, 2) we get (a,0), (b, 1), (c, 2)
# now find sum of (a[0], b[1], c[2]) and appened to added
added.append(sum(i[j] for i, j in zip(combination, p)))
# print added and total sum
print(added)
print(sum(added))
I don't know if it is faster than your current implementation, but by rolling a NumPy array (special_sum below) you can avoid terms which have duplicated indexes faster than the "obvious" implementation (regular_sum):
a = np.random.randint(15, size=100)
b = np.random.randint(15, size=100)
c = np.random.randint(15, size=100)
def regular_sum(a, b, c):
n = len(a)
s = 0
for i in range(n):
for j in range(n):
for k in range(n):
if i==j or i==k or j==k:
continue
s += a[i] * b[j] * c[k]
return s
def special_sum(a, b, c):
# all combinations b1c1, b1c2, b1c3, b2c1, ..., b3c3
A = np.outer(b, c)
# remove bici terms
np.fill_diagonal(A, 0)
# Now sum terms like: a1 * (terms without b1 or c1),
# a2 * (terms without b2 or c2), ..., rolling the array A
# to keep the unwanted terms in the first row and first column:
s = 0
for i in range(0,len(a)):
s += np.sum(a[i] * A[1:,1:])
A = np.roll(A, -1, axis=0)
A = np.roll(A, -1, axis=1)
return s
I get:
In [44]: %timeit regular_sum(a,b,c)
1 loops, best of 3: 454 ms per loop
In [45]: %timeit special_sum(a,b,c)
100 loops, best of 3: 6.44 ms per loop

Simultaneous Equations with given conditions

to start off I have already solved this problem so it's not a big deal, I'm just asking to satisfy my own curiosity. The question is how to solve a series of simultaneous equations given a set of constraints. The equations are:
tau = 62.4*d*0.0007
A = (b + 1.5*d)*d
P = b + 2*d*sqrt(1 + 1.5**2)
R = A/P
Q = (1.486/0.03)*A*(R**(2.0/3.0))*(0.0007**0.5)
and the conditions are:
tau <= 0.29, Q = 10000 +- say 3, and minimize b
As I mentioned I was already able to come up with a solution using a series of nested loops:
b = linspace(320, 330, 1000)
d = linspace(0.1, 6.6392, 1000)
ansQ = []
ansv = []
anstau = []
i_index = []
j_index = []
for i in range(len(b)):
for j in range(len(d)):
tau = 62.4*d[j]*0.0007
A = (b[i] + 1.5*d[j])*d[j]
P = b[i] + 2*d[j]*sqrt(1 + 1.5**2)
R = A/P
Q = (1.486/0.03)*A*(R**(2.0/3.0))*(0.0007**0.5)
if Q >= 10000 and tau <= 0.29:
ansQ.append(Q)
ansv.append(Q/A)
anstau.append(tau)
i_index.append(i)
j_index.append(j)
This takes a while, and there is something in the back of my head saying that there must be an easier/more elegant solution to this problem. Thanks (Linux Mint 13, Python 2.7.x, scipy 0.11.0)
You seem to only have two degrees of freedom here---you can rewrite everything in terms of b and d or b and tau or (pick your two favorites). Your constraint on tau implies directly a constraint on d, and you can use your constraint on Q to imply a constraint on b.
And it doesn't look (to me at least, I still haven't finished my coffee) that your code is doing anything other than plotting some two dimensional functions over a grid you've defined--NOT solving a system of equations. I normally understand "solving" to involve setting something equal to something else, and writing one variable as a function of another variable.
It does appear you've only posted a snippet, though, so I'll assume you do something else with your data down stream.
Ok, I see. I think this isn't really a minimization problem, it's a plotting problem. The first thing I'd do is see what ranges are implied for b and d from your constraints on tau, and then use that to derive a constraint on d. Then you can mesh those points with meshgrid (as you mentioned below) and run over all combinations.
Since you're applying the constraint before you apply the mesh (as opposed to after, as in your code), you'll only be sampling the parameter space that you're interested in. In your code you generate a bunch of junk you're not interested in, and pick out the gems. If you apply your constraints first, you'll only be left with gems!
I'd define my functions like:
P = lambda b, d: b + 2*d*np.sqrt(1 + 1.5**2)
which works like
>>> import numpy as np
>>> P = lambda b, d: b + 2*d*np.sqrt(1 + 1.5**2)
>>> P(1,2)
8.2111025509279791
Then you can write another function to serve up b and d for you, so you can do something like:
def get_func_vals(b, d):
pvals.append(P(b,d))
or, better yet, store b and d as tuples in a function that doesn't return but yields:
pvals = [P(b,d) for (b,d) in thing_that_yields_b_and_d_tuples]
I didn't test this last line of code, and I always screw up these parenthesis, but I think it's right.

Categories

Resources