I have been recently getting into python optimization by learning about broadcasting and vectorization. I would say I have got the basics down, but there are just some for loops I am uncapable of vectorizing. My question is: is it possible to convert any python for loop into a C++ for loop using numpy?
As an example:
import numpy as np
N = 10
A = np.eye(4) * 0.5
r = np.eye(4)
W = np.random.normal(0, np.sqrt(1), N)
r_t = r[None]
for i in range(N):
z = np.trace(A # r)
temp = r * W[i] * z
r_t = np.concatenate((r_t, temp[None]))
r = temp
The code is an example, and it is not supposed to do anything in particular (aside from returning r_t as a (N, 4,4) array).
W is an array of N values randomly picked from a normal distribution; A is identity matrix scaled by 0.5, and r is initially the identity matrix.
The problem I am finding is that I want "r" to be updated at the end of every loop, so that the value of "z" is different at every loop as well. Is there any way one could vectorize a loop of this sort?
Related
I tried implementing the forward substitution method, a solving process to solve the problem Lx = b with L being a lower triangle matrix and x,b as vectors.
This was an easy task:
def tri_solve(L,b):
n = len(b)
x = np.zeros(n)
x[0] = b[0]/L[0,0];
for i in range(1,n):
comp = 0;
for k in range(0,i):
index = L[i,k]
preSolution = x[k]
comp = comp + index * preSolution
x[i] = 1/L[i,i] * (b[i] - comp)
return x;
Now I compared my calculation times for different sized matrices several times with linalg.solve from the scipy module and it turns out that it is much faster. This makes sense in some points, since SciPy is written in C and C++, but I still expected similar or better calculation times for matrices up to 10x10 dimension. Beginning with 6x6 matrices, linalg.solves becomes slightly faster on average.
Is there a way to improve my rather simple solution?
You could try solve_triangular
If you want to accelerate your code, what you could do is to vectorize the inner loop.
def tri_solve(L,b):
n = len(b)
x = np.zeros(n)
x[0] = b[0]/L[0,0];
for i in range(1,n):
comp = np.sum(L[i,:i] * x[:i])
x[i] = 1/L[i,i] * (b[i] - comp)
return x;
Edit: How to use it
You have to pass as first argument a square lower triangular matrix and as second argument you can pass a 1D array
N = 20
A = np.tril(np.random.randn(N, N))
b = np.random.randn(N)
assert np.allclose(np.linalg.solve(A, b), tri_solve(A, b))
Of course this is a naive implementation and is not stable, you can't use it to solve very large or ill conditioned systems.
I have a complex matrix C with dimensions (r, r) as well as a complex vector of size r. I need to compute a new matrix from C and v following this equation:
where K is also a square matrix of dimensions (r, r). Here is the code to compute K with three loops:
import numpy as np
import matplotlib.pyplot as plt
r = 9
# Create random matrix
C = np.random.rand(r,r) + np.random.rand(r,r) * 1j
v = np.random.rand(r) + np.random.rand(r) * 1j
# Original loops
K = np.zeros((r, r))
for m in range(r):
for n in range(r):
for i in range(r):
K[m,n] += np.imag( C[i,m] * np.conj(C[i,n]) * np.sign(np.imag(v[i])) )
plt.figure()
plt.imshow(K)
plt.show()
Removing the loop with i is relatively easy:
# First optimization
K = np.zeros((r, r))
for m in range(r):
for n in range(r):
K[m,n] = np.imag(np.sum(C[:,m] * np.conj(C[:,n]) * np.sign(np.imag(v)) ))
but I am not sure how to proceed to vectorize the two remaining loops. Is it actually possible in this case?
I had a lot of these of problems and here is how I usually proceeded to find solutions to writing out vectorized code.
Here is what I have noticed about your summation. Cool conclusion is that you probably do not need vectorization at all, as you can express your whole calculation as a single product of 2D matrics. Here comes...
Lets first define following matrix (sorry for lack of Latex notation, Stackoverflow does not support Mathjax) :
A_{i,j} = c_{i,j}.
B_{i,j} = c_{i,j} * sgn(Im(v_i))
Then you can write your summation as:
k_{m,n} = Im( \sum_{i=1}^{r} c_{i,m} * sgn(Im(v_i)) * c_{i,n}^* ) = Im ( \sum_{i=1}^{r} B_{i,m} * A_{i,n}^* ) = Im( \sum_{i=1}^{r} B_{m,i}^T * A_{i,n}^* )
The expression above inside of Im(.) is the by definition of matrix multiplication equivalent to following :
k_{m,n} = Im( (B^T * A^*)_{m,n} )
Which means that your matrix k can be expressed as product of transpose of matrix B and product of matrix A. In your code the matrix matrix A is assigned already to variable C. So the vectorization could be done as follows:
C = np.random.rand(r,r) + np.random.rand(r,r) * 1j
v = np.random.rand(r) + np.random.rand(r) * 1j
k = np.imag( (C * np.sign(np.imag(v)).T # np.conj(C) )
And you have avoided both nasty loops and convoluted expressions
This looks like matrix multiplication:
out = np.imag((C*np.sign(np.imag(v))[:,None]).T # np.conj(C))
Or you can use np.einsum:
out = np.imag(np.einsum('im,in,i', C, np.conj(C), np.sign(np.imag(v))))
Verification with your approach:
np.all(np.abs(out-K) < 1e-6)
# True
I found something that can work for now. However, one loop remains and since the resulting matrix is symetric, there is still some optimization to be made.
Instead of removing the i loop, I removed the two other ones:
K = np.zeros((r, r), dtype=np.complex128)
for i in range(r):
K += adjointMatrix(C) # (np.sign(np.imag(v)) * C)
K = np.imag(K)
with:
def adjointMatrix(X):
return np.conjugate( np.transpose(X) )
I am doing a double loop to sum a function that has mesh grids as an input. The problem is that it runs very slow... I want to optimize the code with an alternative procedure, maybe using vectorize function of numpy, but I don't see how can be implemented. I show you the code that I have:
import numpy as np
import time
Lxx = 2.
Lyy = 1.0
dxx = dyy = 0.01
nxx = 100
nyy = 100
XX, YY = np.meshgrid(np.arange(0, Lxx+dxx, dxx), np.arange(0, Lyy+dyy, dyy)) #mesh grid
def solution(xx,yy,nnmax,mmmax):
sol = 0.
for m in range(nnmax):
for n in range(mmmax):
sol = sol+np.sin(XX*0.356*n)+np.cos(YY*2.3*m)
return sol
start = time.time()
solution(XX,YY,nxx,nyy)
end = time.time()
print ("TIME", end-start)
What I want is to make the sum for large values in nxx, nyy. But of course then it takes a lot of time...This is the reason why I want optimize the code.
If you notice, the terms of the sum are completely separable: they don't share any loop variables. You can therefore create independent (smaller) arrays for the sum over XX, n and YY, m, and take the trig functions and sum of those. The final grid can be accumulated by broadcasting.
To begin, don't bother making the grid:
x = np.arange(0, Lxx+dxx, dxx)
y = np.arange(0, Lyy+dyy, dyy)
Compute a single sum using broadcasting:
n = np.arange(nyy)[:, None]
m = np.arange(nxx)[:, None]
sumx = np.sin(x + 0.356 * n).sum(0)
sumy = np.cos(y + 2.3 * m).sum(0)
You can use the same broadcasting trick to get the final sum in a grid:
result = sumx[:, None] + sumy
I have made a code that calculates the dissolution of fluids, the problem is that the code is very poor, so I have been looking at that with numpy I can optimize it but I have been stuck without knowing how to do the following code using numpy and the roll function. Basically I have a matrix that the index and cannot be more than 1024, for this I use% to calculate what index it is. But this takes a long time.
I tried using numpy, using roll, rotating the matrix and then I don't have to calculate the module. But I don't know how to take the values of the neighbors.
def evolve(grid, dt, D=1.0):
xmax, ymax = grid_shape
new_grid = [[0.0,] * ymax for x in range(xmax)]
for i in range(xmax):
for j in range(ymax):
grid_xx = grid[(i+1)%xmax][j] + grid[(i-1)%xmax][j] - 2.0 * grid[i][j]
grid_yy = grid[i][(j+1)%ymax] + grid[i][(j-1)%ymax] - 2.0 * grid[i][j]
new_grid[i][j] = grid[i][j] + D * (grid_xx + grid_yy) * dt
return new_grid
You have to rewrite the evolve function from (almost) zero using numpy.
Here the guidelines:
First, grid must be a 2D numpy array, not a list of lists.
Your teacher suggested the roll function: look at its docs and try to understand how it works. roll will solve the problem of finding neighbour entries in the matrix by shifting (or rolling) the matrix over one of the axis. You can then create shifted versions of grid in the four directions and use them, instead of searching for neighbours.
Once you have the shifted grids, you'll see that you will not need the for loops to calculate each cell of new_grid: you can use vectorized calculation, which is faster.
So the code will look like this:
def evolve(grid, dt, D=1.0):
if not isinstance(grid, np.ndarray): #ensuring that is a numpy array.
grid = np.array(grid)
u_grid = np.roll(grid, 1, axis=0)
d_grid = np.roll(grid, -1, axis=0)
r_grid = np.roll(grid, 1, axis=1)
l_grid = np.roll(grid, -1, axis=1)
new_grid = grid + D * (u_grid + d_grid + r_grid + l_grid - 4.0*grid) * dt
return new_grid
With a 1024 x 1024 matrix, each numpy evolve takes (on my machine) ~0.15 seconds to return the new_grid. Your evolve with the for loops takes ~3.85 seconds.
consider my code
a,b,c = np.loadtxt ('test.dat', dtype='double', unpack=True)
a,b, and c are the same array length.
for i in range(len(a)):
q[i] = 3*10**5*c[i]/100
x[i] = q[i]*math.sin(a)*math.cos(b)
y[i] = q[i]*math.sin(a)*math.sin(b)
z[i] = q[i]*math.cos(a)
I am trying to find all the combinations for the difference between 2 points in x,y,z to iterate this equation (xi-xj)+(yi-yj)+(zi-zj) = r
I use this combination code
for combinations in it.combinations(x,2):
xdist = (combinations[0] - combinations[1])
for combinations in it.combinations(y,2):
ydist = (combinations[0] - combinations[1])
for combinations in it.combinations(z,2):
zdist = (combinations[0] - combinations[1])
r = (xdist + ydist +zdist)
This takes a long time for python for a large file I have and I am wondering if there is a faster way to get my array for r preferably using a nested loop?
Such as
if i in range(?):
if j in range(?):
Since you're apparently using numpy, let's actually use numpy; it'll be much faster. It's almost always faster and usually easier to read if you avoid python loops entirely when working with numpy, and use its vectorized array operations instead.
a, b, c = np.loadtxt('test.dat', dtype='double', unpack=True)
q = 3e5 * c / 100 # why not just 3e3 * c?
x = q * np.sin(a) * np.cos(b)
y = q * np.sin(a) * np.sin(b)
z = q * np.cos(a)
Now, your example code after this doesn't do what you probably want it to do - notice how you just say xdist = ... each time? You're overwriting that variable and not doing anything with it. I'm going to assume you want the squared euclidean distance between each pair of points, though, and make a matrix dists with dists[i, j] equal to the distance between the ith and jth points.
The easy way, if you have scipy available:
# stack the points into a num_pts x 3 matrix
pts = np.hstack([thing.reshape((-1, 1)) for thing in (x, y, z)])
# get squared euclidean distances in a matrix
dists = scipy.spatial.squareform(scipy.spatial.pdist(pts, 'sqeuclidean'))
If your list is enormous, it's more memory-efficient to not use squareform, but then it's in a condensed format that's a little harder to find specific pairs of distances with.
Slightly harder, if you can't / don't want to use scipy:
pts = np.hstack([thing.reshape((-1, 1)) for thing in (x, y, z)])
sqnorms = np.sum(pts ** 2, axis=1)
dists = sqnorms.reshape((-1, 1)) - 2 * np.dot(pts, pts.T) + sqnorms
which basically implements the formula (a - b)^2 = a^2 - 2 a b + b^2, but all vector-like.
Apologies for not posting a full solution, but you should avoid nesting calls to range(), as it will create a new tuple every time it gets called. You are better off either calling range() once and storing the result, or using a loop counter instead.
For example, instead of:
max = 50
for number in range (0, 50):
doSomething(number)
...you would do:
max = 50
current = 0
while current < max:
doSomething(number)
current += 1
Well, the complexity of your calculation is pretty high. Also, you need to have huge amounts of memory if you want to store all r values in a single list. Often, you don't need a list and a generator might be enough for what you want to do with the values.
Consider this code:
def calculate(x, y, z):
for xi, xj in combinations(x, 2):
for yi, yj in combinations(y, 2):
for zi, zj in combinations(z, 2):
yield (xi - xj) + (yi - yj) + (zi - zj)
This returns a generator that computes only one value each time you call the generator's next() method.
gen = calculate(xrange(10), xrange(10, 20), xrange(20, 30))
gen.next() # returns -3
gen.next() # returns -4 and so on