Numpy - create an almost zero matrix with row from other matrix - python

I have square matrix A and I want to create matrix Z which elements are zero everywhere except for an i'th row, and the i'th row is j'th row of matrix A.
I am aware of two ways to accomplish this. The fist one is fairly straightforward and seems to be the most effective performance-wise:
def do_this(mx: np.array, i: int, j: int):
Z = np.zeros_like(mx)
Z[i, :] = mx[j, :]
return Z
The other, less straightforward way and seemingly much less efficient, is to prepare a mx matrix beforehand, which a zero matrix of the same shape as A, but has 1 in it's (i, j) position, and then to calculate Z as mx # A.
def do_this_other_way(mx: np.array, ref_mx: np.array):
return ref_mx # mx
I decided to benchmark both approaches:
from time import time
import numpy as np
n = 20
num_iters = 5000
A = np.random.rand(n, n)
i, j = 5, 10
t = time()
for _ in range(num_iters):
Z = do_this(A, i, j)
print((time() - t) / num_iters)
ref_mx = np.zeros_like(A)
ref_mx[i, j] = 1
t = time()
for _ in range(num_iters):
Z = do_this_other_way(A, ref_mx)
print((time() - t) / num_iters)
However, when A is relatively small (on my laptop it means that A's size is less than 40), do_this_other_way wins, and when A has size like 20, it wins by an order of magnitude.
That's it: I have doubts that I am doing it the most effective way possible in numpy. Is it possible to do it better without resorting to writing your own low-level implementation of do_this?

Related

Why is my algorithm showing linear behavior when it's supposed to be O(m4^(m))?

I am trying to understand the complexity of an algorithm I am experimenting with. The site where I found the algorithm states that it has a complexity of O(mn4^(m+n)), but when I held n constant in my experimental analysis, the results show a linear behavior, shouldn't it be something like O(m4^m). Can anyone explain why this may be happening?
This is my code:
def longestIncreasingPathDFS(matrix):
maxlen = [0]
for i in range(len(matrix)):
for j in range(len(matrix[0])):
dfs(matrix, i, j, maxlen, 1)
return maxlen[0]
def dfs(matrix, i, j, maxlen, length):
#keeps the longest length in max[0]
maxlen[0] = max(maxlen[0], length)
m = len(matrix)
n = len(matrix[0])
dx = [-1, 0, 1, 0]
dy = [0, 1, 0, -1]
for k in range(4):
x = i + dx[k]
y = j + dy[k]
if x >= 0 and x < m and y >= 0 and y < n and matrix[x][y] > matrix[i][j]:
dfs(matrix, x, y, maxlen, length+ 1)
This is how i get the linear plot
import time
import matplotlib.pyplot as plt
import random
times = []
input_sizes = range(1, 500)
for i in input_sizes:
matrix = [[random.randint(0,100) for _ in range(i)] for _ in range(10)]
start_time = time.time()
longestIncreasingPathDFS(matrix)
end_time = time.time()
times.append(end_time - start_time)
plt.plot(input_sizes, times)
plt.xlabel("Input size")
plt.ylabel("Time (segs)")
plt.show()
I tried increasing the test sample but the plot is clearly lineal, plus i attempted to search related question's about this algorithm but with no luck
Due to the recursion, the worst case is that you go nxm times through in average nxm/2elements, i.e. O((nxm)^4), I'd say.
However, like many algorithms, the normal case is much more forgiving/efficient than the constructed worst case.
So in most cases, it will be more like a constant times nxm, because the longest path is much shorter than the number of matrix elements.
For a random matrix maybe not even growing linear with size, but truly constant - the probability of having a continuous sequence is exponentially decreasing with its length, hence your observation.
Edit:
Tip: Try a large matrix like this (instead of random), the values sorted so the path is stretching over all elements:
[[1, 2, ... n],
[2n, 2n-1, ... n+1],
[2n+1, 2n+2, ... 3n],
[.... n*m]]
I expect this to be more like (n*m)^4
Ah, and another limitation: You use random integers between 1 and 100, so the path is never longer than 100 in your cases. So the complexity is limited to O(n*m*p) where p is the largest integer you use in the random matrix.
Proving #Dr. V's point
import time
import matplotlib.pyplot as plt
import random
import numpy as np
def path_exploit(rows, cols):
"""
Function creates matrix with longest path of size = 2 * (rows + cols) - 2
"""
# Init a zero matrix of size (rows, cols)
matrix = np.zeros(shape = (rows, cols))
# Create longest path along the matrix boundary
bd = [(0, j) for j in range(matrix.shape[1])] + [(i, matrix.shape[1] - 1) for i in range(1, matrix.shape[0])] + [(matrix.shape[0] - 1, j) for j in range(matrix.shape[1] - 2, -1 , -1)] + [(i, 0) for i in range(matrix.shape[0] - 2, 0, -1)]
count = 1
for element in bd:
matrix[element[0], element[1]] = count
count += 1
return matrix.tolist()
times = []
input_sizes = range(1, 1000, 50)
for i in input_sizes:
matrix = path_exploit(i, 10) #[[random.randint(0,100) for _ in range(i)] for _ in range(10)]
start_time = time.time()
longestIncreasingPathDFS(matrix)
end_time = time.time()
times.append(end_time - start_time)
plt.plot(input_sizes, times)
plt.xlabel("Input size")
plt.ylabel("Time (segs)")
plt.show()
Time vs # of cols now starts to look exponential
Plot

Computing derivatives using numpy

I'm trying to implement a differential in python via numpy that can accept a scalar, a vector, or a matrix.
import numpy as np
def foo_scalar(x):
f = x * x
df = 2 * x
return f, df
def foo_vector(x):
f = x * x
n = x.size
df = np.zeros((n, n))
for mu in range(n):
for i in range(n):
if mu == i:
df[mu, i] = 2 * x[i]
return f, df
def foo_matrix(x):
f = x * x
m, n = x.shape
df = np.zeros((m, n, m, n))
for mu in range(m):
for nu in range(n):
for i in range(m):
for j in range(n):
if (mu == i) and (nu == j):
df[mu, nu, i, j] = 2 * x[i, j]
return f, df
This works fine, but it seems like there should be a way to do this in a single function, and let numpy "figure out" the correct dimensions. I could force everything into a 2-D array form with something like
x = np.array(x)
if len(x.shape) == 0:
x = x.reshape(1, 1)
elif len(x.shape) == 1:
x = x.reshape(-1, 1)
if len(f.shape) == 0:
f = f.reshape(1, 1)
elif len(f.shape) == 1:
f = f.reshape(-1, 1)
and always have 4 nested for loops, but this doesn't scale if I need to generalize to higher-order tensors.
Is what I'm trying to do possible, and if so, how?
I highly doubt there is a function to generate the second parameter returned by the function in Numpy. That being said you can play with the feature of Numpy and Python so to vectorize this and make the function faster. You first need to generate the indices and, then generate the target matrix and set it. Note that operating with N-dimensional generic arrays tends to be slow and tricky in non-trivial cases. The magic * unrolling operator is used to generate N parameters.
def foo_generic(x):
f = x ** 2
idx = np.stack(np.meshgrid(*[np.arange(e) for e in x.shape], indexing='ij'))
idx = tuple(np.concatenate((idx, idx)).reshape(2*x.ndim, -1))
df = np.zeros([*x.shape, *x.shape])
df[idx] = 2 * x.ravel()
return f, df
Note that foo_generic does not support scalar and it would be very inefficient to use it for that anyway, but you can add a condition in it to support this special case apart.
The df matrix will very quickly be huge for higher order so I strongly advise you not to use dense matrices for that since the number of zeros is huge compared to the number of values in the matrix case already. Sparse matrices fix this. In fact, for a 5x5 matrix, there are >95% of zeros. Not to mention the matrix becomes quickly huge and willing a huge matrix full of zeros is not efficient.

How can I improve performance in my forward substitution method for lower triangle matrices?

I tried implementing the forward substitution method, a solving process to solve the problem Lx = b with L being a lower triangle matrix and x,b as vectors.
This was an easy task:
def tri_solve(L,b):
n = len(b)
x = np.zeros(n)
x[0] = b[0]/L[0,0];
for i in range(1,n):
comp = 0;
for k in range(0,i):
index = L[i,k]
preSolution = x[k]
comp = comp + index * preSolution
x[i] = 1/L[i,i] * (b[i] - comp)
return x;
Now I compared my calculation times for different sized matrices several times with linalg.solve from the scipy module and it turns out that it is much faster. This makes sense in some points, since SciPy is written in C and C++, but I still expected similar or better calculation times for matrices up to 10x10 dimension. Beginning with 6x6 matrices, linalg.solves becomes slightly faster on average.
Is there a way to improve my rather simple solution?
You could try solve_triangular
If you want to accelerate your code, what you could do is to vectorize the inner loop.
def tri_solve(L,b):
n = len(b)
x = np.zeros(n)
x[0] = b[0]/L[0,0];
for i in range(1,n):
comp = np.sum(L[i,:i] * x[:i])
x[i] = 1/L[i,i] * (b[i] - comp)
return x;
Edit: How to use it
You have to pass as first argument a square lower triangular matrix and as second argument you can pass a 1D array
N = 20
A = np.tril(np.random.randn(N, N))
b = np.random.randn(N)
assert np.allclose(np.linalg.solve(A, b), tri_solve(A, b))
Of course this is a naive implementation and is not stable, you can't use it to solve very large or ill conditioned systems.

Is it possible to "vectorize" successive matrix multiplications?

I am in the process of "vectorizing" a pet project of mine. The function 'propagateXi' propagates (solves) a time-varying, linear system of equations and answers to ~20% of the cost of the project.
The sizes vary a lot between instances of the problem. A "typical" set would be: N = 501, n = 4, s = 3.
Here is a simplification of the code:
import numpy
class Propagator():
def __init__(self,mode):
# lots of code here
pass
def propagateXi(self,Xi,terms):
""" This function propagates the linear system, called many times with different initial conditions.
The initial conditions were previously set.
The "arc" represents a kind of "realization";
the arcs are essentially independent from one another.
Sizes: "vector dimensions": 2*n, "time dimension": N, "arcs dimension": s
- Xi (N, 2*n, s) stores the (vector) state in each time and each "arc".
- self.MainPropMat (N, 2*n, 2*n, s) stores the 2n x 2n matrices
- terms (N, 2*n, s) stores the "forcing terms" of the equation.
The equation would be something like:
\xi_{k+1, arc} = MainPropMat_{k, arc} * \xi_{k, arc} + terms_{k, arc} \forall k, arc
"""
for k in range(self.N - 1):
Xi[k+1,:,:] = numpy.einsum('ijs,js->is', self.MainPropMat[k, :, :, :], Xi[k, :, :]) +\
terms[k, :, :]
return Xi
I have tried to re-interpret the problem as the multiplication of a huge 2nN x 2nN matrix by a long 2nN column containing the initial conditions and the forcing terms. In these conditions, the code was something like this:
import numpy
class Propagator():
def __init__(self,mode):
# lots of code here
self.prepareBigMat()
def prepareBigMat(self)
"""
Prepare the "big matrix" for Xi propagation. Called only once.
This function assembles the "big matrix" (s x 2*n*N x 2*n*N !) for propagating
the Xi differential equation in a single step.
:return: None
"""
n2 = 2 * self.n
BigMat = numpy.empty((self.s,self.N * n2,self.N * n2))
I = numpy.eye(n2*self.N)
for arc in range(self.s):
BigMat[arc,:,:] = I
for k in range(1,self.N):
kn2 = k * n2
BigMat[:, kn2:(kn2+n2) , 0:kn2] = numpy.einsum('ijs,sjk->sik',
self.MainPropMat[k-1,:,:,:],
BigMat[:,(kn2-n2):k*n2, 0:kn2])
self.BigMat = BigMat
def propagateXi(self,Xi,terms):
n2 = n * 2
# Assemble the big column
BigCol = numpy.empty((n2 * N, s))
# Initial conditions
BigCol[:n2, :] = Xi[0, :, :]
# Forcing terms
BigCol[n2:, :] = terms.reshape((n2 * (N - 1), s))
# Perform the multiplication and reshaping of the Xi array
Xi = numpy.einsum('sij,js->is', self.BigMat, BigCol).reshape((N, n2, s))
return Xi
The second version does precisely the same thing as the first one. Sadly, this second version takes almost 5x the running time of the previous one. My guess is that the cost of assembly of the Big Matrix is too great...
Any ideas?
Thanks!

How can I scale a set of 2D arrays (3D array) by a 2D array in a vectorized way using NumPy?

I have a 3D matrix containing N x N covariance matrices for M channels [M x N x N]. I also have a 2D matrix of scaling factors for each channel at a series of time points [M x T]. I want to produce a 4D matrix containing a scaled version of the relevant channel's covariance at each time point. So to be clear, [M x T] * [M x N x N] -> [M x T x N x N]
Current version using for loops:
m, t, n = 4, 10, 7
channel_timeseries = np.zeros((m, t))
covariances = np.random.rand(m, n, n)
result_array = np.zeros((m, t, n, n))
# Each channel
for i, (channel_cov, channel_timeseries) in enumerate(zip(covariances, channel_timeseries)):
# Each time point
for j, time_point in enumerate(channel_timeseries):
result_array[i, j] = time_point * channel_cov
This should lead to the result array being all zeros. Replacing the initialisation of the channel_timeseries with np.ones, we should see the covariance for each channel replicated unchanged at every step of the time series.
The case which actually matters to me is one in which every channel has a scalar value at every time point and we scale the covariance matrix for the relevant channel by the value matching the correct channel and time point.
As you can see above, I can do this with a for loop and it works completely fine, but I'm working with some huge datasets and it would be better to have a vectorised solution.
Many thanks for your time.
You can use np.einsum, as b-fg said
np.einsum('mt,mno->mtno', channel_timeseries, covariances)
or Broadcasting:
channel_timeseries[:, :, None, None] * covariances[:, None, :, :]
numpy.einsum will come handy here. I have modified your code with a random channel_timeseries array, increased the arrays size, and renamed the loop variables (otherwise you overwrite the original ones!)
import numpy as np
import time
m, t, n = 40, 100, 70
channel_timeseries = np.random.rand(m, t)
covariances = np.random.rand(m, n, n)
t0 = time.time()
result_array_1 = np.zeros((m, t, n, n))
# Each channel
for i, (c_cov, c_ts) in enumerate(zip(covariances, channel_timeseries)):
# Each time point
for j, time_point in enumerate(c_ts):
result_array_1[i, j] = time_point * c_cov
t1 = time.time()
result_array_2 = np.einsum('ij,ikl->ijkl', channel_timeseries, covariances)
t2 = time.time()
print(np.array_equal(result_array_1, result_array_2)) # True
print('Time for result_array_1: ', t1-t0) # 0.07601261138916016
print('Time for result_array_2: ', t2-t1) # 0.02957916259765625
This results in a speed increase of more than 50% with numpy.einsum in my machine.

Categories

Resources