Normalize Numpy Upper-triangular subarray - python

I have an upper-triangular subarray of dimension 4. It is initialized as
N, Q = (99, 23)
bivariate = np.zeros((N,N,Q,Q))
and then populated by something like
for i in range(N):
for j in range(i+1,N):
bivariate[i,j] = num
I want the upper-triangular elements to be normalized (Q,Q) matrices. I am currently doing this by just doing a
bivariate /= bivariate.sum(axis=3).sum(axis=2)[:,:,np.newaxis,np.newaxis]
but I get Runtime Warnings due to the empty arrays of the lower-triangular portion being normalized. Is there a better way to do this other than the following?
for i in range(N):
for j in range(i+1,N):
bivariate[i,j] /= bivariate[i,j].sum()
Thanks.

If you're concerned about getting np.nan, you could try to replace the null entries of your normalization factor by 1:
norm_factor = bivariate.sum(axis=3).sum(axis=2)[:,:,None,None]
bivariate /= np.where(norm, norm, 1)
At least you'll avoid the for loops...

FWIW, I've found that it's much easier to work on the upper triangular portion separately then insert it back in.
triu = np.tri_indices(n, 1)
upper_tri = bivariate[triu].reshape(-1, Q*Q)
upper_tri /= upper_tri.sum(axis=1)
bivariate[triu] = upper_tri.reshape(-1, Q, Q)

Related

3D tensor of diagonal matrices

I have a matrix A with m rows and n columns. I want a 3D tensor of dimension m*n*n such that the tensor consists out of m diagonal matrices formed by each of the columns of A. In other words every column of A should be converted into a diagonalized matrix and all those matrices should form a 3D tensor together.
This is quite easy to do with a for loop. But I want to do it without to improve speed.
I came up with a bad and inefficient way which works, but I hope someone can help me with finding a better way, which allows for large A matrices.
# I use python
# import numpy as np
n = A.shape[0] # A is an n*k matrix
k = A.shape[1]
holding_matrix = np.repeat(np.identity(k), repeats=n, axis=1) # k rows with n*k columns
identity_stack = np.tile(np.identity(n),k) #k nxn identity matrices stacked together
B = np.array((A#holding_matrix)*identity_stack)
B = np.array(np.hsplit(B,k)) # desired result of k n*n diagonal matrices in a tensor
n = A.shape[0] # A.shape == (n, k)
k = A.shape[1]
B = np.zeros_like(A, shape=(k, n*n)) # to preserve dtype and order of A
B[:, ::(n+1)] = A.T
B = B.reshape(k, n, n)

Vectorizing numpy calculation without a tensor dot product

I would like to vectorize a particular case of the following mathematical formula (from Table 2 and Appendix A of this paper) with numpy:
The case I would like to compute is the following, where the scaling factors under the square root can be ignored.
The term w_kij - w_ij_bar is a n x p x p matrix, where n is typically much greater than p.
I implemented 2 solutions neither of which are particularly good: one involves a double loop, while the other fills the memory with unnecessary calculations very quickly.
dummy_data = np.random.normal(size=(100, 5, 5))
# approach 1: a double loop
out_hack = np.zeros((5, 5))
for i in range(5):
for j in range(5):
out_hack[i, j] = (dummy_data.T[j, j, :]*dummy_data[:, j, i]).sum()
# approach 2: slicing a diagonal from a tensor dot product
out = np.tensordot(dummy_data.T, dummy_data, axes=1)
out = out.diagonal(0, 0, 2).diagonal(0, 0, 2)
print((out.round(6) == out_hack.round(6)).all())
>>> True
Is there a way to find middle ground between these 2 approaches?
np.einsum translates that almost literally -
np.einsum('kjj,kji->ij',dummy_data,dummy_data)

Convert numpy array to sparse matrix to find inverse and then convert back to numpy array

In the following function, if I use
np.linalg.inv when Nx, Nt get large the function seems to take forever. In my mind I know I should instead use sparse matrices, which are in scipy (which I've never used before), but I'm getting really stuck how to convert M to a sparse matrix, find its inverse, and then convert it back to a numpy array for the for loop.
If anyone could help I'd be really grateful! Thanks!
def BTCS(phiOld, c, Nx, Nt):
#Initiate phi for the for loop
phi = phiOld.copy()
#Crate the matrix M for the BTCS scheme
M = np.zeros((Nx, Nx))
for i in range(Nx):
M[i,(i-1)%Nx] = -c/2
M[i,i] = 1
M[i,(i+1)%Nx] = c/2
#Take the inverse of M so as to have phi(n+1) = M^(-1) * phi(n)
M_inv = np.linalg.inv(M)
#Loop over all time steps
for it in range(Nt):
#Loop over space (excluding end points)
for ix in range(1,Nx-1):
phi[ix] = M_inv.dot(phiOld)[ix]
#Compute boundary values using periodic boundary conditions
phi[0] = M_inv.dot(phiOld)[0]
phi[Nx-1] = phi[0]
#Update old time value
phiOld = phi.copy()
return phi

Numpy Uniform Distribution With Decay

I'm trying to construct a matrix of uniform distributions decaying to 0 at the same rate in each row. The distributions should be between -1 and 1. What I'm looking at is to construct something that resembles:
[[0.454/exp(0) -0.032/exp(1) 0.641/exp(2)...]
[-0.234/exp(0) 0.921/exp(1) 0.049/exp(2)...]
...
[0.910/exp(0) 0.003/exp(1) -0.908/exp(2)...]]
I can build a matrix of uniform distributions using:
w = np.array([np.random.uniform(-1, 1, 10) for i in range(10)])
and can achieve the desired result using a for loop with:
for k in range(len(w)):
for l in range(len(w[0])):
w[k][l] = w[k][l]/np.exp(l)
but wanted to know if there was a better way of accomplishing this.
You can use numpy's broadcasting feature to do this:
w = np.random.uniform(-1, 1, size=(10, 10))
weights = np.exp(np.arange(10))
w /= weights
Alok Singhal's answer is best, but as another way to do this (perhaps more explicit)
you can duplicate the vector [exp(0), ...,exp(9)] and stack them all into matrix by doing an outer product with a vector of ones. Then divide the 'w' matrix by the new 'decay' matrix.
n=10
w = np.array([np.random.uniform(-1, 1, n) for i in range(n)])
decay = np.outer( np.ones((n,1)), np.exp(np.arange(10)) )
result = w/decay
You could also use np.tile for creating a matrix out of several copies of a vector. It accomplishes the same thing as the outer product trick.

Permute rows in "slices" of 3d array to match each other

I have a series of 2d arrays where the rows are points in some space. Many similar points occur across all arrays but in different row order. I want to sort the rows so they have the most similar order. Also the points are too different for clustering with K-means or DBSCAN. The problem can also be cast like this. If I stack the arrays into a 3d array, how do I permute the rows to minimize the average standard deviation (SD) along the 2nd axis? What's a good sorting algorithm for this problem?
I've tried the following approaches.
Create a set of reference 2d array and sort rows in each array to minimize mean euclidean distances to the reference 2d array. This I am afraid gives biased results.
Sort rows in arrays pairwise, then pairs of pair-medians, then pairs of that, etc... This doesn't really work and I'm not sure why.
A third approach could be just brute force optimization but I try to avoid that since I have multiple sets of arrays to perform the procedure on.
This is my code for the 2nd approach (Python):
def reorder_to(A, B):
"""Reorder rows in A to best match rows in B.
Input
-----
A : N x M numpy.array
B : N x M numpy.array
Output
------
perm_order : permutation order
"""
if A.shape != B.shape:
print "A and B must have the same shape"
return None
N = A.shape[0]
# Create a distance matrix of distance between rows in A and B
distance_matrix = np.ones((N, N))*np.inf
for i, a in enumerate(A):
for ii, b in enumerate(B):
ba = (b-a)
distance_matrix[i, ii] = np.sqrt(np.dot(ba, ba))
# Choose permutation order by smallest distances first
perm_order = [[] for _ in range(N)]
for _ in range(N):
ind = np.argmin(distance_matrix)
i, ii = ind/N, ind%N
perm_order[ii] = i
distance_matrix[i, :] = np.inf
distance_matrix[:, ii] = np.inf
return perm_order
def permute_tensor_rows(A):
"""Permute 1d rows in 3d array along the 0th axis to minimize average SD along 2nd axis.
Input
-----
A : numpy.3darray
Each "slice" in the 2nd direction is an independent array whose rows can be permuted
to decrease the average SD in the 2nd direction.
Output
------
A : numpy.3darray
A with sorted rows in each "slice".
"""
step = 2
while step <= A.shape[2]:
for k in range(0, A.shape[2], step):
# If last, reorder to previous
if k + step > A.shape[2]:
A_kk = A[:, :, k:(k+step)]
kk_order = reorder_to(np.median(A_kk, axis=2), np.median(A_k, axis=2))
A[:, :, k:(k+step)] = A[kk_order, :, k:(k+step)]
continue
k_0, k_1 = k, k+step/2
kk_0, kk_1 = k+step/2, k+step
A_k = A[:, :, k_0:k_1]
A_kk = A[:, :, kk_0:kk_1]
order = reorder_to(np.median(A_k, axis=2), np.median(A_kk, axis=2))
A[:, :, k_0:k_1] = A[order, :, k_0:k_1]
print "Step:", step, "\t ... Average SD:", np.mean(np.std(A, axis=2))
step *= 2
return A
Sorry I should have looked at your code sample; that was very informative.
Seems like this here gives an out-of-the-box solution to your problem:
http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linear_sum_assignment.html#scipy.optimize.linear_sum_assignment
Only really feasible for a few 100 points at most though, in my experience.

Categories

Resources