Normalize Numpy Upper-triangular subarray

Normalize Numpy Upper-triangular subarray - python

I have an upper-triangular subarray of dimension 4. It is initialized as
N, Q = (99, 23)
bivariate = np.zeros((N,N,Q,Q))
and then populated by something like
for i in range(N):
for j in range(i+1,N):
bivariate[i,j] = num
I want the upper-triangular elements to be normalized (Q,Q) matrices. I am currently doing this by just doing a
bivariate /= bivariate.sum(axis=3).sum(axis=2)[:,:,np.newaxis,np.newaxis]
but I get Runtime Warnings due to the empty arrays of the lower-triangular portion being normalized. Is there a better way to do this other than the following?
for i in range(N):
for j in range(i+1,N):
bivariate[i,j] /= bivariate[i,j].sum()
Thanks.

If you're concerned about getting np.nan, you could try to replace the null entries of your normalization factor by 1:
norm_factor = bivariate.sum(axis=3).sum(axis=2)[:,:,None,None]
bivariate /= np.where(norm, norm, 1)
At least you'll avoid the for loops...

FWIW, I've found that it's much easier to work on the upper triangular portion separately then insert it back in.
triu = np.tri_indices(n, 1)
upper_tri = bivariate[triu].reshape(-1, Q*Q)
upper_tri /= upper_tri.sum(axis=1)
bivariate[triu] = upper_tri.reshape(-1, Q, Q)

Related

3D tensor of diagonal matrices

I have a matrix A with m rows and n columns. I want a 3D tensor of dimension m*n*n such that the tensor consists out of m diagonal matrices formed by each of the columns of A. In other words every column of A should be converted into a diagonalized matrix and all those matrices should form a 3D tensor together.
This is quite easy to do with a for loop. But I want to do it without to improve speed.
I came up with a bad and inefficient way which works, but I hope someone can help me with finding a better way, which allows for large A matrices.
# I use python
# import numpy as np
n = A.shape[0] # A is an n*k matrix
k = A.shape[1]
holding_matrix = np.repeat(np.identity(k), repeats=n, axis=1) # k rows with n*k columns
identity_stack = np.tile(np.identity(n),k) #k nxn identity matrices stacked together
B = np.array((A#holding_matrix)*identity_stack)
B = np.array(np.hsplit(B,k)) # desired result of k n*n diagonal matrices in a tensor

n = A.shape[0] # A.shape == (n, k)
k = A.shape[1]
B = np.zeros_like(A, shape=(k, n*n)) # to preserve dtype and order of A
B[:, ::(n+1)] = A.T
B = B.reshape(k, n, n)

Vectorizing numpy calculation without a tensor dot product

I would like to vectorize a particular case of the following mathematical formula (from Table 2 and Appendix A of this paper) with numpy:
The case I would like to compute is the following, where the scaling factors under the square root can be ignored.
The term w_kij - w_ij_bar is a n x p x p matrix, where n is typically much greater than p.
I implemented 2 solutions neither of which are particularly good: one involves a double loop, while the other fills the memory with unnecessary calculations very quickly.
dummy_data = np.random.normal(size=(100, 5, 5))
# approach 1: a double loop
out_hack = np.zeros((5, 5))
for i in range(5):
for j in range(5):
out_hack[i, j] = (dummy_data.T[j, j, :]*dummy_data[:, j, i]).sum()
# approach 2: slicing a diagonal from a tensor dot product
out = np.tensordot(dummy_data.T, dummy_data, axes=1)
out = out.diagonal(0, 0, 2).diagonal(0, 0, 2)
print((out.round(6) == out_hack.round(6)).all())
>>> True
Is there a way to find middle ground between these 2 approaches?

np.einsum translates that almost literally -
np.einsum('kjj,kji->ij',dummy_data,dummy_data)

Convert numpy array to sparse matrix to find inverse and then convert back to numpy array

In the following function, if I use
np.linalg.inv when Nx, Nt get large the function seems to take forever. In my mind I know I should instead use sparse matrices, which are in scipy (which I've never used before), but I'm getting really stuck how to convert M to a sparse matrix, find its inverse, and then convert it back to a numpy array for the for loop.
If anyone could help I'd be really grateful! Thanks!
def BTCS(phiOld, c, Nx, Nt):
#Initiate phi for the for loop
phi = phiOld.copy()
#Crate the matrix M for the BTCS scheme
M = np.zeros((Nx, Nx))
for i in range(Nx):
M[i,(i-1)%Nx] = -c/2
M[i,i] = 1
M[i,(i+1)%Nx] = c/2
#Take the inverse of M so as to have phi(n+1) = M^(-1) * phi(n)
M_inv = np.linalg.inv(M)
#Loop over all time steps
for it in range(Nt):
#Loop over space (excluding end points)
for ix in range(1,Nx-1):
phi[ix] = M_inv.dot(phiOld)[ix]
#Compute boundary values using periodic boundary conditions
phi[0] = M_inv.dot(phiOld)[0]
phi[Nx-1] = phi[0]
#Update old time value
phiOld = phi.copy()
return phi

Numpy Uniform Distribution With Decay

I'm trying to construct a matrix of uniform distributions decaying to 0 at the same rate in each row. The distributions should be between -1 and 1. What I'm looking at is to construct something that resembles:
[[0.454/exp(0) -0.032/exp(1) 0.641/exp(2)...]
[-0.234/exp(0) 0.921/exp(1) 0.049/exp(2)...]
...
[0.910/exp(0) 0.003/exp(1) -0.908/exp(2)...]]
I can build a matrix of uniform distributions using:
w = np.array([np.random.uniform(-1, 1, 10) for i in range(10)])
and can achieve the desired result using a for loop with:
for k in range(len(w)):
for l in range(len(w[0])):
w[k][l] = w[k][l]/np.exp(l)
but wanted to know if there was a better way of accomplishing this.

You can use numpy's broadcasting feature to do this:
w = np.random.uniform(-1, 1, size=(10, 10))
weights = np.exp(np.arange(10))
w /= weights

Alok Singhal's answer is best, but as another way to do this (perhaps more explicit)
you can duplicate the vector [exp(0), ...,exp(9)] and stack them all into matrix by doing an outer product with a vector of ones. Then divide the 'w' matrix by the new 'decay' matrix.
n=10
w = np.array([np.random.uniform(-1, 1, n) for i in range(n)])
decay = np.outer( np.ones((n,1)), np.exp(np.arange(10)) )
result = w/decay
You could also use np.tile for creating a matrix out of several copies of a vector. It accomplishes the same thing as the outer product trick.

Permute rows in "slices" of 3d array to match each other

I have a series of 2d arrays where the rows are points in some space. Many similar points occur across all arrays but in different row order. I want to sort the rows so they have the most similar order. Also the points are too different for clustering with K-means or DBSCAN. The problem can also be cast like this. If I stack the arrays into a 3d array, how do I permute the rows to minimize the average standard deviation (SD) along the 2nd axis? What's a good sorting algorithm for this problem?
I've tried the following approaches.
Create a set of reference 2d array and sort rows in each array to minimize mean euclidean distances to the reference 2d array. This I am afraid gives biased results.
Sort rows in arrays pairwise, then pairs of pair-medians, then pairs of that, etc... This doesn't really work and I'm not sure why.
A third approach could be just brute force optimization but I try to avoid that since I have multiple sets of arrays to perform the procedure on.
This is my code for the 2nd approach (Python):
def reorder_to(A, B):
"""Reorder rows in A to best match rows in B.
Input
-----
A : N x M numpy.array
B : N x M numpy.array
Output
------
perm_order : permutation order
"""
if A.shape != B.shape:
print "A and B must have the same shape"
return None
N = A.shape[0]
# Create a distance matrix of distance between rows in A and B
distance_matrix = np.ones((N, N))*np.inf
for i, a in enumerate(A):
for ii, b in enumerate(B):
ba = (b-a)
distance_matrix[i, ii] = np.sqrt(np.dot(ba, ba))
# Choose permutation order by smallest distances first
perm_order = [[] for _ in range(N)]
for _ in range(N):
ind = np.argmin(distance_matrix)
i, ii = ind/N, ind%N
perm_order[ii] = i
distance_matrix[i, :] = np.inf
distance_matrix[:, ii] = np.inf
return perm_order
def permute_tensor_rows(A):
"""Permute 1d rows in 3d array along the 0th axis to minimize average SD along 2nd axis.
Input
-----
A : numpy.3darray
Each "slice" in the 2nd direction is an independent array whose rows can be permuted
to decrease the average SD in the 2nd direction.
Output
------
A : numpy.3darray
A with sorted rows in each "slice".
"""
step = 2
while step <= A.shape[2]:
for k in range(0, A.shape[2], step):
# If last, reorder to previous
if k + step > A.shape[2]:
A_kk = A[:, :, k:(k+step)]
kk_order = reorder_to(np.median(A_kk, axis=2), np.median(A_k, axis=2))
A[:, :, k:(k+step)] = A[kk_order, :, k:(k+step)]
continue
k_0, k_1 = k, k+step/2
kk_0, kk_1 = k+step/2, k+step
A_k = A[:, :, k_0:k_1]
A_kk = A[:, :, kk_0:kk_1]
order = reorder_to(np.median(A_k, axis=2), np.median(A_kk, axis=2))
A[:, :, k_0:k_1] = A[order, :, k_0:k_1]
print "Step:", step, "\t ... Average SD:", np.mean(np.std(A, axis=2))
step *= 2
return A

Sorry I should have looked at your code sample; that was very informative.
Seems like this here gives an out-of-the-box solution to your problem:
http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linear_sum_assignment.html#scipy.optimize.linear_sum_assignment
Only really feasible for a few 100 points at most though, in my experience.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Normalize Numpy Upper-triangular subarray - python

If you're concerned about getting np.nan, you could try to replace the null entries of your normalization factor by 1: norm_factor = bivariate.sum(axis=3).sum(axis=2)[:,:,None,None] bivariate /= np.where(norm, norm, 1) At least you'll avoid the for loops...

FWIW, I've found that it's much easier to work on the upper triangular portion separately then insert it back in. triu = np.tri_indices(n, 1) upper_tri = bivariate[triu].reshape(-1, Q*Q) upper_tri /= upper_tri.sum(axis=1) bivariate[triu] = upper_tri.reshape(-1, Q, Q)

Related

3D tensor of diagonal matrices

Vectorizing numpy calculation without a tensor dot product

Convert numpy array to sparse matrix to find inverse and then convert back to numpy array

Numpy Uniform Distribution With Decay

Permute rows in "slices" of 3d array to match each other

Categories

Resources