How to efficiently compute logsumexp of upper triangle in a nested loop?

How to efficiently compute logsumexp of upper triangle in a nested loop? - python

I have a nested for loop that iterates over rows of the weight matrix and applies logsumexp to the upper triangular portion of the outer addition matrix from these weights rows. It is very slow so I'm trying to figure out how to speed this up by either vectorizing or taking out the loops in lieu of matrix operations.
'''
Wm: weights matrix, nxk
W: updated weights matrix, nxn
triu_inds: upper triangular indices of Wxy outer matrix
'''
for x in range(n-1):
wx = Wm[x, :]
for y in range(x+1, n):
wy = Wm[y, :]
Wxy = np.add.outer(wx, wy)
Wxy = Wxy[triu_inds]
W[x, y] = logsumexp(Wxy)
logsumexp: computes the log of the sum of exponentials of an input array
a: [1, 2, 3]
logsumexp(a) = log( exp(1) + exp(2) + exp(3) )
The input data Wm is a weights matrix of nxk dimensions. K represents a patients sensor locations and n represents all such possible sensor locations. The values in Wm are basically how close a patients sensor is to a known sensor.
example:
Wm = [1 2 3]
[4 5 6]
[7 8 9]
[10 11 12]
wx = [1 2 3]
wy = [4 5 6]
Wxy = [5 6 7]
[6 7 8]
[7 8 9]
triu_indices = ([0, 0, 1], [1, 2, 2])
Wxy[triu_inds] = [6, 7, 8]
logsumexp(Wxy[triu_inds]) = log(exp(6) + exp(7) + exp(8))

You can perform the outer product on the full matrix Wm and then swap the axes corresponding to columns in operand 1 and rows in operand 2 in order to apply the triangle indices to the columns. The resulting matrix is filled for all combinations of rows, so you need to select the upper triangle part.
W = logsumexp(
np.add.outer(Wm, Wm).swapaxes(1, 2)[(slice(None),)*2 + triu_inds],
axis=-1 # Perform summation over last axis.
)
W = np.triu(W, k=1)

Related

How to calculate pairwise inside a matrix

how to perform pair operations on more than 2 lists
Example
If my matrix have 2 lists (L,M) I calculate the dot product and the results are [[M.M M.L , L.M LL]]
How to calculate the same operation for matrices that have more than 2 lists in a way that the result is a symmetric matrice
x = np.array([[1, 3, 5],[1, 4, 5],[2,6,10]])
How to perform pairwise analysis ?

Solution 1: An alternative to the brute force below is using np.einsum, but it is not simple to use that function. This link has an explanation on how to use it, https://ajcr.net/Basic-guide-to-einsum/. See Solution 2 on how matrix is defined.
np.einsum('ij,jk', matrix,matrix.T)
Out[35]:
array([[35, 38],
[38, 42]])
matrix = np.array([L, M, N]) # matrix with 3 lists
np.einsum('ij,jk', matrix,matrix.T)
Out[37]:
array([[ 35, 38, 70],
[ 38, 42, 76],
[ 70, 76, 140]])
Solution 2 for smaller matrices. Explanation below:
def dot_pairwise(matrix):
return [[np.dot(i, j) for j in matrix] for i in matrix]
dot_pairwise(matrix)
Explanation:
import numpy as np
L = np.array([1, 3, 5])
M = np.array([1, 4, 5])
N = np.array([2, 6, 10])
matrix = np.array([L, M, N]) # matrix with 3 lists
# matrix = np.array([L, M]) # matrix with 2 lists to replicate your example
# Initialize an empty result list
result = []
for i in matrix:
row = [] # Initialize an empty row
for j in matrix:
# Calculate the dot product between the ith and jth lists using numpy.dot
print(i,j) # to print the matrices
dot_product = np.dot(i, j)
row.append(dot_product) # Add the dot product to the row
result.append(row) # Add the row to the result
print(result) # [[LL, LM, LN], [ML, MM, MN], [NL, NM, NN]]
This is the result using L, M matrix:
[1 3 5] [1 3 5] LL
[1 3 5] [1 4 5] LM
[1 4 5] [1 3 5] ML
[1 4 5] [1 4 5] MM
[[35, 38], [38, 42]] # dot products

Alternative from this answer, slightly changed:
np.tensordot(x, x, axes=(1,1))

Fastest way to find the maximum minimum value of two 'connected' matrices

I want to maximize the following function:
f(i, j, k) = min(A(i, j), B(j, k))
Where A and B are matrices and i, j and k are indices that range up to the respective dimensions of the matrices. I would like to find (i, j, k) such that f(i, j, k) is maximized. I am currently doing that as follows:
import numpy as np
import itertools
shape_a = (100 , 150)
shape_b = (shape_a[1], 200)
A = np.random.rand(shape_a[0], shape_a[1])
B = np.random.rand(shape_b[0], shape_b[1])
# All the different i,j,k
combinations = itertools.product(np.arange(shape_a[0]), np.arange(shape_a[1]), np.arange(shape_b[1]))
combinations = np.asarray(list(combinations))
A_vals = A[combinations[:, 0], combinations[:, 1]]
B_vals = B[combinations[:, 1], combinations[:, 2]]
f = np.min([A_vals, B_vals], axis=0)
best_indices = combinations[np.argmax(f)]
print(best_indices)
[ 49 14 136]
This is faster than iterating over all (i, j, k), but a lot of (and most of the) time is spent constructing the A_vals and B_vals matrices. This is unfortunate, because they contain many many duplicate values as the same i, j and k appear multiple times. Is there a way to do this where (1) the speed of numpy's matrix computation can be preserved and (2) I don't have to construct the memory-intensive A_vals and B_vals arrays.
In other languages you could maybe construct the matrices so that they container pointers to A and B, but I do not see how to achieve this in Python.

Perhaps you could re-evaluate how you look at the problem in context of what min and max actually do. Say you have the following concrete example:
>>> np.random.seed(1)
>>> print(A := np.random.randint(10, size=(4, 5)))
[[5 8 9 5 0]
[0 1 7 6 9]
[2 4 5 2 4]
[2 4 7 7 9]]
>>> print(B := np.random.randint(10, size=(5, 3)))
[[1 7 0]
[6 9 9]
[7 6 9]
[1 0 1]
[8 8 3]]
You are looking for a pair of numbers in A and B such that the column in A is the same as the row of B, and the you get the maximum smaller number.
For any set of numbers, the largest pairwise minimum happens when you take the two largest numbers. You are therefore looking for the max in each column of A, row of B, the minimum of those pairs, and then the maximum of that. Here is a relatively simple formulation of the solution:
candidate_i = A.argmax(axis=0)
candidate_k = B.argmax(axis=1)
j = np.minimum(A[candidate_i, np.arange(A.shape[1])], B[np.arange(B.shape[0]), candidate_k]).argmax()
i = candidate_i[j]
k = candidate_k[j]
And indeed, you see that
>>> i, j, k
(0, 2, 2)
>>> A[i, j]
9
>>> B[j, k]
9
If there are collisions, argmax will always pick the first option.

Your values i,j,k are determined by the index of the maximum value from the set {A,B}. You can simply use np.argmax().
if np.max(A) < np.max(B):
ind = np.unravel_index(np.argmax(A),A.shape)
else:
ind = np.unravel_index(np.argmax(B),B.shape)
It will return only two values, either i,j if max({A,B}) = max({A}) or j,k if max({A,B}) = max({B}). But if for example you get i,j then k can be any value that fit the shape of the array B, so select randomly one of this value.
If you also need to maximize the other value then:
if np.max(A) < np.max(B):
ind = np.unravel_index(np.argmax(A),A.shape)
ind = ind + (np.argmax(B[ind[1],:]),)
else:
ind = np.unravel_index(np.argmax(B),B.shape)
ind = (np.argmax(A[:,ind[0]]),) + ind

Needing to assess smaller 3D arrays in larger 3D array with Numpy

I have to take a random integer 50x50x50 array and determine which contiguous 3x3x3 cube within it has the largest sum.
It seems like a lot of splitting features in Numpy don't work well unless the smaller cubes are evenly divisible into the larger one. Trying to work through the thought process I made a 48x48x48 cube that is just in order from 1 to 110,592. I then was thinking of reshaping it to a 4D array with the following code and assessing which of the arrays had the largest sum? when I enter this code though it splits the array in an order that is not ideal. I want the first array to be the 3x3x3 cube that would have been in the corner of the 48x48x48 cube. Is there a syntax that I can add to make this happen?
import numpy as np
arr1 = np.arange(0,110592)
arr2=np.reshape(arr1, (48,48,48))
arr3 = np.reshape(arr2, (4096, 3,3,3))
arr3
output:
array([[[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[ 12, 13, 14],
[ 15, 16, 17]],
[[ 18, 19, 20],
[ 21, 22, 23],
[ 24, 25, 26]]],
desired output:
array([[[[ 0, 1, 2],
[ 48, 49, 50],
[ 96, 97, 98]],
etc etc

Solution
There's a live version of this solution online you can try for yourself
There's a simple (kind of) solution to your original problem of finding the maximum 3x3x3 subcube in a 50x50x50 cube that's based on changing the input array's strides. This solution is completely vectorized (meaning no looping), and so should get the best possible performance out of Numpy:
import numpy as np
def cubecube(arr, cshape):
strides = (*arr.strides, *arr.strides)
shape = (*np.array(arr.shape) - cshape + 1, *cshape)
return np.lib.stride_tricks.as_strided(arr, shape=shape, strides=strides)
def maxcube(arr, cshape):
cc = cubecube(arr, cshape)
ccsums = cc.sum(axis=tuple(range(-arr.ndim, 0)))
ix = np.unravel_index(np.argmax(ccsums), ccsums.shape)[:arr.ndim]
return ix, cc[ix]
The maxcube function takes an array and the shape of the subcubes, and returns a tuple of (first-index-of-largest-cube, largest-cube). Here's an example of how to use maxcube:
shape = (50, 50, 50)
cshape = (3, 3, 3)
# set up a 50x50x50 array
arr = np.arange(np.prod(shape)).reshape(*shape)
# set one of the subcubes as the largest
arr[37, 26, 11] = 999999
ix, cube = maxcube(arr, cshape)
print('first index of largest cube: {}'.format(ix))
print('largest cube:\n{}'.format(cube))
which outputs:
first index of largest cube: (37, 26, 11)
largest cube:
[[[999999 93812 93813]
[ 93861 93862 93863]
[ 93911 93912 93913]]
[[ 96311 96312 96313]
[ 96361 96362 96363]
[ 96411 96412 96413]]
[[ 98811 98812 98813]
[ 98861 98862 98863]
[ 98911 98912 98913]]]
In depth explanation
A cube of cubes
What you have is a 48x48x48 cube, but what you want is a cube of smaller cubes. One can be converted to the other by altering its strides. For a 48x48x48 array of dtype int64, the stride will originally be set as (48*48*8, 48*8, 8). The first value of each non-overlapping 3x3x3 subcube can be iterated over with a stride of (3*48*48*8, 3*48*8, 3*8). Combine these strides to get the strides of the cube of cubes:
# Set up a 48x48x48 array, like in OP's example
arr = np.arange(48**3).reshape(48,48,48)
shape = (16,16,16,3,3,3)
strides = (3*48*48*8, 3*48*8, 3*8, 48*48*8, 48*8, 8)
# restride into a 16x16x16 array of 3x3x3 cubes
arr2 = np.lib.stride_tricks.as_strided(arr, shape=shape, strides=strides)
arr2 is a view of arr (meaning that they share data, so no copy needs to be made) with a shape of (16,16,16,3,3,3). The ijkth 3x3 cube in arr can be accessed by passing the indices to arr2:
i,j,k = 0,0,0
print(arr2[i,j,k])
Output:
[[[ 0 1 2]
[ 48 49 50]
[ 96 97 98]]
[[2304 2305 2306]
[2352 2353 2354]
[2400 2401 2402]]
[[4608 4609 4610]
[4656 4657 4658]
[4704 4705 4706]]]
You can get the sums of all of the subcubes by just summing across the inner axes:
sumOfSubcubes = arr2.sum(3,4,5)
This will yield a 16x16x16 array in which each value is the sum of a non-overlapping 3x3x3 subcube from your original array. This solves the specific problem about the 48x48x48 array that the OP asked about. Restriding can also be used to find all of the overlapping 3x3x3 cubes, as in the cubecube function above.

Your thought process with the 48x48x48 cube goes in the right direction insofar that there are 48³ different contiguous 3x3x3 cubes within the 50x50x50 array, though I don't understand why you would want to reshape it.
What you could do is add all 27 values of each 3x3x3 cube to a 48x48x48 dimensional array by going through all 27 permutations of adjacent slices and find the maximum over it. The found entry will give you the index tuple coordinate_index of the cube corner that is closest to the origin of your original array.
import numpy as np
np.random.seed(0)
array_shape = np.array((50,50,50), dtype=int)
cube_dim = np.array((3,3,3), dtype=int)
original_array = np.random.randint(array_shape)
reduced_shape = array_shape - cube_dim + 1
sum_array = np.zeros(reduced shape, dtype=int)
for i in range(cube_dim[0]):
for j in range(cube_dim[1]):
for k in range(cube_dim[2]):
sum_array += original_array[
i:-cube_dim[0]+1+i, j:-cube_dim[1]+1+j, k:-cube_dim[2]+1+k
]
flat_index = np.argmax(sum_array)
coordinate_index = np.unravel_index(flat_index, reduced_shape)
This method should be faster than looping over each of the 48³ index combinations to find the desired cube as it uses in place summation but in turn requires more memory. I'm not sure about it, but defining an (48³, 27) array with slices and using np.sum over the second axis could be even faster.
You can easily change the above code to find a cuboid with arbitrary side lengths instead.

This is a solution without many numpy functions, just numpy.sum. First define a squared matrix and then the size of the cube cs you are going to perform the summation within.
Just change cs to adjust the cube size and find other solutions. Following #Divakar suggestion, I have used a 4x4x4 array and I also store the location where the cube is location (just the vertex of the cube's origin)
import numpy as np
np.random.seed(0)
a = np.random.randint(0,9,(4,4,4))
print(a)
cs = 2 # Cube size
my_sum = 0
idx = None
for i in range(a.shape[0]-cs+2):
for j in range(a.shape[1]-cs+2):
for k in range(a.shape[2]-cs+2):
cube_sum = np.sum(a[i:i+cs, j:j+cs, k:k+cs])
print(cube_sum)
if cube_sum > my_sum:
my_sum = cube_sum
idx = (i,j,k)
print(my_sum, idx) # 42 (0, 0, 0)
This 3D array a is
[[[5 0 3 3]
[7 3 5 2]
[4 7 6 8]
[8 1 6 7]]
[[7 8 1 5]
[8 4 3 0]
[3 5 0 2]
[3 8 1 3]]
[[3 3 7 0]
[1 0 4 7]
[3 2 7 2]
[0 0 4 5]]
[[5 6 8 4]
[1 4 8 1]
[1 7 3 6]
[7 2 0 3]]]
And you get my_sum = 42 and idx = (0, 0, 0) for cs = 2. And my_sum = 112 and idx = (1, 0, 0) for cs = 3

Here is a cumsum based fast solution:
import numpy as np
nd = 3
cs = 3
N = 50
# create indices [cs-1:, ...], [:, cs-1:, ...], ...
fromcsm = *zip(*np.where(np.identity(nd, bool), np.s_[cs-1:], np.s_[:])),
# create indices [cs:, ...], [:, cs:, ...], ...
fromcs = *zip(*np.where(np.identity(nd, bool), np.s_[cs:], np.s_[:])),
# create indices [:cs, ...], [:, :cs, ...], ...
tocs = *zip(*np.where(np.identity(nd, bool), np.s_[:cs], np.s_[:])),
# create indices [:-cs, ...], [:, :-cs, ...], ...
tomcs = *zip(*np.where(np.identity(nd, bool), np.s_[:-cs], np.s_[:])),
# create indices [cs-1, ...], [:, cs-1, ...], ...
atcsm = *zip(*np.where(np.identity(nd, bool), cs-1, np.s_[:])),
def windowed_sum(a):
out = a.copy()
for i, (fcsm, fcs, tcs, tmcs, acsm) \
in enumerate(zip(fromcsm, fromcs, tocs, tomcs, atcsm)):
out[fcs] -= out[tmcs]
out[acsm] = out[tcs].sum(axis=i)
out = out[fcsm].cumsum(axis=i)
return out
This returns the sums over all the sub cubes. We can then use argmax and unravel_index to get the offset of the maximum cube. Example:
np.random.seed(0)
a = np.random.randint(0,9,(N,N,N))
s = windowed_sum(a)
idx = np.unravel_index(np.argmax(s,), s.shape)

Tensorflow scan multiple matrix rows with offset

Question
I want to scan a matrix analogous to Tensorflow's tf.scan(), but using multiple rows at a time. So given a [n, m] matrix, I want to be able to iterate the m rows (with n elements) from i + j to m giving m - j slices of shape [i - j, n].
How can this be achieved?
I know how tf.scan does something like this, returning the accumulated value of each iteration. But I don't think shifting the matrix as multiple inputs solves this, since the values that have an offset cannot be precomputed.
Example
To give an example for n = 3 and m = 5, let's say I have a matrix that looks like the following:
# [[1 0 0]
# [1 1 0]
# [0 0 0] row 3
# [0 0 0] row 4
# [0 0 0]] row 5
matrix_shape = [5, 3]
matrix_idx = tf.constant([[0, 0], [1, 0], [1, 1]])
matrix = tf.scatter_nd(matrix_idx,
tf.ones(tf.shape(matrix_idx)[0],
dtype=tf.int32),
matrix_shape)
I want to apply the following function from row 3 to row 5:
# [[ 1 0 0] ┌ a
# [ 1 1 0] ├ b
# [ 6 4 2] <─┴ output / current line
# [16 12 6]
# [46 34 18]]
def compute(x):
a = x[0]
b = x[1]
return (a + b + 1) * 2
Does Tensorflow have a function specific to this problem?

The following code I wrote does exactly what I wanted.
The important part here is the return of the function used by tf.scan, which not only gives back the current computation c, but also the row from the previous step b. It is therefore important to later cut off this excess from computation by only selecting the later tensor in this list with [1].
#!/usr/bin/env python3
import tensorflow as tf
def compute(x, _):
a = x[0]
b = x[1]
c = (a + b + 1) * 2
return (b, c)
matrix_shape = tf.constant([3, 3])
init_data = [[1, 0, 0], [1, 1, 0]]
initializer = (
tf.constant(init_data[0]),
tf.constant(init_data[1]),
)
matrix = tf.zeros(matrix_shape, dtype=tf.int32)
computation = tf.scan(compute, matrix, initializer)[1]
result = tf.concat((tf.constant(init_data), computation), axis=0)
with tf.Session() as sess:
sess.run(result)
print(result.eval())
Since I'm yet lacking experience: May this solution be bad for performance, because the function is returning a tuple and therefore not using Tensorflow's speed optimizations?

Extract n columns with highest sum in NumPy array

Imagine I have a NumPy matrix with 100 rows and 1000 columns.
How do I get a new matrix composed by the n columns that have the highest sums in the original matrix?

You can use np.argsort as done by #NPE here. Here's an example on two smaller arrays:
def nlargest_cols(a, n):
return a[:, sorted(a.sum(axis=0).argsort()[-n:][::-1])]
# `a` is a 3x4 array with column sums getting
# larger from left to right.
a = np.arange(12).reshape(3,4)
# `b` is `a` rotated 2 turns.
b = np.rot90(a, 2)
print(nlargest_cols(a, 2))
# [[ 2 3]
# [ 6 7]
# [10 11]]
print(nlargest_cols(b, 3))
# [[11 10 9]
# [ 7 6 5]
# [ 3 2 1]]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to efficiently compute logsumexp of upper triangle in a nested loop? - python

Related

How to calculate pairwise inside a matrix

Fastest way to find the maximum minimum value of two 'connected' matrices

Needing to assess smaller 3D arrays in larger 3D array with Numpy

Tensorflow scan multiple matrix rows with offset

Extract n columns with highest sum in NumPy array

Categories

Resources