Optimise matrix factorisation algorithm using numpy matrix operations - python

O = self.feedback_df_normalised.to_numpy() # original matrix
K = self.latent_feature_count
P = np.random.rand(len(O), K) # user embeddings
Q = np.random.rand(len(O[0]), K) # show embeddings
Q_T = np.transpose(Q)
for i in range(len(O)):
print("i:", i)
for j in range(len(O[0])):
print("j:", j)
A_ij = np.dot(P[i,:], Q_T[:,j])
dif_ij = O[i, j] - A_ij
dif_sqd += dif_ij ** 2
for k in range(K):
P[i, k] = P[i, k] + alpha * (2 * dif_ij * Q_T[k, j] - beta * P[i, k])
Q_T[k, j] = Q_T[k, j] + alpha * (2 * dif_ij * P[i, k] - beta * Q_T[k, j])
print("dif_sqd:", dif_sqd)
if dif_sqd < accepted_deviation:
A = P # Q_T
break
have this algorithm implementing matrix factorisation via gradient based on this one:
https://towardsdatascience.com/recommendation-system-matrix-factorization-d61978660b4b#:~:text=Collaborative%20filtering%20is%20the%20application,items'%20and%20users'%20entities.&text=Hence%2C%20from%20the%20matrix%20factorization,in%20user's%20preferences%20and%20interactions.
Iterative relationship the algorithm aims to implement
The general format of O is something like this:
O = [
[5,3,0,1],
[4,0,0,1],
[1,1,0,5],
[1,0,0,4],
[0,1,5,4],
[2,1,3,0],
]
When O becomes large this becomes veerrrrryyyy slow to execute though. I've been banging my head trying to think about how to do this sans looping, but I'm not good enough with matrices to figure it out. Any help would be appreciated.

Related

Is there any way to optimize a triple loop in Python by using numpy or other ressources?

I'm having trouble finding out a way to optimize a triple loop in Python. I will directly give the code for a better and simpler representation of what I have to compute :
Given two 2-D arrays named samples (M x N) and D(N x N) along with the output results (NxN):
for sigma in range(M):
for i in range(N):
for j in range(N):
results[i, j] += (1/N) * (samples[sigma, i]*samples[sigma, j]
- samples[sigma, i]*D[j, i]
- samples[sigma, j]*D[i, j])
return results
It does the job but is not effective at all in python. I tried to unloop the for i.. for j.. loop but I cannot compute it correctly with the sigma in the way.
Does someone have an idea on how to optimize those few lines ? Any suggestions are welcomed such as numpy, numexpr, etc...
One way I found to improve your code (i.e reduce the number of loops) is by using np.meshgrid.
Here is the impovement I found. It took some fiddling but it gives the same output as your triple loop code. I kept the same code structure so you can see what parts correspond to what part. I hope this is of use to you!
for sigma in range(M):
xx, yy = np.meshgrid(samples[sigma], samples[sigma])
results += (1/N) * (xx * yy
- yy * D.T
- xx * D)
print(results) # or return results
.
Edit: Here's a small script to verify that the results are as expected:
import numpy as np
M, N = 3, 4
rng = np.random.default_rng(seed=42)
samples = rng.random((M, N))
D = rng.random((N, N))
results = rng.random((N, N))
results_old = results.copy()
results_new = results.copy()
for sigma in range(M):
for i in range(N):
for j in range(N):
results_old[i, j] += (1/N) * (samples[sigma, i]*samples[sigma, j]
- samples[sigma, i]*D[j, i]
- samples[sigma, j]*D[i, j])
print('\n\nresults_old', results_old, sep='\n')
for sigma in range(M):
xx, yy = np.meshgrid(samples[sigma], samples[sigma])
results_new += (1/N) * (xx * yy
- yy * D.T
- xx * D)
print('\n\nresults_new', results_new, sep='\n')
Edit 2: Entirely getting rid of loops: it is a bit convoluted but it essentially does the same thing.
M, N = samples.shape
xxx, yyy = np.meshgrid(samples, samples)
split_x = np.array(np.hsplit(np.vsplit(xxx, M)[0], M))
split_y = np.array(np.vsplit(np.hsplit(yyy, M)[0], M))
results += np.sum(
(1/N) * (split_x*split_y
- split_y*D.T
- split_x*D), axis=0)
print(results) # or return results
In order to vectorize for loops, we can make use of broadcasting and then reducing along any axes that are not reflected by the output array. To do so, we can "assign" one axis to each of the for loop indices (as a convention). For your example this means that all input arrays can be reshaped to have dimension 3 (i.e. len(a.shape) == 3); the axes correspond then to sigma, i, j respectively. Then we can perform all operations with the broadcasted arrays and finally reduce (sum) the result along the sigma axis (since only i, j are reflected in the result):
# Ordering of axes: (sigma, i, j)
samples_i = samples[:, :, np.newaxis]
samples_j = samples[:, np.newaxis, :]
D_ij = D[np.newaxis, :, :]
D_ji = D.T[np.newaxis, :, :]
return (samples_i*samples_j - samples_i*D_ji - samples_j*D_ij).sum(axis=0) / N
The following is a complete example that compares the reference code (using for loops) with the above version; note that I've removed the 1/N part in order to keep computations in the domain of integers and thus make the array equality test exact.
import time
import numpy as np
def timeit(func):
def wrapper(*args):
t_start = time.process_time()
res = func(*args)
t_total = time.process_time() - t_start
print(f'{func.__name__}: {t_total:.3f} seconds')
return res
return wrapper
rng = np.random.default_rng()
M, N = 100, 200
samples = rng.integers(0, 100, size=(M, N))
D = rng.integers(0, 100, size=(N, N))
#timeit
def reference(samples, D):
results = np.zeros(shape=(N, N))
for sigma in range(M):
for i in range(N):
for j in range(N):
results[i, j] += (samples[sigma, i]*samples[sigma, j]
- samples[sigma, i]*D[j, i]
- samples[sigma, j]*D[i, j])
return results
#timeit
def new(samples, D):
# Ordering of axes: (sigma, i, j)
samples_i = samples[:, :, np.newaxis]
samples_j = samples[:, np.newaxis, :]
D_ij = D[np.newaxis, :, :]
D_ji = D.T[np.newaxis, :, :]
return (samples_i*samples_j - samples_i*D_ji - samples_j*D_ij).sum(axis=0)
assert np.array_equal(reference(samples, D), new(samples, D))
This gives me the following benchmark results:
reference: 6.465 seconds
new: 0.133 seconds
I found easier to break the problem into smaller steps and work on it, until we have a single equation.
Going from your original formulation:
for sigma in range(M):
for i in range(N):
for j in range(N):
results[i, j] += (1/N) * (samples[sigma, i]*samples[sigma, j]
- samples[sigma, i]*D[j, i]
- samples[sigma, j]*D[i, j])
The first thing is to eliminate the j index in the inner most loop. For this we start working with vectors instead of single elements:
for sigma in range(M):
for i in range(N):
results[i, :] += (1/N) * (samples[sigma, i]*samples[sigma, :] - samples[sigma, i]*D[:, i] - samples[sigma, :]*D[i, :])
Then, we eliminate the second loop, the one with i index. In this step we start to think in matrices. Therefore, each loop is the direct summation of "sigma matrices".
for sigma in range(M):
results += (1/N) * (samples[sigma, :, np.newaxis] * samples[sigma] - samples[sigma, :, np.newaxis] * D.T - samples[sigma, :] * D)
I strongly recommend to use this step as the solution since vectorizing even more would require too much memory for a big value of M. But, just for knowlegde...
think of the matrices as 3-dimensional objects. We do the calculations and sum at the end in index zero as:
results = (1/N) * (samples[:, :, np.newaxis] * samples[:,np.newaxis] - samples[:, :, np.newaxis] * D.T - samples[:, np.newaxis, :] * D).sum(axis=0)

How to vectorize with mismatched dimensionality

I have some constraints of the form of
A_{i,j,k} = r_{i,j}B_{i,j,k}
A is a nxmxp matrix, as is B. r is an nxm matrix.
I would like to vectorize this in Python somehow, as efficiently as possible. Right now, I am making r into nxmxp matrix by saying r_{i,j,k} = r_{i,j} for all 1 <= k <= p. Then I call np.multiply on r and B. This seems inefficient. Any ideas welcome, thanks.
def ndHadamardProduct(r, n, m, p): #r is a n x m matrix, p is an int
rnew = np.zeros(n, m, p)
B = np.zeros(n, m, p)
for i in range(n):
for j in range(m):
for k in range(p):
r[i, j, k] = r[i, j]
B[i, j, k] = random.uniform(0, 1)
return np.multiply(r, B)
Add an extra dimension with np.newaxis and then broadcasting takes care of the repetition for you.
import numpy as np
r = np.random.random((3,4))
b = np.random.random((3,4,5))
a = r[:,:,np.newaxis] * b

LUP (PLU) decomposition failed with random matrix

I try to code an LUP (or PLU it's the same) factorization in python. I have a code which works for small matrix (under a 4x4 size). However when I have tried it with a random generated matrix the decomposition has failed.
import numpy as np
def LUP_factorisation(A):
"""Find P, L and U : PA = LU"""
U = A.copy()
shape_a = U.shape
n = shape_a[0]
L = np.eye(n)
P = np.eye(n)
for i in range(n):
print(U)
k = i
comp = abs(U[i, i])
for j in range(i, n):
if abs(U[j, i]) > comp:
k = j
comp = abs(U[j, i])
line_u = U[k, :].copy()
U[k, :] = U[i, :]
U[i, :] = line_u
print(U)
line_p = P[k, :].copy()
P[k, :] = P[i, :]
P[i, :] = line_p
for j in range(i + 1, n):
g = U[j, i] / U[i, i]
L[j, i] = g
U[j, :] -= g * U[i, :]
return L, U, P
if __name__ == "__main__":
A = np.array(
[[1.0, 2.2, 58, 9.5, 42.65], [6.56, 58.789954, 4.45, 23.465, 6.165], [7.84516, 8.9864, 96.546, 4.654, 7.6514],
[45.65, 47.985, 1.56, 3.9845, 8.6], [455.654, 102.615, 63.965, 5.6, 9.456]])
L, U, P = LUP_factorisation(A)
print(L # U)
print(P # A)
With the example I gave it works: we have PA = LU. But when i do for example :
A = np.random.rand(10, 10)
Then, i don't obtain a good result because PA is different of LU. Any ideas ? Thanks.
As #MattTimmermans writes you should swap rows in both L and U.
Normally this is implicitly handled by storing LU in A and then the swaps are automatically applied to both L and U. See https://en.wikipedia.org/wiki/LU_decomposition#C_code_example
But you have split them so you have to add
line_l = L[k, :].copy()
L[k, :] = L[i, :]
L[i, :] = line_l
Only testing it with diagonally dominant matrices is really bad; and only testing linear algebra routines with random matrices is known to be bad as their properties are very specific - and not "random". See work by Trefethen and his students, e.g. http://dspace.mit.edu/handle/1721.1/14322
The goal of testing should be to find bugs - not to make test-cases so simple that it works.
Make sure that the diagonal of the input matrix A is dominant. So add some value to the diagonal of A, e.g.
A = A + np.eye(A.shape)
or
A = A + 100* np.eye(A.shape)
I hope that helps !

Wiki example for Arnoldi iteration only works for real matrices?

The Wikipedia entry for the Arnoldi method provides a Python example that produces basis of the Krylov subspace of a matrix A. Supposedly, if A is Hermitian (i.e. if A == A.conj().T) then the Hessenberg matrix h generated by this algorithm is tridiagonal (source). However, when I use the Wikipedia code on a real-world Hermitian matrix, the Hessenberg matrix is not at all tridiagonal. When I perform the computation on the real part of A (so that A == A.T) then I do get a tridiagonal Hessenberg matrix, so there seems to be a problem with the imaginary components of A. Does anybody know why the Wikipedia code doesn't produce the expected results?
Working example:
import numpy as np
import matplotlib.pyplot as plt
from scipy.linalg import circulant
def arnoldi_iteration(A, b, n):
m = A.shape[0]
h = np.zeros((n + 1, n), dtype=np.complex)
Q = np.zeros((m, n + 1), dtype=np.complex)
q = b / np.linalg.norm(b) # Normalize the input vector
Q[:, 0] = q # Use it as the first Krylov vector
for k in range(n):
v = A.dot(q) # Generate a new candidate vector
for j in range(k + 1): # Subtract the projections on previous vectors
h[j, k] = np.dot(Q[:, j], v)
v = v - h[j, k] * Q[:, j]
h[k + 1, k] = np.linalg.norm(v)
eps = 1e-12 # If v is shorter than this threshold it is the zero vector
if h[k + 1, k] > eps: # Add the produced vector to the list, unless
q = v / h[k + 1, k] # the zero vector is produced.
Q[:, k + 1] = q
else: # If that happens, stop iterating.
return Q, h
return Q, h
# Construct matrix A
N = 2**4
I = np.eye(N)
k = np.fft.fftfreq(N, 1.0 / N) + 0.5
alpha = np.linspace(0.1, 1.0, N)*2e2
c = np.fft.fft(alpha) / N
C = circulant(c)
A = np.einsum("i, ij, j->ij", k, C, k)
# Show that A is Hermitian
print(np.allclose(A, A.conj().T))
# Arbitrary (random) initial vector
np.random.seed(0)
v = np.random.rand(N)
# Perform Arnoldi iteration with complex A
_, h = arnoldi_iteration(A, v, N)
# Perform Arnoldi iteration with real A
_, h2 = arnoldi_iteration(np.real(A), v, N)
# Plot results
plt.subplot(121)
plt.imshow(np.abs(h))
plt.title("Complex A")
plt.subplot(122)
plt.imshow(np.abs(h2))
plt.title("Real A")
plt.tight_layout()
plt.show()
Result:
After browsing through some conference presentation slides, I realised that at some point Q had to be conjugated when A is complex. The correct algorithm is posted below for reference, with the code change marked (note that this correction has also been submitted to the Wikipedia entry):
import numpy as np
def arnoldi_iteration(A, b, n):
m = A.shape[0]
h = np.zeros((n + 1, n), dtype=np.complex)
Q = np.zeros((m, n + 1), dtype=np.complex)
q = b / np.linalg.norm(b)
Q[:, 0] = q
for k in range(n):
v = A.dot(q)
for j in range(k + 1):
h[j, k] = np.dot(Q[:, j].conj(), v) # <-- Q needs conjugation!
v = v - h[j, k] * Q[:, j]
h[k + 1, k] = np.linalg.norm(v)
eps = 1e-12
if h[k + 1, k] > eps:
q = v / h[k + 1, k]
Q[:, k + 1] = q
else:
return Q, h
return Q, h

Python implementation of statistical Sweep operator

I am learning some techniques for doing statistics with missing data from a book (Statistical Analysis with Missing Data by Little and Rubin). One particularly useful function for working with monotone non-response data is the Sweep Operator (details on page 148-151). I know that the R module gmm has the swp function which does this but I was wondering if anyone has implemented this function in Python, ideally for Numpy matrices to hold the input data. I searched StackOverflow and also did several web searches without success. Thanks for any help.
Here is the definition.
A PxP symmetric matrix G is said to be swept on row and column k if it is replaced by another symmetric PxP matrix H with elements defined as follows:
h_kk = -1/g_kk
h_jk = h_kj = g_jk/g_kk for j != k
h_jl = g_jl - g_jk g_kl / g_kk j != k, l != k
G = [g11, g12, g13
g12, g22, g23
g13, g23, g33]
H = SWP(1,G) = [-1/g11, g12/g11, g13/g11
g12/g11, g22-g12^2/g11, g23-g13*g12/g11
g13/g11, g23-g13*g12/g11, g33-g13^2/g11]
kvec = [k1,k2,k3]
SWP[kvec,G] = SWP(k1,SWP(k2,SWP(k3,G)))
Inverse function
H = RSW(k,G)
h_kk = -1/g_kk
h_jk = h_kj = -g_jk/g_kk for j != k
h_jl = g_jk g_kl / g_kk j != k, l != k
G == SWP(k,RSW(k,G)) == RSW(k,SWP(k,G))
def sweep(g, k):
g = np.asarray(g)
n = g.shape[0]
if g.shape != (n, n):
raise ValueError('Not a square array')
if not np.allclose(g - g.T, 0):
raise ValueError('Not a symmetrical array')
if k >= n:
raise ValueError('Not a valid row number')
# Fill with the general formula
h = g - np.outer(g[:, k], g[k, :]) / g[k, k]
# h = g - g[:, k:k+1] * g[k, :] / g[k, k]
# Modify the k-th row and column
h[:, k] = g[:, k] / g[k, k]
h[k, :] = h[:, k]
# Modify the pivot
h[k, k] = -1 / g[k, k]
return h
I have no way of testing the above code, but I found an alternativee description here, which is valid for non-symmetrical matrices, which can be calculated as follows:
def sweep_non_sym(a, k):
a = np.asarray(a)
n = a.shape[0]
if a.shape != (n, n):
raise ValueError('Not a square array')
if k >= n:
raise ValueError('Not a valid row number')
# Fill with the general formula
b = a - np.outer(a[:, k], a[k, :]) / a[k, k]
# b = a - a[:, k:k+1] * a[k, :] / a[k, k]
# Modify the k-th row and column
b[k, :] = a[k, :] / a[k, k]
b[:, k] = -a[:, k] / a[k, k]
# Modify the pivot
b[k, k] = 1 / a[k, k]
return b
This one does give the correct results for the examples in that link:
>>> a = [[2,4],[3,1]]
>>> sweep_non_sym(a, 0)
array([[ 0.5, 2. ],
[-1.5, -5. ]])
>>> sweep_non_sym(sweep_non_sym(a, 0), 1)
array([[-0.1, 0.4],
[ 0.3, -0.2]])
>>> np.dot(a, sweep_non_sym(sweep_non_sym(a, 0), 1))
array([[ 1.00000000e+00, 0.00000000e+00],
[ 5.55111512e-17, 1.00000000e+00]])

Categories

Resources