I have some constraints of the form of
A_{i,j,k} = r_{i,j}B_{i,j,k}
A is a nxmxp matrix, as is B. r is an nxm matrix.
I would like to vectorize this in Python somehow, as efficiently as possible. Right now, I am making r into nxmxp matrix by saying r_{i,j,k} = r_{i,j} for all 1 <= k <= p. Then I call np.multiply on r and B. This seems inefficient. Any ideas welcome, thanks.
def ndHadamardProduct(r, n, m, p): #r is a n x m matrix, p is an int
rnew = np.zeros(n, m, p)
B = np.zeros(n, m, p)
for i in range(n):
for j in range(m):
for k in range(p):
r[i, j, k] = r[i, j]
B[i, j, k] = random.uniform(0, 1)
return np.multiply(r, B)
Add an extra dimension with np.newaxis and then broadcasting takes care of the repetition for you.
import numpy as np
r = np.random.random((3,4))
b = np.random.random((3,4,5))
a = r[:,:,np.newaxis] * b
Related
For a 2-dimensional matrix A of size (N, K) with each element 'a', we can get a matrix B of size (N, K, N) with each element 'b' such that b[i, k, j] = a[i, k]*a[j,k] by the operation
B = tf.expand_dims(A, -1)* tf.transpose(A).
Now with a matrix of 3-dimensional matrix A of size (M, N, K) with each element 'a', is there a way to compute 4-dimensional matrix B of size (M, N, K, N) with each element 'b' such that
b[m, i, k, j] = a[m, i, k]*a[m, j, k]?
Try einsum:
B = np.einsum('mik,mjk->mikj', A, A)
You can use (tf.einsum) if you are using tensors.
Bemma,
This solution should work:
Expand N dimension, multiply, transpose result.
M, N, K = 2,3,4 # insert your dimensions here
A = tf.constant(np.random.randint(1, 100, size=[M,N,K])) # generate A
B = tf.expand_dims(A, 1)* tf.expand_dims(A, 2)
B = tf.transpose(B, perm=[0, 1, 3, 2])
# test to verify result:
for m in range (M):
for i in range (N):
for k in range (K):
for j in range (N):
assert B[m, i, k, j] == A[m, i, k] * A[m, j, k]
this test passes without errors
I am working on a data science project in which I have to compute the euclidian distance between every pair of observations in a dataset.
Since I am working with very large datasets, I have to use an efficient implementation of pairwise distances computation (both in terms of memory usage and computation time).
One solution is to use the pdist function from Scipy, which returns the result in a 1D array, without duplicate instances.
However, this function is not able to deal with categorical variables. For these, I want to set the distance to 0 when the values are the same and 1 otherwise.
I have tried to implement this variant in Python with Numba. The function takes as input the 2D Numpy array containing all the observations and a 1D array containing the types of the variables (either float64 or category).
Here is the code :
import numpy as np
from numba.decorators import autojit
def pairwise(X, types):
m = X.shape[0]
n = X.shape[1]
D = np.empty((int(m * (m - 1) / 2), 1), dtype=np.float)
ind = 0
for i in range(m):
for j in range(i+1, m):
d = 0.0
for k in range(n):
if types[k] == 'float64':
tmp = X[i, k] - X[j, k]
d += tmp * tmp
else:
if X[i, k] != X[j, k]:
d += 1.
D[ind] = np.sqrt(d)
ind += 1
return D.reshape(1, -1)[0]
pairwise_numba = autojit(pairwise)
vectors = np.random.rand(20000, 100)
types = np.array(['float64']*100)
dists = pairwise_numba(vectors, types)
This implementation is very slow despite the use of Numba. Is it possible to improve my code to make it faster ?
In case you really want numba to perform fast you need to jit the function in nopython mode, otherwise numba may fall back to object mode which is slower (and can be quite slow).
However your function cannot be compiled (as of numba version 0.43.1) in nopython mode, that's because:
the dtype argument to np.empty. np.float is simply Pythons float and will be translated by NumPy (but not numba) to np.float_. If you use numba you have to use that.
String support in numba is lacking. So the types[k] == 'float64' line will not compile.
The first issue is trivially fixe. Regarding the second issue: instead of trying to make the string comparisons work just provide a boolean array. Using a boolean array and evaluating one boolean for thruthiness will also be significantly faster than comparing up to 7 characters. Especially if it's in the innermost loop!
So it might look like this:
import numpy as np
import numba as nb
#nb.njit
def pairwise_numba(X, is_float_type):
m = X.shape[0]
n = X.shape[1]
D = np.empty((int(m * (m - 1) / 2), 1), dtype=np.float64) # corrected dtype
ind = 0
for i in range(m):
for j in range(i+1, m):
d = 0.0
for k in range(n):
if is_float_type[k]:
tmp = X[i, k] - X[j, k]
d += tmp * tmp
else:
if X[i, k] != X[j, k]:
d += 1.
D[ind] = np.sqrt(d)
ind += 1
return D.reshape(1, -1)[0]
dists = pairwise_numba(vectors, types == 'float64') # pass in the boolean array
However you can simplify the logic if you combine scipy.spatial.distances.pdist on the float types with a numba logic to count the unequal categorials:
from scipy.spatial.distance import pdist
#nb.njit
def categorial_sum(X):
m = X.shape[0]
n = X.shape[1]
D = np.zeros(int(m * (m - 1) / 2), dtype=np.float64) # corrected dtype
ind = 0
for i in range(m):
for j in range(i+1, m):
d = 0.0
for k in range(n):
if X[i, k] != X[j, k]:
d += 1.
D[ind] = d
ind += 1
return D
def pdist_with_categorial(vectors, types):
where_float_type = types == 'float64'
# calculate the squared distance of the float values
distances_squared = pdist(vectors[:, where_float_type], metric='sqeuclidean')
# sum the number of mismatched categorials and add that to the distances
# and then take the square root
return np.sqrt(distances_squared + categorial_sum(vectors[:, ~where_float_type]))
It won't be significantly faster, but it drastically simplified the logic in the numba function.
Then you can also avoid the additional array creations by passing in the squared distances to the numba function:
#nb.njit
def add_categorial_sum_and_sqrt(X, D):
m = X.shape[0]
n = X.shape[1]
ind = 0
for i in range(m):
for j in range(i+1, m):
d = 0.0
for k in range(n):
if X[i, k] != X[j, k]:
d += 1.
D[ind] = np.sqrt(D[ind] + d)
ind += 1
return D
def pdist_with_categorial(vectors, types):
where_float_type = types == 'float64'
distances_squared = pdist(vectors[:, where_float_type], metric='sqeuclidean')
return add_categorial_sum_and_sqrt(vectors[:, ~where_float_type], distances_squared)
autojit is deprecated, it is recommended to use jit instead. And almost always you should be using jit(nopython=True) which will make numba fail if something can't be lowered out of python.
Using nopython on your code reveals two problems. One is an easy fix - this line needs to refer to a specific numpy type instead of float
- D = np.empty((int(m * (m - 1) / 2), 1), dtype=np.float)
+ D = np.empty((int(m * (m - 1) / 2), 1), dtype=np.float64)
The second is your use of strings to hold type information - numba has limited support for working with strings. You could instead encode the type information in a numeric array, e.g. 0 for numeric, 1 for categorical. So an implementation could be.
#jit(nopython=True)
def pairwise_nopython(X, types):
m = X.shape[0]
n = X.shape[1]
D = np.empty((int(m * (m - 1) / 2), 1), dtype=np.float64)
ind = 0
for i in range(m):
for j in range(i+1, m):
d = 0.0
for k in range(n):
if types[k] == 0: #numeric
tmp = X[i, k] - X[j, k]
d += tmp * tmp
else:
if X[i, k] != X[j, k]:
d += 1.
D[ind] = np.sqrt(d)
ind += 1
return D.reshape(1, -1)[0]
I try to code an LUP (or PLU it's the same) factorization in python. I have a code which works for small matrix (under a 4x4 size). However when I have tried it with a random generated matrix the decomposition has failed.
import numpy as np
def LUP_factorisation(A):
"""Find P, L and U : PA = LU"""
U = A.copy()
shape_a = U.shape
n = shape_a[0]
L = np.eye(n)
P = np.eye(n)
for i in range(n):
print(U)
k = i
comp = abs(U[i, i])
for j in range(i, n):
if abs(U[j, i]) > comp:
k = j
comp = abs(U[j, i])
line_u = U[k, :].copy()
U[k, :] = U[i, :]
U[i, :] = line_u
print(U)
line_p = P[k, :].copy()
P[k, :] = P[i, :]
P[i, :] = line_p
for j in range(i + 1, n):
g = U[j, i] / U[i, i]
L[j, i] = g
U[j, :] -= g * U[i, :]
return L, U, P
if __name__ == "__main__":
A = np.array(
[[1.0, 2.2, 58, 9.5, 42.65], [6.56, 58.789954, 4.45, 23.465, 6.165], [7.84516, 8.9864, 96.546, 4.654, 7.6514],
[45.65, 47.985, 1.56, 3.9845, 8.6], [455.654, 102.615, 63.965, 5.6, 9.456]])
L, U, P = LUP_factorisation(A)
print(L # U)
print(P # A)
With the example I gave it works: we have PA = LU. But when i do for example :
A = np.random.rand(10, 10)
Then, i don't obtain a good result because PA is different of LU. Any ideas ? Thanks.
As #MattTimmermans writes you should swap rows in both L and U.
Normally this is implicitly handled by storing LU in A and then the swaps are automatically applied to both L and U. See https://en.wikipedia.org/wiki/LU_decomposition#C_code_example
But you have split them so you have to add
line_l = L[k, :].copy()
L[k, :] = L[i, :]
L[i, :] = line_l
Only testing it with diagonally dominant matrices is really bad; and only testing linear algebra routines with random matrices is known to be bad as their properties are very specific - and not "random". See work by Trefethen and his students, e.g. http://dspace.mit.edu/handle/1721.1/14322
The goal of testing should be to find bugs - not to make test-cases so simple that it works.
Make sure that the diagonal of the input matrix A is dominant. So add some value to the diagonal of A, e.g.
A = A + np.eye(A.shape)
or
A = A + 100* np.eye(A.shape)
I hope that helps !
I have a 1d ndarray A of shape (n,) and a 2d ndarray E of shape (n,m). I am trying to preform the following calculation (the circle-dot denotes element wise multiplication):
I have written it using with a for loop, but this block of code is called thousands of times, and I was hoping there was a way to accomplish this with broadcasting or numpy functions. The following is my for loop solution I'm trying to rewrite:
def fun(E, A):
X = E * A[:,np.newaxis]
R = np.zeros(E.shape[-1])
for ii in xrange(len(E)-1):
for jj in xrange(ii+1, len(E)):
R += X[ii] * X[jj]
return R
Any help would be appreciated.
Current approach, but still not working:
def fun1(E, A):
X = E * A[:,np.newaxis]
R = np.zeros(E.shape[-1])
for ii in xrange(len(E)-1):
for jj in xrange(ii+1, len(E)):
R += X[ii] * X[jj]
return R
def fun2(E, A):
n = E.shape[0]
m = E.shape[1]
A_ = np.triu(A[1:] * A[:-1].reshape(-1,1))
E_ = E[1:] * E[:-1]
R = np.sum((A_.reshape(n-1, 1, n-1) * E_.T).transpose(0,2,1).reshape(n-1*n-1,m), axis=0)
return R
A = np.arange(4,9)
E = np.arange(20).reshape((5,4))
print fun1(E,A)
print fun2(E,A)
Now, this should work:
def fun3(E,A):
n,m = E.shape
n_ = n - 1
X = E * A[:, np.newaxis]
a = (X[:-1].reshape(n_, 1, m) * X[1:])
b = np.tril(np.ones((m, n_, n_))).T
R = np.sum((a*b).reshape(n_*n_, m), axis=0)
return R
Last function was only based on the given formula. This is instead based on fun and tested with your added test case.
Hope this works for you!
The Wikipedia entry for the Arnoldi method provides a Python example that produces basis of the Krylov subspace of a matrix A. Supposedly, if A is Hermitian (i.e. if A == A.conj().T) then the Hessenberg matrix h generated by this algorithm is tridiagonal (source). However, when I use the Wikipedia code on a real-world Hermitian matrix, the Hessenberg matrix is not at all tridiagonal. When I perform the computation on the real part of A (so that A == A.T) then I do get a tridiagonal Hessenberg matrix, so there seems to be a problem with the imaginary components of A. Does anybody know why the Wikipedia code doesn't produce the expected results?
Working example:
import numpy as np
import matplotlib.pyplot as plt
from scipy.linalg import circulant
def arnoldi_iteration(A, b, n):
m = A.shape[0]
h = np.zeros((n + 1, n), dtype=np.complex)
Q = np.zeros((m, n + 1), dtype=np.complex)
q = b / np.linalg.norm(b) # Normalize the input vector
Q[:, 0] = q # Use it as the first Krylov vector
for k in range(n):
v = A.dot(q) # Generate a new candidate vector
for j in range(k + 1): # Subtract the projections on previous vectors
h[j, k] = np.dot(Q[:, j], v)
v = v - h[j, k] * Q[:, j]
h[k + 1, k] = np.linalg.norm(v)
eps = 1e-12 # If v is shorter than this threshold it is the zero vector
if h[k + 1, k] > eps: # Add the produced vector to the list, unless
q = v / h[k + 1, k] # the zero vector is produced.
Q[:, k + 1] = q
else: # If that happens, stop iterating.
return Q, h
return Q, h
# Construct matrix A
N = 2**4
I = np.eye(N)
k = np.fft.fftfreq(N, 1.0 / N) + 0.5
alpha = np.linspace(0.1, 1.0, N)*2e2
c = np.fft.fft(alpha) / N
C = circulant(c)
A = np.einsum("i, ij, j->ij", k, C, k)
# Show that A is Hermitian
print(np.allclose(A, A.conj().T))
# Arbitrary (random) initial vector
np.random.seed(0)
v = np.random.rand(N)
# Perform Arnoldi iteration with complex A
_, h = arnoldi_iteration(A, v, N)
# Perform Arnoldi iteration with real A
_, h2 = arnoldi_iteration(np.real(A), v, N)
# Plot results
plt.subplot(121)
plt.imshow(np.abs(h))
plt.title("Complex A")
plt.subplot(122)
plt.imshow(np.abs(h2))
plt.title("Real A")
plt.tight_layout()
plt.show()
Result:
After browsing through some conference presentation slides, I realised that at some point Q had to be conjugated when A is complex. The correct algorithm is posted below for reference, with the code change marked (note that this correction has also been submitted to the Wikipedia entry):
import numpy as np
def arnoldi_iteration(A, b, n):
m = A.shape[0]
h = np.zeros((n + 1, n), dtype=np.complex)
Q = np.zeros((m, n + 1), dtype=np.complex)
q = b / np.linalg.norm(b)
Q[:, 0] = q
for k in range(n):
v = A.dot(q)
for j in range(k + 1):
h[j, k] = np.dot(Q[:, j].conj(), v) # <-- Q needs conjugation!
v = v - h[j, k] * Q[:, j]
h[k + 1, k] = np.linalg.norm(v)
eps = 1e-12
if h[k + 1, k] > eps:
q = v / h[k + 1, k]
Q[:, k + 1] = q
else:
return Q, h
return Q, h