I am a bit new to python and want to get into numpy.
I try to solve the gaussian kernel function with 2 for-loops:
for n in range(0, 6):
for k in range(len(centers_Hex)):
expo_sum[n+1] += np.exp(-np.linalg.norm(z_approx-center_Matrix[n][k])**2/(2*sigma**2))
where center_Matrix includesa matrix of (x,y) coordinates for the center of the gaussian bell, z_approx is the data_point which i want to calculate and sigma is a variable.
So how can I simplify these two for loops? My main problem is the linalg.norm for the simplification.
Thank you!
If you can turn center_Matrix into a 3D array, with the 2-element tuples being the inner dimension (so the shape would be (n, k, 2)), you might be able to do the following:
diff = np.linalg.norm([center_Matrix[...,0] - z_approx[0], center_Matrix[...,1] - z_approx[1]], axis=0)
expo_sum = np.exp(-diff**2 / (2*sigma**2))
expo_sum = expo_sum.sum(axis=1)
This does shift the resulting expo_sum by one index, since you use expo_sum[n+1] = ..., but that is something you can adjust elsewhere in your code.
I want to reshape an array of matrices into a single matrix, so that if the original array has shape (n, m, N) the new matrix, X, has shape (N, nxm) and in a way so that if we look at X[i,:].reshape(n,m) we would get back the original i-th matrix.
This can be done with a for-loop and ravel:
X = np.zeros((N, n*m))
for i in range(N):
X[i, :]=Y[:,:,i].ravel() # Y is the original array with shape (n,m,N)
print(X.shape)
Question
Is there a way to do this without using a for-loop, perhaps with just reshape and some other functions? I did not quite find this case when I tried to search online, and doing simply X=Y.reshape((N, n*m)) does not preserve the matrix structure when we check an entry as described above.
Before np.reshape, you can use np.moveaxis (e.g. np.moveaixs(Y, -1, 0)) to move the last axis of Y to the first and make its size to (N, n, m) with the matrix stucture preserved.
list comprehension
X = np.r_[[Y[:, :, i].ravel() for i in Y.shape[2]]]
I wanted to define my own addition operator that takes an Nx1 vector (call it A) and a 1xN vector (B) such that the element in the i^th row and j^th column is the sum of the i^th element in A and the j^th element in B. An example is illustrated here.
I was able to write the following code for the function (and it is correct as far as I know).
def test_fn(a, b):
a_len = a.shape[0]
b_len = b.shape[1]
prod = np.array([[0]*a_len]*b_len)
for i in range(a_len):
for j in range(b_len):
prod[i, j] = a[i, 0] + b[0, j]
return prod
However, the vectors I am working with contain thousands of elements, and the function above is quite slow. I was wondering if there was a better way to approach this problem, or if there was a numpy function that could be of use. Any help would be appreciated.
According to numpy's broadcasting rules, you can use a+b to implement your own defined operator.
The first rule of broadcasting is that if all input arrays do not have the same number of dimensions, a “1” will be repeatedly prepended to the shapes of the smaller arrays until all the arrays have the same number of dimensions.
The second rule of broadcasting ensures that arrays with a size of 1 along a particular dimension act as if they had the size of the array with the largest shape along that dimension. The value of the array element is assumed to be the same along that dimension for the “broadcast” array.
I am facing a memory and speed problem using NumPy but my issue is quite simple.
A is a large NumPy array of H * W integers.
V is a list containing N views of the large array A, each view as the same (Hv, Wv) shape.
K is another list containing N float weights corresponding to the views.
Hv are Wv are almost equal to H and W but smaller. As NumPy views are not copies, this is nice for memory management, even if N is big.
Now, I want to compute a new array using broadcasting for speed: B = V1*K1 + ... + VN*KN
This will result in a new Hv * Wv weighted array.
The issue is that I do not know how to perform such operation without creating intermediate arrays in memory (which is what happens when a view is multiplied with the corresponding weight) and while benefiting from broadcast operations.
import numpy as np
H = W = 1000
Hv = Wv = 900
N = 100
A = np.arange(H * W).reshape(H, W)
V = [A[i:Hv + i, i:Wv + i] for i in range(N)]
K = np.random.rand(N)
# It neither uses speed broadcast nor low memory!
B = sum(v*k for v, k in zip(V, K))
Could someone help me to make a smart use of NumPy, please?
I am assuming V is given as a list and we have don't have access to optimize creating it or just don't need to. So, A is out of the equation and we are left with V and K to get to final output B and thus, left with optimizing the last step.
To solve it, we can just use np.tensordot to replace the last step of sum-reduction as that's basically sum-reduction of a matrix-multiplication. In our case, we are reducing the first axis from K and along the length of input list V. Internally, NumPy would convert the list to a NumPy tensor array and that length would become the first axis of its array version. Thus, we would be reducing the first axis from both these inputs and therefore the implementation would be -
B = np.tensordot(K,V,axes=[0,0]) # `axes` indicates the axes to be sum-reduced
Please note that the internal conversion of list to NumPy array might not be inexpensive and as such it would make more sense to create V using initialization as a NumPy array, rather than in a loop comprehension that would result in a list.
I have some data represented by input_x. It is a tensor of unknown size (should be inputted by batch) and each item there is of size n. input_x undergoes tf.nn.embedding_lookup, so that embed now has dimensions [?, n, m] where m is the embedding size and ? refers to the unknown batch size.
This is described here:
input_x = tf.placeholder(tf.int32, [None, n], name="input_x")
embed = tf.nn.embedding_lookup(W, input_x)
I'm now trying to multiply each sample in my input data (which is now expanded by embedding dimension) by a matrix variable, U, and I can't seem to get how to do that.
I first tried using tf.matmul but it gives an error due to mismatch in shapes. I then tried the following, by expanding the dimension of U and applying batch_matmul (I also tried the function from tf.nn.math_ops., the result was the same):
U = tf.Variable( ... )
U1 = tf.expand_dims(U,0)
h=tf.batch_matmul(embed, U1)
This passes the initial compilation, but then when actual data is applied, I get the following error:
In[0].dim(0) and In[1].dim(0) must be the same: [64,58,128] vs [1,128,128]
I also know why this is happening - I replicated the dimension of U and it is now 1, but the minibatch size, 64, doesn't fit.
How can I do that matrix multiplication on my tensor-matrix input correctly (for unknown batch size)?
Previous answers are obsolete. Currently tf.matmul() support tensors with rank > 2:
The inputs must be matrices (or tensors of rank > 2, representing
batches of matrices), with matching inner dimensions, possibly after
transposition.
Also tf.batch_matmul() was removed and tf.matmul() is the right way to do batch multiplication. The main idea can be understood from the following code:
import tensorflow as tf
batch_size, n, m, k = 10, 3, 5, 2
A = tf.Variable(tf.random_normal(shape=(batch_size, n, m)))
B = tf.Variable(tf.random_normal(shape=(batch_size, m, k)))
tf.matmul(A, B)
Now you will receive a tensor of the shape (batch_size, n, k). Here is what is going on here. Assume you have batch_size of matrices nxm and batch_size of matrices mxk. Now for each pair of them you calculate nxm X mxk which gives you an nxk matrix. You will have batch_size of them.
Notice that something like this is also valid:
A = tf.Variable(tf.random_normal(shape=(a, b, n, m)))
B = tf.Variable(tf.random_normal(shape=(a, b, m, k)))
tf.matmul(A, B)
and will give you a shape (a, b, n, k)
1. I want to multiply a batch of matrices with a batch of matrices of the same length, pairwise
M = tf.random_normal((batch_size, n, m))
N = tf.random_normal((batch_size, m, p))
# python >= 3.5
MN = M # N
# or the old way,
MN = tf.matmul(M, N)
# MN has shape (batch_size, n, p)
2. I want to multiply a batch of matrices with a batch of vectors of the same length, pairwise
We fall back to case 1 by adding and removing a dimension to v.
M = tf.random_normal((batch_size, n, m))
v = tf.random_normal((batch_size, m))
Mv = (M # v[..., None])[..., 0]
# Mv has shape (batch_size, n)
3. I want to multiply a single matrix with a batch of matrices
In this case, we cannot simply add a batch dimension of 1 to the single matrix, because tf.matmul does not broadcast in the batch dimension.
3.1. The single matrix is on the right side
In that case, we can treat the matrix batch as a single large matrix, using a simple reshape.
M = tf.random_normal((batch_size, n, m))
N = tf.random_normal((m, p))
MN = tf.reshape(tf.reshape(M, [-1, m]) # N, [-1, n, p])
# MN has shape (batch_size, n, p)
3.2. The single matrix is on the left side
This case is more complicated. We can fall back to case 3.1 by transposing the matrices.
MT = tf.matrix_transpose(M)
NT = tf.matrix_transpose(N)
NTMT = tf.reshape(tf.reshape(NT, [-1, m]) # MT, [-1, p, n])
MN = tf.matrix_transpose(NTMT)
However, transposition can be a costly operation, and here it is done twice on an entire batch of matrices. It may be better to simply duplicate M to match the batch dimension:
MN = tf.tile(M[None], [batch_size, 1, 1]) # N
Profiling will tell which option works better for a given problem/hardware combination.
4. I want to multiply a single matrix with a batch of vectors
This looks similar to case 3.2 since the single matrix is on the left, but it is actually simpler because transposing a vector is essentially a no-op. We end-up with
M = tf.random_normal((n, m))
v = tf.random_normal((batch_size, m))
MT = tf.matrix_transpose(M)
Mv = v # MT
What about einsum?
All of the previous multiplications could have been written with the tf.einsum swiss army knife. For example the first solution for 3.2 could be written simply as
MN = tf.einsum('nm,bmp->bnp', M, N)
However, note that einsum is ultimately relying on tranpose and matmul for the computation.
So even though einsum is a very convenient way to write matrix multiplications, it hides the complexity of the operations underneath — for example it is not straightforward to guess how many times an einsum expression will transpose your data, and therefore how costly the operation will be. Also, it may hide the fact that there could be several alternatives for the same operation (see case 3.2) and might not necessarily choose the better option.
For this reason, I would personally use explicit formulas like those above to better convey their respective complexity. Although if you know what you are doing and like the simplicity of the einsum syntax, then by all means go for it.
The matmul operation only works on matrices (2D tensors). Here are two main approaches to do this, both assume that U is a 2D tensor.
Slice embed into 2D tensors and multiply each of them with U individually. This is probably easiest to do using tf.scan() like this:
h = tf.scan(lambda a, x: tf.matmul(x, U), embed)
On the other hand if efficiency is important it may be better to reshape embed to be a 2D tensor so the multiplication can be done with a single matmul like this:
embed = tf.reshape(embed, [-1, m])
h = tf.matmul(embed, U)
h = tf.reshape(h, [-1, n, c])
where c is the number of columns in U. The last reshape will make sure that h is a 3D tensor where the 0th dimension corresponds to the batch just like the original x_input and embed.
As answered by #Stryke, there are two ways to achieve this: 1. Scanning, and 2. Reshaping
tf.scan requires lambda functions and is generally used for recursive operations. Some examples for the same are here: https://rdipietro.github.io/tensorflow-scan-examples/
I personally prefer reshaping, since it is more intuitive. If you are trying to matrix multiply each matrix in the 3D tensor by the matrix that is the 2D tensor, like Cijl = Aijk * Bkl, you can do it with a simple reshape.
A' = tf.reshape(Aijk,[i*j,k])
C' = tf.matmul(A',Bkl)
C = tf.reshape(C',[i,j,l])
It seems that in TensorFlow 1.11.0 the docs for tf.matmul incorrectly say that it works for rank >= 2.
Instead, the best clean alternative I've found is to use tf.tensordot(a, b, (-1, 0)) (docs).
This function gets the dot product of any axis of array a and any axis of array b in its general form tf.tensordot(a, b, axis). Providing axis as (-1, 0) gets the standard dot product of two arrays.