Indexing for 3 dimensional Numpy Arrays (convolutional network) - python

I'm trying to write a function that performs Convolution, and I'm getting a little challenged trying to create the output volume using numpy. Specifically, I have an input image that is represented as an array of dimensions (150,150,3). Now, I want to convolve over this image with a set of kernels num_kernels, which are arrays of dimension (4,4,3), and I want these kernels to move over the image with a stride of 2. My thought process has been:
(1) I'll create an output array which is comprised of taking (4,4,3) size chunks out of the input array and stretching these out into rows, and ultimately making a large matrix of these.
(2) Then, I'll create a parameter array composed of all of my (4,4,3) kernels stretched out into rows, which will also make a large matrix.
(3) Then I can dot product these matrices together and reshape the output matrix into the proper dimensions.
My rough psuedo-code start to number (1) is as follows.
def Convolution(input, filter_size, num_filters, stride):
X = input
output_Volume = np.zeros(#dimensions)
weights = np.zeros(#dimensions)
#get weights from other function
for width in range(0,150,2):
for height in range(0,150,2):
row = X(#indexes here to take out chunk).flatten
output_Volume.append(row) #something of this sort
return #dot product output volume and weights
If someone could provide a specific code example of how to implement this (most helpful would be answers to (1) and (2)) in Python (I'm using numpy), it would be much appreciated. Thank you!

Related

How to convolve a 3 dimensional array (in this case a filter bank) with a 2 dimensional image (monochrome) in Python?

I have a function definition that takes in an image that is monochromatic and 2 dimensional, and a filter bank that is a 3 dimensional array (48 2D filters). I need to convolve the two to find the feature vector at each pixel location. How do I do that?
I have tried scipy.ndimage.convolve() but get the error "filter weights array has incorrect shape."
To make things simple, simply loop over the temporal dimension of your filter bank, then apply convolution to the image and each filter within the filter bank. After, stack the results into a 3D matrix. This is actually what I would do for readability.
Suppose your image is stored in img and your filters are stored in filters. img is of size M x N and your filters are of size R x C x D with D being the total number of filters you have.
As you've eluded to using scipy.ndimage.convolve, we can just use that. However, it's possible to use cv2.filter2D too. I'll show you how to use both.
Method #1 - Using scipy.ndimage.convolve
import scipy.ndimage
import numpy as np
outputs = []
D = filters.shape[2]
for i in range(D):
filt = filters[...,i]
out = scipy.ndimage.convolve(img, filt)
outputs.append(out)
outputs = np.dstack(outputs)
The above is straight forward. Create an empty list to store our convolution results, then extract the total number of filters we have. After, we loop over each filter, convolve the image with said filter and append it to the list. We then use numpy.dstack to stack all of the 2D responses together to a 3D matrix.
Method #2 - Using cv2.filter2D
import cv2
import numpy as np
outputs = []
D = filters.shape[2]
for i in range(D):
filt = filters[...,i]
filt = filt[::-1, ::-1]
out = cv2.filter2D(img, -1, filt)
outputs.append(out)
outputs = np.dstack(outputs)
This is exactly the same as Method #1 with the exception of calling cv2.filter2D instead. Also take note that I had to rotate the kernel by 180 degrees as cv2.filter2D performs correlation and not convolution. To perform convolution with cv2.filter2D, you need to rotate the kernel first prior to running the method. Take note that the second parameter to cv2.filter2D is the output data type of the result. We set this to -1 to say that it will be whatever the input data type is.
Note on indexing
If you want to avoid indexing into your filter bank all together and let the for loop do that for you, you can shift the channels around so that the number of filters is the first channel. You can then construct the resulting 3D output matrix by list comprehension:
filters = filters.transpose((2, 0, 1))
outputs = np.dstack([scipy.ndimage.convolve(img, filt) for filt in filters])
You can make the monochrome image a 3D array by either padding zeros or replicating the image itself. The number of such paddings would depend on the depth of convolution kernel. For example, let d be the depth of the convolution kernel and I is your image, then
I_pad = np.empty((I.shape[0], I.shape[1], 0))
# Do this for copying the image across channels
I_pad = [np.concatenate((I_pad, I), axis=-1) for _ in range(d)]
# Do this for zero padding
I_pad = [np.concatenate((I_pad, np.zeros(size(I))), axis=-1) for _ in range(d)]
Then carry out the convolution. Hope it helps

LSTM cell input matrix dimensions

Im trying to build an LSTM using just numpy to try and get a feel for whats going on, but I'm running into an issue with my understanding of how the LSTM matrixes work. I found this image from http://colah.github.io/posts/2015-08-Understanding-LSTMs/ of an RNN
From my understanding of an RNN, xt is dot producted with a weight matrix we will call Wx,h and ht-1 is dot producted with a weight matrix Wh,h and the result gets summed together:
This makes sense as the shape of xt is (b,d) where b is the batch size and d is the dimensionality, and Wx,h is of shape (d,h) thus the resulting matrix would be of shape (b,h).
Similarly ht-1 is of shape (b,h) and Wh,h is of shape (h,h) resulting in a matrix of size (b,h) also. So summing these together would result in a (b,h) shaped matrix as the result, which is perfect as that's the same shape of ht-1.
Here is where I run into problems, looking at this diagram, also from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Things start to not make much sense. If we look at the equation
I know that stacking ht-1 and xt and doing the dot product with Wf is the exact same thing as splitting Wf into two smaller matrices and doing the same thing as we done for the RNN above, the issue comes in with the dimensions. The shape of xt is (b,d) and the shape of ht-1 is (b,h). For them to be stacked on top of one another h must be equal to d, which isn't always the case.
So assuming I'm wrong about the shape of xt and ht-1, I'm guessing we need to do the dot products prior to passing into the cell so the shapes of xt is the same as the shape of ht-1 (both being (b,h)). But even then we run into issues, by stacking the two matrices we end up with a (2b,h) matrix and unfortunately we can't dot product this matrix by any weight matrix to result in a (b,h) matrix, so progressively the number of rows in this vector would grow as more inputs are added in, as the output ht+1 would be of shape (2b,h).
My question is what is wrong with my understanding. Looking at: https://www.quora.com/In-LSTM-how-do-you-figure-out-what-size-the-weights-are-supposed-to-be it appears that my assumption of doing the dot product prior to passing in to the lstm cell is correct, but for some reason this person said that post stacking the (b,h) matrix would result in a (b,h) matrix which doesn't make much sense.

How can I combine my three 2D tensors into a single 3D tensor in tensor flow?

Hello I am a newbie with the tensorflow and currently, I am working with colour Images and it's PCAS.
I have extracted PCAS in a form of "Red","Green" and "Blue" and also computed the weights which are associated with "Red","Green" and "Blue" components.
After doing the all the above stuff I want to combine all three 2D matrices into the single 3D matrix.
For a tensorflow it would be a 3D tensor.
def multi(h0,ppca,mu,i,scope=None):
with tf.variable_scope(scope or"multi"):
return tf.matmul(ppca[:,:,0],h0[i,:,:,0]) + tf.reshape(mu[:,0],[4096,1]) , tf.matmul(ppca[:,:,1],h0[i,:,:,1]) + tf.reshape(mu[:,1],[4096,1]) ,tf.matmul(ppca[:,:,2],h0[i,:,:,2]) + tf.reshape(mu[:,2],[4096,1])
So from the above function, I will get all three different 2D tensors and want to combine those 2D tensors to single 3D tensor which has dimensions [4096,1,3]
How can I do that?
any help is highly appreciated.
You need to concat them like this:
three_d_image = tf.concat(0, [[r], [g], [b]])
This tells tensorflow to concat them along the x dimension and treat each tensor as a matrix.
Doing the same without the additional brackets around the r,g,b tensors will try to concat them to one large 2D matrix
A clean, easy way to do it is using the tf.stack operation (tf.pack in older versions of tensorflow), it concatenats all tensors along a new dimension. If you want your new dimension to be after all previous, you need to set the axis argument to the number of dimensions of your tensors.
three_d_image = tf.stack([r,g,b], axis=2)
one of the solutions is that you can add one more empty dimension to your 2Ds so you will have 3 matrices of 3D dimension [4096,1,1] then you can concat these 3 matrices by axis 2 tf.concat(2,matrices) gives you [4096,1,3]
the second solution can be concat of axis 1, tf.concat(1,matrices) then reshape it to 3D

3d image compression with numpy

I have a 3d numpy array representing an object with cells as voxels and the voxels having values from 1 to 10. I would like to compress the image (a) to make it smaller and (b) to get a quick idea later on of how complex the image is by compressing it to a minimum level of agreement with the original image.
I have used SVD to do this with 2D images and seeing how many singular values were required but it looks to have difficulty with 3D ones. If e.g. I look at the diagonal terms in the S matrix, they are all zero and I was expecting singular values.
Is there any way I can use svd to compress 3D arrays (e.g. flattening in some way)? Or are other methods more appropriate? If necessary I could probably simplify the voxel values to 0 or 1.
You could essentially apply the same principle to the 3D data without flattening it. There are some algorithms to separate N-dimensional matrices, such as the CP-ALS (using Alternating Least Squares) and this is implemented in the package sktensor. You can use the package to decompose the tensor given a rank:
from sktensor import dtensor, cp_als
T = dtensor(X)
rank = 5
P, fit, itr, exectimes = cp_als(T, rank, init='random')
With X being your data. You could then use the weights weights = P.lmbda to reconstruct the original array X and calculate the reconstruction error, as you would do with SVD.
Other decomposition methods for 3D data (or in general tensors) include the Tucker Decomposition or the Canonical Decomposition (also available in the same package).
It is not directly a 3D SVD, but all the methods above can be used to analyze the principal components of your data.
Find bellow (just for completeness) an image of the tucker decomposition:
And bellow another image of the decomposition that CP-ALS (optimization algorithm) tries to obtain:
Image credits to:
1- http://www.slideshare.net/KoheiHayashi1/talk-in-jokyonokai-12989223
2- http://www.bsp.brain.riken.jp/~zhougx/tensor.html
What you want is a higher order svd/Tucker decomposition.
In the 3D case, you will get three projection matrices (one for each dimension) and a low rank core tensor (a 3D array).
You can do this easily using TensorLy:
from tensorly.decomposition import tucker
core, factors = tucker(tensor, ranks=[2, 3, 4])
Here, core will have shape (2, 3, 4) and len(factors) will be 3, one factor for each dimension.

Indexing a tensor in the 3rd dimension

I have a batch of N sequences of integers of length L which is embedded into a N*L*d tensor. This sequence is auto-encoded by my network architecture. So, I have:
from theano import tensor as T
X = T.imatrix('X') # N*L elements in [0,C]
EMB = T.tensor('Embedding') # N*L*d
... # some code goes here :-)
PY = T.tensor('PY') # N*L*C probability of the predicted class in [0,C]
cost = -T.log(PY[X])
as far as I could get, the indexing is in the first dimension of the tensor, so I had to use a theano.scan. Is there a way to index the tensor directly?
Sounds like you want a 3 dimensional version of theano.tensor.nnet.categorical_crossentropy?
If so, then I think you could simply flatten the matrix of true class label indexes into a vector and the 3D tensor of predicted class probabilities into a matrix and then use the built in function.
cost = T.nnet.categorical_crossentropy(
Y.reshape((Y.shape[0] * Y.shape[1], X.shape[2])),
X.flatten())
The order of entries in Y may need to be adjusted first (e.g. via a dimshuffle) to make sure the entries in the matrix and vector being compared correspond to each other.
Here we assume, as the question suggests, that the sequences are not padded -- they are all exactly L elements in length. If the sequences are actually padded then you may need to do something much more complicated to avoid computing cost elements inside the padding regions.

Categories

Resources