To implement a specific function, I need "input_channels" number of kernels in my layer, each having only a single channel depth, and not depth = "input_channels".
I need to convolve one kernel with one channel of the input, thus the output of the layer would have "input_channels" number of kernels.
Which python/numpy/tensorflow convolution function can allow such a convolution where the number of channels in kernel must not always be equal to "input_channels" and can be 1 instead?
Thanks in advance for any help.
(if anyone wishes to know what all i have tried yet,
In the conv2d function of tensorflow, if I specify number of kernels = 1 to do this, then it will sum over all input_channels and number of output_channels will be 1, since it always initialises kernel depth = "input_channels".
Another option is to specify number of number of kernels = input_channels in conv2d function but this would create "input_channels" number of kernels of depth "input_channels", thus adding lot of complexity and incorrect implementation of my layer.
Yet another thing I tried was to initialise a kernel of volume (kernel_height, kernel_width, input_channels) and loop over the third dimension to convolve only a single input channel with a single kernel. But the tensorflow conv2d function requires a rank 4 kernel to work and gives the following error -
ValueError: Shape must be rank 4 but is rank 3 for 'generic_act_func_4/Conv2D' (op: 'Conv2D') with input shapes: [?,28,28], [28,28]. )
As I see it, you're trying to learn a separate model for each dimension in the input. Thus you will need 2D convolution filters with a filter depth of 1.
I believe there should be an easier way, but most logical to me would be to create a model consisting of a number of submodels equal to the depth of your input (32). Thus 32 models containing a single convolutional filter, receiving only one dimension of your input. Stacking the output from all models would then give the results as you require.
Another solution which would be interesting (but I'm not sure whether it will work, have not tried it myself) would be to do separable convolutions on the input.
A link to an article describing these operations:
https://towardsdatascience.com/a-basic-introduction-to-separable-convolutions-b99ec3102728
You essentially want to perform only the 1st part of the separable convolution operation, which is exactly what the DepthwiseConv2D layer in keras/tensorflow does. So I would have a look at that if I where you. Would be interested to know whether this works out for you!
Related
I'm trying to set up a non-conventional neural network using keras, and am having trouble efficiently setting this up.
The first few layers are standard convolutional layers, and the output of these have d channels, which each have image shapes of n x n.
What I want to do is use a single dense layer to map this d x n x n tensor onto a single image of size n x n. I want to define a single dense layer, with input size d, and output size 1, and apply this function to each "pixel" on the input (with the inputs taken depthwise across channels).
So far, I have not found a efficient solution to this. I have tried first defining a fully connected layer, then looping over each "pixel" in the input, however this takes many hours to initialize the model, and I am worried that it will slow down backprop, as the computations are likely not properly parallelized.
Is there an efficient way to do this?
What you're describing is a 1x1 convolution with output depth 1. You can implement it just as you implement the rest of the convolution layers. You might want to apply tf.squeeze afterwards to remove the depth, which should have size 1.
I am wondering if it is possible how to add a similar to flattened layer for images of variable length.
Say we have an input layer for our CNN as:
input_shape=(1, None, None)
After performing your typical series of convolution/maxpooling layers, can we create a flattened layer, such that the shape is:
output_shape=(None,...)
If not, would someone be able to explain why not?
You can add GlobalMaxPooling2D and GlobalAveragePooling2D.
These will eliminate the spatial dimensions and keep only the channels dimension. Max will take the maximum values, Average will get the mean value.
I don't really know why you can't use a Flatten layer, but in fact you can't with variable dimensions.
I understand why a Dense wouldn't work: it would have a variable number of parameters, which is totally infeasible for backpropagation, weight update and things like that. (PS: Dense layers act only on the last dimension, so that is the only that needs to be fixed).
Examples:
A Dense layer requires the last dimension fixed
A Conv layer can have variable spatial dimensions, but needs fixed channels (otherwise the number of parameters will vary)
A recurrent layer can have variable time steps, but needs fixed features and so on
Also, notice that:
For classification models, you'd need a fixed dimension output, so, how to flatten and still guarantee the correct number of elements in each dimension? It's impossible.
For models with variable output, why would you want to have a fixed dimension in the middle of the model anyway?
If you're going totally custom, you can always use K.reshape() inside a Lambda layer and work with the tensor shapes:
import keras.backend as K
def myReshape(x):
shape = K.shape(x)
batchSize = shape[:1]
newShape = K.variable([-1],dtype='int32')
newShape = K.concatenate([batchSize,newShape])
return K.reshape(x,newShape)
The layer: Lambda(myReshape)
I don't think you can because the compile step uses those dimensions to allocate fixed memory when your model is instanced for training or prediction. Some dimensions need to be known ahead of time, so the matrix dimensions can be allocated.
I understand why you want variable-sized image input, the world is not (226, 226, 3). It depends on your specific goals, but for me, scaling up or windowing to a region of interest using say Single Shot Detection as a preprocessing step may be helpful. You could just start with Keras's ImageDataGenerator to scale all images to a fixed size - then you see how much of a performance gain you get from conditional input sizing or windowing preprocessing.
#mikkola, I have found flatten to be very helpful for TimeDistributed models. You can add flatten after the convolution steps using:
your_model.add(Flatten())
The documentation for the Embedding layer is here:
https://keras.io/layers/embeddings/
and the documentation for the Masking layer is here:
https://keras.io/layers/recurrent/
I cant find a difference there. Should one of the layers be prefered in certain situations?
I feel like Masking() is more masking of time steps; while Embedding(mask_zero=True) is more of a data filter.
Masking:
If all values in the input tensor at that timestep are equal to mask_value, then the timestep will be masked (skipped) in all downstream layers
With an arbitrary mask_value. Thus, you can decide to skip time steps in which there is no input, or some other condition you can think of, based on your data.
For Embedding, you overlay a mask on your input skipping calculations for data for which the input=0. This way, you can, in a single time step, propagate full data, part of the data, of no data through the network. This is not a masking of time step #3 or something like that, it is a masking of input data #i. Also, only having no input (input=zero) can be masked.
Thus, there are certainly cases I can think of where the two are completely equal (when an input = 0, it is 0 for all inputs would be such a case), but their use is on another resolution.
I am a little bit confused with how LSTM handle the input.
As we all know, the input of LSTM model in Keras has the form (batch_size, timesteps, input_dim).
My data is a time series data, where each sequence of n time steps are fed in to predict the value at n+1 time steps. Then, how do they access the input? They process each time steps in the sequence or have access to all of them at the same time?
As i check the number of parameters of each LSTM layer. They have 4*d*(n+d) where n is the dimension of input and d is the number of memory cell.
In my case i have d=10, and the number of parameters is 440 (without bias). So it means n=1 here, so seems like the input has dimension 1*1.
Then they have access to all of them spontaniously.
Anyone has some ideas about this?
First, think of a convolutional layer (it's easier).
It has parameters that depend only on the "filter size", "input channels" and "number of filters". But never on the "size of the image".
That happens because it's somewhat a "walking operation". The same group of filters is applied throughout the image. The total operations increase with the size of the image, but the parameters, which only define the filters, are independent from the image size. (Imagine a filter to detect a circle, this filter doesn't need to change to detect circles in different parts of the image, although it's applied for each step in the entire image).
So:
Parameters: number of filters * size of filtersĀ² * input channels
Calculation steps: size of image (considering strides, padding, etc.)
With LSTM layers, a similar thing happens. The parameters are related to what they call "gates". (Take a look here)
There is a "state", and "gates" that are applied in each time iteration to determine how the state will change.
The gates are not time dependent, though. The calculations are time iterations indeed, but every iteration uses the same group of gates.
Comparing to the convolutional layers:
Parameters: number of cells, data dimension
Calculation steps: time steps
Some use cases for neural networks requires that not all neurons are connected between two consecutive layers. For my neural network architecture, I need to have a layer, where each neuron only has connections to some prespecified neurons in the previous layer (at somewhat arbitrary places, not with a pattern such as a convolution layer). This is needed in order to model data on a specific graph. I need to implement this "Sparse" layer in Theano, but I'm not used to the Theano way of programming.
It seems that the most efficient way of programming sparse connections in Theano would be to use theano.tensor.nnet.blocksparse.SparseBlockGemv. An alternative would be to do matrix multiplication, where many weights are set to 0 (= no connection), but that would be very inefficient compared to SparseBlockGemv as each neuron is only connected to 2-6 neurons in the previous layer out of ~100000 neurons. Moreover, a weight matrix of 100000x100000 would not fit on my RAM/GPU. Could someone therefore provide an example of how to implement sparse connections using the SparseBlockGemv method or another computationally-efficient method?
A perfect example would be to extend the MLP Theano Tutorial with an extra layer after the hidden layer (and before softmax), where each neuron only has connections to a subset of neurons in the previous layer. However, other examples are also very welcome!
Edit: Note that the layer must be implemented in Theano as it is just a small part of a larger architecture.
The output of a fully-connected layer is given by the dot product of the input and the weights of that layer. In theano or numpy you can use the dot method.
y = x.dot(w)
If you only have connections to some neurons in the previous layer and those connections are predefined you could do something like this:
y = [x[edges[i]].dot(w[i])) for i in neurons]
Where edges[i] contains the indices for neurons connected to neuron i and w[i] the weights of this connection.
Please note, that theano doesn't know about layers or other high-level details.
Apologies for resurrecting an old thread, but this was the simplest guidance I found that was useful in extending the guidance at https://iamtrask.github.io/2015/07/12/basic-python-network/ for partially-connected inputs. However, it took me a while to make sense of basaundi's answer and I think I can improve upon it.
There were a couple of things that I needed to change to make it work. In my case, I am trying to map from N inputs to M neurons in my first hidden layer. My inputs are in a NxF array, where F is the number of features for my inputs, and my synapse values (weights) between inputs and the first layer are in a FxM array. Therefore, the output of Inputs <dot> Weights is a NxM array. My edge matrix is an MxF array that specifies for each neuron in layer 1 (rows), which of the features of the input data are relevant (columns).
In this setup, at least, it required me to slice my arrays differently than specified above. Also, the list comprehension returns a list of matrices, which must be summed to get the correct NxM (otherwise you get an MxNxM array).
So I have used the following (util.sigmoid is a helper function of my own):
y = [numpy.dot(x[:, edges[i]], w[edges[i]])
for i in range(M)]
y = util.sigmoid(numpy.sum(y, 0))
This seems to work for me.