Some use cases for neural networks requires that not all neurons are connected between two consecutive layers. For my neural network architecture, I need to have a layer, where each neuron only has connections to some prespecified neurons in the previous layer (at somewhat arbitrary places, not with a pattern such as a convolution layer). This is needed in order to model data on a specific graph. I need to implement this "Sparse" layer in Theano, but I'm not used to the Theano way of programming.
It seems that the most efficient way of programming sparse connections in Theano would be to use theano.tensor.nnet.blocksparse.SparseBlockGemv. An alternative would be to do matrix multiplication, where many weights are set to 0 (= no connection), but that would be very inefficient compared to SparseBlockGemv as each neuron is only connected to 2-6 neurons in the previous layer out of ~100000 neurons. Moreover, a weight matrix of 100000x100000 would not fit on my RAM/GPU. Could someone therefore provide an example of how to implement sparse connections using the SparseBlockGemv method or another computationally-efficient method?
A perfect example would be to extend the MLP Theano Tutorial with an extra layer after the hidden layer (and before softmax), where each neuron only has connections to a subset of neurons in the previous layer. However, other examples are also very welcome!
Edit: Note that the layer must be implemented in Theano as it is just a small part of a larger architecture.
The output of a fully-connected layer is given by the dot product of the input and the weights of that layer. In theano or numpy you can use the dot method.
y = x.dot(w)
If you only have connections to some neurons in the previous layer and those connections are predefined you could do something like this:
y = [x[edges[i]].dot(w[i])) for i in neurons]
Where edges[i] contains the indices for neurons connected to neuron i and w[i] the weights of this connection.
Please note, that theano doesn't know about layers or other high-level details.
Apologies for resurrecting an old thread, but this was the simplest guidance I found that was useful in extending the guidance at https://iamtrask.github.io/2015/07/12/basic-python-network/ for partially-connected inputs. However, it took me a while to make sense of basaundi's answer and I think I can improve upon it.
There were a couple of things that I needed to change to make it work. In my case, I am trying to map from N inputs to M neurons in my first hidden layer. My inputs are in a NxF array, where F is the number of features for my inputs, and my synapse values (weights) between inputs and the first layer are in a FxM array. Therefore, the output of Inputs <dot> Weights is a NxM array. My edge matrix is an MxF array that specifies for each neuron in layer 1 (rows), which of the features of the input data are relevant (columns).
In this setup, at least, it required me to slice my arrays differently than specified above. Also, the list comprehension returns a list of matrices, which must be summed to get the correct NxM (otherwise you get an MxNxM array).
So I have used the following (util.sigmoid is a helper function of my own):
y = [numpy.dot(x[:, edges[i]], w[edges[i]])
for i in range(M)]
y = util.sigmoid(numpy.sum(y, 0))
This seems to work for me.
Related
To be more precise. Lets say I already have a vector that represents something (word, object, image...) and that I can not change the way I get it. What I would like to do is create a NN without the embedding and pooling layer and am wondering if tensorflow supports this kind of aproach.
Lets say my vector is 10 features long (10 floats). For each vector I also have a label, lets say there are 3 labels to chose from.
What I am (struggling/trying) to do is this. I would like to push this sort of vector input into a keras dense layer with relu activation and 10 neurons (stack maybe 2 or 3) and then as a final layer use sigmoid activation with 3 output neurons.
Then fit with labels on 40(?) epochs and so on...
My main question is well.. Is this possible? I have yet to finish the code and maybe I am asking this a bit too soon, but nevertheless.
Is this how one would approach this or would you build the model from embedding layer down and would not use the already made vectors?
Indeed it is possible.
One way to do it is to create a generator function yielding the vectors (that will do your vector representation, whatever it is) you want to pass to the network. Then create a TensorFlow dataset by calling tf.data.Dataset.from_generator.
The model will be then probably just a Sequential of dense layers.
I'm trying to set up a non-conventional neural network using keras, and am having trouble efficiently setting this up.
The first few layers are standard convolutional layers, and the output of these have d channels, which each have image shapes of n x n.
What I want to do is use a single dense layer to map this d x n x n tensor onto a single image of size n x n. I want to define a single dense layer, with input size d, and output size 1, and apply this function to each "pixel" on the input (with the inputs taken depthwise across channels).
So far, I have not found a efficient solution to this. I have tried first defining a fully connected layer, then looping over each "pixel" in the input, however this takes many hours to initialize the model, and I am worried that it will slow down backprop, as the computations are likely not properly parallelized.
Is there an efficient way to do this?
What you're describing is a 1x1 convolution with output depth 1. You can implement it just as you implement the rest of the convolution layers. You might want to apply tf.squeeze afterwards to remove the depth, which should have size 1.
To implement a specific function, I need "input_channels" number of kernels in my layer, each having only a single channel depth, and not depth = "input_channels".
I need to convolve one kernel with one channel of the input, thus the output of the layer would have "input_channels" number of kernels.
Which python/numpy/tensorflow convolution function can allow such a convolution where the number of channels in kernel must not always be equal to "input_channels" and can be 1 instead?
Thanks in advance for any help.
(if anyone wishes to know what all i have tried yet,
In the conv2d function of tensorflow, if I specify number of kernels = 1 to do this, then it will sum over all input_channels and number of output_channels will be 1, since it always initialises kernel depth = "input_channels".
Another option is to specify number of number of kernels = input_channels in conv2d function but this would create "input_channels" number of kernels of depth "input_channels", thus adding lot of complexity and incorrect implementation of my layer.
Yet another thing I tried was to initialise a kernel of volume (kernel_height, kernel_width, input_channels) and loop over the third dimension to convolve only a single input channel with a single kernel. But the tensorflow conv2d function requires a rank 4 kernel to work and gives the following error -
ValueError: Shape must be rank 4 but is rank 3 for 'generic_act_func_4/Conv2D' (op: 'Conv2D') with input shapes: [?,28,28], [28,28]. )
As I see it, you're trying to learn a separate model for each dimension in the input. Thus you will need 2D convolution filters with a filter depth of 1.
I believe there should be an easier way, but most logical to me would be to create a model consisting of a number of submodels equal to the depth of your input (32). Thus 32 models containing a single convolutional filter, receiving only one dimension of your input. Stacking the output from all models would then give the results as you require.
Another solution which would be interesting (but I'm not sure whether it will work, have not tried it myself) would be to do separable convolutions on the input.
A link to an article describing these operations:
https://towardsdatascience.com/a-basic-introduction-to-separable-convolutions-b99ec3102728
You essentially want to perform only the 1st part of the separable convolution operation, which is exactly what the DepthwiseConv2D layer in keras/tensorflow does. So I would have a look at that if I where you. Would be interested to know whether this works out for you!
I am wondering if there is a way in TensorFlow, PyTorch or some other library to selectively connect neurons. I want to make a network with a very large number of neurons in each layer, but that has very few connections between layers.
Note that I do not think this is a duplicate of this answer: Selectively zero weights in TensorFlow?. I implemented a custom keras layer using essentially the same method that appears in that question - essentially by creating a dense layer where all but the specified weights are ignored in training and evaluation. This fulfills part of what I want to do by not training specified weights, and not using them for prediction. But, the problems is that I still waste memory saving the untrained weights, and I waste time calculating the gradients of the zeroed weights. What I would like is for the computation of the gradient matrices to involve only sparse matrices, so that I do not waste time and memory.
Is there a way to selectively create and train weights without wasting memory? If my question is unclear or there is more information that it would be helpful for me to provide, please let me know. I would like to be helpful as a question-asker.
The usual, simple solution is to initialize your weight matrices to have zeros where there should be no connection. You store a mask of the location of these zeros, and set the weights at these positions to zero after each weight update. You need to do this as the gradient for zero weights may be nonzero, and this would introduce nonzero weights (i.e. connectios) where you don't want any.
Pseudocode:
# setup network
weights = sparse_init() # only nonzero for existing connections
zero_mask = where(weights == 0)
# train
for e in range(num_epochs):
train_operation() # may lead to introduction of new connections
weights[zero_mask] = 0 # so we set them to zero again
Both tensorflow and pytorch support sparse tensors (torch.sparse, tf.sparse).
My intuitive understanding would be that if you were willing to write your network using the respective low level APIs (e.g. actually implementing the forward-pass yourself), you could cast your weight matrices as sparse tensors. That would in turn result in sparse connectivity, since the weight matrix of layer [L] defines the connectivity between neurons of the previous layer [L-1] with neurons of layer [L].
I am wondering if it is possible how to add a similar to flattened layer for images of variable length.
Say we have an input layer for our CNN as:
input_shape=(1, None, None)
After performing your typical series of convolution/maxpooling layers, can we create a flattened layer, such that the shape is:
output_shape=(None,...)
If not, would someone be able to explain why not?
You can add GlobalMaxPooling2D and GlobalAveragePooling2D.
These will eliminate the spatial dimensions and keep only the channels dimension. Max will take the maximum values, Average will get the mean value.
I don't really know why you can't use a Flatten layer, but in fact you can't with variable dimensions.
I understand why a Dense wouldn't work: it would have a variable number of parameters, which is totally infeasible for backpropagation, weight update and things like that. (PS: Dense layers act only on the last dimension, so that is the only that needs to be fixed).
Examples:
A Dense layer requires the last dimension fixed
A Conv layer can have variable spatial dimensions, but needs fixed channels (otherwise the number of parameters will vary)
A recurrent layer can have variable time steps, but needs fixed features and so on
Also, notice that:
For classification models, you'd need a fixed dimension output, so, how to flatten and still guarantee the correct number of elements in each dimension? It's impossible.
For models with variable output, why would you want to have a fixed dimension in the middle of the model anyway?
If you're going totally custom, you can always use K.reshape() inside a Lambda layer and work with the tensor shapes:
import keras.backend as K
def myReshape(x):
shape = K.shape(x)
batchSize = shape[:1]
newShape = K.variable([-1],dtype='int32')
newShape = K.concatenate([batchSize,newShape])
return K.reshape(x,newShape)
The layer: Lambda(myReshape)
I don't think you can because the compile step uses those dimensions to allocate fixed memory when your model is instanced for training or prediction. Some dimensions need to be known ahead of time, so the matrix dimensions can be allocated.
I understand why you want variable-sized image input, the world is not (226, 226, 3). It depends on your specific goals, but for me, scaling up or windowing to a region of interest using say Single Shot Detection as a preprocessing step may be helpful. You could just start with Keras's ImageDataGenerator to scale all images to a fixed size - then you see how much of a performance gain you get from conditional input sizing or windowing preprocessing.
#mikkola, I have found flatten to be very helpful for TimeDistributed models. You can add flatten after the convolution steps using:
your_model.add(Flatten())