Post-edit: Turns out I got confused while constantly playing with the three functions below.
model.weights
model.get_weights()
model.layer(i).get_weights()
model.layer(i).get_weights() returns two separate arrays (without any tags) which are kernel and bias if bias exists in the model.
model.get_weights() directly returns all the weights without any tags.
model.weights returns weights and a bit of info such as name of the layer it belongs to and its shape. I was using this one for the experiment in the question.
What confused me was simply 1 and 3 above.
Note: I've decided not to delete the question because it received an answer and with the post-edit may it still help someone.
The question was...
After saving a Keras model, when I check the weights, I notice 2 separate biases.
Below is a part of weights listed by names.
conv2d/kernel:0
conv2d/bias:0
kernel ones store a bias array as their 2nd numpy array element which I knew as the original bias of the layer. Then, there is bias ones as well separately.
Which one serves what purpose? What is the difference between them?
The convolution layer (conv2d) has a kernel and a bias term and the dense layer (dense) has also a kernel and a bias term. The bias terms are here to give a new degree of freedom for each layer making the neural net more powerful to predict.
Related
I try to switch from pytorch to tensorflow and since the model now seems to be a fixed thing in tensorflow, i stumble upon a problem when working with Convolutional Neural Networks.
I have a very simple model, just one Conv1D layer and a kernel with size 2.
I want to train it on a small Configuration, say 16 input size and then export the training results on a 32 input size.
How can i access the 3 parameters in this network? (2 kernel, 1 bias) I want to do so to apply them for the higher size case. I struggle because i need to pre-define a input size of the model, this was not the case with pytorch.
Thanks for answering, I've only found outdated answers to this question
model.layers[0].get_weights() yields the weights of the first layer, assuming model is a tf.keras.Model object.
To be more precise. Lets say I already have a vector that represents something (word, object, image...) and that I can not change the way I get it. What I would like to do is create a NN without the embedding and pooling layer and am wondering if tensorflow supports this kind of aproach.
Lets say my vector is 10 features long (10 floats). For each vector I also have a label, lets say there are 3 labels to chose from.
What I am (struggling/trying) to do is this. I would like to push this sort of vector input into a keras dense layer with relu activation and 10 neurons (stack maybe 2 or 3) and then as a final layer use sigmoid activation with 3 output neurons.
Then fit with labels on 40(?) epochs and so on...
My main question is well.. Is this possible? I have yet to finish the code and maybe I am asking this a bit too soon, but nevertheless.
Is this how one would approach this or would you build the model from embedding layer down and would not use the already made vectors?
Indeed it is possible.
One way to do it is to create a generator function yielding the vectors (that will do your vector representation, whatever it is) you want to pass to the network. Then create a TensorFlow dataset by calling tf.data.Dataset.from_generator.
The model will be then probably just a Sequential of dense layers.
I am wondering if there is a way in TensorFlow, PyTorch or some other library to selectively connect neurons. I want to make a network with a very large number of neurons in each layer, but that has very few connections between layers.
Note that I do not think this is a duplicate of this answer: Selectively zero weights in TensorFlow?. I implemented a custom keras layer using essentially the same method that appears in that question - essentially by creating a dense layer where all but the specified weights are ignored in training and evaluation. This fulfills part of what I want to do by not training specified weights, and not using them for prediction. But, the problems is that I still waste memory saving the untrained weights, and I waste time calculating the gradients of the zeroed weights. What I would like is for the computation of the gradient matrices to involve only sparse matrices, so that I do not waste time and memory.
Is there a way to selectively create and train weights without wasting memory? If my question is unclear or there is more information that it would be helpful for me to provide, please let me know. I would like to be helpful as a question-asker.
The usual, simple solution is to initialize your weight matrices to have zeros where there should be no connection. You store a mask of the location of these zeros, and set the weights at these positions to zero after each weight update. You need to do this as the gradient for zero weights may be nonzero, and this would introduce nonzero weights (i.e. connectios) where you don't want any.
Pseudocode:
# setup network
weights = sparse_init() # only nonzero for existing connections
zero_mask = where(weights == 0)
# train
for e in range(num_epochs):
train_operation() # may lead to introduction of new connections
weights[zero_mask] = 0 # so we set them to zero again
Both tensorflow and pytorch support sparse tensors (torch.sparse, tf.sparse).
My intuitive understanding would be that if you were willing to write your network using the respective low level APIs (e.g. actually implementing the forward-pass yourself), you could cast your weight matrices as sparse tensors. That would in turn result in sparse connectivity, since the weight matrix of layer [L] defines the connectivity between neurons of the previous layer [L-1] with neurons of layer [L].
I am wondering if it is possible how to add a similar to flattened layer for images of variable length.
Say we have an input layer for our CNN as:
input_shape=(1, None, None)
After performing your typical series of convolution/maxpooling layers, can we create a flattened layer, such that the shape is:
output_shape=(None,...)
If not, would someone be able to explain why not?
You can add GlobalMaxPooling2D and GlobalAveragePooling2D.
These will eliminate the spatial dimensions and keep only the channels dimension. Max will take the maximum values, Average will get the mean value.
I don't really know why you can't use a Flatten layer, but in fact you can't with variable dimensions.
I understand why a Dense wouldn't work: it would have a variable number of parameters, which is totally infeasible for backpropagation, weight update and things like that. (PS: Dense layers act only on the last dimension, so that is the only that needs to be fixed).
Examples:
A Dense layer requires the last dimension fixed
A Conv layer can have variable spatial dimensions, but needs fixed channels (otherwise the number of parameters will vary)
A recurrent layer can have variable time steps, but needs fixed features and so on
Also, notice that:
For classification models, you'd need a fixed dimension output, so, how to flatten and still guarantee the correct number of elements in each dimension? It's impossible.
For models with variable output, why would you want to have a fixed dimension in the middle of the model anyway?
If you're going totally custom, you can always use K.reshape() inside a Lambda layer and work with the tensor shapes:
import keras.backend as K
def myReshape(x):
shape = K.shape(x)
batchSize = shape[:1]
newShape = K.variable([-1],dtype='int32')
newShape = K.concatenate([batchSize,newShape])
return K.reshape(x,newShape)
The layer: Lambda(myReshape)
I don't think you can because the compile step uses those dimensions to allocate fixed memory when your model is instanced for training or prediction. Some dimensions need to be known ahead of time, so the matrix dimensions can be allocated.
I understand why you want variable-sized image input, the world is not (226, 226, 3). It depends on your specific goals, but for me, scaling up or windowing to a region of interest using say Single Shot Detection as a preprocessing step may be helpful. You could just start with Keras's ImageDataGenerator to scale all images to a fixed size - then you see how much of a performance gain you get from conditional input sizing or windowing preprocessing.
#mikkola, I have found flatten to be very helpful for TimeDistributed models. You can add flatten after the convolution steps using:
your_model.add(Flatten())
Some use cases for neural networks requires that not all neurons are connected between two consecutive layers. For my neural network architecture, I need to have a layer, where each neuron only has connections to some prespecified neurons in the previous layer (at somewhat arbitrary places, not with a pattern such as a convolution layer). This is needed in order to model data on a specific graph. I need to implement this "Sparse" layer in Theano, but I'm not used to the Theano way of programming.
It seems that the most efficient way of programming sparse connections in Theano would be to use theano.tensor.nnet.blocksparse.SparseBlockGemv. An alternative would be to do matrix multiplication, where many weights are set to 0 (= no connection), but that would be very inefficient compared to SparseBlockGemv as each neuron is only connected to 2-6 neurons in the previous layer out of ~100000 neurons. Moreover, a weight matrix of 100000x100000 would not fit on my RAM/GPU. Could someone therefore provide an example of how to implement sparse connections using the SparseBlockGemv method or another computationally-efficient method?
A perfect example would be to extend the MLP Theano Tutorial with an extra layer after the hidden layer (and before softmax), where each neuron only has connections to a subset of neurons in the previous layer. However, other examples are also very welcome!
Edit: Note that the layer must be implemented in Theano as it is just a small part of a larger architecture.
The output of a fully-connected layer is given by the dot product of the input and the weights of that layer. In theano or numpy you can use the dot method.
y = x.dot(w)
If you only have connections to some neurons in the previous layer and those connections are predefined you could do something like this:
y = [x[edges[i]].dot(w[i])) for i in neurons]
Where edges[i] contains the indices for neurons connected to neuron i and w[i] the weights of this connection.
Please note, that theano doesn't know about layers or other high-level details.
Apologies for resurrecting an old thread, but this was the simplest guidance I found that was useful in extending the guidance at https://iamtrask.github.io/2015/07/12/basic-python-network/ for partially-connected inputs. However, it took me a while to make sense of basaundi's answer and I think I can improve upon it.
There were a couple of things that I needed to change to make it work. In my case, I am trying to map from N inputs to M neurons in my first hidden layer. My inputs are in a NxF array, where F is the number of features for my inputs, and my synapse values (weights) between inputs and the first layer are in a FxM array. Therefore, the output of Inputs <dot> Weights is a NxM array. My edge matrix is an MxF array that specifies for each neuron in layer 1 (rows), which of the features of the input data are relevant (columns).
In this setup, at least, it required me to slice my arrays differently than specified above. Also, the list comprehension returns a list of matrices, which must be summed to get the correct NxM (otherwise you get an MxNxM array).
So I have used the following (util.sigmoid is a helper function of my own):
y = [numpy.dot(x[:, edges[i]], w[edges[i]])
for i in range(M)]
y = util.sigmoid(numpy.sum(y, 0))
This seems to work for me.