I try to switch from pytorch to tensorflow and since the model now seems to be a fixed thing in tensorflow, i stumble upon a problem when working with Convolutional Neural Networks.
I have a very simple model, just one Conv1D layer and a kernel with size 2.
I want to train it on a small Configuration, say 16 input size and then export the training results on a 32 input size.
How can i access the 3 parameters in this network? (2 kernel, 1 bias) I want to do so to apply them for the higher size case. I struggle because i need to pre-define a input size of the model, this was not the case with pytorch.
Thanks for answering, I've only found outdated answers to this question
model.layers[0].get_weights() yields the weights of the first layer, assuming model is a tf.keras.Model object.
Related
Please help me understand why my model overfits if my input data is normalized to [-0.5. 0.5] whereas it does not overfit otherwise.
I am solving a regression ML problem trying to detect location of 4 key points on images. To do that I import pretrained ResNet 50 and replace its top layer with the following architecture:
Flattening layer right after ResNet
Fully Connected (dense) layer with 256 nodes followed by LeakyRelu activation and Batch Normalization
Another Fully Connected layer with 128 nodes also followed by LeakyRelu and Batch Normalization
Last Fully connected layer (with 8 nodes) which give me 8 coordinates (4 Xs and 4 Ys) of 4 key points.
Since I stick with Keras framework, I use ImageDataGenerator to produce flow of data (images). Since output of my model (8 numbers: 2 coordinates for each out of 4 key points) normalized to [-0.5, 0.5] range, I decided that input to my model (images) should also be in this range and therefore normalized it to the same range using preprocessing_function in Keras' ImageDataGenerator.
Problem came out right after I started model training. I have frozen entire ResNet (training = False) with the goal in mind to first move gradients of the top layers to the proper degree and only then unfreeze a half of ResNet and finetune the model. When training with ResNet frozen, I noticed that my model suffers from overfitting right after a couple of epochs. Surprisingly, it happens even though my dataset is quite decent in size (25k images) and Batch Normalization is employed.
What's even more surprising, the problem completely disappears if I move away from input normalization to [-0.5, 0.5] and go with image preprocessing using tf.keras.applications.resnet50.preprocess_input. This preprocessing method DOES NOT normalize image data and surprisingly to me leads to proper model training without any overfitting.
I tried to use Dropout with different probabilities, L2 regularization. Also tried to reduce complexity of my model by reducing the number of top layers and the number of nodes in each top layer. I did play with learning rate and batch size. Nothing really helped if my input data is normalized and I have no idea why this happens.
IMPORTANT NOTE: when VGG is employed instead of ResNet everything seems to work well!
I really want to figure out why this happens.
UPD: the problem was caused by 2 reasons:
- batch normalization layers within ResNet didn't work properly when frozen
- image preprocessing for ResNet should be done using Z-score
After two fixes mentioned above, everything seems to work well!
Mentioning the Solution below for the benefit of the community.
Problem is resolved by making the changes mentioned below:
Batch Normalization layers within ResNet didn't work properly when frozen. So, Batch Normalization Layers within ResNet should be unfreezed, before Training the Model.
Image Preprocessing (Normalization) for ResNet should be done using Z-score, instead of preprocessing_function in Keras' ImageDataGenerator
so I am using a convolutional layer as the first layer of a neural network for deep reinforcement learning to get the spatial features out of a simulation I built. The simulation gives different maps that are of different lengths and heights to process. If I understand convolutional networks, this should not matter since the channel size is kept constant. In between the convolutional network and the fully connected layers there is a spatial pyramid pooling layer so that the varying image sizes does not matter. Also the spatial data is pretty sparse. Usually it is able to go through a few states and sometimes a few episodes before the first convolutional layer spits out all Nans. Even when I fix the map size this happens. I do not know where the problem lies, where can the problem lie?
Try to initialize your weights with random numbers between 0 and 1 and then try different learning rates for your network training. (I suggest test it with learning rates equal to 10, 1, 0.1, 0.01, ...)
I am working on a project that requires me to add new units to the output layer of a neural network to implement a form of transfer learning. I was wondering if I could do this and set the units' weights using either Keras or TensorFlow.
Specifically I would like to append an output neuron to the output layer of the Keras model and set that neuron's initial weights and bias.
Stumbled upon the answer to my own question. Thanks everyone for the answers/comments.
https://keras.io/layers/about-keras-layers/
The first few lines of this source detail how to load and set weights.
Essentially, appending an output neuron to a Keras model can be accomplished by loading the old output layer, appending the new weights, and setting weights for a new layer. Code is below.
# Load weights of previous output layer, set weights for new layer
old_layer_weights = model.layers.pop().get_weights()
new_neuron_weights = np.ndarray(shape=[1,bottleneck_size])
# Set new weights
# Append new weights, add new layer
new_layer = Dense(num_classes).set_weights(np.append(old_layer_weights,new_neuron_weights))
model.add(new_layer)
You could add new units to the output layer of a pre-trained neural network. This form of transfer learning is said to be called using the bottleneck features of a pre-trained network. This could be implemented both in tensorflow as well as in Keras.
Please find the tutorial in Keras below:
https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
Also, find the tutorial for tensorflow below:
https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/08_Transfer_Learning.ipynb
Hope this helps!
I'm a beginer in this field of Deep Learning. I'm trying to use Keras for a LSTM in a regression problem. I would like to build an ANN which could exploit the memory cell between one prediction and the next one.
In more details... I have a neural network (Keras) with 2 Hidden layer-LSTM and 1 output layer for a regression context.
The batch_size is equal to 7, timestep equal to 1 and I have 5749 samples.
I'm only interested to understand if using timestep == 1 is the same thing as using an MLP instead of LSTM. For time_step, I'm referring to the reshape phase for the input of the Sequential model in Keras. The output is a single regression.
I'm not interested in the previous inputs, but I'm interested only on the output of the network as an information for the next prediction.
Thank you in advance!
You can say so :)
You're right in thinking that you won't have any recurrency anymore.
But internally, there will be still more operations than in regular Dense layers, due to the existence of more kernels.
But be careful:
If you use stateful=True, it will still be a recurrent LSTM!
If you use initial states properly, you can still make it recurrent.
If you're interested in creating custom operations with the memory/state of the cells, you could try creating your custom recurrent cell taking the LSTMCell code as a template.
Then you'd use that cell in a RNN(CustomCell, ...) layer.
I have a CNN trained on a classification task (Network is simple, 2 convolution + pooling layers and 2x fully connected layers). I would like to use this to reconstruct an image if I input a classification label.
Is this possible to achieve?
Is it possible to share weights between corresponding layers in the 2 networks?
You should have a look to cGAN implementation and why not DeepDream from Google ;)
The answer is yes it is possible, however it's not straight forward