so I am using a convolutional layer as the first layer of a neural network for deep reinforcement learning to get the spatial features out of a simulation I built. The simulation gives different maps that are of different lengths and heights to process. If I understand convolutional networks, this should not matter since the channel size is kept constant. In between the convolutional network and the fully connected layers there is a spatial pyramid pooling layer so that the varying image sizes does not matter. Also the spatial data is pretty sparse. Usually it is able to go through a few states and sometimes a few episodes before the first convolutional layer spits out all Nans. Even when I fix the map size this happens. I do not know where the problem lies, where can the problem lie?
Try to initialize your weights with random numbers between 0 and 1 and then try different learning rates for your network training. (I suggest test it with learning rates equal to 10, 1, 0.1, 0.01, ...)
Related
I try to switch from pytorch to tensorflow and since the model now seems to be a fixed thing in tensorflow, i stumble upon a problem when working with Convolutional Neural Networks.
I have a very simple model, just one Conv1D layer and a kernel with size 2.
I want to train it on a small Configuration, say 16 input size and then export the training results on a 32 input size.
How can i access the 3 parameters in this network? (2 kernel, 1 bias) I want to do so to apply them for the higher size case. I struggle because i need to pre-define a input size of the model, this was not the case with pytorch.
Thanks for answering, I've only found outdated answers to this question
model.layers[0].get_weights() yields the weights of the first layer, assuming model is a tf.keras.Model object.
I have trouble understanding weight transfer in transfer learning like tasks...
I trained two networks and saved the weights using keras with tensorflow backend (two networks are in the same model). I would like to use half of the layers from one network and half of the layers from the other network and concatenate them as a new network. Practically I want to cut two networks and join them in a new network and throw away remaining layers. Since half of the layer are top layers I couldn't do it with .pop() so I decided to transfer weights.
I tried this by setting corresponding weights from each layer (the ones that I needed) in the old model to corresponding layers in my new model like:
new_model.layers[i].set_weights(model.layers[i].get_weights())
This however loads the weights but seems to not work as I expect.
Then I tried get_layer:
new_model.layers[i] = model.get_layer('name').output
This also seems to do a meaningless weight transfer.
What should I transfer from my old network to the new network to carry the sense of actually taking half of the whole network?
Do only weights (and biases) carry all information? What else should I assign to have the theoretically same layers?
What does get_leyer return?
Does get_weight/set_weight do same thing as load_weight?
Please help me understand why my model overfits if my input data is normalized to [-0.5. 0.5] whereas it does not overfit otherwise.
I am solving a regression ML problem trying to detect location of 4 key points on images. To do that I import pretrained ResNet 50 and replace its top layer with the following architecture:
Flattening layer right after ResNet
Fully Connected (dense) layer with 256 nodes followed by LeakyRelu activation and Batch Normalization
Another Fully Connected layer with 128 nodes also followed by LeakyRelu and Batch Normalization
Last Fully connected layer (with 8 nodes) which give me 8 coordinates (4 Xs and 4 Ys) of 4 key points.
Since I stick with Keras framework, I use ImageDataGenerator to produce flow of data (images). Since output of my model (8 numbers: 2 coordinates for each out of 4 key points) normalized to [-0.5, 0.5] range, I decided that input to my model (images) should also be in this range and therefore normalized it to the same range using preprocessing_function in Keras' ImageDataGenerator.
Problem came out right after I started model training. I have frozen entire ResNet (training = False) with the goal in mind to first move gradients of the top layers to the proper degree and only then unfreeze a half of ResNet and finetune the model. When training with ResNet frozen, I noticed that my model suffers from overfitting right after a couple of epochs. Surprisingly, it happens even though my dataset is quite decent in size (25k images) and Batch Normalization is employed.
What's even more surprising, the problem completely disappears if I move away from input normalization to [-0.5, 0.5] and go with image preprocessing using tf.keras.applications.resnet50.preprocess_input. This preprocessing method DOES NOT normalize image data and surprisingly to me leads to proper model training without any overfitting.
I tried to use Dropout with different probabilities, L2 regularization. Also tried to reduce complexity of my model by reducing the number of top layers and the number of nodes in each top layer. I did play with learning rate and batch size. Nothing really helped if my input data is normalized and I have no idea why this happens.
IMPORTANT NOTE: when VGG is employed instead of ResNet everything seems to work well!
I really want to figure out why this happens.
UPD: the problem was caused by 2 reasons:
- batch normalization layers within ResNet didn't work properly when frozen
- image preprocessing for ResNet should be done using Z-score
After two fixes mentioned above, everything seems to work well!
Mentioning the Solution below for the benefit of the community.
Problem is resolved by making the changes mentioned below:
Batch Normalization layers within ResNet didn't work properly when frozen. So, Batch Normalization Layers within ResNet should be unfreezed, before Training the Model.
Image Preprocessing (Normalization) for ResNet should be done using Z-score, instead of preprocessing_function in Keras' ImageDataGenerator
I am studying some machine learning on my own and I am practicing (in Python) with the assignments of the course held by Andrew Ng.
After completing the fourth exercise by hand, I tought to do it in Keras to practice with the library.
In the exercise we have 5000 images of hand written digits, going from 0 to 9. Each image is a 20x20 matrix. The dataset is stored in a matrix X of shape 5000x400 (each image has been 'unrolled') and the labels are stored in a matrix y of shape 5000x10. Each row of y is a hot-one vector.
The exercise asks to implement backpropagation to maximaze the log likelihood, for a simple neural network with one input layer, one hidden layer and one output layer. The hidden layer has 25 neurons and the output layer 10. We use sigmoid as activation for both layers.
My code in Keras is this
model=Sequential()
model.add(Dense(25,input_shape=(400,),use_bias=True,kernel_regularizer=regularizers.l2(1),activation='sigmoid',kernel_initializer='glorot_uniform'))
model.add(Dense(10,use_bias=True,kernel_regularizer=regularizers.l2(1),activation='sigmoid',kernel_initializer='glorot_uniform'))
model.compile(loss='categorical_crossentropy',optimizer='sgd',metrics=['accuracy'])
model.fit(X, y, batch_size=5000,epochs=100, verbose=1)
Since I want this to be as similar as possible to the assignment I have used the same initial weights as the assignment, the same regularization parameter, the same activations and gradient descent as a optimizer (actually the assignment uses the Truncated Newton Method but I don't think my problem lies here).
I thought I was doing everything correctly but when I train the network I get a 10% accuracy on the training dataset. Even playing a little bit with the parameters the accuracy doesn't change much. To try to understand better the problem I tested it with smaller pieces of the dataset. For instance if I select a subdataset of 100 elements containing x images of zero and 100-x images of one, I get a x% training accuracy. My guess is that the network is optimizing the parameters to recognise only the first digit.
Now my questions are: what I am missing? Why isn't this the right implementation of the neural network described above?
If you are practising on the MNIST dataset, to classify 10 digits, you have 10 classes to predict. Rather than sigmoid, you should use ReLU in the hidden layers ( in your case the first layer ) and use softmax activation on the output layer. Use categorical crossentropy loss function with adam or sgd optimizer.
I'm a beginer in this field of Deep Learning. I'm trying to use Keras for a LSTM in a regression problem. I would like to build an ANN which could exploit the memory cell between one prediction and the next one.
In more details... I have a neural network (Keras) with 2 Hidden layer-LSTM and 1 output layer for a regression context.
The batch_size is equal to 7, timestep equal to 1 and I have 5749 samples.
I'm only interested to understand if using timestep == 1 is the same thing as using an MLP instead of LSTM. For time_step, I'm referring to the reshape phase for the input of the Sequential model in Keras. The output is a single regression.
I'm not interested in the previous inputs, but I'm interested only on the output of the network as an information for the next prediction.
Thank you in advance!
You can say so :)
You're right in thinking that you won't have any recurrency anymore.
But internally, there will be still more operations than in regular Dense layers, due to the existence of more kernels.
But be careful:
If you use stateful=True, it will still be a recurrent LSTM!
If you use initial states properly, you can still make it recurrent.
If you're interested in creating custom operations with the memory/state of the cells, you could try creating your custom recurrent cell taking the LSTMCell code as a template.
Then you'd use that cell in a RNN(CustomCell, ...) layer.