Output the loss/cost function in keras

Output the loss/cost function in keras - python

I am trying to find the cost function in Keras. I am running an LSTM with the loss function categorical_crossentropy and I added a Regularizer. How do I output what the cost function looks like after my Regularizer this for my own analysis?
model = Sequential()
model.add(LSTM(
NUM_HIDDEN_UNITS,
return_sequences=True,
input_shape=(PHRASE_LEN, SYMBOL_DIM),
kernel_regularizer=regularizers.l2(0.01)
))
model.add(Dropout(0.3))
model.add(LSTM(NUM_HIDDEN_UNITS, return_sequences=False))
model.add(Dropout(0.3))
model.add(Dense(SYMBOL_DIM))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(lr=1e-03, rho=0.9, epsilon=1e-08))

How do i output what the cost function looks like after my regularizer this for my own analysis?
Surely you can achieve this by obtaining the output (yourlayer.output) of the layer you want to see and print it (see here). However there are better ways to visualize these things.
Meet Tensorboard.
This is a powerful visualization tool that enables you to track and visualize your metrics, outputs, architecture, kernel_initializations, etc. The good news is that there is already a Tensorboard Keras Callback that you can use for this purpose; you just have to import it. To use it just pass an instance of the Callback to your fit method, something like this:
from keras.callbacks import TensorBoard
#indicate folder to save, plus other options
tensorboard = TensorBoard(log_dir='./logs/run1', histogram_freq=1,
write_graph=True, write_images=False)
#save it in your callback list
callbacks_list = [tensorboard]
#then pass to fit as callback, remember to use validation_data also
model.fit(X, Y, callbacks=callbacks_list, epochs=64,
validation_data=(X_test, Y_test), shuffle=True)
After that, start your Tensorboard sever (it runs locally on your pc) by executing:
tensorboard --logdir=logs/run1
For example, this is what my Kernels look like on two different models I tested (to compare them you have to save separate runs and then start Tensorboard on the parent directory instead). This is on the Histograms tab, on my second layer:
The model on the left I initialized with kernel_initializer='random_uniform', thus its shape is the one of a Uniform Distribution. The model on the right I initialized with kernel_initializer='normal', thus why it appears as a Gaussian distribution throughout my epochs (about 30).
This way you could visualize how your kernels and layers "look like", in a more interactive and understandable way than printing outputs. This is just one of the great features Tensorboard has, and it can help you develop your Deep Learning models faster and better.
Of course there are more options to the Tensorboard Callback and for Tensorboard in general, so I do suggest you thoroughly read the links provided if you decide to attempt this. For more information you can check this and also this questions.
Edit: So, you comment you want to know how your regularized loss "looks" analytically. Let's remember that by adding a Regularizer to a loss function we are basically extending the loss function to include some "penalty" or preference in it. So, if you are using cross_entropy as your loss function and adding an l2 regularizer (that is Euclidean Norm) with a weight of 0.01 your whole loss function would look something like:

Related

Does model.compile() go inside MirroredStrategy

I have a network for transfer learning and want to train on two GPUs. I have just trained on one up to this point and am looking for ways to speed things up. I am getting conflicting answers about how to use it most efficiently.
strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0", "/gpu:1"])
with strategy.scope():
base_model = MobileNetV2(weights='imagenet', include_top=False, input_shape=(200,200,3))
x = base_model.output
x = GlobalAveragePooling2D(name="class_pool")(x)
x = Dense(1024, activation='relu', name="class_dense1")(x)
types = Dense(20,activation='softmax', name='Class')(x)
model = Model(inputs=base_model.input, outputs=[types])
Then I set trainable layers:
for layer in model.layers[:160]:
layer.trainable=False
for layer in model.layers[135:]:
layer.trainable=True
Then I compile
optimizer = Adam(learning_rate=.0000001)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics='accuracy')
Should everything be nested insidestrategy.scope()?
This tutorial shows compile within but this tutorial shows it is outside.
Thefirst one shows it outside
mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(1,))])
model.compile(loss='mse', optimizer='sgd')
but says this right after
In this example we used MirroredStrategy so we can run this on a machine with multiple GPUs. strategy.scope() indicates to Keras which strategy to use to distribute the training. Creating models/optimizers/metrics inside this scope allows us to create distributed variables instead of regular variables. Once this is set up, you can fit your model like you would normally. MirroredStrategy takes care of replicating the model's training on the available GPUs, aggregating gradients, and more.

It does not matter where it goes because under the hood, model.compile() would create the optimizer,loss and accuracy metric variables under the strategy scope in use. Then you can call model.fit which would also schedule a training loop under the same strategy scope.
I would suggest further searching as my answer does not have any experimental basis to it. It's just what I think.

when to call compile while training a tensorflow (2.0) model in incremental fashion?

I am writing a neural network to train incrementally (not online). Here is a snippet of the code
output = create_model()
model = Model(inputs=values, outputs=output)
if start_epoch > 1:
weights_list = load_model_from_pickle()
model.set_weights(weights_list)
model.compile(loss='binary_crossentropy', optimizer='adam')
model.fit(data , label, epochs=1, verbose=1, batch_size=1024, shuffle=False)
In essence, I want to load previously trained weights and train for a few more epochs. I read some SO reply that calling compile changes the weights? Is there any other way to do it? Does it make sense to set weight after calling compile? Will the answer change if I run my model in multi gpu setting?

You need to compile the model ones and after training when you reload the model, you dont' require to compile it again. Read more here.
Compile function defines the optimizer, loss functions and metrics you want. It does not change any weights. For more detailed information, read here.

How to find total loss, accuracy, prediction date time using Keras Python 3?

I am trying to predict the OHLC values. Till now I have achieved this:
Jupyter Note of the complete code
As one can see my code is running. But I have few doubts regarding the model that I have created.
I do not know how much accurate the output was predicted.
Whether the error loss improved or not.
I could not even understand why the predictions made do not have date and time along with them on the graph? I want to know for what date the prediction was made.
How I can achieve the goal? Is there a way I can combine the OHLC with the time of prediction?

To have those metrics use:
On the docs page of metrics of Keras
Usage of metrics
A metric is a function that is used to judge the performance of your model. Metric functions are to be supplied in the metrics parameter when a model is compiled.
A metric function is similar to a loss function, except that the results from evaluating a metric are not used when training the model.
model.compile(loss='mean_squared_error',optimizer='sgd', metrics=['mae', 'acc'])
You can either pass the name of an existing metric, or pass a Theano/TensorFlow symbolic function (see Custom metrics).
Arguments
y_true: True labels. Theano/TensorFlow tensor.
y_pred: Predictions. Theano/TensorFlow tensor of the same shape as y_true.
Returns
Single tensor value representing the mean of the output array across all datapoints.
And to evaluate the performance of the model once trained you can use
On the docs page of Model of Keras
evaluate
evaluate(self, x=None, y=None, batch_size=None, verbose=1, sample_weight=None, steps=None)
Returns
the loss value & metrics values for the model in test mode.
In your case
I guess you just have to change these rows:
model_open.compile(loss='mse', optimizer='rmsprop')
model_high.compile(loss='mse', optimizer='rmsprop')
model_low.compile(loss='mse', optimizer='rmsprop')
model_close.compile(loss='mse', optimizer='rmsprop')
to
model_open.compile(loss='mse', optimizer='rmsprop', metrics=['mae', 'acc'])
model_high.compile(loss='mse', optimizer='rmsprop', metrics=['mae', 'acc'])
model_low.compile(loss='mse', optimizer='rmsprop', metrics=['mae', 'acc'])
model_close.compile(loss='mse', optimizer='rmsprop', metrics=['mae', 'acc'])
and at the end evaluate the models with:
model_open.evaluate( testX_open, testY_open)
model_high.evaluate( testX_high, testY_high)
model_low.evaluate( testX_low, testY_low)
model_close.evaluate(testX_close,testY_close)
To add metrics like accuracy acc on the training bar you just have to modifiy the compile statement like this.

Keras: Training loss decrases (accuracy increase) while validation loss increases (accuracy decrease)

I am working on a very sparse dataset with the point of predicting 6 classes.
I have tried working with a lot of models and architectures, but the problem remains the same.
When I start training, the acc for training will slowly start to increase and loss will decrease where as the validation will do the exact opposite.
I have really tried to deal with overfitting, and I simply cannot still believe that this is what is coursing this issue.
What have I tried
Transfer learning on VGG16:
exclude top layer and add dense layer with 256 units and 6 units softmax output layer
finetune the top CNN block
finetune the top 3-4 CNN blocks
To deal with overfitting I use heavy augmentation in Keras and dropout after the 256 dense layer with p=0.5.
Creating own CNN with VGG16-ish architecture:
including batch normalization wherever possible
L2 regularization on each CNN+dense layer
Dropout from anywhere between 0.5-0.8 after each CNN+dense+pooling layer
Heavy data augmentation in "on the fly" in Keras
Realising that perhaps I have too many free parameters:
decreasing the network to only contain 2 CNN blocks + dense + output.
dealing with overfitting in the same manner as above.
Without exception all training sessions are looking like this:
Training & Validation loss+accuracy
The last mentioned architecture looks like this:
reg = 0.0001
model = Sequential()
model.add(Conv2D(8, (3, 3), input_shape=input_shape, padding='same',
kernel_regularizer=regularizers.l2(reg)))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.7))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
model.add(Conv2D(16, (3, 3), input_shape=input_shape, padding='same',
kernel_regularizer=regularizers.l2(reg)))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.7))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(16, kernel_regularizer=regularizers.l2(reg)))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(6))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='SGD',metrics=['accuracy'])
And the data is augmented by the generator in Keras and is loaded with flow_from_directory:
train_datagen = ImageDataGenerator(rotation_range=10,
width_shift_range=0.05,
height_shift_range=0.05,
shear_range=0.05,
zoom_range=0.05,
rescale=1/255.,
fill_mode='nearest',
channel_shift_range=0.2*255)
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
shuffle = True,
class_mode='categorical')
validation_datagen = ImageDataGenerator(rescale=1/255.)
validation_generator = validation_datagen.flow_from_directory(
validation_data_dir,
target_size=(img_width, img_height),
batch_size=1,
shuffle = True,
class_mode='categorical')

What I can think of by analyzing your metric outputs (from the link you provided):
Seems to me that approximately near epoch 30 your model is starting to overfit. Therefore you can try stopping your training in that iteration, or well just train it for ~30 epochs (or the exact number). The Keras Callbacks may be useful here, specially the ModelCheckpoint to enable you to stop your training when desired (Ctrl +C) or when certain criteria is met. Here is an example of basic ModelCheckpoint use:
#save best True saves only if the metric improves
chk = ModelCheckpoint("myModel.h5", monitor='val_loss', save_best_only=False)
callbacks_list = [chk]
#pass callback on fit
history = model.fit(X, Y, ... , callbacks=callbacks_list)
(Edit:) As suggested in comments, another option you have available is to use the EarlyStopping callback, where you can specify the minimum change tolerated and the 'patience' or epochs without such improvement before stopping the training. If using this, you have to pass it to the callbacks argument as explained before.
At the current setup you model has (and with the modifications you have tried) that point in your training seems to be the optimal training time for your case; training it further will bring no benefits to your model (in fact, will make it generalize worse).
Given you have tried several modifications, one thing you can do is to try to increase your Network Depth, to give it more capacity. Try adding more layers, one at a time, and check for improvements. Also, you usually you want to start with simpler models first, before attempting a multi-layer solution.
If a simple model doesn't work, add one layer and test again, repeating until satisfied or possible. And by simple I mean really simple, have you tried a non-convolutional approach? Although CNN are great for images, maybe you are overkilling it here.
If nothing seems to work, maybe it is time to get more data, or to generate more data from the one you have by sampling or other techniques. For that last suggestion, try checking this keras blog I have found really useful. Deep learning algorithms usually require substantial amount of training data, specially for complex models, like images, so be aware this may not be an easy task. Hope this helps.

IMHO, this is just normal situation for DL. In Keras you can setup a callback that will save the best model (depending on evaluation metric that you provide), and callback that will stop training if model isn't improving.
See ModelCheckpoint & EarlyStopping callbacks respectively.
P.S. Sorry, maybe I misunderstood question - do you have validation loss decreasing form first step?

Validation loss is increasing. This means you need more data, or more regularization. Standard situation here, and nothing to be worried about. By the way, more parameters (bigger model) is just going to worsen this problem unless you fix it.
So you can now investigate profitably by introducing more examples, L2, L1, or dropout.

I faced a similar problem and managed to fix it by removing the Batch Normalisation layer that's just before the output dense layer. This made a ton of difference. Also one of the suggestions I was given is to remove the Dropout layer as it might be causing Shift Variance. Check this paper
I got part of the solution from this thread.

Accessing gradient values of keras model outputs with respect to inputs

I made a pretty simple NN model to do some non-linear regressions for me in Keras, as an introduction exercise. I uploaded my jupyter notebookit as a gist here (renders properly on github), which is pretty short and to the point.
It just fits the 1D function y = (x - 5)^2 / 25.
I know that Theano and Tensorflow are, at their core, graph based derivative (gradient) passing frameworks. And utilizing the gradients of loss functions with respect to weights for gradient step-based optimization are the main purpose of that.
But what I'm trying to get sense of is if I have access to something that, given a trained model, can approximate derivatives of inputs with respect to the output layer for me (not the weights or loss function). So for this case, I would want y' = 2(x-5)/25.0 estimated via the network's derivative graph for me for an indicated value of the input x, in the network's currently trained state.
Do I have any options in either the Keras or Theano/TF backend APIs to do this, or do I need to do my own chain ruling somehow with the weights (or maybe adding my own non-trainable "identity" layers or something)? In my notebook, you can see me trying a few approaches based what I was able to find so far, but without a ton of success.
To make it concrete, I have a working keras model with the structure:
model = Sequential()
# 1d input
model.add(Dense(64, input_dim=1, activation='relu'))
model.add(Activation("linear"))
model.add(Dense(32, activation='relu'))
model.add(Activation("linear"))
model.add(Dense(32, activation='relu'))
# 1d output
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam', metrics=["accuracy"])
model.fit(x, y,
batch_size=10,
epochs=25,
verbose=0,
validation_data=(x_test, y_test))
I would like to estimate the derivative of output y with respect to input x at, say, x = 0.5.
All of my attempts to extract gradient values based on searching for past answers have led to syntax errors. From a high level point of view, is this a supported feature of Keras, or is any solution going to be backend-specific?

As you mention, Theano and TF are symbolic, so doing a derivative should be quite easy:
import theano
import theano.tensor as T
import keras.backend as K
J = T.grad(model.output[0, 0], model.input)
jacobian = K.function([model.input, K.learning_phase()], [J])
First you compute the symbolic gradient (T.grad) of the output given the input, then you build a function that you can call and does the computation. Note that sometimes this is not that trivial due to shape problems, as you get one derivative for each element in the input.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.