I have a very simple model where I try to predict the value for the expression 2x - 2
It works well, but here is my question.
So far I trained it based on just 20 values (-10 to 10), and it works fine. What I don't understand is that, when I train it on more values, let's say (-10 to 25), my prediction returns [[nan]]. Even the model weights are [<tf.Variable 'dense/kernel:0' shape=(1, 1) dtype=float32, numpy=array([[nan]], dtype=float32)>, <tf.Variable 'dense/bias:0' shape=(1,) dtype=float32, numpy=array([nan], dtype=float32)>]
Why does adding more training data result in nan?
import tensorflow as tf
import numpy as np
from tensorflow import keras
def gen_vals(x):
return x*2 - 2
model = tf.keras.Sequential([
keras.layers.InputLayer(input_shape=(1,)),
keras.layers.Dense(units=1)
])
model.compile(optimizer='sgd', loss='mean_squared_error', metrics=['accuracy'])
xs = []
ys = []
for x in range(-10, 10):
xs.append(x)
ys.append(gen_vals(x))
xs = np.array(xs, dtype=float)
ys = np.array(ys, dtype=float)
model.fit(xs, ys, epochs=500)
print(model.predict([20]))
So I checked your code and the problem is in your loss function. You are using mean_squared_erro. Due to this, your error is reaching infinity.
Epoch 1/15
7/7 [==============================] - 0s 1ms/step - loss: 22108.5449 - accuracy: 0.0000e+00
Epoch 2/15
7/7 [==============================] - 0s 1ms/step - loss: 2046332.6250 - accuracy: 0.0286
Epoch 3/15
7/7 [==============================] - 0s 1ms/step - loss: 18862860288.0000 - accuracy: 0.0000e+00
Epoch 4/15
7/7 [==============================] - 0s 1ms/step - loss: 8550264864768.0000 - accuracy: 0.0286
Epoch 5/15
7/7 [==============================] - 0s 1ms/step - loss: 24012283831123968.0000 - accuracy: 0.0000e+00
Epoch 6/15
7/7 [==============================] - 0s 1ms/step - loss: 22680820415763316736.0000 - accuracy: 0.0000e+00
Epoch 7/15
7/7 [==============================] - 0s 1ms/step - loss: 1655609635839244500992.0000 - accuracy: 0.0000e+00
Epoch 8/15
7/7 [==============================] - 0s 1ms/step - loss: 611697420191128514199552.0000 - accuracy: 0.0000e+00
Epoch 9/15
7/7 [==============================] - 0s 1ms/step - loss: 229219278753403035799519232.0000 - accuracy: 0.0286
Epoch 10/15
7/7 [==============================] - 0s 1ms/step - loss: 2146224141449145393293494845440.0000 - accuracy: 0.0000e+00
Epoch 11/15
7/7 [==============================] - 0s 1ms/step - loss: 1169213631609383639522618269237248.0000 - accuracy: 0.0000e+00
Epoch 12/15
7/7 [==============================] - 0s 1ms/step - loss: 1042864695227246165669313090114551808.0000 - accuracy: 0.0000e+00
Epoch 13/15
7/7 [==============================] - 0s 1ms/step - loss: inf - accuracy: 0.0286
Epoch 14/15
7/7 [==============================] - 0s 3ms/step - loss: inf - accuracy: 0.0286
Epoch 15/15
7/7 [==============================] - 0s 1ms/step - loss: inf - accuracy: 0.0286
As MSE loss function squares the actual loss and due to the toy dataset that you have it might happen that it reaches inf as in your case.
I will suggest using MAE mean absolute error for your toy example and toy network.
I checked the network provides decent results.
import tensorflow as tf
import numpy as np
from tensorflow import keras
def gen_vals(x):
return x*2 - 2
model = tf.keras.Sequential([
keras.layers.InputLayer(input_shape=(1,)),
keras.layers.Dense(units=1)
])
model.compile(optimizer='sgd', loss='mae', metrics=['accuracy'])
xs = []
ys = []
for x in range(-10, 25):
xs.append(x)
ys.append(gen_vals(x))
Epoch 1/15
7/7 [==============================] - 0s 1ms/step - loss: 14.5341 - accuracy: 0.0000e+00
Epoch 2/15
7/7 [==============================] - 0s 2ms/step - loss: 7.5144 - accuracy: 0.0000e+00
Epoch 3/15
7/7 [==============================] - 0s 2ms/step - loss: 2.0986 - accuracy: 0.0000e+00
Epoch 4/15
7/7 [==============================] - 0s 1ms/step - loss: 1.4349 - accuracy: 0.0000e+00
Epoch 5/15
7/7 [==============================] - 0s 1ms/step - loss: 1.3424 - accuracy: 0.0000e+00
Epoch 6/15
7/7 [==============================] - 0s 1ms/step - loss: 1.5290 - accuracy: 0.0000e+00
Epoch 7/15
7/7 [==============================] - 0s 1ms/step - loss: 1.4349 - accuracy: 0.0000e+00
Epoch 8/15
7/7 [==============================] - 0s 1ms/step - loss: 1.2839 - accuracy: 0.0000e+00
Epoch 9/15
7/7 [==============================] - 0s 1ms/step - loss: 1.4003 - accuracy: 0.0000e+00
Epoch 10/15
7/7 [==============================] - 0s 1ms/step - loss: 1.4593 - accuracy: 0.0000e+00
Epoch 11/15
7/7 [==============================] - 0s 1ms/step - loss: 1.4561 - accuracy: 0.0000e+00
Epoch 12/15
7/7 [==============================] - 0s 1ms/step - loss: 1.4761 - accuracy: 0.0000e+00
Epoch 13/15
7/7 [==============================] - 0s 2ms/step - loss: 1.3080 - accuracy: 0.0000e+00
Epoch 14/15
7/7 [==============================] - 0s 1ms/step - loss: 1.1885 - accuracy: 0.0000e+00
Epoch 15/15
7/7 [==============================] - 0s 1ms/step - loss: 1.2665 - accuracy: 0.0000e+00
[[38.037006]]
Related
The code and output when I execute once:
model.fit(X,y,validation_split=0.2, epochs=10, batch_size= 100)
Epoch 1/10
8/8 [==============================] - 1s 31ms/step - loss: 0.6233 - accuracy: 0.6259 - val_loss: 0.6333 - val_accuracy: 0.6461
Epoch 2/10
8/8 [==============================] - 0s 5ms/step - loss: 0.5443 - accuracy: 0.7722 - val_loss: 0.4803 - val_accuracy: 0.7978
Epoch 3/10
8/8 [==============================] - 0s 4ms/step - loss: 0.5385 - accuracy: 0.7904 - val_loss: 0.4465 - val_accuracy: 0.8202
Epoch 4/10
8/8 [==============================] - 0s 5ms/step - loss: 0.5014 - accuracy: 0.7932 - val_loss: 0.5228 - val_accuracy: 0.7753
Epoch 5/10
8/8 [==============================] - 0s 4ms/step - loss: 0.5283 - accuracy: 0.7736 - val_loss: 0.4284 - val_accuracy: 0.8315
Epoch 6/10
8/8 [==============================] - 0s 4ms/step - loss: 0.4936 - accuracy: 0.7989 - val_loss: 0.4309 - val_accuracy: 0.8539
Epoch 7/10
8/8 [==============================] - 0s 4ms/step - loss: 0.4700 - accuracy: 0.8045 - val_loss: 0.4622 - val_accuracy: 0.8146
Epoch 8/10
8/8 [==============================] - 0s 4ms/step - loss: 0.4732 - accuracy: 0.8087 - val_loss: 0.4159 - val_accuracy: 0.8202
Epoch 9/10
8/8 [==============================] - 0s 5ms/step - loss: 0.5623 - accuracy: 0.7764 - val_loss: 0.7438 - val_accuracy: 0.8090
Epoch 10/10
8/8 [==============================] - 0s 4ms/step - loss: 0.5886 - accuracy: 0.7806 - val_loss: 0.5889 - val_accuracy: 0.6798
Output when I execute the same line of code again in jupyter lab:
Epoch 1/10
8/8 [==============================] - 0s 9ms/step - loss: 0.5269 - accuracy: 0.7496 - val_loss: 0.4568 - val_accuracy: 0.8371
Epoch 2/10
8/8 [==============================] - 0s 4ms/step - loss: 0.4688 - accuracy: 0.8087 - val_loss: 0.4885 - val_accuracy: 0.7753
Epoch 3/10
8/8 [==============================] - 0s 4ms/step - loss: 0.4597 - accuracy: 0.8017 - val_loss: 0.4638 - val_accuracy: 0.7865
Epoch 4/10
8/8 [==============================] - 0s 4ms/step - loss: 0.4741 - accuracy: 0.7890 - val_loss: 0.4277 - val_accuracy: 0.8258
Epoch 5/10
8/8 [==============================] - 0s 5ms/step - loss: 0.4840 - accuracy: 0.8003 - val_loss: 0.4712 - val_accuracy: 0.7978
Epoch 6/10
8/8 [==============================] - 0s 4ms/step - loss: 0.4488 - accuracy: 0.8087 - val_loss: 0.4825 - val_accuracy: 0.7809
Epoch 7/10
8/8 [==============================] - 0s 5ms/step - loss: 0.4432 - accuracy: 0.8087 - val_loss: 0.4865 - val_accuracy: 0.8090
Epoch 8/10
8/8 [==============================] - 0s 4ms/step - loss: 0.4299 - accuracy: 0.8059 - val_loss: 0.4458 - val_accuracy: 0.8371
Epoch 9/10
8/8 [==============================] - 0s 4ms/step - loss: 0.4358 - accuracy: 0.8172 - val_loss: 0.5232 - val_accuracy: 0.8034
Epoch 10/10
8/8 [==============================] - 0s 5ms/step - loss: 0.4697 - accuracy: 0.8059 - val_loss: 0.4421 - val_accuracy: 0.8202
It continues the previous fit, and my question is: how can I make it start from the beginning again? without having to create a new model, so the second time I execute the line of code is independent of the first one
This is a little bit tricky without being able to see the code to initialise the model, and not sure why you'd need to reset the weights without re-initialising the model.
If you save the weights of your model before training, you can then then reset to those initial weights before you train again.
modelWeights = model.get_weights()
model.set_weights(modelWeights)
I am trying to train model in three machines using tensorflow MultiWorkerMirroredStrategy. The script is based on the tensorflow tutorial:Multi-worker training with Keras(https://www.tensorflow.org/tutorials/distribute/multi_worker_with_keras#dataset_sharding_and_batch_size):
import tensorflow_datasets as tfds
import tensorflow as tf
tfds.disable_progress_bar()
import os
import json
strategy = tf.distribute.MultiWorkerMirroredStrategy()
BUFFER_SIZE = 10000
BATCH_SIZE = 64
def make_datasets_unbatched():
# scale MNIST data from (0, 255] to (0., 1.]
def scale(image, label):
image = tf.cast(image, tf.float32)
image /= 255
return image, label
# data download to /home/pzs/tensorflow_datasets/mnist/
datasets, info = tfds.load(name='mnist',
with_info=True,
as_supervised=True)
return datasets['train'].map(scale).cache().shuffle(BUFFER_SIZE)
def build_and_compile_cnn_model():
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10)
])
model.compile(
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer=tf.keras.optimizers.SGD(learning_rate=0.001),
metrics=['accuracy'])
return model
NUM_WORKERS = 3
GLOBAL_BATCH_SIZE = 64 * NUM_WORKERS
train_datasets = make_datasets_unbatched().batch(GLOBAL_BATCH_SIZE)
options = tf.data.Options()
options.experimental_distribute.auto_shard_policy = tf.data.experimental.AutoShardPolicy.OFF
train_datasets = make_datasets_unbatched().batch(BATCH_SIZE)
train_datasets = train_datasets.with_options(options)
with strategy.scope():
multi_worker_model = build_and_compile_cnn_model()
multi_worker_model.fit(x=train_datasets, epochs=30, steps_per_epoch=5)
I run this script separately on tree node3:
on node 1:
TF_CONFIG='{"cluster": {"worker": ["192.168.4.36:12346", "192.168.4.83:12346", "192.168.4.83:12346"]}, "task": {"index": 0, "type": "worker"}}' python3 multi_worker_with_keras.py
on node 2:
TF_CONFIG='{"cluster": {"worker": ["192.168.4.36:12346", "192.168.4.83:12346", "192.168.4.83:12346"]}, "task": {"index": 1, "type": "worker"}}' python3 multi_worker_with_keras.py
on node 3:
TF_CONFIG='{"cluster": {"worker": ["192.168.4.36:12346", "192.168.4.83:12346", "192.168.4.83:12346"]}, "task": {"index": 2, "type": "worker"}}' python3 multi_worker_with_keras.py
and the results of training error and accuracy are:
Epoch 1/30
2022-02-16 11:52:25.060362: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8201
5/5 [==============================] - 7s 195ms/step - loss: 2.3010 - accuracy: 0.0719
Epoch 2/30
5/5 [==============================] - 1s 181ms/step - loss: 2.2984 - accuracy: 0.0688
Epoch 3/30
5/5 [==============================] - 1s 182ms/step - loss: 2.2993 - accuracy: 0.0781
Epoch 4/30
5/5 [==============================] - 1s 182ms/step - loss: 2.2917 - accuracy: 0.0594
Epoch 5/30
5/5 [==============================] - 1s 182ms/step - loss: 2.2987 - accuracy: 0.0969
Epoch 6/30
5/5 [==============================] - 1s 183ms/step - loss: 2.2992 - accuracy: 0.0906
Epoch 7/30
5/5 [==============================] - 1s 181ms/step - loss: 2.2978 - accuracy: 0.1000
Epoch 8/30
5/5 [==============================] - 1s 183ms/step - loss: 2.2887 - accuracy: 0.0969
Epoch 9/30
5/5 [==============================] - 1s 182ms/step - loss: 2.2887 - accuracy: 0.0969
Epoch 10/30
5/5 [==============================] - 1s 183ms/step - loss: 2.2930 - accuracy: 0.0844
Epoch 11/30
5/5 [==============================] - 1s 184ms/step - loss: 2.2905 - accuracy: 0.1000
Epoch 12/30
5/5 [==============================] - 1s 184ms/step - loss: 2.2884 - accuracy: 0.0812
Epoch 13/30
5/5 [==============================] - 1s 186ms/step - loss: 2.2837 - accuracy: 0.1250
Epoch 14/30
5/5 [==============================] - 1s 189ms/step - loss: 2.2842 - accuracy: 0.1094
Epoch 15/30
5/5 [==============================] - 1s 190ms/step - loss: 2.2856 - accuracy: 0.0750
Epoch 16/30
5/5 [==============================] - 1s 192ms/step - loss: 2.2911 - accuracy: 0.0719
Epoch 17/30
5/5 [==============================] - 1s 188ms/step - loss: 2.2805 - accuracy: 0.1031
Epoch 18/30
5/5 [==============================] - 1s 187ms/step - loss: 2.2800 - accuracy: 0.1219
Epoch 19/30
5/5 [==============================] - 1s 190ms/step - loss: 2.2799 - accuracy: 0.1063
Epoch 20/30
5/5 [==============================] - 1s 192ms/step - loss: 2.2769 - accuracy: 0.1187
Epoch 21/30
5/5 [==============================] - 1s 193ms/step - loss: 2.2768 - accuracy: 0.1344
Epoch 22/30
5/5 [==============================] - 1s 190ms/step - loss: 2.2754 - accuracy: 0.1187
Epoch 23/30
5/5 [==============================] - 1s 190ms/step - loss: 2.2821 - accuracy: 0.1187
Epoch 24/30
5/5 [==============================] - 1s 188ms/step - loss: 2.2832 - accuracy: 0.0844
Epoch 25/30
5/5 [==============================] - 1s 190ms/step - loss: 2.2793 - accuracy: 0.1125
Epoch 26/30
5/5 [==============================] - 1s 191ms/step - loss: 2.2762 - accuracy: 0.1406
Epoch 27/30
5/5 [==============================] - 1s 194ms/step - loss: 2.2696 - accuracy: 0.1344
Epoch 28/30
5/5 [==============================] - 1s 192ms/step - loss: 2.2717 - accuracy: 0.1406
Epoch 29/30
5/5 [==============================] - 1s 191ms/step - loss: 2.2680 - accuracy: 0.1500
Epoch 30/30
5/5 [==============================] - 1s 193ms/step - loss: 2.2696 - accuracy: 0.1500
all results are exactly the same for 3 nodes.
my question is:
When using tf.distribute.MultiWorkerMirroredStrategy to train model among multiple machines, each process does forward and backward propagation independently using different slice of a batch training data, why the training errors are all the same for the corresponding epoch in 3 nodes? I try to run a different script and found the same case.
This is expected. The metric values would be allreduced in fit method.
https://github.com/tensorflow/tensorflow/issues/39343#issuecomment-627008557
I've been working on the California Housing dataset from Chapter 10 (page 300) of "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems Book by Aurelien Geron".
I copy-pasted the following code into my Jupyter Notebook from his Colab Notebook:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
housing = fetch_california_housing()
X_train_full, X_test, y_train_full, y_test = train_test_split(housing.data, housing.target, random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(X_train_full, y_train_full, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_valid = scaler.transform(X_valid)
X_test = scaler.transform(X_test)
np.random.seed(42)
tf.random.set_seed(42)
model = keras.models.Sequential([
keras.layers.Dense(30, activation="relu", input_shape=X_train.shape[1:]),
keras.layers.Dense(1)
])
model.compile(loss="mean_squared_error", optimizer=keras.optimizers.SGD(learning_rate=1e-3))
history = model.fit(X_train, y_train, epochs=20, validation_data=(X_valid, y_valid))
mse_test = model.evaluate(X_test, y_test)
X_new = X_test[:3]
y_pred = model.predict(X_new)
My loss looks like this:
Epoch 1/20
363/363 [==============================] - 1s 2ms/step - loss: 22974151580975104.0000 - val_loss: 14617883443200.0000
Epoch 2/20
363/363 [==============================] - 0s 1ms/step - loss: 7723970199552.0000 - val_loss: 3417093963776.0000
Epoch 3/20
363/363 [==============================] - 0s 1ms/step - loss: 1805568049152.0000 - val_loss: 798785339392.0000
Epoch 4/20
363/363 [==============================] - 0s 1ms/step - loss: 422071828480.0000 - val_loss: 186725318656.0000
Epoch 5/20
363/363 [==============================] - 0s 1ms/step - loss: 98664218624.0000 - val_loss: 43649114112.0000
Epoch 6/20
363/363 [==============================] - 1s 2ms/step - loss: 23063846912.0000 - val_loss: 10203475968.0000
Epoch 7/20
363/363 [==============================] - 1s 2ms/step - loss: 5391437312.0000 - val_loss: 2385182720.0000
Epoch 8/20
363/363 [==============================] - 1s 1ms/step - loss: 1260309632.0000 - val_loss: 557565056.0000
Epoch 9/20
363/363 [==============================] - 0s 1ms/step - loss: 294611968.0000 - val_loss: 130337808.0000
Epoch 10/20
363/363 [==============================] - 1s 1ms/step - loss: 68868872.0000 - val_loss: 30468182.0000
Epoch 11/20
363/363 [==============================] - 1s 2ms/step - loss: 16098886.0000 - val_loss: 7122408.0000
Epoch 12/20
363/363 [==============================] - 1s 2ms/step - loss: 3763294.0000 - val_loss: 1665006.2500
Epoch 13/20
363/363 [==============================] - 0s 1ms/step - loss: 879712.9375 - val_loss: 389246.1250
Epoch 14/20
363/363 [==============================] - 1s 1ms/step - loss: 205644.4062 - val_loss: 91006.7344
Epoch 15/20
363/363 [==============================] - 1s 1ms/step - loss: 48073.1602 - val_loss: 21282.1250
Epoch 16/20
363/363 [==============================] - 1s 1ms/step - loss: 11238.7031 - val_loss: 4979.3115
Epoch 17/20
363/363 [==============================] - 1s 2ms/step - loss: 2628.1484 - val_loss: 1166.6659
Epoch 18/20
363/363 [==============================] - 1s 2ms/step - loss: 615.3716 - val_loss: 274.4792
Epoch 19/20
363/363 [==============================] - 1s 2ms/step - loss: 144.8725 - val_loss: 65.5832
Epoch 20/20
363/363 [==============================] - 1s 2ms/step - loss: 34.9049 - val_loss: 16.5316
162/162 [==============================] - 0s 1ms/step - loss: 16.3210
But his loss looks like this:
Epoch 1/20
363/363 [==============================] - 1s 2ms/step - loss: 1.6419 - val_loss: 0.8560
Epoch 2/20
363/363 [==============================] - 1s 2ms/step - loss: 0.7047 - val_loss: 0.6531
Epoch 3/20
363/363 [==============================] - 1s 2ms/step - loss: 0.6345 - val_loss: 0.6099
Epoch 4/20
363/363 [==============================] - 1s 2ms/step - loss: 0.5977 - val_loss: 0.5658
Epoch 5/20
363/363 [==============================] - 1s 2ms/step - loss: 0.5706 - val_loss: 0.5355
Epoch 6/20
363/363 [==============================] - 1s 2ms/step - loss: 0.5472 - val_loss: 0.5173
Epoch 7/20
363/363 [==============================] - 1s 2ms/step - loss: 0.5288 - val_loss: 0.5081
Epoch 8/20
363/363 [==============================] - 1s 2ms/step - loss: 0.5130 - val_loss: 0.4799
Epoch 9/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4992 - val_loss: 0.4690
Epoch 10/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4875 - val_loss: 0.4656
Epoch 11/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4777 - val_loss: 0.4482
Epoch 12/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4688 - val_loss: 0.4479
Epoch 13/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4615 - val_loss: 0.4296
Epoch 14/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4547 - val_loss: 0.4233
Epoch 15/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4488 - val_loss: 0.4176
Epoch 16/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4435 - val_loss: 0.4123
Epoch 17/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4389 - val_loss: 0.4071
Epoch 18/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4347 - val_loss: 0.4037
Epoch 19/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4306 - val_loss: 0.4000
Epoch 20/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4273 - val_loss: 0.3969
162/162 [==============================] - 0s 1ms/step - loss: 0.4212
Why is my loss so much higher???
So whenever I run my TensorFlow model the margin of error (loss / val_loss) graph is extremely back and fourth and I was wondering how I could stop this /reduce it here is a picture
Graph
here's the code if anyone wants to run it should work fine as long as you have the pips
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import matplotlib.pyplot as plt
import datetime
import tensorboard
from keras.models import Sequential
from keras.layers import Dense
train_df = pd.read_csv('https://www.dropbox.com/s/ednsabkdzs8motw/ROK%20INPUT%20DATA%20-%20Sheet1.csv?dl=1')
eval_df = pd.read_csv('https://www.dropbox.com/s/irnqwc1v67wmbfk/ROK%20EVAL%20DATA%20-%20Sheet1.csv?dl=1')
train_df['Troops'] = train_df['Troops'].astype(float)
train_df['Enemy Troops'] = train_df['Enemy Troops'].astype(float)
train_df['Damage'] = train_df['Damage'].astype(float)
eval_df['Troops'] = eval_df['Troops'].astype(float)
eval_df['Enemy Troops'] = eval_df['Enemy Troops'].astype(float)
eval_df['Damage'] = eval_df['Damage'].astype(float)
damage = train_df.pop('Damage')
dataset = tf.data.Dataset.from_tensor_slices((train_df.values, damage.values))
test_labels = eval_df.pop('Damage')
test_features = eval_df.copy()
model = keras.Sequential(
[
tf.keras.layers.InputLayer(input_shape = (8,)),
tf.keras.layers.Dense(8, activation='relu'),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1),
]
)
model.compile(optimizer='adam', loss='mean_squared_error')
model.summary()
history = model.fit(train_df, damage, validation_split=0.2, epochs=5000)
def plot_loss(history):
plt.plot(history.history['loss'], label='loss')
plt.plot(history.history['val_loss'], label='val_loss')
plt.ylim([0, 2000])
plt.xlabel('Epoch')
plt.ylabel('Error [MPG]')
plt.legend()
plt.grid(True)
plot_loss(history)
plt.show()
This is due to the labeled data has imbalanced values in your dataset which means you should use mean_absolute_error in place of mean_squared_error as a loss function to prevent outliers.
Please check below code:
model.compile(optimizer='adam', loss=tf.losses.MeanAbsoluteError())
history = model.fit(train_df, damage,
validation_data=(test_features,test_labels), epochs=50)
Output:
Epoch 1/100
2/2 [==============================] - 1s 150ms/step - loss: 1015.9664 - val_loss: 129.8347
Epoch 2/100
2/2 [==============================] - 0s 30ms/step - loss: 244.7547 - val_loss: 28.9964
Epoch 3/100
2/2 [==============================] - 0s 32ms/step - loss: 629.1597 - val_loss: 20.9922
Epoch 4/100
2/2 [==============================] - 0s 35ms/step - loss: 612.6526 - val_loss: 45.7117
Epoch 5/100
2/2 [==============================] - 0s 34ms/step - loss: 335.1754 - val_loss: 93.0301
Epoch 6/100
2/2 [==============================] - 0s 30ms/step - loss: 168.1687 - val_loss: 128.6208
Epoch 7/100
2/2 [==============================] - 0s 30ms/step - loss: 406.5712 - val_loss: 129.7909
Epoch 8/100
2/2 [==============================] - 0s 28ms/step - loss: 391.4481 - val_loss: 113.0307
Epoch 9/100
2/2 [==============================] - 0s 27ms/step - loss: 182.2033 - val_loss: 83.6522
Epoch 10/100
2/2 [==============================] - 0s 42ms/step - loss: 176.4511 - val_loss: 68.1947
Epoch 11/100
2/2 [==============================] - 0s 28ms/step - loss: 266.6671 - val_loss: 71.0774
Epoch 12/100
2/2 [==============================] - 0s 40ms/step - loss: 198.2684 - val_loss: 88.3499
Epoch 13/100
2/2 [==============================] - 0s 28ms/step - loss: 119.8650 - val_loss: 100.7030
Epoch 14/100
2/2 [==============================] - 0s 27ms/step - loss: 189.6049 - val_loss: 94.6102
Epoch 15/100
2/2 [==============================] - 0s 28ms/step - loss: 146.5237 - val_loss: 77.1270
Epoch 16/100
2/2 [==============================] - 0s 30ms/step - loss: 106.8908 - val_loss: 60.1246
Epoch 17/100
2/2 [==============================] - 0s 29ms/step - loss: 132.0525 - val_loss: 56.3836
Epoch 18/100
2/2 [==============================] - 0s 29ms/step - loss: 129.6660 - val_loss: 64.7796
Epoch 19/100
2/2 [==============================] - 0s 32ms/step - loss: 118.3896 - val_loss: 68.5954
Epoch 20/100
2/2 [==============================] - 0s 30ms/step - loss: 114.2150 - val_loss: 67.0202
Epoch 21/100
2/2 [==============================] - 0s 32ms/step - loss: 112.6538 - val_loss: 65.2389
Epoch 22/100
2/2 [==============================] - 0s 30ms/step - loss: 107.1644 - val_loss: 59.4646
Epoch 23/100
2/2 [==============================] - 0s 31ms/step - loss: 106.9518 - val_loss: 51.4506
Epoch 24/100
2/2 [==============================] - 0s 28ms/step - loss: 107.4203 - val_loss: 48.4060
Epoch 25/100
2/2 [==============================] - 0s 30ms/step - loss: 108.1180 - val_loss: 48.5364
Epoch 26/100
2/2 [==============================] - 0s 30ms/step - loss: 106.6088 - val_loss: 47.0263
Epoch 27/100
2/2 [==============================] - 0s 29ms/step - loss: 107.6407 - val_loss: 47.3658
Epoch 28/100
2/2 [==============================] - 0s 32ms/step - loss: 105.1175 - val_loss: 45.2668
Epoch 29/100
2/2 [==============================] - 0s 35ms/step - loss: 105.9028 - val_loss: 45.2371
Epoch 30/100
2/2 [==============================] - 0s 32ms/step - loss: 103.5908 - val_loss: 48.8512
Epoch 31/100
2/2 [==============================] - 0s 27ms/step - loss: 102.6504 - val_loss: 53.9927
Epoch 32/100
2/2 [==============================] - 0s 28ms/step - loss: 100.8014 - val_loss: 58.1143
Epoch 33/100
2/2 [==============================] - 0s 30ms/step - loss: 114.6031 - val_loss: 49.8311
Epoch 34/100
2/2 [==============================] - 0s 32ms/step - loss: 104.9576 - val_loss: 45.7614
Epoch 35/100
2/2 [==============================] - 0s 35ms/step - loss: 102.5296 - val_loss: 44.3673
Epoch 36/100
2/2 [==============================] - 0s 32ms/step - loss: 105.3818 - val_loss: 40.8473
Epoch 37/100
2/2 [==============================] - 0s 26ms/step - loss: 102.0235 - val_loss: 38.7967
Epoch 38/100
2/2 [==============================] - 0s 30ms/step - loss: 103.9142 - val_loss: 36.8466
Epoch 39/100
2/2 [==============================] - 0s 32ms/step - loss: 105.1095 - val_loss: 40.7968
Epoch 40/100
2/2 [==============================] - 0s 34ms/step - loss: 102.7449 - val_loss: 46.4677
Epoch 41/100
2/2 [==============================] - 0s 29ms/step - loss: 101.3321 - val_loss: 53.2947
Epoch 42/100
2/2 [==============================] - 0s 29ms/step - loss: 106.1829 - val_loss: 53.4320
Epoch 43/100
2/2 [==============================] - 0s 32ms/step - loss: 97.9348 - val_loss: 47.5536
Epoch 44/100
2/2 [==============================] - 0s 31ms/step - loss: 98.5830 - val_loss: 41.6827
Epoch 45/100
2/2 [==============================] - 0s 32ms/step - loss: 98.8272 - val_loss: 36.0022
Epoch 46/100
2/2 [==============================] - 0s 29ms/step - loss: 109.2409 - val_loss: 32.8524
Epoch 47/100
2/2 [==============================] - 0s 39ms/step - loss: 112.1813 - val_loss: 38.2731
Epoch 48/100
2/2 [==============================] - 0s 34ms/step - loss: 99.5903 - val_loss: 40.8585
Epoch 49/100
2/2 [==============================] - 0s 29ms/step - loss: 106.2939 - val_loss: 47.6244
Epoch 50/100
2/2 [==============================] - 0s 27ms/step - loss: 97.1548 - val_loss: 51.4656
Epoch 51/100
2/2 [==============================] - 0s 29ms/step - loss: 97.9445 - val_loss: 46.3714
Epoch 52/100
2/2 [==============================] - 0s 29ms/step - loss: 96.2311 - val_loss: 39.1717
Epoch 53/100
2/2 [==============================] - 0s 38ms/step - loss: 96.8036 - val_loss: 34.6192
Epoch 54/100
2/2 [==============================] - 0s 33ms/step - loss: 99.1502 - val_loss: 31.0388
Epoch 55/100
2/2 [==============================] - 0s 31ms/step - loss: 105.3854 - val_loss: 30.7220
Epoch 56/100
2/2 [==============================] - 0s 46ms/step - loss: 103.1274 - val_loss: 35.8683
Epoch 57/100
2/2 [==============================] - 0s 26ms/step - loss: 94.2024 - val_loss: 38.4891
Epoch 58/100
2/2 [==============================] - 0s 33ms/step - loss: 95.7762 - val_loss: 41.9727
Epoch 59/100
2/2 [==============================] - 0s 34ms/step - loss: 93.3703 - val_loss: 30.4720
Epoch 60/100
2/2 [==============================] - 0s 36ms/step - loss: 93.3310 - val_loss: 20.7104
Epoch 61/100
2/2 [==============================] - 0s 30ms/step - loss: 98.0708 - val_loss: 12.8391
Epoch 62/100
2/2 [==============================] - 0s 31ms/step - loss: 101.6647 - val_loss: 24.7238
Epoch 63/100
2/2 [==============================] - 0s 33ms/step - loss: 89.2492 - val_loss: 35.5170
Epoch 64/100
2/2 [==============================] - 0s 32ms/step - loss: 114.9297 - val_loss: 19.0492
Epoch 65/100
2/2 [==============================] - 0s 42ms/step - loss: 89.8944 - val_loss: 9.8713
Epoch 66/100
2/2 [==============================] - 0s 32ms/step - loss: 119.7986 - val_loss: 12.5584
Epoch 67/100
2/2 [==============================] - 0s 33ms/step - loss: 85.2151 - val_loss: 23.7810
Epoch 68/100
2/2 [==============================] - 0s 31ms/step - loss: 91.6945 - val_loss: 27.0833
Epoch 69/100
2/2 [==============================] - 0s 31ms/step - loss: 91.0443 - val_loss: 20.8228
Epoch 70/100
2/2 [==============================] - 0s 32ms/step - loss: 88.2557 - val_loss: 17.0245
Epoch 71/100
2/2 [==============================] - 0s 31ms/step - loss: 89.2440 - val_loss: 14.7132
Epoch 72/100
2/2 [==============================] - 0s 32ms/step - loss: 89.3514 - val_loss: 13.7965
Epoch 73/100
2/2 [==============================] - 0s 31ms/step - loss: 87.8547 - val_loss: 12.9283
Epoch 74/100
2/2 [==============================] - 0s 32ms/step - loss: 87.2561 - val_loss: 13.1212
Epoch 75/100
2/2 [==============================] - 0s 29ms/step - loss: 87.3379 - val_loss: 15.1878
Epoch 76/100
2/2 [==============================] - 0s 30ms/step - loss: 85.2761 - val_loss: 16.0503
Epoch 77/100
2/2 [==============================] - 0s 34ms/step - loss: 87.9641 - val_loss: 17.0547
Epoch 78/100
2/2 [==============================] - 0s 37ms/step - loss: 82.7034 - val_loss: 15.5357
Epoch 79/100
2/2 [==============================] - 0s 39ms/step - loss: 82.3891 - val_loss: 14.0231
Epoch 80/100
2/2 [==============================] - 0s 31ms/step - loss: 81.3045 - val_loss: 15.4905
Epoch 81/100
2/2 [==============================] - 0s 32ms/step - loss: 81.0241 - val_loss: 15.6177
Epoch 82/100
2/2 [==============================] - 0s 32ms/step - loss: 80.9134 - val_loss: 15.9989
Epoch 83/100
2/2 [==============================] - 0s 32ms/step - loss: 82.4333 - val_loss: 14.1885
Epoch 84/100
2/2 [==============================] - 0s 28ms/step - loss: 79.1791 - val_loss: 14.6505
Epoch 85/100
2/2 [==============================] - 0s 32ms/step - loss: 79.3381 - val_loss: 12.7476
Epoch 86/100
2/2 [==============================] - 0s 33ms/step - loss: 78.1342 - val_loss: 9.6814
Epoch 87/100
2/2 [==============================] - 0s 29ms/step - loss: 83.7268 - val_loss: 7.7703
Epoch 88/100
2/2 [==============================] - 0s 28ms/step - loss: 78.5488 - val_loss: 11.2915
Epoch 89/100
2/2 [==============================] - 0s 27ms/step - loss: 77.6771 - val_loss: 14.2054
Epoch 90/100
2/2 [==============================] - 0s 33ms/step - loss: 78.5004 - val_loss: 14.1587
Epoch 91/100
2/2 [==============================] - 0s 36ms/step - loss: 81.0928 - val_loss: 8.8034
Epoch 92/100
2/2 [==============================] - 0s 29ms/step - loss: 80.1722 - val_loss: 7.1039
Epoch 93/100
2/2 [==============================] - 0s 31ms/step - loss: 77.2722 - val_loss: 6.9086
Epoch 94/100
2/2 [==============================] - 0s 26ms/step - loss: 77.4540 - val_loss: 11.6563
Epoch 95/100
2/2 [==============================] - 0s 27ms/step - loss: 84.5494 - val_loss: 6.5362
Epoch 96/100
2/2 [==============================] - 0s 35ms/step - loss: 76.0600 - val_loss: 15.5146
Epoch 97/100
2/2 [==============================] - 0s 33ms/step - loss: 91.8825 - val_loss: 5.5035
Epoch 98/100
2/2 [==============================] - 0s 28ms/step - loss: 83.6633 - val_loss: 10.4812
Epoch 99/100
2/2 [==============================] - 0s 29ms/step - loss: 76.4038 - val_loss: 11.0298
Epoch 100/100
2/2 [==============================] - 0s 48ms/step - loss: 77.8150 - val_loss: 16.8254
and the loss graph looks like this:
I am doing a Binary classification of IMDB movie review data into Positive or Negative Sentiment.
I have 25K movie reviews and corresponding label.
Preprocessing:
Removed the stop words and split the data into 70:30 training and test. So 17.5K training and 7k test. 17.5k training has been further divided into 14K train and 3.5 k validation dataset as used in keras.model.fit method
Each processed movie review has been converted to TF-IDF vector using Keras text processing module.
Here is my Fully Connected Architecture I used in Keras Dense class
def model_param(self):
""" Method to do deep learning
"""
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD
from keras import regularizers
self.model = Sequential()
# Dense(64) is a fully-connected layer with 64 hidden units.
# in the first layer, you must specify the expected input data shape:
# here, 20-dimensional vectors.
self.model.add(Dense(32, activation='relu', input_dim=self.x_train_std.shape[1]))
self.model.add(Dropout(0.5))
#self.model.add(Dense(60, activation='relu'))
#self.model.add(Dropout(0.5))
self.model.add(Dense(1, activation='sigmoid'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
self.model.compile(loss='binary_crossentropy',
optimizer=sgd,
metrics=['accuracy'])
def fit(self):
""" Training the deep learning network on the training data
"""
self.model.fit(self.x_train_std, self.y_train,validation_split=0.20,
epochs=50,
batch_size=128)
As you see, I tried first without Dropout and as usual I got training accuracy as 1.0 but validation was poor as overfitting was happening. So I added Dropout to prevent overfitting
However inspite of trying multiple dropout ratio, adding another layer with different no. of units in it, changing learning rate I am still getting overfitting on validation dataset. Gets stuck at 85% while training keeps increasing to 99% and so on. Even changed the Epochs from 10 to 50
What could be going wrong here
Train on 14000 samples, validate on 3500 samples
Epoch 1/50
14000/14000 [==============================] - 0s - loss: 0.5684 - acc: 0.7034 - val_loss: 0.3794 - val_acc: 0.8431
Epoch 2/50
14000/14000 [==============================] - 0s - loss: 0.3630 - acc: 0.8388 - val_loss: 0.3304 - val_acc: 0.8549
Epoch 3/50
14000/14000 [==============================] - 0s - loss: 0.2977 - acc: 0.8749 - val_loss: 0.3271 - val_acc: 0.8591
Epoch 4/50
14000/14000 [==============================] - 0s - loss: 0.2490 - acc: 0.8991 - val_loss: 0.3302 - val_acc: 0.8580
Epoch 5/50
14000/14000 [==============================] - 0s - loss: 0.2251 - acc: 0.9086 - val_loss: 0.3388 - val_acc: 0.8546
Epoch 6/50
14000/14000 [==============================] - 0s - loss: 0.2021 - acc: 0.9189 - val_loss: 0.3532 - val_acc: 0.8523
Epoch 7/50
14000/14000 [==============================] - 0s - loss: 0.1797 - acc: 0.9286 - val_loss: 0.3670 - val_acc: 0.8529
Epoch 8/50
14000/14000 [==============================] - 0s - loss: 0.1611 - acc: 0.9350 - val_loss: 0.3860 - val_acc: 0.8543
Epoch 9/50
14000/14000 [==============================] - 0s - loss: 0.1427 - acc: 0.9437 - val_loss: 0.4077 - val_acc: 0.8529
Epoch 10/50
14000/14000 [==============================] - 0s - loss: 0.1344 - acc: 0.9476 - val_loss: 0.4234 - val_acc: 0.8526
Epoch 11/50
14000/14000 [==============================] - 0s - loss: 0.1222 - acc: 0.9534 - val_loss: 0.4473 - val_acc: 0.8506
Epoch 12/50
14000/14000 [==============================] - 0s - loss: 0.1131 - acc: 0.9546 - val_loss: 0.4718 - val_acc: 0.8497
Epoch 13/50
14000/14000 [==============================] - 0s - loss: 0.1079 - acc: 0.9559 - val_loss: 0.4818 - val_acc: 0.8526
Epoch 14/50
14000/14000 [==============================] - 0s - loss: 0.0954 - acc: 0.9630 - val_loss: 0.5057 - val_acc: 0.8494
Epoch 15/50
14000/14000 [==============================] - 0s - loss: 0.0906 - acc: 0.9636 - val_loss: 0.5229 - val_acc: 0.8557
Epoch 16/50
14000/14000 [==============================] - 0s - loss: 0.0896 - acc: 0.9657 - val_loss: 0.5387 - val_acc: 0.8497
Epoch 17/50
14000/14000 [==============================] - 0s - loss: 0.0816 - acc: 0.9666 - val_loss: 0.5579 - val_acc: 0.8463
Epoch 18/50
14000/14000 [==============================] - 0s - loss: 0.0762 - acc: 0.9709 - val_loss: 0.5704 - val_acc: 0.8491
Epoch 19/50
14000/14000 [==============================] - 0s - loss: 0.0718 - acc: 0.9723 - val_loss: 0.5834 - val_acc: 0.8454
Epoch 20/50
14000/14000 [==============================] - 0s - loss: 0.0633 - acc: 0.9752 - val_loss: 0.6032 - val_acc: 0.8494
Epoch 21/50
14000/14000 [==============================] - 0s - loss: 0.0687 - acc: 0.9724 - val_loss: 0.6181 - val_acc: 0.8480
Epoch 22/50
14000/14000 [==============================] - 0s - loss: 0.0614 - acc: 0.9762 - val_loss: 0.6280 - val_acc: 0.8503
Epoch 23/50
14000/14000 [==============================] - 0s - loss: 0.0620 - acc: 0.9756 - val_loss: 0.6407 - val_acc: 0.8500
Epoch 24/50
14000/14000 [==============================] - 0s - loss: 0.0536 - acc: 0.9794 - val_loss: 0.6563 - val_acc: 0.8511
Epoch 25/50
14000/14000 [==============================] - 0s - loss: 0.0538 - acc: 0.9791 - val_loss: 0.6709 - val_acc: 0.8500
Epoch 26/50
14000/14000 [==============================] - 0s - loss: 0.0507 - acc: 0.9807 - val_loss: 0.6869 - val_acc: 0.8491
Epoch 27/50
14000/14000 [==============================] - 0s - loss: 0.0528 - acc: 0.9794 - val_loss: 0.7002 - val_acc: 0.8483
Epoch 28/50
14000/14000 [==============================] - 0s - loss: 0.0465 - acc: 0.9810 - val_loss: 0.7083 - val_acc: 0.8469
Epoch 29/50
14000/14000 [==============================] - 0s - loss: 0.0504 - acc: 0.9796 - val_loss: 0.7153 - val_acc: 0.8497
Epoch 30/50
14000/14000 [==============================] - 0s - loss: 0.0477 - acc: 0.9819 - val_loss: 0.7232 - val_acc: 0.8480
Epoch 31/50
14000/14000 [==============================] - 0s - loss: 0.0475 - acc: 0.9819 - val_loss: 0.7343 - val_acc: 0.8469
Epoch 32/50
14000/14000 [==============================] - 0s - loss: 0.0459 - acc: 0.9819 - val_loss: 0.7352 - val_acc: 0.8500
Epoch 33/50
14000/14000 [==============================] - 0s - loss: 0.0426 - acc: 0.9807 - val_loss: 0.7429 - val_acc: 0.8511
Epoch 34/50
14000/14000 [==============================] - 0s - loss: 0.0396 - acc: 0.9846 - val_loss: 0.7576 - val_acc: 0.8477
Epoch 35/50
14000/14000 [==============================] - 0s - loss: 0.0420 - acc: 0.9836 - val_loss: 0.7603 - val_acc: 0.8506
Epoch 36/50
14000/14000 [==============================] - 0s - loss: 0.0359 - acc: 0.9856 - val_loss: 0.7683 - val_acc: 0.8497
Epoch 37/50
14000/14000 [==============================] - 0s - loss: 0.0377 - acc: 0.9849 - val_loss: 0.7823 - val_acc: 0.8520
Epoch 38/50
14000/14000 [==============================] - 0s - loss: 0.0352 - acc: 0.9861 - val_loss: 0.7912 - val_acc: 0.8500
Epoch 39/50
14000/14000 [==============================] - 0s - loss: 0.0390 - acc: 0.9845 - val_loss: 0.8025 - val_acc: 0.8489
Epoch 40/50
14000/14000 [==============================] - 0s - loss: 0.0371 - acc: 0.9853 - val_loss: 0.8128 - val_acc: 0.8494
Epoch 41/50
14000/14000 [==============================] - 0s - loss: 0.0367 - acc: 0.9848 - val_loss: 0.8184 - val_acc: 0.8503
Epoch 42/50
14000/14000 [==============================] - 0s - loss: 0.0331 - acc: 0.9871 - val_loss: 0.8264 - val_acc: 0.8500
Epoch 43/50
14000/14000 [==============================] - 0s - loss: 0.0338 - acc: 0.9871 - val_loss: 0.8332 - val_acc: 0.8483
Epoch 44/50