How do I define a custom keras metric for computing accuracy like so,
y_true = [12.5, 45.5]
y_predicted = [14.5, 29]
splits = [-float("inf"), 10, 20, 30, float("inf")]
"""
Splits to Classes translation =>
Class 0: -inf to 9
Class 1: 10 to 19
Class 2: 20 to 29
Class 3: 30 to inf
"""
# using the above translation,
y_true_classes = [1, 3]
y_predicted_classes = [1, 2]
accuracy = K.equal( y_true_classes, y_predicted_classes ) # => 0.5 here
return accuracy
Here is an idea on how you might you around implementing this (although probably not the best one).
def convert_to_classes(vals, splits):
out = tf.zeros_like(vals, dtype=tf.int32)
for split in splits:
out = tf.where(vals > split, out + 1, out)
return out
def my_acc(splits):
def custom_acc(y_true, y_pred):
y_true = convert_to_classes(y_true, splits)
y_pred = convert_to_classes(y_pred, splits)
return K.mean(K.equal(y_true, y_pred))
return custom_acc
The function convert_to_classes converts the floats to bucks, assuming the bounds are always +-inf.
The closure my_acc lets you define the splits (without +-inf) at compile time (added statically to the graph), and then returns a metric function as expected with keras.
Testing using tensorflow:
y_true = tf.constant([12.5, 45.5])
y_pred = tf.constant([14.5, 29])
with tf.Session() as sess:
print(sess.run(my_acc((10, 20, 30))(y_true, y_pred)))
gives the expected 0.5 accuracy.
And quick test with Keras:
x = np.random.randn(100, 10)*100
y = np.random.randn(100)*100
model = Sequential([Dense(20, activation='relu'),
Dense(1, activation=None)])
model.compile(optimizer='Adam',
loss='mse',
metrics=[my_acc(splits=(10, 20, 30))])
model.fit(x, y, batch_size=32, epochs=10)
Given the metric (named as the inner function in the closure custom_acc)
100/100 [==============================] - 0s 2ms/step - loss: 10242.2591 - custom_acc: 0.4300
Epoch 2/10
100/100 [==============================] - 0s 53us/step - loss: 10101.9658 - custom_acc: 0.4200
Epoch 3/10
100/100 [==============================] - 0s 53us/step - loss: 10011.4662 - custom_acc: 0.4300
Epoch 4/10
100/100 [==============================] - 0s 51us/step - loss: 9899.7181 - custom_acc: 0.4300
Epoch 5/10
100/100 [==============================] - 0s 50us/step - loss: 9815.1607 - custom_acc: 0.4200
Epoch 6/10
100/100 [==============================] - 0s 74us/step - loss: 9736.5554 - custom_acc: 0.4300
Epoch 7/10
100/100 [==============================] - 0s 50us/step - loss: 9667.0845 - custom_acc: 0.4400
Epoch 8/10
100/100 [==============================] - 0s 58us/step - loss: 9589.5439 - custom_acc: 0.4400
Epoch 9/10
100/100 [==============================] - 0s 61us/step - loss: 9511.8003 - custom_acc: 0.4400
Epoch 10/10
100/100 [==============================] - 0s 51us/step - loss: 9443.9730 - custom_acc: 0.4400
Related
I am new to Keras and have been practicing with resources from the web. Unfortunately, I cannot build a model without it throwing the following error:
ValueError: logits and labels must have the same shape, received ((None, 10) vs (None, 1)).
I have attempted the following:
DF = pd.read_csv("https://raw.githubusercontent.com/EpistasisLab/tpot/master/tutorials/MAGIC%20Gamma%20Telescope/MAGIC%20Gamma%20Telescope%20Data.csv")
X = DF.iloc[:,0:-1]
y = DF.iloc[:,-1]
yBin = np.array([1 if x == 'g' else 0 for x in y ])
scaler = StandardScaler()
X1 = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X1, yBin, test_size=0.25, random_state=2018)
print(X_train.__class__,X_test.__class__,y_train.__class__,y_test.__class__ )
model=Sequential()
model.add(Dense(6,activation="relu", input_shape=(10,)))
model.add(Dense(10,activation="softmax"))
model.build(input_shape=(None,1))
model.summary()
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(x=X_train,
y=y_train,
epochs=600,
validation_data=(X_test, y_test), verbose=1
)
I have read my model is likely wrong in terms of input parameters, what is the correct approach?
When I look at the shape of your data
print(X_train.shape,X_test.shape,y_train.shape,y_test.shape)
I see, that X is 10-dimensional and y us 1-dimensional
Therefore, you need 10-dimensional input
model.build(input_shape=(None,10))
and 1-dimensional output in the last dense layer
model.add(Dense(1,activation="softmax"))
Target variable yBin/y_train/y_test is 1D array (has a shape (None,1) for a given batch).
Your logits come from the Dense layer and the last Dense layer has 10 neurons with softmax activation. So it will give 10 outputs for each input or (batch_size,10) for each batch. This is represented formally as (None,10).
To resolve the particular shape mismatch issue in question change the neuron count of dense layer to 1 and set activation finction to "sigmoid".
model.add(Dense(1,activation="sigmoid"))
As correctly mentioned by #MSS, You need to use sigmoid activation function with 1 neuron in the last dense layer to match the logits with the labels(1,0) of your dataset which indicates binary class.
Fixed code:
model=Sequential()
model.add(Dense(6,activation="relu", input_shape=(10,)))
model.add(Dense(1,activation="sigmoid"))
#model.build(input_shape=(None,1))
model.summary()
model.compile(optimizer='rmsprop',loss='binary_crossentropy',metrics=['accuracy'])
model.fit(x=X_train,y=y_train,epochs=10,validation_data=(X_test, y_test),verbose=1)
Output:
Epoch 1/10
446/446 [==============================] - 3s 4ms/step - loss: 0.5400 - accuracy: 0.7449 - val_loss: 0.4769 - val_accuracy: 0.7800
Epoch 2/10
446/446 [==============================] - 2s 4ms/step - loss: 0.4425 - accuracy: 0.7987 - val_loss: 0.4241 - val_accuracy: 0.8095
Epoch 3/10
446/446 [==============================] - 2s 3ms/step - loss: 0.4082 - accuracy: 0.8175 - val_loss: 0.4034 - val_accuracy: 0.8242
Epoch 4/10
446/446 [==============================] - 2s 3ms/step - loss: 0.3934 - accuracy: 0.8286 - val_loss: 0.3927 - val_accuracy: 0.8313
Epoch 5/10
446/446 [==============================] - 2s 4ms/step - loss: 0.3854 - accuracy: 0.8347 - val_loss: 0.3866 - val_accuracy: 0.8320
Epoch 6/10
446/446 [==============================] - 2s 4ms/step - loss: 0.3800 - accuracy: 0.8397 - val_loss: 0.3827 - val_accuracy: 0.8364
Epoch 7/10
446/446 [==============================] - 2s 4ms/step - loss: 0.3762 - accuracy: 0.8411 - val_loss: 0.3786 - val_accuracy: 0.8387
Epoch 8/10
446/446 [==============================] - 2s 3ms/step - loss: 0.3726 - accuracy: 0.8432 - val_loss: 0.3764 - val_accuracy: 0.8404
Epoch 9/10
446/446 [==============================] - 2s 3ms/step - loss: 0.3695 - accuracy: 0.8466 - val_loss: 0.3724 - val_accuracy: 0.8408
Epoch 10/10
446/446 [==============================] - 2s 4ms/step - loss: 0.3665 - accuracy: 0.8478 - val_loss: 0.3698 - val_accuracy: 0.8454
<keras.callbacks.History at 0x7f68ca30f670>
Input X = [[1,1,1,1,1], [1,2,1,3,7], [3,1,5,7,8]] etc..
Output Y = [[0.77],[0.63],[0.77],[1.26]] etc..
input x mean some combination example
["car", "black", "sport", "xenon", "5dor"]
["car", "red", "sport", "noxenon", "3dor"] etc...
output mean some score of combination.
What i need? i need to predict is combination good or bad....
Dataset size 10k..
Model:
model.add(Dense(20, input_dim = 5, activation = 'relu'))
model.add(Dense(20, activation = 'relu'))
model.add(Dense(1, activation = 'linear'))
optimizer = adam, loss = mse, validation split 0.2, epoch 30
Tr:
Epoch 1/30
238/238 [==============================] - 0s 783us/step - loss: 29.8973 - val_loss: 19.0270
Epoch 2/30
238/238 [==============================] - 0s 599us/step - loss: 29.6696 - val_loss: 19.0100
Epoch 3/30
238/238 [==============================] - 0s 579us/step - loss: 29.6606 - val_loss: 19.0066
Epoch 4/30
238/238 [==============================] - 0s 583us/step - loss: 29.6579 - val_loss: 19.0050
Epoch 5/30
not good no sens...
i need some good documentation how to proper setup or build model...
Just tried to reproduce. My results differ from yours. Please check:
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras import Model
inputA = Input(shape=(5, ))
x = Dense(20, activation='relu')(inputA)
x = Dense(20, activation='relu')(x)
x = Dense(1, activation='linear')(x)
model = Model(inputs=inputA, outputs=x)
model.compile(optimizer = 'adam', loss = 'mse')
input = tf.random.uniform([10000, 5], 0, 10, dtype=tf.int32)
labels = tf.random.uniform([10000, 1])
model.fit(input, labels, epochs=30, validation_split=0.2)
Results:
Epoch 1/30 250/250 [==============================] - 1s 3ms/step -
loss: 0.1980 - val_loss: 0.1082
Epoch 2/30 250/250 [==============================] - 1s 2ms/step -
loss: 0.0988 - val_loss: 0.0951
Epoch 3/30 250/250 [==============================] - 1s 2ms/step -
loss: 0.0918 - val_loss: 0.0916
Epoch 4/30 250/250 [==============================] - 1s 2ms/step -
loss: 0.0892 - val_loss: 0.0872
Epoch 5/30 250/250 [==============================] - 0s 2ms/step -
loss: 0.0886 - val_loss: 0.0859
Epoch 6/30 250/250 [==============================] - 1s 2ms/step -
loss: 0.0864 - val_loss: 0.0860
Epoch 7/30 250/250 [==============================] - 1s 3ms/step -
loss: 0.0873 - val_loss: 0.0863
Epoch 8/30 250/250 [==============================] - 1s 2ms/step -
loss: 0.0863 - val_loss: 0.0992
Epoch 9/30 250/250 [==============================] - 0s 2ms/step -
loss: 0.0876 - val_loss: 0.0865
The model should work on real figures.
When I was running a tensorflow model on python, the accuracy of my model can't be improved by training. Even if I change my training data to a quite regular one, the model still didn't work. What's the problem?
Code:
train_x = np.array([1] * 1000 + [2] * 1000 + [3] * 1000)
train_y = np.zeros((3000, 3))
train_y[:1000,0] = 1
train_y[1000:2000,1] = 1
train_y[2000:3000,2] = 1
val_x = train_x
val_y = train_y
model = tf.keras.Sequential()
model.add(layers.Dense(3, activation='relu'))
model.add(layers.Dense(3, activation='relu'))
model.compile(optimizer=tf.keras.optimizers.Adam(0.1),
loss=tf.keras.losses.categorical_crossentropy,
metrics=[tf.keras.metrics.categorical_accuracy])
model.fit(train_x, train_y, epochs = 10, batch_size = 32, verbose = 1,
shuffle = False,
validation_data=(val_x, val_y))
And training result
Epoch 1/10
94/94 [==============================] - 0s 2ms/step - loss: 10.7836 - categorical_accuracy: 0.3120 - val_loss: 10.7454 - val_categorical_accuracy: 0.3333
Epoch 2/10
94/94 [==============================] - 0s 1ms/step - loss: 10.7454 - categorical_accuracy: 0.3333 - val_loss: 10.7454 - val_categorical_accuracy: 0.3333
Epoch 3/10
94/94 [==============================] - 0s 1ms/step - loss: 10.7454 - categorical_accuracy: 0.3333 - val_loss: 10.7454 - val_categorical_accuracy: 0.3333
Epoch 4/10
94/94 [==============================] - 0s 1ms/step - loss: 10.7454 - categorical_accuracy: 0.3333 - val_loss: 10.7454 - val_categorical_accuracy: 0.3333
Epoch 5/10
94/94 [==============================] - 0s 2ms/step - loss: 10.7454 - categorical_accuracy: 0.3333 - val_loss: 10.7454 - val_categorical_accuracy: 0.3333
So where I should adjust to get better performance, and which thing I have done wrong?
The problem is that with 3 input neurons and 1 feature (i.e., 1 column), the neural network doesn't have enough if... then combinations to learn the pattern you're trying to teach it. If you one-hot encode your input, it will effectively learn to multiply every input column by one and it will give the right answer.
You have the wrong activation function. For multi-class problems, use 'softmax'.
Your optimizer's learning rate is a little too high, so the step is too high and jumping all over the cost function. Use 0.01 at most.
Fully-working example:
import numpy as np
import tensorflow as tf
train_x = np.array([1] * 1000 + [2] * 1000 + [3] * 1000)
train_x = tf.keras.utils.to_categorical(train_x - 1)
train_y = np.zeros((3000, 3))
train_y[:1000,0] = 1
train_y[1000:2000,1] = 1
train_y[2000:3000,2] = 1
val_x = train_x
val_y = train_y
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(3, activation='relu'))
model.add(tf.keras.layers.Dense(3, activation='softmax'))
model.compile(optimizer=tf.keras.optimizers.Adam(0.01),
loss=tf.keras.losses.categorical_crossentropy,
metrics=[tf.keras.metrics.categorical_accuracy])
model.fit(train_x, train_y, epochs = 10, batch_size = 32, verbose = 1,
shuffle = False,
validation_data=(val_x, val_y))
Epoch 9/10
32/3000 [..............................] - ETA: 0s - loss: 0.0067 - cat_acc: 1.0000
608/3000 [=====>........................] - ETA: 0s - loss: 0.0063 - cat_acc: 1.0000
1184/3000 [==========>...................] - ETA: 0s - loss: 0.0244 - cat_acc: 1.0000
1760/3000 [================>.............] - ETA: 0s - loss: 0.0553 - cat_acc: 1.0000
2272/3000 [=====================>........] - ETA: 0s - loss: 0.0550 - cat_acc: 1.0000
2848/3000 [===========================>..] - ETA: 0s - loss: 0.0447 - cat_acc: 1.0000
The model that I am using is this:
from keras.layers import (Input, MaxPooling1D, Dropout,
BatchNormalization, Activation, Add,
Flatten, Conv1D, Dense)
from keras.models import Model
import numpy as np
class ResidualUnit(object):
"""References
----------
.. [1] K. He, X. Zhang, S. Ren, and J. Sun, "Identity Mappings in Deep Residual Networks,"
arXiv:1603.05027 [cs], Mar. 2016. https://arxiv.org/pdf/1603.05027.pdf.
.. [2] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in 2016 IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778. https://arxiv.org/pdf/1512.03385.pdf
"""
def __init__(self, n_samples_out, n_filters_out, kernel_initializer='he_normal',
dropout_rate=0.8, kernel_size=17, preactivation=True,
postactivation_bn=False, activation_function='relu'):
self.n_samples_out = n_samples_out
self.n_filters_out = n_filters_out
self.kernel_initializer = kernel_initializer
self.dropout_rate = dropout_rate
self.kernel_size = kernel_size
self.preactivation = preactivation
self.postactivation_bn = postactivation_bn
self.activation_function = activation_function
def _skip_connection(self, y, downsample, n_filters_in):
"""Implement skip connection."""
# Deal with downsampling
if downsample > 1:
y = MaxPooling1D(downsample, strides=downsample, padding='same')(y)
elif downsample == 1:
y = y
else:
raise ValueError("Number of samples should always decrease.")
# Deal with n_filters dimension increase
if n_filters_in != self.n_filters_out:
# This is one of the two alternatives presented in ResNet paper
# Other option is to just fill the matrix with zeros.
y = Conv1D(self.n_filters_out, 1, padding='same',
use_bias=False,
kernel_initializer=self.kernel_initializer
)(y)
return y
def _batch_norm_plus_activation(self, x):
if self.postactivation_bn:
x = Activation(self.activation_function)(x)
x = BatchNormalization(center=False, scale=False)(x)
else:
x = BatchNormalization()(x)
x = Activation(self.activation_function)(x)
return x
def __call__(self, inputs):
"""Residual unit."""
x, y = inputs
n_samples_in = y.shape[1]
downsample = n_samples_in // self.n_samples_out
n_filters_in = y.shape[2]
y = self._skip_connection(y, downsample, n_filters_in)
# 1st layer
x = Conv1D(self.n_filters_out, self.kernel_size, padding='same',
use_bias=False,
kernel_initializer=self.kernel_initializer
)(x)
x = self._batch_norm_plus_activation(x)
if self.dropout_rate > 0:
x = Dropout(self.dropout_rate)(x)
# 2nd layer
x = Conv1D(self.n_filters_out, self.kernel_size, strides=downsample,
padding='same', use_bias=False,
kernel_initializer=self.kernel_initializer
)(x)
if self.preactivation:
x = Add()([x, y]) # Sum skip connection and main connection
y = x
x = self._batch_norm_plus_activation(x)
if self.dropout_rate > 0:
x = Dropout(self.dropout_rate)(x)
else:
x = BatchNormalization()(x)
x = Add()([x, y]) # Sum skip connection and main connection
x = Activation(self.activation_function)(x)
if self.dropout_rate > 0:
x = Dropout(self.dropout_rate)(x)
y = x
return [x, y]
# ----- Model ----- #
kernel_size = 16
kernel_initializer = 'he_normal'
signal = Input(shape=(1000, 12), dtype=np.float32, name='signal')
age_range = Input(shape=(6,), dtype=np.float32, name='age_range')
is_male = Input(shape=(1,), dtype=np.float32, name='is_male')
x = signal
x = Conv1D(64, kernel_size, padding='same', use_bias=False,
kernel_initializer=kernel_initializer
)(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x, y = ResidualUnit(512, 128, kernel_size=kernel_size,
kernel_initializer=kernel_initializer
)([x, x])
x, y = ResidualUnit(256, 196, kernel_size=kernel_size,
kernel_initializer=kernel_initializer
)([x, y])
x, y = ResidualUnit(64, 256, kernel_size=kernel_size,
kernel_initializer=kernel_initializer
)([x, y])
x, _ = ResidualUnit(16, 320, kernel_size=kernel_size, kernel_initializer=kernel_initializer
)([x, y])
x = Flatten()(x)
diagn = Dense(2, activation='sigmoid', kernel_initializer=kernel_initializer)(x)
model = Model(signal, diagn)
model.summary()
# ----- Train ----- #
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
loss = 'binary_crossentropy'
lr = 0.001
batch_size = 64
opt = Adam(learning_rate=0.001)
callbacks = [ReduceLROnPlateau(monitor='val_loss',
factor=0.1,
patience=7,
min_lr=lr / 100)]
model.compile(optimizer=opt, loss=loss, metrics=['accuracy'])
history = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=70,
initial_epoch=0,
validation_split=0.1,
shuffle='batch',
callbacks=callbacks,
verbose=1)
# Save final result
model.save("./final_model_middle_one.hdf5")
When I substitute the use of Keras with tf.keras, which I need to use the qkeras library, the model doesn't learn and gets stuck at a much lower accuracy at every iteration. What could be causing this?
When I use keras the accuracy start high at 83% and slightly increases during training.
Train on 17340 samples, validate on 1927 samples
Epoch 1/70
17340/17340 [==============================] - 33s 2ms/step - loss: 0.3908 - accuracy: 0.8314 - val_loss: 0.3283 - val_accuracy: 0.8710
Epoch 2/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3641 - accuracy: 0.8416 - val_loss: 0.3340 - val_accuracy: 0.8612
Epoch 3/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3525 - accuracy: 0.8483 - val_loss: 0.3847 - val_accuracy: 0.8550
Epoch 4/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3354 - accuracy: 0.8563 - val_loss: 0.4641 - val_accuracy: 0.8215
Epoch 5/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3269 - accuracy: 0.8590 - val_loss: 0.7172 - val_accuracy: 0.7870
Epoch 6/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3202 - accuracy: 0.8630 - val_loss: 0.3599 - val_accuracy: 0.8617
Epoch 7/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3101 - accuracy: 0.8678 - val_loss: 0.2659 - val_accuracy: 0.8934
Epoch 8/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3058 - accuracy: 0.8688 - val_loss: 0.5683 - val_accuracy: 0.8293
Epoch 9/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.2980 - accuracy: 0.8739 - val_loss: 0.3442 - val_accuracy: 0.8643
Epoch 10/70
7424/17340 [===========>..................] - ETA: 17s - loss: 0.2966 - accuracy: 0.8707
When I use tf.keras the accuracy starts at 50% and does not increase considerably during training:
Epoch 1/70
271/271 [==============================] - 30s 110ms/step - loss: 0.9325 - accuracy: 0.5093 - val_loss: 0.6973 - val_accuracy: 0.5470 - lr: 0.0010
Epoch 2/70
271/271 [==============================] - 29s 108ms/step - loss: 0.8424 - accuracy: 0.5157 - val_loss: 0.6660 - val_accuracy: 0.6528 - lr: 0.0010
Epoch 3/70
271/271 [==============================] - 29s 108ms/step - loss: 0.8066 - accuracy: 0.5213 - val_loss: 0.6441 - val_accuracy: 0.6539 - lr: 0.0010
Epoch 4/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7884 - accuracy: 0.5272 - val_loss: 0.6649 - val_accuracy: 0.6559 - lr: 0.0010
Epoch 5/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7888 - accuracy: 0.5368 - val_loss: 0.6899 - val_accuracy: 0.5760 - lr: 0.0010
Epoch 6/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7617 - accuracy: 0.5304 - val_loss: 0.6641 - val_accuracy: 0.6533 - lr: 0.0010
Epoch 7/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7485 - accuracy: 0.5333 - val_loss: 0.6450 - val_accuracy: 0.6544 - lr: 0.0010
Epoch 8/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7431 - accuracy: 0.5382 - val_loss: 0.6599 - val_accuracy: 0.6539 - lr: 0.0010
Epoch 9/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7336 - accuracy: 0.5421 - val_loss: 0.6532 - val_accuracy: 0.6554 - lr: 0.0010
Epoch 10/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7274 - accuracy: 0.5379 - val_loss: 0.6753 - val_accuracy: 0.6492 - lr: 0.0010
The lines that have been changed between the two trials are the lines where I import keras modules by adding 'tensorflow.' in front of them. I don't know why the results would be so different, possibly due to different default values of certain parameters?
It might be related to how the accuracy metric is computed in keras vs tf.keras. As far as I can tell the accuracy function is usually used when you have one-hot-encoded output. However, it seems that you are outputting two values [A, B] with a sigmoid function applied to each value.
Since I don't know the labels you're using, there might be two cases:
a) You want to predict A or B. If sos I would change the activation function to softmax
b) You want to predict between A or not A and B or not B. In this case I would modify the output tensor shape to have two heads, each with two values: head_A = [A, not_A] and head_B = [B, not_B]. I would then hot-encode the labels respectively and then I would assume you could use the accuracy metric.
Alternatively, you can create a custom metric that is appropriate to your output shape.
I have a similar (same?) problem, I was manipulating some examples from Kaggle, and was unable to save the model using keras. After much Googling I realised that I needed to use tensorflow.keras. This solved my problem, but the 60000 data items I have and was using for training dropped to a reported 1875. Although the error was still 10%.
1875 * 32 = 60000.
This is my fit.
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs, verbose=True,
callbacks=[early_stopping_monitor])
1539/1875 [=======================>......] - ETA: 3s - loss: 0.4445 - accuracy: 0.8418
It turns out that fit defaults to a batch size of 32. If I increase the batch size to 64 I get half the reported data sets, which makes sense:
model.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), epochs=epochs, verbose=True,
callbacks=[early_stopping_monitor])
938/938 [==============================] - 16s 17ms/step - loss: 0.4568 - accuracy: 0.8388
I noticed from your code that you've set batch_size to 64, and your reported data items reduce from 17340 to 271 which is about a 64th, this must also affect your accuracy due to the data you are using.
From the docs here: https://www.tensorflow.org/api_docs/python/tf/keras/Sequential
batch_size
Integer or None. Number of samples per gradient update. If unspecified, batch_size will default to 32. Do not specify the batch_size if your data is in the form of a dataset, generators, or keras.utils.Sequence instances (since they generate batches).
From the Keras docs: https://keras.rstudio.com/reference/fit.html, it also says that the batch size defaults to 32, it must just be reported differently when training the model.
Hope this helps.
I have been trying to better understand the train/validation sequence in the keras model fit() loop. So I tried out a simple training loop where I attempted to fit a simple logistic regression model with input data consisting of a single feature.
I feed the same data for both training and validation. Under those conditions, and by specifying batch size to be the same and total data size, one would expect to obtain exactly the same loss and accuracy. But this is not the case.
Here is my code:
Generate some two random data with two classes:
N = 100
x = np.concatenate([np.random.randn(N//2, 1), np.random.randn(N//2, 1)+2])
y = np.concatenate([np.zeros(N//2), np.ones(N//2)])
And plotting the two class data distribution (one feature x):
data = pd.DataFrame({'x': x.ravel(), 'y': y})
sns.violinplot(x='x', y='y', inner='point', data=data, orient='h')
pyplot.tight_layout(0)
pyplot.show()
Build and fit the keras model:
model = tf.keras.Sequential([tf.keras.layers.Dense(1, activation='sigmoid', input_dim=1)])
model.compile(optimizer=tf.keras.optimizers.SGD(2), loss='binary_crossentropy', metrics=['accuracy'])
model.fit(x, y, epochs=10, validation_data=(x, y), batch_size=N)
Notice that I have specified the data x and targets y for both training and for validation_data. Also, the batch_size is same as total size batch_size=N.
The training results are:
100/100 [==============================] - 1s 5ms/step - loss: 1.4500 - acc: 0.2300 - val_loss: 0.5439 - val_acc: 0.7200
Epoch 2/10
100/100 [==============================] - 0s 18us/step - loss: 0.5439 - acc: 0.7200 - val_loss: 0.4408 - val_acc: 0.8000
Epoch 3/10
100/100 [==============================] - 0s 16us/step - loss: 0.4408 - acc: 0.8000 - val_loss: 0.3922 - val_acc: 0.8300
Epoch 4/10
100/100 [==============================] - 0s 16us/step - loss: 0.3922 - acc: 0.8300 - val_loss: 0.3659 - val_acc: 0.8400
Epoch 5/10
100/100 [==============================] - 0s 17us/step - loss: 0.3659 - acc: 0.8400 - val_loss: 0.3483 - val_acc: 0.8500
Epoch 6/10
100/100 [==============================] - 0s 16us/step - loss: 0.3483 - acc: 0.8500 - val_loss: 0.3356 - val_acc: 0.8600
Epoch 7/10
100/100 [==============================] - 0s 17us/step - loss: 0.3356 - acc: 0.8600 - val_loss: 0.3260 - val_acc: 0.8600
Epoch 8/10
100/100 [==============================] - 0s 18us/step - loss: 0.3260 - acc: 0.8600 - val_loss: 0.3186 - val_acc: 0.8600
Epoch 9/10
100/100 [==============================] - 0s 18us/step - loss: 0.3186 - acc: 0.8600 - val_loss: 0.3127 - val_acc: 0.8700
Epoch 10/10
100/100 [==============================] - 0s 23us/step - loss: 0.3127 - acc: 0.8700 - val_loss: 0.3079 - val_acc: 0.8800
The results show that val_loss and loss are not the same at the end of each epoch, and also acc and val_acc are not exactly the same. However, based on this setup, one would expect them to be the same.
I have been going through the code in keras, particularly this part:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L1364
and so far, all I can say that the difference is due to some different computation through the computation graph.
Does anyone has any idea why there would be such difference?
So after looking more closely at the results, the loss and acc values from the training step are computed BEFORE the current batch is used to update the model.
Thus, in the case of a single batch per epoch, the train acc and loss are evaluated when the batch is fed in, then the model parameters are updated based on the provided optimizer. After the train step is finished, we compute loss and accuracy by feeding in the validation data, which is now evaluated using a new updated model.
This is evident from the training results output, where validation accuracy and loss are in epoch 1 are equal to train accuracy and loss in epoch 2, etc...
A quick check using tensorflow confirmed that values are fetched before variables are updated:
import tensorflow as tf
import numpy as np
np.random.seed(1)
x = tf.placeholder(dtype=tf.float32, shape=(None, 1), name="x")
y = tf.placeholder(dtype=tf.float32, shape=(None), name="y")
W = tf.get_variable(name="W", shape=(1, 1), dtype=tf.float32, initializer=tf.constant_initializer(0))
b = tf.get_variable(name="b", shape=1, dtype=tf.float32, initializer=tf.constant_initializer(0))
z = tf.matmul(x, W) + b
error = tf.square(z - y)
obj = tf.reduce_mean(error, name="obj")
opt = tf.train.MomentumOptimizer(learning_rate=0.025, momentum=0.9)
grads = opt.compute_gradients(obj)
train_step = opt.apply_gradients(grads)
N = 100
x_np = np.random.randn(N).reshape(-1, 1)
y_np = 2*x_np + 3 + np.random.randn(N)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(2):
res = sess.run([obj, W, b, train_step], feed_dict={x: x_np, y: y_np})
print('MSE: {}, W: {}, b: {}'.format(res[0], res[1][0, 0], res[2][0]))
Output:
MSE: 14.721437454223633, W: 0.0, b: 0.0
MSE: 13.372591018676758, W: 0.08826743811368942, b: 0.1636980175971985
Since the parameters W and b were initialized to 0, then it is clear that the fetched values is still 0 even though session was run with gradient update request...