Keras Batchnormalization and sample weights

Keras Batchnormalization and sample weights - python

I am trying the the training and evaluation example on the tensorflow website.
Specifically, this part:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255
y_train = y_train.astype('float32')
y_test = y_test.astype('float32')
def get_uncompiled_model():
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.BatchNormalization()(x)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, activation='softmax', name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
return model
def get_compiled_model():
model = get_uncompiled_model()
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
loss='sparse_categorical_crossentropy',
metrics=['sparse_categorical_accuracy'])
return model
sample_weight = np.ones(shape=(len(y_train),))
sample_weight[y_train == 5] = 2.
# Create a Dataset that includes sample weights
# (3rd element in the return tuple).
train_dataset = tf.data.Dataset.from_tensor_slices(
(x_train, y_train, sample_weight))
# Shuffle and slice the dataset.
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)
model = get_compiled_model()
model.fit(train_dataset, epochs=3)
It appears that if I add the batch normalization layer (this line: x = layers.BatchNormalization()(x)) I get the following error:
InvalidArgumentError: The second input must be a scalar, but it has shape [64]
[[{{node batch_normalization_2/cond/ReadVariableOp/Switch}}]]
Any ideas?

The same code works for me.
The only lines I changed are :
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3)
to model.compile(optimizer=keras.optimizers.RMSprop(lr=1e-3)
(which is version specific)
Then
model.fit(train_dataset, epochs=3) to model.fit(train_dataset, epochs=3, steps_per_epoch=30)
Reason : When using iterators as input to a model, you should specify the steps_per_epoch argument

If you just want to use sample weights, you don't have to use tf.data.Dataset, you can simply run:
model.fit(x=x_train, y=y_train, sample_weight=sample_weight, batch_size=64, epochs=3)
and it works for me (when I change learning_rate to lr as #ASHu2 mentioned).
It gets 97% accuracy after 3 epochs:
...
57408/60000 [===========================>..] - ETA: 0s - loss: 0.1010 - sparse_categorical_accuracy: 0.9709
58816/60000 [============================>.] - ETA: 0s - loss: 0.1011 - sparse_categorical_accuracy: 0.9708
60000/60000 [==============================] - 2s 37us/sample - loss: 0.1007 - sparse_categorical_accuracy: 0.9709
I used TF 1.14.0 on windows.

The problem was solved when I updated tensorflow from version 1.14.1 to 2.0.0-rc1.

Related

Accuracy of same validation dataset differs between last epoch and after fit

The following code gives a log ending with
Epoch 19/20
1/1 [==============================] - 0s 473ms/step - loss: 1.4018 - accuracy: 0.8750 - val_loss: 1.8656 - val_accuracy: 0.8900
Epoch 20/20
1/1 [==============================] - 0s 444ms/step - loss: 0.5904 - accuracy: 0.8750 - val_loss: 2.1255 - val_accuracy: 0.8700
get_dataset: validation
Found 1000 files belonging to 2 classes.
Using 100 files for validation.
4/4 [==============================] - 1s 81ms/step
eval acc: 0.81
My question is:
Why is the val_accuracy after the last epoch (0.87) different from the eval acc (0.81) after the fit?
In my code, I try to use the same dataset for the validation of each epoch during fit and the additional validation afterwards.
[Update 1, 2022-07-19:
Obviously, the two accuracy calculations don't really use the same data. How can I debug which data is actually used?
[Update 3, 2022-07-20: I have followed the data into TensorFlow. The last thing I see is that in Model.evaluate (during fit) and Model.predict the x.filenames are equal. I did not manage to debug much further, because soon in quick_execute the __inference_test_function_248219 resp. the __inference_predict_function_231438 are evaluated outside Python, and the arguments are tensors with dtype=resource, whose contents I cannot see.]
I have deliberately removed my class balancing code to keep my example small. I know that this makes the accuracies less useful, but I don't care about that for now.
Note that get_dataset('validation') is only called once at the beginning of the fit, not at each epoch.
I have now also set max_queue_size=0, use_multiprocessing=False, workers=0 (as seen here, found via this related SO question about TensorFlow 1), but this did not make the accuracies equal.
]
Code:
import tensorflow as tf
from sklearn.metrics import accuracy_score
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.preprocessing import image_dataset_from_directory
inputs = tf.keras.Input(shape=(224, 224, 3))
base_model = tf.keras.applications.ResNet50(weights='imagenet', include_top=False)
base_output = base_model(inputs)
base_model.trainable = False
out = Flatten(name='flat')(base_output)
out = Dense(1, activation='sigmoid')(out)
model = Model(inputs=inputs, outputs=out)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
def get_dataset(subset):
print('get_dataset:', subset)
return image_dataset_from_directory(
'data-nodup-1000',
labels="inferred",
label_mode='binary',
color_mode="rgb",
image_size=(224, 224),
shuffle=True,
seed=1,
validation_split=0.1,
subset=subset,
crop_to_aspect_ratio=False,
)
model.fit(
get_dataset('training'),
steps_per_epoch=1,
epochs=20,
validation_data=get_dataset('validation'),
max_queue_size=0,
use_multiprocessing=False,
workers=0,
)
val_dataset = get_dataset('validation')
true_class = tf.concat([y for x, y in val_dataset], axis=0)
pred = model.predict(val_dataset)
pred_class = pred >= .5
print('eval acc:', accuracy_score(true_class, pred_class))
[Update 2, 2022-07-19:
I can also reproduce the behavior with the deprecated ImageDataGenerator, using
from tensorflow.keras.applications.resnet50 import preprocess_input
from keras_preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(
preprocessing_function=preprocess_input,
validation_split=0.1,
)
def get_dataset(subset):
print('get_dataset:', subset)
return datagen.flow_from_directory(
'data-nodup-1000',
class_mode='binary',
target_size=(224, 224),
shuffle=True,
seed=1,
subset=subset,
)
and
true_class = val_dataset.labels
]
[Update 4, 2022-07-21: Note that deactivating shuffling of validation data by setting shuffle=(subset == 'training') makes the two validation accuracies equal. This is not a workaround, however, because the validation set then consists only of class 1, since flow_from_directory doesn't do stratification.
]
My environment:
I am using all up-to-date libraries, like tensorflow 2.9.1 and sklearn 1.1.1 (via pip-compile -U).
The folder data-nodup-1000 contains one subfolder with 113 files of class 0, and one subfolder with 887 files of class 1.

I have now found out that in TensorFlow 2.9.1 model.predict uses the second iteration of the dataset, which is shuffled differently than the first iteration!
It even uses the second iteration when I directly call model.predict(get_dataset('validation'))!
Therefore, the entries of true_class and pred do not match.
Switching to TensorFlow 2.10.0-rc3 and its tf.keras.utils.split_dataset makes the accuracies equal.
Here's the updated code:
import tensorflow as tf
from sklearn.metrics import accuracy_score
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.preprocessing import image_dataset_from_directory
inputs = tf.keras.Input(shape=(224, 224, 3))
base_model = tf.keras.applications.ResNet50(weights='imagenet', include_top=False)
base_output = base_model(inputs)
base_model.trainable = False
out = Flatten(name='flat')(base_output)
out = Dense(1, activation='sigmoid')(out)
model = Model(inputs=inputs, outputs=out)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
dataset = image_dataset_from_directory(
'data-synthetic',
labels="inferred",
label_mode='binary',
color_mode="rgb",
image_size=(224, 224),
shuffle=True,
seed=1,
crop_to_aspect_ratio=False,
)
train_dataset, val_dataset = tf.keras.utils.split_dataset(dataset, right_size=0.1)
model.fit(
train_dataset,
steps_per_epoch=1,
epochs=20,
validation_data=val_dataset,
max_queue_size=0,
use_multiprocessing=False,
workers=0,
)
true_class = tf.concat([y for x, y in val_dataset], axis=0)
pred = model.predict(val_dataset)
pred_class = pred >= .5
print('eval acc:', accuracy_score(true_class, pred_class))
which correctly yields:
Epoch 19/20
1/1 [==============================] - 0s 438ms/step - loss: 0.4426 - accuracy: 0.9062 - val_loss: 0.4658 - val_accuracy: 0.8800
Epoch 20/20
1/1 [==============================] - 0s 444ms/step - loss: 2.1619 - accuracy: 0.8438 - val_loss: 0.5886 - val_accuracy: 0.8900
4/4 [==============================] - 1s 87ms/step
eval acc: 0.89

there are a few points about your data which causes this:
First, your data is highly imbalanced (8 to 1 label ratio) which makes the model rather overfit and the CV estimate inaccurate.
Second, in the get_dataset function, the shuffle is set to True so every time you call the get_dataset(), it shuffles your data, and because (1) Your validation set is very small and (2) your train/val split is not stratified over your labels, the validation metrics would vary a lot due to this shuffling.
Suggestions to solve this:
call the get_dataset() only once for train and val dataset before fitting the model and save them as variables. and if there is no sequential order in your data, maybe set shuffle=False.
(optional) If possible make your dataset more balanced by techniques such as data augmentation, over-/under-sampling, etc.
def get_dataset(subset):
return image_dataset_from_directory(
'data-nodup-1000',
labels="inferred",
label_mode='binary',
color_mode="rgb",
image_size=(224, 224),
shuffle=False,
seed=0,
validation_split=0.1,
subset=subset,
crop_to_aspect_ratio=False,
)
train_dataset = get_dataset('training')
val_dataset = get_dataset('validation')
model.fit(
train_dataset,
steps_per_epoch=1,
epochs=20,
validation_data=val_dataset,
)
true_class = tf.concat([y for x, y in val_dataset], axis=0)
pred = model.predict(val_dataset)
pred_class = pred >= .5
print('eval acc:', accuracy_score(true_class, pred_class))

Model training with tf.data.Dataset and NumPy arrays yields different results

I use the Keras model training API and observed differences when training the model with NumPy arrays (x_train and y_train) and with tf.data.Dataset.from_tensor_slices((x_train, y_train)). A minimal working example is shown below:
import numpy as np
import tensorflow as tf
tf.keras.utils.set_random_seed(0)
n_examples, n_dims = (100, 10)
raw_dataset = np.random.randn(n_examples, n_dims)
model = tf.keras.models.Sequential(
[
tf.keras.layers.Dense(
1024, activation="relu", use_bias=True
),
tf.keras.layers.Dense(
1, activation="linear", use_bias=True
),
]
)
model.compile(
optimizer=tf.keras.optimizers.Adam(),
loss="mse",
)
x_train = raw_dataset[:, :-1]
y_train = raw_dataset[:, -1]
dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
n_epochs = 10
batch_size = 16
use_dataset = True
if use_dataset:
model.fit(
dataset.batch(batch_size=batch_size),
epochs=n_epochs,
)
else:
model.fit(
x=x_train,
y=y_train,
batch_size=batch_size,
epochs=n_epochs,
)
print("Evaluation:")
model.evaluate(x_train, y_train)
model.evaluate(dataset.batch(batch_size=batch_size))
If I run this code with use_dataset = True, the final performance is:
Evaluation:
4/4 [==============================] - 0s 825us/step - loss: 0.4132
7/7 [==============================] - 0s 701us/step - loss: 0.4132
If I run it with use_dataset = False, I get:
Evaluation:
4/4 [==============================] - 0s 855us/step - loss: 0.4219
7/7 [==============================] - 0s 808us/step - loss: 0.4219
I expected that the two training loops would perform identically. Interestingly, the model performance is identical if I set batch_size = n_examples. The difference seems to be related with the way that batches are handled internally. Why is this happening? Is it a bug or a feature?

The behavior is due to the default parameter shuffle=True in model.fit(*) and not a bug. According to the docs regarding shuffle:
Boolean (whether to shuffle the training data before each epoch) or str (for 'batch'). This argument is ignored when x is a generator or an object of tf.data.Dataset. 'batch' is a special option for dealing with the limitations of HDF5 data; it shuffles in batch-sized chunks. Has no effect when steps_per_epoch is not None.
So this parameter is ignored when a tf.data.Dataset is passed, and the data is not reshuffled after each epoch as in the other approach with arrays.
Here is the code to get the same results for both methods:
import numpy as np
import tensorflow as tf
tf.keras.utils.set_random_seed(0)
n_examples, n_dims = (100, 10)
raw_dataset = np.random.randn(n_examples, n_dims)
model = tf.keras.models.Sequential(
[
tf.keras.layers.Dense(
1024, activation="relu", use_bias=True
),
tf.keras.layers.Dense(
1, activation="linear", use_bias=True
),
]
)
model.compile(
optimizer=tf.keras.optimizers.Adam(),
loss="mse",
)
x_train = raw_dataset[:, :-1]
y_train = raw_dataset[:, -1]
dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
n_epochs = 10
batch_size = 16
use_dataset = False
if use_dataset:
model.fit(
dataset.batch(batch_size=batch_size),
epochs=n_epochs,
)
else:
model.fit(
x=x_train,
y=y_train,
batch_size=batch_size,
shuffle=False,
epochs=n_epochs,
)
print("Evaluation:")
model.evaluate(x_train, y_train)
model.evaluate(dataset.batch(batch_size=batch_size))

Reshaping MNIST for ResNet50

I am trying to train the mnist dataset on ResNet50 using the Keras library.
The shape of mnist is (28, 28, 1) however resnet50 required the shape to be (32, 32, 3)
How can I convert the mnist dataset to the required shape?
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(x_train.shape[0], x_train.shape[1], x_train.shape[2], 1)
x_test = x_test.reshape(x_test.shape[0], x_test.shape[1], x_test.shape[2], 1)
x_train = x_train/255.0
x_test = x_test/255.0
from keras.utils import to_categorical
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
model = models.Sequential()
# model.add(InputLayer(input_shape=(28, 28)))
# model.add(Reshape(target_shape=(32, 32, 3)))
# model.add(Conv2D())
model.add(conv_base)
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(BatchNormalization())
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(BatchNormalization())
model.add(Dense(10, activation='softmax'))
model.compile(optimizer=optimizers.RMSprop(lr=2e-5), loss='binary_crossentropy', metrics=['acc'])
history = model.fit(x_train, y_train, epochs=5, batch_size=20, validation_data=(x_test, y_test))
ValueError: Input 0 is incompatible with layer sequential_10: expected shape=(None, 32, 32, 3), found shape=(20, 28, 28, 1)

You need to resize the MNIST data set. Note that minimum size actually depends on the ImageNet model. For example: Xception requires at least 72, where ResNet is asking for 32. Apart from that, the MNIST is a grayscale image, but it may conflict if you're using the pretrained weight of these models. So, good and safe side is to resize and convert grayscale to RGB.
Full working code for you.
Data Set
We will resize MNIST from 28 to 32. Also, make 3 channels instead of keeping 1.
import tensorflow as tf
import numpy as np
(x_train, y_train), (_, _) = tf.keras.datasets.mnist.load_data()
# expand new axis, channel axis
x_train = np.expand_dims(x_train, axis=-1)
# [optional]: we may need 3 channel (instead of 1)
x_train = np.repeat(x_train, 3, axis=-1)
# it's always better to normalize
x_train = x_train.astype('float32') / 255
# resize the input shape , i.e. old shape: 28, new shape: 32
x_train = tf.image.resize(x_train, [32,32]) # if we want to resize
# one hot
y_train = tf.keras.utils.to_categorical(y_train , num_classes=10)
print(x_train.shape, y_train.shape)
(60000, 32, 32, 3) (60000, 10)
ResNet 50
input = tf.keras.Input(shape=(32,32,3))
efnet = tf.keras.applications.ResNet50(weights='imagenet',
include_top = False,
input_tensor = input)
# Now that we apply global max pooling.
gap = tf.keras.layers.GlobalMaxPooling2D()(efnet.output)
# Finally, we add a classification layer.
output = tf.keras.layers.Dense(10, activation='softmax', use_bias=True)(gap)
# bind all
func_model = tf.keras.Model(efnet.input, output)
Train
func_model.compile(
loss = tf.keras.losses.CategoricalCrossentropy(),
metrics = tf.keras.metrics.CategoricalAccuracy(),
optimizer = tf.keras.optimizers.Adam())
# fit
func_model.fit(x_train, y_train, batch_size=128, epochs=5, verbose = 2)
Epoch 1/5
469/469 - 56s - loss: 0.1184 - categorical_accuracy: 0.9690
Epoch 2/5
469/469 - 21s - loss: 0.0648 - categorical_accuracy: 0.9844
Epoch 3/5
469/469 - 21s - loss: 0.0503 - categorical_accuracy: 0.9867
Epoch 4/5
469/469 - 21s - loss: 0.0416 - categorical_accuracy: 0.9888
Epoch 5/5
469/469 - 21s - loss: 0.1556 - categorical_accuracy: 0.9697
<tensorflow.python.keras.callbacks.History at 0x7f316005a3d0>

Loss is NAN using Keras on the MNIST digit set

I am following an example from a data science textbook and have run into an issue where I am getting NaN values for the loss when running simple Keras neural networks to find the optimal learning rate.
# Get data and split into test/train/valid and normalize
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.mnist.load_data()
X_valid, X_train = X_train_full[:5000] / 255., X_train_full[5000:] / 255.
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]
X_test = X_test / 255.
# Callback to grow the learning rate at each iteration.
# Also record learning rate and loss at each iteration.
K = keras.backend
class ExponentialLearningRate(keras.callbacks.Callback):
def __init__(self, factor):
self.factor = factor
self.rates = []
self.losses = []
def on_batch_end(self, batch, logs):
self.rates.append(K.get_value(self.model.optimizer.lr))
self.losses.append(logs["loss"])
K.set_value(self.model.optimizer.lr, self.model.optimizer.lr * self.factor)
# Define the model and compile/fit.
keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)
model = keras.models.Sequential([
keras.layers.Flatten(input_shape=[28, 28]),
keras.layers.Dense(300, activation="relu"),
keras.layers.Dense(100, activation="relu"),
keras.layers.Dense(10, activation="softmax")
])
model.compile(loss="sparse_categorical_crossentropy",
optimizer=keras.optimizers.SGD(lr=1e-3),
metrics=["accuracy"])
expon_lr = ExponentialLearningRate(factor=1.005)
history = model.fit(X_train, y_train, epochs=1,
validation_data=(X_valid, y_valid),
callbacks=[expon_lr])
Running this gives an output of:
1719/1719 [==============================] - 6s 4ms/step - loss: nan - accuracy: 0.6030 - val_loss: nan - val_accuracy: 0.0958
Plotting the loss vs learning rate gives (top is my result, bottom is the expected result from the example I am following):
Notably, the example loss is much noisier than mine and ranges from ~2.5 to ~0.25. My loss only ranges from ~2.5 to exactly 1, at which point the loss goes NaN.
Perhaps something with keras/tf has been updated since this example was written, but as I am new to keras I am wondering what might be the issue here.

Your problem is the ExponentialLearningRate, your learning rate go from 0.0010150751 to 5.237502 which is why your loss is exploding, change the optimizer like this
optimizer=tf.keras.optimizers.Adam(0.001)
and remove the callback, your loss will be fine then

Neural network isn't learning for a first few epochs on Keras

I'm testing simple networks on Keras with TensorFlow backend and I ran into an issue with using sigmoid activation function
The network isn't learning for first 5-10 epochs, and then everything is fine.
I tried using initializers and regularizers, but that only made it worse.
I use the network like this:
import numpy as np
import keras
from numpy import expand_dims
from keras.preprocessing.image import ImageDataGenerator
from matplotlib import pyplot
# load the image
(x_train, y_train), (x_val, y_val), (x_test, y_test) = netowork2_ker.load_data_shared()
# expand dimension to one sample
x_train = expand_dims(x_train, 2)
x_train = np.reshape(x_train, (50000, 28, 28))
x_train = expand_dims(x_train, 3)
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)
datagen = ImageDataGenerator(
rescale=1./255,
width_shift_range=[-1, 0, 1],
height_shift_range=[-1, 0, 1],
rotation_range=10)
epochs = 20
batch_size = 50
num_classes = 10
model = keras.Sequential()
model.add(keras.layers.Conv2D(64, (3, 3), padding='same',
input_shape=x_train.shape[1:],
activation='sigmoid'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Conv2D(100, (3, 3),
activation='sigmoid'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(100,
activation='sigmoid'))
#model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Dense(num_classes,
activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size),
steps_per_epoch=len(x_train) / batch_size, epochs=epochs,
verbose=2, shuffle=True)
With the code above I get results like these:
Epoch 1/20
- 55s - loss: 2.3098 - accuracy: 0.1036
Epoch 2/20
- 56s - loss: 2.3064 - accuracy: 0.1038
Epoch 3/20
- 56s - loss: 2.3068 - accuracy: 0.1025
Epoch 4/20
- 56s - loss: 2.3060 - accuracy: 0.1079
...
For 7 epochs (different every time) and then the loss rapidly goes downward and i achieve 0.9623 accuracy in 20 epochs.
But if I change activation from sigmoid to relu it works great and gives me 0.5356 accuracy in the first epoch.
This issue makes sigmoid almost unusable for me and I'd like to know, I can do something about it. Is this a bug or am I doing something wrong?

Activation function suggestion：
In practice, the sigmoid non-linearity has recently fallen out of favor and it is rarely ever used. ReLU is the most common choice, if there are a large fraction of “dead” units in network, try Leaky ReLU and tanh. Never use sigmoid.
Reasons for not using the sigmoid：
A very undesirable property of the sigmoid neuron is that when the neuron’s activation saturates at either tail of 0 or 1, the gradient at these regions is almost zero. In addition, Sigmoid outputs are not zero-centered.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Keras Batchnormalization and sample weights - python

The problem was solved when I updated tensorflow from version 1.14.1 to 2.0.0-rc1.

Related

Accuracy of same validation dataset differs between last epoch and after fit

Model training with tf.data.Dataset and NumPy arrays yields different results

Reshaping MNIST for ResNet50

Loss is NAN using Keras on the MNIST digit set

Neural network isn't learning for a first few epochs on Keras

Categories

Resources