I am new to deep learning and have been trying to convert the Keras sequential API to the functional API running on the CIFAR10 image dataset but have been having some difficulty. I've converted the model which looks the same except for the input layer yet the sequential has an average accuracy of around ~70% and my functional has an average accuracy of around ~10%. I would really appreciate some help with regards to figuring out what is going wrong. Here is my functional code:
import tensorflow as tf
from tensorflow import keras
from keras import datasets, layers, models
from keras.models import Model, Input, Sequential
import matplotlib.pyplot as plt
Download and prepare:
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0
input_shape = train_images[0,:,:,:].shape
Create model:
input = layers.Input(shape=input_shape)
x = layers.Conv2D(32, (3, 3), activation='relu',padding='valid')(input)
x = layers.MaxPooling2D((2,2))(x)
x = layers.Conv2D(64, (3, 3), activation='relu')(x)
x = layers.MaxPooling2D((2,2))(x)
x = layers.Conv2D(64, (3, 3), activation='relu')(x)
x = layers.Flatten()(x)
x = layers.Dense(64, activation='relu')(x)
x = layers.Dense(10)(x)
model = Model(input, x, name='Functional')
Compile and train:
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_images, train_labels, epochs=10,
validation_data=(test_images, test_labels))
Here is a link to the original sequential CNN which is a google collaboratory notebook. I would really appreciate any help in trying to understand and fix what is going wrong. Thank you in advance.
There seems to be some issues with SparseCategoricalCrossentropy loss.
Check this: https://github.com/tensorflow/tensorflow/issues/38632
The following model gives good accuracy:
import tensorflow as tf
from tensorflow import keras
from keras import datasets, layers, models
from keras.models import Model, Input, Sequential
import matplotlib.pyplot as plt
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0
train_labels, test_labels = tf.keras.utils.to_categorical(train_labels, 10) , tf.keras.utils.to_categorical(test_labels, 10)
input_shape = train_images[0,:,:,:].shape
input = layers.Input(shape=input_shape)
x = layers.Conv2D(32, (3, 3), activation='relu',padding='valid')(input)
x = layers.MaxPooling2D((2,2))(x)
x = layers.Conv2D(64, (3, 3), activation='relu')(x)
x = layers.MaxPooling2D((2,2))(x)
x = layers.Conv2D(64, (3, 3), activation='relu')(x)
x = layers.Flatten()(x)
x = layers.Dense(64, activation='relu')(x)
x = layers.Dense(10, activation='softmax')(x)
model = Model(input, x, name='Functional')
model.summary()
model.compile(optimizer='adam',
loss=loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=['accuracy'])
history = model.fit(train_images, train_labels, epochs=10,
validation_data=(test_images, test_labels))
conv2d_16 (Conv2D) (None, 30, 30, 32) 896
_________________________________________________________________
max_pooling2d_11 (MaxPooling (None, 15, 15, 32) 0
_________________________________________________________________
conv2d_17 (Conv2D) (None, 13, 13, 64) 18496
_________________________________________________________________
max_pooling2d_12 (MaxPooling (None, 6, 6, 64) 0
_________________________________________________________________
conv2d_18 (Conv2D) (None, 4, 4, 64) 36928
_________________________________________________________________
flatten_6 (Flatten) (None, 1024) 0
_________________________________________________________________
dense_11 (Dense) (None, 64) 65600
_________________________________________________________________
dense_12 (Dense) (None, 10) 650
=================================================================
Total params: 122,570
Trainable params: 122,570
Non-trainable params: 0
_________________________________________________________________
Train on 50000 samples, validate on 10000 samples
Epoch 1/10
50000/50000 [==============================] - 15s 305us/step - loss: 1.4870 - accuracy: 0.4600 - val_loss: 1.2874 - val_accuracy: 0.5488
Epoch 2/10
50000/50000 [==============================] - 15s 301us/step - loss: 1.1365 - accuracy: 0.5989 - val_loss: 1.0789 - val_accuracy: 0.6191
Epoch 3/10
50000/50000 [==============================] - 15s 301us/step - loss: 0.9869 - accuracy: 0.6547 - val_loss: 0.9506 - val_accuracy: 0.6700
Epoch 4/10
50000/50000 [==============================] - 15s 301us/step - loss: 0.8896 - accuracy: 0.6907 - val_loss: 0.9509 - val_accuracy: 0.6695
Epoch 5/10
50000/50000 [==============================] - 16s 311us/step - loss: 0.8135 - accuracy: 0.7151 - val_loss: 0.8688 - val_accuracy: 0.7046
Epoch 6/10
50000/50000 [==============================] - 15s 303us/step - loss: 0.7566 - accuracy: 0.7351 - val_loss: 0.8411 - val_accuracy: 0.7141
Related
I saved the history of training by
history = model.fit(train_generator, epochs=epochs, steps_per_epoch=train_steps,
verbose=1, callbacks=callbacks, validation_data=val_generator,
validation_steps=val_steps,batch_size=16)
with open('history_epochs.pkl', 'wb') as f:
dump(history.history, f)
Can I use the file of history to continue from the last epoch? and how please
Below applies to any deep learning library …
Build model
Train model.
Save model (should be saving parameters/weights as well).
Load model from the saved file (any time any where).
Continue with more training.
You can use the pickle file to save and load your model and continue training:
Create your model
Train your model
Save your model as a pickle file
Code for the above steps:
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
import joblib
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat','Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
fig, axes = plt.subplots(2,5,figsize=(15,6))
for idx, axe in enumerate(axes.flatten()):
axe.axis('off')
idx_img = np.argwhere(y_train==idx)[0][0]
axe.imshow(X_train[idx_img], cmap=plt.cm.binary)
axe.set_title(class_names[y_train[idx_img]])
X_train = X_train.astype('float32') / 255.0
X_train = tf.expand_dims(X_train, axis=-1)
X_test = X_test.astype('float32') / 255.0
X_test = tf.expand_dims(X_test, axis=-1)
y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)
model = tf.keras.Sequential()
model.add(tf.keras.Input(shape=(X_train.shape[1], X_train.shape[1], 1)))
model.add(tf.keras.layers.Conv2D(128, (3,3), activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Conv2D(64, (3,3), activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Conv2D(128, (3,3), activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(512, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Dense(128, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Dense(10, activation='sigmoid'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
model.fit(X_train, y_train, batch_size=256, epochs=3, verbose=1, validation_split=.2)
model.evaluate(X_test, y_test, verbose=1)
joblib.dump(model, 'model.pkl')
Output:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 26, 26, 128) 1280
batch_normalization (BatchN (None, 26, 26, 128) 512
ormalization)
dropout (Dropout) (None, 26, 26, 128) 0
conv2d_1 (Conv2D) (None, 24, 24, 64) 73792
batch_normalization_1 (Batc (None, 24, 24, 64) 256
hNormalization)
dropout_1 (Dropout) (None, 24, 24, 64) 0
conv2d_2 (Conv2D) (None, 22, 22, 128) 73856
batch_normalization_2 (Batc (None, 22, 22, 128) 512
hNormalization)
dropout_2 (Dropout) (None, 22, 22, 128) 0
flatten (Flatten) (None, 61952) 0
dense (Dense) (None, 512) 31719936
dropout_3 (Dropout) (None, 512) 0
dense_1 (Dense) (None, 128) 65664
dropout_4 (Dropout) (None, 128) 0
dense_2 (Dense) (None, 10) 1290
=================================================================
Total params: 31,937,098
Trainable params: 31,936,458
Non-trainable params: 640
_________________________________________________________________
Epoch 1/3
188/188 [==============================] - 19s 81ms/step - loss: 0.8264 - accuracy: 0.7398 - val_loss: 3.4644 - val_accuracy: 0.1245
Epoch 2/3
188/188 [==============================] - 14s 75ms/step - loss: 0.4896 - accuracy: 0.8283 - val_loss: 1.2240 - val_accuracy: 0.5802
Epoch 3/3
188/188 [==============================] - 14s 77ms/step - loss: 0.4055 - accuracy: 0.8544 - val_loss: 0.3711 - val_accuracy: 0.8675
313/313 [==============================] - 2s 5ms/step - loss: 0.3850 - accuracy: 0.8591
[0.3849639296531677, 0.8590999841690063]
INFO:tensorflow:Assets written to: ram://****/assets
['model.pkl']
Load your model
Continue Training
Code for the above steps:
model = joblib.load("/content/model.pkl")
model.fit(X_train, y_train, batch_size=256, epochs=2, verbose=1, validation_split=.2)
model.evaluate(X_test, y_test, verbose=1)
Output:
Epoch 1/2
188/188 [==============================] - 17s 84ms/step - loss: 0.4414 - accuracy: 0.8496 - val_loss: 0.3449 - val_accuracy: 0.8697
Epoch 2/2
188/188 [==============================] - 15s 82ms/step - loss: 0.3704 - accuracy: 0.8708 - val_loss: 0.2884 - val_accuracy: 0.8965
313/313 [==============================] - 1s 5ms/step - loss: 0.3114 - accuracy: 0.8938
[0.31136029958724976, 0.8938000202178955]
I'm trying to make this model work. Initially x.shape is (6703, 56) and y.shape is a binary column having shape (6703, ). Then I run
y = y.to_numpy()
y = y.astype("float32")
y = tf.keras.utils.to_categorical(y, 2)
and now y.shape is (6703, 2). I run
X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size=0.2, random_state=42)
and now
X_train shape is (5362, 56)
Y_train shape is (5362, 2)
X_test shape is (1341, 56)
Y_test shape is (1341, 2)
Then I build the model:
model = tf.keras.models.Sequential(name="3layers")
model.add(keras.layers.Dense(N_HIDDEN,
input_shape=(len(X_train[0]),),
name="dense1",
activation="relu"))
model.add(keras.layers.Dropout(DROPOUT))
model.add(keras.layers.Dense(N_HIDDEN,
name="dense2",
activation="relu"))
model.add(keras.layers.Dropout(DROPOUT))
model.add(keras.layers.Dense(NB_CLASSES,
name="dense3",
activation="softmax"))
model.summary()
model.compile(optimizer="SGD", #SGD adam
loss="categorical_crossentropy",
metrics=["accuracy"])
model.fit(X_train, Y_train,
batch_size=BATCH_SIZE,
epochs=EPOCHS,
verbose=VERBOSE,
validation_split=VALIDATION_SPLIT)
test_loss, test_acc = model.evaluate(X_test, Y_test)
The summary is what I expect:
dense1 (Dense) (None, 64) 3648
dropout_18 (Dropout) (None, 64) 0
dense2 (Dense) (None, 64) 4160
dropout_19 (Dropout) (None, 64) 0
dense3 (Dense) (None, 2) 130
but the output is
Epoch 1/5
> 429/429 [==============================] - 1s 1ms/step - loss: nan - accuracy: 0.5141 - val_loss: nan - val_accuracy: 0.4884
Epoch 2/5
> 429/429 [==============================] - 0s 1ms/step - loss: nan - accuracy: 0.5143 - val_loss: nan - val_accuracy: 0.4884
Epoch 3/5
> 429/429 [==============================] - 0s 987us/step - loss: nan - accuracy: 0.5143 - val_loss: nan - val_accuracy: 0.4884
I've tried changing many parameters, I'm stuck.
I found what it was. There were some "None" values in the x matrix that caused the problem. Removing them it started evaluating a numeric loss. Very poor accuracy, but this will be another problem to solve.
When trying to get confusion matrix for a ConvNet constantly getting the same error.
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras import backend as K
import numpy as np
from keras.preprocessing import image
from sklearn.metrics import classification_report, confusion_matrix
img_width, img_height = 150, 150
train_data_dir = "train"
validation_data_dir = "test"
nb_train_samples = 2000
nb_validation_samples = 400
epochs = 50
batch_size = 40 #16
if K.image_data_format() == 'channels_first':
input_shape = (3, img_width, img_height)
else:
input_shape = (img_width, img_height, 3)
train_datagen = ImageDataGenerator(
rescale = 1. / 255,
shear_range = 0.2,
zoom_range= 0.2,
horizontal_flip= True)
test_datagen = ImageDataGenerator(rescale=1. / 255)
train_generator = test_datagen.flow_from_directory(
train_data_dir,
target_size= (img_width, img_height),
batch_size= batch_size,
class_mode= 'binary')
validation_generator = test_datagen.flow_from_directory(
validation_data_dir,
target_size= (img_width, img_height),
batch_size= batch_size,
class_mode= 'binary')`
Applying CNN Layers ...
model.compile(loss= 'binary_crossentropy',
optimizer= 'rmsprop',
metrics= ['accuracy'] )
`model.fit_generator(
train_generator,
steps_per_epoch= nb_train_samples // batch_size,
epochs= epochs,
validation_data= validation_generator,
validation_steps= nb_validation_samples // batch_size)
Y_pred = model.predict_generator(validation_generator, nb_validation_samples // batch_size+1)
y_pred = np.argmax(Y_pred, axis=1)
print('Confusion Matrix')
print(confusion_matrix(validation_generator.classes, y_pred))`
Getting error mentioned below but don't know how to resolve it
ValueError: Found input variables with inconsistent numbers of samples: [400, 440]
I am able to recreate your error using Dogs_Vs_Cats dataset. Where i have 2000 samples in train directory and 400 samples in validation directory.
Please change model.predict_generator from
Y_pred = model.predict_generator(validation_generator, nb_validation_samples // batch_size+1)
to
Y_pred = model.predict_generator(validation_generator, nb_validation_samples // batch_size)
will resolve this ValueError: Found input variables with inconsistent numbers of samples: [400, 440]
Please refer complete code as below
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras import backend as K
import numpy as np
from keras.preprocessing import image
from sklearn.metrics import classification_report, confusion_matrix
from google.colab import drive
drive.mount('/content/drive')
train_data_dir = '/content/drive/My Drive/Dogs_Vs_Cats/train'
validation_data_dir = '/content/drive/My Drive/Dogs_Vs_Cats/validation'
img_width, img_height = 150, 150
nb_train_samples = 2000
nb_validation_samples = 400
epochs = 10
batch_size = 40 #16
if K.image_data_format() == 'channels_first':
input_shape = (3, img_width, img_height)
else:
input_shape = (img_width, img_height, 3)
train_datagen = ImageDataGenerator(
rescale = 1. / 255,
shear_range = 0.2,
zoom_range= 0.2,
horizontal_flip= True)
test_datagen = ImageDataGenerator(rescale=1. / 255)
train_generator = test_datagen.flow_from_directory(
train_data_dir,
target_size= (img_width, img_height),
batch_size= batch_size,
class_mode= 'binary')
validation_generator = test_datagen.flow_from_directory(
validation_data_dir,
target_size= (img_width, img_height),
batch_size= batch_size,
class_mode= 'binary')
model = Sequential()
model.add(Conv2D(32, (3, 3), strides = (1, 1), input_shape = input_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), strides = (1, 1)))
model.add(Activation('relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.summary()
model.compile(loss = 'binary_crossentropy',
optimizer = 'rmsprop',
metrics = ['accuracy'])
model.fit_generator(
train_generator,
steps_per_epoch= nb_train_samples // batch_size,
epochs= epochs,
validation_data= validation_generator,
validation_steps= nb_validation_samples // batch_size)
Y_pred = model.predict_generator(validation_generator, nb_validation_samples // batch_size)
y_pred = np.argmax(Y_pred, axis=1)
print('Confusion Matrix')
print(confusion_matrix(validation_generator.classes, y_pred))
Output:
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Found 2000 images belonging to 2 classes.
Found 400 images belonging to 2 classes.
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_9 (Conv2D) (None, 148, 148, 32) 896
_________________________________________________________________
activation_9 (Activation) (None, 148, 148, 32) 0
_________________________________________________________________
max_pooling2d_9 (MaxPooling2 (None, 74, 74, 32) 0
_________________________________________________________________
conv2d_10 (Conv2D) (None, 72, 72, 64) 18496
_________________________________________________________________
activation_10 (Activation) (None, 72, 72, 64) 0
_________________________________________________________________
max_pooling2d_10 (MaxPooling (None, 36, 36, 64) 0
_________________________________________________________________
flatten_5 (Flatten) (None, 82944) 0
_________________________________________________________________
dense_9 (Dense) (None, 64) 5308480
_________________________________________________________________
dropout_5 (Dropout) (None, 64) 0
_________________________________________________________________
dense_10 (Dense) (None, 1) 65
=================================================================
Total params: 5,327,937
Trainable params: 5,327,937
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
50/50 [==============================] - 12s 233ms/step - loss: 0.9345 - accuracy: 0.5375 - val_loss: 0.6303 - val_accuracy: 0.5225
Epoch 2/10
50/50 [==============================] - 11s 226ms/step - loss: 0.6745 - accuracy: 0.5965 - val_loss: 0.6094 - val_accuracy: 0.6725
Epoch 3/10
50/50 [==============================] - 11s 223ms/step - loss: 0.6196 - accuracy: 0.6605 - val_loss: 0.5694 - val_accuracy: 0.7150
Epoch 4/10
50/50 [==============================] - 11s 223ms/step - loss: 0.5501 - accuracy: 0.7285 - val_loss: 0.6216 - val_accuracy: 0.7225
Epoch 5/10
50/50 [==============================] - 11s 221ms/step - loss: 0.4794 - accuracy: 0.7790 - val_loss: 0.6268 - val_accuracy: 0.6025
Epoch 6/10
50/50 [==============================] - 11s 226ms/step - loss: 0.4038 - accuracy: 0.8195 - val_loss: 0.4842 - val_accuracy: 0.6975
Epoch 7/10
50/50 [==============================] - 11s 222ms/step - loss: 0.3207 - accuracy: 0.8595 - val_loss: 0.5600 - val_accuracy: 0.7325
Epoch 8/10
50/50 [==============================] - 13s 257ms/step - loss: 0.2574 - accuracy: 0.8920 - val_loss: 0.9705 - val_accuracy: 0.7525
Epoch 9/10
50/50 [==============================] - 13s 252ms/step - loss: 0.2049 - accuracy: 0.9235 - val_loss: 0.7311 - val_accuracy: 0.7475
Epoch 10/10
50/50 [==============================] - 13s 251ms/step - loss: 0.1448 - accuracy: 0.9515 - val_loss: 1.0541 - val_accuracy: 0.7150
Confusion Matrix
[[200 0]
[200 0]]
Hope this answers your question. If not please share complete traceback and code for debug, i am happy to help you.
I've been working on a data set (1000,3253) using a CNN. I'm running gradient calculations through gradient tape but it keeps running out of memory. Yet if I remove the line appending a gradient calculation to a list the script runs through all the epochs. I'm not entirely sure why this would happen but I am also new to tensorflow and the use of gradient tape. Any advice or input would be appreciated
#create a batch loop
for x, y_true in train_dataset:
#create a tape to record actions
with tf.GradientTape(watch_accessed_variables=False) as tape:
x_var = tf.Variable(x)
tape.watch([model.trainable_variables,x_var])
y_pred = model(x_var,training=True)
tape.stop_recording()
loss = los_func(y_true, y_pred)
epoch_loss_avg.update_state(loss)
epoch_accuracy.update_state(y_true, y_pred)
#pdb.set_trace()
gradients,something = tape.gradient(loss, (model.trainable_variables,x_var))
#sa_input.append(tape.gradient(loss, x_var))
del tape
#apply gradients
sa_input.append(something)
opti_func.apply_gradients(zip(gradients, model.trainable_variables))
train_loss_results.append(epoch_loss_avg.result())
train_accuracy_results.append(epoch_accuracy.result())
As you are new to TF2, would recommend to go through this guide. This guide covers training, evaluation, and prediction (inference) models in TensorFlow 2.0 in two broad situations:
When using built-in APIs for training & validation (such as model.fit(), model.evaluate(), model.predict()). This is covered in the section "Using built-in training & evaluation loops".
When writing custom loops from scratch using eager execution and the GradientTape object. This is covered in the section "Writing your own training & evaluation loops from scratch".
Below is a program where I am computing the gradients after every epoch and appending to a list. At end of the program I am converting the list to array for simplicity.
Code - This program throws OOM Error error if I use a deep network of many layers and bigger filter size
# Importing dependency
%tensorflow_version 2.x
from tensorflow import keras
from tensorflow.keras import backend as K
from tensorflow.keras import datasets
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout, Flatten, Conv2D, MaxPooling2D
from tensorflow.keras.layers import BatchNormalization
import numpy as np
import tensorflow as tf
# Import Data
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
# Build Model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(32,32, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(10))
# Model Summary
model.summary()
# Model Compile
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
# Define the Gradient Fucntion
epoch_gradient = []
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# Define the Gradient Function
#tf.function
def get_gradient_func(model):
with tf.GradientTape() as tape:
logits = model(train_images, training=True)
loss = loss_fn(train_labels, logits)
grad = tape.gradient(loss, model.trainable_weights)
model.optimizer.apply_gradients(zip(grad, model.trainable_variables))
return grad
# Define the Required Callback Function
class GradientCalcCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
grad = get_gradient_func(model)
epoch_gradient.append(grad)
epoch = 4
print(train_images.shape, train_labels.shape)
model.fit(train_images, train_labels, epochs=epoch, validation_data=(test_images, test_labels), callbacks=[GradientCalcCallback()])
# (7) Convert to a 2 dimensiaonal array of (epoch, gradients) type
gradient = np.asarray(epoch_gradient)
print("Total number of epochs run:", epoch)
Output -
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_12 (Conv2D) (None, 30, 30, 32) 896
_________________________________________________________________
max_pooling2d_8 (MaxPooling2 (None, 15, 15, 32) 0
_________________________________________________________________
conv2d_13 (Conv2D) (None, 13, 13, 64) 18496
_________________________________________________________________
max_pooling2d_9 (MaxPooling2 (None, 6, 6, 64) 0
_________________________________________________________________
conv2d_14 (Conv2D) (None, 4, 4, 64) 36928
_________________________________________________________________
flatten_4 (Flatten) (None, 1024) 0
_________________________________________________________________
dense_11 (Dense) (None, 64) 65600
_________________________________________________________________
dense_12 (Dense) (None, 10) 650
=================================================================
Total params: 122,570
Trainable params: 122,570
Non-trainable params: 0
_________________________________________________________________
(50000, 32, 32, 3) (50000, 1)
Epoch 1/4
1563/1563 [==============================] - 109s 70ms/step - loss: 1.7026 - accuracy: 0.4081 - val_loss: 1.4490 - val_accuracy: 0.4861
Epoch 2/4
1563/1563 [==============================] - 145s 93ms/step - loss: 1.2657 - accuracy: 0.5506 - val_loss: 1.2076 - val_accuracy: 0.5752
Epoch 3/4
1563/1563 [==============================] - 151s 96ms/step - loss: 1.1103 - accuracy: 0.6097 - val_loss: 1.1122 - val_accuracy: 0.6127
Epoch 4/4
1563/1563 [==============================] - 152s 97ms/step - loss: 1.0075 - accuracy: 0.6475 - val_loss: 1.0508 - val_accuracy: 0.6371
Total number of epochs run: 4
Hope this answers your question. Happy Learning.
My aim is to use SVD to PCA whiten the latent layer before passing it to the decoder module of an autoencoder. I have used tf.linalg.svd but it does not work since it does not contain necessary Keras parameters. So as a workaround I was trying to wrap it inside Lambda but got this error
AttributeError: 'tuple' object has no attribute 'shape'.
I tried SO (E.g. Using SVD in a custom layer in Keras/tensorflow) and did Google search for SVD in Keras but could not find any answers. I have attached a stripped but functional code here:
import numpy as np
import tensorflow as tf
from sklearn import preprocessing
from keras.layers import Lambda, Input, Dense, Multiply, Subtract
from keras.models import Model
from keras import backend as K
from keras.losses import mse
from keras import optimizers
from keras.callbacks import EarlyStopping
x = np.random.randn(100, 5)
train_data = preprocessing.scale(x)
input_shape = (5, )
original_dim = train_data.shape[1]
intermediate_dim_1 = 64
intermediate_dim_2 = 16
latent_dim = 2
batch_size = 10
epochs = 15
# build encoder model
inputs = Input(shape=input_shape, name='encoder_input')
layer_1 = Dense(intermediate_dim_1, activation='tanh') (inputs)
layer_2 = Dense(intermediate_dim_2, activation='tanh') (layer_1)
encoded_layer = Dense(latent_dim, name='latent_layer') (layer_2)
encoder = Model(inputs, encoded_layer, name='encoder')
encoder.summary()
# build decoder model
latent_inputs = Input(shape=(latent_dim,))
layer_1 = Dense(intermediate_dim_1, activation='tanh') (latent_inputs)
layer_2 = Dense(intermediate_dim_2, activation='tanh') (layer_1)
outputs = Dense(original_dim,activation='sigmoid') (layer_2)
decoder = Model(latent_inputs, outputs, name='decoder')
decoder.summary()
# mean removal and pca whitening
meanX = Lambda(lambda x: tf.reduce_mean(x, axis=0, keepdims=True))(encoded_layer)
standardized = Subtract()([encoded_layer, meanX])
sigma2 = K.dot(K.transpose(standardized), standardized)
sigma2 = Lambda(lambda x: x / batch_size)(sigma2)
s, u ,v = tf.linalg.svd(sigma2,compute_uv=True)
# s ,u ,v = Lambda(lambda x: tf.linalg.svd(x,compute_uv=True))(sigma2)
epsilon = 1e-6
# sqrt of number close to 0 leads to problem hence replace it with epsilon
si = tf.where(tf.less(s, epsilon), tf.sqrt(1 / epsilon) * tf.ones_like(s),
tf.math.truediv(1.0, tf.sqrt(s)))
whitening_layer = u # tf.linalg.diag(si) # tf.transpose(v)
whitened_encoding = K.dot(standardized, whitening_layer)
# Connect models
z_decoded = decoder(standardized)
# z_decoded = decoder(whitened_encoding)
# Define losses
reconstruction_loss = mse(inputs,z_decoded)
# Instantiate autoencoder
ae = Model(inputs, z_decoded, name='autoencoder')
ae.add_loss(reconstruction_loss)
# callback = EarlyStopping(monitor='val_loss', patience=5)
adam = optimizers.adam(learning_rate=0.002)
ae.compile(optimizer=adam)
ae.summary()
ae.fit(train_data, epochs=epochs, batch_size=batch_size,
validation_split=0.2, shuffle=True)
To reproduce the error uncomment these lines and comment the one preceding it:
z_decoded = decoder(whitened_encoding)
s ,u ,v = Lambda(lambda x: tf.linalg.svd(x,compute_uv=True))(sigma2)
I would appreciate it if someone could tell me how to wrap the SVD inside Keras layers or an alternate implementation.
Please note that I have not included the reparameterization trick to calculate the loss to keep the code simple.
Thank you !
I solved the problem. To use SVD inside Keras, we need to use the Lambda layer. However, as Lambda returns a tensor with some additional attributes, it is best to do additional work inside the lambda function and return a tensor. Another problem with my code was the combination of encoder and decoder model which I fixed by combining the output of encoder to the input of decoder model. The working code is as follows:
import numpy as np
import tensorflow as tf
from sklearn import preprocessing
from keras.layers import Lambda, Input, Dense, Multiply, Subtract
from keras.models import Model
from keras import backend as K
from keras.losses import mse
from keras import optimizers
from keras.callbacks import EarlyStopping
def SVD(sigma2):
s ,u ,v = tf.linalg.svd(sigma2,compute_uv=True)
epsilon = 1e-6
# sqrt of number close to 0 leads to problem hence replace it with epsilon
si = tf.where(tf.less(s, epsilon),
tf.sqrt(1 / epsilon) * tf.ones_like(s),
tf.math.truediv(1.0, tf.sqrt(s)))
whitening_layer = u # tf.linalg.diag(si) # tf.transpose(v)
return whitening_layer
x = np.random.randn(100, 5)
train_data = preprocessing.scale(x)
input_shape = (5, )
original_dim = train_data.shape[1]
intermediate_dim_1 = 64
intermediate_dim_2 = 16
latent_dim = 2
batch_size = 10
epochs = 15
# build encoder model
inputs = Input(shape=input_shape, name='encoder_input')
layer_1 = Dense(intermediate_dim_1, activation='tanh') (inputs)
layer_2 = Dense(intermediate_dim_2, activation='tanh') (layer_1)
encoded_layer = Dense(latent_dim, name='latent_layer') (layer_2)
encoder = Model(inputs, encoded_layer, name='encoder')
encoder.summary()
# build decoder model
latent_inputs = Input(shape=(latent_dim,))
layer_1 = Dense(intermediate_dim_1, activation='tanh') (latent_inputs)
layer_2 = Dense(intermediate_dim_2, activation='tanh') (layer_1)
outputs = Dense(original_dim,activation='sigmoid') (layer_2)
decoder = Model(latent_inputs, outputs, name='decoder')
decoder.summary()
# mean removal and pca whitening
meanX = Lambda(lambda x: tf.reduce_mean(x, axis=0, keepdims=True))(encoded_layer)
standardized = Subtract()([encoded_layer, meanX])
sigma2 = K.dot(K.transpose(standardized), standardized)
sigma2 = Lambda(lambda x: x / batch_size)(sigma2)
# s, u ,v = tf.linalg.svd(sigma2,compute_uv=True)
whitening_layer = Lambda(SVD)(sigma2)
'''
s ,u ,v = Lambda(lambda x: tf.linalg.svd(x,compute_uv=True))(sigma2)
epsilon = 1e-6
# sqrt of number close to 0 leads to problem hence replace it with epsilon
si = tf.where(tf.less(s, epsilon),
tf.sqrt(1 / epsilon) * tf.ones_like(s),
tf.math.truediv(1.0, tf.sqrt(s)))
whitening_layer = u # tf.linalg.diag(si) # tf.transpose(v)
'''
print('whitening_layer shape=', np.shape(whitening_layer))
print('standardized shape=', np.shape(standardized))
whitened_encoding = K.dot(standardized, whitening_layer)
# Connect models
# z_decoded = decoder(standardized)
z_decoded = decoder(encoder(inputs))
# Define losses
reconstruction_loss = mse(inputs,z_decoded)
# Instantiate autoencoder
ae = Model(inputs, z_decoded, name='autoencoder')
ae.add_loss(reconstruction_loss)
# callback = EarlyStopping(monitor='val_loss', patience=5)
adam = optimizers.adam(learning_rate=0.002)
ae.compile(optimizer=adam)
ae.summary()
ae.fit(train_data, epochs=epochs, batch_size=batch_size,
validation_split=0.2, shuffle=True)
The output of running the code is as follows:
Model: "encoder"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
encoder_input (InputLayer) (None, 5) 0
_________________________________________________________________
dense_1 (Dense) (None, 64) 384
_________________________________________________________________
dense_2 (Dense) (None, 16) 1040
_________________________________________________________________
latent_layer (Dense) (None, 2) 34
=================================================================
Total params: 1,458
Trainable params: 1,458
Non-trainable params: 0
_________________________________________________________________
Model: "decoder"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 2) 0
_________________________________________________________________
dense_3 (Dense) (None, 64) 192
_________________________________________________________________
dense_4 (Dense) (None, 16) 1040
_________________________________________________________________
dense_5 (Dense) (None, 5) 85
=================================================================
Total params: 1,317
Trainable params: 1,317
Non-trainable params: 0
_________________________________________________________________
whitening_layer shape= (2, 2)
standardized shape= (None, 2)
/home/manish/anaconda3/envs/ica_gpu/lib/python3.7/site-packages/keras/engine/training_utils.py:819: UserWarning: Output decoder missing from loss dictionary. We assume this was done on purpose. The fit and evaluate APIs will not be expecting any data to be passed to decoder.
'be expecting any data to be passed to {0}.'.format(name))
Model: "autoencoder"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
encoder_input (InputLayer) (None, 5) 0
_________________________________________________________________
encoder (Model) (None, 2) 1458
_________________________________________________________________
decoder (Model) (None, 5) 1317
=================================================================
Total params: 2,775
Trainable params: 2,775
Non-trainable params: 0
_________________________________________________________________
Train on 80 samples, validate on 20 samples
Epoch 1/15
2020-05-16 16:01:55.443061: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
80/80 [==============================] - 0s 3ms/step - loss: 1.1739 - val_loss: 1.2238
Epoch 2/15
80/80 [==============================] - 0s 228us/step - loss: 1.0601 - val_loss: 1.0921
Epoch 3/15
80/80 [==============================] - 0s 261us/step - loss: 0.9772 - val_loss: 1.0291
Epoch 4/15
80/80 [==============================] - 0s 223us/step - loss: 0.9385 - val_loss: 0.9875
Epoch 5/15
80/80 [==============================] - 0s 262us/step - loss: 0.9105 - val_loss: 0.9560
Epoch 6/15
80/80 [==============================] - 0s 240us/step - loss: 0.8873 - val_loss: 0.9335
Epoch 7/15
80/80 [==============================] - 0s 217us/step - loss: 0.8731 - val_loss: 0.9156
Epoch 8/15
80/80 [==============================] - 0s 253us/step - loss: 0.8564 - val_loss: 0.9061
Epoch 9/15
80/80 [==============================] - 0s 273us/step - loss: 0.8445 - val_loss: 0.8993
Epoch 10/15
80/80 [==============================] - 0s 235us/step - loss: 0.8363 - val_loss: 0.8937
Epoch 11/15
80/80 [==============================] - 0s 283us/step - loss: 0.8299 - val_loss: 0.8874
Epoch 12/15
80/80 [==============================] - 0s 254us/step - loss: 0.8227 - val_loss: 0.8832
Epoch 13/15
80/80 [==============================] - 0s 227us/step - loss: 0.8177 - val_loss: 0.8789
Epoch 14/15
80/80 [==============================] - 0s 241us/step - loss: 0.8142 - val_loss: 0.8725
Epoch 15/15
80/80 [==============================] - 0s 212us/step - loss: 0.8089 - val_loss: 0.8679
I hope this helps.