I am training a CNN using keras and tensorflow. I would like to add Gaussian noise to my input data during training and reduce the percentage of the noise in further steps. What I do right now, I use:
from tensorflow.python.keras.layers import Input, GaussianNoise, BatchNormalization
inputs = Input(shape=x_train_n.shape[1:])
bn0 = BatchNormalization(axis=1, scale=True)(inputs)
g0 = GaussianNoise(0.5)(bn0)
The variable that GaussianNoise takes is the standard deviation of the noise distribution and I couldn't assign a dynamic value to it, how can I add for example a noise, and then decrease this value based on the epoch that I am in?
You can simply design a custom callback which changes the stddev before training for a epoch.
from tensorflow.keras.layers import Input, Dense, Add, Activation
from tensorflow.keras.models import Model
import tensorflow as tf
import numpy as np
import random
from tensorflow.python.keras.layers import Input, GaussianNoise, BatchNormalization
inputs = Input(shape=100)
bn0 = BatchNormalization(axis=1, scale=True)(inputs)
g0 = GaussianNoise(0.5)(bn0)
d0 = Dense(10)(g0)
model = Model(inputs, d0)
model.compile('adam', 'mse')
class MyCustomCallback(tf.keras.callbacks.Callback):
def on_epoch_begin(self, epoch, logs=None):
self.model.layers[2].stddev = random.uniform(0, 1)
print('updating sttdev in training')
X_train = np.zeros((10,100))
y_train = np.zeros((10,10))
noise_change = MyCustomCallback()
callbacks = [noise_change])
Model: "model_5"
Layer (type) Output Shape Param #
input_6 (InputLayer) [(None, 100)] 0
batch_normalization_5 (Batch (None, 100) 400
gaussian_noise_5 (GaussianNo (None, 100) 0
dense_5 (Dense) (None, 10) 1010
Total params: 1,410
Trainable params: 1,210
Non-trainable params: 200
Epoch 1/5
updating sttdev in training
1/1 [==============================] - 0s 1ms/step - loss: 1.6031
Epoch 2/5
updating sttdev in training
1/1 [==============================] - 0s 742us/step - loss: 1.5966
Epoch 3/5
updating sttdev in training
1/1 [==============================] - 0s 1ms/step - loss: 1.8818
Epoch 4/5
updating sttdev in training
1/1 [==============================] - 0s 1ms/step - loss: 1.2032
Epoch 5/5
updating sttdev in training
1/1 [==============================] - 0s 2ms/step - loss: 1.8817
<tensorflow.python.keras.callbacks.History at 0x7fc67ce9e668>
I am trying to use the tensorflow maxout implementation (https://www.tensorflow.org/addons/api_docs/python/tfa/layers/Maxout) but struggle with it;
I try to illustrate my problem: If I have the following
x_out=Dense(d, activation='relu')(x_in)
model = Model(inputs=x_in, outputs=x_out)
model.compile(optimizer='adam', loss='MeanAbsoluteError')
model.fit(X, Y, epochs=5, batch_size=32)
Then it is working normally, i.e. the loss is continuously getting smaller and I can get the estimated weights:
[array([[-0.15133516, -0.14892222, -0.64674205],
[ 0.34437487, 0.7822309 , -0.08931279],
[-0.8330534 , -0.13827904, -0.23096593]], dtype=float32),
array([-0.03069788, -0.03311999, -0.02603031], dtype=float32)]
However, when I want to use a maxout activation instead, things do not work out
x_out = tfa.layers.Maxout(3)(x_in)
model = Model(inputs=x_in, outputs=x_out)
model.compile(optimizer='adam', loss='MeanAbsoluteError')
model.fit(X, Y, epochs=5, batch_size=32)
The loss stays constant for all Epochs and
Out[141]: []
Where is my mistake?
It will only work in combination with another layer, for example a Dense layer. Also, the Maxout layer itself does not have any trainable weights as you can see in the model summary but it does have a hyperparameter num_units:
import tensorflow as tf
import tensorflow_addons as tfa
x = tf.keras.layers.Dense(3)(x_in)
x_out = tfa.layers.Maxout(3)(x)
model = tf.keras.Model(inputs=x_in, outputs=x_out)
model.compile(optimizer='adam', loss='MeanAbsoluteError')
model.fit(X, Y, epochs=5, batch_size=32)
Epoch 1/5
7/7 [==============================] - 0s 2ms/step - loss: 1.0404
Epoch 2/5
7/7 [==============================] - 0s 3ms/step - loss: 1.0361
Epoch 3/5
7/7 [==============================] - 0s 2ms/step - loss: 1.0322
Epoch 4/5
7/7 [==============================] - 0s 2ms/step - loss: 1.0283
Epoch 5/5
7/7 [==============================] - 0s 3ms/step - loss: 1.0244
Model: "model_5"
Layer (type) Output Shape Param #
input_6 (InputLayer) [(None, 3)] 0
dense_5 (Dense) (None, 3) 12
maxout_4 (Maxout) (None, 3) 0
Total params: 12
Trainable params: 12
Non-trainable params: 0
Maybe also take a look at the paper regarding Maxout:
The maxout model is simply a feed-forward achitecture, such as a multilayer perceptron or deep convolutional neural network, that uses a new type of activation function: the maxout unit.
I am using tensorflow version 2.8.0:
I have seen this issue from multiple sources all over forums, githubs, and even some here for the past 5 years with no definitive answer that has worked for me... For some reason, in certain situations, a loaded model from a previous save yields very different results from the original model evaluation. I haven't seen any well documented and investigative questions about this so I thought I'd show my full code below (simple illustration of the issue).
This is an application of transfer learning from a pre-trained tensorflow model. The model is first trained through 5 epochs on train_data, then fine tuned (with more trainable params) for 5 more. Evaluating the model on test_data shows an accuracy of 0.5671. The model is then saved and loaded in .h5 format (I have also tried the tf SavedModel format and the result is the same). The resultant loaded_model yields an evaluation accuracy on the same, unaltered test_data of 0.4535.
The result should be the same (0.5671)... so to further investigate I decided to save the fine tuned model's weights independently, construct and compile the same model architecture in new_model, and load the saved model's weights into new_model. Evaluating new_model yields the correct result, an accuracy of 0.5671. ----- Okay, so it must be the weights not saving properly right? I pulled the weights from each of these three models (model, loaded_model, new_model) and compared their flattened results. They are all the same. I really have no idea what's going on here but I'm assuming it is not random initialization, because the loaded_model evaluation results really did not perform anywhere near the fine tuned model - I would assume they would converge much closer.
import tensorflow as tf
import pandas as pd
import numpy as np
import os
import pathlib
data_dir = pathlib.Path("101_food_classes_10_percent/train")
class_names = np.array(sorted([item.name for item in data_dir.glob('*')]))
train_dir = './101_food_classes_10_percent/train/'
test_dir = './101_food_classes_10_percent/test/'
from tensorflow.keras.preprocessing.image import ImageDataGenerator
train_data = datagen.flow_from_directory(directory = train_dir,
target_size = (224,224),
batch_size = 32,
test_data = datagen.flow_from_directory(directory = test_dir,
target_size = (224,224),
batch_size = 32,
from tensorflow.keras.layers.experimental import preprocessing
data_augmentation = tf.keras.Sequential([
#preprocessing.Rescaling(1/255.) in EfficientNet it's already scaled but could use this for non-scaled
], name = 'data_augmentation')
Found 7575 images belonging to 101 classes.
Found 25250 images belonging to 101 classes.
# Build headless model - Feature Extraction
# Setup base with frozen layers
base_model = tf.keras.applications.EfficientNetB0(include_top=False)
inputs = tf.keras.layers.Input(shape = (224,224,3))
x = data_augmentation(inputs)
x = base_model(x, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x) # Pool base_model's outputs into a feature vector
outputs = tf.keras.layers.Dense(len(class_names), activation='softmax')(x)
model = tf.keras.Model(inputs,outputs)
model.compile('Adam', 'categorical_crossentropy', metrics=['accuracy'])
Model: "model_1"
Layer (type) Output Shape Param #
input_4 (InputLayer) [(None, 224, 224, 3)] 0
data_augmentation (Sequentia (None, None, None, 3) 0
efficientnetb0 (Functional) (None, None, None, 1280) 4049571
global_average_pooling2d_1 ( (None, 1280) 0
dense_1 (Dense) (None, 101) 129381
Total params: 4,178,952
Trainable params: 129,381
Non-trainable params: 4,049,571
history = model.fit(train_data, validation_data=test_data,
epochs=5, callbacks = [checkpoint_callback])
Epoch 1/5
237/237 [==============================] - 63s 230ms/step - loss: 3.4712 - accuracy: 0.2482 - val_loss: 2.4446 - val_accuracy: 0.4497
Epoch 2/5
237/237 [==============================] - 52s 221ms/step - loss: 2.3575 - accuracy: 0.4561 - val_loss: 2.0051 - val_accuracy: 0.5093
Epoch 3/5
237/237 [==============================] - 51s 216ms/step - loss: 1.9838 - accuracy: 0.5265 - val_loss: 1.8313 - val_accuracy: 0.5360
Epoch 4/5
237/237 [==============================] - 51s 212ms/step - loss: 1.7497 - accuracy: 0.5761 - val_loss: 1.7417 - val_accuracy: 0.5461
Epoch 5/5
237/237 [==============================] - 53s 221ms/step - loss: 1.6035 - accuracy: 0.6141 - val_loss: 1.7012 - val_accuracy: 0.5601
790/790 [==============================] - 87s 110ms/step - loss: 1.7294 - accuracy: 0.5481
[1.7294203042984009, 0.5480791926383972]
# Fine tuning: unfreeze some layers, lower leaning rate by 10x
# Refreeze every layer except last 5, adjust tiner tuned features down the model
for layer in base_model.layers[:-5]:
# recompile and lower learning rate by 10x
model.compile(tf.keras.optimizers.Adam(learning_rate=0.0001), 'categorical_crossentropy', metrics=['accuracy'])
Model: "model_1"
Layer (type) Output Shape Param #
input_4 (InputLayer) [(None, 224, 224, 3)] 0
data_augmentation (Sequentia (None, None, None, 3) 0
efficientnetb0 (Functional) (None, None, None, 1280) 4049571
global_average_pooling2d_1 ( (None, 1280) 0
dense_1 (Dense) (None, 101) 129381
Total params: 4,178,952
Trainable params: 910,821
Non-trainable params: 3,268,131
# Fine Tune for 5 more epochs starting with last epoch left off at:
fine_tune_epochs=10 # Total number of epochs we're after: 5 feature extraction, 5 fine tuning
history_fine_tune = model.fit(train_data,
validation_data = test_data,
epochs = fine_tune_epochs,
initial_epoch = history.epoch[-1])
Epoch 5/10
237/237 [==============================] - 59s 220ms/step - loss: 1.3571 - accuracy: 0.6543 - val_loss: 1.6403 - val_accuracy: 0.5567
Epoch 6/10
237/237 [==============================] - 51s 213ms/step - loss: 1.2478 - accuracy: 0.6688 - val_loss: 1.6805 - val_accuracy: 0.5596
Epoch 7/10
237/237 [==============================] - 46s 193ms/step - loss: 1.1424 - accuracy: 0.6964 - val_loss: 1.6352 - val_accuracy: 0.5736
Epoch 8/10
237/237 [==============================] - 45s 191ms/step - loss: 1.0902 - accuracy: 0.7065 - val_loss: 1.6494 - val_accuracy: 0.5657
Epoch 9/10
237/237 [==============================] - 46s 193ms/step - loss: 1.0229 - accuracy: 0.7275 - val_loss: 1.6348 - val_accuracy: 0.5633
Epoch 10/10
237/237 [==============================] - 45s 191ms/step - loss: 0.9704 - accuracy: 0.7434 - val_loss: 1.6990 - val_accuracy: 0.5670
790/790 [==============================] - 83s 105ms/step - loss: 1.6578 - accuracy: 0.5671
[1.657836675643921, 0.5670890808105469]
loaded_model = tf.keras.models.load_model("./101_food_classes_10_percent/big_modelh5.h5")
Model: "model_1"
Layer (type) Output Shape Param #
input_4 (InputLayer) [(None, 224, 224, 3)] 0
data_augmentation (Sequentia (None, None, None, 3) 0
efficientnetb0 (Functional) (None, None, None, 1280) 4049571
global_average_pooling2d_1 ( (None, 1280) 0
dense_1 (Dense) (None, 101) 129381
Total params: 4,178,952
Trainable params: 910,821
Non-trainable params: 3,268,131
790/790 [==============================] - 85s 104ms/step - loss: 2.1780 - accuracy: 0.4535 - loss: 2.1790 - accuracy
[2.1780412197113037, 0.4534653425216675]
# Try save_weights to another model
inputs = tf.keras.layers.Input(shape = (224,224,3))
x = data_augmentation(inputs)
x = base_model(x, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x) # Pool base_model's outputs into a feature vector
outputs = tf.keras.layers.Dense(len(class_names), activation='softmax')(x)
new_model = tf.keras.Model(inputs,outputs)
new_model.compile('Adam', 'categorical_crossentropy', metrics=['accuracy'])
Model: "model_2"
Layer (type) Output Shape Param #
input_5 (InputLayer) [(None, 224, 224, 3)] 0
data_augmentation (Sequentia (None, None, None, 3) 0
efficientnetb0 (Functional) (None, None, None, 1280) 4049571
global_average_pooling2d_2 ( (None, 1280) 0
dense_2 (Dense) (None, 101) 129381
Total params: 4,178,952
Trainable params: 910,821
Non-trainable params: 3,268,131
# Saving weights works... but not save and load_model
790/790 [==============================] - 88s 109ms/step - loss: 1.6578 - accuracy: 0.5671
[1.6578353643417358, 0.5670890808105469]
# Check if weights are the same?
m1 = model.get_weights()
m2 = new_model.get_weights()
m3 = loaded_model.get_weights()
from collections.abc import Iterable
def flatten(l):
for el in l:
if isinstance(el, Iterable) and not isinstance(el, (str, bytes)):
yield from flatten(el)
yield el
m1 = flatten(m1)
m2 = flatten(m2)
m3 = flatten(m3)
This is because you have not saved your entire model using .h5 extension, but you are using .h5 for saving the weights. Please check below code section:
model.save("./101_food_classes_10_percent/big_modelh5") # add .h5
loaded_model = tf.keras.models.load_model("./101_food_classes_10_percent/big_modelh5.h5")
Use this code to save the entire model to a HDF5 file format and try again loading it:
Check this for more details on saving model in .hdf5 format.
I am very new to neural networks in general, and I'm trying to build a basic recurrent network for a student project.
So, here's my code:
import tensorflow
from tensorflow import keras
import numpy
X = numpy.zeros((21, 210, 3))
# Some code filling X from a file
Y = numpy.array([0,0,0,0,0,0, -1,-1,-1,-1,-1,-1,-1,-1,-1, 1,1,1,1,1,1])
Y = Y*0.5 + 0.5
model = keras.models.Sequential()
#model.add(keras.layers.InputLayer(input_shape=(210, 3))) -commented out as it didn't make a difference
model.add(keras.layers.LSTM(32, input_shape=(210, 3), activation='relu'))
model.add(keras.layers.Dense(1, activation='sigmoid'))
model.fit(X, Y, epochs=5)
So, X is 21 "pages" of data with 3 features along 210 timesteps - which is how LSTM should expect it, and Y is an array of 21 values associated with each of the pages of X.
The network is supposed to learn to predict associated Y value based on a page of X.
The problem is that runnig the program speeds through the training and the accuracy stays constant (with some occasional, seemingly random spikes like 0.4286 shown below):
Model: "sequential"
Layer (type) Output Shape Param #
lstm (LSTM) (None, 32) 4608
dense (Dense) (None, 1) 33
Total params: 4,641
Trainable params: 4,641
Non-trainable params: 0
2021-07-07 14:41:56.530633: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/5
1/1 [==============================] - 1s 961ms/step - loss: 0.0000e+00 - accuracy: 0.2857
Epoch 2/5
1/1 [==============================] - 0s 39ms/step - loss: 0.0000e+00 - accuracy: 0.2857
Epoch 3/5
1/1 [==============================] - 0s 35ms/step - loss: 0.0000e+00 - accuracy: 0.2857
Epoch 4/5
1/1 [==============================] - 0s 36ms/step - loss: 0.0000e+00 - accuracy: 0.4286
Epoch 5/5
1/1 [==============================] - 0s 34ms/step - loss: 0.0000e+00 - accuracy: 0.2857
Process finished with exit code 0
What am I doing wrong? Again, it is all very new to me, so i assume it can be almost anything.
Thanks in advance
Having a dataset of monthly activity of users, segment to country and browser. each row is 1 day of user activity summed up and a score for that daily activity. For example: number of sessions per day is one feature. The score is a floating point number calculated from that daily features.
My goal is to try and predict the "average user" score at the end of the month using just 2 days of users data.
I have 25 month of data, some are full and some have only partial of the total days, in order to have a fixed batch size I've padded the sequences like so:
from keras.preprocessing.sequence import pad_sequences
padded_sequences = pad_sequences(sequences, maxlen=None, dtype='float64', padding='pre', truncating='post', value=-10.)
so sequences with less then the max where padded with -10 rows.
I've decided to create an LSTM model to digest the data, so at the end of each batch the model should predict the average user score. Then later I'll try to predict using just 2 days sample.
My Model look like that:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dropout,Dense,Masking
from tensorflow.keras import metrics
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.optimizers import Adam
import datetime, os
model = Sequential()
opt = Adam(learning_rate=0.0001, clipnorm=1)
num_samples = train_x.shape[1]
num_features = train_x.shape[2]
model.add(Masking(mask_value=-10., input_shape=(num_samples, num_features)))
model.add(LSTM(64, return_sequences=True, activation='relu'))
#this is the last LSTM layer, use return_sequences=False
model.add(LSTM(64, return_sequences=False, stateful=False, activation='relu'))
model.compile(loss='mse', optimizer='adam' ,metrics=['acc',metrics.mean_squared_error])
logdir = os.path.join(logs_base_dir, datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = TensorBoard(log_dir=logdir, update_freq=1)
Model: "sequential_13"
Layer (type) Output Shape Param #
masking_5 (Masking) (None, 4283, 16) 0
lstm_20 (LSTM) (None, 4283, 64) 20736
dropout_14 (Dropout) (None, 4283, 64) 0
lstm_21 (LSTM) (None, 64) 33024
dropout_15 (Dropout) (None, 64) 0
dense_9 (Dense) (None, 1) 65
Total params: 53,825
Trainable params: 53,825
Non-trainable params: 0
While training I get NaN value on the 19th epoch
Epoch 16/1000
16/16 [==============================] - 14s 855ms/sample - loss: 298.8135 - acc: 0.0000e+00 - mean_squared_error: 298.8135 - val_loss: 220.7307 - val_acc: 0.0000e+00 - val_mean_squared_error: 220.7307
Epoch 17/1000
16/16 [==============================] - 14s 846ms/sample - loss: 290.3051 - acc: 0.0000e+00 - mean_squared_error: 290.3051 - val_loss: 205.3393 - val_acc: 0.0000e+00 - val_mean_squared_error: 205.3393
Epoch 18/1000
16/16 [==============================] - 14s 869ms/sample - loss: 272.1889 - acc: 0.0000e+00 - mean_squared_error: 272.1889 - val_loss: nan - val_acc: 0.0000e+00 - val_mean_squared_error: nan
Epoch 19/1000
16/16 [==============================] - 14s 852ms/sample - loss: nan - acc: 0.0000e+00 - mean_squared_error: nan - val_loss: nan - val_acc: 0.0000e+00 - val_mean_squared_error: nan
Epoch 20/1000
16/16 [==============================] - 14s 856ms/sample - loss: nan - acc: 0.0000e+00 - mean_squared_error: nan - val_loss: nan - val_acc: 0.0000e+00 - val_mean_squared_error: nan
Epoch 21/1000
I tried to apply the methods described here with no real success.
I've changed my activation from relu to tanh and it solved the NaN issue. However it seems that the accuracy of my model stays 0 while the loss goes down
Epoch 100/1000
16/16 [==============================] - 14s 869ms/sample - loss: 22.8179 - acc: 0.0000e+00 - mean_squared_error: 22.8179 - val_loss: 11.7422 - val_acc: 0.0000e+00 - val_mean_squared_error: 11.7422
Q: What am I doing wrong here?
You are solving a regression task, using accuracy is not meaningful here.
Use mean_absollute_error to check if your error is decreasing over time or not.
Instead of blindly predicting the score, you can make the score bounded to (0, 1).
Just use a min max normalization to bring the output in a range https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html
After that you can use sigmoid in last layer.
Also, you're choosing slightly longer sequences for this simple model 4283, how skewed your sequence lengths are?
Maybe do a histogram plot of all the signal length and see if 4283 is, in fact, a good choice or not. Maybe you can bring this down to something like 512 which may become easier for the model.
Also, padding with -10 seems a pretty weird choice is it something specific for your data or you're choosing randomly? This -10 also suggests you're not normalizing your input data which can become a problem with an LSTM with relu, maybe you should try to normalizing it before training.
After these add a validation plot of the mean absolute error if the performance is still not good.
My aim is to use SVD to PCA whiten the latent layer before passing it to the decoder module of an autoencoder. I have used tf.linalg.svd but it does not work since it does not contain necessary Keras parameters. So as a workaround I was trying to wrap it inside Lambda but got this error
AttributeError: 'tuple' object has no attribute 'shape'.
I tried SO (E.g. Using SVD in a custom layer in Keras/tensorflow) and did Google search for SVD in Keras but could not find any answers. I have attached a stripped but functional code here:
import numpy as np
import tensorflow as tf
from sklearn import preprocessing
from keras.layers import Lambda, Input, Dense, Multiply, Subtract
from keras.models import Model
from keras import backend as K
from keras.losses import mse
from keras import optimizers
from keras.callbacks import EarlyStopping
x = np.random.randn(100, 5)
train_data = preprocessing.scale(x)
input_shape = (5, )
original_dim = train_data.shape[1]
intermediate_dim_1 = 64
intermediate_dim_2 = 16
latent_dim = 2
batch_size = 10
epochs = 15
# build encoder model
inputs = Input(shape=input_shape, name='encoder_input')
layer_1 = Dense(intermediate_dim_1, activation='tanh') (inputs)
layer_2 = Dense(intermediate_dim_2, activation='tanh') (layer_1)
encoded_layer = Dense(latent_dim, name='latent_layer') (layer_2)
encoder = Model(inputs, encoded_layer, name='encoder')
# build decoder model
latent_inputs = Input(shape=(latent_dim,))
layer_1 = Dense(intermediate_dim_1, activation='tanh') (latent_inputs)
layer_2 = Dense(intermediate_dim_2, activation='tanh') (layer_1)
outputs = Dense(original_dim,activation='sigmoid') (layer_2)
decoder = Model(latent_inputs, outputs, name='decoder')
# mean removal and pca whitening
meanX = Lambda(lambda x: tf.reduce_mean(x, axis=0, keepdims=True))(encoded_layer)
standardized = Subtract()([encoded_layer, meanX])
sigma2 = K.dot(K.transpose(standardized), standardized)
sigma2 = Lambda(lambda x: x / batch_size)(sigma2)
s, u ,v = tf.linalg.svd(sigma2,compute_uv=True)
# s ,u ,v = Lambda(lambda x: tf.linalg.svd(x,compute_uv=True))(sigma2)
epsilon = 1e-6
# sqrt of number close to 0 leads to problem hence replace it with epsilon
si = tf.where(tf.less(s, epsilon), tf.sqrt(1 / epsilon) * tf.ones_like(s),
tf.math.truediv(1.0, tf.sqrt(s)))
whitening_layer = u # tf.linalg.diag(si) # tf.transpose(v)
whitened_encoding = K.dot(standardized, whitening_layer)
# Connect models
z_decoded = decoder(standardized)
# z_decoded = decoder(whitened_encoding)
# Define losses
reconstruction_loss = mse(inputs,z_decoded)
# Instantiate autoencoder
ae = Model(inputs, z_decoded, name='autoencoder')
# callback = EarlyStopping(monitor='val_loss', patience=5)
adam = optimizers.adam(learning_rate=0.002)
ae.fit(train_data, epochs=epochs, batch_size=batch_size,
validation_split=0.2, shuffle=True)
To reproduce the error uncomment these lines and comment the one preceding it:
z_decoded = decoder(whitened_encoding)
s ,u ,v = Lambda(lambda x: tf.linalg.svd(x,compute_uv=True))(sigma2)
I would appreciate it if someone could tell me how to wrap the SVD inside Keras layers or an alternate implementation.
Please note that I have not included the reparameterization trick to calculate the loss to keep the code simple.
Thank you !
I solved the problem. To use SVD inside Keras, we need to use the Lambda layer. However, as Lambda returns a tensor with some additional attributes, it is best to do additional work inside the lambda function and return a tensor. Another problem with my code was the combination of encoder and decoder model which I fixed by combining the output of encoder to the input of decoder model. The working code is as follows:
import numpy as np
import tensorflow as tf
from sklearn import preprocessing
from keras.layers import Lambda, Input, Dense, Multiply, Subtract
from keras.models import Model
from keras import backend as K
from keras.losses import mse
from keras import optimizers
from keras.callbacks import EarlyStopping
def SVD(sigma2):
s ,u ,v = tf.linalg.svd(sigma2,compute_uv=True)
epsilon = 1e-6
# sqrt of number close to 0 leads to problem hence replace it with epsilon
si = tf.where(tf.less(s, epsilon),
tf.sqrt(1 / epsilon) * tf.ones_like(s),
tf.math.truediv(1.0, tf.sqrt(s)))
whitening_layer = u # tf.linalg.diag(si) # tf.transpose(v)
return whitening_layer
x = np.random.randn(100, 5)
train_data = preprocessing.scale(x)
input_shape = (5, )
original_dim = train_data.shape[1]
intermediate_dim_1 = 64
intermediate_dim_2 = 16
latent_dim = 2
batch_size = 10
epochs = 15
# build encoder model
inputs = Input(shape=input_shape, name='encoder_input')
layer_1 = Dense(intermediate_dim_1, activation='tanh') (inputs)
layer_2 = Dense(intermediate_dim_2, activation='tanh') (layer_1)
encoded_layer = Dense(latent_dim, name='latent_layer') (layer_2)
encoder = Model(inputs, encoded_layer, name='encoder')
# build decoder model
latent_inputs = Input(shape=(latent_dim,))
layer_1 = Dense(intermediate_dim_1, activation='tanh') (latent_inputs)
layer_2 = Dense(intermediate_dim_2, activation='tanh') (layer_1)
outputs = Dense(original_dim,activation='sigmoid') (layer_2)
decoder = Model(latent_inputs, outputs, name='decoder')
# mean removal and pca whitening
meanX = Lambda(lambda x: tf.reduce_mean(x, axis=0, keepdims=True))(encoded_layer)
standardized = Subtract()([encoded_layer, meanX])
sigma2 = K.dot(K.transpose(standardized), standardized)
sigma2 = Lambda(lambda x: x / batch_size)(sigma2)
# s, u ,v = tf.linalg.svd(sigma2,compute_uv=True)
whitening_layer = Lambda(SVD)(sigma2)
s ,u ,v = Lambda(lambda x: tf.linalg.svd(x,compute_uv=True))(sigma2)
epsilon = 1e-6
# sqrt of number close to 0 leads to problem hence replace it with epsilon
si = tf.where(tf.less(s, epsilon),
tf.sqrt(1 / epsilon) * tf.ones_like(s),
tf.math.truediv(1.0, tf.sqrt(s)))
whitening_layer = u # tf.linalg.diag(si) # tf.transpose(v)
print('whitening_layer shape=', np.shape(whitening_layer))
print('standardized shape=', np.shape(standardized))
whitened_encoding = K.dot(standardized, whitening_layer)
# Connect models
# z_decoded = decoder(standardized)
z_decoded = decoder(encoder(inputs))
# Define losses
reconstruction_loss = mse(inputs,z_decoded)
# Instantiate autoencoder
ae = Model(inputs, z_decoded, name='autoencoder')
# callback = EarlyStopping(monitor='val_loss', patience=5)
adam = optimizers.adam(learning_rate=0.002)
ae.fit(train_data, epochs=epochs, batch_size=batch_size,
validation_split=0.2, shuffle=True)
The output of running the code is as follows:
Model: "encoder"
Layer (type) Output Shape Param #
encoder_input (InputLayer) (None, 5) 0
dense_1 (Dense) (None, 64) 384
dense_2 (Dense) (None, 16) 1040
latent_layer (Dense) (None, 2) 34
Total params: 1,458
Trainable params: 1,458
Non-trainable params: 0
Model: "decoder"
Layer (type) Output Shape Param #
input_1 (InputLayer) (None, 2) 0
dense_3 (Dense) (None, 64) 192
dense_4 (Dense) (None, 16) 1040
dense_5 (Dense) (None, 5) 85
Total params: 1,317
Trainable params: 1,317
Non-trainable params: 0
whitening_layer shape= (2, 2)
standardized shape= (None, 2)
/home/manish/anaconda3/envs/ica_gpu/lib/python3.7/site-packages/keras/engine/training_utils.py:819: UserWarning: Output decoder missing from loss dictionary. We assume this was done on purpose. The fit and evaluate APIs will not be expecting any data to be passed to decoder.
'be expecting any data to be passed to {0}.'.format(name))
Model: "autoencoder"
Layer (type) Output Shape Param #
encoder_input (InputLayer) (None, 5) 0
encoder (Model) (None, 2) 1458
decoder (Model) (None, 5) 1317
Total params: 2,775
Trainable params: 2,775
Non-trainable params: 0
Train on 80 samples, validate on 20 samples
Epoch 1/15
2020-05-16 16:01:55.443061: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
80/80 [==============================] - 0s 3ms/step - loss: 1.1739 - val_loss: 1.2238
Epoch 2/15
80/80 [==============================] - 0s 228us/step - loss: 1.0601 - val_loss: 1.0921
Epoch 3/15
80/80 [==============================] - 0s 261us/step - loss: 0.9772 - val_loss: 1.0291
Epoch 4/15
80/80 [==============================] - 0s 223us/step - loss: 0.9385 - val_loss: 0.9875
Epoch 5/15
80/80 [==============================] - 0s 262us/step - loss: 0.9105 - val_loss: 0.9560
Epoch 6/15
80/80 [==============================] - 0s 240us/step - loss: 0.8873 - val_loss: 0.9335
Epoch 7/15
80/80 [==============================] - 0s 217us/step - loss: 0.8731 - val_loss: 0.9156
Epoch 8/15
80/80 [==============================] - 0s 253us/step - loss: 0.8564 - val_loss: 0.9061
Epoch 9/15
80/80 [==============================] - 0s 273us/step - loss: 0.8445 - val_loss: 0.8993
Epoch 10/15
80/80 [==============================] - 0s 235us/step - loss: 0.8363 - val_loss: 0.8937
Epoch 11/15
80/80 [==============================] - 0s 283us/step - loss: 0.8299 - val_loss: 0.8874
Epoch 12/15
80/80 [==============================] - 0s 254us/step - loss: 0.8227 - val_loss: 0.8832
Epoch 13/15
80/80 [==============================] - 0s 227us/step - loss: 0.8177 - val_loss: 0.8789
Epoch 14/15
80/80 [==============================] - 0s 241us/step - loss: 0.8142 - val_loss: 0.8725
Epoch 15/15
80/80 [==============================] - 0s 212us/step - loss: 0.8089 - val_loss: 0.8679
I hope this helps.