How to fit basic custom built model in tensorflow - python

I am used to working in PyTorch but now have to learn Tensorflow for my job. I am trying to get up to speed by creating a simple dense network and training it on the MNIST dataset, but I cannot get it to train. My super simple code:
import tensorflow as tf
from tensorflow.keras import Input
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.utils import to_categorical
# Load mnist data from keras
(train_data, train_label), (test_data, test_label) = tf.keras.datasets.mnist.load_data(path="mnist.npz")
train_label, test_label = to_categorical(train_label), to_categorical(test_label)
train_data, train_label, test_data, test_label = Flatten()(train_data), Flatten()(train_label), Flatten()(test_data), Flatten()(test_label)
# Create generic SGD optimizer (no learning schedule)
optimizer = SGD(learning_rate = 0.01)
# Define function to build and compile model
def build_mnist_model(input_shape, batch_size = 30):
input_img = Input(shape = input_shape, batch_size = batch_size)
# Pass through dense layer
x = Dense(200, activation = 'relu', use_bias = True)(input_img)
x = Dense(400, activation = 'relu', use_bias = True)(x)
scores = Dense(10, activation = 'softmax', use_bias = True)(x)
# Create and compile tf model
mnist_model = Model(input_img, scores)
mnist_model.compile(optimizer = optimizer, loss = 'categorical_crossentropy')
return mnist_model
# Build the model
mnist_model = build_mnist_model(train_data[0].shape)
# Train the model
mnist_model.fit(
x = train_data,
y = train_label,
batch_size = 30,
epochs = 20,
verbose = 2,
shuffle = True,
# steps_per_epoch = 200
)
When I run this I get
ValueError: When using data tensors as input to a model, you should specify the `steps_per_epoch` argument.
This does not really make sense to me because my train_data and train_label are just regular tensors and per the Tensorflow documentation in this case it should default to the number of samples in the dataset divided by the batch size (which would be 200 in my case).
At any rate, I tried specifying steps_per_epoch = 200 when I call mnist_model.fit() but then I get a different error:
InvalidArgumentError: Incompatible shapes: [60000,10] vs. [30,1]
[[{{node training_4/SGD/gradients/gradients/loss_5/dense_17_loss/softmax_cross_entropy_with_logits_grad/mul}}]]
I can't seem to discern where a size mismatch would come from. In PyTorch, I am used to manually creating batches (by subindexing my data and label tensors) but in Tensorflow this seems to happen automatically. As such, this leaves me quite confused about what batch has the wrong size, how it got the wrong size, etc. I hope this simple model is way easier than I am making it and I just do not know the Tensorflow tricks yet.
Thanks for the help.

Related

ValueError when using ModelCheckpoint in Keras

I'm creating an Ensemble of Vgg19, DenseNet, and EfficientNetB1.
The code is as follows:
IMAGE_SIZE = (224,224,3)
import tensorflow as tf
vgg19 = tf.keras.applications.vgg19.VGG19(
input_shape=IMAGE_SIZE, weights='imagenet', include_top=False)
for layer in vgg19.layers:
layer._name = layer._name + str('_19')
layer.trainable = False
effnetb1 =tf.keras.applications.efficientnet.EfficientNetB1(
include_top=False, weights='imagenet', input_shape=IMAGE_SIZE)
for layer in effnetb1.layers:
layer._name = layer._name + str('_B1')
layer.trainable=False
densenet=tf.keras.applications.densenet.DenseNet121(
include_top=False, weights="imagenet", input_shape=IMAGE_SIZE)
for layer in densenet.layers:
layer._name = layer._name + str('_Dense')
layer.trainable=False
from keras.layers import Input, Flatten, Concatenate, Dense, Average, Dropout
inp = Input(IMAGE_SIZE)
vgg19_x = Flatten()(vgg19(inp))
vgg19_x = Dense(256, activation='relu')(vgg19_x)
effnet_x = Flatten()(effnetb1(inp))
effnet_x = Dense(256, activation='relu')(effnet_x)
densenet_x = Flatten()(densenet(inp))
densenet_x = Dense(256, activation='relu')(densenet_x)
from keras.models import Model
x = Concatenate()([vgg19_x, effnet_x, densenet_x])
x = Dense(128, activation='relu')(x)
x = Dropout(0.30)(x)
x = Dense(64, activation='relu')(x)
out = Dense(2, activation='softmax')(x)
model = Model(inputs = inp, outputs = out)
model.compile(
loss='categorical_crossentropy',
optimizer=tf.keras.optimizers.Adam(
learning_rate=0.0005,
name="Adam"),
metrics=['accuracy']
)
model.summary()
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
checkpointer = ModelCheckpoint(filepath="/content/drive/MyDrive/ensemble/ensemble-weights.hdf5", verbose=1, save_best_only=True)
r = model.fit(
training_set,
validation_data=test_set,
epochs=30,
steps_per_epoch=len(training_set),
validation_steps=len(test_set),
callbacks = [checkpointer]
)
The code runs fine and the training is successfully taking place when I'm not using the callback. But when I use a ModelCheckpoint, I get the following error after 1st epoch:
ValueError: The target structure is of type `<class 'keras.engine.keras_tensor.KerasTensor'>`
KerasTensor(type_spec=TensorSpec(shape=(None, 224, 224, 3), dtype=tf.float32, name='input_5'), name=...
However, the input structure is a sequence (<class 'list'>) of length 0.
[]
nest cannot guarantee that it is safe to map one to the other.
Can anyone tell me what's wrong here? Also, is it because I'm concatenating three models?
Your help will be appreciated. Thank you!
I also ran into this issue while trying to implement a nested model (which is what would be constructed here after you create the concatenated model).
The issue seems to be that Keras cannot handle the inputs and outputs of nested models in newer tensorflow versions(tf 2.0 and above). Depending on the version you are on, you might want to either explicitly refer the input/output of the nested model you are using. In tf2.6, what seems to work is to define separate models for each part - ie - the common layers added after concatenation should also be wrapped in a model like below (taken from here):
#Make GradCAM heatmap following the Keras tutorial.
last_conv_layer = model.layers[-4].layers[-1]
last_conv_layer_model = keras.Model(model.layers[-4].inputs, last_conv_layer.output)
# Second, we create a model that maps the activations of the last conv
# layer to the final class predictions
classifier_input = keras.Input(shape=last_conv_layer.output.shape[1:])
x = classifier_input
for layer in model.layers[-3:]:
x = layer(x)
classifier_model = keras.Model(classifier_input, x)
#Preparing the image with the preprocessing layers
preprocess_layers = keras.Model(model.inputs, model.layers[-5].output)
img_array = preprocess_layers(prepared_image)
# Then, we compute the gradient of the top predicted class for our input image
# with respect to the activations of the last conv layer
with tf.GradientTape() as tape:
# Compute activations of the last conv layer and make the tape watch it
last_conv_layer_output = last_conv_layer_model(img_array)
tape.watch(last_conv_layer_output)
# Compute class predictions
preds = classifier_model(last_conv_layer_output)
top_pred_index = tf.argmax(preds[0])
top_class_channel = preds[:, top_pred_index]
# This is the gradient of the top predicted class with regard to
# the output feature map of the last conv layer
grads = tape.gradient(top_class_channel, last_conv_layer_output)
You can also check the following github issues (they are not very related, but deal with a similar problem) - issue1, issue2, issue3

Cannot convert a symbolic input/output to a numpy array

I am trying to run a deep learning code that I found in a tutorial in order to familiarise myself with resnet50, keras and tensorflow with python 3.7. When I run my code, I get the following error:
TypeError: Cannot convert a symbolic Keras input/output to a numpy array. This error may indicate that you're trying to pass a symbolic value to a NumPy call, which is not supported. Or, you may be trying to pass Keras symbolic inputs/outputs to a TF API that does not register dispatching, preventing Keras from automatically converting the API call to a lambda layer in the Functional Model.
I tried to use the following fix as mentioned on stack overflow:
from tensorflow.python.framework.ops import disable_eager_execution
disable_eager_execution()
Without any success. My full code can be seen below:
from keras.applications.resnet50 import ResNet50
from keras.layers import Dense, GlobalAveragePooling2D
from keras.models import Model
from keras.optimizers import SGD
from keras.preprocessing.image import ImageDataGenerator
import numpy as np
from keras.preprocessing import image
from sklearn.linear_model import LogisticRegression
from tensorflow.python.framework.ops import disable_eager_execution
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
# Download the architecture of ResNet50 with ImageNet weights
base_model = ResNet50(include_top=False, weights='imagenet')
# Taking the output of the last convolution block in ResNet50
x = base_model.output
# Adding a Global Average Pooling layer
x = GlobalAveragePooling2D()(x)
# Adding a fully connected layer having 1024 neurons
x = Dense(1024, activation='relu')(x)
# Adding a fully connected layer having 2 neurons which will
# give probability of image having either dog or cat
predictions = Dense(2, activation='softmax')(x)
# Model to be trained
model = Model(inputs=base_model.input, outputs=predictions)
# Training only top layers i.e. the layers which we have added in the end
for layer in base_model.layers:
layer.trainable = False
# Compiling the model
model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy', metrics = ['accuracy'],
experimental_run_tf_function=False)
# Creating objects for image augmentations
train_datagen = ImageDataGenerator(rescale = 1./255,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True)
test_datagen = ImageDataGenerator(rescale = 1./255)
# Proving the path of training and test dataset
# Setting the image input size as (224, 224)
# We are using class mode as binary because there are only two classes in our data
training_set = train_datagen.flow_from_directory('training_set',
target_size = (224, 224),
batch_size = 32,
class_mode = 'categorical')
test_set = test_datagen.flow_from_directory('test_set',
target_size = (224, 224),
batch_size = 32,
class_mode = 'categorical')
# Training the model for 5 epochs
model.fit_generator(training_set,
steps_per_epoch = 8000,
epochs = 5,
validation_data = test_set,
validation_steps = 2000)
# We will try to train the last stage of ResNet50
for layer in base_model.layers[0:143]:
layer.trainable = False
for layer in base_model.layers[143:]:
layer.trainable = True
# Training the model for 10 epochs
model.fit_generator(training_set,
steps_per_epoch = 8000,
epochs = 10,
validation_data = test_set,
validation_steps = 2000)
# Saving the weights in the current directory
model.save_weights("resnet50_weights.h5")
# Predicting the final result of image
test_image = image.load_img('cat_or_dog_test.jpg', target_size = (224, 224))
test_image = image.img_to_array(test_image)\
# Expanding the 3-d image to 4-d image.
# The dimensions will be Batch, Height, Width, Channel
test_image = np.expand_dims(test_image, axis = 0)
# Predicting the final class
classifier = LogisticRegression()
result = classifier.predict(test_image)
# Fetching the class labels
labels = training_set.class_indices
labels = list(labels.items())
# Printing the final label
for label, i in labels:
if i == result:
print("The test image has: ", label)
break
I had the same problem when using: from keras import Input;
But, when I change to: from tensorflow.keras import Input, it works!
I assume that the following line is where the error occurs:
test_image = np.expand_dims(test_image, axis = 0)
The reason is probably that you try to apply a numpy function to a tensor. Don't do that. Either convert your tensor to numpy or use a function that work on tensors. Normally, I'd say prefer the second option over the first one (it will prevent unnecessary conversions and make your code more efficient). In your case you will need to convert your tensor to numpy because you are using sklearn afterward:
test_image = np.expand_dims(test_image.numpy(), axis=0)
I am new to DL and I received a similar error a nd the following has helped me.
Try:
del base_model
Before:
base_model = ResNet50(include_top=False, weights='imagenet')
and also simultaneously:
Try:
del model
Before:
model = Model(inputs=base_model.input, outputs=predictions)
Please let me know if this has helped you or hasn't :) .
Try using tensorflow.keras.something instead of keras.something.
It worked for me.
Ofcourse you have to also import tensorlfow

Training a tf.keras model with a basic low-level TensorFlow training loop doesn't work

Note: All code for a self-contained example to reproduce my problem can be found below.
I have a tf.keras.models.Model instance and need to train it with a training loop written in the low-level TensorFlow API.
The problem:
Training the exact same tf.keras model once with a basic, standard low-level TensorFlow training loop and once with Keras' own model.fit() method produces very different results. I would like to find out what I'm doing wrong in my low-level TF training loop.
The model is a simple image classification model that I train on Caltech256 (link to tfrecords below).
With the low-level TensorFlow training loop, the training loss first decreases as it should, but then after just 1000 training steps, the loss plateaus and then starts increasing again:
Training the same model on the same dataset using the normal Keras training loop, on the other hand, works as expected:
What am I missing in my low-level TensorFlow training loop?
Here is the code to reproduce the problem (download the TFRecords with the link at the bottom):
import tensorflow as tf
from tqdm import trange
import sys
import glob
import os
sess = tf.Session()
tf.keras.backend.set_session(sess)
num_classes = 257
image_size = (224, 224, 3)
# Build a tf.data.Dataset from TFRecords.
tfrecord_directory = 'path/to/tfrecords/directory'
tfrecord_filennames = glob.glob(os.path.join(tfrecord_directory, '*.tfrecord'))
feature_schema = {'image': tf.FixedLenFeature([], tf.string),
'filename': tf.FixedLenFeature([], tf.string),
'label': tf.FixedLenFeature([], tf.int64)}
dataset = tf.data.Dataset.from_tensor_slices(tfrecord_filennames)
dataset = dataset.shuffle(len(tfrecord_filennames)) # Shuffle the TFRecord file names.
dataset = dataset.flat_map(lambda filename: tf.data.TFRecordDataset(filename))
dataset = dataset.map(lambda single_example_proto: tf.parse_single_example(single_example_proto, feature_schema)) # Deserialize tf.Example objects.
dataset = dataset.map(lambda sample: (sample['image'], sample['label']))
dataset = dataset.map(lambda image, label: (tf.image.decode_jpeg(image, channels=3), label)) # Decode JPEG images.
dataset = dataset.map(lambda image, label: (tf.image.resize_image_with_pad(image, target_height=image_size[0], target_width=image_size[1]), label))
dataset = dataset.map(lambda image, label: (tf.image.per_image_standardization(image), label))
dataset = dataset.map(lambda image, label: (image, tf.one_hot(indices=label, depth=num_classes))) # Convert labels to one-hot format.
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.repeat()
dataset = dataset.batch(32)
iterator = dataset.make_one_shot_iterator()
features, labels = iterator.get_next()
# Build a simple model.
input_tensor = tf.keras.layers.Input(shape=image_size)
x = tf.keras.layers.Conv2D(64, (3,3), strides=(2,2), activation='relu', kernel_initializer='he_normal')(input_tensor)
x = tf.keras.layers.Conv2D(64, (3,3), strides=(2,2), activation='relu', kernel_initializer='he_normal')(x)
x = tf.keras.layers.Conv2D(128, (3,3), strides=(2,2), activation='relu', kernel_initializer='he_normal')(x)
x = tf.keras.layers.Conv2D(256, (3,3), strides=(2,2), activation='relu', kernel_initializer='he_normal')(x)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dense(num_classes, activation=None, kernel_initializer='he_normal')(x)
model = tf.keras.models.Model(input_tensor, x)
This is the simple TensorFlow training loop:
# Build the training-relevant part of the graph.
model_output = model(features)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=tf.stop_gradient(labels), logits=model_output))
train_op = tf.train.AdamOptimizer().minimize(loss)
# The next block is for the metrics.
with tf.variable_scope('metrics') as scope:
predictions_argmax = tf.argmax(model_output, axis=-1, output_type=tf.int64)
labels_argmax = tf.argmax(labels, axis=-1, output_type=tf.int64)
mean_loss_value, mean_loss_update_op = tf.metrics.mean(loss)
acc_value, acc_update_op = tf.metrics.accuracy(labels=labels_argmax, predictions=predictions_argmax)
local_metric_vars = tf.contrib.framework.get_variables(scope=scope, collection=tf.GraphKeys.LOCAL_VARIABLES)
metrics_reset_op = tf.variables_initializer(var_list=local_metric_vars)
# Run the training
epochs = 3
steps_per_epoch = 1000
fetch_list = [mean_loss_value,
acc_value,
train_op,
mean_loss_update_op,
acc_update_op]
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
with sess.as_default():
for epoch in range(1, epochs+1):
tr = trange(steps_per_epoch, file=sys.stdout)
tr.set_description('Epoch {}/{}'.format(epoch, epochs))
sess.run(metrics_reset_op)
for train_step in tr:
ret = sess.run(fetch_list, feed_dict={tf.keras.backend.learning_phase(): 1})
tr.set_postfix(ordered_dict={'loss': ret[0],
'accuracy': ret[1]})
Below is the standard Keras training loop, which works as expected. Note that the activation of the dense layer in the model above needs to be changed from None to 'softmax' in order for the Keras loop to work.
epochs = 3
steps_per_epoch = 1000
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(dataset,
epochs=epochs,
steps_per_epoch=steps_per_epoch)
You can download the TFRecords for the Caltech256 dataset here (about 850 MB).
UPDATE:
I've managed to solve the problem: Replacing the low-level TF loss function
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=tf.stop_gradient(labels), logits=model_output))
by its Keras equivalent
loss = tf.reduce_mean(tf.keras.backend.categorical_crossentropy(target=labels, output=model_output, from_logits=True))
does the trick. Now the low-level TensorFlow training loop behaves just like model.fit().
This raises a new question:
What does tf.keras.backend.categorical_crossentropy() do that tf.nn.softmax_cross_entropy_with_logits_v2() doesn't that leads the latter to perform much worse? (I know that the latter needs logits, not softmax output, so that's not the issue)
Replacing the low-level TF loss function
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=tf.stop_gradient(labels), logits=model_output))
by its Keras equivalent
loss = tf.reduce_mean(tf.keras.backend.categorical_crossentropy(target=labels, output=model_output, from_logits=True))
does the trick. Now the low-level TensorFlow training loop behaves just like model.fit().
However, I don't know why this is. If anyone knows why tf.keras.backend.categorical_crossentropy() behaves well while tf.nn.softmax_cross_entropy_with_logits_v2() doesn't work at all, please post an answer.
Another important note:
In order to train a tf.keras model with a low-level TF training loop and a tf.data.Dataset object, one generally shouldn't call the model on the iterator output. That is, one shouldn't do this:
model_output = model(features)
Instead, one should create a model in which the input layer is set to build on the iterator output instead of creating a placeholder, like so:
input_tensor = tf.keras.layers.Input(tensor=features)
This doesn't matter in this example, but it becomes relevant if any layers in the model have internal updates that need to be run during the training (e.g. BatchNormalization).
You apply a softmax activation on your last layer
x = tf.keras.layers.Dense(num_classes, activation='softmax', kernel_initializer='he_normal')(x)
and you apply again a softmax when using
tf.nn.softmax_cross_entropy_with_logits_v2 as it expects unscaled logits. From the documentation:
WARNING: This op expects unscaled logits, since it performs a softmax
on logits internally for efficiency. Do not call this op with the
output of softmax, as it will produce incorrect results.
Thus, remove the softmax activation of your last layer and it should work.
x = tf.keras.layers.Dense(num_classes, activation=None, kernel_initializer='he_normal')(x)
[...]
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=tf.stop_gradient(labels), logits=model_output))

Same model produces consistently different accuracies in Keras and Tensorflow

I'm trying to implement the same model in Keras, and in Tensorflow using Keras layers, using custom data. The two models produce consistently different accuracies over many times of training (keras ~71%, tensorflow ~65%). I want tensorflow to do as well as keras so I can go into the tensorflow iterations to tweak some lower level algorithms.
Here's my original Keras code:
from keras.layers import Dense, Dropout, Input
from keras.models import Model, Sequential
from keras import backend as K
input_size = 2000
num_classes = 4
num_industries = 22
num_aux_inputs = 3
main_input = Input(shape=(input_size,),name='text_vectors')
x = Dense(units=64, activation='relu', name = 'dense1')(main_input)
drop1 = Dropout(0.2,name='dropout1')(x)
auxiliary_input = Input(shape=(num_aux_inputs,), name='aux_input')
x = keras.layers.concatenate([drop1,auxiliary_input])
x = Dense(units=64, activation='relu',name='dense2')(x)
drop2 = Dropout(0.1,name='dropout2')(x)
x = Dense(units=32, activation='relu',name='dense3')(drop2)
main_output = Dense(units=num_classes,
activation='softmax',name='main_output')(x)
model = Model(inputs=[main_input, auxiliary_input],
outputs=main_output)
model.compile(loss=keras.losses.categorical_crossentropy, metrics= ['accuracy'],optimizer=keras.optimizers.Adadelta())
history = model.fit([train_x,train_x_auxiliary], train_y, batch_size=128, epochs=20, verbose=1, validation_data=([val_x,val_x_auxiliary], val_y))
loss, accuracy = model.evaluate([val_x,val_x_auxiliary], val_y, verbose=0)
Here's I moved the keras layers to tensorflow following this article:
import tensorflow as tf
from keras import backend as K
import keras
from keras.layers import Dense, Dropout, Input # Dense layers are "fully connected" layers
from keras.metrics import categorical_accuracy as accuracy
from keras.objectives import categorical_crossentropy
tf.reset_default_graph()
sess = tf.Session()
K.set_session(sess)
input_size = 2000
num_classes = 4
num_industries = 22
num_aux_inputs = 3
x = tf.placeholder(tf.float32, shape=[None, input_size], name='X')
x_aux = tf.placeholder(tf.float32, shape=[None, num_aux_inputs], name='X_aux')
y = tf.placeholder(tf.float32, shape=[None, num_classes], name='Y')
# build graph
layer = Dense(units=64, activation='relu', name = 'dense1')(x)
drop1 = Dropout(0.2,name='dropout1')(layer)
layer = keras.layers.concatenate([drop1,x_aux])
layer = Dense(units=64, activation='relu',name='dense2')(layer)
drop2 = Dropout(0.1,name='dropout2')(layer)
layer = Dense(units=32, activation='relu',name='dense3')(drop2)
output_logits = Dense(units=num_classes, activation='softmax',name='main_output')(layer)
loss = tf.reduce_mean(categorical_crossentropy(y, output_logits))
acc_value = tf.reduce_mean(accuracy(y, output_logits))
correct_prediction = tf.equal(tf.argmax(output_logits, 1), tf.argmax(y, 1), name='correct_pred')
optimizer = tf.train.AdadeltaOptimizer(learning_rate=1.0, rho=0.95,epsilon=tf.keras.backend.epsilon()).minimize(loss)
init = tf.global_variables_initializer()
sess.run(init)
epochs = 20 # Total number of training epochs
batch_size = 128 # Training batch size
display_freq = 300 # Frequency of displaying the training results
num_tr_iter = int(len(y_train) / batch_size)
with sess.as_default():
for epoch in range(epochs):
print('Training epoch: {}'.format(epoch + 1))
# Randomly shuffle the training data at the beginning of each epoch
x_train, x_train_aux, y_train = randomize(x_train, x_train_auxiliary, y_train)
for iteration in range(num_tr_iter):
start = iteration * batch_size
end = (iteration + 1) * batch_size
x_batch, x_aux_batch, y_batch = get_next_batch(x_train, x_train_aux, y_train, start, end)
# Run optimization op (backprop)
feed_dict_batch = {x: x_batch, x_aux:x_aux_batch, y: y_batch,K.learning_phase(): 1}
optimizer.run(feed_dict=feed_dict_batch)
I also implemented the whole model from scratch in tensorflow, but it also is a ~65% accuracy, so I decided to try this Keras-layers-within-TF set up to identify problems.
I've looked up posts on similar problems with Keras and Tensorflow, and have tried the following which didn't help in my case:
Keras's dropout layer is only active in the training phase, so I did the same in my tf code by setting keras.backend.learning_phase().
Keras and Tensorflow have different variable initializations. I've tried initializing my weights in tensorflow these following 3 ways, which is supposed to be the same as Keras's weight initialization, but they also didn't affect the accuracies:
initer = tf.glorot_uniform_initializer()
initer = tf.contrib.layers.xavier_initializer()
initer = tf.random_normal(shape) * (np.sqrt(2.0/(shape[0] + shape[1])))
The optimizer in the two versions are set to be exactly the same! Though it doesn't look like the accuracy depends on the optimizer - I tried using different optimizers in both keras and tf and the accuracies each converge to the same.
Help!
It seems to me that this is most probably the weight initialization problem. What I would suggest you to do is to initialize keras layers and before training get the layer weights and initialize tf layers with those values.
I have ran into that kind of problems and it solved problems for me but it was a long time ago and I don't know if they made those initializers the same. At that time tf and keras initializations were not the same obviously.
I checked with initializers,seed, parameters and hyperparameters but accuracy is different.
I checked the code for Keras and they randomly shuffle the batch of images and then fed into the network, so this shuffling is different across different engines. So we need to figure out a way in which we can fed the same set of batch images to the network in order to get same accuracy

Autoencoder Gridsearch Hyperparameter tuning Keras

My data shape is the same, I just generated here random numbers. In real the datas are float numbers from range -6 to 6, I scaled them as well. The Input layer size and Encoding dimension have to remain the same. When I am training the loss starts and stays at 0.631 all the time. I changed the learning rate manually. I am new to python and do not know to implement to a grid search to this code to find the right parameters. What else can I do to tune my network ?
import numpy as np
from keras.layers import Input, Dense
from keras.models import Model
from keras import optimizers
#Train data
x_train=np.random.rand(2666000)
x_train = (train-min(train))/(max(train)-min(train))
x_train=x_train.reshape(-1,2000)
x_test=[]#empty testing later
#Enc Dimension
encoding_dim=100
#Input shape
input_dim = Input(shape=(2000,))
#Encoding Layer
encoded = Dense(encoding_dim, activation='relu')(input_dim)
#Decoding Layer
decoded = Dense(2000, activation='sigmoid')(encoded)
#Model AE
autoencoder = Model(input_dim, decoded)
#Model Encoder
encoder = Model(input_dim, encoded)
#Encoding
encoded_input = Input(shape=(encoding_dim,))
#Decoding
decoder_layer = autoencoder.layers[-1]
#Model Decoder
decoder = Model(encoded_input, decoder_layer(encoded_input))
optimizers.Adadelta(lr=0.1, rho=0.95, epsilon=None, decay=0.0)
autoencoder.compile(optimizer=optimizer, loss='binary_crossentropy',
metrics=['accuracy'])
#Train and test
autoencoder_train= autoencoder.fit(x_train, x_train,
epochs=epochs, shuffle=False, batch_size=2048)
I suggest adding more hidden layers. If your loss stays the same it means at least one of two things:
Your data is more or less random and there are no relationships to be drawn
Your model is not complex enough to learn meaningful relationships from your data
A rule of thumb for me is that a model should be powerful enough to overfit the data given enough training iterations.
Unfortunately there is a fine line between sufficiently complex and too complex. You have to play around with the number of hidden layers, the number of units in each layer, and the amount of epochs you take to train your network. Since you only have two Dense layers, a good starting point would be to increase model complexity.
If you insist on using a grid search keras has a wrapper for scikit_learn and sklearn has a grid search module. A toy example:
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
def create_model():
<return a compiled but untrained keras model>
model = KerasClassifier(build_fn = create_model, batch_size=1000, epochs=10)
#now write out all the parameters you want to try out for the grid search
activation = ['relu', 'tanh', 'sigmoid'...]
learn_rate = [0.1, 0.2, ...]
init = ['unform', 'normal', 'zero', ...]
optimizer = ['SGD', 'Adam' ...]
param_grid = dict(activation=activation, learn_rate=learn_rate, init=init, optimizer=optimizer)
grid = GridSearchCV(estimator=model, param_grid=param_grid)
result = grid.fit(X, y)

Categories

Resources