Use tensorflow learning-rate decay in a Keras-to-TPU model - python

I'm following the "How to train Keras model x20 times faster with TPU for free" guide (click here) to run a keras model on google's colab TPU. It works perfectly. But...I like to use cosine restart learning rate decay when I fit my models. I've coded up my own as a keras callback, but it won't work within this framework because the tensorflow TFOptimizer class doesn't have a learning-rate variable that can be reset. I see that tensorflow itself has a bunch of decay function in tf.train, like tf.train.cosine_decay but I can't figure out how to embed it within my model.
Here's the basic code from that blog post. Anyone have a fix?
import tensorflow as tf
import os
from tensorflow.python.keras.layers import Input, LSTM, Bidirectional, Dense, Embedding
def make_model(batch_size=None):
source = Input(shape=(maxlen,), batch_size=batch_size,
dtype=tf.int32, name='Input')
embedding = Embedding(input_dim=max_features,
output_dim=128, name='Embedding')(source)
lstm = LSTM(32, name='LSTM')(embedding)
predicted_var = Dense(1, activation='sigmoid', name='Output')(lstm)
model = tf.keras.Model(inputs=[source], outputs=[predicted_var])
model.compile(
optimizer=tf.train.RMSPropOptimizer(learning_rate=0.01),
loss='binary_crossentropy',
metrics=['acc'])
return model
training_model = make_model(batch_size=128)
# This address identifies the TPU we'll use when configuring TensorFlow.
TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']
tf.logging.set_verbosity(tf.logging.INFO)
tpu_model = tf.contrib.tpu.keras_to_tpu_model(
training_model,
strategy=tf.contrib.tpu.TPUDistributionStrategy(
tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)))
history = tpu_model.fit(x_train, y_train,
epochs=20,
batch_size=128 * 8,
validation_split=0.2)

One option is to manually set the learning rates - there is a Keras+TPU example with a callback here: https://github.com/tensorflow/tpu/blob/master/models/experimental/resnet50_keras/resnet50.py#L197-L201

The following seems to work, where lr is the initial learning rate you choose and M is the number of initial steps over which you want to the cosine decay to work.
def make_model(batch_size=None,lr=1.e-3,n_steps=2000):
source = Input(shape=(maxlen,), batch_size=batch_size,
dtype=tf.int32, name='Input')
embedding = Embedding(input_dim=max_features,
output_dim=128, name='Embedding')(source)
lstm = LSTM(32, name='LSTM')(embedding)
predicted_var = Dense(1, activation='sigmoid', name='Output')(lstm)
model = tf.keras.Model(inputs=[source], outputs=[predicted_var])
# implement cosine decay or other learning rate decay here
global_step = tf.Variable(0)
global_step=1
learning_rate = tf.train.cosine_decay_restarts(
learning_rate=lr,
global_step=global_step,
first_decay_steps=n_steps,
t_mul= 1.5,
m_mul= 1.,
alpha=0.1
)
# now feed this into the optimizer as shown below
model.compile(
optimizer=tf.train.RMSPropOptimizer(learning_rate=learning_rate),
loss='binary_crossentropy',
metrics=['acc'])
return model

Related

No hparams data was found when using tensorboard with keras-tuner

versions: tensorboard==2.9.0, keras-tuner==1.1.2
Here is simple model of binary classification with hyperparameters to search added in the model by using keras-tuner.
def build_model(hp):
n_layers = 4
n_features = len(X_train.columns)
inputs = tf.keras.Input(shape=(n_features,))
dense = tf.keras.layers.Dense(hp.Int("input_units", min_value=128, max_value=256, step=32),
activation=hp.Choice("activation", ['relu', 'tanh'])
)(inputs)
dense = tf.keras.layers.Dropout(0.2)(dense)
# num_layer as hyperparameter
for i in range(hp.Int("dense_layer", 1, n_layers)):
dense = tf.keras.layers.Dense(hp.Int(f"hidden_unit_{i}", 128, 256, 32),
activation=hp.Choice("activation", ['relu', 'tanh'])
)(dense)
output = tf.keras.layers.Dense(1, activation='sigmoid')(dense)
model = tf.keras.Model(inputs=inputs, outputs=output)
lr = hp.Float("lr", min_value=1e-4, max_value=1e-1, sampling="log")
model.compile(optimizer=tf.keras.optimizers.Adam(lr),
loss=tf.keras.losses.BinaryCrossentropy(),
metrics=metrics)
return model
hyperparameter search space would be
{neurons:[128, 160, 192, 224, 256],
num_hidden_layers:[1,2,3],
activation_function = ['relu', 'tanh'],
learning_rate = [0.0001, 0.001, 0.01]}
Now begin search
tuner = RandomSearch(
build_model,
objective = kt.Objective("val_binary_accuracy", direction="max"),
max_trials = 3,
executions_per_trial = 1,
directory=LOG_DIR
)
tensorboard_cb = tf.keras.callbacks.TensorBoard('logs/hyp_tune/')
tuner.search(X_train, y_train, epochs=10, batch_size=512,
validation_data=(X_test, y_test),
callbacks=[tensorboard_cb]
)
From keras-tuner guide https://keras.io/guides/keras_tuner/visualize_tuning/ This should work fine, showing Hparams when opening tensorboard.
However when I select HPARAMS tab, it outputs message below:
No hparams data was found.
Probable causes:
You haven’t written any hparams data to your event files.
Event files are still being loaded (try reloading this page).
TensorBoard can’t find your event files.
If you’re new to using TensorBoard, and want to find out how to add data and set up your event files, check out the README and perhaps the TensorBoard tutorial.
If you think TensorBoard is configured properly, please see the section of the README devoted to missing data problems and consider filing an issue on GitHub.
I've tried re-searching, restarting notebook, however cannot still no luck.
[EDIT]
when I load tensorboard tensorboard --logdir='logs/t1' it should show logs/t1 at left side of screen below Runs however it shows logs/t0 which is previous run(simple model run w/o hyperparameter tuning) I think since it is showing previous run w/o hyperparameter tuning it has no data showing in HPARAMS tab. How can I delete previous log and load new one? (overwriting hyperparameter tuning model with 'logs/t0' works fine)
I write this code and run it correctly:
At the end, use these two commands and get your output:
%load_ext tensorboard
%tensorboard --logdir /logs/hyp_tune/
Full code:
# !pip install keras-tuner -q
import numpy as np
import keras_tuner
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
(x_train, y_train), (x_test, y_test) = (np.random.rand(1000,4), np.random.rand(1000)) , (np.random.rand(100,4), np.random.rand(100))
def build_model(hp):
n_layers = 4
n_features = x_train.shape[1]
inputs = tf.keras.Input(shape=(n_features,))
dense = tf.keras.layers.Dense(hp.Int("input_units", min_value=128, max_value=256, step=32),
activation=hp.Choice("activation", ['relu', 'tanh'])
)(inputs)
dense = tf.keras.layers.Dropout(0.2)(dense)
# num_layer as hyperparameter
for i in range(hp.Int("dense_layer", 1, n_layers)):
dense = tf.keras.layers.Dense(hp.Int(f"hidden_unit_{i}", 128, 256, 32),
activation=hp.Choice("activation", ['relu', 'tanh'])
)(dense)
output = tf.keras.layers.Dense(1, activation='sigmoid')(dense)
model = tf.keras.Model(inputs=inputs, outputs=output)
lr = hp.Float("lr", min_value=1e-4, max_value=1e-1, sampling="log")
model.compile(optimizer=tf.keras.optimizers.Adam(lr),
loss=tf.keras.losses.BinaryCrossentropy(),
metrics=["accuracy"])
return model
hp = keras_tuner.HyperParameters()
model = build_model(hp)
model.summary()
tuner = keras_tuner.RandomSearch(
build_model,
max_trials=10,
overwrite=True,
objective="val_accuracy",
# Set a directory to store the intermediate results.
directory="/logs/hyp_tune/",
)
tensorboard_cb = tf.keras.callbacks.TensorBoard('/logs/hyp_tune/')
tuner.search(
x_train,
y_train,
validation_data=(x_test, y_test),
batch_size=512,
epochs=10,
callbacks=[tensorboard_cb],
)
output:
%load_ext tensorboard
%tensorboard --logdir /logs/hyp_tune/

How to fit basic custom built model in tensorflow

I am used to working in PyTorch but now have to learn Tensorflow for my job. I am trying to get up to speed by creating a simple dense network and training it on the MNIST dataset, but I cannot get it to train. My super simple code:
import tensorflow as tf
from tensorflow.keras import Input
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.utils import to_categorical
# Load mnist data from keras
(train_data, train_label), (test_data, test_label) = tf.keras.datasets.mnist.load_data(path="mnist.npz")
train_label, test_label = to_categorical(train_label), to_categorical(test_label)
train_data, train_label, test_data, test_label = Flatten()(train_data), Flatten()(train_label), Flatten()(test_data), Flatten()(test_label)
# Create generic SGD optimizer (no learning schedule)
optimizer = SGD(learning_rate = 0.01)
# Define function to build and compile model
def build_mnist_model(input_shape, batch_size = 30):
input_img = Input(shape = input_shape, batch_size = batch_size)
# Pass through dense layer
x = Dense(200, activation = 'relu', use_bias = True)(input_img)
x = Dense(400, activation = 'relu', use_bias = True)(x)
scores = Dense(10, activation = 'softmax', use_bias = True)(x)
# Create and compile tf model
mnist_model = Model(input_img, scores)
mnist_model.compile(optimizer = optimizer, loss = 'categorical_crossentropy')
return mnist_model
# Build the model
mnist_model = build_mnist_model(train_data[0].shape)
# Train the model
mnist_model.fit(
x = train_data,
y = train_label,
batch_size = 30,
epochs = 20,
verbose = 2,
shuffle = True,
# steps_per_epoch = 200
)
When I run this I get
ValueError: When using data tensors as input to a model, you should specify the `steps_per_epoch` argument.
This does not really make sense to me because my train_data and train_label are just regular tensors and per the Tensorflow documentation in this case it should default to the number of samples in the dataset divided by the batch size (which would be 200 in my case).
At any rate, I tried specifying steps_per_epoch = 200 when I call mnist_model.fit() but then I get a different error:
InvalidArgumentError: Incompatible shapes: [60000,10] vs. [30,1]
[[{{node training_4/SGD/gradients/gradients/loss_5/dense_17_loss/softmax_cross_entropy_with_logits_grad/mul}}]]
I can't seem to discern where a size mismatch would come from. In PyTorch, I am used to manually creating batches (by subindexing my data and label tensors) but in Tensorflow this seems to happen automatically. As such, this leaves me quite confused about what batch has the wrong size, how it got the wrong size, etc. I hope this simple model is way easier than I am making it and I just do not know the Tensorflow tricks yet.
Thanks for the help.

Good training/validation accuracy but poor test accuracy

Ive trained a model to classify 4 types of eye diseases using the VGG16 pretrained model. I am fairly new to machine learning so didn't know what to make out of the results.
After training it for about 6 hours on 90,000 images:
training accuracy kept increasing as well as the loss (went from roughly 2 to 0.8 ended with an accuracy of 88%)
validation loss kept flucating between 1-2 per epoch (accuracy did improve to 85%)
(I accidentally reran the cell so cant see the output)
After looking at the confusion matrix, it seems my test isn't performing well
Image_height = 196
Image_width = 300
val_split = 0.2
batches_size = 10
lr = 0.0001
spe = 512
vs = 32
epoch = 10
#Creating batches
#Creating batches
train_batches = ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input,validation_split=val_split) \
.flow_from_directory(directory=train_folder, target_size=(Image_height,Image_width), classes=['CNV','DME','DRUSEN','NORMAL'], batch_size=batches_size,class_mode="categorical",
subset="training")
validation_batches = ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input,validation_split=val_split) \
.flow_from_directory(directory=train_folder, target_size=(Image_height,Image_width), classes=['CNV','DME','DRUSEN','NORMAL'], batch_size=batches_size,class_mode="categorical",
subset="validation")
test_batches = ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input) \
.flow_from_directory(test_folder, target_size=(Image_height,Image_width),
classes=['CNV','DME','DRUSEN','NORMAL'], batch_size=batches_size,class_mode="categorical")
#Function to create model. We will be using a pretrained model
def create():
vgg16_model = keras.applications.vgg16.VGG16(input_tensor=Input(shape=(Image_height, Image_width, 3)),input_shape=(Image_height,Image_width,3), include_top = False)
model = Sequential()
model.add(vgg16_model)
for layer in model.layers:
layer.trainable = False
model.add(Flatten())
model.add(Dense(4, activation='softmax'))
return model
model = create()
model.compile(Adam(lr=lr),loss="categorical_crossentropy",metrics=['accuracy'])
model.fit(train_batches, steps_per_epoch=spe,
validation_data=validation_batches,validation_steps=vs, epochs=epoch)
Any suggestions on what I can improve on so the confusion matrix isn't doing so poorly? I also have the model saved if its possible to just retrain it with more layers.
A number of issues and recommendations. You are using VGG16 model. That model has over 40 million trainable parameters. On a data set of 90,000 images your training time will be very long. So I recommend you consider using the MobileNet model. It only has 4 million trainable parameters and is essentially just as accurate as VGG16. Documentation is [here.][1] Next irrespective of which model you use you should set the initial weights to the imagenet weights. Your model will start off trained on images.I find I get better results by making all layers in the model trainable. Now you say your model reached an accuracy of 88%. I do not think that is very good. I believe you need to achieve at least 95%. You can do that by using an adjustable learning rate. The keras callback ReduceLROnPlateau makes doing that easy. Documentation is [here.][2] Set it up to monitor validation loss and reduce the learning rate if it fails to decrease on consecutive epochs. Next you want to save the model that has the lowest validation loss and use that to make predictions. The Keras callback ModelCheckpoint can be set up to monitor validation loss and save the model with the lowest loss. Documentation is [here.][3] .
Code below shows how to implement the MobileNet model for your problem and define the callbacks. You will also have to make changes to the generator to use Mobilenet preprocessing and set target size to (224,224). Also I believe you are missing () around the pre-processing function Hope this helps..
mobile = tf.keras.applications.mobilenet.MobileNet( include_top=False,
input_shape=(224, 224,3),
pooling='max', weights='imagenet',
alpha=1, depth_multiplier=1,dropout=.5)
x=mobile.layers[-1].output
x=keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001 )(x)
predictions=Dense (4, activation='softmax')(x)
model = Model(inputs=mobile.input, outputs=predictions)
for layer in model.layers:
layer.trainable=True
model.compile(Adamax(lr=lr), loss='categorical_crossentropy', metrics=['accuracy'])
checkpoint=tf.keras.callbacks.ModelCheckpoint(filepath=save_loc, monitor='val_loss', verbose=0, save_best_only=True,
save_weights_only=False, mode='auto', save_freq='epoch', options=None)
lr_adjust=tf.keras.callbacks.ReduceLROnPlateau( monitor="val_loss", factor=0.5, patience=1, verbose=0, mode="auto",
min_delta=0.00001, cooldown=0, min_lr=0)
callbacks=[checkpoint, lr_adjust]
[1]: http://httphttps://keras.io/api/applications/mobilenet/s://
[2]: https://keras.io/api/callbacks/reduce_lr_on_plateau/
[3]: https://keras.io/api/callbacks/model_checkpoint/
You don't train any layer except the last one.
You need to set the training capability to the last few or add more layers.
Add
tf.keras.applications.VGG16(... weights='imagenet'... )
In your code, the weights are not pretrained on any set.
The available options are explained here:
https://www.tensorflow.org/api_docs/python/tf/keras/applications/VGG16
while adding layers to model you have to remove last dense layer of the model, as your model has four classes but vgg16 has 1000 classes so you have to remove last dense layer then add your own dense layers:
def create():
vgg16_model = keras.applications.vgg16.VGG16(input_tensor=Input(shape=(Image_height, Image_width, 3)),input_shape=(Image_height,Image_width,3), include_top = False)
model = Sequential()
for layer in vgg16_model.layers[:-1]:
model.add(layer)
model.summary()
for layer in model.layers:
layer.trainable = False
model.add(Flatten())
model.add(Dense(4, activation='softmax'))
return model

Tensorboard Display Validation Data and Training Data in two Graphs

I try to display the accuracy and loss of my net with Tensorboard as graphs, but the training and validation data are shown as separate runs. I am still relatively inexperienced with Tensorflow and Tensorboard, so I hope you can see the reason for this
Here is my code:
import os
import time
import pickle
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.callbacks import TensorBoard
print("Loading Data via Pickel")
X = pickle.load(open("X.pickle", "rb"))
y = pickle.load(open("y.pickle", "rb"))
print(len(X))
print(len(y))
startTime = time.time()
hidden_dense_layers = [0,1,2]
hidden_dense_layer_size = [64, 128, 256, 512, 1024]
for dense_layer_ammount in hidden_dense_layers:
for dense_layer_size in hidden_dense_layer_size:
NAME = "{}-hidden_layers-{}-layersize".format(dense_layer_ammount, dense_layer_size)
print("----------", NAME, "----------")
print("Building Model")
# model = keras.Sequential([
# keras.layers.Flatten(input_shape=(200, 200)),
# keras.layers.Dense(500, activation="relu"),
# keras.layers.Dense(1, activation="sigmoid")
# ])
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=(75, 75)))
for i in range(dense_layer_ammount):
model.add(keras.layers.Dense(dense_layer_size, activation="relu"))
model.add(keras.layers.Dense(1, activation="sigmoid"))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
print("Creating Callbacks")
print("Creating Checkpoint Callback")
checkpoint_path = "training_2/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
# Create a callback that saves the model's weights
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_path,
save_weights_only=True,
verbose=1
)
print("Creating Tensorboard Callback")
tensorboard_callback = TensorBoard(log_dir="logs/{}".format(NAME))
print("Training Model")
model.fit(
X,
y,
# batch_size=32,
epochs=10,
callbacks=[
# checkpoint_callback,
tensorboard_callback
],
validation_split=0.3
)
Here is how the runs are Displayed for me
Here is how the Graphs are displayed to me
It is completely normal to have two curves for both graphs. Each curve corresponds to training data or validation data (resp. orange and blue on your plots). To each epoch you get a two-step process:
first you get the actual model parameter tuning with gradient descent, the training step. The blue curve tells you learn something (e.g.: is the model complex enough for the given task ?).
secondly you need to make sure that the trained model is performing well on data that have not been used to tune the parameter, this is the validation step. The red curve will tell you how close you are to an overfitting situation (meaning that you get good performances for the tuning part, but that the model is very bad when feeding with "new data").

Same model produces consistently different accuracies in Keras and Tensorflow

I'm trying to implement the same model in Keras, and in Tensorflow using Keras layers, using custom data. The two models produce consistently different accuracies over many times of training (keras ~71%, tensorflow ~65%). I want tensorflow to do as well as keras so I can go into the tensorflow iterations to tweak some lower level algorithms.
Here's my original Keras code:
from keras.layers import Dense, Dropout, Input
from keras.models import Model, Sequential
from keras import backend as K
input_size = 2000
num_classes = 4
num_industries = 22
num_aux_inputs = 3
main_input = Input(shape=(input_size,),name='text_vectors')
x = Dense(units=64, activation='relu', name = 'dense1')(main_input)
drop1 = Dropout(0.2,name='dropout1')(x)
auxiliary_input = Input(shape=(num_aux_inputs,), name='aux_input')
x = keras.layers.concatenate([drop1,auxiliary_input])
x = Dense(units=64, activation='relu',name='dense2')(x)
drop2 = Dropout(0.1,name='dropout2')(x)
x = Dense(units=32, activation='relu',name='dense3')(drop2)
main_output = Dense(units=num_classes,
activation='softmax',name='main_output')(x)
model = Model(inputs=[main_input, auxiliary_input],
outputs=main_output)
model.compile(loss=keras.losses.categorical_crossentropy, metrics= ['accuracy'],optimizer=keras.optimizers.Adadelta())
history = model.fit([train_x,train_x_auxiliary], train_y, batch_size=128, epochs=20, verbose=1, validation_data=([val_x,val_x_auxiliary], val_y))
loss, accuracy = model.evaluate([val_x,val_x_auxiliary], val_y, verbose=0)
Here's I moved the keras layers to tensorflow following this article:
import tensorflow as tf
from keras import backend as K
import keras
from keras.layers import Dense, Dropout, Input # Dense layers are "fully connected" layers
from keras.metrics import categorical_accuracy as accuracy
from keras.objectives import categorical_crossentropy
tf.reset_default_graph()
sess = tf.Session()
K.set_session(sess)
input_size = 2000
num_classes = 4
num_industries = 22
num_aux_inputs = 3
x = tf.placeholder(tf.float32, shape=[None, input_size], name='X')
x_aux = tf.placeholder(tf.float32, shape=[None, num_aux_inputs], name='X_aux')
y = tf.placeholder(tf.float32, shape=[None, num_classes], name='Y')
# build graph
layer = Dense(units=64, activation='relu', name = 'dense1')(x)
drop1 = Dropout(0.2,name='dropout1')(layer)
layer = keras.layers.concatenate([drop1,x_aux])
layer = Dense(units=64, activation='relu',name='dense2')(layer)
drop2 = Dropout(0.1,name='dropout2')(layer)
layer = Dense(units=32, activation='relu',name='dense3')(drop2)
output_logits = Dense(units=num_classes, activation='softmax',name='main_output')(layer)
loss = tf.reduce_mean(categorical_crossentropy(y, output_logits))
acc_value = tf.reduce_mean(accuracy(y, output_logits))
correct_prediction = tf.equal(tf.argmax(output_logits, 1), tf.argmax(y, 1), name='correct_pred')
optimizer = tf.train.AdadeltaOptimizer(learning_rate=1.0, rho=0.95,epsilon=tf.keras.backend.epsilon()).minimize(loss)
init = tf.global_variables_initializer()
sess.run(init)
epochs = 20 # Total number of training epochs
batch_size = 128 # Training batch size
display_freq = 300 # Frequency of displaying the training results
num_tr_iter = int(len(y_train) / batch_size)
with sess.as_default():
for epoch in range(epochs):
print('Training epoch: {}'.format(epoch + 1))
# Randomly shuffle the training data at the beginning of each epoch
x_train, x_train_aux, y_train = randomize(x_train, x_train_auxiliary, y_train)
for iteration in range(num_tr_iter):
start = iteration * batch_size
end = (iteration + 1) * batch_size
x_batch, x_aux_batch, y_batch = get_next_batch(x_train, x_train_aux, y_train, start, end)
# Run optimization op (backprop)
feed_dict_batch = {x: x_batch, x_aux:x_aux_batch, y: y_batch,K.learning_phase(): 1}
optimizer.run(feed_dict=feed_dict_batch)
I also implemented the whole model from scratch in tensorflow, but it also is a ~65% accuracy, so I decided to try this Keras-layers-within-TF set up to identify problems.
I've looked up posts on similar problems with Keras and Tensorflow, and have tried the following which didn't help in my case:
Keras's dropout layer is only active in the training phase, so I did the same in my tf code by setting keras.backend.learning_phase().
Keras and Tensorflow have different variable initializations. I've tried initializing my weights in tensorflow these following 3 ways, which is supposed to be the same as Keras's weight initialization, but they also didn't affect the accuracies:
initer = tf.glorot_uniform_initializer()
initer = tf.contrib.layers.xavier_initializer()
initer = tf.random_normal(shape) * (np.sqrt(2.0/(shape[0] + shape[1])))
The optimizer in the two versions are set to be exactly the same! Though it doesn't look like the accuracy depends on the optimizer - I tried using different optimizers in both keras and tf and the accuracies each converge to the same.
Help!
It seems to me that this is most probably the weight initialization problem. What I would suggest you to do is to initialize keras layers and before training get the layer weights and initialize tf layers with those values.
I have ran into that kind of problems and it solved problems for me but it was a long time ago and I don't know if they made those initializers the same. At that time tf and keras initializations were not the same obviously.
I checked with initializers,seed, parameters and hyperparameters but accuracy is different.
I checked the code for Keras and they randomly shuffle the batch of images and then fed into the network, so this shuffling is different across different engines. So we need to figure out a way in which we can fed the same set of batch images to the network in order to get same accuracy

Categories

Resources