I use the Keras model training API and observed differences when training the model with NumPy arrays (x_train and y_train) and with tf.data.Dataset.from_tensor_slices((x_train, y_train)). A minimal working example is shown below:
import numpy as np
import tensorflow as tf
tf.keras.utils.set_random_seed(0)
n_examples, n_dims = (100, 10)
raw_dataset = np.random.randn(n_examples, n_dims)
model = tf.keras.models.Sequential(
[
tf.keras.layers.Dense(
1024, activation="relu", use_bias=True
),
tf.keras.layers.Dense(
1, activation="linear", use_bias=True
),
]
)
model.compile(
optimizer=tf.keras.optimizers.Adam(),
loss="mse",
)
x_train = raw_dataset[:, :-1]
y_train = raw_dataset[:, -1]
dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
n_epochs = 10
batch_size = 16
use_dataset = True
if use_dataset:
model.fit(
dataset.batch(batch_size=batch_size),
epochs=n_epochs,
)
else:
model.fit(
x=x_train,
y=y_train,
batch_size=batch_size,
epochs=n_epochs,
)
print("Evaluation:")
model.evaluate(x_train, y_train)
model.evaluate(dataset.batch(batch_size=batch_size))
If I run this code with use_dataset = True, the final performance is:
Evaluation:
4/4 [==============================] - 0s 825us/step - loss: 0.4132
7/7 [==============================] - 0s 701us/step - loss: 0.4132
If I run it with use_dataset = False, I get:
Evaluation:
4/4 [==============================] - 0s 855us/step - loss: 0.4219
7/7 [==============================] - 0s 808us/step - loss: 0.4219
I expected that the two training loops would perform identically. Interestingly, the model performance is identical if I set batch_size = n_examples. The difference seems to be related with the way that batches are handled internally. Why is this happening? Is it a bug or a feature?
The behavior is due to the default parameter shuffle=True in model.fit(*) and not a bug. According to the docs regarding shuffle:
Boolean (whether to shuffle the training data before each epoch) or str (for 'batch'). This argument is ignored when x is a generator or an object of tf.data.Dataset. 'batch' is a special option for dealing with the limitations of HDF5 data; it shuffles in batch-sized chunks. Has no effect when steps_per_epoch is not None.
So this parameter is ignored when a tf.data.Dataset is passed, and the data is not reshuffled after each epoch as in the other approach with arrays.
Here is the code to get the same results for both methods:
import numpy as np
import tensorflow as tf
tf.keras.utils.set_random_seed(0)
n_examples, n_dims = (100, 10)
raw_dataset = np.random.randn(n_examples, n_dims)
model = tf.keras.models.Sequential(
[
tf.keras.layers.Dense(
1024, activation="relu", use_bias=True
),
tf.keras.layers.Dense(
1, activation="linear", use_bias=True
),
]
)
model.compile(
optimizer=tf.keras.optimizers.Adam(),
loss="mse",
)
x_train = raw_dataset[:, :-1]
y_train = raw_dataset[:, -1]
dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
n_epochs = 10
batch_size = 16
use_dataset = False
if use_dataset:
model.fit(
dataset.batch(batch_size=batch_size),
epochs=n_epochs,
)
else:
model.fit(
x=x_train,
y=y_train,
batch_size=batch_size,
shuffle=False,
epochs=n_epochs,
)
print("Evaluation:")
model.evaluate(x_train, y_train)
model.evaluate(dataset.batch(batch_size=batch_size))
I have a dataset with multiple fields, but only two are relevant for my machine learning implementation. The rest shall not be considered for predictions, but might unveil interesting correlations.
Is there a way to return prediction results when calling model.evaluate?
For example:
[loss, accuracy, predicted_results] = model.evaluate(input, results)
AFAIK, we can't get prediction on x using model.evaluate, it simply returns the loss and acc, source. But for your need, you can write a custom class and define the necessary calls such as .evaluate and .predict. Let's define a simple model to demonstrate.
Train and Run
import tensorflow as tf
import numpy as np
img = tf.random.normal([20, 32], 0, 1, tf.float32)
tar = np.random.randint(2, size=(20, 1))
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(10, input_dim = 32,
kernel_initializer ='normal', activation= 'relu'))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam', metrics=['accuracy'])
model.fit(img, tar, epochs=2, verbose=2)
Epoch 1/2
1/1 - 1s - loss: 0.7083 - accuracy: 0.5000
Epoch 2/2
1/1 - 0s - loss: 0.6983 - accuracy: 0.5000
Now, for your request, we can do something as follows:
class Custom_Evaluate:
def __init__(self, model):
self.model = model
def eval_predict(self, x, y):
loss, acc = self.model.evaluate(x, y)
pred = self.model.predict(x)
return loss, acc, pred
custom_evaluate = Custom_Evaluate(model)
loss, acc, pred = custom_evaluate.eval_predict(img, tar)
print(loss, acc)
print(pred)
0.6886215806007385 0.6499999761581421
[[0.5457604 ]
[0.6126752 ]
[0.53668976]
[0.40323135]
[0.37159938]
[0.5520069 ]
[0.4959099 ]
[0.5363802 ]
[0.5033434 ]
[0.65680957]
[0.6863682 ]
[0.44409862]
[0.4672098 ]
[0.49656072]
[0.620726 ]
[0.47991502]
[0.58834356]
[0.5245693 ]
[0.5359181 ]
[0.4575624 ]]
I'm using a deep learning concept but a beginner in it, I'm trying to build a feature fusion concept using 3 deep neural network models, the idea is I'm trying to get features from all three models and do classification on the last single sigmoid layer and then get the results, here is the code that I run.
Code:
from keras.layers import Input, Dense
from keras.models import Model
from sklearn.model_selection import train_test_split
import numpy
# random seed for reproducibility
numpy.random.seed(2)
# loading load pima indians diabetes dataset, past 5 years of medical history
dataset = numpy.loadtxt('https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv', delimiter=",")
# split into input (X) and output (Y) variables, splitting csv data
X = dataset[:, 0:8]
Y = dataset[:, 8]
x_train, x_validation, y_train, y_validation = train_test_split(X, Y, test_size=0.20, random_state=5)
#create the input layer
input_layer = Input(shape=(8,))
A2 = Dense(8, activation='relu')(input_layer)
A3 = Dense(30, activation='relu')(A2)
B2 = Dense(40, activation='relu')(A2)
B3 = Dense(30, activation='relu')(B2)
C2 = Dense(50, activation='relu')(B2)
C3 = Dense(5, activation='relu')(C2)
merged = Model(inputs=[input_layer],outputs=[A3,B3,C3])
final_model = Dense(1,
activation='sigmoid')(merged
final_model.compile(loss="binary_crossentropy",
optimizer="adam", metrics=['accuracy'])
# call the function to fit to the data (training the network)
final_model.fit(x_train, y_train, epochs=2000, batch_size=50,
validation_data=(x_validation, y_validation))
# evaluate the model
scores = final_model.evaluate(x_validation,y_validation)
print("\n%s: %.2f%%" % (final_model.metrics_names[1], scores[1] * 100))
Here is the error that I'm facing
if x.shape.ndims is None:
AttributeError: 'Functional' object has no attribute 'shape'
Please help me out to fix this issue, or if anyone knows what code should I use then let me know I'm also willing to change code but not concept Thank you.
Update
From #M.Innat's answer, we've tried as follows. The idea is we first build 3 models and then build a final / combine model by joining these models with a single classifier. But I am facing a discrepancy. When I train each model, they gave 90% results but when I combine them, they hardly reach 60 or 70.
Code MODEL 1:
model = Sequential()
# input layer requires input_dim param
model.add(Dense(10, input_dim=8, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(5, activation='relu'))
# sigmoid instead of relu for final probability between 0 and 1
model.add(Dense(1, activation='sigmoid'))
# compile the model, adam gradient descent (optimized)
model.compile(loss="binary_crossentropy",
optimizer="adam", metrics=['accuracy'])
# call the function to fit to the data (training the network)
model.fit(x_train, y_train, epochs=1000, batch_size=50,
validation_data=(x_validation, y_validation))
# evaluate the model
scores = model.evaluate(X, Y)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1] * 100))
model.save('diabetes_risk_nn.h5')
MODEL 1 Accuracy = 94.14%. And same as another 2 models.
MODEL 2 Accuracy = 93.62%
MODEL 3 Accuracy = 92.71%
Next, as #M.Innat's suggested to merging the models. Here we have done that using the above Models 1,2,3. But the score is not near ~90%. FINAL Combined Model:
# Define Model A
input_layer = Input(shape=(8,))
A2 = Dense(10, activation='relu')(input_layer)
A3 = Dense(50, activation='relu')(A2)
A4 = Dense(50, activation='relu')(A3)
A5 = Dense(50, activation='relu')(A4)
A6 = Dense(50, activation='relu')(A5)
A7 = Dense(50, activation='relu')(A6)
A8 = Dense(5, activation='relu')(A7)
model_a = Model(inputs=input_layer, outputs=A8, name="ModelA")
# Define Model B
input_layer = Input(shape=(8,))
B2 = Dense(10, activation='relu')(input_layer)
B3 = Dense(50, activation='relu')(B2)
B4 = Dense(40, activation='relu')(B3)
B5 = Dense(60, activation='relu')(B4)
B6 = Dense(30, activation='relu')(B5)
B7 = Dense(50, activation='relu')(B6)
B8 = Dense(50, activation='relu')(B7)
B9 = Dense(5, activation='relu')(B8)
model_b = Model(inputs=input_layer, outputs=B9, name="ModelB")
# Define Model C
input_layer = Input(shape=(8,))
C2 = Dense(10, activation='relu')(input_layer)
C3 = Dense(50, activation='relu')(C2)
C4 = Dense(40, activation='relu')(C3)
C5 = Dense(40, activation='relu')(C4)
C6 = Dense(70, activation='relu')(C5)
C7 = Dense(50, activation='relu')(C6)
C8 = Dense(50, activation='relu')(C7)
C9 = Dense(60, activation='relu')(C8)
C10 = Dense(50, activation='relu')(C9)
C11 = Dense(5, activation='relu')(C10)
model_c = Model(inputs=input_layer, outputs=C11, name="ModelC")
all_three_models = [model_a, model_b, model_c]
all_three_models_input = Input(shape=all_three_models[0].input_shape[1:])
And then combine these three.
models_output = [model(all_three_models_input) for model in all_three_models]
Concat = tf.keras.layers.concatenate(models_output, name="Concatenate")
final_out = Dense(1, activation='sigmoid')(Concat)
final_model = Model(inputs=all_three_models_input, outputs=final_out, name='Ensemble')
#tf.keras.utils.plot_model(final_model, expand_nested=True)
final_model.compile(loss="binary_crossentropy",
optimizer="adam", metrics=['accuracy'])
# call the function to fit to the data (training the network)
final_model.fit(x_train, y_train, epochs=1000, batch_size=50,
validation_data=(x_validation, y_validation))
# evaluate the model
scores = final_model.evaluate(x_validation,y_validation)
print("\n%s: %.2f%%" % (final_model.metrics_names[1], scores[1] * 100))
final_model.save('diabetes_risk_nn.h5')
But unlike each model where they gave 90%, this combine final model gave accuracy around =70%
I suppose the output layer is that Dense(1, activation='sigmoid'). So try something like this
# ...
merged = tf.keras.layers.concatenate([A3,B3,C3])
out = Dense(1, activation='sigmoid')(merged)
model = (input_layer, out)
model.fit(x_train, y_train, ...)
According to your code, there is only one model (not three). And by seeing the output that you tried, I think you're looking for something like this:
DataSet
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
from sklearn.model_selection import train_test_split
import numpy
# random seed for reproducibility
numpy.random.seed(2)
# loading load pima indians diabetes dataset, past 5 years of medical history
dataset = numpy.loadtxt('https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv', delimiter=",")
# split into input (X) and output (Y) variables, splitting csv data
X = dataset[:, 0:8]
Y = dataset[:, 8]
x_train, x_validation, y_train, y_validation = train_test_split(X, Y, test_size=0.20, random_state=5)
Model
#create the input layer
input_layer = Input(shape=(8,))
A2 = Dense(8, activation='relu')(input_layer)
A3 = Dense(30, activation='relu')(A2)
B2 = Dense(40, activation='relu')(input_layer)
B3 = Dense(30, activation='relu')(B2)
C2 = Dense(50, activation='relu')(input_layer)
C3 = Dense(5, activation='relu')(C2)
merged = tf.keras.layers.concatenate([A3,B3,C3])
final_out = Dense(1, activation='sigmoid')(merged)
final_model = Model(inputs=[input_layer], outputs=final_out)
tf.keras.utils.plot_model(final_model)
Train
final_model.compile(loss="binary_crossentropy",
optimizer="adam", metrics=['accuracy'])
# call the function to fit to the data (training the network)
final_model.fit(x_train, y_train, epochs=5, batch_size=50,
validation_data=(x_validation, y_validation))
# evaluate the model
scores = final_model.evaluate(x_validation,y_validation)
print("\n%s: %.2f%%" % (final_model.metrics_names[1], scores[1] * 100))
Epoch 1/5
13/13 [==============================] - 1s 15ms/step - loss: 0.7084 - accuracy: 0.6803 - val_loss: 0.6771 - val_accuracy: 0.6883
Epoch 2/5
13/13 [==============================] - 0s 5ms/step - loss: 0.6491 - accuracy: 0.6600 - val_loss: 0.5985 - val_accuracy: 0.6623
Epoch 3/5
13/13 [==============================] - 0s 5ms/step - loss: 0.6161 - accuracy: 0.6813 - val_loss: 0.6805 - val_accuracy: 0.6883
Epoch 4/5
13/13 [==============================] - 0s 5ms/step - loss: 0.6335 - accuracy: 0.7003 - val_loss: 0.6115 - val_accuracy: 0.6623
Epoch 5/5
13/13 [==============================] - 0s 5ms/step - loss: 0.5684 - accuracy: 0.7285 - val_loss: 0.6150 - val_accuracy: 0.6883
5/5 [==============================] - 0s 2ms/step - loss: 0.6150 - accuracy: 0.6883
accuracy: 68.83%
Update
Based on your this comment:
Let me explain to you what I'm trying to do, firstly I create 3 models DNN separately then I try to combine those models to get features of all there, after that I want to classify using all extracted features and then evaluate the accuracy. That's what actually I'm trying to develop.
create 3 models separately - OK, 3 models
combine them to get a feature - OK, Feature extractors
classify - OK, Average the model output feature maps and pass to the classifier - in other words Ensembling.
Let's do this. First, build three models separately.
# Define Model A
input_layer = Input(shape=(8,))
A2 = Dense(8, activation='relu')(input_layer)
A3 = Dense(30, activation='relu')(A2)
C3 = Dense(5, activation='relu')(A3)
model_a = Model(inputs=input_layer, outputs=C3, name="ModelA")
# Define Model B
input_layer = Input(shape=(8,))
A2 = Dense(8, activation='relu')(input_layer)
A3 = Dense(30, activation='relu')(A2)
C3 = Dense(5, activation='relu')(A3)
model_b = Model(inputs=input_layer, outputs=C3, name="ModelB")
# Define Model C
input_layer = Input(shape=(8,))
A2 = Dense(8, activation='relu')(input_layer)
A3 = Dense(30, activation='relu')(A2)
C3 = Dense(5, activation='relu')(A3)
model_c = Model(inputs=input_layer, outputs=C3, name="ModelC")
I used the same number of parameters, change yourself. Anyway, these three models perform as each feature extractor (not classifier). Next, we will combine their output by averaging them and after that pass that to the classifier.
all_three_models = [model_a, model_b, model_c]
all_three_models_input = Input(shape=all_three_models[0].input_shape[1:])
models_output = [model(all_three_models_input) for model in all_three_models]
Avg = tf.keras.layers.average(models_output, name="Average")
final_out = Dense(1, activation='sigmoid')(Avg)
final_model = Model(inputs=all_three_models_input, outputs=final_out, name='Ensemble')
tf.keras.utils.plot_model(final_model, expand_nested=True)
Now, you can train the model and evaluate it on the test set. Hope this helps.
More info.
(1). You can add seed.
from tensorflow.keras.models import Model
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from sklearn.model_selection import train_test_split
import os, numpy
# random seed for reproducibility
numpy.random.seed(101)
tf.random.set_seed(101)
os.environ['TF_CUDNN_DETERMINISTIC'] = '1'
dataset = .. your data
# split into input (X) and output (Y) variables, splitting csv data
X = dataset[:, 0:8]
Y = dataset[:, 8]
x_train, x_validation, y_train, y_validation = train_test_split(X, Y,
test_size=0.20, random_state=101)
(2). Try with the SGD optimizer. Also, use the ModelCheckpoint callback to save the highest validation accuracy.
final_model.compile(loss="binary_crossentropy",
optimizer="sgd", metrics=['accuracy'])
model_save = tf.keras.callbacks.ModelCheckpoint(
'merge_best.h5',
monitor="val_accuracy",
verbose=0,
save_best_only=True,
save_weights_only=True,
mode="max",
save_freq="epoch"
)
# call the function to fit to the data (training the network)
final_model.fit(x_train, y_train, epochs=1000, batch_size=256, callbacks=[model_save],
validation_data=(x_validation, y_validation))
Evaluate on the test set.
# evaluate the model
final_model.load_weights('merge_best.h5')
scores = final_model.evaluate(x_validation,y_validation)
print("\n%s: %.2f%%" % (final_model.metrics_names[1], scores[1] * 100))
5/5 [==============================] - 0s 4ms/step - loss: 0.6543 - accuracy: 0.7662
accuracy: 76.62%
I am trying to train the mnist dataset on ResNet50 using the Keras library.
The shape of mnist is (28, 28, 1) however resnet50 required the shape to be (32, 32, 3)
How can I convert the mnist dataset to the required shape?
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(x_train.shape[0], x_train.shape[1], x_train.shape[2], 1)
x_test = x_test.reshape(x_test.shape[0], x_test.shape[1], x_test.shape[2], 1)
x_train = x_train/255.0
x_test = x_test/255.0
from keras.utils import to_categorical
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
model = models.Sequential()
# model.add(InputLayer(input_shape=(28, 28)))
# model.add(Reshape(target_shape=(32, 32, 3)))
# model.add(Conv2D())
model.add(conv_base)
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(BatchNormalization())
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(BatchNormalization())
model.add(Dense(10, activation='softmax'))
model.compile(optimizer=optimizers.RMSprop(lr=2e-5), loss='binary_crossentropy', metrics=['acc'])
history = model.fit(x_train, y_train, epochs=5, batch_size=20, validation_data=(x_test, y_test))
ValueError: Input 0 is incompatible with layer sequential_10: expected shape=(None, 32, 32, 3), found shape=(20, 28, 28, 1)
You need to resize the MNIST data set. Note that minimum size actually depends on the ImageNet model. For example: Xception requires at least 72, where ResNet is asking for 32. Apart from that, the MNIST is a grayscale image, but it may conflict if you're using the pretrained weight of these models. So, good and safe side is to resize and convert grayscale to RGB.
Full working code for you.
Data Set
We will resize MNIST from 28 to 32. Also, make 3 channels instead of keeping 1.
import tensorflow as tf
import numpy as np
(x_train, y_train), (_, _) = tf.keras.datasets.mnist.load_data()
# expand new axis, channel axis
x_train = np.expand_dims(x_train, axis=-1)
# [optional]: we may need 3 channel (instead of 1)
x_train = np.repeat(x_train, 3, axis=-1)
# it's always better to normalize
x_train = x_train.astype('float32') / 255
# resize the input shape , i.e. old shape: 28, new shape: 32
x_train = tf.image.resize(x_train, [32,32]) # if we want to resize
# one hot
y_train = tf.keras.utils.to_categorical(y_train , num_classes=10)
print(x_train.shape, y_train.shape)
(60000, 32, 32, 3) (60000, 10)
ResNet 50
input = tf.keras.Input(shape=(32,32,3))
efnet = tf.keras.applications.ResNet50(weights='imagenet',
include_top = False,
input_tensor = input)
# Now that we apply global max pooling.
gap = tf.keras.layers.GlobalMaxPooling2D()(efnet.output)
# Finally, we add a classification layer.
output = tf.keras.layers.Dense(10, activation='softmax', use_bias=True)(gap)
# bind all
func_model = tf.keras.Model(efnet.input, output)
Train
func_model.compile(
loss = tf.keras.losses.CategoricalCrossentropy(),
metrics = tf.keras.metrics.CategoricalAccuracy(),
optimizer = tf.keras.optimizers.Adam())
# fit
func_model.fit(x_train, y_train, batch_size=128, epochs=5, verbose = 2)
Epoch 1/5
469/469 - 56s - loss: 0.1184 - categorical_accuracy: 0.9690
Epoch 2/5
469/469 - 21s - loss: 0.0648 - categorical_accuracy: 0.9844
Epoch 3/5
469/469 - 21s - loss: 0.0503 - categorical_accuracy: 0.9867
Epoch 4/5
469/469 - 21s - loss: 0.0416 - categorical_accuracy: 0.9888
Epoch 5/5
469/469 - 21s - loss: 0.1556 - categorical_accuracy: 0.9697
<tensorflow.python.keras.callbacks.History at 0x7f316005a3d0>
data = np.random.random((10000, 150))
labels = np.random.randint(10, size=(10000, 1))
labels = to_categorical(labels, num_classes=10)
model = Sequential()
model.add(Dense(units=32, activation='relu', input_shape=(150,)))
model.add(Dense(units=10, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(data, labels, epochs=30, validation_split=0.2)
I created 10000 random samples to train my net, but it use only few of them(250/10000)
Exaple of the 1st epoch:
Epoch 1/30
250/250 [==============================] - 0s 2ms/step - loss: 2.1110 - accuracy: 0.2389 - val_loss: 2.2142 - val_accuracy: 0.1800
Your data is split into training and validation subsets (validation_split=0.2).
Training subset has size 8000 and validation 2000.
Training goes in batches, each batch has size 32 samples by default.
So one epoch should take 8000/32=250 batches, as it shows in the progress.
Try code like following example
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(10, size=(1000, 1))
# Convert labels to categorical one-hot encoding
one_hot_labels = keras.utils.to_categorical(labels, num_classes=10)
# Train the model, iterating on the data in batches of 32 samples
model.fit(data, one_hot_labels, epochs=10, batch_size=32)