I have 1 dataset (MNIST btw), splitted into train and test, both have exactly the same shape. I train a convolutional Autoencoder on on the train part and use the other for validation as seen below in the fit() function call.
The code works perfectly(i.e. model train on train data and provides good results) if I remove the validation_data=(x_test,x_test)
But I have to use validation_data, the problem is when I use them, after the first epoch, when the loss gets calculated on the train data and needs to be calculated for the test data, I get an error:
Epoch 1/5 896/1000 [=========================>....] - ETA: 0s - loss:
0.6677--------------------------------------------------------------------------- InvalidArgumentError Traceback (most recent call
last)
InvalidArgumentError: Tensor must be 4-D with last dim 1, 3, or 4, not
[1,3,3,8,8,1]
[[Node: conv2d_3/kernel_0_1 = ImageSummary[T=DT_FLOAT, bad_color=Tensor,
max_images=3
How can I resolve that?
(x_train, _), (x_test, _) = mnist.load_data()
print("+++++++++++++++shape of x_train " , x_train.shape)
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
# adapt this if using `channels_first` image data format
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))
# adapt this if using `channels_first` image data format
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))
#TODO remove after i have solved the problem with the dim mismatch when using the validation dataset
x_train = x_train[range(1000),:,:,:]
x_test = x_test[range(1000),:,:,:]
# execute this in terminal to start tensorboard and let it watch the given logfile
# tensorboard --logdir=/tmp/autoencoder
tensorboardPath = os.path.join(os.getcwd(),"tensorboard")
tensorBoard = TensorBoard(log_dir=tensorboardPath,write_graph=True,write_images=True,histogram_freq=1, embeddings_freq=1, embeddings_layer_names=None)
checkpointer = ModelCheckpoint(filepath=os.path.join(os.getcwd(),"tensorboard"), verbose=1, save_best_only=True)
autoencoder.fit(x_train, x_train,
epochs=5,
batch_size=128,
shuffle=True,
validation_data=(x_test,x_test),
callbacks=[tensorBoard, checkpointer])`
Ok, I found out where the problem is.
Apparently, when using tenorboard callbacks with the write_images set to true.
There is a problem with writing visualisations of the convolutional layers as images. Because there is a dimension mismatch. As I understood, such debugging data are written out in case validation data are available. If I set the write_images to false, all works fine.
Related
I created a simple TensorFlow model (no convolution layers) using the MNIST dataset. I initially used SparseCategoricalCrossentropy loss function and it worked fine. I created a nearly identical model this time using CategoricalCrossentropy loss and changed the labels to one-hot encoding:
(x_train, y_train), (x_test, y_test) = mnist_data
# scale data:
x_train = x_train / 255
x_test = x_test / 255
# create one-hot encoding:
y_train_one_hot = tf.one_hot(y_train, 10).numpy()
y_test_one_hot = tf.one_hot(y_test, 10).numpy()
# print shapes:
# each image is 28x28; 60,000 examples; 10 possible output values:
print(x_train.shape) # (60000, 28, 28)
print(y_train.shape) # (60000, 10)
model = Sequential([
Flatten(input_shape=(28, 28)),
Dense(128, activation='relu'),
Dropout(0.2),
Dense(10, activation='softmax')
])
model.compile(
Adam(learning_rate=0.01),
loss='categorical_crossentropy',
metrics=['accuracy']
)
model_history = model.fit(
x_train,
y_train_one_hot,
epochs=20
)
However, I'm not able to even train the model as I get an error: ValueError: Shape mismatch: The shape of labels (received (320,)) should equal the shape of logits except for the last dimension (received (32, 10)). I don't really understand what shape is wrong or why.
Edit:
I forgot to show how I set mnist_data:
mnist_data = tf.keras.datasets.mnist.load_data()
all I did was change the first line of code
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
It trained as it should. Note you have this line of code
print(y_train.shape) # (60000, 10)
the dimension is not (60000,10) it is (60000). After you one hot encode it it will be (60000,10)
I want to train a keras neural network on the mnist dataset. The problem is that my model already overfits after 1 or 2 epochs. To combat this problem, I wanted to use data augmentation:
First I load the data:
#load mnist dataset
(tr_images, tr_labels), (test_images, test_labels) = mnist.load_data()
#normalize images
tr_images, test_images = preprocess(tr_images, test_images)
#function which returns the amount of train images, test images and classes
amount_train_images, amount_test_images, total_classes = get_data_information(tr_images, tr_labels, test_images, test_labels)
#convert labels into the respective vectors
tr_vector_labels = keras.utils.to_categorical(tr_labels, total_classes)
test_vector_labels = keras.utils.to_categorical(test_labels, total_classes)
I create a model with a "create_model" function:
untrained_model = create_model()
This is the function definition:
def create_model(_learning_rate=0.01, _momentum=0.9, _decay=0.001, _dense_neurons=128, _fully_connected_layers=3, _loss="sparse_categorical_crossentropy", _dropout=0.1):
#create model
model = keras.Sequential()
#input
model.add(Flatten(input_shape=(28, 28)))
#add fully connected layers
for i in range(_fully_connected_layers):
model.add(Dense(_dense_neurons, activation='relu'))
model.add(Dropout(_dropout))
#classifier
model.add(Dense(total_classes, activation='sigmoid'))
optimizer = keras.optimizers.SGD(
learning_rate=_learning_rate,
momentum=_momentum,
decay=_decay
)
#compile
model.compile(
optimizer=optimizer,
loss=_loss,
metrics=['accuracy']
)
return model
The function returns a compiled but untrained model. I also use this function when I try to optimize the hyperparameters (hence the many parameters).
Then I create an ImagaDataGenerator:
generator = tf.keras.preprocessing.image.ImageDataGenerator(
rotation_range=0.15,
width_shift_range=0.15,
height_shift_range=0.15,
zoom_range=0.15
)
Now I want to train the model with my train_model_with_data_augmentation function:
train_model_with_data_augmentation(
tr_images=tr_images,
tr_labels=tr_labels,
test_images=test_images,
test_labels=test_labels,
model=untrained_model,
generator=generator,
hyperparameters=hyperparameters
)
However, I don't know how to use this generator for the model I've created because the only method I've found was the fit method of the generator but I want to train my model and not the generator.
Here is the graph that I get from the training history: https://ibb.co/sKFnwGr
Can I somehow convert the generator to data that I can use as parameters in the fit method of the model?
If not: How can I train the model I've created with this generator? (or do I have to implement data augmentation in a completely different way?)
Does data augmentation even make sense with the mnist dataset?
What other options are there to prevent overfitting on mnist?
Update:
I tried to use this code:
generator.fit(x_train)
model.fit(generator.flow(x_train, y_train, batch_size=32), steps_per_epoch=len(x_train)/32, epochs=epochs)
However I get this error message:
ValueError: "Input to .fit() should have rank 4. Got array with shape: (60000, 28, 28)"
I believe the input matrix of the fit method should contain Image Index, height, widht, depth so it should have 4 dimensions while my x_train array only has 3 dimensions and doesn't have any dimension about the depth of the image. I tried to expand it:
x_train = x_train[..., np.newaxis]
y_train = y_train[..., np.newaxis]
But then I get this error message:
"Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized. The process may be terminated."
Working example of using ImageDataGenerator can be found here. The example itself:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
y_train = np_utils.to_categorical(y_train, num_classes)
y_test = np_utils.to_categorical(y_test, num_classes)
datagen = ImageDataGenerator(
featurewise_center=True,
featurewise_std_normalization=True,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True)
# compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied)
datagen.fit(x_train)
# fits the model on batches with real-time data augmentation:
model.fit(datagen.flow(x_train, y_train, batch_size=32),
steps_per_epoch=len(x_train) / 32, epochs=epochs)
I want to simultaneously augment X (500,28,28,1), Y (500,28,28,1) imageset in keras and store them in an array for visualizing results (before i can train a network). The output y is not a label but an image.
I copy X_train in y_train (Mnist dataset) and i want to apply same effects in both x, y for training a network. However, i am unable to do transofmration for both X and y. I am getting ZCA on X only.My code is :
'''
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape((X_train.shape[0], 28, 28, 1))
X_train = X_train.astype('float32')
y_train=X_train
datagen = ImageDataGenerator(zca_whitening=True)
datagen.fit(X_train)
datagen.fit(y_train)
training_set=datagen.flow(X_train,y_train,batch_size=100):
temp=np.asarray(training_set[0])
'''
temp[0...] has ZCA applied whereas temp[1..] doesnt have any effect
You need to pass pairs of X_train, y_train and X_test, y_test as arguments to datagen's flow method. Here's an example:
datagen = ImageDataGenerator(zca_whitening=True)
datagen.fit(X_train) # to compute quantities required for featurewise normalization
training_set = datagen.flow(X_train, y_train, batch_size=100)
test_set = datagen.flow(X_test, y_test, batch_size=100)
classifier.fit_generator(training_set, validation_data=test_set, epochs=100)
This allows for simultaneous augmentation of input X and corresponding ground-truth labels Y for training the neural network.
Hope this helps!
Here are a few references for the same: 1, 2 & 3
I am facing this problem of creating a dataset from a very few images.
Both input (X_train) and output (y_train) contains (28x28) size images such as MNIST. For example in my code:
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape((X_train.shape[0], 28, 28, 1))
X_train = X_train.astype('float32')
y_train=X_train
datagen = ImageDataGenerator(zca_whitening=True)
How can I fit this datagen to both X_train and y_train simultaneously and save them in a dataset array. Don't want to pass it to training.
Thank you for the help
Beware that augmentation per se is not applied on the target variable y_train but only on the input variables X_train. The generator is only going to reproduce the same ground truth labels y for the newly generated X.
Hence fitting the generator is only using X_train:
datagen.fit(X_train)
If you do not want to pass the augmented data to training, you can loop over the generator after fitting to get the generated samples:
for X_batch, y_batch in datagen.flow(X_train, y_train, batch_size=32):
# Do whatever you want with the generated X_batch and y_batch.
I understand that is what you are willing to do.
See examples on keras doc.
After applying PCA on MNIST data, I identified CNN model and layers. After fitting CNN model (X_train_PCA, Y_train) I end up with dimension problem at evaluation phase. Here is the message
"ValueError: Error when checking input: expected conv2d_1_input to have shape (1, 10, 10) but got array with shape (1, 28, 28)". When I try to reshape X_test into 10X10 format, I got a very low score
First I applied min-max regularization, and then PCA to X_train. Then, I produced validation data from X_train. The problem is; I can fit the data in 100 dimension format(after applying PCA), my input data becomes 10X10. When I try to get score from fitted model using X_test which is still (10000, 1, 28, 28)). I get an errors as mentioned above. How can I solve dimension problem. I also tried to transform X_test with minmaxscaler and PCA. No change in score
pca_3D = PCA(n_components=100)
X_train_pca = pca_3D.fit_transform(X_train)
X_train_pca.shape
cnn_model_1_scores = cnn_model_1.evaluate(X_test, Y_test, verbose=0)
# Split the data into training, validation and test sets
X_train1 = X_pca_proj_3D[:train_size]
X_valid = X_pca_proj_3D[train_size:]
Y_train1 = Y_train[:train_size]
Y_valid = Y_train[train_size:]
# We need to convert the input into (samples, channels, rows, cols) format
X_train1 = X_train1.reshape(X_train1.shape[0], 1, 10,
10).astype('float32')
X_valid = X_valid.reshape(X_valid.shape[0], 1, 10, 10).astype('float32')
X_test = X_test.reshape(X_test.shape[0], 1, 28, 28).astype('float32')
X_train1.shape, X_valid.shape, X_test.shape
((51000, 1, 10, 10), (9000, 1, 10, 10), (10000, 1, 28, 28))
#create model
cnn_model_1=Sequential()
#1st Dense Layer
cnn_model_1.add(Conv2D(32, kernel_size=(5,5),
data_format="channels_first",
input_shape=(1,10,10),
activation='relu'))
#Max-Pooling
cnn_model_1.add(MaxPooling2D(pool_size=(2,2)))
#Max pooling is a sample-based discretization process. The objective is to
down-sample an input representation (image, hidden-layer output matrix,
etc.), reducing its dimensionality
# the number of layers, remains unchanged in the pooling operation
#cnn_model_1.add(BatchNormalization())
#Dropout
cnn_model_1.add(Flatten())
#cnn_model_1.add(BatchNormalization())
#2nd Dense Layer
cnn_model_1.add(Dense(128, activation='relu'))
#final softmax layer
cnn_model_1.add(Dense(10, activation='softmax'))
# print a summary and check if you created the network you intended
cnn_model_1.summary()
#Compile Model
cnn_model_1.compile(loss='categorical_crossentropy', optimizer='adam',
metrics=['accuracy'])
#Fit the model
cnn_model_1_history=cnn_model_1.fit(X_train1, Y_train1,
validation_data=(X_valid, Y_valid), epochs=5, batch_size=100, verbose=2)
# Final evaluation of the model
cnn_model_1_scores = cnn_model_1.evaluate(X_test, Y_test, verbose=0)
print("Baseline Test Accuracy={0:.2f}% (categorical_crossentropy) loss=
{1:.2f}".format(cnn_model_1_scores[1]*100, cnn_model_1_scores[0]))
cnn_model_1_scores
I solved the problem, updating the post to give intuition for other coders to debug their code. First, I applied PCA on X_test data and after getting low score I tried without applying. As #Scott suggested, this was wrong. After carefully checking my code, I saw that I forgot to change X_test to X_test_pca after applying PCA on test data while constructing CNN model. I also fitted PCA on X_train while applying PCA on X_test data.