ValueError: Can not squeeze dim[1], expected a dimension of 1, got 3 for 'metrics/sparse_categorical_accuracy/Squeeze' (op: 'Squeeze') with input shapes: [?,3].
The Iris dataset
In this assignment, you will use the Iris dataset. It consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters. For a reference, see the following papers:
R. A. Fisher. "The use of multiple measurements in taxonomic problems". Annals of Eugenics. 7 (2): 179–188, 1936.
Your goal is to construct a neural network that classifies each sample into the correct class, as well as applying validation and regularisation techniques.
Load and preprocess the data
First read in the Iris dataset using datasets.load_iris(), and split the dataset into training and test sets.
You can now construct a model to fit to the data. Using the Sequential API, build your model according to the following specifications:
The model should use the input_shape in the function argument to set the input size in the first layer.
The first layer should be a dense layer with 64 units.
The weights of the first layer should be initialised with the He uniform initializer.
The biases of the first layer should be all initially equal to one.
There should then be a further four dense layers, each with 128 units.
This should be followed with four dense layers, each with 64 units.
All of these Dense layers should use the ReLU activation function.
The output Dense layer should have 3 units and the softmax activation function.
In total, the network should have 10 layers.
from numpy.random import seed
seed(8)
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, model_selection
get_ipython().run_line_magic('matplotlib', 'inline')
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Softmax
def read_in_and_split_data(iris_data):
return model_selection.train_test_split(
iris_data["data"],
iris_data["target"],
test_size=0.1
)
# In[3]:
# Run your function to generate the test and training data.
iris_data = datasets.load_iris()
(train_data, test_data,
train_targets, test_targets) = read_in_and_split_data(iris_data)
# We will now convert the training and test targets using a one hot encoder.
# In[4]:
# Convert targets to a one-hot encoding
train_targets = tf.keras.utils.to_categorical(np.array(train_targets))
test_targets = tf.keras.utils.to_categorical(np.array(test_targets))
#### GRADED CELL ####
# Complete the following function.
# Make sure to not change the function name or arguments.
def get_model(input_shape):
"""
This function should build a Sequential model according to
the above specification. Ensure the weights are initialised
by providing the input_shape argument in the first layer, given by the
function argument.
Your function should return the model.
"""
model = Sequential([
Dense(64, activation = "rely",
kernel_initializer='he_uniform',
bias_initializer='ones',
input_shape=input_shape),
Dense(128, activation = "relu"),
Dense(128, activation = "relu"),
Dense(128, activation = "relu"),
Dense(128, activation = "relu"),
Dense(64, activation = "relu"),
Dense(64, activation = "relu"),
Dense(64, activation = "relu"),
Dense(64, activation = "relu"),
Dense(3, activation = "softmax"),
])
return model
# In[16]:
# Run your function to get the model
model = get_model(train_data[0].shape)
# #### Compile the model
#
# You should now compile the model using the `compile` method.
# Remember that you need to specify an optimizer, a loss function and
# a metric to judge the performance of your model.
# In[23]:
#### GRADED CELL ####
# Complete the following function.
# Make sure to not change the function name or arguments.
def compile_model(model):
#model.compile(loss="sparse_categorical_crossentropy", optimizer="adam")
opt = tf.keras.optimizers.Adam(learning_rate=0.0001)
acc = tf.keras.metrics.SparseCategoricalAccuracy()
model.compile(optimizer=opt,
loss='sparse_categorical_crossentropy',
metrics=[acc] )
# In[24]:
# Run your function to compile the model
compile_model(model)
#### GRADED CELL ####
# Complete the following function.
# Make sure to not change the function name or arguments.
def train_model(model, train_data, train_targets, epochs):
"""
This function should train the model for the given number of epochs on the
train_data and train_targets.
Your function should return the training history, as returned by model.fit.
"""
return model.fit(train_data, train_targets, epochs)
# Run the following cell to run the training for 800 epochs.
# In[26]:
# Run your function to train the model
history = train_model(model, train_data, train_targets, epochs=800)
This is because you have the wrong loss function. Your targets are one-hot encoded, so you should not use 'sparse_categorical_crossentropy'. Instead, you should use 'categorical_crossentropy'
Same thing for acc = tf.keras.metrics.SparseCategoricalAccuracy(). It should be acc = tf.keras.metrics.CategoricalAccuracy()
Related
I have a dataset whose scheme is like:
X1 ... X20 C
where the first 20 columns are input data, and the last column is the target one. The dataset includes 2000 record. I want to design a sequential Keras model to classify those target labels (which vary from 1 to 10, thereby being multi-label classification problem). Assuming that I have saved those input data and labels in X_train_1 and y_train_1, Here is my model:
def build_model_1(n_hidden = 1, n_neurons = 30, learning_rate = 3e-3, input_shape = X_train_1.shape):
model = tf.keras.Sequential()
model.add(tf.keras.layers.InputLayer(input_shape=input_shape))
model.add(tf.keras.layers.BatchNormalization(momentum=0.999))
for layer in range(n_hidden):
model.add(tf.keras.layers.Dense(n_neurons, tf.keras.activations.selu,
kernel_initializer="lecun_normal",
kernel_regularizer= tf.keras.regularizers.l2(0.01)))
model.add(tf.keras.layers.BatchNormalization(momentum=0.999))
model.add(tf.keras.layers.Dense(10, tf.keras.activations.softmax, kernel_initializer="lecun_normal"))
loss = tf.keras.losses.categorical_crossentropy
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate, beta_1=0.9, beta_2=0.999)
metric = [tf.keras.metrics.Accuracy()]
model.compile(loss = loss, optimizer=optimizer, metrics=[metric])
return model
I thought the shape of the input should be that of my training dataset, however when I compile and fit my model, I get the following error:
ValueError: Input 0 of layer sequential_12 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (32, 20)
What am I doing wrong here?
Your input shape is simply 20, since you have 20 features and 2000 samples. You do not have to provide the batch size. Here is a working example:
import tensorflow as tf
import numpy as np
def build_model_1(n_hidden = 1, n_neurons = 30, learning_rate = 3e-3, input_shape = (20,)):
model = tf.keras.Sequential()
model.add(tf.keras.layers.InputLayer(input_shape=input_shape))
model.add(tf.keras.layers.BatchNormalization(momentum=0.999))
for layer in range(n_hidden):
model.add(tf.keras.layers.Dense(n_neurons, tf.keras.activations.selu,
kernel_initializer="lecun_normal",
kernel_regularizer= tf.keras.regularizers.l2(0.01)))
model.add(tf.keras.layers.BatchNormalization(momentum=0.999))
model.add(tf.keras.layers.Dense(10, tf.keras.activations.softmax, kernel_initializer="lecun_normal"))
loss = tf.keras.losses.categorical_crossentropy
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate, beta_1=0.9, beta_2=0.999)
metric = [tf.keras.metrics.Accuracy()]
model.compile(loss = loss, optimizer=optimizer, metrics=[metric])
return model
train_data = np.random.random((2000, 20))
model = build_model_1()
y = model(train_data)
Also, ask yourself if you are really dealing with a multi-label classification problem. Can a sample from your dataset belong to more than one class, or are the classes mutually exclusive? If the classes are not mutually exclusive, I would recommend changing the activation function for the output layer to sigmoid and changing the loss function to binary_crossentropy. The intuition behind this can be found here.
I'm new to Machine Learning, thought I'll start with keras. Here I'm classifying movie reviews as three class classification (positive as 1, neutral as 0 and negative as -1) using binary crossentropy. So, when I'm trying to wrap my keras model with tensorflow estimator, I get the error.
The code is as follows:
import tensorflow as tf
import numpy as np
import pandas as pd
import numpy as K
csvfilename_train = 'train(cleaned).csv'
csvfilename_test = 'test(cleaned).csv'
# Read .csv files as pandas dataframes
df_train = pd.read_csv(csvfilename_train)
df_test = pd.read_csv(csvfilename_test)
train_sentences = df_train['Comment'].values
test_sentences = df_test['Comment'].values
# Extract labels from dataframes
train_labels = df_train['Sentiment'].values
test_labels = df_test['Sentiment'].values
vocab_size = 10000
embedding_dim = 16
max_length = 30
trunc_type = 'post'
oov_tok = '<OOV>'
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
tokenizer = Tokenizer(num_words = vocab_size, oov_token = oov_tok)
tokenizer.fit_on_texts(train_sentences)
word_index = tokenizer.word_index
sequences = tokenizer.texts_to_sequences(train_sentences)
padded = pad_sequences(sequences, maxlen = max_length, truncating = trunc_type)
test_sequences = tokenizer.texts_to_sequences(test_sentences)
test_padded = pad_sequences(test_sequences, maxlen = max_length)
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length = max_length),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(6, activation = 'relu'),
tf.keras.layers.Dense(2, activation = 'sigmoid'),
])
model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
num_epochs = 10
model.fit(padded, train_labels, epochs = num_epochs, validation_data = (test_padded, test_labels))
And the error is as follows:
---> 10 model.fit(padded, train_labels, epochs = num_epochs, validation_data = (test_padded, test_labels))
And finally this:
ValueError: logits and labels must have the same shape ((None, 2) vs (None, 1))
There are several issues with your code.
You are using the wrong loss function. The binary cross-entropy loss is used for binary classification problems but here you are doing a multi-class classification (3 classes - positive, negative, neutral).
Using the sigmoid activation function in the last layer is wrong because the sigmoid function maps logit values to a range between 0 and 1 (However, your class labels are 0, 1 and -1). This clearly shows that the network will never be able to predict a negative value because of the sigmoid function (which can only map values between 0 and 1) and hence, will never learn to predict the negative class.
The right approach would be to view this as a multi-class classification problem and use the categorical cross-entropy loss accompanied by the softmax activation in your last Dense layer with 3 units (one for each class). Note that one-hot encoded labels have to be used for the categorical cross-entropy loss and integer labels can be used along with the sparse categorical cross-entropy loss.
Below is an example using categorical cross-entropy loss.
tf.keras.layers.Dense(3, activation = 'softmax')
Note the 3 changes:
loss function changed to categorical cross-entropy
No. of units in final Dense layer is 3
One-hot encoding of labels is required and can be done using tf.one_hot
tf.one_hot(train_labels, 3)
.
Note: All code for a self-contained example to reproduce my problem can be found below.
I have a tf.keras.models.Model instance and need to train it with a training loop written in the low-level TensorFlow API.
The problem:
Training the exact same tf.keras model once with a basic, standard low-level TensorFlow training loop and once with Keras' own model.fit() method produces very different results. I would like to find out what I'm doing wrong in my low-level TF training loop.
The model is a simple image classification model that I train on Caltech256 (link to tfrecords below).
With the low-level TensorFlow training loop, the training loss first decreases as it should, but then after just 1000 training steps, the loss plateaus and then starts increasing again:
Training the same model on the same dataset using the normal Keras training loop, on the other hand, works as expected:
What am I missing in my low-level TensorFlow training loop?
Here is the code to reproduce the problem (download the TFRecords with the link at the bottom):
import tensorflow as tf
from tqdm import trange
import sys
import glob
import os
sess = tf.Session()
tf.keras.backend.set_session(sess)
num_classes = 257
image_size = (224, 224, 3)
# Build a tf.data.Dataset from TFRecords.
tfrecord_directory = 'path/to/tfrecords/directory'
tfrecord_filennames = glob.glob(os.path.join(tfrecord_directory, '*.tfrecord'))
feature_schema = {'image': tf.FixedLenFeature([], tf.string),
'filename': tf.FixedLenFeature([], tf.string),
'label': tf.FixedLenFeature([], tf.int64)}
dataset = tf.data.Dataset.from_tensor_slices(tfrecord_filennames)
dataset = dataset.shuffle(len(tfrecord_filennames)) # Shuffle the TFRecord file names.
dataset = dataset.flat_map(lambda filename: tf.data.TFRecordDataset(filename))
dataset = dataset.map(lambda single_example_proto: tf.parse_single_example(single_example_proto, feature_schema)) # Deserialize tf.Example objects.
dataset = dataset.map(lambda sample: (sample['image'], sample['label']))
dataset = dataset.map(lambda image, label: (tf.image.decode_jpeg(image, channels=3), label)) # Decode JPEG images.
dataset = dataset.map(lambda image, label: (tf.image.resize_image_with_pad(image, target_height=image_size[0], target_width=image_size[1]), label))
dataset = dataset.map(lambda image, label: (tf.image.per_image_standardization(image), label))
dataset = dataset.map(lambda image, label: (image, tf.one_hot(indices=label, depth=num_classes))) # Convert labels to one-hot format.
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.repeat()
dataset = dataset.batch(32)
iterator = dataset.make_one_shot_iterator()
features, labels = iterator.get_next()
# Build a simple model.
input_tensor = tf.keras.layers.Input(shape=image_size)
x = tf.keras.layers.Conv2D(64, (3,3), strides=(2,2), activation='relu', kernel_initializer='he_normal')(input_tensor)
x = tf.keras.layers.Conv2D(64, (3,3), strides=(2,2), activation='relu', kernel_initializer='he_normal')(x)
x = tf.keras.layers.Conv2D(128, (3,3), strides=(2,2), activation='relu', kernel_initializer='he_normal')(x)
x = tf.keras.layers.Conv2D(256, (3,3), strides=(2,2), activation='relu', kernel_initializer='he_normal')(x)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dense(num_classes, activation=None, kernel_initializer='he_normal')(x)
model = tf.keras.models.Model(input_tensor, x)
This is the simple TensorFlow training loop:
# Build the training-relevant part of the graph.
model_output = model(features)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=tf.stop_gradient(labels), logits=model_output))
train_op = tf.train.AdamOptimizer().minimize(loss)
# The next block is for the metrics.
with tf.variable_scope('metrics') as scope:
predictions_argmax = tf.argmax(model_output, axis=-1, output_type=tf.int64)
labels_argmax = tf.argmax(labels, axis=-1, output_type=tf.int64)
mean_loss_value, mean_loss_update_op = tf.metrics.mean(loss)
acc_value, acc_update_op = tf.metrics.accuracy(labels=labels_argmax, predictions=predictions_argmax)
local_metric_vars = tf.contrib.framework.get_variables(scope=scope, collection=tf.GraphKeys.LOCAL_VARIABLES)
metrics_reset_op = tf.variables_initializer(var_list=local_metric_vars)
# Run the training
epochs = 3
steps_per_epoch = 1000
fetch_list = [mean_loss_value,
acc_value,
train_op,
mean_loss_update_op,
acc_update_op]
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
with sess.as_default():
for epoch in range(1, epochs+1):
tr = trange(steps_per_epoch, file=sys.stdout)
tr.set_description('Epoch {}/{}'.format(epoch, epochs))
sess.run(metrics_reset_op)
for train_step in tr:
ret = sess.run(fetch_list, feed_dict={tf.keras.backend.learning_phase(): 1})
tr.set_postfix(ordered_dict={'loss': ret[0],
'accuracy': ret[1]})
Below is the standard Keras training loop, which works as expected. Note that the activation of the dense layer in the model above needs to be changed from None to 'softmax' in order for the Keras loop to work.
epochs = 3
steps_per_epoch = 1000
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(dataset,
epochs=epochs,
steps_per_epoch=steps_per_epoch)
You can download the TFRecords for the Caltech256 dataset here (about 850 MB).
UPDATE:
I've managed to solve the problem: Replacing the low-level TF loss function
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=tf.stop_gradient(labels), logits=model_output))
by its Keras equivalent
loss = tf.reduce_mean(tf.keras.backend.categorical_crossentropy(target=labels, output=model_output, from_logits=True))
does the trick. Now the low-level TensorFlow training loop behaves just like model.fit().
This raises a new question:
What does tf.keras.backend.categorical_crossentropy() do that tf.nn.softmax_cross_entropy_with_logits_v2() doesn't that leads the latter to perform much worse? (I know that the latter needs logits, not softmax output, so that's not the issue)
Replacing the low-level TF loss function
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=tf.stop_gradient(labels), logits=model_output))
by its Keras equivalent
loss = tf.reduce_mean(tf.keras.backend.categorical_crossentropy(target=labels, output=model_output, from_logits=True))
does the trick. Now the low-level TensorFlow training loop behaves just like model.fit().
However, I don't know why this is. If anyone knows why tf.keras.backend.categorical_crossentropy() behaves well while tf.nn.softmax_cross_entropy_with_logits_v2() doesn't work at all, please post an answer.
Another important note:
In order to train a tf.keras model with a low-level TF training loop and a tf.data.Dataset object, one generally shouldn't call the model on the iterator output. That is, one shouldn't do this:
model_output = model(features)
Instead, one should create a model in which the input layer is set to build on the iterator output instead of creating a placeholder, like so:
input_tensor = tf.keras.layers.Input(tensor=features)
This doesn't matter in this example, but it becomes relevant if any layers in the model have internal updates that need to be run during the training (e.g. BatchNormalization).
You apply a softmax activation on your last layer
x = tf.keras.layers.Dense(num_classes, activation='softmax', kernel_initializer='he_normal')(x)
and you apply again a softmax when using
tf.nn.softmax_cross_entropy_with_logits_v2 as it expects unscaled logits. From the documentation:
WARNING: This op expects unscaled logits, since it performs a softmax
on logits internally for efficiency. Do not call this op with the
output of softmax, as it will produce incorrect results.
Thus, remove the softmax activation of your last layer and it should work.
x = tf.keras.layers.Dense(num_classes, activation=None, kernel_initializer='he_normal')(x)
[...]
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=tf.stop_gradient(labels), logits=model_output))
I am trying to build a neural network to predict 3 output values out of 63 inputs. I have a dataset containing two numpy arrays with the shape of [8100, 63] and [8100, 3] but when I try to feed them to Keras the model does not converge and the mean squared error is in the area of 10^11.
The function i used to calculate the Data does not have any non-linear properties so i first thought that one or two layers should be enough. with three layers the MSE is still in the area of 10^10 and I am not sure what I am doing wrong.
The regression should return three absolute Values which can be bigger than 1 - this is the reason why I didn't use softmax layers.
I would be really grateful for any input or help!
import numpy as np
from keras.models import *
from keras.layers import Dense
from keras import optimizers
from keras.utils import plot_model
np.random.seed(7)
#Define Input
tf_features_64 = np.load("IN.npy")
tf_labels_64 = np.load("OUT.npy")
tf_features_32 = tf_features_64.astype(np.float32)
tf_labels_32 = tf_labels_64.astype(np.float32)
X = tf_features_32
Y = tf_labels_32
#create Layers
visible = Input(shape=(63,))
x = Dense(100, activation='relu')(visible)
x = Dense(100, activation='relu')(x)
x = Dense(100, activation='relu')(x)
x = Dense(70, activation='relu')(x)
x = Dense(30, activation='relu')(x)
output = Dense(3)(x)
Optimizer = optimizers.adam(lr=0.001)
model = Model(inputs=visible, outputs = output)
model.compile(optimizer=Optimizer,
loss='categorical_crossentropy',
metrics=['mse']
)
model.fit(X, Y, epochs=400, batch_size=300, shuffle=True)
print(model.summary)
When we are using neural networks for classification we should use softmax at last layer with categorical_crossentropy loss.
output = Dense(3, activation='softmax')(x)
model.compile(optimizer=Optimizer,
loss='categorical_crossentropy')
For regression we should use linear output with mse loss
output = Dense(3)(x)
model.compile(optimizer=Optimizer,
loss='mse')
You are using categorical_crossentropy as a loss function and mse as a metric
model.compile(optimizer=Optimizer,
loss='categorical_crossentropy',
metrics=['mse']
)
Change loss function to mse
model.compile(optimizer=Optimizer,
loss='mse')
My data shape is the same, I just generated here random numbers. In real the datas are float numbers from range -6 to 6, I scaled them as well. The Input layer size and Encoding dimension have to remain the same. When I am training the loss starts and stays at 0.631 all the time. I changed the learning rate manually. I am new to python and do not know to implement to a grid search to this code to find the right parameters. What else can I do to tune my network ?
import numpy as np
from keras.layers import Input, Dense
from keras.models import Model
from keras import optimizers
#Train data
x_train=np.random.rand(2666000)
x_train = (train-min(train))/(max(train)-min(train))
x_train=x_train.reshape(-1,2000)
x_test=[]#empty testing later
#Enc Dimension
encoding_dim=100
#Input shape
input_dim = Input(shape=(2000,))
#Encoding Layer
encoded = Dense(encoding_dim, activation='relu')(input_dim)
#Decoding Layer
decoded = Dense(2000, activation='sigmoid')(encoded)
#Model AE
autoencoder = Model(input_dim, decoded)
#Model Encoder
encoder = Model(input_dim, encoded)
#Encoding
encoded_input = Input(shape=(encoding_dim,))
#Decoding
decoder_layer = autoencoder.layers[-1]
#Model Decoder
decoder = Model(encoded_input, decoder_layer(encoded_input))
optimizers.Adadelta(lr=0.1, rho=0.95, epsilon=None, decay=0.0)
autoencoder.compile(optimizer=optimizer, loss='binary_crossentropy',
metrics=['accuracy'])
#Train and test
autoencoder_train= autoencoder.fit(x_train, x_train,
epochs=epochs, shuffle=False, batch_size=2048)
I suggest adding more hidden layers. If your loss stays the same it means at least one of two things:
Your data is more or less random and there are no relationships to be drawn
Your model is not complex enough to learn meaningful relationships from your data
A rule of thumb for me is that a model should be powerful enough to overfit the data given enough training iterations.
Unfortunately there is a fine line between sufficiently complex and too complex. You have to play around with the number of hidden layers, the number of units in each layer, and the amount of epochs you take to train your network. Since you only have two Dense layers, a good starting point would be to increase model complexity.
If you insist on using a grid search keras has a wrapper for scikit_learn and sklearn has a grid search module. A toy example:
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
def create_model():
<return a compiled but untrained keras model>
model = KerasClassifier(build_fn = create_model, batch_size=1000, epochs=10)
#now write out all the parameters you want to try out for the grid search
activation = ['relu', 'tanh', 'sigmoid'...]
learn_rate = [0.1, 0.2, ...]
init = ['unform', 'normal', 'zero', ...]
optimizer = ['SGD', 'Adam' ...]
param_grid = dict(activation=activation, learn_rate=learn_rate, init=init, optimizer=optimizer)
grid = GridSearchCV(estimator=model, param_grid=param_grid)
result = grid.fit(X, y)