Loss not converging in visual question answering with keras

Loss not converging in visual question answering with keras - python

I am trying to train a neural network for visual question answering but the loss keeps diverging.
Basic hyperparameters modifications gave no results and i've tried different models too with no result. Here is a model i used:
word2vec_dim = 30
num_hidden_nodes_mlp = 1024
num_hidden_nodes_lstm = 30
num_layers_lstm = 2
dropout = 0.3
activation_mlp = 'tanh'
num_epochs = 1
image_model = Sequential()
image_model.add(Reshape(input_shape = (320,480,4), target_shape=(320,480,4)))
image_model.add(Conv2D(4,(3,1)))
image_model.add(Conv2D(4,(1,3)))
image_model.add(MaxPooling2D(pool_size=(2, 2)))
image_model.add(Conv2D(4,(3,1)))
image_model.add(Conv2D(4,(1,3)))
image_model.add(MaxPooling2D(pool_size=(2, 2)))
image_model.add(Conv2D(4,(3,1)))
image_model.add(Conv2D(4,(1,3)))
image_model.add(MaxPooling2D(pool_size=(2, 2)))
image_model.add(Conv2D(4,(3,1)))
image_model.add(Conv2D(4,(1,3)))
image_model.add(Flatten())
image_model.add(Dense(num_hidden_nodes_lstm, activation='relu'))
model1 = Model(inputs = image_model.input, outputs = image_model.output)
model1.summary()
language_model = Sequential()
language_model.add(Embedding(len(unique_words)+1, word2vec_dim, input_length=max_lenght))
language_model.add(LSTM(units=num_hidden_nodes_lstm,
return_sequences=True, input_shape=(None, word2vec_dim)))
for i in range(num_layers_lstm-2):
language_model.add(LSTM(units=num_hidden_nodes_lstm, return_sequences=True))
language_model.add(LSTM(units=num_hidden_nodes_lstm, return_sequences=False))
model2 = Model(language_model.input, language_model.output)
model2.summary()
combined = concatenate([image_model.output, language_model.output])
model = Dense(512, activation="tanh", kernel_initializer="uniform")(combined)
#model = Activation('tanh')(model)
model = Dropout(0.3)(model)
model = Dense(512, activation="tanh", kernel_initializer="uniform")(model)
#model = Activation('tanh')(model)
#model = Dropout(0.5)(model)
#model = Dense(1024, activation="tanh", kernel_initializer="uniform")(model)
#model = Activation('tanh')(model)
#model = Dropout(0.5)(model)
model = Dense(13, activation="softmax")(model)
model = Model(inputs=[image_model.input, language_model.input], outputs=model)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
model.summary()
Here instead the training code. The dataset was split 80/20, batch size 64, epochs are low but since the dataset is big (3k batches), the loss explode before even getting to 10% of a single one.
words target class is one hot encoded and the question encoding is done with a one to one dictionary correspondence (using a dictyonary with every word since there are not many), leaving 0 as padding value. I have ingnored commas, question marks etc. tough.
train_gen=image_generator(batch_size=batch_size)
eval_gen=evaluation_generator(batch_size=batch_size)
model.fit(x=train_gen, epochs=2, verbose=1, validation_data=eval_gen, steps_per_epoch=training_batches ,validation_steps=evaluation_batches, shuffle=True, max_queue_size=10, callbacks=[save])
I also get this warning message
/opt/conda/lib/python3.6/site-packages/tensorflow_core/python/framework/indexed_slices.py:433: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Epoch 1/2
522/3243 [===>..........................] - ETA: 33:05 - loss: 2825421622922535501824.0000
I observed that the model would answer to all the question with the same class(I immagine it to be the cause of the diverging loss).
where image_generator is defined:
def my_hash (word):
for x in range(dictionary_lenght-1):
if word==unique_words[x]:
return (x+1)
print("Error, word not in the vocabulary")
def pad(sequence, lenght, value=0):
for x in range(len(sequence), lenght):
sequence.append(value)
return sequence
def image_generator(batch_size = 32):
zeros=[0]*13
while True:
for x2 in range(training_batches):# Select files (paths/indices) for the batch
input_img_batch = []
input_question_batch = []
output_batch = []
img_name=""
for x in range(batch_size):
temp=[]
img_name=training_data["questions"][x+x2*batch_size]["image_filename"]
question=training_data["questions"][x+x2*batch_size]["question"].replace("?","")
question=hashing_trick(question, dictionary_lenght,hash_function=my_hash)
question=pad(question, max_lenght)
img = Image.open("/kaggle/input/ann-and-dl-vqa/dataset_vqa/train/" + img_name , 'r')
img=img.resize([img_width, img_height])
img=np.asarray(img)#execute the same process as before but the corrispective mask
img=img/255
input_img_batch.append(img)
input_question_batch.append(question)
dummy=zeros
dummy[encode_answer(training_data["questions"][x+x2*batch_size]["answer"])]=1
output_batch.append(dummy)
# Return a tuple of (input,output) to feed the network
batch_x1 = np.array( input_img_batch )
batch_x2 = np.array( input_question_batch )
batch_y = np.array( output_batch )
yield( [batch_x1, batch_x2], batch_y )

I solved the problem.
There was an issue in the image_generator.
The vector zero somehow changed value and became equal to dummy(instead of the other way) and messed with the prediction target.

Related

How to make a neural network of nodes with equal weights in Keras, preferably functional API

I want to make a model like the below picture. (simplified)
So, practically, I want the weights with the same names to always have the same values during training. What I did was the code below:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
example_train_features = np.arange(12000).reshape(1000, 12)
example_lanbels = np.random.randint(2, size=1000) #these data are just for illustration purposes
train_ds = tf.data.Dataset.from_tensor_slices((example_train_features, example_lanbels)).shuffle(buffer_size = 1000).batch(32)
dense1 = layers.Dense(1, activation="relu") #input shape:4
dense2 = layers.Dense(2, activation="relu") #input shape:1
dense3 = layers.Dense(1, activation="sigmoid") #input shape:6
feature_input = keras.Input(shape=(12,), name="features")
nodes_list = []
for i in range(3):
first_lvl_input = feature_input[i :: 4] ######## marked line
out1 = dense1(first_lvl_input)
out2 = dense2(out1)
nodes_list.append(out2)
joined = layers.concatenate(nodes_list)
final_output = dense3(joined)
model = keras.Model(inputs = feature_input, outputs = final_output, name="extrema_model")
compile_and_fit(model, train_ds, val_ds, patience=4)
model.compile(loss = tf.keras.losses.BinaryCrossentropy(),
optimizer = tf.keras.optimizers.RMSprop(),
metrics=keras.metrics.BinaryAccuracy())
history = model.fit(train_ds, epochs=10, validation_data=val_ds)
But when I try to run this code I get this error:
MklConcatOp : Dimensions of inputs should match: shape[0][0]= 71 vs. shape[18][0] = 70
[[node extrema_model/concatenate_2/concat (defined at <ipython-input-373-5efb41d312df>:398) ]] [Op:__inference_train_function_15338]
(please don't pay attention to numbers as they are from my real code) this is because it gets the whole data including the labels as an input, but shouldn't Keras only feed the features itself? Anyway, if I write the marked line as below:
first_lvl_input = feature_input[i :12: 4]
it doesn't give me the above error anymore. But, then I get another error which I know why happens but I don't know how to resolve it.
InvalidArgumentError: Incompatible shapes: [4,1] vs. [32,1]
[[node gradient_tape/binary_crossentropy/logistic_loss/mul/BroadcastGradientArgs
(defined at <ipython-input-1-b82546367b3c>:398) ]] [Op:__inference_train_function_6098]
This is because keras is feeding again the whole batch array, whereas in Keras documentation it is written you shouldn't specify the batch dimension for the program as it understands itself, so I expected Keras to feed the data one by one for my code to work. So I appreciate any ideas on how to resolve this or on how to write a code for what I want. Thanks.

You can wrap the dense layers in timedistributed wrapper , and reshape your data to have three dimensions (1000,3,4)(batch, sequence, feature), so for each time step (=3 that replace your for loop code .) the four features will be multiplied with the same weights each time.
example_train_features = np.arange(12000).reshape(1000, 3, 4 )
example_lanbels = np.random.randint(2, size=1000) #these data are just for illustration purposes
train_ds = tf.data.Dataset.from_tensor_slices((example_train_features, example_lanbels)).shuffle(buffer_size = 1000).batch(32)
dense1 = layers.TimeDistributed(layers.Dense(1, activation="relu")) #input shape:4
dense2 =layers.TimeDistributed(layers.Dense(2, activation="relu")) #input shape:1
dense3 = layers.Dense(1, activation="sigmoid") #input shape:6
feature_input = keras.Input(shape=(3,4), name="features")
out1 = dense1(feature_input)
out2 = dense2(out1)
z = layers.Flatten()(out2)
final_output = dense3(z)
model = keras.Model(inputs = feature_input, outputs = final_output, name="extrema_model")
model.compile(loss = tf.keras.losses.BinaryCrossentropy(),
optimizer = tf.keras.optimizers.RMSprop(),
metrics=keras.metrics.BinaryAccuracy())
history = model.fit(train_ds, epochs=10)

Character-based Text Classification with Triplet Loss

Im trying to implement a text-classifier using triplet loss to classify different job descriptions into categories based on this paper. But whatever i do, the classifier yields very bad results.
For the embedding i followed this tutorial and the NN architecture is based on this article.
I create my encodings using:
max_char_len = 20
group_numbers = range(0, len(job_groups))
char_vocabulary = {'PAD':0}
X_char = []
y_temp = []
i = 1
for group, number in zip(job_groups, group_numbers):
for job in group:
job_cleaned = some_cleaning_function(job)
job_enc = []
for c in job_cleaned:
if c in char_vocabulary.keys():
job_enc.append(char_vocabulary[c])
else:
char_vocabulary[c] = i
job_enc.append(char_vocabulary[c])
i+=1
X_char.append(job_enc)
y_temp.append(number)
X_char = pad_sequences(X_char, maxlen = max_char_length, truncating='post')
My Neural Network is set up the following way:
def create_base_model():
char_in = Input(shape=(max_char_length,), name='Char_Input')
char_enc = Embedding(input_dim=len(char_vocabulary)+1, output_dim=20, mask_zero=True,name='Char_Embedding')(char_in)
x = Bidirectional(LSTM(64, return_sequences=True, recurrent_dropout=0.2, dropout=0.4))(char_enc)
x = Bidirectional(LSTM(64, return_sequences=True, recurrent_dropout=0.2, dropout=0.4))(x)
x = Bidirectional(LSTM(64, return_sequences=True, recurrent_dropout=0.2, dropout=0.4))(x)
x = Bidirectional(LSTM(64, return_sequences=False, recurrent_dropout=0.2, dropout=0.4))(x)
out = Dense(128, activation = "softmax")(x)
return Model(char_in, out)
def get_siamese_triplet_char():
anchor_input_c = Input(shape=(max_char_length,),name='Char_Input_Anchor')
pos_input_c = Input(shape=(max_char_length,),name='Char_Input_Positive')
neg_input_c = Input(shape=(max_char_length,),name='Char_Input_Negative')
base_model = create_base_model(encoding_generator)
encoded_anchor = base_model(anchor_input_c)
encoded_positive = base_model(pos_input_c)
encoded_negative = base_model(neg_input_c)
inputs = [anchor_input_c, pos_input_c, neg_input_c]
outputs = [encoded_anchor, encoded_positive, encoded_negative]
siamese_triplet = Model(inputs, outputs)
siamese_triplet.add_loss((triplet_loss(outputs)))
siamese_triplet.compile(loss=None, optimizer='adam')
return siamese_triplet, base_model
The triplet loss is defined as follows:
def triplet_loss(inputs):
anchor, positive, negative = inputs
positive_distance = K.square(anchor - positive)
negative_distance = K.square(anchor - negative)
positive_distance = K.sqrt(K.sum(positive_distance, axis=-1, keepdims = True))
negative_distance = K.sqrt(K.sum(negative_distance, axis=-1, keepdims = True))
loss = positive_distance - negative_distance
loss = K.maximum(0.0, 1 + loss)
return K.mean(loss)
The model is then trained with:
siamese_triplet_char.fit(x=
[Anchor_chars_train,
Positive_chars_train,
Negative_chars_train],
shuffle=True, batch_size=8, epochs=22, verbose=1)
My goal is to: First, train the network with no label data in order to minimize the space of the different phrases and second, add a classification layer and create the final classifier.
My general problem is that even the first phase shows sinking cost-values it overfits and the validation results jump around and the second phase fails badly as I'm not able to train the model to actually classify.
My questions are the following:
Could someone explain the Embedding Architecture? What is the output dimension refering to? The individual characters? Would that even make sense? Or is there a better way to encode the input data?
How can i add validation_data to a network that does not contain labeled data? I could use validation_split, but i would rather prefer passing specific data to validate as my data is stratified.
Is there a reason why the classification does not work? Applying a simple K-Nearest Neighbor algorithm achieves at best 0.5 accuracy! Is it because of the data? Or is there a systematic error in my system?
All ideas and suggestions are really appreciated!

Encoder-Decoder LSTM model gives 'nan' loss and predictions

I am trying to create a basic encoder-decoder model for training a chatbot. X contains the questions or human dialogues and Y contains the bot answers. I have padded the sequences to the max size of input and output sentences. X.shape = (2363, 242, 1) and Y.shape = (2363, 144, 1). But during training, the loss has value 'nan' for all epochs and the prediction gives array with all values as 'nan'. I have tried using 'rmsprop' optimizer instead of 'adam'. I cannot use loss function 'categorical_crossentropy' as the output is not one-hot encoded but a sequence. What exactly is wrong with my code?
Model
model = Sequential()
model.add(LSTM(units=64, activation='relu', input_shape=(X.shape[1], 1)))
model.add(RepeatVector(Y.shape[1]))
model.add(LSTM(units=64, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(units=1)))
print(model.summary())
model.compile(optimizer='adam', loss='mean_squared_error')
hist = model.fit(X, Y, epochs=20, batch_size=64, verbose=2)
model.save('encoder_decoder_model_epochs20.h5')
Data Preparation
def remove_punctuation(s):
s = s.translate(str.maketrans('','',string.punctuation))
s = s.encode('ascii', 'ignore').decode('ascii')
return s
def prepare_data(fname):
word2idx = {'PAD': 0}
curr_idx = 1
sents = list()
for line in open(fname):
line = line.strip()
if line:
tokens = remove_punctuation(line.lower()).split()
tmp = []
for t in tokens:
if t not in word2idx:
word2idx[t] = curr_idx
curr_idx += 1
tmp.append(word2idx[t])
sents.append(tmp)
sents = np.array(pad_sequences(sents, padding='post'))
return sents, word2idx
human = 'rdany-conversations/human_text.txt'
robot = 'rdany-conversations/robot_text.txt'
X, input_vocab = prepare_data(human)
Y, output_vocab = prepare_data(robot)
X = X.reshape((X.shape[0], X.shape[1], 1))
Y = Y.reshape((Y.shape[0], Y.shape[1], 1))

First of all check that you do not have any NaNs in your input. If this is not the case it might be exploding gradients. Standardize your inputs (MinMax- or Z-scaling), try smaller learning rates, clip gradients the gradients, try different weight initializations.

Transfer learning with Eulidean loss in the final layer

Greatly appreciate it if someone could help me out here:
I'm trying to do some transfer learning on a regression task --- my inputs are 200X200 RGB images and my prediction output/label is a set of real values (let's say, within [0,10], though scaling is not a big deal here...?) --- on top of InceptionV3 architecture. Here are my functions that take a pretrained Inception model, remove the last layer and add a new layer for transfer learning...
"""
Transfer learning functions
"""
IM_WIDTH, IM_HEIGHT = 299, 299 #fixed size for InceptionV3
NB_EPOCHS = 3
BAT_SIZE = 32
FC_SIZE = 1024
NB_IV3_LAYERS_TO_FREEZE = 172
def eucl_dist(inputs):
x, y = inputs
return ((x - y)**2).sum(axis=-1)
def add_new_last_continuous_layer(base_model):
"""Add last layer to the convnet
Args:
base_model: keras model excluding top, for instance:
base_model = InceptionV3(weights='imagenet',include_top=False)
Returns:
new keras model with last layer
"""
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(FC_SIZE, activation='relu')(x)
predictions = Lambda(eucl_dist, output_shape=(1,))(x)
model = Model(input=base_model.input, output=predictions)
return model
def setup_to_transfer_learn_continuous(model, base_model):
"""Freeze all layers and compile the model"""
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer='rmsprop',
loss= 'eucl_dist',
metrics=['accuracy'])
Here are my implementations:
base_model = InceptionV3(weights = "imagenet",
include_top=False, input_shape=(3,200,200))
model0 = add_new_last_continuous_layer(base_model)
setup_to_transfer_learn_continuous(model0, base_model)
history=model0.fit(train_x, train_y, validation_data = (test_x, test_y), nb_epoch=epochs, batch_size=32)
scores = model0.evaluate(test_x, test_y, verbose = 0)
features = model0.predict(X_train)
where train_x is a (168435, 3, 200, 200) numpy array and train_y is a (168435,) numpy array. The same goes for test_x and test_y except the number of observations is 42509.
I got the TypeError: Tensor object is not iterable bug which occurred at predictions = Lambda(eucl_dist, output_shape=(1,))(x)'' when going through theadd_new_last_continuous_layer()`` function. Could you anyone kindly give me some guidance to get around that and what the problem is? Greatly appreciated and happy holidays!

Keras: How to concatenate over a subset of inputs

I am training a neural network using Keras and Theano, in which inputs have a format like:
[
[Situation features],
[Option 1 features],
[Option 2 features],
]
I want to train a model to predict how often each option will be chosen, by making the model learn how to score each option, and how the situation makes differences in score more or less important.
My model looks like:
option_inputs = [Input(shape=(NUM_FEATURES,), name='situation_input'),
Input(shape=(NUM_FEATURES,), name='option_input_0'),
Input(shape=(NUM_FEATURES,), name='option_input_1')]
situation_input_processing = Dense(5, activation='relu', name='situation_input_processing')
option_input_processing = Dense(20, activation='relu', name='option_input_processing')
diversity_neuron = Dense(1, activation='softplus', name='diversity_neuron')
scoring_neuron = Dense(1, activation='linear', name='scoring_neuron')
diversity_output = diversity_neuron(situation_input_processing(journey_inputs[0]))
scoring_outputs = [scoring_neuron(option_input_processing(option_input)) for option_input in option_inputs[1:2]]
logit_outputs = [Multiply()([diversity_output, scoring_output]) for scoring_output in scoring_outputs]
probability_outputs = Activation('softmax')(keras.layers.concatenate(logit_outputs, axis=-1))
model = Model(inputs=option_inputs, outputs=probability_outputs)
When trying to get probability_outputs, I get the error:
ValueError: Concatenate layer should be called on a list of inputs
The error seems to be triggered because logit_outputs is not built iterating through all 3 input feature collections, only out of 2 of them.
Any idea how to work around this problem?
Once the model is trained, I want to observe the outputs of diversity_neuron and scoring_neuron to learn how to extrapolate the scoring for arbitrary number of options and understand what drives diversity.

I have made these changes to workaround the problem:
I'm including the situation features in the beginning of each option
feature list
I have added a layer that can filter the situation
features from the option inputs. This is made by manually setting
weights of a non-trainable layer.
I then can iterate through all
inputs in any path of the network
The final code looks like:
option_inputs = [Input(shape=(NUM_FEATURES,), name='option_input_0'),
Input(shape=(NUM_FEATURES,), name='option_input_1')]
situation_input_filtering = Dense(NUM_SITUATION_FEATURES, activation='linear', name='situation_input_filtering')
situation_input_filtering.trainable = False
situation_input_processing = Dense(5, activation='relu', name='situation_input_processing')
option_input_processing = Dense(20, activation='relu', name='option_input_processing')
diversity_neuron = Dense(1, activation='sigmoid', name='diversity_neuron')
scoring_neuron = Dense(1, activation='linear', name='scoring_neuron')
diversity_outputs = [diversity_neuron(situation_input_processing(situation_input_filtering(option_input))) for
option_input in option_inputs]
scoring_outputs = [scoring_neuron(option_input_processing(option_input)) for option_input in option_inputs]
logit_outputs = [Multiply()([diversity_output, scoring_output]) for diversity_output, scoring_output in
zip(diversity_outputs, scoring_outputs)]
combined = keras.layers.concatenate(logit_outputs, axis=-1)
probability_outputs = Activation('softmax')(combined)
model = Model(inputs=option_inputs, outputs=probability_outputs)
model.compile(optimizer='adadelta', loss='categorical_crossentropy', metrics=['accuracy'])
mask_weights = np.zeros([NUM_FEATURES, NUM_SITUATION_FEATURES])
for i in xrange(NUM_situation_FEATURES):
mask_weights[i, i] = 1.0
for layer in model.layers:
if layer.name == 'situation_input_filtering':
layer.set_weights([mask_weights, np.zeros(NUM_SITUATION_FEATURES)])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Loss not converging in visual question answering with keras - python

I solved the problem. There was an issue in the image_generator. The vector zero somehow changed value and became equal to dummy(instead of the other way) and messed with the prediction target.

Related

How to make a neural network of nodes with equal weights in Keras, preferably functional API

Character-based Text Classification with Triplet Loss

Encoder-Decoder LSTM model gives 'nan' loss and predictions

Transfer learning with Eulidean loss in the final layer

Keras: How to concatenate over a subset of inputs

Categories

Resources