Merge 3 Deep Network and Train End-to-End - python

I'm using a deep learning concept but a beginner in it, I'm trying to build a feature fusion concept using 3 deep neural network models, the idea is I'm trying to get features from all three models and do classification on the last single sigmoid layer and then get the results, here is the code that I run.
Code:
from keras.layers import Input, Dense
from keras.models import Model
from sklearn.model_selection import train_test_split
import numpy
# random seed for reproducibility
numpy.random.seed(2)
# loading load pima indians diabetes dataset, past 5 years of medical history
dataset = numpy.loadtxt('https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv', delimiter=",")
# split into input (X) and output (Y) variables, splitting csv data
X = dataset[:, 0:8]
Y = dataset[:, 8]
x_train, x_validation, y_train, y_validation = train_test_split(X, Y, test_size=0.20, random_state=5)
#create the input layer
input_layer = Input(shape=(8,))
A2 = Dense(8, activation='relu')(input_layer)
A3 = Dense(30, activation='relu')(A2)
B2 = Dense(40, activation='relu')(A2)
B3 = Dense(30, activation='relu')(B2)
C2 = Dense(50, activation='relu')(B2)
C3 = Dense(5, activation='relu')(C2)
merged = Model(inputs=[input_layer],outputs=[A3,B3,C3])
final_model = Dense(1,
activation='sigmoid')(merged
final_model.compile(loss="binary_crossentropy",
optimizer="adam", metrics=['accuracy'])
# call the function to fit to the data (training the network)
final_model.fit(x_train, y_train, epochs=2000, batch_size=50,
validation_data=(x_validation, y_validation))
# evaluate the model
scores = final_model.evaluate(x_validation,y_validation)
print("\n%s: %.2f%%" % (final_model.metrics_names[1], scores[1] * 100))
Here is the error that I'm facing
if x.shape.ndims is None:
AttributeError: 'Functional' object has no attribute 'shape'
Please help me out to fix this issue, or if anyone knows what code should I use then let me know I'm also willing to change code but not concept Thank you.
Update
From #M.Innat's answer, we've tried as follows. The idea is we first build 3 models and then build a final / combine model by joining these models with a single classifier. But I am facing a discrepancy. When I train each model, they gave 90% results but when I combine them, they hardly reach 60 or 70.
Code MODEL 1:
model = Sequential()
# input layer requires input_dim param
model.add(Dense(10, input_dim=8, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(5, activation='relu'))
# sigmoid instead of relu for final probability between 0 and 1
model.add(Dense(1, activation='sigmoid'))
# compile the model, adam gradient descent (optimized)
model.compile(loss="binary_crossentropy",
optimizer="adam", metrics=['accuracy'])
# call the function to fit to the data (training the network)
model.fit(x_train, y_train, epochs=1000, batch_size=50,
validation_data=(x_validation, y_validation))
# evaluate the model
scores = model.evaluate(X, Y)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1] * 100))
model.save('diabetes_risk_nn.h5')
MODEL 1 Accuracy = 94.14%. And same as another 2 models.
MODEL 2 Accuracy = 93.62%
MODEL 3 Accuracy = 92.71%
Next, as #M.Innat's suggested to merging the models. Here we have done that using the above Models 1,2,3. But the score is not near ~90%. FINAL Combined Model:
# Define Model A
input_layer = Input(shape=(8,))
A2 = Dense(10, activation='relu')(input_layer)
A3 = Dense(50, activation='relu')(A2)
A4 = Dense(50, activation='relu')(A3)
A5 = Dense(50, activation='relu')(A4)
A6 = Dense(50, activation='relu')(A5)
A7 = Dense(50, activation='relu')(A6)
A8 = Dense(5, activation='relu')(A7)
model_a = Model(inputs=input_layer, outputs=A8, name="ModelA")
# Define Model B
input_layer = Input(shape=(8,))
B2 = Dense(10, activation='relu')(input_layer)
B3 = Dense(50, activation='relu')(B2)
B4 = Dense(40, activation='relu')(B3)
B5 = Dense(60, activation='relu')(B4)
B6 = Dense(30, activation='relu')(B5)
B7 = Dense(50, activation='relu')(B6)
B8 = Dense(50, activation='relu')(B7)
B9 = Dense(5, activation='relu')(B8)
model_b = Model(inputs=input_layer, outputs=B9, name="ModelB")
# Define Model C
input_layer = Input(shape=(8,))
C2 = Dense(10, activation='relu')(input_layer)
C3 = Dense(50, activation='relu')(C2)
C4 = Dense(40, activation='relu')(C3)
C5 = Dense(40, activation='relu')(C4)
C6 = Dense(70, activation='relu')(C5)
C7 = Dense(50, activation='relu')(C6)
C8 = Dense(50, activation='relu')(C7)
C9 = Dense(60, activation='relu')(C8)
C10 = Dense(50, activation='relu')(C9)
C11 = Dense(5, activation='relu')(C10)
model_c = Model(inputs=input_layer, outputs=C11, name="ModelC")
all_three_models = [model_a, model_b, model_c]
all_three_models_input = Input(shape=all_three_models[0].input_shape[1:])
And then combine these three.
models_output = [model(all_three_models_input) for model in all_three_models]
Concat = tf.keras.layers.concatenate(models_output, name="Concatenate")
final_out = Dense(1, activation='sigmoid')(Concat)
final_model = Model(inputs=all_three_models_input, outputs=final_out, name='Ensemble')
#tf.keras.utils.plot_model(final_model, expand_nested=True)
final_model.compile(loss="binary_crossentropy",
optimizer="adam", metrics=['accuracy'])
# call the function to fit to the data (training the network)
final_model.fit(x_train, y_train, epochs=1000, batch_size=50,
validation_data=(x_validation, y_validation))
# evaluate the model
scores = final_model.evaluate(x_validation,y_validation)
print("\n%s: %.2f%%" % (final_model.metrics_names[1], scores[1] * 100))
final_model.save('diabetes_risk_nn.h5')
But unlike each model where they gave 90%, this combine final model gave accuracy around =70%

I suppose the output layer is that Dense(1, activation='sigmoid'). So try something like this
# ...
merged = tf.keras.layers.concatenate([A3,B3,C3])
out = Dense(1, activation='sigmoid')(merged)
model = (input_layer, out)
model.fit(x_train, y_train, ...)

According to your code, there is only one model (not three). And by seeing the output that you tried, I think you're looking for something like this:
DataSet
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
from sklearn.model_selection import train_test_split
import numpy
# random seed for reproducibility
numpy.random.seed(2)
# loading load pima indians diabetes dataset, past 5 years of medical history
dataset = numpy.loadtxt('https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv', delimiter=",")
# split into input (X) and output (Y) variables, splitting csv data
X = dataset[:, 0:8]
Y = dataset[:, 8]
x_train, x_validation, y_train, y_validation = train_test_split(X, Y, test_size=0.20, random_state=5)
Model
#create the input layer
input_layer = Input(shape=(8,))
A2 = Dense(8, activation='relu')(input_layer)
A3 = Dense(30, activation='relu')(A2)
B2 = Dense(40, activation='relu')(input_layer)
B3 = Dense(30, activation='relu')(B2)
C2 = Dense(50, activation='relu')(input_layer)
C3 = Dense(5, activation='relu')(C2)
merged = tf.keras.layers.concatenate([A3,B3,C3])
final_out = Dense(1, activation='sigmoid')(merged)
final_model = Model(inputs=[input_layer], outputs=final_out)
tf.keras.utils.plot_model(final_model)
Train
final_model.compile(loss="binary_crossentropy",
optimizer="adam", metrics=['accuracy'])
# call the function to fit to the data (training the network)
final_model.fit(x_train, y_train, epochs=5, batch_size=50,
validation_data=(x_validation, y_validation))
# evaluate the model
scores = final_model.evaluate(x_validation,y_validation)
print("\n%s: %.2f%%" % (final_model.metrics_names[1], scores[1] * 100))
Epoch 1/5
13/13 [==============================] - 1s 15ms/step - loss: 0.7084 - accuracy: 0.6803 - val_loss: 0.6771 - val_accuracy: 0.6883
Epoch 2/5
13/13 [==============================] - 0s 5ms/step - loss: 0.6491 - accuracy: 0.6600 - val_loss: 0.5985 - val_accuracy: 0.6623
Epoch 3/5
13/13 [==============================] - 0s 5ms/step - loss: 0.6161 - accuracy: 0.6813 - val_loss: 0.6805 - val_accuracy: 0.6883
Epoch 4/5
13/13 [==============================] - 0s 5ms/step - loss: 0.6335 - accuracy: 0.7003 - val_loss: 0.6115 - val_accuracy: 0.6623
Epoch 5/5
13/13 [==============================] - 0s 5ms/step - loss: 0.5684 - accuracy: 0.7285 - val_loss: 0.6150 - val_accuracy: 0.6883
5/5 [==============================] - 0s 2ms/step - loss: 0.6150 - accuracy: 0.6883
accuracy: 68.83%
Update
Based on your this comment:
Let me explain to you what I'm trying to do, firstly I create 3 models DNN separately then I try to combine those models to get features of all there, after that I want to classify using all extracted features and then evaluate the accuracy. That's what actually I'm trying to develop.
create 3 models separately - OK, 3 models
combine them to get a feature - OK, Feature extractors
classify - OK, Average the model output feature maps and pass to the classifier - in other words Ensembling.
Let's do this. First, build three models separately.
# Define Model A
input_layer = Input(shape=(8,))
A2 = Dense(8, activation='relu')(input_layer)
A3 = Dense(30, activation='relu')(A2)
C3 = Dense(5, activation='relu')(A3)
model_a = Model(inputs=input_layer, outputs=C3, name="ModelA")
# Define Model B
input_layer = Input(shape=(8,))
A2 = Dense(8, activation='relu')(input_layer)
A3 = Dense(30, activation='relu')(A2)
C3 = Dense(5, activation='relu')(A3)
model_b = Model(inputs=input_layer, outputs=C3, name="ModelB")
# Define Model C
input_layer = Input(shape=(8,))
A2 = Dense(8, activation='relu')(input_layer)
A3 = Dense(30, activation='relu')(A2)
C3 = Dense(5, activation='relu')(A3)
model_c = Model(inputs=input_layer, outputs=C3, name="ModelC")
I used the same number of parameters, change yourself. Anyway, these three models perform as each feature extractor (not classifier). Next, we will combine their output by averaging them and after that pass that to the classifier.
all_three_models = [model_a, model_b, model_c]
all_three_models_input = Input(shape=all_three_models[0].input_shape[1:])
models_output = [model(all_three_models_input) for model in all_three_models]
Avg = tf.keras.layers.average(models_output, name="Average")
final_out = Dense(1, activation='sigmoid')(Avg)
final_model = Model(inputs=all_three_models_input, outputs=final_out, name='Ensemble')
tf.keras.utils.plot_model(final_model, expand_nested=True)
Now, you can train the model and evaluate it on the test set. Hope this helps.
More info.
(1). You can add seed.
from tensorflow.keras.models import Model
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from sklearn.model_selection import train_test_split
import os, numpy
# random seed for reproducibility
numpy.random.seed(101)
tf.random.set_seed(101)
os.environ['TF_CUDNN_DETERMINISTIC'] = '1'
dataset = .. your data
# split into input (X) and output (Y) variables, splitting csv data
X = dataset[:, 0:8]
Y = dataset[:, 8]
x_train, x_validation, y_train, y_validation = train_test_split(X, Y,
test_size=0.20, random_state=101)
(2). Try with the SGD optimizer. Also, use the ModelCheckpoint callback to save the highest validation accuracy.
final_model.compile(loss="binary_crossentropy",
optimizer="sgd", metrics=['accuracy'])
model_save = tf.keras.callbacks.ModelCheckpoint(
'merge_best.h5',
monitor="val_accuracy",
verbose=0,
save_best_only=True,
save_weights_only=True,
mode="max",
save_freq="epoch"
)
# call the function to fit to the data (training the network)
final_model.fit(x_train, y_train, epochs=1000, batch_size=256, callbacks=[model_save],
validation_data=(x_validation, y_validation))
Evaluate on the test set.
# evaluate the model
final_model.load_weights('merge_best.h5')
scores = final_model.evaluate(x_validation,y_validation)
print("\n%s: %.2f%%" % (final_model.metrics_names[1], scores[1] * 100))
5/5 [==============================] - 0s 4ms/step - loss: 0.6543 - accuracy: 0.7662
accuracy: 76.62%

Related

Training GCN regression model but getting bad accuracy and prediction results

I am trying to build a model that can read my graph data and use the node features with the weighted adjacency data to predict specific targets.
I started with 21 sample node with each having a set of 16801 features, their indices for determining training, validation and testing nodes through the training, and the adjacency determining the corresponding weighted edges values.
x_features #shape=(1, 21, 16801) dtype=float32
x_indices #shape=(1, None) dtype=int32
x_adjacency #shape=(1, 21, 21) dtype=float32
The prediction targets are saved in separated target lists:
y_train = np.expand_dims(train_targets, 0).astype(np.float32)
y_val = np.expand_dims(val_targets, 0).astype(np.float32)
y_test = np.expand_dims(test_targets, 0).astype(np.float32)
y_train #array([[[32.],[31.],[27.],[29.],[28.],[35.],[35.],[27.],[33.],[26.]]], dtype=float32)
The model is like the following:
x = Dropout(0.5)(x_features)
x = GraphConvolution(32, activation='relu',
use_bias=True,
kernel_initializer=kernel_initializer,
bias_initializer=bias_initializer)([x, x_adjacency])
x = Dropout(0.5)(x)
x = GraphConvolution(16, activation='relu',
use_bias=True,
kernel_initializer=kernel_initializer,
bias_initializer=bias_initializer)([x, x_adjacency])
x = GatherIndices(batch_dims=1)([x, x_indices])
output = Dense(1, activation='linear')(x)
model = Model(inputs=[x_features, x_indices, x_adjacency], outputs=output)
model.summary()
The model summary
model.compile(
optimizer=SGD(learning_rate=0.1, momentum=0.9),
loss='mean_squared_error',
metrics=["acc"],
)
history = model.fit(
x = [features_input, train_indices, A_input], #features_input.shape:(1, 21, 16801). train_indices.shape:(1,10). A_input.shape:(1, 21, 21)
y = y_train, #y_train.shape:(1, 10, 1)
batch_size = 32,
epochs=200,
validation_data=([features_input, val_indices, A_input], y_val),
verbose=1,
shuffle=False,
)
I reach the last epoch with:
Epoch 200/200
1/1 [==============================] - 0s 31ms/step - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00
test_preds = model.predict([features_input, test_indices, A_input])
print('test_preds:\n' , test_preds,'\n\n y_test:\n', y_test)
outputs:
test_preds: [ [ [nan][nan][nan][nan][nan][nan] ] ]
y_test: [ [ [28.][32.][30.][34.][32.][35.] ] ]

Keras neural network takes only few samples to train

data = np.random.random((10000, 150))
labels = np.random.randint(10, size=(10000, 1))
labels = to_categorical(labels, num_classes=10)
model = Sequential()
model.add(Dense(units=32, activation='relu', input_shape=(150,)))
model.add(Dense(units=10, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(data, labels, epochs=30, validation_split=0.2)
I created 10000 random samples to train my net, but it use only few of them(250/10000)
Exaple of the 1st epoch:
Epoch 1/30
250/250 [==============================] - 0s 2ms/step - loss: 2.1110 - accuracy: 0.2389 - val_loss: 2.2142 - val_accuracy: 0.1800
Your data is split into training and validation subsets (validation_split=0.2).
Training subset has size 8000 and validation 2000.
Training goes in batches, each batch has size 32 samples by default.
So one epoch should take 8000/32=250 batches, as it shows in the progress.
Try code like following example
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(10, size=(1000, 1))
# Convert labels to categorical one-hot encoding
one_hot_labels = keras.utils.to_categorical(labels, num_classes=10)
# Train the model, iterating on the data in batches of 32 samples
model.fit(data, one_hot_labels, epochs=10, batch_size=32)

Keras Batchnormalization and sample weights

I am trying the the training and evaluation example on the tensorflow website.
Specifically, this part:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255
y_train = y_train.astype('float32')
y_test = y_test.astype('float32')
def get_uncompiled_model():
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.BatchNormalization()(x)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, activation='softmax', name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
return model
def get_compiled_model():
model = get_uncompiled_model()
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
loss='sparse_categorical_crossentropy',
metrics=['sparse_categorical_accuracy'])
return model
sample_weight = np.ones(shape=(len(y_train),))
sample_weight[y_train == 5] = 2.
# Create a Dataset that includes sample weights
# (3rd element in the return tuple).
train_dataset = tf.data.Dataset.from_tensor_slices(
(x_train, y_train, sample_weight))
# Shuffle and slice the dataset.
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)
model = get_compiled_model()
model.fit(train_dataset, epochs=3)
It appears that if I add the batch normalization layer (this line: x = layers.BatchNormalization()(x)) I get the following error:
InvalidArgumentError: The second input must be a scalar, but it has shape [64]
[[{{node batch_normalization_2/cond/ReadVariableOp/Switch}}]]
Any ideas?
The same code works for me.
The only lines I changed are :
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3)
to model.compile(optimizer=keras.optimizers.RMSprop(lr=1e-3)
(which is version specific)
Then
model.fit(train_dataset, epochs=3) to model.fit(train_dataset, epochs=3, steps_per_epoch=30)
Reason : When using iterators as input to a model, you should specify the steps_per_epoch argument
If you just want to use sample weights, you don't have to use tf.data.Dataset, you can simply run:
model.fit(x=x_train, y=y_train, sample_weight=sample_weight, batch_size=64, epochs=3)
and it works for me (when I change learning_rate to lr as #ASHu2 mentioned).
It gets 97% accuracy after 3 epochs:
...
57408/60000 [===========================>..] - ETA: 0s - loss: 0.1010 - sparse_categorical_accuracy: 0.9709
58816/60000 [============================>.] - ETA: 0s - loss: 0.1011 - sparse_categorical_accuracy: 0.9708
60000/60000 [==============================] - 2s 37us/sample - loss: 0.1007 - sparse_categorical_accuracy: 0.9709
I used TF 1.14.0 on windows.
The problem was solved when I updated tensorflow from version 1.14.1 to 2.0.0-rc1.

How to fix a constant validation accuracy in machine learning?

I'm trying to do image classification with dicom images that have balanced classes using the pre-trained InceptionV3 model.
def convertDCM(PathDCM) :
data = []
for dirName, subdir, files in os.walk(PathDCM):
for filename in sorted(files):
ds = pydicom.dcmread(PathDCM +'/' + filename)
im = fromarray(ds.pixel_array)
im = keras.preprocessing.image.img_to_array(im)
im = cv2.resize(im,(299,299))
data.append(im)
return data
PathDCM = '/home/Desktop/FULL_BALANCED_COLOURED/'
data = convertDCM(PathDCM)
#scale the raw pixel intensities to the range [0,1]
data = np.array(data, dtype="float")/255.0
labels = np.array(labels,dtype ="int")
#splitting data into training and testing
#test_size is percentage to split into test/train data
(trainX, testX, trainY, testY) = train_test_split(
data,labels,
test_size=0.2,
random_state=42)
img_width, img_height = 299, 299 #InceptionV3 size
train_samples = 300
validation_samples = 50
epochs = 25
batch_size = 15
base_model = keras.applications.InceptionV3(
weights ='imagenet',
include_top=False,
input_shape = (img_width,img_height,3))
model_top = keras.models.Sequential()
model_top.add(keras.layers.GlobalAveragePooling2D(input_shape=base_model.output_shape[1:], data_format=None)),
model_top.add(keras.layers.Dense(300,activation='relu'))
model_top.add(keras.layers.Dropout(0.5))
model_top.add(keras.layers.Dense(1, activation = 'sigmoid'))
model = keras.models.Model(inputs = base_model.input, outputs = model_top(base_model.output))
#Compiling model
model.compile(optimizer = keras.optimizers.Adam(
lr=0.0001),
loss='binary_crossentropy',
metrics=['accuracy'])
#Image Processing and Augmentation
train_datagen = keras.preprocessing.image.ImageDataGenerator(
rescale = 1./255,
zoom_range = 0.1,
width_shift_range = 0.2,
height_shift_range = 0.2,
horizontal_flip = True,
fill_mode ='nearest')
val_datagen = keras.preprocessing.image.ImageDataGenerator()
train_generator = train_datagen.flow(
trainX,
trainY,
batch_size=batch_size,
shuffle=True)
validation_generator = train_datagen.flow(
testX,
testY,
batch_size=batch_size,
shuffle=True)
When I train the model, I always get a constant validation accuracy of 0.3889 with the validation loss fluctuating.
#Training the model
history = model.fit_generator(
train_generator,
steps_per_epoch = train_samples//batch_size,
epochs = epochs,
validation_data = validation_generator,
validation_steps = validation_samples//batch_size)
Epoch 1/25
20/20 [==============================]20/20
[==============================] - 195s 49s/step - loss: 0.7677 - acc: 0.4020 - val_loss: 0.7784 - val_acc: 0.3889
Epoch 2/25
20/20 [==============================]20/20
[==============================] - 187s 47s/step - loss: 0.7016 - acc: 0.4848 - val_loss: 0.7531 - val_acc: 0.3889
Epoch 3/25
20/20 [==============================]20/20
[==============================] - 191s 48s/step - loss: 0.6566 - acc: 0.6304 - val_loss: 0.7492 - val_acc: 0.3889
Epoch 4/25
20/20 [==============================]20/20
[==============================] - 175s 44s/step - loss: 0.6533 - acc: 0.5529 - val_loss: 0.7575 - val_acc: 0.3889
predictions= model.predict(testX)
print(predictions)
Predicting the model also only returns an array of one prediction per image:
[[0.457804 ]
[0.45051473]
[0.48343503]
[0.49180537]...
Why is it that the model only predicts one of the two classes? Does this have to do with the constant val accuracy or possibly overfitting?
If you have two classes, every image is in one or the other so probabilities for one class are enough to find everything because the sum of the probabilities for each image is supposed to make 1. So if you have the probabilitie p for 1 class, the probability for the other one is 1-p.
If you want to have the possibility of classifing images not in one of those two class, then you should create a third one.
Also, this line:
model_top.add(keras.layers.Dense(1, activation = 'sigmoid'))
means that the Output is a vector of shape(nb_sample,1) and has the same shape as your training labels

Text classification with LSTM Network and Keras

I'm currently using a Naive Bayes algorithm to do my text classification.
My end goal is to be able to highlight parts of a big text document if the algorithm has decided the sentence belonged to a category.
Naive Bayes results are good, but I would like to train a NN for this problem, so I've followed this tutorial:
http://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/ to build my LSTM network on Keras.
All these notions are quite difficult for me to understand right now, so excuse me if you see some really stupid things in my code.
1/ Preparation of the training data
I have 155 sentences of different sizes that have been tagged to a label.
All these tagged sentences are in a training.csv file:
8,9,1,2,3,4,5,6,7
16,15,4,6,10,11,12,13,14
17,18
22,19,20,21
24,20,21,23
(each integer representing a word)
And all the results are in another label.csv file:
6,7,17,15,16,18,4,27,30,30,29,14,16,20,21 ...
I have 155 lines in trainings.csv, and of course 155 integers in label.csv
My dictionnary has 1038 words.
2/ The code
Here is my current code:
total_words = 1039
## fix random seed for reproducibility
numpy.random.seed(7)
datafile = open('training.csv', 'r')
datareader = csv.reader(datafile)
data = []
for row in datareader:
data.append(row)
X = data;
Y = numpy.genfromtxt("labels.csv", dtype="int", delimiter=",")
max_sentence_length = 500
X_train = sequence.pad_sequences(X, maxlen=max_sentence_length)
X_test = sequence.pad_sequences(X, maxlen=max_sentence_length)
# create the model
embedding_vecor_length = 32
model = Sequential()
model.add(Embedding(total_words, embedding_vecor_length, input_length=max_sentence_length))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(X_train, Y, epochs=3, batch_size=64)
# Final evaluation of the model
scores = model.evaluate(X_train, Y, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))
This model is never converging:
155/155 [==============================] - 4s - loss: 0.5694 - acc: 0.0000e+00
Epoch 2/3
155/155 [==============================] - 3s - loss: -0.2561 - acc: 0.0000e+00
Epoch 3/3
155/155 [==============================] - 3s - loss: -1.7268 - acc: 0.0000e+00
I would like to have one of the 24 labels as a result, or a list of probabilities for each label.
What am I doing wrong here?
Thanks for your help!
I've updated my code thanks to the great comments posted to my question.
Y_train = numpy.genfromtxt("labels.csv", dtype="int", delimiter=",")
Y_test = numpy.genfromtxt("labels_test.csv", dtype="int", delimiter=",")
Y_train = np_utils.to_categorical(Y_train)
Y_test = np_utils.to_categorical(Y_test)
max_review_length = 50
X_train = sequence.pad_sequences(X_train, maxlen=max_review_length)
X_test = sequence.pad_sequences(X_test, maxlen=max_review_length)
model = Sequential()
model.add(Embedding(top_words, 32, input_length=max_review_length))
model.add(LSTM(10, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(31, activation="softmax"))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=["accuracy"])
model.fit(X_train, Y_train, epochs=100, batch_size=30)
I think I can play with LSTM size (10 or 100), number of epochs and batch size.
Model has a very poor accuracy (40%). But currently I think it's because I don't have enough data (150 sentences for 24 labels).
I will put this project in standby mode until I get more data.
If someone has some ideas to improve this code, feel free to comment!

Categories

Resources