I'm the freshman in Machine Learning and Neural Network. I've got the problem with text classification. I use LSTM NN architecture system with Keras library.
My model every time reach the results about 97%. I got the database with something about 1 million records, where 600k of them are positive and 400k are negative.
I got also 2 labeled classes as 0 (for negative) and 1 (for positive). My database is split for training database and tests database in relation 80:20. For the NN input, I use Word2Vec trained on PubMed articles.
My network architecure:
model = Sequential()
model.add(emb_layer)
model.add(LSTM(64, dropout =0.5))
model.add(Dense(2))
model.add(Activation(‘softmax’)
model.compile(optimizer=’rmsprop’, loss=’binary_crossentropy’, metrics=[‘accuracy’])
model.fit(X_train, y_train, epochs=50, batch_size=32)
How can I fix (do better) my NN created model in this kind of text classification?
The problem with which we are dealing here is called overfitting.
First of all, make sure your input data is properly cleaned. One of the principles of machine learning is: ‘Garbage In, Garbage Out”. Next, you should balance your data collection, for example on 400k positive and 400k negative records. In sequence, the data set should be divided into a training, test and validation set (60%:20%:20%), for example using scikit-learn library, as in the following example:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2)
Then I would use a different neural network architecture and try to optimize the parameters.
Personally, I would suggest using a 2-layer LSTM neural network or a combination of a convolutional and recurrent neural network (faster and reading articles that give better results).
1) 2-layer LSTM:
model = Sequential()
model.add(emb_layer)
model.add(LSTM(64, dropout=0.5, recurrent_dropout=0.5, return_sequences=True)
model.add(LSTM(64, dropout=0.5, recurrent_dropout=0.5))
model.add(Dense(2))
model.add(Activation(‘sigmoid’))
You can try using 2 layers with 64 hidden neurons, add recurrent_dropout parameter.
The main reason why we use sigmoid function is because it exists between (0 to 1). Therefore, it is especially used for models where we have to predict the probability as an output.Since probability of anything exists only between the range of 0 and 1, sigmoid is the right choice.
2) CNN + LSTM
model = Sequential()
model.add(emb_layer)
model.add(Convolution1D(32, 3, padding=’same’))
model.add(Activation(‘relu’))
model.add(MaxPool1D(pool_size=2))
model.add(Dropout(0.5))
model.add(LSTM(32, dropout(0.5, recurrent_dropout=0.5, return_sequences=True))
model.add(LSTM(64, dropout(0.5, recurrent_dropout=0.5))
model.add(Dense(2))
model.add(Activation(‘sigmoid’))
You can try using combination of a CNN and RNN. In this architecture, the model learns faster (up to 5 times faster).
Then, in both cases, you need to apply optimization, loss function.
A good optimizer for both cases is the "Adam" optimizer.
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
In the last step, we validate our network on the validation set.
In addition, we use callback, which will stop the network learning process, in case when, for example, in 3 more iterations, there are no changes in the accuracy of the classification.
from keras.callbacks import EarlyStopping
early_stopping = EarlyStopping(patience=3)
model.fit(X_train, y_train, epochs=100, batch_size=32, validation_data=(X_val, y_val), callbacks=[early_stopping])
We can also control the overfitting using graphs. If you want to see how to do it, check here.
If you need further help, let me know in a comment.
Related
I want to design three models that they have the same structure but at the end one of them should have some serious overfitting and another model has less overfitting and the last model has no overfitting.
The idea is that i want to see how much information exist in last layer of each model for some test data. let's say I m using mnist dataset as training and testing set and the structure of all models should be like this.
# Network architecture
network = Sequential()
# input layer
network.add(Dense(512, activation='relu', input_shape=(28*28,) ))
# Hidden layers
network.add(Dense(64, activation='relu', name='features'))
# Output layer
network.add((Dense(10,activation='softmax')))
network.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
#train the model
history = network.fit(train_img, train_label, epochs=50, batch_size=256, validation_split=0.2)
So now the question is how to change this train model that fulfills my needs for three models with different overfitting.
I m new in machine learning topics and i hope i have explain my question as good as possible.
Thanks in advance
Overfit:
The MNIST dataset is rather simple, therefore it should be easy to overfit with the model you are suggesting. Increase the number of epochs: eventually, your model will memorize the training data very well. If you struggle to overfit the data, you might need a more complex network - but I doubt that this will be the case.
Just right:
Probably the easiest wat to obtain the model which is just right (no overfit or underfit) use a callback. Specifically, we can use early stopping. The callback will stop training if the validation loss stops improving. For your code, all you have to do is modify the training as follows:
First define a callback
callback_es = tf.keras.callbacks.EarlyStopping(monitor = 'val_loss')
Add the callback to your training
history = network.fit(train_img, train_label, epochs=50, batch_size=256, validation_split=0.2, callback = [callback_es])
Underfit
Similar idea as with overfitting. In this case, you want to stop your training early on. Train your model for a limited number of epochs only. If you find that your model overfits to quickly, try to lower the learning rate.
I would like to know if my code is doing what i want to do; To give you some background 'im implementing CNN for image classification. I'm trying to use cross validation to compare my different neural network architecture
here the code:
def create_model():
model = Sequential()
model.add(Conv2D(24,kernel_size=3,padding='same',activation='relu',
input_shape=(96,96,1)))
model.add(MaxPool2D())
model.add(Conv2D(48,kernel_size=3,padding='same',activation='relu'))
model.add(MaxPool2D())
model.add(Conv2D(64,kernel_size=3,padding='same',activation='relu'))
model.add(MaxPool2D())
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(256, activation='relu'))
model.add(Dense(12, activation='softmax'))
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])
return model
model = KerasClassifier(build_fn=create_model, epochs=5, batch_size=20, verbose=1)
# 3-Fold Crossvalidation
kfold = KFold(n_splits=3, shuffle=True, random_state=2019)
results = cross_val_score(model, train_X, train_Y_one_hot, cv=kfold)
model.fit(train_X, train_Y_one_hot,validation_data=(valid_X, valid_label),class_weight=class_weights)
y_pred = model.predict(test_X)
test_eval = model.evaluate(test_X, y_pred, verbose=0)
I have found the part for cross validation on internet. But i have some problem to understand it.
My question: 1=> Can I use cross validation to improve my accuracy? For example i run 10 time my neural network and my model get the weight where the best accuracy occured
2 => If i understand well, in the code above, results run my CNN 3 time and show me the accuracy. But when i use model.fit, model is run only one time; Am i right?
Thanks for your help
Not really, cross-validation is more a way to prevent overfitting/ not be confused by abnormal results coming from a badly splitted dataset -> getting a revelant estimation of you model performances. If you want to tune the Hyperparameters of your model, you should better use sklearn.model_selection.GridSearchCV / sklearn.model_selection.RandomSearchCV
when doing cross_val_score For each Train/Test
sklearn does a fit then predict/evaluate, So for each new Instance of the model,
you have 1 fit then 1 predict/evaluate;
Else your cross-validation is not valid because it depends on fitting on previous dataset (and maybe on test data !)
There are two key terms here that you should get familiarized with:
Hyperparameters
Parameters
Hyperparameters control the general architecture of a model. These are what the programmer or data scientist controls. In case of a CNN, this refers to the number of layers, their configurations, activations, optimizers etc. For a simple polynomial regression model this would be the degree of the polynomial.
Parameters refer to the actual values of weights or coefficients that the model ends up with after it solves the optimization using gradient descent or whatever method you use. In a CNN this would be the weights matrix for each layer. For a polynomial regression this would be the coefficients and bias.
Cross validation is used to find the best set of hyperparameters. The best set of parameters are obtained by the optimizer (gradient descent, adam etc) for a given set of hyperparameters and data.
To answer your questions:
You would run cross validation several times, each time with a different hyperparameter configuration (network architecture). That's the only thing you can control. At the end you pick the best architecture based on accuracy. The weights of the model would be different for each fold but finding the best weights is the optimizer's job, not yours.
Yes. In 3 fold CV, the model is trained 3 times and evaluated 3 times. When you do model.fit you are making predictions once on a new dataset.
I created a neural network in python that is predicting my time-series very well.
My issue is I want to be able to create a neural network that can predict multiple time series at the same time.
Is this possible and how would I go about it?
This is the code to build the NN for a single time series
nn_model = Sequential()
nn_model.add(Dense(12, input_dim=1, activation='relu'))
nn_model.add(Dense(1))
nn_model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mse', 'mae'])
early_stop = EarlyStopping(monitor='loss', patience=2, verbose=1)
history = nn_model.fit(X_train, y_train, epochs=100, batch_size=1, verbose=1, callbacks=[early_stop], shuffle=False)
Any ideas about how to convert this to run for multiple time series?
I wrote simple code to learn Keras:
from tensorflow import keras
def main():
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()
model = keras.Sequential()
model.add(keras.layers.Conv2D(16, 3, padding='same', activation='relu'))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(10, activation='softmax'))
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=4)
model.summary()
if __name__ == '__main__':
main()
But it seems to not learn anything. Not like it should learn much, but should at least decrease loss and increase accuracy a little. But both are stuck the same every epoch.
I had exact same model written in Pytorch and it achieved around 35% accuracy. This in tensorflow + keras is stuck on 10%.
tensorflow-gpu v1.9
What am I missing?
I think the default learning rate is to high for this problem. Try something like
opt=keras.optimizers.Adam(lr=1.e-5)
model.compile(optimizer=opt, loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
I checked the default learning rate used by Adam in both keras and PyTorch, and they both use 1e-3. Therefore, learning rate should not be the issue, assume you use default in both models.
Alternatively, I think this is related to the weight initialization, which is explicitly handled by each layer in keras but not in PyTorch.
Simply changing the training line to the following,
model.fit(x_train/255., y_train, shuffle=True,
validation_data=(x_test/255., y_test), epochs=4)
you should observe both training and validation accuracy reach around 60%.
I am not familiar with PyTorch, but I suggest you initialize the weights in the keras network with those used by the PyTorch network. In this way, you will have a fair comparison.
I am currently trying train a regression network using keras. To ensure I proper training I've want to train using crossvalidation.
The Problem is that it seems that keras don't have any functions supporting crossvalidation or do they?
The only solution I seemed to have found is to use scikit test_train_split and run a model.fit for for each k fold manually. Isn't there a already an integrated solutions for this, rather than manually doing it ?
Nope... That seem to be the solution. (Of what I know of.)
There is a scikit learn wrapper for Keras that will help you do this easily: https://keras.io/scikit-learn-api/
I recommend reading Dr. Jason Brownlee's example: https://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/
def baseline_model():
# create model
model = Sequential()
model.add(Dense(13, input_dim=13, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam')
return model
estimator = KerasRegressor(build_fn=wider_model, nb_epoch=100, batch_size=5, verbose=0)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold)