Keras simplest conv network not learning anything - python

I wrote simple code to learn Keras:
from tensorflow import keras
def main():
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()
model = keras.Sequential()
model.add(keras.layers.Conv2D(16, 3, padding='same', activation='relu'))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(10, activation='softmax'))
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=4)
model.summary()
if __name__ == '__main__':
main()
But it seems to not learn anything. Not like it should learn much, but should at least decrease loss and increase accuracy a little. But both are stuck the same every epoch.
I had exact same model written in Pytorch and it achieved around 35% accuracy. This in tensorflow + keras is stuck on 10%.
tensorflow-gpu v1.9
What am I missing?

I think the default learning rate is to high for this problem. Try something like
opt=keras.optimizers.Adam(lr=1.e-5)
model.compile(optimizer=opt, loss='sparse_categorical_crossentropy',
metrics=['accuracy'])

I checked the default learning rate used by Adam in both keras and PyTorch, and they both use 1e-3. Therefore, learning rate should not be the issue, assume you use default in both models.
Alternatively, I think this is related to the weight initialization, which is explicitly handled by each layer in keras but not in PyTorch.
Simply changing the training line to the following,
model.fit(x_train/255., y_train, shuffle=True,
validation_data=(x_test/255., y_test), epochs=4)
you should observe both training and validation accuracy reach around 60%.
I am not familiar with PyTorch, but I suggest you initialize the weights in the keras network with those used by the PyTorch network. In this way, you will have a fair comparison.

Related

when to call compile while training a tensorflow (2.0) model in incremental fashion?

I am writing a neural network to train incrementally (not online). Here is a snippet of the code
output = create_model()
model = Model(inputs=values, outputs=output)
if start_epoch > 1:
weights_list = load_model_from_pickle()
model.set_weights(weights_list)
model.compile(loss='binary_crossentropy', optimizer='adam')
model.fit(data , label, epochs=1, verbose=1, batch_size=1024, shuffle=False)
In essence, I want to load previously trained weights and train for a few more epochs. I read some SO reply that calling compile changes the weights? Is there any other way to do it? Does it make sense to set weight after calling compile? Will the answer change if I run my model in multi gpu setting?
You need to compile the model ones and after training when you reload the model, you dont' require to compile it again. Read more here.
Compile function defines the optimizer, loss functions and metrics you want. It does not change any weights. For more detailed information, read here.

How to split model between 2 GPUs with keras in Tensorflow?

Essentially, I am looking for something like with tf.device('/device:GPU:0' for keras. I want to put my operations on different GPUs. I am using a Sequential model following lines of
...
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
...
model.fit(train_images, train_labels, epochs=5)
Probably you can use with K.tf.device('/gpu:1'): as a context. Or, if the backend is tensorflow, then the way you assign gpu in tf should work for keras also.

Keras - what accuracy metric should be used along with sparse_categorical_crossentropy to compile model

When I have 2 classes I used binary_crossentropy as loss value like this to compile a model:
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
But right now I have 5 classes & I'm not using on hot encoded features. So I choose sparse_categorical_crossentropy as loss value. But what should be the accuracy metric as keras metric source code suggested there are multiple accuracy metrics available. I tried:
model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy', metrics=['sparse_categorical_accuracy'])
So is it correct or should I just use categorical_accuracy?
sparse_categorical_accuracy is a correct metrics for
sparse_categorical_entropy.
But why are you using sparse_categorical_entropy? What kind of classes do you have? sparse_categorical_entropy is being used for Integer outputs. But if you have a one-hot-encoded target, you should use categorical_crossentropy as loss function and accuracy or categorical_accuracy for metrics.
UPDATE:
Use the following code for your classification problem:
model.add(Dense(5, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

How to fix (do better) text classification model with using word2vec

I'm the freshman in Machine Learning and Neural Network. I've got the problem with text classification. I use LSTM NN architecture system with Keras library.
My model every time reach the results about 97%. I got the database with something about 1 million records, where 600k of them are positive and 400k are negative.
I got also 2 labeled classes as 0 (for negative) and 1 (for positive). My database is split for training database and tests database in relation 80:20. For the NN input, I use Word2Vec trained on PubMed articles.
My network architecure:
model = Sequential()
model.add(emb_layer)
model.add(LSTM(64, dropout =0.5))
model.add(Dense(2))
model.add(Activation(‘softmax’)
model.compile(optimizer=’rmsprop’, loss=’binary_crossentropy’, metrics=[‘accuracy’])
model.fit(X_train, y_train, epochs=50, batch_size=32)
How can I fix (do better) my NN created model in this kind of text classification?
The problem with which we are dealing here is called overfitting.
First of all, make sure your input data is properly cleaned. One of the principles of machine learning is: ‘Garbage In, Garbage Out”. Next, you should balance your data collection, for example on 400k positive and 400k negative records. In sequence, the data set should be divided into a training, test and validation set (60%:20%:20%), for example using scikit-learn library, as in the following example:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2)
Then I would use a different neural network architecture and try to optimize the parameters.
Personally, I would suggest using a 2-layer LSTM neural network or a combination of a convolutional and recurrent neural network (faster and reading articles that give better results).
1) 2-layer LSTM:
model = Sequential()
model.add(emb_layer)
model.add(LSTM(64, dropout=0.5, recurrent_dropout=0.5, return_sequences=True)
model.add(LSTM(64, dropout=0.5, recurrent_dropout=0.5))
model.add(Dense(2))
model.add(Activation(‘sigmoid’))
You can try using 2 layers with 64 hidden neurons, add recurrent_dropout parameter.
The main reason why we use sigmoid function is because it exists between (0 to 1). Therefore, it is especially used for models where we have to predict the probability as an output.Since probability of anything exists only between the range of 0 and 1, sigmoid is the right choice.
2) CNN + LSTM
model = Sequential()
model.add(emb_layer)
model.add(Convolution1D(32, 3, padding=’same’))
model.add(Activation(‘relu’))
model.add(MaxPool1D(pool_size=2))
model.add(Dropout(0.5))
model.add(LSTM(32, dropout(0.5, recurrent_dropout=0.5, return_sequences=True))
model.add(LSTM(64, dropout(0.5, recurrent_dropout=0.5))
model.add(Dense(2))
model.add(Activation(‘sigmoid’))
You can try using combination of a CNN and RNN. In this architecture, the model learns faster (up to 5 times faster).
Then, in both cases, you need to apply optimization, loss function.
A good optimizer for both cases is the "Adam" optimizer.
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
In the last step, we validate our network on the validation set.
In addition, we use callback, which will stop the network learning process, in case when, for example, in 3 more iterations, there are no changes in the accuracy of the classification.
from keras.callbacks import EarlyStopping
early_stopping = EarlyStopping(patience=3)
model.fit(X_train, y_train, epochs=100, batch_size=32, validation_data=(X_val, y_val), callbacks=[early_stopping])
We can also control the overfitting using graphs. If you want to see how to do it, check here.
If you need further help, let me know in a comment.

Keras and cross validation

I am currently trying train a regression network using keras. To ensure I proper training I've want to train using crossvalidation.
The Problem is that it seems that keras don't have any functions supporting crossvalidation or do they?
The only solution I seemed to have found is to use scikit test_train_split and run a model.fit for for each k fold manually. Isn't there a already an integrated solutions for this, rather than manually doing it ?
Nope... That seem to be the solution. (Of what I know of.)
There is a scikit learn wrapper for Keras that will help you do this easily: https://keras.io/scikit-learn-api/
I recommend reading Dr. Jason Brownlee's example: https://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/
def baseline_model():
# create model
model = Sequential()
model.add(Dense(13, input_dim=13, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam')
return model
estimator = KerasRegressor(build_fn=wider_model, nb_epoch=100, batch_size=5, verbose=0)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold)

Categories

Resources