data = np.random.random((10000, 150))
labels = np.random.randint(10, size=(10000, 1))
labels = to_categorical(labels, num_classes=10)
model = Sequential()
model.add(Dense(units=32, activation='relu', input_shape=(150,)))
model.add(Dense(units=10, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(data, labels, epochs=30, validation_split=0.2)
I created 10000 random samples to train my net, but it use only few of them(250/10000)
Exaple of the 1st epoch:
Epoch 1/30
250/250 [==============================] - 0s 2ms/step - loss: 2.1110 - accuracy: 0.2389 - val_loss: 2.2142 - val_accuracy: 0.1800
Your data is split into training and validation subsets (validation_split=0.2).
Training subset has size 8000 and validation 2000.
Training goes in batches, each batch has size 32 samples by default.
So one epoch should take 8000/32=250 batches, as it shows in the progress.
Try code like following example
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(10, size=(1000, 1))
# Convert labels to categorical one-hot encoding
one_hot_labels = keras.utils.to_categorical(labels, num_classes=10)
# Train the model, iterating on the data in batches of 32 samples
model.fit(data, one_hot_labels, epochs=10, batch_size=32)
Related
I use the Keras model training API and observed differences when training the model with NumPy arrays (x_train and y_train) and with tf.data.Dataset.from_tensor_slices((x_train, y_train)). A minimal working example is shown below:
import numpy as np
import tensorflow as tf
tf.keras.utils.set_random_seed(0)
n_examples, n_dims = (100, 10)
raw_dataset = np.random.randn(n_examples, n_dims)
model = tf.keras.models.Sequential(
[
tf.keras.layers.Dense(
1024, activation="relu", use_bias=True
),
tf.keras.layers.Dense(
1, activation="linear", use_bias=True
),
]
)
model.compile(
optimizer=tf.keras.optimizers.Adam(),
loss="mse",
)
x_train = raw_dataset[:, :-1]
y_train = raw_dataset[:, -1]
dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
n_epochs = 10
batch_size = 16
use_dataset = True
if use_dataset:
model.fit(
dataset.batch(batch_size=batch_size),
epochs=n_epochs,
)
else:
model.fit(
x=x_train,
y=y_train,
batch_size=batch_size,
epochs=n_epochs,
)
print("Evaluation:")
model.evaluate(x_train, y_train)
model.evaluate(dataset.batch(batch_size=batch_size))
If I run this code with use_dataset = True, the final performance is:
Evaluation:
4/4 [==============================] - 0s 825us/step - loss: 0.4132
7/7 [==============================] - 0s 701us/step - loss: 0.4132
If I run it with use_dataset = False, I get:
Evaluation:
4/4 [==============================] - 0s 855us/step - loss: 0.4219
7/7 [==============================] - 0s 808us/step - loss: 0.4219
I expected that the two training loops would perform identically. Interestingly, the model performance is identical if I set batch_size = n_examples. The difference seems to be related with the way that batches are handled internally. Why is this happening? Is it a bug or a feature?
The behavior is due to the default parameter shuffle=True in model.fit(*) and not a bug. According to the docs regarding shuffle:
Boolean (whether to shuffle the training data before each epoch) or str (for 'batch'). This argument is ignored when x is a generator or an object of tf.data.Dataset. 'batch' is a special option for dealing with the limitations of HDF5 data; it shuffles in batch-sized chunks. Has no effect when steps_per_epoch is not None.
So this parameter is ignored when a tf.data.Dataset is passed, and the data is not reshuffled after each epoch as in the other approach with arrays.
Here is the code to get the same results for both methods:
import numpy as np
import tensorflow as tf
tf.keras.utils.set_random_seed(0)
n_examples, n_dims = (100, 10)
raw_dataset = np.random.randn(n_examples, n_dims)
model = tf.keras.models.Sequential(
[
tf.keras.layers.Dense(
1024, activation="relu", use_bias=True
),
tf.keras.layers.Dense(
1, activation="linear", use_bias=True
),
]
)
model.compile(
optimizer=tf.keras.optimizers.Adam(),
loss="mse",
)
x_train = raw_dataset[:, :-1]
y_train = raw_dataset[:, -1]
dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
n_epochs = 10
batch_size = 16
use_dataset = False
if use_dataset:
model.fit(
dataset.batch(batch_size=batch_size),
epochs=n_epochs,
)
else:
model.fit(
x=x_train,
y=y_train,
batch_size=batch_size,
shuffle=False,
epochs=n_epochs,
)
print("Evaluation:")
model.evaluate(x_train, y_train)
model.evaluate(dataset.batch(batch_size=batch_size))
I have a timeseries data and I am doing univariate forecasting using stacked LSTM without any activation function, Like following.
model = Sequential()
model.add(LSTM(200, return_sequences=True, input_shape=(window_6, features)))
model.add(Dropout(0.2))
model.add(LSTM(200, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(200, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(200))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
Also, I used ADAM optimizer and MSE loss, with 128 batch size and 500 epochs, and 500 steps per epoch.
loss_ = PlotLossesKeras()
model.fit(X1, y1, batch_size= 128, epochs=500, validation_split = 0.2, steps_per_epoch = 500, shuffle=True, callbacks=[loss_])
The loss plot looks like this:
Loss
training (min: 76.334, max: 1568.067, cur: 77.816)
validation (min: 83.637, max: 382.949, cur: 84.288)
500/500 [==============================] - 5s 9ms/step - loss: 77.8162 - val_loss: 84.2875
My questions:
Is this model overfit/underfit/Normal fit?
The current loss is 77 and validation loss is 84. How I can reduce loss to below 10?
Need suggestions. Open for any critique. Thanks.
Keras Binary Classifier Tutorial Example gives only 50% validation accuracy.
The near 50% accuracy can be gotten from an un-trained classifier itself for binary classification.
This example is straight from https://keras.io/getting-started/sequential-model-guide/
import numpy as np
import tensorflow as tf
from tensorflow_core.python.keras.models import Sequential
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
np.random.seed(10)
# Generate dummy data
x_train = np.random.random((1000, 20))
y_train = np.random.randint(2, size=(1000, 1))
x_test = np.random.random((800, 20))
y_test = np.random.randint(2, size=(800, 1))
model = Sequential()
model.add(Dense(64, input_dim=20, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.fit(x_train, y_train,
epochs=50,
batch_size=128,
validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, batch_size=128)
Accuracy output.
I tried with multiple trials.
Increased the number of hidden layers
Epoch 50/50 1000/1000 [==============================] - 0s
211us/sample - loss: 0.6905 - accuracy: 0.5410 - val_loss: 0.6959 -
val_accuracy: 0.4812
Could someone help me understand if anything is wrong here?
How to increase the accuracy for this "example" problem presented in the tutorial?
If you train a classifier with random examples, you will always get aprrox. 50% accuracy at validation data here represented by x_test. It is because your training samples get trained with random classes. Also the validation or test set has been assigned to random classes. This is why the random accuracy i.e. 50-50% occurs.
The more epoch you test the training set the more accuracy you will get on training set as an effect of overfitting.
I'm testing simple networks on Keras with TensorFlow backend and I ran into an issue with using sigmoid activation function
The network isn't learning for first 5-10 epochs, and then everything is fine.
I tried using initializers and regularizers, but that only made it worse.
I use the network like this:
import numpy as np
import keras
from numpy import expand_dims
from keras.preprocessing.image import ImageDataGenerator
from matplotlib import pyplot
# load the image
(x_train, y_train), (x_val, y_val), (x_test, y_test) = netowork2_ker.load_data_shared()
# expand dimension to one sample
x_train = expand_dims(x_train, 2)
x_train = np.reshape(x_train, (50000, 28, 28))
x_train = expand_dims(x_train, 3)
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)
datagen = ImageDataGenerator(
rescale=1./255,
width_shift_range=[-1, 0, 1],
height_shift_range=[-1, 0, 1],
rotation_range=10)
epochs = 20
batch_size = 50
num_classes = 10
model = keras.Sequential()
model.add(keras.layers.Conv2D(64, (3, 3), padding='same',
input_shape=x_train.shape[1:],
activation='sigmoid'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Conv2D(100, (3, 3),
activation='sigmoid'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(100,
activation='sigmoid'))
#model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Dense(num_classes,
activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size),
steps_per_epoch=len(x_train) / batch_size, epochs=epochs,
verbose=2, shuffle=True)
With the code above I get results like these:
Epoch 1/20
- 55s - loss: 2.3098 - accuracy: 0.1036
Epoch 2/20
- 56s - loss: 2.3064 - accuracy: 0.1038
Epoch 3/20
- 56s - loss: 2.3068 - accuracy: 0.1025
Epoch 4/20
- 56s - loss: 2.3060 - accuracy: 0.1079
...
For 7 epochs (different every time) and then the loss rapidly goes downward and i achieve 0.9623 accuracy in 20 epochs.
But if I change activation from sigmoid to relu it works great and gives me 0.5356 accuracy in the first epoch.
This issue makes sigmoid almost unusable for me and I'd like to know, I can do something about it. Is this a bug or am I doing something wrong?
Activation function suggestion:
In practice, the sigmoid non-linearity has recently fallen out of favor and it is rarely ever used. ReLU is the most common choice, if there are a large fraction of “dead” units in network, try Leaky ReLU and tanh. Never use sigmoid.
Reasons for not using the sigmoid:
A very undesirable property of the sigmoid neuron is that when the neuron’s activation saturates at either tail of 0 or 1, the gradient at these regions is almost zero. In addition, Sigmoid outputs are not zero-centered.
I'm currently using a Naive Bayes algorithm to do my text classification.
My end goal is to be able to highlight parts of a big text document if the algorithm has decided the sentence belonged to a category.
Naive Bayes results are good, but I would like to train a NN for this problem, so I've followed this tutorial:
http://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/ to build my LSTM network on Keras.
All these notions are quite difficult for me to understand right now, so excuse me if you see some really stupid things in my code.
1/ Preparation of the training data
I have 155 sentences of different sizes that have been tagged to a label.
All these tagged sentences are in a training.csv file:
8,9,1,2,3,4,5,6,7
16,15,4,6,10,11,12,13,14
17,18
22,19,20,21
24,20,21,23
(each integer representing a word)
And all the results are in another label.csv file:
6,7,17,15,16,18,4,27,30,30,29,14,16,20,21 ...
I have 155 lines in trainings.csv, and of course 155 integers in label.csv
My dictionnary has 1038 words.
2/ The code
Here is my current code:
total_words = 1039
## fix random seed for reproducibility
numpy.random.seed(7)
datafile = open('training.csv', 'r')
datareader = csv.reader(datafile)
data = []
for row in datareader:
data.append(row)
X = data;
Y = numpy.genfromtxt("labels.csv", dtype="int", delimiter=",")
max_sentence_length = 500
X_train = sequence.pad_sequences(X, maxlen=max_sentence_length)
X_test = sequence.pad_sequences(X, maxlen=max_sentence_length)
# create the model
embedding_vecor_length = 32
model = Sequential()
model.add(Embedding(total_words, embedding_vecor_length, input_length=max_sentence_length))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(X_train, Y, epochs=3, batch_size=64)
# Final evaluation of the model
scores = model.evaluate(X_train, Y, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))
This model is never converging:
155/155 [==============================] - 4s - loss: 0.5694 - acc: 0.0000e+00
Epoch 2/3
155/155 [==============================] - 3s - loss: -0.2561 - acc: 0.0000e+00
Epoch 3/3
155/155 [==============================] - 3s - loss: -1.7268 - acc: 0.0000e+00
I would like to have one of the 24 labels as a result, or a list of probabilities for each label.
What am I doing wrong here?
Thanks for your help!
I've updated my code thanks to the great comments posted to my question.
Y_train = numpy.genfromtxt("labels.csv", dtype="int", delimiter=",")
Y_test = numpy.genfromtxt("labels_test.csv", dtype="int", delimiter=",")
Y_train = np_utils.to_categorical(Y_train)
Y_test = np_utils.to_categorical(Y_test)
max_review_length = 50
X_train = sequence.pad_sequences(X_train, maxlen=max_review_length)
X_test = sequence.pad_sequences(X_test, maxlen=max_review_length)
model = Sequential()
model.add(Embedding(top_words, 32, input_length=max_review_length))
model.add(LSTM(10, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(31, activation="softmax"))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=["accuracy"])
model.fit(X_train, Y_train, epochs=100, batch_size=30)
I think I can play with LSTM size (10 or 100), number of epochs and batch size.
Model has a very poor accuracy (40%). But currently I think it's because I don't have enough data (150 sentences for 24 labels).
I will put this project in standby mode until I get more data.
If someone has some ideas to improve this code, feel free to comment!