I am trying to classify my input time-series data in 10 response classes. So I have 10 classes in my response feature.
My input data has 40 features and response(y_train) has 1 feature with 10 classes.
train input shape (4320, 43), train_y shape (4320,)
My LSTM Network looks like following
model = Sequential()
model.add(LSTM(25, dropout=0.2, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(train_X, train_y, epochs=10, batch_size=36, validation_split =0.05)
And I get an error
Error when checking target: expected dense_21 to have shape (10,) but got array with shape (1,)
I think it is happening because I have 1 feature in my train_y, where the dense output layer is expecting 10 features. How to run my multiclass time series classification with categorical_entropy loss function?
Also, as soon as I change loss function to sparse_categorical_entropy, it runs smooth.
model = Sequential()
model.add(LSTM(25, dropout=0.2, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(10, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(train_X, train_y, epochs=10, batch_size=36, validation_split =0.05)
Please help me to understand the reason behind it. Also, which loss function shall I use for multiclass classification time series?
The initial error:
Error when checking target: expected dense_21 to have shape (10,) but got array with shape (1,)
Would be due to the y_train, not converted into a categorical measure. You need to clean the y_train data and convert then into your desired 10 classes categorical array through, maybe, one-hot encoding.
In simple terms, categorical_crossentropy should only be used on data that is one-hot encoded.
[1, 0, 0, 0]
[0, 1, 0, 0]
[0, 0, 1, 0]
...
Otherwise, sparse_categorical_crossentropy deals with integers.
1
2
5
3
...
Related
My understanding is that "sparse_categorical_crossentropy" fits my multi-classification without one-hot-encoding case. I also slowed the adam learning rate in case it is overshooting the predictions.
I am not sure what I am not understanding that I am doing incorrectly.
My input data looks similar to this:
My output prediction results are labels: [1 2 3 4 5 6 7 8 9 10] (not one-hot-encoded). Each number represents I want the network to end up choosing.
print(x_train.shape)
print(x_test.shape)
x_train = x_train.reshape(x_train.shape[0], round(x_train.shape[1]/5), 5)
x_test = x_test.reshape(x_test.shape[0], round(x_test.shape[1]/5), 5)
print(x_train.shape)
print(np.unique(y_train))
print(len(np.unique(y_train)))
input_shape = (x_train.shape[1], 5)
adam = keras.optimizers.Adam(learning_rate=0.0001)
model = Sequential()
model.add(Conv1D(512, 5, activation='relu', input_shape=input_shape))
model.add(Conv1D(512, 5, activation='relu'))
model.add(MaxPooling1D(3))
model.add(Conv1D(512, 5, activation='relu'))
model.add(Conv1D(512, 5, activation='relu'))
model.add(GlobalAveragePooling1D())
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='sparse_categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
model.fit(x_train, y_train, batch_size=32, epochs=25, validation_data=(x_test, y_test))
print(model.summary())
Here is the error results:
Model Layers (if it helps):
I see two main problems in your approach
your labels are from 1 to 10... they must start from 0 in order to have them in the range 0-9. this can be achieved simply doing y_train-1 and y_test-1 (if y_test and y_train are numpy arrays)
the last layer of your network must be Dense(10, activation='softmax') where 10 is the number of class to predict and softmax is used to generate probabilities in multiclass problem
Use sparse_categorical_crossentropy is ok because you have integer encoded target
From the official example in Keras docs, the stacked LSTM classifier is trained using categorical_crossentropy as a loss function, as expected. https://keras.io/getting-started/sequential-model-guide/#examples
But the y_train values are seeded using numpy.random.random() which outputs real numbers, versus 0,1 binary classification ( which is typical )
Are the y_train values being promoted to 0,1 values under the hood?
Can you even train this loss function against real values between 0,1 ?
How is accuracy then calculated ?
Confusing.. no?
from keras.models import Sequential
from keras.layers import LSTM, Dense
import numpy as np
data_dim = 16
timesteps = 8
num_classes = 10
# expected input data shape: (batch_size, timesteps, data_dim)
model = Sequential()
model.add(LSTM(32, return_sequences=True,
input_shape=(timesteps, data_dim))) # returns a sequence of vectors of dimension 32
model.add(LSTM(32, return_sequences=True)) # returns a sequence of vectors of dimension 32
model.add(LSTM(32)) # return a single vector of dimension 32
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
# Generate dummy training data
x_train = np.random.random((1000, timesteps, data_dim))
y_train = np.random.random((1000, num_classes))
# Generate dummy validation data
x_val = np.random.random((100, timesteps, data_dim))
y_val = np.random.random((100, num_classes))
model.fit(x_train, y_train,
batch_size=64, epochs=5,
validation_data=(x_val, y_val))
For this example, the y_train and y_test are not the one-hot encoding anymore, but the probabilities of each classes. So it is still applicable for cross-entropy. And we can treat the one-hot encoding as the special case of the probabilities vector.
y_train[0]
array([0.30172708, 0.69581121, 0.23264601, 0.87881279, 0.46294832,
0.5876406 , 0.16881395, 0.38856604, 0.00193709, 0.80681196])
I have some hard time to get the dimensions of a LSTM network right.
So I have the following data:
train_data.shape
(25391, 3) # to be read as 25391 timesteps and 3 features
train_labels.shape
(25391, 1) # to be read as 25391 timesteps and 1 feature
So I have thought my input dimension is (1, len(train_data), train_data.shape[1]) as I plan to submit 1 batch. But I get the following error:
Error when checking target: expected lstm_10 to have 2 dimensions, but got array with shape (1, 25391, 1)
Here is the model code:
model = Sequential()
model.add(LSTM(1, # predict one feature and one timestep
batch_input_shape=(1, len(train_data), train_data.shape[1]),
activation='tanh',
return_sequences=False))
model.compile(loss = 'categorical_crossentropy', optimizer='adam', metrics = ['accuracy'])
print(model.summary())
# as 1 sample with len(train_data) time steps and train_data.shape[1] features.
model.fit(x=train_data.values.reshape(1, len(train_data), train_data.shape[1]),
y=train_labels.values.reshape(1, len(train_labels), train_labels.shape[1]),
epochs=1,
verbose=1,
validation_split=0.8,
validation_data=None,
shuffle=False)
How should the input dimensions look like?
The problem is in the target (i.e. labels) shape you provide (i.e. Error when checking target). The output of LSTM layer in your model, which is also the output of the model, has a shape of (None, 1) since you are specifying to only the final output to be returned (i.e. return_sequences=False). In order to have the output of each timestep you need to set return_sequences=True. This way the output shape of LSTM layer would be (None, num_timesteps, num_units) which is consistent with the shape of labels array you provide.
I am trying to train an LSTM recurrent neural network, for sequence classification.
My data has the following formart:
Input: [1,5,2,3,6,2, ...] -> Output: 1
Input: [2,10,4,6,12,4, ...] -> Output: 1
Input: [4,1,7,1,9,2, ...] -> Output: 2
Input: [1,3,5,9,10,20, ...] -> Output: 3
.
.
.
So basically I want to provide a sequence as an input and get an integer as an output.
Each input sequence has length = 2000 float numbers, and I have around 1485 samples for training
The output is just an integer from 1 to 10
This is what I tried to do:
# Get the training numpy 2D array for the input (1485X 2000).
# Each element is an input sequence of length 2000
# eg: [ [1,2,3...], [4,5,6...], ... ]
x_train = get_training_x()
# Get the training numpy 2D array for the outputs (1485 X 1).
# Each element is an integer output for the corresponding input from x_train
# eg: [ 1, 2, 3, ...]
y_train = get_training_y()
# Create the model
model = Sequential()
model.add(LSTM(100, input_shape=(x_train.shape)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(x_train, y_train, nb_epoch=3, batch_size=64)
I get the following error:
Error when checking input: expected lstm_1_input to have 3 dimensions, but got array with shape (1485, 2000)
I tried using this instead:
model.add(LSTM(100, input_shape=(1485, 1, 2000)))
But got the another error this time:
ValueError: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=4
Can anyone explain what is my input shape? and what am I doing wrong?
Thanks
try reshaping your training data to:
x_train=x_train.reshape(x_train.shape[0], 1, x_train.shape[1])
input_shape=(None, x_train.shape[1], 1), where None is the batch size, x_train.shape[1] is the length of each sequence of features, and 1 is each feature length. (Not sure if batch size is necessary for Sequential model).
And then reshape your data into x_train = x_train.reshape(-1, x_train.shape[1], 1).
Given the format of your input and output, you can use parts of the approach taken by one of the official Keras examples. More specifically, since you are not creating a binary classifier, but rather predicting an integer, you can use one-hot encoding to encode y_train using to_categorical().
# Number of elements in each sample
num_vals = x_train.shape[1]
# Convert all samples in y_train to one-hot encoding
y_train = to_categorical(y_train)
# Get number of possible values for model inputs and outputs
num_x_tokens = np.amax(x_train) + 1
num_y_tokens = y_train.shape[1]
model = Sequential()
model.add(Embedding(num_x_tokens, 100))
model.add(LSTM(100))
model.add(Dense(num_y_tokens, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=64,
epochs=3)
The num_x_tokens in the code above would be the maximum size of the element in one of your input samples (e.g. if you have two samples [1, 7, 2] and [3, 5, 4] then num_x_tokens is 7). If you use numpy you can find this with np.amax(x_train). Similarly, num_y_tokens is the number of categories you have in y_train.
After training, you can run predictions using the code below. Using np.argmax effectively reverses to_categorical in this configuration.
model_out = model.predict(x_test)
model_out = np.argmax(model_out, axis=1)
You can import to_categorical using from keras.utils import to_categorical, Embedding using from keras.layers import Embedding, and numpy using import numpy as np.
Also, you don't have to do print(model.summary()). model.summary() is enough to print out the summary.
EDIT
If it is the case that the input is of the form [[0.12, 0.31, ...], [0.22, 0.95, ...], ...] (say, generated with x_train = np.random.rand(num_samples, num_vals)) then you can use x_train = np.reshape(x_train, (num_samples, num_vals, 1)) to change the shape of the array to input it into the LSTM layer. The code to train the model in that case would be:
num_samples = x_train.shape[0]
num_vals = x_train.shape[1] # Number of elements in each sample
# Reshape for what LSTM expects
x_train = np.reshape(x_train, (num_samples, num_vals, 1))
y_train = to_categorical(y_train)
# Get number of possible values for model outputs
num_y_tokens = y_train.shape[1]
model = Sequential()
model.add(LSTM(100, input_shape=(num_vals, 1)))
model.add(Dense(num_y_tokens, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=64,
epochs=3)
The num_vals is the length of each sample array in x_train. np.reshape(x_train, (num_samples, num_vals, 1)) changes each sample from [0.12, 0.31, ...] form to [[0.12], [0.31], ...] form, which is the shape that LSTM then takes (input_shape=(num_vals, 1)). The extra 1 seems strange in this case, but it is necessary to add an extra dimension to the input for the LSTM since it expects each sample to have at least two dimensions, typically called (timesteps, data_dim), or in this case (num_vals, 1).
To see how else LSTMs are used in Keras you can refer to:
Keras Sequential model guide (has several LSTM examples)
Keras examples (look for *.py files with lstm in their name)
I slightly misunderstand how to create a simple Sequence for my data.
The data has the following dimensions:
X_train.shape
(2369, 12)
y_train.shape
(2369,)
X_test.shape
(592, 12)
y_test.shape
(592,)
This is how I create the model:
batch_size = 128
nb_epoch = 20
in_out_neurons = X_train.shape[1]
dimof_middle = 100
model = Sequential()
model.add(Dense(batch_size, batch_input_shape=(None, in_out_neurons)))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(batch_size))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(in_out_neurons))
model.add(Activation('linear'))
# I am solving the regression problem, not the classification one
model.compile(loss="mean_squared_error", optimizer="rmsprop")
history = model.fit(X_train, y_train,
batch_size=batch_size, nb_epoch=nb_epoch,
verbose=1, validation_data=(X_test, y_test))
The error message:
Exception: Error when checking model input: expected dense_input_14 to
have shape (None, 1) but got array with shape (2369, 12)รง
The error is:
Error when checking model target: expected activation_42 to have shape
(None, 12) but got array with shape (2369, 1)
This error occurs at line:
model.add(Dense(in_out_neurons))
How to change Dense to make it work?
Another question is how to add a simple autoencoder in order to initialize weights of ANN?
One of your problems is that you seem to misunderstand what a batch is.
A batch is the number of training samples computed at a time, so instead of computing one training sample from X_train at a time you use, for example, 100 at a time. The important bit here is that this has nothing to do with your model.
So when you write
model.add(Dense(batch_size, batch_input_shape=(None, in_out_neurons)))
then you create a fully connected layer with an output size of one batch. That does not make a lot of sense.
Another problem is that your model's output is 12 neurons while your Y is only one value/neuron. Your model looks like this:
|
v
[128]
[128]
[ 12]
|
v
Then what fit() does is, it inputs a matrix of shape (128, 12) ((batch size, X_train.shape[1])) into the model and attempts to compare the output of shape (128,12) from the last layer to the corresponding Y values of the batch (shape (128,1)).