I'm using Keras framework to build a stacked LSTM model as follows:
model.add(layers.LSTM(units=32,
batch_input_shape=(1, 100, 64),
stateful=True,
return_sequences=True))
model.add(layers.LSTM(units=32, stateful=True, return_sequences=True))
model.add(layers.LSTM(units=32, stateful=True, return_sequences=False))
model.add(layers.Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(train_dataset,
train_labels,
epochs=1,
validation_split = 0.2,
verbose=1,
batch_size=1,
shuffle=False)
Knowing that the default batch_size for mode.fit, model.predict and model.evaluate is 32, the model forces me to change this default batch_size to the samebatch_size value used in batch_input_shape (batch_size, time_steps, input_dims).
My questions are:
What is the difference between passing the batch_size into
batch_input_shape or into the model.fit?
Could I train with batch_size, lets say 10, and evaluate on a single batch (rather than
10 batches) if I passes the batch_size into the structure of the
LSTM layer through batch_input_shape?
when the lstm layer is in stateful mode, the batch size must be given and cannot be None.
this is because the lstm is stateful and needs to know how to concatenate the hidden states from the t-1 timestep batch to the t timestep batch
When you create a Sequential() model it is defined to support any batch size. In particular, in TensorFlow 1.* the input is a placeholder that has None as the first dimension:
import tensorflow as tf
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(units=2, input_shape=(2, )))
print(model.inputs[0].get_shape().as_list()) # [None, 2] <-- supports any batch size
print(model.inputs[0].op.type == 'Placeholder') # True
If you use tf.keras.InputLayer() you can define a fixed batch size like this:
import tensorflow as tf
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.InputLayer((2,), batch_size=50)) # <-- same as using batch_input_shape
model.add(tf.keras.layers.Dense(units=2, input_shape=(2, )))
print(model.inputs[0].get_shape().as_list()) # [50, 2] <-- supports only batch_size==50
print(model.inputs[0].op.type == 'Placeholder') # True
The batch size of model.fit() method is used to split your data to batches. For example, if you use InputLayer() and define a fixed batch size while providing different value of a batch size to the model.fit() method you will get ValueError:
import tensorflow as tf
import numpy as np
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.InputLayer((2,), batch_size=2)) # <--batch_size==2
model.add(tf.keras.layers.Dense(units=2, input_shape=(2, )))
model.compile(optimizer=tf.keras.optimizers.Adam(),
loss='categorical_crossentropy')
x_train = np.random.normal(size=(10, 2))
y_train = np.array([[0, 1] for _ in range(10)])
model.fit(x_train, y_train, batch_size=3) # <--batch_size==3
This will raise:
ValueError: Thebatch_sizeargument value 3 is incompatible with the specified batch size of your Input Layer: 2
To summarize: If you define a batch size None you can pass any number of samples for training or evaluation, even all samples at once without splitting to batches (if the data is too big you will get OutOfMemoryError). If you define a fixed batch size you will have to use the same fixed batch size for training and evaluation.
Related
Okay so I'm pretty new to deep learning and have a very basic doubt. I have an input data with an array containing 255 data (Araay shape (255,)) in epochs_data and their corresponding labels in new_labels (Array shape (255,)).
I split the data using the following code:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(epochs_data, new_labels, test_size = 0.2, random_state=30)
I'm using a sequential model:
from keras.models import Sequential
from keras import layers
from keras.layers import Dense, Activation, Flatten
model = Sequential()
I know how to code for the hidden layers and output layer:
model.add(Dense(500, activation='relu')) #Hidden Layer
model.add(Dense(2, activation='softmax')) #Output Layer
But I don't know how to code layer for input with the input_shape specified. The X_train is the input.It's an array of shape (180,). Also tell me how to code the model.fit() for the same. Any help is appreciated.
You have to copy this line before the hidden layer. You can add the activation function that you want. Finally, as you can see this line represent both the input layer and the 1° hidden layer (you have to choose the n° of neuron (I put 100) )
model.add(Dense(100, input_shape = (X_train.shape[1],))
EDIT:
Before fitting your model you have to configure your model with this line:
model.compile(loss = 'mse', optimizer = 'Adam', metrics = ['mse'])
So you have to choose a metric that in this case is Mean Squarred Error and an optimizer like Adam, Adamax, ect.
Then you can fit your model choosing the data (X,Y), n° epochs, val_split and the batch size.
history = model.fit(X_train, y_train, epochs = 200,
validation_split = 0.1, batch_size=250)
I've got a 2D numpy matrix (from a DataFrame) of already condensed word vectors (I used a max pooling technique, am trying to compare a logres to a bi-LSTM approach), and I'm not sure how to prepare it to use it in a keras model.
I'm aware of the need of a 3D tensor for the Bi-LSTM model, and have tried googling solutions, but couldn't find a solution that worked.
This is what I have right now:
# Set model parameters
epochs = 4
batch_size = 32
input_shape = (1, 10235, 3072)
# Create the model
model = Sequential()
model.add(Bidirectional(LSTM(64, return_sequences = True, input_shape = input_shape)))
model.add(Dropout(0.5))
model.add(Dense(1, activation = 'sigmoid'))
# Try using different optimizers and different optimizer configs
model.compile('adam', 'binary_crossentropy', metrics = ['accuracy'])
# Fit the training set over the model and correct on the validation set
model.fit(inputs['X_train'], inputs['y_train'],
batch_size = batch_size,
epochs = epochs,
validation_data = [inputs['X_validation'], inputs['y_validation']])
# Get score over the test set
return model.evaluate(inputs['X_test'], inputs['y_test'])
I currently got the following error:
ValueError: Input 0 is incompatible with layer bidirectional_23: expected ndim=3, found ndim=2
The shape of my training data (inputs['X_train']) is (10235, 3072).
Thanks so much!
I've made it work with the suggestion of the reply by doing the following:
Remove return_sequence = True;
Apply the following transformations to the X sets: np.reshape(inputs[dataset], (inputs[dataset].shape[0], inputs[dataset].shape[1], 1))
Change the input shape of the LSTM layer to (10235, 3072, 1) which is the shape of X_train.
I want to change my model architecture a bit on the LSTM so it accepts the same exact flattened inputs the full connected approach does.
Working Dnn model from Keras examples
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.utils import to_categorical
# import the data
from keras.datasets import mnist
# read the data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
num_pixels = x_train.shape[1] * x_train.shape[2] # find size of one-dimensional vector
x_train = x_train.reshape(x_train.shape[0], num_pixels).astype('float32') # flatten training images
x_test = x_test.reshape(x_test.shape[0], num_pixels).astype('float32') # flatten test images
# normalize inputs from 0-255 to 0-1
x_train = x_train / 255
x_test = x_test / 255
# one hot encode outputs
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
num_classes = y_test.shape[1]
print(num_classes)
# define classification model
def classification_model():
# create model
model = Sequential()
model.add(Dense(num_pixels, activation='relu', input_shape=(num_pixels,)))
model.add(Dense(100, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
# compile model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
return model
# build the model
model = classification_model()
# fit the model
model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=10, verbose=2)
# evaluate the model
scores = model.evaluate(x_test, y_test, verbose=0)
Same problem but trying LSTM (syntax error still)
def kaggle_LSTM_model():
model = Sequential()
model.add(LSTM(128, input_shape=(x_train.shape[1:]), activation='relu', return_sequences=True))
# What does return_sequences=True do?
model.add(Dropout(0.2))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='softmax'))
opt = tf.keras.optimizers.Adam(lr=1e-3, decay=1e-5)
model.compile(loss='sparse_categorical_crossentropy', optimizer=opt,
metrics=['accuracy'])
return model
model_kaggle_LSTM = kaggle_LSTM_model()
# fit the model
model_kaggle_LSTM.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=10, verbose=2)
# evaluate the model
scores = model_kaggle_LSTM.evaluate(x_test, y_test, verbose=0)
Problem is here:
model.add(LSTM(128, input_shape=(x_train.shape[1:]), activation='relu', return_sequences=True))
ValueError: Input 0 is incompatible with layer lstm_17: expected
ndim=3, found ndim=2
If I go back and don't flatten x_train and y_train, it works. However, I'd like this to be "just another model choice" that feeds off the same pre-processed input. I thought passing shape[1:] would work as that it the real flattened input_shape. I'm sure it's something easy I'm missing about the dimensionality, but I couldn't get it after an hour of twiddling and debugging, although did figure out not flattening the 28x28 to 784 works, but I don't understand why it works. Thanks a lot!
For bonus points, an example of how to do either DNN or LSTM in either 1D (784,) or 2D (28, 28) would be the best.
RNN layers such as LSTM are meant for sequence processing (i.e. a series of vectors which their order of appearance matters). You can look at an image from top to bottom, and consider each row of pixels as a vector. Therefore, the image would be a sequence of vectors and can be fed to the RNN layer. Therefore, according to this description, you should expect that the RNN layer take an input of shape (sequence_length, number_of_features). That's why when you feed the images to the LSTM network in their original shape, i.e. (28,28), it works.
Now if you insist on feeding the LSTM model the flattened image, i.e. with shape (784,), you have at least two options: either you can consider this as a sequence of length one, i.e. (1, 748), which does not make much sense; or you can add a Reshape layer to your model to reshape back the input to its original shape suitable for the input shape of a LSTM layer, like this:
from keras.layers import Reshape
def kaggle_LSTM_model():
model = Sequential()
model.add(Reshape((28,28), input_shape=x_train.shape[1:]))
# the rest is the same...
From the official example in Keras docs, the stacked LSTM classifier is trained using categorical_crossentropy as a loss function, as expected. https://keras.io/getting-started/sequential-model-guide/#examples
But the y_train values are seeded using numpy.random.random() which outputs real numbers, versus 0,1 binary classification ( which is typical )
Are the y_train values being promoted to 0,1 values under the hood?
Can you even train this loss function against real values between 0,1 ?
How is accuracy then calculated ?
Confusing.. no?
from keras.models import Sequential
from keras.layers import LSTM, Dense
import numpy as np
data_dim = 16
timesteps = 8
num_classes = 10
# expected input data shape: (batch_size, timesteps, data_dim)
model = Sequential()
model.add(LSTM(32, return_sequences=True,
input_shape=(timesteps, data_dim))) # returns a sequence of vectors of dimension 32
model.add(LSTM(32, return_sequences=True)) # returns a sequence of vectors of dimension 32
model.add(LSTM(32)) # return a single vector of dimension 32
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
# Generate dummy training data
x_train = np.random.random((1000, timesteps, data_dim))
y_train = np.random.random((1000, num_classes))
# Generate dummy validation data
x_val = np.random.random((100, timesteps, data_dim))
y_val = np.random.random((100, num_classes))
model.fit(x_train, y_train,
batch_size=64, epochs=5,
validation_data=(x_val, y_val))
For this example, the y_train and y_test are not the one-hot encoding anymore, but the probabilities of each classes. So it is still applicable for cross-entropy. And we can treat the one-hot encoding as the special case of the probabilities vector.
y_train[0]
array([0.30172708, 0.69581121, 0.23264601, 0.87881279, 0.46294832,
0.5876406 , 0.16881395, 0.38856604, 0.00193709, 0.80681196])
I slightly misunderstand how to create a simple Sequence for my data.
The data has the following dimensions:
X_train.shape
(2369, 12)
y_train.shape
(2369,)
X_test.shape
(592, 12)
y_test.shape
(592,)
This is how I create the model:
batch_size = 128
nb_epoch = 20
in_out_neurons = X_train.shape[1]
dimof_middle = 100
model = Sequential()
model.add(Dense(batch_size, batch_input_shape=(None, in_out_neurons)))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(batch_size))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(in_out_neurons))
model.add(Activation('linear'))
# I am solving the regression problem, not the classification one
model.compile(loss="mean_squared_error", optimizer="rmsprop")
history = model.fit(X_train, y_train,
batch_size=batch_size, nb_epoch=nb_epoch,
verbose=1, validation_data=(X_test, y_test))
The error message:
Exception: Error when checking model input: expected dense_input_14 to
have shape (None, 1) but got array with shape (2369, 12)ç
The error is:
Error when checking model target: expected activation_42 to have shape
(None, 12) but got array with shape (2369, 1)
This error occurs at line:
model.add(Dense(in_out_neurons))
How to change Dense to make it work?
Another question is how to add a simple autoencoder in order to initialize weights of ANN?
One of your problems is that you seem to misunderstand what a batch is.
A batch is the number of training samples computed at a time, so instead of computing one training sample from X_train at a time you use, for example, 100 at a time. The important bit here is that this has nothing to do with your model.
So when you write
model.add(Dense(batch_size, batch_input_shape=(None, in_out_neurons)))
then you create a fully connected layer with an output size of one batch. That does not make a lot of sense.
Another problem is that your model's output is 12 neurons while your Y is only one value/neuron. Your model looks like this:
|
v
[128]
[128]
[ 12]
|
v
Then what fit() does is, it inputs a matrix of shape (128, 12) ((batch size, X_train.shape[1])) into the model and attempts to compare the output of shape (128,12) from the last layer to the corresponding Y values of the batch (shape (128,1)).