Feature Normalization/Standard Scalar in Keras

Feature Normalization/Standard Scalar in Keras - python

I am working with a Sequential Keras model and I trying to figure out the best method for feature scaling.
model = Sequential()
model.add(Masking(mask_value=-50, input_shape=(None,10)))
model.add(LayerNormalization(axis=-1))
model.add(LSTM(100, input_shape=(None,10)))
model.add(Dense(100, activation='relu'))
model.add(Dense(3, activation='softmax'))
print(model.summary())
In line 3, I have a LayerNormalization layer which according to documentation, scales to mean and standard deviation. However, I have also come across Batch normalization and tf.keras.layers.experimental.preprocessing.Normalization. My question is is this method similar to Sklearn's StandardScalar() or is there another method I could use to feature scale within the model?

This should work. It uses an UpSampling layer for a naive 5x5 image-based input:
# define model
model = Sequential()
# define input shape, output enough activations for for 128 5x5 image
model.add(Dense(128 * 5 * 5, input_dim=100))
# reshape vector of activations into 128 feature maps with 5x5
model.add(Reshape((5, 5, 128)))
# double input from 128 5x5 to 1 10x10 feature map
model.add(UpSampling2D())
# fill in detail in the upsampled feature maps and output a single image
model.add(Conv2D(1, (3,3), padding='same'))
# summarize model
model.summary()
But you can use the Conv2DTranspose layer too, which combines the UpSampling2D and Conv2D layers into one layer.
A TimeDistributed layer in the case of LSTMs will help. Refer

Related

Why is a CNN model struggling to classify a colored MNIST?

I'm trying to classify colored MNIST digits with a basic CNN architecture on Keras. Here is the piece of code that colors the original dataset into purely either red, green or blue.
def load_norm_data():
## load basic mnist
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
train_images = np.zeros((*x_train.shape, 3)) # orig shape: (60 000, 28, 28, 1) -> rgb shape: (60 000, 28, 28, 3)
for num in range(x_train.shape[0]):
rgb = np.random.randint(3)
train_images[num, ..., rgb] = x_train[num]/255
return train_images, y_train
if __name__ == '__main__':
ims, labels = load_norm_data()
for num in range(10):
plt.subplot(2, 5, num+1)
plt.imshow(ims[num])
plt.axis('off')
which gives for the first couple of digits:
Then, I attempt to classify this colored dataset into the same 10 digit classes of MNIST, so the labels aren't changing --and yet the models accuracy drops from 95% for non-colored MNIST, to wildly variable 30-70% on colored MNIST, vastly depending on weight initialization... Please find below the architecture of said model:
model = keras.Sequential()
model.add(keras.layers.Conv2D(64, kernel_size=(3,3), padding='same'))
model.add(keras.layers.MaxPool2D(pool_size=(2,2)))
model.add(keras.layers.Conv2D(64, kernel_size=(3,3), padding='same'))
model.add(keras.layers.MaxPool2D(pool_size=(2,2), padding='same'))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(10, activation='relu'))
model.add(keras.layers.Softmax())
input_shape = train_images.shape
model.build(input_shape)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
model.fit(train_images, train_numbers, batch_size=12, epochs=25)
Initially, I thought that this drop in performance might be linked to data irregularity (e.g. imagine a lot of 3s in the data ended up being green, thus the model learns green = 3). So I checked the data, the counts are good and the rgb distribution for each class is near 33% for each color too. I also checked the misclassified images to see if there were many representatives of a certain color or digit, but it doesn't seem to be the case either. In any case, after reading Keras' documentation and because of the fact that Conv2D forces you to pass it a 2-dimensional kernel_size that I imagine thus operates on all channels of the input image, the model shouldn't be taking color into account for classification here.
Am I missing something here?
Please let me know if you need any further information.
Thank you in advance.

The last part of the model includes a dense -> relu -> softmax. The relu activation should be removed. In addition, you might benefit from adding non-linearities (e.g., relu) in your convolutional blocks. Otherwise, the neural network will end up being a (big) linear function and will not work as well for non-linear data.
model = keras.Sequential()
model.add(keras.layers.Conv2D(64, kernel_size=(3,3), padding='same', activation='relu'))
model.add(keras.layers.MaxPool2D(pool_size=(2,2)))
model.add(keras.layers.Conv2D(64, kernel_size=(3,3), padding='same', activation='relu'))
model.add(keras.layers.MaxPool2D(pool_size=(2,2), padding='same'))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(10))
model.add(keras.layers.Softmax())
It is interesting that the original model worked well on the MNIST dataset. I cannot say for sure why, but perhaps the MNIST dataset is simple enough that the model was able to cope. Also, the relu -> softmax would clamp negative values to 0, and maybe there were not many negative values.

How many parameters are being optimised over in a simple CNN?

Okay so here's my CNN (simple example from a tutorial) along with some arithmetic to get the total number of free parameters.
We've got a dataset of 28*28 grayscale image (MNIST).
First layer is a 2D convolution using 32 3x3 kernels. Dimensionality of the output is 26x26x32 (kernel stride length was 1 and we have 32 feature maps of 26x26). Running parameter count: 288
Second layer is 2x2 MaxPool with a 2x2. Dimensionality of the output is 13x13x32 but then we flatten so we got a vector of length 5408. No extra parameters here.
Third layer is Dense. A 5408x100 matrix. Dimensionality of the output is 100. Running Parameter count: 540988
Fourth layer is Dense also. A 100x10 matrix. Dimensionality of the output is 10. Running Parameter count: 541988
Then we're supposed to do stochastic gradient descent on a 541988 parameter space!
That feels like a ridiculously big number to me. And this is meant to be the hello world problem of CNNs. Am I missing something fundamental in my understanding of how this is meant to work? Or maybe the number is correct but it's not actually a big deal for a computer to crunch?
In case it helps. Here is how the model was built in Keras:
def define_model():
model = Sequential()
model.add(Conv2D(32, (3,3), activation = 'relu', kernel_initializer = 'he_uniform', input_shape=(28,28,1)))
model.add(MaxPooling2D((2,2)))
model.add(Flatten())
model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(10, activation='softmax'))
opt = SGD(lr=0.01, momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy', metric=['accuracy'])
return model

Replicated Convolutional Bi-Directional LSTM implementation in Keras diverging

This is the model I am trying to replicate (more information in linked paper):
In our models, we adopted one dropout layer between LSTM models and the first fully-connected layer and another dropout layer between the first fully-connected layer and the second fully-connected layer. Their masking probabilities are both set to 0.5.
...
For our proposed CBLSTM, one-layer CNN is firstly designed, whose filter number, filter size and pooling size are set to 150, 10 and 5. Therefore, the shape of the raw sensory sequence is changed from 100 x 12 to 19 x 150 after CNN. Then, a two-layer bi-directional LSTM is built on top of the CNN.
Backward and forward LSTMs share the same layer sizes as [150, 200]. Therefore, the output of the LSTM module is the concatenated vector of the representations learned by backward and forward LSTMs, and its dimensionality is 400. Then, before feeding the representation into the linear regression layer, two fully-connected layers with a size of [500, 600] are adopted. The nonlinearity activation functions in our proposed CBLSTM are all set to ReLu.
Source: Zhao, R., Yan, R., Wang, J., & Mao, K. (2017). Learning to monitor machine health with convolutional bi-directional LSTM networks. Sensors, 17(2), 273. link to paper
The input is 630 samples x 100 timesteps x 12 features.
How my model looks at the moment:
model = Sequential()
model.add(Conv1D(filters=150, kernel_size=10, activation='relu', input_shape=(100,12)))
model.add(MaxPooling1D(pool_size=5, strides=None, padding='valid'))
model.add(Bidirectional(LSTM(150, return_sequences=True), merge_mode='concat'))
model.add(Bidirectional(LSTM(200, return_sequences=False), merge_mode='concat'))
model.add(Dropout(0.5))
model.add(Dense(500, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(600, activation='relu'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['mae'])
While the training loss steadily decreases per epoch, the validation set does not and diverges pretty quickly. This indicates that there is a mistake in my model which I have not yet been able to find. Any ideas as to what is wrong?
Side note: I am using the same data as input as the authors.

Activation function error in a 1D CNN in Keras

I'm creating a model to classify if the input waverform contains rising edge of SDA of I2C line.
My input has 20000 datapoints and 100 training data.
I've initially found an answer regarding the input in here Keras 1D CNN: How to specify dimension correctly?
However, I'm getting an error in the activation function:
ValueError: Error when checking target: expected activation_1 to have 3 dimensions, but got array with shape (100, 1)
My model is:
model.add(Conv1D(filters=n_filter,
kernel_size=input_filter_length,
strides=1,
activation='relu',
input_shape=(20000,1)))
model.add(BatchNormalization())
model.add(MaxPooling1D(pool_size=4, strides=None))
model.add(Dense(1))
model.add(Activation("sigmoid"))
adam = Adam(lr=learning_rate)
model.compile(optimizer= adam, loss='binary_crossentropy', metrics=['accuracy'])
model.fit(train_data, train_label,
nb_epoch=10,
batch_size=batch_size, shuffle=True)
score = np.asarray(model.evaluate(test_new_data, test_label, batch_size=batch_size))*100.0
I can't determine the problem in here. On why the activation function expects a 3D tensor.

The problem lies in the fact that starting from keras 2.0, a Dense layer applied to a sequence will apply the layer to each time step - so given a sequence it will produce a sequence. So your Dense is actually producing a sequence of 1-element vectors and this causes your problem (as your target is not a sequence).
There are several ways on how to reduce a sequence to a vector and then apply a Dense to it:
GlobalPooling:
You may use GlobalPooling layers like GlobalAveragePooling1D or GlobalMaxPooling1D, eg.:
model.add(Conv1D(filters=n_filter,
kernel_size=input_filter_length,
strides=1,
activation='relu',
input_shape=(20000,1)))
model.add(BatchNormalization())
model.add(GlobalMaxPooling1D(pool_size=4, strides=None))
model.add(Dense(1))
model.add(Activation("sigmoid"))
Flattening:
You might colapse the whole sequence to a single vector using Flatten layer:
model.add(Conv1D(filters=n_filter,
kernel_size=input_filter_length,
strides=1,
activation='relu',
input_shape=(20000,1)))
model.add(BatchNormalization())
model.add(MaxPooling1D(pool_size=4, strides=None))
model.add(Flatten())
model.add(Dense(1))
model.add(Activation("sigmoid"))
RNN Postprocessing:
You could also add a recurrent layer on a top of your sequence and make it to return only the last output:
model.add(Conv1D(filters=n_filter,
kernel_size=input_filter_length,
strides=1,
activation='relu',
input_shape=(20000,1)))
model.add(BatchNormalization())
model.add(MaxPooling1D(pool_size=4, strides=None))
model.add(SimpleRNN(10, return_sequences=False))
model.add(Dense(1))
model.add(Activation("sigmoid"))

Conv1D has its output with 3 dimensions (and it will keep like that until the Dense layer).
Conv output: (BatchSize, Length, Filters)
For the Dense layer to output only one result, you need to add a Flatten() or Reshape((shape)) layer, to make it (BatchSize, Lenght) only.
If you call model.summary(), you will see exactly what shape each layer is outputting. You have to adjust the output to be exactly the same shape as the array you pass as the correct results. The None that appears in those shapes is the batch size and may be ignored.
About your model: I think you need more convolution layers, reducing the number of filters gradually, because condensing so much data in a single Dense layer does not usually bring good results.
About dimensions: keras layers toturial and samples

Keras : How can I transform my data in order to fit for this specific Deep Learning architecture?

Let's say that my training data is a 3d numpy array with dimensions (4155, 5, 150). This data consists of 4155 training samples, each one featuring a 5*150 matrix, while my labels are 4155. I then want to feed it to the following architecture :
model = Sequential()
model.add(Convolution1D(input_dim=4,
input_length=1000,
nb_filter=320,
filter_length=26,
border_mode="valid",
activation="relu",
subsample_length=1))
model.add(MaxPooling1D(pool_length=13, stride=13))
model.add(Dropout(0.2))
model.add(brnn)
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(input_dim=75*640, output_dim=925))
model.add(Activation('relu'))
model.add(Dense(input_dim=925, output_dim=919))
model.add(Activation('sigmoid'))
The problem is that a) I don't know how to change the dimensionality of my input array, so that it can fit to this model. The above specified layer parameters are just for the example. I simply want to use a Convolutional Layer, followed by a Bidirrectional LSTM and finally two Fully Connected Layers. Does anyone have an idea ?
Thanks in advance !

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Feature Normalization/Standard Scalar in Keras - python

Related

Why is a CNN model struggling to classify a colored MNIST?

How many parameters are being optimised over in a simple CNN?

Replicated Convolutional Bi-Directional LSTM implementation in Keras diverging

Activation function error in a 1D CNN in Keras

Keras : How can I transform my data in order to fit for this specific Deep Learning architecture?

Categories

Resources