NLP Keras - Dimension of Embedding and Global Average Pooling Layers

NLP Keras - Dimension of Embedding and Global Average Pooling Layers - python

I am trying to trace the calculation of Tensorflow’s NLP neural network pipeline below.
embedding_dim=16
model = Sequential([
vectorize_layer,
Embedding(vocab_size, embedding_dim, name="embedding"),
GlobalAveragePooling1D(),
Dense(16, activation='relu'),
Dense(1)
])
I am a little bit confused on the dimensions of the embedding and global average pooling layers.
For example, if I have a sample sentence vector [‘I’, ‘like’, ‘Chinese’, ‘food’]. The embedding layer will expand this vector from 4x1 dimensions to a matrix of 4x16 dimensions right? Then the global average pooling 1D will take an average of each feature, and return a vector of 16x1, where each value represents an average feature value of my sentence, am I correct?

Related

Autoencoder using MLP for anomaly detection in multivariate timeseries

I am developing an autoencoder using an MLP to detect anomalies in a multivariate time series. To simplify the problem, I started using only one series variable.
Univariate case
The way I'm applying it is to break the time series into pieces, and present those pieces to the network. For example, my series consists of 1000 points, which I break into 50 subseries of length 20. Each of these subseries becomes an example for learning the network.
What should the DAE input_shape be? I saw that there is a difference if shape=(20, ) and shape=(20,1). I leave below the code of the DAE that I have been working on. And how should the format of the last layer of the DAE be? When I use the output layer with only 1 neuron, the model works correctly, why?
model = keras.Sequential([
### ENCODING ###
layers.Input(shape=(df_train.shape[1], df_train.shape[2])),
# or ?
#layers.Input(shape=(df_train.shape[1],)),
layers.Dense(16, activation='sigmoid'),
layers.Dropout(rate=0.1),
layers.Dense(8, activation='sigmoid'),
### LATENT SPACE
layers.Dense(4, activation='sigmoid'),
### DECODING ###
layers.Dense(8, activation='sigmoid'),
layers.Dropout(rate=0.1),
layers.Dense(16, activation='sigmoid'),
layers.Dense(1, activation='sigmoid')
])
Multivariate case
Considering the multivariate case, in which I have 16 time series. How would the input shape and output layer look?

Dense layers, the building block of an MPL, only take a single dimension. So you must flatten your 2D vector into 1D. The shape of the vector will (width*height,).
The alternative is to use a Convolutional or Recurrent Autoencoder (LSTM/GRU). With a convolutional autoencoder, most of the layers will be either Conv2d or Conv1d. Then you would use a single Dense layer as the compressive bottleneck. Convolutional layers take inputs on shape (width,height,channel) - where channel can be 1 if there is no third dimension.

Feature Normalization/Standard Scalar in Keras

I am working with a Sequential Keras model and I trying to figure out the best method for feature scaling.
model = Sequential()
model.add(Masking(mask_value=-50, input_shape=(None,10)))
model.add(LayerNormalization(axis=-1))
model.add(LSTM(100, input_shape=(None,10)))
model.add(Dense(100, activation='relu'))
model.add(Dense(3, activation='softmax'))
print(model.summary())
In line 3, I have a LayerNormalization layer which according to documentation, scales to mean and standard deviation. However, I have also come across Batch normalization and tf.keras.layers.experimental.preprocessing.Normalization. My question is is this method similar to Sklearn's StandardScalar() or is there another method I could use to feature scale within the model?

This should work. It uses an UpSampling layer for a naive 5x5 image-based input:
# define model
model = Sequential()
# define input shape, output enough activations for for 128 5x5 image
model.add(Dense(128 * 5 * 5, input_dim=100))
# reshape vector of activations into 128 feature maps with 5x5
model.add(Reshape((5, 5, 128)))
# double input from 128 5x5 to 1 10x10 feature map
model.add(UpSampling2D())
# fill in detail in the upsampled feature maps and output a single image
model.add(Conv2D(1, (3,3), padding='same'))
# summarize model
model.summary()
But you can use the Conv2DTranspose layer too, which combines the UpSampling2D and Conv2D layers into one layer.
A TimeDistributed layer in the case of LSTMs will help. Refer

Replicated Convolutional Bi-Directional LSTM implementation in Keras diverging

This is the model I am trying to replicate (more information in linked paper):
In our models, we adopted one dropout layer between LSTM models and the first fully-connected layer and another dropout layer between the first fully-connected layer and the second fully-connected layer. Their masking probabilities are both set to 0.5.
...
For our proposed CBLSTM, one-layer CNN is firstly designed, whose filter number, filter size and pooling size are set to 150, 10 and 5. Therefore, the shape of the raw sensory sequence is changed from 100 x 12 to 19 x 150 after CNN. Then, a two-layer bi-directional LSTM is built on top of the CNN.
Backward and forward LSTMs share the same layer sizes as [150, 200]. Therefore, the output of the LSTM module is the concatenated vector of the representations learned by backward and forward LSTMs, and its dimensionality is 400. Then, before feeding the representation into the linear regression layer, two fully-connected layers with a size of [500, 600] are adopted. The nonlinearity activation functions in our proposed CBLSTM are all set to ReLu.
Source: Zhao, R., Yan, R., Wang, J., & Mao, K. (2017). Learning to monitor machine health with convolutional bi-directional LSTM networks. Sensors, 17(2), 273. link to paper
The input is 630 samples x 100 timesteps x 12 features.
How my model looks at the moment:
model = Sequential()
model.add(Conv1D(filters=150, kernel_size=10, activation='relu', input_shape=(100,12)))
model.add(MaxPooling1D(pool_size=5, strides=None, padding='valid'))
model.add(Bidirectional(LSTM(150, return_sequences=True), merge_mode='concat'))
model.add(Bidirectional(LSTM(200, return_sequences=False), merge_mode='concat'))
model.add(Dropout(0.5))
model.add(Dense(500, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(600, activation='relu'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['mae'])
While the training loss steadily decreases per epoch, the validation set does not and diverges pretty quickly. This indicates that there is a mistake in my model which I have not yet been able to find. Any ideas as to what is wrong?
Side note: I am using the same data as input as the authors.

Does this Keras Conv1D model correctly represent the intended architecture?

I'm new to Keras and am trying to use a 1D convolutional neural network (CNN) for multi-class classification. I've created a simple model and want to check that it correctly represents my desired architecture.
My input data is a numpy array of shape (number_of_samples, number of features), where number_of_samples = 3541 and number_of_features = 144. There are 277 classes and I've used one-hot encoding to represent the targets as an array of shape (number_of_samples,number_of_features). My desired architecture is shown in the picture below:
The code for my model (which I've run without any issues) is as follows:
# Variables:
############
num_features = 144
num_classes = 277
units = num_classes
input_dim = 1
num_filters = 1
kernel_size = 3
# Reshape training data and labels:
###################################
# inital training_data has shape (3541, 144)
training_data_reshaped = np.atleast_3d(training_data) # (has shape 3541, 144, 1)
# inital labels vector has shape (3541, 1)
new_labels_binary = to_categorical(labels) # One-hot encoding of class labels
# Build, compile and fit model:
###############################
model = Sequential()
# A 1D convolutional layer which applies 1 output filter with a window size (length) of 3 and
# a (default) stride length of 1
model.add(Conv1D(filters = num_filters,
kernel_size = kernel_size,
activation = 'relu',
input_shape=(num_features, input_dim)))
model.add(Flatten())
# Output layer
model.add(Dense(units=units))
sgd = optimizers.SGD()
model.compile(optimizer = sgd,
loss = 'categorical_crossentropy')
model.fit(x = training_data_reshaped,
y = new_labels_binary,
batch_size = batch_size)
print(model.summary())
Does my code correctly represent my desired architecture? In particular:
My aim is that each of the 142 neurons in the output of the convolutional layer is connected to each of the 277 neurons in the model output layer, and that, on input sample x, the vector output by the output layer is compared to row x of new_labels_binary. From what I understand of the Keras docs, this model should do just that, but I'm checking because I'm new to this and the docs were sometimes ambiguous!
I don't mean this to be vague: is there anything in my model which is not (quite) correct given my desired architecture? I just want to make sure I'm not missing anything!
Thanks in advance.

The structure looks fine to me but if you want to solve a multi label classification task the output layer should normally have a softmax activation.
model.add(Dense(units=units,activation='softmax'))
If you dont specify the activation for a Dense layer a linear activation is applied.

Keras word embedding in four gram model

I am following coursera neural network class and I am trying to pass the assignments using python+keras instead of octave.
I want to predict the fourth word given the previous three ones. My input documents total 250 unique words.
The model should have an embedding layer that maps each word to a 50-d vector space, a hidden layer with 200 neurons with sigmoid activation function and an output layer of 250 units scoring the probability of the forth word to be equal to those in my vocabulary through a softmax activation.
I am having troubles with dimensions. Here is my code:
from keras.models import Sequential
from keras.layers import Dense, Activation, Embedding
model = Sequential([Embedding(250,50),
Dense(200, activation='sigmoid'),
Dense(250, activation='softmax')
])
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
Yet I never get to compile the model since I am encountering the following error:
Exception: Input 0 is incompatible with layer dense_1: expected ndim=2, found ndim=3
Any hint will be much appreciated. Thanks in advance

From https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html
"All that the Embedding layer does is to map the integer inputs to the vectors found at the corresponding index in the embedding matrix, i.e. the sequence [1, 2] would be converted to [embeddings[1], embeddings[2]]. This means that the output of the Embedding layer will be a 3D tensor of shape (samples, sequence_length, embedding_dim)."
Your embedding layer outputs 3 dimension vectors while the dense layers expects 2 dim vecs.
You can follow the links tutorial and with some mods it will fit your problem.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

NLP Keras - Dimension of Embedding and Global Average Pooling Layers - python

Related

Autoencoder using MLP for anomaly detection in multivariate timeseries

Feature Normalization/Standard Scalar in Keras

Replicated Convolutional Bi-Directional LSTM implementation in Keras diverging

Does this Keras Conv1D model correctly represent the intended architecture?

Keras word embedding in four gram model

Categories

Resources