Keras word embedding in four gram model - python

I am following coursera neural network class and I am trying to pass the assignments using python+keras instead of octave.
I want to predict the fourth word given the previous three ones. My input documents total 250 unique words.
The model should have an embedding layer that maps each word to a 50-d vector space, a hidden layer with 200 neurons with sigmoid activation function and an output layer of 250 units scoring the probability of the forth word to be equal to those in my vocabulary through a softmax activation.
I am having troubles with dimensions. Here is my code:
from keras.models import Sequential
from keras.layers import Dense, Activation, Embedding
model = Sequential([Embedding(250,50),
Dense(200, activation='sigmoid'),
Dense(250, activation='softmax')
])
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
Yet I never get to compile the model since I am encountering the following error:
Exception: Input 0 is incompatible with layer dense_1: expected ndim=2, found ndim=3
Any hint will be much appreciated. Thanks in advance

From https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html
"All that the Embedding layer does is to map the integer inputs to the vectors found at the corresponding index in the embedding matrix, i.e. the sequence [1, 2] would be converted to [embeddings[1], embeddings[2]]. This means that the output of the Embedding layer will be a 3D tensor of shape (samples, sequence_length, embedding_dim)."
Your embedding layer outputs 3 dimension vectors while the dense layers expects 2 dim vecs.
You can follow the links tutorial and with some mods it will fit your problem.

Related

Autoencoder using MLP for anomaly detection in multivariate timeseries

I am developing an autoencoder using an MLP to detect anomalies in a multivariate time series. To simplify the problem, I started using only one series variable.
Univariate case
The way I'm applying it is to break the time series into pieces, and present those pieces to the network. For example, my series consists of 1000 points, which I break into 50 subseries of length 20. Each of these subseries becomes an example for learning the network.
What should the DAE input_shape be? I saw that there is a difference if shape=(20, ) and shape=(20,1). I leave below the code of the DAE that I have been working on. And how should the format of the last layer of the DAE be? When I use the output layer with only 1 neuron, the model works correctly, why?
model = keras.Sequential([
### ENCODING ###
layers.Input(shape=(df_train.shape[1], df_train.shape[2])),
# or ?
#layers.Input(shape=(df_train.shape[1],)),
layers.Dense(16, activation='sigmoid'),
layers.Dropout(rate=0.1),
layers.Dense(8, activation='sigmoid'),
### LATENT SPACE
layers.Dense(4, activation='sigmoid'),
### DECODING ###
layers.Dense(8, activation='sigmoid'),
layers.Dropout(rate=0.1),
layers.Dense(16, activation='sigmoid'),
layers.Dense(1, activation='sigmoid')
])
Multivariate case
Considering the multivariate case, in which I have 16 time series. How would the input shape and output layer look?
Dense layers, the building block of an MPL, only take a single dimension. So you must flatten your 2D vector into 1D. The shape of the vector will (width*height,).
The alternative is to use a Convolutional or Recurrent Autoencoder (LSTM/GRU). With a convolutional autoencoder, most of the layers will be either Conv2d or Conv1d. Then you would use a single Dense layer as the compressive bottleneck. Convolutional layers take inputs on shape (width,height,channel) - where channel can be 1 if there is no third dimension.

Understanding of Basic Neural Network Structure

Let's say I want to code this basic Neural Network Structure in Keras which has 10 units in Input Layer and 3 units in Output layer.
Now if I am using Keras, and give input_shape of more then 10, how it will adjust in it.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
model = Sequential()
model.add(Dense(10, activation = 'relu', input_shape = (64,)))
model.add(Dense(3, activation = 'sigmoid'))
model.summary()
You see, here input_shape is of size 64, but how will it adjust in model whose first layer has 10 units because for what I have learned that size of input shape/vector should be equal to number of units in the input layer.
Or Am I not implementing this neural network right?
That would not be a problem. The weight matrix of shape (10,64) would be used in input layer. your input has shape 64 and first hidden layer has 10 units giving a output of 3 units. Seems fine to me.
But your input layer itself is 64. So what you are getting is a 3-layer network with a hidden layer of 10 units.
If the shape of your input vector is 64, then you really need to have an input layer with size 64. The input layer of a neural network doesn't perform any computations. It just passes the inputs forward to the first hidden layer. This one, on the other hand, performs the computations for all neurons contained in it (linear combination of input vector and weights, later served as an input to the activation function, which is the ReLU in your case).
In your code, you are building a neural net with 64 input neurons (which again don't perform any computations), 10 neurons in the first (and only) hidden layer and 3 neurons in the output layer.

How to get only last output of sequence model in Keras?

I trained a Many-to-Many sequence model in Keras with return_sequences=True and TimeDistributed wrapper on the last Dense layer:
model = Sequential()
model.add(Embedding(input_dim=vocab_size, output_dim=50))
model.add(LSTM(100, return_sequences=True))
model.add(TimeDistributed(Dense(vocab_size, activation='softmax')))
# train...
model.save_weights("weights.h5")
So during the training the loss is calculated over all hidden states (in every timestamp). But for inference I only need the get output on the last timestamp. So I load the weights into Many-to-One sequence model for inference without TimeDistributed wrapper and I set return_sequences=False to get only last output of the LSTM layer:
inference_model = Sequential()
inference_model.add(Embedding(input_dim=vocab_size, output_dim=50))
inference_model.add(LSTM(100, return_sequences=False))
inference_model.add(Dense(vocab_size, activation='softmax'))
inference_model.load_weights("weights.h5")
When I test my inference model on a sequence with length 20 I expect to get a prediction with shape (vocab_size) but inference_model.predict(...) still returns predictions for every timestamp - a tensor of shape (20, vocab_size)
If, for whatever reason, you need only the last timestep during inference, you can build a new model which applies the trained model on the input and returns the last timestep as its output using the Lambda layer:
from keras.models import Model
from keras.layers import Input, Lambda
inp = Input(shape=put_the_input_shape_here)
x = model(inp) # apply trained model on the input
out = Lambda(lambda x: x[:,-1])(x)
inference_model = Model(inp, out)
Side Note: As already stated in this answer, TimeDistributed(Dense(...)) and Dense(...) are equivalent, since Dense layer is applied on the last dimension of its input Tensor. Hence, that's why you get the same output shape.

How to use Conv1D and Bidirectional LSTM in keras to do multiclass classification of each timestep?

I am trying to use a Conv1D and Bidirectional LSTM in keras (much like in this question) for signal processing, but doing a multiclass classification of each time step.
The problem is that even though the shapes used by Conv1D and LSTM are somewhat equivalent:
Conv1D: (batch, length, channels)
LSTM: (batch, timeSteps, features)
The output of the Conv1D is = (length - (kernel_size - 1)/strides), and therefore doesn't match the LSTM shape anymore, even without using MaxPooling1D and Dropout.
To be more specific, my training set X has n samples with 1000 time steps and one channel (n_samples, 1000, 1), and I used LabelEncoder and OneHotEncoder so y has n samples, 1000 time steps and 5 one hot encoded classes (n_samples, 1000, 5).
Since one class is much more prevalent than the others (is actually the absence of signal), I am using loss='sparse_categorical_crossentropy', sample_weight_mode="temporal" and sample_weight to give a higher weight to time steps containing meaningful classes.
model = Sequential()
model.add(Conv1D(128, 3, strides=1, input_shape = (1000, 1), activation = 'relu'))
model.add(Bidirectional(LSTM(128, return_sequences=True)))
model.add(TimeDistributed(Dense(5, activation='softmax')))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['categorical_accuracy'], sample_weight_mode="temporal")
print(model.summary())
Model
When I try to fit the model I get this error message:
Error when checking target: expected time_distributed_1 to have shape
(None, 998, 1) but got array with shape (100, 1000, 5).
Is there a way to make such a neural network configuration work?
Your convolution is cutting the tips of the sequence. Use padding='same' in the convolutional layers.
The message, though, seems not to fit your model. Your model clearly has 5 output features (because of Dense(5)), but the massage says it expects 1. Maybe this is happening because of "sparse" crossentropy. You should probably, by the format of your data, use a "categorical_crossentropy".

Keras confusion about number of layers

I'm a bit confused about the number of layers that are used in Keras models. The documentation is rather opaque on the matter.
According to Jason Brownlee the first layer technically consists of two layers, the input layer, specified by input_dim and a hidden layer. See the first questions on his blog.
In all of the Keras documentation the first layer is generally specified as
model.add(Dense(number_of_neurons, input_dim=number_of_cols_in_input, activtion=some_activation_function)).
The most basic model we could make would therefore be:
model = Sequential()
model.add(Dense(1, input_dim = 100, activation = None))
Does this model consist of a single layer, where 100 dimensional input is passed through a single input neuron, or does it consist of two layers, first a 100 dimensional input layer and second a 1 dimensional hidden layer?
Further, if I were to specify a model like this, how many layers does it have?
model = Sequential()
model.add(Dense(32, input_dim = 100, activation = 'sigmoid'))
model.add(Dense(1)))
Is this a model with 1 input layer, 1 hidden layer, and 1 output layer or is this a model with 1 input layer and 1 output layer?
Your first one consists of a 100 neurons input layer connected to one single output neuron
Your second one consists of a 100 neurons input layer, one hidden layer of 32 neurons and one output layer of one single neuron.
You have to think of your first layer as your input layer (with the same number of neurons as the dimenson, so 100 for you) connected to another layer with as many neuron as you specify (1 in your first case, 32 in the second one)
In Keras what is useful is the command
model.summary()
For your first question, the model is :
1 input layer and 1 output layer.
For the second question :
1 input layer
1 hidden layer
1 activation layer (The sigmoid one)
1 output layer
For the input layer, this is abstracted by Keras with the input_dim arg or input_shape, but you can find this layer in :
from keras.layers import Input
Same for the activation layer.
from keras.layers import Activation
# Create a `Sequential` model and add a Dense layer as the first layer.
model = tf.keras.models.Sequential()
model.add(tf.keras.Input(shape=(16,)))
model.add(tf.keras.layers.Dense(32, activation='relu'))
# Now the model will take as input arrays of shape (None, 16)
# and output arrays of shape (None, 32).
# Note that after the first layer, you don't need to specify
# the size of the input anymore:
model.add(tf.keras.layers.Dense(32))
model.output_shape
(None, 32)
model.layers
[<keras.layers.core.dense.Dense at 0x7f494062e950>,
<keras.layers.core.dense.Dense at 0x7f4944048d90>]
model.summary()
Output
it may help you understand clearly

Categories

Resources