Normally, there's no need to produce a one hot vector output in a neural network; however, I am trying to train a GAN, so the output of one network needs to match the input of the other. Currently the last layer in my generator is a dense softmax, so that I have a probability distribution over the outputs, but I need to convert that vector to a one-hot, so it matches the input the discriminator expects. There doesn't seem to be any built in layer to do this with keras. I'm trying to write a lambda expression, but can't seem to get it to work.
Here is the code right now:
s1 = Input(shape=(self.sentence_length,))
embed = Embedding(output_dim=self.embedding_vector_length,
input_dim=self.vocabulary_size,
input_length=self.sentence_length)(s1)
x = concatenate([embed,embed],axis=1)
x = LSTM(self.latent_dimension,return_sequences=True)(x)
x = LSTM(self.embedding_vector_length,return_sequences=True)(x)
x = Lambda(lambda s: s[:,15:,:])(x)
x = Dense(self.vocabulary_size,activation='softmax')(x)
# x = Lambda(???)
model = Model(s1,x)
model.summary()
Related
I have developed a trivial Feed Forward neural network with Pytorch.
The neural network uses GloVe pre-trained embeddings in a freezed nn.Embeddings layer.
Next, the embedding layer splits into three embeddings. Each split is a different transformation applied to the initial embedding layer. Then the embeddings layer feed three nn.Linear layers. And finally I have a single output layer for a binary classification target.
The shape of the embedding tensor is [64,150,50]
-> 64: sentences in the batch,
-> 150: words per sentence,
-> 50: vector-size of a single word (pre-trained GloVe vector)
So after the transformation, the embedding layer splits into three layers with shape [64,50], where 50 = either the torch.mean(), torch.max() or torch.min() of the 150 words per sentence.
My questions are:
How could I feed the output layer from three different nn.Linear layers to predict a single target value [0,1].
Is this efficient and helpful to the total predictive power of the model? Or just selecting the average of the embeddings is sufficient and no improvement will be observed.
The forward() method of my PyTorch model is:
def forward(self, text):
embedded = self.embedding(text)
if self.use_pretrained_embeddings:
embedded_average = torch.mean(embedded, dim=1)
embedded_max = torch.max(embedded, dim=1)[0]
embedded_min = torch.min(embedded, dim=1)[0]
else:
embedded = self.flatten_layer(embedded)
input_layer = self.input_layer(embedded_average) #each Linear layer has the same value of hidden unit
input_layer = self.activation(input_layer)
input_layer_max = self.input_layer(embedded_max)
input_layer_max = self.activation(input_layer_max)
input_layer_min = self.input_layer(embedded_min)
input_layer_min = self.activation(input_layer_min)
#What should I do here? to exploit the weights of the 3 hidden layers
output_layer = self.output_layer(input_layer)
output_layer = self.activation_output(output_layer) #Sigmoid()
return output_layer
After the proposed answer the function is:
def forward(self, text):
embedded = self.embedding(text)
if self.use_pretrained_embeddings:
embedded_average = torch.mean(embedded, dim=1)
embedded_max = torch.max(embedded, dim=1)[0]
embedded_min = torch.min(embedded, dim=1)[0]
#use of average embeddings transformation
input_layer_average = self.input_layer(embedded_average)
input_layer_average = self.activation(input_layer_average)
#use of max embeddings transformation
input_layer_max = self.input_layer(embedded_max)
input_layer_max = self.activation(input_layer_max)
#use of min embeddings transformation
input_layer_min = self.input_layer(embedded_min)
input_layer_min = self.activation(input_layer_min)
else:
embedded = self.flatten_layer(embedded)
input_layer = torch.concat([input_layer_average, input_layer_max, input_layer_min], dim=1)
input_layer = self.activation(input_layer)
print("3",input_layer.shape) #[192,1] vs [64,1] -> output layer
if self.n_layers !=0:
for layer in self.layers:
input_layer = layer(input_layer)
output_layer = self.output_layer(input_layer)
output_layer = self.activation_output(output_layer)
return output_layer
This generates the following error:
ValueError: Using a target size (torch.Size([64, 1])) that is different to the input size (torch.Size([192, 1])) is deprecated. Please ensure they have the same size.
Expected outcome since the concatenated layer is 3x the size of the sentences (64). Any fix that could resolve it?
Regarding 1: You can use torch.concat to concatenate the outputs along the appropriate dimension, and then e.g. map them to a single output using another linear layer.
Regarding 2: You will have to try it yourself and see whether this is useful.
I am looking for something like this:
inputs = tf.keras.Input(shape = input_shape)
# network structure
x = layers.Dense(4, activation='relu')(inputs)
x = layers.Dense(4, activation='relu')(x)
#output layer
outputs = layers.Dense(output_size, activation='linear')(x)
#scaling layer??
outputs = layers.Scale(output_size)(outputs)
#build model
model = tf.keras.models.Model(inputs=inputs, outputs=outputs, name = 'mymodel')
I want the layer to scale my outputs by a scalar. And I don't want to specify this scalar, but rather have the model learn this scalar by itself.
Is there such a layer?
Or can I achieve this with a Multiply layer in combination with something like sympy?
I need this for a quantum-computing model (made with tfq) which can only give outputs between 0 and 1. I can't use a dense layer, because that would bring in classical machine-learning, which I don't want to use.
A scale layer is usually unnecessary because the desired information is in the relationship between the outputs.
If you want specific values, you probably need to change the loss function.
However, this link can allow you to make a personalized layer: https://keras.io/guides/making_new_layers_and_models_via_subclassing/
I've made a sequential model in keras, for generating musical sequences. Something very simple, with LSTM and dense softmax. I have 333 possible musical events
I know that model.fit() needs all training data in memory, which is a problem if it is one hot encoded. So I give the model an integer as input, transform this to one hot encoding in a Lambda layer, and then use sparse categorical cross entropy for loss. Because each batch would be transformed to one hot encoding on the fly, I thought that this would sort out my memory issues. But instead, it hangs at the beginning of fitting, and fills up my memory, even with small batch size. Evidently, I'm not understanding something about how keras works, which is not surprising, given that I'm new to it (and on that note, please point out anything too naive in my code).
1) what is happening behind the scenes? What is it about keras that I'm not understanding? It seems like keras is going ahead and running the Lambda layer on all of my training examples before doing any training.
2) How can I solve this, and make keras do it truly on the fly? Can I solve it with model.fit(), which I'm currently using, or do I need model.fit_generator(), which to me looks like it could solve this rather easily?
Here is some of my code:
def musicmodel(Tx, n_a, n_values):
"""
Arguments:
Tx -- length of a sequence in the corpus
n_a -- the number of activations used in our model (for the LSTM)
n_values -- number of unique values in the music data
Returns:
model -- a keras model
"""
# Define the input with a shape
X = Input(shape=(Tx,))
# Define s0, initial hidden state for the decoder LSTM
a0 = Input(shape=(n_a,), name='a0')
c0 = Input(shape=(n_a,), name='c0')
a = a0
c = c0
# Create empty list to append the outputs to while iterating
outputs = []
# Step 2: Loop
for t in range(Tx):
# select the "t"th time step from X.
x = Lambda(lambda x: x[:,t])(X)
# We need the class represented in one hot fashion:
x = Lambda(lambda x: tf.one_hot(K.cast(x, dtype='int32'), n_values))(x)
# We then reshape x to be (1, n_values)
x = reshapor(x)
# Perform one step of the LSTM_cell
a, _, c = LSTM_cell(x, initial_state=[a, c])
# Apply densor to the hidden state output of LSTM_Cell
out = densor(a)
# Add the output to "outputs"
outputs.append(out)
# Step 3: Create model instance
model = Model(inputs=[X,a0,c0],outputs=outputs)
return model
I then fit my model:
model = musicmodel(Tx, n_a, n_values)
opt = Adam(lr=0.01, beta_1=0.9, beta_2=0.999, decay=0.01)
model.compile(optimizer=opt, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
a0 = np.zeros((m, n_a))
c0 = np.zeros((m, n_a))
model.fit([X, a0, c0], list(Y), validation_split=0.25, epochs=600, verbose=2, batch_size=4)
My problem
I am trying to implement a Siamese network in Keras that is trained on triplets—[anchor_input, similar_input, different_input]—and executes on pairs—[input_a, input_b]—telling me whether they are similar or different. I am able to train fine on triplets, and have even trained and tested on pairs, but when I train on triplets and try to create my pair-wise network, I get the following error:
ValueError: logits and labels must have the same shape (() vs (?, ?))
Current network overview
My triplet network is defined using the following code, which I have pared down to something pretty minimal for this example:
def siamese_triplet(in_shape, feature_dim):
input_a = Input(shape=in_shape) # input is a series, n_samples by n_features
input_b = Input(shape=in_shape)
input_c = Input(shape=in_shape)
base_network = create_model() # makes my Siamese kernel network
processed_a = base_network(input_a) # base_network outputs a vector, say 50 elements long
processed_b = base_network(input_b)
processed_c = base_network(input_c)
l1_distance = lambda x: K.abs(x[0] - x[1]) # vector, 50 elements long
p_distance = Lambda(l1_distance,
output_shape=lambda x: x[0])([processed_a, processed_b])
n_distance = Lambda(l1_distance,
output_shape=lambda x: x[0])([processed_a, processed_c])
triplet_loss = Lambda(lambda x: K.mean(K.maximum(0, x[0] - x[1] + 1)),
output_shape=(1,))([p_distance, n_distance]) # scalar
model = Model([input_a, input_b, input_c], triplet_loss)
optimizer = SGD()
model.compile(optimizer=optimizer,
loss=lambda x, y: y) # passes triplet loss through
I train the network and get a fitted model object out. I then try to re-create the network with a structure built around input pairs, passing it the base_network layer extracted from my model, which ends up being model.layers[3]:
# triplet_net = siamese_triplet(...)
# model = triplet_net.fit(...)
pair_net = siamese_pair(input_shape, model.layers[3])
with siamese_pair defined as:
def siamese_pair(in_shape, base_network):
input_a = Input(shape=in_shape)
input_b = Input(shape=in_shape)
processed_a = base_network(input_a) # vector, 50 elements
processed_b = base_network(input_b)
distance = Lambda(lambda x: K.abs(x[0] - x[1]),
output_shape=lambda x: x[0])([processed_a, processed_b]) # vector, 50 elements
prediction = Lambda(lambda x: K.mean(x),
output_shape=(1,))(distance) # scalar
model = Model([input_a, input_b], prediction)
optimizer = SGD()
model.compile(optimizer=optimizer,
loss='binary_crossentropy') # for evaluation purposes
The model.compile(...) line throws the error.
Note that the triplet loss should push distances close to zero for objects that are the same (class 0), and push them towards 1 for objects that are different (class 1), so setting prediction = K.mean(distance) should be pretty close to one of these class labels, I would think.
My transition from triplet-loss training to pair-wise evaluation seems kind of janky to me, and I would love to figure out the best way to do it, so I am open to suggestions to improve the design. In the mean time, I would be happy just getting past this error so I can at least run and evaluate my performance classifying input pairs as similar or different.
My questions
Why am I getting the error above? It seems like it is expecting no y_true at all for the loss function, which is strange to me.
How do I fix the error above?
Is there a better way to pass my trained base_layer into a different Siamese network structure?
Is there a better way to get pair-wise predictions out of my trained base_layer in a different network structure?
Answers to just the first two would be great, but if somebody has suggestions on the last two as well, I am all ears.
I want to interpret an RNN by looking at the sequence-by-sequence values. It is possible to output these values with return_sequences. However, those values are then used as inputs into the next layer (e.g., a dense activation layer). I would like to output only the last value but record all values over the full sequence for interpretation. What's the easiest way to do this?
Create two models with the same layer, but in one of them you feed the Dense layer only with the last step of the RNN:
inputs = Input(inputShape)
outs = RNN(..., return_sequences=True)(inputs)
modelSequence = Model(inputs,outs)
#take only the last step
outs = Lambda(lambda x: x[:,-1])(outs)
outs = Dense(...)(outs)
modelSingle = Model(inputs,outs)
Use modelSingle,fit(x_data,y_data) to train as you've been doing normally.
Use modelSequence.predict(x_data) to see the results of the RNN without training.