Related
I am implementing a simple multitask model in Keras. I used the code given in the documentation under the heading of shared layers.
I know that in multitask learning, we share some of the initial layers in our model and the final layers are made individual to the specific tasks as per the link.
I have following two cases in keras API where in the first, I am using keras.layers.concatenate while in the other, I am not using any keras.layers.concatenate.
I am posting the codes as well as the models for each case as follows.
Case-1 code
import keras
from keras.layers import Input, LSTM, Dense
from keras.models import Model
from keras.models import Sequential
from keras.layers import Dense
from keras.utils.vis_utils import plot_model
tweet_a = Input(shape=(280, 256))
tweet_b = Input(shape=(280, 256))
# This layer can take as input a matrix
# and will return a vector of size 64
shared_lstm = LSTM(64)
# When we reuse the same layer instance
# multiple times, the weights of the layer
# are also being reused
# (it is effectively *the same* layer)
encoded_a = shared_lstm(tweet_a)
encoded_b = shared_lstm(tweet_b)
# We can then concatenate the two vectors:
merged_vector = keras.layers.concatenate([encoded_a, encoded_b], axis=-1)
# And add a logistic regression on top
predictions1 = Dense(1, activation='sigmoid')(merged_vector)
predictions2 = Dense(1, activation='sigmoid')(merged_vector)
# We define a trainable model linking the
# tweet inputs to the predictions
model = Model(inputs=[tweet_a, tweet_b], outputs=[predictions1, predictions2])
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
Case-1 Model
Case-2 code
import keras
from keras.layers import Input, LSTM, Dense
from keras.models import Model
from keras.models import Sequential
from keras.layers import Dense
from keras.utils.vis_utils import plot_model
tweet_a = Input(shape=(280, 256))
tweet_b = Input(shape=(280, 256))
# This layer can take as input a matrix
# and will return a vector of size 64
shared_lstm = LSTM(64)
# When we reuse the same layer instance
# multiple times, the weights of the layer
# are also being reused
# (it is effectively *the same* layer)
encoded_a = shared_lstm(tweet_a)
encoded_b = shared_lstm(tweet_b)
# And add a logistic regression on top
predictions1 = Dense(1, activation='sigmoid')(encoded_a )
predictions2 = Dense(1, activation='sigmoid')(encoded_b)
# We define a trainable model linking the
# tweet inputs to the predictions
model = Model(inputs=[tweet_a, tweet_b], outputs=[predictions1, predictions2])
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
Case-2 Model
In both cases, the LSTMlayer is shared only. In case-1, we have keras.layers.concatenate but in case-2, we don't have any keras.layers.concatenate.
My question is, which one is multitasking, case-1 or case-2? Morover, what is the function of keras.layers.concatenate in case-1?
Both are multi-task models, as this only depends if there are multiple outputs with one task associated to each output.
The difference is that your first model explicitly concatenates features produced by the shared layer, so both output tasks can consider information from both inputs. The second model only has connections from one input directly to one of the outputs, without considering the other input. The only link between models here is that they share the LSTM weights.
I've been following Towards Data Science's tutorial about word2vec and skip-gram models, but I stumbled upon a problem that I cannot solve, despite searching about it a lot and trying multiple unsuccessful solutions.
https://towardsdatascience.com/understanding-feature-engineering-part-4-deep-learning-methods-for-text-data-96c44370bbfa
The step that it shows you how to build the skip-gram model architecture seems deprecated because of the use of the Merge layer from keras.layers.
What I tried to do was translate his piece of code - which is implemented in the Sequential API of Keras - to the Functional API to solve the deprecation of the Merge layer, by replacing it with the keras.layers.Dot layer. However, I'm still stuck in this step of merging the two models (word and context) into the final model, whose architecture must be like this:
Here's the code that the author used:
from keras.layers import Merge
from keras.layers.core import Dense, Reshape
from keras.layers.embeddings import Embedding
from keras.models import Sequential
# build skip-gram architecture
word_model = Sequential()
word_model.add(Embedding(vocab_size, embed_size,
embeddings_initializer="glorot_uniform",
input_length=1))
word_model.add(Reshape((embed_size, )))
context_model = Sequential()
context_model.add(Embedding(vocab_size, embed_size,
embeddings_initializer="glorot_uniform",
input_length=1))
context_model.add(Reshape((embed_size,)))
model = Sequential()
model.add(Merge([word_model, context_model], mode="dot"))
model.add(Dense(1, kernel_initializer="glorot_uniform", activation="sigmoid"))
model.compile(loss="mean_squared_error", optimizer="rmsprop")
And here is my attempt to translate the Sequential code implementation into the Functional one:
from keras import models
from keras import layers
from keras import Input, Model
word_input = Input(shape=(1,))
word_x = layers.Embedding(vocab_size, embed_size, embeddings_initializer='glorot_uniform')(word_input)
word_reshape = layers.Reshape((embed_size,))(word_x)
word_model = Model(word_input, word_reshape)
context_input = Input(shape=(1,))
context_x = layers.Embedding(vocab_size, embed_size, embeddings_initializer='glorot_uniform')(context_input)
context_reshape = layers.Reshape((embed_size,))(context_x)
context_model = Model(context_input, context_reshape)
model_input = layers.dot([word_model, context_model], axes=1, normalize=False)
model_output = layers.Dense(1, kernel_initializer='glorot_uniform', activation='sigmoid')
model = Model(model_input, model_output)
However, when executed, the following error is returned:
ValueError: Layer dot_5 was called with an input that isn't a symbolic
tensor. Received type: . Full
input: [,
]. All inputs to
the layer should be tensors.
I'm a total beginner to the Functional API of Keras, I will be grateful if you could give me some guidance in this situation on how could I input the context and word models into the dot layer to achieve the architecture in the image.
You are passing Model instances to the layer, however as the error suggests you need to pass Keras Tensors (i.e. outputs of layers or models) to layers in Keras. You have two option here. One is to use the .output attribute of the Model instance like this:
dot_output = layers.dot([word_model.output, context_model.output], axes=1, normalize=False)
or equivalently, you can use the output tensors directly:
dot_output = layers.dot([word_reshape, context_reshape], axes=1, normalize=False)
Further, you need to apply the Dense layer which is followed on the dot_output and pass instances of Input layer as inputs of Model. Therefore:
model_output = layers.Dense(1, kernel_initializer='glorot_uniform',
activation='sigmoid')(dot_output)
model = Model([word_input, context_input], model_output)
I'm kind of a newbie to tensorflow and building neural networks.
I'm trying to make a neural network with the tf.keras api that will take a single input, and give 3 outputs. Here is my code:
import os
import tensorflow as tf
from tensorflow import keras
import numpy as np
train_times = np.array([[1],[2],[3],[4],[5],[6],[7],[8]])
train_sensors = np.array([[0.1,0.15,0.2],[0.25,0.3,0.35],[0.4,0.45,0.5],[0.55,0.6,0.65],[0.7,0.75,0.8],[0.85,0.9,0.95],[0.05,0.33,0.56],[0.8,0.35,0.9]])
test_times = np.array([[1],[2],[3],[4],[5],[6],[7],[8]])
test_sensors = np.array([[0.1,0.15,0.2],[0.25,0.3,0.35],[0.4,0.45,0.5],[0.55,0.6,0.65],[0.7,0.75,0.8],[0.85,0.9,0.95],[0.05,0.33,0.56],[0.8,0.35,0.9]])
print(train_sensors[0].shape)
def create_model():
model = tf.keras.models.Sequential([
keras.layers.Dense(5, activation=tf.nn.relu, input_shape=(1,), name="Input"),
keras.layers.Dense(10,activation=tf.nn.relu, name="Middle"),
keras.layers.Dropout(0.2),
keras.layers.Dense(3, activation=tf.nn.softmax, name="Out")
])
model.compile(optimizer=tf.keras.optimizers.Adam(), loss=tf.keras.losses.sparse_categorical_crossentropy,
metrics=['accuracy'])
return model
model = create_model()
model.summary()
checkpoint_path = "sensor_predict.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
cp_callback = tf.keras.callbacks.ModelCheckpoint(checkpoint_path,save_weights_only=True,verbose=1)
model.fit(x=train_times, y=train_sensors,epochs = 10,validation_data = (test_sensors, test_times), callbacks = [cp_callback])
I have specified that the last layer should have three outputs, but I get this error every time I run it:
ValueError: Error when checking target: expected Out to have shape (1,) but got array with shape (3,)
I can't figure out why it seems to think I want a single output from the network.
NOTE: The dataset I am using is not what I will actually use. I'm just trying to get a functional network, and then I'll generate the data later.
Your loss function (tf.keras.losses.sparse_categorical_crossentropy) is expecting the training vector to be one hot encoded. Change it to tf.keras.losses.mse, for example, and I think it will work.
See tensorflow docs for the definition.
After going through some Stack questions and the Keras documentation, I manage to write some code trying to evaluate the gradient of the output of a neural network w.r.t its inputs, the purpose being a simple exercise of approximating a bivariate function (f(x,y) = x^2+y^2) using as loss the difference between analytical and automatic differentiation.
Combining answers from two questions (Keras custom loss function: Accessing current input pattern
and Getting gradient of model output w.r.t weights using Keras
), I came up with this:
import tensorflow as tf
from keras import backend as K
from keras.models import Model
from keras.layers import Dense, Activation, Input
def custom_loss(input_tensor):
outputTensor = model.output
listOfVariableTensors = model.input
gradients = K.gradients(outputTensor, listOfVariableTensors)
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
evaluated_gradients = sess.run(gradients,feed_dict={model.input:input_tensor})
grad_pred = K.add(evaluated_gradients[0], evaluated_gradients[1])
grad_true = k.add(K.scalar_mul(2, model.input[0][0]), K.scalar_mul(2, model.input[0][1]))
return K.square(K.subtract(grad_pred, grad_true))
input_tensor = Input(shape=(2,))
hidden = Dense(10, activation='relu')(input_tensor)
out = Dense(1, activation='sigmoid')(hidden)
model = Model(input_tensor, out)
model.compile(loss=custom_loss_wrapper(input_tensor), optimizer='adam')
Which yields the error: TypeError: The value of a feed cannot be a tf.Tensor object. because of feed_dict={model.input:input_tensor}. I understand the error, I just don't know how to fix it.
From what I gathered, I can't simply pass input data into the loss function, it must be a tensor. I realized Keras would 'understand' it when I call input_tensor. This all just leads me to think I'm doing things the wrong way, trying to evaluate the gradient like that. Would really appreciate some enlightenment.
I don't really understand why you want this loss function, but I will provide an answer anyway. Also, there is no need to evaluate the gradient within the function (in fact, you would be "disconnecting" the computational graph). The loss function could be implemented as follows:
from keras import backend as K
from keras.models import Model
from keras.layers import Dense, Input
def custom_loss(input_tensor, output_tensor):
def loss(y_true, y_pred):
gradients = K.gradients(output_tensor, input_tensor)
grad_pred = K.sum(gradients, axis=-1)
grad_true = K.sum(2*input_tensor, axis=-1)
return K.square(grad_pred - grad_true)
return loss
input_tensor = Input(shape=(2,))
hidden = Dense(10, activation='relu')(input_tensor)
output_tensor = Dense(1, activation='sigmoid')(hidden)
model = Model(input_tensor, output_tensor)
model.compile(loss=custom_loss(input_tensor, output_tensor), optimizer='adam')
A Keras loss must have y_true and y_pred as inputs. You can try adding your input object as both x and y during the fit:
def custom_loss(y_true,y_pred):
...
return K.square(K.subtract(grad_true, grad_pred))
...
model.compile(loss=custom_loss, optimizer='adam')
model.fit(X, X, ...)
This way, y_true will be the batch being processed at each iteration from the input X, while y_pred will be the output of the model for that particular batch.
I have a sequence I am trying to classify, using a Keras LSTM with return_sequences=True. I have 'data' and 'labels' datasets both of which are the same shape - 2D matrices with rows by location and columns by time interval (cell values are my 'signal' feature). So an RNN w/ return_sequences=True seems like an intuitive approach.
After reshaping my data (X) and labels (Y) to 3D tensors of shape (rows, cols, 1), I call model.fit(X, Y) but get the following error:
ValueError('Invalid shape for y')
It points me to the code for class KerasClassifier()'s fit method which checks that len(y.shape)==2.
Ok so maybe I was supposed to reshape my 2D 'X' to a 3D Tensor of shape (rows, cols, 1) but leave my labels as 2D for sklearn interface? But then when I try that I get another Keras error:
ValueError: Error when checking model target: expected lstm_17 to have
3 dimensions, but got array with shape (500, 2880)
...So how does one fit a Sklearn-style Keras RNN to return sequences? Different parts of Keras seem to demand that my target be both 2D and 3D. Or (more likely) I'm misunderstanding something.
...
Here's a reproduceable code example:
from keras.layers import LSTM
from keras.wrappers.scikit_learn import KerasClassifier
# Raw Data/Targets
X = np.array([1,2,3,4,5,6,7,8,9,10,11,12]).reshape(3,4)
Y = np.array([1,0,1,1,0,1,0,1,0,1,0,1]).reshape(3,4)
# Convert X to 3D tensor per Keras doc for recurrent layers
X = X.reshape(X.shape[0], X.shape[1], 1)
# .fit() at bottom will throw an error whether or not this line is used to reshape Y
to reshape Y
Y = Y.reshape(Y.shape[0], Y.shape[1], 1)
# Define function to return compiled Keras Model (to pass to Sklearn API)
def keras_rnn(timesteps, num_features):
'''Function to return compiled Keras Classifier to pass to sklearn wrapper'''
model = Sequential()
model.add(LSTM(8, return_sequences=True, input_shape=(timesteps, num_features)))
model.add(LSTM(1, return_sequences=True, activation = 'sigmoid'))
model.compile(optimizer = 'RMSprop', loss = 'categorical_crossentropy')
return model
# Convert compiled Keras model to Scikit-learn-style classifier (compatible w/ sklearn model-tuning methods)
rnn_sklearn = KerasClassifier(build_fn=keras_rnn,
timesteps=4,
num_features=1)
# Fit RNN Model to Data, Target
rnn_sklearn.fit(X, Y)
ValueError: Invalid shape for y
This is something that I think is a feature of the KerasClassifier class. I ran into the same problem when I was using the class on a multi-step, multi-feature LSTM. For some reason, if I built the model through Keras and ran the fit() method after compile() the model will train normally with no errors. However, when I have the model created in a function and call that function with KerasClassifier, than I run into the error you have. Upon looking at the KerasClassifier class in the keras module (search for wrappers/scikit_learn.py) I found that 'y' had to be a specific shape or the function would raise an exception. This shape was a 2D 'y' tensor (n_samples, n_outputs) or a 1D 'y' tensor (n_samples) which was incompatible for what I was expecting. So I'm just going to use the model's fit() method instead of using the wrapper. Hope this helps.
BTW. My Keras version is 2.2.4 and Tensorflow is 1.15.0. This may not be applicable in the newer versions.
This code work with Keras 2.0.2:
import numpy as np
from keras.models import Sequential
from keras.layers import LSTM, Flatten
from keras.wrappers.scikit_learn import KerasClassifier
# Raw Data/Targets
X = np.array([1,2,3,4,5,6,7,8,9,10,11,12]).reshape(3,4)
Y = np.array([1,0,1,1,0,1,0,1,0,1,0,1]).reshape(3,4)
# Convert X to 3D tensor per Keras doc for recurrent layers
X = X.reshape(X.shape[0], X.shape[1], 1)
# .fit() at bottom will throw an error whether or not this line is used to reshape Y to reshape Y
Y = Y.reshape(Y.shape[0], Y.shape[1], 1)
# Define function to return compiled Keras Model (to pass to Sklearn API)
def keras_rnn(timesteps, num_features):
'''Function to return compiled Keras Classifier to pass to sklearn wrapper'''
model = Sequential()
model.add(LSTM(8, return_sequences=True, input_shape=(timesteps, num_features)))
model.add(LSTM(1, return_sequences=True, activation = 'sigmoid'))
model.compile(optimizer = 'RMSprop', loss = 'binary_crossentropy')
return model
# Convert compiled Keras model to Scikit-learn-style classifier (compatible w/ sklearn model-tuning methods)
rnn_sklearn = KerasClassifier(build_fn=keras_rnn,
timesteps=4,
num_features=1)
# Fit RNN Model to Data, Target
rnn_sklearn.fit(X, Y)