keras. Initializing a bidirecional LSTM. Passing word embeddings - python

In the implementation i am using, the lstm is initialized in the following way:
l_lstm = Bidirectional(LSTM(64, return_sequences=True))(embedded_sequences)
What i don't really understand and it might be because of the lack of experience in Python generally: the notation l_lstm= Bidirectional(LSTM(...))(embedded_sequences).
I don't get what i am passing the embedded_sequences to? Because it is not a parameter of LSTM() but also does not seem to be an argument for Bidirectional() as it stands separately.
Here is the documentation for Bidirectional:
def __init__(self, layer, merge_mode='concat', weights=None, **kwargs):
if merge_mode not in ['sum', 'mul', 'ave', 'concat', None]:
raise ValueError('Invalid merge mode. '
'Merge mode should be one of '
'{"sum", "mul", "ave", "concat", None}')
self.forward_layer = copy.copy(layer)
config = layer.get_config()
config['go_backwards'] = not config['go_backwards']
self.backward_layer = layer.__class__.from_config(config)
self.forward_layer.name = 'forward_' + self.forward_layer.name
self.backward_layer.name = 'backward_' + self.backward_layer.name
self.merge_mode = merge_mode
if weights:
nw = len(weights)
self.forward_layer.initial_weights = weights[:nw // 2]
self.backward_layer.initial_weights = weights[nw // 2:]
self.stateful = layer.stateful
self.return_sequences = layer.return_sequences
self.return_state = layer.return_state
self.supports_masking = True
self._trainable = True
super(Bidirectional, self).__init__(layer, **kwargs)
self.input_spec = layer.input_spec
self._num_constants = None

Let's try to break down what is going on:
You start with LSTM(...) which creates an LSTM Layer. Now layers in Keras are callable which means you can use them like functions. For example lstm = LSTM(...) and then lstm(some_input) will call the LSTM on the given input tensor.
The Bidirectional(...) wraps any RNN layer and returns you another layer that when called applies the wrapped layer in both directions. So l_lstm = Bidirectional(LSTM(...)) is a layer when called with some input will apply the LSTM in both direction. Note: Bidirectional creates a copy of passed LSTM layer, so backwards and forwards are different LSTMs.
Finally, when you call Bidirectional(LSTM(...))(embedded_seqences) bidirectional layer takes the input sequences, passes it to the wrapped LSTMs in both directions, collects their output and concatenates it.
To understand more about layers and their callable nature, you can look at the functional API guide of the documentation.

Related

Should I set run_functions_eagerly(True) to call hidden layers in the loss function?

Hello I'm new to Tensorflow/keras, and I want to build a neural network similar to an autoencoder, with a custom loss function that uses the encoder(or decoder) for example among other operations.
Here's my NN architecture
first_ = Input(shape=(2, nt))
def encoder_one(inp):
second_ = layers.Dense(30 , activation='relu')(inp)
third_ = layers.Dense(30, activation='relu')(second_)
last_ = layers.Dense(latent_dim, activation='relu')(third_)
return last_
def linear_one(inp):
last = layers.Dense(latent_dim)(inp)
return last
def decoder_one(inp):
third = layers.Dense(30, activation='relu')(inp)
second = layers.Dense(30, activation='relu')(third)
first = layers.Dense(2, activation='relu')(second)
return first
def encoder(inp):
encoded = []
for i in range(inp.shape[2]):
encoded.append(encoder_one(inp[:, :, i]))
return encoded
def linear(inp):
advanced = []
for i in range(len(inp)):
advanced.append(linear_one(inp[i]))
return advanced
def decoder(inp):
decoded = []
for i in range(len(inp)):
decoded.append(decoder_one(inp[i]))
return tf.stack(decoded, axis=2)
enc = encoder(first_)
adv = linear(enc)
dec = decoder(adv)
model = Model(inputs=first_, outputs=dec)
And I want the loss function to have something like :
def custom_loss(y_true, y_pred):
return tf.some_function((y_true[:, :, some_index] - decoder_one(encoder_one(y_pred[:, :, some_index]))), axis = 1,)
Now I have to set the option tf.config.run_functions_eagerly(True) in order for this to work, otherwise I get the ValueError ValueError: tf.function only supports singleton tf.Variables created on the first call. .... I see that the hidden layers cannot be called multiple times (Like in the loss function).
So is tf.config.run_functions_eagerly(True) The right way to do this or should I write more specific layers and training loops from scratch ?
Thank you !

Short circuit computation in mixture of experts model using tensorflow keras functional api

I am trying to swap between multiple different "expert" layers based on the output of a "gating" layer (as a mixture of experts).
I created a custom layer that takes in the outputs of the expert and gating layers, but this ends up throwing away some outputs rather than not computing them in the first place.
How can I make the model "short circuit" to only evaluate the gating layer and the selected expert layer(s) to save computation time?
I am using tensorflow 2.0 gpu and the keras functional api
Keras models can be implemented fully dynamically, to support the efficient routing that you mentioned. The following example shows one way in which this can be done. The example is written with the following premises:
It assumes there are two experts (LayerA and LayerB)
It assumes that a mix-of-experts model (MixOfExpertsModel) switches dynamically between the two expert layer classes depending on the per-example output of a Keras Dense layer
It satisfies the need to run training on the model in a batch fashion.
Pay attention to the comments in the code to see how the switching is done.
import numpy as np
import tensorflow as tf
# This is your Expert A class.
class LayerA(tf.keras.layers.Layer):
def build(self, input_shape):
self.weight = self.add_weight("weight_a", shape=input_shape[1:])
#tf.function
def call(self, x):
return x + self.weight
# This is your Expert B class.
class LayerB(tf.keras.layers.Layer):
def build(self, input_shape):
self.weight = self.add_weight("weight_b", shape=input_shape[1:])
#tf.function
def call(self, x):
return x * self.weight
class MixOfExpertsModel(tf.keras.models.Model):
def __init__(self):
super(MixOfExpertsModel, self).__init__()
self._expert_a = LayerA()
self._expert_b = LayerB()
self._gating_layer = tf.keras.layers.Dense(1, activation="sigmoid")
#tf.function
def call(self, x):
z = self._gating_layer(x)
# The switching logic:
# - examples with gating output <= 0.5 are routed to expert A
# - examples with gating output > 0.5 are routed to expert B.
mask_a = tf.squeeze(tf.less_equal(z, 0.5), axis=-1)
mask_b = tf.squeeze(tf.greater(z, 0.5), axis=-1)
# `input_a` is a subset of slices of the original input (`x`).
# So is `input_b`. As such, no compute is wasted.
input_a = tf.boolean_mask(x, mask_a, axis=0)
input_b = tf.boolean_mask(x, mask_b, axis=0)
if tf.size(input_a) > 0:
output_a = self._expert_a(input_a)
else:
output_a = tf.zeros_like(input_a)
if tf.size(input_b) > 0:
output_b = self._expert_b(input_b)
else:
output_b = tf.zeros_like(input_b)
# Return `mask_a`, and `mask_b`, so that the caller can know
# which example is routed to which expert and whether its output
# appears in `output_a` or `output_b`. # This is necessary
# for writing a (custom) loss function for this class.
return output_a, output_b, mask_a, mask_b
# Create an intance of the mix-of-experts model.
mix_of_experts_model = MixOfExpertsModel()
# Generate some dummy data.
num_examples = 32
xs = np.random.random([num_examples, 8]).astype(np.float32)
# Call the model.
print(mix_of_experts_model(xs))
I didn't write a custom loss function that would support the training of this class. But that's doable by using the return values of MixOfExpertsModel.call(), namely the outputs and masks.

save and load custom attention model lstm in keras

I want to run a seq2seq model using lstm for a customer journey analysis.I am able to run the model but unable to load the saved model on a different notebook.
Code for attention model is here:
# RNN "Cell" classes in Keras perform the actual data transformations at each timestep. Therefore, in order to add attention to LSTM, we need to make a custom subclass of LSTMCell.
class AttentionLSTMCell(LSTMCell):
def __init__(self, **kwargs):
self.attentionMode = False
super(AttentionLSTMCell, self).__init__(**kwargs)
# Build is called to initialize the variables that our cell will use. We will let other Keras
# classes (e.g. "Dense") actually initialize these variables.
#tf_utils.shape_type_conversion
def build(self, input_shape):
# Converts the input sequence into a sequence which can be matched up to the internal
# hidden state.
self.dense_constant = TimeDistributed(Dense(self.units, name="AttLstmInternal_DenseConstant"))
# Transforms the internal hidden state into something that can be used by the attention
# mechanism.
self.dense_state = Dense(self.units, name="AttLstmInternal_DenseState")
# Transforms the combined hidden state and converted input sequence into a vector of
# probabilities for attention.
self.dense_transform = Dense(1, name="AttLstmInternal_DenseTransform")
# We will augment the input into LSTMCell by concatenating the context vector. Modify
# input_shape to reflect this.
batch, input_dim = input_shape[0]
batch, timesteps, context_size = input_shape[-1]
lstm_input = (batch, input_dim + context_size)
# The LSTMCell superclass expects no constant input, so strip that out.
return super(AttentionLSTMCell, self).build(lstm_input)
# This must be called before call(). The "input sequence" is the output from the
# encoder. This function will do some pre-processing on that sequence which will
# then be used in subsequent calls.
def setInputSequence(self, input_seq):
self.input_seq = input_seq
self.input_seq_shaped = self.dense_constant(input_seq)
self.timesteps = tf.shape(self.input_seq)[-2]
# This is a utility method to adjust the output of this cell. When attention mode is
# turned on, the cell outputs attention probability vectors across the input sequence.
def setAttentionMode(self, mode_on=False):
self.attentionMode = mode_on
# This method sets up the computational graph for the cell. It implements the actual logic
# that the model follows.
def call(self, inputs, states, constants):
# Separate the state list into the two discrete state vectors.
# ytm is the "memory state", stm is the "carry state".
ytm, stm = states
# We will use the "carry state" to guide the attention mechanism. Repeat it across all
# input timesteps to perform some calculations on it.
stm_repeated = K.repeat(self.dense_state(stm), self.timesteps)
# Now apply our "dense_transform" operation on the sum of our transformed "carry state"
# and all encoder states. This will squash the resultant sum down to a vector of size
# [batch,timesteps,1]
# Note: Most sources I encounter use tanh for the activation here. I have found with this dataset
# and this model, relu seems to perform better. It makes the attention mechanism far more crisp
# and produces better translation performance, especially with respect to proper sentence termination.
combined_stm_input = self.dense_transform(
keras.activations.relu(stm_repeated + self.input_seq_shaped))
# Performing a softmax generates a log probability for each encoder output to receive attention.
score_vector = keras.activations.softmax(combined_stm_input, 1)
# In this implementation, we grant "partial attention" to each encoder output based on
# it's log probability accumulated above. Other options would be to only give attention
# to the highest probability encoder output or some similar set.
context_vector = K.sum(score_vector * self.input_seq, 1)
# Finally, mutate the input vector. It will now contain the traditional inputs (like the seq2seq
# we trained above) in addition to the attention context vector we calculated earlier in this method.
inputs = K.concatenate([inputs, context_vector])
# Call into the super-class to invoke the LSTM math.
res = super(AttentionLSTMCell, self).call(inputs=inputs, states=states)
# This if statement switches the return value of this method if "attentionMode" is turned on.
if(self.attentionMode):
return (K.reshape(score_vector, (-1, self.timesteps)), res[1])
else:
return res
# Custom implementation of the Keras LSTM that adds an attention mechanism.
# This is implemented by taking an additional input (using the "constants" of the RNN class into the LSTM: The encoder output vectors across the entire input sequence.
class LSTMWithAttention(RNN):
def __init__(self, units, **kwargs):
cell = AttentionLSTMCell(units=units)
self.units = units
super(LSTMWithAttention, self).__init__(cell, **kwargs)
#tf_utils.shape_type_conversion
def build(self, input_shape):
self.input_dim = input_shape[0][-1]
self.timesteps = input_shape[0][-2]
return super(LSTMWithAttention, self).build(input_shape)
# This call is invoked with the entire time sequence. The RNN sub-class is responsible
# for breaking this up into calls into the cell for each step.
# The "constants" variable is the key to our implementation. It was specifically added
# to Keras to accomodate the "attention" mechanism we are implementing.
def call(self, x, constants, **kwargs):
if isinstance(x, list):
self.x_initial = x[0]
else:
self.x_initial = x
# The only difference in the LSTM computational graph really comes from the custom
# LSTM Cell that we utilize.
self.cell._dropout_mask = None
self.cell._recurrent_dropout_mask = None
self.cell.setInputSequence(constants[0])
return super(LSTMWithAttention, self).call(inputs=x, constants=constants, **kwargs)
Code defining encoder and decoder model:
# Encoder Layers
encoder_inputs = Input(shape=(None,len_input), name="attenc_inputs")
encoder = LSTM(units=units, return_sequences=True, return_state=True)
encoder_outputs, state_h, state_c = encoder((encoder_inputs))
encoder_states = [state_h, state_c]
#define inference decoder
encoder_model = Model(encoder_inputs, encoder_states)
#encoder_model.save('atten_enc_model.h5')
# define training decoder
decoder_inputs = Input(shape=(None, n_output))
Attention_dec_lstm = LSTMWithAttention(units=units, return_sequences=True, return_state=True)
# Note that the only real difference here is that we are feeding attenc_outputs to the decoder now.
attdec_lstm_out, _, _ = Attention_dec_lstm(inputs=decoder_inputs,
constants=encoder_outputs,
initial_state=encoder_states)
decoder_dense1 = Dense(units, activation="relu")
decoder_dense2 = Dense(n_output, activation='softmax')
decoder_outputs = decoder_dense2(Dropout(rate=.10)(decoder_dense1(Dropout(rate=.10)(attdec_lstm_out))))
atten_model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
atten_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
atten_model.summary()
#Defining inference decoder
state_input_h = Input(shape=(units,), name="state_input_h")
state_input_c = Input(shape=(units,), name="state_input_c")
decoder_states_inputs = [state_input_h, state_input_c]
attenc_seq_out = Input(shape=encoder_outputs.get_shape()[1:], name="attenc_seq_out")
attenc_seq_out
inf_attdec_inputs = Input(shape=(None,n_output), name="inf_attdec_inputs")
attdec_res, attdec_h, attdec_c = Attention_dec_lstm(inputs=inf_attdec_inputs,
constants=attenc_seq_out,
initial_state=decoder_states_inputs)
decoder_states = [attdec_h, attdec_c]
decoder_model = Model(inputs=[inf_attdec_inputs, state_input_h, state_input_c, attenc_seq_out],
outputs=[attdec_res, attdec_h, attdec_c])
Code for model fit and save:
history = atten_model.fit([encoder_input_data, decoder_input_data], decoder_target_data,
batch_size=batch_size_num,
epochs=epochs,
validation_split=0.2,verbose=1)
atten_model.save('atten_model_lstm.h5')
Code to load the encoder decoder model with custom Attention layer:
with open('atten_model_lstm.json') as mdl:
json_string = mdl.read()
model = model_from_json(json_string, custom_objects={'AttentionLSTMCell': AttentionLSTMCell, 'LSTMWithAttention': LSTMWithAttention})
This code to load is giving error :
TypeError: int() argument must be a string, a bytes-like object or a number, not 'AttentionLSTMCell'
Here's a solution inspired by the link in my comment:
# serialize model to JSON
atten_model_json = atten_model.to_json()
with open("atten_model.json", "w") as json_file:
json_file.write(atten_model_json)
# serialize weights to HDF5
atten_model.save_weights("atten_model.h5")
print("Saved model to disk")
# Different part of your code or different file
# load json and create model
json_file = open('atten_model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
# load weights into new model
loaded_model.load_weights("atten_model.h5")
print("Loaded model from disk")

Extracting the dropout mask from a keras dropout layer?

I would like to extract and store the dropout mask [array of 1/0s] from a dropout layer in a Sequential Keras model at each batch while training. I was wondering if there was a straight forward way way to do this within Keras or if I would need to switch over to tensorflow (How to get the dropout mask in Tensorflow).
Would appreciate any help! I'm quite new to TensorFlow and Keras.
There are a couple of functions (dropout_layer.get_output_mask(), dropout_layer.get_input_mask()) for the dropout layer that I tried using but got None after calling on the previous layer.
model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(name="flat", input_shape=(28, 28, 1)))
model.add(tf.keras.layers.Dense(
512,
activation='relu',
name = 'dense_1',
kernel_initializer=tf.keras.initializers.GlorotUniform(seed=123),
bias_initializer='zeros'))
dropout = tf.keras.layers.Dropout(0.2, name = 'dropout') #want this layer's mask
model.add(dropout)
x = dropout.output_mask
y = dropout.input_mask
model.add(tf.keras.layers.Dense(
10,
activation='softmax',
name='dense_2',
kernel_initializer=tf.keras.initializers.GlorotUniform(seed=123),
bias_initializer='zeros'))
model.compile(...)
model.fit(...)
It's not easily exposed in Keras. It goes deep until it calls the Tensorflow dropout.
So, although you're using Keras, it's will also be a tensor in the graph that can be gotten by name (finding it's name: In Tensorflow, get the names of all the Tensors in a graph).
This option, of course will lack some keras information, you should probably have to do that inside a Lambda layer so Keras adds certain information to the tensor. And you must take extra care because the tensor will exist even when not training (where the mask is skipped)
Now, you can also use a less hacky way, that may consume a little processing:
def getMask(x):
boolMask = tf.not_equal(x, 0)
floatMask = tf.cast(boolMask, tf.float32) #or tf.float64
return floatMask
Use a Lambda(getMasc)(output_of_dropout_layer)
But instead of using a Sequential model, you will need a functional API Model.
inputs = tf.keras.layers.Input((28, 28, 1))
outputs = tf.keras.layers.Flatten(name="flat")(inputs)
outputs = tf.keras.layers.Dense(
512,
# activation='relu', #relu will be a problem here
name = 'dense_1',
kernel_initializer=tf.keras.initializers.GlorotUniform(seed=123),
bias_initializer='zeros')(outputs)
outputs = tf.keras.layers.Dropout(0.2, name = 'dropout')(outputs)
mask = Lambda(getMask)(outputs)
#there isn't "input_mask"
#add the missing relu:
outputs = tf.keras.layers.Activation('relu')(outputs)
outputs = tf.keras.layers.Dense(
10,
activation='softmax',
name='dense_2',
kernel_initializer=tf.keras.initializers.GlorotUniform(seed=123),
bias_initializer='zeros')(outputs)
model = Model(inputs, outputs)
model.compile(...)
model.fit(...)
Training and predicting
Since you can't train the masks (it doesn't make any sense), it should not be an output of the model for training.
Now, we could try this:
trainingModel = Model(inputs, outputs)
predictingModel = Model(inputs, [output, mask])
But masks don't exist in prediction, because dropout is only applied in training. So this doesn't bring us anything good in the end.
The only way for training is then using a dummy loss and dummy targets:
def dummyLoss(y_true, y_pred):
return y_true #but this might evoke a "None" gradient problem since it's not trainable, there is no connection to any weights, etc.
model.compile(loss=[loss_for_main_output, dummyLoss], ....)
model.fit(x_train, [y_train, np.zeros((len(y_Train),) + mask_shape), ...)
It's not guaranteed that these will work.
I found a very hacky way to do this by trivially extending the provided dropout layer. (Almost all code from TF.)
class MyDR(tf.keras.layers.Layer):
def __init__(self,rate,**kwargs):
super(MyDR, self).__init__(**kwargs)
self.noise_shape = None
self.rate = rate
def _get_noise_shape(self,x, noise_shape=None):
# If noise_shape is none return immediately.
if noise_shape is None:
return array_ops.shape(x)
try:
# Best effort to figure out the intended shape.
# If not possible, let the op to handle it.
# In eager mode exception will show up.
noise_shape_ = tensor_shape.as_shape(noise_shape)
except (TypeError, ValueError):
return noise_shape
if x.shape.dims is not None and len(x.shape.dims) == len(noise_shape_.dims):
new_dims = []
for i, dim in enumerate(x.shape.dims):
if noise_shape_.dims[i].value is None and dim.value is not None:
new_dims.append(dim.value)
else:
new_dims.append(noise_shape_.dims[i].value)
return tensor_shape.TensorShape(new_dims)
return noise_shape
def build(self, input_shape):
self.noise_shape = input_shape
print(self.noise_shape)
super(MyDR,self).build(input_shape)
#tf.function
def call(self,input):
self.noise_shape = self._get_noise_shape(input)
random_tensor = tf.random.uniform(self.noise_shape, seed=1235, dtype=input.dtype)
keep_prob = 1 - self.rate
scale = 1 / keep_prob
# NOTE: if (1.0 + rate) - 1 is equal to rate, then we want to consider that
# float to be selected, hence we use a >= comparison.
self.keep_mask = random_tensor >= self.rate
#NOTE: here is where I save the binary masks.
#the file grows quite big!
tf.print(self.keep_mask,output_stream="file://temp/droput_mask.txt")
ret = input * scale * math_ops.cast(self.keep_mask, input.dtype)
return ret

How to create Keras model with optional inputs

I'm looking for a way to create a Keras model with optional inputs. In raw TensorFlow, you can create placeholders with optional inputs as follows:
import numpy as np
import tensorflow as tf
def main():
required_input = tf.placeholder(
tf.float32,
shape=(None, 2),
name='required_input')
default_optional_input = tf.random_uniform(
shape=(tf.shape(required_input)[0], 3))
optional_input = tf.placeholder_with_default(
default_optional_input,
shape=(None, 3),
name='optional_input')
output = tf.concat((required_input, optional_input), axis=-1)
with tf.Session() as session:
with_optional_input_output_np = session.run(output, feed_dict={
required_input: np.random.uniform(size=(4, 2)),
optional_input: np.random.uniform(size=(4, 3)),
})
print(f"with optional input: {with_optional_input_output_np}")
without_optional_input_output_np = session.run(output, feed_dict={
required_input: np.random.uniform(size=(4, 2)),
})
print(f"without optional input: {without_optional_input_output_np}")
if __name__ == '__main__':
main()
In a similar fashion, I would want to be able to have optional inputs for my Keras model. It seems like the tensor argument in the keras.layers.Input.__init__ might be what I'm looking for, but at least it doesn't work as I was expecting (i.e. the same way as tf.placeholder_with_default shown above). Here's an example that breaks:
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
def create_model(output_size):
required_input = tf.keras.layers.Input(
shape=(13, ), dtype='float32', name='required_input')
batch_size = tf.shape(required_input)[0]
def sample_optional_input(inputs, batch_size=None):
base_distribution = tfp.distributions.MultivariateNormalDiag(
loc=tf.zeros(output_size),
scale_diag=tf.ones(output_size),
name='sample_optional_input')
return base_distribution.sample(batch_size)
default_optional_input = tf.keras.layers.Lambda(
sample_optional_input,
arguments={'batch_size': batch_size}
)(None)
optional_input = tf.keras.layers.Input(
shape=(output_size, ),
dtype='float32',
name='optional_input',
tensor=default_optional_input)
concat = tf.keras.layers.Concatenate(axis=-1)(
[required_input, optional_input])
dense = tf.keras.layers.Dense(
output_size, activation='relu')(concat)
model = tf.keras.Model(
inputs=[required_input, optional_input],
outputs=[dense])
return model
def main():
model = create_model(output_size=3)
required_input_np = np.random.normal(size=(4, 13))
outputs_np = model.predict({'required_input': required_input_np})
print(f"outputs_np: {outputs_np}")
required_input = tf.random_normal(shape=(4, 13))
outputs = model({'required_input': required_input})
print(f"outputs: {outputs}")
if __name__ == '__main__':
main()
The first call to the model.predict seems to give correct output, but for some reason, the direct call to model fails with the following error:
ValueError: Layer model expects 2 inputs, but it received 1 input tensors. Inputs received: []
Can the tensor argument in Input.__init__ be used to implement optional inputs for Keras model as in my example above? If yes, what should I change in my example to make it run correctly? If not, what is the expected way of creating optional inputs in Keras?
I really don't think it's possible without workarounds. Keras was not meant for that.
But, noticing that you are using two different session.run commands for each case, it seems that it should be easy to do it with two models. One model uses the optional input, the other doesn't. You choose which one to use the same way you choose which session.run() to call.
That said, you can use Input(tensor=...) or simply create the optional input inside a Lambda layer. Both things are fine. But don't use Input(shape=..., tensor=...), these are redundant arguments and sometimes Keras does not deal well with redundancies like this.
Ideally, keep all operations inside Lambda layers, even the tf.shape operation.
That said:
required_input = tf.keras.layers.Input(
shape=(13, ), dtype='float32', name='required_input')
#needs the input for the case you want to pass it:
optional_input_when_used = tf.keras.layers.Input(shape=(output_size,))
#operations should be inside Lambda layers
batch_size = Lambda(lambda x: tf.shape(x)[0])(required_input)
#updated for using the batch size coming from lambda
#you didn't use "inputs" anywhere in this function
def sample_optional_input(batch_size):
base_distribution = tfp.distributions.MultivariateNormalDiag(
loc=tf.zeros(output_size),
scale_diag=tf.ones(output_size),
name='sample_optional_input')
return base_distribution.sample(batch_size)
#updated for using the batch size as input
default_optional_input = tf.keras.layers.Lambda(sample_optional_input)(batch_size)
#let's skip the concat for now - notice I'm not "using" this layer yet
dense_layer = tf.keras.layers.Dense(output_size, activation='relu')
#you could create the rest of the model here if it's big, so you don't create it twice
#(check the final section of this answer)
Model using passed input:
concat_when_used = tf.keras.layers.Concatenate(axis=-1)(
[required_input, optional_input_when_used]
)
dense_when_used = dense_layer(concat_when_used)
#or final_part_of_the_model(concat_when_used)
model_when_used = Model([required_input, optional_input_when_used], dense_when_used)
Model not using the optional input:
concat_not_used = tf.keras.layers.Concatenate(axis=-1)(
[required_input, default_optional_input]
)
dense_not_used = dense_layer(concat_not_used)
#or final_part_of_the_model(concat_not_used)
model_not_used = Model(required_input, dense_not_used)
It's ok to create two models like this and choose one to use (both models share the final layers, so they will always be trained together)
Now, at the point you choose which session.run, now you will choose which model to use:
model_when_used.predict([x1, x2])
model_when_used.fit([x1,x2], y)
model_not_used.predict(x)
model_not_used.fit(x, y)
How to create a shared final part?
If your final part is big, you will not want to call everything twice to create two models. In this case, create a final model first:
input_for_final = Input(shape_after_concat)
out = Dense(....)(input_for_final)
out = Dense(....)(out)
out = Dense(....)(out)
.......
final_part_of_the_model = Model(input_for_final, out)
Then use this final part in previous answer.
dense_when_used = final_part_of_the_model(concat_when_used)
dense_not_used = final_part_of_the_model(concat_not_used)

Categories

Resources