Can I use loops inside a model using functional API? - python

I have a trained keras model which takes inputs of size (batchSize,2). This works well and gives good results.
My main problem is to have a model which takes an input a vector of size(batchSize,2,16) and slice it inside the model to 16 vectors of size(batchSize,2) and concatenate the outputs together.
I have used this code for this
y = layers.Input(shape=(2,16,))
model_x= load_model('saved_model')
for i in range(16):
x_input = Lambda(lambda x: x[:, :, i])(y)
if i == 0:
x_output = model_x(x_input)
else:
x_output = layers.concatenate([x_output,
model_x(x_input)])
x_output = Lambda(lambda x: x[:, :tf.cast(N, tf.int32)])(x_output)
final_model = Model(y, x_output)
Although the saved model gives me good performance, this code does not trains well and doesn't give the intended performance.
What can I do to get better results?

I can't say anything about the bad performance of your final model because it might be due to various reasons and this is not readily evident from the content of your question. But to answer your original question: yes, you can use for loops that way, because you are essentially creating layers/tensors and connecting them to each other (i.e. building the graph of the model). So it's a valid thing to do. The problem might be somewhere else, e.g. a wrong indexing, a wrong loss function, etc.
Further, you can build your final model in a much simpler approach. You already have a trained model which gets inputs of shape (batch_size, 2) and gives outputs of shape (batch_size, 8). Now you want to build a model which takes inputs of shape (batch_size, 2, 16), apply the already trained model on each of the 16 (batch_size, 2) segments and then concatenate the results. You can easily do that with a TimeDistributed wrapper:
# load your already trained model
model_x = load_model('saved_model')
inp = layers.Input(shape=(2,16))
# this makes the input shape as `(16,2)`
x = layers.Permute((2,1))(inp)
# this would apply `model_x` on each of the 16 segments; the output shape would be (None, 16, 8)
x = layers.TimeDistributed(model_x)(x)
# flatten to make it have a shape of (None, 128)
out = layers.Flatten()(x)
final_model = Model(inp, out)

Related

TimeDistributed(Dense()) vs Dense() after lstm

input_word = Input(shape=(max_len,))
model = Embedding(input_dim=num_words, output_dim=50, input_length=max_len)(input_word)
model = SpatialDropout1D(0.1)(model)
model = Bidirectional(LSTM(units=100, return_sequences=True, recurrent_dropout=0.1))(model)
out = TimeDistributed(Dense(num_tags, activation="softmax"))(model)
#out = Dense(num_tags, activation="softmax")(model)
model = Model(input_word, out)
model.summary()
I get the same result when I use just Dense layer or with TimeDistributed. In which case should I use TimeDistributed?
TimeDistributed is only necessary for certain layers that cannot handle additional dimensions in their implementation. E.g. MaxPool2D only works on 2D tensors (shape batch x width x height x channels) and will crash if you, say, add a time dimension:
tfkl = tf.keras.layers
a = tf.random.normal((16, 32, 32, 3))
tfkl.MaxPool2D()(a) # this works
a = tf.random.normal((16, 5, 32, 32, 3)) # added a 5th dimension
tfkl.MaxPool2D()(a) # this will crash
Here, adding TimeDistributed will fix it:
tfkl.TimeDistributed(tfkl.MaxPool2D())(a) # works with a being 5d!
However, many layers already support arbitrary input shapes and will automatically distribute the computations over those dimensions. One of these is Dense -- it is always applied to the last axis in your input and distributed over all others, so TimeDistributed isn't necessary. In fact, as you noted, it changes nothing about the output.
Still, it may change how exactly the computation is done. I'm not sure about this, but I would wager that not using TimeDistributed and relying on the Dense implementation itself may be more efficient.
According to the book Zero to Deep Learning by Francesco Mosconi in chapter 7:
If we want the model return an output sequence to be compared with the
sequence of values in the labels, we will use the TimeDistributed
layer wrapper around our output Dense layer. This method of training
is called Teacher Forcing. If we didn’t create output sequences we
wouldn't need Teacher Forcing(i.e. wouldn't need TimeDistributed wrapper).

How to generate dataset for multi output keras model

I followed the code examples for structured data classification at keras.io to build a model for classifying a rather simple model similar to the one in the example.
I wanted to extend the model to handle a second output, but I cannot use this model to train. The dataset is generated like it is done in the example (but with two results):
res1 = dataframe.pop("result1")
res2 = dataframe.pop("result2")
ds = tf.data.Dataset.from_tensor_slices((dict(dataframe),(res1,res2)))
The model is also similar to the example but using a two-dimensional output:
x = layers.Dense(32, activation="relu")(all_features)
x = layers.Dropout(0.5)(x)
output = layers.Dense(2, activation="sigmoid")(x)
model = keras.Model(all_inputs, output)
model.compile("adam", "binary_crossentropy", metrics=["accuracy"])
It compiles, but when i try to run fit...
model.fit(train_ds,epochs=30)
I get an error message:
ValueError: logits and labels must have the same shape ((None, 2) vs (None, 1))
How can I prepare the dataset to meet the shape constraints?
I believe you should use the zip() function:
ds = tf.data.Dataset.from_tensor_slices((dict(dataframe),zip(res1,res2)))
This way, you are telling from_tensor_slices() to zip labels into a new array of shape (N, 2) instead of concatenating two vectors of shape (N, 1) into (2N, 1).

What does applying a layer on a model do?

I'm working with the tensorflow.keras API, and I've encountered a syntax which I'm unfamiliar with, i.e., applying a layer on a sub-models' output, as shown in the following example from this tutorial:
from tensorflow.keras import Model, layers
from tensorflow.keras.applications import resnet
target_shape = (200, 200)
base_cnn = resnet.ResNet50(
weights="imagenet", input_shape=target_shape + (3,), include_top=False
)
flatten = layers.Flatten()(base_cnn.output)
dense1 = layers.Dense(512, activation="relu")(flatten)
dense1 = layers.BatchNormalization()(dense1)
dense2 = layers.Dense(256, activation="relu")(dense1)
dense2 = layers.BatchNormalization()(dense2)
output = layers.Dense(256)(dense2)
embedding = Model(base_cnn.input, output, name="Embedding")
In the official reference of layers.Flatten for example, I couldn't find the explanation of what does applying it on a layer actually do. In the keras.Layer reference I've encountered this explanation:
call(self, inputs, *args, **kwargs): Called in call after making sure build() has been called. call() performs the logic of applying the layer to the input tensors (which should be passed in as argument).
So my question is:
What does flatten = layers.Flatten()(base_cnn.output) do?
You are creating a model based on a pre-trained model. This pre-trained model will not be actively trained with the rest of your layers unless you explicitly set trainable=True. That is, you are only interested in extracting its useful features. A flattening operation is usually used to convert a multidimensional output into a one-dimensional tensor, and that is exactly what is happening in this line: flatten = layers.Flatten()(base_cnn.output). A one-dimensional tensor is often a desirable end result of a model, especially in supervised learning. The output of the pre-trained resnet model is (None, 7, 7, 2048) and you want to generate 1D feature vectors for each input and compare them, so you flatten that output, resulting in a tensor with the shape (None, 100352) or (None, 7 * 7 * 2048).
Alternatives to Flatten would be GlobalMaxPooling2D and GlobalAveragePooling2D, which downsample an input by taking the max or average value along the spatial dimensions. For more information on this topic check out this post.

Why mini-batch output to Mobilenet is different from single output?

I am trying to check if the mini-batch output is equal to giving all the elements of the mini-batch one by one for evaluating the feature vector of Mobilenet.
Look at the following code:
model = tf.keras.models.Sequential(
(
hub.KerasLayer("https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/4",
output_shape=[1280],
trainable=False
),
)
)
images = tf.random.uniform(shape=(20, 224, 224, 3))
features = model.predict(images)
for i in range(20):
image = tf.reshape(images[i, ...], (1, 224, 224, 3))
image_feature = model.predict(image)
self.assertTrue((image_feature == features[i, ...]).all())
The assertTrue fails in my test. Should not it give the same feature vector for all the images whether they are feed as a mini-batch or one by one?
I guess it has something to do with what means and variances (of BN layers) the model is using. If the moving means and variances from the training stage are used (and they should be, IMO), the single outputs should be exactly the same with the minibatch output. Difference <= 10e-4 is still large enough to give inconsistent predictions.

Keras - model.predict() returns list instead of numpy array

I am using model.predict() on a testing tensor, which has the same size of the input used for training, (N_tr*70,1025,11,3)
The model is trained by regression, with three outputs as ground-truth, each of size (N_te*70,1025).
For information, when testing the model N_te=180.
According to the documentation, the output of model.predict() should be a numpy tensor, instead I get a list of three elements, each with shape (N_te*70,1025).
I am afraid that the output might have been somehow shuffled (which would explain my unexpected results).
Do you have any advice to get a numpy array which is compatible to the one I used as ground-truth? If not, do you know any other work-around?
EDIT: added the neural network code
input_img = Input(shape=(1025, 11, 3 ) )
x = ( Flatten())(input_img)
for i in range(0,4):
x = ( Dense(1024*3))(x)
x = ( BatchNormalization() )(x)
x = ( LeakyReLU())(x)
o0 = ( Dense(1025, activation='sigmoid'))(x)
o1 = ( Dense(1025, activation='sigmoid'))(x)
o2 = ( Dense(1025, activation='sigmoid'))(x)
Model prediction:
output = model.predict(X_in, batch_size = batch_size, verbose=1)
It is expected that in a multi-output model, predict returns a list of numpy arrays, with each element being the corresponding output. Remember that loss is computed individually between each output and the ground truth, so this format is already ideas for that purpose.

Categories

Resources