TensorFlow Probability - want NN to output multiple distributions - python

I have a simple model that currently outputs a single numerical value which I've adapted to instead output a distribution using TFP (mean + std deviation) so I can instead understand the model's confidence around the prediction.
model = tf.keras.Sequential([
tf.keras.layers.Dense(10, input_shape=[len(df.columns),], activation='relu'), # Should only be one input, so [1,]
tf.keras.layers.Dense(10, activation='relu'),
tf.keras.layers.Dense(2 * len(target.columns)), # there are 2 outputs, so we want a mean + standard deviation for EACH of the outputs
tfp.layers.DistributionLambda(
lambda t: tfd.Normal(loc=t[..., :1],
scale=1e-3 + tf.math.softplus(0.05 * t[...,1:]))
)
])
The current 2 Dense outputs point to the mean + standard deviation of the output distribution.
In my real dataset, I have two numerical values I attempt to predict based on input data. How do I make a model output two distributions? I think the final Dense layer would need to be 4 nodes (2 means and 2 standard deviations), but I'm not sure how to make this properly work with the Distribution Lambda. I'm hoping to have a single model that predicts this rather than having to train one model per target output.
EDIT: I created this collab for people to see what I'm getting at a little more easily. I simplified the example a little bit more and hopefully, it's more self-explanatory what I'm trying to accomplish:
https://colab.research.google.com/drive/1Wlucked4V0z-Bm_ql8XJnOJL0Gm4EwnE?usp=sharing

Check out this guide on shapes in TFP: https://www.tensorflow.org/probability/examples/Understanding_TensorFlow_Distributions_Shapes
IIUC you'll want to output a distribution with batch_shape = [2]. This is effectively 2 distributions of the same family, with different parameters. Computations done with this batch of distributions (samples, pdf/log_pdf evaluations) will be vectorized (run in parallel).

IIUC and assuming you want to leave your tfp.layers.DistributionLambda as it is, you have a few options, which can you experiment with:
Option 1: Use two Dense layers with the Keras functional API:
# Your code
#[.....]
tfd = tfp.distributions
sample_layer = tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t[..., :1],
scale=1e-3 + tf.math.softplus(0.05 * t[...,1:])))
def get_df_model():
inputs = tf.keras.layers.Input(shape=[len(df.columns),])
x = tf.keras.layers.Dense(10, activation='relu')(inputs)
x = tf.keras.layers.Dense(10, activation='relu')(x)
outputs1 = tf.keras.layers.Dense(len(target.columns))(x)
outputs2 = tf.keras.layers.Dense(len(target.columns))(x) # there are 2 outputs, so we want a mean + standard deviation for EACH of the outputs
outputs1 = sample_layer(outputs1)
outputs2 = sample_layer(outputs2)
model = tf.keras.Model(inputs, [outputs1, outputs2])
negloglik = lambda y, rv_y: -rv_y.log_prob(y)
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
return model
model = get_df_model()
model.summary()
model.fit(df, target, epochs=10)
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 1)] 0 []
dense_24 (Dense) (None, 10) 20 ['input_1[0][0]']
dense_25 (Dense) (None, 10) 110 ['dense_24[0][0]']
dense_26 (Dense) (None, 2) 22 ['dense_25[0][0]']
dense_27 (Dense) (None, 2) 22 ['dense_25[0][0]']
distribution_lambda_10 (Distri ((None, 1), 0 ['dense_26[0][0]',
butionLambda) (None, 1)) 'dense_27[0][0]']
==================================================================================================
Total params: 174
Trainable params: 174
Non-trainable params: 0
__________________________________________________________________________________________________
Epoch 1/10
157/157 [==============================] - 1s 2ms/step - loss: 522.2677 - distribution_lambda_10_loss: 247.8716 - distribution_lambda_10_1_loss: 274.3961
Epoch 2/10
157/157 [==============================] - 1s 3ms/step - loss: 20.3496 - distribution_lambda_10_loss: 9.5429 - distribution_lambda_10_1_loss: 10.8067
Epoch 3/10
157/157 [==============================] - 1s 6ms/step - loss: 13.7444 - distribution_lambda_10_loss: 6.6085 - distribution_lambda_10_1_loss: 7.1359
Epoch 4/10
157/157 [==============================] - 1s 7ms/step - loss: 11.3713 - distribution_lambda_10_loss: 5.5506 - distribution_lambda_10_1_loss: 5.8206
Epoch 5/10
157/157 [==============================] - 1s 4ms/step - loss: 10.2081 - distribution_lambda_10_loss: 5.0250 - distribution_lambda_10_1_loss: 5.1830
Epoch 6/10
157/157 [==============================] - 0s 3ms/step - loss: 9.5528 - distribution_lambda_10_loss: 4.7256 - distribution_lambda_10_1_loss: 4.8272
Epoch 7/10
157/157 [==============================] - 0s 2ms/step - loss: 9.1495 - distribution_lambda_10_loss: 4.5393 - distribution_lambda_10_1_loss: 4.6102
Epoch 8/10
157/157 [==============================] - 1s 6ms/step - loss: 8.8837 - distribution_lambda_10_loss: 4.4159 - distribution_lambda_10_1_loss: 4.4678
Epoch 9/10
157/157 [==============================] - 0s 3ms/step - loss: 8.7027 - distribution_lambda_10_loss: 4.3319 - distribution_lambda_10_1_loss: 4.3708
Epoch 10/10
157/157 [==============================] - 0s 3ms/step - loss: 8.5743 - distribution_lambda_10_loss: 4.2724 - distribution_lambda_10_1_loss: 4.3019
<keras.callbacks.History at 0x7f51001c2f50>
Note what the docs state regarding the distributions when using DistributionLambda:
By default, a distribution is represented as a tensor via a random draw, e.g., tfp.distributions.Distribution.sample
Option 2: Use one Dense layer and split the output into two:
def get_df_model():
sample_layer = tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t[..., :1],
scale=1e-3 + tf.math.softplus(0.05 * t[...,1:])))
inputs = tf.keras.layers.Input(shape=[len(df.columns),])
x = tf.keras.layers.Dense(10, activation='relu')(inputs)
x = tf.keras.layers.Dense(10, activation='relu')(x)
x = tf.keras.layers.Dense(2 * len(target.columns))(x)
x1, x2 = tf.split(x, num_or_size_splits=2, axis=-1)
outputs1 = sample_layer(x1)
outputs2 = sample_layer(x2)
model = tf.keras.Model(inputs, [outputs1, outputs2])
negloglik = lambda y, rv_y: -rv_y.log_prob(y)
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
return model
Option 3: Use slice :2
# Your code
#[.....]
tfd = tfp.distributions
sample_layer = tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t[..., :2],
scale=1e-3 + tf.math.softplus(0.05 * t[...,2:])))
def get_df_model():
inputs = tf.keras.layers.Input(shape=[len(df.columns),])
x = tf.keras.layers.Dense(10, activation='relu')(inputs)
x = tf.keras.layers.Dense(10, activation='relu')(x)
outputs = tf.keras.layers.Dense(2*len(target.columns))(x)
outputs = sample_layer(outputs)
model = tf.keras.Model(inputs, [outputs])
negloglik = lambda y, rv_y: -rv_y.log_prob(y)
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
return model
model = get_df_model()
model.summary()
model.fit(df, target, epochs=10)
Additionally: If you want to explicitly use independent distributions based on the parameters x1 and x2, try:
def get_df_model():
inputs = tf.keras.layers.Input(shape=[len(df.columns),])
x = tf.keras.layers.Dense(10, activation='relu')(inputs)
x = tf.keras.layers.Dense(10, activation='relu')(x)
x = tf.keras.layers.Dense(2 * len(target.columns))(x)
x1, x2 = tf.split(x, num_or_size_splits=2, axis=-1)
outputs1 = tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t[..., :1],
scale=1e-3 + tf.math.softplus(0.05 * t[...,1:])))(x1)
outputs2 = tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t[..., :1],
scale=1e-3 + tf.math.softplus(0.05 * t[...,1:])))(x2)
model = tf.keras.Model(inputs, [outputs1, outputs2])
negloglik = lambda y, rv_y: -rv_y.log_prob(y)
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
return model

Related

WARNING:tensorflow:Model was constructed with shape (20, 37, 42) for input Tensor("input_5:0", shape=(20, 37, 42), dtype=float32), but

WARNING:tensorflow:Model was constructed with shape (20, 37, 42) for input Tensor("input_5:0", shape=(20, 37, 42), dtype=float32), but it was called on an input with incompatible shape (None, 37).
Hello! Deep learning noob here... I'm having trouble using LSTM layers.
The input is a length 37 float array containing 2 floats and a length 35 one-hot array converted into float. The output is a length 19 array with 0s and 1s. Like the title suggests, I'm having trouble reshaping my input data to fit the model, and I'm not even sure what input dimensions would be considered 'compatible'
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import random
inputs, outputs = [], []
for x in range(10000):
tempi, tempo = [], []
tempi.append(random.random() - 0.5)
tempi.append(random.random() - 0.5)
for x2 in range(35):
if random.random() > 0.5:
tempi.append(1.)
else:
tempi.append(0.)
for x2 in range(19):
if random.random() > 0.5:
tempo.append(1.)
else:
tempo.append(0.)
inputs.append(tempi)
outputs.append(tempo)
batch = 20
timesteps = 42
training_units = 0.85
cutting_point_i = int(len(inputs)*training_units)
cutting_point_o = int(len(outputs)*training_units)
x_train, x_test = np.asarray(inputs[:cutting_point_i]), np.asarray(inputs[cutting_point_i:])
y_train, y_test = np.asarray(outputs[:cutting_point_o]), np.asarray(outputs[cutting_point_o:])
input_layer = keras.Input(shape=(37,timesteps),batch_size=batch)
dense = layers.LSTM(150, activation="sigmoid", return_sequences=True)
x = dense(input_layer)
hidden_layer_2 = layers.LSTM(150, activation="sigmoid", return_sequences=True)(x)
output_layer = layers.Dense(10, activation="softmax")(hidden_layer_2)
model = keras.Model(inputs=input_layer, outputs=output_layer, name="my_model"
Several problems here.
Your input didn't have time steps, you need input shape (n, time steps, features)
In input_shape, the time steps dimension comes first, not last
Your last LSTM layer returned sequences, so you can't compare it with 0s and 1s
What I did:
I added time steps to your data (7)
I permuted the dimensions in input_shape
I set the final return_sequences=False
Completely fixed example with generated data:
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
batch = 20
n_samples = 1000
timesteps = 7
features = 10
x_train = np.random.rand(n_samples, timesteps, features)
y_train = keras.utils.to_categorical(np.random.randint(0, 10, n_samples))
input_layer = keras.Input(shape=(timesteps, features),batch_size=batch)
dense = layers.LSTM(16, activation="sigmoid", return_sequences=True)(input_layer)
hidden_layer_2 = layers.LSTM(16, activation="sigmoid", return_sequences=False)(dense)
output_layer = layers.Dense(10, activation="softmax")(hidden_layer_2)
model = keras.Model(inputs=input_layer, outputs=output_layer, name="my_model")
model.compile(loss='categorical_crossentropy', optimizer='adam')
history = model.fit(x_train, y_train)
Train on 1000 samples
20/1000 [..............................] - ETA: 2:50 - loss: 2.5145
200/1000 [=====>........................] - ETA: 14s - loss: 2.3934
380/1000 [==========>...................] - ETA: 5s - loss: 2.3647
560/1000 [===============>..............] - ETA: 2s - loss: 2.3549
740/1000 [=====================>........] - ETA: 1s - loss: 2.3395
900/1000 [==========================>...] - ETA: 0s - loss: 2.3363
1000/1000 [==============================] - 4s 4ms/sample - loss: 2.3353
The correct input for your model is (20, 37, 42).
Note: Here 20 is the batch_size you have explicitly specified.
Code:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
batch = 20
timesteps = 42
training_units = 0.85
x1 = tf.constant(np.random.randint(50, size =(1000,37, 42)), dtype = tf.float32)
y1 = tf.constant(np.random.randint(10, size =(1000,)), dtype = tf.int32)
input_layer = keras.Input(shape=(37,timesteps),batch_size=batch)
dense = layers.LSTM(150, activation="sigmoid", return_sequences=True)
x = dense(input_layer)
hidden_layer_2 = layers.LSTM(150, activation="sigmoid", return_sequences=True)(x)
hidden_layer_3 = layers.Flatten()(hidden_layer_2)
output_layer = layers.Dense(10, activation="softmax")(hidden_layer_3)
model = keras.Model(inputs=input_layer, outputs=output_layer, name="my_model")
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
tf.keras.utils.plot_model(model, 'my_first_model.png', show_shapes=True)
Model Architecture:
You can clearly see the Input Size.
Code to Run:
model.fit(x = x1, y = y1, batch_size = batch, epochs = 10)
Note: Whatever batch_size you have specified you have to specify the same batch_size in the model.fit() command.
Output:
Epoch 1/10
50/50 [==============================] - 4s 89ms/step - loss: 2.3288 - accuracy: 0.0920
Epoch 2/10
50/50 [==============================] - 5s 91ms/step - loss: 2.3154 - accuracy: 0.1050
Epoch 3/10
50/50 [==============================] - 5s 101ms/step - loss: 2.3114 - accuracy: 0.0900
Epoch 4/10
50/50 [==============================] - 5s 101ms/step - loss: 2.3036 - accuracy: 0.1060
Epoch 5/10
50/50 [==============================] - 5s 99ms/step - loss: 2.2998 - accuracy: 0.1000
Epoch 6/10
50/50 [==============================] - 4s 89ms/step - loss: 2.2986 - accuracy: 0.1170
Epoch 7/10
50/50 [==============================] - 4s 84ms/step - loss: 2.2981 - accuracy: 0.1300
Epoch 8/10
50/50 [==============================] - 5s 103ms/step - loss: 2.2950 - accuracy: 0.1290
Epoch 9/10
50/50 [==============================] - 5s 106ms/step - loss: 2.2960 - accuracy: 0.1210
Epoch 10/10
50/50 [==============================] - 5s 97ms/step - loss: 2.2874 - accuracy: 0.1210

How to use SVD inside keras layers?

My aim is to use SVD to PCA whiten the latent layer before passing it to the decoder module of an autoencoder. I have used tf.linalg.svd but it does not work since it does not contain necessary Keras parameters. So as a workaround I was trying to wrap it inside Lambda but got this error
AttributeError: 'tuple' object has no attribute 'shape'.
I tried SO (E.g. Using SVD in a custom layer in Keras/tensorflow) and did Google search for SVD in Keras but could not find any answers. I have attached a stripped but functional code here:
import numpy as np
import tensorflow as tf
from sklearn import preprocessing
from keras.layers import Lambda, Input, Dense, Multiply, Subtract
from keras.models import Model
from keras import backend as K
from keras.losses import mse
from keras import optimizers
from keras.callbacks import EarlyStopping
x = np.random.randn(100, 5)
train_data = preprocessing.scale(x)
input_shape = (5, )
original_dim = train_data.shape[1]
intermediate_dim_1 = 64
intermediate_dim_2 = 16
latent_dim = 2
batch_size = 10
epochs = 15
# build encoder model
inputs = Input(shape=input_shape, name='encoder_input')
layer_1 = Dense(intermediate_dim_1, activation='tanh') (inputs)
layer_2 = Dense(intermediate_dim_2, activation='tanh') (layer_1)
encoded_layer = Dense(latent_dim, name='latent_layer') (layer_2)
encoder = Model(inputs, encoded_layer, name='encoder')
encoder.summary()
# build decoder model
latent_inputs = Input(shape=(latent_dim,))
layer_1 = Dense(intermediate_dim_1, activation='tanh') (latent_inputs)
layer_2 = Dense(intermediate_dim_2, activation='tanh') (layer_1)
outputs = Dense(original_dim,activation='sigmoid') (layer_2)
decoder = Model(latent_inputs, outputs, name='decoder')
decoder.summary()
# mean removal and pca whitening
meanX = Lambda(lambda x: tf.reduce_mean(x, axis=0, keepdims=True))(encoded_layer)
standardized = Subtract()([encoded_layer, meanX])
sigma2 = K.dot(K.transpose(standardized), standardized)
sigma2 = Lambda(lambda x: x / batch_size)(sigma2)
s, u ,v = tf.linalg.svd(sigma2,compute_uv=True)
# s ,u ,v = Lambda(lambda x: tf.linalg.svd(x,compute_uv=True))(sigma2)
epsilon = 1e-6
# sqrt of number close to 0 leads to problem hence replace it with epsilon
si = tf.where(tf.less(s, epsilon), tf.sqrt(1 / epsilon) * tf.ones_like(s),
tf.math.truediv(1.0, tf.sqrt(s)))
whitening_layer = u # tf.linalg.diag(si) # tf.transpose(v)
whitened_encoding = K.dot(standardized, whitening_layer)
# Connect models
z_decoded = decoder(standardized)
# z_decoded = decoder(whitened_encoding)
# Define losses
reconstruction_loss = mse(inputs,z_decoded)
# Instantiate autoencoder
ae = Model(inputs, z_decoded, name='autoencoder')
ae.add_loss(reconstruction_loss)
# callback = EarlyStopping(monitor='val_loss', patience=5)
adam = optimizers.adam(learning_rate=0.002)
ae.compile(optimizer=adam)
ae.summary()
ae.fit(train_data, epochs=epochs, batch_size=batch_size,
validation_split=0.2, shuffle=True)
To reproduce the error uncomment these lines and comment the one preceding it:
z_decoded = decoder(whitened_encoding)
s ,u ,v = Lambda(lambda x: tf.linalg.svd(x,compute_uv=True))(sigma2)
I would appreciate it if someone could tell me how to wrap the SVD inside Keras layers or an alternate implementation.
Please note that I have not included the reparameterization trick to calculate the loss to keep the code simple.
Thank you !
I solved the problem. To use SVD inside Keras, we need to use the Lambda layer. However, as Lambda returns a tensor with some additional attributes, it is best to do additional work inside the lambda function and return a tensor. Another problem with my code was the combination of encoder and decoder model which I fixed by combining the output of encoder to the input of decoder model. The working code is as follows:
import numpy as np
import tensorflow as tf
from sklearn import preprocessing
from keras.layers import Lambda, Input, Dense, Multiply, Subtract
from keras.models import Model
from keras import backend as K
from keras.losses import mse
from keras import optimizers
from keras.callbacks import EarlyStopping
def SVD(sigma2):
s ,u ,v = tf.linalg.svd(sigma2,compute_uv=True)
epsilon = 1e-6
# sqrt of number close to 0 leads to problem hence replace it with epsilon
si = tf.where(tf.less(s, epsilon),
tf.sqrt(1 / epsilon) * tf.ones_like(s),
tf.math.truediv(1.0, tf.sqrt(s)))
whitening_layer = u # tf.linalg.diag(si) # tf.transpose(v)
return whitening_layer
x = np.random.randn(100, 5)
train_data = preprocessing.scale(x)
input_shape = (5, )
original_dim = train_data.shape[1]
intermediate_dim_1 = 64
intermediate_dim_2 = 16
latent_dim = 2
batch_size = 10
epochs = 15
# build encoder model
inputs = Input(shape=input_shape, name='encoder_input')
layer_1 = Dense(intermediate_dim_1, activation='tanh') (inputs)
layer_2 = Dense(intermediate_dim_2, activation='tanh') (layer_1)
encoded_layer = Dense(latent_dim, name='latent_layer') (layer_2)
encoder = Model(inputs, encoded_layer, name='encoder')
encoder.summary()
# build decoder model
latent_inputs = Input(shape=(latent_dim,))
layer_1 = Dense(intermediate_dim_1, activation='tanh') (latent_inputs)
layer_2 = Dense(intermediate_dim_2, activation='tanh') (layer_1)
outputs = Dense(original_dim,activation='sigmoid') (layer_2)
decoder = Model(latent_inputs, outputs, name='decoder')
decoder.summary()
# mean removal and pca whitening
meanX = Lambda(lambda x: tf.reduce_mean(x, axis=0, keepdims=True))(encoded_layer)
standardized = Subtract()([encoded_layer, meanX])
sigma2 = K.dot(K.transpose(standardized), standardized)
sigma2 = Lambda(lambda x: x / batch_size)(sigma2)
# s, u ,v = tf.linalg.svd(sigma2,compute_uv=True)
whitening_layer = Lambda(SVD)(sigma2)
'''
s ,u ,v = Lambda(lambda x: tf.linalg.svd(x,compute_uv=True))(sigma2)
epsilon = 1e-6
# sqrt of number close to 0 leads to problem hence replace it with epsilon
si = tf.where(tf.less(s, epsilon),
tf.sqrt(1 / epsilon) * tf.ones_like(s),
tf.math.truediv(1.0, tf.sqrt(s)))
whitening_layer = u # tf.linalg.diag(si) # tf.transpose(v)
'''
print('whitening_layer shape=', np.shape(whitening_layer))
print('standardized shape=', np.shape(standardized))
whitened_encoding = K.dot(standardized, whitening_layer)
# Connect models
# z_decoded = decoder(standardized)
z_decoded = decoder(encoder(inputs))
# Define losses
reconstruction_loss = mse(inputs,z_decoded)
# Instantiate autoencoder
ae = Model(inputs, z_decoded, name='autoencoder')
ae.add_loss(reconstruction_loss)
# callback = EarlyStopping(monitor='val_loss', patience=5)
adam = optimizers.adam(learning_rate=0.002)
ae.compile(optimizer=adam)
ae.summary()
ae.fit(train_data, epochs=epochs, batch_size=batch_size,
validation_split=0.2, shuffle=True)
The output of running the code is as follows:
Model: "encoder"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
encoder_input (InputLayer) (None, 5) 0
_________________________________________________________________
dense_1 (Dense) (None, 64) 384
_________________________________________________________________
dense_2 (Dense) (None, 16) 1040
_________________________________________________________________
latent_layer (Dense) (None, 2) 34
=================================================================
Total params: 1,458
Trainable params: 1,458
Non-trainable params: 0
_________________________________________________________________
Model: "decoder"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 2) 0
_________________________________________________________________
dense_3 (Dense) (None, 64) 192
_________________________________________________________________
dense_4 (Dense) (None, 16) 1040
_________________________________________________________________
dense_5 (Dense) (None, 5) 85
=================================================================
Total params: 1,317
Trainable params: 1,317
Non-trainable params: 0
_________________________________________________________________
whitening_layer shape= (2, 2)
standardized shape= (None, 2)
/home/manish/anaconda3/envs/ica_gpu/lib/python3.7/site-packages/keras/engine/training_utils.py:819: UserWarning: Output decoder missing from loss dictionary. We assume this was done on purpose. The fit and evaluate APIs will not be expecting any data to be passed to decoder.
'be expecting any data to be passed to {0}.'.format(name))
Model: "autoencoder"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
encoder_input (InputLayer) (None, 5) 0
_________________________________________________________________
encoder (Model) (None, 2) 1458
_________________________________________________________________
decoder (Model) (None, 5) 1317
=================================================================
Total params: 2,775
Trainable params: 2,775
Non-trainable params: 0
_________________________________________________________________
Train on 80 samples, validate on 20 samples
Epoch 1/15
2020-05-16 16:01:55.443061: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
80/80 [==============================] - 0s 3ms/step - loss: 1.1739 - val_loss: 1.2238
Epoch 2/15
80/80 [==============================] - 0s 228us/step - loss: 1.0601 - val_loss: 1.0921
Epoch 3/15
80/80 [==============================] - 0s 261us/step - loss: 0.9772 - val_loss: 1.0291
Epoch 4/15
80/80 [==============================] - 0s 223us/step - loss: 0.9385 - val_loss: 0.9875
Epoch 5/15
80/80 [==============================] - 0s 262us/step - loss: 0.9105 - val_loss: 0.9560
Epoch 6/15
80/80 [==============================] - 0s 240us/step - loss: 0.8873 - val_loss: 0.9335
Epoch 7/15
80/80 [==============================] - 0s 217us/step - loss: 0.8731 - val_loss: 0.9156
Epoch 8/15
80/80 [==============================] - 0s 253us/step - loss: 0.8564 - val_loss: 0.9061
Epoch 9/15
80/80 [==============================] - 0s 273us/step - loss: 0.8445 - val_loss: 0.8993
Epoch 10/15
80/80 [==============================] - 0s 235us/step - loss: 0.8363 - val_loss: 0.8937
Epoch 11/15
80/80 [==============================] - 0s 283us/step - loss: 0.8299 - val_loss: 0.8874
Epoch 12/15
80/80 [==============================] - 0s 254us/step - loss: 0.8227 - val_loss: 0.8832
Epoch 13/15
80/80 [==============================] - 0s 227us/step - loss: 0.8177 - val_loss: 0.8789
Epoch 14/15
80/80 [==============================] - 0s 241us/step - loss: 0.8142 - val_loss: 0.8725
Epoch 15/15
80/80 [==============================] - 0s 212us/step - loss: 0.8089 - val_loss: 0.8679
I hope this helps.

Why is accuracy lower 0.01, but prediction very good (99,99%)

I did my first own neural network with TensorFlow 2 in Python.
My idea was to build a neural network which is able to find the solution to translate binary numbers (8-bit) in decimal numbers.
After a few tries: Yeah it works very precise!
But what I don't understand: The accuracy is very low.
Second thing is: The model has to train over 200.000 values!
For 256 possible answers. Where are the failure in my code/model?
#dataset
def dataset(length, num):
global testdata, solution
testdata = np.random.randint(2, size=(num, length))
solution = testdata.copy()
solution = np.zeros((num, 1))
for i in range(num):
for n in range(length):
x = testdata [i,length - n -1] * (2 ** n)
solution [i] += x
length = 8
num = 220000
dataset (length, num)
#Modell
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(8, activation='relu'),
tf.keras.layers.Dense(1, activation='relu')
])
model.compile(optimizer='adam',
loss='mean_squared_error',
metrics=['accuracy'])
#Training und Evaluate
model.fit(testdata, solution, epochs=4)
model.evaluate(t_testdata, t_solution, verbose=2)
model.summary()
loss: 6.6441e-05 - accuracy: 0.0077
Shouldn't it be like 0.77 or higher?
You should not consider accuracy as metrics for the regression problem, since you are trying to output a single value, even if the small changes in the precision it will result as zero, you can consider below example.
Consider you are trying to predict value 15, and the model returns value 14.99, the resulting accuracy will still be zero.
m = tf.keras.metrics.Accuracy()
_ = m.update_state([[15]], [[14.99]])
m.result().numpy()
Result:
0.0
You can consider the below list of metrics for regression.
Regression metrics
MeanSquaredError class
RootMeanSquaredError class
MeanAbsoluteError class
MeanAbsolutePercentageError class
MeanSquaredLogarithmicError class
CosineSimilarity class
LogCoshError class
I have tried the same problem with one of the above listed metrics and below is the result.
def bin2int(bin_list):
#bin_list = [0, 0, 0, 1]
int_val = ""
for k in bin_list:
int_val += str(int(k))
#int_val = 11011011
return int(int_val, 2)
def dataset(num):
# num - no of samples
bin_len = 8
X = np.zeros((num, bin_len))
Y = np.zeros((num))
for i in range(num):
X[i] = np.around(np.random.rand(bin_len)).astype(int)
Y[i] = bin2int(X[i])
return X, Y
no_of_smaples = 220000
trainX, trainY = dataset(no_of_smaples)
testX, testY = dataset(5)
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(8, activation='relu'),
tf.keras.layers.Dense(1, activation='relu')
])
model.compile(optimizer='adam',
loss='mean_absolute_error',
metrics=['mse'])
model.fit(trainX, trainY,validation_data = (testX,testY),epochs=4)
model.summary()
Output:
Epoch 1/4
6875/6875 [==============================] - 15s 2ms/step - loss: 27.6938 - mse: 2819.9429 - val_loss: 0.0066 - val_mse: 5.2560e-05
Epoch 2/4
6875/6875 [==============================] - 15s 2ms/step - loss: 0.0580 - mse: 0.1919 - val_loss: 0.0066 - val_mse: 6.0013e-05
Epoch 3/4
6875/6875 [==============================] - 16s 2ms/step - loss: 0.0376 - mse: 0.0868 - val_loss: 0.0106 - val_mse: 1.2932e-04
Epoch 4/4
6875/6875 [==============================] - 15s 2ms/step - loss: 0.0317 - mse: 0.0466 - val_loss: 0.0177 - val_mse: 3.2429e-04
Model: "sequential_11"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_24 (Dense) multiple 72
_________________________________________________________________
dense_25 (Dense) multiple 9
_________________________________________________________________
round_4 (Round) multiple 0
=================================================================
Total params: 81
Trainable params: 81
Non-trainable params: 0
Predict:
model.predict([[0., 0., 0., 0., 0., 1., 1., 0.]])
array([[5.993815]], dtype=float32)

Batch Input Keras Shared Parameters

I am building network to rank a set of N inputs. Ideally they should all be input at the same time and share parameters. Their target vector should be an N-hot vector to match the inputs.
This means my input should be (Batch_size, N, sequence_length, feature_length)
But keras will throw an error for any input larger than 3 dimensions as shown here:
ValueError: Input 0 is incompatible with layer lstm_2: expected
ndim=3, found ndim=4
My current keras set up is:
x = Input(shape=(72,300))
aux_input = Input(shape=(72, 4))
probs = Input(shape=(1,))
#dim_red_1 = Dense(100)(x)
dim_red_2 = Dense(20, activation='tanh')(x)
cat = concatenate([dim_red_2, aux_input])
encoded = LSTM(64)(cat)
cat2 = concatenate([encoded, probs])
output = Dense(1, activation='sigmoid')(cat2)
lstm_model = Model(inputs=[x, aux_input, probs], outputs=output)
lstm_model.compile(optimizer='ADAM', loss='binary_crossentropy', metrics=['accuracy'])
Is there a way to achieve this with Keras?
Although your code seems to be fine, make sure to import the right packages:
import numpy as np
from tensorflow.python.keras import Input
from tensorflow.python.keras.engine.training import Model
from tensorflow.python.keras.layers import Dense, LSTM, Concatenate
a = np.zeros(shape=[1000, 72, 300])
b = np.zeros(shape=[1000, 72, 4])
c = np.zeros(shape=[1000, 1])
d = np.zeros(shape=[1000, 1])
x = Input(shape=(72, 300))
aux_input = Input(shape=(72, 4))
probs = Input(shape=(1,))
dim_red_2 = Dense(20, activation='tanh')(x)
cat = Concatenate()([dim_red_2, aux_input])
encoded = LSTM(64)(cat)
cat2 = Concatenate()([encoded, probs])
output = Dense(1, activation='sigmoid')(cat2)
lstm_model = Model(inputs=[x, aux_input, probs], outputs=output)
lstm_model.compile(optimizer='ADAM', loss='binary_crossentropy', metrics=['accuracy'])
lstm_model.summary()
lstm_model.fit([a, b, c], d, batch_size=256)
output:
256/1000 [======>.......................] - ETA: 2s - loss: 0.6931 - acc: 1.0000
512/1000 [==============>...............] - ETA: 1s - loss: 0.6910 - acc: 1.0000
768/1000 [======================>.......] - ETA: 0s - loss: 0.6885 - acc: 1.0000
1000/1000 [==============================] - 1s 1ms/step - loss: 0.6859 - acc: 1.00

MobileNet transfer learning in Keras for object localization extraction - loss computed as NaN

I'm trying to use Keras and its MobileNet implementation to do object localization (output the x/y coordinates of a few features, instead of classes) and I'm running into some likely very basic issue that I can't figure out.
My code looks like this:
# =============================
# Load MobileNet and change the top layers.
model = applications.MobileNet(weights="imagenet",
include_top=False,
input_shape=(224, 224, 3))
# Freeze all the layers except the very last 5.
for layer in model.layers[:-5]:
layer.trainable = False
# Adding custom Layers at the end, after the last Conv2D layer.
x = model.output
x = GlobalAveragePooling2D()(x)
x = Reshape((1, 1, 1024))(x)
x = Dropout(0.5)(x)
x = Conv2D(1024, (1, 1), activation='relu', padding='same', name='conv_preds')(x)
x = Dense(1024, activation="relu")(x)
# I'd like this to output 4 variables, two pairs of x/y coordinates
x = Dense(PREDICT_SIZE, activation="sigmoid")(x)
predictions = Reshape((PREDICT_SIZE,))(x)
# =============================
# Create the new final model.
model_final = Model(input = model.input, output = predictions)
def custom_loss(y_true, y_pred):
'''Trying to compute the Euclidian distance as a Loss Function'''
return K.sqrt(K.sum(K.square(y_true - y_pred), axis=-1))
model_final.compile(loss = custom_loss,
optimizer = optimizers.adam(lr=0.0001),
metrics=["accuracy"])
With this model, then I load the data and try to train it.
x_train, y_train, x_val, y_val = load_data(DATASET_DIR)
# This load_data is my own implementation. It returns the images
# as tensors.
# ==> x_train[0].shape= (224, 224, 3)
#
# y_train and y_val look like this:
# ==> y_train[0]= [ 0.182 -0.0933 0.072 -0.0453]
#
# holding values in the [0, 1] interval for where the pixel
# is relative to the width/height of the image.
#
model_final.fit(x_train, y_train,
batch_size=batch_size, epochs=5, shuffle=False,
validation_data=(x_val, y_val))
Unfortunately, what I get when I run this model to train, I get something like this:
Train on 45 samples, validate on 5 samples
Epoch 1/5
16/45 [=========>....................] - ETA: 2s - loss: nan - acc: 0.0625
32/45 [====================>.........] - ETA: 1s - loss: nan - acc: 0.0312
45/45 [==============================] - 4s - loss: nan - acc: 0.0222 - val_loss: nan - val_acc: 0.0000e+00
Epoch 2/5
16/45 [=========>....................] - ETA: 2s - loss: nan - acc: 0.0625
32/45 [====================>.........] - ETA: 1s - loss: nan - acc: 0.0312
45/45 [==============================] - 4s - loss: nan - acc: 0.0222 - val_loss: nan - val_acc: 0.0000e+00
Epoch 3/5
I'm at a loss about why my loss value is "nan". I must be doing something wrong, and I've tried to change everything - the loss function, the shape of the output... but I can't figure out what I'm doing wrong.
Any help would be appreciated!
UPDATE: it seems like the issue is in the way I load_data.
If I create the image data like this it fails and results in loss:nan
i = pil_image.open(img_filename)
img = image.load_img(img_filename, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = keras.applications.mobilenet.preprocess_input(x)
x_train = np.append(x_train, x, axis=0)
but if I do something trivial like this, 'fit' works just fine and computes real values for loss:
x_train = np.random.random((100, 224, 224, 3))
sigh I wonder what's happening...
UPDATE #2: I figured out what the issue was
Documenting this here in case it helps anybody.
The way to properly generate the input tensors for MobileNet is this one:
test_img=[]
for i in range(len(test)):
temp_img=image.load_img(test_path+test['filename'][i],target_size=(224,224))
temp_img=image.img_to_array(temp_img)
test_img.append(temp_img)
test_img=np.array(test_img)
test_img=preprocess_input(test_img)
Notice how making it into a numpy.array and running preprocess_input happens on the whole batch of images. Doing it image by image seems to not have worked (what I was doing before).
Hope this helps somebody someday.

Categories

Resources