I did my first own neural network with TensorFlow 2 in Python.
My idea was to build a neural network which is able to find the solution to translate binary numbers (8-bit) in decimal numbers.
After a few tries: Yeah it works very precise!
But what I don't understand: The accuracy is very low.
Second thing is: The model has to train over 200.000 values!
For 256 possible answers. Where are the failure in my code/model?
#dataset
def dataset(length, num):
global testdata, solution
testdata = np.random.randint(2, size=(num, length))
solution = testdata.copy()
solution = np.zeros((num, 1))
for i in range(num):
for n in range(length):
x = testdata [i,length - n -1] * (2 ** n)
solution [i] += x
length = 8
num = 220000
dataset (length, num)
#Modell
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(8, activation='relu'),
tf.keras.layers.Dense(1, activation='relu')
])
model.compile(optimizer='adam',
loss='mean_squared_error',
metrics=['accuracy'])
#Training und Evaluate
model.fit(testdata, solution, epochs=4)
model.evaluate(t_testdata, t_solution, verbose=2)
model.summary()
loss: 6.6441e-05 - accuracy: 0.0077
Shouldn't it be like 0.77 or higher?
You should not consider accuracy as metrics for the regression problem, since you are trying to output a single value, even if the small changes in the precision it will result as zero, you can consider below example.
Consider you are trying to predict value 15, and the model returns value 14.99, the resulting accuracy will still be zero.
m = tf.keras.metrics.Accuracy()
_ = m.update_state([[15]], [[14.99]])
m.result().numpy()
Result:
0.0
You can consider the below list of metrics for regression.
Regression metrics
MeanSquaredError class
RootMeanSquaredError class
MeanAbsoluteError class
MeanAbsolutePercentageError class
MeanSquaredLogarithmicError class
CosineSimilarity class
LogCoshError class
I have tried the same problem with one of the above listed metrics and below is the result.
def bin2int(bin_list):
#bin_list = [0, 0, 0, 1]
int_val = ""
for k in bin_list:
int_val += str(int(k))
#int_val = 11011011
return int(int_val, 2)
def dataset(num):
# num - no of samples
bin_len = 8
X = np.zeros((num, bin_len))
Y = np.zeros((num))
for i in range(num):
X[i] = np.around(np.random.rand(bin_len)).astype(int)
Y[i] = bin2int(X[i])
return X, Y
no_of_smaples = 220000
trainX, trainY = dataset(no_of_smaples)
testX, testY = dataset(5)
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(8, activation='relu'),
tf.keras.layers.Dense(1, activation='relu')
])
model.compile(optimizer='adam',
loss='mean_absolute_error',
metrics=['mse'])
model.fit(trainX, trainY,validation_data = (testX,testY),epochs=4)
model.summary()
Output:
Epoch 1/4
6875/6875 [==============================] - 15s 2ms/step - loss: 27.6938 - mse: 2819.9429 - val_loss: 0.0066 - val_mse: 5.2560e-05
Epoch 2/4
6875/6875 [==============================] - 15s 2ms/step - loss: 0.0580 - mse: 0.1919 - val_loss: 0.0066 - val_mse: 6.0013e-05
Epoch 3/4
6875/6875 [==============================] - 16s 2ms/step - loss: 0.0376 - mse: 0.0868 - val_loss: 0.0106 - val_mse: 1.2932e-04
Epoch 4/4
6875/6875 [==============================] - 15s 2ms/step - loss: 0.0317 - mse: 0.0466 - val_loss: 0.0177 - val_mse: 3.2429e-04
Model: "sequential_11"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_24 (Dense) multiple 72
_________________________________________________________________
dense_25 (Dense) multiple 9
_________________________________________________________________
round_4 (Round) multiple 0
=================================================================
Total params: 81
Trainable params: 81
Non-trainable params: 0
Predict:
model.predict([[0., 0., 0., 0., 0., 1., 1., 0.]])
array([[5.993815]], dtype=float32)
Related
What I'm trying to accomplish is having a set relationship between some custom input data and output data then have a neural network figure out this relationship/rule to predict future output given the input. I've set up some test code here where a random list of inputs is generated and if it is more than 0.5 the output is 1 otherwise the output is 0.
from tensorflow import keras
import numpy as np
# generate data
data_input_generate = np.random.random((6400, 1))
data_output_generate = np.random.randint(2, size=(6400, 1))
data_input = np.vstack([data_input_generate, data_input_generate])
data_output = np.vstack([data_output_generate, data_output_generate])
for i in range(len(data_input)):
if data_input[i] >= 0.5:
data_output[i] = [1]
else:
data_output[i] = [0]
# setup neural network
Inputs = keras.layers.Input(shape=(1, ))
hidden1 = keras.layers.Dense(units=100, activation="sigmoid")(Inputs)
hidden2 = keras.layers.Dense(units=100, activation='softmax')(hidden1)
predictions = keras.layers.Dense(units=1, activation='relu')(hidden2)
# initialize model
model = keras.Model([Inputs], outputs=predictions)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit
model.fit(data_input, data_output, batch_size=10, epochs=5)
# predict
predictions = model.predict(data_input_generate)
# print predictions
for i in range(10):
print(f"Value: {data_input_generate[i]}, Result: {data_output_generate[i]}, Prediction: {predictions[i]}")
The problem is, after fitting the model, the accuracy stays at 50%. Is this a problem with my layer activation function or the way I set up the model? My goal is to correctly predict the output with fairly high accuracy. Thanks in advance!
Try using a sigmoid activation function on your output layer. Here is a working example:
from tensorflow import keras
import numpy as np
# generate data
data_input_generate = np.random.random((6400, 1))
data_output_generate = np.random.randint(2, size=(6400, 1))
data_input = np.vstack([data_input_generate, data_input_generate])
data_output = np.vstack([data_output_generate, data_output_generate])
for i in range(len(data_input)):
if data_input[i] >= 0.5:
data_output[i] = [1]
else:
data_output[i] = [0]
# setup neural network
Inputs = keras.layers.Input(shape=(1, ))
hidden1 = keras.layers.Dense(units=64, activation="relu")(Inputs)
hidden2 = keras.layers.Dense(units=32, activation='relu')(hidden1)
dropout = keras.layers.Dropout(0.8)(hidden2)
predictions = keras.layers.Dense(units=1, activation='sigmoid')(dropout)
# initialize model
model = keras.Model([Inputs], outputs=predictions)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit
model.fit(data_input, data_output, batch_size=10, epochs=5)
# predict
predictions = model.predict(data_input_generate)
# print predictions
for i in range(10):
print(f"Value: {data_input_generate[i]}, Result: {data_output_generate[i]}, Prediction: {predictions[i]}")
Epoch 1/5
1280/1280 [==============================] - 4s 3ms/step - loss: 0.3632 - accuracy: 0.8327
Epoch 2/5
1280/1280 [==============================] - 3s 3ms/step - loss: 0.1870 - accuracy: 0.9427
Epoch 3/5
1280/1280 [==============================] - 3s 3ms/step - loss: 0.1528 - accuracy: 0.9475
Epoch 4/5
1280/1280 [==============================] - 3s 2ms/step - loss: 0.1461 - accuracy: 0.9482
Epoch 5/5
1280/1280 [==============================] - 2s 2ms/step - loss: 0.1384 - accuracy: 0.9493
Value: [0.79415764], Result: [0], Prediction: [0.9997529]
Value: [0.38311113], Result: [1], Prediction: [1.7478478e-05]
Value: [0.05360975], Result: [0], Prediction: [2.3240638e-07]
Value: [0.78635261], Result: [1], Prediction: [0.99970365]
Value: [0.74414175], Result: [1], Prediction: [0.99921006]
Value: [0.47845171], Result: [1], Prediction: [0.07256863]
Value: [0.53008247], Result: [0], Prediction: [0.886382]
Value: [0.40377478], Result: [1], Prediction: [9.9769844e-05]
Value: [0.18209166], Result: [1], Prediction: [5.199377e-07]
Value: [0.00937745], Result: [1], Prediction: [1.7613968e-07]
I have a simple model that currently outputs a single numerical value which I've adapted to instead output a distribution using TFP (mean + std deviation) so I can instead understand the model's confidence around the prediction.
model = tf.keras.Sequential([
tf.keras.layers.Dense(10, input_shape=[len(df.columns),], activation='relu'), # Should only be one input, so [1,]
tf.keras.layers.Dense(10, activation='relu'),
tf.keras.layers.Dense(2 * len(target.columns)), # there are 2 outputs, so we want a mean + standard deviation for EACH of the outputs
tfp.layers.DistributionLambda(
lambda t: tfd.Normal(loc=t[..., :1],
scale=1e-3 + tf.math.softplus(0.05 * t[...,1:]))
)
])
The current 2 Dense outputs point to the mean + standard deviation of the output distribution.
In my real dataset, I have two numerical values I attempt to predict based on input data. How do I make a model output two distributions? I think the final Dense layer would need to be 4 nodes (2 means and 2 standard deviations), but I'm not sure how to make this properly work with the Distribution Lambda. I'm hoping to have a single model that predicts this rather than having to train one model per target output.
EDIT: I created this collab for people to see what I'm getting at a little more easily. I simplified the example a little bit more and hopefully, it's more self-explanatory what I'm trying to accomplish:
https://colab.research.google.com/drive/1Wlucked4V0z-Bm_ql8XJnOJL0Gm4EwnE?usp=sharing
Check out this guide on shapes in TFP: https://www.tensorflow.org/probability/examples/Understanding_TensorFlow_Distributions_Shapes
IIUC you'll want to output a distribution with batch_shape = [2]. This is effectively 2 distributions of the same family, with different parameters. Computations done with this batch of distributions (samples, pdf/log_pdf evaluations) will be vectorized (run in parallel).
IIUC and assuming you want to leave your tfp.layers.DistributionLambda as it is, you have a few options, which can you experiment with:
Option 1: Use two Dense layers with the Keras functional API:
# Your code
#[.....]
tfd = tfp.distributions
sample_layer = tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t[..., :1],
scale=1e-3 + tf.math.softplus(0.05 * t[...,1:])))
def get_df_model():
inputs = tf.keras.layers.Input(shape=[len(df.columns),])
x = tf.keras.layers.Dense(10, activation='relu')(inputs)
x = tf.keras.layers.Dense(10, activation='relu')(x)
outputs1 = tf.keras.layers.Dense(len(target.columns))(x)
outputs2 = tf.keras.layers.Dense(len(target.columns))(x) # there are 2 outputs, so we want a mean + standard deviation for EACH of the outputs
outputs1 = sample_layer(outputs1)
outputs2 = sample_layer(outputs2)
model = tf.keras.Model(inputs, [outputs1, outputs2])
negloglik = lambda y, rv_y: -rv_y.log_prob(y)
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
return model
model = get_df_model()
model.summary()
model.fit(df, target, epochs=10)
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 1)] 0 []
dense_24 (Dense) (None, 10) 20 ['input_1[0][0]']
dense_25 (Dense) (None, 10) 110 ['dense_24[0][0]']
dense_26 (Dense) (None, 2) 22 ['dense_25[0][0]']
dense_27 (Dense) (None, 2) 22 ['dense_25[0][0]']
distribution_lambda_10 (Distri ((None, 1), 0 ['dense_26[0][0]',
butionLambda) (None, 1)) 'dense_27[0][0]']
==================================================================================================
Total params: 174
Trainable params: 174
Non-trainable params: 0
__________________________________________________________________________________________________
Epoch 1/10
157/157 [==============================] - 1s 2ms/step - loss: 522.2677 - distribution_lambda_10_loss: 247.8716 - distribution_lambda_10_1_loss: 274.3961
Epoch 2/10
157/157 [==============================] - 1s 3ms/step - loss: 20.3496 - distribution_lambda_10_loss: 9.5429 - distribution_lambda_10_1_loss: 10.8067
Epoch 3/10
157/157 [==============================] - 1s 6ms/step - loss: 13.7444 - distribution_lambda_10_loss: 6.6085 - distribution_lambda_10_1_loss: 7.1359
Epoch 4/10
157/157 [==============================] - 1s 7ms/step - loss: 11.3713 - distribution_lambda_10_loss: 5.5506 - distribution_lambda_10_1_loss: 5.8206
Epoch 5/10
157/157 [==============================] - 1s 4ms/step - loss: 10.2081 - distribution_lambda_10_loss: 5.0250 - distribution_lambda_10_1_loss: 5.1830
Epoch 6/10
157/157 [==============================] - 0s 3ms/step - loss: 9.5528 - distribution_lambda_10_loss: 4.7256 - distribution_lambda_10_1_loss: 4.8272
Epoch 7/10
157/157 [==============================] - 0s 2ms/step - loss: 9.1495 - distribution_lambda_10_loss: 4.5393 - distribution_lambda_10_1_loss: 4.6102
Epoch 8/10
157/157 [==============================] - 1s 6ms/step - loss: 8.8837 - distribution_lambda_10_loss: 4.4159 - distribution_lambda_10_1_loss: 4.4678
Epoch 9/10
157/157 [==============================] - 0s 3ms/step - loss: 8.7027 - distribution_lambda_10_loss: 4.3319 - distribution_lambda_10_1_loss: 4.3708
Epoch 10/10
157/157 [==============================] - 0s 3ms/step - loss: 8.5743 - distribution_lambda_10_loss: 4.2724 - distribution_lambda_10_1_loss: 4.3019
<keras.callbacks.History at 0x7f51001c2f50>
Note what the docs state regarding the distributions when using DistributionLambda:
By default, a distribution is represented as a tensor via a random draw, e.g., tfp.distributions.Distribution.sample
Option 2: Use one Dense layer and split the output into two:
def get_df_model():
sample_layer = tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t[..., :1],
scale=1e-3 + tf.math.softplus(0.05 * t[...,1:])))
inputs = tf.keras.layers.Input(shape=[len(df.columns),])
x = tf.keras.layers.Dense(10, activation='relu')(inputs)
x = tf.keras.layers.Dense(10, activation='relu')(x)
x = tf.keras.layers.Dense(2 * len(target.columns))(x)
x1, x2 = tf.split(x, num_or_size_splits=2, axis=-1)
outputs1 = sample_layer(x1)
outputs2 = sample_layer(x2)
model = tf.keras.Model(inputs, [outputs1, outputs2])
negloglik = lambda y, rv_y: -rv_y.log_prob(y)
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
return model
Option 3: Use slice :2
# Your code
#[.....]
tfd = tfp.distributions
sample_layer = tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t[..., :2],
scale=1e-3 + tf.math.softplus(0.05 * t[...,2:])))
def get_df_model():
inputs = tf.keras.layers.Input(shape=[len(df.columns),])
x = tf.keras.layers.Dense(10, activation='relu')(inputs)
x = tf.keras.layers.Dense(10, activation='relu')(x)
outputs = tf.keras.layers.Dense(2*len(target.columns))(x)
outputs = sample_layer(outputs)
model = tf.keras.Model(inputs, [outputs])
negloglik = lambda y, rv_y: -rv_y.log_prob(y)
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
return model
model = get_df_model()
model.summary()
model.fit(df, target, epochs=10)
Additionally: If you want to explicitly use independent distributions based on the parameters x1 and x2, try:
def get_df_model():
inputs = tf.keras.layers.Input(shape=[len(df.columns),])
x = tf.keras.layers.Dense(10, activation='relu')(inputs)
x = tf.keras.layers.Dense(10, activation='relu')(x)
x = tf.keras.layers.Dense(2 * len(target.columns))(x)
x1, x2 = tf.split(x, num_or_size_splits=2, axis=-1)
outputs1 = tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t[..., :1],
scale=1e-3 + tf.math.softplus(0.05 * t[...,1:])))(x1)
outputs2 = tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t[..., :1],
scale=1e-3 + tf.math.softplus(0.05 * t[...,1:])))(x2)
model = tf.keras.Model(inputs, [outputs1, outputs2])
negloglik = lambda y, rv_y: -rv_y.log_prob(y)
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
return model
I am trying to use the tensorflow maxout implementation (https://www.tensorflow.org/addons/api_docs/python/tfa/layers/Maxout) but struggle with it;
I try to illustrate my problem: If I have the following
d=3
x_in=Input(shape=d)
x_out=Dense(d, activation='relu')(x_in)
model = Model(inputs=x_in, outputs=x_out)
model.compile(optimizer='adam', loss='MeanAbsoluteError')
X=tf.random.normal([200,3])
Y=tf.random.normal([200,3])
model.fit(X, Y, epochs=5, batch_size=32)
Then it is working normally, i.e. the loss is continuously getting smaller and I can get the estimated weights:
model.layers[1].get_weights()
Out[141]:
[array([[-0.15133516, -0.14892222, -0.64674205],
[ 0.34437487, 0.7822309 , -0.08931279],
[-0.8330534 , -0.13827904, -0.23096593]], dtype=float32),
array([-0.03069788, -0.03311999, -0.02603031], dtype=float32)]
However, when I want to use a maxout activation instead, things do not work out
d=3
x_in=Input(shape=d)
x_out = tfa.layers.Maxout(3)(x_in)
model = Model(inputs=x_in, outputs=x_out)
model.compile(optimizer='adam', loss='MeanAbsoluteError')
X=tf.random.normal([200,3])
Y=tf.random.normal([200,3])
model.fit(X, Y, epochs=5, batch_size=32)
The loss stays constant for all Epochs and
model.layers[1].get_weights()
Out[141]: []
Where is my mistake?
It will only work in combination with another layer, for example a Dense layer. Also, the Maxout layer itself does not have any trainable weights as you can see in the model summary but it does have a hyperparameter num_units:
import tensorflow as tf
import tensorflow_addons as tfa
d=3
x_in=tf.keras.layers.Input(shape=d)
x = tf.keras.layers.Dense(3)(x_in)
x_out = tfa.layers.Maxout(3)(x)
model = tf.keras.Model(inputs=x_in, outputs=x_out)
model.compile(optimizer='adam', loss='MeanAbsoluteError')
X=tf.random.normal([200,3])
Y=tf.random.normal([200,3])
model.fit(X, Y, epochs=5, batch_size=32)
print(model.summary())
Epoch 1/5
7/7 [==============================] - 0s 2ms/step - loss: 1.0404
Epoch 2/5
7/7 [==============================] - 0s 3ms/step - loss: 1.0361
Epoch 3/5
7/7 [==============================] - 0s 2ms/step - loss: 1.0322
Epoch 4/5
7/7 [==============================] - 0s 2ms/step - loss: 1.0283
Epoch 5/5
7/7 [==============================] - 0s 3ms/step - loss: 1.0244
Model: "model_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_6 (InputLayer) [(None, 3)] 0
dense_5 (Dense) (None, 3) 12
maxout_4 (Maxout) (None, 3) 0
=================================================================
Total params: 12
Trainable params: 12
Non-trainable params: 0
_________________________________________________________________
None
Maybe also take a look at the paper regarding Maxout:
The maxout model is simply a feed-forward achitecture, such as a multilayer perceptron or deep convolutional neural network, that uses a new type of activation function: the maxout unit.
WARNING:tensorflow:Model was constructed with shape (20, 37, 42) for input Tensor("input_5:0", shape=(20, 37, 42), dtype=float32), but it was called on an input with incompatible shape (None, 37).
Hello! Deep learning noob here... I'm having trouble using LSTM layers.
The input is a length 37 float array containing 2 floats and a length 35 one-hot array converted into float. The output is a length 19 array with 0s and 1s. Like the title suggests, I'm having trouble reshaping my input data to fit the model, and I'm not even sure what input dimensions would be considered 'compatible'
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import random
inputs, outputs = [], []
for x in range(10000):
tempi, tempo = [], []
tempi.append(random.random() - 0.5)
tempi.append(random.random() - 0.5)
for x2 in range(35):
if random.random() > 0.5:
tempi.append(1.)
else:
tempi.append(0.)
for x2 in range(19):
if random.random() > 0.5:
tempo.append(1.)
else:
tempo.append(0.)
inputs.append(tempi)
outputs.append(tempo)
batch = 20
timesteps = 42
training_units = 0.85
cutting_point_i = int(len(inputs)*training_units)
cutting_point_o = int(len(outputs)*training_units)
x_train, x_test = np.asarray(inputs[:cutting_point_i]), np.asarray(inputs[cutting_point_i:])
y_train, y_test = np.asarray(outputs[:cutting_point_o]), np.asarray(outputs[cutting_point_o:])
input_layer = keras.Input(shape=(37,timesteps),batch_size=batch)
dense = layers.LSTM(150, activation="sigmoid", return_sequences=True)
x = dense(input_layer)
hidden_layer_2 = layers.LSTM(150, activation="sigmoid", return_sequences=True)(x)
output_layer = layers.Dense(10, activation="softmax")(hidden_layer_2)
model = keras.Model(inputs=input_layer, outputs=output_layer, name="my_model"
Several problems here.
Your input didn't have time steps, you need input shape (n, time steps, features)
In input_shape, the time steps dimension comes first, not last
Your last LSTM layer returned sequences, so you can't compare it with 0s and 1s
What I did:
I added time steps to your data (7)
I permuted the dimensions in input_shape
I set the final return_sequences=False
Completely fixed example with generated data:
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
batch = 20
n_samples = 1000
timesteps = 7
features = 10
x_train = np.random.rand(n_samples, timesteps, features)
y_train = keras.utils.to_categorical(np.random.randint(0, 10, n_samples))
input_layer = keras.Input(shape=(timesteps, features),batch_size=batch)
dense = layers.LSTM(16, activation="sigmoid", return_sequences=True)(input_layer)
hidden_layer_2 = layers.LSTM(16, activation="sigmoid", return_sequences=False)(dense)
output_layer = layers.Dense(10, activation="softmax")(hidden_layer_2)
model = keras.Model(inputs=input_layer, outputs=output_layer, name="my_model")
model.compile(loss='categorical_crossentropy', optimizer='adam')
history = model.fit(x_train, y_train)
Train on 1000 samples
20/1000 [..............................] - ETA: 2:50 - loss: 2.5145
200/1000 [=====>........................] - ETA: 14s - loss: 2.3934
380/1000 [==========>...................] - ETA: 5s - loss: 2.3647
560/1000 [===============>..............] - ETA: 2s - loss: 2.3549
740/1000 [=====================>........] - ETA: 1s - loss: 2.3395
900/1000 [==========================>...] - ETA: 0s - loss: 2.3363
1000/1000 [==============================] - 4s 4ms/sample - loss: 2.3353
The correct input for your model is (20, 37, 42).
Note: Here 20 is the batch_size you have explicitly specified.
Code:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
batch = 20
timesteps = 42
training_units = 0.85
x1 = tf.constant(np.random.randint(50, size =(1000,37, 42)), dtype = tf.float32)
y1 = tf.constant(np.random.randint(10, size =(1000,)), dtype = tf.int32)
input_layer = keras.Input(shape=(37,timesteps),batch_size=batch)
dense = layers.LSTM(150, activation="sigmoid", return_sequences=True)
x = dense(input_layer)
hidden_layer_2 = layers.LSTM(150, activation="sigmoid", return_sequences=True)(x)
hidden_layer_3 = layers.Flatten()(hidden_layer_2)
output_layer = layers.Dense(10, activation="softmax")(hidden_layer_3)
model = keras.Model(inputs=input_layer, outputs=output_layer, name="my_model")
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
tf.keras.utils.plot_model(model, 'my_first_model.png', show_shapes=True)
Model Architecture:
You can clearly see the Input Size.
Code to Run:
model.fit(x = x1, y = y1, batch_size = batch, epochs = 10)
Note: Whatever batch_size you have specified you have to specify the same batch_size in the model.fit() command.
Output:
Epoch 1/10
50/50 [==============================] - 4s 89ms/step - loss: 2.3288 - accuracy: 0.0920
Epoch 2/10
50/50 [==============================] - 5s 91ms/step - loss: 2.3154 - accuracy: 0.1050
Epoch 3/10
50/50 [==============================] - 5s 101ms/step - loss: 2.3114 - accuracy: 0.0900
Epoch 4/10
50/50 [==============================] - 5s 101ms/step - loss: 2.3036 - accuracy: 0.1060
Epoch 5/10
50/50 [==============================] - 5s 99ms/step - loss: 2.2998 - accuracy: 0.1000
Epoch 6/10
50/50 [==============================] - 4s 89ms/step - loss: 2.2986 - accuracy: 0.1170
Epoch 7/10
50/50 [==============================] - 4s 84ms/step - loss: 2.2981 - accuracy: 0.1300
Epoch 8/10
50/50 [==============================] - 5s 103ms/step - loss: 2.2950 - accuracy: 0.1290
Epoch 9/10
50/50 [==============================] - 5s 106ms/step - loss: 2.2960 - accuracy: 0.1210
Epoch 10/10
50/50 [==============================] - 5s 97ms/step - loss: 2.2874 - accuracy: 0.1210
I'm trying to use Keras and its MobileNet implementation to do object localization (output the x/y coordinates of a few features, instead of classes) and I'm running into some likely very basic issue that I can't figure out.
My code looks like this:
# =============================
# Load MobileNet and change the top layers.
model = applications.MobileNet(weights="imagenet",
include_top=False,
input_shape=(224, 224, 3))
# Freeze all the layers except the very last 5.
for layer in model.layers[:-5]:
layer.trainable = False
# Adding custom Layers at the end, after the last Conv2D layer.
x = model.output
x = GlobalAveragePooling2D()(x)
x = Reshape((1, 1, 1024))(x)
x = Dropout(0.5)(x)
x = Conv2D(1024, (1, 1), activation='relu', padding='same', name='conv_preds')(x)
x = Dense(1024, activation="relu")(x)
# I'd like this to output 4 variables, two pairs of x/y coordinates
x = Dense(PREDICT_SIZE, activation="sigmoid")(x)
predictions = Reshape((PREDICT_SIZE,))(x)
# =============================
# Create the new final model.
model_final = Model(input = model.input, output = predictions)
def custom_loss(y_true, y_pred):
'''Trying to compute the Euclidian distance as a Loss Function'''
return K.sqrt(K.sum(K.square(y_true - y_pred), axis=-1))
model_final.compile(loss = custom_loss,
optimizer = optimizers.adam(lr=0.0001),
metrics=["accuracy"])
With this model, then I load the data and try to train it.
x_train, y_train, x_val, y_val = load_data(DATASET_DIR)
# This load_data is my own implementation. It returns the images
# as tensors.
# ==> x_train[0].shape= (224, 224, 3)
#
# y_train and y_val look like this:
# ==> y_train[0]= [ 0.182 -0.0933 0.072 -0.0453]
#
# holding values in the [0, 1] interval for where the pixel
# is relative to the width/height of the image.
#
model_final.fit(x_train, y_train,
batch_size=batch_size, epochs=5, shuffle=False,
validation_data=(x_val, y_val))
Unfortunately, what I get when I run this model to train, I get something like this:
Train on 45 samples, validate on 5 samples
Epoch 1/5
16/45 [=========>....................] - ETA: 2s - loss: nan - acc: 0.0625
32/45 [====================>.........] - ETA: 1s - loss: nan - acc: 0.0312
45/45 [==============================] - 4s - loss: nan - acc: 0.0222 - val_loss: nan - val_acc: 0.0000e+00
Epoch 2/5
16/45 [=========>....................] - ETA: 2s - loss: nan - acc: 0.0625
32/45 [====================>.........] - ETA: 1s - loss: nan - acc: 0.0312
45/45 [==============================] - 4s - loss: nan - acc: 0.0222 - val_loss: nan - val_acc: 0.0000e+00
Epoch 3/5
I'm at a loss about why my loss value is "nan". I must be doing something wrong, and I've tried to change everything - the loss function, the shape of the output... but I can't figure out what I'm doing wrong.
Any help would be appreciated!
UPDATE: it seems like the issue is in the way I load_data.
If I create the image data like this it fails and results in loss:nan
i = pil_image.open(img_filename)
img = image.load_img(img_filename, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = keras.applications.mobilenet.preprocess_input(x)
x_train = np.append(x_train, x, axis=0)
but if I do something trivial like this, 'fit' works just fine and computes real values for loss:
x_train = np.random.random((100, 224, 224, 3))
sigh I wonder what's happening...
UPDATE #2: I figured out what the issue was
Documenting this here in case it helps anybody.
The way to properly generate the input tensors for MobileNet is this one:
test_img=[]
for i in range(len(test)):
temp_img=image.load_img(test_path+test['filename'][i],target_size=(224,224))
temp_img=image.img_to_array(temp_img)
test_img.append(temp_img)
test_img=np.array(test_img)
test_img=preprocess_input(test_img)
Notice how making it into a numpy.array and running preprocess_input happens on the whole batch of images. Doing it image by image seems to not have worked (what I was doing before).
Hope this helps somebody someday.