I have modeled multivariate model with more than 100 different output layers in parallel.
I am able to get averaged loss function, but it is not really possible for me to get averaged accuracy values. (I am doing regression)
Would you be kind enough as to suggest any idea of how to do this in KERAS?
Thanks
I create a custom callback to do this
class MergeMetrics(Callback):
def __init__(self,**kargs):
super(MergeMetrics,self).__init__(**kargs)
def on_epoch_begin(self,epoch, logs={}):
return
def on_epoch_end(self, epoch, logs={}):
logs['merge_mse'] = np.mean([logs[m] for m in logs.keys() if 'mse' in m])
logs['merge_mae'] = np.mean([logs[m] for m in logs.keys() if 'mae' in m])
I use this callback to merge 2 metrics coming from 2 different outputs. I use a simple problem for example but you can integrate it easily in your problem and integrate it with a validation set
this is the dummy example where I use mse and mae as metrics
X = np.random.uniform(0,1, (1000,10))
y1 = np.random.uniform(0,1, 1000)
y2 = np.random.uniform(0,1, 1000)
inp = Input((10,))
x = Dense(32, activation='relu')(inp)
out1 = Dense(1, name='y1')(x)
out2 = Dense(1, name='y2')(x)
m = Model(inp, [out1,out2])
m.compile('adam','mae', metrics=['mse','mae'])
checkpoint = MergeMetrics()
m.fit(X, [y1,y2], epochs=10, callbacks=[checkpoint])
the printed output is:
loss: ... - y1_mse: 0.2227 - y1_mae: 0.3884 - y2_mse: 0.1163 - y2_mae: 0.2805 - merge_mse: 0.1695 - merge_mae: 0.3345
Related
I am trying to replicate the code available in https://machinelearningmastery.com/neural-networks-are-function-approximators/ using pytorch and adding train and test data but the prediction and loss results are not good.
In first attempt, I tried to change deep network nodes, epochs and learning rate, then
I tried to add scheduler to adjust learning rate and I implemented a small checkpoint to save the good models but still it was not enough to get good results.
I was wondering if the community have any idea to fix my code.
DATA
X = torch.arange(start, end, step, dtype=torch.float32).unsqueeze(dim=1)
y = torch.tensor([i**2.0 for i in X[0:]]).unsqueeze(dim=1)
train_split = int (0.8 *len(X))
X_train, y_train = X[:train_split], y[:train_split]
X_test , y_test = X[train_split:], y[train_split:]
scaler_x = MinMaxScaler()
scaler_x.fit(X_train)
X_Train = scaler_x.transform(X_train)
scaler_y = MinMaxScaler()
scaler_y.fit(y_train)
y_Train = scaler_y.transform(y_train)
Deep Network
class FunctionEstimatorModel(nn.Module):
def __init__(self):
super().__init__()
self.linear_layer_1 = nn.Linear(in_features = 1,out_features = 200)
self.relu = nn.LeakyReLU()
self.linear_layer_2 = nn.Linear(in_features = 200,out_features = 200)
self.relu = nn.LeakyReLU()
self.linear_layer_3 = nn.Linear(in_features = 200,out_features = 1)
def forward (self, x: torch.Tensor) -> torch.Tensor:
return self.linear_layer_3(self.relu(self.linear_layer_2(self.relu(self.linear_layer_1(x)))))
for loop
for epoch in range(epochs):
model_0.train()
y_preds = model_0(X_train).squeeze()
loss = loss_fn(y_preds, y_train)
optimizer.zero_grad()
loss.backward()
optimizer.step()
scheduler.step()
#####Testing
model_0.eval()
with torch.inference_mode():
test_pred = model_0(X_test).squeeze()
test_loss = loss_fn(test_pred, y_test)
print(f"Epoch: {epoch} | Loss: {loss} | Test Loss: {test_loss}")
enter image description here
Thanks a lot
I should point out that compared to the tutorial page you referencing, the points you are trying to predict are much more difficult because they are out of distribution. That's because your model is only able to predict points within the [-50, 25] since it was only ever given training points belonging to that interval. If you look at the example from the page, however, his training points cover the whole range (with a different density of course but still).
i am using tensorflow/keras and i would like to use the input in the loss function
as per this answer here
Custom loss function in Keras based on the input data
I have created my loss function thusly
def custom_Loss_with_input(inp_1):
def loss(y_true, y_pred):
b = K.mean(inp_1)
return y_true - b
return loss
and set up the model with the layers and all ending like this
model = Model(inp_1, x)
model.compile(loss=custom_Loss_with_input(inp_1), optimizer= Ada)
return model
Nevertheless, i get the following error:
TypeError: Cannot convert a symbolic Keras input/output to a numpy array. This error may indicate that you're trying to pass a symbolic value to a NumPy call, which is not supported. Or, you may be trying to pass Keras symbolic inputs/outputs to a TF API that does not register dispatching, preventing Keras from automatically converting the API call to a lambda layer in the Functional Model.
Any advice on how to eliminate this error?
Thanks in advance
You can use add_loss to pass external layers to your loss, in your case the input tensor.
Here an example:
def CustomLoss(y_true, y_pred, input_tensor):
b = K.mean(input_tensor)
return K.mean(K.square(y_true - y_pred)) + b
X = np.random.uniform(0,1, (1000,10))
y = np.random.uniform(0,1, (1000,1))
inp = Input(shape=(10,))
hidden = Dense(32, activation='relu')(inp)
out = Dense(1)(hidden)
target = Input((1,))
model = Model([inp,target], out)
model.add_loss( CustomLoss( target, out, inp ) )
model.compile(loss=None, optimizer='adam')
model.fit(x=[X,y], y=None, epochs=3)
If your loss is composed of different parts and you want to track them you can add different losses corresponding to the loss parts. In this way, the losses are printed at the end of each epoch and are stored in model.history.history. Remember that the final loss minimized during training is the sum of the various loss parts.
def ALoss(y_true, y_pred):
return K.mean(K.square(y_true - y_pred))
def BLoss(input_tensor):
b = K.mean(input_tensor)
return b
X = np.random.uniform(0,1, (1000,10))
y = np.random.uniform(0,1, (1000,1))
inp = Input(shape=(10,))
hidden = Dense(32, activation='relu')(inp)
out = Dense(1)(hidden)
target = Input((1,))
model = Model([inp,target], out)
model.add_loss(ALoss( target, out ))
model.add_metric(ALoss( target, out ), name='a_loss')
model.add_loss(BLoss( inp ))
model.add_metric(BLoss( inp ), name='b_loss')
model.compile(loss=None, optimizer='adam')
model.fit(x=[X,y], y=None, epochs=3)
To use the model in inference mode (removing the target from inputs):
final_model = Model(model.input[0], model.output)
final_model.predict(X)
To test a nonlinear sequential model using Keras,
I made some random data x1,x2,x3
and y = a + b*x1 + c*x2^2 + d*x3^3 + e (a,b,c,d,e are constants).
Loss is getting low really quickly but the model actually predicts a pretty wrong number. I've done it with a linear model with similar codes but it worked right. Maybe the Sequential model is designed wrong. Here is my code
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Dense, Dropout
from tensorflow.keras import initializers
# y = 3*x1 + 5*x2 + 10
def gen_sequential_model():
model = Sequential([Input(3,name='input_layer')),
Dense(16, activation = 'relu', name = 'hidden_layer1', kernel_initializer=initializers.RandomNormal(mean = 0.0, stddev= 0.05, seed=42)),
Dense(16, activation = 'relu', name = 'hidden_layer2', kernel_initializer=initializers.RandomNormal(mean = 0.0, stddev= 0.05, seed=42)),
Dense(1, activation = 'relu', name = 'output_layer', kernel_initializer=initializers.RandomNormal(mean = 0.0, stddev= 0.05, seed=42)),
])
model.summary()
model.compile(optimizer='adam',loss='mse')
return model
def gen_linear_regression_dataset(numofsamples=500, a=3, b=5, c=7, d=9, e=11):
np.random.seed(42)
X = np.random.rand(numofsamples,3)
# y = a + bx1 + cx2^2 + dx3^3+ e
for idx in range(numofsamples):
X[idx][1] = X[idx][1]**2
X[idx][2] = X[idx][2]**3
coef = np.array([b,c,d])
bias = e
y = a + np.matmul(X,coef.transpose()) + bias
return X, y
def plot_loss_curve(history):
import matplotlib.pyplot as plt
plt.figure(figsize = (15,10))
plt.plot(history.history['loss'][1:])
plt.plot(history.history['val_loss'][1:])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train','test'],loc = 'upper right')
plt.show()
def predict_new_sample(model, x, a=3, b=5, c=7, d=9, e=11):
x = x.reshape(1,3)
y_pred = model.predict(x)[0][0]
y_actual = a + b*x[0][0] + c*(x[0][1]**2) + d*(x[0][2]**3) + e
print("y actual value: ", y_actual)
print("y pred value: ", y_pred)
model = gen_sequential_model()
X,y = gen_linear_regression_dataset(numofsamples=2000)
history = model.fit(X,y,epochs = 100, verbose=2, validation_split=0.3)
plot_loss_curve(history)
predict_new_sample(model, np.array([0.7,0.5,0.5]))
Result:
...
Epoch 99/100
44/44 - 0s - loss: 1.0631e-10 - val_loss: 9.9290e-11
Epoch 100/100
44/44 - 0s - loss: 1.0335e-10 - val_loss: 9.3616e-11
y actual value: 20.375
y pred value: 25.50001
Why is my predicted value so different from the real value?
Despite the improper use of activation = 'relu' in the last layer and the use of non-recommended kernel initializations, your model works fine, and the reported metrics are true and not flukes.
The problem is not in the model; the problem is that your data generating function does not return what you intend it to return.
First, in order to see that your model indeed learns what you have asked it to learn, let's run your code as is and then use your data generating function to produce a sample:
X, y_true = gen_linear_regression_dataset(numofsamples=1)
print(X)
print(y_true)
Result:
[[0.37454012 0.90385769 0.39221343]]
[25.72962531]
So for this particular X, the true output is 25.72962531; let's pass now this X to the model using your predict_new_sample function:
predict_new_sample(model, X)
# result:
y actual value: 22.134424269890232
y pred value: 25.729633
Well, the predicted output 25.729633 is extremely close to the true one as calculated above (25.72962531); thing is, your function thinks that the true output should be 22.134424269890232, which is demonstrably not the case.
What has happened is that your gen_linear_regression_dataset function returns the data X after you have calculated the squared and cubic components, which is not what you want; you want the returned data X to be before calculating the square & cube components, so that your model learns how to do this itself.
So, you need to change the function as follows:
def gen_linear_regression_dataset(numofsamples=500, a=3, b=5, c=7, d=9, e=11):
np.random.seed(42)
X_init = np.random.rand(numofsamples,3) # data to be returned
# y = a + bx1 + cx2^2 + dx3^3+ e
X = X_init.copy() # temporary data
for idx in range(numofsamples):
X[idx][1] = X[idx][1]**2
X[idx][2] = X[idx][2]**3
coef = np.array([b,c,d])
bias = e
y = a + np.matmul(X,coef.transpose()) + bias
return X_init, y
After modifying the function and re-training the model (you'll notice that the validation error ends up somewhat higher, ~ 1.3), we have
X, y_true = gen_linear_regression_dataset(numofsamples=1)
print(X)
print(y_true)
Result:
[[0.37454012 0.95071431 0.73199394]]
[25.72962531]
and
predict_new_sample(model, X)
# result:
y actual value: 25.729625308532768
y pred value: 25.443237
which is consistent. You will still not be getting perfect predictions of course, especially for unseen data (and remember that the error is now higher):
predict_new_sample(model, np.array([0.07,0.6,0.5]))
# result:
y actual value: 17.995
y pred value: 19.69147
As commented briefly above, you should really change your model to get rid from the kernel initializers (i.e. use the default, recommended ones) and use the correct activation function for your last layer:
def gen_sequential_model():
model = Sequential([Input(3,name='input_layer'),
Dense(16, activation = 'relu', name = 'hidden_layer1'),
Dense(16, activation = 'relu', name = 'hidden_layer2'),
Dense(1, activation = 'linear', name = 'output_layer'),
])
model.summary()
model.compile(optimizer='adam',loss='mse')
return model
You'll discover that you get a better validation error and better predictions:
predict_new_sample(model, np.array([0.07,0.6,0.5]))
# result:
y actual value: 17.995
y pred value: 18.272991
Nice catch from #desertnaut.
Just to add a few things uppon #desertnaut solution that seem to improve the results.
Scale your data (even that you always use 0-1 it seems to add a little boost)
Add Dropout between layers
Increase number of epochs (150 -200 ?)
Add reduce learning rate on plateau (give it some try)
Add more units to the layers
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.2,
patience=5, min_lr=0.001)
def gen_sequential_model():
model = Sequential([
Input(3,name='input_layer'),
Dense(64, activation = 'relu', name = 'hidden_layer1'),
Dropout(0.5),
Dense(64, activation = 'relu', name = 'hidden_layer2'),
Dropout(0.5),
Dense(1, name = 'output_layer')
])
history = model.fit(X, y, epochs = 200, verbose=2, validation_split=0.2, callbacks=[reduce_lr])
predict_new_sample(model, x=np.array([0.07, 0.6, 0.5]))
y actual value: 17.995
y pred value: 17.710054
I'm trying to implement a loss function by using the representations of the intermediate layers. As far as I know, the Keras backend custom loss function only accepts two input arguments(y_ture, and y-pred). How can I define a loss function with #tf.function and use it for a model that has been defined via Keras?
any help would be appreciated.
this a simple workaround to pass additional variables to your loss function. in our case, we pass the hidden output of one of our layers (x1). this output can be used to do something inside the loss function (I do a dummy operation)
def mse(y_true, y_pred, hidden):
error = y_true-y_pred
return K.mean(K.square(error)) + K.mean(hidden)
X = np.random.uniform(0,1, (1000,10))
y = np.random.uniform(0,1, 1000)
inp = Input((10,))
true = Input((1,))
x1 = Dense(32, activation='relu')(inp)
x2 = Dense(16, activation='relu')(x1)
out = Dense(1)(x2)
m = Model([inp,true], out)
m.add_loss( mse( true, out, x1 ) )
m.compile(loss=None, optimizer='adam')
m.fit(x=[X, y], y=None, epochs=3)
## final fitted model to compute predictions
final_m = Model(inp, out)
I am doing a slight modification of a standard neural network by defining a custom loss function. The custom loss function depends not only on y_true and y_pred, but also on the training data. I implemented it using the wrapping solution described here.
Specifically, I wanted to define a custom loss function that is the standard mse plus the mse between the input and the square of y_pred:
def custom_loss(x_true)
def loss(y_true, y_pred):
return K.mean(K.square(y_pred - y_true) + K.square(y_true - x_true))
return loss
Then I compile the model using
model_custom.compile(loss = custom_loss( x_true=training_data ), optimizer='adam')
fit the model using
model_custom.fit(training_data, training_label, epochs=100, batch_size = training_data.shape[0])
All of the above works fine, because the batch size is actually the number of all the training samples.
But if I set a different batch_size (e.g., 10) when I have 1000 training samples, there will be an error
Incompatible shapes: [1000] vs. [10].
It seems that Keras is able to automatically adjust the size of the inputs to its own loss function base on the batch size, but cannot do so for the custom loss function.
Do you know how to solve this issue?
Thank you!
==========================================================================
* Update: the batch size issue is solved, but another issue occurred
Thank you, Ori, for the suggestion of concatenating the input and output layers! It "worked", in the sense that the codes can run under any batch size. However, it seems that the result from training the new model is wrong... Below is a simplified version of the codes to demonstrate the problem:
import numpy as np
import scipy.io
import keras
from keras import backend as K
from keras.models import Model
from keras.layers import Input, Dense, Activation
from numpy.random import seed
from tensorflow import set_random_seed
def custom_loss(y_true, y_pred): # this is essentially the mean_square_error
mse = K.mean( K.square( y_pred[:,2] - y_true ) )
return mse
# set the seeds so that we get the same initialization across different trials
seed_numpy = 0
seed_tensorflow = 0
# generate data of x = [ y^3 y^2 ]
y = np.random.rand(5000+1000,1) * 2 # generate 5000 training and 1000 testing samples
x = np.concatenate( ( np.power(y, 3) , np.power(y, 2) ) , axis=1 )
training_data = x[0:5000:1,:]
training_label = y[0:5000:1]
testing_data = x[5000:6000:1,:]
testing_label = y[5000:6000:1]
# build the standard neural network with one hidden layer
seed(seed_numpy)
set_random_seed(seed_tensorflow)
input_standard = Input(shape=(2,)) # input
hidden_standard = Dense(10, activation='relu', input_shape=(2,))(input_standard) # hidden layer
output_standard = Dense(1, activation='linear')(hidden_standard) # output layer
model_standard = Model(inputs=[input_standard], outputs=[output_standard]) # build the model
model_standard.compile(loss='mean_squared_error', optimizer='adam') # compile the model
model_standard.fit(training_data, training_label, epochs=50, batch_size = 500) # train the model
testing_label_pred_standard = model_standard.predict(testing_data) # make prediction
# get the mean squared error
mse_standard = np.sum( np.power( testing_label_pred_standard - testing_label , 2 ) ) / 1000
# build the neural network with the custom loss
seed(seed_numpy)
set_random_seed(seed_tensorflow)
input_custom = Input(shape=(2,)) # input
hidden_custom = Dense(10, activation='relu', input_shape=(2,))(input_custom) # hidden layer
output_custom_temp = Dense(1, activation='linear')(hidden_custom) # output layer
output_custom = keras.layers.concatenate([input_custom, output_custom_temp])
model_custom = Model(inputs=[input_custom], outputs=[output_custom]) # build the model
model_custom.compile(loss = custom_loss, optimizer='adam') # compile the model
model_custom.fit(training_data, training_label, epochs=50, batch_size = 500) # train the model
testing_label_pred_custom = model_custom.predict(testing_data) # make prediction
# get the mean squared error
mse_custom = np.sum( np.power( testing_label_pred_custom[:,2:3:1] - testing_label , 2 ) ) / 1000
# compare the result
print( [ mse_standard , mse_custom ] )
Basically, I have a standard one-hidden-layer neural network, and a custom one-hidden-layer neural network whose output layer is concatenated with the input layer. For testing purpose, I did not use the concatenated input layer in the custom loss function, because I wanted to see if the custom network can reproduce the standard neural network. Since the custom loss function is equivalent to the standard 'mean_squared_error' loss, both networks should have the same training results (I also reset the random seeds to make sure that they have the same initialization).
However, the training results are very different. It seems that the concatenation makes the training process different? Any ideas?
Thank you again for all your help!
Final update: Ori's approach of concatenating input and output layers works, and is verified by using the generator. Thanks!!
The problem is that when compiling the model, you set x_true to be a static tensor, in the size of all the samples. While the input for keras loss functions are the y_true and y_pred, where each of them is of size [batch_size, :].
As I see it there are 2 options you can solve this, the first one is using a generator for creating the batches, in such a way that you will have control over which indices are evaluated each time, and at the loss function you could slice the x_true tensor to fit the samples being evaluated:
def custom_loss(x_true)
def loss(y_true, y_pred):
x_true_samples = relevant_samples(x_true)
return K.mean(K.square(y_pred - y_true) + K.square(y_true - x_true_samples))
return loss
This solution can be complicated, what I would suggest is a simpler workaround -
Concatenate the input layer with the output layer, such that your new output will be of the form original_output , input.
Now you can use a new modified loss function:
def loss(y_true, y_pred):
return K.mean(K.square(y_pred[:,:output_shape] - y_true[:,:output_shape]) +
K.square(y_true[:,:output_shape] - y_pred[:,outputshape:))
Now your new loss function will take in account both the input data, and the prediction.
Edit:
Note that while you set the seed, your models are not exactly the same, and as you did not use a generator, you let keras choose the batches, and for different models he might pick different samples.
As your model does not converge, different samples can lead to different results.
I added a generator to your code, to verify the samples we pick for training, now you can see both results are the same:
def custom_loss(y_true, y_pred): # this is essentially the mean_square_error
mse = keras.losses.mean_squared_error(y_true, y_pred[:,2])
return mse
def generator(x, y, batch_size):
curIndex = 0
batch_x = np.zeros((batch_size,2))
batch_y = np.zeros((batch_size,1))
while True:
for i in range(batch_size):
batch_x[i] = x[curIndex,:]
batch_y[i] = y[curIndex,:]
i += 1;
if i == 5000:
i = 0
yield batch_x, batch_y
# set the seeds so that we get the same initialization across different trials
seed_numpy = 0
seed_tensorflow = 0
# generate data of x = [ y^3 y^2 ]
y = np.random.rand(5000+1000,1) * 2 # generate 5000 training and 1000 testing samples
x = np.concatenate( ( np.power(y, 3) , np.power(y, 2) ) , axis=1 )
training_data = x[0:5000:1,:]
training_label = y[0:5000:1]
testing_data = x[5000:6000:1,:]
testing_label = y[5000:6000:1]
batch_size = 32
# build the standard neural network with one hidden layer
seed(seed_numpy)
set_random_seed(seed_tensorflow)
input_standard = Input(shape=(2,)) # input
hidden_standard = Dense(10, activation='relu', input_shape=(2,))(input_standard) # hidden layer
output_standard = Dense(1, activation='linear')(hidden_standard) # output layer
model_standard = Model(inputs=[input_standard], outputs=[output_standard]) # build the model
model_standard.compile(loss='mse', optimizer='adam') # compile the model
#model_standard.fit(training_data, training_label, epochs=50, batch_size = 10) # train the model
model_standard.fit_generator(generator(training_data,training_label,batch_size), steps_per_epoch= 32, epochs= 100)
testing_label_pred_standard = model_standard.predict(testing_data) # make prediction
# get the mean squared error
mse_standard = np.sum( np.power( testing_label_pred_standard - testing_label , 2 ) ) / 1000
# build the neural network with the custom loss
seed(seed_numpy)
set_random_seed(seed_tensorflow)
input_custom = Input(shape=(2,)) # input
hidden_custom = Dense(10, activation='relu', input_shape=(2,))(input_custom) # hidden layer
output_custom_temp = Dense(1, activation='linear')(hidden_custom) # output layer
output_custom = keras.layers.concatenate([input_custom, output_custom_temp])
model_custom = Model(inputs=input_custom, outputs=output_custom) # build the model
model_custom.compile(loss = custom_loss, optimizer='adam') # compile the model
#model_custom.fit(training_data, training_label, epochs=50, batch_size = 10) # train the model
model_custom.fit_generator(generator(training_data,training_label,batch_size), steps_per_epoch= 32, epochs= 100)
testing_label_pred_custom = model_custom.predict(testing_data)
# get the mean squared error
mse_custom = np.sum( np.power( testing_label_pred_custom[:,2:3:1] - testing_label , 2 ) ) / 1000
# compare the result
print( [ mse_standard , mse_custom ] )