I am trying to apply AdaBoost to a Keras model. The thing is that I have to use a custom loss function (unscaled deviance) which works well when I am using Keras into the RandomizedSearchCV of sklearn, but when I try to use AdaBoostRegressor i get :
ValueError: y should be a 1d array, got an array of shape (64501, 2) instead.
This error is coming from the fact that in my custom loss function, I use three arguments. However Keras only accepts two parameters (y_true, y_pred), so I bypassed this by passing a tuple with two values instead of y_true, like this :
#Loss function
def deviance(data, y_pred):
y_true = data[:, 0]
d = data[:, 1]
lnY = KB.log(y_true)
bool1 = KB.equal(y_true, 0)
zeros = KB.zeros_like(y_true)
lnY = KB.switch(bool1, zeros, lnY)
lnYp = KB.log(y_pred)
bool2 = KB.equal(y_pred, 0)
zeross = KB.zeros_like(y_pred)
lnYp = KB.switch(bool2, zeross, lnYp)
loss = 2 * d * (y_true * lnY - y_true * lnYp[:, 0] - y_true + y_pred[:, 0])
return loss
So the program takes the tuple (call it 'feed') and unpacks it in order to calculate the unscaled deviance. I then use it like this and it works :
grid = RandomizedSearchCV(pipeline, cv = cv, param_distributions=param_grid, verbose=2, n_iter = 40) #plus de folds pourraient augmenter la variance
grid.fit(data, feed)
But now, I want to use this keras model in an AdaboostRegressor by using :
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.pipeline import Pipeline
from sklearn.model_selection import KFold
def baseline_model2(dropout = 0.2, kernel_initializer = 'glorot_uniform', nn1 = 15, nn2 = 10, lr = 0.001, act1 = "relu"):
with tf.device('/gpu:0'):
# create model
#building model
model = keras.Sequential()
model.add(Dense(nn1, input_dim = 21, activation = act1, kernel_initializer=kernel_initializer))
model.add(Dropout(dropout))
#model.add(Dense(2, activation = "exponential"))
model.add(Dense(nn2, activation = act1))
model.add(Dense(1, activation = "exponential", kernel_initializer=kernel_initializer))
optimizer = keras.optimizers.adagrad(lr=lr)
model.compile(loss=deviance, optimizer=optimizer, metrics = [deviance, "mean_squared_error"])
return model
clf = KerasRegressor(build_fn=baseline_model2)
from sklearn.ensemble import AdaBoostRegressor
boostedNN = AdaBoostRegressor(base_estimator=clf)
boostedNN.fit(data, feed)
It is this part that gives me the ValueError. I know it comes from the fact that I am supplying a tuple to the algorithm, but I am searching for a way to bypass this, because Keras will need that tuple in order to evaluate the deviance correctly. I think editing a line or two in skl library could do the trick but I didn't manage to find the right place to do it by myself.
Related
I'm having trouble implementing a custom loss function into a Neural Network I'm building in TensorFlow. I want use one of my features as part of the loss function, so I've tried using model.add_loss instead of giving loss a value in the model.compile function.
My data looks like this:
import tensorflow as tf
import numpy as np
from tensorflow.keras import layers
feature_df = np.array([600,9])
training, test, = feature_df[:350,:], feature_df[350:,:]
x_train = training[:,[0,1,2,3,4,5,6]]
y_train = training[:,8]
loss_inp_train = training[:,[6]]
x_test = test[:,[0,1,2,3,4,5,6]]
y_test = test[:,8]
loss_inp_test = test[:,[6]]
I want to use a custom loss function because its not necessarily the mse I'm interested in minimizing, I want to minimize the profitability of this model, which depends if y_true and y_pred fall above or below loss_inp_train
I've tried creating a loss function that looks like this
def custom_loss(y_pred, y_true,inp):
loss = 0
if (y_pred < inp):
if y_true < inp:
loss = loss + .9
else:
loss = loss - 1
else:
if y_true > inp:
loss = loss + .9
else:
loss = loss - 1
loss = loss*-1
return(loss)
And the Model
model = tf.keras.Sequential([
normalize,
layers.Dense(18),
layers.Dense(1)
])
model.add_loss(profit_loss(y_pred,y_train,loss_inp_train))
model.compile(loss = None,
optimizer = tf.optimizers.Adam())
I'm having trouble feeding the loss function the output of the model. I'm still new to TensorFlow, whenever I've accessed predicted values its after the training using model.predict, but obviously I don't have a fitted model yet. How do I reference both a feature of the training data and y_true, y_pred in a function?
Probably the best way to do this is to define a custom loss. Unfortunately I'm not sure how to handle nested if statements like you have. Probably with a combination of K.switch. I can try to give you a partial solutions taking in consideration only the presence of a single if statement. Let's take the following simplified code:
loss = 0
if (y_pred < inp):
loss = # assignment 1
else:
loss = # assignment 2
In this case the loss function could be converted into this:
def profit_loss(inp):
def loss_function(y_true, y_pred):
loss = 0
condition = K.greater(y_pred - inp, 0)
loss1 = # assignment 1 if y_pred < inp
loss2 = # assignment 2 if y_pred >= inp
loss = K.switch(condition, loss2, loss1)
return - K.sum(loss)
return loss_function
model.compile(optimizer = tf.optimizers.Adam(), loss=profit_loss(inp))
This way y_true and y_pred are automatically handled and you just have to feed the inp argument.
Hope this helps getting you closer to solving the problem.
I'm trying to implement and train a neural network using the JAX library and its little neural network submodule, "Stax". Since this library doesn't come with an implementation of binary cross entropy, I wrote my own:
def binary_cross_entropy(y_hat, y):
bce = y * jnp.log(y_hat) + (1 - y) * jnp.log(1 - y_hat)
return jnp.mean(-bce)
I implemented a simple neural network and trained it on MNIST, and started to get suspicious of some of the results I was getting. So I implemented the same setup in Keras, and I immediately got wildly different results! The same model, trained in the same way on the same data, was getting 90% training accuracy in Keras instead of around 50% in JAX. Eventually I tracked down part of the issue to my naive implementation of cross-entropy, which is supposedly numerically unstable. Following this post and this code I found, I wrote the following new version:
def binary_cross_entropy_stable(y_hat, y):
y_hat = jnp.clip(y_hat, 0.000001, 0.9999999)
logits = jnp.log(y_hat/(1 - y_hat))
max_logit = jnp.clip(logits, 0, None)
bces = logits - logits * y + max_logit + jnp.log(jnp.exp(-max_logit) + jnp.exp(-logits - max_logit))
return jnp.mean(bces)
This works a little better. Now my JAX implementation gets up to 80% train accuracy, but that's still a lot less than the 90% Keras gets. What I want to know is what is going on? Why are my two implementations not behaving the same way?
Below, I condensed my two implementations down to a single script. In this script, I implement the same model in JAX and in Keras. I initialize both with the same weights, and train them using full-batch gradient descent for 10 steps on 1000 datapoints from MNIST, the same data for each model. JAX finishes with 80% training accuracy, while Keras finishes with 90%. Specifically, I get this output:
Initial Keras accuracy: 0.4350000023841858
Initial JAX accuracy: 0.435
Final JAX accuracy: 0.792
Final Keras accuracy: 0.9089999794960022
JAX accuracy (Keras weights): 0.909
Keras accuracy (JAX weights): 0.7919999957084656
And actually, when I vary the conditions a little (using different random initial weights or a different training set), sometimes I get back the 50% JAX accuracy and 90% Keras accuracy.
I swap the weights at the end to verify that the weights obtained from training are indeed the issue, not something to do with the actual computation of the network predictions, or the way I calculate accuracy.
The code:
import numpy as np
import jax
from jax import jit, grad
from jax.experimental import stax, optimizers
import jax.numpy as jnp
import keras
import keras.datasets.mnist
def binary_cross_entropy(y_hat, y):
bce = y * jnp.log(y_hat) + (1 - y) * jnp.log(1 - y_hat)
return jnp.mean(-bce)
def binary_cross_entropy_stable(y_hat, y):
y_hat = jnp.clip(y_hat, 0.000001, 0.9999999)
logits = jnp.log(y_hat/(1 - y_hat))
max_logit = jnp.clip(logits, 0, None)
bces = logits - logits * y + max_logit + jnp.log(jnp.exp(-max_logit) + jnp.exp(-logits - max_logit))
return jnp.mean(bces)
def binary_accuracy(y_hat, y):
return jnp.mean((y_hat >= 1/2) == (y >= 1/2))
########################################
# #
# Create dataset #
# #
########################################
input_dimension = 784
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data(path="mnist.npz")
xs = np.concatenate([x_train, x_test])
xs = xs.reshape((70000, 784))
ys = np.concatenate([y_train, y_test])
ys = (ys >= 5).astype(np.float32)
ys = ys.reshape((70000, 1))
train_xs = xs[:1000]
train_ys = ys[:1000]
########################################
# #
# Create JAX model #
# #
########################################
jax_initializer, jax_model = stax.serial(
stax.Dense(1000),
stax.Relu,
stax.Dense(1),
stax.Sigmoid
)
rng_key = jax.random.PRNGKey(0)
_, initial_jax_weights = jax_initializer(rng_key, (1, input_dimension))
########################################
# #
# Create Keras model #
# #
########################################
initial_keras_weights = [*initial_jax_weights[0], *initial_jax_weights[2]]
keras_model = keras.Sequential([
keras.layers.Dense(1000, activation="relu"),
keras.layers.Dense(1, activation="sigmoid")
])
keras_model.compile(
optimizer=keras.optimizers.SGD(learning_rate=0.01),
loss=keras.losses.binary_crossentropy,
metrics=["accuracy"]
)
keras_model.build(input_shape=(1, input_dimension))
keras_model.set_weights(initial_keras_weights)
if __name__ == "__main__":
########################################
# #
# Compare untrained models #
# #
########################################
initial_keras_predictions = keras_model.predict(train_xs, verbose=0)
initial_jax_predictions = jax_model(initial_jax_weights, train_xs)
_, keras_initial_accuracy = keras_model.evaluate(train_xs, train_ys, verbose=0)
jax_initial_accuracy = binary_accuracy(jax_model(initial_jax_weights, train_xs), train_ys)
print("Initial Keras accuracy:", keras_initial_accuracy)
print("Initial JAX accuracy:", jax_initial_accuracy)
########################################
# #
# Train JAX model #
# #
########################################
L = jit(binary_cross_entropy_stable)
gradL = jit(grad(lambda w, x, y: L(jax_model(w, x), y)))
opt_init, opt_apply, get_params = optimizers.sgd(0.01)
network_state = opt_init(initial_jax_weights)
for _ in range(10):
wT = get_params(network_state)
gradient = gradL(wT, train_xs, train_ys)
network_state = opt_apply(
0,
gradient,
network_state
)
final_jax_weights = get_params(network_state)
final_jax_training_predictions = jax_model(final_jax_weights, train_xs)
final_jax_accuracy = binary_accuracy(final_jax_training_predictions, train_ys)
print("Final JAX accuracy:", final_jax_accuracy)
########################################
# #
# Train Keras model #
# #
########################################
for _ in range(10):
keras_model.fit(
train_xs,
train_ys,
epochs=1,
batch_size=1000,
verbose=0
)
final_keras_loss, final_keras_accuracy = keras_model.evaluate(train_xs, train_ys, verbose=0)
print("Final Keras accuracy:", final_keras_accuracy)
########################################
# #
# Swap weights #
# #
########################################
final_keras_weights = keras_model.get_weights()
final_keras_weights_in_jax_format = [
(final_keras_weights[0], final_keras_weights[1]),
tuple(),
(final_keras_weights[2], final_keras_weights[3]),
tuple()
]
jax_accuracy_with_keras_weights = binary_accuracy(
jax_model(final_keras_weights_in_jax_format, train_xs),
train_ys
)
print("JAX accuracy (Keras weights):", jax_accuracy_with_keras_weights)
final_jax_weights_in_keras_format = [*final_jax_weights[0], *final_jax_weights[2]]
keras_model.set_weights(final_jax_weights_in_keras_format)
_, keras_accuracy_with_jax_weights = keras_model.evaluate(train_xs, train_ys, verbose=0)
print("Keras accuracy (JAX weights):", keras_accuracy_with_jax_weights)
Try changing the PRNG seed at line 57 to a value other than 0 to run the experiment using different initial weights.
Your binary_cross_entropy_stable function does not match the output of keras.binary_crossentropy; for example:
x = np.random.rand(10)
y = np.random.rand(10)
print(keras.losses.binary_crossentropy(x, y))
# tf.Tensor(0.8134677734043875, shape=(), dtype=float64)
print(binary_cross_entropy_stable(x, y))
# 0.9781515
That is where I would start if you're trying to exactly duplicate the model.
You can view the source of the keras loss function here: keras/losses.py#L1765-L1810, with the main part of the implementation here: keras/backend.py#L4972-L5017
One detail: it appears that with a sigmoid activation function, Keras re-uses some cached logits to compute the binary cross entropy while avoiding problematic values: keras/backend.py#L4988-L4997. I'm not sure how to easily replicate that behavior using JAX & stax.
I have a simple neural network with two outputs and for each of them I need to use different activation function. I do basically what is written in this article - here, but it looks like my layer with different activation functions is not working:
See my code below:
X = filled_df.loc[:, "SOUTEZ_MEAN_HOME":"TOTAL_POINTS_AWAY"].values
y = filled_df.loc[:, "HOME_YELLOW_CARDS"].values
X= X.astype("float32")
y= y.astype("float32")
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.3)
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train= scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
def negative_binomial_layer(x):
# Get the number of dimensions of the input
num_dims = len(x.get_shape())
# Separate the parameters
n, p = tf.unstack(x, num=2, axis=-1)
# Add one dimension to make the right shape
n = tf.expand_dims(n, -1)
p = tf.expand_dims(p, -1)
# Apply a softplus to make positive
n = tf.cast(n, tf.float32)
p = tf.cast(p, tf.float32)
n = tf.keras.activations.softplus(n)
# Apply a sigmoid activation to bound between 0 and 1
p = tf.keras.activations.sigmoid(p)
# Join back together again
out_tensor = tf.concat((n, p), axis=num_dims-1)
return out_tensor
input_shape = (212, )
# Define inputs with predefined shape
inputs = Input(shape=input_shape)
# Build network with some predefined architecture
Layer1 = Dense(16)
Layer2 = Dense(8)
output1 = Layer1(inputs)
output2 = Layer2(output1)
# Predict the parameters of a negative binomial distribution
outputs = Dense(2)(output2)
#outputs = tf.cast(outputs, tf.float32)
distribution_outputs = Lambda(negative_binomial_layer)(outputs)
# Construct model
model = Model(inputs=inputs, outputs=outputs)
num_epochs = 10
opt = Adam()
model.compile(loss = negative_binomial_loss, optimizer = opt)
history = model.fit(X_train, y_train, epochs = num_epochs,
validation_data = (X_test, y_test))
These are my predicted values if I print y_pred in custom loss function:
Epoch 1/10
y_pred = [[2.19472528 3.14479065]
[-1.16056371 1.69369149]
[-1.12327099 2.06830978]
...
[-1.23587477 4.82307]
[0.235431105 3.86740351]
[-2.75554061 1.10352468]] [[[2.19472528 3.14479065]
[-1.16056371 1.69369149]
[-1.12327099 2.06830978]
...
[-1.23587477 4.82307]
[0.235431105 3.86740351]
[-2.75554061 1.10352468]]]
Second predicted value p should be between 0 and 1 and since it is out of this range I am getting nan during counting loss.
Any suggestions? Thanks
I can't give an exact programming explanation, but I can give a theoretical answer to this question which you should be able to use to build it.
From what I am assuming, you are asking how to use a different activation function for each output node in the outputs layer. I do not know much about any of the libraries or extensions you are using, but usually these kinds of libraries include some kind of method to create a customised network. From the code you have posted I can see that you are using a pre-defined structure for a network, this means that you may not be able to customise the output layer yourself, and you will have to create a custom network instead. I am assuming you are using Tensorflow due to some of the methods in the code you posted.
There is also something else to consider. Usually you have activated functions on the neurons (hidden layer) too, that is something that you might have to take in to consideration as well.
I am sorry I was not able to give a practical answer, but I hope this helps you see what you can do to get it to work - have a nice day!
I want to convert the code written in Python into Matlab code. May I know is it possible to do that. l am wonder, how can we use the python libraries in Matlab. Share the procedure to do the conversion
Here is the Data I used:
https://drive.google.com/open?id=1GLm87-5E_6YhUIPZ_CtQLV9F9wcGaTj2
Here is my code in Python:
# imports libraries
import numpy as np
import pandas as pd
import os
import tensorflow as tf
import matplotlib.pyplot as plt
import random
from scipy import signal
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.models import Sequential
from tensorflow import set_random_seed
from tensorflow.keras.initializers import glorot_uniform
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
from sklearn.metrics import accuracy_score
from importlib import reload
# useful pandas display settings
pd.options.display.float_format = '{:.3f}'.format
# useful functions
def plot_history(history, metrics_to_plot):
"""
Function plots history of selected metrics for fitted neural net.
"""
# plot
for metric in metrics_to_plot:
plt.plot(history.history[metric])
# name X axis informatively
plt.xlabel('epoch')
# name Y axis informatively
plt.ylabel('metric')
# add informative legend
plt.legend(metrics_to_plot)
# plot
plt.show()
def plot_fit(y_true, y_pred, title='title'):
"""
Function plots true values and predicted values, sorted in increase order by true values.
"""
# create one dataframe with true values and predicted values
results = y_true.reset_index(drop=True).merge(pd.DataFrame(y_pred), left_index=True, right_index=True)
# rename columns informartively
results.columns = ['true', 'prediction']
# sort for clarity of visualization
results = results.sort_values(by=['true']).reset_index(drop=True)
# plot true values vs predicted values
results.plot()
# adding scatter on line plots
plt.scatter(results.index, results.true, s=5)
plt.scatter(results.index, results.prediction, s=5)
# name X axis informatively
plt.xlabel('obs sorted in ascending order with respect to true values')
# add customizable title
plt.title(title)
# plot
plt.show();
def reset_all_randomness():
"""
Function assures reproducibility of NN estimation results.
"""
# reloads
reload(tf)
reload(np)
reload(random)
# seeds - for reproducibility
os.environ['PYTHONHASHSEED']=str(984797)
random.seed(984797)
set_random_seed(984797)
np.random.seed(984797)
my_init = glorot_uniform(seed=984797)
return my_init
def give_me_mse(true, prediction):
"""
This function returns mse for 2 vectors: true and predicted values.
"""
return np.mean((true-prediction)**2)
# Importing the dataset
X = pd.read_excel(r"C:\filelocation\Data.xlsx","Sheet1").values
y = pd.read_excel(r"C:\filelocation\Data.xlsx","Sheet2").values
# Importing the experiment data
Data = pd.read_excel(r"C:\filelocation\Data.xlsx","Sheet1")
v = pd.DataFrame(Data, columns= ['v']).values
c = pd.DataFrame(Data, columns= ['c']).values
ird = pd.DataFrame(Data, columns= ['ird']).values
tmp = pd.DataFrame(Data, columns= ['tmp']).values
#Data Prepration
ird = ird.ravel()
tmp = tmp.ravel()
ir = np.nanmax(ird)
tp = np.nanmax(tmp)
p = v*c
p = p.ravel()
peaks, _ = signal.find_peaks(p)
nop = len(peaks)
pv = p.max()
#Experimental Data for testing
E_data = np.array([[ir,tp,pv,nop]])
#importing some more libraries
from sklearn.preprocessing import LabelEncoder
from keras.utils import np_utils
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(np.ravel(y))
y_encoded = encoder.transform(np.ravel(y))
# convert integers to dummy variables (i.e. one hot encoded)
y_dummy = np_utils.to_categorical(y_encoded)
# reset_all_randomness - for reproducibility
my_init = reset_all_randomness()
# Splitting the dataset into the Training set and Test set
X_train, X_test, y_train, y_test, y_train_dummy, y_test_dummy = train_test_split(X, y, y_dummy, test_size = 0.3, random_state = 20)
# Feature Scaling
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
E_data = sc.transform(E_data)
# Initialising the ANN
model0 = Sequential()
# Adding 1 hidden layer: the input layer and the first hidden layer
model0.add(Dense(units = 160, activation = 'tanh', input_dim = 4, kernel_initializer=my_init))
# Adding 2 hidden layer
model0.add(Dense(units = 49, activation = 'tanh', kernel_initializer=my_init))
# Adding 3 hidden layer
model0.add(Dense(units = 24, activation = 'tanh', kernel_initializer=my_init))
# Adding 4 hidden layer
model0.add(Dense(units = 15, activation = 'tanh', kernel_initializer=my_init))
# Adding output layer
model0.add(Dense(units = 6, activation = 'softmax', kernel_initializer=my_init))
# Set up Optimizer
Optimizer = tf.train.AdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.99)
# Compiling the ANN
model0.compile(optimizer = Optimizer, loss = 'categorical_crossentropy', metrics=['accuracy','categorical_crossentropy','mse'])
# Fitting the ANN to the Train set, at the same time observing quality on Valid set
history = model0.fit(X_train, y_train_dummy, validation_data=(X_test, y_test_dummy), batch_size = 100, epochs = 1500)
# Generate prediction for all Train, Valid set and Experimental set
y_train_pred_model0 = model0.predict(X_train)
y_test_pred_model0 = model0.predict(X_test)
y_exp_pred_model0 = model0.predict(E_data)
# find final prediction by taking class with highest probability
y_train_pred_model0 = np.array([[list(x).index(max(list(x))) + 1] for x in y_train_pred_model0])
y_test_pred_model0 = np.array([[list(x).index(max(list(x))) + 1] for x in y_test_pred_model0])
y_exp_pred_model0 = np.array([[list(x).index(max(list(x))) + 1] for x in y_exp_pred_model0])
# check what metrics are in fact available in history
history.history.keys()
# Inverse scaling
X_train_inverse = sc.inverse_transform(X_train)
X_test_inverse = sc.inverse_transform(X_test)
E_data_inverse = sc.inverse_transform(E_data)
#Plots
print('#######################################################################')
# look at model fitting history
plot_history(history, ['mean_squared_error', 'val_mean_squared_error'])
plot_history(history, ['categorical_crossentropy', 'val_categorical_crossentropy'])
plot_history(history, ['acc', 'val_acc'])
# look at model fit quality
plot_fit(pd.DataFrame(y_train), y_train_pred_model0, 'Fit on train data')
plot_fit(pd.DataFrame(y_test), y_test_pred_model0, 'Fit on test data')
#Results
print('#######################################################################')
print('=============Mean Squared Error============')
print('MSE on train data is: {}'.format(give_me_mse(y_train, y_train_pred_model0)))
print('MSE on test data is: {}'.format(give_me_mse(y_test, y_test_pred_model0)))
print('#######################################################################')
print('================Accuracy===================')
print('Accuracy of ANN is: {} Percentage'.format((accuracy_score(y_test, y_test_pred_model0))*100))
print('#######################################################################')
print('========Result of Test Data set is=========')
for i in range(len(y_test)):
print('%s => %d (expected %s)' % (X_test_inverse[i].tolist(), y_test_pred_model0[i], y_test[i].tolist()))
print('#######################################################################')
print('====Result of Experimental Data set is=====')
print('%s => %d' % (E_data_inverse, y_exp_pred_model0))
There is no "direct" way to convert Python code to MATLAB code.
What you can do is directly translate the approach (the algorithm) and write the code from scratch.
or what I think would be more preferable to you is to directly call python script in MATLAB using their API
here is the link for further reading: https://in.mathworks.com/help/matlab/call-python-libraries.html
for example:
>> py.math.sqrt(4)
ans =
1
To run your own function, you can create a file in your current MATLAB working directory. here is the file ‘hello.py’ that contained these two lines:
def world():
return 'hello world'
Then in MATLAB:
>> py.hello.world();
Hello world!
if you run into errors make sure you're using the supported version of Python and add
pyversion <path_to_executable>
to the start of your MATLAB file.
Although I'm not sure how well it will work considering all the Python libraries you're importing (Scipy, Tensorflow etc)
After I created my model in Keras, I want to get the gradients and apply them directly in Tensorflow with the tf.train.AdamOptimizer class. However, since I am using a Dropout layer, I don't know how to tell to the model whether it is in the training mode or not. The training keyword is not accepted. This is the code:
net_input = Input(shape=(1,))
net_1 = Dense(50)
net_2 = ReLU()
net_3 = Dropout(0.5)
net = Model(net_input, net_3(net_2(net_1(net_input))))
#mycost = ...
optimizer = tf.train.AdamOptimizer()
gradients = optimizer.compute_gradients(mycost, var_list=[net.trainable_weights])
# perform some operations on the gradients
# gradients = ...
trainstep = optimizer.apply_gradients(gradients)
I get the same behavior with and without dropout layer, even with dropout rate=1. How to solve this?
As #Sharky already said you can use training argument while invoking call() method of Dropout class. However, if you want to train in tensorflow graph mode you need to pass a placeholder and feed it boolean value during training. Here is the example of fitting Gaussian blobs applicable to your case:
import tensorflow as tf
import numpy as np
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import ReLU
from tensorflow.keras.layers import Input
from tensorflow.keras import Model
x_train, y_train = make_blobs(n_samples=10,
n_features=2,
centers=[[1, 1], [-1, -1]],
cluster_std=1)
x_train, x_test, y_train, y_test = train_test_split(
x_train, y_train, test_size=0.2)
# `istrain` indicates whether it is inference or training
istrain = tf.placeholder(tf.bool, shape=())
y = tf.placeholder(tf.int32, shape=(None))
net_input = Input(shape=(2,))
net_1 = Dense(2)
net_2 = Dense(2)
net_3 = Dropout(0.5)
net = Model(net_input, net_3(net_2(net_1(net_input)), training=istrain))
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
labels=y, logits=net.output)
loss_fn = tf.reduce_mean(xentropy)
optimizer = tf.train.AdamOptimizer(0.01)
grads_and_vars = optimizer.compute_gradients(loss_fn,
var_list=[net.trainable_variables])
trainstep = optimizer.apply_gradients(grads_and_vars)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
l1 = loss_fn.eval({net_input:x_train,
y:y_train,
istrain:True}) # apply dropout
print(l1) # 1.6264652
l2 = loss_fn.eval({net_input:x_train,
y:y_train,
istrain:False}) # no dropout
print(l2) # 1.5676715
sess.run(trainstep, feed_dict={net_input:x_train,
y:y_train,
istrain:True}) # train with dropout
Keras layers inherit from tf.keras.layers.Layer class. Keras API handle this internally with model.fit. In case Keras Dropout is used with pure TensorFlow training loop, it supports a training argument in its call function.
So you can control it with
dropout = tf.keras.layers.Dropout(rate, noise_shape, seed)(prev_layer, training=is_training)
From official TF docs
Note: - The following optional keyword arguments are reserved for
specific uses: * training: Boolean scalar tensor of Python boolean
indicating whether the call is meant for training or inference. *
mask: Boolean input mask. - If the layer's call method takes a mask
argument (as some Keras layers do), its default value will be set to
the mask generated for inputs by the previous layer (if input did come
from a layer that generated a corresponding mask, i.e. if it came from
a Keras layer with masking support.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout#call