Alternatives for loss functions in python CNTK - python

I have created a sequential model in CNTK and pass this model into a loss function like the following:
ce = cross_entropy_with_softmax(model, labels)
As mentioned here and as I have multilabel classifier, I want to use a proper loss function. The problem is I can not find any proper document to find these loss functions in Python. Is there any suggestion or sample code for this requirement.
I should notice that I found these alternatives (logistic and weighted logistic) in BrainScript language, but not in Python.

"my data has more than one label (three label) and each label has more than two values (30 different values)"
Do I understand right, you have 3 network outputs and associated labels, and each one is a 1-in-30 classifier? Then it seems you can just add three cross_entropy_with_softmax() values. Is that what you want?
E.g. if the model function returns a triple (ending in something like return combine([z1, z2, z3])), then your criterion function that you pass to Trainer could look like this (if you don't use Python 3, the syntax is a little different):
from cntk.layers.typing import Tensor, SparseTensor
#Function
def my_criterion(input : Tensor[input_dim], labels1 : SparseTensor[30],
labels2 : SparseTensor[30], labels3 : SparseTensor[30]):
z1, z2, z3 = my_model(input).outputs
loss = cross_entropy_with_softmax(z1, labels1) + \
cross_entropy_with_softmax(z2, labels2) + \
cross_entropy_with_softmax(z3, labels3)
return loss
learner = ...
trainer = Trainer(None, my_criterion, learner)
# in MB loop:
input_mb, L1_mb, L2_mb, L3_mb = my_next_minibatch()
trainer.train_minibatch(my_criterion.argument_map(input_mb, L1_mb, L2_mb, L3_mb))

Update (based on comments below): If you are using a sequential model then you are probably interested in taking a sum over all positions in the sequence of the loss at each position. cross_entropy_with_softmax is appropriate for the per-position loss and CNTK will automatically compute the sum of the loss values over all positions in the sequence.
Note that the terminology multilabel is non-standard here as it is typically referring to problems with multiple binary labels. The wiki page you link to refers to that case which is different from what you are doing.
Original answer (valid for the actual multilabel case): You will want to use binary_cross_entropy or weighted_binary_cross_entropy. (We decided to rename Logistic when porting this to Python). At the time of this writing these operations only support {0,1} labels. If your labels are in (0,1) then you will need to define your loss like this
import cntk as C
my_bce = label*C.log(model)+(1-label)*C.log(1-model)

Currently, most operators are in the cntk.ops package and documented here. The only exception being the sequence related operators, which reside in cntk.ops.sequence.
We have plans to restructure the operator space (without breaking backwards compatibility) to increase discoverability.
For your particular case, cross_entropy_with_softmax seems to be a reasonable choice, and you can find its documentation with examples here. Please also check out this Jupyter Notebook for a complete example.

Related

Tensorflow & Keras: Different output shape depends on training or inferring

I am sub-classing tensorflow.keras.Model to implement a certain model. Expected behavior:
Training (fitting) time: returns a list of tensors including the final output and auxiliary output;
Inferring (predicting) time: returns a single output tensor.
And the code is:
class SomeModel(tensorflow.keras.Model):
# ......
def call(self, x, training=True):
# ......
return [aux1, aux2, net] if training else net
This is how i use it:
model=SomeModel(...)
model.compile(...,
loss=keras.losses.SparseCategoricalCrossentropy(),
loss_weights=[0.4, 0.4, 1],...)
# ......
model.fit(data, [labels, labels, labels])
And got:
AssertionError: in converted code:
ipython-input-33-862e679ab098:140 call *
`return [aux1, aux2, net] if training else net`
...\tensorflow_core\python\autograph\operators\control_flow.py:918 if_stmt
Then the problem is that the if statement is converted into the calculation graph and this would of course cause the problem. I found the whole stack trace is long and useless so it's not included here.
So, is there any way to make TensorFlow generate different graph based on training or not?
Which tensorflow version are you using? You can overwrite behaviour in the .fit, .predict and .evaluate methods in Tensorflow 2.2, which would generate different graphs for these methods (I assume) and potentially work for your use-case.
The problems with earlier versions is that subclassed models get created by tracing the call method. This means Python conditionals become Tensorflow conditionals and face several limitations during graph creation and execution.
First, both branches (if-else) have to be defined, and regarding python collections (eg. lists), the branches have to have the same structure (eg. number of elements). You can read about the limitations and effects of Autograph here and here.
(Also, a conditional may not get evaluated at every run, if the condition is based on a Python variable and not a tensor.)

Setting up an optimization solver on top of a neural network model

I have a trained neural network model developed using the Keras framework in a Jupyter notebook. It is a regression problem, where I am trying to predict an output variable using some 14 input variables or features.
As a next step, I would like to minimize my output and want to determine what configuration/values these 14 inputs would take to get to the minimal value of the output.
So, essentially, I would like to pass the trained model object as my objective function in a solver, and also a bunch of constraints on the input variables to optimize/minimize the objective.
What is the best Python solver that can help me get there?
Thanks in advance!
So you already have your trained model, which we can think of as f(x) = y.
The standard SciPy method to minimize this is appropriately named scipy.optimize.minimize.
To use it, you just need to adapt your f(x) = y function to fit the API that SciPy uses. That is, the first function argument is the list of params to optimize over. The second argument is optional, and can contain any args that are fixed for the entire optimization (i.e. your trained model).
def score_trained_model(params, args):
# Get the model from the fixed args.
model = args[0]
# Run the model on the params, return the output.
return model_predict(model, params)
With this, plus an initial guess, you can use the minimize function now:
# Nelder-Mead is my go-to to start with.
# But it doesn't take advantage of the gradient.
# Something that does, e.g. BGFS, may perform better for your case.
method = 'Nelder-Mead'
# All zeros is fine, but improving this initial guess can help.
guess_params = [0]*14
# Given a trained model, optimize the inputs to minimize the output.
optim_params = scipy.optimize.minimize(
score_trained_model,
guess_params,
args=(trained_model,),
method=method,
)
It is possible to supply constraints and bounds to some of the optimization methods. For Nelder-Mead that is not supported, but you can just return a very large error when constraints are violated.
Older answer.
OP wants to optimize the inputs, x, not the hyperparameters.
It sounds like you want to do hyperparameter optimization. My Python library of choice is hyperopt: https://github.com/hyperopt/hyperopt
Given that you already have some training and scoring code, for example:
def train_and_score(args):
# Unpack args and train your model.
model = make_model(**args)
trained = train_model(model, **args)
# Return the output you want to minimize.
return score_model(trained)
You can easily use hyperopt to tune parameters like the learning rate, dropout, or choice of activations:
from hyperopt import fmin, hp, tpe, space_eval
space = {
'lr': hp.loguniform('lr', np.log(0.01), np.log(0.5)),
'dropout': hp.uniform('dropout', 0, 1),
'activation': hp.choice('activation', ['relu', 'sigmoid']),
}
# Minimize the training score over the space.
trials = Trials()
best = fmin(train_and_score, space, trials=trials, algo=tpe.suggest, max_evals=100)
# Print details about the best results and hyperparameters.
print(best)
print(space_eval(space, best))
There are also libraries that will help you directly integrate this with Keras. A popular choice is hyperas: https://github.com/maxpumperla/hyperas

In the sklearn RandomForestClassifier, is class_weight=None equivalent to class_weight="balanced_subsample"?

The wording in the documentation makes it seem like None and "balanced_subsample" are equivalent, but I want to make sure that this is indeed the case.
Documentation clearly says that they are not equivalent:
class_weight=None - all classes are supposed to have weight one
class_weight='balanced_subsample' - “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).
The “balanced_subsample” mode is the same as “balanced” except that weights are computed based on the bootstrap sample for every tree grown.

How can I apply custom regularization in CNTK (using python)?

do you know how can I apply a custom regularization function to CNTK?
In particular, I would like to add to the loss the derivative of the functino wrt to the inputs; something like
newLoss = loss + lambda * gradient_F(inputs)
where F is the function learned by the model and inputs are the inputs to the model.
How can I achieve this in CNTK? I don't know how to access the gradients wrt to the inputs, and how to take the gradient wrt to the weights of the regularizer.
First, gradient is not a scalar, so it doesn't make a lot of sense to optimize it. The gradient norm might be an interesting thing to add to your loss. To do that, CNTK would have to take the gradient of the gradient norm, which at the time of this writing (July 2017) is not supported. It is however an important feature we want to add in the next few months.
Update: One workaround is to do something like this
noisy_inputs = x + C.random.normal_like(x, scale=0.01)
noisy_model = model.clone('share', {x: noisy_inputs})
auxiliary_loss = C.squared_error(model, noisy_model)
but you will have to tune the scale of the noise for your problem.
CNTK learners only accept numbers as regularizer (L1/L2) values. If you really want to add your custom regularizer, you can easily implement your own Learner. You will have access to the gradients you need. You will find couple of examples on how to implement your own Learner here.
Here's the code to do this:
def cross_entropy_with_softmax_plus_regularization(model, labels, l2_regularization_weight):
w_norm = C.Constant(0);
for p in (model.parameters):
w_norm = C.plus(w_norm, 0.5*C.reduce_sum(C.square(p)))
return C.reduce_log_sum_exp(model.output) -
C.reduce_log_sum_exp(C.times_transpose(labels, model.output)) + l2_regularization_weight*w_norm
and my http://www.telesens.co/2017/09/29/spiral_cntk/ blog post about it

What is the loss function that use the DNNRegressor?

I am using DNNRegressor to train my model. I search in the documentation what is the loss function used by this wrapper but i don't find it. On the other hand, it is possible to change that loss function?.
Thank you for your suggestions.
It uses L2 loss (mean squared error) as defined in target_column.py:
def regression_target(label_name=None,
weight_column_name=None,
target_dimension=1):
"""Creates a _TargetColumn for linear regression.
Args:
label_name: String, name of the key in label dict. Can be null if label
is a tensor (single headed models).
weight_column_name: A string defining feature column name representing
weights. It is used to down weight or boost examples during training. It
will be multiplied by the loss of the example.
target_dimension: dimension of the target for multilabels.
Returns:
An instance of _TargetColumn
"""
return _RegressionTargetColumn(loss_fn=_mean_squared_loss,
label_name=label_name,
weight_column_name=weight_column_name,
target_dimension=target_dimension)
and currently API does not support any changes here. However, since it is open source - you can always modify the constructor to call different function internally, with different loss.

Categories

Resources