Tensorflow: KL divergence for categorical probability distribution

Tensorflow: KL divergence for categorical probability distribution - python

I am trying to calculate the accuracy of my network with KL divergence. The prediction is a k-dimensional probability vector, which shall be compared against a gold-standard probability distribution of the same dimensionality.
I tried this:
corr_subj_test = tf.contrib.distributions.kl(pred_subj, y)
accr_subj_test = tf.reduce_mean(corr_subj_test)
But eventually get the following error:
NotImplementedError: No KL(dist_a || dist_b) registered for dist_a
type Tensor and dist_b type Tensor

Checking the tensorflow github and some other Issues that give the same NotImplementedError error (like this one) it seems that the kl() method does not currently accept that specific combination of parameter types.
If it is possible, you could pass your data to kl() in a data type it accepts (maybe transforming your data to achieve so).**
You could also try post it on tensorflow issues to discuss about your problem.
** Edit:
As suggested and explained by the answer in this question, you can obtain your desired result by using Cross Entropy instead with the softmax_cross_entropy_with_logits method, like this:
newY = pred_subj/y
crossE = tf.nn.softmax_cross_entropy_with_logits(pred_subj, newY)
accr_subj_test = tf.reduce_mean(-crossE)

Related

sklearn mixture.GMM in python using univariate GMM

In R, mclust has an argument 'modelNames' where you can define which model to implement. I wish to do a univariate modeling which is also modelNames <- 'V' in mclust under mixture.GMM in python. However, the only thing I find that I can tweak with is the covariance_type. Nonetheless, when I run the same data using R and mixture.GMM under sklearn, I get different fitting despite the same number of fitted components. What could I change in mixture.GMM to indicate I am using a univariate variable variance?
mclust code:
function(x){Mclust(ma78[x,],G=2,modelNames="V",verbose=FALSE)}
GMM code:
gmm = GMM(n_components = 2).fit(data)

With univariate data, the covariance can either be equal or unique (variable). With Mclust these options are modelNames = "E" or "V", respectively.
With sklearn, they appear to be covariance_type = "tied" or "full". Possibly, something like this for variable Gaussian mixture model
gmm = mixture.GaussianMixture(n_components = 2, covariance_type='full').fit(data)
Even using Mclust or sklearn alone there can be instanced that you may not get same parameter values for different runs - this is because the estimates can depend on the initial values. One way to avoid this is using a larger number of starts if such option is available.

found the answer on stats.stackexchange. The only thing you have to do is to reshape your data data.reshape(-1, 1) before you pass it into sklearn.mixture.GaussianMixture
Andreas

DBSCAN with custom metric

I have the following given:
a dataset in the range of thousands
a way of computing the similarity, but the datapoints themselves I cannot plot them in euclidian space
I know that DBSCAN should support custom distance metric but I dont know how to use it.
say I have a function
def similarity(x,y):
return similarity ...
and I have a list of data that can be passed pairwise into that function, how do I specify this when using the DBSCAN implementation of scikit-learn ?
Ideally what I want to do is to get a list of the clusters but I cant figure out how to get started in the first place.
There is a lot of terminology that still confuses me:
http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html
How do I pass a feature array and what is it ? How do I fit this implementation to my needs ? How will I be able to get my "sublists" from this algorithm ?

A "feature array" is simply an array of the features of a datapoint in your dataset.
metric is the parameter you're looking for. It can be a string (the name of a builtin metric), or a callable. Your similarity function is a callable. This isn't well described in the documentation, but a metric has to do just that, take two datapoints as parameters, and return a number.
def similarity(x, y):
return ...
reduced_dataset = sklearn.cluster.DBSCAN(metric=similarity).fit(dataset)

In case someone is searching the same for strings with a custom metric
def metric(x, y):
return yourDistFunc(string_seqs[int(x[0])],string_seqs[int(y[0])])
def clusterPockets():
global string_seqs
string_seqs = load_data() #["foo","bar"...]
dat = np.arange(len(string_seqs)).reshape(-1, 1)
clustered_dataset = DBSCAN(metric=metric)).fit(X=dat, y=dat)

How can I apply custom regularization in CNTK (using python)?

do you know how can I apply a custom regularization function to CNTK?
In particular, I would like to add to the loss the derivative of the functino wrt to the inputs; something like
newLoss = loss + lambda * gradient_F(inputs)
where F is the function learned by the model and inputs are the inputs to the model.
How can I achieve this in CNTK? I don't know how to access the gradients wrt to the inputs, and how to take the gradient wrt to the weights of the regularizer.

First, gradient is not a scalar, so it doesn't make a lot of sense to optimize it. The gradient norm might be an interesting thing to add to your loss. To do that, CNTK would have to take the gradient of the gradient norm, which at the time of this writing (July 2017) is not supported. It is however an important feature we want to add in the next few months.
Update: One workaround is to do something like this
noisy_inputs = x + C.random.normal_like(x, scale=0.01)
noisy_model = model.clone('share', {x: noisy_inputs})
auxiliary_loss = C.squared_error(model, noisy_model)
but you will have to tune the scale of the noise for your problem.

CNTK learners only accept numbers as regularizer (L1/L2) values. If you really want to add your custom regularizer, you can easily implement your own Learner. You will have access to the gradients you need. You will find couple of examples on how to implement your own Learner here.

Here's the code to do this:
def cross_entropy_with_softmax_plus_regularization(model, labels, l2_regularization_weight):
w_norm = C.Constant(0);
for p in (model.parameters):
w_norm = C.plus(w_norm, 0.5*C.reduce_sum(C.square(p)))
return C.reduce_log_sum_exp(model.output) -
C.reduce_log_sum_exp(C.times_transpose(labels, model.output)) + l2_regularization_weight*w_norm
and my http://www.telesens.co/2017/09/29/spiral_cntk/ blog post about it

Alternatives for loss functions in python CNTK

I have created a sequential model in CNTK and pass this model into a loss function like the following:
ce = cross_entropy_with_softmax(model, labels)
As mentioned here and as I have multilabel classifier, I want to use a proper loss function. The problem is I can not find any proper document to find these loss functions in Python. Is there any suggestion or sample code for this requirement.
I should notice that I found these alternatives (logistic and weighted logistic) in BrainScript language, but not in Python.

"my data has more than one label (three label) and each label has more than two values (30 different values)"
Do I understand right, you have 3 network outputs and associated labels, and each one is a 1-in-30 classifier? Then it seems you can just add three cross_entropy_with_softmax() values. Is that what you want?
E.g. if the model function returns a triple (ending in something like return combine([z1, z2, z3])), then your criterion function that you pass to Trainer could look like this (if you don't use Python 3, the syntax is a little different):
from cntk.layers.typing import Tensor, SparseTensor
#Function
def my_criterion(input : Tensor[input_dim], labels1 : SparseTensor[30],
labels2 : SparseTensor[30], labels3 : SparseTensor[30]):
z1, z2, z3 = my_model(input).outputs
loss = cross_entropy_with_softmax(z1, labels1) + \
cross_entropy_with_softmax(z2, labels2) + \
cross_entropy_with_softmax(z3, labels3)
return loss
learner = ...
trainer = Trainer(None, my_criterion, learner)
# in MB loop:
input_mb, L1_mb, L2_mb, L3_mb = my_next_minibatch()
trainer.train_minibatch(my_criterion.argument_map(input_mb, L1_mb, L2_mb, L3_mb))

Update (based on comments below): If you are using a sequential model then you are probably interested in taking a sum over all positions in the sequence of the loss at each position. cross_entropy_with_softmax is appropriate for the per-position loss and CNTK will automatically compute the sum of the loss values over all positions in the sequence.
Note that the terminology multilabel is non-standard here as it is typically referring to problems with multiple binary labels. The wiki page you link to refers to that case which is different from what you are doing.
Original answer (valid for the actual multilabel case): You will want to use binary_cross_entropy or weighted_binary_cross_entropy. (We decided to rename Logistic when porting this to Python). At the time of this writing these operations only support {0,1} labels. If your labels are in (0,1) then you will need to define your loss like this
import cntk as C
my_bce = label*C.log(model)+(1-label)*C.log(1-model)

Currently, most operators are in the cntk.ops package and documented here. The only exception being the sequence related operators, which reside in cntk.ops.sequence.
We have plans to restructure the operator space (without breaking backwards compatibility) to increase discoverability.
For your particular case, cross_entropy_with_softmax seems to be a reasonable choice, and you can find its documentation with examples here. Please also check out this Jupyter Notebook for a complete example.

What is the loss function that use the DNNRegressor?

I am using DNNRegressor to train my model. I search in the documentation what is the loss function used by this wrapper but i don't find it. On the other hand, it is possible to change that loss function?.
Thank you for your suggestions.

It uses L2 loss (mean squared error) as defined in target_column.py:
def regression_target(label_name=None,
weight_column_name=None,
target_dimension=1):
"""Creates a _TargetColumn for linear regression.
Args:
label_name: String, name of the key in label dict. Can be null if label
is a tensor (single headed models).
weight_column_name: A string defining feature column name representing
weights. It is used to down weight or boost examples during training. It
will be multiplied by the loss of the example.
target_dimension: dimension of the target for multilabels.
Returns:
An instance of _TargetColumn
"""
return _RegressionTargetColumn(loss_fn=_mean_squared_loss,
label_name=label_name,
weight_column_name=weight_column_name,
target_dimension=target_dimension)
and currently API does not support any changes here. However, since it is open source - you can always modify the constructor to call different function internally, with different loss.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Tensorflow: KL divergence for categorical probability distribution - python

Related

sklearn mixture.GMM in python using univariate GMM

DBSCAN with custom metric

How can I apply custom regularization in CNTK (using python)?

Alternatives for loss functions in python CNTK

What is the loss function that use the DNNRegressor?

Categories

Resources