Sklearn SVM with non-numpy objects and custom kernel

Sklearn SVM with non-numpy objects and custom kernel - python

I want to do SVM classification (i.e. OneClassSVM) with sklearn.svm.OneClassSVM on physical states that come from a different library (tenpy). I'd define a custom kernel
def overlap(X,Y):
return np.array([[x.overlap(y) for y in Y] for x in X])
where overlap() is a defined function in said library to calculate the overlap between states. When I try to fit with my data
clf = OneClassSVM(kernel=overlap)
clf.fit(states)
where states is a list of such state objects, I get the error
TypeError: float() argument must be a string or a number, not 'MPS'
Is there a way to tell sklearn to ignore this test (w/o editing the source code)?
To my understanding the nature of the data and how it's processed is in principal not essential to the algorithm as long as there is a well-defined kernel for the objects.

Related

logit optimization methods for binary set of variables

I am looking for help about the implementation of a logit model with statsmodel for binary variables.
Here is my code :
(I am using feature selection methods : MinimumRedundancyMaximumRelevance and RecursiveFeatureElimination available on python)
for i_mrmr in range(4,20):
for i_rfe in range(3,i_mrmr):
regressors_step1 = I am selecting the MRMR features
regressors_step2 = I am selecting features from the previous list with RFE method
for method in ['newton', 'nm', 'bfgs', 'lbfgs', 'powell', 'cg', 'ncg']:
logit_model = Logit(y,X.loc[:,regressors_step2])
try:
result = logit_model.fit(method=method, cov_type='HC1')
print(result.summary)
except:
result = "error"
I am using Logit from statsmodels.discrete.discrete_model.Logit.
The y variable, the target, is a binary.
All explanatory variables, the X, are binary too.
The logit model is "functionning" for the different optimization methods. That is to say, I end up with some summary to print. Nonetheless, various warnings print such as : "Maximum Likelihood optimization failed to converge."
The optimization methods presented in the statsmodel algorithm are the ones from scipy :
‘newton’ for Newton-Raphson, ‘nm’ for Nelder-Mead
‘bfgs’ for Broyden-Fletcher-Goldfarb-Shanno (BFGS)
‘lbfgs’ for limited-memory BFGS with optional box constraints
‘powell’ for modified Powell’s method
‘cg’ for conjugate gradient
‘ncg’ for Newton-conjugate gradient
We can find these methods on scipy.optimize.
Here are my questions :
I did not find anywhere any argument against the use of these optimization methods for a binary set of variables. But, because of the warnings, I am asking myself if it is correct to do so. And then, what is the best method, the one which is the more appropriate in this case ?
Here : Scipy minimize: how to restrict x only to 0 and 1? it is implicitely said that a model of the kind Python MIP (Mixed-Integer Linear Programming) could be better in the binary set of variables case. In the documentation of the MIP package of python it appears that to implement this kind of model I should explicitly give a function to minimize or maximize and also I should express the constraints... (see : https://docs.python-mip.com/en/latest/quickstart.html#creating-models)
Therefore I am wondering if i need to define a logit function as the objective function ? What are the constraints I should express ? Is there any easier way to do ?

How is Tensorflow SparseCategoricalCrossentropy is Impelemented?

I am working on a weighted version of SparseCategoricalCrossentropy. right now my implementation is converting y_true to one hot form and calculates the cross entropy then multiplies it with a weight matrix. I get the same output between my implementation and SparseCategoricalCrossentropy when weights are all 1 however my problem is with one hot encoding. I have a lot of classes (32+bg) and when using one hot encoding I run out of memory for large images/batch sizes which does not happen with SparseCategoricalCrossentropy. I am trying to figure out how is the built in one implemented (is there a way to avoid one hot encoding etc.). How is the built in one implemented or where is it implemented looking at [1] it is probably implemented on the native side but I can not find it?
[1] https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/losses.py#L692

The SparseCategoricalCrossentropy documentation has a "View Source on GitHub" tab you can click on. This will show you the implementation. Doing this leads us to line 666 of tensorflow.python.keras.losses. We can see from the class definition that it wraps a function sparse_categorical_crossentropy which is defined on line 4867 of tensorflow.keras.backend. We can see at the bottom of the function definition this is a wrapper around tf.nn.sparse_softmax_cross_entropy_with_logits and this function definition can be found in tensorflow.python.ops.nn_ops. At the bottom of this function definition, we can see it is a wrapper around gen_nn_ops.sparse_softmax_cross_entropy_with_logits. If you look for gen_nn_ops, you won't find it. It is the name of the *.so file that python imports to run tensorflow's C++ op code. So what we are really looking for is a sparse softmax C++ kernel, which can be found in tensorflow.core.kernels.sparse_xent_op.cc. This op calls a functor which calls a method SparseXentEigenImpl whose implementation can be found in the corresponding header file, sparse_xent_op.h. And starting on line 47 of that file you can see how they create the sparse loss.
// Generator for calculation of the sparse Xent loss.
// This generator takes the logits, the sum of the exponentiated
// logits, and the label indices. For each minibatch entry, ignoring
// the batch index b, it calculates:
//
// loss[j] = (log(sum_exp_logits) - logits[j]) * 1{ j == label }
//
// for j = 0 .. num_classes. This value must be summed over all j for
// the final loss.
And on line 224 there is a comment of outlining the loss calculation formula.
// sum(-labels *
// ((logits - max_logits) - log(sum(exp(logits - max_logits)))))
// along classes
Not sure if this helps you create your weighted op, but this is how sparse xent is calculated in tensorflow.
Edit:
There also is a method tf.nn.weighted_cross_entropy_with_logits. Not sure if that will work with your sparsity requirement, but will probably work better than trying to implement something yourself.

Freeze only some lines of a torch.nn.Embedding object

I am quite a newbie to Pytorch, and I am trying to implement a sort of "post-training" procedure on embeddings.
I have a vocabulary with a set of items, and I have learned one vector for each of them.
I keep the learned vectors in a nn.Embedding object.
What I'd like to do now is to add a new item to the vocabulary without updating the already learned vectors. The embedding for the new item would be initialized randomly, and then trained while keeping all the other embeddings frozen.
I know that in order to prevent a nn.Embedding to be trained, I need to set to False its requires_grad variable. I have also found this other question that is similar to mine. The best answer proposes to
either store the frozen vectors and the vector to train in different nn.Embedding objects, the former with requires_grad = False and the latter with requires_grad = True
or store the frozen vectors and the new one in the same nn.Embedding object, computing the gradient on all vectors, but descending it is only on the dimensions of the vector of of the new item. This, however, leads to a relevant degradation in performances (which I want to avoid, of course).
My problem is that I really need to store the vector for the new item in the same nn.Embedding object as the frozen vectors of the old items. The reason for this constraint is the following: when building my loss function with the embeddings of the items (old and new), I need to lookup the vectors based on the ids of the items, and for performances reasons I need to use Python slicing. In other words, given a list of item ids item_ids, I need to do something like vecs = embedding[item_ids]. If I used two different nn.Embedding items for the old items and the and new one I would need to use an explicit for-loop with if-else conditions, which would lead to worse performances.
Is there any way I can do this?

If you look at the implementation of nn.Embedding it uses the functional form of embedding in the forward pass. Therefore, I think you could implement a custom module that does something like this:
import torch
from torch.nn.parameter import Parameter
import torch.nn.functional as F
weights_freeze = torch.rand(10, 5) # Don't make parameter
weights_train = Parameter(torch.rand(2, 5))
weights = torch.cat((weights_freeze, weights_train), 0)
idx = torch.tensor([[11, 1, 3]])
lookup = F.embedding(idx, weights)
# Desired result
print(lookup)
lookup.sum().backward()
# 11 corresponds to idx 1 in weights_train so this has grad
print(weights_train.grad)

What is the parameter copy_x in sklearn linear regression?

The documentation says this:
copy_X : boolean, optional, default True
If True, X will be copied; else, it may be overwritten.
What would X be overwritten with? And which X, during training or testing?

It is written in the source code that the input data needs to be centered and normalized in order to apply the algorithm: https://github.com/scikit-learn/scikit-learn/blob/bac89c2/sklearn/linear_model/base.py#L93.
It seems that the function responsible for the transformation self._preprocess_data is only called during the fitting part https://github.com/scikit-learn/scikit-learn/blob/bac89c2/sklearn/linear_model/base.py#L463. So only the training set could be modified.
Hope it helped.

How do I know what prior's I'm giving to sci-kit learn? (Naive-bayes classifiers.)

In sci-kit learn's naive bayesian classifiers you can specify the prior probabilities, and the classifier will use those provided probabilities in it's calculations. But I don't know how the prior probabilities should be ordered.
from sklearn.naive_bayes import BernoulliNB
data = [[0], [1]]
classes = ['light bulb', 'door mat']
classes.shuffle() # This simulates getting classes from a complex source.
classifier = BernoulliNB(class_prior=[0, 1]) # Here we provide prior probabilities.
classifier.fit(data, classes)
In the above code, how do I know which class is assumed to be the 100% prior? Do I need to consider the order of the classes in the data before specifying prior probabilities?
I would also be interested in knowing where this documented.

It seems to be undocumented. When fit, target is preprocessed by LabelBinarizer, so you can get your data's classes with
from sklearn.preprocessing import LabelBinarizer
labelbin = LabelBinarizer()
labelbin.fit_transform(classes)
Then labelbin.classes_ contains resulting classes for your target data (classes), in order corresponding to one of priors.

The order is that of classes after sorting, so P(light bulb)=.4 would be specified using [.6, .4] because "door mat" < "light bulb".

Deeply nested within the code base the following happens: The classes you provide samplewise to the call of fit() are turned into a set, sorted and then stored in that order in the classifier object (alphabetical or numerical order). The priors provided for __init__() correspond to the classes in this exact order.
Apparantly this is undocumented.
For further reading:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/naive_bayes.py
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/preprocessing/label.py
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/multiclass.py

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.