LightGBM - Binary Classification using train API - python

I have a binary classification problem which I'm trying to solve using LightGBM's train and cv APIs.
First I have tuned the hyperparameters by using hyperopt together with an objective function that wraps the LightGBM CV API call. For that, since the target classes are highly unbalanced, I've used the customized focal loss function with f1-score evaluation to find the best fit.
When I try to fit the final model using the optimized parameters, the model doesn't consider it as a binary problem and outputs continuous values at prediction. See the attached image.
Anyone knows what I'm missing ?
Jupyter notebook

Related

Are there any alternate ways to apply class weights to tensorflow neural networks?

I am currently trying to create a Tensorflow DNN model with a multilabel target variable, and whilst my code hasn't had any problems so far, the imbalanced nature of the dataset that I'm working with has caused a few problems.
As per recommendations in Keras' documentation, I've applied an intial bias to the model. I've also tried to enable the class weight parameter in the model compile function and this is where I'm stuck
https://github.com/tensorflow/tensorflow/issues/41448
There seems to be a known bug in this method as seen in this GitHub link, and my attempts at creating a workaround haven't been successful at all. I'd appreciate any advice on creating a workaround because I'm at a loss myself to be honest. Currently running Tensorflow 2.4
You are using a slightly old version of TensorFlow. This worked for me in a multiclass dataset using TensorFlow 2.7 and Keras 2.7:
from sklearn.utils.class_weight import compute_class_weight
class_weights = compute_class_weight(class_weight="balanced", classes=np.unique(y_train),
y=y_train)
model.fit(
...
class_weight=dict(enumerate(class_weights))
)
The values of y_train must be integers in the range [0, NUMBER_CLASSES - 1] for this code to work correctly. You can accomplish this using LabelEncoder.
Alternatively, you can use sample_weight instead of class_weight to accomplish the same thing (in fact, Keras internally converts class_weight to sample_weight). Here you can find the documentation about these parameters.
Other easy-to-implement and effective methods to combat data imbalance are oversampling and undersampling, which have a similar effect to using class_weight. You can use them in case you have problems using class_weight or sample_weight.

How to train svm model in python with scikit and use it for predictions in C++?

I would like to know, if there is the possibility to somehow train a svm classifier using scikit in python (love this module and its documentation) and import that trained model into C++ for prediction making.
Here is how far I got:
I have written a python script which uses scikit to create a reasonable svm classifier
I can also store that model in pickle format
Now, I had a look at libSVM for C++ but I do not see how that is able to import such a model. I think that the documentation is not that good or I missed something here.
However, I also thought that instead of storing the whole model, I could just store the parameters of the SVM Classifier and load only those parameters ( I think the needed once are: Support Vectors, C, degree) for a linear SVM classifier. Unfortunately, I cannot find any documentation of libSVM on how to do that.
A last option which I would not prefer that much would be to go with OpenCV in which I could train a SVM classifier, store it and load it back all in C++. But this would introduce even more library dependencies (especially such a large one) for my program. If there is a good way to avoid that, I would love to do so.
As always I thank you in advance!
Best,
Tukk

Can Optunity be used in Multi class svm?

I ran a program with optunity to find the hyperparameter of SVM without deciding the kernel first as seen here http://optunity.readthedocs.io/en/latest/notebooks/notebooks/sklearn-svc.html#tune-svc-without-deciding-the-kernel-in-advance it ran but when i replaced the data and labels with multi class information it commits an error how come this is hapenning.
Optunity uses ROC-AUC for selecting optimum hyper parameters. AUC can't be estimated for multiclass problems. Some workaround solutions for using Optunity for multiclass problems are:
Use Accuracy as criterian for selecting optimum parameters rather than AUC
Convert multiclass into binary class and select optimum parameters for each classifier.

Logistic regression multiclass classification with Python API

currently the Python API does not yet support multi class classification within Spark, but will in the future as it is described on the Spark page 1.
Is there any release date or any chance to run it with Python that implements multi class with Logistic regression? I know it does with Scala, but I would like to run it with Python. Thank you.
scikit-learn's LogisticRegression offers a multi_class parameter. From the docs:
Multiclass option can be either ‘ovr’ or ‘multinomial’. If the option
chosen is ‘ovr’, then a binary problem is fit for each label. Else the
loss minimised is the multinomial loss fit across the entire
probability distribution. Works only for the ‘lbfgs’ solver.
Hence, multi_class='ovr' seems to be the right choice for you.
For more information: see this link
Added:
As per the pyspark documentation, you can still do multi class regression using their API. Using the class pyspark.mllib.classification.LogisticRegressionWithLBFGS, you get the optional parameter numClasses for multi-class classification.

Classifying new occurances - Multinomial Naive Bayes

So I have currently trained a Multinomial Naive Bayes classifier, using [SKiLearn][1]
Now what I can do is classify test data by using predict.
But if I want to run this every night, as a script, I clearly need to always have a classifier already trained up! Now what I'd like to be able to do, is take classifier coefficients, informative words, and use these to classify new data.
Is this possible - to develop my own method for classification? Or should I be simply training the SkiLearn classifier nightly?
EDIT: One thing, it seems I can do, is retain and save my trained classifier.
However with logistic regression, you can take the coefficients and use these on new data. Is there anything similar to this for NB?
Do you mean [sklearn]? Are you using python? If that is the case, it turns out that [sklearn] provides a function for getting the parameters of the model [get_params(deep=True)] as well as a function for setting them [set_params(**params)].
Therefore, a possible procedure could be:
Training stage:
1) Train the model
2) Get the parameters of the model by using get_params()
3) Save the parameters into a binary file (e.g. by using pickle.dump())
Prediction stage:
1) Load the parameters of the model from the binary file (e.g. by using pickle.load())
2) Set the parameters of the model by using set_params()
3) Classify new data by using the predict() function
Hope that helps.

Categories

Resources