LASSO IN PYTHON FOR CLASSIFICATION - python

I am studying with LASSO in python with sklearn, but it is incorrect when I run the code for classification data set and the obtained result is only one after 10-fold cross-validation.
Y is binary label with 1 and 2.
import numpy as np
from sklearn.linear_model import LassoCV, Lasso
from sklearn.model_selection import cross_val_score
lasso = Lasso().fit(X,Y)
accs=cross_val_score(lasso, X, Y, scoring=None, cv=10)
print('The results:',accs)
I expect get the ten different results after cross-validation with lasso in python.

LASSO is for regression type of machine learning. There are two types: Classification and Regression. Perhaps you should try Random forest classification instead.

Related

Why applying cross validation before training a model

So, I am struggling to understand why is it that, as a common practice, a cross-validation step is done to a model does has not been trained yet. An example of what I am saying can be found in here. A piece of the code is pasted below:
from numpy import mean
from numpy import std
from sklearn.datasets import make_classification
from sklearn.model_selection import RepeatedKFold
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
# create dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# prepare the cross-validation procedure
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
# create model
model = LogisticRegression()
# evaluate model
scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)
# report performance
print('Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))
Questions:
What would be the purpose of the cross-validation at that point?
Does some training procedure take place on any part of that code?
How does RepeatedKFold contributes to tackling an unbalance dataset (let's assume that this is the case).
Thanks in advance!
cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)
according to the documentation the "cross_val_score" fits the model using the given cross validation technique, there
in the code above, "model" contains the model that will be fit, and "cv" contains information about the cross validation method that the "cross_val_score" will use to structure the training and CV sets and evaluate the model.
in other words, those are just definitions, the actual training and CV happen inside the "cross_val_score" function.
How does RepeatedKFold contributes to tackling an unbalance dataset (let's assume that this is the case).
KFold CV generally doesn't tackle an unbalanced dataset, it just assures that the result will not be biased by the choice of the training/CV datasets,
Repeated k-fold cross-validation provides a way to improve the estimated performance of a machine learning model. This involves simply repeating the cross-validation procedure multiple times and reporting the mean result across all folds from all runs. This mean result is expected to be a more accurate estimate of the true unknown underlying mean performance of the model on the dataset, as calculated using the standard error.
if you want to tackle an unbalanced dataset you have to use a better metric than accuracy, like ‘balanced_accuracy’ or ‘roc_auc’ and making sure both the training and CV datasets have both positive and negative cases.

Sklearn - Can we use cross validation and batch training in same model?

Is it possible to train a sklearn random forest model by using k-fold cross validation by giving batch by batch the input set? Because I have some problems with a memory issue. I can not fit all training data at a time. Besides, I want to use cross-validation (not train test split). Is there any example of usage for that?
Update: I found this website where they present a library to do that. In that, the is an example as follows:
from dask_ml.wrappers import Incremental
from dask_ml.datasets import make_classification
import sklearn.linear_model
X, y = make_classification(chunks=25)
est = sklearn.linear_model.SGDClassifier()
clf = Incremental(est, scoring='accuracy')
clf.fit(X, y, classes=[0, 1])
However, I can not understand where I give my own X and y data to the model. I do not see where they came from. How can I fit my own train data to the model?

What is .linear_model in sklearn.linear_model

I want to know what is the meaning of .linear_model in the following code -
from sklearn.linear_model import LogisticRegression
My understanding is sklearn is the library/module (both have same meaning) and LogisticRegression is the class inside this module.
But I'm not able to understand what .linear_model means?
linear_model is a module. sklearn is a package. A package is basically a module that contains other modules.
linear_model is a class of the sklearn module if contain different functions for performing machine learning with linear models.
The term linear model implies that the model is specified as a linear combination of features. Based on training data, the learning process computes one weight for each feature to form a model that can predict or estimate the target value.
It includes :
Linear regression and classification, Ridge regression and classification, Lasso, Multi-task Lasso
etc..
Check the sklearn doc for further details.

How to apply Leave one out cross validation with logistic regression and find the values of Coefficents?

I have written a code that performs logistic regression with leave one out cross validation. I need to know the value of coefficients for logistic regression. But the attribute model. Coefficients_ work only after the model have used fit function. But as I have performed Cross validation so I have not used fit function to train the model.
Here is the code:
from sklearn.model_selection import LeaveOneOut
from sklearn.linear_model import LogisticRegression
reg=LogisticRegression()
loo=LeaveOneOut()
scores=cross_val_score(reg,train1,labels,cv=loo)
print(scores)
print(scores.mean())
coef = classifier.coef_
I want to know coefficient values for my features in train1 but as I have not used fit method, How can I get the values of these coefficients?

Computing training score using cross_val_score

I am using cross_val_score to compute the mean score for a regressor. Here's a small snippet.
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
cross_val_score(LinearRegression(), X, y_reg, cv = 5)
Using this I get an array of scores. I would like to know how the scores on the validation set (as returned in the array above) differ from those on the training set, to understand whether my model is over-fitting or under-fitting.
Is there a way of doing this with the cross_val_score object?
You can use cross_validate instead of cross_val_score
according to doc:
The cross_validate function differs from cross_val_score in two ways -
It allows specifying multiple metrics for evaluation.
It returns a dict containing training scores, fit-times and score-times in addition to the test score.
Why would you want that? cross_val_score(cv=5) does that for you as it splits your train data 10 times and verifies accuracy scores on 5 test subsets. This method already serves as a way to prevent your model from over-fitting.
Anyway, if you are eager to verify accuracy on your validation data, then you have to fit your LinearRegression first on X and y_reg.

Categories

Resources