The project I am currently working on makes use of the sklearn svm.SVC class where at one point in the code instantiate the following:
self.classifier = OneVsRestClassifier(SVC(kernel = 'linear', probability = True))
After fitting the classifier, I then try to inspect the support_vector_ or support_ attributes of the classifier. However, I get the following error:
'SVC' object has no attribute 'support_vectors_'
I tried changing the kernel to 'poly' or 'rbf', but this does not fix the error. Why is this happening? Shouldn't any linear SVM have something (i.e. 'None' at the least) for this attribute? I am using sklearn version 0.15.1 if that helps.
Thanks!
Assuming you obtained the error message by trying to evaluate
self.classifier.estimator.support_vectors_
observe that OneVsRestClassifier clones your estimator as many times as there are classes and fits as many of them to your data. They can be found in the estimators_ variable of the ovr. Try
self.classifier.estimators_[0].support_vectors_
That will give you the support vectors for the first OVR problem.
Related
I am using a LinearSVC, i pre-processed the numeric and categorical data using column transformer,then used pipeline. I used GridSearchCV to get the best parameters for the model which i later put into the pipeline as you can see.
I fit,tested and got the score as well but i want to know the most important feature coefficients.
So far, i have tried " clf.coef_ " as the classifier step is named as clf in the pipeline but i get a message saying clf not defined.
I also tried gridf.coef_,pipefinal.steps[1].coef_ but nothing worked.
So any help in this regard will be highly appreciated. Thanks.
preprocessing=ColumnTransformer([('hot',OneHotEncoder(),categ),('scale',StandardScaler(),num)],n_jobs=-1)
pipefinal=Pipeline([('pre',preprocessing),('clf',LinearSVC(max_iter=100000,C=0.1))])
gridf=GridSearchCV(pipefinal,param_grid={},cv=10)
gridf.fit(X_train,y_train)
gridf.score(X_val,y_val)
GridSearchCV will make the best estimator available through its best_estimator_ attribute after you have called the fit() method. Since your estimator is a Pipeline object, you have to further subscript it to access the classifier. Then, you can access its coef_ attribute. In your case, that would be:
gridf.best_estimator_['clf'].coef_
In the current version of TF (2.2.0) there is an option
to do multi class classification (i.e., more than two classes, by changing
n_classes to the relevant number in the estimator params).
However, all previous examples that I saw, for example the formal one here:
https://www.tensorflow.org/tutorials/estimator/boosted_trees_model_understanding
present binary classification. So I'm not sure what to do with the target (classes) vector.
If I keep him in the range of [0,...,num_classes-1], when I try to train the model, I get the error (comes from TF gradients.py file):
"'int' object has no attribute 'is_compatible_with'". It feels like a dimension\shape error with respect to class vector, but I
couldn't find the default loss function and what the this model expects to get. I don't think I need to convert the class vector to binary matrix (one hot encoding). Appreciate any help!
Indeed when I've changed the TF code manually everything worked.
Then, I found out there is a bug report on the issue here:
https://github.com/tensorflow/tensorflow/issues/40063
I am new to Python and Machine learning. I have searched internet regarding my question and tried the solution people have suggested, but still not get it. Would really appreciate it if anyone can help me out.
I am working on my first XGboost model. I have tuned the parameters by using xgb.XGBClassifier, and then would like to enforce monotonicity on model variables. Seemingly I have to use xgb.train() to enforce monotonicity as shown in my code below.
xgb.train() can do predict(), but NOT predict_proba() function. So how can I get probability from xgb.train() ?
I have tried to use 'objective':'multi:softprob' instead of 'objective':'binary:logistic'. then score = bst_constr.predict(dtrain). But the score does not seem right to me.
Thank you so much.
params_constr={
'base_score':0.5,
'learning_rate':0.1,
'max_depth':5,
'min_child_weight':100,
'n_estimators':200,
'nthread':-1,
'objective':'binary:logistic',
'seed':2018,
'eval_metric':'auc'
}
params_constr['monotone_constraints'] = "(1,1,0,1,-1,-1,0,0,1,-1,1,0,1,0,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,)"
dtrain = xgb.DMatrix(X_train, label = y_train)
bst_constr = xgb.train(params_constr, dtrain)
X_test['score']=bst_constr.predict_proba(X_test)[:,1]
AttributeError: 'Booster' object has no attribute 'predict_proba'
So based on my understanding, you are trying to obtain the probability for each class in the prediction phase. Two options.
It seems that you are using the XGBoost native api. Then just select the 'objective':'multi:softprob' as the parameter, and use the bst_constr.predict instead of bst_constr.predict_proba.
XGBoost also provides the scikit-learn api. But then you should initiate the model with bst_constr = xgb.XGBClassifier(**params_constr), and use bst_constr.fit() for training. Then you can call the bst_constr.predict_proba to obtain what you want. You can refer here for more details Scikit-Learn API in XGBoost.
I used logistic regression with python and got an accuracy score of 95%, how do I get this equation so that I can actually implement it?
I wrote:
model = LogisticRegression()
model.fit(train_X,train_y)
prediction=model.predict(test_X)
print('Accuracy:', "\n", '%',metrics.accuracy_score(prediction,test_y) * 100)
and my output was:
Accuracy:
%95.5555555556
The model object has an attribute called coef_ where the coefficients of the model are stored. In addition, the attribute intercept_ gives the intercept of the model.
I'm assuming you're using SkLearn. But what do you mean by implement it? Are you looking to write it into a separate language, or use a different library (i.e. TensorFlow)?
If you just want to keep the model and use it in a python program later, you can save and load it with Pickle.
Initially, I used sklearn to perform logistic regression. When I asked how to get the p-values from the coefficients I got the reply that I should use statsmodels even though I mentioned that it doesn't read it in my data (and I've never used it before).
My training data is a np.array and my labels are a list. I get the following error TypeError: cannot perform reduce with flexible type I've never worked with this module before so I have no clue how to use it and why it doesn't accept my data since sklearn seems to have no problems. How do I make this work and get the p-values?