Using Sequential Feature Selection for unsupervised learning - python

I am having problems using sklearn's SequentialFeatureSelector for feature selection prior to clustering.
Please see my code:
Shooting_ST_array = np.array(Shooting_ST)
kmeans1 = KMeans(n_clusters=4)
kmeans1 = kmeans1.fit(Shooting_ST_array)
sfs = SequentialFeatureSelector(estimator=kmeans1, n_features_to_select=5, direction='backward')
sfs.fit(Shooting_ST_array, y=None)
sfs.get_feature_names_out(Shooting_ST.columns)
Depending on whether I used forward or backward selection, the algorithm returns the first 5 or last 5 columns.
I am also getting the following attribute error:
AttributeError: 'NoneType' object has no attribute 'split'
Does anyone have an idea what I am doing wrong? I have unfortunately not found any example implementations of the method for unlabelled data.

Related

AttributeError: 'numpy.ndarray' object has no attribute 'predict'

i am trying to find a predicted Y value (output is numerical) with x inputs using strings (eg. Business type, Department and Region). After using this :
print(model.predict([['Finance and Control'], ['EMEA'], ['Professional Services']]))
it returned this error : AttributeError: 'numpy.ndarray' object has no attribute 'predict'
import pickle
model = pickle.load(open('model3.pkl','rb'))
print(model.predict([['Finance and Control'], ['EMEA'], ['Professional Services']]))
Sample array after OHE
I ran into a similar issue as this, though without additional context, I'm not sure our errors derive from the same problem. However, the resolution I arrived at may help someone in the future as they go down the SO rabbit hole.
Using sklearn-0.20,
import joblib
model = joblib.load('model.pkl')
model.predict(previously_loaded_data)
Resulted in
AttributeError: 'numpy.ndarray' object has no attribute 'predict'
However, the following allowed me to load the actual model and use its predict method:
from sklearn.externals import joblib
model = joblib.load('model.pkl')
model.predict(previously_loaded_data)
sklearn.externals.joblib is deprecated since sklearn-0.23+, but my use case required sklearn-0.20.

PySAL OLS Model: AttributeError: 'OLS' object has no attribute 'predict'

I have divided my data into training and validation samples and have successfully fit my model with three types of linear models. What I cannot figure out how to do is apply the model to the validation sample to evaluate the fit. When I attempt to apply the model to the holdout sample (sorry, I know that this isn't a reproducible example but I think that the issue is pretty clear. I'm just putting this snippet here for completeness. Please be gentle!):
valid = validation.loc[:, x + [ "sale_amt"]]
holdout1 = m1.predict(valid)
I get the following error message:
AttributeError Traceback (most recent call last)
in ()
8
9 valid = validation.loc[:, x + [ "sale_amt"]]
---> 10 holdout1 = m1.predict(valid)
AttributeError: 'OLS' object has no attribute 'predict'`
Other Python OLS regression packages have a 'predict' method, but it doesn't seem that PySAL does. I realize that the function coefficients (betas) are available and will pursue applying them to my validation data directly, but I was hoping that there is a simple answer that I just missed.
I apologize if it is bad form to answer my own question, but I did come up with a solution. I contacted Daniel Arribas-Bel, one of the PySAL developers, and he helped guide me to the result I was seeking. Note that my PySAL OLS object is named m1, and my validation dataframe is called 'validation':
m1 = ps.model.spreg.OLS(...)
m1.intercept = m1.betas[0] # Get the intercept from the betas array
m1.coefficients = m1.betas[1:len(m1.betas)] # Get the coefficients from the betas array
validation['predicted_price'] = m1.intercept + validation.loc[:, x].dot( m1.coefficients)
Note that this is the method I would use for a non-spatial model adapted for the KNN model I built in PySAL and might not be technically fully correct for a spatial model. Caveat emptor.

Semi-supervised learning for regression by scikit-learn

Can Label Propagation be used for semi-supervised regression tasks in scikit-learn?
According to its API, the answer is YES.
http://scikit-learn.org/stable/modules/label_propagation.html
However, I got the error message when I tried to run the following code.
from sklearn import datasets
from sklearn.semi_supervised import label_propagation
import numpy as np
rng=np.random.RandomState(0)
boston = datasets.load_boston()
X=boston.data
y=boston.target
y_30=np.copy(y)
y_30[rng.rand(len(y))<0.3]=-999
label_propagation.LabelSpreading().fit(X,y_30)
It shows that "ValueError: Unknown label type: 'continuous'" in the label_propagation.LabelSpreading().fit(X,y_30) line.
How should I solve the problem? Thanks a lot.
It looks like the error in the documentation, code itself clearly is classification only (beggining of the .fit call of the BasePropagation class):
check_classification_targets(y)
# actual graph construction (implementations should override this)
graph_matrix = self._build_graph()
# label construction
# construct a categorical distribution for classification only
classes = np.unique(y)
classes = (classes[classes != -1])
In theory you could remove the "check_classification_targets" call and use "regression like method", but it will not be the true regression since you will never "propagate" any value which is not encountered in the training set, you will simply treat the regression value as the class identifier. And you will be unable to use value "-1" since it is a codename for "unlabeled"...

How to use my own classifier in ensemble python

The main aim is to add a deep learning classification method like CNN as an individual in ensemble in python.
The following code works fine:
clf1=CNN()
eclf1=VotingClassifier(estimators=[('lr', clf1)], voting='soft')
eclf1=eclf1.fit(XTrain,YTrain)
But, the error:
'NoneType' object has no attribute 'predict_proba'
comes up once running eclf1=eclf1.predict(XTest).
Just in case, The CNN consists of _fit_ function for training, and the following function:
def predict_proba(self,XTest):
#prediction=np.mean(np.argmax(teY, axis=1) == predict(teX))
teX=XTest.reshape(len(XTest),3,112,112)
p=predict(teX)
i = np.zeros((p.shape[0],p.max()+1))
for x,y in enumerate(p):
i[x,y] = 1
return i
Can you elaborate better what you did and which error you came across?
By your question only I can assume you tried to call 'predic_proba' after the line eclf1=eclf1.predict(XTest). And of course this will turn on an error because the eclf1.predict(XTest) returns an array, which doesn't have a predict() method.
Try just changing it to:
pred_results=eclf1.predict(XTest)
pred_result_probs = eclf1.predict_proba(XTest)

Calling goodness of fit value with result.prsquared() in statsmodels results TypeError: 'numpy.float64' object is not callable

I am a complete beginner, and I'm currently doing this tutorial about logit regression models in python 3.4, with statsmodels 0.6.1 and Pycharm community version 4.5.1:
http://blog.yhathq.com/posts/logistic-regression-and-python.html
It runs smoothly. I try to add my own lines, to try out a few things.
After the part when I fit the data
train_cols = data.columns[1:]
logit = sm.logit(data['admit'], data[train_cols])
result = logit.fit()
and I print out the summary
print(result.summary())
I tried to take a little detour from the tutorial, to print only the Goodness of Fit measurement (in this case, it is a pseudo R-squared value). According to the documentation it is a method of result object (same as summary), so it should work like this:
print(result.prsquared())
However, running this code results in a TypeError on a line containing only print(result.prsquared()):
TypeError: 'numpy.float64' object is not callable
It really bugs me, because if I would to compare several models, pseudo R-squared would be my first choice to do it.
prsquared is an attribute, not a function. Try:
print(result.prsquared)

Categories

Resources