How to use my own classifier in ensemble python - python

The main aim is to add a deep learning classification method like CNN as an individual in ensemble in python.
The following code works fine:
clf1=CNN()
eclf1=VotingClassifier(estimators=[('lr', clf1)], voting='soft')
eclf1=eclf1.fit(XTrain,YTrain)
But, the error:
'NoneType' object has no attribute 'predict_proba'
comes up once running eclf1=eclf1.predict(XTest).
Just in case, The CNN consists of _fit_ function for training, and the following function:
def predict_proba(self,XTest):
#prediction=np.mean(np.argmax(teY, axis=1) == predict(teX))
teX=XTest.reshape(len(XTest),3,112,112)
p=predict(teX)
i = np.zeros((p.shape[0],p.max()+1))
for x,y in enumerate(p):
i[x,y] = 1
return i

Can you elaborate better what you did and which error you came across?
By your question only I can assume you tried to call 'predic_proba' after the line eclf1=eclf1.predict(XTest). And of course this will turn on an error because the eclf1.predict(XTest) returns an array, which doesn't have a predict() method.
Try just changing it to:
pred_results=eclf1.predict(XTest)
pred_result_probs = eclf1.predict_proba(XTest)

Related

Using Sequential Feature Selection for unsupervised learning

I am having problems using sklearn's SequentialFeatureSelector for feature selection prior to clustering.
Please see my code:
Shooting_ST_array = np.array(Shooting_ST)
kmeans1 = KMeans(n_clusters=4)
kmeans1 = kmeans1.fit(Shooting_ST_array)
sfs = SequentialFeatureSelector(estimator=kmeans1, n_features_to_select=5, direction='backward')
sfs.fit(Shooting_ST_array, y=None)
sfs.get_feature_names_out(Shooting_ST.columns)
Depending on whether I used forward or backward selection, the algorithm returns the first 5 or last 5 columns.
I am also getting the following attribute error:
AttributeError: 'NoneType' object has no attribute 'split'
Does anyone have an idea what I am doing wrong? I have unfortunately not found any example implementations of the method for unlabelled data.

MLPClassifier in BaggingClassifier

Is there any way I can use partial_fit() in a BagginClassifier() which contains multiple MLPClassifier()?
My problem is binary classification, something like this:
clf = MLPClassifier()
model = BaggingClassifier(base_estimator=clf)
model.partial_fit(x, y, classes=[0, 1])
It keeps me giving this error:
AttributeError: 'BaggingClassifier' object has no attribute 'partial_fit'
Seems like it isn't. The documentation of sklearn gave the following list of modules that support partial_fit:
sklearn.naive_bayes.MultinomialNB
sklearn.naive_bayes.BernoulliNB
sklearn.linear_model.Perceptron
sklearn.linear_model.SGDClassifier
sklearn.linear_model.PassiveAggressiveClassifier
sklearn.linear_model.SGDRegressor
sklearn.linear_model.PassiveAggressiveRegressor
sklearn.cluster.MiniBatchKMeans
sklearn.decomposition.MiniBatchDictionaryLearning
sklearn.cluster.MiniBatchKMeans

What is the need to return a function object while creating a data set using tensorflow

I am new to Machine Learning and I am trying to create a Machine Learning Model using the Tensorflow API from the tutorial in the Tensorflow documentation from here
But I am having trouble understanding this part of the code
def make_input_fn(data_df, label_df, num_epochs=10, shuffle=True, batch_size=32):
def input_function(): # inner function, this will be returned
ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df)) # create tf.data.Dataset object with data and its label
if shuffle:
ds = ds.shuffle(1000) # randomize order of data
ds = ds.batch(batch_size).repeat(num_epochs) # split dataset into batches of 32 and repeat process for number of epochs
return ds # return a batch of the dataset
return input_function # return a function object for use
Then storing the output of the function in a variable
train_input_fn = make_input_fn(dftrain, y_train)
And at last training the model with the data set
linear_est.train(train_input_fn)
I failed to realize what we are trying to do when by just returning the function name of the inner-function in make_input_function instead of just returning our data set and passing it to train the model.
I am a beginner in Python and just started to learn Machine Learning and I am unable to find a proper answer to my question so if anyone can kindly explain it in a beginner friendly way I would be very much obliged.
I failed to realize what we are trying to do when by just returning the function name of the inner-function in make_input_function instead of just returning our data set and passing it to train the model.
In python programming, this is called Currying, It is used to transform multiple-argument function into single argument function by evaluating incremental nesting of function arguments. Currying also mends one argument to another forms a relative pattern while execution.
In tensorflow, based on the documentation (https://www.tensorflow.org/api_docs/python/tf/estimator/LinearClassifier#train).
train(
input_fn, hooks=None, steps=None, max_steps=None, saving_listeners=None
)
The method train of the estimator is expecting a parameter input_fn. The reason is that everytime you call the Estimator.train() it will create a new graph by invoking either input_fn and model_fn and connecting them together. If you supply either a tensor or a dataset it will lead to different errors.

PySAL OLS Model: AttributeError: 'OLS' object has no attribute 'predict'

I have divided my data into training and validation samples and have successfully fit my model with three types of linear models. What I cannot figure out how to do is apply the model to the validation sample to evaluate the fit. When I attempt to apply the model to the holdout sample (sorry, I know that this isn't a reproducible example but I think that the issue is pretty clear. I'm just putting this snippet here for completeness. Please be gentle!):
valid = validation.loc[:, x + [ "sale_amt"]]
holdout1 = m1.predict(valid)
I get the following error message:
AttributeError Traceback (most recent call last)
in ()
8
9 valid = validation.loc[:, x + [ "sale_amt"]]
---> 10 holdout1 = m1.predict(valid)
AttributeError: 'OLS' object has no attribute 'predict'`
Other Python OLS regression packages have a 'predict' method, but it doesn't seem that PySAL does. I realize that the function coefficients (betas) are available and will pursue applying them to my validation data directly, but I was hoping that there is a simple answer that I just missed.
I apologize if it is bad form to answer my own question, but I did come up with a solution. I contacted Daniel Arribas-Bel, one of the PySAL developers, and he helped guide me to the result I was seeking. Note that my PySAL OLS object is named m1, and my validation dataframe is called 'validation':
m1 = ps.model.spreg.OLS(...)
m1.intercept = m1.betas[0] # Get the intercept from the betas array
m1.coefficients = m1.betas[1:len(m1.betas)] # Get the coefficients from the betas array
validation['predicted_price'] = m1.intercept + validation.loc[:, x].dot( m1.coefficients)
Note that this is the method I would use for a non-spatial model adapted for the KNN model I built in PySAL and might not be technically fully correct for a spatial model. Caveat emptor.

Calling goodness of fit value with result.prsquared() in statsmodels results TypeError: 'numpy.float64' object is not callable

I am a complete beginner, and I'm currently doing this tutorial about logit regression models in python 3.4, with statsmodels 0.6.1 and Pycharm community version 4.5.1:
http://blog.yhathq.com/posts/logistic-regression-and-python.html
It runs smoothly. I try to add my own lines, to try out a few things.
After the part when I fit the data
train_cols = data.columns[1:]
logit = sm.logit(data['admit'], data[train_cols])
result = logit.fit()
and I print out the summary
print(result.summary())
I tried to take a little detour from the tutorial, to print only the Goodness of Fit measurement (in this case, it is a pseudo R-squared value). According to the documentation it is a method of result object (same as summary), so it should work like this:
print(result.prsquared())
However, running this code results in a TypeError on a line containing only print(result.prsquared()):
TypeError: 'numpy.float64' object is not callable
It really bugs me, because if I would to compare several models, pseudo R-squared would be my first choice to do it.
prsquared is an attribute, not a function. Try:
print(result.prsquared)

Categories

Resources