AttributeError: 'RandomForestClassifier' object has no attribute 'fit_transform' - python

I am getting an error
AttributeError: 'RandomForestClassifier' object has no attribute 'fit_transform'
However, there is a method named fit_transform(X,y) in sklearn.ensemble.RandomForestClassifier. This can be seen here
I don't understand why I am getting this error and how do I resolve it.
Here is the code snippet-
from sklearn.ensemble import RandomForestClassifier
import pickle
import sys
import numpy as np
X1=np.array(pickle.load(open('X2g_train.p','rb')))
X2=np.array(pickle.load(open('X3g_train.p','rb')))
X3=np.array(pickle.load(open('X4g_train.p','rb')))
X4=np.array(pickle.load(open('Xhead_train.p','rb')))
X=np.hstack((X2,X1,X3,X4))
y = np.array(pickle.load(open('y.p','rb')))
rf=RandomForestClassifier(n_estimators=200)
Xr=rf.fit_transform(X,y)

There's no such method in the scikit-learn API documentation
To train your model and get predictions, you need to do like this
rf = RandomForestClassifier()
# train the model
rf.fit(X_train, y_train)
# get predictions
predictions = rf.predict(X_test)

Related

AttributeError: 'ColumnTransformer' object has no attribute '_name_to_fitted_passthrough'

I am predicting the IPL match win probability. While deploying the model using streamlit it show this error:
AttributeError: 'ColumnTransformer' object has no attribute '_name_to_fitted_passthrough'
That's my code:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
trf = ColumnTransformer([('trf',OneHotEncoder(sparse=False,drop='first'),
['batting_team','bowling_team','city'])],remainder='passthrough')
pipeline code
pipe = Pipeline(steps=[
('step1',trf),
('step2',LogisticRegression(solver='liblinear'))])

kds library gives AttributeError: module 'kds' has no attribute 'metrics'

Hi I am trying to use pypi kds package.
I have installed it with: pip install kds
I didn't have any installation problem. But when I ran the following example script:
# REPRODUCABLE EXAMPLE
# Load Dataset and train-test split
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn import tree
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33,random_state=3)
clf = tree.DecisionTreeClassifier(max_depth=1,random_state=3)
clf = clf.fit(X_train, y_train)
y_prob = clf.predict_proba(X_test)
# The magic happens here
import kds
kds.metrics.report(y_test, y_prob)
It gives an error:
AttributeError Traceback (most recent call last)
<ipython-input-4-fa00bcb248e7> in <module>
13 # The magic happens here
14 import kds
---> 15 kds.metrics.report(y_test, y_prob)
AttributeError: module 'kds' has no attribute 'metrics'
Issue is resolved in the latest version. Please update the package using pip install kds
Also don't forget the y_prob[:,1] in the last line. (output of scikit learn has 2 columns, so select 1 column)
# The magic happens here
import kds
kds.metrics.plot_lift(y_test, y_prob[:,1])

How to add a function into utils module on titanic data set?

When I run the following code. I am having an error saying AttributeError: module 'python_utils' has no attribute 'clean_data', but I know know how to fix it.
import python_utils
import pandas as pd
from sklearn import linear_model
def clean_data(data):
data["Fare"]=data["Fare"].fillna(data["Fare"].dropna().median())
data["Age"]=data["Age"].fillna(data["age"].dropna().median())
data.loc[data["Sex"]=="male", "Sex"]=0
data.loc[data["Sex"]=="female", "Sex"]=1
data["Embarked"]=data["Embarked"].fillna("S")
data.loc[data["Embarked"]=="S",Embarked]=0
data.loc[data["Embarked"]=="C",Embarked]=1
data.loc[data["Embarked"]=="Q",Embarked]=2
train=pd.read_csv('train.csv')
python_utils.clean_data(train)
target=train["Survived"].values
features=train[["Pclass","Age","Sex","SibSp","Parch"]].values
classifier=linear_model.logisticRegression()
classifier=classifier.fit(features,target)
print(classifier_.score(features,target))

Why am I facing the Unpickling Error while trying to load my model in Flask?

I have tried the pickling and unpickling in jupyter lab and it seems to work as its supposed to but when i run my app.py it gives me following error.
C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Python 3.7\fakenews\venv\lib\site-packages\sklearn\utils\deprecation.py:144: FutureWarning: The sklearn.linear_model.passive_aggressive module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.linear_model. Anything that cannot be imported from sklearn.linear_model is now part of the private API.
warnings.warn(message, FutureWarning)
Traceback (most recent call last):
File "app.py", line 9, in <module>
model = pickle.load(open('model.pkl', 'rb'))
_pickle.UnpicklingError: invalid load key, '\x17'.
Here are my files.
Model.py
import pandas as pd
import itertools
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
#Read the data
df=pd.read_csv('C:\\Users\\Hp\\Desktop\\mini project\\news\\news.csv')
#Get shape and head
df.shape
df.head()
#DataFlair - Get the labels
labels=df.label
labels.head()
#DataFlair - Split the dataset
x_train,x_test,y_train,y_test=train_test_split(df['text'], labels, test_size=0.2, random_state=7)
#DataFlair - Initialize a TfidfVectorizer
tfidf_vectorizer=TfidfVectorizer(stop_words='english', max_df=0.7)
#DataFlair - Fit and transform train set, transform test set
tfidf_train=tfidf_vectorizer.fit_transform(x_train)
tfidf_test=tfidf_vectorizer.transform(x_test)
#DataFlair - Initialize a PassiveAggressiveClassifier
pac=PassiveAggressiveClassifier(max_iter=50)
pac.fit(tfidf_train,y_train)
#DataFlair - Predict on the test set and calculate accuracy
y_pred=pac.predict(tfidf_test)
score=accuracy_score(y_test,y_pred)
print(f'Accuracy: {round(score*100,2)}%')
#DataFlair - Build confusion matrix
confusion_matrix(y_test,y_pred, labels=['FAKE','REAL'])
App.py
import numpy as np
from flask import Flask, request, jsonify, render_template
import pickle
import pandas as pd
app = Flask(__name__)
model = pickle.load(open('model.pkl', 'rb'))
session.clear()
#app.route('/')
def home():
return render_template('index.html')
#app.route('/predict',methods=['POST'])
def predict():
news = request.form["newsT"]
test1 = pd.Series(news, index=[11000])
prediction = model.predict(test1)
return render_template('index.html', prediction_text='Sales should be $ {}'.format(prediction))
if __name__ == "__main__":
app.run(debug=True)
I used this code to pickle---------------
# Save the model as a pickle in a file
joblib.dump(pac, 'model.pkl')
# Load the model from the file
pac_from_joblib = joblib.load('model.pkl')
# Use the loaded model to make predictions
pac_from_joblib.predict(tfidf_test)
This works fine in the lab but seems to give unpickling error while loading app.py.
I am fairly new to this field and am unable to figure out whats wrong even after extensive search online.
It seems like it is an encoding issue. It may because you are joblib to save a pickle model and try to load that same model vai pickle library. Try loading the model again by using joblib
model = joblib.load('model.pkl')
I hope it helps.

Converting pmml file for random forest in python

I need to convert my random forest model into pmml format in python. I've imported sklearn2pmml from github and tried create a pmml file. I run the code below;
import pandas
import sklearn_pandas
iris = iris.csv
iris_df = pandas.concat((pandas.DataFrame(iris.data[:, :], columns = ["Sepal.Length", "sepal_width", "petal_length", "petal_width"]), pandas.DataFrame(iris.target, columns = ["species"])), axis = 1)
iris_mapper = sklearn_pandas.DataFrameMapper([('sepal_length',None),
('sepal_width', None),
('petal_width', None),
('petal_width', None),
('species',None)])
iris = iris_mapper.fit_transform(iris_df)
from sklearn.ensemble import RandomForestClassifier
iris_X = iris[:, 0:4]
iris_y = iris[:, 4]
iris_classifier = RandomForestClassifier(n_estimators=10)
iris_classifier.fit(iris_X, iris_y)
from sklearn2pmml import sklearn2pmml
sklearn2pmml(iris_classifier, iris_mapper, "randomforest.pmml")
However, I get an error;
TypeError: The pipeline object is not an instance of PMMLPipeline
Any suggestion what I am missing or another way to creat pmml format?
TypeError: The pipeline object is not an instance of PMMLPipeline
The first argument of the sklearn2pmml function call must be an instance of sklearn2pmml.PMMLPipeline. You're passing an instance of sklearn.ensemble.RandomForestClassifier instead.
Any suggestion what I am missing or another way to creat pmml format?
You're pairing a pre-historic code example with the latest version of the sklearn2pmml library. These are your options:
Upgrade code example to latest sklearn2pmml library version. Please take two minutes to read through the "Usage" section of its README.file.
Downgrade the sklearn2pmml library to 0.13.0 (or older) version.
sklearn2pmml() need a PMMLPipeline model, so try to pack iris_classifier with PMMLPipeline like this:
import pandas
import sklearn_pandas
from sklearn.datasets import load_iris
from sklearn2pmml.pipeline import PMMLPipeline
from sklearn.ensemble import RandomForestClassifier
d = load_iris()
iris_X = d.data
iris_y = d.target
iris_classifier = RandomForestClassifier(n_estimators=10)
#rfc_model = iris_classifier.fit(iris_X, iris_y)
pipeline_model = PMMLPipeline([('iris_classifier',
iris_classifier)]).fit(iris_X, iris_y)
from sklearn2pmml import sklearn2pmml
sklearn2pmml(pipeline_model, 'rfc.pmml', with_repr = True)

Categories

Resources