ModuleNotFoundError: No module named 'sklearn.tree.tree' - python

I'm trying to learn how to create a machine learning API with Flask, however, following this tutorial, the following error appears when I type the command python app.py:
Traceback (most recent call last):
File "C:\Users\Breno\Desktop\flask-api\app.py", line 24, in <module>
model = p.load(open(modelfile, 'rb'))
ModuleNotFoundError: No module named 'sklearn.tree.tree'
My code:
from flask import Flask, request, redirect, url_for, flash, jsonify
import numpy as np
import pickle as p
import pandas as pd
import json
#from sklearn.tree import DecisionTreeClassifier
app = Flask(__name__)
#app.route('/api/', methods=['POST'])
def makecalc():
j_data = request.get_json()
prediction = np.array2string(model.predict(j_data))
return jsonify(prediction)
if __name__ == '__main__':
modelfile = 'models/final_prediction.pickle'
model = p.load(open(modelfile, 'rb'))
app.run(debug=True,host='0.0.0.0')
Could someone help me please?

Pickles are not necessarily compatible across scikit-learn versions so this behavior is expected (and the use case is not supported). For more details, see https://scikit-learn.org/dev/modules/model_persistence.html#model-persistence. Replace pickle by joblib.
by example :
>>> from sklearn import svm
>>> from sklearn import datasets
>>> clf = svm.SVC()
>>> X, y= datasets.load_iris(return_X_y=True)
>>> clf.fit(X, y)
SVC()
>>> from joblib import dump, load
>>> dump(clf, open('filename.joblib','wb'))
>>> clf2 = load(open('filename.joblib','rb'))
>>> clf2.predict(X[0:1])
array([0])
>>> y[0]
0

For anyone coming across this issue (perhaps dealing with code written long ago), sklearn.tree.tree is now under sklearn.tree (as from v0.24). This can be see from the import error warning:
from sklearn.tree.tree import BaseDecisionTree
/usr/local/lib/python3.7/dist-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.tree.tree module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.tree. Anything that cannot be imported from sklearn.tree is now part of the private API.
warnings.warn(message, FutureWarning)
Instead, use:
from sklearn.tree import BaseDecisionTree

The problem is with the version of sklearn. Module sklearn.tree.tree is removed since version 0.24. Most probably, your model has been generated with the older version. Try installing an older version of sklearn:
pip uninstall scikit-learn
pip install scikit-learn==0.20.4

Related

Data frames, importing csv files in jupyter notebook

import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.externals import joblib
music_data = pd.read_csv('music.csv')
X = music_data.drop(columns=['genre'])
y = music_data['genre']
model = DecisionTreeClassifier()
model.fit(X, y)
joblib.dump(model, 'music-recommender.joblib' )
This is the output:
Traceback (most recent call last)
<ipython-input-28-802088a85507> in <module>
1 import pandas as pd
2 from sklearn.tree import DecisionTreeClassifier
----> 3 from sklearn.externals import joblib
4
5 music_data = pd.read_csv('music.csv')
ImportError: cannot import name 'joblib' from 'sklearn.externals' (C:\Users\christian\anaconda3\lib\site-packages\sklearn\externals\__init__.py)*****
I would appreciate if someone could help out, is there another syntax that I should use to import joblib from sklearn?
I had the same problem, I never knew that joblib was already install on my machine. just replace this line of code
from sklearn.externals import joblib
with
import joblib
The issue is here that you are trying to import joblib from scikit learn not reading csv file if you're wondering. Install joblib using pip install joblib and import it like this
import joblib
joblib.dump(model, 'music-recommender.joblib')
Remember to remove the import line from sklearn.externals. You can find more details here from the official documentation of scikit-learn.

name 'StandardScaler' is not defined

I have installed scikit-learn 0.23.2 via pip3, however, I get this error from my code
Traceback (most recent call last):
File "pca_iris.py", line 12, in <module>
X = StandardScaler().fit_transform(X)
NameError: name 'StandardScaler' is not defined
I searched the web and saw similar topics, however the version is correct and I don't know what to do further. The line import sklearn is in the top of the script.
Any thought?
StandardScaler is a method under sklearn.preprocessing. You need to import the StandardScaler like this:
from sklearn.preprocessing import StandardScaler
X = StandardScaler().fit_transform(X)
Or
import sklearn
X = sklearn.preprocessing.StandardScaler().fit_transform(X)

Why am I facing the Unpickling Error while trying to load my model in Flask?

I have tried the pickling and unpickling in jupyter lab and it seems to work as its supposed to but when i run my app.py it gives me following error.
C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Python 3.7\fakenews\venv\lib\site-packages\sklearn\utils\deprecation.py:144: FutureWarning: The sklearn.linear_model.passive_aggressive module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.linear_model. Anything that cannot be imported from sklearn.linear_model is now part of the private API.
warnings.warn(message, FutureWarning)
Traceback (most recent call last):
File "app.py", line 9, in <module>
model = pickle.load(open('model.pkl', 'rb'))
_pickle.UnpicklingError: invalid load key, '\x17'.
Here are my files.
Model.py
import pandas as pd
import itertools
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
#Read the data
df=pd.read_csv('C:\\Users\\Hp\\Desktop\\mini project\\news\\news.csv')
#Get shape and head
df.shape
df.head()
#DataFlair - Get the labels
labels=df.label
labels.head()
#DataFlair - Split the dataset
x_train,x_test,y_train,y_test=train_test_split(df['text'], labels, test_size=0.2, random_state=7)
#DataFlair - Initialize a TfidfVectorizer
tfidf_vectorizer=TfidfVectorizer(stop_words='english', max_df=0.7)
#DataFlair - Fit and transform train set, transform test set
tfidf_train=tfidf_vectorizer.fit_transform(x_train)
tfidf_test=tfidf_vectorizer.transform(x_test)
#DataFlair - Initialize a PassiveAggressiveClassifier
pac=PassiveAggressiveClassifier(max_iter=50)
pac.fit(tfidf_train,y_train)
#DataFlair - Predict on the test set and calculate accuracy
y_pred=pac.predict(tfidf_test)
score=accuracy_score(y_test,y_pred)
print(f'Accuracy: {round(score*100,2)}%')
#DataFlair - Build confusion matrix
confusion_matrix(y_test,y_pred, labels=['FAKE','REAL'])
App.py
import numpy as np
from flask import Flask, request, jsonify, render_template
import pickle
import pandas as pd
app = Flask(__name__)
model = pickle.load(open('model.pkl', 'rb'))
session.clear()
#app.route('/')
def home():
return render_template('index.html')
#app.route('/predict',methods=['POST'])
def predict():
news = request.form["newsT"]
test1 = pd.Series(news, index=[11000])
prediction = model.predict(test1)
return render_template('index.html', prediction_text='Sales should be $ {}'.format(prediction))
if __name__ == "__main__":
app.run(debug=True)
I used this code to pickle---------------
# Save the model as a pickle in a file
joblib.dump(pac, 'model.pkl')
# Load the model from the file
pac_from_joblib = joblib.load('model.pkl')
# Use the loaded model to make predictions
pac_from_joblib.predict(tfidf_test)
This works fine in the lab but seems to give unpickling error while loading app.py.
I am fairly new to this field and am unable to figure out whats wrong even after extensive search online.
It seems like it is an encoding issue. It may because you are joblib to save a pickle model and try to load that same model vai pickle library. Try loading the model again by using joblib
model = joblib.load('model.pkl')
I hope it helps.

'ValueError: non-string names in Numpy dtype unpickling' in flask(python2.7)

I have tried using pickle in Flask(python 2.7) but when I run the Flask script I got an error ValueError: non-string names in Numpy dtype unpickling. I have used RandomForestClassifier object to predict the request.I have attached my code below.Unable to solve this error from two days.Any help is appreciated.
import numpy as np
from flask import Flask, jsonify, abort, request
import pickle
from sklearn.externals import joblib
my_random_forest = pickle.load(open("iris_rfc.pkl", "rb"))
app = Flask(__name__)
#app.route('/api', methods=['POST'])
def make_predict():
data = request.get_json(force=True)
predict_request = [[data['sl'], data['sw'], data['pl'], data['pw']]
predict_request = np.array(predict_request)
y_hat = my_random_forest.predict(predict_request)
output = [y_hat[0]]`enter code here`
return jsonify(results=output)
if __name__ == '__main__':
app.run(debug=True)
I have the same problem. The reason seems to be a library version mismatch. The random forest classifier model was certainly generated with another sklearn version with a newer numpy version. Updating your libraries (numpy, etc.) should solve the problem.
The same behavior was reported by user drishit96 ValueError: non-string names in Numpy dtype unpickling only on AWS Lambda
I think the open("iris_rfc.pkl", "rb") is not necessary.
Try:
my_random_forest = pickle.load(path_to_iris_rfc.pkl)

Converting pmml file for random forest in python

I need to convert my random forest model into pmml format in python. I've imported sklearn2pmml from github and tried create a pmml file. I run the code below;
import pandas
import sklearn_pandas
iris = iris.csv
iris_df = pandas.concat((pandas.DataFrame(iris.data[:, :], columns = ["Sepal.Length", "sepal_width", "petal_length", "petal_width"]), pandas.DataFrame(iris.target, columns = ["species"])), axis = 1)
iris_mapper = sklearn_pandas.DataFrameMapper([('sepal_length',None),
('sepal_width', None),
('petal_width', None),
('petal_width', None),
('species',None)])
iris = iris_mapper.fit_transform(iris_df)
from sklearn.ensemble import RandomForestClassifier
iris_X = iris[:, 0:4]
iris_y = iris[:, 4]
iris_classifier = RandomForestClassifier(n_estimators=10)
iris_classifier.fit(iris_X, iris_y)
from sklearn2pmml import sklearn2pmml
sklearn2pmml(iris_classifier, iris_mapper, "randomforest.pmml")
However, I get an error;
TypeError: The pipeline object is not an instance of PMMLPipeline
Any suggestion what I am missing or another way to creat pmml format?
TypeError: The pipeline object is not an instance of PMMLPipeline
The first argument of the sklearn2pmml function call must be an instance of sklearn2pmml.PMMLPipeline. You're passing an instance of sklearn.ensemble.RandomForestClassifier instead.
Any suggestion what I am missing or another way to creat pmml format?
You're pairing a pre-historic code example with the latest version of the sklearn2pmml library. These are your options:
Upgrade code example to latest sklearn2pmml library version. Please take two minutes to read through the "Usage" section of its README.file.
Downgrade the sklearn2pmml library to 0.13.0 (or older) version.
sklearn2pmml() need a PMMLPipeline model, so try to pack iris_classifier with PMMLPipeline like this:
import pandas
import sklearn_pandas
from sklearn.datasets import load_iris
from sklearn2pmml.pipeline import PMMLPipeline
from sklearn.ensemble import RandomForestClassifier
d = load_iris()
iris_X = d.data
iris_y = d.target
iris_classifier = RandomForestClassifier(n_estimators=10)
#rfc_model = iris_classifier.fit(iris_X, iris_y)
pipeline_model = PMMLPipeline([('iris_classifier',
iris_classifier)]).fit(iris_X, iris_y)
from sklearn2pmml import sklearn2pmml
sklearn2pmml(pipeline_model, 'rfc.pmml', with_repr = True)

Categories

Resources