I am on my second day of re-taking Python for the gazillionth time!
I am doing a tutorial on ML in Python, using the following code:
import sklearn.tree
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import tree
music_data = pd.read_csv('music.csv')
x = music_data.drop(columns=['genre'])
y = music_data['genre']
model = DecisionTreeClassifier()
model.fit(x,y)
tree.export_graphviz(model, out_file='music-recommender.dot',
feature_names=['age','gender'],
class_names= sorted(y.unique()),
label='all',
rounded=True,
filled=True)
I keep getting the following error:
ImportError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_13088/3820271611.py in <module>
2 import pandas as pd
3 from sklearn.tree import DecisionTreeClassifier
----> 4 from sklearn.tree import tree
5
6 music_data = pd.read_csv('music.csv')
ImportError: cannot import name 'tree' from 'sklearn.tree' (C:\Anaconda\lib\site-packages\sklearn\tree\__init__.py)
I've tried to find a solution online, but I don't think it's the version of Python/Anaconda because I literally just installed both. I also don't think it's the sklearn.tree since I was able to import DecisionClassifer.
As this answer indicates, you're looking at some older code; this is always a risk with programming. But there's another thing you need to know about your code.
First off, scikit-learn contains several modules, and almost everything you need from it is in one of those. In my experience, most people import things like this:
from sklearn.tree import DecisionTreeRegressor # A regressor class.
from sklearn.tree import plot_tree # A helpful function.
from sklearn.metrics import mean_squared_error # An evaluation function.
It looks like the tutorial wants something similar to plot_tree(). This new-ish function is much easier to use than the older Graphviz visualization. So unless you really need the DOT file for some reasons, you should be able to do this:
from sklearn.tree import plot_tree
sklearn.tree.plot_tree(model)
Bottom line: there will probably be more broken things in that material. So if I were you I'd either make a new environment with a version of sklearn matching whatever material you're using... or ditch that material and look for something newer.
from sklearn.tree import tree looks wrong. Did you mean from sklearn import tree ?
According to the official Scikit Learn Decision Trees Documentation you really do not need too much of importing.
It can be done simply as follows:
from sklearn import tree
import pandas as pd
music_data = pd.read_csv('music.csv')
X = music_data.drop(columns=['genre'])
y = music_data['genre']
model = tree.DecisionTreeClassifier()
model.fit(X,y)
Related
This is my code.
I use jupyter notebook trying to start machine learning after learning the basics so if you have other tips much appreciated.
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
music_data = pd.read_csv('music.csv')
X = music_data.drop(columns = ['genre'])
y = music_data['genre']
model = DecisionTreeClassifier
model.fit(X,y)
music_data
and I get an error "model.fit is missing the positional argument y".
I want to plot the loss_curve by using the following code:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor
def plotCurves(Xtrain,ytrain,Xval,yval):
solver=["lbfgs", "sgd", "adam"]
for i in solver:
mlp=MLPRegressor(activation='relu',max_iter=1000,solver=i)
mlp.fit(Xtrain,ytrain)
pred=mlp.predict(Xval)
print (mlp.score(Xval,yval))
pd.DataFrame(mlp.loss_curve_).plot()
However, when I run my code the following error appears:
'MLPRegressor' object has no attribute 'loss_curve_'
and in the Anaconda IDE version 1.9.7 it appears this method when I am coding.
What can I try to solve this?
Only the stochastic solvers will expose a loss_curve_ attribute on the estimator after fit, so in your first iteration it fails with the lbfgs solver. You can verify this with the following:
from sklearn.datasets import make_classification
from sklearn.neural_network import MLPRegressor
X, y = make_classification(n_samples=5)
solver=[
"lbfgs",
"sgd",
"adam"
]
for i in solver:
mlp = MLPRegressor(activation='relu',solver=i)
mlp.fit(X,y)
print(hasattr(mlp, "loss_curve_"))
False
True
True
If you want to access this attribute, you'll want to stick with either the adam or sgd solver.
I have the code below and this code work only with the binary class so how can I use with three classes.
from sklearn.tree import DecisionTreeClassifier
import pandas as pd
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import scikitplot as skp
orgnal_data = pd.read_excel("movie.xls")
# Program extracting first column
text = orgnal_data.iloc[:,0]
lable = orgnal_data.iloc[:,1]
x_train,x_test,y_train,y_test=train_test_split(fe,lable,test_size=0.30,random_state=40)
DT = DecisionTreeClassifier()
DT_y = DT.fit(x_train,y_train).predict(x_test)
clf_names = ['Decision Tree']
skp.metrics.plot_calibration_curve(y_test,DT_y,clf_names)
plt.show()
Since you use scikit-plot module, there is no function for a multiclass problem.
Read the source code here:
This function currently only works for binary classification.
So you can either 1) modify the source code or 2) open a github issue and request a function for multiclass problems.
EDIT 1:
Using scikit-learn you have some ML models that can handle multiclass problems. For example for the LinearSVC function here, the multiclass support is handled according to a one-vs-the-rest scheme.
So you can actually have models like this and then use the plot_calibration_curve function for each case (one VS rest) separately.
I am trying to follow scikit learn example on decision trees:
from sklearn.datasets import load_iris
from sklearn import tree
X, y = load_iris(return_X_y=True)
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, y)
When I try to plot the tree:
tree.plot_tree(clf.fit(iris.data, iris.target))
I get
NameError Traceback (most recent call last)
<ipython-input-2-e72b33a93ee6> in <module>
----> 1 tree.plot_tree(clf.fit(iris.data, iris.target))
NameError: name 'iris' is not defined
Your problem was different, but I ended up here through googling this issue and you have also same-ish issue present.
At least on windows matplotlib (which is used to show the tree with tree.plot_tree) will not show anything if you don't have plt.show() somewhere.
from sklearn import tree
import matplotlib.pyplot as plt
sometree = ....
tree.plot_tree(sometree)
plt.show() # mandatory on Windows
iris doesn't exist if you don't assign it. Use this line to plot:
tree.plot_tree(clf.fit(X, y))
You already assigned the X and y of load_iris() to a variable so you can use them.
Additionally, make sure the graphviz library's bin folder is in PATH.
I have 52 CSV files in a folder. I want to build a model based on this data. That's why I want to do Leave one out cross-validation on these data. How can I do this using sci-kit learn in python?
I tried from sci kit document and also search many resources.But I didn't found the solution. I have tried this code.
import glob
import numpy as np
import pandas as pd
from sklearn.cross_validation import LeaveOneOut
path=r'...................\Data\New design process data'
filelist=glob.glob(path + "/*.csv")
loo=LeaveOneOut()
for train,test in loo.split(filelist):
print("%s %s" % (train, test))
But it showed errors.
init() missing 1 required positional argument: 'n'
I am new in python as well as sci-kit learn. If anyone can help me, It would be a great convenience.
You should use the newer version of the module, which is located in sklearn.model_selection instead of sklearn.cross_validation. (The cross_validation module was depricated in 0.18.) Using this version, you can instantiate the class without the positional argument, and it also does not fail when you try to call split.
from sklearn.model_selection import LeaveOneOut
X = np.array([[1, 2], [3, 4]])
y = np.array([1, 2])
loo = LeaveOneOut() # works without passing an argument
loo.get_n_splits(X) # returns 2