Sklearn decision tree plot does not appear - python

I am trying to follow scikit learn example on decision trees:
from sklearn.datasets import load_iris
from sklearn import tree
X, y = load_iris(return_X_y=True)
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, y)
When I try to plot the tree:
tree.plot_tree(clf.fit(iris.data, iris.target))
I get
NameError Traceback (most recent call last)
<ipython-input-2-e72b33a93ee6> in <module>
----> 1 tree.plot_tree(clf.fit(iris.data, iris.target))
NameError: name 'iris' is not defined

Your problem was different, but I ended up here through googling this issue and you have also same-ish issue present.
At least on windows matplotlib (which is used to show the tree with tree.plot_tree) will not show anything if you don't have plt.show() somewhere.
from sklearn import tree
import matplotlib.pyplot as plt
sometree = ....
tree.plot_tree(sometree)
plt.show() # mandatory on Windows

iris doesn't exist if you don't assign it. Use this line to plot:
tree.plot_tree(clf.fit(X, y))
You already assigned the X and y of load_iris() to a variable so you can use them.
Additionally, make sure the graphviz library's bin folder is in PATH.

Related

SVM problem - name 'model_SVC' is not defined

I have a problem with this code:
from sklearn import svm
model_SVC = SVC()
model_SVC.fit(X_scaled_df_train, y_train)
svm_prediction = model_SVC.predict(X_scaled_df_test)
The error message is
NameError
Traceback (most recent call
last) ~\AppData\Local\Temp/ipykernel_14392/1339209891.py in
----> 1 svm_prediction = model_SVC.predict(X_scaled_df_test)
NameError: name 'model_SVC' is not defined
Any ideas?
use:
from sklearn.svm import SVC
The line from sklearn import svm was incorrect. The correct way is
from sklearn.svm import SVC
The documentation is sklearn.svm.SVC. And when I choose this model, I'm mindful of the dataset size. Extracted:
The fit time scales at least quadratically with the number of samples and may be impractical beyond tens of thousands of samples. For large datasets consider using LinearSVC instead.
from sklearn.svm import LinearSVC
For more info you could read When should one use LinearSVC or SVC?

Import Error: cannot import name 'tree' from 'sklearn.tree'

I am on my second day of re-taking Python for the gazillionth time!
I am doing a tutorial on ML in Python, using the following code:
import sklearn.tree
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import tree
music_data = pd.read_csv('music.csv')
x = music_data.drop(columns=['genre'])
y = music_data['genre']
model = DecisionTreeClassifier()
model.fit(x,y)
tree.export_graphviz(model, out_file='music-recommender.dot',
feature_names=['age','gender'],
class_names= sorted(y.unique()),
label='all',
rounded=True,
filled=True)
I keep getting the following error:
ImportError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_13088/3820271611.py in <module>
2 import pandas as pd
3 from sklearn.tree import DecisionTreeClassifier
----> 4 from sklearn.tree import tree
5
6 music_data = pd.read_csv('music.csv')
ImportError: cannot import name 'tree' from 'sklearn.tree' (C:\Anaconda\lib\site-packages\sklearn\tree\__init__.py)
I've tried to find a solution online, but I don't think it's the version of Python/Anaconda because I literally just installed both. I also don't think it's the sklearn.tree since I was able to import DecisionClassifer.
As this answer indicates, you're looking at some older code; this is always a risk with programming. But there's another thing you need to know about your code.
First off, scikit-learn contains several modules, and almost everything you need from it is in one of those. In my experience, most people import things like this:
from sklearn.tree import DecisionTreeRegressor # A regressor class.
from sklearn.tree import plot_tree # A helpful function.
from sklearn.metrics import mean_squared_error # An evaluation function.
It looks like the tutorial wants something similar to plot_tree(). This new-ish function is much easier to use than the older Graphviz visualization. So unless you really need the DOT file for some reasons, you should be able to do this:
from sklearn.tree import plot_tree
sklearn.tree.plot_tree(model)
Bottom line: there will probably be more broken things in that material. So if I were you I'd either make a new environment with a version of sklearn matching whatever material you're using... or ditch that material and look for something newer.
from sklearn.tree import tree looks wrong. Did you mean from sklearn import tree ?
According to the official Scikit Learn Decision Trees Documentation you really do not need too much of importing.
It can be done simply as follows:
from sklearn import tree
import pandas as pd
music_data = pd.read_csv('music.csv')
X = music_data.drop(columns=['genre'])
y = music_data['genre']
model = tree.DecisionTreeClassifier()
model.fit(X,y)

Scikit learn and data set analysis

I am new to Scikit learn and I tried the first program they have given in their website the code is given below:
from sklearn import svm
from sklearn import datasets
clf = svm.SVC()
iris = datasets.load_iris()
X, y = iris.data, iris.target
clf.fit(X, y)
while I compile the last line I get the following error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: fit() missing 1 required positional argument: 'y'
pls help me with this issue.
Since the code runs fine in vanilla format. Most likely you have multiple environments interfering. Try running in a new virtualenv.

Python regression analysis error

I'm trying to run a regression analysis with the below mentioned code. I encounter ImportError: No module named statsmodels.api and No module named matplotlib.pyplot. Any suggestions will be appreciated to overcome this error.
import pandas as pd
import numpy as np
import seaborn as sns
from scipy import stats, integrate
import matplotlib.pyplot as plt
import statsmodels.api as sm
data = pd.read_csv("F:\Projects\Poli_Map\DAT_OL\MASTRTAB.csv")
# define the data/predictors as the pre-set feature names
df = pd.DataFrame(data.data, columns=data.feature_names)
# Put the target (IMR) in another DataFrame
target = pd.DataFrame(data.target, columns=["IMR"])
X = df["HH_LATR","COMM_TOILT","PWS"]
y = target["IMR"]
model = sm.OLS(y, X).fit()
predictions = model.predict(X) # make the predictions by the model
# Print out the statistics
model.summary()
plt.scatter(predictions, y, s=30, c='r', marker='+', zorder=10) #Plot graph
plt.xlabel("Independent variables")
plt.ylabel("Outcome variables")
plt.show()
I highly recommend that you install ANACONDA. This way the environment variables are set automatically and you don't need to worry about anything else. There are many useful packages (e.g. numpy, sympy, scipy) which are bundled with anaconda.
Moreover, based on personal experience I can tell you that using pip on windows and compiling from source (you need visual studio) is a pain in the neck sometimes. That's why ANACONDA has been conceived.
see : https://www.anaconda.com/download/
Hope this helps.

Cannot find reference to Python package (plt.cm.py)

I have a small issue with running code from a tutorial that isn't working as it should. It's not a syntax problem for sure. I'm working with scikit-learn and matplotlib, and I'm getting a warning message in my IDE "Cannot find reference 'gray_r' in 'cm.py'..." All my packages are installed properly (via pip) and have worked for sample programs except this.
Any advice?
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn import svm
digits = datasets.load_digits()
print(digits.data)
print(digits.target)
print(digits.images[0])
clf = svm.SVC(gamma=0.001, C=100)
print(len(digits.data))
x, y = digits.data[:-1], digits.target[:-1]
clf.fit(x,y)
print('Prediction:', clf.predict(digits.data[-1])
plt.imshow(digits.images[-1], cmap=plt.cm.gray_r, interpolation='nearest')
plt.show()
Well for starters your missing a closing parenthesis on your last print statement: print('Prediction:', clf.predict(digits.data[-1])) Other than that, this code runs on my computer with only a deprecation warning. What does the traceback say?

Categories

Resources