Using graphviz to plot decision tree in python - python

I am following the answer presented to a previous post: Is it possible to print the decision tree in scikit-learn?
from sklearn.datasets import load_iris
from sklearn import tree
from sklearn.externals.six import StringIO
import pydot
clf = tree.DecisionTreeClassifier()
iris = load_iris()
clf = clf.fit(iris.data, iris.target)
tree.export_graphviz(clf, out_file='tree.dot')
dot_data = StringIO()
tree.export_graphviz(clf, out_file=dot_data)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("iris.pdf")
Unfortunately, I cannot figure out the following error:
'list' object has no attribute 'write_pdf'
Does anyone know a way around this as the structure of the generated tree.dot file is a list?
Update
I have attempted using the web application http://webgraphviz.com/. This works, however, the decision tree conditions, together with the classes are not displayed. Is there any way to include these in the tree.dot file?

Looks like data that you collect in graph is of type list.
graph = pydot.graph_from_dot_data(dot_data.getvalue())
type(graph)
<type 'list'>
We are only interested in first element of the list.
So you can do this one of following of two ways,
1) Change line where you collect dot_data value in graph to
(graph, ) = pydot.graph_from_dot_data(dot_data.getvalue())
2) Or collect entire list in graph but just use first element to be sent to pdf
graph[0].write_pdf("iris.pdf")
Here is what I get as output of iris.pdf
Update
To get around path error,
Exception: "dot.exe" not found in path.
Install graphviz from here
Then use either following in your code.
import os
os.environ["PATH"] += os.pathsep + 'C:/Program Files (x86)/Graphviz2.38/bin/'
Or simply add following to your windows path in control panel.
C:\Program Files (x86)\Graphviz2.38\bin
As per graphviz documentation, it does not get added to windows path during installation.

Related

Python Databricks cannot visualise dtreeviz decision tree

I need to visualize a decision tree in dtreeviz in Databricks.
The code seems to be working fine.
However, instead of showing the decision tree it throws the following:
Out[23]: <dtreeviz.trees.DTreeViz at 0x7f5b27a91160>
Running the following code:
import pandas as pd
from sklearn import preprocessing, tree
from dtreeviz.trees import dtreeviz
Things = {'Feature01': [3,4,5,0],
'Feature02': [4,5,6,0],
'Feature03': [1,2,3,8],
'Target01': ['Red','Blue','Teal','Red']}
df = pd.DataFrame(Things,
columns= ['Feature01', 'Feature02',
'Feature02', 'Target01'])
label_encoder = preprocessing.LabelEncoder()
label_encoder.fit(df.Target01)
df['target'] = label_encoder.transform(df.Target01)
classifier = tree.DecisionTreeClassifier()
classifier.fit(df.iloc[:,:3], df.target)
dtreeviz(classifier,
df.iloc[:,:3],
df.target,
target_name='toy',
feature_names=df.columns[0:3],
class_names=list(label_encoder.classes_)
)
if you look into dtreeviz documentation you'll see that dtreeviz method just creates an object, and then you need to use function like .view() to show it. On Databricks, view won't work, but you can use .svg() method to generate output as SVG, and then use displayHTML function to show it. Following code:
viz = dtreeviz(classifier,
...)
displayHTML(viz.svg())
will give you desired output:
P.S. You need to have the dot command-line tool to generate output. It could be installed by executing in a cell of the notebook:
%sh apt-get install -y graphviz

Why is my code unable to read from my current file directory?

I've been doing a data science project with some friends and i need to import a data set from a file that is currently stored within my machine default downloads directory, but the code i have been working on is written for a windows machine and the file path different from my own machine, thus i can't import the data set as im not sure how i can replace the code so that it works. I've attempted to correct the path myself but i can't seem to get it to read its location, even though i think the path is correct. Below is my code.
import numpy as np
import pandas as pd
import os, sys
from sklearn.preprocessing import MinMaxScaler
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
#DataFlair - Read the data
df= pd.read_csv('/home/morgankemp/Downloads/parkinsons.data.')
df.head()
#DataFlair - Get the features and labels
features=df.loc[:,df.columns!='status'].values[:,1:]
labels=df.loc[:,'status'].values
#DataFlair - Get the count of each label (0 and 1) in labels
print(labels[labels==1].shape[0], labels[labels==0].shape[0])
#DataFlair - Scale the features to between -1 and 1
scaler=MinMaxScaler((-1,1))
x=scaler.fit_transform(features)
y=labels
#DataFlair - Split the dataset
x_train,x_test,y_train,y_test=train_test_split(x, y, test_size=0.2, random_state=7)
#DataFlair - Train the model
model=XGBClassifier()
model.fit(x_train,y_train)
# DataFlair - Calculate the accuracy
y_pred=model.predict(x_test)
print(accuracy_score(y_test, y_pred)*100)
I can't get the correct file path, i have entered what i think is the correct file path for the file, that being the default downloads directory on my user, as well as making sure that all of the characters are case sensitive.
Whenever i run the code i continue to get an error saying the following: "FileNotFoundError: [Errno 2] No such file or directory: '/morgankemp#pop-os/Home/morgankemp/Downloads/parkinsons.data'"
Additional information is that I'm running Pop_OS and am using VSC. Any and all help would be appreciated.
Most probably, your file is 'parkinsons.data.csv'.
Replace
df= pd.read_csv('/home/morgankemp/Downloads/parkinsons.data.')
With
df= pd.read_csv('/home/morgankemp/Downloads/parkinsons.data.csv')

Decision trees graph not working python 3.6 not saving

I am trying to print s decesion tree in python but for some reason i am getting an error message:
InvocationException: GraphViz's executables not found
import graphviz
tree = DecisionTreeClassifier(criterion='entropy',max_depth=18,random_state=0)
tree.fit(X_train, y_train)
dot_data = StringIO()
export_graphviz(tree,out_file = dot_data,filled=True,rounded=True,feature_names=X_train.columns.values.tolist(),class_names = ['0', '1'],special_characters=True)
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_png("C:/Temp/Tree.png")
print('Visible tree plot saved as png.')
graph
You need to add graphviz to PATH. Find your own version of this:
C:\Users\Env\Library\bin\graphviz
And add it to PATH.

Python - Graphviz - Remove legend on nodes of DecisionTreeClassifier

I have a decision tree classifier from sklearn and I use pydotplus to show it.
However I don't really like when there is a lot of informations on each nodes for my presentation (entropy, samples and value).
To explain it easier to people I would like to only keep the decision and the class on it.
Where can I modify the code to do it ?
Thank you.
Accoring to the documentation, it is not possible to abstain from setting the additional information inside boxes. The only thing that you may implicitly omit is the impurity parameter.
However, I have done it the other explicit way which is somewhat crooked. First, I save the .dot file setting the impurity to False. Then, I open it up and convert it to a string format. I use regex to subtract the redundant labels and resave it.
The code goes like this:
import pydotplus # pydot library: install it via pip install pydot
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import export_graphviz
from sklearn.datasets import load_iris
data = load_iris()
clf = DecisionTreeClassifier()
clf.fit(data.data, data.target)
export_graphviz(clf, out_file='tree.dot', impurity=False, class_names=True)
PATH = '/path/to/dotfile/tree.dot'
f = pydot.graph_from_dot_file(PATH).to_string()
f = re.sub('(\\\\nsamples = [0-9]+)(\\\\nvalue = \[[0-9]+, [0-9]+, [0-9]+\])', '', f)
f = re.sub('(samples = [0-9]+)(\\\\nvalue = \[[0-9]+, [0-9]+, [0-9]+\])\\\\n', '', f)
with open('tree_modified.dot', 'w') as file:
file.write(f)
Here are the images before and after modification:
In your case, there seems to be more parameters in boxes, so you may want to tweak the code a little bit.
I hope that helps!

Python, PyDot and DecisionTree

I'm trying to visualize my DecisionTree, but getting the error
The code is:
X = [i[1:] for i in dataset]#attribute
y = [i[0] for i in dataset]
clf = tree.DecisionTreeClassifier()
dot_data = StringIO()
tree.export_graphviz(clf.fit(train_X, train_y), out_file=dot_data)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("tree.pdf")
And the error is
Traceback (most recent call last):
if data.startswith(codecs.BOM_UTF8):
TypeError: startswith first arg must be str or a tuple of str, not bytes
Can anyone explain me whats the problem? Thank you a lot!
In case of using Python 3, just use pydotplus instead of pydot. It will also have a soft installation process by pip.
import pydotplus
<your code>
dot_data = StringIO()
tree.export_graphviz(clf, out_file=dot_data)
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("iris.pdf")
I had the same exact problem and just spent a couple hours trying to figure this out. I can't guarantee what I share here will work for others but it may be worth a shot.
I tried installing official pydot packages but I have Python 3 and they simply did not work. After finding a note in a thread from one of the many websites I scoured through, I ended up installing this forked repository of pydot.
I went to graphviz.org and installed their software on my Windows 7 machine. If you don't have Windows, look under their Download section for your system.
After successful install, in Environment Variables (Control Panel\All Control Panel Items\System\Advanced system settings > click Environment Variables button > under System variables I found the variable path > click Edit... > I added ;C:\Program Files (x86)\Graphviz2.38\bin to the end in the Variable value: field.
To confirm I can now use dot commands in the Command Line (Windows Command Processor), I typed dot -V which returned dot - graphviz version 2.38.0 (20140413.2041).
In the below code, keep in mind that I'm reading a dataframe from my clipboard. You might be reading it from file or whathaveyou.
In IPython Notebook:
import pandas as pd
import numpy as np
from sklearn import tree
import pydot
from IPython.display import Image
from sklearn.externals.six import StringIO
df = pd.read_clipboard()
X = df[df.columns[:-1]]
y = df[df.columns[-1]]
dtr = tree.DecisionTreeRegressor(max_depth=3)
dtr.fit(X, y)
dot_data = StringIO()
tree.export_graphviz(dtr, out_file=dot_data, feature_names=X.columns)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
Image(graph.create_png())
Alternatively, if you're not using IPython, you can generate your own image from the command line as long as you have graphviz installed (step 2 above). Using my same example code above, you use this line after fitting the model:
tree.export_graphviz(dtr.tree_, out_file='treepic.dot', feature_names=X.columns)
then open up command prompt where the treepic.dot file is and enter this command line:
dot -T png treepic.dot -o treepic.png
A .png file should be created with your decision tree.
The line in question is checking to see if the stream/file is encoded as UTF-8
Instead of:
if data.startswith(codecs.BOM_UTF8):
use:
if codecs.BOM_UTF8 in data:
You will likely have more success...

Categories

Resources