An empty block always appears on my decision tree created by Python - python

Imgur
Just like the situation in the picture, my decision tree always has an empty block.
I have already searched for a while, but still can't find the solution.
My codes are listed below, running in jupyter notebook.
Hoping for your help.
from sklearn import tree
from sklearn import datasets
import pydotplus
wine = datasets.load_wine()
X = wine.data
Y = wine.target
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size = 0.3)
clf = tree.DecisionTreeClassifier(criterion = 'gini').fit(X_train,Y_train)
clf.score(X_train,Y_train)
clf.predict(X_test)
feature_names = wine.feature_names
target_name = wine.target_names
import graphviz
dot_data = tree.export_graphviz(clf,
out_file = None,
feature_names = feature_na,
class_names = target_name,
filled = None,
rounded = True,)
dot_data = dot_data.replace('helvetica', 'Microsoft JhengHei')
graph = pydotplus.graph_from_dot_data(dot_data)
graph.write_pdf('wine.pdf')

I think there is a very high chance that your package manager messed up. Have you used pip to install the packages? Try installing via conda (I recommend creating conda virtual environment).
Also I think you have a typo at line 24: feature_names = feature_na(mes). By installing packages via conda + fixing the typo and running your code I got the following tree.

Related

Why is my joblib file not being saved in the same directory as my Jupyter file?

So, I'm trying to create a persistent model for a machine learning project. I'm using joblib.dump to do so. Here is the code:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from joblib import dump
music_data = pd.read_csv(r"C:\Users\obaro\OneDrive\Documents\music.csv")
X = music_data.drop(columns=['genre'])
y = music_data['genre']
model = DecisionTreeClassifier()
model.fit(X, y)
dump(model, "music-recommender.joblib")
In the Jupiter notebook, the output seems to provide what I'm looking for,
['music-recommender.joblib'] just a string of the file name. I can't find this file anywhere though. What's going on? I'm running windows 10 if that helps. Thanks!
You will need to either save your file with an absolute path, or set the relative path relative to your current working directory:
music_data = pd.read_csv(r"C:\Users\obaro\OneDrive\Documents\music.csv")
X = music_data.drop(columns=['genre'])
y = music_data['genre']
model = DecisionTreeClassifier()
model.fit(X, y)
dump(model, r"C:\Users\obaro\OneDrive\Documents\music-recommender.joblib")
To find out your current working directory, use
import os
os.getcwd()
and then set relative from there.

Huge problem to display decision tree in Jupyter Notebook in Python: ExecutableNotFound?

I have a problem to create and display decision tree in Jupyter Notebook using Python.
My code is as below:
X = data.drop(["Risk"], axis=1)
y = data["Risk"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
from sklearn.tree import DecisionTreeClassifier
klasyfikator = DecisionTreeClassifier(criterion = "gini", random_state=0, max_depth=4, min_samples_leaf=1)
klasyfikator.fit(X = X, y = y)
data = export_graphviz(klasyfikator,out_file=None,feature_names=X.columns,class_names=["0", "1"],
filled=True, rounded=True,
special_characters=True)
graph = graphviz.Source(data)
graph
Generally this decision tree concerns credit risk research 0 - will not pay 1 - will pay.
When I use code above, I have error like this:
ExecutableNotFound: failed to execute ['dot', '-Tsvg'], make sure the Graphviz executables are on your systems' PATH
I have already tried many solutions from StackOverflow, for example:
pip install graphviz
Conta install graphiz
I Downloaded Graphviz from http://www.graphviz.org/download/
I added to the PATH environment variable:
C:\Program Files (x86)\Graphviz2.38\bin
And there is still error described above. What can I do? what should I do ? Please help me guys because I'm losing hope of being able to draw this tree. Thank you!
Moreover, when I added by using this code:
import os
os.environ["PATH"] += os.pathsep + 'C:\Program Files (x86)\Graphviz2.38\bin'
I have in PATH something like this: C:\\Program Files (x86)\\Graphviz2.38\x08in it is not the same, what can I do ?
With latest version of sklearn, you can directly plot the decision tree without graphviz.
Use:
from sklearn.tree import plot_tree
plot_tree(klasyfikator)
Read more here.

InvalidArgumentError: input_1_1:0 is both fed and fetched

I am using the visualize_activation function in keras-vis:
from vis.visualization import visualize_activation, visualize_cam
from vis.utils import utils
from keras import activations
from matplotlib import pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (18, 6)
# Utility to search for layer index by name.
# Alternatively we can specify this as -1 since it corresponds to the last
# layer.
layer_idx = utils.find_layer_idx(my_model, 're_lu_3')
patches = np.expand_dims(patches,axis=3)
# This is the output node we want to maximize.
filter_idx = None
img = visualize_activation(my_model, layer_idx, filter_indices=filter_idx,
seed_input=patches[0])
plt.imshow(img[..., 0])
However, this throws the error: InvalidArgumentError: input_1_1:0 is both fed and fetched.
How to resolve this? I tried creating a copy of my_model using tf.identity, but that didn't work.
keras-vis on pip seems broken, try installing directly on the GitHub master branch:
pip uninstall vis
pip install git+https://github.com/raghakot/keras-vis.git -U
Using the version on pip, both the MNIST and ResNet example outputs the error: InvalidArgumentError: input_1_1:0 is both fed and fetched.. After updating, they both works well.

Attribute error after pickle

First I followed an iris tutorial and it worked great! the program ran fine and did everything it was supposed to do. Then I started working on a pickle tutorial to pickle data then open it again ... then everything went crazy. Now I have a pycache folder in my code folder that wasn't there and I am getting the following error:
AttributeError: module 'numpy' has no attribute 'dtype'
So far I have tried completely wiping scipy, numpy, sklearn, and pandas from my computer and reinstalling. Then I tried disabling rapport (I'm on an Ubuntu machine) because a part of the bug long error code kept talking about it.
Below is the program I ran that I think caused this.
Save Model Using Pickle
import pandas
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
import pickle
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pandas.read_csv(url, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
test_size = 0.33
seed = 7
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, test_size=test_size, random_state=seed)
# Fit the model on 33%
model = LogisticRegression()
model.fit(X_train, Y_train)
# save the model to disk
filename = 'finalized_model.sav'
pickle.dump(model, open(filename, 'wb'))
Upon further investigation I realized I saved the code as pickle.py on my computer (in the same folder that the pycache was appearing). I changed it to pickle1.py and now everything works. Lesson learned don't name code after modules...
I might guess that your numpy installation somehow got stepped on. Maybe try "pip install --upgrade --force-reinstall numpy" in your command line?
Or maybe it's a line that says "numpy.dtype" somewhere is being used wrong. You'd have to share at least that line of code to see that though.
Just wild guesses without having your entire setup.

Cannot find reference to Python package (plt.cm.py)

I have a small issue with running code from a tutorial that isn't working as it should. It's not a syntax problem for sure. I'm working with scikit-learn and matplotlib, and I'm getting a warning message in my IDE "Cannot find reference 'gray_r' in 'cm.py'..." All my packages are installed properly (via pip) and have worked for sample programs except this.
Any advice?
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn import svm
digits = datasets.load_digits()
print(digits.data)
print(digits.target)
print(digits.images[0])
clf = svm.SVC(gamma=0.001, C=100)
print(len(digits.data))
x, y = digits.data[:-1], digits.target[:-1]
clf.fit(x,y)
print('Prediction:', clf.predict(digits.data[-1])
plt.imshow(digits.images[-1], cmap=plt.cm.gray_r, interpolation='nearest')
plt.show()
Well for starters your missing a closing parenthesis on your last print statement: print('Prediction:', clf.predict(digits.data[-1])) Other than that, this code runs on my computer with only a deprecation warning. What does the traceback say?

Categories

Resources