I am trying to convert dot file into a png or jpeg file where I can view the Random Forest Tree. I am following this tutorial: https://towardsdatascience.com/how-to-visualize-a-decision-tree-from-a-random-forest-in-python-using-scikit-learn-38ad2d75f21c.
I am getting error FileNotFoundError: [WinError 2] The system cannot find the file specified
I can see that tree.dot is there and I am able to open it. Trying to find why it is not reading it? Thanks.
from sklearn.datasets import load_iris
iris = load_iris()
# Model (can also use single decision tree)
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=10)
# Train
model.fit(iris.data, iris.target)
# Extract single tree
estimator = model.estimators_[5]
from sklearn.tree import export_graphviz
# Export as dot file
export_graphviz(estimator, out_file='tree.dot',
feature_names = iris.feature_names,
class_names = iris.target_names,
rounded = True, proportion = False,
precision = 2, filled = True)
<<error occurs here>>
# Convert to png using system command (requires Graphviz)
from subprocess import call
call(['dot', '-Tpng', 'tree.dot', '-o', 'tree.png', '-Gdpi=600'])
# Display in jupyter notebook
from IPython.display import Image
Image(filename = 'tree.png')
I ran through docker - ubuntu image and ran: RUN apt-get install graphviz -y in the Dockerfile. It started to work. Then used dot -Tpng tree.dot -o tree.png
Related
i am trying to read CT scan Dicom file using pydicom python library but i just can't get rid of this below error even when i install gdcm and pylibjpeg
RuntimeError: The following handlers are available to decode the pixel data however they are missing required dependencies: GDCM (req. ), pylibjpeg (req. )
Here is my code
!pip install pylibjpeg pylibjpeg-libjpeg pylibjpeg-openjpeg
!pip install python-gdcm
import gdcm
import pylibjpeg
import numpy as np
import pydicom
from pydicom.pixel_data_handlers.util import apply_voi_lut
import matplotlib.pyplot as plt
%matplotlib inline
def read_xray(path, voi_lut = True, fix_monochrome = True):
dicom = pydicom.read_file(path)
# VOI LUT (if available by DICOM device) is used to transform raw DICOM data to "human-friendly" view
if voi_lut:
data = apply_voi_lut(dicom.pixel_array, dicom)
else:
data = dicom.pixel_array
# depending on this value, X-ray may look inverted - fix that:
if fix_monochrome and dicom.PhotometricInterpretation == "MONOCHROME1":
data = np.amax(data) - data
data = data - np.min(data)
data = data / np.max(data)
data = (data * 255).astype(np.uint8)
return data
img = read_xray('/content/ActiveTB/2018/09/17/1.2.392.200036.9116.2.6.1.44063.1797735841.1537157438.869027/1.2.392.200036.9116.2.6.1.44063.1797735841.1537157440.863887/1.2.392.200036.9116.2.6.1.44063.1797735841.1537154539.142332.dcm')
plt.figure(figsize = (12,12))
plt.imshow(img)
Here is the image link on which i am trying to run this code
https://drive.google.com/file/d/1-xuryA5VlglaumU2HHV7-p6Yhgd6AaCC/view?usp=sharing
Try running the following:
!pip install pylibjpeg
!pip install gdcm
In a similar problem as yours, we could see that the problem persists at the Pixel Data level. You need to install one or more optional libraries, so that you can handle the various compressions.
First, you should do:
$ pip uninstall pycocotools
$ pip install pycocotools --no-binary :all: --no-build-isolation
From here, you should do as follows:
$ pip install pylibjpeg pylibjpeg-libjpeg pydicom
Then, your code should look like this:
from pydicom import dcmread
import pylibjpeg
import gdcm
import numpy as np
import pydicom
from pydicom.pixel_data_handlers.util import apply_voi_lut
import matplotlib.pyplot as plt
%matplotlib inline
def read_xray(path, voi_lut = True, fix_monochrome = True):
dicom = dcmread(path)
# VOI LUT (if available by DICOM device) is used to transform raw DICOM data to "human-friendly" view
if voi_lut:
data = apply_voi_lut(dicom.pixel_array, dicom)
else:
data = dicom.pixel_array
# depending on this value, X-ray may look inverted - fix that:
if fix_monochrome and dicom.PhotometricInterpretation == "MONOCHROME1":
data = np.amax(data) - data
data = data - np.min(data)
data = data / np.max(data)
data = (data * 255).astype(np.uint8)
return data
img = read_xray('/content/ActiveTB/2018/09/17/1.2.392.200036.9116.2.6.1.44063.1797735841.1537157438.869027/1.2.392.200036.9116.2.6.1.44063.1797735841.1537157440.863887/1.2.392.200036.9116.2.6.1.44063.1797735841.1537154539.142332.dcm')
plt.figure(figsize = (12,12))
plt.imshow(img)
i am trying to learn machine learn through Python from W3School. I am trying to get mydecisiontree. PNG using PyDotPlus
I am using pip PyCharm professional 2020.3
the code is as follow:
import numpy as np
import matplotlib.pyplot as plt
import pandas
from sklearn import tree
import pydotplus
from sklearn.tree import DecisionTreeClassifier
import matplotlib.image as pltimg
df = pandas.read_csv("shows.csv")
d = {'UK' : 0,'USA' : 1, 'N' : 2}
df['Nationality'] = df['Nationality'].map(d)
d = {'YES' : 1, "NO" : 0}
df['Go'] = df['Go'].map(d)
features = ['Age', 'Experience', 'Rank', 'Nationality']
X = df[features]
y = df['Go']
dtree = DecisionTreeClassifier()
dtree = dtree.fit(X, y)
data = tree.export_graphviz(dtree, out_file=None, feature_names=features)
graph = pydotplus.graph_from_dot_data(data)
graph.write_png('mydecisiontree.png')
img=pltimg.imread('mydecisiontree.png')
img_plot = plt.imshow(img)
plt.show()
Although the PyCharm shows no error but when i run the code it cant make the PNG file and gives an error on the line:
graph.write_png('mydecisiontree.png')
It shows the following error:
File "direc.....\venv\lib\site-packages\pydotplus\graphviz.py", line 1960, in create
'GraphViz\'s executables not found')
pydotplus.graphviz.InvocationException: GraphViz's executables not found
I can't see the problem. How to solve this?
PyCharm doesn't show an error because your code doesn't contain any. The problem is in your environment. Have you installed GraphViz (using pip install graphviz) in the environment that you are using to run it?
Also see the answers here:
GraphViz not working when imported inside PydotPlus (GraphViz's executables not found)
Graphviz's executables are not found (Python 3.4)
I had that problem too,
I installed graphviz from: here
and it solved.
I want to create an endpoint for scikit logistic regression in AWS Sagemaker. I have a train.py file which contains training code for scikit sagemaker.
import subprocess as sb
import pandas as pd
import numpy as np
import pickle,json
import sys
def install(package):
sb.call([sys.executable, "-m", "pip", "install", package])
install('s3fs')
import argparse
import os
if __name__ =='__main__':
parser = argparse.ArgumentParser()
# hyperparameters sent by the client are passed as command-line arguments to the script.
parser.add_argument('--solver', type=str, default='liblinear')
# Data, model, and output directories
parser.add_argument('--output_data_dir', type=str, default=os.environ.get('SM_OUTPUT_DIR'))
parser.add_argument('--model_dir', type=str, default=os.environ.get('SM_MODEL_DIR'))
parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TRAIN'))
args, _ = parser.parse_known_args()
# ... load from args.train and args.test, train a model, write model to args.model_dir.
input_files = [ os.path.join(args.train, file) for file in os.listdir(args.train) ]
if len(input_files) == 0:
raise ValueError(('There are no files in {}.\n' +
'This usually indicates that the channel ({}) was incorrectly specified,\n' +
'the data specification in S3 was incorrectly specified or the role specified\n' +
'does not have permission to access the data.').format(args.train, "train"))
raw_data = [ pd.read_csv(file, header=None, engine="python") for file in input_files ]
df = pd.concat(raw_data)
y = df.iloc[:,0]
X = df.iloc[:,1:]
solver = args.solver
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(solver=solver).fit(X, y)
from sklearn.externals import joblib
def model_fn(model_dir):
lr = joblib.dump(lr, "model.joblib")
return lr
In my sagemaker notebook I ran the following code
import os
import boto3
import re
import copy
import time
from time import gmtime, strftime
from sagemaker import get_execution_role
import sagemaker
role = get_execution_role()
region = boto3.Session().region_name
bucket=<bucket> # Replace with your s3 bucket name
prefix = <prefix>
output_path = 's3://{}/{}/{}'.format(bucket, prefix,'output_data_dir')
train_data = 's3://{}/{}/{}'.format(bucket, prefix, 'train')
train_channel = sagemaker.session.s3_input(train_data, content_type='text/csv')
from sagemaker.sklearn.estimator import SKLearn
sklearn = SKLearn(
entry_point='train.py',
train_instance_type="ml.m4.xlarge",
role=role,output_path = output_path,
sagemaker_session=sagemaker.Session(),
hyperparameters={'solver':'liblinear'})
I'm fitting my model here
sklearn.fit({'train': train_channel})
Now, for creating endpoint,
from sagemaker.predictor import csv_serializer
predictor = sklearn.deploy(1, 'ml.m4.xlarge')
While trying to create endpoint, it is throwing
ClientError: An error occurred (ValidationException) when calling the CreateModel operation: Could not find model data at s3://<bucket>/<prefix>/output_data_dir/sagemaker-scikit-learn-x-y-z-000/output/model.tar.gz.
I checked my S3 bucket. Inside my output_data_dir there is sagemaker-scikit-learn-x-y-z-000 dir which has debug-output\training_job_end.ts file. An additional directory got created outside my <prefix> folder with name sagemaker-scikit-learn-x-y-z-000 that has source\sourcedir.tar.gz file. Generally whenever I trained my models with sagemaker built-in algorithms, output_data_dir\sagemaker-scikit-learn-x-y-z-000\output\model.tar.gz kind of files get created. Can someone please tell me where my scikit model got stored, how to push source\sourcedir.tar.gz inside my prefix code without having doing it manually and how to see contents of sourcedir.tar.gz?
Edit: I elaborated the question regarding prefix. Whenever I run sklearn.fit(), two files with same name sagemaker-scikit-learn-x-y-z-000 are getting created in my S3 bucket. One created inside my <bucket>/<prefix>/output_data_dir/sagemaker-scikit-learn-x-y-z-000/debug-output/training_job_end.ts and other file is created in <bucket>/sagemaker-scikit-learn-x-y-z-000/source/sourcedir.tar.gz. Why is the second file not created inside my <prefix> like the first one? What is contained in sourcedir.tar.gz file?
I am not sure if your model is really stored, if you can't find it in S3. While you define a function with the call of joblib.dump in your entry point script, I am having the call at the end of the main. For example:
# persist model
path = os.path.join(args.model_dir, "model.joblib")
joblib.dump(myestimator, path)
print('model persisted at ' + path)
Then the file can be found in ..\output\model.tar.gz just as in your other cases. In order to double-check that is created you maybe want to have a print statement that can be found in the protocol of the training.
You must dump the model as the last step of your training code. Currently you are doing it in the wrong place, as model_fn goal is to load the model for inference, not for training.
Add the dump after training:
lr = LogisticRegression(solver=solver).fit(X, y)
lr = joblib.dump(lr, args.model_dir)
Change model_fn() to load the model instead of dumping it.
See more here.
This post here explains it well:
https://towardsdatascience.com/deploying-a-pre-trained-sklearn-model-on-amazon-sagemaker-826a2b5ac0b6
In short, the tar.gz gets created by tar-gz-ing the model.joblib binary which was first created joblib.dump. To quote the article:
#Build tar file with model data + inference code
bashCommand = "tar -cvpzf model.tar.gz model.joblib inference.py"
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
output, error = process.communicate()
The inference.py is probably optional.
i'm using scikitlearn to introduce me to Machine Learning, i' m following this tutorial link to yt
but if i try to export the pdf decision tree i have this error:
i try to do: open -w review iris.pdf
and the result is :
Impossibile ottenere un descrittore di file che si riferisce alla console
if i compile from the terminal i have the error:
Traceback (most recent call last) File "fstraining.py", line 2, in <module>
import graphviz ImportError: No module named graphviz
Thanks for the attention
Once you have built your decision tree clf, simply:
from sklearn.externals.six import StringIO
from sklearn.tree import export_graphviz
import pydotplus
# Export resulting tree to DOT source code string
dot_data = export_graphviz(clf,
feature_names=col_names,
out_file=None,
filled=True,
rounded=True)
#Export to pdf
pydot_graph = pydotplus.graph_from_dot_data(dot_data)
pydot_graph.write_pdf('tree.pdf')
This answer is adapted from here:
I am receving this error while I am try to run the code (from CMD):
ModuleNotFoundError: No module named 'numbers.hog'; numbers is not a package
Here is the hog.py file code...
from skimage import feature
class HOG:
def __init__(self, orientations = 9, pixelsPerCell = (8, 8),
cellsPerBlock = (3, 3), normalize = False):
self.orienations = orientations
self.pixelsPerCell = pixelsPerCell
self.cellsPerBlock = cellsPerBlock
self.normalize = normalize
def describe(self, image):
hist = feature.hog(image,
orientations = self.orienations,
pixels_per_cell = self.pixelsPerCell,
cells_per_block = self.cellsPerBlock,
normalize = self.normalize)
return hist
...and the main (train.py) which return the error.
from sklearn.svm import LinearSVC
from numbers.hog import HOG
from numbers import dataset
import argparse
import pickle as cPickle
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required = True,
help = "path to the dataset file")
ap.add_argument("-m", "--model", required = True,
help = "path to where the model will be stored")
args = vars(ap.parse_args())
(digits, target) = dataset.load_digits(args["dataset"])
data = []
hog = HOG(orientations = 18, pixelsPerCell = (10, 10),
cellsPerBlock = (1, 1), normalize = True)
for image in digits:
image = dataset.deskew(image, 20)
image = dataset.center_extent(image, (20, 20))
hist = hog.describe(image)
data.append(hist)
model = LinearSVC(random_state = 42)
model.fit(data, target)
f = open(args["model"], "w")
f.write(cPickle.dumps(model))
f.close()
I don't uderstand why it gives me error on module package. numbers is a package, why it don't import it as well (as it seems) ?
UPDATE: tried to put from .hog import HOG and then execute from CMD..It prints:
No module named '__main__.hog'; '__main__' is not a package
Is it crazy ? hog.py is in the main package together with the other files. As you can see, it also contains HOG class.... Can't understand.. Some one can reproduce the error ?
In the IDE console it prints:
usage: train.py [-h] -d DATASET -m MODEL
train.py: error: the following arguments are required: -d/--dataset, -m/--model
This should be correct as soon as it is executed in IDE because the program MUST run in CMD.
UPDATE 2: for who is interested, this is the project https://github.com/VAUTPL/Number_Detection
Change from numbers.hog import HOG to from hog import HOG and change from numbers import dataset to import dataset.
You are already in the "numbers" package so you don't have to precise it again when you import it.
When you type from numbers import dataset, Python will look for a package numbers (inside the actual package) that contains a dataset.py file.
If your train.py was outside the numbers package then you have to put the package name (numbers) before.
Important
numbers is a python standard package
https://docs.python.org/2/library/numbers.html
Check if you are not really importing that package or rename your package to a more specific name.
Also:
It might looks like python doesnt recognize your package.
Open a python shell and write:
import sys
print sys.path
Check if your number path is there.
If it's not there you have to add it.
sys.path.insert(0, "/path/to/your/package_or_module")
Your train.py file is already in the package "numbers", so you don't have to import numbers.
Try this instead:
from hog import HOG
I saw in comment that it gives you "error (red line)".
Can you be more precise, because I don't see errors there.