Python, PyDot and DecisionTree

Python, PyDot and DecisionTree - python

I'm trying to visualize my DecisionTree, but getting the error
The code is:
X = [i[1:] for i in dataset]#attribute
y = [i[0] for i in dataset]
clf = tree.DecisionTreeClassifier()
dot_data = StringIO()
tree.export_graphviz(clf.fit(train_X, train_y), out_file=dot_data)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("tree.pdf")
And the error is
Traceback (most recent call last):
if data.startswith(codecs.BOM_UTF8):
TypeError: startswith first arg must be str or a tuple of str, not bytes
Can anyone explain me whats the problem? Thank you a lot!

In case of using Python 3, just use pydotplus instead of pydot. It will also have a soft installation process by pip.
import pydotplus
<your code>
dot_data = StringIO()
tree.export_graphviz(clf, out_file=dot_data)
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("iris.pdf")

I had the same exact problem and just spent a couple hours trying to figure this out. I can't guarantee what I share here will work for others but it may be worth a shot.
I tried installing official pydot packages but I have Python 3 and they simply did not work. After finding a note in a thread from one of the many websites I scoured through, I ended up installing this forked repository of pydot.
I went to graphviz.org and installed their software on my Windows 7 machine. If you don't have Windows, look under their Download section for your system.
After successful install, in Environment Variables (Control Panel\All Control Panel Items\System\Advanced system settings > click Environment Variables button > under System variables I found the variable path > click Edit... > I added ;C:\Program Files (x86)\Graphviz2.38\bin to the end in the Variable value: field.
To confirm I can now use dot commands in the Command Line (Windows Command Processor), I typed dot -V which returned dot - graphviz version 2.38.0 (20140413.2041).
In the below code, keep in mind that I'm reading a dataframe from my clipboard. You might be reading it from file or whathaveyou.
In IPython Notebook:
import pandas as pd
import numpy as np
from sklearn import tree
import pydot
from IPython.display import Image
from sklearn.externals.six import StringIO
df = pd.read_clipboard()
X = df[df.columns[:-1]]
y = df[df.columns[-1]]
dtr = tree.DecisionTreeRegressor(max_depth=3)
dtr.fit(X, y)
dot_data = StringIO()
tree.export_graphviz(dtr, out_file=dot_data, feature_names=X.columns)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
Image(graph.create_png())
Alternatively, if you're not using IPython, you can generate your own image from the command line as long as you have graphviz installed (step 2 above). Using my same example code above, you use this line after fitting the model:
tree.export_graphviz(dtr.tree_, out_file='treepic.dot', feature_names=X.columns)
then open up command prompt where the treepic.dot file is and enter this command line:
dot -T png treepic.dot -o treepic.png
A .png file should be created with your decision tree.

The line in question is checking to see if the stream/file is encoded as UTF-8
Instead of:
if data.startswith(codecs.BOM_UTF8):
use:
if codecs.BOM_UTF8 in data:
You will likely have more success...

Related

Is there any problem with the OpenSlide.read_region function?

I am using the python API of openslide packages to read some ndpi file.When I use the read_region function, sometimes it return a odd image. What problems could have happend?
I have tried to read the full image, and it will be worked well. Therefore, I think there is no problem with the original file.
from openslide import OpenSlide
import cv2
import numpy as np
slide = OpenSlide('/Users/xiaoying/django/ndpi-rest-api/slide/read/21814102D-PAS - 2018-05-28 17.18.24.ndpi')
image = slide.read_region((1, 0),6, (780, 960))
image.save('image1.png')
The output is strange output

As the read_region documentation says, the x and y parameters are always in the coordinate space of level 0. For the behavior you want, you'll need to multiply those parameters by the downsample of the level you're reading.

This appears to be a version-realted bug, see also
https://github.com/openslide/openslide/issues/291#issuecomment-722935212
The problem seems to relate to libpixman verions 0.38.x . There is a Workaround section written by GunnarFarneback suggesting to load a different version first e.g.
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libpixman-1.so.0.34.0
upadte easier solution is:
We are using Python 3.6.8+ and this did the trick for us: conda install pixman=0.36.0

Using graphviz to plot decision tree in python

I am following the answer presented to a previous post: Is it possible to print the decision tree in scikit-learn?
from sklearn.datasets import load_iris
from sklearn import tree
from sklearn.externals.six import StringIO
import pydot
clf = tree.DecisionTreeClassifier()
iris = load_iris()
clf = clf.fit(iris.data, iris.target)
tree.export_graphviz(clf, out_file='tree.dot')
dot_data = StringIO()
tree.export_graphviz(clf, out_file=dot_data)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("iris.pdf")
Unfortunately, I cannot figure out the following error:
'list' object has no attribute 'write_pdf'
Does anyone know a way around this as the structure of the generated tree.dot file is a list?
Update
I have attempted using the web application http://webgraphviz.com/. This works, however, the decision tree conditions, together with the classes are not displayed. Is there any way to include these in the tree.dot file?

Looks like data that you collect in graph is of type list.
graph = pydot.graph_from_dot_data(dot_data.getvalue())
type(graph)
<type 'list'>
We are only interested in first element of the list.
So you can do this one of following of two ways,
1) Change line where you collect dot_data value in graph to
(graph, ) = pydot.graph_from_dot_data(dot_data.getvalue())
2) Or collect entire list in graph but just use first element to be sent to pdf
graph[0].write_pdf("iris.pdf")
Here is what I get as output of iris.pdf
Update
To get around path error,
Exception: "dot.exe" not found in path.
Install graphviz from here
Then use either following in your code.
import os
os.environ["PATH"] += os.pathsep + 'C:/Program Files (x86)/Graphviz2.38/bin/'
Or simply add following to your windows path in control panel.
C:\Program Files (x86)\Graphviz2.38\bin
As per graphviz documentation, it does not get added to windows path during installation.

Reading an image in python - experimenting with images

I'm experimenting a little bit working with images in Python for a project I'm working on.
This is the first time ever for me programming in Python and I haven't found a tutorial that deals with the issues I'm facing.
I'm experimenting with different image decompositions, and I want to define some variable A as a set image from a specified folder. Basically I'm looking for Python's analog of Matlab's imread.
After googling for a bit, I found many solutions but none seem to work for me for some reason.
For example even this simple code
import numpy as np
import cv2
# Load an color image in grayscale
img = cv2.imread('messi5.jpg',0)
which is supposed to work (taken from http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_gui/py_image_display/py_image_display.html) yields the error "No module named cv2".
Why does this happen? How can I read an image?
Another thing I tried is
import numpy as np
import skimage.io as io
A=io.imread('C:\Users\Oria\Desktop\test.jpg')
io.imshow(A)
which yields the error "SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape"
All I want to do is be able to read an image from a specified folder, shouldn't be hard...Should also be noted that the database I work with is ppm files. So I want to read and show ppm images.
Edit: My enviornment is Pyzo. If it matters for anything.
Edit2: Changing the back slashes into forward slashes changes the error to
Traceback (most recent call last):
File "<tmp 1>", line 3, in <module>
A=io.imread('C:/Users/Oria/Desktop/test.jpg')
File "F:\pyzo2015a\lib\site-packages\skimage\io\_io.py", line 97, in imread
img = call_plugin('imread', fname, plugin=plugin, **plugin_args)
File "F:\pyzo2015a\lib\site-packages\skimage\io\manage_plugins.py", line 209, in call_plugin
return func(*args, **kwargs)
File "F:\pyzo2015a\lib\site-packages\matplotlib\pyplot.py", line 2215, in imread
return _imread(*args, **kwargs)
File "F:\pyzo2015a\lib\site-packages\matplotlib\image.py", line 1258, in imread
'more images' % list(six.iterkeys(handlers.keys)))
File "F:\pyzo2015a\lib\site-packages\six.py", line 552, in iterkeys
return iter(d.keys(**kw))
AttributeError: 'builtin_function_or_method' object has no attribute 'keys'

The closest analogue to Matlab's imread is scipy.misc.imread, part of the scipy package. I would write this code as:
import scipy.misc
image_array = scipy.misc.imread('filename.jpg')
Now to your broader questions. The reason this seems hard is because you're coming from Matlab, which uses a different philosophy. Matlab is a monolithic install that comes out of the box with a huge number of functions. Python is modular. The built-in library is relatively small, and then you install packages depending on what you want to do. For instance, the packages scipy (scientific computing), cv2 (computer vision), and PIL (image processing) can all read simple images from disk, so you choose between them depending on what else from the package you might want to use.
This provides a lot more flexibility, but it does require you to become comfortable installing packages. Sadly this is much more difficult on Windows than on Linux-like systems, due to the lack of a "package manager". On Linux I can sudo apt-get install scipy and install all of scipy in one line. In Windows, you might be better off installing something like conda that smooths the package installation process.

Python 2.7 Anaconda Spyder IDE Scitools Movie encoder='html' Not Working

Right now, I'm trying to create a movie in the Anaconda Spyder IDE (running Python 2.7). I have the following import statement at the top of my program:
from scitools.std import cos, exp, linspace, plot, movie
import time, glob, sys, os
After creating plots to make a movie from, I use the call:
movie('tmp_*.png', encoder='html', output_file='tmp_heatwave.html')
to try and create a movie in .html format. After running my program, I get the error:
ValueError: encoder must be ['mencoder', 'ffmpeg', 'mpeg_encode', 'ppmtompeg',
'mpeg2enc', 'convert'], not 'html'
Why is this happening? According to my textbook, A Primer on Scientific Programming with Python, "The HTML format can always be made and played. Hence, this format is the natural choice if problems with other formats occur."
Thanks!

Which version of scitools are you using? 0.9.0?
In the source code of movie class, html is in the legal encoder.
...
_legal_encoders = 'convert mencoder ffmpeg mpeg_encode ppmtompeg '\
'mpeg2enc html'.split()
_legal_file_types = 'png gif jpg ps eps bmp tif tga pnm'.split()
...
EDIT:
check version of scitools.
>>> import scitools
>>> scitools.__version__
'0.9.0'
>>>

Image.open() cannot identify image file - Python?

I am running Python 2.7 in Visual Studio 2013. The code previously worked ok when in Spyder, but when I run:
import numpy as np
import scipy as sp
import math as mt
import matplotlib.pyplot as plt
import Image
import random
# (0, 1) is N
SCALE = 2.2666 # the scale is chosen to be 1 m = 2.266666666 pixels
MIN_LENGTH = 150 # pixels
PROJECT_PATH = 'C:\\cimtrack_v1'
im = Image.open(PROJECT_PATH + '\\ST.jpg')
I end up with the following errors:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\cimtrack_v1\PythonApplication1\dr\trajgen.py", line 19, in <module>
im = Image.open(PROJECT_PATH + '\\ST.jpg')
File "C:\Python27\lib\site-packages\PIL\Image.py", line 2020, in open
raise IOError("cannot identify image file")
IOError: cannot identify image file
Why is it so and how may I fix it?
As suggested, I have used the Pillow installer to my Python 2.7. But weirdly, I end up with this:
>>> from PIL import Image
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named PIL
>>> from pil import Image
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named pil
>>> import PIL.Image
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named PIL.Image
>>> import PIL
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named PIL
All fail!

I had a same issue.
from PIL import Image
instead of
import Image
fixed the issue

So after struggling with this issue for quite some time, this is what could help you:
from PIL import Image
instead of
import Image
Also, if your Image file is not loading and you're getting an error "No file or directory" then you should do this:
path=r'C:\ABC\Users\Pictures\image.jpg'
and then open the file
image=Image.open(path)

In my case.. I already had "from PIL import Image" in my code.
The error occurred for me because the image file was still in use (locked) by a previous operation in my code. I had to add a small delay or attempt to open the file in append mode in a loop, until that did not fail. Once that did not fail, it meant the file was no longer in use and I could continue and let PIL open the file. Here are the functions I used to check if the file is in use and wait for it to be available.
def is_locked(filepath):
locked = None
file_object = None
if os.path.exists(filepath):
try:
buffer_size = 8
# Opening file in append mode and read the first 8 characters.
file_object = open(filepath, 'a', buffer_size)
if file_object:
locked = False
except IOError as message:
locked = True
finally:
if file_object:
file_object.close()
return locked
def wait_for_file(filepath):
wait_time = 1
while is_locked(filepath):
time.sleep(wait_time)

first, check your pillow version
python -c 'import PIL; print PIL.PILLOW_VERSION'
I use pip install --upgrade pillow upgrade the version from 2.7 to 2.9(or 3.0) fixed this.

In my case, the image was corrupted during download (using wget with github url)
Try with multiple images from different sources.
python
from PIL import Image
Image.open()

Often it is because the image file is not closed by last program.
It should be better to use
with Image.open(file_path) as img:
#do something

In my case, it was because the images I used were stored on a Mac, which generates many hidden files like .image_file.png, so they turned out to not even be the actual images I needed and I could safely ignore the warning or delete the hidden files. It was just an oversight in my case.

Just a note for people having the same problem as me.
I've been using OpenCV/cv2 to export numpy arrays into Tiffs but I had problems with opening these Tiffs with PIL Open Image and had the same error as in the title.
The problem turned out to be that PIL Open Image could not open Tiffs which was created by exporting numpy float64 arrays. When I changed it to float32, PIL could open the Tiff again.

If you are using Anaconda on windows then you can open Anaconda Navigator app and go to Environment section and search for pillow in installed libraries and mark it for upgrade to latest version by right clicking on the checkbox.
Screenshot for reference:
This has fixed the following error:
PermissionError: [WinError 5] Access is denied: 'e:\\work\\anaconda\\lib\\site-packages\\pil\\_imaging.cp36-win_amd64.pyd'

Seems like a Permissions Issue. I was facing the same error. But when I ran it from the root account, it worked. So either give the read permission to the file using chmod (in linux) or run your script after logging in as a root user.

In my case there was an empty picture in the folder. After deleting the empty .jpg's it worked normally.

This error can also occur when trying to open a multi-band image with PIL. It seems to do fine with 4 bands (probably because it assumes an alpha channel) but anything more than that and this error pops out. In my case, I fixed it by using tifffile.imread instead.

I had the same issue. In my case, the image file size was 0(zero). Check the file size before opening the image.
fsize = os.path.getsize(fname_image)
if fsize > 0 :
img = Image.open(fname_image)
#do something

In my case the image file had just been written to and needed to be flushed before opening, like so:
img_file.flush()
img = Image.open(img_file.name))

For anyone who make it in bigger scale, you might have also check how many file descriptors you have. It will throw this error if you ran out at bad moment.

For whoever reaches here with the error colab PIL UnidentifiedImageError: cannot identify image file in Google Colab, with a new PIL versions, and none of the previous solutions works for him:
Simply restart the environment, your installed PIL version is probably outdated.

For me it was fixed by downloading the image data set I was using again (in fact I forwarded the copy I had locally using vs-code's SFTP). Here is the jupyter notebook I used (in vscode) with it's output:
from pathlib import Path
import PIL
import PIL.Image as PILI
#from PIL import Image
print(PIL.__version__)
img_path = Path('PATH_UR_DATASET/miniImagenet/train/n03998194/n0399819400000585.jpg')
print(img_path.exists())
img = PILI.open(img_path).convert('RGB')
print(img)
output:
7.0.0
True
<PIL.Image.Image image mode=RGB size=158x160 at 0x7F4AD0A1E050>
note that open always opens in r mode and even has a check to throw an error if that mode is changed.

In my case the error was caused by alpha channels in a TIFF file.

I'll add my particular case.
I was processing images uploaded through multipart/form-data using AWS API Gateway. When I was uploading my images, that had not been giving this error locally, I was observing UnidentifiedImageError exception thrown by PIL when loading uploaded image. In order to fix this error I had to add multipart/form-data within settings of service.

Im working in Google colab, and in had same problem.
UnidentifiedImageError: cannot identify image file '/content/drive/MyDrive/Python/test.jpg'
The problem is that the default version of PIL (as today 24/11/2022) in colab is 9.3.0; but when you do !pip install pillow the version that is updated is 7.1.2.
So, what I did was open a new colab notebook and NOT pip pillow. It worked.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python, PyDot and DecisionTree - python

In case of using Python 3, just use pydotplus instead of pydot. It will also have a soft installation process by pip. import pydotplus <your code> dot_data = StringIO() tree.export_graphviz(clf, out_file=dot_data) graph = pydotplus.graph_from_dot_data(dot_data.getvalue()) graph.write_pdf("iris.pdf")

The line in question is checking to see if the stream/file is encoded as UTF-8 Instead of: if data.startswith(codecs.BOM_UTF8): use: if codecs.BOM_UTF8 in data: You will likely have more success...

Related

Is there any problem with the OpenSlide.read_region function?

Using graphviz to plot decision tree in python

Reading an image in python - experimenting with images

Python 2.7 Anaconda Spyder IDE Scitools Movie encoder='html' Not Working

Image.open() cannot identify image file - Python?

Categories

Resources