I'm trying to make pytesser (downloadable here) work on my mac OS, but I don't succeed.
I installed Tesseract, PIL and all the dependencies.
I unzipped pytesser in my python lib folder and modified the script file into __init__.py
in the init file I modified the path to the tesseract.exe file as suggested here and here
that is:
tesseract_exe_name = 'my lib path/pytesser/tesseract' # Name of executable to be called at command line
that's what I get as error:
Traceback (most recent call last):
File "<pyshell#50>", line 1, in <module>
print image_to_string(picz)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pytesser/__init__.py", line 31, in image_to_string
call_tesseract(scratch_image_name, scratch_text_name_root)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pytesser/__init__.py", line 21, in call_tesseract
proc = subprocess.Popen(args)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 679, in __init__
errread, errwrite)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1228, in _execute_child
raise child_exception
OSError: [Errno 8] Exec format error
it seems that the module does not manage to run the .exe file. I tried to change the path, add the extension .exe but I always get the same error.
Several solutions for a python tesseract wrapper:
Python-Tesseract:
First of get homebrew and brew install python, then easy_install https://bitbucket.org/3togo/python-tesseract/downloads/python_tesseract-0.9.1-py2.7-macosx-10.10-x86_64.egg
source: https://code.google.com/p/python-tesseract/wiki/HowToCompileForHomebrewMac
pytesseract:
This what I was using previously before getting python-tesseract, pip install pytesseract. Then you have to go to /usr/local/lib/python2.7/site-packages and go to pytesseract then pytesseract.py. Change the file path in the python script to where tesseract is located on your computer.
Related
So I've had Python 3.6 on my Windows 10 computer for a while now, and today I just downloaded and installed the graphviz 0.8.2 (https://pypi.python.org/pypi/graphviz) package via the admin commandline with:
pip3 install graphviz
It was only after this point that I downloaded the Graphviz 2.38 MSI installer file and installed the program at:
C:\Program Files (x86)\Graphviz2.38
So then I tried to run this simple Python program:
from graphviz import Digraph
dot = Digraph(comment="The round table")
dot.node('A', 'King Arthur')
dot.node('B', 'Sir Bedevere the Wise')
dot.node('L', 'Sir Lancelot the Brave')
dot.render('round-table.gv', view=True)
But unfortunately, I received the following error when I try to run my Python program from commandline:
Traceback (most recent call last):
File "C:\Program Files\Python36\lib\site-packages\graphviz\backend.py", line 124, in render
subprocess.check_call(args, startupinfo=STARTUPINFO, stderr=stderr)
File "C:\Program Files\Python36\lib\subprocess.py", line 286, in check_call
retcode = call(*popenargs, **kwargs)
File "C:\Program Files\Python36\lib\subprocess.py", line 267, in call
with Popen(*popenargs, **kwargs) as p:
File "C:\Program Files\Python36\lib\subprocess.py", line 709, in __init__
restore_signals, start_new_session)
File "C:\Program Files\Python36\lib\subprocess.py", line 997, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\foldername\testing.py", line 11, in <module>
dot.render('round-table.gv', view=True)
File "C:\Program Files\Python36\lib\site-packages\graphviz\files.py", line 176, in render
rendered = backend.render(self._engine, self._format, filepath)
File "C:\Program Files\Python36\lib\site-packages\graphviz\backend.py", line 127, in render
raise ExecutableNotFound(args)
graphviz.backend.ExecutableNotFound: failed to execute ['dot', '-Tpdf', '-O', 'round-table.gv'], make sure the Graphviz executables are on your systems' PATH
Notice how what I've asked seems VERY similar to this question asked here:
"RuntimeError: Make sure the Graphviz executables are on your system's path" after installing Graphviz 2.38
But for some reason, adding those paths (suggested in the solutions at the link above) to the system variables isn't working, and I don't know why! I tried restarting the computer after adding the paths as well, still to no success. See the image below:
Although the other suggested solution, which was to add these few lines in front of my Python code, did work:
import os
os.environ["PATH"] += os.pathsep + 'C:/Program Files (x86)/Graphviz2.38/bin/'
But here's the issue: I don't understand why adding to the environment variables didn't work, and this is my primary concern. So my question is this: why did adding those lines of code in front of the Python script work but changing the environment variables didn't? What do I need to do to get my script to run without adding those lines of code in front?
Can you please post the output you get when you type SET in a cmd window after setting the PATH environment variable?
Does it contain C:/Program Files (x86)/Graphviz2.38/bin/ ?
A cmd window must be restarted before updated environment variables become effective!
I am trying to convert a pdf document to text document using pdftotext software.
I need to call this application inc command prompt from python script to convert the file.
I have following code:
import os
import subprocess
path = "C:\\Users\\..."
pdffname = "pdffilename.pdf"
txtfname = "txtfilename.txt"
subprocess.call(['pdftotext', '-layout',
os.path.join(path, pdffname),
os.path.join(path, txtfname)])
When I run this code, I get error
File "C:/Users/.../code-1.py", line 44, in <module>
os.path.join(path, txtfname)])
File "C:\Anaconda\lib\subprocess.py", line 522, in call
return Popen(*popenargs, **kwargs).wait()
File "C:\Anaconda\lib\subprocess.py", line 710, in __init__
errread, errwrite)
File "C:\Anaconda\lib\subprocess.py", line 958, in _execute_child
startupinfo)
WindowsError: [Error 2] The system cannot find the file specified
Can you help to call pdftotext application from python to convert pdf to text file.
I had this same error, except with Popen. I fixed it by providing the full path to pdftotext.exe in the subprocess call. Don't forget to escape your backslashes.
I do not know much about Anaconda, and I have not tested this myself, but I believe Conda may have an issue referencing scripts on Windows: fix references to scripts on windows
I am using Ubuntu 14.04. I have the following code:
import Image
import pytesseract
im = Image.open('test.png')
print pytesseract.image_to_string(im)
but I keep getting the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 540, in runfile
execfile(filename, namespace)
File "/home/chaitanya/pythonapp/localcopy.py", line 4, in <module>
print pytesseract.image_to_string(im)
File "/usr/local/lib/python2.7/dist-packages/pytesseract/pytesseract.py", line 142, in image_to_string
config=config)
File "/usr/local/lib/python2.7/dist-packages/pytesseract/pytesseract.py", line 75, in run_tesseract
stderr=subprocess.PIPE)
File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
Both the python program and the image are in the same location.What could be the problem??
You need to install tesseract-ocr:
sudo apt-get install tesseract-ocr
If you're on windows and have PIP installed go to your project directory and run:
pip install tesseract-ocr
Based off of #padraic cunningham's answer which I tailored to my setting.
If you are on Linux (ubuntu 16, should not matter) and have a conda installation:
First search for what you need to be installing:
$ anaconda search -t conda tesserocr
You will get a few options, you need to look at the platforms and builds to identify what makes sense for you.
As I have python 3.6 and linux-64 I chose mcs07/tesserocr
To install:
$ conda install -c mcs07 tesserocr
That's it. I didn't need a restart of the terminal or anything. I just kept going.
I'm trying to follow this example of pytesser (link) in a Mac Maverick.
>>> from pytesser import *
>>> im = Image.open('phototest.tif')
>>> text = image_to_string(im)
But, in the last line I get this error message:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pytesser.py", line 31, in image_to_string
call_tesseract(scratch_image_name, scratch_text_name_root)
File "pytesser.py", line 21, in call_tesseract
proc = subprocess.Popen(args)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 711, in __init__
errread, errwrite)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1308, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
But, I don't understand what I should do. The file phototest is in the same folder I'm running the script. How to fix this?
UPDATE:
When I try
brew install tesseract
I get this error:
Warning: It appears you have MacPorts or Fink installed.
Software installed with other package managers causes known problems for
Homebrew. If a formula fails to build, uninstall MacPorts/Fink and try again.
Error: You must `brew link libtiff libpng jpeg' before tesseract can be installed
I actually had the same error as you, which is how I found this post. I also have the solution to my problem, because you gave it to me!
I was seeing:
ryan.davis$ python tesseract.py
Traceback (most recent call last):
File "tesseract.py", line 52, in <module>
print (image_to_string(big))
File "/usr/local/lib/python2.7/site-packages/pytesseract/pytesseract.py", line 161, in image_to_string
config=config)
File "/usr/local/lib/python2.7/site-packages/pytesseract/pytesseract.py", line 94, in run_tesseract
stderr=subprocess.PIPE)
File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 710, in __init__
errread, errwrite)
File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1335, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
Want to know what I had to do to fix this? Exactly what you tried: brew install tesseract I had installed the tesseract python library, but hadn't installed it at the system level. So that solves my problem. How about yours?
I think you might have been distracted by this:
Warning: It appears you have MacPorts or Fink installed. Software
installed with other package managers causes known problems for
Homebrew. If a formula fails to build, uninstall MacPorts/Fink and try
again.
And not noticed your answer was already provided in the brew response:
You must brew link libtiff libpng jpeg before tesseract can be
installed.
So do:
brew link libtiff
brew link libpng
brew link jpeg
Then:
brew install tesseract
Finally:
:)
I've downloaded PyTesser and extracted it.
I was in the pytesser_v0.0.1 folder and tried to run the sample usage code in the python interpreter:
from pytesser import *
print image_file_to_string('fnord.tif')
and the output:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pytesser.py", line 44, in image_file_to_string
call_tesseract(filename, scratch_text_name_root)
File "pytesser.py", line 21, in call_tesseract
proc = subprocess.Popen(args)
File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1259, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
NOTE: I'm in Ubuntu 12.10 with Python 2.7.3
can anyone help me understand this error, and what can I do to fix it ?
This isn't as well documented as it could be, but if you are not on Windows you need to install the tesseract binary for your platform. On Ubuntu and other Debian based Linux distributions, apt-get install tesseract-ocr. Then you can run:
python pytesser.py
which uses the test files phototest.tif, fnord.tif and fonts_test.png to test the library.
For beginners on windows to use pytesseract:
Open command prompt
Type: pip install pytesseract
(this will install pytesseract last version module on your python easily)
Go to this link and download and install tesseract-ocr engine:
https://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-setup-3.02.02.exe&can=2&q=
Now you are ready to use pytesseract
For more information and see code example check this link:
http://www.manejandodatos.es/2014/11/ocr-python-easy/