Install tesseract/pytesser on Mac OS X - python

I am trying to install this (and additionally pytesser) for osx 10.9 (with anaconda as default python). I have looked around online but I can't get any of the tutorials to work as they all seem to be extinct (homebrew doesn't have a formula for leptonica for instance). I have probably been struggling to install this for the best part of a week with absolutely no luck at all.
Has anyone managed to succeed recently-how did you do it?
Thanks
Edit: Strangely the brew for leptonica has spluttered into life. I have the fairly strange error below.
brew install tesseract
==> Downloading https://bitbucket.org/3togo/python-tesseract/downloads/tesseract
Already downloaded: /Library/Caches/Homebrew/tesseract-3.03-rc1.tar.gz
==> ./configure --prefix=/usr/local/Cellar/tesseract/3.03-rc1
checking for leptonica... yes
checking for pixCreate in -llept... yes
checking leptonica version >= 1.70... configure: error: in `/private/tmp/tesseract- 19Ol/tesseract-3.03':
configure: error: leptonica 1.70 or higher is required
See `config.log' for more details
READ THIS: https://github.com/Homebrew/homebrew/wiki/troubleshooting
i.e it is registering the install but still not working. I will check out the config. file as instructed
Edit 2:
Upon trying to import the library in python I get this:
import tesseract
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "//anaconda/lib/python2.7/site-packages/python-tesseract_0.8-3.0-py2.7_macosx-10.9- intel.egg/tesseract.py", line 28, in <module>
_tesseract = swig_import_helper()
File "//anaconda/lib/python2.7/site-packages/python-tesseract_0.8-3.0-py2.7_macosx-10.9-intel.egg/tesseract.py", line 24, in swig_import_helper
_mod = imp.load_module('_tesseract', fp, pathname, description)
ImportError: dlopen(//anaconda/lib/python2.7/site-packages/python-tesseract_0.8-3.0- py2.7_macosx-10.9-intel.egg/_tesseract.so, 2): Library not loaded: /usr/local/lib/libtesseract.3.dylib
Referenced from: //anaconda/lib/python2.7/site-packages/python-tesseract_0.8-3.0-py2.7_macosx-10.9-intel.egg/_tesseract.so
Reason: image not found
To be honest I am a complete amateur with respect to any of this behind the scenes installation and had to google extensively to even get this far. I would be really grateful if someone with a bit of knowledge could shed any light on the obvious things to try, as I feel as though I have exhausted the web looking for solutions and am getting close to considering this library unuseable and attempting to write my own ocr library-100% not a job I am looking forward to. Alternatively, if anyone knows any decent python ocr libraries with decent support/ install mainatenance I would love to know about them (From my google searching I suspect that tesseract is by far the best known, which is why it is so frustrating that the install is so tricky)
I will happily provide any any more info about my system etc to any warrior willing to have a crack at helping with this.
Thanks!

You need to install tesseract first
https://bitbucket.org/3togo/python-tesseract/downloads/tesseract.rb
For details,
https://code.google.com/p/python-tesseract/wiki/HowToCompileForHomebrewMac

I have just installed tesseract 3.02 using brew without any issues (osx 10.9). If you don't need version 3.03, you may want to try installing 3.02. Instructions on installing a different version using brew: Homebrew install specific version of formula?
Otherwise, based on your log, the brew install did not complete successfully so tesseract can not be imported. Brew downloads the source, runs configure, then does make install. The configure step is failing because you need leptonica 1.70. Usually brew would detect this dependency and install leptonica 1.70 for you.
You may want to try installing leptonica yourself: http://www.leptonica.com/download.html. Instructions on building: http://www.leptonica.com/source/README.html

Related

RDKit installation under Windows and Python3.7.4

RDKit could be a nice package if it wasn't so complicated to install.
Here on SO, there are several questions having problems with the installation of RDKit.
However, on different operating systems or different environments.
My configuration is:
Win10, Python 3.7.4, pip is installed, PATH is set, PYTHONPATH is set.
The installation of other modules is working fine via python -m pip install <package>.
I'm aware that the site recommends the fastest installation with Anaconda.
However, I don't have and don't want Anaconda.
On the webpage it says:
"Get the appropriate windows binary build from: https://github.com/rdkit/rdkit/releases".
However, there are no binaries of the latest versions.
This means, I would have to build it from source. I'm hesitating because the process seems to be pretty complicated, many extra installations with new problems and unknowns, and furthermore, the instructions seem to be outdated and incomplete for somebody who would build binaries from the source for the first time.
So, then I tried some unofficial binaries of RDKit.
If I unpack them and set the paths according to instructions, I get this error message:
>>> from rdkit import Chem
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\xyz\Programs\RDKit\rdkit\__init__.py", line 2, in <module>
from .rdBase import rdkitVersion as __version__
ImportError: DLL load failed: The specified module could not be found.
So, finally my questions:
How to properly install RDKit with the above mentioned configuration?
What is the specified DLL which is missing?
Where is it expecting it and searching it?
Are these RDKit 3.6 binaries maybe incompatible with Python 3.7.4?
I'm pretty sure it is probably a "small" thing (a path here or a check there), but I'm stuck. Thank you for any hints.
Update:
Apparently, it is not just a "small" thing. Chances to get this to work are most likely very low.
In the meantime I found this:
https://github.com/rdkit/rdkit/issues/1812
https://github.com/rdkit/rdkit/issues/2389
If the author of rdkit writes (April 2019):
I would be happy to be able to do pip distributions of the RDKit, but
to the best of my knowledge no one has managed to figure out how to
make it actually work.
I'd be happy to accept a PR from someone who has figured this out, but
I am not likely to have the time to do this myself anytime in the near
future.
So, if anybody feels capable achieving this, please feel free.
I will invest time in something else or will have to switch to Anaconda if I want to use RDKit.
On the webpage you linked there is a section about missing DLLs:
"In Win7 systems, you may run into trouble due to missing DLLs, see one thread from the mailing list: http://www.mail-archive.com/rdkit-discuss#lists.sourceforge.net/msg01632.html You can download the missing DLLs from here: http://www.microsoft.com/en-us/download/details.aspx?id=5555"
Not sure if this helps

How do I install OpenCV 3 on Centos 6.8?

I'm working on a CentOS cluster right now and have Python2.7 installed. I've managed to get OpenCV 2.4 installed (using these helpful instructions) but it does not have all of the functionality of 3 (I need the connectedComponents function and a couple others not available). Omitting the "checkout tags" step results in errors during "cmake". Something else to note is when I attempt to install the ffmpeg package it tells me no such package is available. Error:
CMake Error at 3rdparty/ippicv/downloader.cmake:77 (message):
ICV: Failed to download ICV package: ippicv_linux_20151201.tgz.
Status=6;"Couldn't resolve host name"
Call Stack (most recent call first):
3rdparty/ippicv/downloader.cmake:110 (_icv_downloader)
cmake/OpenCVFindIPP.cmake:237 (include)
...
I've managed to get OpenCV 2.4 installed (using these helpful instructions) but it does not have all of the functionality of 3 (I need the connectedComponents function and a couple others not available).
Why don't you just download OpenCV 3 then?
Something else to note is when I attempt to install the ffmpeg package it tells me no such package is available.
You can download the file yourself from here (the package that is not available for you).
Then place it in the folder where it initially would have been downloaded to:
<your opencv build>/3rdparty/ippicv/
It seems like OpenCV 3 would be better suited for what you are doing, you even said yourself that you are needing features that aren't available in 2.7.
The OpenCV 3.0 documentation actually has a full guide on installing the latest version of the library using the Yum feature in your terminal. It walks you through every step and explains them all in detail, including the Cmake steps which seem to be giving you trouble. I would recommend taking a look at the guide which is linked here.

ImportError: Permission Denied while using LXML

I've been having a ton of trouble using LXML, after installing it from https://pypi.python.org/pypi/lxml/3.2.1 using Easy_Install-2.7. I installed it on Windows using cygwin, and at first the package seemed to be okay. However upon further testing I ran into problems.
When I run code with:
import lxml
it works completely fine. But as soon as I try:
import lxml.etree
I get this error:
Traceback (most recent call last):
File "D:\Nick_Code\NewsScraper\testdummy.py", line 7, in <module>
import lxml.etree
File "/usr/lib/python2.7/site-packages/lxml-3.2.0-py2.7-cygwin-1.7.20-i686.egg/lxml/etree.py", line 7, in <module>
__bootstrap__()
File "/usr/lib/python2.7/site-packages/lxml-3.2.0-py2.7-cygwin-1.7.20-i686.egg/lxml/etree.py", line 6, in __bootstrap__
imp.load_dynamic(__name__,__file__)
ImportError: Permission denied
I've been trying to find information/work arounds for quite a while but no success. Please let me know if you have any insight or need information.
Thanks!
Michael
This is not a solid answer. But I will highlight several of the problems involved for obtaining a solution. Most likely the problem above, is like a cancer caused by several factors acting catastrophically together.
I have the same exact problem as in the OP, when attempting to use the native Cygwin supplied Python packages on my Windows Vista machine. Being new to Python I have spent several days in trying to get this to work, and understand why it is not working. But all my Google-fu returned nothing but countless dead ends. So here's my take on this.
There are many reasons why Python could have trouble under Cygwin, some which you can do something about and some which are beyond most peoples control. What it boils down to, are the following key issues:
Windows is a complete mess when it comes to file permissions, and Cygwin cannot handle windows file permissions very well. So what you see in Cygwin is far from the whole story.
Windows is shamefully character case-independent which causes loads of trouble, especially when you need to (cross)compile anything that was originally developed under *nix based system (i.e. everything). In fact, if you attempt extracting any archive that contains files whose names differ only in capitalization. (I.e. "makefile" vs "Makefile" etc.) files under Windows or Cygwin, you loose all but one of the files. in case their You need to enable case-sensitivity to do anything more than "hello world" *nix compilations.
Windows handles symlinks completely different than Cygwin. And if your ZIP, TAR etc. archives contain any symlinks, they will be broken after extraction to Windows environment.
Sloppy code practices, where developer have not properly tested their creations on various environments, or carefully set proper file permissions to their *.tar.gz collections. Including correct dependency specifications, or mentioning whether or not binaries has been statically linked etc.
For the full gory details and further (Win-Cygwin) issues, look HERE.
At first I tried to use Cygwin's own Python without any additional packages, and nstalling lxml using PIP and easy_install. Then I tried to use Cygwin's own libxml2, libxslt and xml python packages, and I had the same problems.
At first, after installing the static windows binaries (as suggested elsewhere),
I got this error:
File "/usr/lib/python2.7/site-packages/lxml-3.2.4-py2.7-cygwin-1.7.24-i686.egg/lxml/etree.py", line 6, in __bootstrap__
imp.load_dynamic(__name__,__file__)
ImportError: Permission denied
Aborted (core dumped)
Then I investigated the file permissions and changed those with: chmod -R 755 /usr/lib/python2.7/
I got one step further to isolate problem to an apparently missing file.
And enabling verbose and diagnostic mode's didn't help much either.
File "/usr/lib/python2.7/site-packages/lxml-3.2.4-py2.7-cygwin-1.7.24-i686.egg/lxml/etree.py", line 6, in __bootstrap__
imp.load_dynamic(__name__,__file__)
ImportError: No such file or directory
Aborted (core dumped)
HERE is the exact statement specification:
Load and initialize a module implemented as a dynamically loadable shared
library and return its module object. If the module was already initialized, it
will be initialized again. Re-initialization involves copying the __dict__
attribute of the cached instance of the module over the value used in the module
cached in sys.modules. The pathname argument must point to the shared library.
The name argument is used to construct the name of the initialization function:
an external C function called initname() in the shared library is called. The
optional file argument is ignored. (Note: using shared libraries is highly
system dependent, and not all systems support it.)
So I started reading on the lxml website which clearly state lxml's dependencies on both libxml2 and libxslt, and unless they are statically linked, they also depend on iconv and zlib. So you're lead to believe you need to install all of these. Don't! Continue reading. But if you're going to build from sources (as easy_install may try to do) you'll need everything, including the development header libraries: libxml2-devel, libxslt-devel. Another place states that you also need Cython and install with:
easy_install lxml==dev
The dependencies are shown in this picture from HERE:
So you think you may get away with something like:
STATIC_DEPS=true pip install lxml
But that doesn't do it either. Probably because the libraries used to compile Cygwin's Python have to be the same as those for compiling lxml. But I don't know. Notice how the lxml package refers to Cygwin "1.7.24". My Cygwin is already "1.7.25" and you can check this with uname -a. Then you can check your static python executable with file and ldd. Then you understand that this also depend on the C-compiler used for building python/cygwin under Windows or *nix. Smelling a nightmare I decided that building my own was not the way to go. So next I tried to install the Python libraries (supplied as
executables) meant for Windows Python. This didn't work since I never had windows native Python installed, and I was greeted with an error that the installed could not find Python in my registry. I could of course just extract the executable, but I wouldn't know where to put the binaries without the installer. So I had another idea...
There are 3 possible solutions to getting this to work, as far as I can see.
The easy way of installing a Windows native Python interpreter. You loose some native Cygwin functionality, unless you install in correct place: /usr/lib/python2.7 and make sure Cygwin can find it and use it. This also uses a different file-permissions, case-sensitivity and character set (UTF-16LE) than Cygwin (UTF-8), potentially creating many other issues down the line! Difficulty: Easy
Continue hacking the Cygwin's Python, to make it work with the binary libraries used in (1). But this requires:
a) Uninstall and remove all Cygwin Python packages, except bare Python interpreter.
b) Remove all PIP and easy install traces.
c) Hacking the Windows registry to pretend to have Python27 installed:
HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore\2.7\InstallPath C:\Python27\
HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore\2.7\PythonPath C:\Python27\Lib;C:\Python27\DLLs;C:\Python27\Lib\lib-tk
HKEY_CLASSES_ROOT...
d) Install the Windows binary libraries.
e) All the rest should now hopefully work with PIP or easy_install. Difficulty: Medium!
Doing it properly by compiling Python and all libraries from scratch. Difficulty: Hard!
I successfully did (1), but I still think (2) is the smarter way of doing it, but I have not tested it, which is why I don't consider this as a good answer. BTW. One more quirk, I have to run the interpreter with: python.exe -E to avoid an annoying: "SyntaxError: invalid syntax" when hitting return!
Conclusion:
Apparently, you don't need the libxml2 and libxslt python packages to use lxml!
In my case I needed Scrapy, so I also had to install a few other packages.
$ pip.exe list
cssselect (0.9.1)
lxml (3.2.4)
pip (1.4.1)
pyOpenSSL (0.11)
pywin32 (218)
queuelib (1.1.1)
Scrapy (0.20.0)
setuptools (1.4.1)
six (1.4.1)
Twisted (13.2.0)
w3lib (1.5)
zope.interface (4.0.5)
$ll /cygdrive/c/Python27/Lib/site-packages/
adodbapi
cssselect
isapi
lxml
OpenSSL
pip
pythonwin
pywin32_system32
queuelib
scrapy
twisted
w3lib
win32
win32com
win32comext
zope
cssselect-0.9.1-py2.7.egg-info
lxml-3.2.4-py2.7.egg-info
pip-1.4.1-py2.7.egg-info
queuelib-1.1.1-py2.7.egg-info
Scrapy-0.20.0-py2.7.egg-info
six-1.4.1-py2.7.egg-info
Twisted-13.2.0-py2.7.egg-info
w3lib-1.5-py2.7.egg-info
zope.interface-4.0.5-py2.7.egg-info
PyWin32.chm
setuptools-1.4.1-py2.7.egg
pyOpenSSL-0.11-py2.7.egg-info
pywin32-218-py2.7.egg-info
easy-install.pth
pywin32.pth
setuptools.pth
zope.interface-4.0.5-py2.7-nspkg.pth
pythoncom.py
six.py
pythoncom.pyc
six.pyc
pythoncom.pyo
pywin32.version.txt
README.txt
Useful References:
HERE
HERE
HERE HERE HERE HERE
HERE

Python PyAudio installation problems (with PortAudio)

I'm trying to write a program to record information from my computers microphone an save it to a file. PyAudio seems like one of the better packages for doing this, and they even have a binary for Windows 7 (Python 2.7). I downloaded the executable file and ran it to set up PyAudio, but when I try to import PyAudio into a python script now I get an error:
Please build and install the PortAudio Python bindings first.
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
import pyaudio
File "C:\Python27\lib\site-packages\pyaudio.py", line 103, in <module>
sys.exit(-1)
SystemExit: -1
If I look at pyaudio.py, the code that it's failing on is:
# attempt to import PortAudio
try:
import _portaudio as pa
except ImportError:
print "Please build and install the PortAudio Python " +\
"bindings first."
sys.exit(-1)
Also, in case it's relevant, if I go to Python27\Lib\site-packages (where pyaudio.py is) there is a file called portaudio_x64.dll.
The documentation on their site only seems to have instructions for if you want to install PyAudio by building from source code. Additionally, it says that PortAudio v19 is included in the binary, so I assumed it would just work after running the setup executable.
I have no idea what's going wrong and I really need this running soon. Any ideas on what's going wrong? Or if anyone has recommendations for similar packages that work better specifically with Windows 7 (64-bit) and Python 2.7 (Enthought distribution), as well as cross-platform, I'd love to hear them.
Copying the answer from the comments in order to remove this question from the "Unanswered" filter:
Try the binaries from http://www.lfd.uci.edu/~gohlke/pythonlibs/#pyaudio
~ answer per cgohlke

Has anybody been able to install PyWeka?

I need to install in python 2.6 or 2.7 for windows the library PyWeka0.3dev, It says it requires setuptools, which I installed but then they told me it was a deprecated instalation library and I installed distribute, then I downloaded the PyWeka compressed package and each time I try to install it neither with setup.py nor with easy_install (where it says something like no module ez_setup). Can anybody give me a clue about how to do this?
As mentioned to you via Aardvark (yes, I am omnipresent), the module in question is broken. You can't easy_install it. It's a bug in PyWeka.
You can download the file from PyPI, http://pypi.python.org/pypi/PyWeka/0.3dev, and unpack it.
In the file setup.py, remove the following two lines:
from ez_setup import use_setuptools
use_setuptools()
And install it by running
python setup.py install
You need to have installed numpy and NamedMatrix (which has the same bug as PyWeka) first.
However, you mentioned you are on Windows. I strongly doubt that PyWeka will work on Windows. There are some Unix specific code in it.
And I still really want to know why the authors are reading files by calling cat from subprocess. That seems pretty pointless and is together with the broken install, good enough reason for me to keep far away from that module. I suspect it's authors simply have no idea what they are doing.
That, or they are geniuses.
A punk/goth approach to programming probably has the right to be..
To get the C-compliation part to work on windows you either need (1) to have Visual Studio of the same version that was used to compile the python version you are using, or (2) mingw which is a bit trickier to set up.

Categories

Resources