Mac Poppler + pdf2image PATH bug - python

Problem:
I've been attempting to use the Python library pdf2image, which I am aware requires the prior installation of poppler. Poppler is installed (via homebrew) and the package via pip.
However, when running convert_to_path(my_pdf) I get the following:
Traceback (most recent call last):
File "<ipython-input-9-ba107659b495>", line 1, in <module>
test_image = convert_from_path(testfile,
File "/Users/<myuser>/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pdf2image/pdf2image.py", line 97, in convert_from_path
page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)["Pages"]
File "/Users/<myuser>/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pdf2image/pdf2image.py", line 467, in pdfinfo_from_path
raise PDFInfoNotInstalledError(
PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?
My System:
Mac, OC 10.15.7
Python (via Homebrew) version 3.8.0
which python
/Users/<myuser>/.pyenv/shims/python
which pip
/Users/<myuser>/.pyenv/shims/pip
What I've tried so far:
Poppler is installed via homebrew (brew install poppler) and pdf2image installed with pip3 install pdf2image
I've run brew cleanup too.
Attempted to force the poppler path in convert_to_path with the following,
pop_path = "/usr/local/Cellar/poppler/21.03.0_1"
convert_to_path(my_pdf_file,poppler_path = pop_path)
but still get the same error.
Had a decent look online and found many people with similar, yet not identical, problems. I feel like I must be doing something wrong, so any guidance would be great.

Partial Solution?
In manually entering the pop_path file path, I forgot to append /bin to the path,
pop_path = "/usr/local/Cellar/poppler/21.03.0_1/bin"
The code works now. Though my pride will take some time to recover...
I feel like I might still be sitting on a bad configuration issue? The many posts on similar issues seem to imply a homebrew installed-popper shouldn't have this issue. Is it maybe because I am using pyenv too?

if you use conda environment run the below code in mac terminal
conda install -c conda-forge pdf2image
conda install -c conda-forge poppler
It solve the problem

Related

Installing PIP Python 3.6.3 Ubuntu 16.04 Zlib Not Available, But It's Installed

I'm trying to follow this tutorial on installing Python 3.6.3 and PIP with virtual environments, but when I get to sudo python3.6 get-pip.py I get the error
Traceback (most recent call last):
File "get-pip.py", line 20061, in <module>
main()
File "get-pip.py", line 194, in main
bootstrap(tmpdir=tmpdir)
File "get-pip.py", line 82, in bootstrap
import pip
zipimport.ZipImportError: can't decompress data; zlib not available
but I have zlib1g-dev installed and don't know how to fix this problem. I've googled a lot and tried reinstalling, but haven't had any success.
Sorry to start a new question, but I did not have enough Karma to comment on the other one. Any help would be greatly appreciated.
Update: I ended up installing everything from source instead of using any packages and it seems to be working. I was not able to solve the problem, but found an alternative way to get things working.
For pip to work, Python needs to be linked to the zlib library when Python itself is installed. It appears that either zlib was not installed when you installed Python, or at least that the Python installer couldn't locate it. To help it along, you may issue the following before installing Python. In bash syntax,
zlib_lib="/usr/lib32"
zlib_inc="/usr/include"
export CPPFLAGS="-I${zlib_inc} ${CPPFLAGS}"
export LD_LIBRARY_PATH="${zlib_lib}:${LD_LIBRARY_PATH}"
export LDFLAGS="-L${zlib_lib} -Wl,-rpath=${zlib_lib} ${LDFLAGS}"
Here I've assumed that zlib is installed under /usr/lib32 and /usr/include/. To check this, look for the libz.so.1 file in the "lib" directory and the zlib.h file in the "inc" directory. If you find them somewhere else, simply change zlib_lib and zlib_inc accordingly.
IF you have different versions of Python installed it is likely the install is on another version. for example. I have pyperclip in 3.6.3 32bit version, but I an not access it in 3.6.3 64bit or 3.7.2dev.

Python: ImportError: lxml not found, please install it

I have the following code (in PyCharm (MacOS)):
import pandas as pd
fiddy_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')
print(fiddy_states)
And I get the following error:
/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/user_name/PycharmProjects/PandasTest/Doc3.py
Traceback (most recent call last):
File "/Users/user_name/PycharmProjects/PandasTest/Doc3.py", line 9, in <module>
fiddy_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/html.py", line 906, in read_html
keep_default_na=keep_default_na)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/html.py", line 733, in _parse
parser = _parser_dispatch(flav)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/html.py", line 693, in _parser_dispatch
raise ImportError("lxml not found, please install it")
ImportError: lxml not found, please install it
In Anaconda does appear installed the last version of lxml (3.8.0). Despite of that, I have tried to reinstall it by: 1) writing pip install lxml and 2) downloading the lxml wheel corresponding to my python version (lxml-3.8.0-cp36-cp36m-win_amd64.whl), but in any case all remains the same (in the second case I get that it is not a supported wheel on this platform, even though the version of python is correct (3.6, 64 bits)).
I've read similar questions here (even with the same code above, since it's from a tutorial), but the problem still persists.
Based on the fact that the error is:
/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6
This means that you are working with python-3.6. Now usually the package manager for python-3.x is pip3. So you probably should install it with:
pip3 install lxml
For people reached here using Jupyter notebook, I restarted the kernel after pip install lxml and the error is gone.
I got same error, it seems that my python3 was pointing to pandas in python2 (since I have not install pandas in python3). After doing pip3 install pandas and restarting a notebook, it worked fine.
you may have to (re)install some of your libraries pip install lxml bs4 html5lib
pd.read_html() reads with 'lxml' library by default, so try another library that you installed above like pd.read_html(some_url, flavor='html5lib')
You can go to Settings > Project Interpreter > Click on '+' icon
Find 'lxml' from the list of packages and click 'Install Package' button found below.
I am using PyCharm 2019.2.1 (Community Edition)
Build #PC-192.6262.63, built on August 22, 2019
Runtime version: 11.0.3+12-b304.39 amd64
VM: OpenJDK 64-Bit Server VM by JetBrains s.r.o
Linux 4.15.0-58-generic
GC: ParNew, ConcurrentMarkSweep
Memory: 937M
Cores: 4
I tried to reinstall lxml without any progress.
I ended uninstalling pandas and reinstalling and updating and that solved my issues!
pip uninstall pandas
pip install pandas
pip3 install --upgrade pandas
I got the same error when trying to run some code that was using pandas. I tried some suggestions here but those did not work. Finally, what worked for me was the following two steps :
conda update anaconda
conda install spyder=5.0.5
Now when I restarted Spyder and ran my code it worked fine.
I have just installed and starting using anaconda so I don't know the root cause of this issue, but my guess is there seemed to be some "cross-connection" in the packages I had installed prior to my installation of Anaconda, and by running the above two steps now everything is running from within the Anaconda environment.
This error occurs when lxml is not installed, so just go to the terminal
and run: pip3 install lxml
I got the same problem. Trying to reinstall lxml does not work. After rereading the error message and tracing the error ~\Miniconda3\envs\mini_ds\lib\site-packages\pandas\io\html.py:872, I think I found the problem lies in the function _importers() in ~/pandas/io/html.py.
Here is the function:
def _importers() -> None:
# import things we need
# but make this done on a first use basis
global _IMPORTS
if _IMPORTS:
return
global _HAS_BS4, _HAS_LXML, _HAS_HTML5LIB
bs4 = import_optional_dependency("bs4", errors="ignore")
_HAS_BS4 = bs4 is not None
lxml = import_optional_dependency("lxml.etree", errors="ignore")
_HAS_LXML = lxml is not None
html5lib = import_optional_dependency("html5lib", errors="ignore")
_HAS_HTML5LIB = html5lib is not None
_IMPORTS = True
You can see that for lxml option, it actually tries importing "lxml.etree" instead of "lxml". So this is probably why reinstalling "lxml" would not help.
Conclusion, I think this is perhaps a problem of pandas version (mine is 1.4.1). For me, a quick solution is to specify the flavor ='html5lib' in pd.read_html().
I installed lxml 4.9.1, but it didn't work. So I tried to install lxml 4.8.0 instead, and it worked!
pip install lxml==4.8
As OP is using Anaconda, in order to solve that issue, install lxml by opening the CMD.Exe Prompt for the environment one is working on, and run
conda install -c anaconda lxml
(Source)
One can also do it by specifying the version as follows
conda install -c anaconda lxml=4.8.0
Notes:
pip doesn't manage dependencies the same way conda does and can, potentially, damage one's installation. Therefore, would recommend to use it only if conda doesn't work.
pip install lxml
# or
pip install lxml==4.9.1
If one is using pip and one has already the package installed and one is getting errors, one can pass -I (--ignore-installed) and -v as follows
pip install -Iv lxml==4.9.1
lxml official documentation can be found here.
This is their official GitHub repo.
I was seeing this issue as well on my RPi.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py", line 1113, in read_html
displayed_only=displayed_only,
File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py", line 902, in _parse
parser = _parser_dispatch(flav)
File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py", line 859, in _parser_dispatch
raise ImportError("lxml not found, please install it")
ImportError: lxml not found, please install it
Looking into /home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py it was attempting to use lxml.etree, so I attempted to just use that module
>>> from lxml import etree
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: libxslt.so.1: cannot open shared object file: No such file or directory
I searched for that error and found that the following packages needed to be installed on the RPi
sudo apt-get install libxslt
After installing I was successfully able to use pandas
import pandas as pd
from urllibenter code here.request import Request, urlopen
url = 'WEB-SITE'
request_site = Request(url, headers={"User-Agent": "Mozilla/5.0"})
webpage = urlopen(request_site)
dfk1 = pd.read_html(webpage, flavor='html5lib')
print(dfk1)

'Library not loaded: #rpath/libcudart.7.5.dylib' TensorFlow Error on Mac

I'm using OS X El Capitan (10.11.4).
I just downloaded TensorFlow using the pip install instructions here.
Everything went pretty smoothly, though I did get a few warning messages like:
The directory '/Users/myusername/Library/Caches/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want the -H flag.
and
You are using pip version 6.0.8, however version 8.1.2 is available. Even though I just installed pip.
Then, when I tested TensorFlow in Python, I got the error:
>>> import tensorflow as tf
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/__init__.py", line 23, in <module>
from tensorflow.python import *
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/__init__.py", line 48, in <module>
from tensorflow.python import pywrap_tensorflow
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in <module>
_pywrap_tensorflow = swig_import_helper()
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/_pywrap_tensorflow.so, 10): Library not loaded: #rpath/libcudart.7.5.dylib
Referenced from: /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/_pywrap_tensorflow.so
Reason: image not found
Now, when I try to do pip uninstall tensorflow-0.10.0rc0 it tells me that it's not installed.
The closest thing I've found to resembling this problem is this issue in the TensorFlow GitHub docs (which I have not tried).
How can I uninstall whatever it did install and get TensorFlow up and running correctly?
This error message is displayed if you install the GPU-enabled Mac OS version of TensorFlow (available from release 0.10 onwards) on a machine that does not have CUDA installed.
To fix the error, install the CPU version for Python 2.7 or 3.x, as follows:
# Mac OS X, CPU only, Python 2.7:
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-0.12.0-py2-none-any.whl
$ sudo pip install --upgrade $TF_BINARY_URL
# Mac OS X, CPU only, Python 3.4 or 3.5:
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-0.12.0-py3-none-any.whl
$ sudo pip3 install --upgrade $TF_BINARY_URL
See tensorflow versions: https://www.tensorflow.org/versions/
To add to #mrry's answer, if you already have CUDA installed but you still get the error, it could be because the CUDA libraries are not on your path. Add the following to your ~/.bashrc or ~/.zshrc:
# export CUDA_HOME=/Developer/NVIDIA/CUDA-7.5 ## This is the default location on macOS
export CUDA_HOME=/usr/local/cuda
export DYLD_LIBRARY_PATH="$CUDA_HOME/lib:$DYLD_LIBRARY_PATH"
export PATH="$CUDA_HOME/bin:$PATH"
Uncomment either of the CUDA_HOMEs or edit it so that it contains your CUDA install. If you do not know where it is installed, try:
find / -name "*libcudart*"
Certainly installing CUDA is essential, as is ensuring that all the paths are correct. I'm running:
TensorFlow 0.12r0
OSX 10.12.1
python 2.7 from brew
virtualenv to separate my python environments
CUDA 8.0.55
cudnn-8.0-osx-x64-v5.1
On my system I have also had further issues where it appears that the problem originates from the dynamic libraries internally referencing relative paths.
To discover the #rpath being referenced from _pywrap_tensorflow.so the following code is run:
otool -l /Users/norman_h/.virtualenvs/env_name/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so
which returned, amongst other things, the following:
Load command 15
cmd LC_RPATH
cmdsize 128
path $ORIGIN/../../_solib_darwin/_U#local_Uconfig_Ucuda_S_Scuda_Ccudart___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Slib (offset 12)
Load command 16
cmd LC_RPATH
cmdsize 48
path ../local_config_cuda/cuda/lib (offset 12)
Load command 17
cmd LC_RPATH
cmdsize 56
path ../local_config_cuda/cuda/extras/CUPTI/lib (offset 12)
It can be seen that the dynamic library is attempting to find the CUDA libraries within my virtual environment where I installed TensorFlow with pip. It's not looking within my systems environment paths.
A hack around of a solution is to dynamically link the CUDA libraries from their /usr/local/cuda/lib location into the site-packages where pip installed TensorFlow inside my virtual environment.
mkdir /Users/norman_h/.virtualenvs/env_name/lib/python2.7/site-packages/tensorflow/local_config_cuda
cd /Users/norman_h/.virtualenvs/env_name/lib/python2.7/site-packages/tensorflow/local_config_cuda
ln -s /usr/local/cuda .
Will need to re-link when pip upgrades TensorFlow from within the virtual environment.
I think this all goes back to the original compilation of TensorFlow that is done for the pip install and I have no idea how to submit a fix, or even if I am correct. Perhaps the original compilation of Tensorflow needs to be more dynamic and not static.
Best of luck!
This issue popped up on my macOS when i tried to import pyTorch. I found the solution in a japanese site which can't make heads and tails out of which simply gave the solution as brew install libomp . Cheers! Sorry for posting to an old thread, but I thought it necessary.

Ghostscript Can not find Ghostscript library (libgs)

I'm on a macbookpro 10.6.8 and I get this error message when trying to use ghostscript:
Traceback (most recent call last):
File "/Users/arnoutaertgeerts/Documents/Eclips/SlideTalk 2.0/slidetalk.py", line 13, in <module>
import ghostscript
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/ghostscript-0.4.1-py2.7.egg/ghostscript/__init__.py", line 33, in <module>
import _gsprint as gs
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/ghostscript-0.4.1-py2.7.egg/ghostscript/_gsprint.py", line 290, in <module>
raise RuntimeError('Can not find Ghostscript library (libgs)')
RuntimeError: Can not find Ghostscript library (libgs)
Installed the package with:
pip install ghostscript
For newer user who are using M1 mac, ghostscript might show a missing libgs file error and the file would be unavailable at usr/local/lib
The issue can be resolved by following these steps, in the same order:
brew install ghostscript
conda install ghostscript, which installs the arm_64 based library from conda-forge
If conda throws a channel error try using conda install -c conda-forge ghostscript
pip install ghostscript
Note:
The libgs.dylib file can be found in the home brew installation of ghostscript
Changing the address in _gsprint.py will throw an error as the brew installed version will be arm_64 based and the pip installed version will be OS_X86 based
Python doesn't recognise ghostscript module from conda install unless pip install is run, so that is an essential step
How are you trying to 'use' Ghostscript ? This seems to be an error from Python which can't find libgs (I do not speak Python I'm afraid)
I'm not certain that libgs is is included in the Mac installation but if it is, then libgs should be in the Ghostscript folder. Have you checked to see if it is present ?
If it is, then the most likely problem is that its not in the search path, I have no idea how searches are resolved on a Mac though.
The ctypes.find_library searches in /urs/local/lib.
I added this path to my ghostscript module:
/opt/local/lib/libgs
I changed the path for libgs in the file "_gsprint.py" and it works~
Instead of libgs.so (libgs = cdll.LoadLibrary("libgs.so"), I used
libgs = cdll.LoadLibrary("Corresponding_Path_in_my_laptop/libgs.dylib").
Ps: There's no libgs.so on my Mac, only one libgs.dylib file.
Thanks #KenS and #arnoutaertgeerts!

Problems importing python-Xlib

I installed a new module and it appears as if one of its dependencies was not already installed. The module is called Xlib.display.
Here is the error message I received:
from Xlib.display import Display
ImportError: No module named Xlib.display
Where can I find this module that I am apparently lacking? Google yielded no leads.
"Edit: I already have that sourceforge module downloaded but I still get the same results.
Please try.
This shall install Xlib
sudo apt-get install python-xlib
Then you can check
>>from Xlib.display import Display
To install PyMouse if you want to control and capture mouse events please use:
sudo easy_install https://github.com/pepijndevos/PyMouse/zipball/master
Below worked for me!
pip install python3_xlib
I have also used pyuserinput for automation which requires this.
I was having the same problem, but the solutions above didn't work for me. Since I had installed python through the anaconda package, when I used:
sudo apt-get install python-xlib
Xlib was still undetectable by python2. The solution in my case was to use:
anaconda search -t conda python-xlib
Then find the package from the anaconda api, mine was erik/python-xlib. Install it using:
conda install --channel https://conda.anaconda.org/erik python-xlib
Then it worked.
On Debian systems install python-xlib.
On other systems there's a high probability that the package carries the same name.
I don't think the Xlib library works in Python 3.
Source:
Requirements
The Python X Library requires Python 1.5.2 or newer. It has been tested to various extents with Python 1.5.2 and 2.0 through 2.6.
I honestly cant explain why this works... but here is the command that got it working for me.
sudo apt-get install python3-xlib
Should not work because xlib apparently does not work with python 3.x, but everything installed alright, so I'm not complaining!
I was looking for the same answer, however after some more digging it seems that XCB (X protocol C-language Binding) will obsolete Xlib in general. From the XCB website:
The X protocol C-language Binding (XCB) is a replacement for Xlib featuring a small footprint, latency hiding, direct access to the protocol, improved threading support, and extensibility.
Fortunately there are python bindings available as python-xpyb in apt or xpyb on PyPi. I've not gotten that far in my project so I haven't tested if this works with Python3, but this is probably the way to go and the proper place to file any Python3 support bugs if necessary.
Scenario:
I was trying to use screenshot functionalities of pyautogui package. I was getting this error:
Traceback (most recent call last):
File "test_screenshot.py", line 1, in <module>
import pyautogui
File ".../miniconda3/envs/myenv/lib/python3.7/site-packages/pyautogui/__init__.py", line 152, in <module>
from . import _pyautogui_x11 as platformModule
File ".../miniconda3/envs/myenv/lib/python3.7/site-packages/pyautogui/_pyautogui_x11.py", line 7, in <module>
from Xlib.display import Display
ModuleNotFoundError: No module named 'Xlib'
Python code (test_screenshot.py):
import pyautogui
img = pyautogui.screenshot('test.png')
Environment:
Ubuntu 16.04 (LTS)
conda 4.5.11
Python 3.7 (Miniconda)
requirements.txt:
certifi==2019.3.9
Pillow==5.4.1
PyAutoGUI==0.9.42
PyGetWindow==0.0.4
PyMsgBox==1.0.6
PyRect==0.1.4
PyScreeze==0.1.20
PyTweening==1.0.3
Solution:
I installed python-xlib package in the conda environment using:
pip install python-xlib
Now test_screenshot.py is running without any error.
Updated requirements.txt:
certifi==2019.3.9
Pillow==5.4.1
PyAutoGUI==0.9.42
PyGetWindow==0.0.4
PyMsgBox==1.0.6
PyRect==0.1.4
PyScreeze==0.1.20
python-xlib==0.25
PyTweening==1.0.3
six==1.12.0

Categories

Resources