Tabula-py - ImportError: No module named tabula - python

I am trying to use Tabula-py to read a pdf. I installed tabula-py through pip install tabula-py
I have also installed the required dependencies
requests
pandas
pytest
flake8
My code is currently as follows:
import tabula
import pandas as pd
df = tabula.read_pdf("report.pdf", pages=2)
print(df)
I am getting the following error:
Traceback (most recent call last):
File "tabula_pdf_reader.py", line 1, in <module>
import tabula
ImportError: No module named tabula
Any inputs to what I am missing here?

I faced this same issue in Ubuntu.
First, check the version of the JDK and JRE that are installed on your machine by running java --version and javac --version. Each should have a version greater than 7.
Then use pip3 to install tabula.

I got the same issue here when executing on Terminal.
However, after I ran by starting with 'ipython3' instead of 'ipython', it worked perfectly.
You have to make sure that tabula-py module is installed in python3 directory, not python2

use this
import camelot
tables = camelot.read_pdf('foo.pdf')
tables.export('foo.csv', f='csv', compress=True)

For macOS users - an update to Monterey operating system will solve the problem.

Related

ModuleNotFoundError: No module named 'camelot'

I want to extract tables from pdf and for that
I used Camelot. But I'm getting this error whenever I try to import it:
import camelot
Traceback (most recent call last):
File "<ipython-input-11-679d8f55abf0>", line 1, in <module>
import camelot
ModuleNotFoundError: No module named 'camelot'
I've tried installing camelot using:
pip install camelot-py[cv]
and
pip install camelot-py[all]
but I'm getting the same error again and again. How do I remove this?
Your help would be appreciated!
Check for your python version by writing python --version in the command prompt with the path where python is installed.
For python 3.7, try:
pip install camelot-py
https://pypi.org/project/camelot-py/
I hope this works for you.
If using conda (this is what I'd recommend):
conda install -c conda-forge camelot-py
If using pip (may have to manually handle dependencies): pip install camelot-py[cv]
Official installation instructions: https://camelot-py.readthedocs.io/en/master/user/install.html#install
Try to install Camelot in correct python version directory using
''''python2.7 -m pip install''''
Use your python version number instead of 2.7 above
In your python environment you have to install padas library.
You can install Camelot python with following command:
pip install Camelot
After the installation of Camelot python library, ModuleNotFoundError: No module named 'Camelot' error will be solved.
Thanks

AttributeError: module 'camelot' has no attribute 'read_pdf'

I am trying to extract tables from pdf using camelot and I get this attribute error. Could you please help?
import camelot
import pandas as pd
pdf = camelot.read_pdf("Gordian.pdf")
AttributeError Traceback (most recent call last)
in
----> 1 pdf = camelot.read_pdf("Gordian.pdf")
AttributeError: module 'camelot' has no attribute 'read_pdf'
NOTE : If you are using virtual environment activate environment before do this things.
I have already faced this error.There is a no bug in your code.The problem is with camelot installation.
1 remove installed camelot version
2 install again using this command. There is a multiple ways to install camelot. Please try it one by one
pip install camelot-py
pip install camelot-py[cv]
pip install camelot-py[all]
3 run your code >> i have attached sample code here
import camelot
data = camelot.read_pdf("test_file.pdf", pages='all')
print(data)
Try this: import camelot.io as camelot
That worked for me.
please check if you have java installed on you machine, go to your terminal and run "java -version", if you do not have you won't be able to read pdf using Camelot or tabula,
once you have installed java, install tabula-py using the command
pip install tabula-py.
from tabula.io import read_pdf
tables = read_pdf('file.pdf') # substitute your file name
I abandoned trying to get camelot to work in Jupiter Notebooks to read tables & instead installed the following:
!{sys.executable} -m pip install tabula-py tabulate
from tabula import read_pdf
from tabulate import tabulate
pdf_path = (
Path.home()
/ "my_pdf.pdf"
)
df = read_pdf(str(pdf_path), pages=1)
df[0]
Here's the link with full installation steps:
https://camelot-py.readthedocs.io/en/master/user/install.html#using-pip
After you install
pip install camelot-py[cv]
Write this:
import camelot.io as camelot
pip uninstall camelot
pip uninstall camelot-py
pip install camelot-py[cv]
install ghostscript app from internet
! apt install ghostscript python3-tk
pip install ghostscript
When downloading the library please pay attention to where it is downloaded. Because the library you downloaded may have been saved in another Python version

No module named 'pandas._libs.tslib'

I am not able to import pandas
C:\Users\Yash\Desktop\Python\Twitter Sentimental Analysis>python import.py
Traceback (most recent call last):
File "C:\Users\Yash\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\__init__.py", line 26, in <module>
from pandas._libs import (hashtable as _hashtable,
File "C:\Users\Yash\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\_libs\__init__.py", line 4, in <module>
from .tslib import iNaT, NaT, Timestamp, Timedelta, OutOfBoundsDatetime
ModuleNotFoundError: No module named 'pandas._libs.tslib'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "import.py", line 4, in <module>
import pandas as pd
File "C:\Users\Yash\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\__init__.py", line 35, in <module>
"the C extensions first.".format(module))
ImportError: C extension: No module named 'pandas._libs.tslib' not built. If you want to import pandas from the source directory, you may need to run 'python setup.py build_ext --inplace --force' to build the C extensions first.
I tried screwing around but this error stayed the same.
I have updated the modules already along with pip and python!
This is the full traceback of the command.
I am currently using python 3.6.6 (downloaded from the official site)
pip version : 18.1 running on windows 10 laptop!!
I faced a similar issue and solved it by manually uninstalling pandas and then installing pandas using pip. You have mentioned that you have only updated pandas. So I assume you haven't tried re-installing it.
While doing so pandas version in my environment changed from 0.23.4 to 0.24.1
My Environment :
python 3.6.7
pip 18.1
Note : I am also a beginner in Python usage. More experienced users may know a better way.
pip uninstall pandas
pip install pandas
The above steps solved my issues and I am able to import pandas.
I checked the release notes in pandas community and it seems like the dependency on tslib has been removed.
Check section 1.5 in the below link and search for tslib.
http://pandas.pydata.org/pandas-docs/version/0.24/pandas.pdf
I faced the same error and resolved it by calling the following commands:
pip uninstall pandas
pip install pandas
pip3 install --upgrade pandas
I was facing the same error. I tried the above solutions didn't work out. Here what worked for me.
If you have two different python env and trying to run files from different env then first you have to uninstall pandas from both env and install them in the new env.
For example, I have installed python3.6 and python3.9, so first I uninstalled pandas from 3.6
sudo pip3.6 uninstall pandas
I repeated this command serval times until all versions of pandas have uninstalled. after that, I install the pandas in 3.9 using this command
/usr/bin/python3.9 -m pip install pandas

No module named 'gspread', ModuleNotFoundError. Problem with import in Python

I seem to have a very common problem although nothing I try works for me:
I have installed Python 3.6.5 for Windows 64 bit and am using Vs Code for editing.
I used Ubuntu to install pip3 and then installed gspread as follows:
pip3 install gspread
Although import gspread gives an error:
Traceback (most recent call last):
File "c:\Users\User\Documents\Vs Code\test.py", line 2, in <module>
import gspread
ModuleNotFoundError: No module named 'gspread'
How do I fix this problem to import gspread?
you can input, in python console :
in[1]: pip install gspread
the picture was taken after installation, for demonstration

Python: ImportError: lxml not found, please install it

I have the following code (in PyCharm (MacOS)):
import pandas as pd
fiddy_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')
print(fiddy_states)
And I get the following error:
/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/user_name/PycharmProjects/PandasTest/Doc3.py
Traceback (most recent call last):
File "/Users/user_name/PycharmProjects/PandasTest/Doc3.py", line 9, in <module>
fiddy_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/html.py", line 906, in read_html
keep_default_na=keep_default_na)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/html.py", line 733, in _parse
parser = _parser_dispatch(flav)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/html.py", line 693, in _parser_dispatch
raise ImportError("lxml not found, please install it")
ImportError: lxml not found, please install it
In Anaconda does appear installed the last version of lxml (3.8.0). Despite of that, I have tried to reinstall it by: 1) writing pip install lxml and 2) downloading the lxml wheel corresponding to my python version (lxml-3.8.0-cp36-cp36m-win_amd64.whl), but in any case all remains the same (in the second case I get that it is not a supported wheel on this platform, even though the version of python is correct (3.6, 64 bits)).
I've read similar questions here (even with the same code above, since it's from a tutorial), but the problem still persists.
Based on the fact that the error is:
/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6
This means that you are working with python-3.6. Now usually the package manager for python-3.x is pip3. So you probably should install it with:
pip3 install lxml
For people reached here using Jupyter notebook, I restarted the kernel after pip install lxml and the error is gone.
I got same error, it seems that my python3 was pointing to pandas in python2 (since I have not install pandas in python3). After doing pip3 install pandas and restarting a notebook, it worked fine.
you may have to (re)install some of your libraries pip install lxml bs4 html5lib
pd.read_html() reads with 'lxml' library by default, so try another library that you installed above like pd.read_html(some_url, flavor='html5lib')
You can go to Settings > Project Interpreter > Click on '+' icon
Find 'lxml' from the list of packages and click 'Install Package' button found below.
I am using PyCharm 2019.2.1 (Community Edition)
Build #PC-192.6262.63, built on August 22, 2019
Runtime version: 11.0.3+12-b304.39 amd64
VM: OpenJDK 64-Bit Server VM by JetBrains s.r.o
Linux 4.15.0-58-generic
GC: ParNew, ConcurrentMarkSweep
Memory: 937M
Cores: 4
I tried to reinstall lxml without any progress.
I ended uninstalling pandas and reinstalling and updating and that solved my issues!
pip uninstall pandas
pip install pandas
pip3 install --upgrade pandas
I got the same error when trying to run some code that was using pandas. I tried some suggestions here but those did not work. Finally, what worked for me was the following two steps :
conda update anaconda
conda install spyder=5.0.5
Now when I restarted Spyder and ran my code it worked fine.
I have just installed and starting using anaconda so I don't know the root cause of this issue, but my guess is there seemed to be some "cross-connection" in the packages I had installed prior to my installation of Anaconda, and by running the above two steps now everything is running from within the Anaconda environment.
This error occurs when lxml is not installed, so just go to the terminal
and run: pip3 install lxml
I got the same problem. Trying to reinstall lxml does not work. After rereading the error message and tracing the error ~\Miniconda3\envs\mini_ds\lib\site-packages\pandas\io\html.py:872, I think I found the problem lies in the function _importers() in ~/pandas/io/html.py.
Here is the function:
def _importers() -> None:
# import things we need
# but make this done on a first use basis
global _IMPORTS
if _IMPORTS:
return
global _HAS_BS4, _HAS_LXML, _HAS_HTML5LIB
bs4 = import_optional_dependency("bs4", errors="ignore")
_HAS_BS4 = bs4 is not None
lxml = import_optional_dependency("lxml.etree", errors="ignore")
_HAS_LXML = lxml is not None
html5lib = import_optional_dependency("html5lib", errors="ignore")
_HAS_HTML5LIB = html5lib is not None
_IMPORTS = True
You can see that for lxml option, it actually tries importing "lxml.etree" instead of "lxml". So this is probably why reinstalling "lxml" would not help.
Conclusion, I think this is perhaps a problem of pandas version (mine is 1.4.1). For me, a quick solution is to specify the flavor ='html5lib' in pd.read_html().
I installed lxml 4.9.1, but it didn't work. So I tried to install lxml 4.8.0 instead, and it worked!
pip install lxml==4.8
As OP is using Anaconda, in order to solve that issue, install lxml by opening the CMD.Exe Prompt for the environment one is working on, and run
conda install -c anaconda lxml
(Source)
One can also do it by specifying the version as follows
conda install -c anaconda lxml=4.8.0
Notes:
pip doesn't manage dependencies the same way conda does and can, potentially, damage one's installation. Therefore, would recommend to use it only if conda doesn't work.
pip install lxml
# or
pip install lxml==4.9.1
If one is using pip and one has already the package installed and one is getting errors, one can pass -I (--ignore-installed) and -v as follows
pip install -Iv lxml==4.9.1
lxml official documentation can be found here.
This is their official GitHub repo.
I was seeing this issue as well on my RPi.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py", line 1113, in read_html
displayed_only=displayed_only,
File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py", line 902, in _parse
parser = _parser_dispatch(flav)
File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py", line 859, in _parser_dispatch
raise ImportError("lxml not found, please install it")
ImportError: lxml not found, please install it
Looking into /home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py it was attempting to use lxml.etree, so I attempted to just use that module
>>> from lxml import etree
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: libxslt.so.1: cannot open shared object file: No such file or directory
I searched for that error and found that the following packages needed to be installed on the RPi
sudo apt-get install libxslt
After installing I was successfully able to use pandas
import pandas as pd
from urllibenter code here.request import Request, urlopen
url = 'WEB-SITE'
request_site = Request(url, headers={"User-Agent": "Mozilla/5.0"})
webpage = urlopen(request_site)
dfk1 = pd.read_html(webpage, flavor='html5lib')
print(dfk1)

Categories

Resources