Facing below issue: can anyone help? please..
Getting the below while trying to extract table data from PDF's..
import camelot
# PDF file to extract tables from
file = input_folder+file_name
tables = camelot.read_pdf(file)
# number of tables extracted
print("Total tables extracted:", tables.n)
# print the first table as Pandas DataFrame
print(tables[0].df)
Error: AttributeError: module 'camelot' has no attribute 'read_pdf'
This error most likely occured because you installed the wrong package.
When you installed the camelot module, you should have used this:
pip install camelot-py[cv]
If not, uninstall the package you installed and use the above command.
I encounted the same problem and tried many things, including install/uninstall various camelot packages, cloning git, etc. It didn't work for me. I found that the issue was related to CV2. Server (headless) environments do not have GUI packages installed so if you are using Camelot on the server with no GUI you should instal opencv-python-headless first:
pip install opencv-python-headless
and then import in along with camelot.io insteatd of camelot:
import camelot.io as camelot
import cv2
Related
I've been using dataframe_image for a while and have had great results so far. Last week, out of a sudden, all my code containing the method dfi.export() stopped working with this error as an output
raise SyntaxError("not a PNG file")
File <string>
SyntaxError: not a PNG file
I can export the images passing the argument table_conversion='matplotlib' but they do not come out styled...
This is my code:
now = str(datetime.now())
filename = ("Extracciones-"+now[0:10]+".png")
df_styled = DATAFINAL.reset_index(drop=True).style.apply(highlight_rows, axis=1)
dfi.export(df_styled, filename,max_rows=-1)
IMAGEN = Image.open(filename)
IMAGEN.show()
Any clues on why this just suddenly stopped working?
Or any ideas to export dataframes as images (not using html)?
These were the outputs i used to get:
fully styled dataframe images
and this is the only thing I can get right now
Thank you in advance
dataframe_image has a dependency on Chrome, and a recent Chrome update (possibly v109 on 2013-01-10) broke dataframe_image. v0.1.5 was released on 2023-01-14 to fix this.
pip install --upgrade dataframe_image
pip show dataframe_image
The version should now be v0.1.5 or later, which should resolve the problem.
Some users have reported still having the error even after upgrading. This could be due to upgrading the package in the wrong directory (due to multiple installations of python, pip, virtual envs, etc). The reliable way to check the actual version of dataframe_image that the code is using, is to add this debugging code to the top of your python code:
import pandas as pd
import dataframe_image as dfi
from importlib.metadata import version
print(version('dataframe_image'))
df = pd.DataFrame({'x':[1,2]})
dfi.export(df, 'out.png')
exit()
Also check chrome://version/ in your Chrome browser.
I had the same issue, and we finally figured it out; looks like the new release of dataframe_image on 1/14 broke something on the previous version.
We upgraded the package and the issue was resolved.
pip install -u dataframe_image
I fixed this issue.
It was related to resources that are busy in the background and chrome can't use these resources.
https://github.com/dexplo/dataframe_image/pull/70
please update the library to version v0.1.5
you can do it via pip :
python -m pip install --upgrade dataframe-image
Good luck.
I trying to replicate passing-networks-in-python repositories outcome. I have installed the dependencies listed in requirements.txt and downloaded StatsBomb and Metrica Sports data into the eventing and tracking folder.
However, when trying to run prepare_vaep.py I get ModuleNotFoundError: No module named 'socceraction.classification' returned.
Could this be an issue with the version I am using (3.7.6)?
Seems like socceraction module was updated and does not include classification packages (or were moved). Either update the import socceraction.classification to correct import or install certain version using pip install socceractiopn==<version_num>
Check socceraction Github for source code
Here is a specific commit in package structure changes
EDIT: change any import socceraction.classification to import socceraction.vaep (change any children that use classification as well) if you want to use latest code.
I know that this is a simple question...I have tried going through a few of the other questions related to ModuleNotFoundError w/Pycharm and I tried uninstalling docxptl via pip but to no avail.
Looking in the library i see docxtpl so I am a bit confused.
I also uninstalled and reinstalled lxml via pip as that seemed to cause some issues with docxptl with other people with docxtpl
code:
from docxtpl import DocxTemplate
error message:
ModuleNotFoundError: No module named 'docxtpl'
Follow my steps carefully:
Go to PyCharm settings
Search for Project Intrepreter
Click on the + icon in the window
Then search for docxtpl in the new window and then click on it
Then select Install Package
Full fledged tutorial here: https://www.jetbrains.com/help/pycharm/installing-uninstalling-and-upgrading-packages.html
A .py program works but the exact same code, when exposed as API, doesn't work.
The code reads the pdf with Tabula and provides the table content as a output.
I've tried :
import tabula
df = tabula.read_pdf("my_pdf")
print(df)
and
from tabula import wrapper
df = wrapper.read_pdf("my_pdf")
print(df)
I've installed tabula-py (not tabula) on AWS EC2 running Ubuntu.
More than read_pdf, I actually want to convert to CSV and give the output. But that doesn't work as well. I get the same no-attribute error i.e. module 'tabula' has no attribute 'convert_into.
The .py file and the API file (.py as well) are in the same directory and are accessed with the same user.
Any help will be highly appreciated.
EDIT : I tried to run the same python file from the API as OS command (os.system("python3 /home/ubuntu/flaskapp/tabler.py")). But it didn't work as well.
make sure that you installed tabula-py not just tabula
use
!pip install tabula-py
and to import it use
from tabula.io import read_pdf
There is actually an entry in the FAQ about this issue specifically :
If you’ve installed tabula, it will be conflict the namespace. You should install tabula-py after removing tabula.
Although using read_csv() from tabula.io worked, as suggested by other answers, I was also able to use tabula.read_csv() after having removed tabula and reinstalled tabula-py (using pip install --force-reinstall tabula-py).
If you accidentally installed tabula before installing tabula-py, they'll conflict in the namespace (even after uninstalling tabula).
Uninstall tabula-py and re-install it. That did the trick for me.
There is something off with tabula package. I looked inside and there is no __init__.py. You can do:
from tabula.io import read_pdf
it worked for me.
from tabula import read_pdf didn't work for me. I've replaced tabula.read_pdf() by tabula.io.read_pdf() to make it work.
if you are working in colab then u have to install it by command
!pip install -q tabula-py
import tabula
and for using function like read_pdf and convert_into we have to use
dfs = tabula.io.read_pdf(path, stream=True)
Note-tabula.io (should be used to access these function in colab)
have a good day and long live Data science community.
try
from tabula import read_pdf
I had the same problem, and this fixed it.
It is working this way:
import tabula # just this here!
#declare the path of your file
file_path = "/path/to/pdf_file/data.pdf"
#Convert your file
df = tabula.io.read_pdf(file_path)
Thai is all!
Trying to access the todoist api, and I copied some code from the api documentation. However, on my system I get an error stating:
Unable to import 'todoist.api'pylint(import-error).
I installed it with:
pip install todoist-python
as mentioned in the documentation
from todoist.api import TodoistAPI
I get my error on the very first line. How do I not get this?
You did everything right, so it's probably related to the way your installation is set.
Be sure you are using the same python you used to install the library. Check if the library is installed (pip list) and check if you're using the right Python when running the code. It's possible that the library was installed in one version and you're using the other.
I had the same problem, I solved it by following the GitHub instructions, but the name of the module to install using pip is todoist.