A .py program works but the exact same code, when exposed as API, doesn't work.
The code reads the pdf with Tabula and provides the table content as a output.
I've tried :
import tabula
df = tabula.read_pdf("my_pdf")
print(df)
and
from tabula import wrapper
df = wrapper.read_pdf("my_pdf")
print(df)
I've installed tabula-py (not tabula) on AWS EC2 running Ubuntu.
More than read_pdf, I actually want to convert to CSV and give the output. But that doesn't work as well. I get the same no-attribute error i.e. module 'tabula' has no attribute 'convert_into.
The .py file and the API file (.py as well) are in the same directory and are accessed with the same user.
Any help will be highly appreciated.
EDIT : I tried to run the same python file from the API as OS command (os.system("python3 /home/ubuntu/flaskapp/tabler.py")). But it didn't work as well.
make sure that you installed tabula-py not just tabula
use
!pip install tabula-py
and to import it use
from tabula.io import read_pdf
There is actually an entry in the FAQ about this issue specifically :
If you’ve installed tabula, it will be conflict the namespace. You should install tabula-py after removing tabula.
Although using read_csv() from tabula.io worked, as suggested by other answers, I was also able to use tabula.read_csv() after having removed tabula and reinstalled tabula-py (using pip install --force-reinstall tabula-py).
If you accidentally installed tabula before installing tabula-py, they'll conflict in the namespace (even after uninstalling tabula).
Uninstall tabula-py and re-install it. That did the trick for me.
There is something off with tabula package. I looked inside and there is no __init__.py. You can do:
from tabula.io import read_pdf
it worked for me.
from tabula import read_pdf didn't work for me. I've replaced tabula.read_pdf() by tabula.io.read_pdf() to make it work.
if you are working in colab then u have to install it by command
!pip install -q tabula-py
import tabula
and for using function like read_pdf and convert_into we have to use
dfs = tabula.io.read_pdf(path, stream=True)
Note-tabula.io (should be used to access these function in colab)
have a good day and long live Data science community.
try
from tabula import read_pdf
I had the same problem, and this fixed it.
It is working this way:
import tabula # just this here!
#declare the path of your file
file_path = "/path/to/pdf_file/data.pdf"
#Convert your file
df = tabula.io.read_pdf(file_path)
Thai is all!
Related
I've been using dataframe_image for a while and have had great results so far. Last week, out of a sudden, all my code containing the method dfi.export() stopped working with this error as an output
raise SyntaxError("not a PNG file")
File <string>
SyntaxError: not a PNG file
I can export the images passing the argument table_conversion='matplotlib' but they do not come out styled...
This is my code:
now = str(datetime.now())
filename = ("Extracciones-"+now[0:10]+".png")
df_styled = DATAFINAL.reset_index(drop=True).style.apply(highlight_rows, axis=1)
dfi.export(df_styled, filename,max_rows=-1)
IMAGEN = Image.open(filename)
IMAGEN.show()
Any clues on why this just suddenly stopped working?
Or any ideas to export dataframes as images (not using html)?
These were the outputs i used to get:
fully styled dataframe images
and this is the only thing I can get right now
Thank you in advance
dataframe_image has a dependency on Chrome, and a recent Chrome update (possibly v109 on 2013-01-10) broke dataframe_image. v0.1.5 was released on 2023-01-14 to fix this.
pip install --upgrade dataframe_image
pip show dataframe_image
The version should now be v0.1.5 or later, which should resolve the problem.
Some users have reported still having the error even after upgrading. This could be due to upgrading the package in the wrong directory (due to multiple installations of python, pip, virtual envs, etc). The reliable way to check the actual version of dataframe_image that the code is using, is to add this debugging code to the top of your python code:
import pandas as pd
import dataframe_image as dfi
from importlib.metadata import version
print(version('dataframe_image'))
df = pd.DataFrame({'x':[1,2]})
dfi.export(df, 'out.png')
exit()
Also check chrome://version/ in your Chrome browser.
I had the same issue, and we finally figured it out; looks like the new release of dataframe_image on 1/14 broke something on the previous version.
We upgraded the package and the issue was resolved.
pip install -u dataframe_image
I fixed this issue.
It was related to resources that are busy in the background and chrome can't use these resources.
https://github.com/dexplo/dataframe_image/pull/70
please update the library to version v0.1.5
you can do it via pip :
python -m pip install --upgrade dataframe-image
Good luck.
Facing below issue: can anyone help? please..
Getting the below while trying to extract table data from PDF's..
import camelot
# PDF file to extract tables from
file = input_folder+file_name
tables = camelot.read_pdf(file)
# number of tables extracted
print("Total tables extracted:", tables.n)
# print the first table as Pandas DataFrame
print(tables[0].df)
Error: AttributeError: module 'camelot' has no attribute 'read_pdf'
This error most likely occured because you installed the wrong package.
When you installed the camelot module, you should have used this:
pip install camelot-py[cv]
If not, uninstall the package you installed and use the above command.
I encounted the same problem and tried many things, including install/uninstall various camelot packages, cloning git, etc. It didn't work for me. I found that the issue was related to CV2. Server (headless) environments do not have GUI packages installed so if you are using Camelot on the server with no GUI you should instal opencv-python-headless first:
pip install opencv-python-headless
and then import in along with camelot.io insteatd of camelot:
import camelot.io as camelot
import cv2
I am working with Pandas for the 1st time and don't know much about it.
While trying to read an Excel file, Visual Studio code shows the "missing dependency xlrd". I don't know what to do.
Info:
Anaconda, VS code installed on the same drive. Excel file also on the same drive. I am using Windows 10 64bit.
Very short description. It would be nice if the description was a little more detailed. Try install the module:
pip install xlrd
If using python3 then:
pip3 install xlrd
If you are using conda:
conda install -c anaconda xlrd
May be there are multiple python versions in the system, where requirement might be satisfied for one and not for the other. I faced such problem and python3 rather than pip3 worked for me. Check out this too.
python3 -m pip install xlrd
Then it must work, otherwise, upgrade.
pip3 install --upgrade pandas
pip3 install --upgrade xlrd
I hope this will work.
import xlrd
import pandas as pd
sp = pd.ExcelFile("data.xlsx")
print(sp.parse(sp.sheet_names[0]))
If it doesn't work even after the upgrade, my guess is that there is another problem that is not known from your description. (Please include the full error message in the description as a code block, not in image format.)
First make sure you have all the required libraries installed.
pip install pandas
Pandas also requires the NumPy library
pip install numpy
In order to work with Pandas in your script, you will need to import it into your code. This is done with one line of code:
import pandas as pd
To work with Excel using Pandas, you need an additional object named ExcelFile. ExcelFile is built into the Pandas ecosystem, so you import directly from Pandas:
from pandas import ExcelFile
Recall your path where you have that excel file, example: /Users/Desktop/file.xlsx
Rather than referencing the path inside of the Read_Excel function, keep code clean by storing the path in a variable:
file_path = '/Users/Desktop/file.xlsx'
The Read_Excel function takes the file path of an Excel Workbook and returns a DataFrame object with the contents.
Put it all together and set the DataFrame object to a variable named “df”:
df = pd.read_excel(file_Path)
Lastly, you want to view the DataFrame so print the result. Add a print statement to the end of your script, using the DataFrame variable as the argument
print(df)
I am building a Python script for processing json data according to some criteria.
Also I have built a custom module which consists of methods for retrieving json data and generation of json file which consist of processed data.
But that module file is stored into S3 bucket and I need to import that module into my script so that I can invoke functions defined in module.
Please suggest me appropriate solution regarding importing python module from external URL
Well, you could download the file using urllib2 and then import it, if the online module is all in one file:
from urllib2 import urlopen
r = urlopen('http://urlHere/fileHere')
f = open('filenameToWrite', 'w')
f.write(r.read())
f.close()
import filenameWithout.PyInIt
Package your module into your favorite extension (tarball, wheel, etc.) using setuptools and then you will be able to install it using pip as bruno by doing something like:
pip install --no-index --trusted-host s3_ip/host --find-links http://s3.com...
Please see my answer to question # 48905127 - believed to be the best solution I found so far and comes with detailed steps and code snippets ready for you to copy and use.
I am trying to use BeautifulSoup, and despite using the import statement:
from bs4 import BeautifulSoup
I am getting the error: ImportError: cannot import name BeautifulSoup
import bs4 does not give any errors.
I have also tried import bs4.BeautifulSoup and just importing bs4 and creating a BeautifulSoup object with: bs4.BeautifulSoup()
Any guidance would be appreciated.
The issue was I named the file HTMLParser.py , and that name is already used somewhere in the bs4 module.
Thanks to everyone that helped!
I found out after numerous attempts to solve the ImportError: cannot import name 'BeautifulSoup4' that the package is actually called BeautifulSoup so the import should be:
from bs4 import BeautifulSoup
Make sure the directory from which you are running your script does not contain a filename called bs4.py.
I solved it by installing beautifulsoup4, the "4" is essential.
pip install beautifulsoup4
I experienced a variation of this problem and am posting for others' benefit.
I named my Python example script bs4.py
Inside this script, whenever trying to import bs4 using the command:
from bs4 import BeautifulSoup, an ImportError was thrown, but confusingly (for me) the import worked perfectly from an interactive shell within the same venv environment.
After renaming the Python script, imports work as expected. The error was caused as Python tries to import itself from the local directory rather than using the system copy of bs4
Copy bs4 and beautifulsoup4-4.6.0.dist-info from C:\python\Lib\site-packages to your local project directory. It worked for me. Here, python actually looks for the library in local directory rather than the place where the library was installed!
The bs4 and beautifulsoup4 folders might be in the site-packages folder. So copy BeautifulSoup4 folder in bs4 and then try the below code. It worked for me.
from bs4 import BeautifulSoup
Since you were importing BeautifulSoup from bs4 and in bs4 there was no BeautifulSoup folder. That is why it was showing ImportError: cannot import name BeautifulSoup.
One of the possible reason: If you have more than one python versions installed and let's say you installed beautifulsoup4 using pip3, it will only be available for import when you run it in python3 shell.
I was also facing this type error in the beginning even after install all the modules which were required including pip install bs4 (if you have installed this then no need to install beautifusoup4 | BeautifulSoup4 through pip or anywhere else it comes with bs4 itself)
Solution : Just go to your python file where it is installed C:\python\Lib\site-packages
and then copy bs4 and beautifulsoup4-4.6.0.dist-info folders and paste it to your project folder where you have saved your working project.
The best way to resolve is, while creating your interpreter select your global python path on your system(/usr/local/bin/python3.7).
Make sure that in pycharm shell, python --version appears as 3.7. It shouldn't show 2.7
There is no problem with package just need to Copy bs4 and
beautifulsoup4-4.6.0.dist-info into your project directory
When I used
pip3 install beautifulsoup4
instead of
pip install beautifulsoup4
it returned that all requirements already satisfied but I ran it again and it worked, I'm using a virtualenv which uses python 3.8.10, I don't really know the logic behind it but hey it worked.
I had the same problem. The error was that the file in which I was importing beautifulsoup from bs4 was in another folder. Just replaced the file out of the internal folder and it worked.
For anyone else that might have the same issue as me. I tried all the above, but still didn't work. issue was 1 was using a virtual environment so needed to do pip install in the pycharm terminal instead of a command prompt to install it there. Secondly I had typed import Beautifulsoup with the S not capitalized. changed to BeautifulSoup and it worked.
For me it was a permissions issue. Directory "/usr/local/lib/python#.#/site-packages/bs4" was only 'rwx' by root and no other groups/users. Please check permissions on that directory.