Can't catch python warning with warnings.catch_warnings() - python

I'am using the Camelot library in python to read tables from pdf's.
If There is no table recognized, but something else (like a text), the library gives a warning: UserWarning: No tables found in table area 1 [stream.py:365].
My idea was to catch this Warning with warnings.catch_warnings() function.
This is my Code:
with warnings.catch_warnings(record=True) as w:
# reading tables from pdf
parsed_tables = camelot.read_pdf(
tmp_file.name,
pages=page,
flavor="stream",
row_tol=row_tol,
table_areas=["30,480,790,100"],
surpress_stdout=False
)
# warning.warn("TEST")
print("warning", w)
My problem is that variable w is always empty. If I uncomment the "TEST" warning the warning appears in variable w (it works with my own warning).
I searched the library for warning filters but I didn't find any.
I tried to add warnings.filterwarnings("default") or warnings.simplefilter("always").
Why can't I catch this warning? Is it because it occurs in the library and not in my Code?

Related

"pd.read_excel(filename, sheet_name=None" causes UserWarning: Slicer List extension is not supported

I noticed this message only today and could not find any notification on the pandas documentation web...
I use a simple way to load all sheets into dictionary of dataframes:
filename = "data.xlsx"
sheets_dict = pd.read_excel(filename, sheet_name=None)
and it started to cause warning shown below...
is it a bug? or I should start using different method?
If not a bug, - please advise the option.
openpyxl\worksheet\_reader.py:312: UserWarning: Slicer List extension is not supported and will be removed
Its the warning from openpyxl. If you use the default engine on pandas to load each Excel file, the warning goes away, but the time it takes for each file goes up to 5 or even 6 seconds. You can ignore those warnings.
import warnings
warnings.filterwarnings('ignore', category=UserWarning, module='openpyxl')

Tabula font error in reading table from PDF

I saw a lot of people had similar issues, but not this one. And many of the similar issues do not have an applicable solution, unfortunately.
I am getting this warning from tabula. And when I look at the result or test the length of what it extracts, there is nothing there. Here is the message:
Got stderr: Apr 12, 2022 5:34:12 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>
WARNING: Using fallback font 'Helvetica-Oblique' for 'CenturyGothic-Italic'
All I am using is:
table = tabula.read_pdf(pdf_path, pages= page, multiple_tables = True)
Any ideas??
The correct approach, would be to install the missing fonts as recommended in the answer here:
Using fallback font while parsing file content using pdfbox - can it cause mistakes?
However, for my application, which is reading pdf files from a docker container, installing extra fonts in the OS might be unnecessary. Because what you see in the logs are a warning, the missing fonts do not really impact the parsing of the PDF.
To remove these warnings from any logging in tabula.py I just added silent=True to the arguments in the method call as follows:
table_df = tabula.read_pdf(
input_path=pdf_file,
output_format="dataframe",
pages="all",
silent=True,
)

How to deal with warning : "Workbook contains no default style, apply openpyxl's default "

I have the -current- latest version of pandas, openpyxl, xlrd.
openpyxl : 3.0.6.
pandas : 1.2.2.
xlrd : 2.0.1.
I have a generated excel xlsx- file (export from a webapplication).
I read it in pandas:
myexcelfile = pd.read_excel(easy_payfile, engine="openpyxl")
Everything goes ok, I can successfully read the file.
But I do get a warning:
/Users/*******/projects/environments/venv/lib/python3.8/site-packages/openpyxl/styles/stylesheet.py:214: UserWarning: Workbook contains no default style, apply openpyxl's default
warn("Workbook contains no default style, apply openpyxl's default")
The documentation doesn't shed too much light on it.
Is there any way I can add an option to avoid this warning?
I prefer not to suppress it.
I don't think the library offers you a way to disable this thus you are going to need to use the warnings package directly.
A simple and punctual solution to the problem would be doing:
import warnings
with warnings.catch_warnings(record=True):
warnings.simplefilter("always")
myexcelfile = pd.read_excel(easy_payfile, engine="openpyxl")
df=pd.read_excel("my.xlsx",engine="openpyxl") passing the engine parameter got rid of the warning for me. Default = None, so I think it is just warning you that it using openpyxl for default style.
I had the same warning. Just changed the sheet name of my excel file from "sheet_1" to "Sheet1", then the warning disappeared. very similar with Yoan. I think pandas should fix this warning later.
#ruhanbidart solution is better because you turn off warnings just for the call to read_excel, but if you have dozens of calls to pd.read_excel, you can simply disable all warnings:
import warnings
warnings.simplefilter("ignore")
I had the exact same warning and was unable to read the file. In my case the problem was coming from the Sheet name in the Excel file.
The initial name contained a . (ex: MDM.TARGET) I simply replace the . with _ and everything's fine.
In my situation some columns' names had a dollar sign ($) in them. Replacing '$' to '_' solved the issue.

WEKA - Cant read CSV generated with Python pandas

I've been working on some dataframes with Python. I load them in using readCSV(filename, index=0) and it's all fine. The files also open fine in Excel. I also opened them in notepad, and the seem alright; below is an example line:
851,1.218108787,0.636454978,0.269719611,-0.849476404,-0.143909689,0.050626813,-0.094248374,-0.3096134,-0.131347142,0.671271112,0.167593329,0.439417259,-0.198164647,-0.031552824,-0.215189948,-0.1791156,0.092648696,-0.107840318,-0.162596466,0.019324121,0.040572892,-0.008307331,-0.077819297,-0.023809355,-0.148229913,-0.041082835,0.138234498,-0.070986117,0.024788437,-0.050982962,0.24689969,0
The first column is as I understand it an index column. Then there's a bunch of Principal Components, and at the end is a 1/0.
When I try and load the file into WEKA, however, it gives me a nasty error and urges me to use the converter, saying:
Reason:
32 Problem encountered on line: 2
When I attempt to use the converter with the default settings, it states a new error:
Couldn't read object file_name.csv invalid stream header: 2C636F6D
Could anyone help with any of this? I can't provide the entire data file but if requested I can try and maybe cut out a few rows and only paste those if the error still occurs. Are there any flags I need to specify when saving a file to CSV in python? At the moment I just use a .toCSV('x.csv').
I think the index column not having an issue would prevent weka from reading it, when you write using pandas.to_csv() set the index = False
df.to_csv(index = False)

Tableau SDK TableException (40200)

Issue: Error being thrown: tableausdk.Exceptions.TableauException: TableauException (40200): The system cannot find the path specified.
- OS::mkdir(CreateDirectory path="C:\PATH\Tableau-SDK\tdetmp2A0E0E5E")
I am attempting to to create a tableau extract from oracle data using python and the tableauSDK.
The code seems to run correctly if the extract already exists. (although the produced tde is unreadable)
According to the Tableau community I should be able to create an extract from any source data without the extract already existing...
Any idea on why this is occuring?
tde_path = r'C:\PATH\test.tde'
tde_file = Extract(path=tde_path) ## ERROR Thrown here
The reason now seems obvious...
The error had the answer :
OS::mkdir(CreateDirectory path="C:\PATH\Tableau-SDK\tdetmp2A0E0E5E")
To solve the issue :
The Directory C:\PATH\Tableau-SDK\ did not exist.
Created the Directory and the code ran without error.

Categories

Resources