Python - Openpyxl - "UserWarning: Unknown extension" issue

Python - Openpyxl - "UserWarning: Unknown extension" issue - python

I am trying to learn Python (day 2) and am hoping to practice with Excel books first as this is where I am comfortable/fluent.
Right off the bat I am having an error that I don't quit understand when running the below code:
import openpyxl
wb = openpyxl.load_workbook("/Users/Scott/Desktop/Workbook1.xlsx")
print(wb.sheetnames)
This does print my sheet names as requested, but it is followed by:
/Users/Scott/PycharmProjects/Excel/venv/lib/python3.7/site-packages/openpyxl/worksheet/_reader.py:293: UserWarning: Unknown extension is not supported and will be removed
warn(msg)
I have found other questions that point to slicers/conditional formatting etc, but that does not apply here. This is a book I just made and only added 3 sheets before saving. It has no data, no formatting, and the extension is valid. I have no add-ons installed on my excel either.
Any idea why why I am getting this error? How do I resolve?
Python: 3.7
openpyxl: 2.6

I had a similar issue. I developed an application which read and write Excel files. It woked well on my Windows computer, but then I tried to run it on a friends mac. It showed the same error. I could "fix" it by changing the configuration of the workbook, like this:
import openpyxl as op
wb = op.load_workbook(file, read_only=True, data_only=True)
But, as you can see, you can only read Excel files with this configuration. At the end, I realized that my friend didn't have Microsoft Office installed on his computer. Install it truly solved my problem.

This question was from a couple years ago but I'm encountering it now with openpyxl and require a fix, as the warning is confounding and misleading to my end users.
The warning from openpyxl comes via the stdlib warnings library, which can be suppressed.
import warnings
warnings.simplefilter("ignore")
That's the "hit it with a hammer" approach. More granular levels of warnings suppression can be found here: https://docs.python.org/3/library/warnings.html

This is exactly the problem I encountered just now..
And to my situation (not to everyone) I discovered that you just need to close your excel and rerun the code, very simple.
If this doesn't work, you can refer to other answers.
Thanks

Python - Openpyxl - "UserWarning: Unknown extension" issue
To understand the error, you need to know what's inside an XLSX file. The best way to take a look is to change the extension to zip and open that. Inside you will see a file called [Content_Types].xml and directories for the other content. If you check out the XML in Content_Types you will see a <Types ...> tag containing other tags like this:
<Default Extension="png" ContentType="image/png"/>
<Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
<Default Extension="xml" ContentType="application/xml"/>
Note the "Extension" property. This is what the warning refers to. In the example above, my file included Extension="png" - the unknown extension.
For me, it was enough to specify read_only=True and the error went away eg:
wb = openpyxl.load_workbook(file, read_only=True)
I could also fix the issue by copying everything except the images to a new workbook and saving that. After checking, the xml in the new workbook no longer contained the png property.
Note, reading into pandas with pd.read_excel uses openpyxl and generates the same "Unknown extension" error but there is no way to pass through the read_only parameter. You can suppress the specific warning with:
import warnings
warnings.filterwarnings('ignore', category=UserWarning, module='openpyxl')

Related

Python does not read the Excel Sheet

I am trying to read an excel sheet and create a pandas data frame out of it, but it keeps saying that the sheet does not exist, even tho it actually exists. Has anyone faced something similar?
This is the code I used:
excel=pd.ExcelFile("Berlin_Club_List.xlsx")
clubs=pd.read_excel(excel, 'Berlin_Club_List')

Is the file in your working directory?
If it is not you need to provide full path
(Would be a comment but not enough rep)

To check your actual working directory you can do the following:
import os
os.getcwd()
Do show the actual error message that you got. This will help others to understand and debug the issue you are facing.
Definitely check the documentation for pd.read_excel by typing
pd.read_excel?
you will see that you can specify the sheet you want to by
pd.read_excel(excel, sheet_name='sheetname')
and so on.

Reading excel file with pylightxl

My organization needs me to use pylightxl library to read some bulky excel xlsx files. I have never used this library before and I'm getting a strange error in pycharm. I simply do not understand what it is.
I've tried googling but there isn't much support for pylightxl on the web. Does anyone know how to help?

for completeness of this post it looks like there was a bug in early days of pylightxl for files that were converted from xls to xlsx, however this issue has been resolved with #31 with version 1.52+

For those who might meet the same issue in the future, here's how i solved my issue, or rather a work around of it.
The excel files i had been given were initially in xls format, which pylightxl does not support.
I converted them to xlsx by just clicking "save as" in excel and then tried to read them in pylightxl, of which i was getting the strange error above. Must be something to do with the format
So I ended up saving it in csv instead, of which reading them was successful.
So if anyone meets this error, try different formats for the document you're trying to read

Outputting A .xls File In Python

I have been teaching myself Python to automate some of our work processes. So far reading from Excel files (.xls, .xlsx) has gone great.
Currently I have hit a bit of a snag. Although I can output .xlsx files fine, the software system that we have to use for our primary work task can only take .xls files as an input - it cannot handle .xlsx files, and the vendor sees no reason to add .xlsx support at any point in the foreseeable future.
When I try to output a .xls file using either Pandas or OpenPyXl, and open that file in Excel, I get a warning that the file format and extension of the file do not match, which leads me to think that attempting to open this file using our software could lead to some pretty unexpected consequences (because it's actually a .xlsx file, just not named as such)
I've tried to search for how to fix this all on Google, but all I can find are guides for how to convert a .xls file to a .xlsx file (which is almost the opposite of what I need). So I was wondering if anybody could please help me on whether this can be achieved, and if it can, how.
Thank you very much for your time

Under the pandas.DataFrame.to_excel documentation you should notice a parameter called engine, which states:
engine : str, optional
Write engine to use, openpyxl or xlsxwriter. You can also set this via the options io.excel.xlsx.writer, io.excel.xls.writer, and io.excel.xlsm.writer.
What it does not state is that the engine param is automatically picked based on your file extension -- therefore, easy fix:
import pandas as pd
df = pd.DataFrame({"data": [1, 2, 3]})
df.to_excel("file.xls") # Notice desired file extension.
This will automatically use the xlwt engine, so make sure you have it installed via pip install xlwt.

Felipe is right the filename extension will set the engine parameter.
So basically all it's saying is that the old Excel format ".xls" extension is no longer supported in Pandas. So if you specify the output spreadsheet with the ".xlsx" extension the warning message disappears.

I FINALLY have the answer!
I have libreoffice installed and am using the following in the command line on windows:
"C:\Program Files\LibreOffice\program\soffice.exe" --headless --convert-to xlsx test2.xls
Currently trying to use subprocess to automate this.

Xlwings: avoid to open the file

Is there any way to avoid the file to be opened while working with xlwings?
I have read there was an update going on one year ago but I do not know if the issue has been solved.

As of version 0.10.4, xlwings is purely manipulating Excel files via a running Excel instance. That means, yes, you need to have your file open.
You can set the Excel instance to visible=False, see here, but I doubt that this is what you want. To manipulate the files directly without Excel, you have to use xlrd/xlwt or xlsxwriter or openpyxl.

According with Felix Zumstein, the solution lies in that documentation. Also I had this problem and I solved with the following line of code:
import xlwings
xlwings.App().visible = False
Personally, before I had other problems, even wider! And thanks to this solution I solved them.

xlsx file extension not valid after saving with openpyxl and keep_vba=true. Which is the best way?

In the environment, we have an excel file, which includes rawdata in one sheet and pivot table and charts in another sheet.
I need to append rows every day to raw data automatically using a python job.
I am not sure, but there may be some VB Script running on the front end which will refresh the pivot tables.
I used openpyxl and by following its online documentation, I was able to append rows and save the workbook. I used keep_vba=true while loading the workbook to keep the VBA modules inside to enable pivoting. But after saving the workbook, the xlsx is not being opened anymore using MS office and saying the format or the extension is not valid. I can see the data using python but with office, its not working anymore. If I don't use keep_vba=true, then pivoting is not working, only the previous values are present (ofcourse as I understood, as VBA script is needed for pivoting).
Could you explain me what's happening? I am new to python and don't know its concepts much.
How can I fix this in openpyxl or is there any better alternative other than openpyxl. Data connections in MS office is not an option for me.
As I understood, xlsx may need special modules to save the VB script to save in the same way as it may be saved using MS office. If it is, then what is the purpose of keep_vba=true ?
I would be grateful if you could explain in more detail. I would love to know.
As I have very short time to complete this task, I am looking for a quick answer here, instead of going through all the concepts.
Thankyou!

You have to save the files with the extension ".xlsm" rather than ".xlsx". The .xlsx format exists specifically to provide the user with assurance that there is no VBA code within the file. This is an Excel standard and not a problem with openpyxl. With that said, I haven't worked with openpyxl, so I'm not sure what you need to do to be sure your files are properly converted to .xlsm.
Edit: Sorry, misread your question first time around. Easiest step would be to set keep_vba=False. That might resolve your issue right there, since you're telling openpyxl to look for VBA code that can't possibly exist in an xlsx file. Hard to say more than that until you post the relevant section of your code.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.