Can't load xlsx file

Can't load xlsx file - python

I am trying to read attached xlsx (Click here to download ) file using python openpyxl. However, workbook cannot be loaded. Here is my attempt to open xlsx file in python -
>>> from openpyxl import load_workbook
>>> workbook = load_workbook(filename = "test.xlsx")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\openpyxl\reader\excel.py", line 136, in load_workbook
_load_workbook(wb, archive, filename, use_iterators, keep_vba)
File "C:\Python27\lib\site-packages\openpyxl\reader\excel.py", line 198, in _load_workbook
keep_vba=keep_vba)
File "C:\Python27\lib\site-packages\openpyxl\reader\worksheet.py", line 332, in read_worksheet
fast_parse(ws, xml_source, string_table, style_table, color_index)
File "C:\Python27\lib\site-packages\openpyxl\reader\worksheet.py", line 320, in fast_parse
parser.parse()
File "C:\Python27\lib\site-packages\openpyxl\reader\worksheet.py", line 137, in parse
dispatcher[tag_name](element)
File "C:\Python27\lib\site-packages\openpyxl\reader\worksheet.py", line 176, in parse_merge
self.ws.merge_cells(mergeCell.get('ref'))
File "C:\Python27\lib\site-packages\openpyxl\worksheet.py", line 815, in merge_cells
raise InsufficientCoordinatesException(msg)
openpyxl.shared.exc.InsufficientCoordinatesException: Range must be a cell range (e.g. A1:E1)

It appears that your .xlsx file is damaged or permanently corrupted. The reasons could be many. One of them could be that you might have renamed the extension of the file to .xlsx which would invalidate the file. To confirm this beahviour, please try to open this file in Microsoft Excel.
I tried reading the file through, openpyxl, xlrd and pandas but none of them worked.
>>> import xlrd
>>> xlrd.open_workbook('test.xlsx')
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '<html> <'
>>> from openpyxl import load_workbook
>>> workbook = load_workbook(filename = "test.xlsx")
InvalidFileException: File is not a zip file
>>> import pandas
>>> pandas.ExcelFile('test.xlsx')
InvalidFileException: File is not a zip file

I ran into this issue trying to open every file in a directory ending in *.xlsx .
I later found the file that caused the error was named ~$filename.xlsx . I'm guessing that Microsoft indicates that a file is currently opened by creating a file with the same name, prepended with the ~$. Once I closed the file, everything worked as expected.

The problem was that some merged cells were, in fact, merged with themselves. openpyxl expected a merged cell reference always to be a range of cells. A fix for the problem which ignores meaningless merges has been added to the 2.0 branch.

I like openpyxl and use it for creating xlsx documents. It could be a bug or a missing compatibility with excel feature that takes place in your specific document. I would report it to the openpyxl community

OK Guys.. I have reported this bug to openpyxl developers and they have provided a quick fix on this. Here is the complete thread.

I did never try openpyxl but I use xlrd for reading excel files (.xls and .xlsx). its work great.
see the examples and documentation at http://www.python-excel.org/

Related

Export Python Dataframe to Excel

I'm trying to export a Python Dataframe to excel using xlsx or csv...
Here is the code I tried to use:
export_word_count = word_count.to_excel (r'C:\Users\OTR\PycharmProjects\MyProjects\word_count.xlsx', index = None, header=True)
I keep getting the following error messages:
Traceback (most recent call last):
File "C:/Users/OTR/PycharmProjects/MyProjects/CAP_Test_MotsCles.py", line 35,
in <module>
export_word_count = word_count.to_excel
(r'C:\Users\OTR\PycharmProjects\MyProjects\word_count_CAP.xlsx', index = None,
header=True)
File "C:\Users\OTR\PycharmProjects\MyProjects\venv\lib\site-
packages\pandas\core\generic.py", line 2127, in to_excel
engine=engine)
File "C:\Users\OTR\PycharmProjects\MyProjects\venv\lib\site-packages\pandas\io\formats\excel.py", line 656, in write
writer = ExcelWriter(_stringify_path(writer), engine=engine)
File "C:\Users\OTR\PycharmProjects\MyProjects\venv\lib\site-packages\pandas\io\excel.py", line 1204, in __init__
from openpyxl.workbook import Workbook
ModuleNotFoundError: No module named 'openpyxl'
Any output on this would be greatly appreciated. I tried to tweek the code, but still wouldn't export. Thank you.
EDIT:
Managed to export, but having issues with full data export
Sample Python Data:
products 58
company 53
cannabis 42
business 39

You dont have python openpyxl module installed.
Install it with:
pip install openpyxl

Your words are your index. Right now you are not exporting the index.
Try changing your code to:
word_count.to_excel (r'C:\Users\OTR\PycharmProjects\MyProjects\word_count.xlsx', index =True, header=True)
‘index=True’ is the default behavior, so not actually necessary.

Openpyxl ['MergedCell' object attribute 'hyperlink' is read-only] Problem

I met a problem while loading a xlsx File. In the worksheet there is a Hyperlink in a merged cell. While loading the file, error ocured. Can anybody
Code is just like this.
workbook = openpyxl.load_workbook(report_filepath)
Error Info:
File "F:\mainfunc_new.py", line 733, in read_report
workbook = openpyxl.load_workbook(report_filepath)
File "C:\Users\10225167\AppData\Local\Programs\Python\Python36\lib\site-packages\openpyxl\reader\excel.py", line 312, in load_workbook
reader.read()
File "C:\Users\10225167\AppData\Local\Programs\Python\Python36\lib\site-packages\openpyxl\reader\excel.py", line 274, in read
self.read_worksheets()
File "C:\Users\10225167\AppData\Local\Programs\Python\Python36\lib\site-packages\openpyxl\reader\excel.py", line 228, in read_worksheets
ws_parser.bind_all()
File "C:\Users\10225167\AppData\Local\Programs\Python\Python36\lib\site-packages\openpyxl\worksheet\_reader.py", line 389, in bind_all
self.bind_hyperlinks()
File "C:\Users\10225167\AppData\Local\Programs\Python\Python36\lib\site-packages\openpyxl\worksheet\_reader.py", line 355, in bind_hyperlinks
cell.hyperlink = link
AttributeError: 'MergedCell' object attribute 'hyperlink' is read-only
Thanks.

Use version 2.5.14 instead of yours. It worked for me.
pip install openpyxl==2.5.14

Double check if the cells that you're merging are empty. I don't think you can merge cells if that has values.

Maybe this error was caused by Excel format. I deleted the current excel file and replaced it with a good excel file, the error disappeared.

Pandas open_excel() fails with xlrd.biffh.XLRDError: Can't find workbook in OLE2 compound document

I'm trying to use pandas to parse an .xlsm document. My code worked perfectly with the example file I was given, but once I got the rest of the documents, it failed with the above error. Here's the offending stack trace:
Traceback (most recent call last):
File "########/UnsupervisedCAM.py", line 9, in <module>
info_dict = read_excel_to_dict('files/' + filename)
File "########\readCAM.py", line 7, in read_excel_to_dict
df = pandas.read_excel(filename, parse_cols='E,G,I,K,Q,O')
File "########\Anaconda3\envs\tensorflow\lib\site-packages\pandas\io\excel.py", line 191, in read_excel
io = ExcelFile(io, engine=engine)
File "########\Anaconda3\envs\tensorflow\lib\site-packages\pandas\io\excel.py", line 249, in __init__
self.book = xlrd.open_workbook(io)
File "########\Anaconda3\envs\tensorflow\lib\site-packages\xlrd\__init__.py", line 441, in open_workbook
ragged_rows=ragged_rows,
File "########\Anaconda3\envs\tensorflow\lib\site-packages\xlrd\book.py", line 87, in open_workbook_xls
ragged_rows=ragged_rows,
File "########\Anaconda3\envs\tensorflow\lib\site-packages\xlrd\book.py", line 595, in biff2_8_load
raise XLRDError("Can't find workbook in OLE2 compound document")
xlrd.biffh.XLRDError: Can't find workbook in OLE2 compound document
I'm not even sure where to start... Haven't found anything of use online.

I got the same error message and could solve it by removing the password protection of the xlsx-file.
(not saying that it's the only reason for the error, but worth checking!)

After a lot of searching, the only way I've found to do this is to open and save all the excel documents, which seems to 'strip' them of their OLE2 format. I automated the process with the following vbs script:
Dim objFSO, objFolder, objFile
Dim objExcel, objWB
Set objExcel = CreateObject("Excel.Application")
Set objFSO = CreateObject("scripting.filesystemobject")
MyFolder = "<PATH/TO/FILES"
Set objFolder = objfso.getfolder(myfolder)
For Each objFile In objfolder.Files
If Right(objFile.Name,4) = "<EXTENSION>" Then
Set objWB = objExcel.Workbooks.Open(objFile)
objWB.save
objWB.close
End If
Next
objExcel.Quit
Set objExcel = Nothing
Set objFSO = Nothing
Wscript.Echo "Done"
Make sure to change the path to the folder and extension.

In case you face this issue over Jupyter notebook as I did when searching for the error, you can simply restart the kernel and the issue gets resolved.

Why I am not able to load excel files generated in the morning, but can load them in the afternoon in Python using Openpyxl

I am using Python Openpyxl to import excel files which are generated by a online tool. When I import the files generated in the morning, I got an error like this:
Traceback (most recent call last):
File "test4.py", line 8, in <module>
wb = openpyxl.load_workbook (temp2)
File "C:\Python27\lib\site-packages\openpyxl\reader\excel.py", line 201, in load_workbook
wb.properties = DocumentProperties.from_tree(src)
File "C:\Python27\lib\site-packages\openpyxl\descriptors\serialisable.py", line 89, in from_tree
return cls(**attrib)
File "C:\Python27\lib\site-packages\openpyxl\packaging\core.py", line 106, in__init__
self.modified = modified
File "C:\Python27\lib\site-packages\openpyxl\descriptors\base.py", line 267, in __set__
value = W3CDTF_to_datetime(value)
File "C:\Python27\lib\site-packages\openpyxl\utils\datetime.py", line 40, in W3CDTF_to_datetime
dt = [int(v) for v in match.groups()[:6]]
AttributeError: 'NoneType' object has no attribute 'groups'
The strange thing is I only got this error when I importing the files which are generated by the online tool in the morning. I tried the same file but generated in the afternoon, it works very well. I'm confused where the problem is. There are no fields in the excel files related to time. And the files generated in the morning and in the afternoon are exactly the same except the modified time. Does anybody can help me with it? Thank you.

Excel files created from this online tool isn't well compatible with openpyxl
The function load_workbook will get workbook-level information and assign to Workbook()'s wb.properties from 'docProps/core.xml' by opening excel file through zipfile. One piece of information is modified time.
The value of modified raise the error, it can't be transported into datetime. The pattern of 'modified' must be openpyxl.utils.datetime.W3CDTF_REGEX, which is W3CDTF|W3C Date and Time Formats
You can check the excel's modified time if it corresponds to W3CDTF. Here is the code:
from openpyxl.reader.excel import _validate_archive
archive = _validate_archive('/path/to/yourexcel.xlsx')
valid_files = archive.namelist()
# you'll find 'xx/core.xml' I'm not sure if it's 'docProps/core.xml'
print valid_files
# read 'xx/core.xml'
wb_info = archive.read('docProps/core.xml')
print wb_info
In wb_info, you will find something like
<dcterms:modified xsi:type="dcterms:W3CDTF">2017-04-01T22:48:48Z</dcterms:modified>.
Contrast wb_info of excel files from online tool and your pc.

Getting error while opening excel file in Python

Hi I am very new to python, here i m trying to open a xls file in python code but it is showing me some error as below.
Code:
from xlrd import open_workbook
import os.path
wb = open_workbook('C:\Users\xxxx\Desktop\a.xlsx')
Error:Traceback (most recent call last):
File "C:\Python27\1.py", line 3, in <module>
wb = open_workbook('C:\Users\xxxx\Desktop\a.xlsx')
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 429, in open_workbook
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 1545, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 1539, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found 'PK\x03\x04\x14\x00\x06\x00'
need help guyz

This is a version conflict issue. Your Excel sheet format and the format that xlrd expects are different. You could try to save the Excel sheet in a different format until you find what xlrd expects.

Not familiar with xlrd, but nothing wrong appears on my Mac.
According to #jewirth, you can try to rename the suffix to xls which is the old version, and then reopen it or convert it into xlsx.

from xlrd import open_workbook
import os.path
wb = open_workbook(r'C:\Users\XXXX\Desktop\a.xlsx')
print wb
Output : <xlrd.book.Book object at 0x0260E490>
Opened the excel in 'r' format and it shows the excel object. Its working normally. Try to get the xlrd version and update it. Change the excel file format to '.xls' from '.xlsx' and try

You are getting that error because you are using an old version of xlrd which doesn't support xlsx.
You need to upgrade to a recent version of xlrd.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can't load xlsx file - python

The problem was that some merged cells were, in fact, merged with themselves. openpyxl expected a merged cell reference always to be a range of cells. A fix for the problem which ignores meaningless merges has been added to the 2.0 branch.

I like openpyxl and use it for creating xlsx documents. It could be a bug or a missing compatibility with excel feature that takes place in your specific document. I would report it to the openpyxl community

OK Guys.. I have reported this bug to openpyxl developers and they have provided a quick fix on this. Here is the complete thread.

I did never try openpyxl but I use xlrd for reading excel files (.xls and .xlsx). its work great. see the examples and documentation at http://www.python-excel.org/

Related

Export Python Dataframe to Excel

Openpyxl ['MergedCell' object attribute 'hyperlink' is read-only] Problem

Pandas open_excel() fails with xlrd.biffh.XLRDError: Can't find workbook in OLE2 compound document

Why I am not able to load excel files generated in the morning, but can load them in the afternoon in Python using Openpyxl

Getting error while opening excel file in Python

Categories

Resources