Python is generating this error by reading the excel file by all the method i.e pandas, openpyxl.
raise CompDocError(msg)
xlrd.compdoc.CompDocError: MSAT extension: accessing sector 131072 but only 22863 in file
Related
raise CompDocError(msg)
xlrd.compdoc.CompDocError: MSAT extension: accessing sector 131072 but only 22863 in file
You might be trying to open a corrupt Excel file. Assuming you're opening the file using xlrd, you could try adding the ignore_workbook_corruption=True parameter:
workbook = xlrd.open_workbook('file_name.xls', ignore_workbook_corruption=True)
A few of my users (all of whom use Mac) have uploaded an Excel into my application, which then rejected it because the file appeared to be empty. After some debugging, I've determined that the file was saved in Strict Open XML Spreedsheet format, and that openpyxl (2.6.0) doesn't issue an error, but rather prints a warning to stderr.
To reproduce, open a file, add a few rows and save as Strict Open XML Spreedsheet (*.xlsx) format.
import openpyxl
with open('excel_open_strict.xlsx', 'rb') as f:
workbook = openpyxl.load_workbook(filename=f)
This will print the following warning, but will not throw any exception:
UserWarning: File contains an invalid specification for Sheet1. This will be removed
Furthermore, the workbook appears to have no sheets:
assert workbook.get_sheet_names() == []
I've now had three Mac users experience this issue. It seems like Mac will sometimes default to using this Strict Open XML Spreedsheet format. If this is a normal case, then openpyxl should be able to handle it. Otherwise, it would be great if openpyxl would just throw an exception. As a workaround, it seems I can do the following:
import openpyxl
with open('excel_open_strict.xlsx', 'rb') as f:
workbook = openpyxl.load_workbook(filename=f)
if not workbook.get_sheet_names():
raise Exception("The Excel was saved in an incorrect format")
I had similar problems with XLSX files created using the R library openxlsx. A sample error message from a simple python program to open the file and retrieve a single value from sheet Crops:
Warning (from warnings module):
File "C:\Python38\lib\site-packages\openpyxl\reader\workbook.py", line 88
warn(msg)
UserWarning: File contains an invalid specification for Crops. This will be removed
My first, very clumsy solution:
Open with Excel
Save the file as *.xls, which triggered a warning about compatibility.
Re-save as *.xlsx
My second solution works if you only need to read the file:
Impose a read-only restriction:
wb = load_workbook(filename = 'CAF_LTAR_crops_out_0.3.xlsx', read_only=True)
The broad lesson seems to be that the XLSX file specification is not uniformly (correctly?) implemented across programming languages.
I am working with a Windows PC and I had the same Problem with openpyxl. I got an excel template that was saved as Strict Open XML Spreadsheet (*.xlsx). I tried to fill out the template but I got always a fault message for each work sheet as below and when I tried to print the array with all worksheet names was empty [].
UserWarning: File contains an invalid specification for Sheetname. This will be removed
Solution
I saved the file as Excel Workbook (*.xlsx) and not as Strict Open XML Spreadsheet (*.xlsx). After that I had no fault message, the array included all Worksheets and I could fill out the template with openpyxl.
I'm using Python 3.5 and xlsxwriter to create reports in excel, These reports require to have outlook .msg files embedded in the excel rows (which is usually done by adding them as "objects" in excel, just like you would do with a .pdf file for example)
example of object in excel:
Unfortunately xlsxwriter seems to have methods for inserting images (worksheet.insert_image), buttons (worksheet.insert_button) and charts but no objects. I tried using insert_image with the .msg file but it gave an error.
worksheet.insert_image(row, col + 6,'C:/object1.msg')
Error: "Exception: C:/object1.msg: Unknown or unsupported image file format.
My question is, is there any library or xlsxwriter method to insert an object (.pdf or .msg file or whatever) into an xlsx file?
So I have an .xls file which I am able to open with Excel and also with Notepad (can see the numbers along with some other text) but I cannot read the file using pandas module.
df = pd.read_excel(r'"R:\Project\Projects\429 - Buchner Höhe\Analysis Data\scada\20171101.xls"',parse_dates=[[0,1,2,3]])
The error which pops up is as follows:
XLRDError: Unsupported format, or corrupt file: Expected BOF record;
found b'\x03\x11\x0b\x02 \x01\x00\x00'
I tried renaming the file to .xlsx using os.rename, it still does not work.
It is quite likely the file was already a csv file--not an xls or xlsx, renamed through the file system, rather than an actual Excel format file. This is the error generated when you attempt to open a csv with xlrd.
The indicator that this is the case is you can open it with Notepad.
I have a program (zTree) that is writing an Excel file and updating it constantly. What I need this Python program to do is read in the data from the Excel file as its updating. The problem that I'm having though is that when I try to read in the data using xlrd, I get the error:
peek = f.read(peeksz)
IO Error: [Errno 13] Permission denied
which comes up because Excel is in read-only mode. Is there any way to read in the data of an Excel file in read-only mode using Python?
just tested it on win 7 (64bit), but in this case it works:
import xlrd
workbook = xlrd.open_workbook('C:/User/myaccount/Book1.xls')
worksheet = workbook.sheet_by_name('Sheet1')
print worksheet
could it be, that you are trying to copy it first, or that your python is trying to put a temporary copy of the file in the py-directoy? - because that would give the IO-Error