Pandas unable to open this Excel file

Pandas unable to open this Excel file - python

I am trying to use python pandas to open an Excel file. Code is simple as shown below;
import pandas as pd
df = pd.read_excel('../TestXLWings.xlsm', sheetname="TestSheet")
I got an error below;
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.2\helpers\pydev\pydevd.py", line 1599, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.2\helpers\pydev\pydevd.py", line 1026, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.2\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/testing/Dropbox/Test-XLwings/test.py", line 3, in <module>
df = pd.read_excel('../TestXLWings.xlsm', sheetname="TestSheet")
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\excel.py", line 203, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\excel.py", line 260, in __init__
self.book = xlrd.open_workbook(io)
File "C:\ProgramData\Anaconda3\lib\site-packages\xlrd\__init__.py", line 441, in open_workbook
ragged_rows=ragged_rows,
File "C:\ProgramData\Anaconda3\lib\site-packages\xlrd\book.py", line 87, in open_workbook_xls
ragged_rows=ragged_rows,
File "C:\ProgramData\Anaconda3\lib\site-packages\xlrd\book.py", line 595, in biff2_8_load
raise XLRDError("Can't find workbook in OLE2 compound document")
xlrd.biffh.XLRDError: Can't find workbook in OLE2 compound document
My Excel file is xlsm and protected by password. What does OLE2 compound document mean exactly? Does pandas have problems opening this kind of Excel files? I am using python v3.6

I will answer my own question. In one of the comments from ayhan, Excel-protected files cannot be read by xlrd. One solution is to remove the protection.
I need the command to unprotect an Excel file from python
Another solution to read the Excel-protected file is to use xlwings. I have verified that xlwings is able to read protected Excel files when the Excel file is opened.

I would create a new excel file and remove sensitivity label in excel. Then be able to read the file with pd.

Related

Getting "At least one sheet must be visible" when trying to open excel spreadsheet. What to do?

When trying to open an excel spreadsheet do same changes and save it:
import openpyxl
workbook = openpyxl.load_workbook(filename = 'sample.xlsx', read_only=False)
workbook.save('test.xlsx')
I get the following upon calling save():
Traceback (most recent call last):
File "read_pyxl.py", line 4, in <module>
workbook.save('test.xlsx')
File "/home/martin/myvenv/lib/python3.8/site-packages/openpyxl/workbook/workbook.py", line 407, in save
save_workbook(self, filename)
File "/home/martin/myvenv/lib/python3.8/site-packages/openpyxl/writer/excel.py", line 293, in save_workbook
writer.save()
File "/home/martin/myvenv/lib/python3.8/site-packages/openpyxl/writer/excel.py", line 275, in save
self.write_data()
File "/home/martin/myvenv/lib/python3.8/site-packages/openpyxl/writer/excel.py", line 89, in write_data
archive.writestr(ARC_WORKBOOK, writer.write())
File "/home/martin/myvenv/lib/python3.8/site-packages/openpyxl/workbook/_writer.py", line 148, in write
self.write_views()
File "/home/martin/myvenv/lib/python3.8/site-packages/openpyxl/workbook/_writer.py", line 135, in write_views
active = get_active_sheet(self.wb)
File "/home/martin/myvenv/lib/python3.8/site-packages/openpyxl/workbook/_writer.py", line 33, in get_active_sheet
raise IndexError("At least one sheet must be visible")
IndexError: At least one sheet must be visible
>>>
How do I fix this ?

Turned out in this case sample.xlsx had all sheets set to "hidden".
If at least one of the sheets is set to be visible the code works.
A weird limitation, but it exists.

Tablib xlsx file badZip file issue

I am getting error on opening xlsx extension file in windows 8 using tablib library.
python version - 2.7.14
error is as follows:
python suit_simple_sheet_product.py
Traceback (most recent call last):
File "suit_simple_sheet_product.py", line 19, in <module>
data = tablib.Dataset().load(open(BASE_PATH).read())
File "C:\Python27\lib\site-packages\tablib\core.py", line 446, in load
format = detect_format(in_stream)
File "C:\Python27\lib\site-packages\tablib\core.py", line 1157, in detect_format
if fmt.detect(stream):
File "C:\Python27\lib\site-packages\tablib\formats\_xls.py", line 25, in detect
xlrd.open_workbook(file_contents=stream)
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 120, in open_workbook
zf = zipfile.ZipFile(timemachine.BYTES_IO(file_contents))
File "C:\Python27\lib\zipfile.py", line 770, in __init__
self._RealGetContents()
File "C:\Python27\lib\zipfile.py", line 811, in _RealGetContents
raise BadZipfile, "File is not a zip file"
zipfile.BadZipfile: File is not a zip file
path location is as follows =
BASE_PATH = 'C:\Users\anju\Downloads\automate\catalog-5090 fabric detail and price list.xlsx'

Excel .xlsx files are actually zip files. In order for the unzip to work correctly, the file must be opened in binary mode, as such your need to open the file using:
import tablib
BASE_PATH = r'c:\my folder\my_test.xlsx'
data = tablib.Dataset().load(open(BASE_PATH, 'rb').read())
print data
Add r before your string to stop Python from trying to interpret the backslash characters in your path.

Python Error when reading data from .xls file

I need to read a few xls files into Python.The sample data file can be found through Link:data.file. I tried:
import pandas as pd
pd.read_excel('data.xls',sheet=1)
But it gives an error message:
ERROR *** codepage 21010 -> encoding 'unknown_codepage_21010' ->
LookupError: unknown encoding: unknown_codepage_21010 Traceback (most
recent call last):
File "", line 1, in
pd.read_excel('data.xls',sheet=1)
File "C:\Anaconda3\lib\site-packages\pandas\io\excel.py", line 113,
in read_excel
return ExcelFile(io, engine=engine).parse(sheetname=sheetname, **kwds)
File "C:\Anaconda3\lib\site-packages\pandas\io\excel.py", line 150,
in init
self.book = xlrd.open_workbook(io)
File "C:\Anaconda3\lib\site-packages\xlrd__init__.py", line 435, in
open_workbook
ragged_rows=ragged_rows,
File "C:\Anaconda3\lib\site-packages\xlrd\book.py", line 116, in
open_workbook_xls
bk.parse_globals()
File "C:\Anaconda3\lib\site-packages\xlrd\book.py", line 1170, in
parse_globals
self.handle_codepage(data)
File "C:\Anaconda3\lib\site-packages\xlrd\book.py", line 794, in
handle_codepage
self.derive_encoding()
File "C:\Anaconda3\lib\site-packages\xlrd\book.py", line 775, in
derive_encoding
_unused = unicode(b'trial', self.encoding)
File "C:\Anaconda3\lib\site-packages\xlrd\timemachine.py", line 30,
in
unicode = lambda b, enc: b.decode(enc)
LookupError: unknown encoding: unknown_codepage_21010
Anyone could help with this problem?
PS: I know if I open the file in windows excel, and resave it, the code could work, but I am looking for a solution without manual adjustment.

using the ExcelFile class, I was successfully able to read the file into python.
let me know if this helps!
import xlrd
import pandas as pd
xls = pd.ExcelFile(’C:\data.xls’)
xls.parse(’Index Constituents Data’, index_col=None, na_values=[’NA’])

The below worked for me.
import xlrd
my_xls = xlrd.open_workbook('//myshareddrive/something/test.xls',encoding_override="gb2312")

TypeError when trying to open a workbook using openpyxl

I'm trying to use openpyxl to open and modify an existing excel workbook, but I can't even open the file without getting an error.
from openpyxl import load_workbook
ws = load_workbook('PO-Copy.xlsx')
I get a long TypeError as a result:
Traceback (most recent call last):
File "<module1>", line 6, in <module>
File "C:\Python27\Lib\site-packages\openpyxl\reader\excel.py", line 151, in load_workbook
_load_workbook(wb, archive, filename, read_only, keep_vba)
File "C:\Python27\Lib\site-packages\openpyxl\reader\excel.py", line 224, in _load_workbook
keep_vba=keep_vba)
File "C:\Python27\Lib\site-packages\openpyxl\reader\worksheet.py", line 308, in read_worksheet
fast_parse(ws, xml_source, shared_strings, style_table, color_index)
File "C:\Python27\Lib\site-packages\openpyxl\reader\worksheet.py", line 296, in fast_parse
parser.parse()
File "C:\Python27\Lib\site-packages\openpyxl\reader\worksheet.py", line 84, in parse
dispatcher[tag_name](element)
File "C:\Python27\Lib\site-packages\openpyxl\reader\worksheet.py", line 282, in parse_data_validation
dv = parser(tag)
File "C:\Python27\Lib\site-packages\openpyxl\worksheet\datavalidation.py", line 179, in parser
dv = DataValidation(**element.attrib)
TypeError: __init__() got an unexpected keyword argument 'errorStyle'
Has anyone else ran into this error? is there a fix I can use to keep going?

The ability to read DataValidation in existing files was added in openpyxl 2.1 but was limited to what DataValidation in Python supported. Work has started on supporting DataValidation fully and is available in the 2.2 branch at https://bitbucket.org/habub68/openpyxl

Error when trying to convert .sxw file to .rml in OpenERP 6.0.3

[2012-06-01 15:33:10,638][molisamples] ERROR:web-services:Uncaught exception
Traceback (most recent call last):
File "osv\osv.pyo", line 122, in wrapper
File "osv\osv.pyo", line 176, in execute
File "osv\osv.pyo", line 167, in execute_cr
File "C:\Program Files (x86)\OpenERP 6.0\Server\addons\base_report_designer\base_report_designer.py", line 42, in sxwtorml
File "C:\Program Files (x86)\OpenERP 6.0\Server\addons\base_report_designer\openerp_sxw2rml\openerp_sxw2rml.py", line 309, in sxw2rml
File "C:\Program Files (x86)\OpenERP 6.0\Server\addons\base_report_designer\openerp_sxw2rml\openerp_sxw2rml.py", line 294, in unpackNormalize
File "C:\Program Files (x86)\OpenERP 6.0\Server\addons\base_report_designer\openerp_sxw2rml\openerp_sxw2rml.py", line 269, in oo_read
File "zipfile.pyo", line 346, in init
File "zipfile.pyo", line 366, in _GetContents
File "zipfile.pyo", line 378, in _RealGetContents
BadZipfile: File is not a zip file
I get the above error when I try to convert a report I just designed to .rml (using Open Office Writer). Please what could be the issue. I am seriously confused here

You can convert your .sxw to .rml using base_report_designer module.
Try following steps:
Open terminal -> go to openerp_sxw2rml folder like this:
cd addons/base_report_designer/openerp_sxw2rml
Then run this command: python openerp_sxw2rml.py absolute path of sxw > absolute path of rml
Like this:
python openerp_sxw2rml.py /home/arya/my_module/report/my_report.sxw > /home/arya/my_module/report/my_report.rml
This will convert sxw file into rml and you can find your file at given path of rml.
Thank you.

The error says that the file is not a zip file, so it's probably expecting the compressed format of sxw file. Any chance you saved the file in OpenOffice's uncompressed format?

Make sure when you save in Openoffice Writer, you select the old format, the one with SXW extension.
Don't just type .sxw, make sure the program puts it there by itself by selecting the correct entry in the fileformat selectionbox (i forget the full title and cannot check atm)

I figured it out. I had some errors in the python parser file for the report. That's what was causing the issue. It's been fixed now. Thanks y'all for the help

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas unable to open this Excel file - python

I would create a new excel file and remove sensitivity label in excel. Then be able to read the file with pd.

Related

Getting "At least one sheet must be visible" when trying to open excel spreadsheet. What to do?

Tablib xlsx file badZip file issue

Python Error when reading data from .xls file

TypeError when trying to open a workbook using openpyxl

Error when trying to convert .sxw file to .rml in OpenERP 6.0.3

Categories

Resources