Book has no extract_formulas attribute calling xlrd.open_workbook() - python

I have this code:
import xlrd
path = "C:\\Users\\m.macapanas\\Desktop\\OFCCP_Default_Values.xlsm"
excel_workbook = xlrd.open_workbook(path)
excel_worksheet = excel_workbook.sheet_by_index(0)
#Read from Excel Worksheet
print("Your Worksheet has " + str(excel_worksheet.ncols) + " columns")
print("Your Worksheet has " + str(excel_worksheet.nrows) + " rows")
for row in range (excel_worksheet.nrows):
for col in range(excel_worksheet.ncols):
print(excel_worksheet.cell_value(row, col), end='')
print('\t', end='')
print()
Then the result is error
Traceback (most recent call last):
File "C:/Users/m.macapanas/IdeaProjects/OFCCP Tool/Read Excel File with Python/Pandas.py", line 4, in
excel_workbook = xlrd.open_workbook(path)
File "C:\Users\m.macapanas\AppData\Roaming\Python\Python36\site-packages\xlrd_init_.py", line 141, in open_workbook
ragged_rows=ragged_rows,
File "C:\Users\m.macapanas\AppData\Roaming\Python\Python36\site-packages\xlrd\xlsx.py", line 808, in open_workbook_2007_xml
x12book.process_stream(zflo, 'Workbook')
File "C:\Users\m.macapanas\AppData\Roaming\Python\Python36\site-packages\xlrd\xlsx.py", line 265, in process_stream
meth(self, elem)
File "C:\Users\m.macapanas\AppData\Roaming\Python\Python36\site-packages\xlrd\xlsx.py", line 392, in do_sheet
sheet = Sheet(bk, position=None, name=name, number=sheetx)
File "C:\Users\m.macapanas\AppData\Roaming\Python\Python36\site-packages\xlrd\sheet.py", line 326, in init
self.extract_formulas = book.extract_formulas
AttributeError: 'Book' object has no attribute 'extract_formulas'

According to the xlrd documentation states in a warning:
This library will no longer read anything other than .xls files.
Your error is popping up when you attempt to open a workbook for the file "C:\\Users\\m.macapanas\\Desktop\\OFCCP_Default_Values.xlsm", which has a .xlsm extension.
The xlrd library explicitly doesn't support reading the newer file formats like .xlsm. So you'll either have to switch libraries or find a way to downgrade your input file to supported .xls format.

Issue
Analyze the error
line 4, in excel_workbook = xlrd.open_workbook(path)
Your script fails to open the workbook.
AttributeError: 'Book' object has no attribute 'extract_formulas'
The attribute-error states, it does not find extract_formulas as attribute of xlrd's Book object.
Caused by unsupported file-format .xlsx
As Nathaniel Ford's answer explained:
xlrd (as of current version 2.0.1) only supports older Excel file-format .xls
See also
Pandas cannot open an Excel (.xlsx) file
Why is python xlrd errors when opening a .xlsm instead of .xls
Alternative solution
Research on Stackoverflow gave:
How can I open an Excel file in Python?
Working with Excel Files in Python is a great resources-collection which lists popular libraries.
Ported to OpenPyXL
There on top: openpyxl
The recommended package for reading and writing Excel 2010 files (ie: .xlsx)
After installing using:
pip install openpyxl
Your code might be ported to this library like:
from openpyxl import load_workbook
path = "C:\\Users\\m.macapanas\\Desktop\\OFCCP_Default_Values.xlsm"
excel_workbook = load_workbook(filename = path)
excel_worksheet = excel_workbook. worksheets[0] # first worksheet
# Read from Excel Worksheet
print("Your Worksheet has " + str(excel_worksheet.ncols) + " columns")
print("Your Worksheet has " + str(excel_worksheet.nrows) + " rows")
for row in excel_worksheet.rows:
for col in excel_worksheet.cols:
print(excel_worksheet.cell(row, col), end='')
print('\t', end='')
print()

Related

Win32com Module Problems

I want to convert .xls to .xlsx, so I use win32com module
this is my code:
import os
import win32com.client as win32
address = address = os.getcwd()
fname = address + "\\Bundles.xls"
fname2 = address + "\\searchresults.xls"
excel = win32.gencache.EnsureDispatch('Excel.Application')
excel2 = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open(fname)
wb5 = excel.Workbooks.Open(fname2)
wb.SaveAs(fname+"x", FileFormat = 51)
wb5.SaveAs(fname2+"x", FileFormat = 51) #FileFormat = 51 is for .xlsx extension
wb.Close()
wb5.Close() #FileFormat = 56 is for .xls extension
excel.Application.Quit()
excel2.Application.Quit()
print('File .xls convert .xlsx successful!!')
then I got the error, here it is the traceback:
Traceback (most recent call last):
File "c:/Users/shenshuaic/Desktop/SFP Program/win32test.py", line 7, in <module>
excel = win32.gencache.EnsureDispatch('Excel.Application')
File "C:\Users\shenshuaic\AppData\Roaming\Python\Python37\site-packages\win32com\client\gencache.py", line 527, in EnsureDispatch
disp = win32com.client.Dispatch(prog_id)
File "C:\Users\shenshuaic\AppData\Roaming\Python\Python37\site-packages\win32com\client\__init__.py", line 96, in Dispatch
return __WrapDispatch(dispatch, userName, resultCLSID, typeinfo, clsctx=clsctx)
File "C:\Users\shenshuaic\AppData\Roaming\Python\Python37\site-packages\win32com\client\__init__.py", line 37, in __WrapDispatch
klass = gencache.GetClassForCLSID(resultCLSID)
File "C:\Users\shenshuaic\AppData\Roaming\Python\Python37\site-packages\win32com\client\gencache.py", line 183, in GetClassForCLSID
mod = GetModuleForCLSID(clsid)
File "C:\Users\shenshuaic\AppData\Roaming\Python\Python37\site-packages\win32com\client\gencache.py", line 226, in GetModuleForCLSID
mod = GetModuleForTypelib(typelibCLSID, lcid, major, minor)
File "C:\Users\shenshuaic\AppData\Roaming\Python\Python37\site-packages\win32com\client\gencache.py", line 266, in GetModuleForTypelib
AddModuleToCache(typelibCLSID, lcid, major, minor)
File "C:\Users\shenshuaic\AppData\Roaming\Python\Python37\site-packages\win32com\client\gencache.py", line 552, in AddModuleToCache
dict = mod.CLSIDToClassMap
AttributeError: module 'win32com.gen_py.00020813-0000-0000-C000-000000000046x0x1x9' has no attribute 'CLSIDToClassMap'
It looks like you're running into a bug which happens when you use early binding with win32com. My recommendation is if you can use late binding as you won't receive the error. If you do need to use early binding then you can need to delete the auto-generated python code from win32com. If you go to this location on your system:
C:\Users\<USERNAME>\AppData\Local\Temp\gen_py
You'll see some folders in there, where each folder represents the version of python that code was generated for. For example, if you see 3.7 that means python 3.7. Regardless of which one you see go inside and you should see a bunch of different python files. Where each file represents a different object that you've specified early binding for.
All you need to do is just delete the 3.7 folder or whatever it is and re-run your code. That fixes the issue 90% of the time.
Now with your code, I would recommend you modify it a little since you really do not need to have two instances of Excel opened at the same time.
import os
import win32com.client as win32
address = address = os.getcwd()
file_name_1 = address + "\\Bundles.xls"
file_name_2 = address + "\\searchresults.xls"
new_file_name_1 = file_name_1 + "_converted"
new_file_name_2 = file_name_2 + "_converted"
excel = win32.gencache.EnsureDispatch('Excel.Application')
file_1 = excel.Workbooks.Open(file_name_1)
file_2 = excel.Workbooks.Open(file_name_2)
file_1.SaveAs(new_file_name_1, FileFormat = 51)
file_2.SaveAs(new_file_name_2, FileFormat = 51)
file_1.Close()
file_2.Close()
excel.Application.Quit()
print('File .xls convert .xlsx successfully!!')

Openpyxl FileNotFoundError

I have been trying to get openpyxl working with pycharm but the excel documents appear with a question mark, and when I try to run code it says filenotfounderror
import openpyxl as xl
wb = xl.load_workbook("transactions.xlsx")
print(wb)
I expect the output to be the cell values but instead i get this:
Traceback (most recent call last): File
"C:/Users/nicol/.PyCharmCE2019.1/config/scratches/excel_work.py", line
3, in
wb = xl.load_workbook("transactions.xlsx") File "C:\Users\nicol\PycharmProjects\FirstProject\venv\lib\site-packages\openpyxl\reader\excel.py",
line 311, in load_workbook
data_only , keep_links) File "C:\Users\nicol\PycharmProjects\FirstProject\venv\lib\site-packages\openpyxl\reader\excel.py",
line 126, in init
self.archive = _validate_archive(fn) File "C:\Users\nicol\PycharmProjects\FirstProject\venv\lib\site-packages\openpyxl\reader\excel.py",
line 98, in _validate_archive
archive = ZipFile(filename, 'r') File "C:\Users\nicol\AppData\Local\Programs\Python\Python37-32\lib\zipfile.py",
line 1204, in init
self.fp = io.open(file, filemode) FileNotFoundError: [Errno 2] No such file or directory: 'transactions.xlsx'
Add full path to the file like:
C:\Users\mee\Desktop\Test
import openpyxl as xl
wb = xl.load_workbook("C:\Users\mee\Desktop\Test\transactions.xlsx") ' Change your path
print(wb)
you must use full path **
**or change directory
for example
import openpyxl as xl
import os
os.chdir("c:/user/sam/desktop/test")
wb = xl.load_workbook("transactions.xlsx")
print(wb)
transaction in test folder
It is also always a good idea to check if the "filename string" actually refers to a file. In order to check this, use something like
import os
absolute_filename = r"C:\Users\mee\Desktop\Test\transactions.xlsx"
if not os.path.isfile(absolute_filename):
print("ERROR: File not found!")
exit(-1)
This way you can be sure the file is actually there! If it isn't, all libraries (e.g. openpyxl) will throw some sort of error/exception.
I've faced the same issue. It worked with me when I copied the relative path, which is the path starting from the project name.
from openpyxl import Workbook, load_workbook
wb = load_workbook('Projects/automate_excel/book1.xlsx')
enter image description here
I hope it will work with you! also, make sure that your excel file and you Python file are in the same folder.

What does the Permission error mean when trying to read in Excel files using the xlrd module?

I am new to python and I am just trying to figure out how to read in a data set from Excel using the xlrd module. When I run my code I am getting the permission error [errno 13]. I'm not sure what the error means or why I am getting it.
Here is my code I am using:
import xlrd
loc = ("path to the file I'm trying to read in")
wb = xlrd.open_workbook(loc)
sheet = wb.sheet_by_index(0)
sheet.cell_value(0,0)
print(sheet.nrows)
and this is the output I get:
Traceback (most recent call last): File "GaitOptMain.py", line 46,
in
wb = xlrd.open_workbook(loc) File "C:\Users\mleef\AppData\Local\Programs\Python\Python37\lib\site-packages\xlrd__init__.py",
line 116, in open_workbook
with open(filename, "rb") as f: PermissionError: [Errno 13] Permission denied: [path that I used in the code]
I was actually able to figure it out. I think the problem is that I was trying to read in a directory and not an actual file. the path I was using ended at the folder and not the file.
loc = ("C:/Users/mleef/Desktop/python text/practice_data.xlsx")
wb = xlrd.open_workbook(loc)
sheet = wb.sheet_by_index(0)
sheet.cell_value(0,0)
print(sheet.nrows)
output:
1429 (number if rows in the data set)
Or if you have the file already open, you will see this error

Getting error while opening excel file in Python

Hi I am very new to python, here i m trying to open a xls file in python code but it is showing me some error as below.
Code:
from xlrd import open_workbook
import os.path
wb = open_workbook('C:\Users\xxxx\Desktop\a.xlsx')
Error:Traceback (most recent call last):
File "C:\Python27\1.py", line 3, in <module>
wb = open_workbook('C:\Users\xxxx\Desktop\a.xlsx')
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 429, in open_workbook
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 1545, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 1539, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found 'PK\x03\x04\x14\x00\x06\x00'
need help guyz
This is a version conflict issue. Your Excel sheet format and the format that xlrd expects are different. You could try to save the Excel sheet in a different format until you find what xlrd expects.
Not familiar with xlrd, but nothing wrong appears on my Mac.
According to #jewirth, you can try to rename the suffix to xls which is the old version, and then reopen it or convert it into xlsx.
from xlrd import open_workbook
import os.path
wb = open_workbook(r'C:\Users\XXXX\Desktop\a.xlsx')
print wb
Output : <xlrd.book.Book object at 0x0260E490>
Opened the excel in 'r' format and it shows the excel object. Its working normally. Try to get the xlrd version and update it. Change the excel file format to '.xls' from '.xlsx' and try
You are getting that error because you are using an old version of xlrd which doesn't support xlsx.
You need to upgrade to a recent version of xlrd.

Can't load xlsx file

I am trying to read attached xlsx (Click here to download ) file using python openpyxl. However, workbook cannot be loaded. Here is my attempt to open xlsx file in python -
>>> from openpyxl import load_workbook
>>> workbook = load_workbook(filename = "test.xlsx")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\openpyxl\reader\excel.py", line 136, in load_workbook
_load_workbook(wb, archive, filename, use_iterators, keep_vba)
File "C:\Python27\lib\site-packages\openpyxl\reader\excel.py", line 198, in _load_workbook
keep_vba=keep_vba)
File "C:\Python27\lib\site-packages\openpyxl\reader\worksheet.py", line 332, in read_worksheet
fast_parse(ws, xml_source, string_table, style_table, color_index)
File "C:\Python27\lib\site-packages\openpyxl\reader\worksheet.py", line 320, in fast_parse
parser.parse()
File "C:\Python27\lib\site-packages\openpyxl\reader\worksheet.py", line 137, in parse
dispatcher[tag_name](element)
File "C:\Python27\lib\site-packages\openpyxl\reader\worksheet.py", line 176, in parse_merge
self.ws.merge_cells(mergeCell.get('ref'))
File "C:\Python27\lib\site-packages\openpyxl\worksheet.py", line 815, in merge_cells
raise InsufficientCoordinatesException(msg)
openpyxl.shared.exc.InsufficientCoordinatesException: Range must be a cell range (e.g. A1:E1)
It appears that your .xlsx file is damaged or permanently corrupted. The reasons could be many. One of them could be that you might have renamed the extension of the file to .xlsx which would invalidate the file. To confirm this beahviour, please try to open this file in Microsoft Excel.
I tried reading the file through, openpyxl, xlrd and pandas but none of them worked.
>>> import xlrd
>>> xlrd.open_workbook('test.xlsx')
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '<html> <'
>>> from openpyxl import load_workbook
>>> workbook = load_workbook(filename = "test.xlsx")
InvalidFileException: File is not a zip file
>>> import pandas
>>> pandas.ExcelFile('test.xlsx')
InvalidFileException: File is not a zip file
I ran into this issue trying to open every file in a directory ending in *.xlsx .
I later found the file that caused the error was named ~$filename.xlsx . I'm guessing that Microsoft indicates that a file is currently opened by creating a file with the same name, prepended with the ~$. Once I closed the file, everything worked as expected.
The problem was that some merged cells were, in fact, merged with themselves. openpyxl expected a merged cell reference always to be a range of cells. A fix for the problem which ignores meaningless merges has been added to the 2.0 branch.
I like openpyxl and use it for creating xlsx documents. It could be a bug or a missing compatibility with excel feature that takes place in your specific document. I would report it to the openpyxl community
OK Guys.. I have reported this bug to openpyxl developers and they have provided a quick fix on this. Here is the complete thread.
I did never try openpyxl but I use xlrd for reading excel files (.xls and .xlsx). its work great.
see the examples and documentation at http://www.python-excel.org/

Categories

Resources