Getting error while opening excel file in Python

Getting error while opening excel file in Python - python

Hi I am very new to python, here i m trying to open a xls file in python code but it is showing me some error as below.
Code:
from xlrd import open_workbook
import os.path
wb = open_workbook('C:\Users\xxxx\Desktop\a.xlsx')
Error:Traceback (most recent call last):
File "C:\Python27\1.py", line 3, in <module>
wb = open_workbook('C:\Users\xxxx\Desktop\a.xlsx')
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 429, in open_workbook
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 1545, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 1539, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found 'PK\x03\x04\x14\x00\x06\x00'
need help guyz

This is a version conflict issue. Your Excel sheet format and the format that xlrd expects are different. You could try to save the Excel sheet in a different format until you find what xlrd expects.

Not familiar with xlrd, but nothing wrong appears on my Mac.
According to #jewirth, you can try to rename the suffix to xls which is the old version, and then reopen it or convert it into xlsx.

from xlrd import open_workbook
import os.path
wb = open_workbook(r'C:\Users\XXXX\Desktop\a.xlsx')
print wb
Output : <xlrd.book.Book object at 0x0260E490>
Opened the excel in 'r' format and it shows the excel object. Its working normally. Try to get the xlrd version and update it. Change the excel file format to '.xls' from '.xlsx' and try

You are getting that error because you are using an old version of xlrd which doesn't support xlsx.
You need to upgrade to a recent version of xlrd.

Related

Book has no extract_formulas attribute calling xlrd.open_workbook()

I have this code:
import xlrd
path = "C:\\Users\\m.macapanas\\Desktop\\OFCCP_Default_Values.xlsm"
excel_workbook = xlrd.open_workbook(path)
excel_worksheet = excel_workbook.sheet_by_index(0)
#Read from Excel Worksheet
print("Your Worksheet has " + str(excel_worksheet.ncols) + " columns")
print("Your Worksheet has " + str(excel_worksheet.nrows) + " rows")
for row in range (excel_worksheet.nrows):
for col in range(excel_worksheet.ncols):
print(excel_worksheet.cell_value(row, col), end='')
print('\t', end='')
print()
Then the result is error
Traceback (most recent call last):
File "C:/Users/m.macapanas/IdeaProjects/OFCCP Tool/Read Excel File with Python/Pandas.py", line 4, in
excel_workbook = xlrd.open_workbook(path)
File "C:\Users\m.macapanas\AppData\Roaming\Python\Python36\site-packages\xlrd_init_.py", line 141, in open_workbook
ragged_rows=ragged_rows,
File "C:\Users\m.macapanas\AppData\Roaming\Python\Python36\site-packages\xlrd\xlsx.py", line 808, in open_workbook_2007_xml
x12book.process_stream(zflo, 'Workbook')
File "C:\Users\m.macapanas\AppData\Roaming\Python\Python36\site-packages\xlrd\xlsx.py", line 265, in process_stream
meth(self, elem)
File "C:\Users\m.macapanas\AppData\Roaming\Python\Python36\site-packages\xlrd\xlsx.py", line 392, in do_sheet
sheet = Sheet(bk, position=None, name=name, number=sheetx)
File "C:\Users\m.macapanas\AppData\Roaming\Python\Python36\site-packages\xlrd\sheet.py", line 326, in init
self.extract_formulas = book.extract_formulas
AttributeError: 'Book' object has no attribute 'extract_formulas'

According to the xlrd documentation states in a warning:
This library will no longer read anything other than .xls files.
Your error is popping up when you attempt to open a workbook for the file "C:\\Users\\m.macapanas\\Desktop\\OFCCP_Default_Values.xlsm", which has a .xlsm extension.
The xlrd library explicitly doesn't support reading the newer file formats like .xlsm. So you'll either have to switch libraries or find a way to downgrade your input file to supported .xls format.

Issue
Analyze the error
line 4, in excel_workbook = xlrd.open_workbook(path)
Your script fails to open the workbook.
AttributeError: 'Book' object has no attribute 'extract_formulas'
The attribute-error states, it does not find extract_formulas as attribute of xlrd's Book object.
Caused by unsupported file-format .xlsx
As Nathaniel Ford's answer explained:
xlrd (as of current version 2.0.1) only supports older Excel file-format .xls
See also
Pandas cannot open an Excel (.xlsx) file
Why is python xlrd errors when opening a .xlsm instead of .xls
Alternative solution
Research on Stackoverflow gave:
How can I open an Excel file in Python?
Working with Excel Files in Python is a great resources-collection which lists popular libraries.
Ported to OpenPyXL
There on top: openpyxl
The recommended package for reading and writing Excel 2010 files (ie: .xlsx)
After installing using:
pip install openpyxl
Your code might be ported to this library like:
from openpyxl import load_workbook
path = "C:\\Users\\m.macapanas\\Desktop\\OFCCP_Default_Values.xlsm"
excel_workbook = load_workbook(filename = path)
excel_worksheet = excel_workbook. worksheets[0] # first worksheet
# Read from Excel Worksheet
print("Your Worksheet has " + str(excel_worksheet.ncols) + " columns")
print("Your Worksheet has " + str(excel_worksheet.nrows) + " rows")
for row in excel_worksheet.rows:
for col in excel_worksheet.cols:
print(excel_worksheet.cell(row, col), end='')
print('\t', end='')
print()

Openpyxl FileNotFoundError

I have been trying to get openpyxl working with pycharm but the excel documents appear with a question mark, and when I try to run code it says filenotfounderror
import openpyxl as xl
wb = xl.load_workbook("transactions.xlsx")
print(wb)
I expect the output to be the cell values but instead i get this:
Traceback (most recent call last): File
"C:/Users/nicol/.PyCharmCE2019.1/config/scratches/excel_work.py", line
3, in
wb = xl.load_workbook("transactions.xlsx") File "C:\Users\nicol\PycharmProjects\FirstProject\venv\lib\site-packages\openpyxl\reader\excel.py",
line 311, in load_workbook
data_only , keep_links) File "C:\Users\nicol\PycharmProjects\FirstProject\venv\lib\site-packages\openpyxl\reader\excel.py",
line 126, in init
self.archive = _validate_archive(fn) File "C:\Users\nicol\PycharmProjects\FirstProject\venv\lib\site-packages\openpyxl\reader\excel.py",
line 98, in _validate_archive
archive = ZipFile(filename, 'r') File "C:\Users\nicol\AppData\Local\Programs\Python\Python37-32\lib\zipfile.py",
line 1204, in init
self.fp = io.open(file, filemode) FileNotFoundError: [Errno 2] No such file or directory: 'transactions.xlsx'

Add full path to the file like:
C:\Users\mee\Desktop\Test
import openpyxl as xl
wb = xl.load_workbook("C:\Users\mee\Desktop\Test\transactions.xlsx") ' Change your path
print(wb)

you must use full path **
**or change directory
for example
import openpyxl as xl
import os
os.chdir("c:/user/sam/desktop/test")
wb = xl.load_workbook("transactions.xlsx")
print(wb)
transaction in test folder

It is also always a good idea to check if the "filename string" actually refers to a file. In order to check this, use something like
import os
absolute_filename = r"C:\Users\mee\Desktop\Test\transactions.xlsx"
if not os.path.isfile(absolute_filename):
print("ERROR: File not found!")
exit(-1)
This way you can be sure the file is actually there! If it isn't, all libraries (e.g. openpyxl) will throw some sort of error/exception.

I've faced the same issue. It worked with me when I copied the relative path, which is the path starting from the project name.
from openpyxl import Workbook, load_workbook
wb = load_workbook('Projects/automate_excel/book1.xlsx')
enter image description here
I hope it will work with you! also, make sure that your excel file and you Python file are in the same folder.

Export Python Dataframe to Excel

I'm trying to export a Python Dataframe to excel using xlsx or csv...
Here is the code I tried to use:
export_word_count = word_count.to_excel (r'C:\Users\OTR\PycharmProjects\MyProjects\word_count.xlsx', index = None, header=True)
I keep getting the following error messages:
Traceback (most recent call last):
File "C:/Users/OTR/PycharmProjects/MyProjects/CAP_Test_MotsCles.py", line 35,
in <module>
export_word_count = word_count.to_excel
(r'C:\Users\OTR\PycharmProjects\MyProjects\word_count_CAP.xlsx', index = None,
header=True)
File "C:\Users\OTR\PycharmProjects\MyProjects\venv\lib\site-
packages\pandas\core\generic.py", line 2127, in to_excel
engine=engine)
File "C:\Users\OTR\PycharmProjects\MyProjects\venv\lib\site-packages\pandas\io\formats\excel.py", line 656, in write
writer = ExcelWriter(_stringify_path(writer), engine=engine)
File "C:\Users\OTR\PycharmProjects\MyProjects\venv\lib\site-packages\pandas\io\excel.py", line 1204, in __init__
from openpyxl.workbook import Workbook
ModuleNotFoundError: No module named 'openpyxl'
Any output on this would be greatly appreciated. I tried to tweek the code, but still wouldn't export. Thank you.
EDIT:
Managed to export, but having issues with full data export
Sample Python Data:
products 58
company 53
cannabis 42
business 39

You dont have python openpyxl module installed.
Install it with:
pip install openpyxl

Your words are your index. Right now you are not exporting the index.
Try changing your code to:
word_count.to_excel (r'C:\Users\OTR\PycharmProjects\MyProjects\word_count.xlsx', index =True, header=True)
‘index=True’ is the default behavior, so not actually necessary.

Why I am not able to load excel files generated in the morning, but can load them in the afternoon in Python using Openpyxl

I am using Python Openpyxl to import excel files which are generated by a online tool. When I import the files generated in the morning, I got an error like this:
Traceback (most recent call last):
File "test4.py", line 8, in <module>
wb = openpyxl.load_workbook (temp2)
File "C:\Python27\lib\site-packages\openpyxl\reader\excel.py", line 201, in load_workbook
wb.properties = DocumentProperties.from_tree(src)
File "C:\Python27\lib\site-packages\openpyxl\descriptors\serialisable.py", line 89, in from_tree
return cls(**attrib)
File "C:\Python27\lib\site-packages\openpyxl\packaging\core.py", line 106, in__init__
self.modified = modified
File "C:\Python27\lib\site-packages\openpyxl\descriptors\base.py", line 267, in __set__
value = W3CDTF_to_datetime(value)
File "C:\Python27\lib\site-packages\openpyxl\utils\datetime.py", line 40, in W3CDTF_to_datetime
dt = [int(v) for v in match.groups()[:6]]
AttributeError: 'NoneType' object has no attribute 'groups'
The strange thing is I only got this error when I importing the files which are generated by the online tool in the morning. I tried the same file but generated in the afternoon, it works very well. I'm confused where the problem is. There are no fields in the excel files related to time. And the files generated in the morning and in the afternoon are exactly the same except the modified time. Does anybody can help me with it? Thank you.

Excel files created from this online tool isn't well compatible with openpyxl
The function load_workbook will get workbook-level information and assign to Workbook()'s wb.properties from 'docProps/core.xml' by opening excel file through zipfile. One piece of information is modified time.
The value of modified raise the error, it can't be transported into datetime. The pattern of 'modified' must be openpyxl.utils.datetime.W3CDTF_REGEX, which is W3CDTF|W3C Date and Time Formats
You can check the excel's modified time if it corresponds to W3CDTF. Here is the code:
from openpyxl.reader.excel import _validate_archive
archive = _validate_archive('/path/to/yourexcel.xlsx')
valid_files = archive.namelist()
# you'll find 'xx/core.xml' I'm not sure if it's 'docProps/core.xml'
print valid_files
# read 'xx/core.xml'
wb_info = archive.read('docProps/core.xml')
print wb_info
In wb_info, you will find something like
<dcterms:modified xsi:type="dcterms:W3CDTF">2017-04-01T22:48:48Z</dcterms:modified>.
Contrast wb_info of excel files from online tool and your pc.

Can't load xlsx file

I am trying to read attached xlsx (Click here to download ) file using python openpyxl. However, workbook cannot be loaded. Here is my attempt to open xlsx file in python -
>>> from openpyxl import load_workbook
>>> workbook = load_workbook(filename = "test.xlsx")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\openpyxl\reader\excel.py", line 136, in load_workbook
_load_workbook(wb, archive, filename, use_iterators, keep_vba)
File "C:\Python27\lib\site-packages\openpyxl\reader\excel.py", line 198, in _load_workbook
keep_vba=keep_vba)
File "C:\Python27\lib\site-packages\openpyxl\reader\worksheet.py", line 332, in read_worksheet
fast_parse(ws, xml_source, string_table, style_table, color_index)
File "C:\Python27\lib\site-packages\openpyxl\reader\worksheet.py", line 320, in fast_parse
parser.parse()
File "C:\Python27\lib\site-packages\openpyxl\reader\worksheet.py", line 137, in parse
dispatcher[tag_name](element)
File "C:\Python27\lib\site-packages\openpyxl\reader\worksheet.py", line 176, in parse_merge
self.ws.merge_cells(mergeCell.get('ref'))
File "C:\Python27\lib\site-packages\openpyxl\worksheet.py", line 815, in merge_cells
raise InsufficientCoordinatesException(msg)
openpyxl.shared.exc.InsufficientCoordinatesException: Range must be a cell range (e.g. A1:E1)

It appears that your .xlsx file is damaged or permanently corrupted. The reasons could be many. One of them could be that you might have renamed the extension of the file to .xlsx which would invalidate the file. To confirm this beahviour, please try to open this file in Microsoft Excel.
I tried reading the file through, openpyxl, xlrd and pandas but none of them worked.
>>> import xlrd
>>> xlrd.open_workbook('test.xlsx')
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '<html> <'
>>> from openpyxl import load_workbook
>>> workbook = load_workbook(filename = "test.xlsx")
InvalidFileException: File is not a zip file
>>> import pandas
>>> pandas.ExcelFile('test.xlsx')
InvalidFileException: File is not a zip file

I ran into this issue trying to open every file in a directory ending in *.xlsx .
I later found the file that caused the error was named ~$filename.xlsx . I'm guessing that Microsoft indicates that a file is currently opened by creating a file with the same name, prepended with the ~$. Once I closed the file, everything worked as expected.

The problem was that some merged cells were, in fact, merged with themselves. openpyxl expected a merged cell reference always to be a range of cells. A fix for the problem which ignores meaningless merges has been added to the 2.0 branch.

I like openpyxl and use it for creating xlsx documents. It could be a bug or a missing compatibility with excel feature that takes place in your specific document. I would report it to the openpyxl community

OK Guys.. I have reported this bug to openpyxl developers and they have provided a quick fix on this. Here is the complete thread.

I did never try openpyxl but I use xlrd for reading excel files (.xls and .xlsx). its work great.
see the examples and documentation at http://www.python-excel.org/

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Getting error while opening excel file in Python - python

This is a version conflict issue. Your Excel sheet format and the format that xlrd expects are different. You could try to save the Excel sheet in a different format until you find what xlrd expects.

Not familiar with xlrd, but nothing wrong appears on my Mac. According to #jewirth, you can try to rename the suffix to xls which is the old version, and then reopen it or convert it into xlsx.

You are getting that error because you are using an old version of xlrd which doesn't support xlsx. You need to upgrade to a recent version of xlrd.

Related

Book has no extract_formulas attribute calling xlrd.open_workbook()

Openpyxl FileNotFoundError

Export Python Dataframe to Excel

Why I am not able to load excel files generated in the morning, but can load them in the afternoon in Python using Openpyxl

Can't load xlsx file

Categories

Resources