Index out of range when reading with Openpyxl - python

I am trying to open a .xlsx file with Openpyxl, using the "Optimized reader" tips from the documentation :
# -*- coding: iso-8859-1 -*-
from openpyxl import load_workbook
wb = load_workbook(filename = r'/path/to/the/file.xlsx', use_iterators = True)
This give me the following error :
Traceback (most recent call last):
File "/home/me/test.py", line 5, in <module>
wb = load_workbook(filename = r'/path/to/the/file.xlsx', use_iterators = True)
File "/usr/local/lib/python2.6/dist-packages/openpyxl/reader/excel.py", line 151, in load_workbook
_load_workbook(wb, archive, filename, read_only, keep_vba)
File "/usr/local/lib/python2.6/dist-packages/openpyxl/reader/excel.py", line 240, in _load_workbook
wb._named_ranges = list(read_named_ranges(archive.read(ARC_WORKBOOK), wb))
File "/usr/local/lib/python2.6/dist-packages/openpyxl/reader/workbook.py", line 160, in read_named_ranges
named_range.scope = workbook.worksheets[int(location_id)]
IndexError: list index out of range
I also tried using flags (keep_vba = True|False, guess_types = True|False, data_only = True|False) with every combination. Same error.
The .xlsx file I am trying to open has 13 worksheets, there is no worksheet with more than 200 row, so I suppose this is not a size problem.
I can't edit anything on this .xlsx file, I don't have permission, this is a readonly file for me.
I am using Python 2.6 on a Debian Squeeze 64 bits and the version of Openpyxl is 2.1.0.
If I try to open an other file (an empty test file), it works fine (no error triggered, the script carry on).
So I suppose the question is : what is wrong with the .xlsx file I am trying to open ?

The problem is related to the defined names / ranges in use. I've seen it an another file but not yet sure quite what's triggering it. Can you please submit a bug, preferably with a sample file, as this will make tracking the problem down a lot easier.
The 2.1 branch should contain a fix for this if you can try a checkout. As far as I can tell the issue is related to the use of defined names from other workbooks or when using some of the reserved names for print areas, etc. Such definitions are likely lost when the file is processed by openpyxl but shouldn't affect the data itself

Related

How to save an excel using pywin32?

I am trying to save an excel file generated by another application that is open. i.e the excel application is in the foreground. This file has some data and it needs to be saved i.e written into the disk.
In other words, I need to do an operation like File->SaveAs.
Steps to reproduce:
Open an Excel Application. This will be shown as Book1 - Excel in the title by default
Write this code and run
import win32com.client as win32
app = win32.gencache.EnsureDispatch('Excel.Application')
app.Workbooks(1).SaveAs(r"C:\Users\test\Desktop\test.xlsx")
app.Application.Quit()
Error -
Traceback (most recent call last):
File "c:/Users/test/Downloads/automate_excel.py", line 6, in <module>
ti = disp._oleobj_.GetTypeInfo()
pywintypes.com_error: (-2147418111, 'Call was rejected by callee.', None, None)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:/Users/test/Downloads/automate_excel.py", line 6, in <module>
app = win32.gencache.EnsureDispatch('Excel.Application')
File "C:\Users\test\AppData\Local\Programs\Python\Python38\lib\site-packages\win32com\client\gencache.py", line 633, in EnsureDispatch
raise TypeError(
TypeError: This COM object can not automate the makepy process - please run makepy manually for this object
There could be many sources for your problem so I would apreciate if you shared further code. The second error can for example occur when you are running multiple instances of the line excel = win32.gencache.EnsureDispatch('Excel.Application') for example in a for loop .
Also make sure to have a version of excel that is fully activated and licensed .
This is working for me (on python==3.9.8 and pywin32==305). You'll see that the first line is a different than yours, but I think that's really it.
In the course of this we kept getting Attribute Errors for the Workbook or for setting DisplayAlerts. We found (from this question: Excel.Application.Workbooks attribute error when converting excel to pdf) that if Excel is in a loop (for example, editing a cell or has a pop-up open) then you will get an error. So, be sure to click enter out of a cell so that you aren't editing it.
import win32com.client as win32
savepath = 'c:\\my\\file\\path\\test\\'
xl = win32.Dispatch('Excel.Application')
wb = xl.Workbooks['Book1']
wb.DisplayAlerts = False # helpful if saving multiple times to save file, it means you won't get a pop-up for overwrite and will default to save it.
filename = 'new_xl.xlsx'
wb.SaveAs(savepath+filename)
wb.Close()
xl.Quit()
edit: add pywin32 version, include some more tips
This is the version that worked for me based on #scotscotmcc's answer. The issue was with the cell which was in edit mode while I was running the program. Make sure you hit enter in the current cell and come out of the edit mode in excel.
import win32com.client as win32
import random
xl = win32.Dispatch('Excel.Application')
wb = xl.Workbooks['Book1']
wb.SaveAs(r"C:\Users\...\Desktop\Form"+str(random.randint(0,1000))+".xlsx")
wb.Close()
xl.Quit()

Modifying the xlsx file using openpyxl in databricks directly without pandas/dataframe

import openpyxl
input_workbook1 = openpyxl.load_workbook('/dbfs/FileStore/Test/my_excel.xlsx')
sheet_1 = input_workbook1.active
sheet_1['A2'] = 'A2'
input_workbook1.save('/dbfs/FileStore/Test/Output.xlsx')
OSError: [Errno 95] Operation not supported
I tried reading the excel file directly using openpyxl in databricks , I can able to read and modify directly without pandas/dataframes, but when I am trying to save i.e last line in above code facing the issue.I tried exactly the same way but facing the above error , can anyone help me please
I tried doing the same procedure and it gave me the same error OSError: [Errno 95] Operation not supported. The reason for this is that there is a limitation that random writes do not work on the local file system and here is the official Microsoft documentation (Local File API limitations) which refers to this issue.
So, try instead of trying to write to the local file system, write the file to /databricks/driver/ path and then copy/move the file to required directory.
Modify your code as following:
import openpyxl
input_workbook1 = openpyxl.load_workbook('/dbfs/FileStore/Test/my_excel.xlsx')
sheet_1 = input_workbook1.active
sheet_1['A2'] = 'A2'
input_workbook1.save('Output.xlsx')
#will be saved to '/databricks/driver/'.
#Use dbutils.fs.ls('/databricks/driver/') to view.
from shutil import move
move('/databricks/driver/Output.xlsx','/dbfs/FileStore/Test/')
wb1 = openpyxl.load_workbook('/dbfs/FileStore/Output.xlsx')
ws1 = wb1.active
for row in ws1.iter_rows():
print([col.value for col in row])
The above code will successfully move your file to the required path without any errors.

Pandas suddenly cannot open Excel file (can't find workbook in OLE2 compound document

I have a script that reads an xlsx excel file that was working fine until a week ago. The error message is:
xlrd.biffh.XLRDError: Can't find workbook in OLE2 compound document
By debugging the script, I've found the whole stack:
C:\MyFolder\MyScript.py", line 42, in PandasReadExcel
ef=pd.read_excel(excfile,sheetname,header,skiprows)
File "C:\Python\Python36\lib\site-packages\pandas\io\excel.py", line 191, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\Python\Python36\lib\site-packages\pandas\io\excel.py", line 249, in __init__
self.book = xlrd.open_workbook(io)
File "C:\Python\Python36\lib\site-packages\xlrd\__init__.py", line 441, in open_workbook
ragged_rows=ragged_rows, File "C:\Python\Python36\lib\site-packages\xlrd\book.py", line 87, in open_workbook_xls
ragged_rows=ragged_rows,
File "C:\Python\Python36\lib\site-packages\xlrd\book.py", line 595, in biff2_8_load
raise XLRDError("Can't find workbook in OLE2 compound document")
xlrd.biffh.XLRDError: Can't find workbook in OLE2 compound document
By reviewing similar cases here and in GitHub, I've found that this error usually occurs with xlsm files or password-protected files. But the concerned Excel workbook is not password protected and is a xlsx file. To my "unluck" I don't know the person who changes the file, it is updated regularly by a team that takes laboratory analysis, so I don't have any ideas of what they changed in the file. All I know is that I can open/edit that file with no problem.
Some threads suggest updating pandas or xlrd version (I am using pandas 0.19.2), which I am wanting to avoid, since the script runs in a remote server and updating the version would affect proper work of other scripts depending on this routine.
I thank anybody who has any clue on how to solve this problem.
After months struggling with this error, I've learned that the concerned files are being edited using an older version of Microsoft Office (namely Office 2007, in this very case). Then I decided to implement a clumsy workaround solution:
Just open the files using a compatible Excel version, and save a copy in a different folder; then open the file using pandas read_excel function, it should open normally!
To automate this task I wrote a powershell script just to open the original file and save the copy. This script must be executed according to how often the data is updated:
$FileName = "\\path\to\the\source\file.xlsx"
$FileNameCopy = "\\path\to\the\copy\file.xlsx"
$xl = New-Object -comobject Excel.Application
# repeat this for every file concerned
$wb = $xl.Workbooks.open("$FileName",3)
$wb.SaveAs($FileNameCopy)
$wb.Close($False)
$xl.Quit()
Now I can have my data loaded normally again.

TypeError when trying to add CSV entry into Spreadsheet using Pandas and XLSXwriter

I am currently trying to create a program that scans a CSV file and searches entries in the file using the BING API, the results are then pasted into a spreadsheet.
Part of this macro involves also pasting onto the spreadsheet what term is being searched, so I am effectively copying an entry from the CSV into a spreadsheet, which sounds pointless but serves a vital role.
My CSV looks like this:
EntryNumber Name Company TitleNumber
123 john hsbc 5555
124 chris ford 6666
125 adam apple 7777
I use Pandas to extract the data from the CSV by iterating it through it row by row, using this code:
for index,row in df.iterrows():
entrynumber = row['EntryNumber']
name = row['Name']
company = row['Company']
title = row['TitleNumber']
Then I try and write one of the variables to a cell in the spreadsheet using xlsxwriter:
worksheet.write(row, col, entrynumber)
However this generates a type error, the traceback is below:
Traceback (most recent call last):
File "CSVtest.py", line 68, in <module>
worksheet.write(row, col, entrynumber)
File "/usr/local/lib/python3.5/site-packages/xlsxwriter/worksheet.py", line 57, in cell_wrapper
int(args[0])
File "/usr/local/lib/python3.5/site-packages/pandas/core/series.py", line 92, in wrapper
"{0}".format(str(converter)))
TypeError: cannot convert the series to <class 'int'>
Exception ignored in: <bound method Workbook.__del__ of <xlsxwriter.workbook.Workbook object at 0x1088118d0>>
Traceback (most recent call last):
File "/usr/local/lib/python3.5/site-packages/xlsxwriter/workbook.py", line 148, in __del__
Exception: Exception caught in workbook destructor. Explicit close() may be required for workbook.
No idea why this is happening, I've tried converting the variables to strings but the error still pops up, anyone got any ideas?
Any help is greatly appreciated :) Thanks.
Hey everyone I figured out a solution incase anyone else is as stupid as me to make the same mistake.
Basically, as I was using XLSXWRITER I had a variable called 'row' to tell the module where to start writing data to the spreadsheet.
In my haze I completely forgot that I had also used that same name when I was iterating over the CSV file using PANDAS, using the code:
for index,row in df.iterrows():
Obviously this caused some sort of error as Python got mixed up between the two.
Anyway its unlikely but hopefully this can help someone who makes a similar mistake while learning!

pyExcelerator has problems reading some files

I've got a problem using pyExcelerator when reading some xls-files.
There're some python scripts i wrote, that use this library to parse XLS-files and populate database with info.
The templates for the files these scripts parse may vary and i sometimes reconfigure the script to handle them. With the one of the templates i ran into problem: pyExcelerator just raises an exception:
Traceback (most recent call last):
File "/home/* * */parsexls.py",
line 64, in handle_label
parser.parse()
File "/home/* * */parsers.py", line 335, in parse
self.contents = pyExcelerator.parse_xls(self.file_record.file,
self.encoding)
File "/usr/local/lib/python2.6/dist-packages/pyExcelerator/ImportXLS.py",
line 327, in parse_xls
ole_streams = CompoundDoc.Reader(filename).STREAMS
File "/usr/local/lib/python2.6/dist-packages/pyExcelerator/CompoundDoc.py",
line 67, in __init__
self.__build_short_sectors_data()
File "/usr/local/lib/python2.6/dist-packages/pyExcelerator/CompoundDoc.py",
line 256, in __build_short_sectors_data
dentry_start_sid, stream_size) = self.dir_entry_list[0]
IndexError: list index out of range
Some of the problem XLS-files contained empty sheets and removing of these sheets helped, but many of the files can't be handled even without empty sheets. There's nothing extraordinary in these files and they contain no formulas or pictures - just strings, numbers and dates.
As i can see, the pyExcelerator is abandoned by it's author :(
Any suggestions on fixing this issue are much appreciated.
I'm the author of xlrd. It reads XLS files and is not a fork of anything. I maintain a package called xlwt which writes XLS files and is a fork of pyExcelerator. The parse_xls functionality in pyExcelerator was deprecated to the point of removal from xlwt. Use xlrd instead.
Given the traceback that you reproduced, it looks like the file may be corrupted. What it is doing there happens well before the sheet data is parsed. What software produces these files? Can you open them with Excel or OpenOffice.org's Calc or Gnumeric? xlrd may give you a more meaningful error message. You may like to send me (insert_punctuation('sjmachin', 'lexicon', 'net')) copies of your failing file(s); please include some with and some without empty sheets. By the way, what are you using to remove empty sheets? What error message do you get from pyExcelerator when processing files with empty sheets?
You might wish to give xlrd a try... it started (I believe) as a fork of pyExcelerator, so incorporating requires few code changes, but it is actively maintained:
http://pypi.python.org/pypi/xlrd
Project website
General info, release notes and history from the documentation

Categories

Resources