Python XLWT attempt to overwrite cell workaround

Python XLWT attempt to overwrite cell workaround - python

Using the python module xlwt, writing to the same cell twice throws an error:
Message File Name Line Position
Traceback
<module> S:\********
write C:\Python26\lib\site-packages\xlwt\Worksheet.py 1003
write C:\Python26\lib\site-packages\xlwt\Row.py 231
insert_cell C:\Python26\lib\site-packages\xlwt\Row.py 150
Exception: Attempt to overwrite cell: sheetname=u'Sheet 1' rowx=1 colx=12
with the code snippet
def insert_cell(self, col_index, cell_obj):
if col_index in self.__cells:
if not self.__parent._cell_overwrite_ok:
msg = "Attempt to overwrite cell: sheetname=%r rowx=%d colx=%d" \
% (self.__parent.name, self.__idx, col_index)
raise Exception(msg) #row 150
prev_cell_obj = self.__cells[col_index]
sst_idx = getattr(prev_cell_obj, 'sst_idx', None)
if sst_idx is not None:
self.__parent_wb.del_str(sst_idx)
self.__cells[col_index] = cell_obj
Looks like the code 'raise'es an exception which halts the entire process. Is removing the 'raise' term enough to allow for overwriting cells? I appreciate xlwt's warning, but i thought the pythonic way is to assume "we know what we're doing". I don't want to break anything else in touching the module.

The problem is that overwriting of worksheet data is disabled by default in xlwt. You have to allow it explicitly, like so:
worksheet = workbook.add_sheet("Sheet 1", cell_overwrite_ok=True)

What Ned B. has written is valuable advice -- except for the fact that as xlwt is a fork of pyExcelerator, "author of the module" is ill-defined ;-)
... and Kaloyan Todorov has hit the nail on the head.
Here's some more advice:
(1) Notice the following line in the code that you quoted:
if not self.__parent._cell_overwrite_ok:
and search the code for _cell_overwrite_ok and you should come to Kaloyan's conclusion.
(2) Ask questions on (and search the archives of) the python-excel google-group
(3) Check out this site which gives pointers to the google-group and to a tutorial.
Background: the problem was that some people didn't know what they were doing (and in at least one case were glad to be told), and the behaviour that xlwt inherited from pyExcelerator was to blindly write two (or more) records for the same cell, which led not only to file bloat but also confusion, because Excel would complain and show the first written and OpenOffice and Gnumeric would silently show the last written. Removing all trace of the old data from the shared string table so that it wouldn't waste space or (worse) be visible in the file was a PITA.
The whole saga is recorded in the google-group. The tutorial includes a section on overwriting cells.

If you:
don't want to set the entire worksheet to be able to be overwritten in the constructor, and
still catch the exception on a case-by-case basis
...try this:
try:
worksheet.write(row, col, "text")
except:
worksheet._cell_overwrite_ok = True
# do any required operations since we found a duplicate
worksheet.write(row, col, "new text")
worksheet._cell_overwrite_ok = False

You should get in touch with the author of the module. Simply removing a raise is unlikely to work well. I would guess that it would lead to other problems further down the line. For example, later code may assume that any given cell is only in the intermediate representation once.

Related

workbook save failing, not sure why

I apologize for the length of this. I am a relative Neophyte to Excel VBA and even more junior with Python. I have run into an issue with an error that occasionally occurs in python using OpenPyXl (just trying that for the first time).
Background: I have a series of python scripts (12) running and querying an API to gather data and populate 12 different, though similar, workbooks. Separately, I have a equal number of Excel instances periodically looking for that data and doing near-real-time analysis and reporting. Another python script looks for key information to be reported from the spreadsheets and will text it to me when identified. The problem seems to occur between the data gathering python scripts and a copy command in the data analysis workbooks.
The way the python data gathering scripts "talk" to the analysis workbooks is via the sheets they build in their workbooks. The existing vba in the analysis workbooks will copy the data workbooks to another directory (so that they can be opened and manipulated without impacting their use by the python scripts) and then interpret and copy the data into the Excel analysis workbook. Although I recently tested a method to read the data directly from those python-created workbooks without opening them, the vba will require some major surgery to convert to that method and is likely not going to happen soon.
TL,DR: There are data workbooks and analysis workbooks. Python builds the data workbooks and the analysis workbooks use VBA to copy the data workbooks to another directory and load specific data from the copied data workbooks. There is a one-to-one correspondence between the data and analysis workbooks.
Based on the above, I believe that the only "interference" that occurs with the data workbooks is when the macro in the analysis workbook copies the workbook. I thought this would be a relatively safe level of interference, but it apparently is not.
The copy is done in VBA with this set of commands (the actual VBA sub is about 500 lines):
fso.CopyFile strFromFilePath, strFilePath, True
where fso is set thusly:
Set fso = CreateObject("Scripting.FileSystemObject")
and the strFromFilePath and strFilePath both include a fully qualified file name (with their respective paths). This has not generated any errors on the VBA side.
The data is copied about once a minute (though it varies from 40 seconds to about 5 minutes) and seems to work fine from a VBA perspective.
What fails is the python side about 1% of the time (which is probably 12 or fewer times daily. While that seems small, the associated data capture process halts until I notice and restart it. This means anywhere from 1 to all 12 of the data capture processes will fail at some point each day.
Here is what a failure looks like:
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
monitor('DLD',1,13,0)
File "<string>", line 794, in monitor
File "C:\Users\abcd\AppData\Local\Programs\Python\Python39\lib\site-packages\openpyxl\workbook\workbook.py", line 407, in save
save_workbook(self, filename)
File "C:\Users\abcd\AppData\Local\Programs\Python\Python39\lib\site-packages\openpyxl\writer\excel.py", line 291, in save_workbook
archive = ZipFile(filename, 'w', ZIP_DEFLATED, allowZip64=True)
File "C:\Users\abcd\AppData\Local\Programs\Python\Python39\lib\zipfile.py", line 1239, in __init__
self.fp = io.open(file, filemode)
PermissionError: [Errno 13] Permission denied: 'DLD20210819.xlsx'
and I believe it occurs as a result of the following lines of python code (which comes after a while statement with various if conditions to populate the worksheets). The python script itself is about 200 lines long:
time.sleep(1) # no idea why wb.save sometimes fails; trying a delay
wb.save(FileName)
Notice, I left in one of the attempts to correct this. I have tried waiting as much as 3 seconds with no noticeable difference.
I admit I have no idea how to detect errors thrown by OpenPyXl and am quite unskilled at python error handling, but I had tried this code yesterday:
retries = 1
success = False
while not success and retries < 3:
try:
wb.save
success = True
except PermissionError as saveerror:
print ('>>> Save Error: ',saveerror)
wait = 3
print('=== Waiting %s secs and re-trying... ===' % wait)
#sys.stdout.flush()
time.sleep(wait)
retries += 1
My review of the output tells me that the except code never executed while testing the data capture routine over 3000 times. However, the "save" also never happened so the analysis spreadsheets did not receive any information until later when the python code saved the workbook and closed it.
I also tried adding a wb.close after setting the success variable to true, but got the same results.
I am considering either rewriting the VBA to try to grab the data directly from the unopened data workbooks without first copying them (which actually sounds more dangerous) or using an external synching tool to copy them outside of VBA (which could potentially cause exactly the same problem).
Does anyone have an idea of what may be happening and how to address it? It works nearly all the time but just fails several times a day.
Can someone help me to better understand how to trap the error thrown by OpenPyXl so that I can have it retry rather than just abending?
Any suggestions are appreciated. Thank you for reading.

Not sure if this is the best way, but the comment from simpleApp gave me an idea that I may want to use a technique I used elsewhere in the VBA. Since I am new to these tools, perhaps someone can suggest a cleaner approach, but I am going to try using a semaphore file to signal when I am copying the file to alert the python script that it should avoid saving.
In the below I am separating out the directory the prefix and the suffix. The prefix would be different for each of the 12 or more instances I am running and I have not figured out where I want to put these files nor what suffix I should use, so I made them variables.
For example, in the VBA I will have something like this to create a file saying currently available:
Dim strSemaphoreFolder As String
Dim strFilePrefix As String
Dim strFileDeletePath As String
Dim strFileInUseName As String
Dim strFileAvailableName As String
Dim strSemaphoreFileSuffix As String
Dim fso As Scripting.FileSystemObject
Dim fileTemp As TextStream
Set fso = CreateObject("Scripting.FileSystemObject")
strSemaphoreFileSuffix = ".txt"
strSemaphoreFolder = "c:\temp\monitor\"
strFilePrefix = "RJD"
strFileDeletePath = strSemaphoreFolder & strFilePrefix & "*" & strSemaphoreFileSuffix
' Clean up remnants from prior activities
If Len(Dir(strFileDeletePath)) > 0 Then
Kill strFileDeletePath
End If
' files should be gone
' Set the In-use and Available Names
strFileInUseName = strFilePrefix & "InUse" & strSemaphoreFileSuffix
strFileAvailableName = strFilePrefix & "Available" & strSemaphoreFileSuffix
' Create an available file
Set fileTemp = fso.CreateTextFile(strSemaphoreFolder & strFileAvailableName, True)
fileTemp.Close
' available file should be there
Then, when I am about to copy the file, I will briefly change the filename to indicate that the file is in use, perform the potentially problematic copy and then change it back with something like this:
' Temporarily name the semaphore file to "In Use"
Name strSemaphoreFolder & strFileAvailableName As strSemaphoreFolder & strFileInUseName
fso.CopyFile strFromFilePath, strFilePath, True
' After copying the file name it back to "Available"
Name strSemaphoreFolder & strFileInUseName As strSemaphoreFolder & strFileAvailableName
Over in the Python script, before I do the wb.save command, I will insert a check to see whether the file indicates that it is available or in use with something like this:
prefix = 'RJD'
directory = 'c:\\temp\\monitor\\'
suffix = '.txt'
filepathname = directory + prefix + 'Available' + suffix
while not (os.path.isfile(directory + prefix + 'Available' + suffix)):
time.sleep(1)
wb.save
Does this seem like it would work?
I am thinking that it should avoid the failure if I have properly identified it as an attempt to save the file in the Python script while the VBA script is telling the operating system to copy it.
Thoughts?
afterthoughts:
Using the technique I described, I probably need to create the "Available" semaphore file in the Python script and simply assume it will be there in the VBA script since the Python script is collecting the data and may be doing so before the VBA is even started.
A better alternative may be to simply check for the existence of the "In Use" file which will never be there unless the VBA wants it there, like this:
while (os.path.isfile(directory + prefix + 'InUse' + suffix)):
time.sleep(1)
wb.save

Removing excel sheet with openpyxl raise error when I open the file

I tested the openpyxl .remove() function and it's working on multiple empty file.
Problem: I have a more complex Excel file with multiple sheet that I need to remove. If I remove one or two it works, when I try to remove three or more, Excel raise an error when I open the file.
Sorry, we have troubles getting info in file bla bla.....
logs talking about pictures troubles
logs about error105960_01.xml ?
The strange thing is that it's talking about pictures trouble but I don't have this error if I don't remove 3 or more sheet. And I don't try to remove sheet with images !
Even more strange, It's always about the number, every file can be deleted without trouble but if I remove 3 or more, Excel yell at me.
The thing is that, it's ok when Excel "repair" the "error" but sometimes, excel reinitialize the format of the sheets (size of cell, bold and length of the characters, etc...) and everything fail :(
bad visual that I want to avoid
If someone have an idea, i'm running out of creativity !
For the code, I only use basic functions (simplify here but it would be long to present more...).
INPUT_EXCEL_PATH = "my_excel.xlsx"
OUTPUT_EXCEL_PATH = "new_excel.xlsx"
wb = openpyxl.load_workbook(INPUT_EXCEL_PATH)
ws = wb["sheet1"]
wb.remove(ws)
ws = wb["sheet2"]
wb.remove(ws)
ws = wb["sheet3"]
wb.remove(ws)
wb.save(OUTPUT_EXCEL_PATH)

In my case it was some left over empty CalculationChainPart. I used DocxToSource to investigate the corrupted file. Excel will attempt to fix the file on load. Save this file and compare it's structure to the original file. To delete descendant parts you can use the DeletePart() method.
using (SpreadsheetDocument doc = SpreadsheetDocument .Open(document, true)) {
MainDocumentPart mainPart = doc.MainDocumentPart;
if (mainPart.DocumentSettingsPart != null) {
mainPart.DeletePart(mainPart.DocumentSettingsPart);
}
}
CalculationChainPart can be also removed anytime.
While calculation chain information can be loaded by a spreadsheet application, it is not required. A calculation chain can be constructed in memory at load-time (source)

Constant first row of a .csv file?

I have a Python code which is logging some data into a .csv file.
logging_file = 'test.csv'
dt = datetime.datetime.now()
f = open(logging_file, 'a')
f.write('\n "{:%H:%M:%S}",{},{}'.format(dt,x,y,))
The above code is the core part and this produces continuous data in .csv file as
"00:34:09" ,23.05,23.05
"00:36:09" ,24.05,24.05
"00:38:09" ,26.05,26.05
... etc.,
Now I wish to add the following lines in first row of this data. time, data1,data2.I expect output as
time, data1, data2
"00:34:09" ,23.05,23.05
"00:36:09" ,24.05,24.05
"00:38:09" ,26.05,26.05
... etc.,
I tried many ways. Those ways not produced me the result as preferred format.But I am unable to get my expected result.
Please help me to solve the problem.

I would recommend writing a class specifically for creating and managing logs.Have it initialize a file, on creation, with the expected first line (don't forget a \n character!), and keep track of any necessary information about that log(the name of the log it created, where it is, etc). You can then have the class 'write' to the log (append the log, really), you can create new logs as necessary, and, you can have it check for existing logs, and make decisions about either updating what is existing, or scrapping it and starting over.

Python: Iteratively Writing to Excel Files

Python 2.7. I'm using xlsxwriter.
Let's say I have myDict = {1: 'One', 2: 'Two', 3: 'Three'}
I need to perform some transformation on the value and write the result to a spreadsheet.
So I write a function to create a new file and put some headers in there and do formatting, but don't close it so I can write further with my next function.
Then I write another function for transforming my dict values and writing them to the worksheet.
I'm a noob when it comes to classes so please forgive me if this looks silly.
import xlsxwriter
class ReadWriteSpreadsheet(object):
def __init__(self, outputFile=None, writeWorkbook=None, writeWorksheet=None):
self.outputFile = outputFile
self.writeWorksheet = writeWorksheet
self.writeWorkbook = writeWorkbook
# This function works fine
def setup_new_spreadsheet(self):
self.writeWorkbook = xlsxwriter.Workbook(self.outputFile)
self.writeWorksheet = self.writeWorkbook.add_worksheet('My Worksheet')
self.writeWorksheet.write('A1', 'TEST')
# This one does not
def write_data(self):
# Forget iterating through the dict for now
self.writeWorksheet.write('A5', myDict[1])
x = ReadWriteSpreadsheet(outputFile='test.xlsx')
x.setup_new_spreadsheet()
x.write_data()
I get:
Exception Exception: Exception('Exception caught in workbook destructor. Explicit close() may be required for workbook.',) in <bound method Workbook.__del__ of <xlsxwriter.workbook.Workbook object at 0x00000000023FDF28>> ignored
The docs say this error is due to not closing the workbook, but if I close it then I can't write to it further...
How do I structure this class so that the workbook and worksheet from setup_new_spreadsheet() is able to be written to by write_data()?

The exception mentioned in your question is triggered when python realises you will not need to use your Workbook any more in the rest of your code and therefore decides to delete it from his memory (garbage collection). When doing so, it will realise you haven't closed your workbook yet and so will not have persisted your excel spreadsheet at all on the disk (only happen on close I assume) and will raise that exception.
If you had another method close on your class that did: self.writeWorkbook.close() and made sure to call it last you would not have that error.

When you do ReadWriteSpreadsheet() you get a new instance of the class you've defined. That new instance doesn't have any knowledge of any workbooks that were set up in a different instance.
It looks like what you want to do is get a single instance, and then issue the methods on that one instance:
x = ReadWriteSpreadsheet(outputFile='test.xlsx')
x.setup_new_spreadsheet()
x.write_data()
To address your new concern:
The docs say this error is due to not closing the workbook, but if I close it then I can't write to it further...
Yes, that's true, you can't write to it further. That is one of the fundamental properties of Excel files. At the level we're working with here, there's no such thing as "appending" or "updating" an Excel file. Even the Excel program itself cannot do it. You only have two viable approaches:
Keep all data in memory and only commit to disk at the very end.
Reopen the file, reading the data into memory; modify the in-memory data; and write all the in-memory data back out to a new disk file (which can have the same name as the original if you want to overwrite).
The second approach requires using a package that can read Excel files. The main choices there are xlrd and OpenPyXL. The latter will handle both reading and writing, so if you use that one, you don't need XlsxWriter.

Found determinated string and delete cell

I created this topic ( Delete lines found in file with many lines ) and they suggested using the package "xlrd". I used and got interact with the file, but could not compare the contents of the cell with some string.
Here's my code:
import xlrd
arquivo = xlrd.open_workbook('/media/IRREMOVIVEL/arquivo.xls',)
planilha = arquivo.sheet_by_index(0)
def lerPlanilha():
for i in range(planilha.ncols):
if (planilha.cell(8,9) == "2010"):
print 'it works =>'
break
else:
print 'not works'
break
lerPlanilha()
But I got error: not works
Sorry for duplicate, maybe, and bad english.

xlrd.sheet.Sheet.cell method returns xlrd.sheet.Cell instance, which represents cell with value stored in it's value attribute. So something like sheet.cell(x,y).value should work.
And about deleting - you can't modify document using xlrd, you should use xlwt for writing excel, and xlutils for reading, modifying and writing document. A little intense in googling gives you something like http://www.python-excel.org/

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.