I've got an .xlsx file. Some cells in it have comments which content will be used thereafter. How to check, iterating through every cell, if it has a comment or not?
This code (in which I tried to iterate the third column and nothing else) returns an error:
import win32com.client, win32gui, re
xl = win32com.client.Dispatch("Excel.Application")
xl.Visible = 1
TempExchFilePath = win32gui.GetOpenFileNameW()[0]
wb = xl.Workbooks.Open(TempExchFilePath)
sh = wb.Sheets("Sheet1")
comments = None
for i in range (0,201,1):
if sh.Cells(2,i).Comment.Text() != None:
comment = sh.Cells(2,i).Comment.Text()
comments += comment
print(comments)
input()
I am very new to Python and sorry for my English.
Thanks! :3
Here is what I think is the best way, using the Python Excel modules, specifically xlrd
Suppose you have a workbook which has a cell A1 with a comment written by Joe Schmo which says "Hi!", here's how you'd get at that.
>>> from xlrd import *
>>> wb = open_workbook("test.xls")
>>> sheet = wb.sheet_by_index(0)
>>> notes = sheet.cell_note_map
>>> print notes
{(0, 0): <xlrd.sheet.Note object at 0x00000000033FE9E8>}
>>> notes[0,0].text
u'Schmo, Joe:\nHi!'
A Quick Explanation of What's Going On
So the xlrd module is a pretty handy thing, once you figure it out (full documentation here). The first two lines import the module and create a workbook object called wb. Next, we create a sheet object of the first sheet (index 0) and call that sheet (I'm feeling creative today). Then we create a dicitonary of note objects called notes with the cell_note_map attribute of our sheet object. This dictionary has the (row,col) index of the comment as the key, and then a note object as the value. We can then extract the text of that note using the text attribute of the note object.
For multiple notes, you can iterate through your dictionary to get at all the text as show below:
>>> comments = []
>>> for key in notes.keys():
... comments.append(notes[key].text)
...
>>> print comments
[u"Schmo, Joe:\nHere's another\n", u'Schmo, Joe:\nhi!']
Some Things to Note
This will only work with .xls files, not .xlsx, but you can save any .xlsx as an .xls so there's no problem
The author of the comment will always be listed first, but can be accessed separately by using the author attribute instead of text. There will also always be a \n inbetween the author and text.
Cells which do not have comments will not be mapped by cell_note_map. So a full sheet without any comments will yield an empty dictionary
I think defining comments as None and then trying to add Stuff (i guess a string) won't work.
Try comments = "" instead of comments = None
Other then that, it would deffinitly help to see the error.
I think this should work. However, you have
comments = None
and then
comments += comment
I don't think you can do None + anything. Most likely, you either want to do
comments = ''
comments += comment
or
comments = []
comments.append(comment)
Another thing you probably need to fix:
if sh.Cells(2,i).Comment.Text() != None:
The (2,i) syntax doesn't appear to work in python. Change to Cells[2][i]. Also, if Comment doesn't exist, then it will be None , and won't have a Text() function. i.e.:
if sh.Cells[2][i].Comment != None:
comment = sh.Cells[2][i].Comment.Text()
Related
I'm looking import a row of strings from an xlsx document and iterate them into Python to be used as a variable elsewhere. For now I'm trying to simply have them printed out as proof the code works.
For some reason, the code runs without any errors but does not print anything out.
From what I've been reading in openpyxl's documentation this should be working fine, I can't figure out what the problem is.
I'm assuming it's something to do with my if statement but as far as I can tell everything checks out.
Perhaps there's an issue identifying the column?
For clarification, the column I'm trying to access is 'B'. The first cell of B is the header and cells 2-max sheet is the data.
My code:
from openpyxl import load_workbook
path = "/Users/xxx/Desktop/alpha list test.xlsx"
book = load_workbook(path)
sheet = book['Sheet1']
column_name = 'username'
for column_cell in sheet.iter_cols(1, sheet.max_column):
if column_cell[0] == column_name:
B = 0
for data in column_cell[1:]:
htag = data.value
print(htag)
Which results in:
Process finished with exit code 0
The /xxx/ in the path is to hide personal information.
column_cell is a tuple of objects. So column_cell[0] is the first tuple but the object is not going to equal a string value
if column_cell[0] == column_name:
try using the value attribute, otherwise it will not match your header and never continue to the print part of the code.
if column_cell[0].value == column_name:
I would like to replace the table with another table in another docx file.
And during the searching, I found the method that down blow can delete the whole table(paragraph) easily.
doc = docx.Document('test.docx')
def delete_paragraph(paragraph):
p = paragraph._element
p.getparent().remove(p)
p._p = p._element = None
delete_paragraph(doc.table[3])
So, I guess it's possible to replace, too.
And I try to code blow:
doc = docx.Document('test.docx')
stand = docx.Document('stand.docx')
def replace_paragraph(p1, p2):
p1.element = p2._element
replace_paragraph(doc.tables[3],stand.tables[0])
But it didn't work. How can I do ?
===UPDATE===
I found another method blow.
from copy import deepcopy
def copy_table_after(table, paragraph):
tbl, p = table._tbl, paragraph._p
new_tbl = deepcopy(tbl)
p.addnext(new_tbl)
First use delete_paragraph to delete the old table, then use copy_table_after to copy the new table.
However, this way has to know the paragraph location of the old table.
If someone know the better way, please tell me, thank you.
I really am very new to this so please be gentle. I've been looking for a couple of hours at how to sort this out. Essentially I am trying to open a word document, find the "X" character in a very simple table I have put in, then update it to whatever the user inputs. The last thing I did here was make this a function and call it, to see if I could get round some issues I thought I was having with it correctly capturing the user's input. It looks like the below in IDLE. I'm trying to get X replaced by Cabbage, so this is what the below shows. The issue is that after I run this I open the word document (for the Nth time now) and it just is not updating to say "Cabbage". What might I be doing wrong here? I am not getting any error messages to go on. I've tried this without the function and function call, but it isn't having it:
>>> import os
>>> from docx import Document
>>> import docx
>>> doc=Document('Temp.docx')
>>> def tupdate(rep):
for table in doc.tables:
for col in table.columns:
for cell in col.cells:
for p in cell.paragraphs:
if 'X' in p.text:
p.text.replace("X", rep)
>>> rep = input()
Cabbage
>>> tupdate(rep)
>>> doc.save('Temp.docx')
Any help would be appreciated. I am using the latest version of python on windows.
Thank you.
p.text.replace("X", rep) does not do an in-place substitution.
I've tested the code below and I was able to replace Xs with Zs.
import os
from docx import Document
doc = Document('Temp.docx')
rep = 'Z' # input()
for table in doc.tables:
for col in table.columns:
for cell in col.cells:
for p in cell.paragraphs:
if 'X' in p.text:
p.text = p.text.replace("X", rep)
doc.save('Temp.docx')
I have old version of a few word documents (word document with '.doc' extension) all of which have a lot of tracked changes in them. Most of the changes have comments associated with them.
I need to figure out a way to use python to reject all the changes that have been made in the documents, while retaining the comments.
I tried this with the new versions of word document('.docx' files) and faced no issues. All the changes were rejected and the word document still had all the comments in it. But when I tried to do it with the older versions of word document, all my comments got deleted.
I was using the following function at first with few different versions of the word file.
def reject_changes(path):
doc = word.Documents.Open(path)
doc.Activate()
word.ActiveDocument.TrackRevisions = False
word.ActiveDocument.Revisions.RejectAll()
word.ActiveDocument.Save()
doc.Close(False)
I tried to use the above function with the original word document
I changed the extension of the file to '.docx' and tried the above function
I made a copy of the document and saved it in '.docx' format.
In all these cases the comments were deleted.
I then tried the following code:
def reject_changes(path):
doc = word.Documents.Open(path)
doc.Activate()
word.ActiveDocument.TrackRevisions = False
nextRev = word.Selection.NextRevision()
while nextRev:
nextRev.Reject()
nextRev = word.Selection.NextRevision()
word.ActiveDocument.Save()
doc.Close(False)
For some reason this code was almost working. But on checking few of the documents again, I found that while most of the comments remained a couple of them were still deleted.
I think that since the comments are being deleted, they are probably a part of Revisions, in that case, is it possible to check if the revision is a comment or not. If not, can someone please suggest a way to ensure that no comments are deleted in the document on rejecting the changes.
Edit:
So, I found out that the comments that were getting deleted were added to the document when the 'Track Changes' option was active. I guess it made the comments as a part of the revision. So my first function works pretty well in case the comments are made once the 'Track Changes' option was not active.
But then, I have about more then twenty word documents (all of them a mix of doc and docx files), each of them have at least fifteen pages and over fifty comments.
I am using win32com.client. I am not too familiar with other packages that work with MS word. Any help would be appreciated.
Thanks!
Okay, so I was able to get a workaround for this by:
Creating a selection object and selecting the scope of the text marked by the comment.
Saving the range of the commented text into a range object.
Rejecting the tracked changes for the selected text.
Getting the new text based on the range object that was created in step 2.
This method takes a lot of time, though and the easiest way to extract the marked text is to ensure that comments are made when the word is not tracking the changes.
This is the code I am using now.
def reject_changes(path, doc_names):
word = win32.gencache.EnsureDispatch('Word.Application')
rejected_changes = []
for doc in doc_names:
#open the word document
wb = word.Documents.Open(rejected_doc)
wb.Activate()
current_doc = word.ActiveDocument
current_doc.TrackRevisions = False
text = ''
#iterating over the comments
for c in current_doc.Comments:
sentence_range = c.Scope #returns a range object of the text marked by comment
select_sentence = sentence_range.Select() #select the sentence marked by sentence_range
nextRev = word.Selection.NextRevision() #checks for the next revision in word
while nextRev:
#if the next revision is not within the sentence_range then skip.
if nextRev.Range.Start < sentence_range.Start or nextRev.Range.End > sentence_range.End:
break
else:
nextRev.Reject()
new_range = current_doc.Range(sentence_range.Start, sentence_range.End)
text = new_range.Text
nextRev = word.Selection.NextRevision()
author = c.Author
rejected_changes.append((doc,author,text,path))
current_doc.Save()
wb.Close(False)
return rejected_changes
I am trying to parse some unicode text from an excel2007 cell read by using xlrd (actually xlsxrd).
For some reason xlrd attaches "text: " to the beginning of the unicode string and is making it difficult for me to type cast. I eventually want to reverse the order of the string since it is a name and will be put in alphabetical order with several others. Any help would be greatly appreciated, thanks.
here is a simple example of what I'm trying to do:
>>> import xlrd, xlsxrd
>>> book = xlsxrd.open_workbook('C:\\fileDir\\fileName.xlsx')
>>> book.sheet_names()
[u'Sheet1', u'Sheet2']
>>> sh = book.sheet_by_index(1)
>>> print sh
<xlrd.sheet.Sheet object at 0x(hexaddress)>
>>> name = sh.cell(0, 0)
>>> print name
text: u'First Last'
from here I would like to parse "name" exchanging 'First' with 'Last' or just separating the two for storage in two different vars but every attempt I have made to type cast the unicode gives an error. perhaps I am going about it the wrong way?
Thanks in advance!
I think you may need
name = sh.cell(0,0).value
to get the unicode object. Then, to split into two variables, you can obtain a list with the first and last name, using an empty space as separator:
split_name = name.split(' ')
print split_name
This gives [u'First', u'Last']. You can easily reverse the list:
split_name = split_name.reverse()
print split_name
giving [u'Last', u'First'].
Read aboput the Cell class in the xlrd documentation. Work through the tutorial that you can get via www.python-excel.org.