Python Getting cursor position in WordApplication with win32 - python

I would like to add some text after the current position of the cursor in an active word document.
The code I need would have to be something like this:
import win32com.client as win32
word = win32.gencache.EnsureDispatch('Word.Application')
def InsertTextAfterCursor(text,document):
document.Range(0,document.GETCURSORPOSITION).InsertAfter(text) #GETCURSORPOSITION is what I don't know how to implement
word.Documents().Open("TheDocI'mWorkingOn.docx")
doc = word.Documents().Item("TheDocI'mWorkingOn.docx")
InsertTextAfterCursor(doc,'The text I want to insert')
The only idea I've come up with is to constantly check for changes of the entire Range of the document, so I can get where I'm writing, but:
this is useless if I just click anywhere in the document without changing anything.
this becomes expensive and slow if I'm dealing with a, say, 200 pages document.
I've been reading the official documentation for a while now but I can't figure this out. Would appreciate any indication

Related

Is there a way to programmatically reject changes to a word document using python, while not deleting comments from it?

I have old version of a few word documents (word document with '.doc' extension) all of which have a lot of tracked changes in them. Most of the changes have comments associated with them.
I need to figure out a way to use python to reject all the changes that have been made in the documents, while retaining the comments.
I tried this with the new versions of word document('.docx' files) and faced no issues. All the changes were rejected and the word document still had all the comments in it. But when I tried to do it with the older versions of word document, all my comments got deleted.
I was using the following function at first with few different versions of the word file.
def reject_changes(path):
doc = word.Documents.Open(path)
doc.Activate()
word.ActiveDocument.TrackRevisions = False
word.ActiveDocument.Revisions.RejectAll()
word.ActiveDocument.Save()
doc.Close(False)
I tried to use the above function with the original word document
I changed the extension of the file to '.docx' and tried the above function
I made a copy of the document and saved it in '.docx' format.
In all these cases the comments were deleted.
I then tried the following code:
def reject_changes(path):
doc = word.Documents.Open(path)
doc.Activate()
word.ActiveDocument.TrackRevisions = False
nextRev = word.Selection.NextRevision()
while nextRev:
nextRev.Reject()
nextRev = word.Selection.NextRevision()
word.ActiveDocument.Save()
doc.Close(False)
For some reason this code was almost working. But on checking few of the documents again, I found that while most of the comments remained a couple of them were still deleted.
I think that since the comments are being deleted, they are probably a part of Revisions, in that case, is it possible to check if the revision is a comment or not. If not, can someone please suggest a way to ensure that no comments are deleted in the document on rejecting the changes.
Edit:
So, I found out that the comments that were getting deleted were added to the document when the 'Track Changes' option was active. I guess it made the comments as a part of the revision. So my first function works pretty well in case the comments are made once the 'Track Changes' option was not active.
But then, I have about more then twenty word documents (all of them a mix of doc and docx files), each of them have at least fifteen pages and over fifty comments.
I am using win32com.client. I am not too familiar with other packages that work with MS word. Any help would be appreciated.
Thanks!
Okay, so I was able to get a workaround for this by:
Creating a selection object and selecting the scope of the text marked by the comment.
Saving the range of the commented text into a range object.
Rejecting the tracked changes for the selected text.
Getting the new text based on the range object that was created in step 2.
This method takes a lot of time, though and the easiest way to extract the marked text is to ensure that comments are made when the word is not tracking the changes.
This is the code I am using now.
def reject_changes(path, doc_names):
word = win32.gencache.EnsureDispatch('Word.Application')
rejected_changes = []
for doc in doc_names:
#open the word document
wb = word.Documents.Open(rejected_doc)
wb.Activate()
current_doc = word.ActiveDocument
current_doc.TrackRevisions = False
text = ''
#iterating over the comments
for c in current_doc.Comments:
sentence_range = c.Scope #returns a range object of the text marked by comment
select_sentence = sentence_range.Select() #select the sentence marked by sentence_range
nextRev = word.Selection.NextRevision() #checks for the next revision in word
while nextRev:
#if the next revision is not within the sentence_range then skip.
if nextRev.Range.Start < sentence_range.Start or nextRev.Range.End > sentence_range.End:
break
else:
nextRev.Reject()
new_range = current_doc.Range(sentence_range.Start, sentence_range.End)
text = new_range.Text
nextRev = word.Selection.NextRevision()
author = c.Author
rejected_changes.append((doc,author,text,path))
current_doc.Save()
wb.Close(False)
return rejected_changes

Arcpy, select features based on part of a string

So for my example, I have a large shapefile of state parks where some of them are actual parks and others are just trails. However there is no column defining which are trails vs actual parks, and I would like to select those that are trails and remove them. I DO have a column for the name of each feature, that usually contains the word "trail" somewhere in the string. It's not always at the beginning or end however.
I'm only familiar with Python at a basic level and while I could go through manually selecting the ones I want, I was curious to see if it could be automated. I've been using arcpy.Select_analysis and tried using "LIKE" in my where_clause and have seen examples using slicing, but have not been able to get a working solution. I've also tried using the 'is in' function but I'm not sure I'm using it right with the where_clause. I might just not have a good enough grasp of the proper terms to use when asking and searching. Any help is appreciated. I've been using the Python Window in ArcMap 10.3.
Currently I'm at:
arcpy.Select_analysis ("stateparks", "notrails", ''trail' is in \"SITE_NAME\"')
Although using the Select tool is a good choice, the syntax for the SQL expression can be a challenge. Consider using an Update Cursor to tackle this problem.
import arcpy
stateparks = r"C:\path\to\your\shapefile.shp"
notrails = r"C:\path\to\your\shapefile_without_trails.shp"
# Make a copy of your shapefile
arcpy.CopyFeatures_management(stateparks, notrails)
# Check if "trail" exists in the string--delete row if so
with arcpy.da.UpdateCursor(notrails, "SITE_NAME") as cursor:
for row in cursor:
if "trails" in row[0]: # row[0] refers to the current row in the "SITE_NAME" field
cursor.deleteRow() # Delete the row if condition is true

Python: Writing text to a 2003 Word Doc in a specific place on the page

I'm using Python 2.7, Windows 7, and Word 2003. Those three cannot change (well except for maybe the python version). I work in Law and the attorneys have roughly 3 boiler plate objections (just a large piece of text, maybe 5 paragraphs) that need to be inserted into a word document at a specific spot. Now instead of going through and copying and pasting the objection where its needed, my idea is for the user to go through the document adding a special word/phrase (place holder if you will) that wont be found anywhere in the document. Then run some code and have python fill in the rest. Maybe not the cleverest way to go about it, but I'm a noob. I've been practicing with a test page and inserted the below text as place holders (the extra "o" stands for objection)
oone
otwo
othree
Below is what I have so far. I have two questions
Do you have any other suggestions to go about this?
My code does insert the string in the correct order, but the formatting goes out the window and it writes in my string 6 times instead of 1. How can I resolve the formatting issue so it simply writes the text into the spot the place holder is at?
import sys
import fileinput
f = open('work.doc', 'r+')
obj1 = "oone"
obj2 = "otwo"
obj3 = "othree"
for line in fileinput.input('work.doc'):
if obj1 in line:
f.write("Objection 1")
elif obj2 in line:
f.write("Objection 2")
elif obj3 in line:
f.write("Objection 3")
else:
f.write("No Objection")
f.close
You could use python-uno to load the document into OpenOffice and manipulate it using the UNO interface. There is some example code on the site I just linked to which can get you started.

Pyqt: Get text under cursor

How can I get the text under the cursor? So if I hover over it and the word was "hi" I could read it? I think I need to do something with QTextCursor.WordUnderCursor but I am not really sure what. Any help?
This is what I am trying to work with right now:
textCursor = text.cursorForPosition(event.pos());
textCursor.select(QTextCursor.WordUnderCursor);
text.setTextCursor(textCursor);
word = textCursor.selectedText();
I have it selecting the text right now just so I can see it.
Edit 2:
What I am really trying to do is display a tooltip over certain words in the text.
Unfortunately, I can't test this at the moment, so this is a best guess at what you need. This is based on some code I wrote that had a textfield that showed errors in a tooltip as you typed, but should work.
You've already got code to select the word under the hover over, you just need the tooltip in the right spot.
textCursor = text.cursorForPosition(event.pos())
textCursor.select(QTextCursor.WordUnderCursor)
text.setTextCursor(textCursor)
word = textCursor.selectedText()
if meetsSomeCondition(word):
toolTipText = toolTipFromWord(word)
# Put the hover over in an easy to read spot
pos = text.cursorRect(text.textCursor()).bottomRight()
# The pos could also be set to event.pos() if you want it directly under the mouse
pos = text.mapToGlobal(pos)
QtGui.QToolTip.showText(pos,toolTipText)
I've left meetsSomeCondition() and toolTipFromWord() up to you to fill in as you don't describe those, but they are pretty descriptive in what needs to go there.
Regarding your comment on doing it without selecting the word, the easiest way to do this is to cache the cursor before you select a new one and then set it back. You can do this by calling QTextEdit.textCursor() and then setting it like you did previously.
Like so:
oldCur = text.textCursor()
textCursor.select(QTextCursor.WordUnderCursor) # line from above
text.setTextCursor(textCursor) # line from above
word = textCursor.selectedText() # line from above
text.setTextCursor(oldCur)
# if condition as above

Extracting data from MS Word

I am looking for a way to extract / scrape data from Word files into a database. Our corporate procedures have Minutes of Meetings with clients documented in MS Word files, mostly due to history and inertia.
I want to be able to pull the action items from these meeting minutes into a database so that we can access them from a web-interface, turn them into tasks and update them as they are completed.
Which is the best way to do this:
VBA macro from inside Word to create CSV and then upload to the DB?
VBA macro in Word with connection to DB (how does one connect to MySQL from VBA?)
Python script via win32com then upload to DB?
The last one is attractive to me as the web-interface is being built with Django, but I've never used win32com or tried scripting Word from python.
EDIT: I've started extracting the text with VBA because it makes it a little easier to deal with the Word Object Model. I am having a problem though - all the text is in Tables, and when I pull the strings out of the CELLS I want, I get a strange little box character at the end of each string. My code looks like:
sFile = "D:\temp\output.txt"
fnum = FreeFile
Open sFile For Output As #fnum
num_rows = Application.ActiveDocument.Tables(2).Rows.Count
For n = 1 To num_rows
Descr = Application.ActiveDocument.Tables(2).Cell(n, 2).Range.Text
Assign = Application.ActiveDocument.Tables(2).Cell(n, 3).Range.Text
Target = Application.ActiveDocument.Tables(2).Cell(n, 4).Range.Text
If Target = "" Then
ExportText = ""
Else
ExportText = Descr & Chr(44) & Assign & Chr(44) & _
Target & Chr(13) & Chr(10)
Print #fnum, ExportText
End If
Next n
Close #fnum
What's up with the little control character box? Is some kind of character code coming across from Word?
Word has a little marker thingy that it puts at the end of every cell of text in a table.
It is used just like an end-of-paragraph marker in paragraphs: to store the formatting for the entire paragraph.
Just use the Left() function to strip it out, i.e.
Left(Target, Len(Target)-1))
By the way, instead of
num_rows = Application.ActiveDocument.Tables(2).Rows.Count
For n = 1 To num_rows
Descr = Application.ActiveDocument.Tables(2).Cell(n, 2).Range.Text
Try this:
For Each row in Application.ActiveDocument.Tables(2).Rows
Descr = row.Cells(2).Range.Text
Well, I've never scripted Word, but it's pretty easy to do simple stuff with win32com. Something like:
from win32com.client import Dispatch
word = Dispatch('Word.Application')
doc = word.Open('d:\\stuff\\myfile.doc')
doc.SaveAs(FileName='d:\\stuff\\text\\myfile.txt', FileFormat=?) # not sure what to use for ?
This is untested, but I think something like that will just open the file and save it as plain text (provided you can find the right fileformat) – you could then read the text into python and manipulate it from there. There is probably a way to grab the contents of the file directly, too, but I don't know it off hand; documentation can be hard to find, but if you've got VBA docs or experience, you should be able to carry them across.
Have a look at this post from a while ago: http://mail.python.org/pipermail/python-list/2002-October/168785.html Scroll down to COMTools.py; there's some good examples there.
You can also run makepy.py (part of the pythonwin distribution) to generate python "signatures" for the COM functions available, and then look through it as a kind of documentation.
You could use OpenOffice. It can open word files, and also can run python macros.
I'd say look at the related questions on the right -->
The top one seems to have some good ideas for going the python route.
how about saving the file as xml. then using python or something else and pull the data out of word and into the database.
It is possible to programmatically save a Word document as HTML and to import the table(s) contained into Access. This requires very little effort.

Categories

Resources