I am using python-docx to create a new document and then I add a table (rows=1,cols=5). Then I add a picture to each of the five cells. I have the code working but what I see from docx is not what I see when I use Word manually.
Specifically, if I set on "Show Formatting Marks" and then look at what was generated by docx, there is always a hard return in the beginning of each of the cells (put there by the add_paragraph method.) When I use Word manually, there is no hard return.
The result of the hard return is that each picture is down one line from where I want it to be. If I use Word, the pictures are where I expect them to be.
What is also strange is that on the docx document I can manually go in and single click next to the hard return, press the down cursor key once, and then press the Backspace key once and the hard return is deleted and the picture moves to the top of the cell.
So my question is, does anyone know of a way to get a picture in a table cell without having a hard return put in when the add_paragraph method is executed?
Any help would be greatly appreciated.
def paragraph_format_run(cell):
paragraph = cell.add_paragraph()
format = paragraph.paragraph_format
run = paragraph.add_run()
format.space_before = Pt(0)
format.space_after = Pt(0)
format.line_spacing = 1.0
format.alignment = WD_ALIGN_PARAGRAPH.CENTER
return paragraph, format, run
def main():
document = Document()
sections = document.sections
section = sections[0]
section.top_margin = Inches(1.0)
section.bottom_margin = Inches(1.0)
section.left_margin = Inches(0.75)
section.right_margin = Inches(0.75)
table = document.add_table(rows=1, cols=5)
table.allow_autofit = False
cells = table.rows[0].cells
for i in range(5):
pic_path = f"Table_Images\pic_{i}.jpg"
cell = cells[i]
cell.vertical_alignment = WD_ALIGN_VERTICAL.TOP
cell_p, cell_f, cell_r = paragraph_format_run(cell)
cell_r.add_picture(pic_path, width=Inches(1.25))
doc_path = "TableTest_1.docx"
document.save(doc_path)
Each blank cell in a newly created table contains a single empty paragraph. This is just one of those things about the Word format. I suppose it gives a place to put the insertion mark (flashing vertical cursor) when you're using the Word application. A completely empty cell would have no place to "click" into.
This requires that any code that adds content to a cell must treat the first paragraph differently. In short, you access the first paragraph as cell.paragraphs[0] and only create second and later paragraphs with cell.add_paragraph().
So in this particular case, the paragraph_format_run() function would change like this:
def paragraph_format_run(cell):
paragraph = cell.paragraphs[0]
...
This assumes a lot, like it only works when cell is empty, but given what you now know about cell paragraphs you may be able to adapt it to adding multiple images into a cell if later decide you need that.
Related
What I want to do is insert an image into a specific location in an existing Word document using Python. I've looked at various libraries to do this; I'm using the docx-mailmerge package to insert text and tables using Word merge fields, but unfortunately image merging is just a TODO/wishlist feature. python-docx meanwhile allows image insertion, but only at the end of a document, not in specific places.
Is there another library that does this, or a good trick to accomplish it?
Fiddling around with the underlying API (and thanks to this SO answer) I hacked my way to success:
add a placeholder in your Word document where the image should go, something like a single line that says [ChartImage1]
find the paragraph object in the document that contains that text
replace the text of that paragraph with an empty string
add a run, and inside that add your image
So something like:
document = Document("template.docx")
image_paras = [i for i, p in enumerate(document.paragraphs) if "[ChartImage1]" in p.text]
p = document.paragraphs[image_paras[0]]
p.text = ""
r = p.add_run()
r.add_picture("path/to/image.png")
document.save("my_doc.docx")
I use Python-docx to generate Microsoft Word document.The user want that when he write for eg: "Good Morning every body,This is my %(profile_img)s do you like it?"
in a HTML field, i create a word document and i recuper the picture of the user from the database and i replace the key word %(profile_img)s by the picture of the user NOT at the END OF THE DOCUMENT. With Python-docx we use this instruction to add a picture:
document.add_picture('profile_img.png', width=Inches(1.25))
The picture is added to the document but the problem that it is added at the end of the document.
Is it impossible to add a picture in a specific position in a microsoft word document with python? I've not found any answers to this in the net but have seen people asking the same elsewhere with no solution.
Thanks (note: I'm not a hugely experiance programmer and other than this awkward part the rest of my code will very basic)
Quoting the python-docx documentation:
The Document.add_picture() method adds a specified picture to the end of the document in a paragraph of its own. However, by digging a little deeper into the API you can place text on either side of the picture in its paragraph, or both.
When we "dig a little deeper", we discover the Run.add_picture() API.
Here is an example of its use:
from docx import Document
from docx.shared import Inches
document = Document()
p = document.add_paragraph()
r = p.add_run()
r.add_text('Good Morning every body,This is my ')
r.add_picture('/tmp/foo.jpg')
r.add_text(' do you like it?')
document.save('demo.docx')
well, I don't know if this will apply to you but here is what I've done to set an image in a specific spot to a docx document:
I created a base docx document (template document). In this file, I've inserted some tables without borders, to be used as placeholders for images. When creating the document, first I open the template, and update the file creating the images inside the tables. So the code itself is not much different from your original code, the only difference is that I'm creating the paragraph and image inside a specific table.
from docx import Document
from docx.shared import Inches
doc = Document('addImage.docx')
tables = doc.tables
p = tables[0].rows[0].cells[0].add_paragraph()
r = p.add_run()
r.add_picture('resized.png',width=Inches(4.0), height=Inches(.7))
p = tables[1].rows[0].cells[0].add_paragraph()
r = p.add_run()
r.add_picture('teste.png',width=Inches(4.0), height=Inches(.7))
doc.save('addImage.docx')
Here's my solution. It has the advantage on the first proposition that it surrounds the picture with a title (with style Header 1) and a section for additional comments. Note that you have to do the insertions in the reverse order they appear in the Word document.
This snippet is particularly useful if you want to programmatically insert pictures in an existing document.
from docx import Document
from docx.shared import Inches
# ------- initial code -------
document = Document()
p = document.add_paragraph()
r = p.add_run()
r.add_text('Good Morning every body,This is my ')
picPath = 'D:/Development/Python/aa.png'
r.add_picture(picPath)
r.add_text(' do you like it?')
document.save('demo.docx')
# ------- improved code -------
document = Document()
p = document.add_paragraph('Picture bullet section', 'List Bullet')
p = p.insert_paragraph_before('')
r = p.add_run()
r.add_picture(picPath)
p = p.insert_paragraph_before('My picture title', 'Heading 1')
document.save('demo_better.docx')
This is adopting the answer written by Robᵩ while considering more flexible input from user.
My assumption is that the HTML field mentioned by Kais Dkhili (orignal enquirer) is already loaded in docx.Document(). So...
Identify where is the related HTML text in the document.
import re
## regex module
img_tag = re.compile(r'%\(profile_img\)s') # declare pattern
for _p in enumerate(document.paragraphs):
if bool(img_tag.match(_p.text)):
img_paragraph = _p
# if and only if; suggesting img_paragraph a list and
# use append method instead for full document search
break # lose the break if want full document search
Replace desired image into placeholder identified as img_tag = '%(profile_img)s'
The following code is after considering the text contains only a single run
May be changed accordingly if condition otherwise
temp_text = img_tag.split(img_paragraph.text)
img_paragraph.runs[0].text = temp_text[0]
_r = img_paragraph.add_run()
_r.add_picture('profile_img.png', width = Inches(1.25))
img_paragraph.add_run(temp_text[1])
and done. document.save() it if finalised.
In case you are wondering what to expect from the temp_text...
[In]
img_tag.split(img_paragraph.text)
[Out]
['This is my ', ' do you like it?']
I spend few hours in it. If you need to add images to a template doc file using python, the best solution is to use python-docx-template library.
Documentation is available here
Examples available in here
This is variation on a theme. Letting I be the paragraph number in the specific document then:
p = doc.paragraphs[I].insert_paragraph_before('\n')
p.add_run().add_picture('Fig01.png', width=Cm(15))
I have old version of a few word documents (word document with '.doc' extension) all of which have a lot of tracked changes in them. Most of the changes have comments associated with them.
I need to figure out a way to use python to reject all the changes that have been made in the documents, while retaining the comments.
I tried this with the new versions of word document('.docx' files) and faced no issues. All the changes were rejected and the word document still had all the comments in it. But when I tried to do it with the older versions of word document, all my comments got deleted.
I was using the following function at first with few different versions of the word file.
def reject_changes(path):
doc = word.Documents.Open(path)
doc.Activate()
word.ActiveDocument.TrackRevisions = False
word.ActiveDocument.Revisions.RejectAll()
word.ActiveDocument.Save()
doc.Close(False)
I tried to use the above function with the original word document
I changed the extension of the file to '.docx' and tried the above function
I made a copy of the document and saved it in '.docx' format.
In all these cases the comments were deleted.
I then tried the following code:
def reject_changes(path):
doc = word.Documents.Open(path)
doc.Activate()
word.ActiveDocument.TrackRevisions = False
nextRev = word.Selection.NextRevision()
while nextRev:
nextRev.Reject()
nextRev = word.Selection.NextRevision()
word.ActiveDocument.Save()
doc.Close(False)
For some reason this code was almost working. But on checking few of the documents again, I found that while most of the comments remained a couple of them were still deleted.
I think that since the comments are being deleted, they are probably a part of Revisions, in that case, is it possible to check if the revision is a comment or not. If not, can someone please suggest a way to ensure that no comments are deleted in the document on rejecting the changes.
Edit:
So, I found out that the comments that were getting deleted were added to the document when the 'Track Changes' option was active. I guess it made the comments as a part of the revision. So my first function works pretty well in case the comments are made once the 'Track Changes' option was not active.
But then, I have about more then twenty word documents (all of them a mix of doc and docx files), each of them have at least fifteen pages and over fifty comments.
I am using win32com.client. I am not too familiar with other packages that work with MS word. Any help would be appreciated.
Thanks!
Okay, so I was able to get a workaround for this by:
Creating a selection object and selecting the scope of the text marked by the comment.
Saving the range of the commented text into a range object.
Rejecting the tracked changes for the selected text.
Getting the new text based on the range object that was created in step 2.
This method takes a lot of time, though and the easiest way to extract the marked text is to ensure that comments are made when the word is not tracking the changes.
This is the code I am using now.
def reject_changes(path, doc_names):
word = win32.gencache.EnsureDispatch('Word.Application')
rejected_changes = []
for doc in doc_names:
#open the word document
wb = word.Documents.Open(rejected_doc)
wb.Activate()
current_doc = word.ActiveDocument
current_doc.TrackRevisions = False
text = ''
#iterating over the comments
for c in current_doc.Comments:
sentence_range = c.Scope #returns a range object of the text marked by comment
select_sentence = sentence_range.Select() #select the sentence marked by sentence_range
nextRev = word.Selection.NextRevision() #checks for the next revision in word
while nextRev:
#if the next revision is not within the sentence_range then skip.
if nextRev.Range.Start < sentence_range.Start or nextRev.Range.End > sentence_range.End:
break
else:
nextRev.Reject()
new_range = current_doc.Range(sentence_range.Start, sentence_range.End)
text = new_range.Text
nextRev = word.Selection.NextRevision()
author = c.Author
rejected_changes.append((doc,author,text,path))
current_doc.Save()
wb.Close(False)
return rejected_changes
I created a word document which contains the text
Hello. You owe me ${debt}. Please pay me back soon.
in Times New Roman size 12. The file name is debtTemplate.docx. I would like to replace {debt} by an actual number (1.20) using python-docx. I tried that following code:
from docx import Document
document = Document("debtTemplate.docx")
paragraphs = document.paragraphs
debt = "1.20"
paragraph = paragraphs[0]
text = paragraph.text
newText = text.format(debt=debt)
paragraph.clear()
paragraph.add_run(newText)
document.save("debt.docx")
This results in a new document with the desired text, but in Calabri font size 11. I would like the font to be like the original: Times New Roman size 12.
I know that you can add a style variable to paragraph.add_run(), so I tried that but nothing work. Eg paragraph.add_run(newText,style="Strong") didn't even change anything.
Does anyone know what I can do?
EDIT: here's a modified version of my code that I had hoped would work but didn't.
from docx import Document
document = Document("debtTemplate.docx")
document.save("debt.docx")
paragraphs = document.paragraphs
debt = "1.20"
paragraph = paragraphs[0]
style = paragraph.style
text = paragraph.text
newText = text.format(debt=debt)
paragraph.clear()
paragraph.add_run(newText,style)
document.save("debt.docx")
This page in the docs should help you understand why the style is not having an effect. It's a pretty easy fix: http://python-docx.readthedocs.org/en/latest/user/styles.html
I like a couple other things about what you've found though:
Using the str.format() method to do placeholder replacement is a nice, easy way to do lightweight text replacement. I'll have to add that to the documentation as an approach to simple custom document generation.
In the XML for a paragraph, there is an optional element called <w:defRPr> which Word uses to indicates the default formatting for any new text added to the paragraph, like if you started typing after placing your insertion point at the end of the paragraph. Right now, python-docx ignores that element. That's why you're getting the default Calibri 11 instead of the Times New Roman 12 you started with. But a useful feature might be to use that element, if present, to assign run properties to any new runs added at the end of the paragraph. If you want to add that as a feature request to the GitHub tracker we'll take a look at getting it implemented.
How can I get the text under the cursor? So if I hover over it and the word was "hi" I could read it? I think I need to do something with QTextCursor.WordUnderCursor but I am not really sure what. Any help?
This is what I am trying to work with right now:
textCursor = text.cursorForPosition(event.pos());
textCursor.select(QTextCursor.WordUnderCursor);
text.setTextCursor(textCursor);
word = textCursor.selectedText();
I have it selecting the text right now just so I can see it.
Edit 2:
What I am really trying to do is display a tooltip over certain words in the text.
Unfortunately, I can't test this at the moment, so this is a best guess at what you need. This is based on some code I wrote that had a textfield that showed errors in a tooltip as you typed, but should work.
You've already got code to select the word under the hover over, you just need the tooltip in the right spot.
textCursor = text.cursorForPosition(event.pos())
textCursor.select(QTextCursor.WordUnderCursor)
text.setTextCursor(textCursor)
word = textCursor.selectedText()
if meetsSomeCondition(word):
toolTipText = toolTipFromWord(word)
# Put the hover over in an easy to read spot
pos = text.cursorRect(text.textCursor()).bottomRight()
# The pos could also be set to event.pos() if you want it directly under the mouse
pos = text.mapToGlobal(pos)
QtGui.QToolTip.showText(pos,toolTipText)
I've left meetsSomeCondition() and toolTipFromWord() up to you to fill in as you don't describe those, but they are pretty descriptive in what needs to go there.
Regarding your comment on doing it without selecting the word, the easiest way to do this is to cache the cursor before you select a new one and then set it back. You can do this by calling QTextEdit.textCursor() and then setting it like you did previously.
Like so:
oldCur = text.textCursor()
textCursor.select(QTextCursor.WordUnderCursor) # line from above
text.setTextCursor(textCursor) # line from above
word = textCursor.selectedText() # line from above
text.setTextCursor(oldCur)
# if condition as above