Python-pptx compatibility with text boxes in LibreOffice

Python-pptx compatibility with text boxes in LibreOffice - python

Just starting to try to use python-pptx, and have fallen at the first. Linux Mint 20.1, python 3.85, LibreOffice 6.4.
This is basically the 'Hello World' from the documentation.
from pptx import Presentation
from pptx.util import Inches, Pt
prs = Presentation()
blank_slide_layout = prs.slide_layouts[6]
slide = prs.slides.add_slide(blank_slide_layout)
left = top = width = height = Inches(1)
txBox = slide.shapes.add_textbox(left, top, width, height)
tf = txBox.text_frame
print('before text ', txBox.left, txBox.top, txBox.width, txBox.height)
tf.text = "This is a long line of text inside a textbox"
print('after text ', txBox.left, txBox.top, txBox.width, txBox.height)
prs.save('test.pptx')
The text is more than a single line for the textbox. Printing out its bounds before and after text insertion shows that as far as python-pptx is concerned, the textbox width hasn't changed.
When the resulting presentation is viewed in LibreOffice, instead of wrapping the text within its boundaries, the textbox has expanded symmetrically around the mid point, pushing the start of the text off the lefthand edge of the page.
I was hoping the LibreOffice was only incompatible with powerpoint for rare edge cases, but text in text boxes is the meat and bread of presentations.
When I upload it to google slides, the text wraps within the left and right text box boundaries, but spills out of the bottom edge. The textbox shows up as 1" x 1" in the right place.
If I use onedrive.live.com, the text is left justified in the box and spills out of the righthand side without wrapping, and the textbox is shown as being 1" x 1" in the right place.
If I use onlinedocumentviewer.com, the display is the same as onedrive, though I can't get to see the text box.
Unfortunately I can't test the behaviour on a native powerpoint installation.
Maybe there's an autosize or fixed flag, which left unset leaves each viewer defaulting it in its own idiosyncratic way? How do we control text boxes / frames when targetting LibreOffice?
I possibly have a workaround to break up my text into single lines and use one per text box, but I'd rather understand the whether it can be done the proper way.

After some floundering around in the docs, I stumbled across the text_frame .word_wrap property. Setting to True gets the text to wrap in LibreOffice. While I was there, setting text_frame.auto_size = MSO_AUTO_SIZE.TEXT_TO_FIT_SHAPE reduces font size to get it all to fit in the box.
Is there a list of properties like .word_wrap in one place anywhere?

Related

change all fonts in powerpoint without opening the file

I wanted change the all fonts in about 100 powerpoint files, without opening the files. There are several shape types in each slide and each might have a different font. I used python-pptx package and wrote the following code to change the fonts of all texts in a powerpoint presentation. Although it does not give any error, it does not work, and the fonts in the file are still whatever they were, for example Arial. I also added print(shape.text) to make sure that it has found all texts, and it seems that there is no issue there. Is it a bug? Or am I missing anything?
prs = Presentation('f10.pptx')
for i, slide in enumerate(prs.slides):
for shape in slide.shapes:
print (shape.has_text_frame)
if shape.has_text_frame:
print(shape.text)
for p in shape.text_frame.paragraphs:
for r in p.runs:
print(r.font.name)
r.font.name = 'Tahoma'
print(r.font.name)
prs.save('f10_tahoma.pptx')
Besides, it seems that the package does not work for utf-8 characters. I added a text-box on the last slide by adding:
text_frame = shape.text_frame
text_frame.clear() # not necessary for newly-created shape
p = text_frame.paragraphs[0]
run = p.add_run()
run.text = 'سلام '
font = run.font
font.name = 'Andalus'
font.size = Pt(18)
before saving the file to add a textbox with utf-8 characters. It adds it there, and when I check the font it shows that it is set to Andalus, but actually it is not Andalus.

With Aspose.Slides for Python via .NET, you can easily change all fonts for all texts in your presentations. The following code example shows you how to do this:
import aspose.slides as slides
with slides.Presentation('example.pptx') as presentation:
for slide in presentation.slides:
for shape in slide.shapes:
if isinstance(shape, slides.AutoShape):
for paragraph in shape.text_frame.paragraphs:
for portion in paragraph.portions:
portion.portion_format.latin_font = slides.FontData('Tahoma')
You can also evaluate Aspose.Slides Cloud SDK for Python for presentation manipulating. This REST-based API allows you to make 150 free API calls per month for API learning and presentation processing.
Aspose Slides Online Viewer can be used to view presentations without PowerPoint installed.
I work at Aspose.

What language is the text of the file? Run.font properties work fine for UTF-8, but there is a separate font for cursive scripts like Arabic. Access to that secondary font is not implemented in python-pptx unfortunately, but that could explain at least part of the behavior you're seeing.
For roman character text (like that we're using here), there are a couple things to check.
The font in question needs to be installed on the machine PowerPoint is running on when the document is opened. Otherwise PowerPoint will substitute a font.
The font (typeface) name used in the XML will not always exactly match what appears in the PowerPoint drop-down selection box. You need to give that name to python-pptx in the exact form it should appear in the XML. You may need to make an example file that works by hand, perhaps containing a single slide with a single textbox for simplicity, and then inspect the XML of that file to find the "spelling" used for that typeface by PowerPoint.
You could do that with code like this:
prs = Presentation("example.pptx")
shape = prs.slides[0].shapes[0]
print(shape._element.xml)
You should be able to locate the typeface name somewhere in an element like <p:rPr> or <p:defRPr>.

add two images in same line in python-docx

I am trying to add two images in docx file.
Images should be one left side one right side.
After using this below code the image position is working like left and right as I want but they are not on the same line I want.
One is up and others are under that image.
I have tried the WD_ALIGN_PARAGRAPH.RIGHT but I am not getting the result I want.
## Image for Left Side
my_img = document.add_picture(i,width=Inches(0.8),height=Inches(0.8))
last_paragraph = document.paragraphs[-1]
last_paragraph.alignment = WD_ALIGN_PARAGRAPH.LEFT
## Image for Right Side
my_img2 = document.add_picture(i,width=Inches(0.8),height=Inches(0.8))
last_paragraph = document.paragraphs[-1]
last_paragraph.alignment = WD_ALIGN_PARAGRAPH.RIGHT
Please help me, I want both images on same line like two images together with little space between them.

Use Run.add_picture() instead of Paragraph.add_picture(). This will allow multiple images in the same paragraph, which, if they both fit within the page margins, will result in side-by-side images:
paragraph = document.add_paragraph()
run = paragraph.add_run()
run.add_picture(...)
run_2 = paragraph.add_run()
run_2.add_picture(...)
As far as alignment is concerned, when using paragraphs, inserting tabs is probably most reliable. The other alternative is to add a table and place the images in side-by-side cells.

Python- How to add text on desired coordinates on new word file

If the size of the document (.doc) say 1000x3000 (WidthxHeight), I need to place a text "Hello" at 400X300 coordinate points.
How to do it using python + any libraries? (Ubuntu platform)
Similar kind of problem has been addressed using Java
How to add text on desired coordinates on new word file using openxml

You can't place regular text at an arbitrary position. But you can use a textbox instead. Make sure to set top and left margins to 0. The top left corner of the text will then be at the specified coordinates.
Unfortunately, Python-docx doesn't support floating shapes. But you can do it using COM Automation on a Windows computer where Word is installed. You need PyWin32 which can also be downloaded from here.
import win32com.client as win32
word = win32.gencache.EnsureDispatch('Word.Application')
doc = word.Documents.Add()
tb = doc.Shapes.AddTextbox(1, 400, 300, 100, 100)
tb.TextFrame.TextRange.Text = "Hello"
tb.TextFrame.MarginTop = 0
tb.TextFrame.MarginLeft = 0
tb.Fill.Visible = 0
tb.Line.Visible = 0
doc.SaveAs2("Hello.docx")
doc.Close()
word.Application.Quit()

How Could I replace Text in a PPP with Python pptx?

I want to replace the text in a textbox in Powerpoint with Python-pptx.
Everything I found online didn't work for me and the documentation isn't that helpful for me.
So I have a Textbox with the Text:
$$Name 1$$
$$Name 2$$
and I want to change the $$Name1 $$ to Tom.
How can I achieve that?

A TextFrame object defined in python-pptx helps in manipulating contents of a textbox. You can do something like:
from python-pptx import Presentation
"""open file"""
prs = Presentaion('pptfile.pptx')
"""get to the required slide"""
slide = prs.slides[0]
"""Find required text box"""
for shape in slide.shapes:
if not shape.has_text_frame:
continue
text_frame = shape.text_frame
if "Name 1" == text_frame.text:
text_frame.text = "Tom"
"""save the file"""
prs.save("pptfile.pptx")

Try this:
import pptx
input_pptx = "Input File Path"
prs = pptx.Presentation((input_pptx))
testString = "$$Name1 $$"
replaceString = 'Tom'
title_slide_layout = prs.slide_layouts[0]
slide = prs.slides.add_slide(title_slide_layout)
for slide in prs.slides:
for shape in slide.shapes:
if shape.has_text_frame:
if(shape.text.find(testString))!=-1:
shape.text = shape.text.replace(testString, replaceString)
if not shape.has_table:
continue
prs.save('C:/test.pptx')

Ok thank you. I just found out, that my PowerPoint example was totaly messed up. No everything works fine with a new PowerPoint blanked

In order to keep original formatting, we need to replace the text at the run level.
from pptx import Presentation
ppt = Presentation(file_path_of_pptx)
search_str = '$$Name1$$'
replace_str = 'Tom'
for slide in ppt.slides:
for shape in slide.shapes:
if shape.has_text_frame:
for paragraph in shape.text_frame.paragraphs:
for run in paragraph.runs:
print(run.text)
if(run.text.find(search_str))!=-1:
run.text = run.text.replace(search_str, replace_str)
ppt.save(file_path_of_new_pptx)

Since PowerPoint splits the text of a paragraph into seemingly random runs (and on top each run carries its own - possibly different - character formatting) you can not just look for the text in every run, because the text may actually be distributed over a couple of runs and in each of those you'll only find part of the text you are looking for.
Doing it at the paragraph level is possible, but you'll lose all character formatting of that paragraph, which might screw up your presentation quite a bit.
Using the text on paragraph level, doing the replacement and assigning that result to the paragraph's first run while removing the other runs from the paragraph is better, but will change the character formatting of all runs to that of the first one, again screwing around in places, where it shouldn't.
Therefore I wrote a rather comprehensive script that can be installed with
python -m pip install python-pptx-text-replacer
and that creates a command python-pptx-text-replacer that you can use to do those replacements from the command line, or you can use the class TextReplacer in that package in your own Python scripts. It is able to change text in tables, charts and wherever else some text might appear, while preserving any character formatting specified for that text.
Read the README.md at https://github.com/fschaeck/python-pptx-text-replacer for more detailed information on usage. And open an issue there if you got any problems with the code!
Also see my answer at python-pptx - How to replace keyword across multiple runs? for an example of how the script deals with character formatting...

Pdftron - creating new element with same styles as existing element

I am trying to create PDF editing prototype using PdfTron software.
I have successfully created interface where user can click on image, created from PDF, select region and will be presented a text input where he/she can then enter text, that will replace the content in PDF file.
Now the text replacing part is problematic. Since there is no API doc for Python (only examples) I am following Java / Android API documentation.
Where I am for now. I have following code to find out the elements that are in user selected rectangle. Values x1, y1, x2, y2 are PDF coordinates based on user selection in the front end.
rect = Rect(x1, y1, x2, y2)
text = ''
extractor = TextExtractor()
extractor.Begin(page)
line = extractor.GetFirstLine()
words = []
while line.IsValid():
word = line.GetFirstWord()
while word.IsValid():
elRect = word.GetBBox()
elRect.Normalize()
if elRect.IntersectRect(elRect, rect):
text += ' ' + word.GetString()
words.append(word)
word = word.GetNextWord()
line = line.GetNextLine()
words is basically array where I store the content that will later need to be replaced for new element.
Now the problem. I want the new element have the same style and font that the old text has.
Api (link) tells me that using
style = words[0].GetStyle()
gives me style of the word and I can get font from style using
font = style.GetFont()
doc : https://www.pdftron.com/pdfnet/mobile/docs/Android/pdftron/PDF/TextExtractor.Style.html
But this returned font is of Obj class not Font class.
And apparently creating new text element with font requires object of Font class.
Because
element = eb.CreateTextBegin(font, 10.0);
generates an error:
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/home/alan/.virtualenvs/pdfprint/local/lib/python2.7/site-packages/PDFNetPython2.py", line 5056, in CreateTextBegin
def CreateTextBegin(self, *args): return _PDFNetPython2.ElementBuilder_CreateTextBegin(self, *args)
NotImplementedError: Wrong number or type of arguments for overloaded function 'ElementBuilder_CreateTextBegin'.
Possible C/C++ prototypes are:
pdftron::PDF::ElementBuilder::CreateTextBegin(pdftron::PDF::Font,double)
pdftron::PDF::ElementBuilder::CreateTextBegin()
Perhaps there is better approach to achieving same result?
Edit1
Reading docs I found that you can create Font object based on Object like:
font = Font(style.GetFont())
Still stuck on creating element with those styles though.
/edit1
Edit2
I use following code to test writing into file:
style = elements[0].GetStyle()
font = Font(style.GetFont())
fontsize = style.GetFontSize()
eb = ElementBuilder()
element = eb.CreateTextBegin(font, 10.0)
writer.WriteElement(element)
element = eb.CreateTextRun('My Name')
element.SetTextMatrix(10, 0, 0, 10, 100, 100)
gstate = element.GetGState()
gstate.SetTextRenderMode(GState.e_fill_text)
gstate.SetStrokeColorSpace(ColorSpace.CreateDeviceRGB())
gstate.SetStrokeColor(ColorPt(1, 1, 1))
element.UpdateTextMetrics()
writer.WriteElement(element)
writer.WriteElement(eb.CreateTextEnd())
writer.End()
from core.helpers import ensure_dir
ensure_dir(output_filename)
doc.Save(output_filename, SDFDoc.e_linearized)
doc.Close()
What I cant figure out is:
How to copy styles from existing element.
How to position new element in document.
Why this test code does not give me visible results. As far as I see new file gets created by it does not have "My Name" anywhere in it.
/Edit2

Based on the code above it looks like you want to append some text to an existing page based on the font style (font name + color) used by the first word on the page.
There are couple issue with the above code. You are setting the stroke color rather than fill:
gstate.SetTextRenderMode(GState.e_fill_text)
gstate.SetStrokeColorSpace(ColorSpace.CreateDeviceRGB());
gstate.SetStrokeColor(ColorPt(1, 1, 1))
try
gstate.SetTextRenderMode(GState.e_fill_text)
gstate.SetFillColorSpace(ColorSpace.CreateDeviceRGB());
gstate.SetFillColor(ColorPt(1, 0, 0)) // hardcode to red … for testing purposes only
The main issue is most likely related to Font handling. You are hijacking an existing font and are assuming that this font is using ‘standard encoding’. However this font is likely not using standard encoding. Also fonts in existing PDFs are often sub-setted (this means that the font does not contain a full list of glyphs, but only character references that are present in the document). As a result, you may see notdef or whitespace instead of the expected text. This and some other issues are covered here:
https://groups.google.com/d/msg/pdfnet-sdk/RBTuJG2uILk/pGkrKnqZ_YIJ
https://groups.google.com/d/msg/pdfnet-sdk/2y8s5aehq-c/xyknr9W5r-cJ
As an solution, instead of using embedded font directly you can find a matching system font (e.g. based on font name and other properties) and create a new font. PDFNet offers a utility method Font.Create(doc, font) , or Font.Create(doc, "Font name").
This methods will create a Unicode font so you should use eb.CreateUnicodeTextRun() rather than eb.CreateTextRun().
Alternatively you could use AcroForm as a template (see InteractiveForms sample) and pdfdoc.FattenAnnotations() to end-up with read-only version of the document.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.