Replacing particular text in all sides of a ppt using python-pptx - python

I am new with python-pptx. But I am familiar with its basic working. I have searched a lot but I could not find a way to change a particular text by another text in all slides. That text may be in any text_frame of a slide. like all slides in a ppt have 'java' keyword, I want to change it by 'python' using python pptx in slides.
for slide in ppt.slides:
if slide.has_text_frame:
#do something with text frames

Something like this should help, you'll need to iterate the shape objects in each slide.shapes and check for a TextFrame and the existence of your keyword:
def replace_text_by_keyword(ppt, keyword, replacement):
for slide in ppt.slides:
for shp in slide.shapes:
if shp.has_text_frame and keyword in shp.text:
thisText = shp.text.replace(keyword, replacement)
shp.text = thisText
This example is just a simple str.replace of course if you have more complicated replacement/text-updating algorithm, you can modify as needed.

In addition when replacing text, you cannot simply replace it, you will loose all formatting.
Your text is contained in text_frame. Text_frame contain paragraphs, and paragraphs are made up of runs. A run contains all your formatting. You need to get to paragraph, then the run, then update text.
"A run exists to provide character level formatting, including font typeface, size, and color, an optional hyperlink target URL, bold, italic, and underline styles, strikethrough, kerning, and a few capitalization styles like all caps." (see reference below)
You'll need to do something like this:
prs = Presentation('data/p1.pptx')
for slide in prs.slides:
for shape in slide.shapes:
if not shape.has_text_frame:
continue
for paragraph in shape.text_frame.paragraphs:
for run in paragraph.runs:
run.text=newText(run.text)
prs.save('data/p1.pptx')
Official documentation(working with text): python-pptx.readthedocs.io
Visual representation of what this means Duplicate post

Related

how to edit/modify text in PDF

I am working on my final year project, so I working on a website where a user can come and read PDF, I am adding some features such as converting currency to their country currency, I am using flask and pymuPDF for my project and I don't know how I can modify the text at a pdf
anyone can help me with this problem
I heard here that using pymuPDF or pypdf2 can work, but I didn't find any solution for replacing text
Using the redaction facility of PyMuPDF is probably the adequate thing to do.
The approach:
Identify the location of the text to replace
Erase the text and replace it using redactions
Care must be taken to get hold of the original font, and whether or not the new text is longer / short than the original.
import fitz # import PyMuPDF
doc = fitz.open("myfile.pdf")
page = doc[number] # page number 0-based
# suppose you want to replace all occurrences of some text
disliked = "delete this"
better = "better text"
hits = page.search_for("delete this") # list of rectangles where to replace
for rect in hit:
page.add_redact_annot(rect, better, fontname="helv", fontsize=11,
align=fitz.TEXT_ALIGN_CENTER, ...) # more parameters
page.apply_annots(images=fitz.PDF_REDACT_IMAGE_NONE) # don't touch images
doc.save("replaced.pdf", garbage=3, deflate=True)
This works well with short text and medium quality expectations.
With some more effort, the original font properties, color, font size, etc. can be identified to produce a close-to-perfect result.

How to make bolding text in the text widget work optimally?

I want to make text bold work as intended, meaning for example when you want to make a selected word bold, but you mistakenly didn't select the whole word, and you left out the last letter, and then you want to correct that mistake and select the whole word, instead of bolding the word, it will change its weight to normal. Specifically I'm talking about this way of doing this:
`
def bold_it():
bold_font = font.Font(my_text, my_text.cget("font"))
bold_font.configure(weight="bold")
my_text.tag_configure("bold", font=bold_font)
current_tags = my_text.tag_names("sel.first")
if "bold" in current_tags:
my_text.tag_remove("bold", "sel.first", "sel.last")
else:
my_text.tag_add("bold", "sel.first", "sel.last")
`
I am fully aware of what the problem is, and it is in the current_tags variable, since the variable will return "bold" because tag names only looks at tags which are at the first selected position. In turn, this will make the if statements remove the bold tag instead of applying it.
So my question is, how do you fix this, or optimize this?
Codemy.com did a video on this, and this question is based on this video, https://www.youtube.com/watch?v=X6zqePBPDVU.
I tried utilizing the tag_ranges() method so I could get two indexes instead of just where the selecting begins, but it did not work because tag_names() accepts only one argument.

Python pptx - part of text in cell with different color

I am using pptx module to generate slide with table. I am able to change font in each cell, but what I also need is change font of specific word in text. In example "Generating random sentence as example". In this world "random" is bold.
Found similar case at text color in python-pptx module ,but that one works with "text frame" and not with cell.
Any advices/suggestions are welcomed!
Thanks
If you use cell where that example uses text_frame (or perhaps tf in that particular example) the rest of the code is the same. So to create "A sentence with a red word" where "red" appears in red color:
from docx.shared import RGBColor
# ---reuse existing default single paragraph in cell---
paragraph = cell.paragraphs[0]
# ---add distinct runs of text one after the other to
# --- form paragraph contents
paragraph.add_run("A sentence with a ")
# ---colorize the run with "red" in it---
red_run = paragraph.add_run("red")
red_run.font.color.rgb = RGBColor(255, 0, 0)
paragraph.add_run(" word.")
You can find additional details in the documentation here:
https://python-docx.readthedocs.io/en/latest/user/text.html#font-color

Python PPTX Slide Layout Import

I apologize, I have been looking for a solution but can't find enough documentation to figure it out. I am trying to import a default slide layout required for school, it has a special background and a Title Block and a Subtitle Block. I assumed when I import this, python-pptx would just automatically create placeholders 0 and 1 for those two text blocks but when I try and edit the placeholders, I get an attribute error:
AttributeError: 'Presentation' object has no attribute 'placeholders'
My code is as follows:
from pptx import Presentation
prs = Presentation('SeniorDesignTitleSlide.pptx')
Presentation_Title = prs.placeholders[0]
Presentation_Subtitle = prs.placeholders[1]
Presentation_Title.text = 'This Is a Test'
Presentation_Subtitle.text = 'Is This Working?'
prs.save('SlideLayoutImportTest.pptx')
Edit[0]: I do realize I am just opening that particular presentation, but how do I access and edit the single slide that’s in it ?
Edit[1]: I’ve found a few posts from 2015 about python-pptx expanding on this feature, but there’s no further information that it actually occurred.
How does python-pptx assign placeholders for imported slide layouts? Or does it even do this? Does it need to be a .potx file?
Thank you in advance.
Placeholders belong to a slide object, not a presentation object. So the first thing is to get ahold of a slide.
A slide is created from a slide layout, which it essentially clones to get some starting shapes, including placeholders in many cases.
So the first step is to figure out which slide layout you want. The easiest way to do this is to open the "starting" presentation (sometimes called a "template" presentation) and inspect it's slide master and layouts using the View > Master > Slide Master... menu option.
Find the one you want, count down to it from the first layout, starting at 0, and that gives you the index of that slide layout.
Then your code looks something like this:
from pptx import Presentation
prs = Presentation('SeniorDesignTitleSlide.pptx')
slide_layout = prs.slide_layouts[0] # assuming you want the first one
slide = prs.slides.add_slide(slide_layout)
Presentation_Title = slide.placeholders[0]
Presentation_Subtitle = slide.placeholders[1]
Presentation_Title.text = 'This Is a Test'
Presentation_Subtitle.text = 'Is This Working?'
prs.save('SlideLayoutImportTest.pptx')
The placeholders collection behaves like a dict as far as indexed access goes, so the 0 and 1 used as indices above are unlikely to match exactly in your case (although the 0 will probably work; the title is always 0).
This page of the documentation explains how to discover what indices your template has available: http://python-pptx.readthedocs.io/en/latest/user/placeholders-using.html
The page before that one has more on placeholder concepts:
http://python-pptx.readthedocs.io/en/latest/user/placeholders-understanding.html

I can't change the style of text in Word documents with Python-docx

I created a word document which contains the text
Hello. You owe me ${debt}. Please pay me back soon.
in Times New Roman size 12. The file name is debtTemplate.docx. I would like to replace {debt} by an actual number (1.20) using python-docx. I tried that following code:
from docx import Document
document = Document("debtTemplate.docx")
paragraphs = document.paragraphs
debt = "1.20"
paragraph = paragraphs[0]
text = paragraph.text
newText = text.format(debt=debt)
paragraph.clear()
paragraph.add_run(newText)
document.save("debt.docx")
This results in a new document with the desired text, but in Calabri font size 11. I would like the font to be like the original: Times New Roman size 12.
I know that you can add a style variable to paragraph.add_run(), so I tried that but nothing work. Eg paragraph.add_run(newText,style="Strong") didn't even change anything.
Does anyone know what I can do?
EDIT: here's a modified version of my code that I had hoped would work but didn't.
from docx import Document
document = Document("debtTemplate.docx")
document.save("debt.docx")
paragraphs = document.paragraphs
debt = "1.20"
paragraph = paragraphs[0]
style = paragraph.style
text = paragraph.text
newText = text.format(debt=debt)
paragraph.clear()
paragraph.add_run(newText,style)
document.save("debt.docx")
This page in the docs should help you understand why the style is not having an effect. It's a pretty easy fix: http://python-docx.readthedocs.org/en/latest/user/styles.html
I like a couple other things about what you've found though:
Using the str.format() method to do placeholder replacement is a nice, easy way to do lightweight text replacement. I'll have to add that to the documentation as an approach to simple custom document generation.
In the XML for a paragraph, there is an optional element called <w:defRPr> which Word uses to indicates the default formatting for any new text added to the paragraph, like if you started typing after placing your insertion point at the end of the paragraph. Right now, python-docx ignores that element. That's why you're getting the default Calibri 11 instead of the Times New Roman 12 you started with. But a useful feature might be to use that element, if present, to assign run properties to any new runs added at the end of the paragraph. If you want to add that as a feature request to the GitHub tracker we'll take a look at getting it implemented.

Categories

Resources