Edit Header and Footer using python-pptx

Edit Header and Footer using python-pptx - python

Can I edit the Header & Footer of an existing Presentation using python-pptx? The values I want to set are as shown in the attached image. Thanks.

I asked this a long time ago, but I can't remember where and couldn't find it on SO. Scanny answered the question, so I'm relaying his answer here (probably poorly).
By default, Python-pptx doesn't include footers or page number placeholders when listing slide placeholders. It's common practice to recommend inserting text boxes instead when these are needed, but that's not useful when dealing with multiple templates or layouts.
The first thing you'll need to add somewhere is a patch so that the placeholders are included:
def footer_patch(self):
for ph in self.placeholders:
yield ph
SlideLayout.iter_cloneable_placeholders = footer_patch
You should then be able to grab the footer from the placeholders with simple means:
footer_copy = "Hi, it's me, the footer"
elif "FOOTER" in str(shape.placeholder_format.type):
footer = slide.placeholders[shape.placeholder_format.idx]
footer_text_frame = footer.text_frame
insert_text(footer_copy, footer_text_frame)
The above is old code, and probably a poor example of how to do this, but I hope it gives a starting point. A similar approach should work for the other values you listed there. Some values, like the page number, may require additional XML editing, which you can read about in another post where Scanny was my savior.
Please note, if you're using placeholders for other tasks, adding the Footer placeholder to the list of placeholders may have unforeseen consequences.

Related

Python Report Writing with flexible templates for report body rows

I have a need to create reports from Python. Our existing reporting system for my Volunteer Fire Company is being deprecated, and it had some idiosyncrasies anyway. Everything I'm looking at that seems feasible uses a template for the report body. Before I get too far down a dead end path trying different methods, I'd like to know if anyone knows of anything out there that can do something like this. Specifically, conditional formatting in a row in the body. Below is what we get currently- today I do the bolding of the rows manually based on our criteria of 33% or better. In my code I'll obviously know if the row deserves to be bolded- but everything I'm looking at that uses a template wouldn't allow this. The end result will be a PDF, but if I have to go through Excel, Word, HTML, or whatever to get there, I'll check it out. Sorry for the non-specific question, but I could potentially churn a lot more wasted time looking for something when somebody may already know what to use. Thanks.

Creating python PPTX Internal Hyperlinks prior to creating slides

On the topic of Python PPTX Internal Hyperlink
Is there a way to create Hyperlinks prior to creating the slides that they will be linked to?
E.g. create a table of contents slide with hyperlinks to the slides that will be added later, the number of slides wont always be the same as it will be edited by the user.

Interesting question. Here's what I found working with a currently selected shape on a slide. I expect you'd be applying hyperlinks to a textrange instead, but the general idea's the same.
A hyperlink to another slide contains no .Address and the .SubAddress looks like:
SlideID,SlideIndex,SlideTitle
SlideTitle can be blank, so I tested with this to link to a third slide that wasn't there:
Activewindow.Selection.ShapeRange(1).ActionSettings(1).Hyperlink.subaddress = ",3,"
No errors, but the link doesn't work either.
This DOES work, however:
Activewindow.Selection.ShapeRange(1).ActionSettings(1).Hyperlink.subaddress = "258,3,"
I run this on a selected shape on the first slide then later add a third slide and the link jumps to it.
The trick then becomes: How will you know what the SlideID will be, this slide you haven't yet added? In a new presentation, PPT will give the first slide an ID of 256 and increment the ID for each new slide you add, but it'll be quite tricky to keep track of SlideIDs and SlideIndexes if you're adding new slides at random places in a presentation, that may have had slides added and deleted beforehand.
Personally, I think I'd add blank slides ahead of time, link to them, then add whatever content's necessary.

I doubt it. Certainly the straightforward way won't work, but it's hard to say never against human ingenuity.
Anyway, the way this mechanism works is based on the PowerPoint hyperlink behavior, there are just special "action" verbs that mean "jump internally" rather than "jump to web address". The verb in this case would be NAMED_SLIDE (although NEXT_SLIDE, LAST_SLIDE, etc. are also available). The "name" of the slide in the XML is its relationship id, basically a keyword like "rId7" for the mapping of one PPTX package part (e.g. slide) to another.
Since there isn't a slide yet, there can be no relationship. Having such a "dangling" relationship would very likely trigger a repair error, but I can't see it working out in any case. Creating a new slide will create a new relationship without regard to relationships that are already there, so best case is your "pre-creation" relationship just gets ignored.
I think you're going to need a different strategy.

Extracting headings' text from word doc

I am trying to extract text from headings(of any level) in a MS Word document(.docx file). Currently I am trying to solve using python-docx, but unfortunately I am still not able to figure out if it is even feasible after reading it(maybe I am mistaken).
I tried to look for the solutions online but found nothing specific to my task. It would be great if someone could guide me here.

The fundamental challenge is identifying heading paragraphs. There's nothing stopping an author from formatting a "regular" paragraph to look like (and serve as) a heading as far as a reader is concerned.
However, it's not uncommon for authors to reliably use styles to create headings, because doing so makes it possible to automatically compile those headings into a table of contents.
In that case, you can just iterate over the paragraphs, and pick out those with one of the heading styles.
def iter_headings(paragraphs):
for paragraph in paragraphs:
if paragraph.style.name.startswith('Heading'):
yield paragraph
for heading in iter_headings(document.paragraphs):
print heading.text
Heading levels may be parsed from the full style name if they've kept the defaults (like 'Heading 1', 'Heading 2', ...).
This may need to be adjusted if the author has renamed the heading styles.
There are more sophisticated approaches which are more reliable (as far as being style-name independent), but those don't have API support so you'd need to dig into the internal code and interact with some of the style XML directly I expect.

How to iterate over everything in a python-docx document?

I am using python-docx to convert a Word docx to a custom HTML equivalent. The document that I need to convert has images and tables, but I haven't been able to figure out how to access the images and the tables within a given run. Here is what I am thinking...
for para in doc.paragraphs:
for run in para.runs:
# How to tell if this run has images or tables?
...but I don't see anything on the Run that has info on the InlineShape or Table. Do I have to fall back to the XML directly or is there a better, cleaner way to iterate over everything in the document?
Thanks!

There are actually two problems to solve for what you're trying to do. The first is iterating over all the block-level elements in the document, in document order. The second is iterating over all the inline elements within each block element, in the order they appear.
python-docx doesn't yet have the features you would need to do this directly. However, for the first problem there is some example code here that will likely work for you:
https://github.com/python-openxml/python-docx/issues/40
There is no exact counterpart I know of to deal with inline items, but I expect you could get pretty far with paragraph.runs. All inline content will be within a paragraph. If you got most of the way there and were just hung up on getting pictures or something you could go down the the lxml level and decode some of the XML to get what you needed. If you get that far along and are still keen, if you post a feature request on the GitHub issues list for something like "feature: Paragraph.iter_inline_items()" I can probably provide you with some similar code to get you what you need.
This requirement comes up from time to time so we'll definitely want to add it at some point.
Note that block-level items (paragraphs and tables primarily) can appear recursively, and a general solution will need to account for that. In particular, a paragraph can (and in fact at least one always must) appear in a table cell. A table can also appear in a table cell. So theoretically it can get pretty deep. A recursive function/method is the right approach for getting to all of those.

Assuming doc is of type Document, then what you want to do is have 3 separate iterations:
One for the paragraphs, as you have in your code
One for the tables, via doc.tables
One for the shapes, via doc.inline_shapes
The reason your code wasn't working was that paragraphs don't have references to the tables and or shapes within the document, as that is stored within the Document object.
Here is the documentation for more info: python-docx

Possible to Insert page in word document with python-docx?

I just read through the documentation on python-docx.
They mention several times that added content is created at the end of the document, but I didn't notice any way to alter this functionality.
Does anyone know how to add a new page to a pre-existing document, but make it page 1?
Thanks!

The short answer is the library doesn't support that just yet, although those features are high on the backlog so will be among the next to be implemented.
To get it done in the meantime you'll need to go down to the XML level with a "workaround" function. If you want to add this use case on this issue on GitHub I'll put together some workaround code you can use.
https://github.com/python-openxml/python-docx/issues/27

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.