Make a PowerPoint presentation using Python? - python

So I have a collection of close to 90 photographs along with a caption and the date stored in a text file. The images are of variable sizes and I would like to automate the procedure of converting this data into a PowerPoint presentation, with one picture on one slide along with its date and caption as the Title. Any reliable methods present?

Check out the python-pptx library. Its useful for creating and updating PowerPoint .pptx files.
Also for some quick examples in python-pptx with screenshots, you can check this link.

Another approach is to automate powerpoint through win32com. I have succesfully done this to augenerate presentations with lots of images following the instructions at http://www.s-anand.net/blog/automating-powerpoint-with-python/ .

Related

Removing formatted images from a word document using Python

I have a word document from a colleague who gave me a .docx Microsoft Word file with 90 images on it that need to be extracted so they can be turned into flashcards. I tried using the Python module "docx2txt" which worked ok, but only extracted 34 images. Upon further inspection, I found that it was because when my coworker made the original file, he took screenshots of PowerPoint slides that he had made with about 4-6 of the images on one slide. Then, he would put them in Word and use the built in Word trimming tool to copy the picture several times and trim down to each individual picture he needed in a particular line of the document. Docx2txt copied the pictures files to my designated directly perfectly, but did not keep the formatting. Any picture file he had inserted and "trimmed down" to size, was copied as the full image. Does anyone know of a way to keep the formatting so I don't have to go through and manually copy 90 pictures one by one? Perhaps converting to a .pdf file and using a pdf related module or something? Or might be there some way of using another Python library which will keep the picture formatting? Thanks for any help you can provide! I'm somewhat of a beginner with Python, but love it when I can get it to automate stuff... even if it ends up taking longer to figure out how to do it than just boring myself to death saving the photos manually, lol.
https://support.microsoft.com/en-us/topic/reduce-the-file-size-of-a-picture-in-microsoft-office-8db7211c-d958-457c-babd-194109eb9535
Important: Cropped parts of the picture are not removed from the file, and can potentially be seen by others; including search engines if the cropped image is posted online. Only the Office desktop apps have the ability to remove cropped areas from the underlying image file.
Follow the relevant section for Desktop Office (Windows or Mac) note from above it CANNOT work on Web 365.
go to "Other kinds of cropping"
Important: If you delete cropped areas and later change your mind, you can click the Undo Button Image button to restore them. Deletions can [ONLY] be undone until the file is saved.
So make a backup copy of the file
Select the picture or pictures (If you want all selected that should be easy with CTRL + A to highlight everything)
Then follow the instructions
Picture Tools > Format, and in the Adjust group, click Compress Pictures
Be sure that the Delete cropped areas of pictures check box is selected
DEselect the Apply only to this picture check box.
Double check a few manually to verify all is well then save a copy.

How to use python-pptx to replace specific categories in the chart

Currently, I am doing repeated work like find and replace some content in PowerPoint. And I find some codes to F&R the text in the text-box or table. However, I cannot F&R those words in the chart that I need to edit in excel. Is there any possible solution? Instead of extracting the data to create a new chart.
Charts in Powerpoint are a little tricky, as they are stored in two places.
You'll need to first edit the "real" chart data by loading the excel behind (in C# this is the EmbeddedData section), and then update the cached values that Powerpoint has. These are descendants of the chartspace. If you don't do this, then you have to click in and out of the slide deck to trick Powerpoint into updating. I'm afraid I don't have python code to do that as I did the same in C#/Dotnet for slidetemplater.com
That does however have a python API too (https://pypi.org/project/slidetemplater/)

Is it possible for a PDF data parser to read PowerPoint PDFs?

I am currently developing a proprietary PDF parser that can read multiple types of documents with various types of data. Before starting, I was thinking about if reading PowerPoint slides was possible. My employer uses presentation guidelines that requires imagery and background designs - is it possible to build a parser that can read the data from these PowerPoint PDFs without the slide decor getting in the way?
So the workflow would basically be this:
At the end of a project, the project report is delivered in the form of a presentation.
The presentation would be converted to PDF.
The PDF would be submitted to my application.
The application would read the slides and create a data-focused report for quick review.
The goal of the application is to cut down on the amount of reading that needs to be done by a significant amount as some of these presentation reports can be many pages long with not enough time in the day.
Parsing PDFs into structured data is always tricky, as the format is geared towards precise printing, rather than ease of editing or data extraction.
Basically, a PDF contains information like "there's a label with such text at such (x,y) position on a certain page", or things like that.
Basically, you will very likely need some heuristics in order to turn that into structured data.
It will basically be a form of scraping.
Search on your favorite search engine for PDF scraping, or something like that, and it would be a good start.
Also, you may want to look at those similar posts:
PDF Data and Table Scraping to Excel
How to extract table as text from the PDF using Python?
A PowerPoint PDF isn't a type of PDF.
There isn't going to be anything natively in the PDF that identifies elements on the page as being 'slide' graphics the originated from a PowerPoint file for example.
You could try building an algorithm that makes decision about content to drop from the created PDF but that would be tricky and seems like the wrong approach to me.
A better approach would be to "Export" the PPT to text first, e.g. in Microsoft PowerPoint Export it to a RTF file so you get all of the text out and use that directly or then convert that to PDF.

How to Embed Excel Files and Link Data into PowerPoint by python in specific place of slide

There is an excel file with different sheets which is included by different charts and data. I want to make a powerpoint for my daily presentation automatically by python(PPTx lib in python).
my problem is I have to copy the charts which exist in excel and past in my powerpoint which is created by python (pptx). I want to know is there any possibility to export charts from excel file to powerpoint by python?
There is no direct API support for this in python-pptx. However, there are other approaches that might work for you.
Perhaps the simplest would be to use a package like openpyxl to read the data from the spreadsheet and recreate the chart using python-pptx, based on the data read from Excel.
If you wanted to copy the chart exactly, this is also possible but would require detailed knowledge of the Open Packaging Convention (OPC) file format and XML schemas to accomplish. Essentially, you would copy the chart-part for the chart into the PowerPoint package (zip file) and connect it to a graphic-frame shape on a slide. You'd also need to embed the Excel worksheet into the PowerPoint, perhaps repeatedly (once for each chart) and make any format-specific adjustments (Excel and PowerPoint handle charts slightly differently in certain details).
This latter approach would be a big job, so I would recommend trying the simpler approach first and see if that will get it done for you.

Django reportlab - Print current page (admin console) to PDF

What I am trying to accomplish is to allow users to view information in the django admin console and allow them to save and print out a PDF of the information infront of them based upon how ever they sorted/filtered the data.
I have seen a lot of documentation on report lab but mostly for just drawing lines and what not. How can I simply output the admin results to a PDF? If that is even possible. I am open to other suggestions if report lab is not the ideal way to get this done.
Thanks in advance.
Better use some kind of html2pdf because you already have html there.
If html2pdf doesn't do what you need, you can do everything you want to do with ReportLab. Have a look at the ReportLab manual, in particular the parts on Platypus. This is a part of the ReportLab library that allows you to build PDFs out of objects representing page parts (paragraphs, tables, frames, layouts, etc.).

Categories

Resources