Remove Text Shapes from Powerpoint slides where the shapes are empty - python

My md2pptx code creates slides using python-pptx. It sometimes ignores shapes on a page it doesn't need.
In Powerpoint Slide Show these empty shapes don't appear. In LibreOffice they seem to.
I'm pretty adept at manipulating the underlying XML for a slide.
Is it feasible to remove empty shapes - perhaps by deleting their XML elements? Or does python-pptx itself offer the capability to delete a shape? (I think not.)
Assume I can navigate to the shapes and figure out which ones are empty.
Note: I'm not aiming to delete whole slides, just empty shapes.

Deleting a "stand-alone" shape is reliable and pretty easy, something like:
sp = shape._element
sp.getparent().remove(sp)
The problem comes in where the shape has a relationship to some other "package part". For example, a Picture shape has a relationship (identified with an rId) to an image part (file) in the package (.pptx zip archive). In those cases, if you don't also properly deal with the relationship, you may get a "repair error" when you try to open the resulting file in PowerPoint.
A "regular" shape (so-called "auto-shape") such as a rectangle, text-box, line, or other geometric shape has no relationships and can be reliably deleted with this method. A table is probably safe too, but not a chart. A group shape is probably okay too, but only if it does not contain a picture or a chart. Both a picture and a chart may be a problem if you don't also remove their relationship.
Whether or not a repair error is triggered is a behavior that may differ between PowerPoint and LibreOffice (or other PPTX client). You can try just deleting a picture or chart shape without dealing with the relationship and see what happens, but to be reliable you'd need to test it with all the possible clients.
Removing a relationship is a little more involved and is either covered in another python-pptx question here on SO or would make a good new question.

Related

how to delete a text layer using fitz?

This is a very straightforward issue. I added an invisible text layer using page.insert_text().
After saving the modified pdf, I can use page.get_text() to retrieve the created text layer.
I would like to be able to eliminate that layer, buy couldn't find a function to do it.
The solution I've came up with is taking the pages as images and create a new pdf. But it seems like a very inefficient solution.
I would like to be able to solve this issue without using a different library other than fitz and it feels like it should be a solution within fitz, considering that page.get_text() can access the exact information I'm trying to eliminate
If you are certain of the whereabouts of your text on the page (and I understood that you are), simply use PDF redactions:
page.add_redact_annot(rect1) # remove text inside this rectangle
page.add_redact_annot(rect2)
...
page.apply_redactions(images=fitz.PDF_REDACT_IMAGE_NONE)
# the above removes everything intersecting any of the rects,
# but leaves images untouched
Obviously you can remove all text on the page by taking page.rect as the redaction rectangle.

Python pptx insert a picture in a table cell

Is there a way to insert a picture inside a cell using pptx python?
I'm also thinking of finding the coordinate of the cell and adjust the numbers for inserting the picture, but can not find anything.
Thank you.
No, unfortunately not. Note that this is not a limitation of python-pptx, it is a limitation of PowerPoint in general. Only text can be placed in a table cell.
There is nothing stopping you from placing a picture shape above (in z-order) a table cell, which will look like the picture is inside. This is a common approach but unfortunately is somewhat brittle. In particular, the row height is not automatically adjusted to "fit" the picture and changes in the content of cells in prior rows can cause lower rows to "move down" and no longer be aligned with the picture. So this approach has some drawbacks.
Another possible approach is to use a picture as the background for a cell (like you might use colored shading or a texture). There is no API support for this in python-pptx and it's not without its own problems, but might be an approach worth considering.

Get stats from specific Connected Components with opencv (with mask)

I am new to opencv (python) and don't really know how to tackle my new task.
I have several images (binarized) and masks for them. I want to extract all Connected Components of the original image that are masked and see their shapes (bounding boxes). I'm mainly interested in their length to height ratio. I'd also like to get a mean (or better: median?) for those, because I'd like to analyse them.
I played around with cv2.connectedComponentsWithStats(), but I can't seem to get the information I want with it. The documentation sadly also didn't help me.
So: Is there a way to get all desired CCs in (i.e.) an array, where they have their location and shape listed? That would be tremendously helpful!
(Also I have quite a few of those images and would like to get a good average of all of them. Is there a way to do this for a whole folder full of images?)

Matplotlib export text as line elements in Python

I have to build some rudimentary CAD Tool in Python based on matplotlib for handling the display of the content.
After all the parts have been put together, the whole layout shall be exported as line elements (basically just tuples of the start / end coordinates of the lines, e.g. [x1,y1,x2,y2]) and just points.
So far I have all the basic gemoetric stuff implemented, but I cannot figure out how to implement text properly. To be able to use different fonts etc. I want to use the text capabilities of matplotlib, but I can't find a way to export the text properly from matplotlib.
Is there a way to get a vectorized output right away? Or at least an array of the plotted text?
After some days of struggling, I found a way to get the outline of the text: https://github.com/rougier/freetype-py , more precisely the example https://github.com/rougier/freetype-py/blob/master/examples/glyph-vector.py
If you just want to get the outline as an vector array, you can delete everything after line 78 and do this:
path = Path(VERTS, CODES)
outline = path.to_polygons()
This will give you an array of polygons, and each polygon is again an array of points (x,y) of the polygon.
Though it was some trouble to get freetype running on windows and I still have not figured out how to make it portable, I think I will stick with this solution, because it is fast, reliable and allows one to use all the nice system fonts.

Easiest way to parse simple SVG into Python, applying all transforms

I'd like to draw a few simple objects in Inkscape (lines, circles, rectangles), group them, move them around, scale, rotate, make copies, etc. and then I need something which will let me load the SVG into Python and iterate over all these shapes, getting the relevant attributes (for circles: centre and radius; for lines: the two end-points; etc.).
I've installed and tried svg-utils, pysvg, svgfig and read about inkex, and while all of these seem to allow me to iterate through the XML structure, with varying degrees of awkwardness, as far as I can see none of them apply the transforms to the elements. So, if I draw a line from (0,0) to (1,1), group it, move it to (100,100), then its XML tag is still going to say (0,0) to (1,1), but its real position is computed by applying the transform in its containing group, to these end-points.
I don't want to write all this transform-application code myself, because that would be re-inventing the bicycle. But I need help finding a convenient existing bicycle...
One likely useful route is to find an exporter into a simple format, which would already have had to solve all these problems. Here is an example I found: http://en.wikipedia.org/wiki/SK1_%28program%29#Supported_formats
But which of the export formats listed there is likely to be the simplest?

Categories

Resources