SVG to black-and-white - python

I would like to be able to convert SVG documents to black and white. My try is the following Makefile script using 'sed' :
%.bw.svg: %.svg
sed '/stroke:none/!s/stroke:[^;\"]*/stroke:black/g' $< > $#
This works for lines etc but not for fillings. Basically if the stroke is not invisible (none), then I convert it to black. I would like to do the same for fillings, if not white or invisible, then convert to black.
I wonder if it would be too complex to do something like this in a better way, perhaps using XSLT, but I have no experience. Anyone can help ?

Two options that I would try:
1- Inkscape appears to be able to do it - Inkscape Convert
2- SVG supports a ColorProfile attribute on the SVG element that can reference an ICC Color Profile. I would try to reference a GrayScale color profile there and see what happens. Looks like there is one available here.

My first thought is that it can be dangerous to manipulate XML (in this case SVG) via sed etc. since it won't escape XML chars properly or respect character encodings.
Having said that, your dataset may be sufficiently constrained and limited such that this isn't a particular problem.
Considering XPath solutions (inc. XSLT) sounds good since you'll be able to precisely identify the components you want to change. Some implementation of XQuery may be of use here.
A very different alternative is XMLStarlet - a command line toolset used for processing XML in scripts.
Finally, can you use a programmatic toolset to do this ? Batik would be my first choice (in the Java world).

I think you need a full CSS parser to do this job for all of SVG; but for "SVG as generated by some particular vector editing application", XSLT containing string editing of the style attributes (as you're doing now, except that it will properly stick to the styles and avoid e.g. <text>) might be adequate.
It would be useful if you'd edit your question to explain how your strategy fails for fill colors.

First of all: Don't try doing this with sed. Editing XML is a little more complex than that.
You can use an SVG filter effect on the image. The ColorMatrix filter primitive can desaturate an image.

Related

Find phonome and duration

I was wondering if anyone knows of a python tool that finds phonemes from a text, as well as their duration.
In short, I want a forced alignment tool like aeneas, but I want the phonemes and their duration.
Thank you!
You didn't specify what kind of data you have, but I assume it's audio files with their corresponding orthographic transcriptions.
In that case, the Montreal Forced Aligner might be suitable (there is link to the executable on that page).
It is based on Kaldi, so for more robust and comprehensive solution, the kaldi-dnn-ali-gop repo provides more powerful options.

How are font variations written

I have been thinking for some time that variable fonts were simply combinations of multiple fonts, and that values were interpolated between them. However, I just read about this project, protottypo (which is unfortunately discontinued), and discovered about how they were storing their fonts as variables. See this screenshot from a promotional video, a few years ago:
And it seemed just so logical! Why not use a real language-like format, with variables and all. In the picture above, it (kinda) looks like python code.
And then I thought "It must have been thought through, let's look at how OpenType font variations are implemented."
And I looked on the web for the schema and the specification, but could not find it.
So the actual question(s):
How are variable fonts stored in otf files? Is it simply, as I thought before, multiple fonts and the other values are interpolated between them? Is there a variable language like the one above used to write the variable parts of the font (obviously)?
Where can I find the TTF specification for variable fonts? Is there any?
Is there a way to write a variable font with a regular text file (of course involving some vector graphics of some sort, like: const d = 'M23.6,0c-3.4,0-6.3,2.7-7.6,5.6C14.7,2.7,11.8,0,8.4,0C3.8,0,0,3.8,0,8.4c0,9.4,9.5,11.9,16,21.2 c6.1-9.3,16-12.1,16-21.2C32,3.8,28.2,0,23.6,0z' (this one makes a heart)
Thank you (that's what the heart is for :)

Get dynamic text with python and present it in LaTeX, format depending on content

I try to gather some graphics and text from different folders and present them in a comprehensive way. For this I use python to copy them into one folder and derive a dynamic LaTeX presentation, where I plot the copied graphics and print the text. The problem I'm facing now is, that I can derive the title for a slide dynamically from a text file, but if it's too long it will obviously wrap around. This dynamic title can be pretty long, so it might fill the whole slide. What I'd like to do now is to limit the space used by this text, without losing its information. The non-elegant solution I have to this problem is to count the characters and if it's over a certain threshold, use a smaller font. This solution is tedious and not optimal, I'd love to hear a better idea.

Is there any document for the Python win32com operations on Powerpoint?

I have gat some samples about how to open a presentation and access the slides and shapes. But I want to do some more other operations(e.g. generate a thumbnail from a specified slide). What methods can I use? Is there any document illustrating all the functionalities?
Not to discourage you, but my experience using COM from Python is that you won't find many examples.
I would be shocked (but happy to see) if anybody posted a big tutorial or reference using PowerPoint in Python. Probably the best you'll find, which you've probably already found, is this article
However, if you follow along through that article and some of the other Python+COM code around, you start to see the patterns of how VB and C# code converts to Python code using the same interfaces.
Once you understand that, your best source of information is probably the PowerPoint API reference on MSDN.
From looking at the samples Jeremiah pointed to, it looks like you'd start there then do something like this, assuming you wanted to export slide #42:
Slide = Presentation.Slides(42)
Slide.Export FileName, "PNG", 1024, 768
Substitute the full path\filename.ext to the file you want to export to for Filename; string.
Use PNG, JPG, GIF, WMF, EMF, TIF (not always a good idea from PowerPoint), etc; string
The next two numbers are the width and height (in pixels) at which to export the image; VBLong (signed 32-bit (4-byte) numbers ranging in value from -2,147,483,648 to 2,147,483,647)
I've petted pythons but never coded in them; this is my best guess as to syntax. Shouldn't be too much of a stretch to fix any errors.

Programmatically change font color of text in PDF

I'm not familiar with the PDF specification at all. I was wondering if it's possible to directly manipulate a PDF file so that certain blocks of text that I've identified as important are highlighted in colors of my choice. Language of choice would be python.
It's possible, but not necessarily easy, because the PDF format is so rich. You can find a document describing it in detail here. The first elementary example it gives about how PDFs display text is:
BT
/F13 12 Tf
288 720 Td
(ABC) Tj
ET
BT and ET are commands to begin and end a text object; Tf is a command to use external font resource F13 (which happens to be Helvetica) at size 12; Td is a command to position the cursor at the given coordinates; Tj is a command to write the glyphs for the previous string. The flavor is somewhat "reverse-polish notation"-oid, and indeed quite close to the flavor of Postscript, one of Adobe's other great contributions to typesetting.
The problem is, there is nothing in the PDF specs that says that text that "looks" like it belongs together on the page as displayed must actually "be" together; since precise coordinates can always be given, if the PDF is generated by a sophisticated typography layout system, it might position text precisely, character by character, by coordinates. Reconstructing text in form of words and sentences is therefore not necessarily easy -- it's almost as hard as optical text recognition, except that you are given the characters precisely (well -- almost... some alleged "images" might actually display as characters...;-).
pyPdf is a very simple pure-Python library that's a good starting point for playing around with PDF files. Its "text extraction" function is quite elementary and does nothing but concatenate the arguments of a few text-drawing commands; you'll see that suffices on some docs, and is quite unusable on others, but at least it's a start. As distributed, pyPdf does just about nothing with colors, but with some hacking that could be remedied.
reportlab's powerful Python library is entirely focused on generating new PDFs, not on interpreting or modifying existing ones. At the other extreme, pure Python library pdfminer entirely focusing on parsing PDF files; it does do some clustering to try and reconstruct text in cases in which simpler libraries would be stumped.
I don't know of an existing library that performs the transformational tasks you desire, but it should be feasible to mix and match some of these existing ones to get most of it done... good luck!
Highlight is possible in pdf file using PDF annotations but doing it natively is not that easy job. If any of the mentioned library provide such facility is something that you may look for.

Categories

Resources