To whomever can lend a hand.
I'm building an app with BoaConstructor in Python which uses a wx.STC.StyledTextCtrl. In this styledtextctrl I am outputting hexadecimal data through Scapy's hexdump function. It adds the line numbers, hexadecimal dump and character transcode. Unfortunately, I cannot figure out how to format this text in the StyledTextCtrl so it displays like a regular hex editor would (see images here http://imgur.com/a/tqE02). Thanks!
Since you said that the output looks OK in other editors, I'm guessing the issue is that the font uses variable pixel widths for characters (e.g. if a wide character like "w" is 15 pixels wide then a thinner character like "i" may be only 10 pixels). In a fixed-width font (sometimes called monowidth font or typewritter text font) all characters have the exact same width.
You can see evidence of this in your screen shot; there is a slight extra indent on the second row right before the ".(." part. It seems that the space character is much thinner than numerals, so all those missing pixels make the third "column" of the bottom row appear too far left (while extra width in the second row pushes it slightly right)
You need to find a font such that IsFixedWidth() returns True then set up your STC to use that font. Normally, you'd set this using the SetFont() method but I happen to know that STCs prefer to use there own methods. I found this StyleSetFont() method that is specific to STCs, so that's probably a better bet.
As for choosing your font, here is a good tutorial about how fonts work in wxpython. The author actually is an active member here. Specially, search that article for his "wx.FontEnumerator widget" example. It should allow you to find fixed-widths fonts and view them to see if you like them. For a quick-and-dirty solution, this forum post talks about the wx.TELETYPE flag that guarantees fixed-width and these code samples should give you some idea how to use it.
Good luck!
----EDIT----
I'm not very familiar with STCs but I remember from a previous answer I gave to a question involving STCs that you might need to call StyleClearAll() after setting your font. See the EDIT section of that answer for more information about why this may be required.
Related
I have been thinking for some time that variable fonts were simply combinations of multiple fonts, and that values were interpolated between them. However, I just read about this project, protottypo (which is unfortunately discontinued), and discovered about how they were storing their fonts as variables. See this screenshot from a promotional video, a few years ago:
And it seemed just so logical! Why not use a real language-like format, with variables and all. In the picture above, it (kinda) looks like python code.
And then I thought "It must have been thought through, let's look at how OpenType font variations are implemented."
And I looked on the web for the schema and the specification, but could not find it.
So the actual question(s):
How are variable fonts stored in otf files? Is it simply, as I thought before, multiple fonts and the other values are interpolated between them? Is there a variable language like the one above used to write the variable parts of the font (obviously)?
Where can I find the TTF specification for variable fonts? Is there any?
Is there a way to write a variable font with a regular text file (of course involving some vector graphics of some sort, like: const d = 'M23.6,0c-3.4,0-6.3,2.7-7.6,5.6C14.7,2.7,11.8,0,8.4,0C3.8,0,0,3.8,0,8.4c0,9.4,9.5,11.9,16,21.2 c6.1-9.3,16-12.1,16-21.2C32,3.8,28.2,0,23.6,0z' (this one makes a heart)
Thank you (that's what the heart is for :)
FontTools is producing some XML with all sorts of details in this structure
<cmap>
<tableVersion version="0"/>
<cmap_format_4 platformID="0" platEncID="3" language="0">
<map code="0x20" name="space"/><!-- SPACE -->
<!--many, many more characters-->
</cmap_format_4>
<cmap_format_0 platformID="1" platEncID="0" language="0">
<map code="0x0" name=".notdef"/>
<!--many, many more characters again-->
</cmap_format_0>
<cmap_format_4 platformID="0" platEncID="3" language="0"> <!--"cmap_format_4" again-->
<map code="0x20" name="space"/><!-- SPACE -->
<!--more "map" nodes-->
</cmap_format_4>
</cmap>
I'm trying to figure out every character this font supports, so these code attributes are what I'm interested in. I believe I am correct in thinking that all code attributes are UTF-8 values: is this correct? I am curious why there are two nodes cmap_format_4 (they seem to be identical, but I haven't tested that with a thorough amount of fonts those, so if someone familiar with this module knows for certain, that is my first question).
To be assured I am seeing all characters contained in the typeface, do I need to combine all code attribute values, or just one or two. Will FontTools always produce these three XML nodes, or is the quantity variable? Any idea why? The documentation is a little vague.
the number of cmap_format_N nodes ("cmap subtables") is variable, as is the 'N' (the format). There are several formats; the most common is 4, but there is also format 12, format 0, format 6, and a few others.
fonts may have multiple cmap subtables, but are not required to. The reason for this is the history of the development of TrueType (which has evolved into OpenType). The format was invented before Unicode, at a time when each platform had their own way(s) of character mapping. The different formats and ability to have multiple mappings was necessity at the time in order to have a single font file that could map everything without multiple files, duplication, etc. Nowadays most fonts that are produced will only have a single Unicode subtable, but there are many floating around that have multiple subtables.
The code values in the map node are code point values expressed as hexadecimal. They might be Unicode values, but not necessarily (see the next point).
I think your font may be corrupted (or possibly there was copy/paste mix-up). It is possible to have multiple cmap_format_N entries in the cmap, but each combination of platformID/platformEncID/language should be unique. Also, it is important to note that not all cmap subtables map Unicodes; some express older, pre-Unicode encodings. You should look at tables where platformID="3" first, then platformID="0" and finally platformID="2" as a last resort. Other platformIDs do not necessarily map Unicode values.
As for discovering "all Unicodes mapped in a font": that can be a bit tricky when there are multiple Unicode subtables, especially if their contents differ. You might get close by taking the union of all code values in all of the subtables that are known to be Unicode maps, but it is important to understand that most platforms will only use one of the maps at a time. Usually there is a preferred picking order similar to what I stated above; when one is found, that is the one used. There's no standardized order of preference that applies to all platforms (that I'm aware of), but most of the popular ones follow an order pretty close to what I listed.
Finally, regarding Unicode vs UTF-8: the code values are Unicode code points; NOT UTF-8 byte sequences. If you're not sure of the difference, spend some time reading about character encodings and byte serialization at Unicode.org.
I'm trying to allow wrapping of the text in a Gtk.CellRendererText but I have small problem:
Those rows are very large.
The only code I changed was this:
cell = Gtk.CellRendererText(markup=0)
cell.set_property("wrap_mode", Pango.WrapMode.WORD)
cell.set_property("wrap_width", 20)
And that makes it wrap, but it also seemed to make this visual issue appear
I seem to remember reading something about this on a blog at planet gnome quite a while ago. From what I remember there is something to do with the height-for-width drawing model that means when wrapping is enabled GtkLabel etc request enough height to reflow the text for wrap-width even if there is more horizontal space available which leaves loads of empty space when the width is wider. There was a fix but I'm afraid I can't remember it at the moment, I'll try and find the original post later.
I've tried but I can't find the post, however having read some more I'm pretty sure this is the problem. There is some discussion related to GtkTable doing similar things at https://bugs.launchpad.net/ubuntu/+source/gtk+3.0/+bug/825173 I've a nasty feeling the fix I can't remember properly might have been to turn off wrapping. I guess it would be possible to get a notification when the column width changes and make wrap-width the correct width for that value but that's a bit of a pain.
If you can live with the column being a fixed width, set the expand property of the column to False and the fixed-width property to True then set the wrap-width, width-chars and max-width-chars properties of the renderer all to the same value then the text wraps without any extra space.
I try to gather some graphics and text from different folders and present them in a comprehensive way. For this I use python to copy them into one folder and derive a dynamic LaTeX presentation, where I plot the copied graphics and print the text. The problem I'm facing now is, that I can derive the title for a slide dynamically from a text file, but if it's too long it will obviously wrap around. This dynamic title can be pretty long, so it might fill the whole slide. What I'd like to do now is to limit the space used by this text, without losing its information. The non-elegant solution I have to this problem is to count the characters and if it's over a certain threshold, use a smaller font. This solution is tedious and not optimal, I'd love to hear a better idea.
I'm not familiar with the PDF specification at all. I was wondering if it's possible to directly manipulate a PDF file so that certain blocks of text that I've identified as important are highlighted in colors of my choice. Language of choice would be python.
It's possible, but not necessarily easy, because the PDF format is so rich. You can find a document describing it in detail here. The first elementary example it gives about how PDFs display text is:
BT
/F13 12 Tf
288 720 Td
(ABC) Tj
ET
BT and ET are commands to begin and end a text object; Tf is a command to use external font resource F13 (which happens to be Helvetica) at size 12; Td is a command to position the cursor at the given coordinates; Tj is a command to write the glyphs for the previous string. The flavor is somewhat "reverse-polish notation"-oid, and indeed quite close to the flavor of Postscript, one of Adobe's other great contributions to typesetting.
The problem is, there is nothing in the PDF specs that says that text that "looks" like it belongs together on the page as displayed must actually "be" together; since precise coordinates can always be given, if the PDF is generated by a sophisticated typography layout system, it might position text precisely, character by character, by coordinates. Reconstructing text in form of words and sentences is therefore not necessarily easy -- it's almost as hard as optical text recognition, except that you are given the characters precisely (well -- almost... some alleged "images" might actually display as characters...;-).
pyPdf is a very simple pure-Python library that's a good starting point for playing around with PDF files. Its "text extraction" function is quite elementary and does nothing but concatenate the arguments of a few text-drawing commands; you'll see that suffices on some docs, and is quite unusable on others, but at least it's a start. As distributed, pyPdf does just about nothing with colors, but with some hacking that could be remedied.
reportlab's powerful Python library is entirely focused on generating new PDFs, not on interpreting or modifying existing ones. At the other extreme, pure Python library pdfminer entirely focusing on parsing PDF files; it does do some clustering to try and reconstruct text in cases in which simpler libraries would be stumped.
I don't know of an existing library that performs the transformational tasks you desire, but it should be feasible to mix and match some of these existing ones to get most of it done... good luck!
Highlight is possible in pdf file using PDF annotations but doing it natively is not that easy job. If any of the mentioned library provide such facility is something that you may look for.