Python: How to specify and view high-numbered Unicode characters? - python

The Unicode character U+1d134 is the musical symbol for "common time"; it looks like a capital 'C'.
But using Python 3.6, when I specify '\U0001d134' I get a glyph that seems to indicate an unknown symbol. On my Mac, it looks like a square with a question-mark in it.
Is the inability to display the corresponding glyph simply a font limitation, or is it something else? (Like maybe something I'm doing wrong....)
For clarity, I want to use this and other such symbols in an app I'm writing, and would like to find out if there's a way to do this.

The problem lies not in your code but in your local system. You don't have any font installed that contains the character ๐„ด "MUSICAL SYMBOL COMMON TIME".
That is also the reason none of your browsers can display it. Usually, browsers are quite good in hunting down any font that can display a certain character. Reason they all fail is what's in the paragraph above.
But โ€“ as it happened,
>>> print ('\U0001d134')
๐„ด
worked for me, displaying this:
I pasted it into UnicodeChecker, which helpfully listed 'all' fonts that contain this character: only one, Bravura. It's an Open Source font so go ahead and download it. (Be careful to follow proper procedures if you want to distribute it along with your app.)
To think that I only had that font installed because of an earlier SO question.

Related

How do I replace all xA6 in my code

I obfuscated my python 3.5 code and now I am left with these strange black boxes. I am not able to copy them and "search and replace" with some character. How am I supposed to get rid of them? The code won't run, it keeps raising "syntaxError: invalid character in identifier"
This is very frustrating I have been stuck with this issue for hours.
If you use Notepad++, you can use the HEX-Editor plugin and view the raw hex of your source, and then use Ctrl+H to find and replace "A6" with whatever you desire.
If you do not have that plugin, it's very easy to download it using Notepad++'s built in plugin manager.
The extended ASCII Tables tell me that 0xA6 is a broken vertical bar, which certainly doesn't seem like a valid character for an identifier, so no questions there.

How to change the characters used for QTextOption.ShowTabsAndSpaces?

Is there a way to change which character is used for QT's QTextOption.ShowTabsAndSpaces flag?
I find that the default character that's used for viewing whitespace (specifically spaces) stands out a little too much. I'd like to change the font or character used so that it's less distinct.
It looks like the character used is unicode "Middle Dot", ยท (U+00B7) and I'd like to use, say, U+02D1 ห‘.
Ideally I'd like to be able to set it to whatever the user wants.
I've been searching through the Qt docs and have only been able to find how to turn this flag on (here).
EDIT:
I guess I should show some code... Here's how I'm currently adding the whitespace indicators:
opts = self.document().defaultTextOption()
opts.setFlags(opts.flags() | QTextOption.ShowTabsAndSpaces)
self.document().setDefaultTextOption(opts)
Running Python 3.4 and PyQt4, but should be able to port C++ code over.
EDIT2:
Thanks to Andrei Shikalev's answer below, I've posted a feature request for this on the QT tracker: https://bugreports.qt.io/browse/QTBUG-46072
Currently we could not change characters for tabs and white space. This characters hardcoded in Qt source for QTextLayout:
QChar visualTab(0x2192);
...
QChar visualSpace((ushort)0xb7);
More info in source for QTextLayout on GitHub.
You can create feature request for tabs and white spaces custom characters. IMHO this feature will be useful for custom-looking editors, based on Qt.

How can one find the Unicode codepoints that a font has glyphs for, on a Debian-based system?

From a scripting language (Python or Ruby, say) on a Debian-based system, I would like to find either one of:
All the Unicode codepoints that a particular font has glyphs for
All the fonts that have glyphs for a particular Unicode codepoint
(Obviously either 1 or 2 can be derived form the other, so whatever is easier would be great.) I have done this in the past by running:
fc-list : file charset
... and parsing the output at the end of each line, based on this code from fontconfig
but it seems to me that there ought to be a much simpler way of doing this.
(I'm not completely sure this is the right StackExchange site for this question, but I am looking for an answer that can be used programmatically.)
I would try any of the FreeType 2 language bindings. Here's a Perl solution to list the Unicode code points of a font using Font::FreeType:
use Font::FreeType;
Font::FreeType->new->face('DejaVuSans.ttf')->foreach_char(sub {
printf("%04X\n", $_->char_code);
});
I've recently listed the mapping from unicode codepoints to glypths in a TTF using TTX/FontTools. That tool is written in Python, so it matches the Python tag in your post. The command
ttx -t cmap foo.ttf
will generate an XML file foo.ttx which describes that mapping, for various environments and encodings. See e.g. this reference for a description of what the platform and encoding identifiers actually mean. I assume that the package can be used as a library as well as a command line tool, but I have no experience there.

How to detect the right font to use depending on the langage

For a program of mine I have a database full of street name (using GIS stuff) in unicode. The user selects any part of the world he wants to see (using openstreetmap, google maps or whatever) and my program displays every streets selected using a nice font to show their names. As you may know not every font can display non latin characters... and it gives me headaches. I wonder how to tell my program "if this word is written in chinese, then use a chinese font".
EDIT: I forgot to mention that I want to use non-standard fonts. Arial, Courier and some other can display non-latin words, but I want to use other fonts (I have a specific font for chinese, another one for japanese, another one for arabic...). I just have to know what font to chose depending of the word I want to write.
You need information about the language of the text.
And when you decide what fonts you want, you do a mapping from language to font.
If you try to do it automatically, it does not work. The fonts for Japanese, Chinese Traditional, and Chinese Simplified look differently even for the same character. They might be inteligible, but a native would be able to tell (ok, complain) that the font is wrong.
Plus, if you do anything algorithmically, there is no way to consider the estethic part (for instance the fact that you don't like Arial :-)
Use utf-8 text and a font that has glyphs for every possible character defined, like Arial/Verdana in Windows. That bypasses the entire detection problem. One font will handle everything.

page separator in Kate editor

PEP 8 says:
Python accepts the control-L (i.e. ^L)
form feed character as whitespace;
Many tools treat these characters as
page separators, so you may use them
to separate pages of related sections
of your file
This look like a great idea for me, but in the text editor I use(kate) "control+L" is for save all files. Someone have any solution?
... or I'm losing something here?
Ctrl-L simply refers to the character with ASCII code 12 (form feed, new page). It is called Ctrl-L only because some editors allow you to enter it with Ctrl-L. (For instance, in vim, one can type Ctrl-Q Ctrl-L to enter that character, and it also appears as ^L). In Kate, Ctrl-L is a shortcut for saving all files, so you cannot type it that way and I'm not sure there is any way of entering that character easily.
As a Kate developer, I unfortunately have to tell you that such control sequences are not supported. In fact, Kate often treats these files as binary files, since such characters are not human readable text. So in short: Try to avoid ^L.
You can create a plugin, and disable the shortcut Ctrl + L from the menu: Settings -> Configure shortcuts

Categories

Resources