Python counting zero-length control characters in string formatting width field? - python

In Python 3.4.3, I was trying to width-align some fields using the string.format() operator, and it appears to count zero-length control characters against the width total. Sample code:
ANSI_RED = "\033[31m"
ANSI_DEFAULT="\033[39m\033[49m"
string1 = "12"
string2 = ANSI_RED+"12"+ANSI_DEFAULT
print("foo{:4s}bar".format(string1))
print("foo{:4s}bar".format(string2))
This will output:
foo12 bar
foo12bar
(with the second output having '12' in red, but I can't reproduce that in SO)
In the second case, I've lost my field width, I assume because Python saw that the total number of chars in the string was larger than the width, despite most of those chars resulting in zero-length on an ANSI-conforming terminal.
What's a clean way of having ANSI colors and working field widths?

What's a clean way of having ANSI colors and working field widths?
Unfortunately, you will have to strip the escape sequences to get a displayed field width.
The len() function returns the number of bytes in a Python 2 str type and the number of code points in a Python 3 str type. That length has never been guaranteed to match the display width (which is a more challenging problem):
>>> s = 'abc\bde'
>>> print s
abcde
>>> len(s)
6
In general, you can't know the display width for certain unless you know something about how the display will interpret the codes (i.e. the width is different depending on whether the device supports ANSI escape sequences).

I don't know if it will qualify as "clean" but something in the vain of the following is workable:
print("foo{0}{1:4s}{2}bar".format(ANSI_RED, string1, ANSI_DEFAULT))

Getting terminal control codes right is really difficult (as seen below, not all of them have a well-defined width), so your best bet is probably to use explicit column movement.
# string2 defined as above
def col(n): return "\033[{:d}G".format(n)
print("foo{:s}{:s}bar".format(string2,col(8)))
Output:
foo12 bar

Related

python in Visual Studio Code - how to print funky stuff

I have been testing printing colors and characters in VS Code (version 1.69) using python 3.+. To print colored text in VS code you would use:
print("\033[31mThis is red font.\033[0m")
print("\033[32mThis is green font.\033[0m")
print("\033[33mThis is yellow font.\033[0m")
print("\033[34mThis is blue font.\033[0m")
print("\033[37mThis is the default font. \033[0m")
Special characters would be like the following:
print("\1\2\3\4\05\06\07\016\017\013\014\020")
print("\21\22\23\24\25\26\27\36\37\31\32\34\35")
Part 1 of my question: How would you print special characters from a loop? What I tried is:
for i in range(1, 99):
t = "\\" + str(i)
print(t)
Part 2: Is there a way to print dark text with a colored highlighted background?
The first example is showing ansi escape sequences, the second example is using a common convention in many languages, including Python, to include non-standard characters in a string by escaping their character value, but in your example, you may not be realising that you're escaping octal values, instead of decimal ones.
Printing them is no different from printing any character though - I think you may be confusing printing strings representing values and the actual values of variables, a very common mistake/confusion for beginning programmers. If you want to be able to print ('\21') without writing out the string, you could just print(chr(17)), because 17 is the decimal equivalent of octal 21.
Have a look at the documentation for string literals for more detail.
The loop you're trying to create would be something like:
for i in range(1, 99):
print(chr(i))
But you have to keep in mind that if i gets to 21, it's not printing '\21', but '\25' since 25 is the octal representation of the decimal value 21.
Note: also, you're asking specifically about VSCode, but that's a different question altogether. Whether or not the console in VSCode supports printing ANSI escape sequences depends on the type of terminal, it doesn't really have that much to do with what you do in your code. However, if you want ANSI escape sequences to render in text files, there's extensions for that.

Total number of alphanum values in python

I am making a code in which every character is associated with a number. To make my life easier I decided to use alphanumeric values (a=97, b=98, z=121). The first step in my code would be to get a number out of a character. For example:
char = input"Write a character:"
print(ord(char.lower()))
Afterwards though, I need to know the total number of alphanum characters that exist and nowhere have I found my answer...
Your question is not very clear. Why would you need total number of alphanum characters?
thing is, that number depends on the encoding in question. If ASCII is in question then:
>>> import string
>>> len(string.letters+string.digits)
Which is something you could do by counting manually.
And this is even not really the total count, as there is a few more alpha from other languages within 0-128 ASCII range.
If unicode, well, then you will have to search for the specification to see how many of these are there. I do not even know how many alphabets are crammed into unicode or UTF-8.
If it is a question of recognizing alpha-numeric characters in a string, then Python has a nice method to do so:
>>> "A".isalnum()
>>> "0".isalnum()
>>> "[".isalnum()
So please, express yourself more clearly.

ZWNJ not shown properly in python 3.3

I am trying to replace the space between two tokens written in the Arabic alphabet with a ZWNJ but what the function returns is not decoded properly on the screen:
>>> nm.normalize("رشته ها")
'رشته\u200cها'
\u200 should be rendered as a half-space that would be placed between 'رشته' and 'ها' here, but it gets messed up like that. I am using Python 3.3.3
The function returned a string object with the \u200c character as part of it, but Python shows you the representation. The \uxxxx syntax is used to make the representation useful as a debugging value, you can now copy that representation and paste it back into Python and get the exact same value.
In other words, the function worked exactly as advertised; the space was indeed replaced by a U+200C ZERO WIDTH NON-JOINER codepoint.
If you wanted to write the string to your terminal or console, use print():
print(nm.normalize("رشته ها"))
Demo:
>>> result = 'رشته\u200cها'
>>> len(result)
7
>>> result[4]
'\u200c'
>>> print(result)
رشته‌ها
You can see that character 5 (index 4) is a single character here, not 6 separate characters.

Display width of unicode strings in Python [duplicate]

This question already has answers here:
Normalizing Unicode
(2 answers)
Closed 8 years ago.
How can I determine the display width of a Unicode string in Python 3.x, and is there a way to use that information to align those strings with str.format()?
Motivating example: Printing a table of strings to the console. Some of the strings contain non-ASCII characters.
>>> for title in d.keys():
>>> print("{:<20} | {}".format(title, d[title]))
zootehni- | zooteh.
zootekni- | zootek.
zoothèque | zooth.
zooveterinar- | zoovet.
zoovetinstitut- | zoovetinst.
母 | 母母
>>> s = 'è'
>>> len(s)
2
>>> [ord(c) for c in s]
[101, 768]
>>> unicodedata.name(s[1])
'COMBINING GRAVE ACCENT'
>>> s2 = '母'
>>> len(s2)
1
As can be seen, str.format() simply takes the number of code-points in the string (len(s)) as its width, leading to skewed columns in the output. Searching through the unicodedata module, I have not found anything suggesting a solution.
Unicode normalization can fix the problem for è, but not for Asian characters, which often have larger display width. Similarly, zero-width unicode characters exist (e.g. zero-width space for allowing line breaks within words). You can't work around these issues with normalization, so please do not suggest "normalize your strings".
Edit: Added info about normalization.
Edit 2: In my original dataset also have some European combining characters that don't result in a single code-point even after normalization:
zwemwater | zwemw.
zwia̢z- | zw.
>>> s3 = 'a\u0322' # The 'a + combining retroflex hook below' from zwiaz
>>> len(unicodedata.normalize('NFC', s3))
2
You have several options:
Some consoles support escape sequences for pixel-exact positioning of the cursor. Might cause some overprinting, though.
Historical note: This approach was used in the Amiga terminal to display images in a console window by printing a line of text and then advancing the cursor down by one pixel. The leftover pixels of the text line slowly built an image.
Create a table in your code which contains the real (pixel) widths of all Unicode characters in the font that is used in the console / terminal window. Use a UI framework and a small Python script to generate this table.
Then add code which calculates the real width of the text using this table. The result might not be a multiple of the character width in the console, though. Together with pixel-exact cursor movement, this might solve your issue.
Note: You'll have to add special handling for ligatures (fi, fl) and composites. Alternatively, you can load a UI framework without opening a window and use the graphics primitives to calculate the string widths.
Use the tab character (\t) to indent. But that will only help if your shell actually uses the real text width to place the cursor. Many terminals will simply count characters.
Create a HTML file with a table and look at it in a browser.

How can I get Python to use upper case letters when printing hexadecimal values?

In Python v2.6 I can get hexadecimal for my integers in one of two ways:
print(("0x%x")%value)
print(hex(value))
However, in both cases, the hexadecimal digits are lower case. How can I get these in upper case?
Capital X (Python 2 and 3 using sprintf-style formatting):
print("0x%X" % value)
Or in python 3+ (using .format string syntax):
print("0x{:X}".format(value))
Or in python 3.6+ (using formatted string literals):
print(f"0x{value:X}")
Just use upper().
intNum = 1234
hexNum = hex(intNum).upper()
print('Upper hexadecimal number = ', hexNum)
Output:
Upper hexadecimal number = 0X4D2
print(hex(value).upper().replace('X', 'x'))
Handles negative numbers correctly.
By using uppercase %X:
>>> print("%X" % 255)
FF
Updating for Python 3.6 era: Just use 'X' in the format part, inside f-strings:
print(f"{255:X}")
(f-strings accept any valid Python expression before the : - including direct numeric expressions and variable names).
The more Python 3 idiom using f-strings would be:
value = 1234
print(f'0x{value:X}')
'0x4D2'
Notes (and why this is not a duplicate):
shows how to avoid capitalizing the '0x' prefix, which was an issue in other answers
shows how to get variable interpolation f'{value}'; nobody actually ever puts (hardcoded) hex literals in real code. There are plenty of pitfalls in doing variable interpolation: it's not f'{x:value}' nor f'{0x:value}' nor f'{value:0x}' nor even f'{value:%x}' as I also tried. So many ways to trip up. It still took me 15 minutes of trial-and-error after rereading four tutorials and whatsnew docs to get the syntax. This answer shows how to get f-string variable interpolation right; others don't.

Categories

Resources