Aligning strings in Python

Aligning strings in Python - python

I searched for creating aligned strings in Python and found some relevant stuff, but didn't work for me. Here's one example:
for line in [[1, 128, 1298039], [123388, 0, 2]]:
print('{:>8} {:>8} {:>8}'.format(*line))
Output:
1 128 1298039
123388 0 2
This is what I see in the shell:
As you can see, the alignment didn't happen. Same problem arises when using \t.
What can I do to align the strings in a neat, tabular format?

You have configured your IDLE shell to use a proportional font, one that uses different widths for different characters. Notice how the () pair takes almost the same amount of horizontal space as the > character above it.
Your code is otherwise entirely correct; with a fixed-width font the numbers will line up correctly.
Switch to using a fixed width font instead. Courier is a good default choice, but Windows has various other fonts installed that are proportional, including Consolas.
Configure the font in the Options -> Configure IDLE menu. Pick a different font from the Font Face list. The sample characters in the panel below should line up (except for the second line k at the end, it should stick out).

Related

tkinter wraplength units is pixels?

Re: wraplength for tkinter.Label, from the NM Tech docs by John Shipman: "You can limit the number of characters in each line by setting this option to the desired number. The default value, 0, means that lines will be broken only at newlines." Other sources agree that the units for wraplength are characters.
The code below seems to be breaking the line as if the units of wraplength were pixels, not characters. If I set wraplength to 10, for example, the label displays a column of text one or two characters wide. If I set wraplength to 20, the lines are 3 or 4 characters long.
In my application, the user will be creating his own simple widgets for custom forms, and it would be better if the wraplength option used character count for units instead of whatever it is doing. Since the NMTech docs are for Tkinter 8.5, but 8.6 is what comes with my Python 3.5, maybe that explains the difference, but I don't see docs for 8.6.
But ideally, lines should wrap at the nearest space between words with wraplength just used as a maximum line length. So if the user has to type in his own \n anyway, wraplength seems useless to me.
Summary: in a GUI for users who will be inputing option values for their own simple widgets, 1) is there a way to get Tkinter to accept characters for wraplength units and 2) can I get the line to break at the nearest space short of the wraplength with wraplength used as a maximum line length only, instead of an absolute line length?
Thanks for any solutions or suggestions.
import tkinter as tk
root = tk.Tk()
sib = tk.Label(root, text='Give em hell Harry', wraplength=10)
sib.grid()
root.mainloop()

Is there a way to get Tkinter to accept characters for wraplength units
No, there is not. If you need wrapping to happen at word boundaries you can use a text widget rather than a label.
Unless specified otherwise, the units are in pixels. From the canonical tk documentation:
wraplength For widgets that can perform word-wrapping, this option specifies the maximum line length. Lines that would exceed this length are wrapped onto the next line, so that no line is longer than the specified length. The value may be specified in any of the standard forms for screen distances. If this value is less than or equal to 0 then no wrapping is done: lines will break only at newline characters in the text.
In the above text, "any of the standard forms for screen distances" refers to the fact that you can use a suffix to specify the distance in printers points (eg: "72p"), centimeters (eg: "2.54c"), millimeters (eg: "1000m"), or inches (eg: "1i")

This answer is two years later, but if anyone that sees it and needs to wrap text in a label based on characters or words, you could use the textwrap module. It's very useful if you don't want words to get cut-off between lines but still want to keep them in a label widget.

Formatting string not working

I have done some research on formatting string but it does not want to work for me. I have this
for i in range(0,10):
stat = arr[i]
highscoreText = GameFont.render('{0:12}{1:>0}'.format(stat["Name"],stat["Score"]),2,(255,255,255))
Screen.blit(highscoreText,[50,50 + (i*30)])
Output: http://prntscr.com/b9abfw
The name works but I can't seem to make the Score align to the right.

The string formatting works as expected. Try to print formatted sting in console. The problem with the font you use. See, ll in hello take the same span as k below.
To solve this you have to render names and scores separately and then blit them at appropriate positions.
Or you can change the font you use to a monospace, like Courier or Dejavu mono

String formatting assumes that you are using a monospace font. Since you have decided to use a proportional font you will need to draw as separate blocks and use the graphics routines to align each block to the right.

Force LaTeX font to match default matplotlib font

I have seen this issue pop up here and there but have yet to find a suitable answer.
When making a plot in matplotlib, the only way to insert symbols and math functions (like fractions, exponents, etc...) is to use TeX formatting. However, by default TeX formatting uses a different font AND italicizes the text. So for example, if I wanted an axis label to say the following:
photons/cm^2/s/Angstrom
I have to do the following:
ax1.set_ylabel(r'Photons/$cm^2$/s/$\AA$')
This produces a very ugly label that uses 2 different fonts and has bits and pieces italicized.
How do I permanently change the font of TeX (Not the other way around) so that it matches the default font used by matplotlib?
I have seen other solutions that tell the user to manually make all text the same in a plot by using \mathrm{} for example but this is ridiculously tedious. I have also seen solutions which change the default font of matplotlib to match TeX which seem utterly backwards to me.

It turns out the solution was rather simple and a colleague of mine had the solution.
If I were to use this line of code to create a title:
fig.suptitle(r'$H_2$ Emission from GJ832')
The result would be "H2 Emission from GJ832" which is an illustration of the problem I was having. However, it turns out anything inside of the $$ is converted to math type and thus the italics assigned.
If we change that line of code to the following:
fig.suptitle(r'H$_2$ Emission from GJ832')
Then the result is "H2 Emission from GJ832" without the italics. So this is an example of where we can constrain the math type to include only the math parts of the text, namely creating the subscript of 2.
However, if I were to change the code to the following:
fig.suptitle(r'H$_{two}$ Emission from GJ832')
the result would be "Htwo Emission from GJ832" which introduces the italics again. In this case, and for any case where you must have text (or are creating unit symbols) inside the dollar signs, you can easily remove the italics the following way:
fig.suptitle(r'H$_{\rm two}$ Emission from GJ832')
or in the case of creating a symbol:
ax2.set_xlabel(r'Wavelength ($\rm \AA$)')
The former results in "Htwo Emission from GJ832"
and the latter in "Wavelength (A)"
where A is the Angstrom symbol.
Both of these produce the desired result with nothing italicized by calling \rm before the text or symbol in the dollar signs. The result is nothing italicized INCLUDING the Angstrom symbol created by \AA.
While this doesn't change the default of the TeX formatting, it IS a simple solution to the problem and doesn't require any new packages. Thank you Roland Smith for the suggestions anyway. I hope this helps others who have been struggling with the same issue.

For typesetting units, use the siunitx package (with mode=text) rather than math mode.
Update: The above is only valid when you have defined text.usetex : True in your rc settings.
From the matplotlib docs:
Note that you do not need to have TeX installed, since matplotlib ships its own TeX expression parser, layout engine and fonts.
And:
Regular text and mathtext can be interleaved within the same string. Mathtext can use the Computer Modern fonts (from (La)TeX), STIX fonts (with are designed to blend well with Times) or a Unicode font that you provide. The mathtext font can be selected with the customization variable mathtext.fontset
Reading this, it sounds that setting mathtext.fontset and the regular font that matplotlib uses the same would solve the problem if you don't use TeX.

Lines vs rows in the terminal

There appears to be some concept of lines vs rows in terminal emulators, about which I'd like to know more.
Demonstration of what I mean by rows vs lines
The Python script below displays three lines of 'a' and waits, then with three lines of 'b'.
import sys, struct, fcntl, termios
write = sys.stdout.write
def clear_screen(): write('\x1b[2J')
def move_cursor(row, col): write('\x1b['+str(row)+';'+str(col)+'H')
def current_width(): #taken from blessings so this example doesn't have dependencies
return struct.unpack('hhhh', fcntl.ioctl(sys.stdout.fileno(), termios.TIOCGWINSZ, '\000' * 8))[1]
clear_screen()
for c in 'ab':
#clear_screen between loops changes this behavior
width = current_width()
move_cursor(5, 1)
write(c*width+'\n')
move_cursor(6, 1)
write(c*width+'\n')
move_cursor(7, 1)
write(c*width+'\n')
sys.stdout.flush()
try: input() # pause and wait for ENTER in python 2 and 3
except: pass
If you narrow the terminal window width by one character during this break, you see
That seems pretty reasonable - each line has been separately wrapped. When we hit enter again to print bs,
Everything works as expected. I've used absolute cursor positioning, and written to the same rows I wrote to previously - which of course doesn't overwrite all of the a's, because many of them are on other rows.
However, when we narrow the window by one more character, the wrapping works differently:
Why did the second and third rows of b wrap together, and why did last line of a's merge with the first line of b's? A hint of why is in the top visible row above - we see two a's because theyse two rows are still linked - of course if we move the window again, that one line will continue to wrap the same way. This seems to be happening even for lines which we replaced a whole row of.
It turns out that the rows that had wrapped before are now linked to their corresponding parent rows; it's more obvious that they belong to the same logical line once we widen the terminal a lot:
My question
Practically, my question is how to prevent or predict this rows-being-combined-into-lines. Clearing the whole screen eliminates the behavior, but it would be nice to do this only for individual lines that need it if possible so I can keep the caching by line that is significantly speeding up my application. Clearing to the end of a row unlinks that row from the row below it, but clearing to the beginning of a row does not unlink that row from the one above it.
I'm curious - what are these line things? Where can I read about them? Can I find out which rows are part of the same line?
I've observed this behavior with terminal.app and iterm, with and w/o tmux. I imagine source-diving into any of these would yield an answer even if there's no spec - but I imagine there's a spec somewhere!
Background: I'd like to make a terminal user interface that can predict the way terminal wrapping will occur if the user decreases the window width. I'm aware of things like fullscreen mode (tput smcup, or python -c 'print "\x1b[?1049h"', which are what ncurses uses) which would work for preventing line wrap, but don't want to use it here.
Edit: made it more clear that I understand the overwriting behavior of the script already and want an explanation of the wrapping behavior.

OK. Let's start with the causes for the behavior you are seeing:
I tested your code and noticed that it only happened when you resized the window. When the window was left alone, it would write out the a's, and upon pressing enter would over-write them with b's (I assume that's the intended behavior).
What appears to be happening is that when you resize the window partway through, the line indices change, so that on your next iteration, you can't trust the same coordinates when you call move_cursor().
Interestingly, when you resize the window, the word wrapping pushes the text before the cursor upwards. I assume this is part of the terminal emulator's code (since we almost always want to retain focus on the cursor and if the cursor is at the bottom of the screen, resizing might obscure it beyond the window's height if the word-wrapping pushed it downwards).
You'll notice that after a resize when you hit enter, only two lines of a's remain visible (and not all 3). Here's what appears to be happening:
First we begin with the initial output. (line numbers added for clarity)
1
2
3
4
5 aaaaaaaaaaaaaaa\n
6 aaaaaaaaaaaaaaa\n
7 aaaaaaaaaaaaaaa\n
8
Note that there is a new line character at the end of each of these lines (which is why your cursor appears below the last despite your not having moved the cursor again)
When you shrink the window by one character, this happens:
1
2 aaaaaaaaaaaaaa
3 a\n
4 aaaaaaaaaaaaaa
5 a\n
6 aaaaaaaaaaaaaa
7 a\n
8
You'll notice what I mean by "pushing the text upwards"
Now when you hit enter and your loop reiterates, the cursor is sent to row 5 col 1 (as specified by your code) and is placed directly over the last a of the second line. When it starts writing b's it overwrites the last a of the second line with b's and the subsequent line as well.
1
2 aaaaaaaaaaaaaa
3 a\n
4 aaaaaaaaaaaaaa
5 bbbbbbbbbbbbbb\n
6 bbbbbbbbbbbbbb
7 bbbbbbbbbbbbbb\n
8
Importantly, this also overwrites the new-line character at the end of the second line of a's. This means that there is now no new-line dividing the second line of a's and the first line of b's, so when you expand the window: they appear as a single line.
1
2
3
4
5 aaaaaaaaaaaaaaa\n
6 aaaaaaaaaaaaaabbbbbbbbbbbbbb\n
7 bbbbbbbbbbbbbbbbbbbbbbbbbbbb\n
8
I'm not totally sure why this second line of b's also gets put together but it appears to likely have something to do with the fact that the line of a's which the first one overwrites is now missing it's own new-line termination. However, that's just a guess.
The reason why you get two characters of line-wrap if you try to shrink the window by yet another character is because now you are shrinking two halves of the same line of text, which means that one pushes on the other, causing two characters instead of one at the end.
For example: in these test windows I've shown, the width begins at 15 characters, I then shrink it to 14 and print out the b's. There is still one line of a's which is 15 chars long, and now a line of 14 a's & 14 b's which is line-wrapped at 14 chars. The same (for some reason) is true of the last two rows of b's (they are one line of 28 chars, wrapped at 14). So when you shrink the window by one more character (down to 13): the first line of 15 a's now has two trailing characters (15 - 13 = 2); the next line of 28 chars now has to fit in a 13 character-wide window (28 / 13 = 2 R2), and the same applies to the last b's as well.
0 aaaaaaaaaaaaa
1 aa\n
2 aaaaaaaaaaaaa
3 abbbbbbbbbbbb
4 bb\n
5 bbbbbbbbbbbbb
6 bbbbbbbbbbbbb
7 bb\n
8
Why does it work like this?:
This sort of stuff is the difficulty you run into when you are trying to run your program within another program that has the power to reposition the text as it sees fit. In the event of a resize your indices become unreliable. Your terminal emulator is trying to handle the realignment for you and is pushing the text before your prompt (which is fixed at row 8) up and down in the scroll-back to ensure you can always see your active prompt.
Rows and columns are something defined by the terminal/terminal emulator and it is up to it to interpret their location accordingly. When the appropriate control sequences are given it is the terminal which interprets them accordingly for proper display.
Note that SOME terminals do behave differently and in an emulated terminal there is often a setting to change what sort of terminal it is emulating, which may also affect how certain escape sequences respond. This is why a UNIX environment usually has a setting or environment variable ($TERM) which tells it which type of terminal it is communicating with so that it knows what control sequences to send.
Most Terminals use standard ANSI compliant control sequences, or systems based on the DEC VT series of Hardware Terminals.
In the Terminal.app preferences under Preferences->Settings->Advanced you can actually see (or change) which type of Terminal is being emulated by your window in the drop-down menu next to "Declare terminal as:"
How to overcome this:
You might be able to mitigate this by storing the last known width and checking to see if there has been a change. In which case you can change your cursor logic to compensate for the changes.
Alternately you might consider using escape sequences designed for relative cursor movement (as opposed to absolute) to avoid accidentally overwriting previous lines after a resize. There is also the ability to save and restore specific cursor locations using only escape sequences.
Esc[<value>A Up
Esc[<value>B Down
Esc[<value>C Forward
Esc[<value>D Backward
Esc[s Save Current Position
Esc[u Restore Last Saved Position
Esc[K Erase from cursor position to end of line
However you have no real guarantee that all Terminal emulators will deal with window resizes the same way (that's not really part of any terminal standard, AFAIK), or that it won't change in the future. If you are hoping to make a true terminal emulator, I suggest first getting your GUI window setup so that you can be in control of all resizing logic.
However if you want to run in a terminal-emulator window and deal with mitigating window resizes for a given command-line utility that you're writing. I'd suggest looking at the curses library for python. This is the sort of functionality used by all window-resize aware programs that I know of off the top of my head (vim, yum, irssi), and can deal with this sort of changes. Though I don't personally have any experience using it.
It's available for python via the curses module.
(and please, if you plan on redistributing your program, consider writing it in Python3. Do it for the children :D)
Resources:
These links might be helpful:
ANSI Escape Sequences
VT100 Escape Sequences
I hope that helps!

As 0x783czar pointed out, the key difference is whether an explicit newline was printed which caused the terminal to begin a new row, or there was an implicit overflow because there was no room left on the right to print the desired characters.
It's important to remember this at the end of each line for copy-pasting purposes (whether there'll be a newline character in the buffer or not), for triple-click highlight behavior in many terminals, and for rewrapping the contents when the window is resized (in those terminals that support it).
Applications running inside terminals hardly ever care about this difference, and they use the words "line" and "row" interchangeably. Hence, when we implemented rewrapping the contents on resize in gnome-terminal, we preferred the words "row" or "line" for one single visual line of the terminal, and the word "paragraph" for the contents between two adjacent newline characters. A paragraph wraps into multiple lines if it's wider than the terminal. (This is not by any means an official terminology, but IMO is quite reasonable and helps talk about these concepts.)

Display width of unicode strings in Python [duplicate]

This question already has answers here:
Normalizing Unicode
(2 answers)
Closed 8 years ago.
How can I determine the display width of a Unicode string in Python 3.x, and is there a way to use that information to align those strings with str.format()?
Motivating example: Printing a table of strings to the console. Some of the strings contain non-ASCII characters.
>>> for title in d.keys():
>>> print("{:<20} | {}".format(title, d[title]))
zootehni- | zooteh.
zootekni- | zootek.
zoothèque | zooth.
zooveterinar- | zoovet.
zoovetinstitut- | zoovetinst.
母 | 母母
>>> s = 'è'
>>> len(s)
2
>>> [ord(c) for c in s]
[101, 768]
>>> unicodedata.name(s[1])
'COMBINING GRAVE ACCENT'
>>> s2 = '母'
>>> len(s2)
1
As can be seen, str.format() simply takes the number of code-points in the string (len(s)) as its width, leading to skewed columns in the output. Searching through the unicodedata module, I have not found anything suggesting a solution.
Unicode normalization can fix the problem for è, but not for Asian characters, which often have larger display width. Similarly, zero-width unicode characters exist (e.g. zero-width space for allowing line breaks within words). You can't work around these issues with normalization, so please do not suggest "normalize your strings".
Edit: Added info about normalization.
Edit 2: In my original dataset also have some European combining characters that don't result in a single code-point even after normalization:
zwemwater | zwemw.
zwia̢z- | zw.
>>> s3 = 'a\u0322' # The 'a + combining retroflex hook below' from zwiaz
>>> len(unicodedata.normalize('NFC', s3))
2

You have several options:
Some consoles support escape sequences for pixel-exact positioning of the cursor. Might cause some overprinting, though.
Historical note: This approach was used in the Amiga terminal to display images in a console window by printing a line of text and then advancing the cursor down by one pixel. The leftover pixels of the text line slowly built an image.
Create a table in your code which contains the real (pixel) widths of all Unicode characters in the font that is used in the console / terminal window. Use a UI framework and a small Python script to generate this table.
Then add code which calculates the real width of the text using this table. The result might not be a multiple of the character width in the console, though. Together with pixel-exact cursor movement, this might solve your issue.
Note: You'll have to add special handling for ligatures (fi, fl) and composites. Alternatively, you can load a UI framework without opening a window and use the graphics primitives to calculate the string widths.
Use the tab character (\t) to indent. But that will only help if your shell actually uses the real text width to place the cursor. Many terminals will simply count characters.
Create a HTML file with a table and look at it in a browser.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.