Chinese text in Tkinter, Python

Chinese text in Tkinter, Python - python

is there anything I should do if I want to display Chinese characters in Tkinter in Python?
I want to diplay Chinese text on my labels. It seems like it works, kind of. It doesn't make the characters even - some of them are bold, some of them are not.
label = ttk.Label(text = "晚上好")
label.pack()
In this the "晚" appears bold, whereas "上好" does not. Is there anything I can do about this?

Related

Unusual font when extracting text from PDF

I have been trying to extract text from PDF files and most of the files seem to work fine. However, one particular document has text in this unusual font: ｉｎ ｓｏｌｉｄ
I have tried extraction using PHP and then Python and both were unable to fix this font. I tried copying text and tried to see if I can get it fixed in text editing tools but couldn't do much.Please note that the original PDF document looks fine but when text is copied and pasted in a text editing tool, the gap between characters starts to appear. I am completely clueless on what to do. Please suggest a solution to fix this in PHP/Python (preferably PHP).

Pre-unicode, some character encodings allowed you to compose Japanese/Korean/Chinese characters either as two half width characters or one full width character. In that case, latin characters could be full width to be mixed evenly with the other characters. You have Full Width Latin characters on your hands and that's why the space out oddly.
You can normalize the string with NFKD compatibility decomposition to get to regular latin. This will also change any half/full width Japanese/Korean/Chinese characters by, um ... I'm not sure, but I think into characters built from multi code point characters.
>>> import unicodedata
>>> t="ｉｎ ｓｏｌｉｄ"
>>> unicodedata.normalize("NFKC", t)
'in solid'

Unicode Displays Incorrectly in tkinter

I am trying to display Indian languages in tkinter GUI. I am using Python 3 and tkinter is of version 8.6.
In my code, python seems to handle the languages correctly because when I print them, the font and the character sequence are correct. But when I display the text on the tkinter GUI (Label, Text or Canvas) they are getting jumbled up or are not handled correctly.
The font seems to be not the issue as the language itself is correctly picked and many of the letters are correct.
I had a look at this thread 8 years ago tkinter cannot display unicode characters correctly , my problem seems to be similar but there is no solution given to this either.
I am pasting the simplified version of the code below. Please note - all fonts used are installed in my system.
root = tk.Tk()
text = 'श्वसन प्रणाली में नाक गुहा, ट्रेकिआ और फेफड़े होते हैं'
labelcheck = ttk.Label(text=text, font = "Lohit\ Devnagri")
textcheck = tk.Text()
textcheck.insert(tk.END, text)
canvascheck = tk.Canvas(root,width=800, height=200)
canvascheck.create_text(200, 20, font="Lohit\ Devnagri", text=text)
labelcheck.grid(row = 0, column = 0)
textcheck.grid(row =0, column = 1)
canvascheck.grid(row = 1, column =0)
print(text)
root.mainloop()
The text printed in the console is an exact match of the text in the code. The text in the tkinter UI is in the image link below.
Note that there are minor differences for a person who does not know the language but for the native speaker/reader the changes are not trivial.
So, the question is does tkinter not handle all unicode characters correctly? Should I stop digging in this direction of making it work? I am developing an application on Raspberrypi so I do not want to move away from tkinter as it is clearly very responsive and light weight.
Any help here would be invaluable to me.
Edit 1:
As per progmaticos suggestion, I took the unicode sequence of first few words and applied the normalize API to them. I still see the same issue - What python prints out is correct, while what is shown in tkinter's GUI is incorrect.
unicodetext = '\u0936\u094D\u0935\u0938\u0928\20\u092A\u094D\u0930\u0923\u093E\u0932\u0940'
text1 = unicodedata.normalize('NFC', unicodetext)
text2 = unicodedata.normalize('NFD', unicodetext)
text3 = unicodedata.normalize('NFKD', unicodetext)

The issue is not with tkinter. Looks like it is an OS issue. Same application with python version same (3.8) on windows displays the unicode characters correctly. In ubuntu and rasbian the problem still persists. Will check on that issue in the coming days. But the issue is nether tkinter's nor of python. Thanks all for helping out.

How to change the font family/style when printing to the console?

The question is straightforward, is it possible to change the font family of text in a Python print() output? Like Times New Roman, Arial, or Comic Sans?
I only want to change some of the output. Not all of the text like in this question.
I'm using Python 3 and Jupyter Notebook on a Mac.
I know it's possible to make certain text bold like so:
bold_start = '\033[1m'
bold_end = '\033[0m'
print(bold_start, "Hello", bold_end, "World")
This outputs "Hello World" instead of "Hello World" or "Hello World"

Python strings are just strings of Unicode characters, they don't say anything about font one way or another. The font is determined by whatever is rendering the characters, e.g. the terminal program you're using, or the browser you're using. The print function just spits out the resulting string.
As you pointed out, if you're in a terminal that understands those escape sequences, then you can use those to affect the output. If your output is a web page, then you can embed html code to specify whatever you like, but all the python interpreter sees is a string of characters, not a string of characters in any particular font.

the Georgian language in tkinter. Python

I can not write on a standard Georgian language in the Text widget. instead of letters writes question marks .
when no tkinter, ie, when writing code, Georgian font recognized without problems. Plus, if I copy the word written in Georgian and inserted in the text widget, it is displayed correctly.
this is elementary code that displays the text box on the screen, where I want to write a word in Georgian.
import tkinter
root = tkinter.Tk()
txt = tkinter.Text(root)
txt.pack()
root.mainloop()
the first image shows how the word is displayed when the selected Georgian language.
the second shot, when I write in the code in Georgian, and define in advance the value of text field. in this case, the text in the field is displayed normally.

Okay, so here is how I achieved it:
First, make sure you have a Georgian font installed in your computer; if there is no any, then go download one (I downloaded mine from here);
Now, go to your tkinter program, and add your font to your Text widget:
txt = tkinter.Text(root, font=("AcadNusx", 16))
NOTE 1: My font name that supports Georgian is AcadNusx, but yours can be different;
NOTE 2: If you have not imported Font, then import it at the beginning of your program;
NOTE 3: Do not change your computer's font to Georgian, because you have already changed it inside the program, so make sure it is set to English.

The best answer I can determine so far is that there is something about Georgian and keyboard entry that tk does not like, at least not on Windows.
Character 'translation' is usually called 'transliteration'.
Tk text uses the Basic Multilingual Plane (the BMP, the first 2**16 codepoints) of Unicode. This includes the Georgian alphabet. The second image shows that the default Text widget font on your system is quite capable of displaying Georgian characters once the characters are in the widget. So a new display font does not seem to be the solution to your problem.
('ქართული ენა' is visible on Firefox because FF is unicode based and can display most if not all of the BMP.)
It looks like the problem is getting the proper codes to tk without going through your editor. What OS and editor are your using. How did you enter the mixed-alphabet line similar to
txt.insert('1.0', 'ქართული ენა') # ? (I cannot copy the image string.)
How are you running the Python code? If you cut the question marks from the first image and insert into
for c in '<insert here>': print(ord(c))
what do you see?
You need a 'Georgian keyboard entry' or 'input method' program (Google shows several) for your OS that will let you switch back and forth between sending ascii and Geargian codes to any program reading the keyboard.
Windows now comes with this, with languages activated on a case-by-case basis. I already had Spanish entry, and with it I can enter á and ñ both here and into IDLE and a fresh Text box. However, when I add Georgian, I can type (randomly ;-) ჰჯჰფგეუგსკფ here (in FireFox, also MS Edge) but only get ?????? in tk Text boxes. And these are actual ascii question marks, ord('?') = 63, rather that replacements for codes that cannot be represented. Japanese also works with tk-Text based IDLE. So the problem with Georgian is not generic to all non-latin alphabets.

Returning position where a label wraps text (tkinter)

I was wondering if it was possible to return the position or character where a label wraps text, as I would like to append a "\n" into the string being printed in the label after said character. Believe it or not, I do have a good reason for doing this, although it is somewhat complicated to explain.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.