How to print overline character in python 3? - python

Please don't mark my question as already answered, because in all of the questions on stackoverflow or in the Unicode HOWTO I can't figure out how to print the overline or U+203E character in Python 3. Can someone please explain in baby programmer language how to print unicode characters like this one? I have tried some things, but to be honest I had no idea what I was doing.
I am working Kubuntu xenial (16.04).
When I try to print the character I get a UnicodeEncodeError. My question would be, how to work around this error?
EDIT 1: Problem located
I have now figured out my locale is set to POSIX, which would be ASCII encoding. I will try to set it to UTF-8 encoding.
EDIT 2: Still no solution
I have found out what I need to change, I just haven't found out how to. For anyone with the same issue, there's a comment with a link to a post where a similar problem is solved.
EDIT 3: Final answer
Here is a link to an askubuntu forum where I asked how to edit my /etc/default/locale file. Turns out one command in the Linux shell was enough. For me a lot of stuff doesn't work, but this command allowed me to set my locale to en_US.UTF-8: sudo /usr/sbin/update-locale LANG=en_US.UTF-8. After rebooting my OS, the settings had applied and my locale was changed.
Now I don't need the overline character anymore, because I have learned to work with graphics libraries, but I have had multiple problems because of my locale. Thanks to everyone for the advice!

Use \u to indicate a unicode character: print("\u203e").

You need to use the combining character U+0304 instead.
print(u'a\u0304')
ā
U+0305 is probably a better choice (as viraptor suggests). You can also use the Unicode Roman numerals (U+2160 through U+217f) instead of regular uppercase Latin letters, although (at least in my terminal) they don't render as well with the overline.
print(u'\u2163\u0305')
Ⅳ̅
print u'I\u0305V\u0305'
I̅V̅

It seems like your sys.stdout.encoding is 'ascii'.
Try to set it to 'utf-8':
import sys, codecs
sys.stdout = codecs.getwriter('utf-8')(sys.stdout.detach())
print ('\u203e')

print(u' \u0305'*k)
with k as the length of the line in characters.
e.g.
print(u' \u0305'*5)

Related

Get rid of unicode characters in VSCode Interactive Python Environment

I'm using VSCode Version: 1.46.1 on Mac OS Catalina. I'm using the built-in Python interactive terminal Python 3.7.4 Whenever I print strings, it shows up with unicode, making it difficult to read, like so:
\\u201cI like what we have.\\u201d It is quiet and there is somebody else in the room. I tell my dog that I need to go and he says, \\u201cjust alright.\\u201d ~~I am hungry.\\n\\n
I have tried every flavor of un-escaping escaped characters. See here:
Unescaping escaped characters in a string using Python 3.2
And
Using unicode character u201c
But to no avail. I think the problem lies in the encoding options built into VSCode itself, but I'm not sure how to modify that.
Maybe this page could provide some information for you.
"\u201c" and "\u201d" means “ and ”, but they will not work, they should be "\u201c" and "\u201d".

ANSI escape code wont work on python interpreter [duplicate]

This question already has answers here:
Python: How can I make the ANSI escape codes to work also in Windows?
(11 answers)
Closed last month.
ANSI code wont work on my python interpreter
I wanted to color some of my prints on the project. I looked up on how to color printed characters and found the ANSI escape codes, so i tried it up on the interpreter, but it wont work.
for example:
print("\033[32m Hello")
it Gives me <-[32m Hello (an arrow left sign).
how do i make it work? can it work on python's interpreter? if not, where should i use it?
Note, this is a possible duplicate to this question answered by this post. A reiteration of the answer by #gary-vernon-grubb is posted below for convenience.
Use os.system('') to ensure that the ANSI escape sequence is processed correctly. An example in the Windows Command Prompt can be seen below:
Ensure that there are no spaces between the ANSI escape sequence and the color code! This was a bit of a pain in the neck for me.
You are best off installing other packages to help generate the ANSI sequences, iirc win32 console does not support ANSI colours natively. One option is to install colorama. The following snippet prints out red text.
import colorama
from colorama import Fore, Back, Style
colorama.init()
print(Fore.RED + 'This is red')
Edit: Upon researching a little more, I've realised that Windows 10 has support for ANSI Escape Sequence and you can do so by calling. Use this if you intend on copy and pasting.
import os
os.system("echo [31mThis is red[0m")
However i'd still prefer the former.
You could be using IDLE... in that case you can't have ANSI colours; the IDLE 'terminal' isn't really a terminal so ANSI codes will show up as a character, whether you type chr(0x1B) or \033 or \x1b; It's all the same.
Your arrow character is normal; I just get a box because I guess the default font doesn't support left arrows...?
But #thatotherguy's explanation might be right... unless you're using IDLE because in that case it's definitely the problem.

Python UTF-8 REGEX

I have a problem while trying to find text specified in regex.
Everything work perfectly fine but when i added "\£" to my regex it started causing problems. I get SyntaxError. "NON ASCII CHACTER "\xc2" in file (...) but no encoding declared...
I've tried to solve this problem with using
import sys
reload(sys) # to enable `setdefaultencoding` again
sys.setdefaultencoding("UTF-8")
but it doesnt help. I just want to build regular expression and use pound sign there. flag re.Unicode flag doesnt help, saving string as unicode (pat) doesnt help. Is there any solution to fix this regex? I just want to build regular expression and use pound sign there.Thanks for help.
k = text.encode('utf-8')
pat = u'salar.{1,6}?([0-9\-,\. \tkFFRroOMmTtAanNuUMm\$\&\;\£]{2,})'
pattern = re.compile(pat, flags = re.DOTALL|re.I|re.UNICODE)
salary = pattern.search(k).group(1)
print (salary)
Error is still there even if I comment(put "#" and skip all of those lines. Maybe its not connected with re. library but my settings?
The error message means Python cannot guess which character set you are using. It also tells you that you can fix it by telling it the encoding of your script.
# coding: utf-8
string = "£"
or equivalently
string = u"\u00a3"
Without an encoding declaration, Python sees a bunch of bytes which mean different things in different encodings. Rather than guess, it forces you to tell you what they mean. This is codified in PEP-263.
(ASCII is unambiguous [except if your system is EBCDIC I guess] so it knows what you mean if you use a pure-ASCII representation for everything.)
The encoding settings you were fiddling with affect how files and streams are read, and program I/O generally, but not how the program source is interpreted.

Python Printing in Terminal

I am sorry to post what I think may be a very basic question, but my attempts at solving this have been futile, and I can't find a useful solution that has already been suggested to similar questions on this site.
My basic issue is this: I am attempting to run a file (coding UTF-8) as a program in Mac terminal (running Python 2.7.5). This works fine when I print the results of a mathematical operations, but for some reason I cannot print a simple string of characters.
I have tried running both:
# coding: utf-8
print "Hello, World."
exit()
and
# coding: utf-8
print("Hello, World.")
exit()
Both return an invalid syntax error, with the caret pointing at first set of quotation marks that I've used. What am I missing here?
Thank you for your help!
It turned out that I needed to disable smart quotes in TextEdit.

How do I handle Python unicode strings with null-bytes the 'right' way?

Question
It seems that PyWin32 is comfortable with giving null-terminated unicode strings as return values. I would like to deal with these strings the 'right' way.
Let's say I'm getting a string like: u'C:\\Users\\Guest\\MyFile.asy\x00\x00sy'. This appears to be a C-style null-terminated string hanging out in a Python unicode object. I want to trim this bad boy down to a regular ol' string of characters that I could, for example, display in a window title bar.
Is trimming the string off at the first null byte the right way to deal with it?
I didn't expect to get a return value like this, so I wonder if I'm missing something important about how Python, Win32, and unicode play together... or if this is just a PyWin32 bug.
Background
I'm using the Win32 file chooser function GetOpenFileNameW from the PyWin32 package. According to the documentation, this function returns a tuple containing the full filename path as a Python unicode object.
When I open the dialog with an existing path and filename set, I get a strange return value.
For example I had the default set to: C:\\Users\\Guest\\MyFileIsReallyReallyReallyAwesome.asy
In the dialog I changed the name to MyFile.asy and clicked save.
The full path part of the return value was: u'C:\Users\Guest\MyFile.asy\x00wesome.asy'`
I expected it to be: u'C:\\Users\\Guest\\MyFile.asy'
The function is returning a recycled buffer without trimming off the terminating bytes. Needless to say, the rest of my code wasn't set up for handling a C-style null-terminated string.
Demo Code
The following code demonstrates null-terminated string in return value from GetSaveFileNameW.
Directions: In the dialog change the filename to 'MyFile.asy' then click Save. Observe what is printed to the console. The output I get is u'C:\\Users\\Guest\\MyFile.asy\x00wesome.asy'.
import win32gui, win32con
if __name__ == "__main__":
initial_dir = 'C:\\Users\\Guest'
initial_file = 'MyFileIsReallyReallyReallyAwesome.asy'
filter_string = 'All Files\0*.*\0'
(filename, customfilter, flags) = \
win32gui.GetSaveFileNameW(InitialDir=initial_dir,
Flags=win32con.OFN_EXPLORER, File=initial_file,
DefExt='txt', Title="Save As", Filter=filter_string,
FilterIndex=0)
print repr(filename)
Note: If you don't shorten the filename enough (for example, if you try MyFileIsReally.asy) the string will be complete without a null byte.
Environment
Windows 7 Professional 64-bit (no service pack), Python 2.7.1, PyWin32 Build 216
UPDATE: PyWin32 Tracker Artifact
Based on the comments and answers I have received so far, this is likely a pywin32 bug so I filed a tracker artifact.
UPDATE 2: Fixed!
Mark Hammond reported in the tracker artifact that this is indeed a bug. A fix was checked in to rev f3fdaae5e93d, so hopefully that will make the next release.
I think Aleksi Torhamo's answer below is the best solution for versions of PyWin32 before the fix.
I'd say it's a bug. The right way to deal with it would probably be fixing pywin32, but in case you aren't feeling adventurous enough, just trim it.
You can get everything before the first '\x00' with filename.split('\x00', 1)[0].
This doesn't happen on the version of PyWin32/Windows/Python I tested; I don't get any nulls in the returned string even if it's very short. You might investigate if a newer version of one of the above fixes the bug.
ISTR that I had this issue some years ago, then I discovered that such Win32 filename-dialog-related functions return a sequence of 'filename1\0filename2\0...filenameN\0\0', while including possible garbage characters depending on the buffer that Windows allocated.
Now, you might prefer a list instead of the raw return value, but that would be a RFE, not a bug.
PS When I had this issue, I quite understood why one would expect GetOpenFileName to possibly return a list of filenames, while I couldn't imagine why GetSaveFileName would. Perhaps this is considered as API uniformity. Who am I to know, anyway?

Categories

Resources