I have C code which draws a vertical & a horizontal line in the center of screen as below:
#include<stdio.h>
#define HLINE for(i=0;i<79;i++)\
printf("%c",196);
#define VLINE(X,Y) {\
gotoxy(X,Y);\
printf("%c",179);\
}
int main()
{
int i,j;
clrscr();
gotoxy(1,12);
HLINE
for(y=1;y<25;y++)
VLINE(39,y)
return 0;
}
I am trying to convert it literally in python version 2.7.6:
import curses
def HLINE():
for i in range(0,79):
print "%c" % 45
def VLINE(X,Y):
curses.setsyx(Y,X)
print "%c" % 124
curses.setsyx(12,1)
HLINE()
for y in range(1,25):
VLINE(39,y)
My questions:
1.Do we have to change the position of x and y in setsyx function i.e, gotoxy(1,12) is setsyx(12,1) ?
2.Curses module is only available for unix not for windows?If yes, then what about windows(python 2.7.6)?
3.Why character value of 179 and 196 are � in python but in C, it is | and - respectively?
4.Above code in python is literally right or it needs some improvement?
Yes, you will have to change the argument positions. setsyx(y, x) and gotoxy(x, y)
There are Windows libraries made available. I find most useful binaries here: link
This most likely has to do with unicode formatting. What you could try to do is add the following line to the top of your python file (after the #!/usr/bin/python line) as this forces python to work with utf-8 encoding in String objects:
# -*- coding: utf-8 -*-
Your Python code to me looks acceptable enough, I wouldn't worry about it.
Yes.
Duplicate of Curses alternative for windows
Presumably you are using Python 2.x, thus your characters are bytes and therefore encoding-dependent. The meaning of a particular numeric value is determined by the encoding used. Most likely you are using utf8 on Linux and something non-utf8 in your Windows program, so you cannot compare the values. In curses you should use curses.ACS_HLINE and curses.ACS_VLINE.
You cannot mix print and curses functions, it will mess up the display. Use curses.addch or variants instead.
Related
I'm developing a cross-platform Python (3.7+) application, and I need to rely on sort order of TEXT columns in SQLite, meaning the comparison algorithm of TEXT values must be based on UTF-8 bytes. Even if the system encoding (sys.getdefaultencoding()) is not utf-8.
But in documentation of sqlite3 module I can't find an encoding option for sqlite3.connect.
And I read that the use of sys.setdefaultencoding("utf-8") is an ugly hack and highly discouraged (that's why we need to reload(sys) before calling it)
So what's the solution?
Looking at Python's _sqlite/connection.c code, either sqlite3_open_v2 or sqlite3_open is called (depending on a compile flag). And based on sqlite doc, both of them use UTF-8 as default database encoding. I'm still not sure about the meaning of word "default" since it doesn't mention any way to override it! But I it doesn't look like that Python can open with another encoding.
#ifdef SQLITE_OPEN_URI
Py_BEGIN_ALLOW_THREADS
rc = sqlite3_open_v2(database, &self->db,
SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE |
(uri ? SQLITE_OPEN_URI : 0), NULL);
#else
if (uri) {
PyErr_SetString(pysqlite_NotSupportedError, "URIs not supported");
return -1;
}
Py_BEGIN_ALLOW_THREADS
rc = sqlite3_open(database, &self->db);
#endif
I'm trying to port my Vim 8.0 configuration (~/.vimrc) to Python. That is, I'm setting Vim options as fields on vim.options mapping:
import vim
# set wildmenu
vim.options['wildmenu'] = True
# set wildcharm=<C-Z>
vim.options['wildcharm'] = ord('^Z') # [Literal ^Z (ASCII 26), CTRL-V CTRL-Z]
# set wildchar=<F10>
vim.options['wildchar'] = -15211 # extracted from Vim
The wildchar and wildcharm Vim options are of type "number". As far as I understand, they expect a kind of a keycode (at least in simple cases it is the ASCII code of the character in question).
In Vimscript, when you say something like set wildchar=<F10>, Vim translates the Vim-specific textual representation into a numeric keycode.
In Python, this is not the case (vim.options['wildchar'] = '<F10>' gives a TypeError).
For simple cases, it is possible to use ord() on a string containing the literally typed control character (see above with Ctrl-Z). However, a key like F10 produces multiple characters, so I can't use ord() on it.
In the end, I want to be able to do something like this:
vim.options['wildchar'] = magic('<F10>')
Does this magic() function exist?
Edit: I'm not asking how to invoke Vimscript code from Python (i. e. vim.command(...)). I understand that the encompassing problem can be trivially solved this way, but I'm asking a different question here.
:python vim.command("set wildchar=<F10>")
See the vim.command documentation for more explanation.
I am using a dictionary to store some character pairs in Python (I am replacing umlaut characters). Here is what it looks like:
umlautdict={
'ae': 'ä',
'ue': 'ü',
'oe': 'ö'
}
Then I run my inputwords through it like so:
for item in umlautdict.keys():
outputword=inputword.replace(item,umlautdict[item])
But this does not do anything (no replacement happens). When I printed out my umlautdict, I saw that it looks like this:
{'ue': '\xfc', 'oe': '\xf6', 'ae': '\xc3\xa4'}
Of course that is not what I want; however, trying things like unicode() (--> Error) or pre-fixing u did not improve things.
If I type the 'ä' or 'ö' into the replace() command by hand, everything works just fine. I also changed the settings in my script (working in TextWrangler) to # -*- coding: utf-8 -*- as it would net even let me execute the script containing umlauts without it.
So I don't get...
Why does this happen? Why and when do the umlauts change from "good
to evil" when I store them in the dictionary?
How do I fix it?
Also, if anyone knows: what is a good resource to learn about
encoding in Python? I have issues all the time and so many things
don't make sense to me / I can't wrap my head around.
I'm working on a Mac in Python 2.7.10. Thanks for your help!
Converting to Unicode is done by decoding your string (assuming you're getting bytes):
data = "haer ueber loess"
word = data.decode('utf-8') # actual encoding depends on your data
Define your dict with unicode strings as well:
umlautdict={
u'ae': u'ä',
u'ue': u'ü',
u'oe': u'ö'
}
and finally print umlautdict will print out some representation of that dict, usually involving escapes. That's normal, you don't have to worry about that.
Declare your coding.
Use raw format for the special characters.
Iterate properly on your string: keep the changes from each loop iteration as you head to the next.
Here's code to get the job done:
\# -*- coding: utf-8 -*-
umlautdict = {
'ae': r'ä',
'ue': r'ü',
'oe': r'ö'
}
print umlautdict
inputword = "haer ueber loess"
for item in umlautdict.keys():
inputword = inputword.replace(item, umlautdict[item])
print inputword
Output:
{'ue': '\xc3\xbc', 'oe': '\xc3\xb6', 'ae': '\xc3\xa4'}
här über löss
I am experiencing an odd behavior when using the locale library with unicode input. Below is a minimum working example:
>>> x = '\U0010fefd'
>>> ord(x)
1113853
>>> ord('\U0010fefd') == 0X10fefd
True
>>> ord(x) <= 0X10ffff
True
>>> import locale
>>> locale.strxfrm(x)
'\U0010fefd'
>>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
'en_US.UTF-8'
>>> locale.strxfrm(x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: character U+110000 is not in range [U+0000; U+10ffff]
I have seen this on Python 3.3, 3.4 and 3.5. I do not get an error on Python 2.7.
As far as I can see, my unicode input is within the appropriate unicode range, so it seems that somehow something internal to strxfrm when using the 'en_US.UTF-8' is moving the input out of range.
I am running Mac OS X, and this behavior may be related to http://bugs.python.org/issue23195... but I was under the impression this bug would only manifest as incorrect results, not a raised exception. I cannot replicate on my SLES 11 machine, and others confirm they cannot replicate on Ubuntu, Centos, or Windows. It may be instructive to hear about other OS's in the comments.
Can someone explain what may be happening here under the hood?
In Python 3.x, the function locale.strxfrm(s) internally uses the POSIX C function wcsxfrm(), which is based on current LC_COLLATE setting. The POSIX standard define the transformation in this way:
The transformation shall be such that if wcscmp() is applied to two
transformed wide strings, it shall return a value greater than, equal
to, or less than 0, corresponding to the result of wcscoll() applied
to the same two original wide-character strings.
This definition can be implemented in multiple ways, and doesn't even require that the resulting string is readable.
I've created a little C code example to demonstrate how it works:
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main() {
wchar_t buf[10];
wchar_t *in = L"\x10fefd";
int i;
setlocale(LC_COLLATE, "en_US.UTF-8");
printf("in : ");
for(i=0;i<10 && in[i];i++)
printf(" 0x%x", in[i]);
printf("\n");
i = wcsxfrm(buf, in, 10);
printf("out: ");
for(i=0;i<10 && buf[i];i++)
printf(" 0x%x", buf[i]);
printf("\n");
}
It prints the string before and after the transformation.
Running it on Linux (Debian Jessie) this is the result:
in : 0x10fefd
out: 0x1 0x1 0x1 0x1 0x552
while running it on OSX (10.11.1) the result is:
in : 0x10fefd
out: 0x103 0x1 0x110000
You can see that the output of wcsxfrm() on OSX contains the character U+110000 which is not permitted in a Python string, so this is the source of the error.
On Python 2.7 the error is not raised because its locale.strxfrm() implementation is based on strxfrm() C function.
UPDATE:
Investigating further, I see that the LC_COLLATE definition for en_US.UTF-8 on OSX is a link to la_LN.US-ASCII definition.
$ ls -l /usr/share/locale/en_US.UTF-8/LC_COLLATE
lrwxr-xr-x 1 root wheel 28 Oct 1 14:24 /usr/share/locale/en_US.UTF-8/LC_COLLATE -> ../la_LN.US-ASCII/LC_COLLATE
I found the actual definition in the sources from Apple. The content of file la_LN.US-ASCII.src is the following:
order \
\x00;...;\xff
2nd UPDATE:
I've further tested the wcsxfrm() function on OSX. Using the la_LN.US-ASCII collate, given a sequence of wide character C1..Cn as input, the output is a string with this form:
W1..Wn \x01 U1..Un
where
Wx = 0x103 if Cx > 0xFF else Cx+0x3
Ux = Cx+0x103 if Cx > 0xFF else Cx+0x3
Using this algorithm \x10fefd become 0x103 0x1 0x110000
I've checked and every UTF-8 locale use this collate on OSX, so I'm inclined to say that the collate support for UTF-8 on Apple systems is broken. The resulting ordering is almost the same of the one obtained whith normal byte comparison, with the bonus of the ability to obtain illegal Unicode characters.
I've been reading tutorials about Curses programming in Python, and many refer to an ability to use extended characters, such as line-drawing symbols. They're characters > 255, and the curses library knows how to display them in the current terminal font.
Some of the tutorials say you use it like this:
c = ACS_ULCORNER
...and some say you use it like this:
c = curses.ACS_ULCORNER
(That's supposed to be the upper-left corner of a box, like an L flipped vertically)
Anyway, regardless of which method I use, the name is not defined and the program thus fails. I tried "import curses" and "from curses import *", and neither works.
Curses' window() function makes use of these characters, so I even tried poking around on my box for the source to see how it does it, but I can't find it anywhere.
you have to set your local to all, then encode your output as utf-8 as follows:
import curses
import locale
locale.setlocale(locale.LC_ALL, '') # set your locale
scr = curses.initscr()
scr.clear()
scr.addstr(0, 0, u'\u3042'.encode('utf-8'))
scr.refresh()
# here implement simple code to wait for user input to quit
scr.endwin()
output:
あ
From curses/__init__.py:
Some constants, most notably the ACS_*
ones, are only added to the C
_curses module's dictionary after initscr() is called. (Some versions
of SGI's curses don't define values
for those constants until initscr()
has been called.) This wrapper
function calls the underlying C
initscr(), and then copies the
constants from the
_curses module to the curses package's dictionary. Don't do 'from curses
import *' if you'll be needing the
ACS_* constants.
In other words:
>>> import curses
>>> curses.ACS_ULCORNER
exception
>>> curses.initscr()
>>> curses.ACS_ULCORNER
>>> 4194412
I believe the below is appropriately related, to be posted under this question. Here I'll be using utfinfo.pl (see also on Super User).
First of all, for standard ASCII character set, the Unicode code point and the byte encoding is the same:
$ echo 'a' | perl utfinfo.pl
Char: 'a' u: 97 [0x0061] b: 97 [0x61] n: LATIN SMALL LETTER A [Basic Latin]
So we can do in Python's curses:
window.addch('a')
window.border('a')
... and it works as intended
However, if a character is above basic ASCII, then there are differences, which addch docs don't necessarily make explicit. First, I can do:
window.addch(curses.ACS_PI)
window.border(curses.ACS_PI)
... in which case, in my gnome-terminal, the Unicode character 'π' is rendered. However, if you inspect ACS_PI, you'll see it's an integer number, with a value of 4194427 (0x40007b); so the following will also render the same character (or rater, glyph?) 'π':
window.addch(0x40007b)
window.border(0x40007b)
To see what's going on, I grepped through the ncurses source, and found the following:
#define ACS_PI NCURSES_ACS('{') /* Pi */
#define NCURSES_ACS(c) (acs_map[NCURSES_CAST(unsigned char,c)])
#define NCURSES_CAST(type,value) static_cast<type>(value)
#lib_acs.c: NCURSES_EXPORT_VAR(chtype *) _nc_acs_map(void): MyBuffer = typeCalloc(chtype, ACS_LEN);
#define typeCalloc(type,elts) (type *)calloc((elts),sizeof(type))
#./widechar/lib_wacs.c: { '{', { '*', 0x03c0 }}, /* greek pi */
Note here:
$ echo '{π' | perl utfinfo.pl
Got 2 uchars
Char: '{' u: 123 [0x007B] b: 123 [0x7B] n: LEFT CURLY BRACKET [Basic Latin]
Char: 'π' u: 960 [0x03C0] b: 207,128 [0xCF,0x80] n: GREEK SMALL LETTER PI [Greek and Coptic]
... neither of which relates to the value of 4194427 (0x40007b) for ACS_PI.
Thus, when addch and/or border see a character above ASCII (basically an unsigned int, as opposed to unsigned char), they (at least in this instance) use that number not as Unicode code point, or as UTF-8 encoded bytes representation - but instead, they use it as a look-up index for acs_map-ping function (which ultimately, however, would return the Unicode code point, even if it emulates VT-100). That is why the following specification:
window.addch('π')
window.border('π')
will fail in Python 2.7 with argument 1 or 3 must be a ch or an int; and in Python 3.2 would render simply a space instead of a character. When we specify 'π'. we've actually specified the UTF-8 encoding [0xCF,0x80] - but even if we specify the Unicode code point:
window.addch(0x03C0)
window.border0x03C0)
... it simply renders nothing (space) in both Python 2.7 and 3.2.
That being said - the function addstr does accept UTF-8 encoded strings, and works fine:
window.addstr('π')
... but for borders - since border() apparently handles characters in the same way addch() does - we're apparently out of luck, for anything not explicitly specified as an ACS constant (and there's not that many of them, either).
Hope this helps someone,
Cheers!