I am wondering why i can use print to print a unicode string in my OSX Terminal.app, but if i redirect stdout to a file or pipe it to 'more', i get an UnicodeEncodeError.
How does python decides whether it prints unicode or throws an exception.
Because your terminal encoding is set correctly and when you redirect to a file (or pipe) the encoding is set to the default encoding (ASCII in python2.) try print sys.stdout.encoding in both time (when you run your script as the terminal as stdout and when you redirect to a file) and you will see the difference.
Try also this in your command line:
$ python -c 'import sys; print sys.stdout.encoding;'
UTF8
$ python -c 'import sys; print sys.stdout.encoding;' | cat
None
More Info can be found HERE:
Related
If I run this command in ubuntu shell:
debconf-set-selections <<< 'postfix postfix/mailname string server.exmaple.com'
It runs successfully, but if I run it via python:
>>> from subprocess import run
>>> run("debconf-set-selections <<< 'postfix postfix/mailname string server.exmaple.com'", shell=True)
/bin/sh: 1: Syntax error: redirection unexpected
CompletedProcess(args="debconf-set-selections <<< 'postfix postfix/mailname string server.exmaple.com'", returncode=2)
I don't understand why python is trying to interpret whether there is redirection etc. How does one make the command successfully run so one can script installation of an application, e.g. postfix in this case via python (not a normal bash script)?
I have tried various forms with double and single quotes (as recommended in other posts), with no success.
subprocess uses /bin/sh as shell, and presumably your system's one does not support here-string (<<<), hence the error.
From subprocess source:
if shell:
# On Android the default shell is at '/system/bin/sh'.
unix_shell = ('/system/bin/sh' if
hasattr(sys, 'getandroidapilevel') else '/bin/sh')
You can run the command as an argument to any shell that supports here string e.g. bash:
run('bash -c "debconf-set-selections <<< \"postfix postfix/mailname string server.exmaple.com\""', shell=True)
Be careful with the quoting.
Or better you can stay POSIX and use echo and pipe to pass via STDIN:
run("echo 'postfix postfix/mailname string server.exmaple.com' | debconf-set-selections", shell=True)
This looks like a Python issue because using python print or call subprocess with cmd echo, I can get different results.(See examples in below)
The question is how to properly print wide chars in cmd.exe under chcp65001?
Is there an option in python print method or any special methods I need to do to adapt the Windows environment.
OS Win7 64bit.
==GOOD==
C:\>python -c "import subprocess; subprocess.run('cmd.exe /c echo 啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊', shell=True)"
啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊
==BAD==
C:\> python -c "print('啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊')"
啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊
�啊啊啊啊啊啊啊啊啊啊啊
��啊啊啊啊啊啊
�啊啊啊
�啊
C:\>chcp
Active code page: 65001
C:\>python -V
Python 3.5.2
There is already a Perl base fix:
C:\>python -c "print('啊啊啊啊啊啊啊啊啊啊啊啊啊')" | perl -Mutf8 -ne "BEGIN{binmode(STDIN,':unix:encoding(utf8):crlf');binmode(STDOUT, ':unix:encoding(utf8):crlf');}print"
啊啊啊啊啊啊啊啊啊啊啊啊啊
C:\>chcp
Active code page: 65001
Why am I getting the last octet repeated when my Perl program outputs a UTF-8 encoded string in cmd.exe?
It would be good to have the Python Solution
I am trying to run the following subprocess.Popen() command in python 2.6
output = subprocess.Popen("psexec -accepteula \\machineName -u username -p password cmd.exe /C fsutil fsinfo drives", stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
print(output.communicate())
But I get the following result with output.communicate()
('\r\n', '\r\nPsExec v2.11 - Execute processes remotely\r\nCopyright (C) 2001-2014 Mark Russinovich\r\nSysinternals - www.sysinternals.com\r\n\r\nConnecting to machineNameHere...\r\r\rStarting PSEXESVC service on machineNameHere...\r\r\rConnecting with PsExec service on machineNameHere...\r\r\rStarting cmd.exe on machineNameHere...\r\r\r\r\ncmd.exe exited on machineNameHere with error code 0.\r\n')
When I run the same psexec command from cmd line in windows, I get the correct output.
PsExec v2.11 - Execute processes remotely
Copyright (C) 2001-2014 Mark Russinovich
Sysinternals - www.sysinternals.com
**Drives: A:\ C:\ D:\**
cmd.exe exited on machineNameHere with error code 0.
I am looking for the output **Drives: A:\ C:\ D:** even while running the psexec command using subprocess.Popen(). Any way I can do it?
Now I narrowed down the issue by running different commands like dir and echo "test". The issue seems to be that Popen is reading only first line into stdout and not the complete output. Any suggestions on how to fix this?
(expanding on John Gordon's comment on the question)
you are trying to run this command through a shell:
"psexec -accepteula \\machineName -u username -p password cmd.exe /C fsutil fsinfo drives"
notice that you have double backslashes within the command (\\machineName). The backslash happens to be an escape character in string literals (used to escape characters that otherwise have a special meaning). Therefore, \\machineName is getting translated to \machineName before it is passed to the spawned process.
here are 2 ways to handle this:
1) prepend an escape character (another backslash) before each backslash:
"psexec -accepteula \\\\machineName"
2) add an 'r' before the string literal, which makes it a a raw string, and will not interpret the backlsashes as escape chars:
r"psexec -accepteula \\machineName"
Here is an example in the python interpreter that should make it clear. Notice the output that gets printed:
>>> print("\\machineName")
\machineName
>>> print("\\\\machineName")
\\machineName
>>> print(r"\\machineName")
\\machineName
When I type 'sys.getfilesystemencoding()' in shell, I got the result "utf-8"
>>>
>>> import sys
>>> sys.getfilesystemencoding()
'UTF-8'
>>>
But when I run in a WSGI script , I got the result "ANSI_X3.4-1968"
So why is it different?
This happens due to different script environment.
Notice what happens when I change LC_CTYPE in the following example:
└> LC_CTYPE=ANSI python -c 'import sys; print sys.getfilesystemencoding()'
ANSI_X3.4-1968
└> LC_CTYPE=en_US.UTF-8 python -c 'import sys; print sys.getfilesystemencoding()'
UTF-8
To fix this, assign en_US.UTF-8 value to LC_CTYPE environment variable for your wsgi script.
I have a native program written in Python that expects its input on stdin. As a simple example,
#!python3
import sys
with open('foo.txt', encoding='utf8') as f:
f.write(sys.stdin.read())
I want to be able to pass a (PowerShell) string to this program as standard input. Python expects its standard input in the encoding specified in $env:PYTHONIOENCODING, which I will typically set to UTF8 (so that I don't get any encoding errors).
But no matter what I do, characters get corrupted. I've searched the net and found suggestions to change [Console]::InputEncoding/[Console]::OutputEncoding, or to use chcp, but nothing seems to work.
Here's my basic test:
PS >[Console]::OutputEncoding.EncodingName
Unicode (UTF-8)
PS >[Console]::InputEncoding.EncodingName
Unicode (UTF-8)
PS >$env:PYTHONIOENCODING
utf-8
PS >python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())"
´╗┐?
PS >chcp 1252
Active code page: 1252
PS >python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())"
?
PS >chcp 65001
Active code page: 65001
PS >python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())"
?
How can I fix this problem?
I can't even explain what's going on here. Basically, I want the test (python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())") to print out a Euro sign. And to understand why, I have to do whatever is needed to get that to work :-) (Because then I can translate that knowledge to my real scenario, which is to be able to write working pipelines of Python programs that don't break when they encounter Unicode characters).
Thanks to mike z, the following works:
$OutputEncoding = [Console]::OutputEncoding = (new-object System.Text.UTF8Encoding $false)
$env:PYTHONIOENCODING = "utf-8"
python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())"
The new-object is needed to get a UTF-8 encoding without a BOM. The $OutputEncoding variable and [Console]::OutputEncoding both appear to need to be set.
I still don't fully understand the difference between the two encoding values, and why you would ever have them set differently (which appears to be the default).