I have a native program written in Python that expects its input on stdin. As a simple example,
#!python3
import sys
with open('foo.txt', encoding='utf8') as f:
f.write(sys.stdin.read())
I want to be able to pass a (PowerShell) string to this program as standard input. Python expects its standard input in the encoding specified in $env:PYTHONIOENCODING, which I will typically set to UTF8 (so that I don't get any encoding errors).
But no matter what I do, characters get corrupted. I've searched the net and found suggestions to change [Console]::InputEncoding/[Console]::OutputEncoding, or to use chcp, but nothing seems to work.
Here's my basic test:
PS >[Console]::OutputEncoding.EncodingName
Unicode (UTF-8)
PS >[Console]::InputEncoding.EncodingName
Unicode (UTF-8)
PS >$env:PYTHONIOENCODING
utf-8
PS >python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())"
´╗┐?
PS >chcp 1252
Active code page: 1252
PS >python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())"
?
PS >chcp 65001
Active code page: 65001
PS >python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())"
?
How can I fix this problem?
I can't even explain what's going on here. Basically, I want the test (python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())") to print out a Euro sign. And to understand why, I have to do whatever is needed to get that to work :-) (Because then I can translate that knowledge to my real scenario, which is to be able to write working pipelines of Python programs that don't break when they encounter Unicode characters).
Thanks to mike z, the following works:
$OutputEncoding = [Console]::OutputEncoding = (new-object System.Text.UTF8Encoding $false)
$env:PYTHONIOENCODING = "utf-8"
python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())"
The new-object is needed to get a UTF-8 encoding without a BOM. The $OutputEncoding variable and [Console]::OutputEncoding both appear to need to be set.
I still don't fully understand the difference between the two encoding values, and why you would ever have them set differently (which appears to be the default).
Related
This looks like a Python issue because using python print or call subprocess with cmd echo, I can get different results.(See examples in below)
The question is how to properly print wide chars in cmd.exe under chcp65001?
Is there an option in python print method or any special methods I need to do to adapt the Windows environment.
OS Win7 64bit.
==GOOD==
C:\>python -c "import subprocess; subprocess.run('cmd.exe /c echo 啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊', shell=True)"
啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊
==BAD==
C:\> python -c "print('啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊')"
啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊
�啊啊啊啊啊啊啊啊啊啊啊
��啊啊啊啊啊啊
�啊啊啊
�啊
C:\>chcp
Active code page: 65001
C:\>python -V
Python 3.5.2
There is already a Perl base fix:
C:\>python -c "print('啊啊啊啊啊啊啊啊啊啊啊啊啊')" | perl -Mutf8 -ne "BEGIN{binmode(STDIN,':unix:encoding(utf8):crlf');binmode(STDOUT, ':unix:encoding(utf8):crlf');}print"
啊啊啊啊啊啊啊啊啊啊啊啊啊
C:\>chcp
Active code page: 65001
Why am I getting the last octet repeated when my Perl program outputs a UTF-8 encoded string in cmd.exe?
It would be good to have the Python Solution
I am writing shellcode exploits with python3. However, when I try and output some hex bytes. e.g. using the line - python3 -c 'print("\x8c")' | xxd
The value in xxd is c28c, rather than the expected 8c
This issue does not occur in python2.
Your issue arises because Python 3 handles strings as Unicode, and print expects Unicode to encode some output for your terminal. Try the following to bypass this:
python3 -c "import sys; sys.stdout.buffer.write(b'\x8c')" | xxd
Just like the title addresses, how can this be done? I stupidly tried the following, but I will share the stupidity here so you can get an idea as to what I want to happen:
myself$ python help('modules') | pbcopy
Is this a good idea:
fout = open('output.txt', 'w')
fout.write(help('modules'))
On my Ubuntu, and hopefully on your boxen too (as it is a standard python feature), there is the handy pydoc command, thus it is very easy to type
pydoc modules | pbcopy
Use pydoc to look up documentation and print it.
Example:
$ python -c 'import pydoc; print pydoc.getdoc(id)'
id(object) -> integer
Return the identity of an object. This is guaranteed to be unique among
simultaneously existing objects. (Hint: it's the object's memory address.)
I don't know what is pbcopy, but I gouess this woul do the trick:
python -c 'import urllib; help(urllib)' | pbcopy
at least this is definitely works:
python -c 'import urllib; help(urllib)' > file
From man python:
-c command
Specify the command to execute (see next section). This terminates the option list (following options are passed as arguments to the command).
Update:
In order to copy this to clipboard you can add this to ~/.bashrc:
pc() { python -c "import $1; help($1);" | xclip -i -selection clipboard; }
then just call pc logging or pc my_module
Or you can pipe it to pbcopy or what ever works for you.
I am wondering why i can use print to print a unicode string in my OSX Terminal.app, but if i redirect stdout to a file or pipe it to 'more', i get an UnicodeEncodeError.
How does python decides whether it prints unicode or throws an exception.
Because your terminal encoding is set correctly and when you redirect to a file (or pipe) the encoding is set to the default encoding (ASCII in python2.) try print sys.stdout.encoding in both time (when you run your script as the terminal as stdout and when you redirect to a file) and you will see the difference.
Try also this in your command line:
$ python -c 'import sys; print sys.stdout.encoding;'
UTF8
$ python -c 'import sys; print sys.stdout.encoding;' | cat
None
More Info can be found HERE:
How can I make the following one liner print every file through Python?
python -c "import sys;print '>>',sys.argv[1:]" | dir *.*
Specifically would like to know how to pipe into a python -c.
DOS or Cygwin responses accepted.
python -c "import os; print os.listdir('.')"
If you want to apply some formatting like you have in your question,
python -c "import os; print '\n'.join(['>>%s' % x for x in os.listdir('.')])"
If you want to use a pipe, use xargs:
ls | xargs python -c "import sys; print '>>', sys.argv[1:]"
or backticks:
python -c "import sys; print '>>', sys.argv[1:]" `ls`
You can read data piped into a Python script by reading sys.stdin. For example:
ls -al | python -c "import sys; print sys.stdin.readlines()"
It is not entirely clear what you want to do (maybe I am stupid). The confusion comes from your example which is piping data out of a python script.
If you want to print all files:
find . -type f
If you want to print only the current directory's files
find . -type f -maxdepth 1
If you want to include the ">>" before each line
find . -type f -maxdepth 1 | xargs -L 1 echo ">>"
If you don't want the space between ">>" and $path from echo
find . -type f -maxdepth 1 | xargs -L 1 printf ">>%s\n"
This is all using cygwin, of course.
ls | python -c "import sys; print sys.stdin.read()"
just read stdin as normal for pipes
would like to know how to pipe though
You had the pipe the wrong way round, if you wanted to feed the output of ‘dir’ into Python, ‘dir’ would have to be on the left. eg.:
dir "*.*" | python -c "import sys;[sys.stdout.write('>>%s\n' % line) for line in sys.stdin]"
(The hack with the list comprehension is because you aren't allowed a block-introducing ‘for’ statement on one line after a semicolon.)
Clearly the Python-native solution (‘os.listdir’) is much better in practice.
Specifically would like to know how to pipe into a python -c
see cobbal's answer
piping through a program is transparent from the program's point of view, all the program knows is that it's getting input from the standard input stream
Generally speaking, a shell command of the form
A | B
redirects the output of A to be the input of B
so if A spits "asdf" to standard output, then B gets "asdf" into its standard input
the standard input stream in python is sys.stdin