I am writing shellcode exploits with python3. However, when I try and output some hex bytes. e.g. using the line - python3 -c 'print("\x8c")' | xxd
The value in xxd is c28c, rather than the expected 8c
This issue does not occur in python2.
Your issue arises because Python 3 handles strings as Unicode, and print expects Unicode to encode some output for your terminal. Try the following to bypass this:
python3 -c "import sys; sys.stdout.buffer.write(b'\x8c')" | xxd
Related
using perl,
$ perl -e 'print "\xca"' > out
now $ xxd out
we have
00000000: ca
But with Python, I tried
$ python3 -c 'print("\xca", end="")' > out
$ xxd out
what I got is
00000000: c38a
I'm not sure what is going on.
So in Python, a str object is a series of unicode code points. How this is printed to the screen depends on the encoding of your sys.stdout. This is picked based on your locale (or possibly various environment variables can affect this, but by default, it is your locale). So yours must be set to UTF-8. That's my default too:
(py311) Juans-MBP:~ juan$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
(py311) Juans-MBP:~ juan$ python -c "print('\xca', end='')" | xxd
00000000: c38a
However, if I override my locale and tell it to use en_US.ISO8859-1 (latin-1), a single-byte encoding, we get what you expect:
(py311) Juans-MBP:~ juan$ LC_ALL="en_US.ISO8859-1" python -c "print('\xca', end='')" | xxd
00000000: ca
The solution is to work with raw bytes if you want raw bytes. The way to do that in Python source code is to use a bytes literal (or a string literal and then .encode it). We can use the raw buffer at sys.stdout.buffer:
(py311) Juans-MBP:~ juan$ python -c "import sys; sys.stdout.buffer.write(b'\xca')" | xxd
00000000: ca
Or by encoding a string to a bytes object:
(py311) Juans-MBP:~ juan$ python -c "import sys; sys.stdout.buffer.write('\xca'.encode('latin'))" | xxd
00000000: ca
In python \xca is interpreted as a two-byte string in the UTF-8
encoding and that's why when a value is written inside a file it
automatically stored two bytes in the file as c3 8a
But in perl \xca is interpreted as a single byte with the hexadecimal
value 0xca and for that when the value is stored inside the file it will save
without encoding.
You can check more details
I am trying to construct an AWS Signature v4 Auth header to call the STS GetCallerIdentity API as per the documentation at https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-header-based-auth.html in Bash.
Now, I have the same process working in Python, and after poring minutely over my scripts and outputs in Python and Bash, I see that the SHA256 calculated in Bash for the string is different from the one calculated in Python.
The strings look the same in my text editor - character by character.
But since the SHA256 differs, I am assuming that this much be a problem with encoding of the string.
The Python script uses UTF8, and even though I have tried doing a printf "%s" "${string}" | iconv -t utf-8 | openssl dgst -sha256 in the Bash script, the hash values still differ.
How do I convert bash strings/variables to UTF8 before calculating the SHA256 sum.
It might helpful to see how you're calculating it in Python. From what I can see, it looks like the output is the same.
$ python -c "import hashlib; \
print(hashlib.sha256('test'.encode('utf8')).digest().hex())"
9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08
$ printf "%s" test | openssl dgst -sha256
(stdin)= 9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08
How can I get the same sha256 hash in terminal (Mac/Linux) and Python?
Tried different versions of the examples below, and search on StackOverflow.
Terminal:
echo 'test text' | shasum -a 256
c2a4f4903509957d138e216a6d2c0d7867235c61088c02ca5cf38f2332407b00
Python3:
import hashlib
hashlib.sha256(str("test text").encode('utf-8')).hexdigest()
'0f46738ebed370c5c52ee0ad96dec8f459fb901c2ca4e285211eddf903bf1598'
Update:
Different from Why is an MD5 hash created by Python different from one created using echo and md5sum in the shell? because in Python3 you need to explicitly encode, and I need the solution in Python, not just in terminal. The "duplicate" will not work on files:
example.txt content:
test text
Terminal:
shasum -a 256 example.txt
c2a4f4903509957d138e216a6d2c0d7867235c61088c02ca5cf38f2332407b00
The echo built-in will add a trailing newline yielding a different string, and thus a different hash. Do it like so
echo -n 'test text' | shasum -a 256
If you indeed intended to also hash the newline (I advice against this as it violates POLA), it needs to be fixed up in python like so
hashlib.sha256("{}\n".format("test text").encode('utf-8')).hexdigest()
I have a native program written in Python that expects its input on stdin. As a simple example,
#!python3
import sys
with open('foo.txt', encoding='utf8') as f:
f.write(sys.stdin.read())
I want to be able to pass a (PowerShell) string to this program as standard input. Python expects its standard input in the encoding specified in $env:PYTHONIOENCODING, which I will typically set to UTF8 (so that I don't get any encoding errors).
But no matter what I do, characters get corrupted. I've searched the net and found suggestions to change [Console]::InputEncoding/[Console]::OutputEncoding, or to use chcp, but nothing seems to work.
Here's my basic test:
PS >[Console]::OutputEncoding.EncodingName
Unicode (UTF-8)
PS >[Console]::InputEncoding.EncodingName
Unicode (UTF-8)
PS >$env:PYTHONIOENCODING
utf-8
PS >python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())"
´╗┐?
PS >chcp 1252
Active code page: 1252
PS >python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())"
?
PS >chcp 65001
Active code page: 65001
PS >python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())"
?
How can I fix this problem?
I can't even explain what's going on here. Basically, I want the test (python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())") to print out a Euro sign. And to understand why, I have to do whatever is needed to get that to work :-) (Because then I can translate that knowledge to my real scenario, which is to be able to write working pipelines of Python programs that don't break when they encounter Unicode characters).
Thanks to mike z, the following works:
$OutputEncoding = [Console]::OutputEncoding = (new-object System.Text.UTF8Encoding $false)
$env:PYTHONIOENCODING = "utf-8"
python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())"
The new-object is needed to get a UTF-8 encoding without a BOM. The $OutputEncoding variable and [Console]::OutputEncoding both appear to need to be set.
I still don't fully understand the difference between the two encoding values, and why you would ever have them set differently (which appears to be the default).
I am wondering why i can use print to print a unicode string in my OSX Terminal.app, but if i redirect stdout to a file or pipe it to 'more', i get an UnicodeEncodeError.
How does python decides whether it prints unicode or throws an exception.
Because your terminal encoding is set correctly and when you redirect to a file (or pipe) the encoding is set to the default encoding (ASCII in python2.) try print sys.stdout.encoding in both time (when you run your script as the terminal as stdout and when you redirect to a file) and you will see the difference.
Try also this in your command line:
$ python -c 'import sys; print sys.stdout.encoding;'
UTF8
$ python -c 'import sys; print sys.stdout.encoding;' | cat
None
More Info can be found HERE: