SHA256 of visually similar strings differs in Bash vs Python - python

I am trying to construct an AWS Signature v4 Auth header to call the STS GetCallerIdentity API as per the documentation at https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-header-based-auth.html in Bash.
Now, I have the same process working in Python, and after poring minutely over my scripts and outputs in Python and Bash, I see that the SHA256 calculated in Bash for the string is different from the one calculated in Python.
The strings look the same in my text editor - character by character.
But since the SHA256 differs, I am assuming that this much be a problem with encoding of the string.
The Python script uses UTF8, and even though I have tried doing a printf "%s" "${string}" | iconv -t utf-8 | openssl dgst -sha256 in the Bash script, the hash values still differ.
How do I convert bash strings/variables to UTF8 before calculating the SHA256 sum.

It might helpful to see how you're calculating it in Python. From what I can see, it looks like the output is the same.
$ python -c "import hashlib; \
print(hashlib.sha256('test'.encode('utf8')).digest().hex())"
9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08
$ printf "%s" test | openssl dgst -sha256
(stdin)= 9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08

Related

How to use here-string in PowerShell to write pip.ini and .condarc file properly?

Disclaimer: I am getting more and more accustomed with PowerShell, but I am rather inexperienced with PowerShell.
I would like to use PowerShell together with here-string syntax to write the pip.ini and .condarc configuration files to use the Python package managers pip, conda, respectively.
With the .condarc, there is no error, but I think I had to rewrite it again in Notepad++ to really make it work - I think that it is a file encoding issue:
mkdir 'C:\ProgramData\conda\.condarc'
echo #"
show_channel_urls: true
allow_other_channels: false
report_errors: false
remote_read_timeout_secs: 120
"# > C:\ProgramData\conda\.condarc
And the following gives an error for pip.ini because of [global]:
mkdir 'C:\ProgramData\pip\pip.ini'
echo #"
[global]
index = https://xxxx/nexus/repository/xxxx/pypi
index-url = https://xxxx:8443/nexus/repository/xxxx/simple
trusted-host = xxxx:8443
"# > C:\ProgramData\pip\pip.ini
Get-Content pip.ini works well, but pip config list -v returns:
PS C:\Program Files> pip config list -v
Configuration file could not be loaded.
File contains no section headers.
file: 'C:\\ProgramData\\pip\\pip.ini', line: 1
'ÿþ[\x00g\x00l\x00o\x00b\x00a\x00l\x00]\x00\n'
Remark: xxxx represents sensitive company information, therefore replaces real text.
I also tried to escape the square brackets with `, but without success.
Is there a way to specify some file encoding like UTF-8 above or can the problem be somehow solved in another automatized way?
In Windows Powershell, the redirection operators use Unicode encoding (in other words, UTF-16 with the little-endian byte order.) That's why you see such weird file content. Either
run your code from PowerShell Core (pwsh.exe), or
use the Out-File cmdlet with its Encoding parameter instead of > redirector.
However, note that Windows PowerShell tends to add a UTF-8 byte order mark using the following code snippet:
$MyRawString = #"
[global]
index = https://xxxx/nexus/repository/xxxx/pypi
index-url = https://xxxx:8443/nexus/repository/xxxx/simple
trusted-host = xxxx:8443
"#
$MyPath = "C:\ProgramData\pip\pip.ini"
$MyRawString | Out-file -FilePath $MyPath -Encoding utf8
Solution for Windows PowerShell. Using .NET's UTF8Encoding class and passing $False to the constructor seems to work (stolen from this M. Dudley's answer):
$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
[System.IO.File]::WriteAllLines($MyPath, $MyRawString, $Utf8NoBomEncoding)
Explanation Get-Help About_Redirection
Windows PowerShell (powershell.exe)
When you are writing to files, the redirection operators use Unicode
encoding. If the file has a different encoding, the output might not
be formatted correctly. To redirect content to non-Unicode files, use
the Out-File cmdlet with its Encoding parameter.
PowerShell Core (pwsh.exe, version 6+)
When you are writing to files, the redirection operators use
UTF8NoBOM encoding. If the file has a different encoding, the output
might not be formatted correctly. To write to files with a different
encoding, use the Out-File cmdlet with its Encoding
parameter.
Note that there is an error in the online version of the About_Redirection Help topic for PowersShell 5.1…
Out-File Default in Windows PowerShell 5.1 is Unicode in PowerShell Core 7+ it's UTF8.
You're using mkdir to create a file which doesn't make sense to me.
There's no need to pre-create the file anyhow. So, combining the above maybe something like:
#"
show_channel_urls: true
allow_other_channels: false
report_errors: false
remote_read_timeout_secs: 120
"# | Out-File C:\ProgramData\conda\.condarc
#"
[global]
index = https://xxxx/nexus/repository/xxxx/pypi
index-url = https://xxxx:8443/nexus/repository/xxxx/simple
trusted-host = xxxx:8443
"# | Out-File C:\ProgramData\pip\pip.ini

Evaluate "Shell command line with shell variables" from python OR evaluate python string as shell command line

CONTEXT
I am working on a simulation cluster.
In order to make as flexible as possible (working with different simulation soft) , we created a python file that parse a config file defining environment variables, and command line to start the simulation. This command is launched through SLURM sbatch command (shell $COMMAND)
ISSUE
From python, all Environment variables are enrolled reading the config file
I have issue with variable COMMAND that is using other environment variables (displayed as shell variable)
For example
COMMAND = "fluent -3ddp -n$NUMPROCS -hosts=./hosts -file $JOBFILE"
os.environ['COMMAND']=COMMAND
NUMPROCS = "32"
os.environ['NUMPROCS']=NUMPROCS
[...]
exe = Popen(['sbatch','template_document.sbatch'], stdout=PIPE, stderr=PIPE)
sbatch distribute COMMAND to all simulation nodes as COMMAND being a command line
COMMAND recalls other saved env. variables. Shell interprets it strictly as text... Which makes the command line fails. it is strictly as a string using $ not variable for example :
'fluent -3ddp -n$NUMPROCS -hosts=./hosts -file $JOBFILE'
SOLUTION I AM LOOKING FOR
I am looking for a simple solution
Solution 1: A 1 to 3 python command lines to evaluate the COMMAND as shell command to echo
Solution 2: A Shell command to evaluate the variables within the "string" $COMMAND as a variable
At the end the command launched from within sbatch should be
fluent -3ddp -n32 -hosts=./hosts -file /path/to/JOBFILE
You have a few options:
Partial or no support for bash's variable substitution, e.g. implement some python functionality to reproduces bash's $VARIABLE syntax.
Reproduce all of bash's variable substitution facilities which are supported in the config file ($VARIABLE, ${VARIABLE}, ${VARIABLE/x/y}, $(cmd) - whatever.
Let bash do the heavy lifting, for the cost of performance and possibly security, depending on your trust of the content of the config files.
I'll show the third one here, since it's the most resilient (again, security issues notwithstanding). Let's say you have this config file, config.py:
REGULAR = "some-text"
EQUALS = "hello = goodbye" # trap #1: search of '='
SUBST = "decorated $REGULAR"
FANCY = "xoxo${REGULAR}xoxo"
CMDOUT = "$(date)"
BASH_A = "trap" # trap #2: avoid matching variables like BASH_ARGV
QUOTES = "'\"" # trap #3: quoting
Then your python program can run the following incantation:
bash -c 'source <(sed "s/^/export /" config.py | sed "s/[[:space:]]*=[[:space:]]*/=/") && env | grep -f <(cut -d= -f1 config.py | grep -E -o "\w+" | sed "s/.*/^&=/")'
which will produce the following output:
SUBST=decorated some-text
CMDOUT=Thu Nov 28 12:18:50 PST 2019
REGULAR=some-text
QUOTES='"
FANCY=xoxosome-textxoxo
EQUALS=hello = goodbye
BASH_A=trap
Which you can then read with python, but note that the quotes are now gone, so you'll have to account for that.
Explanation of the incantation:
bash -c 'source ...instructions... && env | grep ...expressions...' tells bash to read & interpret the instructions, then grep the environment for the expressions. We're going to turn the config file into instructions which modify bash's environment.
If you try using set instead of env, the output will be inconsistent with respect to quoting. Using env avoids trap #3.
Instructions: We're going to create instructions for the form:
export FANCY="xoxo${REGULAR}xoxo"
so that bash can interpret them and env can read them.
sed "s/^/export /" config.py prefixes the variables with export.
sed "s/[[:space:]]*=[[:space:]]*/=/" converts the assignment format to syntax that bash can read with source. Using s/x/y/ instead of s/x/y/g avoids trap #1.
source <(...command...) causes bash to treat the output of the command as a file and run its lines, one by one.
Of course, one way to avoid this complexity is to have the file use bash syntax to begin with. If that were the case, we would use source config.sh instead of source <(...command...).
Expressions: We want to grep the output of env for patterns like ^FANCY=.
cut -d= -f1 config.py | grep -E -o "\w+" finds the variable names in config.py.
sed "s/.*/^&=/" turns variable names like FANCY to grep search expressions such as ^FANCY=. This is to avoid trap #2.
grep -f <(...command...) gets grep to treat the output of the command as a file containing one search expression in each line, which in this case would be ^FANCY=, ^CMDOUT= etc.
EDIT
Since you actually want to just pass this environment to another bash command rather than use it in python, you can actually just have python run this:
bash -c 'source <(sed "s/^/export /" config.py | sed "s/[[:space:]]*=[[:space:]]*/=/") && $COMMAND'
(assuming that COMMAND is specified in the config file).
It seems I have not explained well enough the issue, but your 3rd solution seems replying to my expectations... though so far I did not manage to adapt it
Based on your 3rd solution BASH, I will make it more straight :
Let's say I have got following after running python, and this that cannot be modified
EXPORT COMMAND='fluent -3ddp -n$NUMPROCS -hosts=./hosts -file $JOBFILE'
EXPORT JOBFILE='/path/to/jobfile'
EXPORT NUMPROCS='32'
EXPORT WHATSOEVER='SPECIFIC VARIABLE TO SIMULATION SOFTWARE'
I wish to execute the following from the slurm batch file (bash), using $COMMAND / $JOBFILE /$NUMPROCS
fluent -3ddp -n32-hosts=./hosts -file /path/to/jobfile
Please note : I have backup solution in python - I managed to substitute $VARIABLE by its value - based on the assumption $VARIABLE is not composed by another $variable... using regex substitution... just it looks so many lines to have what seemed to me simple request

How to get the same hash in Python3 and Mac / Linux terminal?

How can I get the same sha256 hash in terminal (Mac/Linux) and Python?
Tried different versions of the examples below, and search on StackOverflow.
Terminal:
echo 'test text' | shasum -a 256
c2a4f4903509957d138e216a6d2c0d7867235c61088c02ca5cf38f2332407b00
Python3:
import hashlib
hashlib.sha256(str("test text").encode('utf-8')).hexdigest()
'0f46738ebed370c5c52ee0ad96dec8f459fb901c2ca4e285211eddf903bf1598'
Update:
Different from Why is an MD5 hash created by Python different from one created using echo and md5sum in the shell? because in Python3 you need to explicitly encode, and I need the solution in Python, not just in terminal. The "duplicate" will not work on files:
example.txt content:
test text
Terminal:
shasum -a 256 example.txt
c2a4f4903509957d138e216a6d2c0d7867235c61088c02ca5cf38f2332407b00
The echo built-in will add a trailing newline yielding a different string, and thus a different hash. Do it like so
echo -n 'test text' | shasum -a 256
If you indeed intended to also hash the newline (I advice against this as it violates POLA), it needs to be fixed up in python like so
hashlib.sha256("{}\n".format("test text").encode('utf-8')).hexdigest()

Outputting hex values in python3

I am writing shellcode exploits with python3. However, when I try and output some hex bytes. e.g. using the line - python3 -c 'print("\x8c")' | xxd
The value in xxd is c28c, rather than the expected 8c
This issue does not occur in python2.
Your issue arises because Python 3 handles strings as Unicode, and print expects Unicode to encode some output for your terminal. Try the following to bypass this:
python3 -c "import sys; sys.stdout.buffer.write(b'\x8c')" | xxd

How do I pipe Unicode into a native application in PowerShell

I have a native program written in Python that expects its input on stdin. As a simple example,
#!python3
import sys
with open('foo.txt', encoding='utf8') as f:
f.write(sys.stdin.read())
I want to be able to pass a (PowerShell) string to this program as standard input. Python expects its standard input in the encoding specified in $env:PYTHONIOENCODING, which I will typically set to UTF8 (so that I don't get any encoding errors).
But no matter what I do, characters get corrupted. I've searched the net and found suggestions to change [Console]::InputEncoding/[Console]::OutputEncoding, or to use chcp, but nothing seems to work.
Here's my basic test:
PS >[Console]::OutputEncoding.EncodingName
Unicode (UTF-8)
PS >[Console]::InputEncoding.EncodingName
Unicode (UTF-8)
PS >$env:PYTHONIOENCODING
utf-8
PS >python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())"
´╗┐?
PS >chcp 1252
Active code page: 1252
PS >python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())"
?
PS >chcp 65001
Active code page: 65001
PS >python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())"
?
How can I fix this problem?
I can't even explain what's going on here. Basically, I want the test (python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())") to print out a Euro sign. And to understand why, I have to do whatever is needed to get that to work :-) (Because then I can translate that knowledge to my real scenario, which is to be able to write working pipelines of Python programs that don't break when they encounter Unicode characters).
Thanks to mike z, the following works:
$OutputEncoding = [Console]::OutputEncoding = (new-object System.Text.UTF8Encoding $false)
$env:PYTHONIOENCODING = "utf-8"
python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())"
The new-object is needed to get a UTF-8 encoding without a BOM. The $OutputEncoding variable and [Console]::OutputEncoding both appear to need to be set.
I still don't fully understand the difference between the two encoding values, and why you would ever have them set differently (which appears to be the default).

Categories

Resources