How do I translate a hash function from Python to R

How do I translate a hash function from Python to R - python

I'm trying to translate the following python script to R, but I'm having difficulty that reflects that fact that I am not well versed with Python or R.
Here is what I have for Python:
import hashlib, hmac
print hmac.new('123456', 'hello'.encode('utf-8'),hashlib.sha256).digest()
When I run this in Python I'm getting a message that says standard output is empty.
Question: What am I doing wrong?
Here's what I'm using for R
library('digest')
hmac('123456','hello', algo='sha256', serialize=FALSE)
My questions with the R code are:
How do I encode to utf-8 in R. I couldn't find a package.
What are the correct parameter settings for serialize and raw for R given I want to match the output of the Python function above (once its working).

If you want to get the bytes of the hash in R, set raw=TRUE. Then you can write it out as a binary fine
library('digest')
x <- hmac('123456', enc2utf8('hello'), algo='sha256', serialize=FALSE, raw=TRUE)
writeBin(x, "Rout.txt")
If you're not outputting text, the encoding doesn't matter. These are raw bytes. The only different in the output is that the python print seems to be adding a new line character. If I hexdump on the R file i see
0000000 ac 28 d6 02 c7 67 42 4d 0c 80 9e de bf 73 82 8b
0000010 ed 5c e9 9c e1 55 6f 4d f8 e2 23 fa ee c6 0e dd

Related

How to convert private key to base64 format?

I have a private key with the below format. The private key is just for testing.
xprvA3bdZ5Dz3QFmyC6Y7tKeJahknnUZPgpw2Zhf7LNmZ1uLfJo2b557DpPBeBVW6Etbggpnd6VRUEWvKUj3NnBuU1MeWH8CY7eVTQ2yvZUXYSq
That is very likely in a Ethereum format. I need to convert it to a base64 format, in order to use it in signatures. (filecoin transaction need to base64 encoded format of privatekey)
For sending Filecoin transaction I need to use this method:
transactionSignLotus(unSignedMessage, privatekey)
## transactionSignLotus (support Lotus schema)
Sign a transaction and return a JSON string of the signed transaction which can then be sent to a lotus node.
Arguments:
* **transaction**: a filecoin transaction;
* **privatekey**: a private key (hexstring or buffer);
My privatekey doesn't work for this method. I think I need to convert to base64, but I used most of the ways for converting the private key to base64 format, cannot got results.
Note: When I say base64 I mean something like this:
px2g1zwEd1+EMfj4nX1oh0roouBGHhPo7QUNkPBHk1Q=

$ cat 68879302.js
let bip32 = require('bitcoinjs-lib').bip32
let x = bip32.fromBase58('xprvA3bdZ5Dz3QFmyC6Y7tKeJahknnUZPgpw2Zhf7LNmZ1uLfJo2b557DpPBeBVW6Etbggpnd6VRUEWvKUj3NnBuU1MeWH8CY7eVTQ2yvZUXYSq')
console.log(x.__D)
$ node 68879302.js
<Buffer cf 8c f7 64 cb d5 6b 5c 40 fa d6 7c 40 9d 52 2e c1 17 e4 93 16 dd 03 ec ec c4 1f 6c 80 6c ee d4>

import base64
key = b'xprvA3bdZ5Dz3QFmyC6Y7tKeJahknnUZPgpw2Zhf7LNmZ1uLfJo2b557DpPBeBVW6Etbggpnd6VRUEWvKUj3NnBuU1MeWH8CY7eVTQ2yvZUXYSq'
b64 = base64.b64encode(key)
print(b64)
# got b'eHBydkEzYmRaNUR6M1FGbXlDNlk3dEtlSmFoa25uVVpQZ3B3MlpoZjdMTm1aMXVMZkpvMmI1NTdEcFBCZUJWVzZFdGJnZ3BuZDZWUlVFV3ZLVWozTm5CdVUxTWVXSDhDWTdlVlRRMnl2WlVYWVNx'
Did it help?

Assuming you use *nix with bash tools:
$> echo -n “* privatekey: PUT-YOUR-HEX-HERE” | base64 -w 0

Converting broken byte string from unicode back to corresponding bytes

The following code retrieves an iterable object of strings in rows which contains a PDF byte stream. The string row was type of str. The resulting file was a PDF format and could be opened.
with open(fname, "wb") as fd:
for row in rows:
fd.write(row)
Due to a new C-Library and changes in the Python implementation the str changes to unicode. And the corresponding content changed as well so my PDF file is broken.
Starting bytes of first row object:
old row[0]: 25 50 44 46 2D 31 2E 33 0D 0A 25 E2 E3 CF D3 0D 0A ...
new row[0]: 25 50 44 46 2D 31 2E 33 0D 0A 25 C3 A2 C3 A3 C3 8F C3 93 0D 0A ...
I adjust the corresponding byte positions here so it looks like a unicode problem.
I think this is a good start but I still have a unicode string as input...
>>> "\xc3\xa2".decode('utf8') # but as input I have u"\xc3\xa2"
u'\xe2'
I already tried several calls of encode and decode so I need a more analytical way to fix this. I can't see the wood for the trees. Thank you.

When you find u"\xc3\xa2" in a Python unicode string, it often means that you have read an UTF-8 encoded file as is it was Latin1 encoded. So the best thing to do is certainly to fix the initial read.
That being said if you have to depend on broken code, the fix is still easy: you just encode the string as Latin1 and then decode it as UTF-8:
fixed_u_str = broken_u_str.encode('Latin1').decode('UTF-8')
For example:
u"\xc3\xa2\xc3\xa3".encode('Latin1').decode('utf8')
correctly gives u"\xe2\xe3" which displays as âã

This looks like you should be doing
fd.write(row.encode('utf-8'))
assuming the type of row is now unicode (this is my understanding of how you presented things).

PyCrypto AES encryption not working as expected

I am creating a Python function to perform counter mode encryption using the PyCrypto module. I am aware of the builtin, but want to implement it myself.
I'm trying Test Vector #1 from RFC 3686, and have the correct Counter Block and the correct Key in ASCII form. But when I encrypt the Counter Block using the Key, I don't get the expected Key Stream.
The relevant parts of my code:
cipher = AES.new(key)
ctr_block = iv + nonce + ctr
key_stream = base64.b64decode(cipher.encrypt(ctr_block))
I can provide more code if needed, but I'm not sure how because ctr_block and key have many question mark characters when I print them.
Why am I not getting the expected answer? It seems like everything should go right. Perhaps I made some mistake with the encoding of the string.
Edit
Self-contained code:
from Crypto.Cipher import AES
import base64
def hex_to_str(hex_str):
return str(bytearray([int(n, 16) for n in hex_str.split()]))
key = hex_to_str("AE 68 52 F8 12 10 67 CC 4B F7 A5 76 55 77 F3 9E")
iv = hex_to_str("00 00 00 00 00 00 00 00")
nonce = hex_to_str("00 00 00 30")
ctr = hex_to_str("00 00 00 01")
cipher = AES.new(key)
ctr_block = iv + nonce + ctr
key_stream = base64.b64decode(cipher.encrypt(ctr_block))
print "".join([hex(ord(char)) for char in key_stream])
# 0xd90xda0x72

First, the correct CTR block order is nonce + iv + ctr. Second, that base64.b64decode call is wrong: cipher.encrypt produces a decoded string. After these two fixes your code prints 0xb70x600x330x280xdb0xc20x930x1b0x410xe0x160xc80x60x7e0x620xdf which seems to be a correct key stream.

First, use byte strings:
In [14]: keystring = "AE 68 52 F8 12 10 67 CC 4B F7 A5 76 55 77 F3 9E"
In [15]: keystring.replace(' ', '').decode('hex')
Out[15]: '\xaehR\xf8\x12\x10g\xccK\xf7\xa5vUw\xf3\x9e'
Second, you shouldn't use base64.

Problem writing unicode UTF-16 data to file in python

I'm working on Windows with Python 2.6.1.
I have a Unicode UTF-16 text file containing the single string Hello, if I look at it in a binary editor I see:
FF FE 48 00 65 00 6C 00 6C 00 6F 00 0D 00 0A 00
BOM H e l l o CR LF
What I want to do is read in this file, run it through Google Translate API, and write both it and the result to a new Unicode UTF-16 text file.
I wrote the following Python script (actually I wrote something more complex than this with more error checking, but this is stripped down as a minimal test case):
#!/usr/bin/python
import urllib
import urllib2
import sys
import codecs
def translate(key, line, lang):
ret = ""
print "translating " + line.strip() + " into " + lang
url = "https://www.googleapis.com/language/translate/v2?key=" + key + "&source=en&target=" + lang + "&q=" + urllib.quote(line.strip())
f = urllib2.urlopen(url)
for l in f.readlines():
if l.find("translatedText") > 0 and l.find('""') == -1:
a,b = l.split(":")
ret = unicode(b.strip('"'), encoding='utf-16', errors='ignore')
break
return ret
rd_file_name = sys.argv[1]
rd_file = codecs.open(rd_file_name, encoding='utf-16', mode="r")
rd_file_new = codecs.open(rd_file_name+".new", encoding='utf-16', mode="w")
key_file = open("api.key","r")
key = key_file.readline().strip()
for line in rd_file.readlines():
new_line = translate(key, line, "ja")
rd_file_new.write(unicode(line) + "\n")
rd_file_new.write(new_line)
rd_file_new.write("\n")
This gives me an almost-Unicode file with some extra bytes in it:
FF FE 48 00 65 00 6C 00 6C 00 6F 00 0D 00 0A 00 0A 00
20 22 E3 81 93 E3 82 93 E3 81 AB E3 81 A1 E3 81 AF 22 0A 00
I can see that 20 is a space, 22 is a quote, I assume that "E3" is an escape character that urllib2 is using to indicate that the next character is UTF-16 encoded??
If I run the same script but with "cs" (Czech) instead of "ja" (Japanese) as the target language, the response is all ASCII and I get the Unicode file with my "Hello" first as UTF-16 chars and then "Ahoj" as single byte ASCII chars.
I'm sure I'm missing something obvious but I can't see what. I tried urllib.unquote() on the result from the query but that didn't help. I also tried printing the string as it comes back in f.readlines() and it all looks pretty plausible, but it's hard to tell because my terminal window doesn't support Unicode properly.
Any other suggestions for things to try? I've looked at the suggested dupes but none of them seem to quite match my scenario.

I believe the output from Google is UTF-8, not UTF-16. Try this fix:
ret = unicode(b.strip('"'), encoding='utf-8', errors='ignore')

Those E3 bytes are not "escape characters". If one had no access to documentation, and was forced to make a guess, the most likely suspect for the response encoding would be UTF-8. Expectation (based on a one-week holiday in Japan): something like "konnichiwa".
>>> response = "\xE3\x81\x93\xE3\x82\x93\xE3\x81\xAB\xE3\x81\xA1\xE3\x81\xAF"
>>> ucode = response.decode('utf8')
>>> print repr(ucode)
u'\u3053\u3093\u306b\u3061\u306f'
>>> import unicodedata
>>> for c in ucode:
... print unicodedata.name(c)
...
HIRAGANA LETTER KO
HIRAGANA LETTER N
HIRAGANA LETTER NI
HIRAGANA LETTER TI
HIRAGANA LETTER HA
>>>
Looks close enough to me ...

Reading and writing binary data of dynamic sizes question

I am trying to read data from a file in binary mode and manipulate that data.
try:
resultfile = open("binfile", "rb")
except:
print "Error"
resultsize = os.path.getsize("binfile")
There is a 32 byte header which I parse fine then the buffer of binary data starts. The data can be of any size from 16 to 4092 and can be in any format from text to a pdf or image or anything else. The header has the size of the data so to get this information I do
contents = resultfile.read(resultsize)
and this puts the entire file into a string buffer. I found out this is probably my problem because when I try to copy chunks of the hex data from "contents" into a new file some bytes do not copy correctly so pdfs and images will come out corrupted.
Printing out a bit of the file string buffer in the interpreter yields for example something like "%PDF-1.5\r\n%\xb5\xb5\xb5\xb5\r\n1 0 obj\r\n" when I just want the bytes themselves in order to write them to a new file. Is there an easy solution to this problem that I am missing?
Here is an example of a hex dump with the pdf written with my python and the real pdf:
25 50 44 46 2D 31 2E 35 0D 0D 0A 25 B5 B5 B5 B5 0D 0D 0A 31 20 30 20 6F 62 6A 0D 0D 0A
25 50 44 46 2D 31 2E 35 0D 0A 25 B5 B5 B5 B5 0D 0A 31 20 30 20 6F 62 6A
It seems like a 0D is being added whenever there is a 0D 0A. In image files it might be a different byte, I don't remember and might have to test it.
My code to write the new file is pretty simple, using contents as the string buffer holding all the data.
fbuf = contents[offset+8:size+offset]
fl = open(fname, 'a')
fl.write(fbuf)
This is being called in a loop based on a signature that is found in the header. Offset+8 is beginning of actual pdf data and size is the size of the chunk to copy.

You need to open your output file in binary mode, as you do your input file. Otherwise, newline characters may get changed. You can see that this is what happens in your hex dump: 0A characters ('\n') are changed into OD 0A ('\r\n').
This should work:
input_file = open('f1', 'rb')
contents = input_file.read()
#....
data = contents[offset+8:size+offset] #for example
output_file = open('f2', 'wb')
output_file.write(data)

The result you get is "just the bytes themselves". You can write() them to an open file to copy them.
"It seems like a 0D is being added whenever there is a 0D 0A"
Sounds like you are on windows, and you are opening one of your files in text mode instead of binary.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.