A bit of a weird question perhaps, but I'm trying to replicate a python example where they are creating a HMAC SHA256 hash from a series of parameters.
I've run into a problem where I'm supposed to translate an api key in hex to ascii and use it as secret, but I just can't get the output to be the same as python.
>>> import hmac
>>> import hashlib
>>> apiKey = "76b02c0c4543a85e45552466694cf677937833c9cce87f0a628248af2d2c495b"
>>> apiKey.decode('hex')
'v\xb0,\x0cEC\xa8^EU$fiL\xf6w\x93x3\xc9\xcc\xe8\x7f\nb\x82H\xaf-,I['
If I've understood the material online this is supposed to represent the hex string in ascii characters.
Now to the powershell script:
$apikey = '76b02c0c4543a85e45552466694cf677937833c9cce87f0a628248af2d2c495b';
$hexstring = ""
for($i=0; $i -lt $apikey.Length;$i=$i+2){
$hexelement = [string]$apikey[$i] + [string]$apikey[$i+1]
$hexstring += [CHAR][BYTE][CONVERT]::toint16($hexelement,16)
}
That outputs the following:
v°,♀EC¨^EU$fiLöw?x3ÉÌè⌂
b?H¯-,I[
They are almost the same, but not quite and using them as secret in the HMAC generates different results. Any ideas?
Stating the obvious: The key in this example is no the real key.
Update:
They look more or less the same, but the encoding of the output is different. I also verified the hex to ASCII in multiple online functions and the powershell version seems right.
Does anyone have an idea how to compare the two different outputs?
Update 2:
I converted each character to integer and both Python and Powershell generates the same numbers, aka the content should be the same.
Attaching the scripts
Powershell:
Function generateToken {
Param($apikey, $url, $httpMethod, $queryparameters=$false, $postData=$false)
#$timestamp = [int]((Get-Date -UFormat %s).Replace(",", "."))
$timestamp = "1446128942"
$datastring = $httpMethod + $url
if($queryparameters){ $datastring += $queryparameters }
$datastring += $timestamp
if($postData){ $datastring += $postData }
$hmacsha = New-Object System.Security.Cryptography.HMACSHA256
$apiAscii = HexToString -hexstring $apiKey
$hmacsha.key = [Text.Encoding]::ASCII.GetBytes($apiAscii)
$signature = $hmacsha.ComputeHash([Text.Encoding]::ASCII.GetBytes($datastring))
$signature
}
Function HexToString {
Param($hexstring)
$asciistring = ""
for($i=0; $i -lt $hexstring.Length;$i=$i+2){
$hexelement = [string]$hexstring[$i] + [string]$hexstring[$i+1]
$asciistring += [CHAR][BYTE][CONVERT]::toint16($hexelement,16)
}
$asciistring
}
Function TokenToHex {
Param([array]$Token)
$hexhash = ""
Foreach($element in $Token){
$hexhash += '{0:x}' -f $element
}
$hexhash
}
$apiEndpoint = "http://test.control.llnw.com/traffic-reporting-api/v1"
#what you see in Control on Edit My Profile page#
$apikey = '76b02c0c4543a85e45552466694cf677937833c9cce87f0a628248af2d2c495b';
$queryParameters = "shortname=bulkget&service=http&reportDuration=day&startDate=2012-01-01"
$postData = "{param1: 123, param2: 456}"
$token = generateToken -uri $apiEndpoint -httpMethod "GET" -queryparameters $queryParameters, postData=postData, -apiKey $apiKey
TokenToHex -Token $token
Python:
import hashlib
import hmac
import time
try: import simplejson as json
except ImportError: import json
class HMACSample:
def generateSecurityToken(self, url, httpMethod, apiKey, queryParameters=None, postData=None):
#timestamp = str(int(round(time.time()*1000)))
timestamp = "1446128942"
datastring = httpMethod + url
if queryParameters != None : datastring += queryParameters
datastring += timestamp
if postData != None : datastring += postData
token = hmac.new(apiKey.decode('hex'), msg=datastring, digestmod=hashlib.sha256).hexdigest()
return token
if __name__ == '__main__':
apiEndpoint = "http://test.control.llnw.com/traffic-reporting-api/v1"
#what you see in Control on Edit My Profile page#
apiKey = "76b02c0c4543a85e45552466694cf677937833c9cce87f0a628248af2d2c495b";
queryParameters = "shortname=bulkget&service=http&reportDuration=day&startDate=2012-01-01"
postData = "{param1: 123, param2: 456}"
tool = HMACSample()
hmac = tool.generateSecurityToken(url=apiEndpoint, httpMethod="GET", queryParameters=queryParameters, postData=postData, apiKey=apiKey)
print json.dumps(hmac, indent=4)
apiKey with "test" instead of the converted hex to ASCII string outputs the same value which made me suspect that the conversion was the problem. Now I'm not sure what to believe anymore.
/Patrik
ASCII encoding support characters from this code point range 0–127. Any character outside this range, encoded with byte 63, which correspond to ?, in case you decode byte array back to string. So, with your code, you ruin your key by applying ASCII encoding to it. But if what you want is a byte array, then why do you do Hex String -> ASCII String -> Byte Array instead of just Hex String -> Byte Array?
Here is PowerShell code, which generate same results, as your Python code:
function GenerateToken {
param($apikey, $url, $httpMethod, $queryparameters, $postData)
$datastring = -join #(
$httpMethod
$url
$queryparameters
#[DateTimeOffset]::Now.ToUnixTimeSeconds()
1446128942
$postData
)
$hmacsha = New-Object System.Security.Cryptography.HMACSHA256
$hmacsha.Key = #($apikey -split '(?<=\G..)(?=.)'|ForEach-Object {[byte]::Parse($_,'HexNumber')})
[BitConverter]::ToString($hmacsha.ComputeHash([Text.Encoding]::UTF8.GetBytes($datastring))).Replace('-','').ToLower()
}
$apiEndpoint = "http://test.control.llnw.com/traffic-reporting-api/v1"
#what you see in Control on Edit My Profile page#
$apikey = '76b02c0c4543a85e45552466694cf677937833c9cce87f0a628248af2d2c495b';
$queryParameters = "shortname=bulkget&service=http&reportDuration=day&startDate=2012-01-01"
$postData = "{param1: 123, param2: 456}"
GenerateToken -url $apiEndpoint -httpMethod "GET" -queryparameters $queryParameters -postData $postData -apiKey $apiKey
I also fix some other errors in your PowerShell code. In particular, arguments to GenerateToken function call. Also, I change ASCII to UTF8 for $datastring encoding. UTF8 yields exactly same bytes if all characters are in ASCII range, so it does not matter in you case. But if you want to use characters out of ASCII range in $datastring, than you should choose same encoding, as you use in Python, or you will not get the same results.
Related
I have an Api code from Python 2 is as below:
dataStr = request.POST['data']
data = json.loads(dataStr)
key_string = data.get('key')
seed = data.get('seed')
guid = data.get('guid')
sha1 = hashlib.sha1()
yesStr = seed + 'yes' + key_string + 'ke.yp.ad' + guid
sha1.update(yesStr)
yesValue = sha1.hexdigest()
resp = {
'data': yesValue,
'place_id': str(record.GroupNumber)
}
return HttpResponse(resp)
now I am upgrading it to python 3, but sha1.update must use an encoded string. but I don't know what the encoding format was. so far I tried
sha1.udpate(yesStr.encode("utf-8"))
sha1.udpate(yesStr.encode("ascii"))
but none of them is matching the old string. Can anyone help? thanks.
Finally I figured this out with my friend's help. The default encoding for Python 2 is 'latin1'. So what I did to fix the problem is to change
sha1.update(yesStr)
to:
sha1.update(yesStr.encode('latin1'))
and it worked in python 3.
i am trying to encode a particular string with python with pycrypto and encode the same string with nodejs with crypto.
i am getting different results in both the cases for the same input string
python code:
from Crypto.Cipher import AES
from hashlib import md5
import base64
password = 'aquickbrownfoxjumpsoverthelazydog'
input = 'hello+world'
BLOCK_SIZE = 16
def pad (data):
pad = BLOCK_SIZE - len(data) % BLOCK_SIZE
return data + pad * chr(pad)
def unpad (padded):
pad = ord(padded[-1])
return padded[:-pad]
def text_encrypt(data, nonce, password):
m = md5()
m.update(password)
key = m.hexdigest()
m = md5()
m.update(password + key)
iv = m.hexdigest()
data = pad(data)
aes = AES.new(key, AES.MODE_CBC, iv[:16])
encrypted = aes.encrypt(data)
return base64.urlsafe_b64encode(encrypted)
output = text_encrypt(input, "", password)
print output
and the nodejs code is as follows:
var crypto = require('crypto');
var password = 'aquickbrownfoxjumpsoverthelazydog';
var input = 'hello+world';
var encrypt = function (input, password, callback) {
var m = crypto.createHash('md5');
m.update(password)
var key = m.digest('hex');
m = crypto.createHash('md5');
m.update(password + key)
var iv = m.digest('hex');
var data = new Buffer(input, 'utf8').toString('binary');
var cipher = crypto.createCipheriv('aes-256-cbc', key, iv.slice(0,16));
var nodev = process.version.match(/^v(\d+)\.(\d+)/);
var encrypted;
if( nodev[1] === '0' && parseInt(nodev[2]) < 10) {
encrypted = cipher.update(data, 'binary') + cipher.final('binary');
} else {
encrypted = cipher.update(data, 'utf8', 'binary') + cipher.final('binary');
}
var encoded = new Buffer(encrypted, 'binary').toString('base64');
callback(encoded);
};
encrypt(input, password, function (encoded) {
console.log(encoded);
});
the results for both the cases is different but after decryption they both tend to give the same correct result.
what might be the issue here?
You didn't specify what different results are you getting but those two should produce same-ish result. The only difference I see is in the base64 alphabet you're using.
In Python you're calling base64.urlsafe_b64encode() which differs from the standard Base64 in what characters it uses for values for 62 and 63 (- and _ instead of + and /). To get the same result, either in Python return:
return base64.b64encode(encrypted)
Or post-process the base64 encoded string in Node.js:
encoded = encoded.replace(/_/g, '/').replace(/-/g, '+');
All this being said, as I've mentioned in my comment, never derive an IV from your password/key (or anything else deterministic and unchanging). Use a cryptographically secure PRNG for it.
I downloaded my Facebook messenger data (in your Facebook account, go to settings, then to Your Facebook information, then Download your information, then create a file with at least the Messages box checked) to do some cool statistics
However there is a small problem with encoding. I'm not sure, but it looks like Facebook used bad encoding for this data. When I open it with text editor I see something like this: Rados\u00c5\u0082aw. When I try to open it with python (UTF-8) I get RadosÅ\x82aw. However I should get: Radosław.
My python script:
text = open(os.path.join(subdir, file), encoding='utf-8')
conversations.append(json.load(text))
I tried a few most common encodings. Example data is:
{
"sender_name": "Rados\u00c5\u0082aw",
"timestamp": 1524558089,
"content": "No to trzeba ostatnie treningi zrobi\u00c4\u0087 xD",
"type": "Generic"
}
I can indeed confirm that the Facebook download data is incorrectly encoded; a Mojibake. The original data is UTF-8 encoded but was decoded as Latin-1 instead. I’ll make sure to file a bug report.
What this means is that any non-ASCII character in the string data was encoded twice. First to UTF-8, and then the UTF-8 bytes were encoded again by interpreting them as Latin-1 encoded data (which maps exactly 256 characters to the 256 possible byte values), by using the \uHHHH JSON escape notation (so a literal backslash, a literal lowercase letter u, followed by 4 hex digits, 0-9 and a-f). Because the second step encoded byte values in the range 0-255, this resulted in a series of \u00HH sequences (a literal backslash, a literal lower case letter u, two 0 zero digits and two hex digits).
E.g. the Unicode character U+0142 LATIN SMALL LETTER L WITH STROKE in the name Radosław was encoded to the UTF-8 byte values C5 and 82 (in hex notation), and then encoded again to \u00c5\u0082.
You can repair the damage in two ways:
Decode the data as JSON, then re-encode any string values as Latin-1 binary data, and then decode again as UTF-8:
>>> import json
>>> data = r'"Rados\u00c5\u0082aw"'
>>> json.loads(data).encode('latin1').decode('utf8')
'Radosław'
This would require a full traversal of your data structure to find all those strings, of course.
Load the whole JSON document as binary data, replace all \u00hh JSON sequences with the byte the last two hex digits represent, then decode as JSON:
import re
from functools import partial
fix_mojibake_escapes = partial(
re.compile(rb'\\u00([\da-f]{2})').sub,
lambda m: bytes.fromhex(m[1].decode()),
)
with open(os.path.join(subdir, file), 'rb') as binary_data:
repaired = fix_mojibake_escapes(binary_data.read())
data = json.loads(repaired)
(If you are using Python 3.5 or older, you'll have to decode the repaired bytes object from UTF-8, so use json.loads(repaired.decode())).
From your sample data this produces:
{'content': 'No to trzeba ostatnie treningi zrobić xD',
'sender_name': 'Radosław',
'timestamp': 1524558089,
'type': 'Generic'}
The regular expression matches against all \u00HH sequences in the binary data and replaces those with the bytes they represent, so that the data can be decoded correctly as UTF-8. The second decoding is taken care of by the json.loads() function when given binary data.
Here is a command-line solution with jq and iconv. Tested on Linux.
cat message_1.json | jq . | iconv -f utf8 -t latin1 > m1.json
I would like to extend #Geekmoss' answer with the following recursive code snippet, I used to decode my facebook data.
import json
def parse_obj(obj):
if isinstance(obj, str):
return obj.encode('latin_1').decode('utf-8')
if isinstance(obj, list):
return [parse_obj(o) for o in obj]
if isinstance(obj, dict):
return {key: parse_obj(item) for key, item in obj.items()}
return obj
decoded_data = parse_obj(json.loads(file))
I noticed this works better, because the facebook data you download might contain list of dicts, in which case those dicts would be just returned 'as is' because of the lambda identity function.
My solution for parsing objects use parse_hook callback on load/loads function:
import json
def parse_obj(dct):
for key in dct:
dct[key] = dct[key].encode('latin_1').decode('utf-8')
pass
return dct
data = '{"msg": "Ahoj sv\u00c4\u009bte"}'
# String
json.loads(data)
# Out: {'msg': 'Ahoj svÄ\x9bte'}
json.loads(data, object_hook=parse_obj)
# Out: {'msg': 'Ahoj světe'}
# File
with open('/path/to/file.json') as f:
json.load(f, object_hook=parse_obj)
# Out: {'msg': 'Ahoj světe'}
pass
Update:
Solution for parsing list with strings does not working. So here is updated solution:
import json
def parse_obj(obj):
for key in obj:
if isinstance(obj[key], str):
obj[key] = obj[key].encode('latin_1').decode('utf-8')
elif isinstance(obj[key], list):
obj[key] = list(map(lambda x: x if type(x) != str else x.encode('latin_1').decode('utf-8'), obj[key]))
pass
return obj
Based on #Martijn Pieters solution, I wrote something similar in Java.
public String getMessengerJson(Path path) throws IOException {
String badlyEncoded = Files.readString(path, StandardCharsets.UTF_8);
String unescaped = unescapeMessenger(badlyEncoded);
byte[] bytes = unescaped.getBytes(StandardCharsets.ISO_8859_1);
String fixed = new String(bytes, StandardCharsets.UTF_8);
return fixed;
}
The unescape method is inspired by the org.apache.commons.lang.StringEscapeUtils.
private String unescapeMessenger(String str) {
if (str == null) {
return null;
}
try {
StringWriter writer = new StringWriter(str.length());
unescapeMessenger(writer, str);
return writer.toString();
} catch (IOException ioe) {
// this should never ever happen while writing to a StringWriter
throw new UnhandledException(ioe);
}
}
private void unescapeMessenger(Writer out, String str) throws IOException {
if (out == null) {
throw new IllegalArgumentException("The Writer must not be null");
}
if (str == null) {
return;
}
int sz = str.length();
StrBuilder unicode = new StrBuilder(4);
boolean hadSlash = false;
boolean inUnicode = false;
for (int i = 0; i < sz; i++) {
char ch = str.charAt(i);
if (inUnicode) {
unicode.append(ch);
if (unicode.length() == 4) {
// unicode now contains the four hex digits
// which represents our unicode character
try {
int value = Integer.parseInt(unicode.toString(), 16);
out.write((char) value);
unicode.setLength(0);
inUnicode = false;
hadSlash = false;
} catch (NumberFormatException nfe) {
throw new NestableRuntimeException("Unable to parse unicode value: " + unicode, nfe);
}
}
continue;
}
if (hadSlash) {
hadSlash = false;
if (ch == 'u') {
inUnicode = true;
} else {
out.write("\\");
out.write(ch);
}
continue;
} else if (ch == '\\') {
hadSlash = true;
continue;
}
out.write(ch);
}
if (hadSlash) {
// then we're in the weird case of a \ at the end of the
// string, let's output it anyway.
out.write('\\');
}
}
Facebook programmers seem to have mixed up the concepts of Unicode encoding and escape sequences, probably while implementing their own ad-hoc serializer. Further details in Invalid Unicode encodings in Facebook data exports.
Try this:
import json
import io
class FacebookIO(io.FileIO):
def read(self, size: int = -1) -> bytes:
data: bytes = super(FacebookIO, self).readall()
new_data: bytes = b''
i: int = 0
while i < len(data):
# \u00c4\u0085
# 0123456789ab
if data[i:].startswith(b'\\u00'):
u: int = 0
new_char: bytes = b''
while data[i+u:].startswith(b'\\u00'):
hex = int(bytes([data[i+u+4], data[i+u+5]]), 16)
new_char = b''.join([new_char, bytes([hex])])
u += 6
char : str = new_char.decode('utf-8')
new_chars: bytes = bytes(json.dumps(char).strip('"'), 'ascii')
new_data += new_chars
i += u
else:
new_data = b''.join([new_data, bytes([data[i]])])
i += 1
return new_data
if __name__ == '__main__':
f = FacebookIO('data.json','rb')
d = json.load(f)
print(d)
This is #Geekmoss' answer, but adapted for Python 3:
def parse_facebook_json(json_file_path):
def parse_obj(obj):
for key in obj:
if isinstance(obj[key], str):
obj[key] = obj[key].encode('latin_1').decode('utf-8')
elif isinstance(obj[key], list):
obj[key] = list(map(lambda x: x if type(x) != str else x.encode('latin_1').decode('utf-8'), obj[key]))
pass
return obj
with json_file_path.open('rb') as json_file:
return json.load(json_file, object_hook=parse_obj)
# Usage
parse_facebook_json(Path("/.../message_1.json"))
Extending Martijn solution #1, that I see it can lead towards recursive object processing (It certainly lead me initially):
You can apply this to the whole string of json object, if you don't ensure_ascii
json.dumps(obj, ensure_ascii=False, indent=2).encode('latin-1').decode('utf-8')
then write it to file or something.
PS: This should be comment on #Martijn answer: https://stackoverflow.com/a/50011987/1309932 (but I can't add comments)
This is my approach for Node 17.0.1, based on #hotigeftas recursive code, using the iconv-lite package.
import iconv from 'iconv-lite';
function parseObject(object) {
if (typeof object == 'string') {
return iconv.decode(iconv.encode(object, 'latin1'), 'utf8');;
}
if (typeof object == 'object') {
for (let key in object) {
object[key] = parseObject(object[key]);
}
return object;
}
return object;
}
//usage
let file = JSON.parse(fs.readFileSync(fileName));
file = parseObject(file);
I need an equivalent of this golang function in Python :
func RsaDecrypt(ciphertext []byte) ([]byte, error) {
block, _ := pem.Decode(privateKey)
if block == nil {
return nil, errors.New("private key error!")
}
priv, err := x509.ParsePKCS1PrivateKey(block.Bytes)
if err != nil {
return nil, err
}
return rsa.DecryptPKCS1v15(rand.Reader, priv, ciphertext)
}
I'm a python developer, and do not understand how this works. I have created this function, but it is not the same:
import rsa
with open("rsa.key") as f:
priv_key_pkcs1 = f.read()
priv_key = rsa.PrivateKey.load_pkcs1(priv_key_pkcs1)
line = '''
Lyzkh2pqrisgM_p32O6FmA8oDvzaimvrU9zyd0vyW6HBM2BznuHLbAYUMGp5oYgEHCxmZTWDs67Jt5AGulfn-LrcewCQi89wrb00ZvP69YdjwBe-7aoXBG4_zNMZ7ecLgd8WzUqBGGtVvUhCTVSBBi85mNMSCcgYHt__PFefRHZE09nHnEX25w6iR0ZZlQxuESBkuqTcs8qjUhs2Guin1xBMSWRINj4JDdCjIVHV4hdSjrINgFU-VF1sYFRibWcboYlXifROOxCF50MGtIBkcf7dnqsrR8HEXgZLnCyikhhlQAFoh2hsj4lPWNpWum-dBWj-B0b8P-hRmermDzcPqA==
'''
encrypted = line.decode('base64')
decrypted = rsa.decrypt(encrypted, priv_key)
print decrypted
Can anybody can help me to convert golang function in Python? Or give me some info where I was wrong in my actual python code?
You're using the wrong base64 decoder to decode your ciphertext. It's evident from the "-" and "_" characters present in your ciphertext that it was encoded using the URL-safe variant of base64. To decode this you should use the base64 module, e.g.
import base64
line = '''
Lyzkh2pqrisgM_p32O6FmA8oDvzaimvrU9zyd0vyW6HBM2BznuHLbAYUMGp5oYgEHCxmZTWDs67Jt5AGulfn-LrcewCQi89wrb00ZvP69YdjwBe-7aoXBG4_zNMZ7ecLgd8WzUqBGGtVvUhCTVSBBi85mNMSCcgYHt__PFefRHZE09nHnEX25w6iR0ZZlQxuESBkuqTcs8qjUhs2Guin1xBMSWRINj4JDdCjIVHV4hdSjrINgFU-VF1sYFRibWcboYlXifROOxCF50MGtIBkcf7dnqsrR8HEXgZLnCyikhhlQAFoh2hsj4lPWNpWum-dBWj-B0b8P-hRmermDzcPqA==
'''
encrypted = base64.urlsafe_b64decode(line)
print len(encrypted)
print encrypted.encode('hex')
I am trying to encode a text string to base64.
i tried doing this :
name = "your name"
print('encoding %s in base64 yields = %s\n'%(name,name.encode('base64','strict')))
But this gives me the following error:
LookupError: 'base64' is not a text encoding; use codecs.encode() to handle arbitrary codecs
How do I go about doing this ? ( using Python 3.4)
Remember to import base64 and that b64encode takes bytes as an argument.
import base64
b = base64.b64encode(bytes('your string', 'utf-8')) # bytes
base64_str = b.decode('utf-8') # convert bytes to string
It turns out that this is important enough to get it's own module...
import base64
base64.b64encode(b'your name') # b'eW91ciBuYW1l'
base64.b64encode('your name'.encode('ascii')) # b'eW91ciBuYW1l'
For py3, base64 encode and decode string:
import base64
def b64e(s):
return base64.b64encode(s.encode()).decode()
def b64d(s):
return base64.b64decode(s).decode()
1) This works without imports in Python 2:
>>>
>>> 'Some text'.encode('base64')
'U29tZSB0ZXh0\n'
>>>
>>> 'U29tZSB0ZXh0\n'.decode('base64')
'Some text'
>>>
>>> 'U29tZSB0ZXh0'.decode('base64')
'Some text'
>>>
(although this doesn't work in Python3 )
2) In Python 3 you'd have to import base64 and do base64.b64decode('...')
- will work in Python 2 too.
To compatibility with both py2 and py3
import six
import base64
def b64encode(source):
if six.PY3:
source = source.encode('utf-8')
content = base64.b64encode(source).decode('utf-8')
It looks it's essential to call decode() function to make use of actual string data even after calling base64.b64decode over base64 encoded string. Because never forget it always return bytes literals.
import base64
conv_bytes = bytes('your string', 'utf-8')
print(conv_bytes) # b'your string'
encoded_str = base64.b64encode(conv_bytes)
print(encoded_str) # b'eW91ciBzdHJpbmc='
print(base64.b64decode(encoded_str)) # b'your string'
print(base64.b64decode(encoded_str).decode()) # your string
Whilst you can of course use the base64 module, you can also to use the codecs module (referred to in your error message) for binary encodings (meaning non-standard & non-text encodings).
For example:
import codecs
my_bytes = b"Hello World!"
codecs.encode(my_bytes, "base64")
codecs.encode(my_bytes, "hex")
codecs.encode(my_bytes, "zip")
codecs.encode(my_bytes, "bz2")
This can come in useful for large data as you can chain them to get compressed and json-serializable values:
my_large_bytes = my_bytes * 10000
codecs.decode(
codecs.encode(
codecs.encode(
my_large_bytes,
"zip"
),
"base64"),
"utf8"
)
Refs:
https://docs.python.org/3/library/codecs.html#binary-transforms
https://docs.python.org/3/library/codecs.html#standard-encodings
https://docs.python.org/3/library/codecs.html#text-encodings
Use the below code:
import base64
#Taking input through the terminal.
welcomeInput= raw_input("Enter 1 to convert String to Base64, 2 to convert Base64 to String: ")
if(int(welcomeInput)==1 or int(welcomeInput)==2):
#Code to Convert String to Base 64.
if int(welcomeInput)==1:
inputString= raw_input("Enter the String to be converted to Base64:")
base64Value = base64.b64encode(inputString.encode())
print "Base64 Value = " + base64Value
#Code to Convert Base 64 to String.
elif int(welcomeInput)==2:
inputString= raw_input("Enter the Base64 value to be converted to String:")
stringValue = base64.b64decode(inputString).decode('utf-8')
print "Base64 Value = " + stringValue
else:
print "Please enter a valid value."
Base64 encoding is a process of converting binary data to an ASCII
string format by converting that binary data into a 6-bit character
representation. The Base64 method of encoding is used when binary
data, such as images or video, is transmitted over systems that are
designed to transmit data in a plain-text (ASCII) format.
Follow this link for further details about understanding and working of base64 encoding.
For those who want to implement base64 encoding from scratch for the sake of understanding, here's the code that encodes the string to base64.
encoder.py
#!/usr/bin/env python3.10
class Base64Encoder:
#base64Encoding maps integer to the encoded text since its a list here the index act as the key
base64Encoding:list = None
#data must be type of str or bytes
def encode(data)->str:
#data = data.encode("UTF-8")
if not isinstance(data, str) and not isinstance(data, bytes):
raise AttributeError(f"Expected {type('')} or {type(b'')} but found {type(data)}")
if isinstance(data, str):
data = data.encode("ascii")
if Base64Encoder.base64Encoding == None:
#construction base64Encoding
Base64Encoder.base64Encoding = list()
#mapping A-Z
for key in range(0, 26):
Base64Encoder.base64Encoding.append(chr(key + 65))
#mapping a-z
for key in range(0, 26):
Base64Encoder.base64Encoding.append(chr(key + 97))
#mapping 0-9
for key in range(0, 10):
Base64Encoder.base64Encoding.append(chr(key + 48))
#mapping +
Base64Encoder.base64Encoding.append('+')
#mapping /
Base64Encoder.base64Encoding.append('/')
if len(data) == 0:
return ""
length=len(data)
bytes_to_append = -(length%3)+(3 if length%3 != 0 else 0)
#print(f"{bytes_to_append=}")
binary_list = []
for s in data:
ascii_value = s
binary = f"{ascii_value:08b}"
#binary = bin(ascii_value)[2:]
#print(s, binary, type(binary))
for bit in binary:
binary_list.append(bit)
length=len(binary_list)
bits_to_append = -(length%6) + (6 if length%6 != 0 else 0)
binary_list.extend([0]*bits_to_append)
#print(f"{binary_list=}")
base64 = []
value = 0
for index, bit in enumerate(reversed(binary_list)):
#print (f"{bit=}")
#converting block of 6 bits to integer value
value += ( 2**(index%6) if bit=='1' else 0)
#print(f"{value=}")
#print(bit, end = '')
if (index+1)%6 == 0:
base64.append(Base64Encoder.base64Encoding[value])
#print(' ', end="")
#resetting value
value = 0
pass
#print()
#padding if there is less bytes and returning the result
return ''.join(reversed(base64))+''.join(['=']*bytes_to_append)
testEncoder.py
#!/usr/bin/env python3.10
from encoder import Base64Encoder
if __name__ == "__main__":
print(Base64Encoder.encode("Hello"))
print(Base64Encoder.encode("1 2 10 13 -7"))
print(Base64Encoder.encode("A"))
with open("image.jpg", "rb") as file_data:
print(Base64Encoder.encode(file_data.read()))
Output:
$ ./testEncoder.py
SGVsbG8=
MSAyIDEwIDEzIC03
QQ==