Encrypt a string in Python. Restrict the characters used to only alphanumeric - python

I would like to encrypt a 10 Character (alpha-numeric only) string into a 16 or 32 character alpha-numeric string.
The string I am encrypting is an asset tag. So in itself it carries no information, but I would like to hide all valid possible strings within a larger group of possible strings. I was hoping that encrypting the string would be a good way to do this.
Is it possible to do this with the Python PyCrypto library?
Here is an example I found regarding using PyCrypto.

You're better off with simple hashing (which is like one way encryption). To do this just use the md5 function to make a digest and then base64 or base16 encode it. Please note that base64 strings can include +, = or /.
import md5
import base64
def obfuscate(s):
return base64.b64encode( md5.new(s).digest())
def obfuscate2(s):
return base64.b16encode( md5.new(s).digest())
# returns alphanumeric string but strings can also include slash, plus or equal i.e. /+=
print obfuscate('Tag 1')
print obfuscate('Tag 2')
print obfuscate('Tag 3')
# return hex string
print obfuscate2('Tag 1')
As has been commented md5 is rapidly losing its security, so if you want to have something more reliable for the future, use the SHA-2 example below.
import hashlib
def obfuscate(s):
m = hashlib.sha256()
m.update(s)
return m.hexdigest()
print obfuscate('Tag 1')
print obfuscate('Tag 2')
print obfuscate('Tag 3')
One more function - this time generate about 96-bit* digest using SHA-2 and truncating the output so that we can restrict it to 16 alphanum chars. This give slightly more chance of collision but should be good enough for most practical purposes.
import hashlib
import base64
def obfuscate(s):
m = hashlib.sha256()
m.update(s)
hash = base64.b64encode(m.digest(), altchars="ZZ") # make one way base64 encode, to fit characters into alphanum space only
return hash[:16] # cut of hash at 16 chars - gives about 96 bits which should
# 96 bits means 1 in billion chance of collision if you have 1 billion tags (or much lower chance with fewer tags)
# http://en.wikipedia.org/wiki/Birthday_attack
print obfuscate('Tag 1')
print obfuscate('Tag 2')
print obfuscate('Tag 3')
*The actual digest is only 95.2 bits as we use 62 character alphabet for encoding.
>>> math.log(62**16,2)
95.26714096618998

To make a string longer, you could try the following;
first compress it with bzip2
then make it readable again with base64 encoding
Like this:
import bz2
import base64
base64.b64encode(bz2.compress('012345'))
This will yield:
'QlpoOTFBWSZTWeEMDLgAAAAIAH4AIAAhgAwDJy7i7kinChIcIYGXAA=='
Due to the bzip2 header, the first 13 character will always be the same, so you should discard them;
base64.b64encode(bz2.compress('012345'))[14:]
This gives:
'EMDLgAAAAIAH4AIAAhgAwDJy7i7kinChIcIYGXAA=='
Note that this is not cryptographically secure; it is trivial to invert if you know the recipe that is used:
foo = base64.b64encode(bz2.compress('012345'))
bz2.decompress(base64.b64decode(foo))
gives:
'012345'

I think shake256 fit your needs:
You need to install pycryptodome.
https://pycryptodome.readthedocs.io/en/latest/src/hash/shake256.html
#!/usr/bin/env python
from Crypto.Hash import SHAKE256
from binascii import hexlify
def encrypt_shake256(s, hash_size):
shake = SHAKE256.new()
shake.update(s.encode())
return hexlify(shake.read(hash_size//2))
def main():
hash = encrypt_shake256("holahola", 16)
print(hash)
print(len(hash))
if __name__ == '__main__':
main()
Output:
b'c126f8fb14fb21d8'
16

Yes, you can also use PyCrypto :
from Crypto.Hash import SHA256
aHash = SHA256.new("somethingToHash")
print(aHash.hexdigest()) #will print out the hashed password
The Crypto.Hash module is what comes from installing the pycrypto module (sudo pip install pycrypto).
This is basically the same thing as hashlib, however the PyCrypto library comes with ciphering modules.

Related

convert base64 string into image in python [duplicate]

I have some data that is base64 encoded that I want to convert back to binary even if there is a padding error in it. If I use
base64.decodestring(b64_string)
it raises an 'Incorrect padding' error. Is there another way?
UPDATE: Thanks for all the feedback. To be honest, all the methods mentioned sounded a bit hit
and miss so I decided to try openssl. The following command worked a treat:
openssl enc -d -base64 -in b64string -out binary_data
It seems you just need to add padding to your bytes before decoding. There are many other answers on this question, but I want to point out that (at least in Python 3.x) base64.b64decode will truncate any extra padding, provided there is enough in the first place.
So, something like: b'abc=' works just as well as b'abc==' (as does b'abc=====').
What this means is that you can just add the maximum number of padding characters that you would ever need—which is two (b'==')—and base64 will truncate any unnecessary ones.
This lets you write:
base64.b64decode(s + b'==')
which is simpler than:
base64.b64decode(s + b'=' * (-len(s) % 4))
Note that if the string s already has some padding (e.g. b"aGVsbG8="), this approach will only work if the validate keyword argument is set to False (which is the default). If validate is True this will result in a binascii.Error being raised if the total padding is longer than two characters.
From the docs:
If validate is False (the default), characters that are neither in the normal base-64 alphabet nor the alternative alphabet are discarded prior to the padding check. If validate is True, these non-alphabet characters in the input result in a binascii.Error.
However, if validate is False (or left blank to be the default) you can blindly add two padding characters without any problem. Thanks to eel ghEEz for pointing this out in the comments.
As said in other responses, there are various ways in which base64 data could be corrupted.
However, as Wikipedia says, removing the padding (the '=' characters at the end of base64 encoded data) is "lossless":
From a theoretical point of view, the padding character is not needed,
since the number of missing bytes can be calculated from the number
of Base64 digits.
So if this is really the only thing "wrong" with your base64 data, the padding can just be added back. I came up with this to be able to parse "data" URLs in WeasyPrint, some of which were base64 without padding:
import base64
import re
def decode_base64(data, altchars=b'+/'):
"""Decode base64, padding being optional.
:param data: Base64 data as an ASCII byte string
:returns: The decoded byte string.
"""
data = re.sub(rb'[^a-zA-Z0-9%s]+' % altchars, b'', data) # normalize
missing_padding = len(data) % 4
if missing_padding:
data += b'='* (4 - missing_padding)
return base64.b64decode(data, altchars)
Tests for this function: weasyprint/tests/test_css.py#L68
Just add padding as required. Heed Michael's warning, however.
b64_string += "=" * ((4 - len(b64_string) % 4) % 4) #ugh
"Incorrect padding" can mean not only "missing padding" but also (believe it or not) "incorrect padding".
If suggested "adding padding" methods don't work, try removing some trailing bytes:
lens = len(strg)
lenx = lens - (lens % 4 if lens % 4 else 4)
try:
result = base64.decodestring(strg[:lenx])
except etc
Update: Any fiddling around adding padding or removing possibly bad bytes from the end should be done AFTER removing any whitespace, otherwise length calculations will be upset.
It would be a good idea if you showed us a (short) sample of the data that you need to recover. Edit your question and copy/paste the result of print repr(sample).
Update 2: It is possible that the encoding has been done in an url-safe manner. If this is the case, you will be able to see minus and underscore characters in your data, and you should be able to decode it by using base64.b64decode(strg, '-_')
If you can't see minus and underscore characters in your data, but can see plus and slash characters, then you have some other problem, and may need the add-padding or remove-cruft tricks.
If you can see none of minus, underscore, plus and slash in your data, then you need to determine the two alternate characters; they'll be the ones that aren't in [A-Za-z0-9]. Then you'll need to experiment to see which order they need to be used in the 2nd arg of base64.b64decode()
Update 3: If your data is "company confidential":
(a) you should say so up front
(b) we can explore other avenues in understanding the problem, which is highly likely to be related to what characters are used instead of + and / in the encoding alphabet, or by other formatting or extraneous characters.
One such avenue would be to examine what non-"standard" characters are in your data, e.g.
from collections import defaultdict
d = defaultdict(int)
import string
s = set(string.ascii_letters + string.digits)
for c in your_data:
if c not in s:
d[c] += 1
print d
Use
string += '=' * (-len(string) % 4) # restore stripped '='s
Credit goes to a comment somewhere here.
>>> import base64
>>> enc = base64.b64encode('1')
>>> enc
>>> 'MQ=='
>>> base64.b64decode(enc)
>>> '1'
>>> enc = enc.rstrip('=')
>>> enc
>>> 'MQ'
>>> base64.b64decode(enc)
...
TypeError: Incorrect padding
>>> base64.b64decode(enc + '=' * (-len(enc) % 4))
>>> '1'
>>>
If there's a padding error it probably means your string is corrupted; base64-encoded strings should have a multiple of four length. You can try adding the padding character (=) yourself to make the string a multiple of four, but it should already have that unless something is wrong
Incorrect padding error is caused because sometimes, metadata is also present in the encoded string
If your string looks something like: 'data:image/png;base64,...base 64 stuff....'
then you need to remove the first part before decoding it.
Say if you have image base64 encoded string, then try below snippet..
from PIL import Image
from io import BytesIO
from base64 import b64decode
imagestr = 'data:image/png;base64,...base 64 stuff....'
im = Image.open(BytesIO(b64decode(imagestr.split(',')[1])))
im.save("image.png")
You can simply use base64.urlsafe_b64decode(data) if you are trying to decode a web image. It will automatically take care of the padding.
Check the documentation of the data source you're trying to decode. Is it possible that you meant to use base64.urlsafe_b64decode(s) instead of base64.b64decode(s)? That's one reason you might have seen this error message.
Decode string s using a URL-safe alphabet, which substitutes - instead
of + and _ instead of / in the standard Base64 alphabet.
This is for example the case for various Google APIs, like Google's Identity Toolkit and Gmail payloads.
Adding the padding is rather... fiddly. Here's the function I wrote with the help of the comments in this thread as well as the wiki page for base64 (it's surprisingly helpful) https://en.wikipedia.org/wiki/Base64#Padding.
import logging
import base64
def base64_decode(s):
"""Add missing padding to string and return the decoded base64 string."""
log = logging.getLogger()
s = str(s).strip()
try:
return base64.b64decode(s)
except TypeError:
padding = len(s) % 4
if padding == 1:
log.error("Invalid base64 string: {}".format(s))
return ''
elif padding == 2:
s += b'=='
elif padding == 3:
s += b'='
return base64.b64decode(s)
There are two ways to correct the input data described here, or, more specifically and in line with the OP, to make Python module base64's b64decode method able to process the input data to something without raising an un-caught exception:
Append == to the end of the input data and call base64.b64decode(...)
If that raises an exception, then
i. Catch it via try/except,
ii. (R?)Strip any = characters from the input data (N.B. this may not be necessary),
iii. Append A== to the input data (A== through P== will work),
iv. Call base64.b64decode(...) with those A==-appended input data
The result from Item 1. or Item 2. above will yield the desired result.
Caveats
This does not guarantee the decoded result will be what was originally encoded, but it will (sometimes?) give the OP enough to work with:
Even with corruption I want to get back to the binary because I can still get some useful info from the ASN.1 stream").
See What we know and Assumptions below.
TL;DR
From some quick tests of base64.b64decode(...)
it appears that it ignores non-[A-Za-z0-9+/] characters; that includes ignoring =s unless they are the last character(s) in a parsed group of four, in which case the =s terminate the decoding (a=b=c=d= gives the same result as abc=, and a==b==c== gives the same result as ab==).
It also appears that all characters appended are ignored after the point where base64.b64decode(...) terminates decoding e.g. from an = as the fourth in a group.
As noted in several comments above, there are either zero, or one, or two, =s of padding required at the end of input data for when the [number of parsed characters to that point modulo 4] value is 0, or 3, or 2, respectively. So, from items 3. and 4. above, appending two or more =s to the input data will correct any [Incorrect padding] problems in those cases.
HOWEVER, decoding cannot handle the case where the [total number of parsed characters modulo 4] is 1, because it takes a least two encoded characters to represent the first decoded byte in a group of three decoded bytes. In uncorrupted encoded input data, this [N modulo 4]=1 case never happens, but as the OP stated that characters may be missing, it could happen here. That is why simply appending =s will not always work, and why appending A== will work when appending == does not. N.B. Using [A] is all but arbitrary: it adds only cleared (zero) bits to the decoded, which may or not be correct, but then the object here is not correctness but completion by base64.b64decode(...) sans exceptions.
What we know from the OP and especially subsequent comments is
It is suspected that there are missing data (characters) in the
Base64-encoded input data
The Base64 encoding uses the standard 64 place-values plus padding:
A-Z; a-z; 0-9; +; /; = is padding. This is confirmed, or at least
suggested, by the fact that openssl enc ... works.
Assumptions
The input data contain only 7-bit ASCII data
The only kind of corruption is missing encoded input data
The OP does not care about decoded output data at any point after that corresponding to any missing encoded input data
Github
Here is a wrapper to implement this solution:
https://github.com/drbitboy/missing_b64
I got this error without any use of base64. So i got a solution that error is in localhost it works fine on 127.0.0.1
In my case Gmail Web API was returning the email content as a base64 encoded string, but instead of encoded with the standard base64 characters/alphabet, it was encoded with the "web-safe" characters/alphabet variant of base64. The + and / characters are replaced with - and _. For python 3 use base64.urlsafe_b64decode().
This can be done in one line - no need to add temporary variables:
b64decode(f"{s}{'=' * (4 - len(s) % 4)}")
In case this error came from a web server: Try url encoding your post value. I was POSTing via "curl" and discovered I wasn't url-encoding my base64 value so characters like "+" were not escaped so the web server url-decode logic automatically ran url-decode and converted + to spaces.
"+" is a valid base64 character and perhaps the only character which gets mangled by an unexpected url-decode.
You should use
base64.b64decode(b64_string, ' /')
By default, the altchars are '+/'.
I ran into this problem as well and nothing worked.
I finally managed to find the solution which works for me. I had zipped content in base64 and this happened to 1 out of a million records...
This is a version of the solution suggested by Simon Sapin.
In case the padding is missing 3 then I remove the last 3 characters.
Instead of "0gA1RD5L/9AUGtH9MzAwAAA=="
We get "0gA1RD5L/9AUGtH9MzAwAA"
missing_padding = len(data) % 4
if missing_padding == 3:
data = data[0:-3]
elif missing_padding != 0:
print ("Missing padding : " + str(missing_padding))
data += '=' * (4 - missing_padding)
data_decoded = base64.b64decode(data)
According to this answer Trailing As in base64 the reason is nulls. But I still have no idea why the encoder messes this up...
def base64_decode(data: str) -> str:
data = data.encode("ascii")
rem = len(data) % 4
if rem > 0:
data += b"=" * (4 - rem)
return base64.urlsafe_b64decode(data).decode('utf-8')
Simply add additional characters like "=" or any other and make it a multiple of 4 before you try decoding the target string value. Something like;
if len(value) % 4 != 0: #check if multiple of 4
while len(value) % 4 != 0:
value = value + "="
req_str = base64.b64decode(value)
else:
req_str = base64.b64decode(value)
In my case I faced that error while parsing an email. I got the attachment as base64 string and extract it via re.search. Eventually there was a strange additional substring at the end.
dHJhaWxlcgo8PCAvU2l6ZSAxNSAvUm9vdCAxIDAgUiAvSW5mbyAyIDAgUgovSUQgWyhcMDAyXDMz
MHtPcFwyNTZbezU/VzheXDM0MXFcMzExKShcMDAyXDMzMHtPcFwyNTZbezU/VzheXDM0MXFcMzEx
KV0KPj4Kc3RhcnR4cmVmCjY3MDEKJSVFT0YK
--_=ic0008m4wtZ4TqBFd+sXC8--
When I deleted --_=ic0008m4wtZ4TqBFd+sXC8-- and strip the string then parsing was fixed up.
So my advise is make sure that you are decoding a correct base64 string.
Clear your browser cookie and recheck again, it should work.
In my case I faced this error, after deleting the venv for the perticular project and it showing error for each fields so I tried by changing the BROWSER(Chrome to Edge), And actually it worked..

Encode IP address using all printable characters in Python 2.7.x

I would like to encode an IP address in as short a string as possible using all the printable characters. According to https://en.wikipedia.org/wiki/ASCII#Printable_characters these are codes 20hex to 7Ehex.
For example:
shorten("172.45.1.33") --> "^.1 9" maybe.
In order to make decoding easy I also need the length of the encoding always to be the same. I also would like to avoid using the space character in order to make parsing easier in the future.
How can one do this?
I am looking for a solution that works in Python 2.7.x.
My attempt so far to modify Eloims's answer to work in Python 2:
First I installed the ipaddress backport for Python 2 (https://pypi.python.org/pypi/ipaddress) .
#This is needed because ipaddress expects character strings and not byte strings for textual IP address representations
from __future__ import unicode_literals
import ipaddress
import base64
#Taken from http://stackoverflow.com/a/20793663/2179021
def to_bytes(n, length, endianess='big'):
h = '%x' % n
s = ('0'*(len(h) % 2) + h).zfill(length*2).decode('hex')
return s if endianess == 'big' else s[::-1]
def def encode(ip):
ip_as_integer = int(ipaddress.IPv4Address(ip))
ip_as_bytes = to_bytes(ip_as_integer, 4, endianess="big")
ip_base85 = base64.a85encode(ip_as_bytes)
return ip_base
print(encode("192.168.0.1"))
This now fails because base64 doesn't have an attribute 'a85encode'.
An IP stored in binary is 4 bytes.
You can encode it in 5 printable ASCII characters using Base85.
Using more printable characters won't be able to shorten the resulting string more than that.
import ipaddress
import base64
def encode(ip):
ip_as_integer = int(ipaddress.IPv4Address(ip))
ip_as_bytes = ip_as_integer.to_bytes(4, byteorder="big")
ip_base85 = base64.a85encode(ip_as_bytes)
return ip_base85
print(encode("192.168.0.1"))
I found this question looking for a way to use base85/ascii85 on python 2. Eventually I discovered a couple of projects available to install via pypi. I settled on one called hackercodecs because the project is specific to encoding/decoding whereas the others I found just offered the implementation as a byproduct of necessity
from __future__ import unicode_literals
import ipaddress
from hackercodecs import ascii85_encode
def encode(ip):
return ascii85_encode(ipaddress.ip_address(ip).packed)[0]
print(encode("192.168.0.1"))
https://pypi.python.org/pypi/hackercodecs
https://github.com/jdukes/hackercodecs

Customized hash function for Python

I would like to generate a human-readable hash with customized properties -- e.g., a short string of specified length consisting entirely of upper case letters and digits excluding 0, 1, O, and I (to eliminate visual ambiguity):
"arbitrary string" --> "E3Y7UM8"
A 7-character string of the above form could take on over 34 billion unique values which, for my purposes, makes collisions extremely unlikely. Security is also not a major concern.
Is there an existing module or routine that implements something like the above? Alternatively, can someone suggest a straightforward algorithm?
The method you should be using has similarities with password one-way encryption. Of course since you are going for readable, a good password function is probably out of the question.
Here's what I would do:
Take an MD5 hash of the email
Convert base32 which already eliminates O and I
Replace any non-readable characters with readable ones
Here's an example based on the above:
import base64 # base32 is a function in base64
import hashlib
email = "somebody#example.com"
md5 = hashlib.md5()
md5.update(email.encode('utf-8'))
hash_in_bytes = md5.digest()
result = base64.b32encode(hash_in_bytes)
print(result)
# Or you can remove the extra "=" at the end
result = result.strip(b'=')
Since it's a one-way function (hash), you obviously don't need to worry about reversing the process (you can't anyway). You can also replace any other characters you find non-readable with readable ones (I would go for lowercase versions of the characters, e.g. q instead of Q)
More about base32 here: https://docs.python.org/3/library/base64.html
You can simply truncate the beginning of an MD5sum algorithm. It should have approximately the same statistical properties than the whole string anyway:
import md5
m = md5.new()
m.update("arbitrary string")
print(m.hexdigest()[:7])
Same code with hashlib module:
import hashlib
m = hashlib.md5()
m.update("arbitrary string")
print(m.hexdigest()[:7])

How to generate a mixed-case hash in Python?

I am having a hard time figuring out a reasonable way to generate a mixed-case hash in Python.
I want to generate something like: aZeEe9E
Right now I'm using MD5, which doesn't generate case-sensitive hashes.
Do any of you know how to generate a hash value consisting of upper- and lower- case characters + numbers?
-
Okay, GregS's advice worked like a charm (on the first try!):
Here is a simple example:
>>> import hashlib, base64
>>> s = 'http://gooogle.com'
>>> hash = hashlib.md5(s).digest()
>>> print hash
46c4f333fae34078a68393213bb9272d
>>> print base64.b64encode(hash)
NDZjNGYzMzNmYWUzNDA3OGE2ODM5MzIxM2JiOTI3MmQ=
you can base64 encode the output of the hash. This has a couple of additional characters beyond those you mentioned.
Maybe you can use base64-encoded hashes?

Random hash in Python

What is the easiest way to generate a random hash (MD5) in Python?
A md5-hash is just a 128-bit value, so if you want a random one:
import random
hash = random.getrandbits(128)
print("hash value: %032x" % hash)
I don't really see the point, though. Maybe you should elaborate why you need this...
I think what you are looking for is a universal unique identifier.Then the module UUID in python is what you are looking for.
import uuid
uuid.uuid4().hex
UUID4 gives you a random unique identifier that has the same length as a md5 sum. Hex will represent is as an hex string instead of returning a uuid object.
http://docs.python.org/2/library/uuid.html
https://docs.python.org/3/library/uuid.html
The secrets module was added in Python 3.6+. It provides cryptographically secure random values with a single call. The functions take an optional nbytes argument, default is 32 (bytes * 8 bits = 256-bit tokens). MD5 has 128-bit hashes, so provide 16 for "MD5-like" tokens.
>>> import secrets
>>> secrets.token_hex(nbytes=16)
'17adbcf543e851aa9216acc9d7206b96'
>>> secrets.token_urlsafe(16)
'X7NYIolv893DXLunTzeTIQ'
>>> secrets.token_bytes(128 // 8)
b'\x0b\xdcA\xc0.\x0e\x87\x9b`\x93\\Ev\x1a|u'
This works for both python 2.x and 3.x
import os
import binascii
print(binascii.hexlify(os.urandom(16)))
'4a4d443679ed46f7514ad6dbe3733c3d'
Yet another approach. You won't have to format an int to get it.
import random
import string
def random_string(length):
pool = string.letters + string.digits
return ''.join(random.choice(pool) for i in xrange(length))
Gives you flexibility on the length of the string.
>>> random_string(64)
'XTgDkdxHK7seEbNDDUim9gUBFiheRLRgg7HyP18j6BZU5Sa7AXiCHP1NEIxuL2s0'
Another approach to this specific question:
import random, string
def random_md5like_hash():
available_chars= string.hexdigits[:16]
return ''.join(
random.choice(available_chars)
for dummy in xrange(32))
I'm not saying it's faster or preferable to any other answer; just that it's another approach :)
import uuid
from md5 import md5
print md5(str(uuid.uuid4())).hexdigest()
import os, hashlib
hashlib.md5(os.urandom(32)).hexdigest()
The most proper way is to use random module
import random
format(random.getrandbits(128), 'x')
Using secrets is an overkill. It generates cryptographically strong randomness sacrifying performance.
All responses that suggest using UUID are intrinsically wrong because UUID (even UUID4) are not totally random. At least they include fixed version number that never changes.
import uuid
>>> uuid.uuid4()
UUID('8a107d39-bb30-4843-8607-ce9e480c8339')
>>> uuid.uuid4()
UUID('4ed324e8-08f9-4ea5-bc0c-8a9ad53e2df6')
All MD5s containing something other than 4 at 13th position from the left will be unreachable this way.
from hashlib import md5
plaintext = input('Enter the plaintext data to be hashed: ') # Must be a string, doesn't need to have utf-8 encoding
ciphertext = md5(plaintext.encode('utf-8')).hexdigest()
print(ciphertext)
It should also be noted that MD5 is a very weak hash function, also collisions have been found (two different plaintext values result in the same hash)
Just use a random value for plaintext.

Categories

Resources