Random hash in Python - python

What is the easiest way to generate a random hash (MD5) in Python?

A md5-hash is just a 128-bit value, so if you want a random one:
import random
hash = random.getrandbits(128)
print("hash value: %032x" % hash)
I don't really see the point, though. Maybe you should elaborate why you need this...

I think what you are looking for is a universal unique identifier.Then the module UUID in python is what you are looking for.
import uuid
uuid.uuid4().hex
UUID4 gives you a random unique identifier that has the same length as a md5 sum. Hex will represent is as an hex string instead of returning a uuid object.
http://docs.python.org/2/library/uuid.html
https://docs.python.org/3/library/uuid.html

The secrets module was added in Python 3.6+. It provides cryptographically secure random values with a single call. The functions take an optional nbytes argument, default is 32 (bytes * 8 bits = 256-bit tokens). MD5 has 128-bit hashes, so provide 16 for "MD5-like" tokens.
>>> import secrets
>>> secrets.token_hex(nbytes=16)
'17adbcf543e851aa9216acc9d7206b96'
>>> secrets.token_urlsafe(16)
'X7NYIolv893DXLunTzeTIQ'
>>> secrets.token_bytes(128 // 8)
b'\x0b\xdcA\xc0.\x0e\x87\x9b`\x93\\Ev\x1a|u'

This works for both python 2.x and 3.x
import os
import binascii
print(binascii.hexlify(os.urandom(16)))
'4a4d443679ed46f7514ad6dbe3733c3d'

Yet another approach. You won't have to format an int to get it.
import random
import string
def random_string(length):
pool = string.letters + string.digits
return ''.join(random.choice(pool) for i in xrange(length))
Gives you flexibility on the length of the string.
>>> random_string(64)
'XTgDkdxHK7seEbNDDUim9gUBFiheRLRgg7HyP18j6BZU5Sa7AXiCHP1NEIxuL2s0'

Another approach to this specific question:
import random, string
def random_md5like_hash():
available_chars= string.hexdigits[:16]
return ''.join(
random.choice(available_chars)
for dummy in xrange(32))
I'm not saying it's faster or preferable to any other answer; just that it's another approach :)

import uuid
from md5 import md5
print md5(str(uuid.uuid4())).hexdigest()

import os, hashlib
hashlib.md5(os.urandom(32)).hexdigest()

The most proper way is to use random module
import random
format(random.getrandbits(128), 'x')
Using secrets is an overkill. It generates cryptographically strong randomness sacrifying performance.
All responses that suggest using UUID are intrinsically wrong because UUID (even UUID4) are not totally random. At least they include fixed version number that never changes.
import uuid
>>> uuid.uuid4()
UUID('8a107d39-bb30-4843-8607-ce9e480c8339')
>>> uuid.uuid4()
UUID('4ed324e8-08f9-4ea5-bc0c-8a9ad53e2df6')
All MD5s containing something other than 4 at 13th position from the left will be unreachable this way.

from hashlib import md5
plaintext = input('Enter the plaintext data to be hashed: ') # Must be a string, doesn't need to have utf-8 encoding
ciphertext = md5(plaintext.encode('utf-8')).hexdigest()
print(ciphertext)
It should also be noted that MD5 is a very weak hash function, also collisions have been found (two different plaintext values result in the same hash)
Just use a random value for plaintext.

Related

Hex digest for Python's built-in hash function

I need to create an identifier token from a set of nested configuration values.
The token can be part of a URL, so – to make processing easier – it should contain only hexadecimal digits (or something similar).
The config values are nested tuples with elements of hashable types like int, bool, str etc.
My idea was to use the built-in hash() function, as this will continue to work even if the config structure changes.
This is my first attempt:
def token(config):
h = hash(config)
return '{:X}'.format(h)
This will produce tokens of variable length, but that doesn't matter.
What bothers me, though, is that the token might contain a leading minus sign, since the return value of hash() is a signed integer.
As a way to avoid the sign, I thought of the following work-around, which is adding a constant to the hash value.
This constant should be half the size of the range the value of hash() can take (which is platform-dependent, eg. different for 32-/64-bit systems):
HALF_HASH_RANGE = 2**(sys.hash_info.width-1)
Is this a sane and portable solution?
Or will I shoot myself in the foot with this?
I also saw suggestions for using struct.pack() (which returns a bytes object, on which one can call the .hex() method), but it also requires knowing the range of the hash value in advance (for the choice of the right format character).
Addendum:
Encryption strength or collisions by chance are not an issue.
The drawback of the hashlib library in this scenario is that it requires writing a converter that traverses the input structure and converts everything into a bytes representation, which is cumbersome.
You can use any of hash functions for getting unique string. Right now python support out of the box many algorithms, like: md5, sha1, sha224, sha256, sha384, sha512. You can read more about it here - https://docs.python.org/2/library/hashlib.html
This example shows how to use library hashlib. (Python 3)
>>> import hashlib
>>> sha = hashlib.sha256()
>>> sha.update('somestring'.encode())
>>> sha.hexdigest()
>>> '63f6fe797026d794e0dc3e2bd279aee19dd2f8db67488172a644bb68792a570c'
Also you can try library hashids. But note that it's not a hash algorithm and you (and anyone who knows salt) can decrypt data.
$ pip install hashids
Basic usage:
>>> from hashids import Hashids
>>> hashids = Hashids()
>>> hashids.encode(123)
'Mj3'
>>> hashids.decode('Mj3')
123
I need to create an identifier token from a set of nested configuration values
I came across this question while trying to solve the same problem, and realizing that some of the calls to hash return negative integers.
Here's how I would implement your token function:
import sys
def token(config) -> str:
"""Generates a hex token that identifies a hashable config."""
# `sign_mask` is used to make `hash` return unsigned values
sign_mask = (1 << sys.hash_info.width) - 1
# Get the hash as a positive hex value with consistent padding without '0x'
return f'{hash(config) & sign_mask:#0{sys.hash_info.width//4}x}'[2:]
In my case I needed it to work with a broad range of inputs for the config. It did not need to be particularly performant (it was not on a hot path), and it was acceptable if it occasionally had collisions (more than what would normally be expected from hash). All it really needed to do is produce short (e.g. 16 chars long) consistent outputs for consistent inputs. So for my case I used the above function with a small modification to ensure the provided config is hashable, at the cost of increased collision risk and processing time:
import sys
def token(config) -> str:
"""Generates a hex token that identifies a config."""
# `sign_mask` is used to make `hash` return unsigned values
sign_mask = (1 << sys.hash_info.width) - 1
# Use `json.dumps` with `repr` to ensure the config is hashable
json_config = json.dumps(config, default=repr)
# Get the hash as a positive hex value with consistent padding without '0x'
return f'{hash(json_config) & sign_mask:#0{sys.hash_info.width//4}x}'[2:]
I'd reccomend using hashlib
cast the token to a string, and then cast the hexdigest to a hex integer. Bellow is an example with the sha256 algorithm but you can use any hashing algorithm hashlib supports
import hashlib as hl
def shasum(token):
return int(hl.sha256(str(token).encode('utf-8')).hexdigest(), 16)

Customized hash function for Python

I would like to generate a human-readable hash with customized properties -- e.g., a short string of specified length consisting entirely of upper case letters and digits excluding 0, 1, O, and I (to eliminate visual ambiguity):
"arbitrary string" --> "E3Y7UM8"
A 7-character string of the above form could take on over 34 billion unique values which, for my purposes, makes collisions extremely unlikely. Security is also not a major concern.
Is there an existing module or routine that implements something like the above? Alternatively, can someone suggest a straightforward algorithm?
The method you should be using has similarities with password one-way encryption. Of course since you are going for readable, a good password function is probably out of the question.
Here's what I would do:
Take an MD5 hash of the email
Convert base32 which already eliminates O and I
Replace any non-readable characters with readable ones
Here's an example based on the above:
import base64 # base32 is a function in base64
import hashlib
email = "somebody#example.com"
md5 = hashlib.md5()
md5.update(email.encode('utf-8'))
hash_in_bytes = md5.digest()
result = base64.b32encode(hash_in_bytes)
print(result)
# Or you can remove the extra "=" at the end
result = result.strip(b'=')
Since it's a one-way function (hash), you obviously don't need to worry about reversing the process (you can't anyway). You can also replace any other characters you find non-readable with readable ones (I would go for lowercase versions of the characters, e.g. q instead of Q)
More about base32 here: https://docs.python.org/3/library/base64.html
You can simply truncate the beginning of an MD5sum algorithm. It should have approximately the same statistical properties than the whole string anyway:
import md5
m = md5.new()
m.update("arbitrary string")
print(m.hexdigest()[:7])
Same code with hashlib module:
import hashlib
m = hashlib.md5()
m.update("arbitrary string")
print(m.hexdigest()[:7])

Python uuid4, How to limit the length of Unique chars

In Python, I am using uuid4() method to create a unique char set. But I can not find a way to limit that to 10 or 8 characters. Is there any way?
uuid4()
ffc69c1b-9d87-4c19-8dac-c09ca857e3fc
Thanks.
You can then generate a short UUID with shortuuid:
import shortuuid
shortuuid.uuid()
'vytxeTZskVKR7C7WgdSP3d'
Native solution with big risk of collision:
Try :
x = uuid4()
str(x)[:8]
Output :
"ffc69c1b"
How do I get a substring of a string in Python?
You can use shortuuid package.
pip install shortuuid
then it would be similar to UUID package.
import shortuuid
shortuuid.uuid()
Output
'vytxeTZskVKR7C7WgdSP3d'
Custom Length UUID
shortuuid.ShortUUID().random(length=22)
Output
'RaF56o2r58hTKT7AYS9doj'
Source - https://pypi.org/project/shortuuid/
The previous answers do not provide a UUID, either because they truncate the string or because they didn't generate a UUID to begin with. According to the documentation, if you truncate the string "[t]he IDs won’t be universally unique any longer [...]" and the documentation describes ShortUUID().random() to generate a cryptographically secure string instead of a UUID.
However, you can change the UUID length indirectly by changing the number of characters in the alphabet. In the implementation of ShortUUID.encoded_length() you can see that the UUID length is int(math.ceil(16 * math.log(256) / math.log(len(alphabet)))). You can change the alphabet by shortuuid.set_alphabet().
The more characters in the alphabet, the shorter the UUID can be and still be unique.

Encrypt a string in Python. Restrict the characters used to only alphanumeric

I would like to encrypt a 10 Character (alpha-numeric only) string into a 16 or 32 character alpha-numeric string.
The string I am encrypting is an asset tag. So in itself it carries no information, but I would like to hide all valid possible strings within a larger group of possible strings. I was hoping that encrypting the string would be a good way to do this.
Is it possible to do this with the Python PyCrypto library?
Here is an example I found regarding using PyCrypto.
You're better off with simple hashing (which is like one way encryption). To do this just use the md5 function to make a digest and then base64 or base16 encode it. Please note that base64 strings can include +, = or /.
import md5
import base64
def obfuscate(s):
return base64.b64encode( md5.new(s).digest())
def obfuscate2(s):
return base64.b16encode( md5.new(s).digest())
# returns alphanumeric string but strings can also include slash, plus or equal i.e. /+=
print obfuscate('Tag 1')
print obfuscate('Tag 2')
print obfuscate('Tag 3')
# return hex string
print obfuscate2('Tag 1')
As has been commented md5 is rapidly losing its security, so if you want to have something more reliable for the future, use the SHA-2 example below.
import hashlib
def obfuscate(s):
m = hashlib.sha256()
m.update(s)
return m.hexdigest()
print obfuscate('Tag 1')
print obfuscate('Tag 2')
print obfuscate('Tag 3')
One more function - this time generate about 96-bit* digest using SHA-2 and truncating the output so that we can restrict it to 16 alphanum chars. This give slightly more chance of collision but should be good enough for most practical purposes.
import hashlib
import base64
def obfuscate(s):
m = hashlib.sha256()
m.update(s)
hash = base64.b64encode(m.digest(), altchars="ZZ") # make one way base64 encode, to fit characters into alphanum space only
return hash[:16] # cut of hash at 16 chars - gives about 96 bits which should
# 96 bits means 1 in billion chance of collision if you have 1 billion tags (or much lower chance with fewer tags)
# http://en.wikipedia.org/wiki/Birthday_attack
print obfuscate('Tag 1')
print obfuscate('Tag 2')
print obfuscate('Tag 3')
*The actual digest is only 95.2 bits as we use 62 character alphabet for encoding.
>>> math.log(62**16,2)
95.26714096618998
To make a string longer, you could try the following;
first compress it with bzip2
then make it readable again with base64 encoding
Like this:
import bz2
import base64
base64.b64encode(bz2.compress('012345'))
This will yield:
'QlpoOTFBWSZTWeEMDLgAAAAIAH4AIAAhgAwDJy7i7kinChIcIYGXAA=='
Due to the bzip2 header, the first 13 character will always be the same, so you should discard them;
base64.b64encode(bz2.compress('012345'))[14:]
This gives:
'EMDLgAAAAIAH4AIAAhgAwDJy7i7kinChIcIYGXAA=='
Note that this is not cryptographically secure; it is trivial to invert if you know the recipe that is used:
foo = base64.b64encode(bz2.compress('012345'))
bz2.decompress(base64.b64decode(foo))
gives:
'012345'
I think shake256 fit your needs:
You need to install pycryptodome.
https://pycryptodome.readthedocs.io/en/latest/src/hash/shake256.html
#!/usr/bin/env python
from Crypto.Hash import SHAKE256
from binascii import hexlify
def encrypt_shake256(s, hash_size):
shake = SHAKE256.new()
shake.update(s.encode())
return hexlify(shake.read(hash_size//2))
def main():
hash = encrypt_shake256("holahola", 16)
print(hash)
print(len(hash))
if __name__ == '__main__':
main()
Output:
b'c126f8fb14fb21d8'
16
Yes, you can also use PyCrypto :
from Crypto.Hash import SHA256
aHash = SHA256.new("somethingToHash")
print(aHash.hexdigest()) #will print out the hashed password
The Crypto.Hash module is what comes from installing the pycrypto module (sudo pip install pycrypto).
This is basically the same thing as hashlib, however the PyCrypto library comes with ciphering modules.

How to generate a mixed-case hash in Python?

I am having a hard time figuring out a reasonable way to generate a mixed-case hash in Python.
I want to generate something like: aZeEe9E
Right now I'm using MD5, which doesn't generate case-sensitive hashes.
Do any of you know how to generate a hash value consisting of upper- and lower- case characters + numbers?
-
Okay, GregS's advice worked like a charm (on the first try!):
Here is a simple example:
>>> import hashlib, base64
>>> s = 'http://gooogle.com'
>>> hash = hashlib.md5(s).digest()
>>> print hash
46c4f333fae34078a68393213bb9272d
>>> print base64.b64encode(hash)
NDZjNGYzMzNmYWUzNDA3OGE2ODM5MzIxM2JiOTI3MmQ=
you can base64 encode the output of the hash. This has a couple of additional characters beyond those you mentioned.
Maybe you can use base64-encoded hashes?

Categories

Resources