In Python, I am using uuid4() method to create a unique char set. But I can not find a way to limit that to 10 or 8 characters. Is there any way?
uuid4()
ffc69c1b-9d87-4c19-8dac-c09ca857e3fc
Thanks.
You can then generate a short UUID with shortuuid:
import shortuuid
shortuuid.uuid()
'vytxeTZskVKR7C7WgdSP3d'
Native solution with big risk of collision:
Try :
x = uuid4()
str(x)[:8]
Output :
"ffc69c1b"
How do I get a substring of a string in Python?
You can use shortuuid package.
pip install shortuuid
then it would be similar to UUID package.
import shortuuid
shortuuid.uuid()
Output
'vytxeTZskVKR7C7WgdSP3d'
Custom Length UUID
shortuuid.ShortUUID().random(length=22)
Output
'RaF56o2r58hTKT7AYS9doj'
Source - https://pypi.org/project/shortuuid/
The previous answers do not provide a UUID, either because they truncate the string or because they didn't generate a UUID to begin with. According to the documentation, if you truncate the string "[t]he IDs won’t be universally unique any longer [...]" and the documentation describes ShortUUID().random() to generate a cryptographically secure string instead of a UUID.
However, you can change the UUID length indirectly by changing the number of characters in the alphabet. In the implementation of ShortUUID.encoded_length() you can see that the UUID length is int(math.ceil(16 * math.log(256) / math.log(len(alphabet)))). You can change the alphabet by shortuuid.set_alphabet().
The more characters in the alphabet, the shorter the UUID can be and still be unique.
Related
I want to generate 500 random binary strings of length 32 (i.e., from 00000...0 to 1111...1 where each string is length 32). I'm wondering how I could do this efficiently or Pythonically.
Right now, I randomly choose between 0 and 1 32 times and then append the results into a string. This seems pretty inefficient and not pythonic. Is there some library functions i could use for this?
import random
rnd_32bit = f'{random.getrandbits(32):=032b}'
Since python 3.9 you can do it this way:
from random import randbytes
def rand_32bit_string(byte_count):
return "{0:b}".format(int.from_bytes(randbytes(4), "big"))
Easiest way is probably to use the secrets library (part of the python standard library as of python 3.6) and its token_bytes function.
import secrets
s = secrets.token_bytes(nbytes=4)
This question has various solutions for converting from bytes to a str of bit digits. You could do the following:
s = bin(int(secrets.token_hex(nbytes=4), 16))[2:].rjust(32, '0')
Throw this in a for loop and you're done!
In python 3.9 you also have the option of random.randbytes, e.g. random.randbytes(4), which behaves similarly (though secrets will use a cryptographically secure source of randomness, if that matters to you).
I need to create an identifier token from a set of nested configuration values.
The token can be part of a URL, so – to make processing easier – it should contain only hexadecimal digits (or something similar).
The config values are nested tuples with elements of hashable types like int, bool, str etc.
My idea was to use the built-in hash() function, as this will continue to work even if the config structure changes.
This is my first attempt:
def token(config):
h = hash(config)
return '{:X}'.format(h)
This will produce tokens of variable length, but that doesn't matter.
What bothers me, though, is that the token might contain a leading minus sign, since the return value of hash() is a signed integer.
As a way to avoid the sign, I thought of the following work-around, which is adding a constant to the hash value.
This constant should be half the size of the range the value of hash() can take (which is platform-dependent, eg. different for 32-/64-bit systems):
HALF_HASH_RANGE = 2**(sys.hash_info.width-1)
Is this a sane and portable solution?
Or will I shoot myself in the foot with this?
I also saw suggestions for using struct.pack() (which returns a bytes object, on which one can call the .hex() method), but it also requires knowing the range of the hash value in advance (for the choice of the right format character).
Addendum:
Encryption strength or collisions by chance are not an issue.
The drawback of the hashlib library in this scenario is that it requires writing a converter that traverses the input structure and converts everything into a bytes representation, which is cumbersome.
You can use any of hash functions for getting unique string. Right now python support out of the box many algorithms, like: md5, sha1, sha224, sha256, sha384, sha512. You can read more about it here - https://docs.python.org/2/library/hashlib.html
This example shows how to use library hashlib. (Python 3)
>>> import hashlib
>>> sha = hashlib.sha256()
>>> sha.update('somestring'.encode())
>>> sha.hexdigest()
>>> '63f6fe797026d794e0dc3e2bd279aee19dd2f8db67488172a644bb68792a570c'
Also you can try library hashids. But note that it's not a hash algorithm and you (and anyone who knows salt) can decrypt data.
$ pip install hashids
Basic usage:
>>> from hashids import Hashids
>>> hashids = Hashids()
>>> hashids.encode(123)
'Mj3'
>>> hashids.decode('Mj3')
123
I need to create an identifier token from a set of nested configuration values
I came across this question while trying to solve the same problem, and realizing that some of the calls to hash return negative integers.
Here's how I would implement your token function:
import sys
def token(config) -> str:
"""Generates a hex token that identifies a hashable config."""
# `sign_mask` is used to make `hash` return unsigned values
sign_mask = (1 << sys.hash_info.width) - 1
# Get the hash as a positive hex value with consistent padding without '0x'
return f'{hash(config) & sign_mask:#0{sys.hash_info.width//4}x}'[2:]
In my case I needed it to work with a broad range of inputs for the config. It did not need to be particularly performant (it was not on a hot path), and it was acceptable if it occasionally had collisions (more than what would normally be expected from hash). All it really needed to do is produce short (e.g. 16 chars long) consistent outputs for consistent inputs. So for my case I used the above function with a small modification to ensure the provided config is hashable, at the cost of increased collision risk and processing time:
import sys
def token(config) -> str:
"""Generates a hex token that identifies a config."""
# `sign_mask` is used to make `hash` return unsigned values
sign_mask = (1 << sys.hash_info.width) - 1
# Use `json.dumps` with `repr` to ensure the config is hashable
json_config = json.dumps(config, default=repr)
# Get the hash as a positive hex value with consistent padding without '0x'
return f'{hash(json_config) & sign_mask:#0{sys.hash_info.width//4}x}'[2:]
I'd reccomend using hashlib
cast the token to a string, and then cast the hexdigest to a hex integer. Bellow is an example with the sha256 algorithm but you can use any hashing algorithm hashlib supports
import hashlib as hl
def shasum(token):
return int(hl.sha256(str(token).encode('utf-8')).hexdigest(), 16)
I would like to generate a human-readable hash with customized properties -- e.g., a short string of specified length consisting entirely of upper case letters and digits excluding 0, 1, O, and I (to eliminate visual ambiguity):
"arbitrary string" --> "E3Y7UM8"
A 7-character string of the above form could take on over 34 billion unique values which, for my purposes, makes collisions extremely unlikely. Security is also not a major concern.
Is there an existing module or routine that implements something like the above? Alternatively, can someone suggest a straightforward algorithm?
The method you should be using has similarities with password one-way encryption. Of course since you are going for readable, a good password function is probably out of the question.
Here's what I would do:
Take an MD5 hash of the email
Convert base32 which already eliminates O and I
Replace any non-readable characters with readable ones
Here's an example based on the above:
import base64 # base32 is a function in base64
import hashlib
email = "somebody#example.com"
md5 = hashlib.md5()
md5.update(email.encode('utf-8'))
hash_in_bytes = md5.digest()
result = base64.b32encode(hash_in_bytes)
print(result)
# Or you can remove the extra "=" at the end
result = result.strip(b'=')
Since it's a one-way function (hash), you obviously don't need to worry about reversing the process (you can't anyway). You can also replace any other characters you find non-readable with readable ones (I would go for lowercase versions of the characters, e.g. q instead of Q)
More about base32 here: https://docs.python.org/3/library/base64.html
You can simply truncate the beginning of an MD5sum algorithm. It should have approximately the same statistical properties than the whole string anyway:
import md5
m = md5.new()
m.update("arbitrary string")
print(m.hexdigest()[:7])
Same code with hashlib module:
import hashlib
m = hashlib.md5()
m.update("arbitrary string")
print(m.hexdigest()[:7])
I am having a hard time figuring out a reasonable way to generate a mixed-case hash in Python.
I want to generate something like: aZeEe9E
Right now I'm using MD5, which doesn't generate case-sensitive hashes.
Do any of you know how to generate a hash value consisting of upper- and lower- case characters + numbers?
-
Okay, GregS's advice worked like a charm (on the first try!):
Here is a simple example:
>>> import hashlib, base64
>>> s = 'http://gooogle.com'
>>> hash = hashlib.md5(s).digest()
>>> print hash
46c4f333fae34078a68393213bb9272d
>>> print base64.b64encode(hash)
NDZjNGYzMzNmYWUzNDA3OGE2ODM5MzIxM2JiOTI3MmQ=
you can base64 encode the output of the hash. This has a couple of additional characters beyond those you mentioned.
Maybe you can use base64-encoded hashes?
What is the easiest way to generate a random hash (MD5) in Python?
A md5-hash is just a 128-bit value, so if you want a random one:
import random
hash = random.getrandbits(128)
print("hash value: %032x" % hash)
I don't really see the point, though. Maybe you should elaborate why you need this...
I think what you are looking for is a universal unique identifier.Then the module UUID in python is what you are looking for.
import uuid
uuid.uuid4().hex
UUID4 gives you a random unique identifier that has the same length as a md5 sum. Hex will represent is as an hex string instead of returning a uuid object.
http://docs.python.org/2/library/uuid.html
https://docs.python.org/3/library/uuid.html
The secrets module was added in Python 3.6+. It provides cryptographically secure random values with a single call. The functions take an optional nbytes argument, default is 32 (bytes * 8 bits = 256-bit tokens). MD5 has 128-bit hashes, so provide 16 for "MD5-like" tokens.
>>> import secrets
>>> secrets.token_hex(nbytes=16)
'17adbcf543e851aa9216acc9d7206b96'
>>> secrets.token_urlsafe(16)
'X7NYIolv893DXLunTzeTIQ'
>>> secrets.token_bytes(128 // 8)
b'\x0b\xdcA\xc0.\x0e\x87\x9b`\x93\\Ev\x1a|u'
This works for both python 2.x and 3.x
import os
import binascii
print(binascii.hexlify(os.urandom(16)))
'4a4d443679ed46f7514ad6dbe3733c3d'
Yet another approach. You won't have to format an int to get it.
import random
import string
def random_string(length):
pool = string.letters + string.digits
return ''.join(random.choice(pool) for i in xrange(length))
Gives you flexibility on the length of the string.
>>> random_string(64)
'XTgDkdxHK7seEbNDDUim9gUBFiheRLRgg7HyP18j6BZU5Sa7AXiCHP1NEIxuL2s0'
Another approach to this specific question:
import random, string
def random_md5like_hash():
available_chars= string.hexdigits[:16]
return ''.join(
random.choice(available_chars)
for dummy in xrange(32))
I'm not saying it's faster or preferable to any other answer; just that it's another approach :)
import uuid
from md5 import md5
print md5(str(uuid.uuid4())).hexdigest()
import os, hashlib
hashlib.md5(os.urandom(32)).hexdigest()
The most proper way is to use random module
import random
format(random.getrandbits(128), 'x')
Using secrets is an overkill. It generates cryptographically strong randomness sacrifying performance.
All responses that suggest using UUID are intrinsically wrong because UUID (even UUID4) are not totally random. At least they include fixed version number that never changes.
import uuid
>>> uuid.uuid4()
UUID('8a107d39-bb30-4843-8607-ce9e480c8339')
>>> uuid.uuid4()
UUID('4ed324e8-08f9-4ea5-bc0c-8a9ad53e2df6')
All MD5s containing something other than 4 at 13th position from the left will be unreachable this way.
from hashlib import md5
plaintext = input('Enter the plaintext data to be hashed: ') # Must be a string, doesn't need to have utf-8 encoding
ciphertext = md5(plaintext.encode('utf-8')).hexdigest()
print(ciphertext)
It should also be noted that MD5 is a very weak hash function, also collisions have been found (two different plaintext values result in the same hash)
Just use a random value for plaintext.