This is the code I use to generate a password reset link for my app:
def create_unique_code():
return str(uuid.uuid4())
Is that strong enough? I use a one or two day expiry time.
In CPython, yes. In other Python implementations, probably, but you might want to double-check that a cryptographically strong source of randomness is used to generate the UUID.
There are two factors you might care about when judging whether some way of generating secure random tokens - such as UUIDs - is "strong enough":
Are there enough possible values for it not to be brute-forced?
Is the source of randomness used cryptographically secure?
Since there are 2122 version 4 UUIDs (that's a little over 5 trillion trillion trillion), the answer to point 1 is definitely "yes", in this case. The space of all possible UUIDs ain't going to be brute forceable any time soon.
Point 2 is not currently answered by the official Python docs on uuid.uuid4(), which make no mention of security or whether the randomness source used is strong. Indeed, the entire documentation of uuid4() is just:
Generate a random UUID.
which clearly provides no security guarantees.
Nor is it addressed by the UUID specification, which does not mandate a cryptographically strong source of randomness be used in UUID generation and indeed explicitly contemplates the possibility of a "predictable random number source" being used to generate UUIDs in the Security Considerations section.
However, we can look at the implementation at https://github.com/python/cpython/blob/master/Lib/uuid.py:
def uuid4():
"""Generate a random UUID."""
return UUID(bytes=os.urandom(16), version=4)
Since this uses os.urandom as its randomness source, it's secure. See the docs at https://docs.python.org/3/library/os.html#os.urandom which note that os.urandom returns:
a string of size random bytes suitable for cryptographic use.
Yes, a UUID4 is fully random and long enough to rule out brute forcing or lucky guesses. So as long as whatever RNG uuid.uuid4() provides sufficiently good randomness you should be fine.
However, consider using e.g. a cryptographically signed token (the itsdangerous lib can take care of it) - not only can you specify an expiry time right when generating it, you also won't necessarily have to store anything about the token on your server.
Related
>>> import sys
>>> sys.set_int_max_str_digits(4300) # Illustrative, this is the default.
>>> _ = int('2' * 5432)
Traceback (most recent call last):
...
ValueError: Exceeds the limit (4300) for integer string conversion: value has 5432 digits.
Python 3.10.7 introduced this breaking change for type conversion.
Documentation: Integer string conversion length limitation
Actually I don't understand why
this was introduced and
where does the default value of 4300 come from? Sounds like an arbitrary number.
See github issue CVE-2020-10735: Prevent DoS by large int<->str conversions #95778:
Problem
A Denial Of Service (DoS) issue was identified in CPython
because we use binary bignum’s for our int implementation. A huge
integer will always consume a near-quadratic amount of CPU time in
conversion to or from a base 10 (decimal) string with a large number
of digits. No efficient algorithm exists to do otherwise.
It is quite common for Python code implementing network protocols and
data serialization to do int(untrusted_string_or_bytes_value) on input
to get a numeric value, without having limited the input length or to
do log("processing thing id %s", unknowingly_huge_integer) or any
similar concept to convert an int to a string without first checking
its magnitude. (http, json, xmlrpc, logging, loading large values into
integer via linear-time conversions such as hexadecimal stored in
yaml, or anything computing larger values based on user controlled
inputs… which then wind up attempting to output as decimal later on).
All of these can suffer a CPU consuming DoS in the face of untrusted
data.
Everyone auditing all existing code for this, adding length guards,
and maintaining that practice everywhere is not feasible nor is it
what we deem the vast majority of our users want to do.
This issue has been reported to the Python Security Response Team
multiple times by a few different people since early 2020, most
recently a few weeks ago while I was in the middle of polishing up the
PR so it’d be ready before 3.11.0rc2.
Mitigation
After discussion on the Python Security Response Team
mailing list the conclusion was that we needed to limit the size of
integer to string conversions for non-linear time conversions
(anything not a power-of-2 base) by default. And offer the ability to
configure or disable this limit.
The Python Steering Council is aware of this change and accepts it as
necessary.
Further discussion can be found on the Python Core Developers Discuss thread Int/str conversions broken in latest Python bugfix releases.
I found this comment by Steve Dower to be informative:
Our apologies for the lack of transparency in the process here. The
issue was first reported to a number of other security teams, and
converged in the Python Security Response Team where we agreed that
the correct fix was to modify the runtime.
The delay between report and fix is entirely our fault. The security
team is made up of volunteers, our availability isn’t always reliable,
and there’s nobody “in charge” to coordinate work. We’ve been
discussing how to improve our processes. However, we did agree that
the potential for exploitation is high enough that we didn’t want to
disclose the issue without a fix available and ready for use.
We did work through a number of alternative approaches, implementing
many of them. The code doing int(gigabyte_long_untrusted_string) could
be anywhere inside a json.load or HTTP header parser, and can run very
deep. Parsing libraries are everywhere, and tend to use int
indiscriminately (though they usually handle ValueError already).
Expecting every library to add a new argument to every int() call
would have led to thousands of vulnerabilities being filed, and made
it impossible for users to ever trust that their systems could not be
DoS’d.
We agree it’s a heavy hammer to do it in the core, but it’s also the
only hammer that has a chance of giving users the confidence to keep
running Python at the boundary of their apps.
Now, I’m personally inclined to agree that int->str conversions should
do something other than raise. I was outvoted because it would break
round-tripping, which is a reasonable argument that I accepted. We can
still improve this over time and make it more usable. However, in most
cases we saw, rendering an excessively long string isn’t desirable
either. That should be the opt-in behaviour.
Raising an exception from str may prove to be too much, and could be
reconsidered, but we don’t see a feasible way to push out updates to
every user of int, so that will surely remain global.
if you get this error:
ValueError: Exceeds the limit (4300) for integer string conversion
you can increase the limit by:
import sys
sys.set_int_max_str_digits(0)
Now, you can do bigger calculations.
Link to documentation:
https://docs.python.org/3/library/stdtypes.html#integer-string-conversion-length-limitation
What exactly is the purpose of using the time.sleep() function in this snippet:
def check_user(user: User, password: str) -> bool:
hashpass, salt = user_info[user].hashed_password
target_hash_pass = hash_password(password, salt)[0]
sleep(random.expovariate(10))
return secrets.compare_digest(hashpass, target_hash_pass)
It's an attempt to introduce a random time delay when verifying passwords, presumably to counter timing attacks, where an attacker makes use of the fact that incorrect passwords result in a faster response.
I'd not expect this to be that useful. secrets.compare_digest() already does everything right to mitigate timing attacks. Provided hashpass and target_hash_pass are the same type (both are bytes or both are strings, always), and have equal length, the usual timing attack vectors are not available here.
However, it could be that the author doesn't trust that those two conditions are always true. Perhaps the user_info structure contains shorter or longer password hashes or there is a chance that you'll get a different type. If so, then those issues should be addressed directly, instead.
It should be noted that because a timing attack compares the statistical differences between multiple attempts, using different passwords, random noise is only going to marginally slow such attacks; it only adds some more noise on top of whatever the timing differences network connections and normal computer operations already add. See Can I prevent timing attacks with random delays? and Could a random sleep prevent timing attacks?. Worse, the code uses the standard random random number generator, which isn't cryptographically secure and so the sleep variations could be accounted for given a determined enough attacker.
I'd strongly recommend to the author to remove that line, it doesn't offer any actual security here.
If you start with a list of hundreds or perhaps thousands of separate items, and you want Python to choose one (at a time) at random (for creating a ciphertext), how "random" will it really be? It's highly important that there be no repeats of the same item (integers, strings) whatsoever, because of the crypt0graphic nature of the app. But is there some way to confidently perform random selection from dictionaries?
Thanks for the suggestions of such, but this question is not a duplicate of the two possibilities listed. For one thing, the range of items up for selection needs to be entirely dynamic, yet for brevity's sake, I've limited the description of the mechanics of the app, which is intended for educational/entertainment purposes and not for saving the world ;-)
From random module docs:
Warning: The pseudo-random generators of this module should not be used for security purposes. Use os.urandom() or SystemRandom if you require a cryptographically secure pseudo-random number generator.
If you're using Python 3.6 you can use:
from secrets import choice
choice(your_options)
According to the module documentation:
The secrets module is used for generating cryptographically strong random numbers suitable for managing data such as passwords, account authentication, security tokens, and related secrets.
First off, what you're talking about is how random a human perceives a generator to be; not how random something is. There's a good post on how Spotify shuffles music to seem more random to humans, while actually reducing entropy. (or at least how they used to do it).
Not ever using the same number/string twice in the same message is a worse security flaw than the one used to crack the Enigma during WW2.
Second, by "how random", you probably mean "how much entropy".
Third, the random module in Python is not cryptographically secure, as others have pointed out. Don't use it for cryptography-related code. There's os.urandom(), SystemRandom or secrets, but you should probably not use any of them, because:
Fourth, and most important, you should never roll your own crypto unless you have a degree in cryptography. Check what the state of the art is, and use that instead. Crypto SE knows their stuff, and so does Security SE.
One of the big additions in the recently released Python 3.6 is the addition of a secrets module for generating cryptographically strong random numbers.
I'm testing some code that uses python-gnupg to encrypt/sign/decrypt some plaintext, and I'd like to generate a key pair on the fly. GnuPG is (of course) super paranoid in generating the key pair, and it sucks a lot of entropy from my system.
I found this answer on unix.stackexchange.com, but using rngd to have /dev/random pull from /dev/urandom sounds like a bad idea.
Since I'm testing I don't need high security, I just need the key pair to be generated as quickly as possible.
An idea is to pre-generate some keys offline, and use those keys on my tests. Anyway, I'd like to programmatically generate my temporary key pairs while executing the tests.
This is the code I'm using now (that is, again, super slow and not good for testing):
from tempfile import mkdtemp
import gnupg
def temp_identity():
identity = gnupg.GPG(gnupghome=mkdtemp())
input_data = gpg.gen_key_input(key_type='RSA', key_length=1024)
identity.gen_key(input_data)
return identity
Using any method to change /dev/random to pull out of /dev/urandom is totally fine once the entropy pool was initiated with a proper random state (which is not a problem on hardware x86 machines, but might require discussion for other devices). I strongly recommend watching The plain simple reality of entropy -- Or how I learned to stop worrying and love urandom, a lecture at 32C3.
If you want to fasten-up on-the-fly key generation, consider going for smaller key sizes like RSA 512 (1k keys aren't really secure, either). THis will render keys insecure, but if that's fine for testing -- go for it. Using another algorithm (for example elliptic curves if you already have GnuPG 2.1) might also speed up key generation.
If you really want to stick with /dev/random and smaller key sizes don't provide adequate performance, you can very well pre-generate keys, export them using gpg --export-secret-keys and import them instead of creating new ones.
gpg-agent also knows the option --debug-quick-random, which seems to fit your use case, but I've never used it before. From man gpg-agent:
--debug-quick-random
This option inhibits the use of the very secure random quality level (Libgcrypt’s GCRY_VERY_STRONG_RANDOM) and degrades all request down to standard random quality. It is
only used for testing and shall not be used for any production quality keys. This option is only effective when given on the command line.
Project Euler
I have recently begun to solve some of the Project Euler riddles. I found the discussion forum in the site a bit frustrating (most of the discussions are closed and poorly-threaded), So I have decided to publish my Python solutions on launchpad for discussion.
The problem is that it seems quite unethical to publish these solutions, as it would let other people gain reputation without doing the programming work, which the site deeply discourages.
My Encryption problem
I want to encrypt my answers so that only those who have already solved the riddles can see my code. The logical key would be the answer to the riddle, which is always numeric.
In order to prevent brute-force attacks on my answers, I want to find an encryption algorithm that takes a significantly long time (few seconds) to run.
Do you know any such algorithm? I would fancy a Python package, which I can attach to the code, over an external program that might have portability issues.
Thanks,
Adam
It sounds like people will have to write their own decryption utility, or use something off-the-shelf, or use off-the-shelf components to decrypt your posts.
PBKDF2 is a standardized algorithm for password-based key derivation, defined in PKCS #5. Basically, you can tune "iterations" parameter so that deriving the key from a password (the answer to the Euler problem) would take several seconds. The key can then be used for any common symmetric encryption algorithm, like AES-128.
This has the advantage that most crypto libraries already support PBKDF2. In fact, you might find mail clients that support password-based encryption for S/MIME messages. Then you could just post an S/MIME and people could read it with the mail client. Unfortunately, my mail client (Thunderbird) only supports public-key encryption.
I think Yin Zhu pegged the social aspect of it and Whirlwind the technical. Using your preferred approach of:
python decrypt.py --problem=123 --key=1234567
the key number is readily available to Google, and even without that, slamming through a million keys (assuming a median key length of 5 decimal digits yields less than 20 bits of key) is pretty fast. If I wanted to be more clever I could use plain-text assumptions (e.g. import, for) and vastly reduce my search space.
For all the trouble you're probably best off using something really complicated like:
>>> print codecs.getencoder('rot_13')('import codecs')[0]
vzcbeg pbqrpf
And if you want the solution to Project Euler problem 123, you'll have to beat it out of me...
Yes, you can do this with virtually any symmetric encryption algorithm: DSA, or AES, for example; just use the integer as the key, and pad the key out to the required length of the encryption algorithm's key, and use that key to decrypt the answer.
Keep in mind that if you extend a short key, the encryption won't be very good. The strength of the encryption has a lot more to do with key length and the algorithm itself than how long it takes to run.
This question seems to have some examples of libraries to use with python.
Just use triple DES and use different keys for each iteration, use the number to generate a each of the 3 keys. Pad up the key length with some text, and you're good.
Tripple DES was designed to increase effectiveness against brute force.
It's not the world's most secure option, but I'll keep most bruter's at bay.
If you encrypt your answers, those who have solved the problem simply do not want to see your answers with such effort, provided that they already have plenty of answers to see in the answer page. Those who haven't cannot see. Then your work becomes less useful.
Btw, there are many places providing answers to Project Euler, e.g. Haskell answers, Clojure answers, F# answers. If somebody only wants the answer to a question, he/she could simply run the program. Provided that Python is so popular, google "Python Euler xx" would give you plenty of blogs solving a specific problem.
The simplest approach would be to hash the answer using a secure hash function such as SHA-1, then provide the hash so users can verify their answer. If you want to make brute-forcing more difficult, iterate the hash - eg, provide the result of n recursive applications of SHA1, where n is some parameter you choose to make it difficult to brute-force.
If the number of possible answers is small, though, it'll be difficult to impossible to prevent someone from brute-forcing it even with an expensive hash function.
Edit: Sorry, I misread your original question. If you want to encrypt your answer, you could do that by using the resulting hash, above, as the encryption key for your answer, rather than posting the hash.
If you want an encryption routine that is easy to use and distribute, I recommend Paul Rubin's p3.py. It's probably on the fast side, for how secure it is, but since you seem to be in need of a hurdle to be jumped rather than a siege-resistant wall, it may be a good choice for your purposes.
You could also look into rijndael.py, which is an implementation of AES, and slower than p3.py.