AES Key Generation from variables

AES Key Generation from variables - python

I want to code a custom key generator in Python. This key will be used as an input (along with the plain text) to AES algorithm for encryption (I will probably use pycrypto or m2crypto libraries for that).
But the key generator has to be custom, as it would generate the key based on the string that would be supplied by the user.
str = date + case-id + name
where:
date = current date when a case was submitted
(we work on separate security analysis cases, submitted on our ticketing tool)
name = person handling the case
case-id = the ticket id with which it was submitted.
This same key needs to be known to the decryptor (on a different system) so that it can decrypt the data.
So the key will have to be fixed for a specific set of date name and case-id for a specific order and will only be different if any of these 3 change in value or order and should not be random every time.
I've gone through some of stackoverflow articles, where it is suggested to use
random_key = os.urandom(16)
but I don't believe this will serve my purpose.
Suggestion on some articles where to start with if I want to design a key generator from scratch, or some pointers on existing libraries will be highly appreciated.

You're looking for a Password hashing algorithm, such as Argon2 or PBKDF2. It will allow you to deterministically extend the 'password' generated from the input values into a suitable key.
However, note that your passwords may still be very weak. I suspect that there is a strong correlation between case-id and date. Names are probably only a small list of people easily found out. Also, isn't this data sent along with the encrypted data by your system? This makes using it as a password a bad idea.

Related

What's the purpose of a password in symmetric cryptography?

I found the Python package to encrypt some data and see this in python Cryptography:
It is possible to use passwords with Fernet(symmetric key). To do this, you need to run the password through a key derivation function such as PBKDF2HMAC, bcrypt or scrypt.
But, it turns out that a password works in the same way as a key(use password/key to en/decrypt). So why bother to use password instead of key itself?
I mean why not just use key itself:
from cryptography.fernet import Fernet
key = Fernet.generate_key()
token = Fernet(key).encrypt(b"my deep dark secret")
Fernet(key).decrypt(token)

A password is something that can be remembered by a person whereas a key is usually not remembered, because it is long (at least 128 bit or Hex-encoded in 32 characters) and is supposed to be really random (indistinguishable from random noise). If you want to encrypt something with a key, but this key cannot be transmitted by asymmetric cryptography and instead should be given over the phone or should never be written anywhere, then you can't simply generate a random key and use that. You need have a password/passphrase in place to derive the key from.
Example 1:
A personal password safe like KeePass needs a key for encryption/decryption. A user will not be able to simply remember that key, therefore we have a much shorter password, which can be remembered. Now, the security lies in the fact that a slow key derivation function is used to derive a key from the password, so an attacker still has trouble brute-forcing the key even though it depends on a much shorter password.
Example 2:
You compress a document and use the encryption of the compression software. Now you can send the container via e-mail, but you can't send the password along with it. So, you call the person you've sent the e-mail to and tell them the password. A password is much easier to transmit than a long and random key in this way.

Secure cookie strategy

After reading about how to ensure that "remember me" tokens are kept secure and reading the source code for psecio's Gatekeeper PHP library, I've come up with the following strategy for keeping things secure, and I wanted to find out if this is going to go horribly wrong. I'm basically doing the following things:
When a user logs in, generate a cryptographically-secure string using the system's random number generator. (random.SystemRandom() in Python) This is generated by picking random characters from the selection of all lower and uppercase ASCII letters and digits. (''.join(_random_gen.choice(_random_chars) for i in range(length)), as per how Django does the same. _random_gen is the secure random number generator)
The generated token is inserted into a RethinkDB database along with the userid it goes along with and an expiration time 1 minute into the future. A cookie value is then created by using the unique ID that RethinkDB generates to identify that entry and the sha256-hashed token from before. Basically: ':'.join(unique_id, sha256_crypt.encrypt(token)). sha256_crypt is from Python's passlib library.
When a user accesses a page that would require them to be logged in, the actual cookie value is retrieved from the database using the ID that was stored. The hashed cookie is then verified against the actual cookie using sha256_crypt.verify.
If the verification passes and the time value previously stored is less than the current time, then the previous entry in the database is removed and a new ID/token pair is generated to be stored as a cookie.
Is this a good strategy, or is there an obvious flaw that I'm not seeing?
EDIT: After re-reading some Stack Overflow posts that I linked in a comment, I have changed the process above so that the database stores the hashed token, and the actual token is sent back as a cookie. (which will only happen over https, of course)

You should make sure you generate enough characters in your secure string. I would aim for 64 bits of entropy, which means you need at least 11 characters in your string to prevent any type of practical brute force.
This is as per OWASP's recommendation for Session Identifiers:
With a very large web site, an attacker might try 10,000 guesses per
second with 100,000 valid session identifiers available to be guessed.
Given these assumptions, the expected time for an attacker to
successfully guess a valid session identifier is greater than 292
years.
Given 292 years, generating a new one every minute seems a little excessive. Maybe you could change this to refresh it once per day.
I would also add a system wide salt to your hashed, stored value (known as a pepper). This will prevent any precomputed rainbow tables from extracting the original session value if an attacker manages to gain access to your session table. Create a 16 bit cryptographically secure random value to use as your pepper.
Apart from this, I don't see any inherent problems with what you've described. The usual advice applies though: Also use HSTS, TLS/SSL and Secure cookie flags.

Create a receipt for a user form submission

There is a requirement that our users should complete and submit a form once a month. So, each month we should have a form that will contain data for the triplet (username, month, year). I want our users to be able to certify that they did actually submit the form for that particular month by creating a receipt for them. So, for each month there will be a report containing the data the user submitted along with the receipt. I don't want the users to be able to create that receipt by themselves though.
What I was thinking was to create a string that contained username, month, year, secret_word and give the md5 hash of that string to the users as their receipt. That way because the users won't have the secret word they won't be able to generate the md5 hash. However my users will probably complain when they see the complexity of that md5 hash. Also if the find out the secret word they will be able to create receipts for everybody.
Is there a standard way of doing what I ask ? Could you recommend me any other possible solutions ?
I am using Python but some pseudocode or link to the appropriate methods would be ok.

#Serafeim, your approach is very good for the situation. Here are some ideas of extending it:
Make sure that the secret_word (in hashing terms it is called salt) is long enough.
Make the end function a bit more complex, e.g.
hash = h(h(username) + month + year + h(salt))
Use a bit more complex hash function, e.g. SHA1
Don't give the end user the whole hash value. E.g. md5 hex digest contains 32 digits, but it would be enough to have first 5-10 digits of the hash in the report.
Updated:
In case you have resources, generate a random salt per user. Then even if somehow a user will learn the salt and the hash function, it will be still useless for the others.

Python encryption scheme that supports multiple decryption keys

Is there a python library that supports (symmetric) encryption of data with the possibility of using multiple decryption keys.
I have (sensitive) user data that must be stored encrypted in a database, but it must be possible for multiple 3rd parties to access the data without giving them all the same secret.
This could be implemented by generating a random key K, encrypt the original data D to get D_K. Then I encrypting K with as many access keys (ak_1 to ak_n) as needed, store them for later use and destroy K. Whenever a 3rd party tries to access D the submit ak_i and I use it to decrypt K and us it to decrypt D_K to get D.
However, it would be nice to have a implemented since a) I don't like to reinvent the wheel and b) this is security and you probably won't get it 100% right.

Due to the confusion and issues surrounding export controls of hard encryption, there's not a lot of 3rd party libraries that directly provide this sort of higher-level explicit encryption scheme.
For the most part you're going to have to wrap the toolset of something like PyCrypto with your own key logic. Seeing as it's crypto we're talking about, however, I'd be remiss if I didn't point out the other libraries for lower-level hard encryption tools in Python.

Exploiting hash function in python

I made a topic about the built-in python hash function: Old python hashing done left to right - why is it bad?
The previous topic was about why it was bad for encryption, because we have an application called Gruyere which is filled with security holes, and it uses the hash() to encrypt cookies.
# global cookie_secret; only use positive hash values
h_data = str(hash(cookie_secret + c_data) & 0x7FFFFFF)
c_data is a username; cookie_secret is salt (which is just '' by default)
I have implemented a more secure encryption method using md5 hashing with salt, but one excercise is to beat this old encryption and I still cannot understand how :-( I've read the string_hash code from python sourcecode but it's not documented and I can't figure it out.
EDIT: The idea is to write a program which can create a valid cookie any valid user, so I think I need to find out cookie_secret somehow

Zack described the answer already in your last question: It's easy to find a collision.
Let's say you save hash("pwd") in the database (that you actually do something different doesn't matter. Now, if you enter "pwd" in the site, you can enter. But how is this checked? Again, the hash of "pwd" is token, and compared to the value in the database. But what if there is a second string, say "hello", and hash("hello") == hash("pwd")? Then you could also use "hello" as password. So to beat the encryption, you don't need to find "pwd", you just need any string which has the same hash-value. You can just search for such a string brute-force (and I guess you can do some optimizations based on the knowledge of the source of hash)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.