There is a requirement that our users should complete and submit a form once a month. So, each month we should have a form that will contain data for the triplet (username, month, year). I want our users to be able to certify that they did actually submit the form for that particular month by creating a receipt for them. So, for each month there will be a report containing the data the user submitted along with the receipt. I don't want the users to be able to create that receipt by themselves though.
What I was thinking was to create a string that contained username, month, year, secret_word and give the md5 hash of that string to the users as their receipt. That way because the users won't have the secret word they won't be able to generate the md5 hash. However my users will probably complain when they see the complexity of that md5 hash. Also if the find out the secret word they will be able to create receipts for everybody.
Is there a standard way of doing what I ask ? Could you recommend me any other possible solutions ?
I am using Python but some pseudocode or link to the appropriate methods would be ok.
#Serafeim, your approach is very good for the situation. Here are some ideas of extending it:
Make sure that the secret_word (in hashing terms it is called salt) is long enough.
Make the end function a bit more complex, e.g.
hash = h(h(username) + month + year + h(salt))
Use a bit more complex hash function, e.g. SHA1
Don't give the end user the whole hash value. E.g. md5 hex digest contains 32 digits, but it would be enough to have first 5-10 digits of the hash in the report.
Updated:
In case you have resources, generate a random salt per user. Then even if somehow a user will learn the salt and the hash function, it will be still useless for the others.
Related
I am making a Django project that will be hosted locally in different environments.
I want users to be able to login by just entering a six-digit PIN on a touch screen or keyboard instead of having to type out a lengthy username/password.
I need to store a PIN for users in the DB. I want the PIN to be hashed or encrypted in some way so that it is not visible in the database. The PIN (and therefore its hash) must be unique but it also must be converted to the same value each time. For instance, every time 123456 is entered it needs to be converted to "jhs8d67RandomString34kds" so that no two users can save the same PIN as the DB column will be unique.
I need to know how to change a user-entered integer and hash it to save in the database.
Then I need to know how to compare it when a user enters the PIN.
I really need some examples on how to implement this and not a lesson in telling me why this is "insecure" or won't work.
Any ideas would be greatly appreciated.
Hashing something doesn't make it secure
All hash function have clashes, the only difference is the probability
Integers have hash function implemented, just use that
Note that for security reasons hashing for strings in randomized in each python process. so those hashes cannot be used for persistent data
You can use module-hashlib:
import hashlib
pincode = "123456"
hashlib.md5(pincode).hexdigest()
'e10adc3949ba59abbe56e057f20f883e'
And the to compare you can do the same:
if hashlib.md5(pincode).hexdigest() == 'e10adc3949ba59abbe56e057f20f883e':
you code here
...
Or use hashlib.pbkdf2_hmac with salt:
hashlib.pbkdf2_hmac(hash_name, password, salt, iterations, dklen=None)
import hashlib
dk = hashlib.pbkdf2_hmac('sha256', b'password', b'salt', 100000)
dk.hex()
'0394a2ede332c9a13eb82e9b24631604c31df978b4e2f0fbd2c549944f9d79a5'
I want to code a custom key generator in Python. This key will be used as an input (along with the plain text) to AES algorithm for encryption (I will probably use pycrypto or m2crypto libraries for that).
But the key generator has to be custom, as it would generate the key based on the string that would be supplied by the user.
str = date + case-id + name
where:
date = current date when a case was submitted
(we work on separate security analysis cases, submitted on our ticketing tool)
name = person handling the case
case-id = the ticket id with which it was submitted.
This same key needs to be known to the decryptor (on a different system) so that it can decrypt the data.
So the key will have to be fixed for a specific set of date name and case-id for a specific order and will only be different if any of these 3 change in value or order and should not be random every time.
I've gone through some of stackoverflow articles, where it is suggested to use
random_key = os.urandom(16)
but I don't believe this will serve my purpose.
Suggestion on some articles where to start with if I want to design a key generator from scratch, or some pointers on existing libraries will be highly appreciated.
You're looking for a Password hashing algorithm, such as Argon2 or PBKDF2. It will allow you to deterministically extend the 'password' generated from the input values into a suitable key.
However, note that your passwords may still be very weak. I suspect that there is a strong correlation between case-id and date. Names are probably only a small list of people easily found out. Also, isn't this data sent along with the encrypted data by your system? This makes using it as a password a bad idea.
I'm trying to reduce the size of a string like this:
'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE0NDU0OTk3NDUsImQiOnsiYXV0aF9kYXRhIjoiZm9vIiwib3RoZXJfYXV0aF9kYXRhIjoiYmFyIiwidWlkIjoidW5pcXVlSWQxIn0sInYiOjB9.h6LV3boj0ka2PsyOjZJb8Q48ugiHlEkNksusRGtcUBk'
to something that someone could type in less then 30 seconds like this:
'aF9kYX'
and be able to turn it back to the original string too. How could I achieve that?
EDIT: I guess I'm not being clear, first I don't know if what I want is possible.
So, I have my app which asks for a token to log in, which is that JWT. But it is way too long for someone to manually type. So I supposed there was an algorithm to make this string smaller (compress it) so that it could be easier and faster to type. An example that comes to my mind of how I would use such algorithm is:
short_to_big(small_string) //Returns the original JWT
big_to_short(JWT_string) //Returns the smaller string
Stupid simple answer: use a dict to store the short string as key and the long one as value. Then you just have to generate the short string the way you like and make sure it's not already in the dict. If you need to persist the key/value, you can use almost any kind of database (sql, key:value, document, or even a csv file FWIW).
Oh and if that doesn't solve your problem then you may want to consider giving more context ;)
You need more constraints. A 200 character string contains a lot more information than a 6 character string, so either need to a lot more about the original strings (e.g. that they come from some known set of strings, or have a limited character set) or you need to store the original strings somewhere and use the string the user type as a key to a map or similar.
There are lossless compression algorithms, but these depend on knowing some probabilistic information about the string (e.g. that repeated characters are likely) and will typically expand the strings if the probabilities are wrong.
UPDATE (After question clarification and comments suggestion)
You could implement an algorithm that uniquely maps this big string into a short representation of the string and store this mapping in a dictionary. The following algorithm does not guarantee the uniqueness but should give you some path to follow.
import random
import string
def long_string_to_short(original_string, length=10):
random.seed(original_string)
filling_values = string.digits + string.ascii_letters
short_string = ''.join(random.choice(filling_values) for char_ in xrange(length))
return short_string
When calling the function you can specify an appropriate length for the short string.
Then you could:
my_mapping_dict = {}
my_long_string = 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE0NDU0OTk3NDUsImQiOnsiYXV0aF9kYXRhIjoiZm9vIiwib3RoZXJfYXV0aF9kYXRhIjoiYmFyIiwidWlkIjoidW5pcXVlSWQxIn0sInYiOjB9.h6LV3boj0ka2PsyOjZJb8Q48ugiHlEkNksusRGtcUBk'
short_string = long_string_to_short(my_long_string)
my_mapping_dict[short_string] = my_long_string
Ok, so, because I couldn't find a solution for shrinking the string, I tried to give it a different approach, and found a solution.
Now to clarify why I wanted to log in with the token, I'm going to write what I want to do with my app:
In Firebase anyone can create an account, but I don't want that, so for that I made a group of users that were the only ones that could write or read the data.
So in order to create an account, the user would have to request a register code, (Which in reality is a JWT generated from Firebase, so that you have permission to add a user to that group I was talking about).
This app is for local use, meaning that only people that lives here are going to use it. So, back to the original question, the token is too big for someone to type (as I have said many times), and I wanted to know if I could shrink it and how. But without success I tried a different approach, which is to generate the token (from a different program), encrypt it with a random code, and upload it to a firebase, that way I give the random code to people so that users can type it in the app so that it can retrieve and decrypt the token and authenticate with it, so that finally the user has an account that has the privilege to read or write data.
Thanks for your responses and sorry if I wasted your time.
After reading about how to ensure that "remember me" tokens are kept secure and reading the source code for psecio's Gatekeeper PHP library, I've come up with the following strategy for keeping things secure, and I wanted to find out if this is going to go horribly wrong. I'm basically doing the following things:
When a user logs in, generate a cryptographically-secure string using the system's random number generator. (random.SystemRandom() in Python) This is generated by picking random characters from the selection of all lower and uppercase ASCII letters and digits. (''.join(_random_gen.choice(_random_chars) for i in range(length)), as per how Django does the same. _random_gen is the secure random number generator)
The generated token is inserted into a RethinkDB database along with the userid it goes along with and an expiration time 1 minute into the future. A cookie value is then created by using the unique ID that RethinkDB generates to identify that entry and the sha256-hashed token from before. Basically: ':'.join(unique_id, sha256_crypt.encrypt(token)). sha256_crypt is from Python's passlib library.
When a user accesses a page that would require them to be logged in, the actual cookie value is retrieved from the database using the ID that was stored. The hashed cookie is then verified against the actual cookie using sha256_crypt.verify.
If the verification passes and the time value previously stored is less than the current time, then the previous entry in the database is removed and a new ID/token pair is generated to be stored as a cookie.
Is this a good strategy, or is there an obvious flaw that I'm not seeing?
EDIT: After re-reading some Stack Overflow posts that I linked in a comment, I have changed the process above so that the database stores the hashed token, and the actual token is sent back as a cookie. (which will only happen over https, of course)
You should make sure you generate enough characters in your secure string. I would aim for 64 bits of entropy, which means you need at least 11 characters in your string to prevent any type of practical brute force.
This is as per OWASP's recommendation for Session Identifiers:
With a very large web site, an attacker might try 10,000 guesses per
second with 100,000 valid session identifiers available to be guessed.
Given these assumptions, the expected time for an attacker to
successfully guess a valid session identifier is greater than 292
years.
Given 292 years, generating a new one every minute seems a little excessive. Maybe you could change this to refresh it once per day.
I would also add a system wide salt to your hashed, stored value (known as a pepper). This will prevent any precomputed rainbow tables from extracting the original session value if an attacker manages to gain access to your session table. Create a 16 bit cryptographically secure random value to use as your pepper.
Apart from this, I don't see any inherent problems with what you've described. The usual advice applies though: Also use HSTS, TLS/SSL and Secure cookie flags.
I made a topic about the built-in python hash function: Old python hashing done left to right - why is it bad?
The previous topic was about why it was bad for encryption, because we have an application called Gruyere which is filled with security holes, and it uses the hash() to encrypt cookies.
# global cookie_secret; only use positive hash values
h_data = str(hash(cookie_secret + c_data) & 0x7FFFFFF)
c_data is a username; cookie_secret is salt (which is just '' by default)
I have implemented a more secure encryption method using md5 hashing with salt, but one excercise is to beat this old encryption and I still cannot understand how :-( I've read the string_hash code from python sourcecode but it's not documented and I can't figure it out.
EDIT: The idea is to write a program which can create a valid cookie any valid user, so I think I need to find out cookie_secret somehow
Zack described the answer already in your last question: It's easy to find a collision.
Let's say you save hash("pwd") in the database (that you actually do something different doesn't matter. Now, if you enter "pwd" in the site, you can enter. But how is this checked? Again, the hash of "pwd" is token, and compared to the value in the database. But what if there is a second string, say "hello", and hash("hello") == hash("pwd")? Then you could also use "hello" as password. So to beat the encryption, you don't need to find "pwd", you just need any string which has the same hash-value. You can just search for such a string brute-force (and I guess you can do some optimizations based on the knowledge of the source of hash)