Safe using user input as key_name?

Safe using user input as key_name? - python

I would like to use a string that was input by the user in a web form as part of a key name:
user_input = self.request.POST.get('foo')
if user_input:
foo = db.get_or_insert(db.Key('Foo', user_input[:100], parent=my_parent))
Is this safe? Or should I do some inexpensive encoding or hash? If yes, which one?

It's safe as long as you don't care about a malicious user filling up your database with junk. get_or_insert won't let them overwrite existing entries, just add new ones.
Make sure you limit it's length (both in the UI and after it's been recieved), even if you do no other validation on it, so at least they can't just give you crazy big keys either to fill up the database quickly or to crash your app.
Edit: You just commented that you do, in fact, verify that it's a reasonable key. In that case, yes, it's safe.
Keep in mind that the user can probably still figure out what key are already in your database, based on how long it takes you to respond to what they've provided, and you still need to make sure they're authorized to see whatever content they request, or limit them to a small number of requests to they can't just brute-force retrieve all the information linked to the keys you're generating.

Related

Is it possible to generate hash from a queryset?

My idea is to create a hash of a queryset result. For example, product inventory.
Each update of this stock would generate a hash.
This use would be intended to only request this queryset in the API, when there is a change (example: a new product in invetory).
Example for this use:
no change, same hash - no request to get queryset
there was change, different hash. Then a request will be made.
This would be a feature designed for those who are consuming the data and not for the Django that is serving.
Does this make any sense? I saw that in python there is a way to generate a hash from a tuple, in my case it would be to use the frozenset and generate the hash. I don't know if it's a good idea.

I would comment, but I'm waiting on the 50 rep to be able to do that. It sounds like you're trying to cache results so you aren't querying on data that hasn't been changed. If you're not familiar with caching, the idea is to save hard-to-compute answers in memory for frequently queried endpoints/functions.
For example, if I had a program that calculated the first n digits of pi, I may choose to save a map of [digit count -> value] so that if 10 people asked me for the first thousand, I would only calculate it once. Redis is a popular option for caching, and I believe it exists for Django. It allows you to cache some information, set a time before expiration on it, and then wipe specific parts of that information (to force it to recalculate) every time something specific changes (like a new product in inventory).
Everybody should try writing their own cache at least once, like what you're describing, but the de facto professional option is to use a caching library. Your idea is good, it will definitely work, and you will probably want a dict of [hash->result] for each hash, where result is the information you would send back over your API. If you plan to save data so it persists across multiple program starts, remember Python forces random seeds for hashes, causing inconsistent values. Check out this post for more info.

Why is there a difference in format of encrypted passwords in Django

I am using Django 1.97. The encrypted passwords are significantly different (in terms of the format).
Some passwords are of format $$$:
pbkdf2_sha256$24000$61Rm3LxOPsCA$5kV2bzD32bpXoF6OO5YuyOlr5UHKUPlpNKwcNVn4Bt0=
While others are of format :
!9rPYViI1oqrSMfkDCZSDeJxme4juD2niKcyvKdpB
Passwords are set either using User.objects.create_user() or user.set_password(). Is this difference an expected one ?

You'll be fine. You just have some blank passwords in your database.
Going back as far as V0.95, django used the $ separators for delimiting algorithm/salt/hash. These days, django pulls out the algorithm first by looking at what is in front of the first $ and then passes the whole lot to the hasher to decode. This allows for a wider set of formats, including the one for PBKDF2 which adds an extra iterations parameter in this list (as per your first example).
However, it also recognises that some users may not be allowed to login and/or have no password. This is encoded using the second format you've seen. As you can see here:
If password is None then a concatenation of UNUSABLE_PASSWORD_PREFIX and a random string will be returned which disallows logins.
You can also see that the random string is exactly 40 characters long - just like your second example.
In short, then, this is all as expected.

There is no significant difference between User.objects.create_user() and user.set_password() since first uses second.
Basically, passwords are in string with format <algorithm>$<iterations>$<salt>$<hash> according to docs. The differences might come from PASSWORD_HASHERS settings variable. May be one password was created with one hasher and other password with another. But if you'll keep those hashers in variable mentioned above all should be fine, users will able to change it etc. You can read about it in little notice after bcrypt section.
Also docs for django.contrib.auth package might be helpful too. Link.
UPDATE:
If you find documentation of an old django versions (1.3 for example), you will see that
Previous Django versions, such as 0.90, used simple MD5 hashes without password salts. For backwards compatibility, those are still supported; they'll be converted automatically to the new style the first time check_password() works correctly for a given user.
So I think that the answer might be somewhere here. But it really depends on how legacy your project is, so you can decide if it's normal or what. Anyway you can issue check_password() to be sure. Or you can just email your user with "change password please" notification. There are many factors involved really.

How to create a unique key from an ID

I am currently working on a text sharing website and I came across the following problem. Each post gets an ID and I would like to be able to easily access the post this by giving the id in the link as a parameter. But since you can simply enter the numbers manually, it is very insecure. My idea is to calculate a longer unique number from the ID. Of course, the number needs to be brought into its original state. The ideal would be a solution in a python. Thanks in advance!
Edit: Correct me if I am wrong but there is no way to reverse the uuid back to the original number?

First thing that needs to be said is that it's not insecure. Even if you calculate some longer number, there is still a chance to access it anyway. Imagine someone creating a generator script trying such numbers. Giving post an ID and security shouldn't be mixed up together.
The best solution would be to add some kind of privileges system or password protection. You can of course use some hash functions for making the id longer if you insist. Not sure what exactly is the idea behind the website, you mean something like Pastebin? Simply add an option for the password protection as I suggested before. Some might use it, some don't.

How to reduce the length of a string?

I'm trying to reduce the size of a string like this:
'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE0NDU0OTk3NDUsImQiOnsiYXV0aF9kYXRhIjoiZm9vIiwib3RoZXJfYXV0aF9kYXRhIjoiYmFyIiwidWlkIjoidW5pcXVlSWQxIn0sInYiOjB9.h6LV3boj0ka2PsyOjZJb8Q48ugiHlEkNksusRGtcUBk'
to something that someone could type in less then 30 seconds like this:
'aF9kYX'
and be able to turn it back to the original string too. How could I achieve that?
EDIT: I guess I'm not being clear, first I don't know if what I want is possible.
So, I have my app which asks for a token to log in, which is that JWT. But it is way too long for someone to manually type. So I supposed there was an algorithm to make this string smaller (compress it) so that it could be easier and faster to type. An example that comes to my mind of how I would use such algorithm is:
short_to_big(small_string) //Returns the original JWT
big_to_short(JWT_string) //Returns the smaller string

Stupid simple answer: use a dict to store the short string as key and the long one as value. Then you just have to generate the short string the way you like and make sure it's not already in the dict. If you need to persist the key/value, you can use almost any kind of database (sql, key:value, document, or even a csv file FWIW).
Oh and if that doesn't solve your problem then you may want to consider giving more context ;)

You need more constraints. A 200 character string contains a lot more information than a 6 character string, so either need to a lot more about the original strings (e.g. that they come from some known set of strings, or have a limited character set) or you need to store the original strings somewhere and use the string the user type as a key to a map or similar.
There are lossless compression algorithms, but these depend on knowing some probabilistic information about the string (e.g. that repeated characters are likely) and will typically expand the strings if the probabilities are wrong.

UPDATE (After question clarification and comments suggestion)
You could implement an algorithm that uniquely maps this big string into a short representation of the string and store this mapping in a dictionary. The following algorithm does not guarantee the uniqueness but should give you some path to follow.
import random
import string
def long_string_to_short(original_string, length=10):
random.seed(original_string)
filling_values = string.digits + string.ascii_letters
short_string = ''.join(random.choice(filling_values) for char_ in xrange(length))
return short_string
When calling the function you can specify an appropriate length for the short string.
Then you could:
my_mapping_dict = {}
my_long_string = 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE0NDU0OTk3NDUsImQiOnsiYXV0aF9kYXRhIjoiZm9vIiwib3RoZXJfYXV0aF9kYXRhIjoiYmFyIiwidWlkIjoidW5pcXVlSWQxIn0sInYiOjB9.h6LV3boj0ka2PsyOjZJb8Q48ugiHlEkNksusRGtcUBk'
short_string = long_string_to_short(my_long_string)
my_mapping_dict[short_string] = my_long_string

Ok, so, because I couldn't find a solution for shrinking the string, I tried to give it a different approach, and found a solution.
Now to clarify why I wanted to log in with the token, I'm going to write what I want to do with my app:
In Firebase anyone can create an account, but I don't want that, so for that I made a group of users that were the only ones that could write or read the data.
So in order to create an account, the user would have to request a register code, (Which in reality is a JWT generated from Firebase, so that you have permission to add a user to that group I was talking about).
This app is for local use, meaning that only people that lives here are going to use it. So, back to the original question, the token is too big for someone to type (as I have said many times), and I wanted to know if I could shrink it and how. But without success I tried a different approach, which is to generate the token (from a different program), encrypt it with a random code, and upload it to a firebase, that way I give the random code to people so that users can type it in the app so that it can retrieve and decrypt the token and authenticate with it, so that finally the user has an account that has the privilege to read or write data.
Thanks for your responses and sorry if I wasted your time.

Checking username availability - Handling of AJAX requests (Google App Engine)

I want to add the 'check username available' functionality on my signup page using AJAX. I have few doubts about the way I should implement it.
With which event should I register my AJAX requests? We can send the
requests when user focus out of the 'username' input field (blur
event) or as he types (keyup event). Which provides better user
experience?
On the server side, a simple way of dealing with requests would be
to query my main 'Accounts' database. But this could lead to a lot
of request hitting my database(even more if we POST using the keyup
event). Should I maintain a separate model for registered usernames
only and use that to get better results?
Is it possible to use Memcache in this case? Initializing cache with
every username as key and updating it as we register users and use a
random key to check if cache is actually initialized or pass the
queries directly to db.

Answers -
Do the check on blur. If you do it on key up, you will be hammering your server with unnecessary queries, annoying the user who is not yet done typing, and likely lag the typing anyway.
If your Account entity is very large, you may want to create a separate AccountName entity, and create a matching such entity whenever you create a real Account (but this is probably an unnecessary optimization). When you create the Account (or AccountName), be sure to assign id=name when you create it. Then you can do an AccountName.get_by_id(name) to quickly see if the AccountName has already been assigned, and it will automatically pull it from memcache if it has been recently dealt with.
By default, GAE NDB will automatically populate memcache for you when you put or get entities. If you follow my advice in step 2, things will be very fast and you won't have to mess around with pre-populating memcache.
If you are concerned about 2 people simultaneously requesting the same user name, put your create method in a transaction:
#classmethod
#ndb.transactional()
def create_account(cls, name, other_params):
acct = Account.get_by_id(name)
if not acct:
acct = Account(id=name, other param assigns)
acct.put()

I would recommend the blur event of the username field, combined with some sort of inline error/warning display.
I would also suggest maintaining a memcache of registered usernames, to reduce DB hits and improve user experience - although probably not populate this with a warm-up, but instead only when requests are made. This is sometimes called a "Repository" pattern.
BUT, you can only populate the cache with USED usernames - you should not store the "available" usernames here (or if you do, use a much lower timeout).
You should always check directly against the DB/Datastore when actually performing the registration. And ideally in some sort of transactional method so that you don't have race conditions with multiple people registering.
BUT, all of this work is dependant on several things, including how busy your app is and what data storage tech you are using!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Safe using user input as key_name? - python

Related

Is it possible to generate hash from a queryset?

Why is there a difference in format of encrypted passwords in Django

How to create a unique key from an ID

How to reduce the length of a string?

Checking username availability - Handling of AJAX requests (Google App Engine)

Categories

Resources