How to reduce the length of a string? - python

I'm trying to reduce the size of a string like this:
'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE0NDU0OTk3NDUsImQiOnsiYXV0aF9kYXRhIjoiZm9vIiwib3RoZXJfYXV0aF9kYXRhIjoiYmFyIiwidWlkIjoidW5pcXVlSWQxIn0sInYiOjB9.h6LV3boj0ka2PsyOjZJb8Q48ugiHlEkNksusRGtcUBk'
to something that someone could type in less then 30 seconds like this:
'aF9kYX'
and be able to turn it back to the original string too. How could I achieve that?
EDIT: I guess I'm not being clear, first I don't know if what I want is possible.
So, I have my app which asks for a token to log in, which is that JWT. But it is way too long for someone to manually type. So I supposed there was an algorithm to make this string smaller (compress it) so that it could be easier and faster to type. An example that comes to my mind of how I would use such algorithm is:
short_to_big(small_string) //Returns the original JWT
big_to_short(JWT_string) //Returns the smaller string

Stupid simple answer: use a dict to store the short string as key and the long one as value. Then you just have to generate the short string the way you like and make sure it's not already in the dict. If you need to persist the key/value, you can use almost any kind of database (sql, key:value, document, or even a csv file FWIW).
Oh and if that doesn't solve your problem then you may want to consider giving more context ;)

You need more constraints. A 200 character string contains a lot more information than a 6 character string, so either need to a lot more about the original strings (e.g. that they come from some known set of strings, or have a limited character set) or you need to store the original strings somewhere and use the string the user type as a key to a map or similar.
There are lossless compression algorithms, but these depend on knowing some probabilistic information about the string (e.g. that repeated characters are likely) and will typically expand the strings if the probabilities are wrong.

UPDATE (After question clarification and comments suggestion)
You could implement an algorithm that uniquely maps this big string into a short representation of the string and store this mapping in a dictionary. The following algorithm does not guarantee the uniqueness but should give you some path to follow.
import random
import string
def long_string_to_short(original_string, length=10):
random.seed(original_string)
filling_values = string.digits + string.ascii_letters
short_string = ''.join(random.choice(filling_values) for char_ in xrange(length))
return short_string
When calling the function you can specify an appropriate length for the short string.
Then you could:
my_mapping_dict = {}
my_long_string = 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE0NDU0OTk3NDUsImQiOnsiYXV0aF9kYXRhIjoiZm9vIiwib3RoZXJfYXV0aF9kYXRhIjoiYmFyIiwidWlkIjoidW5pcXVlSWQxIn0sInYiOjB9.h6LV3boj0ka2PsyOjZJb8Q48ugiHlEkNksusRGtcUBk'
short_string = long_string_to_short(my_long_string)
my_mapping_dict[short_string] = my_long_string

Ok, so, because I couldn't find a solution for shrinking the string, I tried to give it a different approach, and found a solution.
Now to clarify why I wanted to log in with the token, I'm going to write what I want to do with my app:
In Firebase anyone can create an account, but I don't want that, so for that I made a group of users that were the only ones that could write or read the data.
So in order to create an account, the user would have to request a register code, (Which in reality is a JWT generated from Firebase, so that you have permission to add a user to that group I was talking about).
This app is for local use, meaning that only people that lives here are going to use it. So, back to the original question, the token is too big for someone to type (as I have said many times), and I wanted to know if I could shrink it and how. But without success I tried a different approach, which is to generate the token (from a different program), encrypt it with a random code, and upload it to a firebase, that way I give the random code to people so that users can type it in the app so that it can retrieve and decrypt the token and authenticate with it, so that finally the user has an account that has the privilege to read or write data.
Thanks for your responses and sorry if I wasted your time.

Related

Why is there a difference in format of encrypted passwords in Django

I am using Django 1.97. The encrypted passwords are significantly different (in terms of the format).
Some passwords are of format $$$:
pbkdf2_sha256$24000$61Rm3LxOPsCA$5kV2bzD32bpXoF6OO5YuyOlr5UHKUPlpNKwcNVn4Bt0=
While others are of format :
!9rPYViI1oqrSMfkDCZSDeJxme4juD2niKcyvKdpB
Passwords are set either using User.objects.create_user() or user.set_password(). Is this difference an expected one ?
You'll be fine. You just have some blank passwords in your database.
Going back as far as V0.95, django used the $ separators for delimiting algorithm/salt/hash. These days, django pulls out the algorithm first by looking at what is in front of the first $ and then passes the whole lot to the hasher to decode. This allows for a wider set of formats, including the one for PBKDF2 which adds an extra iterations parameter in this list (as per your first example).
However, it also recognises that some users may not be allowed to login and/or have no password. This is encoded using the second format you've seen. As you can see here:
If password is None then a concatenation of UNUSABLE_PASSWORD_PREFIX and a random string will be returned which disallows logins.
You can also see that the random string is exactly 40 characters long - just like your second example.
In short, then, this is all as expected.
There is no significant difference between User.objects.create_user() and user.set_password() since first uses second.
Basically, passwords are in string with format <algorithm>$<iterations>$<salt>$<hash> according to docs. The differences might come from PASSWORD_HASHERS settings variable. May be one password was created with one hasher and other password with another. But if you'll keep those hashers in variable mentioned above all should be fine, users will able to change it etc. You can read about it in little notice after bcrypt section.
Also docs for django.contrib.auth package might be helpful too. Link.
UPDATE:
If you find documentation of an old django versions (1.3 for example), you will see that
Previous Django versions, such as 0.90, used simple MD5 hashes without password salts. For backwards compatibility, those are still supported; they'll be converted automatically to the new style the first time check_password() works correctly for a given user.
So I think that the answer might be somewhere here. But it really depends on how legacy your project is, so you can decide if it's normal or what. Anyway you can issue check_password() to be sure. Or you can just email your user with "change password please" notification. There are many factors involved really.

How to create a unique key from an ID

I am currently working on a text sharing website and I came across the following problem. Each post gets an ID and I would like to be able to easily access the post this by giving the id in the link as a parameter. But since you can simply enter the numbers manually, it is very insecure. My idea is to calculate a longer unique number from the ID. Of course, the number needs to be brought into its original state. The ideal would be a solution in a python. Thanks in advance!
Edit: Correct me if I am wrong but there is no way to reverse the uuid back to the original number?
First thing that needs to be said is that it's not insecure. Even if you calculate some longer number, there is still a chance to access it anyway. Imagine someone creating a generator script trying such numbers. Giving post an ID and security shouldn't be mixed up together.
The best solution would be to add some kind of privileges system or password protection. You can of course use some hash functions for making the id longer if you insist. Not sure what exactly is the idea behind the website, you mean something like Pastebin? Simply add an option for the password protection as I suggested before. Some might use it, some don't.

django: How do I hash a URL from the database object's primary key?

I'm trying to generate URLs for my database objects. I've read I should not use the primary key for URLs, and a stub is not a good option for this particular model. Based on the advice in that link, I played around with zlib.crc32() in a Python interpreter and found that values often return negative numbers which I don't want in my URLs. Is there a better hash I should be using to generate my URLs?
UPDATE: I ended up using the bitwise XOR masking method suggested by David below, and it works wonderfully. Thanks to everyone for your input.
First, "don't use primary keys in URLs" is only a very weak guideline. If you are using incremental integer IDs and you don't want to reveal those numbers, then you could obfuscate them a little bit. For example, you could use: masked_id = entity.id ^ 0xABCDEFAB and unmasked_id = masked_id ^ 0xABCDEFAB.
Second, the article you linked to is highly suspicious. I would not trust it. First, CRC32 is a one-way hashing function: it's impossible (in general) to take a CRC32 hash and get back the string used to create that hash. You'll notice that he doesn't show you how to look up a Customer given the CRC32 of their pk. Second, the code in the article doesn't even make sense. The zlib.crc32 function expects a byte string, while Customer.id will be an integer.
Third, be careful if you want to use a slug for a URL: if the slug changes, your URLs will also change. This may be okay, but it's something you'll need to consider.

Safe using user input as key_name?

I would like to use a string that was input by the user in a web form as part of a key name:
user_input = self.request.POST.get('foo')
if user_input:
foo = db.get_or_insert(db.Key('Foo', user_input[:100], parent=my_parent))
Is this safe? Or should I do some inexpensive encoding or hash? If yes, which one?
It's safe as long as you don't care about a malicious user filling up your database with junk. get_or_insert won't let them overwrite existing entries, just add new ones.
Make sure you limit it's length (both in the UI and after it's been recieved), even if you do no other validation on it, so at least they can't just give you crazy big keys either to fill up the database quickly or to crash your app.
Edit: You just commented that you do, in fact, verify that it's a reasonable key. In that case, yes, it's safe.
Keep in mind that the user can probably still figure out what key are already in your database, based on how long it takes you to respond to what they've provided, and you still need to make sure they're authorized to see whatever content they request, or limit them to a small number of requests to they can't just brute-force retrieve all the information linked to the keys you're generating.

Python - Why use anything other than uuid4() for unique strings?

I see quit a few implementations of unique string generation for things like uploaded image names, session IDs, et al, and many of them employ the usage of hashes like SHA1, or others.
I'm not questioning the legitimacy of using custom methods like this, but rather just the reason. If I want a unique string, I just say this:
>>> import uuid
>>> uuid.uuid4()
UUID('07033084-5cfd-4812-90a4-e4d24ffb6e3d')
And I'm done with it. I wasn't very trusting before I read up on uuid, so I did this:
>>> import uuid
>>> s = set()
>>> for i in range(5000000): # That's 5 million!
>>> s.add(str(uuid.uuid4()))
...
...
>>> len(s)
5000000
Not one repeater (I wouldn't expect one now considering the odds are like 1.108e+50, but it's comforting to see it in action). You could even half the odds by just making your string by combining 2 uuid4()s.
So, with that said, why do people spend time on random() and other stuff for unique strings, etc? Is there an important security issue or other regarding uuid?
Using a hash to uniquely identify a resource allows you to generate a 'unique' reference from the object. For instance, Git uses SHA hashing to make a unique hash that represents the exact changeset of a single a commit. Since hashing is deterministic, you'll get the same hash for the same file every time.
Two people across the world could make the same change to the same repo independently, and Git would know they made the same change. UUID v1, v2, and v4 can't support that since they have no relation to the file or the file's contents.
Well, sometimes you want collisions. If someone uploads the same exact image twice, maybe you'd rather tell them it's a duplicate rather than just make another copy with a new name.
One possible reason is that you want the unique string to be human-readable. UUIDs just aren't easy to read.
uuids are long, and meaningless (for instance, if you order by uuid, you get a meaningless result).
And, because it's too long, I wouldn't want to put it in a URL or expose it to the user in any shape or form.
In addition to the other answers, hashes are really good for things that should be immutable. The name is unique and can be used to check the integrity of whatever it is attached to at any time.
Also note other kinds of UUID could even be appropriate. For example, if you want your identifier to be orderable, UUID1 is based in part on a timestamp. It's all really about your application requirements...

Categories

Resources