I am currently working on a text sharing website and I came across the following problem. Each post gets an ID and I would like to be able to easily access the post this by giving the id in the link as a parameter. But since you can simply enter the numbers manually, it is very insecure. My idea is to calculate a longer unique number from the ID. Of course, the number needs to be brought into its original state. The ideal would be a solution in a python. Thanks in advance!
Edit: Correct me if I am wrong but there is no way to reverse the uuid back to the original number?
First thing that needs to be said is that it's not insecure. Even if you calculate some longer number, there is still a chance to access it anyway. Imagine someone creating a generator script trying such numbers. Giving post an ID and security shouldn't be mixed up together.
The best solution would be to add some kind of privileges system or password protection. You can of course use some hash functions for making the id longer if you insist. Not sure what exactly is the idea behind the website, you mean something like Pastebin? Simply add an option for the password protection as I suggested before. Some might use it, some don't.
Related
My idea is to create a hash of a queryset result. For example, product inventory.
Each update of this stock would generate a hash.
This use would be intended to only request this queryset in the API, when there is a change (example: a new product in invetory).
Example for this use:
no change, same hash - no request to get queryset
there was change, different hash. Then a request will be made.
This would be a feature designed for those who are consuming the data and not for the Django that is serving.
Does this make any sense? I saw that in python there is a way to generate a hash from a tuple, in my case it would be to use the frozenset and generate the hash. I don't know if it's a good idea.
I would comment, but I'm waiting on the 50 rep to be able to do that. It sounds like you're trying to cache results so you aren't querying on data that hasn't been changed. If you're not familiar with caching, the idea is to save hard-to-compute answers in memory for frequently queried endpoints/functions.
For example, if I had a program that calculated the first n digits of pi, I may choose to save a map of [digit count -> value] so that if 10 people asked me for the first thousand, I would only calculate it once. Redis is a popular option for caching, and I believe it exists for Django. It allows you to cache some information, set a time before expiration on it, and then wipe specific parts of that information (to force it to recalculate) every time something specific changes (like a new product in inventory).
Everybody should try writing their own cache at least once, like what you're describing, but the de facto professional option is to use a caching library. Your idea is good, it will definitely work, and you will probably want a dict of [hash->result] for each hash, where result is the information you would send back over your API. If you plan to save data so it persists across multiple program starts, remember Python forces random seeds for hashes, causing inconsistent values. Check out this post for more info.
I'm trying to reduce the size of a string like this:
'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE0NDU0OTk3NDUsImQiOnsiYXV0aF9kYXRhIjoiZm9vIiwib3RoZXJfYXV0aF9kYXRhIjoiYmFyIiwidWlkIjoidW5pcXVlSWQxIn0sInYiOjB9.h6LV3boj0ka2PsyOjZJb8Q48ugiHlEkNksusRGtcUBk'
to something that someone could type in less then 30 seconds like this:
'aF9kYX'
and be able to turn it back to the original string too. How could I achieve that?
EDIT: I guess I'm not being clear, first I don't know if what I want is possible.
So, I have my app which asks for a token to log in, which is that JWT. But it is way too long for someone to manually type. So I supposed there was an algorithm to make this string smaller (compress it) so that it could be easier and faster to type. An example that comes to my mind of how I would use such algorithm is:
short_to_big(small_string) //Returns the original JWT
big_to_short(JWT_string) //Returns the smaller string
Stupid simple answer: use a dict to store the short string as key and the long one as value. Then you just have to generate the short string the way you like and make sure it's not already in the dict. If you need to persist the key/value, you can use almost any kind of database (sql, key:value, document, or even a csv file FWIW).
Oh and if that doesn't solve your problem then you may want to consider giving more context ;)
You need more constraints. A 200 character string contains a lot more information than a 6 character string, so either need to a lot more about the original strings (e.g. that they come from some known set of strings, or have a limited character set) or you need to store the original strings somewhere and use the string the user type as a key to a map or similar.
There are lossless compression algorithms, but these depend on knowing some probabilistic information about the string (e.g. that repeated characters are likely) and will typically expand the strings if the probabilities are wrong.
UPDATE (After question clarification and comments suggestion)
You could implement an algorithm that uniquely maps this big string into a short representation of the string and store this mapping in a dictionary. The following algorithm does not guarantee the uniqueness but should give you some path to follow.
import random
import string
def long_string_to_short(original_string, length=10):
random.seed(original_string)
filling_values = string.digits + string.ascii_letters
short_string = ''.join(random.choice(filling_values) for char_ in xrange(length))
return short_string
When calling the function you can specify an appropriate length for the short string.
Then you could:
my_mapping_dict = {}
my_long_string = 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE0NDU0OTk3NDUsImQiOnsiYXV0aF9kYXRhIjoiZm9vIiwib3RoZXJfYXV0aF9kYXRhIjoiYmFyIiwidWlkIjoidW5pcXVlSWQxIn0sInYiOjB9.h6LV3boj0ka2PsyOjZJb8Q48ugiHlEkNksusRGtcUBk'
short_string = long_string_to_short(my_long_string)
my_mapping_dict[short_string] = my_long_string
Ok, so, because I couldn't find a solution for shrinking the string, I tried to give it a different approach, and found a solution.
Now to clarify why I wanted to log in with the token, I'm going to write what I want to do with my app:
In Firebase anyone can create an account, but I don't want that, so for that I made a group of users that were the only ones that could write or read the data.
So in order to create an account, the user would have to request a register code, (Which in reality is a JWT generated from Firebase, so that you have permission to add a user to that group I was talking about).
This app is for local use, meaning that only people that lives here are going to use it. So, back to the original question, the token is too big for someone to type (as I have said many times), and I wanted to know if I could shrink it and how. But without success I tried a different approach, which is to generate the token (from a different program), encrypt it with a random code, and upload it to a firebase, that way I give the random code to people so that users can type it in the app so that it can retrieve and decrypt the token and authenticate with it, so that finally the user has an account that has the privilege to read or write data.
Thanks for your responses and sorry if I wasted your time.
I apologize if the question is a little vague but I'm trying to find the simplest way to do this.
I have a small group of people, for whom I have written a python script. Now this python script summarizes articles mined from a website (that are unique by an id number) when the user runs it with some parameters. Now each user might choose to "claim" one or more articles, which means that they will be working on it. Thus any future execution of the script should omit using a "claimed" article in its summary.
I need a way to have a globally accessible file, which my script accesses and checks its output against.
I also need to have a way for the user to add multiple id numbers to this global file.
I understand that a rudimentary database might be the best way to go, but is there a simpler way to read and edit files remotely over python? I can host this file on my personal webspace, but I'm not sure of what would be the simplest way to edit and read it since I'm relatively new to python.
The number of users is small and constant so it does not have to be very robust, just needs to work.
Language: Python
Thanks!
I am curious about something I encountered when I was registering on the wakari website. I entered my username which was something like abc.def.ghi and all other information and submitted the form ( or at least tried to submit! ). It threw up an error which said "username must be a valid python variable", so they were obviously doing something in their back-end with usernames as python variables. Would anyone explain to me if this is some sort of design scheme that they are using wherein they store user information as python variables or something like that. Again I apologize since this is not really a specific programming question but this is eating me up and I must know why that happened.
The following is the URL:
https://www.wakari.io/usermgmt/loginorregister
This is pure conjecture. One thing I could see wakiri doing is using the usernames as a module name for your code. That might be interesting. So storing user code as wakiri.<username>. Then the application might be doing an import wakiri.<username> with some interesting stuff in the __init__.py that runs whatever it finds.
Maybe that's it. Or maybe they are storing user code in files on disk. Maybe user code is written out to a file that contains lots of dictionaries that contain code and are named after the username?
Maybe they aren't even using it and just think it is cute to restrict people to valid Python variables.
I'm a Wakari developer, and we've only just caught this question. The short version is that you are pretty safe with a valid UNIX username, and the "error" text should say something using better "plain english" to this effect.
The reason we say the username needs to be a valid Python module name is that we're imagining a day when users could have something like ~/public_python as a place to put directly-shareable code, and then other users could access this via something like from wakari.users import steve. We'd leave it up to you to figure out if you trust user steve enough to import his code directly.
I would like to use a string that was input by the user in a web form as part of a key name:
user_input = self.request.POST.get('foo')
if user_input:
foo = db.get_or_insert(db.Key('Foo', user_input[:100], parent=my_parent))
Is this safe? Or should I do some inexpensive encoding or hash? If yes, which one?
It's safe as long as you don't care about a malicious user filling up your database with junk. get_or_insert won't let them overwrite existing entries, just add new ones.
Make sure you limit it's length (both in the UI and after it's been recieved), even if you do no other validation on it, so at least they can't just give you crazy big keys either to fill up the database quickly or to crash your app.
Edit: You just commented that you do, in fact, verify that it's a reasonable key. In that case, yes, it's safe.
Keep in mind that the user can probably still figure out what key are already in your database, based on how long it takes you to respond to what they've provided, and you still need to make sure they're authorized to see whatever content they request, or limit them to a small number of requests to they can't just brute-force retrieve all the information linked to the keys you're generating.