Dealing with python hash() collision

Dealing with python hash() collision - python

I've created a program to take a users predetermined unique identifier, hash it, and store it in a dictionary mapping to the user's name. I later receive the unique identifier, rehash it, and can look up the user's name.
I've come to a problem where an individual's 9 digit unique ID hash()'s to the same number as somebody else. This has occurred after gathering data for about 40 users.
Is there a common work around to this? I believe this is different than just using a hashmap, because if I create a bucket for the hashed ID, I won't be able to tell who the user was (whether it be the first item in the bucket or second).
Edit:
id = raw_input()
hashed_id = hash(id)
if not dictionary.has_key(hashed_id):
name = raw_input()
dictionary[hashed_id] = name
check_in_user(dictionary[hashed_id])

I have never seen hash() used for this. hash() should be used for data structures as a shorthand for the entire object, such as keys in the internal implementation of dictionaries.
I would suggest using a UUID (universally unique identifier) for your users instead.
import uuid
uuid.uuid4()
# UUID('d36b850c-2433-42c6-9252-6371ea3d33c2')
You'll be very hard pressed to get a collision out of UUIDs.

Related

How to generate unique 8 length number, for account ID for example (Python,Django)

I need to generate unique account ID for each user.(only numeric)
UUID can't solve this problem, pls help me!

Here you go
import random
import string
''.join(random.choice(string.digits) for _ in range(8))
Even shorter with python 3.6 using random.choices()
import random
import string
''.join(random.choices(string.digits, k=8))
Avoid Possible Collision:
Try creating a new object with generated id except integrity error, create id again.
eg. -
def create_obj():
id = ''.join(random.choices(string.digits, k=8))
try:
MyModel.objects.create(id=id)
except IntegrityError:
create_obj()
OR
def create_unique_id():
return ''.join(random.choices(string.digits, k=8))
def create_object():
id = create_unique_id()
unique = False
while not unique:
if not MyModel.objects.get(pk=id):
unique = True
else:
id = create_unique_id()
MyModel.objects.create(id=id)
Thanks to #WillemVanOnsem for pointing out the chances of generating duplicate id, the two examples I provided will create a new id as many times as required to get an unique id, but as the number of rows in your database increase the time to get a unique id will grow more and more and a time will come when there are so many records in your database(10^8) when creation of new record is not possible with a 8-digit uid as all possible combination already exists then you will be stuck in an infinite loop while trying to create a new object.
If the stats provided my Willem is correct, I say the changes are too high of a collision. So I would recommend not to create id's yourself, go with django's default auto field or uuid which guarantees uniqueness through space and time.

Assuming you are using MYSQL and your comment said you didn't use database PK because they start with 1, 2...
Why not just make PK starts with your range?
eg.
ALTER TABLE user AUTO_INCREMENT = 10000000;
And you can put this into your custom migration, see manage.py makemigrations --empty
I presume other databases have the similar approach as well

For django app I would suggest using get_random_string from django.utils.crypto package. Internally python secrets.choice is used. This way you will not have to change code, if secrets interface changes.
from django.utils.crypto import get_random_string
def generate_account_id():
return get_random_string(8, allowed_chars='0123456789')
django.utils.crypto

If you work with python >= 3.6 , the alternative is secrets.token_hex(nbytes) which returns string of hex value, then convert the string to number. To further detect collision you can also check whether any instance with the ID already exists in your Django model (as shown in the accepted answer)
code example :
import secrets
hexstr = secrets.token_hex(4)
your_id = int(hexstr, 16)

Pythonic way to use a variable as any integer

I am having trouble with the parameter of an SNMP query in a python script. An SNMP query takes an OID as a parameter. The OID I use here is written in the code below and, if used alone in a query, should return a list of states for the interfaces of the IP addresses I am querying onto.
What I want is to use that OID with a variable appended to it in order to get a very precise information (if I use the OID alone I will only get a list of thing that would only complexify my problem).
The query goes like this:
oid = "1.3.6.1.4.1.2011.5.25.119.1.1.3.1.2."
variable = "84.79.84.79"
query = session.get(oid + variable)
Here, this query will return a corrupted SNMPObject, as in the process of configuration of the device I am querying on, another number is added, for some reason we do not really care about here, between these two elements of the parameter.
Below is a screenshot showing some examples of an SNMP request that only takes as a parameter the OID above, without the variable appended, on which you may see that my variable varies, and so does the highlighted additional number:
Basically what I am looking for here is the response, but unfortunately I cannot predict for each IP address I am querying what will that "random" number be.
I could use a loop that tries 20 or 50 queries and only saves the response of the only one that would have worked, but it's ugly. What would be better is some built-in function or library that would just say to the query:
"SNMP query on that OID, with any integer appended to it, and with my variable appended to that".
I definitely don't want to generate a random int, as it is already generated in the configuration of the device I am querying, I just want to avoid looping just to get a proper response to a precise query.
I hope that was clear enough.

Something like this should work:
from random import randint
variable = "84.79.84.79"
numbers = "1.3.6.1.4.1.2011.5.25.119.1.1.3.1.2"
query = session.get('.'.join([numbers, str(randint(1,100)), variable])

Implementing HWID checking system for Python scripts?

Let's say I was selling a software and I didn't want it to be leaked (of course using Python wouldn't be the greatest as all code is open but let's just go with it), I would want to get a unique ID that only the user's PC has and store it within a list in my script. Now every time someone executed the script it would iterate through the list and check to see if that unique ID matches with one from the list.
Would this even be possible and if so how would one implement it in Python?

Python has a unique id library uuid. https://docs.python.org/3.5/library/uuid.html
import uuid
# Create a uuid
customer_id = str(uuid.uuid4())
software_ids = [customer_id] # Store in a safe secure place
can_run = customer_id in software_ids
print(can_run)

Fun with GAE: using key_name as PK?

I want to insert new entities programatically as well as manually. For this I was thinking about using key_name to uniquely identify an entity.
The problem is that I don't know how to get the model to generate a new unique key name when I create the entity.
On the other hand, I cannot create the ID (which is unique across data store) manually.
How can I do "create unique key name if provided value is None"?
Thanks for your help!

If you really need a string id (as opposed to an automatically assigned integer id), you could use a random string generator, or unique id generator like uuid.uuid4.

I don't really understand your question. If you want an automatically-generated key name, just leave out the key when you instantiate the object - one will be automatically assigned when you call put().

If most of the time you want your entities to have automatically assigned ids, then just go around creating your entities (without passing a key or key_name), the key will be auto-assigned and your entities will have .key().id() available.
If sometimes you need to assign the numeric ids manually, then you can reserve a block of ids so that AppEngine will never auto-assign them, and then you can use an id from this reserved range whenever you want to assign an id to a known entity. To assign an id to an entity you create a Key for that entity:
# reserve ids 4000 to 5000 for use manually on the Customer entity
db.allocate_id_range(db.Key.from_path('Customer',1),4000,5000)
# manualy create a customer with id 4501
bob = Customer(key=db.Key.from_path('Customer',4501), name='bob')
bob.put()

How to check if key exists in datastore without returning the object

I want to be able to check if a key_name for my model exists in the datastore.
My code goes:
t=MyModel.get_by_key_name(c)
if t==None:
#key_name does not exist
I don't need the object, so is there a way (which would be faster and cost less resource) to check if the object exist without returning it? I only know the key name, not the key.

You can't avoid get_by_key_name() or key-related equivalents to check if a key exists. Your code is fine.

The API talks about Model.all(keys_only=False) returning all the key names when keys_only is set to True
Look at the query that is fired for this, and then you can write a query similar to this but just for your object and see if any row is fetched or not.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Dealing with python hash() collision - python

Related

How to generate unique 8 length number, for account ID for example (Python,Django)

Pythonic way to use a variable as any integer

Implementing HWID checking system for Python scripts?

Fun with GAE: using key_name as PK?

How to check if key exists in datastore without returning the object

Categories

Resources