Why Does Django Query to Check Uniqueness?

Why Does Django Query to Check Uniqueness? - python

tl;dr: Why does Django's uniqueness check on INSERT require a SELECT query, and can I permanently disable it?
I'm working to highly optimize a Django app that is writing to a PSQL database. I have a uuid column which is a primary_key as part of my model. It is the only unique field in the model.
id = models.UUIDField(
primary_key = True,
default = uuid.uuid4,
editable = False,
null = False,
blank = False,
help_text = 'The unique identifier of the Node.'
)
The issue I'm encountering is that when attempting to save a new item, Django automatically performs a uniqueness check query prior to the insert:
SELECT (1) AS "a" FROM "customer" WHERE "customer"."id" = \'1271a1c8-5f6d-4961-b2e9-5e93e450fd4e\'::uuid LIMIT 1
This results in an extra round trip to the database. I know that the field must be unique, but of course Django has already configured this at the database level - so if it tries to insert a row with a non-unique field it will get an error.
I've implemented a workaround which suppresses the query by adding the following to my model:
def validate_unique(self, *args, **kwargs):
# Make sure that we never validate if ID is unique. Duplicates OK.
current_exclude = set(kwargs.get('exclude', []))
current_exclude.add('id')
kwargs['exclude'] = list(current_exclude)
super().validate_unique(*args, **kwargs)
This will ensure that uniqueness on the id field is never checked.
This works, I don't get the extra query. I also verified that if I do try to re-insert a duplicate UUID, I indeed get an error with the database as the source.
My question is this: Why does Django do this? I'm tempted to prevent Django from checking uniqueness ever, unless the extra round trip to the DB accomplishes some valuable purpose.
Env:
django==2.2.12
psycopg2-binary==2.8.5

In Django, model validation is a distinct step from model saving. It appears that whatever you're doing is triggering validation.
There are a number of good reasons for those to be separate steps. One is that you can express many more constraints in arbitrary Python code than you can with database constraints. Another is that it allows you to generate much more descriptive error messages than you would get by trying to parse non-standardized database errors. Another is that sometimes you simply want to know whether something's valid but don't want to actually save it.
By default, Django does not validate models before saving them. Some Django components, though, like the admin (more generally, ModelForms) do trigger validation.
So, you need to figure out why validation is being triggered in your case, and if that's not what you want, prevent it.

Related

django.utils.crypto get_random_string() causing duplicate key error? [duplicate]

Note:
I understand and am well aware of the difference between passing a function as a parameter and invoking a function and passing the result as a parameter. I believe I am passing the function correctly.
Specs
Django 1.11
PostgreSQL 10.4
Scenario:
I have dozens of models in my application, with many existing records. I need to add a random seed to each of these models that will get created and set when a new model instance is created. I also want to generate the random seed for the existing instances.
My understanding of how Django model defaults and Migrations work is that when a new field is added to a model, if that field has a default value set, Djano will update all existing instances with the new field and corresponding default.
However, despite the fact that I'm definitely passing a function as the default, and the function produces a random number, Django is using the same random number when updating existing rows (e.g. it seems that Djano is only calling the function once, then using the return value for all entries).
Example
A shortened version of my code:
def get_random():
return str(random.random())
class Response(models.Model):
# various properties
random = models.CharField(max_length=40, default=get_random)
user = models.ForeignKey(User, on_delete=models.CASCADE)
content = JSONField(null=True)
The random field is being added after the model and many instances of it have already been created. A makemigrations command appears to generate the proper migration file, with a migrations.AddField() call, passing in default=get_random as a parameter.
However, after running makemigrations, all existing existing Response instances contain the exact same number in their random field. Creating and saving new instances of the model work as expected (with a pseudo-unique random number).
Workaround
An easy workaround is to just run a one-time script that does a
for r in Response.objects.all():
r.random = get_random()
r.save()
Or override the model's save() method and then do a mass save. But I don't think these workarounds should be necessary. It also means that if I want to make a unique field with a random default, then I will need multiple migrations. First I would have to add the field with the assigned default. Next I would need to apply the migration and manually re-initialize the field values. Then a second migration to add the unique=True property.
It seems that if Django is to apply default values to existing instances upon a makemigrations then it should apply them using the same semantics as creating a new instance. Is there any way to force Django to call the function for each model instance when migrating?

To add a non-null column to an existing table, Django needs to use an ALTER TABLE ADD COLUMN ... DEFAULT <default_value>. This only allows Django to call the default function once, that's why you see every row having the same value.
Your workaround is pretty much spot on, except that you can populate the existing rows with unique values using a data migration, so that it doesn't require any manual steps. The entire procedure for this use-case is described in the docs: https://docs.djangoproject.com/en/2.1/howto/writing-migrations/#migrations-that-add-unique-fields

Why is a Django migration using the same random default on every row?

Note:
I understand and am well aware of the difference between passing a function as a parameter and invoking a function and passing the result as a parameter. I believe I am passing the function correctly.
Specs
Django 1.11
PostgreSQL 10.4
Scenario:
I have dozens of models in my application, with many existing records. I need to add a random seed to each of these models that will get created and set when a new model instance is created. I also want to generate the random seed for the existing instances.
My understanding of how Django model defaults and Migrations work is that when a new field is added to a model, if that field has a default value set, Djano will update all existing instances with the new field and corresponding default.
However, despite the fact that I'm definitely passing a function as the default, and the function produces a random number, Django is using the same random number when updating existing rows (e.g. it seems that Djano is only calling the function once, then using the return value for all entries).
Example
A shortened version of my code:
def get_random():
return str(random.random())
class Response(models.Model):
# various properties
random = models.CharField(max_length=40, default=get_random)
user = models.ForeignKey(User, on_delete=models.CASCADE)
content = JSONField(null=True)
The random field is being added after the model and many instances of it have already been created. A makemigrations command appears to generate the proper migration file, with a migrations.AddField() call, passing in default=get_random as a parameter.
However, after running makemigrations, all existing existing Response instances contain the exact same number in their random field. Creating and saving new instances of the model work as expected (with a pseudo-unique random number).
Workaround
An easy workaround is to just run a one-time script that does a
for r in Response.objects.all():
r.random = get_random()
r.save()
Or override the model's save() method and then do a mass save. But I don't think these workarounds should be necessary. It also means that if I want to make a unique field with a random default, then I will need multiple migrations. First I would have to add the field with the assigned default. Next I would need to apply the migration and manually re-initialize the field values. Then a second migration to add the unique=True property.
It seems that if Django is to apply default values to existing instances upon a makemigrations then it should apply them using the same semantics as creating a new instance. Is there any way to force Django to call the function for each model instance when migrating?

To add a non-null column to an existing table, Django needs to use an ALTER TABLE ADD COLUMN ... DEFAULT <default_value>. This only allows Django to call the default function once, that's why you see every row having the same value.
Your workaround is pretty much spot on, except that you can populate the existing rows with unique values using a data migration, so that it doesn't require any manual steps. The entire procedure for this use-case is described in the docs: https://docs.djangoproject.com/en/2.1/howto/writing-migrations/#migrations-that-add-unique-fields

Why does Django set a shorter maximum length of foreign keys to auth.User.username?

I have a model with a foreign key that references the username field of auth.User. The original field has a maximum length of 150. But Django generates a foreign key with a maximum length of 30.
In my app's models.py:
class Profile(models.Model):
user = models.ForeignKey('auth.User', to_field='username')
In django.contrib.auth.models:
username = models.CharField(
_('username'),
max_length=150,
Generated SQL:
CREATE TABLE "myapp_profile" (
"id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"user_id" varchar(30) NOT NULL REFERENCES "auth_user" ("username")
);
This only happens when referencing auth.User.username. If I reference a long field in my own model, the foreign key is generated fine.
Why is that? How can I overcome it?
Using Django 1.11.4 and Python 3.6.2. I tried PostgreSQL and SQLite and the problem occurs on both.
CLARIFICATION:
From the answers so far I think my question was misunderstood. I am not looking for a way to have long usernames. My problem is that the stock User model that comes with Django has one max_length (150), but when your model refers to it, the foreign hey has a shorter max_length of 30. Therefore if a user is registered with a username of 31 characters, I will not be able to create child objects of that user, because the foreign key constraint will be violated. And I need this because I have a REST API whose URLs nest resources under uses, that are referred by username, not ID. For example: /users/<username>/profiles/...
UPDATE:
I think the reason for this behavior is the undocumented swappable property of the User model. It is designed to be replaceable by custom models. However, the configured model must have its data in the initial migration of the app that defines the model. The migrations code seems to generate references to the initial migration of swappable models. I am using the default User model, and its initial migration sets the username to 30 chars. Hence my username FKs are 30 chars long. I am able to work around this with a RunSQL migration to alter the FK data type to varchar(15), but I am in doubt if it's the right thing to do.

Is recommended use short identifier, varchar(30) is a long number, something like 999999999999999999999999999999, when Django make identifiers always use the same number. I don't think that you are going to use so much users if you reach that number you should create another type of identifier. Remember the long of the user_id field is the id of the username and not the string

You can use this hack described in this SO answer,
but be very careful!.
Or you can use this package.
However, I think that, as described in this discussion, the best way would be to create a custom User model and do whatever you want there.
Hope it helps!

You must use custom user model.Taken from django docs.
150 characters or fewer. Usernames may contain alphanumeric, _, #, +, . and - characters.
The max_length should be sufficient for many use cases. If you need a longer length, please use a custom user model. If you use MySQL with the utf8mb4 encoding (recommended for proper Unicode support), specify at most max_length=191 because MySQL can only create unique indexes with 191 characters in that case by default.

What is class meta in model?

Recently I am doing a function that need to customize max length of User model. Now I know that I have learned how to solve my requirement.
When I type this code, I can extend max_length:
User._meta.get_field('username').max_length = 100
User._meta.get_field('email').max_length = 100
I still can't understand what meta is. When I try to read Django document about meta model, I just know that I should how to use meta. I need more explanation that let me really understand the inner meaning about meta.
Model Meta relative link:
https://docs.djangoproject.com/en/1.7/ref/models/options/
https://docs.djangoproject.com/en/1.7/topics/db/models/#meta-options

From docs:
Model metadata is “anything that’s not a field”, such as ordering options (ordering), database table name (db_table), or human-readable singular and plural names (verbose_name and verbose_name_plural). None are required, and adding class Meta to a model is completely optional.
Thus, Meta is just a container class responsible for holding metadata information attached to the model. It defines such things as available permissions, associated database table name, whether the model is abstract or not, singular and plural versions of the name etc.
For the available Meta options, you can take a look at here.
As for your question, I would definitely avoid changing max_length to some other value like that, as you know, max_length also creates a database constraint such as VARCHAR(64) which can't be automatically updated to a new value (100) by Django.
Thus, if you want to change max length, make sure you also update the size of the column in the corresponding table column in the database:
For MySQL:
ALTER TABLE auth_user MODIFY username VARCHAR(100);
For PostgreSQL:
ALTER TABLE auth_user ALTER COLUMN username TYPE VARCHAR(100);

Database queries with ModelForm Model(Multiple) Choice fields in Django

I have a form with some Model(Multiple)Choice fields that have so many options that I would like to trim down the available options based on user responses on the front-end, and then populate the select options through AJAX.
I am a little confused as to when Django will query the database in this case, and what are considered the best practices for Django ModelChoice fields that are populated with AJAX data.
Originally, I had been doing things like this:
contact = forms.ModelChoiceField(queryset=aRelatedModel.objects.all())
or a restricted queryset:
contact = forms.ModelChoiceField(queryset=aRelatedModel.objects.filter(somefield = someValue))
So, my question is, when does the DB get queried for ModelChoice options?
The confusion stems from another form I did, where I had a ModelChoiceField with the ability to add new options dynamically. In that case, unless I instantiated the ModelChoiceField after saving the new option, I would get an error. This makes me feel like the database is queried on form instantiation. But, given the lazy nature of Django querysets, it seems like it would also make sense that the DB is not queried until you iterate over said list (ie, when printing the form options).
So, in this kind of case is there a way to avoid potentially needless DB queries? What is the best practice for ModelChoiceFields that will be populated with AJAX data?
I've seen mentions of:
contact = forms.ModelChoiceField(queryset=aRelatedModel.objects.none())
...but never any explicit explanation on why to use this.
Edit:
In that case, I had a form with
field = forms.ModelChoiceField(queryset = relatedModel.objects.all())
Subsequently in the view, I naively did:
myForm = modelForm(request.POST). This produced an error if I instantiated the form before first saving the dynamically added field. After adding the field, and then calling modelForm(request.POST) I no longer had an "invalid choice" error - presumably because the dynamically added field was now included in the modelForm queryset.
I am not sure how that is relevant to the question, however. The question is when a modelForm's queryset is populated with data from the DB.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.