Migration of Django field with default value to PostgreSQL database - python

https://docs.djangoproject.com/en/1.10/topics/migrations/
Here it says:
"PostgreSQL is the most capable of all the databases here in terms of schema support; the only caveat is that adding columns with default values will cause a full rewrite of the table, for a time proportional to its size.
"For this reason, it’s recommended you always create new columns with null=True, as this way they will be added immediately."
I am asking if I get it correct.
From what I understand, I should first create the field with null=True and no default value then migrate it and then give it a default value and migrate it again, the values will be added immediately, but otherwise the whole database would be rewritten and Django migration doesn't want to do the trick by itself?

It's also mentioned in that same page that:
In addition, MySQL will fully rewrite tables for almost every schema
operation and generally takes a time proportional to the number of
rows in the table to add or remove columns. On slower hardware this
can be worse than a minute per million rows - adding a few columns to
a table with just a few million rows could lock your site up for over
ten minutes.
and
SQLite has very little built-in schema alteration support, and so
Django attempts to emulate it by:
Creating a new table with the new schema Copying the data across
Dropping the old table Renaming the new table to match the original
name
So in short, what that statement you are referring to above really says is
postgresql exhibits mysql like behaviour when adding a new column with
a default value
The approach you are trying would work. Adding a column with a null would mean no table re write. You can then alter the column to have a default value. However existing nulls will continue to be null

The way I understand it, on the second migration the default value will not be written to the existing rows. Only when a new row is created with no value for the default field it will be written.
I think the warning to use null=True for new column is only related to performance. If you really want all the existing rows to have the default value just use default= and accept the performance consequence of a table rewrite.

Related

When I delete all the item's from my query in Django the item id's don't reset [duplicate]

I have been working on an offline version of my Django web app and have frequently deleted model instances for a certain ModelX.
I have done this from the admin page and have experienced no issues. The model only has two fields: name and order and no other relationships to other models.
New instances are given the next available pk which makes sense, and when I have deleted all instances, adding a new instance yields a pk=1, which I expect.
Moving the code online to my actual database I noticed that this is not the case. I needed to change the model instances so I deleted them all but to my surprise the primary keys kept on incrementing without resetting back to 1.
Going into the database using the Django API I have checked and the old instances are gone, but even adding new instances yield a primary key that picks up where the last deleted instance left off, instead of 1.
Wondering if anyone knows what might be the issue here.
I wouldn't call it an issue. This is default behaviour for many database systems. Basically, the auto-increment counter for a table is persistent, and deleting entries does not affect the counter. The actual value of the primary key does not affect performance or anything, it only has aesthetic value (if you ever reach the 2 billion limit you'll most likely have other problems to worry about).
If you really want to reset the counter, you can drop and recreate the table:
python manage.py sqlclear <app_name> > python manage.py dbshell
Or, if you need to keep the data from other tables in the app, you can manually reset the counter:
python manage.py dbshell
mysql> ALTER TABLE <table_name> AUTO_INCREMENT = 1;
The most probable reason you see different behaviour in your offline and online apps, is that the auto-increment value is only stored in memory, not on disk. It is recalculated as MAX(<column>) + 1 each time the database server is restarted. If the table is empty, it will be completely reset on a restart. This is probably very often for your offline environment, and close to none for your online environment.
As others have stated, this is entirely the responsibility of the database.
But you should realize that this is the desirable behaviour. An ID uniquely identifies an entity in your database. As such, it should only ever refer to one row. If that row is subsequently deleted, there's no reason why you should want a new row to re-use that ID: if you did that, you'd create a confusion between the now-deleted entity that used to have that ID, and the newly-created one that's reused it. There's no point in doing this and you should not want to do so.
Did you actually drop them from your database or did you delete them using Django? Django won't change AUTO_INCREMENT for your table just by deleting rows from it, so if you want to reset your primary keys, you might have to go into your db and:
ALTER TABLE <my-table> AUTO_INCREMENT = 1;
(This assumes you're using MySQL or similar).
There is no issue, that's the way databases work. Django doesn't have anything to do with generating ids it just tells the database to insert a row and gets the id in response from database. The id starts at 1 for each table and increments every time you insert a row. Deleting rows doesn't cause the id to go back. You shouldn't usually be concerned with that, all you need to know is that each row has a unique id.
You can of course change the counter that generates the id for your table with a database command and that depends on the specific database system you're using.
If you are using SQLite you can reset the primary key with the following shell commands:
DELETE FROM your_table;
DELETE FROM SQLite_sequence WHERE name='your_table';
Another solution for 'POSTGRES' DBs is from the UI.
Select your table and look for 'sequences' dropdown and select the settings and adjust the sequences that way.
example:
I'm not sure when this was added, but the following management command will delete all data from all tables and will reset the auto increment counters to 1.
./manage.py sqlflush | psql DATABASE_NAME

sqlalchemy create a column that is autoupdated depending on other columns

I need to create a column in a table that is autoupdated if one or more columns (possibly in another table) are updated, but it also should be possible to edit this column directly (and value should be kept in sql unless said other cols are updated, in which case first logic is applied)
I tried column_property but it seems that its merely a construction inside python and doesnt represent an actual column
I also tried hybrid_property and default, both didnt accomplish this
This looks like insert/update trigger, however i want to know "elegant" way to declare it if its even possible
I use declarative style for tables on postgres
I dont make any updates to sql outside of sqlalchemy
Definitely looks like insert/update triggers. But if I were you, I would incapsulate this logic in python by using 2 queries , so it will be more clear

Are there disadvantages to making all columns except the primary key column in a table a unique index?

I want to avoid making duplicate records, but there are some occasions when updating the record, the values I receive are exactly the same as the record's version. This results in 0 affected rows which is a value I retain to help me determine if I need to insert a new transaction.
I've tried using a select statement to look for the exact transaction, but some fields (out of many) can be null which doesn't bode well when I have string variables that all have 'field1 = %s' in their where clauses when I'd need something like 'field1 is NULL' instead to get an accurate result back.
My last thought is using a unique index on all of the columns except the one for the table's primary key, but I'm not too familiar with using unique indexes. Should I be able to update these records after the fact? Are there risks to consider when implementing this solution?
Or is there another way I can tell whether I have an unchanged transaction or a new one when provided with values to update with?
The language I'm using is Python with mysql.connector

Database design, adding an extra column versus converting existing column with a function

suppose there was a database table with one column, and it's a PK. To make things more specific this is a django project and the database is in mysql.
If I needed an additional column with all unique values, should I create a new UniqueField with unique integers, or just write a hash-like function to convert the existing PK's for each existing row (model instance) into a new unique variable. The current PK is a varchar/ & string.
With creating a new column it consumes more memory but I think writing a new function and converting fields frequently has disadvantages also. Any ideas?
Having a string-valued PK should not be a problem in any modern database system. A PK is automatically indexed, so when you perform a look-up with a condition like table1.pk = 'long-string-key', it won't be a string comparison but an index look-up. So it's ok to have string-valued PK, regardless of the length of the key values.
In any case, if you need an additional column with all unique values, then I think you should just add a new column.

South initial migrations are not forced to have a default value?

I see that when you add a column and want to create a schemamigration, the field has to have either null=True or default=something.
What I don't get is that many of the fields that I've written in my models initially (say, before initial schemamigration --init or from a converted_to_south app, I did both) were not run against this check, since I didn't have the null/default error.
Is it normal?
Why is it so? And why is South checking this null/default thing anyway?
If you add a column to a table, which already has some rows populated, then either:
the column is nullable, and the existing rows simply get a null value for the column
the column is not nullable but has a default value, and the existing rows are updated to have that default value for the column
To produce a non-nullable column without a default, you need to add the column in multiple steps. Either:
add the column as nullable, populate the defaults manually, and then mark the column as not-nullable
add the column with a default value, and then remove the default value
These are effectively the same, they both will go through updating each row.
I don't know South, but from what you're describing, it is aiming to produce a single DDL statement to add the column, and doesn't have the capability to add it in multiple steps like this. Maybe you can override that behaviour, or maybe you can use two migrations?
By contrast, when you are creating a table, there clearly is no existing data, so you can create non-nullable columns without defaults freely.
When you have existing records in your database and you add a column to one of your tables, you will have to tell the database what to put in there, south can't read your mind :-)
So unless you mark the new field null=True or opt in a default value it will raise an error. If you had an empty database, there are no values to be set, but a model field would still require basic properties. If you look deeper at the field class you're using you will see django sets some default values, like max_length and null (depending on the field).

Categories

Resources