How to keep multiple machines synced with South and Git

How to keep multiple machines synced with South and Git - python

So, although I enjoy South, I've had constant issues with this particular workflow:
migrate a few times on machine A
periodically push changes to Git
after a long period, return to machine B
pull from Git and migrations throw various errors for machine B
These errors are usually a "table already exists" errors.
Now I've read through numerous blog posts and stack questions, and frankly, there doesn't seem to be a clear answer on how to properly check in migration files (and whether you should at all) and how to really integrate South with Git.
What I'm looking for is a detailed run-through of how to use Git and South properly together, and to show what the workflow would be like between two machines.
Currently, what I'm having to do, is after a while completely clear out the migration folders and start from scratch. This doesn't seem like a good way to handle things.

I'd love to know where the doubt has arisen about committing South migration files. I certainly wasn't aware of any suggestion that you wouldn't.
With your workflow you don't specify whether the machines A and B are using the same or different databases. If your code is going to be significantly different between the two machines then they should use different databases. If the database schema gets ahead of the code then you will get errors. Obviously the schema cannot get behind the code because you should always run migrate after a code update.
My workflow is as follows:
A: create schema migrations and apply as they are created.
A: add schema migration files to subversion and commit
B: svn up
B: python manage.py migrate
B: continue coding!
Because migration files can contain code that translates data in the database you shouldn't delete the migrations because you will lose that code. I have a three person development team who have created 80+ migrations and not encountered any problems of the form you describe.

The problem
with south and team workflow appears when two persons create migration without syncing to each other.
Imagine that we have some repo. Persons A and B clone it, then change some model, create migration and then push this all back. We will have 2 migrations with same number in the repo.
South will complain if you'll try to make migration with such history.
Inconsistent migration history
The following options are available:
--merge: will just attempt the migration ignoring any potential dependency conflicts.
As stated in south docs http://south.readthedocs.org/en/latest/tutorial/part5.html you could try to use --merge option and south will try to merge migrations. It will fail if conflicting migrations changed same model(s).
./manage.py schemamigration --auto --merge appname
So the main rule for team is: at one time only one developer could change one model. If somebody started to change model then nobody should touch it until they have migration files up to date.
Rules for team workflow with south and multiple git branches:
Before making changes in model double check if somebody already making changes there
Notify other members about you changes asap
Sync your migrations directories asap
Also from south docs:
when you pull in someone else’s model changes complete with their own migration, you’ll need to make a new empty migration that has the changes from both branches of development frozen in (if you’ve used mercurial, this is equivalent to a merge commit). To do so, simply run:
./manage.py schemamigration --empty appname merge_models
merge_models there is only new migration name
Rules for team workflow with south and single git branch:
If all your team members commit to single branch, then best strategy will be to make model changes first, make migration and push it as soon as possible. Then work on other code.
This articles could also be interesting for you:
http://andrewingram.net/2012/dec/common-pitfalls-django-south/
http://anthony-tresontani.github.io/Django/2013/03/15/south-workflow/

Related

Django: Best way to merge migrations conflicts

I'm currently working on a dev branch and I will need to merge it to master one day. I have up to 20 migrations files on my dev branch and about the same number on master at the moment. I needed to make migrations on both branches which will result in migrations having the same prefix,
(ex 0003_auto)
In other words, if you have migrations files generated by makemigrations with the same prefix, what is the best/secure way of handling this.
Here are two ways I have figured myself (maybe entirely wrong):
Deleting all migrations files, merge the code and then running a fresh makemigrations and migrate which will result in only one migration file.
Using the --merge flag to let django make the merge:
makemigrations --merge
Now, knowing all this I'd like to know what is the best way of handling this. In general, what should I use that will correctly merge conflicts and get me a fresh version of my project with every model updates.
EDIT
I think providing a step by step solution would be ideal for me and future users since there exists tons of informations on the subject but not one seems to be concise and clear.

From the Django docs:
Because migrations are stored in version control, you’ll occasionally come across situations where you and another developer have both committed a migration to the same app at the same time, resulting in two migrations with the same number.
Don’t worry - the numbers are just there for developers’ reference, Django just cares that each migration has a different name [emphasis added].

The simplest way to do this without any worry is this:
Revert to a stable point (before conflicts):
python manage.py migrate usersmaster 0021_signup_status
Delete new migration files.
Re-make migrations:
python manage.py makemigrations

When you are ready, you should merge from master to your development branch. At that time you should fix all conflicts, your migrations should go after master's migrations, and after all of that your database should look as you want it be like.
Since that process takes time, and is quite painful, most people consider short living development branches. That way you need to deal with one or two migration files at a time.

You can resolve migration errors using django-migration-fixer
Fixing migrations on your dev branch can be done using
$ git checkout [dev-branch]
$ git merge [main/master]
Follow the installation instructions here
Run
$ python manage.py makemigrations --fix -b [main/master]
commit the changes and push to the remote branch
$ git add .
$ git commit -am ...
$ git push ...

Why there is need to push django migrations to version control system

This is a common practice that people working on django project usually push migrations to the version control system along with other code.
My question is why this practice is so common? Why not just push the updated models and everyone generate migrations locally. This approach can reduce the effort for resolving migrations conflicts too.

If you didn't commit them to a VCS then what would happen is people would make potentially conflicting changes to the model.
When finally ready to deploy, you would still need django to make new migrations that would then merge everybodys changes together. And this just creates an additional unnecessary step that can introduce bugs.
You also are assuming everybody will always be able to work on an up to date version of the code which isn't always possible when you start working on branches that are not ready to be merged into mainline.

Migrations synchronize the state of your database with the state of your code. If you don't check in the migrations into version control, you lose the intermediate steps. You won't be able to go back in the version control history and just run the code, as the database won't match the models at that point in time.
Migrations, like any code, should be tested, at the very least on a basic level. Even though they are auto-generated, that's not a guarantee that they will work 100% of the time. So the safe path is to create the migrations in your development environment, test them, and then push them to the production environment to apply them there.

Firstly, migrations in version control allows you to run them in production.
Secondly, migrations are not always automatically generated. For example, if you add a new field to a model, you might write a migration to populate the field. That migration cannot be re-created from the models. If that migration is not in version control, then no-one else will be able to run it.

Can I delete the django migration files inside migrations directory

I personally like django for its MVC ideals. But while i am running Django migrations in version 1.7 each and every migrations i do in it is stored inside the migrations directory. If i delete those file it is throwing an error while migration.
I Tested like this. I created a new Django project and initiated a git repo . I ran some 3-4 migrations in Django which resulted in
3-4 migration files under the migrations directory. I tried deleting the very older migration files i.e (1st and 2nd migration files) and tried to run
python manage.py makemigrations
which does cause some error like "migration files not found". Later i did a git stash which restored the deleted files. Now i tried to run the same command again and it was working fine.
What my question is if a person runs some 50 changes in db during development all the migration files are stored in migrations directory. Is it possible to delete those files and do changes to db again without any interruption?

The answer is "it depends".
If you are working against a production DB, or some DB that can't periodically blow away for whatever reason, then you absolutely want to keep around the migration files that you've applied to your DB. They should be checked into source control with the rest of your code.
Now, for a situation like yours, the easiest way to discard your 50 migrations would be to just blow away the db (and it's 50 migrations) and start from scratch given your current models. It's oftentimes a good idea to do this periodically as you evolve your models during development.
Its ok to blow away your models when you blow away your DB because syncdb will build a blank db using your current models. It'll then optionally populate the db using any initial fixtures. Conceptually, there is no longer anything that you've migrated from at such a point, so you don't need to keep around your old migrations for your old db. They are no longer relevant.
It's not usually good to delete migration files that have been applied to your DB unless you are either 1) blowing away the DB altogether, or 2) reverting the migrations first.
You might also appreciate knowing that when you apply migrations to a db it's also recording those migrations in a special table in the db itself. That's why things go haywire when you just delete the migration files. They have to stay in sync with the migration table

The answer is "Do not delete migration files".
To understand why we shouldn't delete migration files, you need to understand how migration works in frameworks.
Migration files are the history of your database. One migration file is created based on the migration files created in the past. Deleting migration files means losing your history. This historical info is recorded in the django_migrations table in your database. if you delete migration files, you will get dependency errors. So Don't try to lose your history by deleting your migration files.

If you want to keep your DB, but decrease the number of migration files, one option is squashing the migrations into one (or few, if complex dependencies) migration.
From the official documentation:
You are encouraged to make migrations freely and not worry about how many you have; the migration code is optimized to deal with hundreds at a time without much slowdown. However, eventually you will want to move back from having several hundred migrations to just a few, and that’s where squashing comes in.
Before squashing, you should be aware that "model interdependencies in Django can get very complex, and squashing may result in migrations that do not run", and therefore manual work may be needed.
For detailed information about how to make the squashing, refer to the docs: https://docs.djangoproject.com/en/dev/topics/migrations/#squashing-migrations

If models match database it is safe to delete migration files.
Currently, with Django 3 I can safely remove the migrations directory, then run python manage.py makemigrations myapp and python manage.py migrate. After that I have 0001_initial.py migration file and my production database is intact. This works when models already match database.

In my opinion, it would be a bad idea. You can always roll back migrations if you make a mistake. Also, as migrations grow too large, you can also "squash" them. I learned about this from an article written by DoorDash.
You are encouraged to make migrations freely and not worry about how many you have; the migration code is optimized to deal with hundreds at a time without much slowdown. However, eventually you will want to move back from having several hundred migrations to just a few, and that’s where squashing comes in.
Squashing migrations: https://docs.djangoproject.com/en/3.2/topics/migrations/#squashing-migrations
DoorDash article: https://doordash.engineering/2017/05/15/tips-for-building-high-quality-django-apps-at-scale/

It probably isn't a good idea (apparently), but if you are going to do it...
do not remove the __init__.py files.
In *nix:
cd [your project directory]
find . -path "*/migrations/[0-9][0-9][0-9][0-9]_*.py" -delete
find . -path "*/migrations/*.pyc" -delete

Should I be adding the Django migration files in the .gitignore file?

Should I be adding the Django migration files in the .gitignore file?
I've recently been getting a lot of git issues due to migration conflicts and was wondering if I should be marking migration files as ignore.
If so, how would I go about adding all of the migrations that I have in my apps, and adding them to the .gitignore file?

Quoting from the Django migrations documentation:
The migration files for each app live in a “migrations” directory inside of that app, and are designed to be committed to, and distributed as part of, its codebase. You should be making them once on your development machine and then running the same migrations on your colleagues’ machines, your staging machines, and eventually your production machines.
If you follow this process, you shouldn't be getting any merge conflicts in the migration files.
When merging version control branches, you still may encounter a situation where you have multiple migrations based on the same parent migration, e.g. if to different developers introduced a migration concurrently. One way of resolving this situation is to introduce a merge_migration. Often this can be done automatically with the command
./manage.py makemigrations --merge
which will introduce a new migration that depends on all current head migrations. Of course this only works when there is no conflict between the head migrations, in which case you will have to resolve the problem manually.
Given that some people here suggested that you shouldn't commit your migrations to version control, I'd like to expand on the reasons why you actually should do so.
First, you need a record of the migrations applied to your production systems. If you deploy changes to production and want to migrate the database, you need a description of the current state. You can create a separate backup of the migrations applied to each production database, but this seems unnecessarily cumbersome.
Second, migrations often contain custom, handwritten code. It's not always possible to automatically generate them with ./manage.py makemigrations.
Third, migrations should be included in code review. They are significant changes to your production system, and there are lots of things that can go wrong with them.
So in short, if you care about your production data, please check your migrations into version control.

You can follow the below process.
You can run makemigrations locally and this creates the migration file. Commit this new migration file to repo.
In my opinion you should not run makemigrations in production at all. You can run migrate in production and you will see the migrations are applied from the migration file that you committed from local. This way you can avoid all conflicts.
IN LOCAL ENV, to create the migration files,
python manage.py makemigrations
python manage.py migrate
Now commit these newly created files, something like below.
git add app/migrations/...
git commit -m 'add migration files' app/migrations/...
IN PRODUCTION ENV, run only the below command.
python manage.py migrate

Quote from the 2022 docs, Django 4.0. (two separate commands = makemigrations and migrate)
The reason that there are separate commands to make and apply
migrations is because you’ll commit migrations to your version control
system and ship them with your app; they not only make your
development easier, they’re also useable by other developers and in
production.
https://docs.djangoproject.com/en/4.0/intro/tutorial02/

TL;DR: commit migrations, resolve migration conflicts, adjust your git workflow.
Feels like you'd need to adjust your git workflow, instead of ignoring conflicts.
Ideally, every new feature is developed in a different branch, and merged back with a pull request.
PRs cannot be merged if there's a conflict, therefore who needs to merge his feature needs to resolve the conflict, migrations included. This might need coordination between different teams.
It is important though to commit migration files! If a conflict arises, Django might even help you solve those conflicts ;)

I can't imagine why you would be getting conflicts, unless you're editing the migrations somehow? That usually ends badly - if someone misses some intermediate commits then they won't be upgrading from the correct version, and their copy of the database will be corrupted.
The process that I follow is pretty simple - whenever you change the models for an app, you also commit a migration, and then that migration doesn't change - if you need something different in the model, then you change the model and commit a new migration alongside your changes.
In greenfield projects, you can often delete the migrations and start over from scratch with a 0001_ migration when you release, but if you have production code, then you can't (though you can squash migrations down into one).

The solution usually used, is that, before anything is merged into master, the developer must pull any remote changes. If there's a conflict in migration versions, he should rename his local migration (the remote one has been run by other devs, and, potentially, in production), to N+1.
During development it might be okay to just not-commit migrations (don't add an ignore though, just don't add them). But once you've gone into production, you'll need them in order to keep the schema in sync with model changes.
You then need to edit the file, and change the dependencies to the latest remote version.
This works for Django migrations, as well as other similar apps (sqlalchemy+alembic, RoR, etc).

Gitignore the migrations, if You have separate DBs for Development, Staging and Production environment. For dev. purposes You can use local sqlite DB and play with migrations locally.
I would recommend You to create four additional branches:
Master - Clean fresh code without migrations. Nobody is connected to this branch. Used for code reviews only
Development - daily development. Push/pull accepted. Each developer is working on sqlite DB
Cloud_DEV_env - remote cloud/server DEV environment. Pull only. Keep migrations locally on machine, which is used for the code deployment and remote migrations of Dev database
Cloud_STAG_env - remote cloud/server STAG environment. Pull only. Keep migrations locally on machine, which is used for the code deployment and remote migrations of Stag database
Cloud_PROD_env - remote cloud/server DEV environment. Pull only. Keep migrations locally on machine, which is used for the code deployment and remote migrations of Prod database
Notes:
2, 3, 4 - migrations can be kept in repos but there should be strict rules of pull requests merging, so we decided to find a person, responsible for deployments, so the only guy who has all the migration files - our deploy-er. He keeps the remote DB migrations each time we have any changes in Models.

You should think of migrations as a version control system for your database schema. makemigrations is responsible for packaging up your model changes into individual migration files - analogous to commits - and migrate is responsible for applying those to your database.
The migration files for each app live in a “migrations” directory inside of that app, and are designed to be committed to, and distributed as part of, its codebase. You should be making them once on your development machine and then running the same migrations on your colleagues’ machines, your staging machines, and eventually your production machines.
golden rule : Make once on dev and migrate on all

Having a bunch of migration files in git is messy. There is only one file in migration folder that you should not ignore. That file is init.py file, If you ignore it, python will no longer look for submodules inside the directory, so any attempts to import the modules will fail. So the question should be how to ignore all migration files but init.py?
The solution is:
Add '0*.py' to .gitignore files and it does the job perfectly.
Hope this helps someone.

Committing your migrations is just a recipe for disaster. Because the migrations are somewhat or a chain that can be traced back, if you have dependences from a former migration e.g a pip module which you used at some point in your project lifecycle and then stopped using. You might find bread crumbs of such dependences in your migrations thread and you have to manually remove these imports from the migrations file.
Verdict, except you are a god tier Django dev, probably avoid adding migrations to your commits.

Short answer
I propose excluding migrations in the repo. After code merge, just run ./manage.py makemigrations and you are all set.
Long answer
I don't think you should put migrations files into repo. It will spoil the migration states in other person's dev environment and other prod and stage environment. (refer to Sugar Tang's comment for examples).
In my point of view, the purpose of Django migrations is to find gaps between previous model states and new model states, and then serialise the gap. If your model changes after code merge, you can simple do makemigrations to find out the gap. Why do you want to manually and carefully merge other migrations when you can achieve the same automatically and bug free? Django documentation says,
They*(migrations)*’re designed to be mostly automatic
; please keep it that way. To merge migrations manually, you have to fully understand what others have changed and any dependence of the changes. That's a lot of overhead and error prone. So tracking models file is sufficient.
It is a good topic on the workflow. I am open to other options.

Parrallel south migration in django causes errors

I ran a code update that points at two front end servers (Amazon Web Service Instances).
A south migration was included as part of the update.Since the migration the live site appears to flit between the current code revision , and the previous revision, at will.
Since discovering this, A previous developer (who has left the company before I turned up), said, & I quote:
"never run migrations in parallel. Running migrations twice causes duplication of new objects >and other errors!"
My code changes did not involve any models.py changes ; the migrate commands were just part of the fabric update script. Also no errors were thrown during the migrations, they seemingly ran as normal.
I have database backups, so I can roll back the database as a last resort.
Is there any other way to sort the issue without doing this?
Thanks for reading
edit: I should add, I pushed the same code to a staging server and it worked fine, so the issue isnt the code

Firstly, if one migration tried to add a table or column that another migration already added, while the other migration was running, the first migration would fail, rolling back the transaction (i.e., not changing anything). So, you shouldn't really be able to run into a problem like your peer described.
However, if you really do somehow have that problem (partially applied migrations, etc.), the following are a couple of options:
Option 1
Reverse the migration, if it's safely reversible, then run it again (only once this time :)
Remove migrations from your deployment script
Option 2
Open a Postgres (or other DBMS) shell, and check for the existence of columns or tables that you fear might have been created, then remove them, if needed, the goal being to reverse all of the (partial) effects of the migration having been run
Open a Django shell and import MigrationHistory from South's models module
Find the MigrationHistory objects that pertain to the migrations that were run, and delete them
Run the migration again
Remove migrations from your deployment script
Note, the final step of each option is "remove migrations from your deployment script". This is because you shouldn't have a database altering statement running while your deployment goes. Even though it feels all wrong, and old-school, you should really run your migrations after the deployment, stopping your service entirely during deployment in cases where there could be problems if new code was deployed with old database tables, and leaving your app running when the migrations are less prone to problems arising from the aforementioned.

Turns out the problem was that the git fetch on one of the front servers didnt take, which is what was causing the problem..it had nothing to do with running migrations in parallel (though I shouldnt have done that anyway)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.