I'm programming a web application using sqlalchemy. Everything was smooth during the first phase of development when the site was not in production. I could easily change the database schema by simply deleting the old sqlite database and creating a new one from scratch.
Now the site is in production and I need to preserve the data, but I still want to keep my original development speed by easily converting the database to the new schema.
So let's say that I have model.py at revision 50 and model.py a revision 75, describing the schema of the database. Between those two schema most changes are trivial, for example a new column is declared with a default value and I just want to add this default value to old records.
Eventually a few changes may not be trivial and require some pre-computation.
How do (or would) you handle fast changing web applications with, say, one or two new version of the production code per day ?
By the way, the site is written in Pylons if this makes any difference.
Alembic is a new database migrations tool, written by the author of SQLAlchemy. I've found it much easier to use than sqlalchemy-migrate. It also works seamlessly with Flask-SQLAlchemy.
Auto generate the schema migration script from your SQLAlchemy models:
alembic revision --autogenerate -m "description of changes"
Then apply the new schema changes to your database:
alembic upgrade head
More info here: http://readthedocs.org/docs/alembic/
What we do.
Use "major version"."minor version" identification of your applications. Major version is the schema version number. The major number is no some random "enough new functionality" kind of thing. It's a formal declaration of compatibility with database schema.
Release 2.3 and 2.4 both use schema version 2.
Release 3.1 uses the version 3 schema.
Make the schema version very, very visible. For SQLite, this means keep the schema version number in the database file name. For MySQL, use the database name.
Write migration scripts. 2to3.py, 3to4.py. These scripts work in two phases. (1) Query the old data into the new structure creating simple CSV or JSON files. (2) Load the new structure from the simple CSV or JSON files with no further processing. These extract files -- because they're in the proper structure, are fast to load and can easily be used as unit test fixtures. Also, you never have two databases open at the same time. This makes the scripts slightly simpler. Finally, the load files can be used to move the data to another database server.
It's very, very hard to "automate" schema migration. It's easy (and common) to have database surgery so profound that an automated script can't easily map data from old schema to new schema.
Use sqlalchemy-migrate.
It is designed to support an agile approach to database design, and make it easier to keep development and production databases in sync, as schema changes are required. It makes schema versioning easy.
Think of it as a version control for your database schema. You commit each schema change to it, and it will be able to go forwards/backwards on the schema versions. That way you can upgrade a client and it will know exactly which set of changes to apply on that client's database.
It does what S.Lott proposes in his answer, automatically for you. Makes a hard thing easy.
The best way to deal with your problem is to reflect your schema instead doing it the declarative way. I wrote an article about the reflective approach here:
http://petrushev.wordpress.com/2010/06/16/reflective-approach-on-sqlalchemy-usage/
but there are other resources about this also. In this manner, every time you make changes to your schema, all you need to do is restart the app and the reflection will fetch the new metadata for the changes in tables. This is quite fast and sqlalchemy does it only once per process. Of course, you'll have to manage the relationships changes you make yourself.
Related
I'm working on a project that I inherited, and I want to add a table to my database that is very similar to one that already exists. Basically, we have a table to log users for our website, and I want to create a second table to specifically log users that our site fails to do a task for.
Since I didn't write the site myself, and am pretty new to both SQL and Django, I'm a little paranoid about running a migration (we have a lot of really sensitive data that I'm paranoid about wiping).
Instead of having a django migration create the table itself, can I create the second table in MySQL, and the corresponding model in Django, and then have this model "recognize" the SQL table? without explicitly using a migration?
SHORT ANSWER: Yes.
MEDIUM ANSWER: Yes. But you will have to figure out how Django would have created the table, and do it by hand. That's not terribly hard.
Django may also spit out some warnings on startup about migrations being needed...but those are warnings, and if the app works, then you're OK.
LONG ANSWER: Yes. But for the sake of your sanity and sleep quality, get a completely separate development environment and test your backups. (But you knew that already.)
I want to use flask peewee as ORM for a relational db (MySQL) but my problem is changes in structure of models... like adding new attributes for a model (this means columns in db).
I want to know if I can do this automatically without writing SQL manually?
It looks like the Peewee module does support migrations.
http://peewee.readthedocs.org/en/latest/peewee/playhouse.html#schema-migrations
We developed https://github.com/keredson/peewee-db-evolve for our company's use that sounds like it may be helpful for you.
Rather than manually writing migrations, db-evolve calculates the diff between the existing schema and your defined models. It then previews and applies the non-destructive SQL commands to bring your schema into line. We've found it to be a much more robust model for schema management. (For example, switching between arbitrary branches with different schema changes is trivial this way, vs. virtually impossible w/ manually authored migrations.)
Example:
Think of it as a non-destructive version of Peewee's create_tables(). (In fact we use it for exactly that all the time, to build the schema from scratch in tests.)
I've wrote a simple migration engine for Peewee https://github.com/klen/peewee_migrate
I'm working on a web-app that's very heavily database driven. I'm nearing the initial release and so I've locked down the features for this version, but there are going to be lots of other features implemented after release. These features will inevitably require some modification to the database models, so I'm concerned about the complexity of migrating the database on each release. What I'd like to know is how much should I concern myself with locking down a solid database design now so that I can release quickly, against trying to anticipate certain features now so that I can build it into the database before release? I'm also anticipating finding flaws with my current model and would probably then want to make changes to it, but if I release the app and then data starts coming in, migrating the data would be a difficult task I imagine. Are there conventional methods to tackle this type of problem? A point in the right direction would be very useful.
For a bit of background I'm developing an asset management system for a CG production pipeline. So lots of pieces of data with lots of connections between them. It's web-based, written entirely in Python and it uses SQLAlchemy with a SQLite engine.
Some thoughts for managing databases for a production application:
Make backups nightly. This is crucial because if you try to do an update (to the data or the schema), and you mess up, you'll need to be able to revert to something more stable.
Create environments. You should have something like a local copy of the database for development, a staging database for other people to see and test before going live and of course a production database that your live system points to.
Make sure all three environments are in sync before you start development locally. This way you can track changes over time.
Start writing scripts and version them for releases. Make sure you store these in a source control system (SVN, Git, etc.) You just want a historical record of what has changed and also a small set of scripts that need to be run with a given release. Just helps you stay organized.
Do your changes to your local database and test it. Make sure you have scripts that do two things, 1) Scripts that modify the data, or the schema, 2) Scripts that undo what you've done in case things go wrong. Test these over and over locally. Run the scripts, test and then rollback. Are things still ok?
Run the scripts on staging and see if everything is still ok. Just another chance to prove your work is good and that if needed you can undo your changes.
Once staging is good and you feel confident, run your scripts on the production database. Remember you have scripts to change data (update, delete statements) and scripts to change schema (add fields, rename fields, add tables).
In general take your time and be very deliberate in your actions. The more disciplined you are the more confident you'll be. Updating the database can be scary, so don't rush things, write out your plan of action, and test, test, test!
One approach that I saw (and liked) was a table called versions that contained an id only.
Then there was an updates.sql script that had a structure similar to this:
DELIMITER $$
CREATE PROCEDURE IF NOT EXISTS DBUpdate()
BEGIN
IF (SELECT id FROM versions) = 1 THEN
CREATE TABLE IF NOT EXISTS new_feature_table(
id INT PRIMARY KEY AUTO-INCREMENT,
blah VARCHAR(128) ...,
);
IF (SELECT id FROM versions) = 2 THEN
CREATE TABLE IF NOT EXISTS newer_feature_table(
id INT PRIMARY KEY AUTO-INCREMENT,
blah VARCHAR(128) ...,
);
END$$
DELIMITER ;
CALL PROCEDURE DBUpdate();
Then you write a python script to check the repository for updates, connect to the db, and run any changes to the schema via this procedure. It's nice because you only need a versions table with the appropriate id value to build out the entire database (with no data, that is; see ryan1234's answer concerning data backups).
I'm wondering if someone can recommend a good pattern for deploying database changes via python.
In my scenario, I've got one or more PostgreSQL databases and I'm trying to deploy a code base to each one. Here's an example of the directory structure for my SQL scripts:
my_db/
main.sql
some_directory/
foo.sql
bar.sql
some_other_directory/
baz.sql
Here's an example of what's in main.sql
/* main.sql has the following contents: */
BEGIN TRANSACTION
\i some_directory/bar.sql
\i some_directory/foo.sql
\i some_other_directory/baz.sql
COMMIT;
As you can see, main.sql defines a specific order of operations and a transaction for the database updates.
I've also got a python / twisted service monitoring SVN for changes in this db code, and I'd like to automatically deploy this code upon discovery of new stuff from the svn repository.
Can someone recommend a good pattern to use here?
Should I be parsing each file?
Should I be shelling out to psql?
...
What you're doing is actually a decent approach if you control all the servers and they're all postgresql servers.
A more general approach is to have a directory of "migrations" which are generally classes with an apply() and undo() that actually do the work in your database, and often come with
abstractions like .create_table() that generate the DDL instructions specific to whatever RDBMS you're using.
Generally, you have some naming convention that ensures the migrations run in the order they were created.
There's a migration library for python called South, though it appears to be geared specifically toward django development.
http://south.aeracode.org/docs/about.html
We just integrated sqlalchemy-migrate which has some pretty Rails-like conventions but with the power of SQLAlchemy. It's shaping up to be a really awesome product, but it does have some downfalls. It's pretty seamless to integrate though.
I'm starting a web project that likely should be fine with SQLite. I have SQLObject on top of it, but thinking long term here -- if this project should require a more robust (e.g. able to handle high traffic), I will need to have a transition plan ready. My questions:
How easy is it to transition from one DB (SQLite) to another (MySQL or Firebird or PostGre) under SQLObject?
Does SQLObject provide any tools to make such a transition easier? Is it simply take the objects I've defined and call createTable?
What about having multiple SQLite databases instead? E.g. one per visitor group? Does SQLObject provide a mechanism for handling this scenario and if so, what is the mechanism to use?
Thanks,
Sean
3) Is quite an interesting question. In general, SQLite is pretty useless for web-based stuff. It scales fairly well for size, but scales terribly for concurrency, and so if you are planning to hit it with a few requests at the same time, you will be in trouble.
Now your idea in part 3) of the question is to use multiple SQLite databases (eg one per user group, or even one per user). Unfortunately, SQLite will give you no help in this department. But it is possible. The one project I know that has done this before is Divmod's Axiom. So I would certainly check that out.
Of course, it would probably be much easier to just use a good concurrent DB like the ones you mention (Firebird, PG, etc).
For completeness:
1 and 2) It should be straightforward without you actually writing much code. I find SQLObject a bit restrictive in this department, and would strongly recommend SQLAlchemy instead. This is far more flexible, and if I was starting a new project today, I would certainly use it over SQLObject. It won't be moving "Objects" anywhere. There is no magic involved here, it will be transferring rows in tables in a database. Which as mentioned you could do by hand, but this might save you some time.
Your success with createTable() will depend on your existing underlying table schema / data types. In other words, how well SQLite maps to the database you choose and how SQLObject decides to use your data types.
The safest option may be to create the new database by hand. Then you'll have to deal with data migration, which may be as easy as instantiating two SQLObject database connections over the same table definitions.
Why not just start with the more full-featured database?
I'm not sure I understand the question.
The SQLObject documentation lists six kinds of connections available. Further, the database connection (or scheme) is specified in a connection string. Changing database connections from SQLite to MySQL is trivial. Just change the connection string.
The documentation lists the different kinds of schemes that are supported.