I have and ORM app that uses SQLAlchemy, Alembic for migration and Pytest for testing. In my testing, I have a database as a fixture. It used to be, before I used migrations, that I dropped all the tables and recreated them for each testing session.
Now that I am using migrations, I want to use Alembic in creating my fixtures too because I believe that mimics a production environment more closely.(Is that a good rationale?)
One way to do it is to downgrade() all the way down and upgrade() up each time. I don't really like this. I might be wrong.
Another would be to drop_all() and create_all() for unit tests, and just write another test that stamps the database with head and tests an upgrade and downgrade.
Is there another good/standard way to integrate migrations with fixtures so I do not have to use drop_tables?
Or is there a way to, after drop_tables stamp the db as "tail" or empty? without explicitly using the migration hash for revision 0, cause that creates dependencies, something like alembic downgrade -1 that will make it go back to year 0. Thank you.
I recommend starting a temporary database instance each time, e.g. with testing.mysqld or testing.postgresql. The advantage of this approach is that you're guaranteed to start fresh each time; the success of your tests will not depend on external factors. The downside is the extra handful of seconds that it takes to start the instance.
If you insist on using an existing database instance, you can, like you said, use create_all() + alembic stamp head. However, instead of doing drop_all(), simply drop the entire database (or schema, in the case of PostgreSQL) and recreate it.
If you insist on using drop_all(), you can drop the alembic_version table to tell alembic that the current version is "tail".
Related
Background: Airflow uses Alembic to apply migrations to the database it uses to store DAG/task metadata. I want to store some other data in this database, and would like to track my schema changes through Alembic migrations. It can be assumed that my migrations will be limited to creating/modifying new tables, without altering any of the tables that Airflow creates and uses.
Will the fact that there are two sets of migrations (one in the Airflow source code, and one in my application code) cause any issues?
Even if you use the same DB server, I suggest to use a different schema/database for the applicative stuff.
This way, when you pass a connection string in the env.py that runs the migrations, it will use a different alembic_version table and therefore they wouldn't collide.
I am close to finishing an ORM for RethinkDB in Python and I got stuck at writing tests. Particularly at those involving save(), get() and delete() operations. What's the recommended way to test whether my ORM does what it is supposed to do when saving or deleting or getting a document?
Right now, for each test in my suite I create a database, populate it with all tables needed by the test models (this takes a lot of time, almost 5 seconds/test!), run the operation on my model (e.g.: save()) and then manually run a query against the database (using RethinkDB's Python driver) to see whether everything has been updated in the database.
Now, I feel this isn't just right; maybe there is another way to write these tests or maybe I can design the tests without even running that many queries against the database. Any idea on how can I improve this or a suggestion on how this has to be really done?
You can create all your databases/tables just once for all your test.
You can also use the raw data directory:
- Start RethinkDB
- Create all your databases/tables
- Commit it.
Before each test, copy the data directory, start RethinkDB on the copy, then when your test is done, delete the copied data directory.
I want to use flask peewee as ORM for a relational db (MySQL) but my problem is changes in structure of models... like adding new attributes for a model (this means columns in db).
I want to know if I can do this automatically without writing SQL manually?
It looks like the Peewee module does support migrations.
http://peewee.readthedocs.org/en/latest/peewee/playhouse.html#schema-migrations
We developed https://github.com/keredson/peewee-db-evolve for our company's use that sounds like it may be helpful for you.
Rather than manually writing migrations, db-evolve calculates the diff between the existing schema and your defined models. It then previews and applies the non-destructive SQL commands to bring your schema into line. We've found it to be a much more robust model for schema management. (For example, switching between arbitrary branches with different schema changes is trivial this way, vs. virtually impossible w/ manually authored migrations.)
Example:
Think of it as a non-destructive version of Peewee's create_tables(). (In fact we use it for exactly that all the time, to build the schema from scratch in tests.)
I've wrote a simple migration engine for Peewee https://github.com/klen/peewee_migrate
I'm programming a web application using sqlalchemy. Everything was smooth during the first phase of development when the site was not in production. I could easily change the database schema by simply deleting the old sqlite database and creating a new one from scratch.
Now the site is in production and I need to preserve the data, but I still want to keep my original development speed by easily converting the database to the new schema.
So let's say that I have model.py at revision 50 and model.py a revision 75, describing the schema of the database. Between those two schema most changes are trivial, for example a new column is declared with a default value and I just want to add this default value to old records.
Eventually a few changes may not be trivial and require some pre-computation.
How do (or would) you handle fast changing web applications with, say, one or two new version of the production code per day ?
By the way, the site is written in Pylons if this makes any difference.
Alembic is a new database migrations tool, written by the author of SQLAlchemy. I've found it much easier to use than sqlalchemy-migrate. It also works seamlessly with Flask-SQLAlchemy.
Auto generate the schema migration script from your SQLAlchemy models:
alembic revision --autogenerate -m "description of changes"
Then apply the new schema changes to your database:
alembic upgrade head
More info here: http://readthedocs.org/docs/alembic/
What we do.
Use "major version"."minor version" identification of your applications. Major version is the schema version number. The major number is no some random "enough new functionality" kind of thing. It's a formal declaration of compatibility with database schema.
Release 2.3 and 2.4 both use schema version 2.
Release 3.1 uses the version 3 schema.
Make the schema version very, very visible. For SQLite, this means keep the schema version number in the database file name. For MySQL, use the database name.
Write migration scripts. 2to3.py, 3to4.py. These scripts work in two phases. (1) Query the old data into the new structure creating simple CSV or JSON files. (2) Load the new structure from the simple CSV or JSON files with no further processing. These extract files -- because they're in the proper structure, are fast to load and can easily be used as unit test fixtures. Also, you never have two databases open at the same time. This makes the scripts slightly simpler. Finally, the load files can be used to move the data to another database server.
It's very, very hard to "automate" schema migration. It's easy (and common) to have database surgery so profound that an automated script can't easily map data from old schema to new schema.
Use sqlalchemy-migrate.
It is designed to support an agile approach to database design, and make it easier to keep development and production databases in sync, as schema changes are required. It makes schema versioning easy.
Think of it as a version control for your database schema. You commit each schema change to it, and it will be able to go forwards/backwards on the schema versions. That way you can upgrade a client and it will know exactly which set of changes to apply on that client's database.
It does what S.Lott proposes in his answer, automatically for you. Makes a hard thing easy.
The best way to deal with your problem is to reflect your schema instead doing it the declarative way. I wrote an article about the reflective approach here:
http://petrushev.wordpress.com/2010/06/16/reflective-approach-on-sqlalchemy-usage/
but there are other resources about this also. In this manner, every time you make changes to your schema, all you need to do is restart the app and the reflection will fetch the new metadata for the changes in tables. This is quite fast and sqlalchemy does it only once per process. Of course, you'll have to manage the relationships changes you make yourself.
as I usually don't do the up front design of my models in Django projects I end up modifying the models a lot and thus deleting my test database every time (because "syncdb" won't ever alter the tables automatically for you). Below lies my workflow and I'd like to hear about yours. Any thoughts welcome..
Modify the model.
Delete the test database. (always a simple sqlite database for me.)
Run "syncdb".
Generate some test data via code.
goto 1.
A secondary question regarding this.. In case your workflow is like above, how do you execute the 4. step? Do you generate the test data manually or is there a proper hook point in Django apps where you can inject the test-data-generating-code at server startup?\
TIA.
Steps 2 & 3 can be done in one step:
manage.py reset appname
Step 4 is most easily managed, from my understanding, by using fixtures
This is a job for Django's fixtures. They are convenient because they are database independent and the test harness (and manage.py) have built-in support for them.
To use them:
Set up your data in your app (call
it "foo") using the admin tool
Create a fixtures directory in your
"foo" app directory
Type: python manage.py dumpdata --indent=4 foo > foo/fixtures/foo.json
Now, after your syncdb stage, you just type:
python manage.py loaddata foo.json
And your data will be re-created.
If you want them in a test case:
class FooTests(TestCase):
fixtures = ['foo.json']
Note that you will have to recreate or manually update your fixtures if your schema changes drastically.
You can read more about fixtures in the django docs for Fixture Loading
Here's what we do.
Apps are named with a Schema version number. appa_2, appb_1, etc.
Minor changes don't change the number.
Major changes increment the number. Syncdb works. And a "data migration" script can be written.
def migrate_appa_2_to_3():
for a in appa_2.SomeThing.objects.all():
appa_3.AnotherThing.create( a.this, a.that )
appa_3.NewThing.create( a.another, a.yetAnother )
for b in ...
The point is that drop and recreate isn't always appropriate. It's sometimes helpful to move data form the old model to the new model without rebuilding from scratch.
South is the coolest.
Though good ol' reset works best when data doesn't matter.
http://south.aeracode.org/
To add to Matthew's response, I often also use custom SQL to provide initial data as documented here.
Django just looks for files in <app>/sql/<modelname>.sql and runs them after creating tables during syncdb or sqlreset. I use custom SQL when I need to do something like populate my Django tables from other non-Django database tables.
Personally my development db is for a project I'm working on right now is rather large, so I use dmigrations to create db migration scripts to modify the db (rather than wiping out the db everytime like I did in the beginning).
Edit: Actually, I'm using South now :-)