Managing seed data with SqlAlchemy and SqlAlchemy-migrate - python

I'm using SqlAlchemy in my Pylons application to access data and SqlAlchemy-migrate to maintain the database schema.
It works fine for managing the schema itself. However, I also want to manage seed data in a migrate-like way. E.g. when ProductCategory table is created it would make sense to seed it with categories data.
Looks like SqlAlchemy-migrate does not support this directly. What would be a good approach to do this with Pylons+SqlAlchemy+SqlAlchemy-migrate?

Well what format is your seed data starting out in? The migrate calls are just python methods so you're free to open some csv, create SA object instances, loop, etc. I usually have my seed data as a series of sql insert statements and just loop over them executing a migate.execute(query) for each one.
So I'll first create the table, loop and run seed data, and then empty/drop table on the downgrade method.

Related

What should I use to enter data to my database on Django? Django admin or SQL code?

I am a newbie in programming, but now I connected my project with PostgreSQL. I learned the way to enter by SQL code and also found out that we can actually enter /adming (by creating the superuser and add data there). So which one is widely used in webdev?
It will depend completely on your application.
You can add rows to a table using SQL if that's the easiest way for you. Or you can add rows by creating new object instances in Python code and .save()ing them. Or you can create instances through a CreateView or through the Django admin.
Adding data with SQL has the drawback that you will lise the benefit of any validators declared on the model's fields. YOu may end up with data stored in your SQL tables which your app regards as "impossible", which may cause you minor or even major difficulties.
I have several times written management commands which all have the same general format. For each "row" in a data source (often a spreadsheet) construct one or more Django objects and save them. You can process each data "row" within a transaction (with transaction.atomic()) so if anything goes wrong, the data row is not committed. Or you can treat the entire process as a single transaction (not recommended for vast numbers of "rows", though)·

Does Django have an equivalent of Rails's “rails db:seed”?

One thing I like about Rails projects is that you can create test content and place them in seeds.rb and seed them into the database by running rake db:seed instead of having to feed them one by one directly.
Is there something similar for Python/Django?
It sounds like migrations or fixtures may be what you are looking for.
Migrations would be Python code that can add data or manipulate the database schema. Fixtures would be JSON formatted data that can be used as an initial data seed by using python manage.py loaddata $FIXTURE.
I would also recommend looking at Factory Boy for test data creation. One could create records in series of loops to population any number of tables and build sensible relationships between records.

How to handle customized schema with sqlalchemy

I'm pretty new to database and server related tasks. I currently have two tables stored in a MSsql database on a server and I'm trying to use python package sqlalchemy to pull some of the data to my local machine. The first table has default schema dbo, and I was able to use the Connect String
'mssql+pyodbc://<username>:<password>#<dsnname>'
to inspect the table, but the other table has a customized schema, and I don't see any information about the table when I use the previous commands. I assume it is because now the second table has different schema and the python package can't find it anymore.
I was looking at automap hoping the package offers a way to deal with customized schema, but many concepts described in there I don't quite understand and I'm not trying to alter the database just pulling data so not sure if it's the right way, any suggestions?
Thanks
In case of automap you should pass the schema argument when preparing reflectively:
AutomapBase.prepare(reflect=True, schema='myschema')
If you wish to reflect both the default schema and your "customized schema" using the same automapper, then first reflect both schemas using the MetaData instance and after that prepare the automapper:
AutomapBase.metadata.reflect()
AutomapBase.metadata.reflect(schema='myschema')
AutomapBase.prepare()
If you call AutomapBase.prepare(reflect=True, ...) consecutively for both schemas, then the automapper will recreate and replace the classes from the 1st prepare because the tables already exist in the metadata. This will then raise warnings.

Loading data from a (MySQL) database into Django without models

This might sound like a bit of an odd question - but is it possible to load data from a (in this case MySQL) table to be used in Django without the need for a model to be present?
I realise this isn't really the Django way, but given my current scenario, I don't really know how better to solve the problem.
I'm working on a site, which for one aspect makes use of a table of data which has been bought from a third party. The columns of interest are liklely to remain stable, however the structure of the table could change with subsequent updates to the data set. The table is also massive (in terms of columns) - so I'm not keen on typing out each field in the model one-by-one. I'd also like to leave the table intact - so coming up with a model which represents the set of columns I am interested in is not really an ideal solution.
Ideally, I want to have this table in a database somewhere (possibly separate to the main site database) and access its contents directly using SQL.
You can always execute raw SQL directly against the database: see the docs.
There is one feature called inspectdb in Django. for legacy databases like MySQL , it creates models automatically by inspecting your db tables. it stored in our app files as models.py. so we don't need to type all column manually.But read the documentation carefully before creating the models because it may affect the DB data ...i hope this will be useful for you.
I guess you can use any SQL library available for Python. For example : http://www.sqlalchemy.org/
You have just then to connect to your database, perform your request and use the datas at your will. I think you can't use Django without their model system, but nothing prevents you from using another library for this in parallel.

How to efficiently manage frequent schema changes using sqlalchemy?

I'm programming a web application using sqlalchemy. Everything was smooth during the first phase of development when the site was not in production. I could easily change the database schema by simply deleting the old sqlite database and creating a new one from scratch.
Now the site is in production and I need to preserve the data, but I still want to keep my original development speed by easily converting the database to the new schema.
So let's say that I have model.py at revision 50 and model.py a revision 75, describing the schema of the database. Between those two schema most changes are trivial, for example a new column is declared with a default value and I just want to add this default value to old records.
Eventually a few changes may not be trivial and require some pre-computation.
How do (or would) you handle fast changing web applications with, say, one or two new version of the production code per day ?
By the way, the site is written in Pylons if this makes any difference.
Alembic is a new database migrations tool, written by the author of SQLAlchemy. I've found it much easier to use than sqlalchemy-migrate. It also works seamlessly with Flask-SQLAlchemy.
Auto generate the schema migration script from your SQLAlchemy models:
alembic revision --autogenerate -m "description of changes"
Then apply the new schema changes to your database:
alembic upgrade head
More info here: http://readthedocs.org/docs/alembic/
What we do.
Use "major version"."minor version" identification of your applications. Major version is the schema version number. The major number is no some random "enough new functionality" kind of thing. It's a formal declaration of compatibility with database schema.
Release 2.3 and 2.4 both use schema version 2.
Release 3.1 uses the version 3 schema.
Make the schema version very, very visible. For SQLite, this means keep the schema version number in the database file name. For MySQL, use the database name.
Write migration scripts. 2to3.py, 3to4.py. These scripts work in two phases. (1) Query the old data into the new structure creating simple CSV or JSON files. (2) Load the new structure from the simple CSV or JSON files with no further processing. These extract files -- because they're in the proper structure, are fast to load and can easily be used as unit test fixtures. Also, you never have two databases open at the same time. This makes the scripts slightly simpler. Finally, the load files can be used to move the data to another database server.
It's very, very hard to "automate" schema migration. It's easy (and common) to have database surgery so profound that an automated script can't easily map data from old schema to new schema.
Use sqlalchemy-migrate.
It is designed to support an agile approach to database design, and make it easier to keep development and production databases in sync, as schema changes are required. It makes schema versioning easy.
Think of it as a version control for your database schema. You commit each schema change to it, and it will be able to go forwards/backwards on the schema versions. That way you can upgrade a client and it will know exactly which set of changes to apply on that client's database.
It does what S.Lott proposes in his answer, automatically for you. Makes a hard thing easy.
The best way to deal with your problem is to reflect your schema instead doing it the declarative way. I wrote an article about the reflective approach here:
http://petrushev.wordpress.com/2010/06/16/reflective-approach-on-sqlalchemy-usage/
but there are other resources about this also. In this manner, every time you make changes to your schema, all you need to do is restart the app and the reflection will fetch the new metadata for the changes in tables. This is quite fast and sqlalchemy does it only once per process. Of course, you'll have to manage the relationships changes you make yourself.

Categories

Resources