I'm coming from EntityFramework. I'm used to creating my db, running EF, and having a bunch of class files generated for each of my db objects (usually tables) filled with properties (usually columns).
Following this basic use example, I've figured out how to use reflection to generate models in memory. But how does one save the models to disk as classes? Since python code isn't compiled I guess the entire ORM could just be generated every time I run my application, but this feels very strange coming from my EF background. What's the best practice here? (BTW I'm using this in the context of Flask).
Generally you reflect them every time you run your app, yes. Otherwise they'd break when you update your schema, which would sort of defeat the point of using reflection.
It is possible to generate declarative classes based on the current state of your schema; there's a sqlacodegen module on PyPI that does this, and the author of SQLA has a database migration project called Alembic that tackles the similar problem of comparing two schemata.
Related
I have a specific question around generation and application of migrations to a pre-existing postgres database, the models have been created in SQLAlchemy (after the original DDLs), and I intend to use Alembic (wrapped by Flask-Migrate) to migrate our application away from the currently defined DDLs, which were run against the database a long time ago. We are moving towards a more defined migration process where schema upgrades can be handled in a sane way.
My question is what should I be aware of when doing this? I am aware of Automap, but I'd much prefer to just add the ORM and standardise our schema creation.
My concern at the moment is that because the tables weren't created by the ORM/migration process, then I will have problems moving it across, basically, I'm aware that I'm unaware of how some things work under the hood. Feel free to tell me if I'm concerned about a null threat.
Cheers.
I am building a database interface using Python's peewee module. I am trying to figure out how to insert data into an existing database where I do not know the schema.
My idea is to use playhouse.reflection.Introspector to find out the database schema, then use that information to create class objects which can then be inserted into the existing database.
So far I've gotten to:
introspector = Introspector.from_database(database)
models = introspector.generate_models()
I'm don't know where to go from there.
1) Can I create database objects in this manner? What is the next step?
2) Is there an easier way to do this?
peewee includes an introspection tool called pwiz that can (basically) introspect a database and produce model definitions. It is run as a command line script and dumps the model definitions to stdout, so invokation is like any other unix tool. Here is an example from the docs:
python -m pwiz -e postgresql my_postgres_db > mymodels.py
From there edit mymodels.py to get what you need.
You could do this on the fly, but it would require a few steps and is hackish (not to mention pointless if you really don't know anything about the schema):
Run pwiz as an os command
Read it to pick out the model names
Import whatever you find
BUT
If you really don't know the schema to start with then you have no idea what the semantics of the database are anyway, which means whatever you find is literally meaningless. Unless you at least know some schema/table/column names you are hunting for (in which case you do know something about the schema) there isn't really much you can do with regard to inserting data (not in a sane way), though you could certainly dump data from the db. But if you just wanted a database dump then pg_dump would have been easier.
I suspect this is actually an X-Y problem. What problem is it you are trying to solve by using this technique? What effect is it supposed to achieve within the context of your system?
If you want to create a GUI, check out the sqlite_web project. It uses Peewee to create a web-based SQLite database manager.
I want to use flask peewee as ORM for a relational db (MySQL) but my problem is changes in structure of models... like adding new attributes for a model (this means columns in db).
I want to know if I can do this automatically without writing SQL manually?
It looks like the Peewee module does support migrations.
http://peewee.readthedocs.org/en/latest/peewee/playhouse.html#schema-migrations
We developed https://github.com/keredson/peewee-db-evolve for our company's use that sounds like it may be helpful for you.
Rather than manually writing migrations, db-evolve calculates the diff between the existing schema and your defined models. It then previews and applies the non-destructive SQL commands to bring your schema into line. We've found it to be a much more robust model for schema management. (For example, switching between arbitrary branches with different schema changes is trivial this way, vs. virtually impossible w/ manually authored migrations.)
Example:
Think of it as a non-destructive version of Peewee's create_tables(). (In fact we use it for exactly that all the time, to build the schema from scratch in tests.)
I've wrote a simple migration engine for Peewee https://github.com/klen/peewee_migrate
I'm programming a web application using sqlalchemy. Everything was smooth during the first phase of development when the site was not in production. I could easily change the database schema by simply deleting the old sqlite database and creating a new one from scratch.
Now the site is in production and I need to preserve the data, but I still want to keep my original development speed by easily converting the database to the new schema.
So let's say that I have model.py at revision 50 and model.py a revision 75, describing the schema of the database. Between those two schema most changes are trivial, for example a new column is declared with a default value and I just want to add this default value to old records.
Eventually a few changes may not be trivial and require some pre-computation.
How do (or would) you handle fast changing web applications with, say, one or two new version of the production code per day ?
By the way, the site is written in Pylons if this makes any difference.
Alembic is a new database migrations tool, written by the author of SQLAlchemy. I've found it much easier to use than sqlalchemy-migrate. It also works seamlessly with Flask-SQLAlchemy.
Auto generate the schema migration script from your SQLAlchemy models:
alembic revision --autogenerate -m "description of changes"
Then apply the new schema changes to your database:
alembic upgrade head
More info here: http://readthedocs.org/docs/alembic/
What we do.
Use "major version"."minor version" identification of your applications. Major version is the schema version number. The major number is no some random "enough new functionality" kind of thing. It's a formal declaration of compatibility with database schema.
Release 2.3 and 2.4 both use schema version 2.
Release 3.1 uses the version 3 schema.
Make the schema version very, very visible. For SQLite, this means keep the schema version number in the database file name. For MySQL, use the database name.
Write migration scripts. 2to3.py, 3to4.py. These scripts work in two phases. (1) Query the old data into the new structure creating simple CSV or JSON files. (2) Load the new structure from the simple CSV or JSON files with no further processing. These extract files -- because they're in the proper structure, are fast to load and can easily be used as unit test fixtures. Also, you never have two databases open at the same time. This makes the scripts slightly simpler. Finally, the load files can be used to move the data to another database server.
It's very, very hard to "automate" schema migration. It's easy (and common) to have database surgery so profound that an automated script can't easily map data from old schema to new schema.
Use sqlalchemy-migrate.
It is designed to support an agile approach to database design, and make it easier to keep development and production databases in sync, as schema changes are required. It makes schema versioning easy.
Think of it as a version control for your database schema. You commit each schema change to it, and it will be able to go forwards/backwards on the schema versions. That way you can upgrade a client and it will know exactly which set of changes to apply on that client's database.
It does what S.Lott proposes in his answer, automatically for you. Makes a hard thing easy.
The best way to deal with your problem is to reflect your schema instead doing it the declarative way. I wrote an article about the reflective approach here:
http://petrushev.wordpress.com/2010/06/16/reflective-approach-on-sqlalchemy-usage/
but there are other resources about this also. In this manner, every time you make changes to your schema, all you need to do is restart the app and the reflection will fetch the new metadata for the changes in tables. This is quite fast and sqlalchemy does it only once per process. Of course, you'll have to manage the relationships changes you make yourself.
What is the best way to migrate MySQL tables to Google Datastore and create python models for them?
I have a PHP+MySQL project that I want to migrate to Python+GAE project. So far the big obstacle is migrating the tables and creating corresponding models. Each table is about 110 columns wide. Creating a model for the table manually is a bit tedious, let alone creating a loader and importing a generated csv table representation.
Is there a more efficient way for me to do the migration?
In general, generating your models automatically shouldn't be too difficult. Suppose you have a csv file for each table, with lines consisting of (field name, data type), then something like this would do the job:
# Maps MySQL types to Datastore property classes
type_map = {
'char': 'StringProperty',
'text': 'TextProperty',
'int': 'IntegerProperty',
# ...
}
def generate_model_class(classname, definition_file):
ret = []
ret.append("class %s(db.Model):" % (classname,))
for fieldname, type in csv.reader(open(definition_file)):
ret.append(" %s = db.%s()" % (fieldname, type_map[type]))
return "\n".join(ret)
Once you've defined your schema, you can bulk load directly from the DB - no need for intermediate CSV files. See my blog post on the subject.
approcket can mysql⇌gae or gae builtin remote api from google
In your shoes, I'd write a one-shot Python script to read the existing MySQL schema (with MySQLdb), generating a models.py to match (then do some manual checks and edits on the generated code, just in case). That's assuming that a data model with "about 110" properties per entity is something you're happy with and want to preserve, of course; it might be worth to take the opportunity to break things up a bit (indeed you may have to if your current approach also relies on joins or other SQL features GAE doesn't give you), but that of course requires more manual work.
Once the data model is in place, bulk loading can happen, typically via intermediate CSV files (there are several ways you can generate those).
you don't need to
http://code.google.com/apis/sql/
:)
You could migrate them to django models first
In particular use
python manage.py inspectdb > models.py
And edit models.py until satisfied. You might have to put ForeignKeys in, adjusts the length of CharFields etc.
I've converted several legacy databases to django like this with good success.
Django models however are different to GAE models (which I'm not very familiar with) so that may not be terribly helpful I don't know!