Insert row into database with unknown schema using Python module peewee - python

I am building a database interface using Python's peewee module. I am trying to figure out how to insert data into an existing database where I do not know the schema.
My idea is to use playhouse.reflection.Introspector to find out the database schema, then use that information to create class objects which can then be inserted into the existing database.
So far I've gotten to:
introspector = Introspector.from_database(database)
models = introspector.generate_models()
I'm don't know where to go from there.
1) Can I create database objects in this manner? What is the next step?
2) Is there an easier way to do this?

peewee includes an introspection tool called pwiz that can (basically) introspect a database and produce model definitions. It is run as a command line script and dumps the model definitions to stdout, so invokation is like any other unix tool. Here is an example from the docs:
python -m pwiz -e postgresql my_postgres_db > mymodels.py
From there edit mymodels.py to get what you need.
You could do this on the fly, but it would require a few steps and is hackish (not to mention pointless if you really don't know anything about the schema):
Run pwiz as an os command
Read it to pick out the model names
Import whatever you find
BUT
If you really don't know the schema to start with then you have no idea what the semantics of the database are anyway, which means whatever you find is literally meaningless. Unless you at least know some schema/table/column names you are hunting for (in which case you do know something about the schema) there isn't really much you can do with regard to inserting data (not in a sane way), though you could certainly dump data from the db. But if you just wanted a database dump then pg_dump would have been easier.
I suspect this is actually an X-Y problem. What problem is it you are trying to solve by using this technique? What effect is it supposed to achieve within the context of your system?

If you want to create a GUI, check out the sqlite_web project. It uses Peewee to create a web-based SQLite database manager.

Related

Can flask-peewee do migration?

I want to use flask peewee as ORM for a relational db (MySQL) but my problem is changes in structure of models... like adding new attributes for a model (this means columns in db).
I want to know if I can do this automatically without writing SQL manually?
It looks like the Peewee module does support migrations.
http://peewee.readthedocs.org/en/latest/peewee/playhouse.html#schema-migrations
We developed https://github.com/keredson/peewee-db-evolve for our company's use that sounds like it may be helpful for you.
Rather than manually writing migrations, db-evolve calculates the diff between the existing schema and your defined models. It then previews and applies the non-destructive SQL commands to bring your schema into line. We've found it to be a much more robust model for schema management. (For example, switching between arbitrary branches with different schema changes is trivial this way, vs. virtually impossible w/ manually authored migrations.)
Example:
Think of it as a non-destructive version of Peewee's create_tables(). (In fact we use it for exactly that all the time, to build the schema from scratch in tests.)
I've wrote a simple migration engine for Peewee https://github.com/klen/peewee_migrate

Loading data from a (MySQL) database into Django without models

This might sound like a bit of an odd question - but is it possible to load data from a (in this case MySQL) table to be used in Django without the need for a model to be present?
I realise this isn't really the Django way, but given my current scenario, I don't really know how better to solve the problem.
I'm working on a site, which for one aspect makes use of a table of data which has been bought from a third party. The columns of interest are liklely to remain stable, however the structure of the table could change with subsequent updates to the data set. The table is also massive (in terms of columns) - so I'm not keen on typing out each field in the model one-by-one. I'd also like to leave the table intact - so coming up with a model which represents the set of columns I am interested in is not really an ideal solution.
Ideally, I want to have this table in a database somewhere (possibly separate to the main site database) and access its contents directly using SQL.
You can always execute raw SQL directly against the database: see the docs.
There is one feature called inspectdb in Django. for legacy databases like MySQL , it creates models automatically by inspecting your db tables. it stored in our app files as models.py. so we don't need to type all column manually.But read the documentation carefully before creating the models because it may affect the DB data ...i hope this will be useful for you.
I guess you can use any SQL library available for Python. For example : http://www.sqlalchemy.org/
You have just then to connect to your database, perform your request and use the datas at your will. I think you can't use Django without their model system, but nothing prevents you from using another library for this in parallel.

Transferring data from a DB2 DB to a greenplum DB

My company has decided to implement a datamart using [Greenplum] and I have the task of figuring out how to go on about it. A ballpark figure of the amount of data to be transferred from the existing [DB2] DB to the Greenplum DB is about 2 TB.
I would like to know :
1) Is the Greenplum DB the same as vanilla [PostgresSQL]? (I've worked on Postgres AS 8.3)
2) Are there any (free) tools available for this task (extract and import)
3) I have some knowledge of Python. Is it feasible, even easy to do this in a resonable amount of time?
I have no idea how to do this. Any advice, tips and suggestions will be hugely welcome.
1) Greenplum is not vanilla postgres, but it is similar. It has some new syntax, but in general, is highly consistent.
2) Greenplum itself provides something called "gpfdist" which lets you listen on a port that you specify in order to bring in a file (but the file has to be split up). You want readable external tables. They are quite fast. Syntax looks like this:
CREATE READABLE EXTERNAL TABLE schema.ext_table
( thing int, thing2 int )
LOCATION (
'gpfdist://server:port1/path/to/filep1.txt',
'gpfdist://server:port2/path/to/filep2.txt',
'gpfdist://server:port3/path/to/filep3.txt'
) FORMAT 'text' (delimiter E'\t' null 'null' escape 'off') ENCODING 'UTF8';
CREATE TEMP TABLE import AS SELECT * FROM schema.ext_table DISTRIBUTED RANDOMLY;
If you play to their rules and your data is clean, the loading can be blazing fast.
3) You don't need python to do this, although you could automate it by using python to kick off the gpfdist processes, and then sending a command to psql that creates the external table and loads the data. Depends on what you want to do though.
Many of Greenplum's utilities are written in python and the current DBMS distribution comes with python 2.6.2 installed, including the pygresql module which you can use to work inside the GPDB.
For data transfer into greenplum, I've written python scripts that connect to the source (Oracle) DB using cx_Oracle and then dumping that output either to flat files or named pipes. gpfdist can read from either sort of source and load the data into the system.
Generally, it is really slow if you use SQL insert or merge to import big bulk data.
The recommended way is to use the external tables you define to use file-based, web-based or gpfdist protocol hosted files.
And also greenplum has a utility named gpload, which can be used to define your transferring jobs, like source, output, mode(inert, update or merge).
1) It's not vanilla postgres
2) I have used pentaho data integration with good success in various types of data transfer projects.
It allows for complex transformations and multi-threaded, multi-step loading of data if you design your steps carefully.
Also I believe Pentaho support Greenplum specifically though I have no experience of this.

What should I use for a python program to do stuff with tables of data?

I want to be able to add daily info to each object and want to have the ability to delete info x days old easily. With the tables I need to look at the trends and do stuff like selecting objects which match some criteria.
Edit: I asked this because I'm not able to think of a way to implement deleting old data easily because you cannot delete tables in sqlite
Using sqlite would it be the best option, is file based, easy to use, you can use Lookups with SQL and it's builtin on python you don't need to install anything.
→ http://docs.python.org/library/sqlite3.html
If your question means that you are just going to be using "table like data" but not bound to a db, look into using this python modul: Module for table like snytax
If you are going to be binding to a back end, and not* distributing your data among computers, then SQLite is the way to go.
A "proper" database would probably be the way to go. If your application only runs on one computer and the database doesn't get to big, sqlite is good and easy to use with python (standard module sqlite3, see the Library Reference for more information)
take a look at the sqlite3 module, it lets you create a single-file database (no server to setup) that will let you perform sql queries. It's part of the standard library in python, so you don't need to install anythin additional.
http://docs.python.org/library/sqlite3.html

How to efficiently manage frequent schema changes using sqlalchemy?

I'm programming a web application using sqlalchemy. Everything was smooth during the first phase of development when the site was not in production. I could easily change the database schema by simply deleting the old sqlite database and creating a new one from scratch.
Now the site is in production and I need to preserve the data, but I still want to keep my original development speed by easily converting the database to the new schema.
So let's say that I have model.py at revision 50 and model.py a revision 75, describing the schema of the database. Between those two schema most changes are trivial, for example a new column is declared with a default value and I just want to add this default value to old records.
Eventually a few changes may not be trivial and require some pre-computation.
How do (or would) you handle fast changing web applications with, say, one or two new version of the production code per day ?
By the way, the site is written in Pylons if this makes any difference.
Alembic is a new database migrations tool, written by the author of SQLAlchemy. I've found it much easier to use than sqlalchemy-migrate. It also works seamlessly with Flask-SQLAlchemy.
Auto generate the schema migration script from your SQLAlchemy models:
alembic revision --autogenerate -m "description of changes"
Then apply the new schema changes to your database:
alembic upgrade head
More info here: http://readthedocs.org/docs/alembic/
What we do.
Use "major version"."minor version" identification of your applications. Major version is the schema version number. The major number is no some random "enough new functionality" kind of thing. It's a formal declaration of compatibility with database schema.
Release 2.3 and 2.4 both use schema version 2.
Release 3.1 uses the version 3 schema.
Make the schema version very, very visible. For SQLite, this means keep the schema version number in the database file name. For MySQL, use the database name.
Write migration scripts. 2to3.py, 3to4.py. These scripts work in two phases. (1) Query the old data into the new structure creating simple CSV or JSON files. (2) Load the new structure from the simple CSV or JSON files with no further processing. These extract files -- because they're in the proper structure, are fast to load and can easily be used as unit test fixtures. Also, you never have two databases open at the same time. This makes the scripts slightly simpler. Finally, the load files can be used to move the data to another database server.
It's very, very hard to "automate" schema migration. It's easy (and common) to have database surgery so profound that an automated script can't easily map data from old schema to new schema.
Use sqlalchemy-migrate.
It is designed to support an agile approach to database design, and make it easier to keep development and production databases in sync, as schema changes are required. It makes schema versioning easy.
Think of it as a version control for your database schema. You commit each schema change to it, and it will be able to go forwards/backwards on the schema versions. That way you can upgrade a client and it will know exactly which set of changes to apply on that client's database.
It does what S.Lott proposes in his answer, automatically for you. Makes a hard thing easy.
The best way to deal with your problem is to reflect your schema instead doing it the declarative way. I wrote an article about the reflective approach here:
http://petrushev.wordpress.com/2010/06/16/reflective-approach-on-sqlalchemy-usage/
but there are other resources about this also. In this manner, every time you make changes to your schema, all you need to do is restart the app and the reflection will fetch the new metadata for the changes in tables. This is quite fast and sqlalchemy does it only once per process. Of course, you'll have to manage the relationships changes you make yourself.

Categories

Resources