I want to create a SQL-like interface for a special data source which I can query using Python. That is, I have a data source with a number of named iterable containers of entities, and I would like to be able to use SQL to filter, join, sort and preferably update/insert/delete as well.
To my understanding, sqlite3's virtual table functionality is quite suitable for this task. Is it possible to create the necessary bindings in Python? I understand that the glue has to be c-like, but my hope is someone already wrote a Python wrapper in C or using ctypes.
I will also accept an answer for a better/easier way of doing this.
You can do this by registering a virtual table in SQLite with the APSW Python bindings.
An example for talking to CouchDB using APSW.
There's a similar capability for Perl, namely:
Create SQLite Virtual Table extensions in Perl
Finally, if you want to make a Python-based virtual table in PostgreSQL 9.1, check out http://multicorn.org/.
Sounds like you can use SQLAlchemy to persist these objects to sqlite3, possibly to an in :memory: db, and issue both object and table level (raw sql) queries. You can also update/insert/delete them easily.
Related
I need to read data from MySQL, process it with python script and write the result into Sqlite.
Also, I need to convert MySql create definitions to Sqlite create definitions.
Are there any existing libraries for python to convert MySql data type (including set, enum, timestamp, etc.) to Sqlite data type, or I should write it myself?
Depending on your use case, you could use an ORM library like peewee for abstracting away the MySQL and Sqlite databases.
One possible way of approaching your problem would be to use peewee's model generator for creating models for the MySQL database, which you can later reuse for the Sqlite one using this example as reference.
I am building a database interface using Python's peewee module. I am trying to figure out how to insert data into an existing database where I do not know the schema.
My idea is to use playhouse.reflection.Introspector to find out the database schema, then use that information to create class objects which can then be inserted into the existing database.
So far I've gotten to:
introspector = Introspector.from_database(database)
models = introspector.generate_models()
I'm don't know where to go from there.
1) Can I create database objects in this manner? What is the next step?
2) Is there an easier way to do this?
peewee includes an introspection tool called pwiz that can (basically) introspect a database and produce model definitions. It is run as a command line script and dumps the model definitions to stdout, so invokation is like any other unix tool. Here is an example from the docs:
python -m pwiz -e postgresql my_postgres_db > mymodels.py
From there edit mymodels.py to get what you need.
You could do this on the fly, but it would require a few steps and is hackish (not to mention pointless if you really don't know anything about the schema):
Run pwiz as an os command
Read it to pick out the model names
Import whatever you find
BUT
If you really don't know the schema to start with then you have no idea what the semantics of the database are anyway, which means whatever you find is literally meaningless. Unless you at least know some schema/table/column names you are hunting for (in which case you do know something about the schema) there isn't really much you can do with regard to inserting data (not in a sane way), though you could certainly dump data from the db. But if you just wanted a database dump then pg_dump would have been easier.
I suspect this is actually an X-Y problem. What problem is it you are trying to solve by using this technique? What effect is it supposed to achieve within the context of your system?
If you want to create a GUI, check out the sqlite_web project. It uses Peewee to create a web-based SQLite database manager.
In perl, DBI module is the standard way of interacting with DBs, where each DB vendor provides its own DBD module which is used by the DBI. (It's somewhat similar to JDBC.) I can't figure out if a similar model exists in python. In case of Postgres, I see there are pg and pgdb modules, where pgdb follows DB-API 2.0 and pg doesn't. Should I care about that? If I go with pgdb, should I expect the same interface from a MySQL db module, which follows DB-API 2.0 ?
Thank you!
A popular module for interacting with Postgres in Python which is DB API 2.0 compliant is psycopg2 (http://initd.org/psycopg/docs/index.html).
That's the one I always use in my Python code to interact with Postgres. I find it straightforward to use, and it offers some nice extras that are fairly easy to add, such as dictionary-based cursors (i.e. DictCursor, where the rows are in a dictionary with the column names as keys, as opposed to an array).
There's also named cursors, where all you have to do is supply a cursor with a name, and psycopg2 will automatically create a server side cursor for you with a default chunk size of 2000, which you can iterate over as any other Python object, with the subsequent fetches going on transparently in the background.
Yes, Python DBAPI 2.0 is the standard API for interacting with database in Python. Note though, that DBAPI is a very simple, low-level interface, by itself, it does not make it easy to write database queries that would be portable across different databases when different databases implement SQL differently.
For a higher level interface that do help you to write portable database application, you can check out SQLAlchemy. Both SQLalchemy core and ORM provides a language for querying database in portable way.
My company has decided to implement a datamart using [Greenplum] and I have the task of figuring out how to go on about it. A ballpark figure of the amount of data to be transferred from the existing [DB2] DB to the Greenplum DB is about 2 TB.
I would like to know :
1) Is the Greenplum DB the same as vanilla [PostgresSQL]? (I've worked on Postgres AS 8.3)
2) Are there any (free) tools available for this task (extract and import)
3) I have some knowledge of Python. Is it feasible, even easy to do this in a resonable amount of time?
I have no idea how to do this. Any advice, tips and suggestions will be hugely welcome.
1) Greenplum is not vanilla postgres, but it is similar. It has some new syntax, but in general, is highly consistent.
2) Greenplum itself provides something called "gpfdist" which lets you listen on a port that you specify in order to bring in a file (but the file has to be split up). You want readable external tables. They are quite fast. Syntax looks like this:
CREATE READABLE EXTERNAL TABLE schema.ext_table
( thing int, thing2 int )
LOCATION (
'gpfdist://server:port1/path/to/filep1.txt',
'gpfdist://server:port2/path/to/filep2.txt',
'gpfdist://server:port3/path/to/filep3.txt'
) FORMAT 'text' (delimiter E'\t' null 'null' escape 'off') ENCODING 'UTF8';
CREATE TEMP TABLE import AS SELECT * FROM schema.ext_table DISTRIBUTED RANDOMLY;
If you play to their rules and your data is clean, the loading can be blazing fast.
3) You don't need python to do this, although you could automate it by using python to kick off the gpfdist processes, and then sending a command to psql that creates the external table and loads the data. Depends on what you want to do though.
Many of Greenplum's utilities are written in python and the current DBMS distribution comes with python 2.6.2 installed, including the pygresql module which you can use to work inside the GPDB.
For data transfer into greenplum, I've written python scripts that connect to the source (Oracle) DB using cx_Oracle and then dumping that output either to flat files or named pipes. gpfdist can read from either sort of source and load the data into the system.
Generally, it is really slow if you use SQL insert or merge to import big bulk data.
The recommended way is to use the external tables you define to use file-based, web-based or gpfdist protocol hosted files.
And also greenplum has a utility named gpload, which can be used to define your transferring jobs, like source, output, mode(inert, update or merge).
1) It's not vanilla postgres
2) I have used pentaho data integration with good success in various types of data transfer projects.
It allows for complex transformations and multi-threaded, multi-step loading of data if you design your steps carefully.
Also I believe Pentaho support Greenplum specifically though I have no experience of this.
I want to be able to add daily info to each object and want to have the ability to delete info x days old easily. With the tables I need to look at the trends and do stuff like selecting objects which match some criteria.
Edit: I asked this because I'm not able to think of a way to implement deleting old data easily because you cannot delete tables in sqlite
Using sqlite would it be the best option, is file based, easy to use, you can use Lookups with SQL and it's builtin on python you don't need to install anything.
→ http://docs.python.org/library/sqlite3.html
If your question means that you are just going to be using "table like data" but not bound to a db, look into using this python modul: Module for table like snytax
If you are going to be binding to a back end, and not* distributing your data among computers, then SQLite is the way to go.
A "proper" database would probably be the way to go. If your application only runs on one computer and the database doesn't get to big, sqlite is good and easy to use with python (standard module sqlite3, see the Library Reference for more information)
take a look at the sqlite3 module, it lets you create a single-file database (no server to setup) that will let you perform sql queries. It's part of the standard library in python, so you don't need to install anythin additional.
http://docs.python.org/library/sqlite3.html