There is a Java paradigm for database access implemented in the Java DataSource. This object create a useful abstraction around the creation of database connections. The DataSource object keeps database configuration, but will only create database connections on request. This is allows you to keep all database configuration and initialization code in one place, and makes it easy to change database implementation, or use a mock database for testing.
I currently working on a Python project which uses cx_Oracle. In cx_Oracle, one gets a connection directly from the module:
import cx_Oracle as dbapi
connection = dbapi.connect(connection_string)
# At this point I am assuming that a real connection has been made to the database.
# Is this true?
I am trying to find a parallel to the DataSource in cx_Oracle. I can easily create this by creating a new class and wrapping cx_Oracle, but I was wondering if this is the right way to do it in Python.
You'll find relevant information of how to access databases in Python by looking at PEP-249: Python Database API Specification v2.0. cx_Oracle conforms to this specification, as do many database drivers for Python.
In this specification a Connection object represents a database connection, but there is no built-in pooling. Tools such as SQLAlchemy do provide pooling facilities, and although SQLAlchemy is often billed as an ORM, it does not have to be used as such and offers nice abstractions for use on top of SQL engines.
If you do want to do object-relational-mapping, then SQLAlchemy does the business, and you can consider either its own declarative syntax or another layer such as Elixir which sits on top of SQLAlchemy and provides increased ease of use for more common use cases.
I don't think there is a "right" way to do this in Python, except maybe to go one step further and use another layer between yourself and the database.
Depending on the reason for wanting to use the DataSource concept (which I've only ever come across in Java), SQLAlchemy (or something similar) might solve the problems for you, without you having to write something from scratch.
If that doesn't fit the bill, writing your own wrapper sounds like a reasonable solution.
Yes, Python has a similar abstraction.
This is from our local build regression test, where we assure that we can talk to all of our databases whenever we build a new python.
if database == SYBASE:
import Sybase
conn = Sybase.connect('sybasetestdb','mh','secret')
elif database == POSTRESQL:
import pgdb
conn = pgdb.connect('pgtestdb:mh:secret')
elif database == ORACLE:
import cx_Oracle
conn = cx_Oracle.connect("mh/secret#oracletestdb")
curs=conn.cursor()
curs.execute('select a,b from testtable')
for row in curs.fetchall():
print row
(note, this is the simple version, in our multidb-aware code we have a dbconnection class that has this logic inside.)
I just sucked it up and wrote my own. It allowed me to add things like abstracting the database (Oracle/MySQL/Access/etc), adding logging, error handling with transaction rollbacks, etc.
Related
I am writing a django project, that makes extensive use of a python package I'm writing (let's call it foo, for convenience).
The python package foo, will consist mainly of functions and classes that munge data obtained from a backend database. I want to write the package in such a way that it will have no dependency on django - and can be used in other projects outside of django.
I am thinking of writing the package so that functions accept a database connection - and classes uses IoC for the database connection - that way, I can obtain a database connection from django and pass it to the DBAL package - when using it in django, and instantiate a DB connection via other means when using the package outside django.
I have two questions:
Is this an acceptable way of approaching this problem (i.e. no gotchas)
Where/how do I obtain a database connection within django?
You can get a cursor object from Django like so:
from django.db import connection
def my_custom_sql(self):
with connection.cursor() as cursor:
cursor.execute("UPDATE bar SET foo = 1 WHERE baz = %s", [self.baz])
cursor.execute("SELECT foo FROM bar WHERE baz = %s", [self.baz])
row = cursor.fetchone()
return row
This example is taken from their docs for custom SQL queries here
In terms of being an acceptable way to write a package, sure. It's just a loose form of Dependency Injection and makes unit testing much easier, for one thing.
The only 'gotcha' is that the cursors passed in by users of your package may not present the same interface. SQLAlchemy cursors may not have the same properties and methods as Django ORM cursors.
I'd say ensuring that the object conforms to your required interface is an acceptable burden to push onto users, as long as you thoroughly document what that object should expose.
You can always provide custom workarounds for the most popular options in future, if you find it would be valuable to do so. (e.g. provide your own cursor adaptor classes).
I'm just prototyping an idea of mine for which I must be able to connect to multiple databases (of multiple types) using Django. I'm aware that it's possible to define several db in settings.py and that we can specify which database a manager should use by using using('db_name'), but unfortunately I can't hard-code my multiple databases in the settings file, since they are dynamic (I mean I don't know at "compile time" which and how many external database I will use). The problem is similar to this one, already asked and answered here: Django: dynamic database file
...but, the accepted answer is IMO just an hack, and I have several concerns about the reliability and security of a similar approach.
So my question is: is there a clean and safe way to establish a database connection dynamically via a somewhat lower level API (like in SQLAlchemy's create_engine('db_url'))? If not is it possible to integrate SQLAlchemy in Django (in a reliable and fully working way)?
ps. another thing I would like to avoid is to have to specify for each ORM action which db to use with using(), instead I like the idea of SQLAlchemy's transaction or alternatively a context processor for which I can write something like:
with active_db('some_db') as db:
# do ORM operations...
In perl, DBI module is the standard way of interacting with DBs, where each DB vendor provides its own DBD module which is used by the DBI. (It's somewhat similar to JDBC.) I can't figure out if a similar model exists in python. In case of Postgres, I see there are pg and pgdb modules, where pgdb follows DB-API 2.0 and pg doesn't. Should I care about that? If I go with pgdb, should I expect the same interface from a MySQL db module, which follows DB-API 2.0 ?
Thank you!
A popular module for interacting with Postgres in Python which is DB API 2.0 compliant is psycopg2 (http://initd.org/psycopg/docs/index.html).
That's the one I always use in my Python code to interact with Postgres. I find it straightforward to use, and it offers some nice extras that are fairly easy to add, such as dictionary-based cursors (i.e. DictCursor, where the rows are in a dictionary with the column names as keys, as opposed to an array).
There's also named cursors, where all you have to do is supply a cursor with a name, and psycopg2 will automatically create a server side cursor for you with a default chunk size of 2000, which you can iterate over as any other Python object, with the subsequent fetches going on transparently in the background.
Yes, Python DBAPI 2.0 is the standard API for interacting with database in Python. Note though, that DBAPI is a very simple, low-level interface, by itself, it does not make it easy to write database queries that would be portable across different databases when different databases implement SQL differently.
For a higher level interface that do help you to write portable database application, you can check out SQLAlchemy. Both SQLalchemy core and ORM provides a language for querying database in portable way.
I am writing a quick and dirty script which requires interaction with a database (PG).
The script is a pragmatic, tactical solution to an existing problem. however, I envisage that the script will evolve over time into a more "refined" system. Given the fact that it is currently being put together very quickly (i.e. I don't have the time to pour over huge reams of documentation), I am tempted to go the quick and dirty route, using psycopg.
The advantages for psycopg2 (as I currently understand it) is that:
written in C, so faster than sqlAlchemy (written in Python)?
No abstraction layer over the DBAPI since works with one db and one db only (implication -> fast)
(For now), I don't need an ORM, so I can directly execute my SQL statements without having to learn a new ORM syntax (i.e. lightweight)
Disadvantages:
I KNOW that I will want an ORM further down the line
psycopg2 is ("dated"?) - don't know how long it will remain around for
Are my perceptions of SqlAlchemy (slow/interpreted, bloated, steep learning curve) true - IS there anyway I can use sqlAlchemy in the "rough and ready" way I want to use psycopg - namely:
execute SQL statements directly without having to mess about with the ORM layer, etc.
Any examples of doing this available?
SQLAlchemy is a ORM, psycopg2 is a database driver. These are completely different things: SQLAlchemy generates SQL statements and psycopg2 sends SQL statements to the database. SQLAlchemy depends on psycopg2 or other database drivers to communicate with the database!
As a rather complex software layer SQLAlchemy does add some overhead but it also is a huge boost to development speed, at least once you learned the library. SQLAlchemy is an excellent library and will teach you the whole ORM concept, but if you don't want to generate SQL statements to begin with then you don't want SQLAlchemy.
To talk with database any one need driver for that. If you are using client like SQL Plus for oracle, MysqlCLI for Mysql then it will direct run the query and that client come with DBServer pack.
To communicate from outside with any language like java, c, python, C#... We need driver to for that database. psycopg2 is driver to run query for PostgreSQL from python.
SQLAlchemy is the ORM which is not same as database driver. It will give you flexibility so you can write your code without any database specific standard. ORM provide database independence for programmer. If you write object.save in ORM then it will check, which database is associated with that object and it will generate insert query according to the backend database.
I have a script with several functions that all need to make database calls. I'm trying to get better at writing clean code rather than just throwing together scripts with horrible style. What is generally considered the best way to establish a global database connection that can be accessed anywhere in the script but is not susceptible to errors such as accidentally redefining the variable holding a connection. I'd imagine I should be putting everything in a module? Any links to actual code would be very useful as well. Thanks.
If you are working with Python and databases, you cannot afford not to look at SQLAlchemy:
SQLAlchemy is the Python SQL toolkit
and Object Relational Mapper that
gives application developers the full
power and flexibility of SQL.
It provides a full suite of well known
enterprise-level persistence patterns,
designed for efficient and
high-performing database access,
adapted into a simple and Pythonic
domain language.
I have built very complex databases with a surprisingly small amount of code (a few hundred lines). The schema definition is almost self-documenting, the objects used for the Object Relational Mapper are Plain Old Python Objects (i.e., what you already have), and the querying API is almost obvious. In addition, the documentation is excellent: many online examples, fully documented API, and an O'Reilly book which, while far from perfect, does take you from zero to dangerous in a few evenings.
If you don't want to use the Object Relational Mapper, you can always fall back to plain connections and literal SQL. Also, the code is portable and database independent (the same code will work with MySQL, Oracle, SQLite, and other database managers).
The Session object will automatically take care of the pooling (what you mention as your concern).
The best way to understand its power is probably to follow the tutorials obtained in the first result page of the Google query sqlalchemy tutorial.
Use a model system/ORM system.