Developing a django project that uses a third party DBAL package - python

I am writing a django project, that makes extensive use of a python package I'm writing (let's call it foo, for convenience).
The python package foo, will consist mainly of functions and classes that munge data obtained from a backend database. I want to write the package in such a way that it will have no dependency on django - and can be used in other projects outside of django.
I am thinking of writing the package so that functions accept a database connection - and classes uses IoC for the database connection - that way, I can obtain a database connection from django and pass it to the DBAL package - when using it in django, and instantiate a DB connection via other means when using the package outside django.
I have two questions:
Is this an acceptable way of approaching this problem (i.e. no gotchas)
Where/how do I obtain a database connection within django?

You can get a cursor object from Django like so:
from django.db import connection
def my_custom_sql(self):
with connection.cursor() as cursor:
cursor.execute("UPDATE bar SET foo = 1 WHERE baz = %s", [self.baz])
cursor.execute("SELECT foo FROM bar WHERE baz = %s", [self.baz])
row = cursor.fetchone()
return row
This example is taken from their docs for custom SQL queries here
In terms of being an acceptable way to write a package, sure. It's just a loose form of Dependency Injection and makes unit testing much easier, for one thing.
The only 'gotcha' is that the cursors passed in by users of your package may not present the same interface. SQLAlchemy cursors may not have the same properties and methods as Django ORM cursors.
I'd say ensuring that the object conforms to your required interface is an acceptable burden to push onto users, as long as you thoroughly document what that object should expose.
You can always provide custom workarounds for the most popular options in future, if you find it would be valuable to do so. (e.g. provide your own cursor adaptor classes).

Related

Dynamic database connection in Django

I'm just prototyping an idea of mine for which I must be able to connect to multiple databases (of multiple types) using Django. I'm aware that it's possible to define several db in settings.py and that we can specify which database a manager should use by using using('db_name'), but unfortunately I can't hard-code my multiple databases in the settings file, since they are dynamic (I mean I don't know at "compile time" which and how many external database I will use). The problem is similar to this one, already asked and answered here: Django: dynamic database file
...but, the accepted answer is IMO just an hack, and I have several concerns about the reliability and security of a similar approach.
So my question is: is there a clean and safe way to establish a database connection dynamically via a somewhat lower level API (like in SQLAlchemy's create_engine('db_url'))? If not is it possible to integrate SQLAlchemy in Django (in a reliable and fully working way)?
ps. another thing I would like to avoid is to have to specify for each ORM action which db to use with using(), instead I like the idea of SQLAlchemy's transaction or alternatively a context processor for which I can write something like:
with active_db('some_db') as db:
# do ORM operations...

Pattern for a Flask App using (only) SQLAlchemy Core

I have a Flask application with which I'd like to use SQLAlchemy Core (i.e. I explicitly do not want to use an ORM), similarly to this "fourth way" described in the Flask doc:
http://flask.pocoo.org/docs/patterns/sqlalchemy/#sql-abstraction-layer
I'd like to know what would be the recommended pattern in terms of:
How to connect to my database (can I simply store a connection instance in the g.db variable, in before_request?)
How to perform reflection to retrieve the structure of my existing database (if possible, I'd like to avoid having to explicitly create any "model/table classes")
Correct: You would create a connection once per thread and access it using a threadlocal variable. As usual, SQLAlchemy has thought of this use-case and provided you with a pattern: Using the Threadlocal Execution Strategy
db = create_engine('mysql://localhost/test', strategy='threadlocal')
db.execute('SELECT * FROM some_table')
Note: If I am not mistaken, the example seems to mix up the names db and engine (which should be db as well, I think).
I think you can safely disregard the Note posted in the documentation as this is explicitly what you want. As long as each transaction scope is linked to a thread (as is with the usual flask setup), you are safe to use this. Just don't start messing with threadless stuff (but flask chokes on that anyway).
Reflection is pretty easy as described in Reflecting Database Objects. Since you don't want to create all the tables manually, SQLAlchemy offers a nice way, too: Reflecting All Tables at Once
meta = MetaData()
meta.reflect(bind=someengine)
users_table = meta.tables['users']
addresses_table = meta.tables['addresses']
I suggest you check that complete chapter concerning reflection.

flask-sqlalchemy or sqlalchemy

I am new in both flask and sqlalchemy, I just start working on a flask app, and I am using sqlalchemy for now. I was wondering if there is any significant benefit I can get from using flask-sqlalchemy vs sqlalchemy. I could not find enough motivations in http://packages.python.org/Flask-SQLAlchemy/index.html or maybe I did not understand the value!! I would appreciate your clarifications.
The main feature of the Flask-SQLAlchemy is proper integration with Flask application - it creates and configures engine, connection and session and configures it to work with the Flask app.
This setup is quite complex as we need to create the scoped session and properly handle it according to the Flask application request/response life-cycle.
In the ideal world that would be the only feature of Flask-SQLAlchemy, but actually, it adds few more things. Check out the docs for more info. Or see this blog post with the overview of them: Demystifying Flask-SQLAlchemy (update: the original article is not available at the moment, there is a snapshot on webarchive).
When I first worked with Flask and SQLAlchemy, I didn't like this overhead . I went over and extracted the session management code from the extension. This approach works, although I discovered that it is quite difficult to do this integration properly.
So the easier approach (which is used in another project I am working on) is to just drop the Flask-SQLAlchemy in and don't use any of additional features it provides. You will have the db.session and you can use it as if it was pure SQLAlchemy setup.
Flask-SQLAlchemy gives you a number of nice extra's you would else end up implementing yourself using SQLAlchemy.
Positive sides on using Flask-SQLAlchemy
Flask_SQLAlchemy handles session configuration, setup and teardown for you.
Gives you declarative base model that makes querying and pagination easier
Backend specific settings.Flask-SQLAlchemy scans installed libs for Unicode support and if fails automatically uses SQLAlchemy Unicode.
Has a method called apply_driver_hacks that automatically sets sane defaults to thigs like MySQL pool-size
Has nice build in methods create_all() and drop_all() for creating and dropping all tables. Useful for testing and in python command line if you did something stupid
It gives you get_or_404()instead of get() and find_or_404() instead of find() Code example at > http://flask-sqlalchemy.pocoo.org/2.1/queries/
Automatically set table names. Flask-SQLAlchemy automatically sets your table names converting your ClassName > class_name this can be overridden by setting __tablename__ class
List item
Negative sides on using Flask-SQLAlchemy
Using Flask-SQLAlchemy will make add additional difficulties to for
migrating from Flask to let's say Pyramid if you ever need to. This is mainly due to the custom declarative base model on Flask_SQLAchemy.
Using Flask-SQLAlchemy you risk using a package with a much smaller community than SQLAlchemy itself, which I cannot easily drop from active development any time soon.
Some nice extras Flask-SQLAlchemy has can make you confused if you do not know they are there.
To be honest, I don't see any benefits. IMHO, Flask-SQLAlchemy creates an additional layer you don't really need. In our case we have a fairly complex Flask application with multiple databases/connections (master-slave) using both ORM and Core where, among other things, we need to control our sessions / DB transactions (e.g. dryrun vs commit modes). Flask-SQLAlchemy adds some additional functionality such as automatic destruction of the session assuming some things for you which is very often not what you need.
The SQLAlchemy documentation clearly states that you should use Flask-SQLAlchemy (especially if you don't understand its benefits!):
[...] products such as Flask-SQLAlchemy [...] SQLAlchemy strongly recommends that these products be used as available.
This quote and a detailed motivation you can find in the second question of the Session FAQ.
As #schlamar suggests, Flask-SqlAlchemy is definitely a good thing. I'd just like to add some extra context to the point made there.
Don't feel like you are choosing one over the other. For example, let's say we want to grab all records from a table using a model using Flask-Sqlalchemy. It is as simple as
Model.query.all()
For a lot of the simple cases, Flask-Sqlalchemy is going to be totally fine. The extra point that I would like to make is, if Flask-Sqlalchemy is not going to do what you want, then there's no reason you can't use SqlAlchemy directly.
from myapp.database import db
num_foo = db.session.query(func.count(OtherModel.id)).filter(is_deleted=False).as_scalar()
db.session.query(Model.id, num_foo.label('num_foo')).order_by('num_foo').all()
As you can see, we can easily jump from one to the other with no trouble and in the second example we are in fact using the Flask-Sqlalchemy defined models.
Here is an example of a benefit flask-sqlalchemy gives you over plain sqlalchemy.
Suppose you're using flask_user.
flask_user automates creation and authentication of user objects, so it needs to access your database. The class UserManager does this by calling through to something called an "adapter" which abstracts the database calls. You provide an adapter in the UserManager constructor, and the adapter must implement these functions:
class MyAdapter(DBAdapter):
def get_object(self, ObjectClass, id):
""" Retrieve one object specified by the primary key 'pk' """
pass
def find_all_objects(self, ObjectClass, **kwargs):
""" Retrieve all objects matching the case sensitive filters in 'kwargs'. """
pass
def find_first_object(self, ObjectClass, **kwargs):
""" Retrieve the first object matching the case sensitive filters in 'kwargs'. """
pass
def ifind_first_object(self, ObjectClass, **kwargs):
""" Retrieve the first object matching the case insensitive filters in 'kwargs'. """
pass
def add_object(self, ObjectClass, **kwargs):
""" Add an object of class 'ObjectClass' with fields and values specified in '**kwargs'. """
pass
def update_object(self, object, **kwargs):
""" Update object 'object' with the fields and values specified in '**kwargs'. """
pass
def delete_object(self, object):
""" Delete object 'object'. """
pass
def commit(self):
pass
If you're using flask-sqlalchemy, you can use the built-in SQLAlchemyAdapter. If you're using sqlalchemy (not-flask-sqlalchemy) you might make different assumptions about the way in which objects are saved to the database (like the names of the tables) so you'll have to write your own adapter class.

Python DB-API: how to handle different paramstyles?

I'm implementing a Python ontology class that uses a database backend to store and query the ontology. The database schema is fixed (specified in advance), but I don't know what type of database engine is being used. However, I can rely on the fact that the Python interface of the database engine uses the Python DB-API 2.0 (PEP 249). A straightforward idea is to let the user pass a PEP 249-compliant Connection object to the constructor of my ontology, which will then use various hardcoded SQL queries to query the database:
class Ontology(object):
def __init__(self, connection):
self.connection = connection
def get_term(self, term_id):
cursor = self.connection.cursor()
query = "SELECT * FROM term WHERE id = %s"
cursor.execute(query, (term_id, ))
[...]
My problem is that different database backends are allowed to support different parameter markers in the queries, defined by the paramstyle attribute of the backend module. For instance, if paramstyle = 'qmark', the interface supports the question mark style (SELECT * FROM term WHERE id = ?); paramstyle = 'numeric' means the numeric, positional style (SELECT * FROM term WHERE id = :1); paramstyle = 'format' means the ANSI C format string style (SELECT * FROM term WHERE id = %s). If I want to make my class be able to handle different database backends, it seems that I have to prepare for all the parameter marker styles. This seems to defeat the whole purpose of a common DB API for me as I can't use the same parameterised query with different database backends.
Is there a way around it, and if so, what is the best approach? The DB API does not specify the existence of a generic escaping function with which I can sanitize my values in the query, so doing the escaping manually is not an option. I don't want to add an extra dependency to the project either by using an even higher level of abstraction (SQLAlchemy, for instance).
This Python recipe might be able to help. It introduces an extra layer of abstraction to wrap parameters in its own Param class.
The PyDal project may also be closer to what you're trying to achieve: "PyDal makes it possible to use the same paramstyle and datetime types with any module that conforms to DBAPI 2.0. In addition, paramstyles and datetime types are configurable."
Strictly speaking, the problem is not caused by the DB API allowing this, but by the different databases which use different SQL syntaxes. The DB API module passes the exact query string to the database, along with the parameters. "Resolving" the parameter markers is done by the database itself, not by the DB API module.
That means that if you want to solve this, you have to introduce some higher level of abstraction. If you do not want to add extra dependencies, you will have to do it yourself. But rather than manually escaping and substituting, you could try to dynamically replace parameter markers in the query string with the desired parameter markers, based on the paramstyle of the backend module. Then pass the string, WITH parameter markers to the db. For example, you could use '%s' everywhere, and use python string substitution to replace the '%s' with ':1', ':2' etc. if the db uses 'numeric' style, and so on....
The thing that tripped me up here was how to figure out what paramstyle is required if your code is just being passed a connection or cursor object. Here's what I came up with:
import importlib
def get_paramstyle(conn):
name = conn.__class__.__module__.split('.')[0]
mod = importlib.import_module(name)
return mod.paramstyle
You should probably do more sanity checking of the conn object, or at least wrap this up in a try block, depending on what assumptions you're willing to make.
I don't want to add an extra dependency to the project either by using
an even higher level of abstraction (SQLAlchemy, for instance).
That's too bad, because SQLAlchemy would be a perfect solution for this problem. In theory, DB-API 2.0 is built to offer this kind of flexibility. But that would require every driver developer (for Oracle, MySQLdb, Postgres, etc) to implement all the different paramstyles in their drivers. They don't. So you get stuck with the "preferred" paramstyle for each database engine.
If you refuse to use SQLAlchemy or any other higher abstraction layer or modern MVC class library, yes you have to write your own higher level of abstraction for this. I don't recommend that, despite that being your chosen solution here. You're facing some devilish details there, and will waste time figuring out bugs that others have already solved.
Don't view an external library dependency as a bad thing. If that's your approach to Python, you are going to be missing out on some of the most powerful features of the language.
Pick your poison.

cx_Oracle and the data source paradigm

There is a Java paradigm for database access implemented in the Java DataSource. This object create a useful abstraction around the creation of database connections. The DataSource object keeps database configuration, but will only create database connections on request. This is allows you to keep all database configuration and initialization code in one place, and makes it easy to change database implementation, or use a mock database for testing.
I currently working on a Python project which uses cx_Oracle. In cx_Oracle, one gets a connection directly from the module:
import cx_Oracle as dbapi
connection = dbapi.connect(connection_string)
# At this point I am assuming that a real connection has been made to the database.
# Is this true?
I am trying to find a parallel to the DataSource in cx_Oracle. I can easily create this by creating a new class and wrapping cx_Oracle, but I was wondering if this is the right way to do it in Python.
You'll find relevant information of how to access databases in Python by looking at PEP-249: Python Database API Specification v2.0. cx_Oracle conforms to this specification, as do many database drivers for Python.
In this specification a Connection object represents a database connection, but there is no built-in pooling. Tools such as SQLAlchemy do provide pooling facilities, and although SQLAlchemy is often billed as an ORM, it does not have to be used as such and offers nice abstractions for use on top of SQL engines.
If you do want to do object-relational-mapping, then SQLAlchemy does the business, and you can consider either its own declarative syntax or another layer such as Elixir which sits on top of SQLAlchemy and provides increased ease of use for more common use cases.
I don't think there is a "right" way to do this in Python, except maybe to go one step further and use another layer between yourself and the database.
Depending on the reason for wanting to use the DataSource concept (which I've only ever come across in Java), SQLAlchemy (or something similar) might solve the problems for you, without you having to write something from scratch.
If that doesn't fit the bill, writing your own wrapper sounds like a reasonable solution.
Yes, Python has a similar abstraction.
This is from our local build regression test, where we assure that we can talk to all of our databases whenever we build a new python.
if database == SYBASE:
import Sybase
conn = Sybase.connect('sybasetestdb','mh','secret')
elif database == POSTRESQL:
import pgdb
conn = pgdb.connect('pgtestdb:mh:secret')
elif database == ORACLE:
import cx_Oracle
conn = cx_Oracle.connect("mh/secret#oracletestdb")
curs=conn.cursor()
curs.execute('select a,b from testtable')
for row in curs.fetchall():
print row
(note, this is the simple version, in our multidb-aware code we have a dbconnection class that has this logic inside.)
I just sucked it up and wrote my own. It allowed me to add things like abstracting the database (Oracle/MySQL/Access/etc), adding logging, error handling with transaction rollbacks, etc.

Categories

Resources