I'm implementing a Python ontology class that uses a database backend to store and query the ontology. The database schema is fixed (specified in advance), but I don't know what type of database engine is being used. However, I can rely on the fact that the Python interface of the database engine uses the Python DB-API 2.0 (PEP 249). A straightforward idea is to let the user pass a PEP 249-compliant Connection object to the constructor of my ontology, which will then use various hardcoded SQL queries to query the database:
class Ontology(object):
def __init__(self, connection):
self.connection = connection
def get_term(self, term_id):
cursor = self.connection.cursor()
query = "SELECT * FROM term WHERE id = %s"
cursor.execute(query, (term_id, ))
[...]
My problem is that different database backends are allowed to support different parameter markers in the queries, defined by the paramstyle attribute of the backend module. For instance, if paramstyle = 'qmark', the interface supports the question mark style (SELECT * FROM term WHERE id = ?); paramstyle = 'numeric' means the numeric, positional style (SELECT * FROM term WHERE id = :1); paramstyle = 'format' means the ANSI C format string style (SELECT * FROM term WHERE id = %s). If I want to make my class be able to handle different database backends, it seems that I have to prepare for all the parameter marker styles. This seems to defeat the whole purpose of a common DB API for me as I can't use the same parameterised query with different database backends.
Is there a way around it, and if so, what is the best approach? The DB API does not specify the existence of a generic escaping function with which I can sanitize my values in the query, so doing the escaping manually is not an option. I don't want to add an extra dependency to the project either by using an even higher level of abstraction (SQLAlchemy, for instance).
This Python recipe might be able to help. It introduces an extra layer of abstraction to wrap parameters in its own Param class.
The PyDal project may also be closer to what you're trying to achieve: "PyDal makes it possible to use the same paramstyle and datetime types with any module that conforms to DBAPI 2.0. In addition, paramstyles and datetime types are configurable."
Strictly speaking, the problem is not caused by the DB API allowing this, but by the different databases which use different SQL syntaxes. The DB API module passes the exact query string to the database, along with the parameters. "Resolving" the parameter markers is done by the database itself, not by the DB API module.
That means that if you want to solve this, you have to introduce some higher level of abstraction. If you do not want to add extra dependencies, you will have to do it yourself. But rather than manually escaping and substituting, you could try to dynamically replace parameter markers in the query string with the desired parameter markers, based on the paramstyle of the backend module. Then pass the string, WITH parameter markers to the db. For example, you could use '%s' everywhere, and use python string substitution to replace the '%s' with ':1', ':2' etc. if the db uses 'numeric' style, and so on....
The thing that tripped me up here was how to figure out what paramstyle is required if your code is just being passed a connection or cursor object. Here's what I came up with:
import importlib
def get_paramstyle(conn):
name = conn.__class__.__module__.split('.')[0]
mod = importlib.import_module(name)
return mod.paramstyle
You should probably do more sanity checking of the conn object, or at least wrap this up in a try block, depending on what assumptions you're willing to make.
I don't want to add an extra dependency to the project either by using
an even higher level of abstraction (SQLAlchemy, for instance).
That's too bad, because SQLAlchemy would be a perfect solution for this problem. In theory, DB-API 2.0 is built to offer this kind of flexibility. But that would require every driver developer (for Oracle, MySQLdb, Postgres, etc) to implement all the different paramstyles in their drivers. They don't. So you get stuck with the "preferred" paramstyle for each database engine.
If you refuse to use SQLAlchemy or any other higher abstraction layer or modern MVC class library, yes you have to write your own higher level of abstraction for this. I don't recommend that, despite that being your chosen solution here. You're facing some devilish details there, and will waste time figuring out bugs that others have already solved.
Don't view an external library dependency as a bad thing. If that's your approach to Python, you are going to be missing out on some of the most powerful features of the language.
Pick your poison.
Related
I have this code:
def advertiser_table(engine):
return Table('advertiser', metadata, autoload=True, autoload_with=engine)
And later I try this:
advertisers = advertiser_table(engine)
...
session.bulk_insert_mappings(
advertisers.name,
missing_advetisers.to_dict('records'),
)
where missing_adverisers is a Pandas DataFrame (but it's not important for this question).
The error this gives me is:
sqlalchemy.orm.exc.UnmappedClassError: Class ''advertiser'' is not mapped
From reading the documentation I could scramble enough to ask the question, but not much more than that... What is Mapper and why is it so detrimental to the functioning of this library?.. Why isn't "the class" mapped? Obviously, what am I to do to "map" it to whatever this library wants it to map?
A Mapper is the M in ORM. It is the thing that maps your table (advertisers in this case) to instances of a class (which you are missing in this case) in order for you to operate on it.
The reason it's confusing for you is because SQLAlchemy is actually two libraries in one -- one is called SQLAlchemy Core, and the other is the SQLAlchemy ORM. Core provides the ability to work with tables and to construct queries that return rows, while the ORM builds on top of Core to provide the ability to work with instances of classes and their relationships as an abstraction. Core roughly corresponds to things you can do on Connection and Engine, while ORM roughly corresponds to things you can do on Session.
So, all of that is to say, session.bulk_insert_mappings is an ORM functionality, and you cannot use it without having a mapped class.
What can you do instead? Use the equivalent Core functionality:
query = advertisers.insert().values(missing_advetisers.to_dict('records'))
engine.execute(query) # or session.execute(query)
Or even use the pandas-provided to_sql function:
missing_advetisers.to_sql("advertiser", engine, if_exists="append")
If you insist on using the ORM, you need to declare a mapped class for your table. The easiest way when using reflection is to use automap. The linked documentation has many examples, so I won't go into detail here.
I have a Flask application with which I'd like to use SQLAlchemy Core (i.e. I explicitly do not want to use an ORM), similarly to this "fourth way" described in the Flask doc:
http://flask.pocoo.org/docs/patterns/sqlalchemy/#sql-abstraction-layer
I'd like to know what would be the recommended pattern in terms of:
How to connect to my database (can I simply store a connection instance in the g.db variable, in before_request?)
How to perform reflection to retrieve the structure of my existing database (if possible, I'd like to avoid having to explicitly create any "model/table classes")
Correct: You would create a connection once per thread and access it using a threadlocal variable. As usual, SQLAlchemy has thought of this use-case and provided you with a pattern: Using the Threadlocal Execution Strategy
db = create_engine('mysql://localhost/test', strategy='threadlocal')
db.execute('SELECT * FROM some_table')
Note: If I am not mistaken, the example seems to mix up the names db and engine (which should be db as well, I think).
I think you can safely disregard the Note posted in the documentation as this is explicitly what you want. As long as each transaction scope is linked to a thread (as is with the usual flask setup), you are safe to use this. Just don't start messing with threadless stuff (but flask chokes on that anyway).
Reflection is pretty easy as described in Reflecting Database Objects. Since you don't want to create all the tables manually, SQLAlchemy offers a nice way, too: Reflecting All Tables at Once
meta = MetaData()
meta.reflect(bind=someengine)
users_table = meta.tables['users']
addresses_table = meta.tables['addresses']
I suggest you check that complete chapter concerning reflection.
Rather than use an ORM, I am considering the following approach in Python and MySQL with no ORM (SQLObject/SQLAlchemy). I would like to get some feedback on whether this seems likely to have any negative long-term consequences since in the short-term view it seems fine from what I can tell.
Rather than translate a row from the database into an object:
each table is represented by a class
a row is retrieved as a dict
an object representing a cursor provides access to a table like so:
cursor.mytable.get_by_ids(low, high)
removing means setting the time_of_removal to the current time
So essentially this does away with the need for an ORM since each table has a class to represent it and within that class, a separate dict represents each row.
Type mapping is trivial because each dict (row) being a first class object in python/blub allows you to know the class of the object and, besides, the low-level database library in Python handles the conversion of types at the field level into their appropriate application-level types.
If you see any potential problems with going down this road, please let me know. Thanks.
That doesn't do away with the need for an ORM. That is an ORM. In which case, why reinvent the wheel?
Is there a compelling reason you're trying to avoid using an established ORM?
You will still be using SQLAlchemy. ResultProxy is actually a dictionary once you go for .fetchmany() or similar.
Use SQLAlchemy as a tool that makes managing connections easier, as well as executing statements. Documentation is pretty much separated in sections, so you will be reading just the part that you need.
web.py has in a decent db abstraction too (not an ORM).
Queries are written in SQL (not specific to any rdbms), but your code remains compatible with any of the supported dbs (sqlite, mysql, postresql, and others).
from http://webpy.org/cookbook/select:
myvar = dict(name="Bob")
results = db.select('mytable', myvar, where="name = $name")
There is a Java paradigm for database access implemented in the Java DataSource. This object create a useful abstraction around the creation of database connections. The DataSource object keeps database configuration, but will only create database connections on request. This is allows you to keep all database configuration and initialization code in one place, and makes it easy to change database implementation, or use a mock database for testing.
I currently working on a Python project which uses cx_Oracle. In cx_Oracle, one gets a connection directly from the module:
import cx_Oracle as dbapi
connection = dbapi.connect(connection_string)
# At this point I am assuming that a real connection has been made to the database.
# Is this true?
I am trying to find a parallel to the DataSource in cx_Oracle. I can easily create this by creating a new class and wrapping cx_Oracle, but I was wondering if this is the right way to do it in Python.
You'll find relevant information of how to access databases in Python by looking at PEP-249: Python Database API Specification v2.0. cx_Oracle conforms to this specification, as do many database drivers for Python.
In this specification a Connection object represents a database connection, but there is no built-in pooling. Tools such as SQLAlchemy do provide pooling facilities, and although SQLAlchemy is often billed as an ORM, it does not have to be used as such and offers nice abstractions for use on top of SQL engines.
If you do want to do object-relational-mapping, then SQLAlchemy does the business, and you can consider either its own declarative syntax or another layer such as Elixir which sits on top of SQLAlchemy and provides increased ease of use for more common use cases.
I don't think there is a "right" way to do this in Python, except maybe to go one step further and use another layer between yourself and the database.
Depending on the reason for wanting to use the DataSource concept (which I've only ever come across in Java), SQLAlchemy (or something similar) might solve the problems for you, without you having to write something from scratch.
If that doesn't fit the bill, writing your own wrapper sounds like a reasonable solution.
Yes, Python has a similar abstraction.
This is from our local build regression test, where we assure that we can talk to all of our databases whenever we build a new python.
if database == SYBASE:
import Sybase
conn = Sybase.connect('sybasetestdb','mh','secret')
elif database == POSTRESQL:
import pgdb
conn = pgdb.connect('pgtestdb:mh:secret')
elif database == ORACLE:
import cx_Oracle
conn = cx_Oracle.connect("mh/secret#oracletestdb")
curs=conn.cursor()
curs.execute('select a,b from testtable')
for row in curs.fetchall():
print row
(note, this is the simple version, in our multidb-aware code we have a dbconnection class that has this logic inside.)
I just sucked it up and wrote my own. It allowed me to add things like abstracting the database (Oracle/MySQL/Access/etc), adding logging, error handling with transaction rollbacks, etc.
Yes, very basic question.
I've successfully created my db using declarative_base, and can perform inserts into the db too. I just have a few questions about SqlAlchemy sql statements.
I've create a table called Location.
A few issues/questions (see code below):
For statement, "print row", I have to specify each column name that I want to have output. i.e. "print row.name, row.lat, etc" Why? (Otherwise the print statement outputs "<classname.Location at <...>>"
Also, what is the preferred way to interact with a db and perform queries (select, insert, update, etc.)- there seem to be a bunch of options: using sqlalchemy.orm.select for example, or engine.text(<sql query>).execute().fetchall(), or even conn.execute(<select>). Options are great, but right now they're all just confusing me.
Thanks so much for the tips!
Here's my code:
from sqlalchemy import create_engine
from sqlalchemy.sql import select
from location_db_setup import *
db_path = "sqlite:////volumes/users/shared/programming/python/web/map.db"
engine = create_engine(db_path, echo= True)
Session = sessionmaker(bind= engine)
session = Session()
session.query(Location).fetchall()
for row in locations:
print row
You code in sample is incomplete and has errors. So it's impossible to say for sure what is Location here. I assume it's a mapped class, so you are requesting a list of all Location objects, not rows. When you print an object you get its string representation. String representation of objects can be changed by defining custom __str__ method.
Although ORM is the most important part of SQLAlchemy, it's not the only. It also expose a lot of functionality not related to ORM directly. When you work with objects the preferred way to create queries are corresponding session method. But sometimes you need selectable objects not bound to particular session (they are not executed directly, but are used in expressions passed to session methods). That's why there are functions in sqlalchemy.orm package.
The preferred way to interact with a db when using an ORM is not to use queries but to use objects that correspond to the tables you are manipulating, typically in conjunction with the session object. SELECT queries become get() or find() calls in some ORMs, query() calls in others. INSERT becomes creating a new object of the type you want (and maybe explicitly adding it, eg session.add() in sqlalchemy). UPDATE becomes editing such an object, and DELETE becomes deleting an object (eg. session.delete() ). The ORM is meant to handle the hard work of translating these operations into SQL for you.
Have you read the tutorial?
Denis and Kylotan gave you good answers. I'm just gonna focus on point 2.
Sometimes depends on your taste. There are times when you need database specific features that an ORM can't do, that's a case when you should use Session(<sql here>).execute() or conn.execute(<sql here>). Another case is when you have a very complex query which is beyond you and you don't find a suitable ORM expression.
Usually, using ORM features like select([...]).where(... or Session.query(<Model here>).filter(... (declarative base) are enough. Almost every sql query has an ORM equivalent.