I'm just prototyping an idea of mine for which I must be able to connect to multiple databases (of multiple types) using Django. I'm aware that it's possible to define several db in settings.py and that we can specify which database a manager should use by using using('db_name'), but unfortunately I can't hard-code my multiple databases in the settings file, since they are dynamic (I mean I don't know at "compile time" which and how many external database I will use). The problem is similar to this one, already asked and answered here: Django: dynamic database file
...but, the accepted answer is IMO just an hack, and I have several concerns about the reliability and security of a similar approach.
So my question is: is there a clean and safe way to establish a database connection dynamically via a somewhat lower level API (like in SQLAlchemy's create_engine('db_url'))? If not is it possible to integrate SQLAlchemy in Django (in a reliable and fully working way)?
ps. another thing I would like to avoid is to have to specify for each ORM action which db to use with using(), instead I like the idea of SQLAlchemy's transaction or alternatively a context processor for which I can write something like:
with active_db('some_db') as db:
# do ORM operations...
Related
I have been playing arround with django for a couple of days and it seems great, but I find it a pain if I want to change the structure of my database, I then am stuck with a few rather awkward options.
Is there a way to completely bypass djangos database abstraction so if I change the structure of the database I dont have to guess what model would have generated it or use a tool (south or ...) to change things?
I essentially want this: https://docs.djangoproject.com/en/dev/topics/db/sql/ (Raw SQL Queries) but instead of refering to a model, refering to an external database.
Could I just create an empty model and then only perform raw queries on it? (and set up the DB externally)
Thanks
P.S. I dont really mind if I have separate databases for the admin stuff and the app data
It's in your question already, just read the docs article from here: Executing custom SQL directly
How can I create one model that talks to two databases in Flask, where one is, say, sqlite, and the other is specifically neo4j?
I'd like to have login and password stuff in a traditional db, and keep other graphy information in neo4j. I'm told neo4j is bad for things that need large graph traversals. Perhaps I'm wrong in needing this, but I have an instance where I'd like to say something like...
"return a dict(person.x,person.y,person.z) from all nodes where type==person", and then feed that into the view of my index page.
I've seen related questions about ORMs with neo4j:
ORM with Graph-Databases like Neo4j in Python
...and this about multiple DBs in Flask:
http://packages.python.org/Flask-SQLAlchemy/binds.html
Specifically, I see this taking the form of my create statement writing to sqlite db connection and then writing a key from there to additional relational information in neo4j.
I've recently released an OGM (Object-Graph Mapping) module for py2neo (http://book.py2neo.org/en/latest/ogm.html). This might help with what you're trying to do.
Otherwise, you could also look at neomodel (https://github.com/robinedwards/neomodel). It's written for Django but should be usable in Flask too.
I don't know about mixed backend models, but I think depending on your user count, you can use neo4j for your users, too. If you put the user nodes into an index, you can get all users without searching the graph.
If you find that this actually is a bottleneck, migrating it to a split storage should not be too difficult.
It is not that hard to adapt neo4j-driver and py2neo to use eg. Flask-Login.
I have use py2neo to that and worked well, but migrated now to neo4j-driver
Downside is that i did not manage to get it working with eg SQLalchemy etc.
Using a double backend solution is not a problem, in an earlier project i have used SQLalchemy with SQLite3 and PostgreSQL, Neo4j and redis together.
Using that, i have found no issues other than some design issues.
i'm working on a project (written in Django) which has only a few entities, but many rows for each entity.
In my application i have several static "reports", directly written in plain SQL. The users can also search the database via a generic filter form. Since the target audience is really tech-savvy and at some point the filter doesn't fit their needs, i think about creating a query language for my database like YQL or Jira's advanced search.
I found http://sourceforge.net/projects/littletable/ and http://www.quicksort.co.uk/DeeDoc.html, but it seems that they only operate on in-memory objects. Since the database can be too large for holding it in-memory, i would prefer that the query is translated in SQL (or better a Django query) before doing the actual work.
Are there any library or best practices on how to do this?
Writing such a DSL is actually surprisingly easy with PLY, and what ho—there's already an example available for doing just what you want, in Django. You see, Django has this fancy thing called a Q object which make the Django querying side of things fairly easy.
At DjangoCon EU 2012, Matthieu Amiguet gave a session entitled Implementing Domain-specific Languages in Django Applications in which he went through the process, right down to implementing such a DSL as you desire. His slides, which include all you need, are available on his website. The final code (linked to from the last slide, anyway) is available at http://www.matthieuamiguet.ch/media/misc/djangocon2012/resources/compiler.html.
Reinout van Rees also produced some good comments on that session. (He normally does!) These cover a little of the missing context.
You see in there something very similar to YQL and JQL in the examples given:
groups__name="XXX" AND NOT groups__name="YYY"
(modified > 1/4/2011 OR NOT state__name="OK") AND groups__name="XXX"
It can also be tweaked very easily; for example, you might want to use groups.name rather than groups__name (I would). This modification could be made fairly trivially (allow . in the FIELD token, by modifying t_FIELD, and then replacing . with __ before constructing the Q object in p_expression_ID).
So, that satisfies simple querying; it also gives you a good starting point should you wish to make a more complex DSL.
I've faced exactly this problem - a large database which needs searching. I made some static reports and several fancy filters using django (very easy with django) just like you have.
However the power users were clamouring for more. I decided that there already was a DSL that they all knew - SQL. The question was how to make it secure enough.
So I used django permissions to give the power users permission to make SQL queries in a new table. I then made a view for the not-quite-so-power users to use these queries. I made them take optional parameters. The queries were run using Python's lower level DB-API which django is using under the hood for its ORM anyway.
The real trick was opening a read only database connection to run these queries just to make sure that no updates were ever run. I made a read only connection by creating a different user in the database with lower permissions and opening a specific connection for that in the view.
TL;DR - SQL is the way to go!
Depending on the form of your data, the types of queries your users need to use, and the frequency that your data is updated, an alternative to the pure SQL solution suggested by Nick Craig-Wood is to index your data in Solr and then run queries against it.
Solr is an added layer of complexity (configuration, data synchronization) but it is super-fast, can handle large datasets, and provides a (relatively) intuitive query language.
You could write your own SQL-ish language using pyparsing, actually. There is even pretty verbose example you could extend.
There is a Java paradigm for database access implemented in the Java DataSource. This object create a useful abstraction around the creation of database connections. The DataSource object keeps database configuration, but will only create database connections on request. This is allows you to keep all database configuration and initialization code in one place, and makes it easy to change database implementation, or use a mock database for testing.
I currently working on a Python project which uses cx_Oracle. In cx_Oracle, one gets a connection directly from the module:
import cx_Oracle as dbapi
connection = dbapi.connect(connection_string)
# At this point I am assuming that a real connection has been made to the database.
# Is this true?
I am trying to find a parallel to the DataSource in cx_Oracle. I can easily create this by creating a new class and wrapping cx_Oracle, but I was wondering if this is the right way to do it in Python.
You'll find relevant information of how to access databases in Python by looking at PEP-249: Python Database API Specification v2.0. cx_Oracle conforms to this specification, as do many database drivers for Python.
In this specification a Connection object represents a database connection, but there is no built-in pooling. Tools such as SQLAlchemy do provide pooling facilities, and although SQLAlchemy is often billed as an ORM, it does not have to be used as such and offers nice abstractions for use on top of SQL engines.
If you do want to do object-relational-mapping, then SQLAlchemy does the business, and you can consider either its own declarative syntax or another layer such as Elixir which sits on top of SQLAlchemy and provides increased ease of use for more common use cases.
I don't think there is a "right" way to do this in Python, except maybe to go one step further and use another layer between yourself and the database.
Depending on the reason for wanting to use the DataSource concept (which I've only ever come across in Java), SQLAlchemy (or something similar) might solve the problems for you, without you having to write something from scratch.
If that doesn't fit the bill, writing your own wrapper sounds like a reasonable solution.
Yes, Python has a similar abstraction.
This is from our local build regression test, where we assure that we can talk to all of our databases whenever we build a new python.
if database == SYBASE:
import Sybase
conn = Sybase.connect('sybasetestdb','mh','secret')
elif database == POSTRESQL:
import pgdb
conn = pgdb.connect('pgtestdb:mh:secret')
elif database == ORACLE:
import cx_Oracle
conn = cx_Oracle.connect("mh/secret#oracletestdb")
curs=conn.cursor()
curs.execute('select a,b from testtable')
for row in curs.fetchall():
print row
(note, this is the simple version, in our multidb-aware code we have a dbconnection class that has this logic inside.)
I just sucked it up and wrote my own. It allowed me to add things like abstracting the database (Oracle/MySQL/Access/etc), adding logging, error handling with transaction rollbacks, etc.
I'm starting a web project that likely should be fine with SQLite. I have SQLObject on top of it, but thinking long term here -- if this project should require a more robust (e.g. able to handle high traffic), I will need to have a transition plan ready. My questions:
How easy is it to transition from one DB (SQLite) to another (MySQL or Firebird or PostGre) under SQLObject?
Does SQLObject provide any tools to make such a transition easier? Is it simply take the objects I've defined and call createTable?
What about having multiple SQLite databases instead? E.g. one per visitor group? Does SQLObject provide a mechanism for handling this scenario and if so, what is the mechanism to use?
Thanks,
Sean
3) Is quite an interesting question. In general, SQLite is pretty useless for web-based stuff. It scales fairly well for size, but scales terribly for concurrency, and so if you are planning to hit it with a few requests at the same time, you will be in trouble.
Now your idea in part 3) of the question is to use multiple SQLite databases (eg one per user group, or even one per user). Unfortunately, SQLite will give you no help in this department. But it is possible. The one project I know that has done this before is Divmod's Axiom. So I would certainly check that out.
Of course, it would probably be much easier to just use a good concurrent DB like the ones you mention (Firebird, PG, etc).
For completeness:
1 and 2) It should be straightforward without you actually writing much code. I find SQLObject a bit restrictive in this department, and would strongly recommend SQLAlchemy instead. This is far more flexible, and if I was starting a new project today, I would certainly use it over SQLObject. It won't be moving "Objects" anywhere. There is no magic involved here, it will be transferring rows in tables in a database. Which as mentioned you could do by hand, but this might save you some time.
Your success with createTable() will depend on your existing underlying table schema / data types. In other words, how well SQLite maps to the database you choose and how SQLObject decides to use your data types.
The safest option may be to create the new database by hand. Then you'll have to deal with data migration, which may be as easy as instantiating two SQLObject database connections over the same table definitions.
Why not just start with the more full-featured database?
I'm not sure I understand the question.
The SQLObject documentation lists six kinds of connections available. Further, the database connection (or scheme) is specified in a connection string. Changing database connections from SQLite to MySQL is trivial. Just change the connection string.
The documentation lists the different kinds of schemes that are supported.