i wonder wether there is a solution (or a need for) an ORM with Graph-Database (f.e. Neo4j). I'm tracking relationships (A is related to B which is related to A via C etc., thus constructing a large graph) of entities (including additional attributes for those entities) and need to store them in a DB, and i think a graph database would fit this task perfectly.
Now, with sql-like DBs, i use sqlalchemyś ORM to store my objects, especially because of the fact that i can retrieve objects from the db and work with them in a pythonic style (use their methods etc.).
Is there any object-mapping solution for Neo4j or other Graph-DB, so that i can store and retrieve python objects into and from the Graph-DB and work with them easily?
Or would you write some functions or adapters like in the python sqlite documentation (http://docs.python.org/library/sqlite3.html#letting-your-object-adapt-itself) to retrieve and store objects?
Shameless plug... there is also my own ORM which you may also want to checkout: https://github.com/robinedwards/neomodel
It's built on top of py2neo, using cypher and rest API calls under hood, i.e no dependency on gremlin.
There are a couple choices in Python out there right now, based on databases' REST interfaces.
As I mentioned in the link #Peter provided, we're working on neo4django, which updates the old Neo4j/Django integration. It's a good choice if you need complex queries and want an ORM that will manage node indexing as well- or if you're already using Django. It works very similarly to the native Django ORM. Find it on PyPi or GitHub.
There's also a more general solution called Bulbflow that is supposed to work with any graph database supported by Blueprints. I haven't used it, but from what I've seen it focuses on domain modeling - Bulbflow already has working relationship models, for example, which we're still working on- but doesn't much support complex querying (as we do with Django querysets + index use). It also lets you work a bit closer to the graph.
Maybe you could take a look on Bulbflow, that allows to create models in Django, Flask or Pyramid. However, it works over a REST client instead of the python-binding provided by Neo4j, so perhaps it's not as fast as the native binding is.
Related
I am looking for a generic way to store python objects in a database. Of course I could just pickle the objects, but that way I would have binary blobs in my database. That way I can not search my objects. Also it seems to be easier to put it together with other applications.
So in my fantasy, I have on object like
class myClass
data1=1
data2='foobar'
data3=some_html_object
...
and could do something like
mydata=myClass()
mydata.add_data(various_things)
mydata.save_to_database()
and would end up with a database which has colums called data1,data2, data3, where I have the values of the of the objects attributes in the rows stored as text which would be searchable. Of course some inital setup would have to be done.
And of course it would be nice if I could plug any database I want (well, at least not just one database) and would not be bothered with the details.
Now of course I could program my own framework to let me do this, but I was hoping that this has been done bevore by someone else :)
Any suggestions?
Your fantasy in fact exists!
You describe something called the Active Record Pattern. It is usually implemented by using Object-Relational Mapping. One common solution for Python is SQLAlchemy, but Storm is somehow popular too:
See What are some good Python ORM solutions?
If your are developing for the Web, Django possess its own ORM.
It sounds like what you want is an Object Relational Mapper (ORM) to map SQL tables to objects.
The most popular ORMs that support different dialects by community are the following:
Python -- SQLAlchemy, Storm, Django (built into web framework)
Ruby -- ActiveRecord, Sequel
Node -- Sequelize
For a specific example of implementing what you described in Python using SQLAlchemy, check out this blog post that walks through a simple example
Now I want to use mongodb as my Python website backend storage, but I am wondering whether it's necessary to use an ODM such as MongoEngine? Or just use mongodb python driver directly?
Any good advice?
Is it strictly necessary? no - you can use the python driver directly without an ODM in the middle. If you prefer defining schemas and models to crafting/modifying your own schema via normal database operations, then an ODM is probably something you should look into.
A lot of people got used to using this kind of solution when mapping their development data model into a relational database (in that case an ORM). Because the MongoDB document model more closely maps to an object in your code (for example), you may feel you no longer need this mapping.
It can still be convenient though (as you can see from the popularity of mongoengine, mongoid, morphia and others) - the choice, in the end, is yours.
i'm working on a project (written in Django) which has only a few entities, but many rows for each entity.
In my application i have several static "reports", directly written in plain SQL. The users can also search the database via a generic filter form. Since the target audience is really tech-savvy and at some point the filter doesn't fit their needs, i think about creating a query language for my database like YQL or Jira's advanced search.
I found http://sourceforge.net/projects/littletable/ and http://www.quicksort.co.uk/DeeDoc.html, but it seems that they only operate on in-memory objects. Since the database can be too large for holding it in-memory, i would prefer that the query is translated in SQL (or better a Django query) before doing the actual work.
Are there any library or best practices on how to do this?
Writing such a DSL is actually surprisingly easy with PLY, and what ho—there's already an example available for doing just what you want, in Django. You see, Django has this fancy thing called a Q object which make the Django querying side of things fairly easy.
At DjangoCon EU 2012, Matthieu Amiguet gave a session entitled Implementing Domain-specific Languages in Django Applications in which he went through the process, right down to implementing such a DSL as you desire. His slides, which include all you need, are available on his website. The final code (linked to from the last slide, anyway) is available at http://www.matthieuamiguet.ch/media/misc/djangocon2012/resources/compiler.html.
Reinout van Rees also produced some good comments on that session. (He normally does!) These cover a little of the missing context.
You see in there something very similar to YQL and JQL in the examples given:
groups__name="XXX" AND NOT groups__name="YYY"
(modified > 1/4/2011 OR NOT state__name="OK") AND groups__name="XXX"
It can also be tweaked very easily; for example, you might want to use groups.name rather than groups__name (I would). This modification could be made fairly trivially (allow . in the FIELD token, by modifying t_FIELD, and then replacing . with __ before constructing the Q object in p_expression_ID).
So, that satisfies simple querying; it also gives you a good starting point should you wish to make a more complex DSL.
I've faced exactly this problem - a large database which needs searching. I made some static reports and several fancy filters using django (very easy with django) just like you have.
However the power users were clamouring for more. I decided that there already was a DSL that they all knew - SQL. The question was how to make it secure enough.
So I used django permissions to give the power users permission to make SQL queries in a new table. I then made a view for the not-quite-so-power users to use these queries. I made them take optional parameters. The queries were run using Python's lower level DB-API which django is using under the hood for its ORM anyway.
The real trick was opening a read only database connection to run these queries just to make sure that no updates were ever run. I made a read only connection by creating a different user in the database with lower permissions and opening a specific connection for that in the view.
TL;DR - SQL is the way to go!
Depending on the form of your data, the types of queries your users need to use, and the frequency that your data is updated, an alternative to the pure SQL solution suggested by Nick Craig-Wood is to index your data in Solr and then run queries against it.
Solr is an added layer of complexity (configuration, data synchronization) but it is super-fast, can handle large datasets, and provides a (relatively) intuitive query language.
You could write your own SQL-ish language using pyparsing, actually. There is even pretty verbose example you could extend.
We're trying to set up a Pyramid project that will use MySQL instead of SQLAlchemy.
My experience with Pyramid/Python is limited, so I was hoping to find a guide online. Unfortunately, I haven't been able to find anything to push us in the right direction. Most search results were for people trying to use raw SQL/MySQL commands with SQLAlchemy (many were re-posted links).
Anyone have a useful tutorial on this?
Pyramid at its base does not assume that you will use any one specific library to help you with your persistence. In order to make things easier, then, for people who DO wish to use libraries such as SQLALchemy, the Pyramid library contains Scaffolding, which is essentially some auto-generated code for a basic site, with some additions to set up items like SQLAlchemy or a specific routing strategy. The pyramid documentation should be able to lead you through creating a new project using the "pyramid_starter" scaffolding, which sets up the basic site without SQLAlchemy.
This will give you the basics you need to set up your views, but next you will need to add code to allow you to connect to a database. Luckily, since your site is just python code, learning how to use MySQL in Pyramid is simply learning how to use MySQL in Python, and then doing the exact same steps within your Pyramid project.
Keep in mind that even if you'd rather use raw SQL queries, you might still find some usefulness in SQLAlchemy. At it's base level, SQLAlchemy simply wraps around the DBAPI calls and adds in useful features like connection pooling. The ORM functionality is actually a large addition to the tight lower-level SQLAlchemy toolset.
sqlalchemy does not make any assumption that you will be using it's orm. If you wish to use plain sql, you can do so, with nothing more than what sqlalchemy already provides. For instance, if you followed the recipe in the cookbook, you would have access to the sqlalchemy session object as request.db, your handler would look something like this:
def someHandler(request):
rows = request.db.execute("SELECT * FROM foo").fetchall()
The Quick Tutorial shows a Pyramid application that uses SQL but not SQLAlchemy. It uses SQLite, but should be reasonably easy to adapt for MySQL.
I have a script with several functions that all need to make database calls. I'm trying to get better at writing clean code rather than just throwing together scripts with horrible style. What is generally considered the best way to establish a global database connection that can be accessed anywhere in the script but is not susceptible to errors such as accidentally redefining the variable holding a connection. I'd imagine I should be putting everything in a module? Any links to actual code would be very useful as well. Thanks.
If you are working with Python and databases, you cannot afford not to look at SQLAlchemy:
SQLAlchemy is the Python SQL toolkit
and Object Relational Mapper that
gives application developers the full
power and flexibility of SQL.
It provides a full suite of well known
enterprise-level persistence patterns,
designed for efficient and
high-performing database access,
adapted into a simple and Pythonic
domain language.
I have built very complex databases with a surprisingly small amount of code (a few hundred lines). The schema definition is almost self-documenting, the objects used for the Object Relational Mapper are Plain Old Python Objects (i.e., what you already have), and the querying API is almost obvious. In addition, the documentation is excellent: many online examples, fully documented API, and an O'Reilly book which, while far from perfect, does take you from zero to dangerous in a few evenings.
If you don't want to use the Object Relational Mapper, you can always fall back to plain connections and literal SQL. Also, the code is portable and database independent (the same code will work with MySQL, Oracle, SQLite, and other database managers).
The Session object will automatically take care of the pooling (what you mention as your concern).
The best way to understand its power is probably to follow the tutorials obtained in the first result page of the Google query sqlalchemy tutorial.
Use a model system/ORM system.