can anyone tell me and describe what oodbms implementation can be used for productive web application that stores all persistent data on it.
Maybe you should give ZODB a look. It is easy pretty easy to use, even if the user community is not as big as those of some other DBMSs.
After some research and benchmark. i found the MongoDB productive, because:
MongoDB (from "humongous") is a scalable, high-performance, open source NoSQL database. Written in C++, MongoDB features:
Document-oriented storage »
JSON-style documents with dynamic schemas offer simplicity and power.
Full Index Support »
Index on any attribute, just like you're used to.
Replication & High Availability »
Mirror across LANs and WANs for scale and peace of mind.
Auto-Sharding »
Scale horizontally without compromising functionality.
Querying »
Rich, document-based queries.
Fast In-Place Updates »
Atomic modifiers for contention-free performance.
Map/Reduce »
Flexible aggregation and data processing.
GridFS »
Store files of any size without complicating your stack.
Commercial Support »
Enterprise class support, training, and consulting available.
EDIT 1:
Packages that provides mongodb in python:
Humongolus
Humongolus is a lightweight ORM framework for Python and MongoDB. The name comes from the combination of MongoDB and Homunculus (the concept of a miniature though fully formed human body). Humongolus allows you to create models/schemas with robust validation. It attempts to be as pythonic as possible and exposes the pymongo cursor objects whenever possible. The code is available for download at github. Tutorials and usage examples are also available at GitHub.
MongoKit
The MongoKit framework is an ORM-like layer on top of PyMongo. There is also a MongoKit google group.
Ming
Ming (the Merciless) is a library that allows you to enforce schemas on a MongoDB database in your Python application. It was developed by SourceForge in the course of their migration to MongoDB. See the introductory blog post for more details.
MongoAlchemy
MongoAlchemy is another ORM-like layer on top of PyMongo. Its API is inspired by SQLAlchemy. The code is available on github; for more information, see the tutorial.
MongoEngine
MongoEngine is another ORM-like layer on top of PyMongo. It allows you to define schemas for documents and query collections using syntax inspired by the Django ORM. The code is available on github; for more information, see the tutorial.
Minimongo
minimongo is a lightweight, pythonic interface to MongoDB. It retains pymongo’s query and update API, and provides a number of additional features, including a simple document-oriented interface, connection pooling, index management, and collection & database naming helpers. The source is on github.
Related
There is a way to define MongoDB collection schema using mongoose in NodeJS. Mongoose verifies the schema at the time of running the queries.
I have been unable to find a similar thing for Motor in Python/Tornado. Is there a way to achieve a similar effect in Motor, or is there a package which can do that for me?
No there isn't. Motor is a MongoDB driver, it does basic operations but doesn't provide many conveniences. An Object Document Mapper (ODM) library like MongoTor, built on Motor, provides higher-level features like schema validation.
I don't vouch for MongoTor. Proceed with caution. Consider whether you really need an ODM: mongodb's raw data format is close enough to Python types that most applications don't need a layer between their code and the driver.
Currently (2019) this project Umongo https://github.com/Scille/umongo seems the more active and usefull if you need a sync/async Python MongoDB ODM. It work with multiple drivers like PyMongo or Motor for async.
Doc is here: http://umongo.readthedocs.io
Also you can use ODMantic as it has the best documentation and the engine supports motor client.
Currently (2023), Beanie [Github Link] is the best ODM I have ever used. It works beautifully with FASTAPI and is deeply integrated with Pydantic making it very easy to model data models.
They have a very nice documentation given in here
Now I want to use mongodb as my Python website backend storage, but I am wondering whether it's necessary to use an ODM such as MongoEngine? Or just use mongodb python driver directly?
Any good advice?
Is it strictly necessary? no - you can use the python driver directly without an ODM in the middle. If you prefer defining schemas and models to crafting/modifying your own schema via normal database operations, then an ODM is probably something you should look into.
A lot of people got used to using this kind of solution when mapping their development data model into a relational database (in that case an ORM). Because the MongoDB document model more closely maps to an object in your code (for example), you may feel you no longer need this mapping.
It can still be convenient though (as you can see from the popularity of mongoengine, mongoid, morphia and others) - the choice, in the end, is yours.
i'm working on a project (written in Django) which has only a few entities, but many rows for each entity.
In my application i have several static "reports", directly written in plain SQL. The users can also search the database via a generic filter form. Since the target audience is really tech-savvy and at some point the filter doesn't fit their needs, i think about creating a query language for my database like YQL or Jira's advanced search.
I found http://sourceforge.net/projects/littletable/ and http://www.quicksort.co.uk/DeeDoc.html, but it seems that they only operate on in-memory objects. Since the database can be too large for holding it in-memory, i would prefer that the query is translated in SQL (or better a Django query) before doing the actual work.
Are there any library or best practices on how to do this?
Writing such a DSL is actually surprisingly easy with PLY, and what ho—there's already an example available for doing just what you want, in Django. You see, Django has this fancy thing called a Q object which make the Django querying side of things fairly easy.
At DjangoCon EU 2012, Matthieu Amiguet gave a session entitled Implementing Domain-specific Languages in Django Applications in which he went through the process, right down to implementing such a DSL as you desire. His slides, which include all you need, are available on his website. The final code (linked to from the last slide, anyway) is available at http://www.matthieuamiguet.ch/media/misc/djangocon2012/resources/compiler.html.
Reinout van Rees also produced some good comments on that session. (He normally does!) These cover a little of the missing context.
You see in there something very similar to YQL and JQL in the examples given:
groups__name="XXX" AND NOT groups__name="YYY"
(modified > 1/4/2011 OR NOT state__name="OK") AND groups__name="XXX"
It can also be tweaked very easily; for example, you might want to use groups.name rather than groups__name (I would). This modification could be made fairly trivially (allow . in the FIELD token, by modifying t_FIELD, and then replacing . with __ before constructing the Q object in p_expression_ID).
So, that satisfies simple querying; it also gives you a good starting point should you wish to make a more complex DSL.
I've faced exactly this problem - a large database which needs searching. I made some static reports and several fancy filters using django (very easy with django) just like you have.
However the power users were clamouring for more. I decided that there already was a DSL that they all knew - SQL. The question was how to make it secure enough.
So I used django permissions to give the power users permission to make SQL queries in a new table. I then made a view for the not-quite-so-power users to use these queries. I made them take optional parameters. The queries were run using Python's lower level DB-API which django is using under the hood for its ORM anyway.
The real trick was opening a read only database connection to run these queries just to make sure that no updates were ever run. I made a read only connection by creating a different user in the database with lower permissions and opening a specific connection for that in the view.
TL;DR - SQL is the way to go!
Depending on the form of your data, the types of queries your users need to use, and the frequency that your data is updated, an alternative to the pure SQL solution suggested by Nick Craig-Wood is to index your data in Solr and then run queries against it.
Solr is an added layer of complexity (configuration, data synchronization) but it is super-fast, can handle large datasets, and provides a (relatively) intuitive query language.
You could write your own SQL-ish language using pyparsing, actually. There is even pretty verbose example you could extend.
I am building a project requiring high performance and scalability, entailing:
Role-based authentication with API-key licensing to access data of specific users
API exposed with REST (XML, JSON), XMLRPC, JSONRPC and SOAP
"Easily" configurable getters and setters to create APIs accessing the same data but with input/output in different schemas
A conservative estimate of the number of tables—often whose queries require joins—is: 20.
Which database type—e.g.: NoSQL or DBMS—key-value data store or object-relational database—e.g.: Redis or PostgreSQL—and web-framework—e.g. Django, Web2Py or Flask—would you recommend?
The bullet point description you have provided would adequately describe just about any data-backed web site on the planet with an API. You haven't provided anywhere near enough detail to provide meaningful recommendations.
Each of the technologies you have mentioned have strengths and weaknesses. However, those strengths and weaknesses are only relevant in the context of a specific problem.
The choice of RDBMS over NoSQL solution depends on the type of data you're intending to store, and the relationship between those data. It depends on the read/write patterns that you're expecting to see. It depends on the volume of data you're expecting to be storing and retrieving. It depends on the specific set of features you need your data store to have.
The choice of web framework depends on the complexity of what you're building, and the resources the community around those tools can provide to support you building your site. That means the community of libraries, and the community of developers who you can call on to support your development.
In short, there is no silver bullet. Work out your requirements, and pick the tool or tools that suit those requirements. The answer won't always be the same.
The web frameworks you mention they all scale the same and with 20 tables they probably have similar performance. The bottle neck will always be the database. You mention joins. That means you need a relational database. I would not over-engineer it. Make it work. Polish it. Than make it scale but adding extensive caching.
i wonder wether there is a solution (or a need for) an ORM with Graph-Database (f.e. Neo4j). I'm tracking relationships (A is related to B which is related to A via C etc., thus constructing a large graph) of entities (including additional attributes for those entities) and need to store them in a DB, and i think a graph database would fit this task perfectly.
Now, with sql-like DBs, i use sqlalchemyś ORM to store my objects, especially because of the fact that i can retrieve objects from the db and work with them in a pythonic style (use their methods etc.).
Is there any object-mapping solution for Neo4j or other Graph-DB, so that i can store and retrieve python objects into and from the Graph-DB and work with them easily?
Or would you write some functions or adapters like in the python sqlite documentation (http://docs.python.org/library/sqlite3.html#letting-your-object-adapt-itself) to retrieve and store objects?
Shameless plug... there is also my own ORM which you may also want to checkout: https://github.com/robinedwards/neomodel
It's built on top of py2neo, using cypher and rest API calls under hood, i.e no dependency on gremlin.
There are a couple choices in Python out there right now, based on databases' REST interfaces.
As I mentioned in the link #Peter provided, we're working on neo4django, which updates the old Neo4j/Django integration. It's a good choice if you need complex queries and want an ORM that will manage node indexing as well- or if you're already using Django. It works very similarly to the native Django ORM. Find it on PyPi or GitHub.
There's also a more general solution called Bulbflow that is supposed to work with any graph database supported by Blueprints. I haven't used it, but from what I've seen it focuses on domain modeling - Bulbflow already has working relationship models, for example, which we're still working on- but doesn't much support complex querying (as we do with Django querysets + index use). It also lets you work a bit closer to the graph.
Maybe you could take a look on Bulbflow, that allows to create models in Django, Flask or Pyramid. However, it works over a REST client instead of the python-binding provided by Neo4j, so perhaps it's not as fast as the native binding is.