I am building a project requiring high performance and scalability, entailing:
Role-based authentication with API-key licensing to access data of specific users
API exposed with REST (XML, JSON), XMLRPC, JSONRPC and SOAP
"Easily" configurable getters and setters to create APIs accessing the same data but with input/output in different schemas
A conservative estimate of the number of tables—often whose queries require joins—is: 20.
Which database type—e.g.: NoSQL or DBMS—key-value data store or object-relational database—e.g.: Redis or PostgreSQL—and web-framework—e.g. Django, Web2Py or Flask—would you recommend?
The bullet point description you have provided would adequately describe just about any data-backed web site on the planet with an API. You haven't provided anywhere near enough detail to provide meaningful recommendations.
Each of the technologies you have mentioned have strengths and weaknesses. However, those strengths and weaknesses are only relevant in the context of a specific problem.
The choice of RDBMS over NoSQL solution depends on the type of data you're intending to store, and the relationship between those data. It depends on the read/write patterns that you're expecting to see. It depends on the volume of data you're expecting to be storing and retrieving. It depends on the specific set of features you need your data store to have.
The choice of web framework depends on the complexity of what you're building, and the resources the community around those tools can provide to support you building your site. That means the community of libraries, and the community of developers who you can call on to support your development.
In short, there is no silver bullet. Work out your requirements, and pick the tool or tools that suit those requirements. The answer won't always be the same.
The web frameworks you mention they all scale the same and with 20 tables they probably have similar performance. The bottle neck will always be the database. You mention joins. That means you need a relational database. I would not over-engineer it. Make it work. Polish it. Than make it scale but adding extensive caching.
Related
I'm about to start the development of a web analytics tool for an e-commerce website.
I'm going to log several different events, basically clicks on various elements of the page and page views.
These events carry metadata (username of the loggedin user, his country, his age, etc...) and the page itself carries other metadata (category, subcategory, product etc...).
My companies would like something like an OLAP cube, to be able to answer questions like:
How many customer from country x visited category y?
How many pageviews for category x in January 2012?
How many customer from country x visited category y?
My understanding is that I should use an OLAP engine to record these events, and then build a reporting interface to allow my colleagues to use it.
Am I right? Do you have advices on the engine and frontend/reporting tool I should use? I'm a Python programmer, so anything Python-friendly would be nice.
Thank you!
The main question is how big your cube is going to be and if you need an open source OLAP solution or not.
If you're dealing with big cubes and want to get room for future features you might go for a real OLAP Server. A few are open source - Mondrian - and other have a 'limited' community edition - Palo, icCube. The important point here is being compatible with MDX and XMLA. defacto OLAP standard, so you can plug different reporting tools and/or using existing libraries. My understanding, there is no Phyton version for an XMLA library as in Java or .NET not sure this is the way to go.
If you're cubes are small you can develop something on your own or go for other quicker solutions as the comment of Charlax is indicating.
As mentioned in the selected answer, it depends on your data amount. However, just you run into a case that a light-weight Python OLAP framework would be sufficient, then you might try Cubes, sources are on github. It contains SQL backend (any other might be implemented as well) and provides a light HTTP OLAP server. Example of an application (PHP front-end with HTTP Slicer OLAP server backend) using it can be found here It does not contain visualization layer and complex queries thought, but that is trade-off for being small.
EDIT: Since you are asking for specifics, consider a photo-sharing site (like Flickr or picasa - - I know that one uses PHP, while the other uses Python) for instance. If it proves to be successful, it needs to scale enormously. I hope this is specific enough.
It's been sometime since I heard any discussion on this, and since I am in the decision making process of choosing between Ruby and Python for a web project, here come the questions:
[1] Can current versions of Rails (Ruby) and Django (Python) query more than one database at a time?
[2] I also read on SO that "If your focus is building web-sites or web-applications go Ruby" (because it has fully featured, web-focused Rails). But that was about 2 years ago. What's the state of Python web framework Django today? Is it head-to-head with Rails now?
EDIT: [3] Don't know if I can ask this here, it's really surprising how quickly the Stack Exchange sites load. Do SE sites still use the same technology mentioned here? If not, does anyone have an update?
There's nothing in either of the languages that would prevent you from connecting to more than one database at a time. The real question is why would you want to?
The reason the StackOverflow sites are so fast is not so much the choice of technology but they way it's applied. Database optimization techniques are largely independent of the platform involved, just based on common-sense principles and proven methods of scaling.
Ruby on Rails offers a number of methods for connecting to multiple databases, though you might mean connecting to a system that's divided into shards, into multi-tenant partitions, or where different forms of data are stored on different databases. All of these approaches are supported, but they are quite different in implementation.
You should post a new question with an outline of the problem you're trying to solve if you want a specific answer.
Multi-database support exists in Django. In our Django project we have models pulling data from Postgres, MySQL, Oracle, and MS SQL Server (there are various issues depending on the database, but it all generally works). From what I've read, RoR supports multiple databases as well. Each framework comes with its own set of strengths and weaknesses which you have to evaluate against your particular needs and requirements. I don't think anyone can give you a (valid/useful) general answer without knowing the specifics of your situation.
Is it even possible to create an abstraction layer that can accommodate relational and non-relational databases? The purpose of this layer is to minimize repetition and allows a web application to use any kind of database by just changing/modifying the code in one place (ie, the abstraction layer). The part that sits on top of the abstraction layer must not need to worry whether the underlying database is relational (SQL) or non-relational (NoSQL) or whatever new kind of database that may come out later in the future.
There's a Summer of Code project going on right now to add non-relational support to Django's ORM. It seems to be going well and chances are good that it will be merged into core in time for Django 1.3.
You could use stock Django and Django-nonrel ( http://www.allbuttonspressed.com/projects/django-nonrel ) together to get a quite unified experience. Some limits apply, read docs carefully though, remembering Spolsky's "All abstractions are leaky".
Yo may also check web2py, they support relational databases and GAE on the core.
Regarding App Engine, all existing attempts limit you in some way (web2py doesn't support transactions or namespaces and probably many other stuff, for example). If you plan to work with GAE, use what GAE provides and forget looking for a SQL-NoSQL holy grail. Existing solutions are inevitably limited and affect performance negatively.
Thank you for all the answers. To summarize the answers, currently only web2py and Django supports this kind of abstraction.
It is not about a SQL-NoSQL holy grail, using abstraction can make the apps more flexible. Lets assume that you started a project using NoSQL, and then later on you need to switch over to SQL. It is desirable that you only make changes to the codes in a few spots instead of all over the place. For some cases, it does not really matter whether you store the data in a relational or non-relational db. For example, storing user profiles, text content for dynamic page, or blog entries.
I know there must be a trade off by using the abstraction, but my question is more about the existing solution or technical insight, instead of the consequences.
I am having to look into some code and consider working in a Python framework called Glashammer.
I know and love Django. I have some experience with Appengine native framework and Django on Appengine.
I'd like to know from you that have used one or more of those, how Glahammer compares and contrasts with others. What are there any Pros and Cons and what else do I need to be aware of.
I am highly biased, because I am the Glashammer author. But the pros for me are:
Werkzeug-based framework removes much of the boilerplate in creating Werkzeug based
applications
Easy pluggability and high flexibility: 2 levels of plugins, Appliances which are reusable components and Bundles which are behavioural
modifiers.
Well unit tested
Documentation is not bad (for an open source project)
Versus something like Django, I would just have to say "Werkzeug-based, with a nicer plugin framework."
Did I mention the code is beautiful like a glowing orb of ... (oh maybe this is subjective)
After a bit of googling (and finding your question:) and half an hour of reading docs and code I can say that
Glashammmer is great because it:
is well-documented;
is lightweight and very flexible;
provides almost everything to rapidly build a complex web app -- unlike Werkzeug itself;
does not suffer from NIH syndrome which is arguably the Django's greatest wart;
does not impose database-related libraries and thus supports whatever storage one could use from Python. Django only supports a number of relational databases and assumes you are happy with them. Of course you can drop Django ORM but this renders admin -- the Django's strongest point -- useless;
appliances are the best way to define views I've seen so far.
Galashammer is not so great because it:
has shorter development history and much smaller community than Django's, which leads to:
inevitably lower quality of core code, and
inevitably lower quantity of contributed code;
makes use of some components that may be unstable (e.g. flatland which is in alpha stage, though it's an arbitrary label and may be irrelevant to quality; moreover, it's only used in glashammer.utils.yconfig);
does not provide an API to define models (e.g. some declarative wrapper with backends), so the "pluginability" of applications can be significantly weaker that in Django (applications will make too many assumptions about the environment).
Anyway, I think this framework is worth diving into.
Closed. This question is opinion-based. It is not currently accepting answers.
Closed 1 year ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I have been using Turbogears 1 for prototyping small sites for the last couple of years and it is getting a little long in the tooth. Any suggestions on making the call between upgrading to Turbogears 2 or switching to something like Django? I'm torn between the familiarity of the TG community who are pretty responsive and do pretty good documentation vs the far larger community using Django. I am quite tempted by the built-in CMS features and the Google AppEngine support.
Any advice?
TG2 is built on top of Pylons which has a fairly large community as well. TG got faster compared to TG1 and it includes a per-method (not just web pages) caching engine.
I think it's more AJAX-friendly than Django by the way pages can be easly published in HTML or JSON .
2011 update: after 3 years of bloated frameworks I'm an happy user of http://bottlepy.org/
I have experience with both Django and TG1.1.
IMO, TurboGears strong point is it's ORM: SQLAlchemy. I prefer TurboGears when the database side of things is non-trivial.
Django's ORM is just not that flexible and powerful.
That being said, I prefer Django. If the database schema is a good fit with Django's ORM I would go with Django.
In my experience, it is simply less hassle to use Django compared with TurboGears.
I have been using Django for a year now and when I started I had no experience of Python or Django and found it very intuitive to use.
I have created a number of hobbyist Google App Engine apps using Django with the latest one being a CMS for my site. Using Django has meant that I have been able to code a lot quicker and with a lot less bugs.
Am sure you would have read from plenty of comparison between TurboGears and DJango on web.
But as for your temptation on CMS and GAE, i can really think you got to go DJango way.
Check these out, and decide youself.
Django with GAE
Django for CMS
Ive only got one question...is the app you are developing directed towards social networking
or customized business logic?
I personally find Django is good for social networking and pylons/turbogears if you really
want the flexibility and no boundaries...
just my 2c
TG2 seem much complicated and confusing, even for doing somewhat simple like a login page with multimple error messages
How to extend the Turbogears 2.1 login functionality
I think thats because of intemperance in modularity...
Django ORM uses the active record implementation – you’ll see this implementation in most ORMs. Basically what it means is that each row in the database is directly mapped to an object in the code and vice versa. ORM frameworks such as Django won’t require predefining the schema to use the properties in the code. You just use them, as the framework can ‘understand’ the structure by looking at the database schema. Also, you can just save the record to the database, as it’s mapped to a specific row in the table.
SQLAlchemy uses the Data Mapper implementation – When using this kind of implementation, there is a separation between the database structure and the objects structure (they are not 1:1 as in the Active Record implementation). In most cases, you’ll have to use another persistence layer to keep interact with the database (for example, to save the object). So you can’t just call the save() method as you can when using the Active Record implementation (which is a con) but on the other hand, you code doesn’t have to know the entire relational structure in the database to function, as there is no direct relationship between the code and the database.
So which of them wins this battle? None. It depends on what you’re trying to accomplish. It’s my believe that if your application is a mostly a CRUD (Create, Read, Update, Delete) application which no hard and complex rules to apply on the relationships between the different data entities, you should use the Active Record implementation (Django). It will allow you to easily and quickly set up an MVP for your product, without any hassle. If you have a lot of “business rules” and restrictions in your applications, you might be better with the Data Mapper model, as it won’t tie you up and force you to think strictly as Active Record does.