django - model - what are the benefits?

django - model - what are the benefits? - python

I have customized my Django project to work with mariaDB (mySQL).
Works fine, however I have issues (or concerns) with models.
First of all - I am not sure why I should need them if for me (personally) its much easier to use SQL statements to get the data.
Using API for DB queries might be useful for people who do not know SQL, but for me its less flexible.
Can anybody explain me main benefits of using models?
Here is one of the issues I have. See the code below.
class Quotes(models.Model):
updated = models.DateTimeField()
tdate = models.DateField(default='1900-01-01')
ticker = models.CharField(max_length=15)
open = models.FloatField(default=0)
vol = models.BigIntegerField(default=0)
why program does not consider 'default' when DB table and fields are created?
why - what I define as FloatField on DB is 'double' and not 'float' (I checked this using phpMYAdmin)
How can I properly set default value?
My table will have at least 1 million of entries.
Do I need to concern about performance using API instead of direct SQL queries? Usually one query will select 700-800 entries.
Is it good approach to use MySQLdb and direct SQL's instead of models?
sorry that some questions might sound too simple, but I just started with Django. Before this I worked with PHP. Main reason I want to use Python for web page development is library which I have developed.

Question zero, i.e. why models: Django's models are a nice abstraction on top of relational database tables – most (if not all) web apps end up having (or being) some sort of CRUD where you manipulate objects or graphs of objects saved in the database, so an object-oriented approach is nice to work with there.
In addition, many features in Django (and libraries that work with Django) are built around models (such as the admin, ModelForms, serialization, etc.).
Question 1: That date should preferably be datetime.date(1900, 1, 1), not a string, but that aside, Django deals with defaults on model instantiation, not necessarily in the database.
Question 2: Because that's how it's mapped, presumably to avoid programmers accidentally losing floating-point precision (since MySQL is rather notorious about doing precision-losing conversions "behind your back").
Question 3: Django's ORM is, to be absolutely honest, not the fastest when it generates queries and instantiates model instances. Most of the time, in regular operations, that's not a problem. Depending on what you're doing with those 700 to 800 instances, you may be able to work around that anyway; for instance, using .values() or .values_list() on a queryset if you don't need the actual instances, just the data.
Regarding direct SQL, please don't hard-code any MySQLdb calls in a Django app though; Django has very nice "escape hatches" for doing raw SQL:
You can perform .raw() SQL queries that map into models, or if that's not enough,
you can just execute SQL against the database connection like you would with MySQLdb.
Oh, and one more thing: your model name should be singular (Quote) :)

Related

The maximum number of objects that can be instantiated with a Django model?

I wrote an app to record the user interactions with the website search box,
the query string is saved as an object of the model SearchQuery. Whenever a user enters some data in the search box, I can save the search query and some info related to the query on the database.
This is for the idea of getting the search trends,
the fields in my database models are,
A Character Field (max_length=30)
A PositiveIntegerField
A BooleanField
My Questions are,
How many objects can be instantiated from the model SearchQuery? If there is a limit on numbers?
As the objects are not related (no db relationships) should I use MongoDB or some kind of NoSQLs for performance?
Is this a good design or should I do some more work to make it efficient?
Django version 1.6.5
Python version 2.7

How many objects can be instantiated from the model SearchQuery? If there is a limit on numbers?
As many as your chosen database can handle, this is probably in the millions. If you are concerned you can use a scheduler to delete older queries when they are no longer useful.
As the objects are not related (no db relationships) should I use MongoDB or some kind of NoSQLs for performance?
Could you, but its unlikely to give you much (if any efficiency gains). Because you are doing frequent writes and (presumably) infrequent reads, then its unlikely to hit the database very hard at all.
Is this a good design or should I do some more work to make it efficient?
There are probably two recommendations I'd make.
a. If you are going to be doing frequent reads on the Search log, look at using multiple databases. One for your log, and one for everything else.
b. Consider just using a regular log file for this information. Again, you will probably only be examining this data infrequently. So there are strng arguments to piping it into a log file, probably CSV-like, to make data analysis of it easier.

Building a DSL query language

i'm working on a project (written in Django) which has only a few entities, but many rows for each entity.
In my application i have several static "reports", directly written in plain SQL. The users can also search the database via a generic filter form. Since the target audience is really tech-savvy and at some point the filter doesn't fit their needs, i think about creating a query language for my database like YQL or Jira's advanced search.
I found http://sourceforge.net/projects/littletable/ and http://www.quicksort.co.uk/DeeDoc.html, but it seems that they only operate on in-memory objects. Since the database can be too large for holding it in-memory, i would prefer that the query is translated in SQL (or better a Django query) before doing the actual work.
Are there any library or best practices on how to do this?

Writing such a DSL is actually surprisingly easy with PLY, and what ho—there's already an example available for doing just what you want, in Django. You see, Django has this fancy thing called a Q object which make the Django querying side of things fairly easy.
At DjangoCon EU 2012, Matthieu Amiguet gave a session entitled Implementing Domain-specific Languages in Django Applications in which he went through the process, right down to implementing such a DSL as you desire. His slides, which include all you need, are available on his website. The final code (linked to from the last slide, anyway) is available at http://www.matthieuamiguet.ch/media/misc/djangocon2012/resources/compiler.html.
Reinout van Rees also produced some good comments on that session. (He normally does!) These cover a little of the missing context.
You see in there something very similar to YQL and JQL in the examples given:
groups__name="XXX" AND NOT groups__name="YYY"
(modified > 1/4/2011 OR NOT state__name="OK") AND groups__name="XXX"
It can also be tweaked very easily; for example, you might want to use groups.name rather than groups__name (I would). This modification could be made fairly trivially (allow . in the FIELD token, by modifying t_FIELD, and then replacing . with __ before constructing the Q object in p_expression_ID).
So, that satisfies simple querying; it also gives you a good starting point should you wish to make a more complex DSL.

I've faced exactly this problem - a large database which needs searching. I made some static reports and several fancy filters using django (very easy with django) just like you have.
However the power users were clamouring for more. I decided that there already was a DSL that they all knew - SQL. The question was how to make it secure enough.
So I used django permissions to give the power users permission to make SQL queries in a new table. I then made a view for the not-quite-so-power users to use these queries. I made them take optional parameters. The queries were run using Python's lower level DB-API which django is using under the hood for its ORM anyway.
The real trick was opening a read only database connection to run these queries just to make sure that no updates were ever run. I made a read only connection by creating a different user in the database with lower permissions and opening a specific connection for that in the view.
TL;DR - SQL is the way to go!

Depending on the form of your data, the types of queries your users need to use, and the frequency that your data is updated, an alternative to the pure SQL solution suggested by Nick Craig-Wood is to index your data in Solr and then run queries against it.
Solr is an added layer of complexity (configuration, data synchronization) but it is super-fast, can handle large datasets, and provides a (relatively) intuitive query language.

You could write your own SQL-ish language using pyparsing, actually. There is even pretty verbose example you could extend.

Converting Django project from MySQL to Mongo, any major pitfalls?

I want to try Mongodb w/ mongoengine. I'm new to Django and databases and I'm having a fit with Foreign Keys, Joins, Circular Imports (you name it). I know I could eventually work through these issues but Mongo just seems like a simpler solution for what I am doing. My question is I'm using a lot of pluggable apps (Imagekit, Haystack, Registration, etc) and wanted to know if these apps will continue to work if I make the switch. Are there any known headaches that I will encounter, if so I might just keep banging my head with MySQL.

There's no reason why you can't use one of the standard RDBMSs for all the standard Django apps, and then Mongo for your app. You'll just have to replace all the standard ways of processing things from the Django ORM with doing it the Mongo way.
So you can keep urls.py and its neat pattern matching, views will still get parameters, and templates can still take objects.
You'll lose querysets because I suspect they are too closely tied to the RDBMS models - but they are just lazily evaluated lists really. Just ignore the Django docs on writing models.py and code up your database business logic in a Mongo paradigm.
Oh, and you won't have the Django Admin interface for easy access to your data.

You might want to check out django-nonrel, which is a young but promising attempt at a NoSQL backend for Django. Documentation is lacking at the moment, but it works great if you just work it out.

I've used mongoengine with django but you need to create a file like mongo_models.py for example. In that file you define your Mongo documents. You then create forms to match each Mongo document. Each form has a save method which inserts or updates whats stored in Mongo. Django forms are designed to plug into any data back end ( with a bit of craft )
BEWARE: If you have very well defined and structured data that can be described in documents or models then don't use Mongo. Its not designed for that and something like PostGreSQL will work much better.
I use PostGreSQL for relational or well structured data because its good for that. Small memory footprint and good response.
I use Redis to cache or operate in memory queues/lists because its very good for that. great performance providing you have the memory to cope with it.
I use Mongo to store large JSON documents and to perform Map and reduce on them ( if needed ) because its very good for that. Be sure to use indexing on certain columns if you can to speed up lookups.
Don't circle to fill a square hole. It won't fill it.
I've seen too many posts where someone wanted to swap a relational DB for Mongo because Mongo is a buzz word. Don't get me wrong, Mongo is really great... when you use it appropriately. I love using Mongo appropriately

Upfront, it won't work for any existing Django app that ships it's models. There's no backend for storing Django's Model data in mongodb or other NoSQL storages at the moment and, database backends aside, models themselves are somewhat of a moot point, because once you get in to using someones app (django.contrib apps included) that ships model-template-view triads, whenever you require a slightly different model for your purposes you either have to edit the application code (plain wrong), dynamically edit the contents of imported Python modules at runtime (magical), fork the application source altogether (cumbersome) or provide additional settings (good, but it's a rare encounter, with django.contrib.auth probably being the only widely known example of an application that allows you to dynamically specify which model it will use, as is the case with user profile models through the AUTH_PROFILE_MODULE setting).
This might sound bad, but what it really means is that you'll have to deploy SQL and NoSQL databases in parallel and go from an app-to-app basis--like Spacedman suggested--and if mongodb is the best fit for a certain app, hell, just roll your own custom app.
There's a lot of fine Djangonauts with NoSQL storages on their minds. If you followed the streams from the past Djangocon presentations, every year there's been important discussions about how Django should leverage NoSQL storages. I'm pretty sure, in this year or the next, someone will refactor the apps and models API to pave the path to a clean design that can finally unify all the different flavors of NoSQL storages as part of the Django core.

I have recently tried this (although without Mongoengine). There are a huge number of pitfalls, IMHO:
No admin interface.
No Auth django.contrib.auth relies on the DB interface.
Many things rely on django.contrib.auth.User. For example, the RequestContext class. This is a huge hindrance.
No Registration (Relies on the DB interface and django.contrib.auth)
Basically, search through the django interface for references to django.contrib.auth and you'll see how many things will be broken.
That said, it's possible that MongoEngine provides some support to replace/augment django.contrib.auth with something better, but there are so many things that depend on it that it's hard to say how you'd monkey patch something that much.

Primary pitfall (for me): no JOINs!

Does Python Django support custom SQL and denormalized databases with no Foreign Key relationships?

I've just started learning Python Django and have a lot of experience building high traffic websites using PHP and MySQL. What worries me so far is Python's overly optimistic approach that you will never need to write custom SQL and that it automatically creates all these Foreign Key relationships in your database. The one thing I've learned in the last few years of building Chess.com is that its impossible to NOT write custom SQL when you're dealing with something like MySQL that frequently needs to be told what indexes it should use (or avoid), and that Foreign Keys are a death sentence. Percona's strongest recommendation was for us to remove all FKs for optimal performance.
Is there a way in Django to do this in the models file? create relationships without creating actual DB FKs? Or is there a way to start at the database level, design/create my database, and then have Django reverse engineer the models file?

If you don't want foreign keys, then avoid using
models.ForeignKey(),
models.ManyToManyField(), and
models.OneToOneField().
Django will automatically create an auto-increment int field named id that you can use to refer to individual records, or you can override that by marking a field as primary_key=True.
There is also documentation on running raw SQL queries on the database.

Raw SQL is as easy as this :
for obj in MyModel.objects.raw('SELECT * FROM myapp_mymodel'):
print obj
Denormalizing a database is up to you at model definition time.
You can use non-relational databases (MongoDB, ...) too with Django NonRel

django-admin inspectdb allows you to reverse engineer a models file from existing tables. That is only a very partial response to your question ;)

You can just create the model.py and avoid having SQL Alchemy automatically create the tables leaving it up to you to define the actual tables as you please. So although there are foreign key relationships in the model.py this does not mean that they must exist in the actual tables. This is a very good thing considering how ludicrously foreign key constraints are implemented in MySQL - MyISAM just ignores them and InnoDB creates a non-optional index on every single one regardless of whether it makes sense.

I concur with the 'no foreign keys' advice (with the disclaimer: I also work for Percona).
The reason why it is is recommended is for concurrency / reducing locking internally.
It can be a difficult "optimization" to sell, but if you consider that the database has transactions (and is more or less ACID compliant) then it should only be application-logic errors that cause foreign-key violations. Not to say they don't exist, but if you enable foreign keys in development hopefully you should find at least a few bugs.
In terms of whether or not you need to write custom SQL:
The explanation I usually give is that "optimization rarely decreases complexity". I think it is okay to stick with an ORM by default, but if in a profiler it looks like one particular piece of functionality is taking a lot more time than you suspect it would when written by hand, then you need to be prepared to fix it (assuming the code is called often enough).
The real secret here is that you need good instrumentation / profiling in order to be frugal with your complexity adding optimization(s).

Map raw SQL to multiple related Django models

Due to performance reasons I can't use the ORM query methods of Django and I have to use raw SQL for some complex questions. I want to find a way to map the results of a SQL query to several models.
I know I can use the following statement to map the query results to one model, but I can't figure how to use it to be able to map to related models (like I can do by using the select_related statement in Django).
model_instance = MyModel(**dict(zip(field_names, row_data)))
Is there a relatively easy way to be able to map fields of related tables that are also in the query result set?

First, can you prove the ORM is stopping your performance? Sometimes performance problems are simply poor database design, or improper indexes. Usually this comes from trying to force-fit Django's ORM onto a legacy database design. Stored procedures and triggers can have adverse impact on performance -- especially when working with Django where the trigger code is expected to be in the Python model code.
Sometimes poor performance is an application issue. This includes needless order-by operations being done in the database.
The most common performance problem is an application that "over-fetches" data. Casually using the .all() method and creating large in-memory collections. This will crush performance. The Django query sets have to be touched as little as possible so that the query set iterator is given to the template for display.
Once you choose to bypass the ORM, you have to fight out the Object-Relational Impedance Mismatch problem. Again. Specifically, relational "navigation" has no concept of "related": it has to be a first-class fetch of a relational set using foreign keys. To assemble a complex in-memory object model via SQL is simply hard. Circular references make this very hard; resolving FK's into collections is hard.
If you're going to use raw SQL, you have two choices.
Eschew "select related" -- it doesn't exist -- and it's painful to implement.
Invent your own ORM-like "select related" features. A common approach is to add stateful getters that (a) check a private cache to see if they've fetched the related object and if the object doesn't exist, (b) fetch the related object from the database and update the cache.
In the process of inventing your own stateful getters, you'll be reinventing Django's, and you'll probably discover that it isn't the ORM layer, but a database design or an application design issue.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.