Due to performance reasons I can't use the ORM query methods of Django and I have to use raw SQL for some complex questions. I want to find a way to map the results of a SQL query to several models.
I know I can use the following statement to map the query results to one model, but I can't figure how to use it to be able to map to related models (like I can do by using the select_related statement in Django).
model_instance = MyModel(**dict(zip(field_names, row_data)))
Is there a relatively easy way to be able to map fields of related tables that are also in the query result set?
First, can you prove the ORM is stopping your performance? Sometimes performance problems are simply poor database design, or improper indexes. Usually this comes from trying to force-fit Django's ORM onto a legacy database design. Stored procedures and triggers can have adverse impact on performance -- especially when working with Django where the trigger code is expected to be in the Python model code.
Sometimes poor performance is an application issue. This includes needless order-by operations being done in the database.
The most common performance problem is an application that "over-fetches" data. Casually using the .all() method and creating large in-memory collections. This will crush performance. The Django query sets have to be touched as little as possible so that the query set iterator is given to the template for display.
Once you choose to bypass the ORM, you have to fight out the Object-Relational Impedance Mismatch problem. Again. Specifically, relational "navigation" has no concept of "related": it has to be a first-class fetch of a relational set using foreign keys. To assemble a complex in-memory object model via SQL is simply hard. Circular references make this very hard; resolving FK's into collections is hard.
If you're going to use raw SQL, you have two choices.
Eschew "select related" -- it doesn't exist -- and it's painful to implement.
Invent your own ORM-like "select related" features. A common approach is to add stateful getters that (a) check a private cache to see if they've fetched the related object and if the object doesn't exist, (b) fetch the related object from the database and update the cache.
In the process of inventing your own stateful getters, you'll be reinventing Django's, and you'll probably discover that it isn't the ORM layer, but a database design or an application design issue.
Related
I am kind of new to sql and database & currently developing website in django framework.
During my reading of django documentation I have read about raw sql queries which are executed using Manager.raw() like below.
for p in Person.objects.raw('SELECT * FROM myapp_person'):
Manager.raw(raw_query, params=None, translations=None)
How does raw queries differes from normal sql queries & when should I use raw sql queries instead of Django ORM ?
Django (like other similar ORM tools) is a connection between relational databases and object-oriented programming. One of the very important functions that it implements is providing a uniform interface to the database -- regardless of the underlying database.
When you use underlying Django functionality, the code should be supported on any database (there may be specific limits on this). This makes it particularly easy to port to another database. It also helps ensure that the generated queries do what you intend.
When you use raw SQL, the code is likely to be specific to one database (creating a porting problem). The code is also not checked, which can result in hard-to-understand errors.
I have a strong preference for using SQL directly -- but that is because I am not a programmer using an ORM framework. If you are going to use such a framework, it is probably better to use the built-in functionality wherever possible.
This is a borderline opinion question so might get flagged, but it is a good point. Essentially the raw SQL queries are intended to only be used for the edge cases where the Django ORM does not fulfil your needs (and with each new version of Django it support more and more query types so raw becomes less useful).
In general I would suggest using the ORM for the more helpful error messages, maintainability, and plain ease of use, and only use raw as a last-resort
I have customized my Django project to work with mariaDB (mySQL).
Works fine, however I have issues (or concerns) with models.
First of all - I am not sure why I should need them if for me (personally) its much easier to use SQL statements to get the data.
Using API for DB queries might be useful for people who do not know SQL, but for me its less flexible.
Can anybody explain me main benefits of using models?
Here is one of the issues I have. See the code below.
class Quotes(models.Model):
updated = models.DateTimeField()
tdate = models.DateField(default='1900-01-01')
ticker = models.CharField(max_length=15)
open = models.FloatField(default=0)
vol = models.BigIntegerField(default=0)
why program does not consider 'default' when DB table and fields are created?
why - what I define as FloatField on DB is 'double' and not 'float' (I checked this using phpMYAdmin)
How can I properly set default value?
My table will have at least 1 million of entries.
Do I need to concern about performance using API instead of direct SQL queries? Usually one query will select 700-800 entries.
Is it good approach to use MySQLdb and direct SQL's instead of models?
sorry that some questions might sound too simple, but I just started with Django. Before this I worked with PHP. Main reason I want to use Python for web page development is library which I have developed.
Question zero, i.e. why models: Django's models are a nice abstraction on top of relational database tables – most (if not all) web apps end up having (or being) some sort of CRUD where you manipulate objects or graphs of objects saved in the database, so an object-oriented approach is nice to work with there.
In addition, many features in Django (and libraries that work with Django) are built around models (such as the admin, ModelForms, serialization, etc.).
Question 1: That date should preferably be datetime.date(1900, 1, 1), not a string, but that aside, Django deals with defaults on model instantiation, not necessarily in the database.
Question 2: Because that's how it's mapped, presumably to avoid programmers accidentally losing floating-point precision (since MySQL is rather notorious about doing precision-losing conversions "behind your back").
Question 3: Django's ORM is, to be absolutely honest, not the fastest when it generates queries and instantiates model instances. Most of the time, in regular operations, that's not a problem. Depending on what you're doing with those 700 to 800 instances, you may be able to work around that anyway; for instance, using .values() or .values_list() on a queryset if you don't need the actual instances, just the data.
Regarding direct SQL, please don't hard-code any MySQLdb calls in a Django app though; Django has very nice "escape hatches" for doing raw SQL:
You can perform .raw() SQL queries that map into models, or if that's not enough,
you can just execute SQL against the database connection like you would with MySQLdb.
Oh, and one more thing: your model name should be singular (Quote) :)
I'm learning Django and its ORM data access methodology and there is something that I'm curious about. In one particular endpoint, I'm making a number of database calls (to Postgres) - below is an example of one:
projects = Project.objects\
.filter(Q(first_appointment_scheduled=True) | (Q(active=True) & Q(phase=ProjectPhase.meet.value)))\
.select_related('customer__first_name', 'customer__last_name',
'lead_designer__user__first_name', 'lead_designer__user__last_name')\
.values('id')\
.annotate(project=F('name'),
buyer=Concat(F('customer__first_name'), Value(' '), F('customer__last_name')),
designer=Concat(F('lead_designer__user__first_name'), Value(' '), F('lead_designer__user__last_name')),
created=F('created_at'),
meeting=F('first_appointment_date'))\
.order_by('id')[:QUERY_SIZE]
As you can see, that's not a small query - I'm pulling in a lot of specific, related data and doing some string manipulation. I'm relatively concerned with performance so I'm doing the best I can to make things more efficient by using select_related() and values() to only get exactly what I need.
The question I have is, conceptually and in broad terms, at what point does it become faster to just write my queries using parameterized SQL instead of using the ORM (since the ORM has to first "translate" the above "mess")? At what approximate level of query complexity should I switch over to raw SQL?
Any insight would be helpful. Thanks!
The question I have is, conceptually and in broad terms, at what point
does it become faster to just write my queries using parameterized SQL
instead of using the ORM (since the ORM has to first "translate" the
above "mess")?
If you are asking about performance, Never.
The time taken to convert the ORM query into SQL will be very small compared to the time taken to actually execute that query. Brain cells are irreplaceable, servers are cheap.
If you are really do have performance issues the first place to look at is the your indexes in your models. Try printing out each of the queries generated by the ORM and run them in your psql console by prefixing EXPLAIN ANALYSE.
You can also use the django-debug-toolbar to automate this. In fact django-debug toolbar is an essential tool to hunt down bottlenecks. You will be surprised to note how often you have missed a simple select_related and how that causes hundreds of additional queries to be executed.
At what approximate level of query complexity should I switch over to
raw SQL?
if you are asking about the ease of coding, it depends.
If the query is very very hard to write using the ORM and it's unreadable, yes, then it's perfectly fine to use a raw query. For example a query that has multiple aggregations, uses common table expressions, multiple joins etc can sometimes be hard to write as an ORM query, in that case if you are comfortable with raw sql writing it that way is fine.
Agreed with what #e4c5 said .
Additional translation layer for converting an ORM query to raw SQL query will effect performance.
However, this effect will depend on how much complex your query is?
When you use ORM, you can control the load on DB by increasing the processing in the app. In addition, this gives the opportunity to cache the result in the application itself.
At last, It totally depends on your schema , how complex your queries can be and how are you scaling your DB(Indices, replicas etc .)
For more read here
ORM tools are great when the queries we need are simple select or insert clauses.
But sometimes we may have to fall back to use raw SQL queries, because we may need to make queries so complex that simply using the ORM API can not give us an efficient and effective solution.
What do you do to deal with the difference between objects returned from raw queries and orm queries?
I personally strive to design my models so I don't have to deffer to writing raw SQL queries, or fallback to mixing in the ContentTypes framework for complex relationships, so I have no experience on the topic.
The documentation covers the topic of the APIs for performing raw SQL queries. You can either use the Manager.raw() on your models (MyModel.objects.raw()), for queries where you can map columns back to actual model fields, or user cursor to query raw rows on your database connection.
If your going to use Manager.raw(), you'll work with the RawQuerySet instead of the usual QuerySet. For all things concerned, when working with result objects the two emulate containers identically, but the QuerySet is a more feature-packed monad.
I can imagine that performing raw SQL queries in Django would be more rewarding than working with a framework with no ORM support—Django can manage your database schema and provide you with a database connection and you'll only have to manually create queries and position query arguments. The resulting rows can be accessed as lists or dictionaries, both of which make it suitable for displaying in templates or performing additional lifting.
SQLAlchemy allows a fair bit of complexity in formulating queries so you can usually get away without raw sql. If you do need to dip down to raw sql, you can use connection.execute. But there are helpers like the text and select functions to make this easier when programming. As far as dealing with the returned objects, you get a list of tuples which are simple to deal with in a pythonic way.
In general, if you need to treat rows (list of tuples, etc) as what your ORM returns, one approach would be to write an adapter class which mimics a queryset interface. This could be initialized with the "schema" of the returned tuples, and then iterate and return objects with properties instead of tuples. I haven't really needed this, but I can see how it might be useful if, for example, you have a framework which relies on querysets being passed around.
I've just started learning Python Django and have a lot of experience building high traffic websites using PHP and MySQL. What worries me so far is Python's overly optimistic approach that you will never need to write custom SQL and that it automatically creates all these Foreign Key relationships in your database. The one thing I've learned in the last few years of building Chess.com is that its impossible to NOT write custom SQL when you're dealing with something like MySQL that frequently needs to be told what indexes it should use (or avoid), and that Foreign Keys are a death sentence. Percona's strongest recommendation was for us to remove all FKs for optimal performance.
Is there a way in Django to do this in the models file? create relationships without creating actual DB FKs? Or is there a way to start at the database level, design/create my database, and then have Django reverse engineer the models file?
If you don't want foreign keys, then avoid using
models.ForeignKey(),
models.ManyToManyField(), and
models.OneToOneField().
Django will automatically create an auto-increment int field named id that you can use to refer to individual records, or you can override that by marking a field as primary_key=True.
There is also documentation on running raw SQL queries on the database.
Raw SQL is as easy as this :
for obj in MyModel.objects.raw('SELECT * FROM myapp_mymodel'):
print obj
Denormalizing a database is up to you at model definition time.
You can use non-relational databases (MongoDB, ...) too with Django NonRel
django-admin inspectdb allows you to reverse engineer a models file from existing tables. That is only a very partial response to your question ;)
You can just create the model.py and avoid having SQL Alchemy automatically create the tables leaving it up to you to define the actual tables as you please. So although there are foreign key relationships in the model.py this does not mean that they must exist in the actual tables. This is a very good thing considering how ludicrously foreign key constraints are implemented in MySQL - MyISAM just ignores them and InnoDB creates a non-optional index on every single one regardless of whether it makes sense.
I concur with the 'no foreign keys' advice (with the disclaimer: I also work for Percona).
The reason why it is is recommended is for concurrency / reducing locking internally.
It can be a difficult "optimization" to sell, but if you consider that the database has transactions (and is more or less ACID compliant) then it should only be application-logic errors that cause foreign-key violations. Not to say they don't exist, but if you enable foreign keys in development hopefully you should find at least a few bugs.
In terms of whether or not you need to write custom SQL:
The explanation I usually give is that "optimization rarely decreases complexity". I think it is okay to stick with an ORM by default, but if in a profiler it looks like one particular piece of functionality is taking a lot more time than you suspect it would when written by hand, then you need to be prepared to fix it (assuming the code is called often enough).
The real secret here is that you need good instrumentation / profiling in order to be frugal with your complexity adding optimization(s).