Django select related performance

Django select related performance - python

So I am debating whether or not to use Django's select_related or not for performance issues.
In the documentation, it says that this is a "performance booster" because it does not need to query the database anymore, but that would clearly mean it has to store a lot more data locally, which can be exhaustive if you need to do a lot of separate calls for a lot of different users.
What are the pros and cons of performance with Django's select_related? And when should (or shouldn't) it be used?

If you dont use select_related or not you will eat memory each time you access a related object, so if you have to access related objects it won't make that much of a difference wrt/ memory usage and can indeed save a lot of db access cost - specially if your db server is not on the same node as your django instance(s). To make a long story short:
as a general guideline: use select_related (with appropriate params to limit what relationships should be followed) when you know you'll need the related object.
if in doubt, don't try to guess, test and profile (yes it requires quite some infrastructure to do proper testing and profiling here but hey, that's how it is).
My own experience: careful use of select_related can vastly improve execution time, never had a problem with memory but we usually do our best to avoid loading millions of rows when we just need a couple ones (doing proper filtering, slicing etc before the query is actually eval'd).

Related

Do Django filters increases ram consumption per user : Python

I don't know where else I could have asked this question, thus asking it here. I want to know that if I impose multiple Django filters on a page which are using multiple db tables, will that effect ram consumption whenever a user visits this page because before the user only filtered data will get reflected. I'm using django with postgresql on a ubuntu based VM, also if there are any documentation which can be helpful in understanding ram utilization, please suggest.

Django filter and query sets are lazy. What it actually means is you are not actually hitting the database until you evaluate them. Quoting official documentation -
Internally, a QuerySet can be constructed, filtered, sliced, and generally passed around without actually hitting the database. No database activity actually occurs until you do something to evaluate the queryset.
So the only space that gets taken in your RAM is actually the list containing queryset and your program. It is when query is evaluated and data is extracted from the database, that is when(depending on how much data is extracted), memory is filled. Also, it'd be a good idea to look at iterators as well

Python Django cache vs store in model field? Which is more efficient?

My view displays a table of data (specific to a customer who may have many users) and this table takes up a lot of computational resource to populate. A customers data changes 4/5 times a week, usually on the same day.
Caching is an obvious solution to this but I was wondering if Django's cache framework is significantly more efficient than creating a Textfield at the customer level and storing the data there instead?
I feel it's easier to implement (and clear the Textfield when the data changes) but what are the drawbacks and is there anything else I need to look out for? (problems if the dataset gets too big? additional fields in the model etc etc???)
Any help would be much appreciated!

A cache is a cache is cache, however you implement it, and the main problem with caches is invalidation.
As Melvyn rightly answered, the case for the cache framework is that it's (well, can be, depending on which backend you choose) outside your database. Whether it's a pro or cons really depends on your database load, infrastructure and whatnots... if you already use the cache framework (for more than plain unconditional full-page caching I mean) and want to mimimize the load on your database then it's possibly worth the added complexity.
Else storing your computed result in the db is quite straightforward and doesn't require additional servers, install etc. I'd personnally go for a dedicated model - to avoid unnecessary overhead at the db level -, including both the cached result and a checksum of the params on which this result depends (canonical memoization pattern) so you can easily detect whether it needs to be recomputed. I found this solution to be easier to maintain than trying to detect changes to each and any of those params and invalid/recompute the cache "on the fly" (which is what can make proper cache invalidation difficult or at least complex to implement) but this again depends on what those params are and where they come from.

The upside to using the cache framework is that you don't have to use the database. You can scale your cache store independent of your database and run the cache on different physical (or virtual) machines.
In addition you don't have to implement the stale vs fresh logic, but that's a one-off.

4-5 times a week doesn't look like a big challenge, but nobody knows except you what kind of computation do you have, how many data you should store, how many users do you have and so on.
If you want to implement this with TextField, it still some kind of caching system, so I suggest to use django's caching system with database backend first https://docs.djangoproject.com/en/1.11/topics/cache/#database-caching You can't retrieve data with 1 query like in case of TextField, but later you can replace database with other layer if necessary.

Advantage of Django ORM V/S Performing raw SQL queries [duplicate]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
If you are motivate to the "pros" of an ORM and why would you use an ORM to management/client, what are those reasons would be?
Try and keep one reason per answer so that we can see which one gets voted up as the best reason.

The most important reason to use an ORM is so that you can have a rich, object oriented business model and still be able to store it and write effective queries quickly against a relational database. From my viewpoint, I don't see any real advantages that a good ORM gives you when compared with other generated DAL's other than the advanced types of queries you can write.
One type of query I am thinking of is a polymorphic query. A simple ORM query might select all shapes in your database. You get a collection of shapes back. But each instance is a square, circle or rectangle according to its discriminator.
Another type of query would be one that eagerly fetches an object and one or more related objects or collections in a single database call. e.g. Each shape object is returned with its vertex and side collections populated.
I'm sorry to disagree with so many others here, but I don't think that code generation is a good enough reason by itself to go with an ORM. You can write or find many good DAL templates for code generators that do not have the conceptual or performance overhead that ORM's do.
Or, if you think that you don't need to know how to write good SQL to use an ORM, again, I disagree. It might be true that from the perspective of writing single queries, relying on an ORM is easier. But, with ORM's it is far too easy to create poor performing routines when developers don't understand how their queries work with the ORM and the SQL they translate into.
Having a data layer that works against multiple databases can be a benefit. It's not one that I have had to rely on that often though.
In the end, I have to reiterate that in my experience, if you are not using the more advanced query features of your ORM, there are other options that solve the remaining problems with less learning and fewer CPU cycles.
Oh yeah, some developers do find working with ORM's to be fun so ORM's are also good from the keep-your-developers-happy perspective. =)

Speeding development. For example, eliminating repetitive code like mapping query result fields to object members and vice-versa.

Making data access more abstract and portable. ORM implementation classes know how to write vendor-specific SQL, so you don't have to.

Supporting OO encapsulation of business rules in your data access layer. You can write (and debug) business rules in your application language of preference, instead of clunky trigger and stored procedure languages.

Generating boilerplate code for basic CRUD operations. Some ORM frameworks can inspect database metadata directly, read metadata mapping files, or use declarative class properties.

You can move to different database software easily because you are developing to an abstraction.

Development happiness, IMO. ORM abstracts away a lot of the bare-metal stuff you have to do in SQL. It keeps your code base simple: fewer source files to manage and schema changes don't require hours of upkeep.
I'm currently using an ORM and it has sped up my development.

So that your object model and persistence model match.

To minimise duplication of simple SQL queries.

The reason I'm looking into it is to avoid the generated code from VS2005's DAL tools (schema mapping, TableAdapters).
The DAL/BLL i created over a year ago was working fine (for what I had built it for) until someone else started using it to take advantage of some of the generated functions (which I had no idea were there)
It looks like it will provide a much more intuitive and cleaner solution than the DAL/BLL solution from http://wwww.asp.net
I was thinking about created my own SQL Command C# DAL code generator, but the ORM looks like a more elegant solution

Abstract the sql away 95% of the time so not everyone on the team needs to know how to write super efficient database specific queries.

I think there are a lot of good points here (portability, ease of development/maintenance, focus on OO business modeling etc), but when trying to convince your client or management, it all boils down to how much money you will save by using an ORM.
Do some estimations for typical tasks (or even larger projects that might be coming up) and you'll (hopefully!) get a few arguments for switching that are hard to ignore.

Compilation and testing of queries.
As the tooling for ORM's improves, it is easier to determine the correctness of your queries faster through compile time errors and tests.
Compiling your queries helps helps developers find errors faster. Right? Right. This compilation is made possible because developers are now writing queries in code using their business objects or models instead of just strings of SQL or SQL like statements.
If using the correct data access patterns in .NET it is easy to unit test your query logic against in memory collections. This speeds the execution of your tests because you don't need to access the database, set up data in the database or even spin up a full blown data context.[EDIT]This isn't as true as I thought it was as unit testing in memory can present difficult challenges to overcome. But I still find these integration tests easier to write than in previous years.[/EDIT]
This is definitely more relevant today than a few years ago when the question was asked, but that may only be the case for Visual Studio and Entity Framework where my experience lies. Plugin your own environment if possible.

.net tiers using code smith templates
http://nettiers.com/default.aspx?AspxAutoDetectCookieSupport=1
Why code something that can be generated just as well.

convince them how much time / money you will save when changes come in and you don't have to rewrite your SQL since the ORM tool will do that for you

I think one cons is that ORM will need some updation in your POJO. mainly related to schema, relation and query. so scenario where you are not suppose to make changes in model objects, might be because it is shared among more that on project or b/w client and server. so in such cases you will need to split it in two levels, which will require additional efforts .
i am an android developer and as you know mobile apps are usually not huge in size, so this additional effort to segregate pure-model and orm-affected-model does not seems worth full.
i understand that question is generic one. but mobile apps are also come inside generic umbrella.

Optimizing Django: nested queries vs relation lookups

I have a legacy code which uses nested ORM query, which produces SQL SELECT query with JOIN, and conditions which also contains SELECT and JOIN. Execution of this query takes enormous time. By the way, when I execute this query in raw SQL, taken from Django_ORM_query.query, it performs with reasonable time.
What are best practices for optimization in such cases?
Would the query perform faster if I will use ManyToMany and ForeignKey relations?

Performance issue in Django is usually caused by following relations in a loop, which causes multiple database queries. If you have django-debug-toolbar installed, you can check for how many queries you're doing and figure out which query needs to be optimized. The debug toolbar also shows you the time of each queries, which is essential for optimizing django, you're missing out a lot if you didn't have it installed or didn't use it.
You'd generally solve the problem of following relations by using select_related() or prefetch_related().
A page generally should have at most 20-30 queries, any more and it's going to seriously affect performance. Most pages should only have 5-10 queries. You want to reduce the number of queries because round trip is the number one killer of database performance. In general one big query is faster than 100 small queries.
The number two killer of database performance is much rarer a problem, though it sometimes arises because of techniques that reduces the number of queries. Your query might simply be too big, if this is the case, you should use defer() or only() so you don't load large fields that you know you won't be using.

When in doubt, use raw SQL. That's a completely valid optimization in Django world.

Map raw SQL to multiple related Django models

Due to performance reasons I can't use the ORM query methods of Django and I have to use raw SQL for some complex questions. I want to find a way to map the results of a SQL query to several models.
I know I can use the following statement to map the query results to one model, but I can't figure how to use it to be able to map to related models (like I can do by using the select_related statement in Django).
model_instance = MyModel(**dict(zip(field_names, row_data)))
Is there a relatively easy way to be able to map fields of related tables that are also in the query result set?

First, can you prove the ORM is stopping your performance? Sometimes performance problems are simply poor database design, or improper indexes. Usually this comes from trying to force-fit Django's ORM onto a legacy database design. Stored procedures and triggers can have adverse impact on performance -- especially when working with Django where the trigger code is expected to be in the Python model code.
Sometimes poor performance is an application issue. This includes needless order-by operations being done in the database.
The most common performance problem is an application that "over-fetches" data. Casually using the .all() method and creating large in-memory collections. This will crush performance. The Django query sets have to be touched as little as possible so that the query set iterator is given to the template for display.
Once you choose to bypass the ORM, you have to fight out the Object-Relational Impedance Mismatch problem. Again. Specifically, relational "navigation" has no concept of "related": it has to be a first-class fetch of a relational set using foreign keys. To assemble a complex in-memory object model via SQL is simply hard. Circular references make this very hard; resolving FK's into collections is hard.
If you're going to use raw SQL, you have two choices.
Eschew "select related" -- it doesn't exist -- and it's painful to implement.
Invent your own ORM-like "select related" features. A common approach is to add stateful getters that (a) check a private cache to see if they've fetched the related object and if the object doesn't exist, (b) fetch the related object from the database and update the cache.
In the process of inventing your own stateful getters, you'll be reinventing Django's, and you'll probably discover that it isn't the ORM layer, but a database design or an application design issue.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.