I have a django project with 5 different models in it. All of them has date field. Let's say i want to get all entries from all models with today date. Of course, i could just filter every model, and put results in one big list, but i believe it's bad. What would be efficient way to do that?
I don't think that it's a bad idea to query each model separately - indeed, from a database perspective, I can't see how you'd be able to do otherwise, as each model will need a separate SQL query. Even if, as #Nagaraj suggests, you set up a common Date model every other model references, you'd still need to query each model separately. You are probably correct, however, that putting the results into a list is bad practice, unless you actually need to load every object into memory, as explained here:
Be warned, though, that [evaluating a QuerySet as a list] could have a large memory overhead, because Django will load each element of the list into memory. In contrast, iterating over a QuerySet will take advantage of your database to load data and instantiate objects only as you need them.
It's hard to suggest other options without knowing more about your use case. However, I think I'd probably approach this by making a list or dictionary of QuerySets, which I could then use in my view, e.g.:
querysets = [cls.objects.filter(date=now) for cls in [Model1, Model2, Model3]]
Take a look at using Multiple Inheritance (docs here) to define those date fields in a class that you can subclass in the classes you want to return in the query.
For example:
class DateStuff(db.Model):
date = db.DateProperty()
class MyClass1(DateStuff):
...
class MyClass2(DateStuff):
...
I believe Django will let you query over the DateStuff class, and it'll return objects from MyClass1 and MyClass2.
Thank #nrabinowitz for pointing out my previous error.
Related
I want to limit the queries for a detail view. I want to access multiple many to many fields for one class instance in less query. It seems prefetch_related doesn't work with get and the server hits he database for every manytomany field.
JobInstance = Job.objects.get(pk=id).prefetch_related('cities').prefetch_related('experience_level')
You can let it work, by reordering it, like:
job_instance = Job.objects.prefetch_related('cities', 'experience_level').get(pk=id)
A .prefetch_related(..) is defined on a QuerySet, when you perform a .get(..) then you fetch the object, and you are no longer working with a queryset.
But for a single object, .prefetch_related(..) will not improve efficiency. After all, .prefetch_related(..) will make here two extra queries to fetch the related objects, exactly as much as not prefetching, and later evaluating the related objects of the job_instance.
.prefetch_related(..) is therefore useful when you want to fetch the related objects of multiple objects in bulk.
I have an abstract base model ('Base') from which two models inherit: 'Movie' and 'Cartoon'. I display a list of both movies and cartoons to the user (with the help of itertools.chain). Then I want to give the user the opportunity to delete any of these items, without knowing in advance whether it is a movie or a cartoon. I am trying to do it like this:
...
movies = Movie.objects.filter(user_created=userlist).order_by('title')
cartoons = Cartoon.objects.filter(user_created=userlist).order_by('title')
all_items = list(chain(movies, cartoons))
item = all_items.get(id=item_id)
item.delete()
But then PyCharm states,
Unresolved attribute reference 'get' for class 'list'
I understand why this happens but I don't know how to avoid it. Is there any way to merge two querysets from different models and apply get or filter without removing the abstract base model and creating a physical parent model?
You could use the ContentTypes framework for a generic and reusable solution to this for an arbitrary number of different models. But I also wonder why Cartoon and Movie must be different types to begin with; it may be worth spending a little time thinking about whether you can use a single model for both types of media - deletion of an arbitrary instance is just one of many cases where a single model will be more straightforward than relying on something like ContentTypes.
EDIT: For more info on ContentTypes. You could either create a base model with a generic relation (you said you didn't want to do this), or for the deletion you could include app label and model name in the request data alongside item id, enabling lookups like:
media_type = ContentType.objects.get(app_label=app_label, model=model_name)
instance = media_type.get_object_for_this_type(id=item_id)
instance.delete()
what's nice about this approach is you'd barely have to change your model structure.
you can first find the index using index() method and then can get item by all_items[given_index].delete()
I searched a lot and did not find what I´am looking for.
What would be the best concept for a model class in django?
To extend User, would be better to have a class with several attributes, or break this class into several classes with few attributes? I´m using the django ORM now.
Say I have a class called Person that extends User, would be better:
class Person(models.Model):
user = foreingkey(User)
attribute1 =
...
attributeN =
Or, would it be better to do this:
class PersonContac(models.Model):
user = foreingkey(User)
attribute1 =
...
attribute3 =
class PersonAddress(models.Model):
user = foreingkey(User)
attribute1 =
...
attribute3 =
class PersonHobby(models.Model):
user = foreingkey(User)
attribute1 =
...
attribute3 =
My each of my views would use the data from the smaller classes (probably).
Over time, the atrribute number can expand.
I want to do is do it once, and touch the minimum possible.
Various attributes can be unfilled by the user, they are not required.
The number of user is indefinite (can be a lot).
I´m concerned in terms of long term performance and maintaining.
If someone can explain me, what would be better for my code, and why.
And what would be better in general (less classes/more attributes, or more classes/less attributes), using the Django ORM.
It is better if my views use the data of only one model class, or it makes no (or little) difference?
Edit:
On the rush for writing I used bad names on class. None of these attributes are many-to-many fields, the User will have only one value for each attribute, or blank.
The number of atributes can expand over time, but not in a great number.
Put any data that is specific to only one User directly in the model. This would probably be things like "Name", "Birthday", etc.
Some things might be better served by a separate model, though. For example multiple people might have the same Hobby or one User might have multiple Hobby(s). Make this a separate class and use a ForeignKeyField or ManyToManyField as necessary.
Whatever you choose, the real trick is to optimize the number of database queries. The django-debug-toolbar is helpful here.
Splitting up your models would by default result in multiple database queries, so make sure to read up on select related to condense that down to one.
Also take a look at the defer method when retrieving a queryset. You can exclude some of those fields that aren't necessary if you know you won't use them in a particular view.
I think it's all up to your interface.
If you have to expose ALL data for a user in a single page and you have a single, large model you will end up with a single sql join instead of one for each smaller table.
Conversely, if you just need a few of these attributes, you might obtain a small performance gain in memory usage if you join the user table with a smaller one because you don't have to load a lot of attributes that aren't going to be used (though this might be mitigated through values (documentation here)
Also, if your attributes are not mandatory, you should at least have an idea of how many attributes are going to be filled. Having a large table of almost empty records could be a waste of space. Maybe a problem, maybe not. It depends on your hw resources.
Lastly, if you really think that your attributes can expand a lot, you could try the EAV approach.
I've got a Django view that I'm trying to optimise. It shows a list of parent objects on a page, along with their children. The child model has the foreign key back to the parent, so select_related doesn't seem to apply.
class Parent(models.Model):
name = models.CharField(max_length=31)
class Child(models.Model):
name = models.CharField(max_length=31)
parent = models.ForeignKey(Parent)
A naive implementation uses n+1 queries, where n is the number of parent objects, ie. one query to fetch the parent list, then one query to fetch the children of each parent.
I've written a view that does the job in two queries - one to fetch the parent objects, another to fetch the related children, then some Python (that I'm far too embarrassed to post here) to put it all back together again.
Once I found myself importing the standard library's collections module I realised that I was probably doing it wrong. There is probably a much easier way, but I lack the Django experience to find it. Any pointers would be much appreciated!
Add a related_name to the foreign key, then use the prefetch_related method which added to Django 1.4:
Returns a QuerySet that will automatically retrieve, in a single
batch, related objects for each of the specified lookups.
This has a similar purpose to select_related, in that both are
designed to stop the deluge of database queries that is caused by
accessing related objects, but the strategy is quite different:
select_related works by creating a SQL join and including the fields
of the related object in the SELECT statement. For this reason,
select_related gets the related objects in the same database query.
However, to avoid the much larger result set that would result from
joining across a 'many' relationship, select_related is limited to
single-valued relationships - foreign key and one-to-one.
prefetch_related, on the other hand, does a separate lookup for each
relationship, and does the 'joining' in Python. This allows it to
prefetch many-to-many and many-to-one objects, which cannot be done
using select_related, in addition to the foreign key and one-to-one
relationships that are supported by select_related. It also supports
prefetching of GenericRelation and GenericForeignKey.
class Parent(models.Model):
name = models.CharField(max_length=31)
class Child(models.Model):
name = models.CharField(max_length=31)
parent = models.ForeignKey(Parent, related_name='children')
>>> Parent.objects.all().prefetch_related('children')
All the relevant children will be fetched in a single query, and used
to make QuerySets that have a pre-filled cache of the relevant
results. These QuerySets are then used in the self.children.all()
calls.
Note 1 that, as always with QuerySets, any subsequent chained methods which imply a different database query will ignore previously
cached results, and retrieve data using a fresh database query.
Note 2 that if you use iterator() to run the query, prefetch_related() calls will be ignored since these two
optimizations do not make sense together.
If you ever need to work with more than 2 levels at once, you can consider a different approach to storing trees in db using MPTT
In a nutshell, it adds data to your model which are updated during updates and allow a much more efficient retrieval.
Actually, select_related is what you are looking for. select_related creates a JOIN so that all the data that you need is fetched in one statement. prefetch_related runs all the queries at once then caches them.
The trick here is to "join in" only what you absolutely need to in order to reduce the performance penalty of the join. "What you absolutely need to" is the long way of saying that you should pre-select only the fields that you will read later in your view or template. There is good documentation here: https://docs.djangoproject.com/en/1.4/ref/models/querysets/#select-related
This is a snippet from one of my models where I faced a similar problem:
return QuantitativeResult.objects.select_related(
'enrollment__subscription__configuration__analyte',
'enrollment__subscription__unit',
'enrollment__subscription__configuration__analyte__unit',
'enrollment__subscription__lab',
'enrollment__subscription__instrument_model'
'enrollment__subscription__instrument',
'enrollment__subscription__configuration__method',
'enrollment__subscription__configuration__reagent',
'enrollment__subscription__configuration__reagent__manufacturer',
'enrollment__subscription__instrument_model__instrument__manufacturer'
).filter(<snip, snip - stuff edited out>)
In this pathological case, I went down from 700+ queries to just one. The django debug toolbar is your friend when it comes to this sort of issue.
I need to store some data in a Django model. These data are not equal to all instances of the model.
At first I thought about subclassing the model, but I’m trying to keep the application flexible. If I use subclasses, I’ll need to create a whole class each time I need a new kind of object, and that’s no good. I’ll also end up with a lot of subclasses only to store a pair of extra fields.
I really feel that a dictionary would be the best approach, but there’s nothing in the Django documentation about storing a dictionary in a Django model (or I can’t find it).
Any clues?
If it's really dictionary like arbitrary data you're looking for you can probably use a two-level setup with one model that's a container and another model that's key-value pairs. You'd create an instance of the container, create each of the key-value instances, and associate the set of key-value instances with the container instance. Something like:
class Dicty(models.Model):
name = models.CharField(max_length=50)
class KeyVal(models.Model):
container = models.ForeignKey(Dicty, db_index=True)
key = models.CharField(max_length=240, db_index=True)
value = models.CharField(max_length=240, db_index=True)
It's not pretty, but it'll let you access/search the innards of the dictionary using the DB whereas a pickle/serialize solution will not.
Another clean and fast solution can be found here: https://github.com/bradjasper/django-jsonfield
For convenience I copied the simple instructions.
Install
pip install jsonfield
Usage
from django.db import models
from jsonfield import JSONField
class MyModel(models.Model):
json = JSONField()
If you don't need to query by any of this extra data, then you can store it as a serialized dictionary. Use repr to turn the dictionary into a string, and eval to turn the string back into a dictionary. Take care with eval that there's no user data in the dictionary, or use a safe_eval implementation.
For example, in the create and update methods of your views, you can add:
if isinstance(request.data, dict) == False:
req_data = request.data.dict().copy()
else:
req_data = request.data.copy()
dict_key = 'request_parameter_that_has_a_dict_inside'
if dict_key in req_data.keys() and isinstance(req_data[dict_key], dict):
req_data[dict_key] = repr(req_data[dict_key])
I came to this post by google's 4rth result to "django store object"
A little bit late, but django-picklefield looks like good solution to me.
Example from doc:
To use, just define a field in your model:
>>> from picklefield.fields import PickledObjectField
>>> class SomeObject(models.Model):
>>> args = PickledObjectField()
and assign whatever you like (as long as it's picklable) to the field:
>>> obj = SomeObject()
>>> obj.args = ['fancy', {'objects': 'inside'}]
>>> obj.save()
As Ned answered, you won't be able to query "some data" if you use the dictionary approach.
If you still need to store dictionaries then the best approach, by far, is the PickleField class documented in Marty Alchin's new book Pro Django. This method uses Python class properties to pickle/unpickle a python object, only on demand, that is stored in a model field.
The basics of this approach is to use django's contibute_to_class method to dynamically add a new field to your model and uses getattr/setattr to do the serializing on demand.
One of the few online examples I could find that is similar is this definition of a JSONField.
I'm not sure exactly sure of the nature of the problem you're trying to solve, but it sounds curiously similar to Google App Engine's BigTable Expando.
Expandos allow you to specify and store additional fields on an database-backed object instance at runtime. To quote from the docs:
import datetime
from google.appengine.ext import db
class Song(db.Expando):
title = db.StringProperty()
crazy = Song(title='Crazy like a diamond',
author='Lucy Sky',
publish_date='yesterday',
rating=5.0)
crazy.last_minute_note=db.Text('Get a train to the station.')
Google App Engine currently supports both Python and the Django framework. Might be worth looking into if this is the best way to express your models.
Traditional relational database models don't have this kind of column-addition flexibility. If your datatypes are simple enough you could break from traditional RDBMS philosophy and hack values into a single column via serialization as #Ned Batchelder proposes; however, if you have to use an RDBMS, Django model inheritance is probably the way to go. Notably, it will create a one-to-one foreign key relation for each level of derivation.
This question is old, but I was having the same problem, ended here and the chosen answer couldn't solve my problem anymore.
If you want to store dictionaries in Django or REST Api, either to be used as objects in your front end, or because your data won't necessarily have the same structure, the solution I used can help you.
When saving the data in your API, use json.dump() method to be able to store it in a proper json format, as described in this question.
If you use this structure, your data will already be in the appropriate json format to be called in the front end with JSON.parse() in your ajax (or whatever) call.
I use a textfield and json.loads()/json.dumps()
models.py
import json
from django.db import models
class Item(models.Model):
data = models.TextField(blank=True, null=True, default='{}')
def save(self, *args, **kwargs):
## load the current string and
## convert string to python dictionary
data_dict = json.loads(self.data)
## do something with the dictionary
for something in somethings:
data_dict[something] = some_function(something)
## if it is empty, save it back to a '{}' string,
## if it is not empty, convert the dictionary back to a json string
if not data_dict:
self.data = '{}'
else:
self.data = json.dumps(data_dict)
super(Item, self).save(*args, **kwargs)
Django-Geo includes a "DictionaryField" you might find helpful:
http://code.google.com/p/django-geo/source/browse/trunk/fields.py?r=13#49
In general, if you don't need to query across the data use a denormalized approach to avoid extra queries. User settings are a pretty good example!
I agree that you need to refrain stuffing otherwise structured data into a single column. But if you must do that, Django has an XMLField build-in.
There's also JSONField at Django snipplets.
Being "not equal to all instances of the model" sounds to me like a good match for a "Schema-free database". CouchDB is the poster child for that approach and you might consider that.
In a project I moved several tables which never played very nice with the Django ORM over to CouchDB and I'm quite happy with that. I use couchdb-python without any of the Django-specific CouchDB modules. A description of the data model can be found here. The movement from five "models" in Django to 3 "models" in Django and one CouchDB "database" actually slightly reduced the total lines of code in my application.
I know this is an old question, but today (2021) the cleanest alternative is to use the native JSONfield (since django 3.1)
docs: https://docs.djangoproject.com/en/3.2/ref/models/fields/#django.db.models.JSONField
you just create a field in the model called jsonfield inside the class model and voilá
Think it over, and find the commonalities of each data set... then define your model. It may require the use of subclasses or not. Foreign keys representing commonalities aren't to be avoided, but encouraged when they make sense.
Stuffing random data into a SQL table is not smart, unless it's truly non-relational data. If that's the case, define your problem and we may be able to help.
If you are using Postgres, you can use an hstore field: https://docs.djangoproject.com/en/1.10/ref/contrib/postgres/fields/#hstorefield.