I searched a lot and did not find what I´am looking for.
What would be the best concept for a model class in django?
To extend User, would be better to have a class with several attributes, or break this class into several classes with few attributes? I´m using the django ORM now.
Say I have a class called Person that extends User, would be better:
class Person(models.Model):
user = foreingkey(User)
attribute1 =
...
attributeN =
Or, would it be better to do this:
class PersonContac(models.Model):
user = foreingkey(User)
attribute1 =
...
attribute3 =
class PersonAddress(models.Model):
user = foreingkey(User)
attribute1 =
...
attribute3 =
class PersonHobby(models.Model):
user = foreingkey(User)
attribute1 =
...
attribute3 =
My each of my views would use the data from the smaller classes (probably).
Over time, the atrribute number can expand.
I want to do is do it once, and touch the minimum possible.
Various attributes can be unfilled by the user, they are not required.
The number of user is indefinite (can be a lot).
I´m concerned in terms of long term performance and maintaining.
If someone can explain me, what would be better for my code, and why.
And what would be better in general (less classes/more attributes, or more classes/less attributes), using the Django ORM.
It is better if my views use the data of only one model class, or it makes no (or little) difference?
Edit:
On the rush for writing I used bad names on class. None of these attributes are many-to-many fields, the User will have only one value for each attribute, or blank.
The number of atributes can expand over time, but not in a great number.
Put any data that is specific to only one User directly in the model. This would probably be things like "Name", "Birthday", etc.
Some things might be better served by a separate model, though. For example multiple people might have the same Hobby or one User might have multiple Hobby(s). Make this a separate class and use a ForeignKeyField or ManyToManyField as necessary.
Whatever you choose, the real trick is to optimize the number of database queries. The django-debug-toolbar is helpful here.
Splitting up your models would by default result in multiple database queries, so make sure to read up on select related to condense that down to one.
Also take a look at the defer method when retrieving a queryset. You can exclude some of those fields that aren't necessary if you know you won't use them in a particular view.
I think it's all up to your interface.
If you have to expose ALL data for a user in a single page and you have a single, large model you will end up with a single sql join instead of one for each smaller table.
Conversely, if you just need a few of these attributes, you might obtain a small performance gain in memory usage if you join the user table with a smaller one because you don't have to load a lot of attributes that aren't going to be used (though this might be mitigated through values (documentation here)
Also, if your attributes are not mandatory, you should at least have an idea of how many attributes are going to be filled. Having a large table of almost empty records could be a waste of space. Maybe a problem, maybe not. It depends on your hw resources.
Lastly, if you really think that your attributes can expand a lot, you could try the EAV approach.
Related
I have an abstract base model ('Base') from which two models inherit: 'Movie' and 'Cartoon'. I display a list of both movies and cartoons to the user (with the help of itertools.chain). Then I want to give the user the opportunity to delete any of these items, without knowing in advance whether it is a movie or a cartoon. I am trying to do it like this:
...
movies = Movie.objects.filter(user_created=userlist).order_by('title')
cartoons = Cartoon.objects.filter(user_created=userlist).order_by('title')
all_items = list(chain(movies, cartoons))
item = all_items.get(id=item_id)
item.delete()
But then PyCharm states,
Unresolved attribute reference 'get' for class 'list'
I understand why this happens but I don't know how to avoid it. Is there any way to merge two querysets from different models and apply get or filter without removing the abstract base model and creating a physical parent model?
You could use the ContentTypes framework for a generic and reusable solution to this for an arbitrary number of different models. But I also wonder why Cartoon and Movie must be different types to begin with; it may be worth spending a little time thinking about whether you can use a single model for both types of media - deletion of an arbitrary instance is just one of many cases where a single model will be more straightforward than relying on something like ContentTypes.
EDIT: For more info on ContentTypes. You could either create a base model with a generic relation (you said you didn't want to do this), or for the deletion you could include app label and model name in the request data alongside item id, enabling lookups like:
media_type = ContentType.objects.get(app_label=app_label, model=model_name)
instance = media_type.get_object_for_this_type(id=item_id)
instance.delete()
what's nice about this approach is you'd barely have to change your model structure.
you can first find the index using index() method and then can get item by all_items[given_index].delete()
I'm using Django and have a few models. They correlate to each other without any foreign keys, but I want to be able to select them in a centralized place, here are the models (without the inheritance and fields so tests are easy):
class ItemTypeOne:
pass
class ItemOneExtra:
pass
# -----------------------------
class ItemTypeTwo:
pass
class ItemTwoExtra:
pass
# ... and so on
What I thought of using so far its a dict to map them, like so:
correlated_extra_model = {ItemTypeOne: ItemOneExtra, ItemTypeTwo: ItemTwoExtra}[ItemTypeOne]
This works, but I'm not sure if it's acceptable
Firstly, in a relational setting, your tables are related only if you have sort of foreign key or one-to-one field pointing to another table. Without it, it does not any relation at db level. You may be enforcing this behaviour through your logic.
Secondly, if you have to create an auxiliary models for each of your base model, I think it's an indicator that the current design is flawed and you need to rethink your models. Of course, this is based on my assumptions and they might not be very much applicable to your use case. So, if you can, please share some more details about the problem statement.
I've got a Django view that I'm trying to optimise. It shows a list of parent objects on a page, along with their children. The child model has the foreign key back to the parent, so select_related doesn't seem to apply.
class Parent(models.Model):
name = models.CharField(max_length=31)
class Child(models.Model):
name = models.CharField(max_length=31)
parent = models.ForeignKey(Parent)
A naive implementation uses n+1 queries, where n is the number of parent objects, ie. one query to fetch the parent list, then one query to fetch the children of each parent.
I've written a view that does the job in two queries - one to fetch the parent objects, another to fetch the related children, then some Python (that I'm far too embarrassed to post here) to put it all back together again.
Once I found myself importing the standard library's collections module I realised that I was probably doing it wrong. There is probably a much easier way, but I lack the Django experience to find it. Any pointers would be much appreciated!
Add a related_name to the foreign key, then use the prefetch_related method which added to Django 1.4:
Returns a QuerySet that will automatically retrieve, in a single
batch, related objects for each of the specified lookups.
This has a similar purpose to select_related, in that both are
designed to stop the deluge of database queries that is caused by
accessing related objects, but the strategy is quite different:
select_related works by creating a SQL join and including the fields
of the related object in the SELECT statement. For this reason,
select_related gets the related objects in the same database query.
However, to avoid the much larger result set that would result from
joining across a 'many' relationship, select_related is limited to
single-valued relationships - foreign key and one-to-one.
prefetch_related, on the other hand, does a separate lookup for each
relationship, and does the 'joining' in Python. This allows it to
prefetch many-to-many and many-to-one objects, which cannot be done
using select_related, in addition to the foreign key and one-to-one
relationships that are supported by select_related. It also supports
prefetching of GenericRelation and GenericForeignKey.
class Parent(models.Model):
name = models.CharField(max_length=31)
class Child(models.Model):
name = models.CharField(max_length=31)
parent = models.ForeignKey(Parent, related_name='children')
>>> Parent.objects.all().prefetch_related('children')
All the relevant children will be fetched in a single query, and used
to make QuerySets that have a pre-filled cache of the relevant
results. These QuerySets are then used in the self.children.all()
calls.
Note 1 that, as always with QuerySets, any subsequent chained methods which imply a different database query will ignore previously
cached results, and retrieve data using a fresh database query.
Note 2 that if you use iterator() to run the query, prefetch_related() calls will be ignored since these two
optimizations do not make sense together.
If you ever need to work with more than 2 levels at once, you can consider a different approach to storing trees in db using MPTT
In a nutshell, it adds data to your model which are updated during updates and allow a much more efficient retrieval.
Actually, select_related is what you are looking for. select_related creates a JOIN so that all the data that you need is fetched in one statement. prefetch_related runs all the queries at once then caches them.
The trick here is to "join in" only what you absolutely need to in order to reduce the performance penalty of the join. "What you absolutely need to" is the long way of saying that you should pre-select only the fields that you will read later in your view or template. There is good documentation here: https://docs.djangoproject.com/en/1.4/ref/models/querysets/#select-related
This is a snippet from one of my models where I faced a similar problem:
return QuantitativeResult.objects.select_related(
'enrollment__subscription__configuration__analyte',
'enrollment__subscription__unit',
'enrollment__subscription__configuration__analyte__unit',
'enrollment__subscription__lab',
'enrollment__subscription__instrument_model'
'enrollment__subscription__instrument',
'enrollment__subscription__configuration__method',
'enrollment__subscription__configuration__reagent',
'enrollment__subscription__configuration__reagent__manufacturer',
'enrollment__subscription__instrument_model__instrument__manufacturer'
).filter(<snip, snip - stuff edited out>)
In this pathological case, I went down from 700+ queries to just one. The django debug toolbar is your friend when it comes to this sort of issue.
I have a django project with 5 different models in it. All of them has date field. Let's say i want to get all entries from all models with today date. Of course, i could just filter every model, and put results in one big list, but i believe it's bad. What would be efficient way to do that?
I don't think that it's a bad idea to query each model separately - indeed, from a database perspective, I can't see how you'd be able to do otherwise, as each model will need a separate SQL query. Even if, as #Nagaraj suggests, you set up a common Date model every other model references, you'd still need to query each model separately. You are probably correct, however, that putting the results into a list is bad practice, unless you actually need to load every object into memory, as explained here:
Be warned, though, that [evaluating a QuerySet as a list] could have a large memory overhead, because Django will load each element of the list into memory. In contrast, iterating over a QuerySet will take advantage of your database to load data and instantiate objects only as you need them.
It's hard to suggest other options without knowing more about your use case. However, I think I'd probably approach this by making a list or dictionary of QuerySets, which I could then use in my view, e.g.:
querysets = [cls.objects.filter(date=now) for cls in [Model1, Model2, Model3]]
Take a look at using Multiple Inheritance (docs here) to define those date fields in a class that you can subclass in the classes you want to return in the query.
For example:
class DateStuff(db.Model):
date = db.DateProperty()
class MyClass1(DateStuff):
...
class MyClass2(DateStuff):
...
I believe Django will let you query over the DateStuff class, and it'll return objects from MyClass1 and MyClass2.
Thank #nrabinowitz for pointing out my previous error.
In a Google App Engine solution (Python), I've used the db.ListProperty as a way to describe a many-to-many relation, like so:
class Department(db.Model):
name = db.StringProperty()
#property
def employees(self):
return Employee.all().filter('departments', self.key())
class Employee(db.Model):
name = db.StringProperty()
departments = db.ListProperty(db.Key)
I create many-to-many relations by simply appending the Department key to the db.ListProperty like so:
employee.departments.append(department.key())
The problem is that I don't know how to actually remove this relationship again, when it is no longer needed.
I've tried Googling it, but I can't seem to find any documentation that describes the db.ListProperty in details.
Any ideas or references?
The ListProperty is just a Python list with some helper methods to make it work with GAE, so anything that applies to a list applies to a ListProperty.
employee.departments.remove(department.key())
employee.put()
Keep in mind that the data must be deserialized/reserialized every time a change is made, so if you are looking for speed when adding or removing single values you may want to go with another method of modelling the relationship like the one in the Relationship Model section of this page.
The ListProperty method also has the disadvantage of sometimes producing very large indexes if you want to search through the lists in a datastore request.
This may not not be a problem for you since your Lists should be relatively small, but it's something to keep in mind for future projects.
Found it via trial and error:
employee.departments.remove(department.key())