In a Google App Engine solution (Python), I've used the db.ListProperty as a way to describe a many-to-many relation, like so:
class Department(db.Model):
name = db.StringProperty()
#property
def employees(self):
return Employee.all().filter('departments', self.key())
class Employee(db.Model):
name = db.StringProperty()
departments = db.ListProperty(db.Key)
I create many-to-many relations by simply appending the Department key to the db.ListProperty like so:
employee.departments.append(department.key())
The problem is that I don't know how to actually remove this relationship again, when it is no longer needed.
I've tried Googling it, but I can't seem to find any documentation that describes the db.ListProperty in details.
Any ideas or references?
The ListProperty is just a Python list with some helper methods to make it work with GAE, so anything that applies to a list applies to a ListProperty.
employee.departments.remove(department.key())
employee.put()
Keep in mind that the data must be deserialized/reserialized every time a change is made, so if you are looking for speed when adding or removing single values you may want to go with another method of modelling the relationship like the one in the Relationship Model section of this page.
The ListProperty method also has the disadvantage of sometimes producing very large indexes if you want to search through the lists in a datastore request.
This may not not be a problem for you since your Lists should be relatively small, but it's something to keep in mind for future projects.
Found it via trial and error:
employee.departments.remove(department.key())
Related
I searched a lot and did not find what I´am looking for.
What would be the best concept for a model class in django?
To extend User, would be better to have a class with several attributes, or break this class into several classes with few attributes? I´m using the django ORM now.
Say I have a class called Person that extends User, would be better:
class Person(models.Model):
user = foreingkey(User)
attribute1 =
...
attributeN =
Or, would it be better to do this:
class PersonContac(models.Model):
user = foreingkey(User)
attribute1 =
...
attribute3 =
class PersonAddress(models.Model):
user = foreingkey(User)
attribute1 =
...
attribute3 =
class PersonHobby(models.Model):
user = foreingkey(User)
attribute1 =
...
attribute3 =
My each of my views would use the data from the smaller classes (probably).
Over time, the atrribute number can expand.
I want to do is do it once, and touch the minimum possible.
Various attributes can be unfilled by the user, they are not required.
The number of user is indefinite (can be a lot).
I´m concerned in terms of long term performance and maintaining.
If someone can explain me, what would be better for my code, and why.
And what would be better in general (less classes/more attributes, or more classes/less attributes), using the Django ORM.
It is better if my views use the data of only one model class, or it makes no (or little) difference?
Edit:
On the rush for writing I used bad names on class. None of these attributes are many-to-many fields, the User will have only one value for each attribute, or blank.
The number of atributes can expand over time, but not in a great number.
Put any data that is specific to only one User directly in the model. This would probably be things like "Name", "Birthday", etc.
Some things might be better served by a separate model, though. For example multiple people might have the same Hobby or one User might have multiple Hobby(s). Make this a separate class and use a ForeignKeyField or ManyToManyField as necessary.
Whatever you choose, the real trick is to optimize the number of database queries. The django-debug-toolbar is helpful here.
Splitting up your models would by default result in multiple database queries, so make sure to read up on select related to condense that down to one.
Also take a look at the defer method when retrieving a queryset. You can exclude some of those fields that aren't necessary if you know you won't use them in a particular view.
I think it's all up to your interface.
If you have to expose ALL data for a user in a single page and you have a single, large model you will end up with a single sql join instead of one for each smaller table.
Conversely, if you just need a few of these attributes, you might obtain a small performance gain in memory usage if you join the user table with a smaller one because you don't have to load a lot of attributes that aren't going to be used (though this might be mitigated through values (documentation here)
Also, if your attributes are not mandatory, you should at least have an idea of how many attributes are going to be filled. Having a large table of almost empty records could be a waste of space. Maybe a problem, maybe not. It depends on your hw resources.
Lastly, if you really think that your attributes can expand a lot, you could try the EAV approach.
I've got a Django view that I'm trying to optimise. It shows a list of parent objects on a page, along with their children. The child model has the foreign key back to the parent, so select_related doesn't seem to apply.
class Parent(models.Model):
name = models.CharField(max_length=31)
class Child(models.Model):
name = models.CharField(max_length=31)
parent = models.ForeignKey(Parent)
A naive implementation uses n+1 queries, where n is the number of parent objects, ie. one query to fetch the parent list, then one query to fetch the children of each parent.
I've written a view that does the job in two queries - one to fetch the parent objects, another to fetch the related children, then some Python (that I'm far too embarrassed to post here) to put it all back together again.
Once I found myself importing the standard library's collections module I realised that I was probably doing it wrong. There is probably a much easier way, but I lack the Django experience to find it. Any pointers would be much appreciated!
Add a related_name to the foreign key, then use the prefetch_related method which added to Django 1.4:
Returns a QuerySet that will automatically retrieve, in a single
batch, related objects for each of the specified lookups.
This has a similar purpose to select_related, in that both are
designed to stop the deluge of database queries that is caused by
accessing related objects, but the strategy is quite different:
select_related works by creating a SQL join and including the fields
of the related object in the SELECT statement. For this reason,
select_related gets the related objects in the same database query.
However, to avoid the much larger result set that would result from
joining across a 'many' relationship, select_related is limited to
single-valued relationships - foreign key and one-to-one.
prefetch_related, on the other hand, does a separate lookup for each
relationship, and does the 'joining' in Python. This allows it to
prefetch many-to-many and many-to-one objects, which cannot be done
using select_related, in addition to the foreign key and one-to-one
relationships that are supported by select_related. It also supports
prefetching of GenericRelation and GenericForeignKey.
class Parent(models.Model):
name = models.CharField(max_length=31)
class Child(models.Model):
name = models.CharField(max_length=31)
parent = models.ForeignKey(Parent, related_name='children')
>>> Parent.objects.all().prefetch_related('children')
All the relevant children will be fetched in a single query, and used
to make QuerySets that have a pre-filled cache of the relevant
results. These QuerySets are then used in the self.children.all()
calls.
Note 1 that, as always with QuerySets, any subsequent chained methods which imply a different database query will ignore previously
cached results, and retrieve data using a fresh database query.
Note 2 that if you use iterator() to run the query, prefetch_related() calls will be ignored since these two
optimizations do not make sense together.
If you ever need to work with more than 2 levels at once, you can consider a different approach to storing trees in db using MPTT
In a nutshell, it adds data to your model which are updated during updates and allow a much more efficient retrieval.
Actually, select_related is what you are looking for. select_related creates a JOIN so that all the data that you need is fetched in one statement. prefetch_related runs all the queries at once then caches them.
The trick here is to "join in" only what you absolutely need to in order to reduce the performance penalty of the join. "What you absolutely need to" is the long way of saying that you should pre-select only the fields that you will read later in your view or template. There is good documentation here: https://docs.djangoproject.com/en/1.4/ref/models/querysets/#select-related
This is a snippet from one of my models where I faced a similar problem:
return QuantitativeResult.objects.select_related(
'enrollment__subscription__configuration__analyte',
'enrollment__subscription__unit',
'enrollment__subscription__configuration__analyte__unit',
'enrollment__subscription__lab',
'enrollment__subscription__instrument_model'
'enrollment__subscription__instrument',
'enrollment__subscription__configuration__method',
'enrollment__subscription__configuration__reagent',
'enrollment__subscription__configuration__reagent__manufacturer',
'enrollment__subscription__instrument_model__instrument__manufacturer'
).filter(<snip, snip - stuff edited out>)
In this pathological case, I went down from 700+ queries to just one. The django debug toolbar is your friend when it comes to this sort of issue.
I'm in the process of writing my first RESTful web service atop GAE and the Python 2.7 runtime; I've started out using Guido's shiny new ndb API.
However, I'm unsure how to solve a particular case without the implicit back-reference feature of the original db API. If the user-agent requests a particular resource and those resources 1 degree removed:
host/api/kind/id?depth=2
What's the best way to discover a related collection of entities from the "one" in a one-to-many relationship, given that the kind of the related entity is unknown at development time?
I'm unable to use a replacement query as described in a previous SO inquiry due to the latter restriction. The fact that my model is definable at runtime (and therefore isn't hardcoded) prevents me from using a query to filter properties for matching keys.
Ancestor and other kindless queries are also out due to the datastore limitation that prevents me from filtering on a property without the kind specified.
Thus far, the only idea I've had (beyond reverting to the db api) is to use a cross-group transaction to write my own reference on the "one", either by updating an ndb.StringProperty(repeat=True) containing all the related kinds when an entity of a new kind is introduced or by simply maintaining a list of keys on the "one" ndb.KeyProperty(repeat=True) every time a related "many" entity is written to the datastore.
I'm hoping someone more experienced than myself can suggest a better approach.
Given jmort253's suggestion, I'll try to augment my question with a concrete example adapted from the docs:
class Contact(ndb.Expando):
""" The One """
# basic info
name = ndb.StringProperty()
birth_day = ndb.DateProperty()
# If I were using db, a collection called 'phone_numbers' would be implicitly
# created here. I could use this property to retrieve related phone numbers
# when this entity was queried. Since NDB lacks this feature, the service
# will neither have a reference to query nor the means to know the
# relationship exists in the first place since it cannot be hard-coded. The
# data model is extensible and user-defined at runtime; most relationships
# will be described only in the data, and must be discoverable by the server.
# In this case, when Contact is queried, I need a way to retrieve the
# collection of phone numbers.
# Company info.
company_title = ndb.StringProperty()
company_name = ndb.StringProperty()
company_description = ndb.StringProperty()
company_address = ndb.PostalAddressProperty()
class PhoneNumber(ndb.Expando):
""" The Many """
# no collection_name='phone_numbers' equivalent exists for the key property
contact = ndb.KeyProperty(kind='Contact')
number = ndb.PhoneNumberProperty()
Interesting question! So basically you want to look at the Contact class and find out if there is some other model class that has a KeyProperty referencing it; in this example PhoneNumber (but there could be many).
I think the solution is to ask your users to explicitly add this link when the PhoneNumber class is created.
You can make this easy for your users by giving them a subclass of KeyProperty that takes care of this; e.g.
class LinkedKeyProperty(ndb.KeyProperty):
def _fix_up(self, cls, code_name):
super(LinkedKeyProperty, self)._fix_up(cls, code_name)
modelclass = ndb.Model._kind_map[self._kind]
collection_name = '%s_ref_%s_to_%s' % (cls.__name__,
code_name,
modelclass.__name__)
setattr(modelclass, collection_name, (cls, self))
Exactly how you pick the name for the collection and the value to store there is up to you; just put something there that makes it easy for you to follow the link back. The example would create a new attribute on Contact:
Contact.PhoneNumber_ref_contact_to_Contact == (PhoneNumber, PhoneNumber.contact)
[edited to make the code working and to add an example. :-) ]
Sound like a good use case for ndb.StructuredProperty.
I have a django project with 5 different models in it. All of them has date field. Let's say i want to get all entries from all models with today date. Of course, i could just filter every model, and put results in one big list, but i believe it's bad. What would be efficient way to do that?
I don't think that it's a bad idea to query each model separately - indeed, from a database perspective, I can't see how you'd be able to do otherwise, as each model will need a separate SQL query. Even if, as #Nagaraj suggests, you set up a common Date model every other model references, you'd still need to query each model separately. You are probably correct, however, that putting the results into a list is bad practice, unless you actually need to load every object into memory, as explained here:
Be warned, though, that [evaluating a QuerySet as a list] could have a large memory overhead, because Django will load each element of the list into memory. In contrast, iterating over a QuerySet will take advantage of your database to load data and instantiate objects only as you need them.
It's hard to suggest other options without knowing more about your use case. However, I think I'd probably approach this by making a list or dictionary of QuerySets, which I could then use in my view, e.g.:
querysets = [cls.objects.filter(date=now) for cls in [Model1, Model2, Model3]]
Take a look at using Multiple Inheritance (docs here) to define those date fields in a class that you can subclass in the classes you want to return in the query.
For example:
class DateStuff(db.Model):
date = db.DateProperty()
class MyClass1(DateStuff):
...
class MyClass2(DateStuff):
...
I believe Django will let you query over the DateStuff class, and it'll return objects from MyClass1 and MyClass2.
Thank #nrabinowitz for pointing out my previous error.
I need to store some data in a Django model. These data are not equal to all instances of the model.
At first I thought about subclassing the model, but I’m trying to keep the application flexible. If I use subclasses, I’ll need to create a whole class each time I need a new kind of object, and that’s no good. I’ll also end up with a lot of subclasses only to store a pair of extra fields.
I really feel that a dictionary would be the best approach, but there’s nothing in the Django documentation about storing a dictionary in a Django model (or I can’t find it).
Any clues?
If it's really dictionary like arbitrary data you're looking for you can probably use a two-level setup with one model that's a container and another model that's key-value pairs. You'd create an instance of the container, create each of the key-value instances, and associate the set of key-value instances with the container instance. Something like:
class Dicty(models.Model):
name = models.CharField(max_length=50)
class KeyVal(models.Model):
container = models.ForeignKey(Dicty, db_index=True)
key = models.CharField(max_length=240, db_index=True)
value = models.CharField(max_length=240, db_index=True)
It's not pretty, but it'll let you access/search the innards of the dictionary using the DB whereas a pickle/serialize solution will not.
Another clean and fast solution can be found here: https://github.com/bradjasper/django-jsonfield
For convenience I copied the simple instructions.
Install
pip install jsonfield
Usage
from django.db import models
from jsonfield import JSONField
class MyModel(models.Model):
json = JSONField()
If you don't need to query by any of this extra data, then you can store it as a serialized dictionary. Use repr to turn the dictionary into a string, and eval to turn the string back into a dictionary. Take care with eval that there's no user data in the dictionary, or use a safe_eval implementation.
For example, in the create and update methods of your views, you can add:
if isinstance(request.data, dict) == False:
req_data = request.data.dict().copy()
else:
req_data = request.data.copy()
dict_key = 'request_parameter_that_has_a_dict_inside'
if dict_key in req_data.keys() and isinstance(req_data[dict_key], dict):
req_data[dict_key] = repr(req_data[dict_key])
I came to this post by google's 4rth result to "django store object"
A little bit late, but django-picklefield looks like good solution to me.
Example from doc:
To use, just define a field in your model:
>>> from picklefield.fields import PickledObjectField
>>> class SomeObject(models.Model):
>>> args = PickledObjectField()
and assign whatever you like (as long as it's picklable) to the field:
>>> obj = SomeObject()
>>> obj.args = ['fancy', {'objects': 'inside'}]
>>> obj.save()
As Ned answered, you won't be able to query "some data" if you use the dictionary approach.
If you still need to store dictionaries then the best approach, by far, is the PickleField class documented in Marty Alchin's new book Pro Django. This method uses Python class properties to pickle/unpickle a python object, only on demand, that is stored in a model field.
The basics of this approach is to use django's contibute_to_class method to dynamically add a new field to your model and uses getattr/setattr to do the serializing on demand.
One of the few online examples I could find that is similar is this definition of a JSONField.
I'm not sure exactly sure of the nature of the problem you're trying to solve, but it sounds curiously similar to Google App Engine's BigTable Expando.
Expandos allow you to specify and store additional fields on an database-backed object instance at runtime. To quote from the docs:
import datetime
from google.appengine.ext import db
class Song(db.Expando):
title = db.StringProperty()
crazy = Song(title='Crazy like a diamond',
author='Lucy Sky',
publish_date='yesterday',
rating=5.0)
crazy.last_minute_note=db.Text('Get a train to the station.')
Google App Engine currently supports both Python and the Django framework. Might be worth looking into if this is the best way to express your models.
Traditional relational database models don't have this kind of column-addition flexibility. If your datatypes are simple enough you could break from traditional RDBMS philosophy and hack values into a single column via serialization as #Ned Batchelder proposes; however, if you have to use an RDBMS, Django model inheritance is probably the way to go. Notably, it will create a one-to-one foreign key relation for each level of derivation.
This question is old, but I was having the same problem, ended here and the chosen answer couldn't solve my problem anymore.
If you want to store dictionaries in Django or REST Api, either to be used as objects in your front end, or because your data won't necessarily have the same structure, the solution I used can help you.
When saving the data in your API, use json.dump() method to be able to store it in a proper json format, as described in this question.
If you use this structure, your data will already be in the appropriate json format to be called in the front end with JSON.parse() in your ajax (or whatever) call.
I use a textfield and json.loads()/json.dumps()
models.py
import json
from django.db import models
class Item(models.Model):
data = models.TextField(blank=True, null=True, default='{}')
def save(self, *args, **kwargs):
## load the current string and
## convert string to python dictionary
data_dict = json.loads(self.data)
## do something with the dictionary
for something in somethings:
data_dict[something] = some_function(something)
## if it is empty, save it back to a '{}' string,
## if it is not empty, convert the dictionary back to a json string
if not data_dict:
self.data = '{}'
else:
self.data = json.dumps(data_dict)
super(Item, self).save(*args, **kwargs)
Django-Geo includes a "DictionaryField" you might find helpful:
http://code.google.com/p/django-geo/source/browse/trunk/fields.py?r=13#49
In general, if you don't need to query across the data use a denormalized approach to avoid extra queries. User settings are a pretty good example!
I agree that you need to refrain stuffing otherwise structured data into a single column. But if you must do that, Django has an XMLField build-in.
There's also JSONField at Django snipplets.
Being "not equal to all instances of the model" sounds to me like a good match for a "Schema-free database". CouchDB is the poster child for that approach and you might consider that.
In a project I moved several tables which never played very nice with the Django ORM over to CouchDB and I'm quite happy with that. I use couchdb-python without any of the Django-specific CouchDB modules. A description of the data model can be found here. The movement from five "models" in Django to 3 "models" in Django and one CouchDB "database" actually slightly reduced the total lines of code in my application.
I know this is an old question, but today (2021) the cleanest alternative is to use the native JSONfield (since django 3.1)
docs: https://docs.djangoproject.com/en/3.2/ref/models/fields/#django.db.models.JSONField
you just create a field in the model called jsonfield inside the class model and voilá
Think it over, and find the commonalities of each data set... then define your model. It may require the use of subclasses or not. Foreign keys representing commonalities aren't to be avoided, but encouraged when they make sense.
Stuffing random data into a SQL table is not smart, unless it's truly non-relational data. If that's the case, define your problem and we may be able to help.
If you are using Postgres, you can use an hstore field: https://docs.djangoproject.com/en/1.10/ref/contrib/postgres/fields/#hstorefield.