When we define a model in django we write something like..
class Student(models.Model):
name = models.CharField(max_length=64)
age = models.IntegerField()
...
where, name = models.CharField() implies that name would be an object of models.CharField. When we have to make an object of student we simple do..
my_name = "John Doe"
my_age = 18
s = Student.objects.create(name=my_name, age=my_age)
where my_name and my_age are string and integer data types respectively, and not an object of models.CharField/models.IntegerField. Although while assigning the values the respective validations are performed (like checking on the max_length for CharField)
I'm trying to build similar models for an abstraction of Neo4j over Django but not able to get this workflow. How can I implement this ?
Found a similar question but didn't find it helpful enough.
How things work
First thing I we need to understand that each field on your models has own validation, this one refer to the CharField(_check_max_length_attribute) and it also calling the super on method check from the Field class to validate some basic common things.
That in mind, we now move to the create method which is much more complicated and total different thing, the basics operations for specific object:
Create a python object
Call save()
Using a lot of getattrs the save does tons of validation
Commit to the DB, if anything wrong goes from the DB, raise it to the user
A third thing you need to understand that when you query an object it first get the data from the db, and then(after long process) it set the data to the object.
Simple Example
class BasicCharField:
def __init__(self, max_len):
self.max_len = max_len
def validate(self, value):
if value > self.max_len:
raise ValueError('the value must be lower than {}'.format(self.max_len))
class BasicModel:
score = BasicCharField(max_len=4)
#staticmethod
def create(**kwargs):
obj = BasicModel()
obj.score = kwargs['score']
obj.save()
return obj
def save(self):
# Lots of validations here
BasicModel.score.validate(self.score)
# DB commit here
BasicModel.create(score=5)
And like we was expecting:
>>> ValueError: the value must be lower than 4
Obviously I had to simplify things to make it into few lines of code, you can improve this by a lot (like iterate over the attribute and not hardcode it like obj.score = ...)
Related
I have a model A and want to make subclasses of it.
class A(models.Model):
type = models.ForeignKey(Type)
data = models.JSONField()
def compute():
pass
class B(A):
def compute():
df = self.go_get_data()
self.data = self.process(df)
class C(A):
def compute():
df = self.go_get_other_data()
self.data = self.process_another_way(df)
# ... other subclasses of A
B and C should not have their own tables, so I decided to use the proxy attirbute of Meta. However, I want there to be a table of all the implemented proxies.
In particular, I want to keep a record of the name and description of each subclass.
For example, for B, the name would be "B" and the description would be the docstring for B.
So I made another model:
class Type(models.Model):
# The name of the class
name = models.String()
# The docstring of the class
desc = models.String()
# A unique identifier, different from the Django ID,
# that allows for smoothly changing the name of the class
identifier = models.Int()
Now, I want it so when I create an A, I can only choose between the different subclasses of A.
Hence the Type table should always be up-to-date.
For example, if I want to unit-test the behavior of B, I'll need to use the corresponding Type instance to create an instance of B, so that Type instance already needs to be in the database.
Looking over on the Django website, I see two ways to achieve this: fixtures and data migrations.
Fixtures aren't dynamic enough for my usecase, since the attributes literally come from the code. That leaves me with data migrations.
I tried writing one, that goes something like this:
def update_results(apps, schema_editor):
A = apps.get_model("app", "A")
Type = apps.get_model("app", "Type")
subclasses = get_all_subclasses(A)
for cls in subclasses:
id = cls.get_identifier()
Type.objects.update_or_create(
identifier=id,
defaults=dict(name=cls.__name__, desc=cls.__desc__)
)
class Migration(migrations.Migration):
operations = [
RunPython(update_results)
]
# ... other stuff
The problem is, I don't see how to store the identifier within the class, so that the Django Model instance can recover it.
So far, here is what I have tried:
I have tried using the fairly new __init_subclass__ construct of Python. So my code now looks like:
class A:
def __init_subclass__(cls, identifier=None, **kwargs):
super().__init_subclass__(**kwargs)
if identifier is None:
raise ValueError()
cls.identifier = identifier
Type.objects.update_or_create(
identifier=identifier,
defaults=dict(name=cls.__name__, desc=cls.__doc__)
)
# ... the rest of A
# The identifier should never change, so that even if the
# name of the class changes, we still know which subclass is referred to
class B(A, identifier=3):
# ... the rest of B
But this update_or_create fails when the database is new (e.g. during unit tests), because the Type table does not exist.
When I have this problem in development (we're still in early stages so deleting the DB is still sensible), I have to go
comment out the update_or_create in __init_subclass__. I can then migrate and put it back in.
Of course, this solution is also not great because __init_subclass__ is run way more than necessary. Ideally this machinery would only happen at migration.
So there you have it! I hope the problem statement makes sense.
Thanks for reading this far and I look forward to hearing from you; even if you have other things to do, I wish you a good rest of your day :)
With a little help from Django-expert friends, I solved this with the post_migrate signal.
I removed the update_or_create in __init_subclass, and in project/app/apps.py I added:
from django.apps import AppConfig
from django.db.models.signals import post_migrate
def get_all_subclasses(cls):
"""Get all subclasses of a class, recursively.
Used to get a list of all the implemented As.
"""
all_subclasses = []
for subclass in cls.__subclasses__():
all_subclasses.append(subclass)
all_subclasses.extend(get_all_subclasses(subclass))
return all_subclasses
def update_As(sender=None, **kwargs):
"""Get a list of all implemented As and write them in the database.
More precisely, each model is used to instantiate a Type, which will be used to identify As.
"""
from app.models import A, Type
subclasses = get_all_subclasses(A)
for cls in subclasses:
id = cls.identifier
Type.objects.update_or_create(identifier=id, defaults=dict(name=cls.__name__, desc=cls.__doc__))
class MyAppConfig(AppConfig):
default_auto_field = "django.db.models.BigAutoField"
name = "app"
def ready(self):
post_migrate.connect(update_As, sender=self)
Hope this is helpful for future Django coders in need!
I just came across a scenario that I don't know how to resolve with the existing structure of my documents. As shown below I can obviously resolve this problem with some refactoring but I am curious about how this could be resolve the most efficiently possible and respecting the same structure.
Please see that this queestion is different to How to Do An Atomic Update on an EmbeddedDocument in a ListField in MongoEngine?
Let's suppose the following models:
class Scans(mongoengine.EmbeddedDocument):
peer = mongoengine.ReferenceField(Peers, required=True)
site = mongoengine.ReferenceField(Sites, required=True)
process_name = mongoengine.StringField(default=None)
documents = mongoengine.ListField(mongoengine.ReferenceField('Documents'))
is_complete = mongoengine.BooleanField(default=False)
to_start_at = mongoengine.DateTimeField()
started = mongoengine.DateTimeField()
finished = mongoengine.DateTimeField()
class ScanSettings(mongoengine.Document):
site = mongoengine.ReferenceField(Sites, required=True)
max_links = mongoengine.IntField(default=100)
max_size = mongoengine.IntField(default=1024)
mime_types = mongoengine.ListField(default=['text/html'])
is_active = mongoengine.BooleanField(default=True)
created = mongoengine.DateTimeField(default=datetime.datetime.now)
repeat = mongoengine.StringField(choices=REPEAT_PATTERN)
scans = mongoengine.EmbeddedDocumentListField(Scans)
What I would like to do is to insert a ScanSettings object if and only if all elements of the scans fields - list of Scans embedded documents - have in turn their document list unique? By unique I mean all elements within the list at database level and not the whole list - that'd be easy.
In plain English if at the time of inserting ScanSetting any element of the scans list has a instance of scans which list of documents are duplicated, then such insertion should not happen. I mean uniqueness at the database level, taking into account existing records if any.
Given that Mongo does not support uniqueness across all elements of a list within the same document I find two solutions:
Option A
I refactor my "schema" and make Scans collection inherit from Document rather than Embedded document and change the scans field of ScanSettings to be a ListField of ReferenceFields to Scans documents. Then it is easy as I just need to save the Scans first using "Updates" with operator "add_to_set" and option "upsert=True". Then once the operation has been approved, save the ScanSettings. I will need the number of scans instances to insert + 1 queries.
Option B
I keep the same "schema" but somehow generates unique IDs for the Scans embedded document. Then before any insertion of Scan Settings with a non-empty scans field I'll fetch the already existing records to see if there are duplicated document's ObjectIds among the just retrieved records and the ones to be inserted.
In other words I would check uniqueness via Python rather than using MogoneEngine/Mongodb. I will need 2 x number of scan instances to insert (read + update with add_set_operator) + 1 ScanSettings save
Option C
Ignore Uniqueness. Given how my model will be structured I am pretty sure there will be no duplicates or if any, it will be negligible. Then deal with duplicates at reading time. For those like me coming from Relational databases this solution feels hitching.
I am a novice in Mongo so I appreciate any comments. Thanks.
PS: I am using latest MongoEngine and free Mongodb.
Thanks a lot in advance.
I finally went for Option A so I refactor my model to:
a) Create a Mixin class that inherits from a Document class to add two methods: overriding 'save' so that it only allows saves when the list of unique documents is empty and 'save_with_uniqueness' which allows saves and/or updates when the list of documents is empty. The idea is to enforce uniqueness.
b) Refactor both Scans and ScanSettings sot that the former redefine the 'scans' field as a ListField of references to Scans and the latter so that inherits from Document rather than Embedded Document.
c) The reality is that Scans and ScanSettings are now inheriting from the Mixin class as both classes need to guarantee uniqueness for both of their attribute 'documents' and 'scans', respectively. Hence the Mixin class.
With a) and b) I can guarantee uniqueness and save first each scan instance for it to later on be added to ScanSettings.scans in the usual way.
A few points for novices like me:
See that I am using inheritance. For it to work you need too add a attribute in the meta dictionary to allow inheritance as shown in the model below.
In my case I wanted to have Scans and ScanSettings in different collections so I had to make them 'abstract' as shown in the meta dictionary of the Mixin class too.
For the save_with_uniqueness I used upsert=True so that if a record can created if this does not exist. The idea is to use 'save_with_uniqueness' in the same way as 'save, creating or updating a document if this exist or not.
I also used 'full_result' flag as I need to return ObjectId of the latest record inserted.
Document._fields is a dictionary that contain the fields that compose that document. I actually wanted to create a general-purpose save_with_uniqueness method so I did not want to have to manually type in the fields of the Document or just duplicate unnecessary code - hence the Mixin.
Finally the code. It's not fully tested but enough to get the main idea right for what I need.
class UniquenessMixin(mongoengine.Document):
def save(self, *args, **kwargs):
try:
many_unique = kwargs['many_unique']
except KeyError:
pass
else:
attribute = getattr(self, many_unique)
self_name = self.__class__.__name__
if len(attribute):
raise errors.DbModelOperationError(f"It looks like you are trying to save a {self.__class__.__name__} "
f"object with a non-empty list of {many_unique}. "
f"Please use '{self_name.lower()}.save_with_uniqueness()' instead")
return super().save(*args, **kwargs)
def save_with_uniqueness(self, many_unique):
attribute = getattr(self, many_unique)
self_name = self.__class__.__name__
if not len(attribute):
raise errors.DbModelOperationError(f"It looks like you are trying to save a {self_name} object with an "
f"empty list {many_unique}. Please use '{self_name.lower()}.save()' "
f"instead")
updates, removals = self._delta()
if not updates:
raise errors.DbModelOperationError(f"It looks like you are trying to update '{self.__class__.__name__}' "
f"but no fields were modified since this object was created")
kwargs = {(key if key != many_unique else 'add_to_set__' + key): value for key, value in updates.items()}
pk = bson.ObjectId() if not self.id else self.id
result = self.__class__.objects(id=pk).update(upsert=True, full_result=True, **kwargs)
try:
self.id = result['upserted']
except KeyError:
pass
finally:
return self.id
meta = {'allow_inheritance': True, 'abstract': True}
class Scans(UniquenessMixin):
peer = mongoengine.ReferenceField(Peers, required=True)
site = mongoengine.ReferenceField(Sites, required=True)
process_name = mongoengine.StringField(default=None)
documents = mongoengine.ListField(mongoengine.ReferenceField('Documents'))
is_complete = mongoengine.BooleanField(default=False)
to_start_at = mongoengine.DateTimeField()
started = mongoengine.DateTimeField()
finished = mongoengine.DateTimeField()
meta = {'collection': 'Scans'}
class ScanSettings(UniquenessMixin):
site = mongoengine.ReferenceField(Sites, required=True)
max_links = mongoengine.IntField(default=100)
max_size = mongoengine.IntField(default=1024)
mime_types = mongoengine.ListField(default=['text/html'])
is_active = mongoengine.BooleanField(default=True)
created = mongoengine.DateTimeField(default=datetime.datetime.now)
repeat = mongoengine.StringField(choices=REPEAT_PATTERN)
scans = mongoengine.ListField(mongoengine.ReferenceField(Scans))
meta = {'collection': 'ScanSettings'}
I am using Django's dumpdata to save data and loaddata to reload it. I am also using natural keys. My model looks similar to this:
class LinkManager(models.Manager):
def get_by_natural_key(self, url):
return self.get(url=url)
class Link(models.Model):
objects = LinkManager()
title = models.CharField(max_length=200)
url = models.URLField()
def natural_key(self):
return (self.url, )
If I export and reimport the data, Django recognizes that the objects already exist and doesn't create duplicates. If I change the title, it correctly updates the objects. However, if I change the URL, it correctly treats it as a new object - although I forgot to mark url unique! How does it guess my intent?
How does django know that my url field is the natural key? There is no get_natural_fields function. Django could call natural_key on the class instead of an instance to get the fields, but that seems really brittle:
>>> [f.field_name for f in Link.natural_key(Link)]
['url']
The reason I want to know this is that I am writing my own special importer (to replace my use of loaddata), and I would like to take advantage of natural keys without hardcoding the natural key (or the "identifying" fields) for each model. Currently, I "identify" an object by it's unique fields - I do:
obj, created = Model.objects.update_or_create(**identifying, defaults=other)
but Django seems to be choosing it's "identifying" fields differently.
I think I've found it out. Django does not just call get_by_natural_key, it first calls natural_key. How does it do that, if it doesn't have an instance of the model?
It simply creates an instance, not backed by the database, from the constructor (d'oh!): Model(**data). See build_instance in django.core.serializers.base. Then it calls natural_key on the newly created object, and immediately get_by_natural_key to retrive the pk that belongs to the object, if present in the database. This way, Django does not need to know what fields the natural key depends on, it just needs to know how to get it from data. You can just call save() on the retrieved instance, if it is in the database it will have a pk and will update, if not it will create a new row.
Source of the build_instance function (Django 1.11.2):
def build_instance(Model, data, db):
"""
Build a model instance.
If the model instance doesn't have a primary key and the model supports
natural keys, try to retrieve it from the database.
"""
obj = Model(**data)
if (obj.pk is None and hasattr(Model, 'natural_key') and
hasattr(Model._default_manager, 'get_by_natural_key')):
natural_key = obj.natural_key()
try:
obj.pk = Model._default_manager.db_manager(db).get_by_natural_key(*natural_key).pk
except Model.DoesNotExist:
pass
return obj
Let's say that I have two teams, "red" and "black". And let's say that I have a Story class, which presents similar information in two very different ways, depending on your team:
class Story(models.Model):
red_title = models.CharField()
black_title = models.CharField()
red_prologue = models.TextField()
black_prologue = models.TextField()
# ... and so on ...
def get_field(self, genericName, team):
"""Return the field with suffix genericName belonging to the given team.
>>>self.get_field("prologue", "red") is self.red_prologue
True
>>>self.get_field("title", "black") is self.black_title
True
"""
assert(team in ["red", "black"])
specificName = "{}_{}".format(team, genericName)
return self.__dict__[specificName]
I'm happy with the getter function, but I feel like I should be able to refactor the code which created the fields in the first place. I'd like a function that looks something like this:
def make_fields(self, genericName, fieldType, **kwargs):
"""Create two fields with suffix genericName.
One will be 'red_{genericName}' and one will be 'black_{genericName}'.
"""
for team in ["red", "black"]:
specificName = "{}_{}".format(team, genericName)
self.__dict__[specificName] = fieldType(**kwargs)
But self and __dict__ are meaningless while the class is first defined, and I think Django requires that database fields be class variables rather than instance variables.
So... is there some way to create this make_fields function within Django, or am I out of luck?
Not sure why you're even doing this. A much more sane model would be:
TEAMS = (
("r","red"),
("b","black"),
)
class Story(models.Model):
team = models.CharField(max_length=1, choices=TEAMS)
title = models.CharField()
prologue = models.TextField()
Your current model is creating lots of duplicate columns (for red and black) that should just be defined by a column itself. Using the model above, you queries would be like Story.objects.filter(team="r").
You then wouldn't need your get_field function at all.
No. A Django model shouldn't be treated as something that can be dyamically constructed; it's a Python representation of a database table. For instance, what would be the semantics of changing the format of specificName after you had already run syncdb? There's no definitive, obvious answer - so Django doesn't try to answer it. You columns are defined at the class level, and that's that.
(At some level, you can always drill into the internal ORM data structures and set up these fields - but all you're doing is opening yourself up to a world of ambiguity and not-well-defined problems. Don't do it.)
Here's the situation. I've got an app with multiple users and each user has a group/company they belong to. There is a company field on all models meaning there's a corresponding company_id column in every table in the DB. I want to transparently enforce that, when a user tries to access any object, they are always restricted to objects within their "domain," e.g. their group/company. I could go through every query and add a filter that says .filter(company=user.company), but I'm hoping there's a better way to do at a lower level so it's transparent to whoever is coding the higher level logic.
Does anyone have experience with this and/or can point we to a good resource on how to approach this? I'm assuming this is a fairly common requirement.
You could do something like this:
from django.db import models
from django.db.models.query import QuerySet
class DomainQuerySet(QuerySet):
def applicable(self, user=None):
if user is None:
return self
else:
return self.filter(company=user.company)
class DomainManager(models.Manager):
def get_query_set(self):
return DomainQuerySet(self.model)
def __getattr__(self, name):
return getattr(self.get_query_set(), name)
class MyUser(models.Model):
company = models.ForeignKey('Company')
objects = DomainManager()
MyUser.objects.applicable(user)
Since we are using querysets, the query is chainable so you could also do:
MyUser.objects.applicable().filter(**kwargs)