I am trying to ensure name uniqueness in a MongoAlchemy-backed model, and am uncertain how to go about it.
My first attempt involved writing a wrap validator which checked for existing database entries with the same name and checked against them (to ensure that there were either 0 or 1 entries with the same name), but this failed because the validator only receives the string with the name, not the entire object (so comparing mongo_ids was impossible).
What's the best way to ensure that objects of a single class all have unique names?
You should use a unique index.
http://www.mongoalchemy.org/api/schema/document.html#mongoalchemy.document.Index
>>> class Person(Document):
... name = StringField()
... name_index = Index().ascending('name').unique()
The database will enforce the constraint for you. It's just wrapping the code that mongo already has here:
http://docs.mongodb.org/manual/tutorial/create-a-unique-index/
Related
I just came across a scenario that I don't know how to resolve with the existing structure of my documents. As shown below I can obviously resolve this problem with some refactoring but I am curious about how this could be resolve the most efficiently possible and respecting the same structure.
Please see that this queestion is different to How to Do An Atomic Update on an EmbeddedDocument in a ListField in MongoEngine?
Let's suppose the following models:
class Scans(mongoengine.EmbeddedDocument):
peer = mongoengine.ReferenceField(Peers, required=True)
site = mongoengine.ReferenceField(Sites, required=True)
process_name = mongoengine.StringField(default=None)
documents = mongoengine.ListField(mongoengine.ReferenceField('Documents'))
is_complete = mongoengine.BooleanField(default=False)
to_start_at = mongoengine.DateTimeField()
started = mongoengine.DateTimeField()
finished = mongoengine.DateTimeField()
class ScanSettings(mongoengine.Document):
site = mongoengine.ReferenceField(Sites, required=True)
max_links = mongoengine.IntField(default=100)
max_size = mongoengine.IntField(default=1024)
mime_types = mongoengine.ListField(default=['text/html'])
is_active = mongoengine.BooleanField(default=True)
created = mongoengine.DateTimeField(default=datetime.datetime.now)
repeat = mongoengine.StringField(choices=REPEAT_PATTERN)
scans = mongoengine.EmbeddedDocumentListField(Scans)
What I would like to do is to insert a ScanSettings object if and only if all elements of the scans fields - list of Scans embedded documents - have in turn their document list unique? By unique I mean all elements within the list at database level and not the whole list - that'd be easy.
In plain English if at the time of inserting ScanSetting any element of the scans list has a instance of scans which list of documents are duplicated, then such insertion should not happen. I mean uniqueness at the database level, taking into account existing records if any.
Given that Mongo does not support uniqueness across all elements of a list within the same document I find two solutions:
Option A
I refactor my "schema" and make Scans collection inherit from Document rather than Embedded document and change the scans field of ScanSettings to be a ListField of ReferenceFields to Scans documents. Then it is easy as I just need to save the Scans first using "Updates" with operator "add_to_set" and option "upsert=True". Then once the operation has been approved, save the ScanSettings. I will need the number of scans instances to insert + 1 queries.
Option B
I keep the same "schema" but somehow generates unique IDs for the Scans embedded document. Then before any insertion of Scan Settings with a non-empty scans field I'll fetch the already existing records to see if there are duplicated document's ObjectIds among the just retrieved records and the ones to be inserted.
In other words I would check uniqueness via Python rather than using MogoneEngine/Mongodb. I will need 2 x number of scan instances to insert (read + update with add_set_operator) + 1 ScanSettings save
Option C
Ignore Uniqueness. Given how my model will be structured I am pretty sure there will be no duplicates or if any, it will be negligible. Then deal with duplicates at reading time. For those like me coming from Relational databases this solution feels hitching.
I am a novice in Mongo so I appreciate any comments. Thanks.
PS: I am using latest MongoEngine and free Mongodb.
Thanks a lot in advance.
I finally went for Option A so I refactor my model to:
a) Create a Mixin class that inherits from a Document class to add two methods: overriding 'save' so that it only allows saves when the list of unique documents is empty and 'save_with_uniqueness' which allows saves and/or updates when the list of documents is empty. The idea is to enforce uniqueness.
b) Refactor both Scans and ScanSettings sot that the former redefine the 'scans' field as a ListField of references to Scans and the latter so that inherits from Document rather than Embedded Document.
c) The reality is that Scans and ScanSettings are now inheriting from the Mixin class as both classes need to guarantee uniqueness for both of their attribute 'documents' and 'scans', respectively. Hence the Mixin class.
With a) and b) I can guarantee uniqueness and save first each scan instance for it to later on be added to ScanSettings.scans in the usual way.
A few points for novices like me:
See that I am using inheritance. For it to work you need too add a attribute in the meta dictionary to allow inheritance as shown in the model below.
In my case I wanted to have Scans and ScanSettings in different collections so I had to make them 'abstract' as shown in the meta dictionary of the Mixin class too.
For the save_with_uniqueness I used upsert=True so that if a record can created if this does not exist. The idea is to use 'save_with_uniqueness' in the same way as 'save, creating or updating a document if this exist or not.
I also used 'full_result' flag as I need to return ObjectId of the latest record inserted.
Document._fields is a dictionary that contain the fields that compose that document. I actually wanted to create a general-purpose save_with_uniqueness method so I did not want to have to manually type in the fields of the Document or just duplicate unnecessary code - hence the Mixin.
Finally the code. It's not fully tested but enough to get the main idea right for what I need.
class UniquenessMixin(mongoengine.Document):
def save(self, *args, **kwargs):
try:
many_unique = kwargs['many_unique']
except KeyError:
pass
else:
attribute = getattr(self, many_unique)
self_name = self.__class__.__name__
if len(attribute):
raise errors.DbModelOperationError(f"It looks like you are trying to save a {self.__class__.__name__} "
f"object with a non-empty list of {many_unique}. "
f"Please use '{self_name.lower()}.save_with_uniqueness()' instead")
return super().save(*args, **kwargs)
def save_with_uniqueness(self, many_unique):
attribute = getattr(self, many_unique)
self_name = self.__class__.__name__
if not len(attribute):
raise errors.DbModelOperationError(f"It looks like you are trying to save a {self_name} object with an "
f"empty list {many_unique}. Please use '{self_name.lower()}.save()' "
f"instead")
updates, removals = self._delta()
if not updates:
raise errors.DbModelOperationError(f"It looks like you are trying to update '{self.__class__.__name__}' "
f"but no fields were modified since this object was created")
kwargs = {(key if key != many_unique else 'add_to_set__' + key): value for key, value in updates.items()}
pk = bson.ObjectId() if not self.id else self.id
result = self.__class__.objects(id=pk).update(upsert=True, full_result=True, **kwargs)
try:
self.id = result['upserted']
except KeyError:
pass
finally:
return self.id
meta = {'allow_inheritance': True, 'abstract': True}
class Scans(UniquenessMixin):
peer = mongoengine.ReferenceField(Peers, required=True)
site = mongoengine.ReferenceField(Sites, required=True)
process_name = mongoengine.StringField(default=None)
documents = mongoengine.ListField(mongoengine.ReferenceField('Documents'))
is_complete = mongoengine.BooleanField(default=False)
to_start_at = mongoengine.DateTimeField()
started = mongoengine.DateTimeField()
finished = mongoengine.DateTimeField()
meta = {'collection': 'Scans'}
class ScanSettings(UniquenessMixin):
site = mongoengine.ReferenceField(Sites, required=True)
max_links = mongoengine.IntField(default=100)
max_size = mongoengine.IntField(default=1024)
mime_types = mongoengine.ListField(default=['text/html'])
is_active = mongoengine.BooleanField(default=True)
created = mongoengine.DateTimeField(default=datetime.datetime.now)
repeat = mongoengine.StringField(choices=REPEAT_PATTERN)
scans = mongoengine.ListField(mongoengine.ReferenceField(Scans))
meta = {'collection': 'ScanSettings'}
I want to duplicate (copy) an object mapped by SQLAlchemy. It should only copy the data created by me, not all the underliying stuff. It shouldn't copy the primary keys or unique values.
This is usefull when creating new data entries which differ only a little from the last one. So the user doesn't have to enter all data again.
An important requirement is that this need to work when the column name in the table (e.g. name) and the memeber name (e.g. _name) in the python class are not the same.
This (simplified) code work for all declarative_base() derived classes BUT ONLY when the col-name and the member-name are the same.
import sqlalchemy as sa
def DuplicateObject(oldObj):
mapper = sa.inspect(type(oldObj))
newObj = type(oldObj)()
for col in mapper.columns:
# no PrimaryKey not Unique
if not col.primary_key and not col.unique:
setattr(newObj, col.key, getattr(oldObj, col.key))
return newObj
col.key is the name of the column in the table. When the member name in the python class is different this wouldn't work. I don't know how SQLAlchemy connect the column-name with the member-name. How does SQLA know this connection? How can I take care of it?
import sqlalchemy as sqa
def duplicate_object(old_obj):
# SQLAlchemy related data class?
if not isinstance(old_obj, _Base):
raise TypeError('The given parameter with type {} is not '
'mapped by SQLAlchemy.'.format(type(old_obj)))
mapper = sa.inspect(type(old_obj))
new_obj = type(old_obj)()
for name, col in mapper.columns.items():
# no PrimaryKey not Unique
if not col.primary_key and not col.unique:
setattr(new_obj, name, getattr(old_obj, name))
return new_obj
It looks like that this work. Even when the members have begin with double underlines (__name).
But someone on the SQLA-mailinglist mentioned
It’s not a generalized solution for the whole world, though. It doesn’t take into account columns that are part of unique Index objects or columns that are mentioned in standalone UniqueConstraint objects.
But because the SQLA-docu is (for me!) quite hard to read and understand I am not really sure what happen in that code - especially in the for-construct. What is behind items() and why are there two parameters (name, col)?
I want to override the save() method of my model and check changes to some of the fields:
def save(self):
if self.counter != self.original_counter: # that's what I want
...
I saw this question was asked before and the answer was to get the object from the db and compare the db value with the current value:
def save(self):
original = MyModel.objects.get(pk=self.pk)
if self.counter != original.counter:
...
but that's a waste of a db query, it's easy to get what I want if on every instance initialization the __init__ method will initialize 2 attributes for each field - obj.<attr> and also obj.original_<attr>, do I need to implement it myself or is there a django package that can do it for me?
I don't think there is a way you can get the original values like that. Even if you implement the pseudo original_* fields yourself, you'd end up doing a MyModel.objects.get(...) anyways.
The issue is that inside the save() method, the object has already been saved. So you see the new values. There is no way to see the original values without querying the database.
I need some properties to be unique. How can I achieve this?
Is there something like unique=True?
I'm using Google App Engine for Python.
Google has provided function to do that:
http://code.google.com/appengine/docs/python/datastore/modelclass.html#Model_get_or_insert
Model.get_or_insert(key_name, **kwds)
Attempts to get the entity of the model's kind with the given key name. If it exists, get_or_insert() simply returns it. If it doesn't exist, a new entity with the given kind, name, and parameters in kwds is created, stored, and returned.
The get and subsequent (possible) put are wrapped in a transaction to ensure atomicity. Ths means that get_or_insert() will never overwrite an existing entity, and will insert a new entity if and only if no entity with the given kind and name exists.
In other words, get_or_insert() is equivalent to this Python code:
def txn():
entity = MyModel.get_by_key_name(key_name, parent=kwds.get('parent'))
if entity is None:
entity = MyModel(key_name=key_name, **kwds)
entity.put()
return entity
return db.run_in_transaction(txn)
Arguments:
key_name
The name for the key of the entity
**kwds
Keyword arguments to pass to the model class's constructor if an instance with the specified key name doesn't exist. The parent argument is required if the desired entity has a parent.
Note: get_or_insert() does not accept an RPC object.
The method returns an instance of the model class that represents the requested entity, whether it existed or was created by the method. As with all datastore operations, this method can raise a TransactionFailedError if the transaction could not be completed.
There's no built-in constraint for making sure a value is unique. You can do this however:
query = MyModel.all(keys_only=True).filter('unique_property', value_to_be_used)
entity = query.get()
if entity:
raise Exception('unique_property must have a unique value!')
I use keys_only=True because it'll improve the performance slightly by not fetching the data for the entity.
A more efficient method would be to use a separate model with no fields whose key name is made up of property name + value. Then you could use get_by_key_name to fetch one or more of these composite key names and if you get one or more not-None values, you know there are duplicate values (and checking which values were not None, you'll know which ones were not unique.)
As onebyone mentioned in the comments, these approaches – by their get first, put later nature – run the risk concurrency issues. Theoretically, an entity could be created just after the check for an existing value, and then the code after the check will still execute, leading to duplicate values. To prevent this, you will have to use transactions: Transactions - Google App Engine
If you're looking to check for uniqueness across all entities with transactions, you'd have to put all of them in the same group using the first method, which would be very inefficient. For transactions, use the second method like this:
class UniqueConstraint(db.Model):
#classmethod
def check(cls, model, **values):
# Create a pseudo-key for use as an entity group.
parent = db.Key.from_path(model.kind(), 'unique-values')
# Build a list of key names to test.
key_names = []
for key in values:
key_names.append('%s:%s' % (key, values[key]))
def txn():
result = cls.get_by_key_name(key_names, parent)
for test in result:
if test: return False
for key_name in key_names:
uc = cls(key_name=key_name, parent=parent)
uc.put()
return True
return db.run_in_transaction(txn)
UniqueConstraint.check(...) will assume that every single key/value pair must be unique to return success. The transaction will use a single entity group for every model kind. This way, the transaction is reliable for several different fields at once (for only one field, this would be much simpler.) Also, even if you've got fields with the same name in one or more models, they will not conflict with each other.
I have a simple "Invoices" class with a "Number" attribute that has to
be assigned by the application when the user saves an invoice. There
are some constraints:
1) the application is a (thin) client-server one, so whatever
assigns the number must look out for collisions
2) Invoices has a "version" attribute too, so I can't use a simple
DBMS-level autoincrementing field
I'm trying to build this using a custom Type that would kick in every
time an invoice gets saved. Whenever process_bind_param is called with
a None value, it will call a singleton of some sort to determine the
number and avoid collisions. Is this a decent solution?
Anyway, I'm having a problem.. Here's my custom Type:
class AutoIncrement(types.TypeDecorator):
impl = types.Unicode
def copy(self):
return AutoIncrement()
def process_bind_param(self, value, dialect):
if not value:
# Must find next autoincrement value
value = "1" # Test value :)
return value
My problem right now is that when I save an Invoice and AutoIncrement
sets "1" as value for its number, the Invoice instance doesn't get
updated with the new number.. Is this expected? Am I missing
something?
Many thanks for your time!
(SQLA 0.5.3 on Python 2.6, using postgreSQL 8.3)
Edit: Michael Bayer told me that this behaviour is expected, since TypeDecorators don't deal with default values.
Is there any particular reason you don't just use a default= parameter in your column definition? (This can be an arbitrary Python callable).
def generate_invoice_number():
# special logic to generate a unique invoice number
class Invoice(DeclarativeBase):
__tablename__ = 'invoice'
number = Column(Integer, unique=True, default=generate_invoice_number)
...