I have a Book model which has a many-to-many field Author. If I merge two books, I want to make sure that the merged book has only unique authors, no doubles. I thought the best way to achieve this was to override the save() method, and so far I've come to this model:
class Book(models.Model):
authors = models.ManyToManyField()
def save(self):
authors_new = []
authors = self.authors.all()
for author in authors:
if author not in authors_new:
authors_new.append(author)
self.authors = authors_new #This won't work
super(Book, self).save()
The penultimate line obviously doesn't work, but I just can't seem to get the syntax right. I think what I want to achieve is pretty obvious, though. Anyone any idea what the right syntax is?
Edit 1:
To explain the merge: I must say I don’t fully understand the code (written by someone else - the merge takes places over several functions), so showing it here won’t help me. What it does is this: say there’s two books in the database that are obviously the same. The first book has
title= “Harry Potter and the Philosopher’s Stone”
author=“JK Rowling”
year=1999
and the other book has
title=“Harry Potter (book 1)”
author=“JK Rowling”
pages=320
When you merge you need to chose which book is the primary book. If I’d chose the first one to be primary, the merge should end up as
title=“Harry Potter and the Philosopher’s Stone”
author=“JK Rowling”
year=1999
pages=320
Problem is the merge ends up with author=“JK Rowling” twice I thought that I could take out the duplicates in the save function.
Edit 2:
The merge takes places in this function, that I haven't written:
def merge_model_objects(primary_object, *alias_objects):
# Use this function to merge model objects and migrate all of
# the related fields from the alias objects to the primary object.
for alias_object in alias_objects:
meta = alias_object._meta
for related_object in meta.get_all_related_objects():
alias_varname = related_object.get_accessor_name()
obj_varname = related_object.field.name
related_objects = getattr(alias_object, alias_varname)
for obj in related_objects.all():
if getattr(obj, 'do_not_merge', False):
continue
setattr(obj, obj_varname, primary_object)
obj.save()
related_objects = meta.get_all_related_many_to_many_objects()
for related_many_object in related_objects:
alias_varname = related_many_object.get_accessor_name()
obj_varname = related_many_object.field.name
if alias_varname is not None:
# standard case
related_many_objects = getattr(
alias_object, alias_varname).all()
else:
# special case, symmetrical relation, no reverse accessor
related_many_objects = getattr(alias_object, obj_varname).all()
for obj in related_many_objects.all():
getattr(obj, obj_varname).remove(alias_object)
getattr(obj, obj_varname).add(primary_object)
alias_object.delete()
primary_object.save()
return primary_object
This is quite a general function and can merge more than two objects, but if I merge Book 1 (= primary_object) and Book 2 (= alias_object) it will save it as a book with author="JK Rowling" twice.
Django represents many to many fields as sets. So you would probably want to do something like
self.authors = self.authors.union(other_model.authors)
Python sets are collections of unique, unordered objects.
Related
I have a class A which is used as a Foreign Key in many other classes.
class A(models.Model):
pass
class B(models.Model):
a: A = ForeignKey(A)
class C(models.Model):
other_name: A = ForeignKey(A)
Now I have a database with a huge table of A objects and many classes like B and C who reference A (say potentially dozens). In this table, there are many objects (100k+) and I want to clean up all objects that are not actively referenced by other objects with a Foreign Key. For example, object 1 of class A is not referenced by class B and C.
How would I do this? I already came up with the following code:
a_list: list = list()
classes: list[tuple] = [(B, "a"), (C, "other_name")]
for cl, field in classes:
field_object: Field = cl._meta.get_field(field)
for obj in cl.objects.all():
a: A = field_object.value_from_object(obj)
a_list.append(a)
to_remove: list[A] = [a for a in A.objects.all() if a not in a_list]
for a in to_remove():
a.remove()
This leaves me with a few questions:
What if I don't know the full list of classes and fields (the case since it is a large group)?
Is this the most efficient way to do this for a large table with many unrelated objects (say 95%)? I guess I can optimize this a lot.
You can filter with:
A.objects.filter(b=None, c=None).delete()
This will make proper JOINs and thus determine the items in a single querying, without having to fetch all other model records from the database.
But this will be expensive anyway, since the triggers are done by Django that will thus "collect" all A objects.
If you do not know what is referencing A, you can work with the meta of the model, so:
from django.db.models.fields.reverse_related import OneToOneRel
fields = {
f.related_query_name: None
for f in A._meta.get_fields()
if isinstance(f, ManyToOneRel)
}
A.objects.filter(**fields).delete()
This will look for all ForeignKeys and OneToOneFields from other models that target (directly) the A model, then make LEFT OUTER JOINs and filter on NULL, and then delete those.
I would advise to first inspect A.objects.filter(**fields) however, and make sure you do not remove any items that are still necessary.
I have the following models:
class Member(models.Model):
ref = models.CharField(max_length=200)
# some other stuff
def __str__(self):
return self.ref
class Feature(models.Model):
feature_id = models.BigIntegerField(default=0)
members = models.ManyToManyField(Member)
# some other stuff
A Member is basically just a pointer to a Feature. So let's say I have Features:
feature_id = 2, members = 1, 2
feature_id = 4
feature_id = 3
Then the members would be:
id = 1, ref = 4
id = 2, ref = 3
I want to find all of the Features which contain one or more Members from a list of "ok members." Currently my query looks like this:
# ndtmp is a query set of member-less Features which Members can point to
sids = [str(i) for i in list(ndtmp.values('feature_id'))]
# now make a query set that contains all rels and ways with at least one member with an id in sids
okmems = Member.objects.filter(ref__in=sids)
relsways = Feature.geoobjects.filter(members__in=okmems)
# now combine with nodes
op = relsways | ndtmp
This is enormously slow, and I'm not even sure if it's working. I've tried using print statements to debug, just to make sure anything is actually being parsed, and I get the following:
print(ndtmp.count())
>>> 12747
print(len(sids))
>>> 12747
print(okmems.count())
... and then the code just hangs for minutes, and eventually I quit it. I think that I just overcomplicated the query, but I'm not sure how best to simplify it. Should I:
Migrate Feature to use a CharField instead of a BigIntegerField? There is no real reason for me to use a BigIntegerField, I just did so because I was following a tutorial when I began this project. I tried a simple migration by just changing it in models.py and I got a "numeric" value in the column in PostgreSQL with format 'Decimal:( the id )', but there's probably some way around that that would force it to just shove the id into a string.
Use some feature of Many-To-Many Fields which I don't know abut to more efficiently check for matches
Calculate the bounding box of each Feature and store it in another column so that I don't have to do this calculation every time I query the database (so just the single fixed cost of calculation upon Migration + the cost of calculating whenever I add a new Feature or modify an existing one)?
Or something else? In case it helps, this is for a server-side script for an ongoing OpenStreetMap related project of mine, and you can see the work in progress here.
EDIT - I think a much faster way to get ndids is like this:
ndids = ndtmp.values_list('feature_id', flat=True)
This works, producing a non-empty set of ids.
Unfortunately, I am still at a loss as to how to get okmems. I tried:
okmems = Member.objects.filter(ref__in=str(ndids))
But it returns an empty query set. And I can confirm that the ref points are correct, via the following test:
Member.objects.values('ref')[:1]
>>> [{'ref': '2286047272'}]
Feature.objects.filter(feature_id='2286047272').values('feature_id')[:1]
>>> [{'feature_id': '2286047272'}]
You should take a look at annotate:
okmems = Member.objects.annotate(
feat_count=models.Count('feature')).filter(feat_count__gte=1)
relsways = Feature.geoobjects.filter(members__in=okmems)
Ultimately, I was wrong to set up the database using a numeric id in one table and a text-type id in the other. I am not very familiar with migrations yet, but as some point I'll have to take a deep dive into that world and figure out how to migrate my database to use numerics on both. For now, this works:
# ndtmp is a query set of member-less Features which Members can point to
# get the unique ids from ndtmp as strings
strids = ndtmp.extra({'feature_id_str':"CAST( \
feature_id AS VARCHAR)"}).order_by( \
'-feature_id_str').values_list('feature_id_str',flat=True).distinct()
# find all members whose ref values can be found in stride
okmems = Member.objects.filter(ref__in=strids)
# find all features containing one or more members in the accepted members list
relsways = Feature.geoobjects.filter(members__in=okmems)
# combine that with my existing list of allowed member-less features
op = relsways | ndtmp
# prove that this set is not empty
op.count()
# takes about 10 seconds
>>> 8997148 # looks like it worked!
Basically, I am making a query set of feature_ids (numerics) and casting it to be a query set of text-type (varchar) field values. I am then using values_list to make it only contain these string id values, and then I am finding all of the members whose ref ids are in that list of allowed Features. Now I know which members are allowed, so I can filter out all the Features which contain one or more members in that allowed list. Finally, I combine this query set of allowed Features which contain members with ndtmp, my original query set of allowed Features which do not contain members.
Judging by the title this would be the exact same question, but I can't see how any of the answers are applicable to my use case:
I have two classes and a relationship between them:
treatment_association = Table('tr_association', Base.metadata,
Column('chronic_treatments_id', Integer, ForeignKey('chronic_treatments.code')),
Column('animals_id', Integer, ForeignKey('animals.id'))
)
class ChronicTreatment(Base):
__tablename__ = "chronic_treatments"
code = Column(String, primary_key=True)
class Animal(Base):
__tablename__ = "animals"
treatment = relationship("ChronicTreatment", secondary=treatment_association, backref="animals")
I would like to be able to select only the animals which have undergon a treatment which has the code "X". I tried quite a few approaches.
This one fails with an AttributeError:
sql_query = session.query(Animal.treatment).filter(Animal.treatment.code == "chrFlu")
for item in sql_query:
pass
mystring = str(session.query(Animal))
And this one happily returns a list of unfiltered animals:
sql_query = session.query(Animal.treatment).filter(ChronicTreatment.code == "chrFlu")
for item in sql_query:
pass
mystring = str(session.query(Animal))
The closest thing to the example from the aforementioned thread I could put together:
subq = session.query(Animal.id).subquery()
sql_query = session.query(ChronicTreatment).join((subq, subq.c.treatment_id=="chrFlu"))
for item in sql_query:
pass
mystring = str(session.query(Animal))
mydf = pd.read_sql_query(mystring,engine)
Also fails with an AttributeError.
Can you hel me sort this list?
First, there are two issues with table definitions:
1) In the treatment_association you have Integer column pointing to chronic_treatments.code while the code is String column.
I think it's just better to have an integer id in the chronic_treatments, so you don't duplicate the string code in another table and also have a chance to add more fields to chronic_treatments later.
Update: not exactly correct, you still can add more fields, but it will be more complex to change your 'code' if you decide to rename it.
2) In the Animal model you have a relation named treatment. This is confusing because you have many-to-many relation, it should be plural - treatments.
After fixing the above two, it should be clearer why your queries did not work.
This one (I replaced treatment with treatments:
sql_query = session.query(Animal.treatments).filter(
Animal.treatments.code == "chrFlu")
The Animal.treatments represents a many-to-many relation, it is not an SQL Alchemy mode, so you can't pass it to the query nor use in a filter.
Next one can't work for the same reason (you pass Animal.treatments into the query.
The last one is closer, you actually need join to get your results.
I think it is easier to understand the query as SQL (and you anyway need to know SQL to be able to use sqlalchemy):
animals = session.query(Animal).from_statement(text(
"""
select distinct animals.* from animals
left join tr_association assoc on assoc.animals_id = animals.id
left join chronic_treatments on chronic_treatments.id = assoc.chronic_treatments_id
where chronic_treatments.code = :code
""")
).params(code='chrFlu')
It will select animals and join chronic_treatments through the tr_association and filter the result by code.
Having this it is easy to rewrite it using SQL-less syntax:
sql_query = session.query(Animal).join(Animal.treatments).filter(
ChronicTreatment.code == "chrFlu")
That will return what you want - a list of animals who have related chronic treatment with given code.
Okay here is the problem. I have this code
list_categories = [None,"mathematics","engineering","science","other"]
class Books(db.Model)
title = db.StringProperty(required=True)
author = db.StringProperty()
isbn = db.StringProperty()
categories = db.StringListProperty(default=None, choices = set(list_categories))
what i want to do here is have my book.categories be a SUBSET of list categories, for example
i have a book whose categories should be 'engineering' and 'mathematics', but when I set
book.categories = ['engineering','mathematics']
it webapp2 gives me an error
BadValueError: Property categories is ['engineering','mathematics']; must be one of set([None,"mathematics","engineering","science","other"])
My initial guess here is that i must set my list_choices to be a POWERSET of [None,"mathematics","engineering","science","other"], but this is too inefficient.
Does anyone know a workaround for this?
The reason for the error (as I'm sure you've guessed) is that StringListProperty does not do any special handling of the choices keyword argument - it simply passes it along to the ListProperty constructor, which in turn passes it to the Property constructor, where it is evaluated:
if self.empty(value):
if self.required:
raise BadValueError('Property %s is required' % self.name)
else:
if self.choices:
match = False
for choice in self.choices:
if choice == value:
match = True
if not match:
raise BadValueError('Property %s is %r; must be one of %r' %
(self.name, value, self.choices))
The issue is that it iterates through each choice individually, but it is comparing it to your entire list (value), which will never result in a match since a string won't equal a list (again, you know this :) ).
My suggestion would be to modify how you assign the list to the property. For instance, instead of:
book.categories = ['engineering','mathematics']
Try something like this:
for category in ['engineering','mathematics']:
book.categories.append(category)
Since the ListProperty contains a list, you can append each item individually so that it passes the test in the aforementioned code. Note that in order to get this to work in my tests, I had to set up the model in a slightly different way - however if you can get to the error you mentioned above, then the append method should work fine.
It makes it a little less straightforward, I agree, but it should circumvent the issue above and hopefully work.
Create a many to many relationship using list of keys. Use the categories property in class Book as a list of keys of class Category.
class Book(db.Model)
title = db.StringProperty(required=True)
author = db.StringProperty()
isbn = db.StringProperty()
# List Of Keys
categories = db.ListProperty(db.Key)
class Category(db.Model)
name = db.StringProperty(choices = ('science', 'engineering', 'math'))
For more info and code samples about modeling check out: https://developers.google.com/appengine/articles/modeling
I have a Django project that has two models: Group and Person. Groups can contain either Person objects or other Group objects. Groups cannot form a cycle (i.e. Group A containing Group B containing Group A), which results in a tree structure where Person objects are leaves.
My question is - how can I count all the contained Group objects and Person objects within a high level Group (like the root Group) with as few SQL queries as possible?
A naive approach with O(N) (where N is # of subgroups) SQL queries would be:
def Group(models.Model):
name = models.CharField(max_length=150)
parent_group = models.ForeignKey('self', related_name=child_groups, null=True, blank=True)
# returns tuple (# of subgroups, # of person objects)
def count_objects(self):
count = (self.child_groups.count(), self.people.count())
for child_group in self.child_groups.all():
# this adds tuples together ( e.g: (1,2) and (1,2) make (2,4) )
tuple(map(operator.add, count, child_group.count_objects()))
def Person(models.Model):
user = models.ForeignKey(User)
picture = models.ImageSpecField(...)
group = models.ForeignKey('Group', related_name="people")
Is there a way to improve this or should I just store these values within the Group object?
So this is an existing problem that many others have tackled. If you're using Django, check this out:
http://django-mptt.github.com/django-mptt/index.html
Within Postgres you could use recursive queries, although there is no direct support for this in Django.
Alternatively you could consider denormalising the count, possibly there are libraries to do this. A quick google gave me: http://pypi.python.org/pypi/django-composition/
If you have to select the same values quite often and they don't change that much, you could try caching them.