How to query Non Unique Together values in Django or SQL?

How to query Non Unique Together values in Django or SQL? - python

Let's say I have this model:
class Contact(BaseModel):
order = models.ForeignKey(Order, related_name='contacts', blank=True, null=True)
type = models.CharField(max_length=15, choices=TYPES, blank=True))
I want to find all orders where order and type are not unique together.
For example, there is order A and there are related contacts:
Contact(order=orderA, type='broker')
Contact(order=orderA, type='broker')
Contact(order=orderA, type='delivery')
I want to find this orderA because this order and type='broker' are not unique together in Contact model.
And then there is orderB and these related contacts:
Contact(order=orderB, type='broker')
Contact(order=orderB, type='delivery')
I don't want this orderB because it and the field type are unique in Contact model.
I tried using annonate() of Django but failed to relate these two fields.
Is it possible to do this with Django queries?
If not, a slight hint of how I could do it in SQL would be greatly appreciated.
Thank you very much.

You could use a SQL query like:
select distinct order_id
from (
select order_id, type
from Contact
group by order_id, type
having count(*) > 1);
The "order" column is shown as "order_id" because that's the way Django names columns.

This should do the trick
qs = (Contact.objects.values('order','type')
.annotate(cnt=models.Count('pk'))
.filter(cnt__gt=1))

You could write a couple methods to solve this problem. I might have gone over board writing these methods for you but regardless heres an explaination.
def equal_to takes self and some other contact, returns true if that contact is the same order and type else false. def all_not_unique returns a list of all not unique contact objects with no duplicates. And should be called like, not_unique = Contact.all_not_unique().
def equal_to(self, other):
assert(self.id != other.id)
if self.order.id == other.order.id:
if self.type == other.type:
return True
return false
#classmethod
def all_not_unique(cls):
query_set = cls.objects.all()
not_unique_query_set = []
for contact_one in query_set:
found = False
for contact_two in query_set:
if not found:
if contact_one.id != contact_two.id:
if contact_one.equal_to(contact_two):
not_unique_query_set.append(contact_one)
found = True

Related

Filter queryset on a foreignkey

It might be not new to all and need your expert guidance. I am trying to filter a column on a django-table2. Note I have not used django-filter here.
class Group(models.Model):
title = models.CharField(blank=True)
class Control(models.Model):
published = models.charField(auto_now=False)
group = models.ForeignKey(Group, on_delete=models.CASCADE)
Now I am trying to filter in views.py as below
table = ControlTable(Control.objects.all().order_by('-published').filter(
group__in=form['contact'].value(),
)
Issue is this is working fine, but when selecting '-----' from dropdown then its showing blank table instead of all the values.
Again if I change the query filter as below
table = ControlTable(Control.objects.all().order_by('-published').filter(
Group__title__iexact=form['contact'].value(),
)
then throwing error Cannot resolve keyword 'Group' into field.
Could you please guide me on this?

That makes sense, if you select ----, then it uses None, so you are filtering for None. You should check for that:
items = ControlTable(Control.objects.all().order_by('-published')
group = form['contact'].value()
if group:
items = items.filter(
group__in=group
)
table = items

Filter Django model on reverse relationship list

I have two Django models as follows:
class Event(models.Model):
name = models.CharField()
class EventPerson(models.Model):
event = models.ForeignKey('Event',on_delete='CASCADE',related_name='event_persons')
person_name = models.CharField()
If an Event exists in the database, it will have exactly two EventPerson objects that are related to it.
What I want to do is to determine if there exists an Event with a given name AND that have a given set of two people (EventPersons) in that event. Is this possible to do in a single Django query?
I know I could write python code like this to check, but I'm hoping for something more efficient:
def event_exists(eventname,person1name,person2name):
foundit=False
for evt in Event.objects.filter(name=eventname):
evtperson_names = [obj.person_name in evt.event_persons.all()]
if len(evtperson_names) == 2 and person1name in evtperson_names and person2name in evtperson_names:
foundit=True
break
return foundit
Or would it be better to refactor the models so that Event has person1name and person2name as its own fields like this:
class Event(models.Model):
name = models.CharField()
person1name = models.CharField()
person2name = models.CharField()
The problem with this is that there is no natural ordering for person1 and person2, ie if the persons are "Bob" and "Sally" then we could have person1name="Bob" and person2name="Sally" or we could have person1name="Sally" and person2name="Bob".
Suggestions?

You can query for EventPerson objects where the event name is as given instead, use the values_list to extract the person_name field, and convert the returning list of values to a set for an unordered comparison:
def event_exists(eventname, person1name, person2name):
return set(EventPerson.objects.filter(event__name=eventname).values_list(
'person_name', flat=True)) == {person1name, person2name}

I modified #blhsing answer slightly adding a filter on names.
def event_exists(eventname, person1name, person2name):
event_people = EventPerson.objects.select_related('event').filter(person_name__in=[person1name, person2name], event__name=eventname)
return set(event_people.values_list('person_name', flat=True)) person1name, person2name}
I would suggest passing EventPerson objects or theird ids to this function instead of just names, would make filtering easier (you wouldn't need a set and filter straight by ids) and more efficient (by using db indices ... or you would have to index person_name as well)

django - prefetch only the newest record?

I am trying to prefetch only the latest record against the parent record.
my models are as such
class LinkTargets(models.Model):
device_circuit_subnet = models.ForeignKey(DeviceCircuitSubnets, verbose_name="Device", on_delete=models.PROTECT)
interface_index = models.CharField(max_length=100, verbose_name='Interface index (SNMP)', blank=True, null=True)
get_bgp = models.BooleanField(default=False, verbose_name="get BGP Data?")
dashboard = models.BooleanField(default=False, verbose_name="Display on monitoring dashboard?")
class LinkData(models.Model):
link_target = models.ForeignKey(LinkTargets, verbose_name="Link Target", on_delete=models.PROTECT)
interface_description = models.CharField(max_length=200, verbose_name='Interface Description', blank=True, null=True)
...
The below query fails with the error
AttributeError: 'LinkData' object has no attribute '_iterable_class'
Query:
link_data = LinkTargets.objects.filter(dashboard=True) \
.prefetch_related(
Prefetch(
'linkdata_set',
queryset=LinkData.objects.all().order_by('-id')[0]
)
)
I thought about getting LinkData instead and doing a select related but ive no idea how to get only 1 record for each link_target_id
link_data = LinkData.objects.filter(link_target__dashboard=True) \
.select_related('link_target')..?
EDIT:
using rtindru's solution, the pre fetched seems to be empty. there is 6 records in there currently, atest 1 record for each of the 3 LinkTargets
>>> link_data[0]
<LinkTargets: LinkTargets object>
>>> link_data[0].linkdata_set.all()
<QuerySet []>
>>>

The reason is that Prefetch expects a Django Queryset as the queryset parameter and you are giving an instance of an object.
Change your query as follows:
link_data = LinkTargets.objects.filter(dashboard=True) \
.prefetch_related(
Prefetch(
'linkdata_set',
queryset=LinkData.objects.filter(pk=LinkData.objects.latest('id').pk)
)
)
This does have the unfortunate effect of undoing the purpose of Prefetch to a large degree.
Update
This prefetches exactly one record globally; not the latest LinkData record per LinkTarget.
To prefetch the max LinkData for each LinkTarget you should start at LinkData: you can achieve this as follows:
LinkData.objects.filter(link_target__dashboard=True).values('link_target').annotate(max_id=Max('id'))
This will return a dictionary of {link_target: 12, max_id: 3223}
You can then use this to return the right set of objects; perhaps filter LinkData based on the values of max_id.
That will look something like this:
latest_link_data_pks = LinkData.objects.filter(link_target__dashboard=True).values('link_target').annotate(max_id=Max('id')).values_list('max_id', flat=True)
link_data = LinkTargets.objects.filter(dashboard=True) \
.prefetch_related(
Prefetch(
'linkdata_set',
queryset=LinkData.objects.filter(pk__in=latest_link_data_pks)
)
)

The following works on PostgreSQL. I understand it won't help OP, but it might be useful to somebody else.
from django.db.models import Count, Prefetch
from .models import LinkTargets, LinkData
link_data_qs = LinkData.objects.order_by(
'link_target__id',
'-id',
).distinct(
'link_target__id',
)
qs = LinkTargets.objects.prefetch_related(
Prefetch(
'linkdata_set',
queryset=link_data_qs,
)
).all()

LinkData.objects.all().order_by('-id')[0] is not a queryset, it is an model object, hence your error.
You could try LinkData.objects.all().order_by('-id')[0:1] which is indeed a QuerySet, but it's not going to work. Given how prefetch_related works, the queryset argument must return a queryset that contains all the LinkData records you need (this is then further filtered, and the items in it joined up with the LinkTarget objects). This queryset only contains one item, so that's no good. (And Django will complain "Cannot filter a query once a slice has been taken" and raise an exception, as it should).
Let's back up. Essentially you are asking an aggregation/annotation question - for each LinkTarget, you want to know the most recent LinkData object, or the 'max' of an 'id' column. The easiest way is to just annotate with the id, and then do a separate query to get all the objects.
So, it would look like this (I've checked with a similar model in my project, so it should work, but the code below may have some typos):
linktargets = (LinkTargets.objects
.filter(dashboard=True)
.annotate(most_recent_linkdata_id=Max('linkdata_set__id'))
# Now, if we need them, lets collect and get the actual objects
linkdata_ids = [t.most_recent_linkdata_id for t in linktargets]
linkdata_objects = LinkData.objects.filter(id__in=linkdata_ids)
# And we can decorate the LinkTarget objects as well if we want:
linkdata_d = {l.id: l for l in linkdata_objects}
for t in linktargets:
if t.most_recent_linkdata_id is not None:
t.most_recent_linkdata = linkdata_d[t.most_recent_linkdata_id]
I have deliberately not made this into a prefetch that masks linkdata_set, because the result is that you have objects that lie to you - the linkdata_set attribute is now missing results. Do you really want to be bitten by that somewhere down the line? Best to make a new attribute that has just the thing you want.

Tricky, but it seems to work:
class ForeignKeyAsOneToOneField(models.OneToOneField):
def __init__(self, to, on_delete, to_field=None, **kwargs):
super().__init__(to, on_delete, to_field=to_field, **kwargs)
self._unique = False
class LinkData(models.Model):
# link_target = models.ForeignKey(LinkTargets, verbose_name="Link Target", on_delete=models.PROTECT)
link_target = ForeignKeyAsOneToOneField(LinkTargets, verbose_name="Link Target", on_delete=models.PROTECT, related_name='linkdata_helper')
interface_description = models.CharField(max_length=200, verbose_name='Interface Description', blank=True, null=True)
link_data = LinkTargets.objects.filter(dashboard=True) \
.prefetch_related(
Prefetch(
'linkdata_helper',
queryset=LinkData.objects.all().order_by('-id'),
'linkdata'
)
)
# Now you can access linkdata:
link_data[0].linkdata
Ofcourse with this approach you can't use linkdata_helper to get related objects.

This is not a direct answer to you question, but solves the same problem. It is possible annotate newest object with a subquery, which I think is more clear. You also don't have to do stuff like Max("id") to limit the prefetch query.
It makes use of django.db.models.functions.JSONObject (added in Django 3.2) to combine multiple fields:
MainModel.objects.annotate(
last_object=RelatedModel.objects.filter(mainmodel=OuterRef("pk"))
.order_by("-date_created")
.values(
data=JSONObject(
id="id", body="body", date_created="date_created"
)
)[:1]
)

Django queries with complex filter

I have the following model:
...
from django.contrib.auth.models import User
class TaxonomyNode(models.Model):
node_id = models.CharField(max_length=20)
name = models.CharField(max_length=100)
...
class Annotation(models.Model):
...
taxonomy_node = models.ForeignKey(TaxonomyNode, blank=True, null=True)
class Vote(models.Model):
created_by = models.ForeignKey(User, related_name='votes', null=True, on_delete=models.SET_NULL)
vote = models.FloatField()
annotation = models.ForeignKey(Annotation, related_name='votes')
...
In the app, a User can produce Vote for an Annotation instance.
A User can vote only once for an Annotation instance.
I want to get a query set with the TaxonomyNode which a User can still annotate a least one of its Annotation. For now, I do it that way:
def user_can_annotate(node_id, user):
if Annotation.objects.filter(node_id=node_id).exclude(votes__created_by=user).count() == 0:
return False
else:
return True
def get_categories_to_validate(user):
"""
Returns a query set with the TaxonomyNode which have Annotation that can be validated by a user
"""
nodes = TaxonomyNode.objects.all()
nodes_to_keep = [node.node_id for node in nodes if self.user_can_annotate(node.node_id, user)]
return nodes.filter(node_id__in=nodes_to_keep)
categories_to_validate = get_category_to_validate(<user instance>)
I guess there is a way to do it in one query, that would speed up the process quite a lot. In brief, I want to exclude from the TaxonomyNode set, all the nodes that have all their annotations already voted once by the user.
Any idea of how I could do it? With django ORM or in SQL?
I have Django version 1.10.6

Try to use this:
#SQL query
unvoted_annotations = Annotation.objects.exclude(votes__created_by=user).select_related('taxonomy_node')
#remove duplicates
taxonomy_nodes=[]
for annotation in unvoted_annotations:
if annotation.taxonomy_node not in taxonomy_nodes:
taxonomy_nodes.append(annotation.taxonomy_node)
There would be only one SQL query as select_related will return the related taxonomy_node in a single query. Also there might be a better way to remove duplicates, eg: by using .distinct().

What I have done so far:
taxonomy_node_pk = [a[0] for a in Annotation.objects.exclude(votes__created_by=user)
.select_related('taxonomy_node').values_list('taxonomy_node').distinct()]
nodes = TaxonomyNode.objects.filter(pk__in=taxonomy_node_pk)
I am doing two queries but the second one is not very costly.
It is quite faster than my original version.
Still what I do is not really beatifull. There is no way to get a query set of TaxonomyNode from the Annotation set directly? And then applying disctinct() in it?

Django - Checking for two models if their primary keys match

I have 2 models (sett, data_parsed), and data_parsed have a foreign key to sett.
class sett(models.Model):
setid = models.IntegerField(primary_key=True)
block = models.ForeignKey(mapt, related_name='sett_block')
username = models.ForeignKey(mapt, related_name='sett_username')
ts = models.IntegerField()
def __unicode__(self):
return str(self.setid)
class data_parsed(models.Model):
setid = models.ForeignKey(sett, related_name='data_parsed_setid', primary_key=True)
block = models.CharField(max_length=2000)
username = models.CharField(max_length=2000)
time = models.IntegerField()
def __unicode__(self):
return str(self.setid)
The data_parsed model should have the same amount of rows, but there is a possibility that they are not in "sync".
To avoid this from happening. I basically do these two steps:
Check if sett.objects.all().count() == data_parsed.objects.all().count()
This works great for a fast check, and it takes literally seconds in 1 million rows.
If they are not the same, I would check for all the sett model's pk, exclude the ones already found in data_parsed.
sett.objects.select_related().exclude(
setid__in = data_parsed.objects.all().values_list('setid', flat=True)).iterator():
Basically what this does is select all the objects in sett that exclude all the setid already in data_parsed. This method "works", but it will take around 4 hours for 1 million rows.
Is there a faster way to do this?

Finding setts without data_parsed using the reverse relation:
setts.objects.filter(data_parsed_setid__isnull=True)

If i am getting it right you are trying to keep a list of processed objects in another model by setting a foreign key.
You have only one data_parsed object by every sett object, so a many to one relationship is not needed. You could use one to one relationships and then check which object has that field as empty.
With a foreign key you could try to filter using the reverse query but that is at object level so i doubt that works.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to query Non Unique Together values in Django or SQL? - python

You could use a SQL query like: select distinct order_id from ( select order_id, type from Contact group by order_id, type having count(*) > 1); The "order" column is shown as "order_id" because that's the way Django names columns.

This should do the trick qs = (Contact.objects.values('order','type') .annotate(cnt=models.Count('pk')) .filter(cnt__gt=1))

Related

Filter queryset on a foreignkey

Filter Django model on reverse relationship list

django - prefetch only the newest record?

Django queries with complex filter

Django - Checking for two models if their primary keys match

Categories

Resources