GQL - Select all parents where specific child is not in children

GQL - Select all parents where specific child is not in children - python

I'd like to have a parent class (Group) where any number of User may join. I want to display all Groups where the User is not already in. How do I model this data and how do I query? Sorry for not providing any code, but I simply have no idea.
Edit:
In SQL, this would be done with a User table, a Group table and a GroupUser cross ref table. And querying would go:
select *
from Group
where Group.ID not in
(
select GroupID
from GroupUser
where UserID = #userid
)

There are going to be thousands of groups and user is going to be a member of tens of them?
You are going to have a user experience issue: how do you make user choose from thousand of groups? A pagination table with 50+ pages?
About your problem: when you solve the above problem, then you can just properly mark groups user is already a member of:
You can simply have all users groups in memory (it's only ten IDs, right?) and simply filter them as you display the groups.

As Wooble says, there's no way to construct a query like this for the App Engine datastore. If your number of groups greatly outnumbers the number of groups a user is actually in, your best option is just to select all groups, then filter out the ones the user is already in - which is exactly what an SQL database would do given that query.

Maybe I put my question unclearly, I am obviously new to GAE and my terminology may be wrong. Anyway here is my solution:
class User(db.Model):
username = db.StringProperty()
class Group(db.Model):
users = db.ListProperty(db.Key)
To find a group and join (somewhat simplified):
groups = db.GqlQuery("SELECT * "
"FROM Group")
for g in groups:
if user.key() not in g.users:
group = g
break
group.users.append(user.key())

Related

Is it possible to use queryset in the FROM clause

I have a model for user's points collection:
class Rating(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE, related_name='rating')
points = models.IntegerField()
Each user could have multiple records in this model. I need to calculate a rank of each user by sum of collected points. For the listing it's easy:
Rating.objects.values('user__username').annotate(
total_points=Sum('points')
).order_by('-total_points')
But how to get rank for a single user by his user_id? I added annotation with numbers of rows:
Rating.objects.values('user__username').annotate(
total_points=Sum('points')
).annotate(
rank=Window(
expression=RowNumber(),
order_by=[F('total_points').desc()]
)
)
it really added correct ranking numbers, but when I try to get a single user by user_id it returns a row with rank=1. It's because the filter condition goes to the WHERE clause and there is a single row with the number 1. I mean this:
Rating.objects.values('user__username').annotate(
total_points=Sum('points')
).annotate(
rank=Window(
expression=RowNumber(),
order_by=[F('total_points').desc()]
)
).filter(user_id=1)
I got the SQL query of this queryset (qs.query) like
SELECT ... FROM rating_rating WHERE ...
and inserted it into another SQL query as "rank_table" and added a condition into the outside WHERE clause:
SELECT * FROM (SELECT ... FROM rating_rating WHERE ...) AS rank_table WHERE user_id = 1;
and executed within the MySQL console. And this works exactly as I need. The question is: how to implement the same using Django ORM?

I have one solution to get what I need. I could add another field to mark records as "correct" or "incorrect" user, sort result by this field and then get the first row:
qs.annotate(
required_user=Case(
When(user_id=1, then=1),
default=0,
output_field=IntegerField(),
)
).order_by('-required_user').first()
This works. But SELECT within another SELECT seems more elegant and I would like to know is it possible with Django.

somehow someone just recently asked something about filtering on windows functions. While what you want is basically subquery (select in select), using annotation with the window function is not supported :
https://code.djangoproject.com/ticket/28333 because the annotated fields will inside the subquery :'(. One provides raw sql with query_with_params, but it is not really elegant.

Django Query where one field is duplicate and another is different

I want to know if I can create a query where one field is duplicate and another one is different.
Basically I want to get all UsersNames where First Name is the same and user_id is different.
I did this
UserNames.objects.values("first_name", "user_id").annotate(ct=Count("first_name")).filter(ct__gt=0)
This will retrieve a list whit all Users
After tis, I make some post processing and create another query where I filter just the users with first_name__in=['aaa'] & user_id__in=[1, 2] to get the users with the same first_name but different user_id
Can I do this just in one query? or in a better way?

You can work with a subquery here, but it will not matter much in terms of performance I think:
from django.db.models import Exists, OuterRef, Q
UserNames.objects.filter(
Exists(UserNames.objects.filter(
~Q(user_id=OuterRef('user_id')),
first_name=OuterRef('first_name')
))
)
or prior to django-3.0:
from django.db.models import Exists, OuterRef, Q
UserNames.objects.annotate(
has_other=Exists(UserNames.objects.filter(
~Q(user_id=OuterRef('user_id')),
first_name=OuterRef('first_name')
))
).filter(has_other=True)
We thus retain UserNames objects for which there exists a UserNames object with the same first_name, and with a different user_id.

Django: remove duplicates (group by) from queryset by related model field

I have a Queryset with a couple of records, and I wan't to remove duplicates using the related model field. For example:
class User(models.Model):
group = models.ForeignKey('Group')
...
class Address(models.Model):
...
models.ForeignKey('User')
addresses = Address.objects.filter(user__group__id=1).order_by('-id')
This returns a QuerySet of Address records, and I want to group by the User ID.
I can't use .annotate because I need all fields from Address, and the relationship between Address and User
I can't use .distinct() because it doesn't work, since all addresses are distinct, and I want distinct user addresses.
I could:
addresses = Address.objects.filter(user__group__id=1).order_by('-id')
unique_users_ids = []
unique_addresses = []
for address in addresses:
if address.user.id not in unique_users_ids:
unique_addresses.append(address)
unique_users_ids.append(address.user.id)
print unique_addresses # TA-DA!
But it seems too much for a simple thing like a group by (damn you Django).
Is there a easy way to achieve this?

By using .distinct() with a field name
Django has also a .distinct(..) function that takes as input column the column names that should be unique. Alas most database systems do not support this (only PostgreSQL to the best of my knowledge). But in PostgreSQL we can thus perform:
# Limited number of database systems support this
addresses = (Address.objects
.filter(user__group__id=1)
.order_by('-id')
.distinct('user_id'))
By using two queries
Another way to handle this is by first having a query that works over the users, and for each user obtains the largest address_id:
from django.db.models import Max
address_ids = (User.objects
.annotate(address_id=Max('address_set__id'))
.filter(address_id__isnull=False)
.values_list('address_id'))
So now for every user, we have calculated the largest corresponding address_id, and we eliminate Users that have no address. We then obtain the list of ids.
In a second step, we then fetch the addresses:
addresses = Address.objects.filter(pk__in=address_ids)

How to do a Django subquery

I have two examples of code which accomplish the same thing. One is using python, the other is in SQL.
Exhibit A (Python):
surveys = Survey.objects.all()
consumer = Consumer.objects.get(pk=24)
for ballot in consumer.ballot_set.all()
consumer_ballot_list.append(ballot.question_id)
for survey in surveys:
if survey.id not in consumer_ballot_list:
consumer_survey_list.append(survey.id)
Exhibit B (SQL):
SELECT * FROM clients_survey WHERE id NOT IN (SELECT question_id FROM consumers_ballot WHERE consumer_id=24) ORDER BY id;
I want to know how I can make exhibit A much cleaner and more efficient using Django's ORM and subqueries.
In this example:
I have ballots which contain a question_id that refers to the survey which a consumer has answered.
I want to find all of the surveys that the consumer hasn't answered. So I need to check each question_id(survey.id) in the consumer's set of ballots against the survey model's id's and make sure that only the surveys that the consumer does NOT have a ballot of are returned.

You more or less have the correct idea. To replicate your SQL code using Django's ORM you just have to break the SQL into each discrete part:
1.create table of question_ids the consumer 24 has answered
2.filter the survey for all ids not in the aformentioned table
consumer = Consumer.objects.get(pk=24)
# step 1
answered_survey_ids = consumer.ballot_set.values_list('question_id', flat=True)
# step 2
unanswered_surveys_ids = Survey.objects.exclude(id__in=answered_survey_ids).values_list('id', flat=True)
This is basically what you did in your current python based approach except I just took advantage of a few of Django's nice ORM features.
.values_list() - this allows you to extract a specific field from all the objects in the given queryset.
.exclude() - this is the opposite of .filter() and returns all items in the queryset that don't match the condition.
__in - this is useful if we have a list of values and we want to filter/exclude all items that match those values.
Hope this helps!

How to query multiple Django models describing denormalized tables

I'm trying to extract information from a number of denormalized tables, using Django models. The tables are pre-existing, part of a legacy MySQL database.
Schema description
Let's say that each table describes traits about a person, and each person has a name (this essentially identifies the person, but does not correspond to some unifying "Person" table). For example:
class JobInfo(models.Model):
name = models.CharField(primary_key=True, db_column='name')
startdate = models.DateField(db_column='startdate')
...
class Hobbies(models.Model):
name = models.CharField(primary_key=True, db_column='name')
exercise = models.CharField(db_column='exercise')
...
class Clothing(model.Model):
name = models.CharField(primary_key=True, db_column='name')
shoes = models.CharField(db_column='shoes')
...
# Twenty more classes exist, all of the same format
Accessing via SQL
In raw SQL, when I want to access information across all tables, I do a series of ugly OUTER JOINs, refining it with a WHERE clause.
SELECT JobInfo.startdate, JobInfo.employer, JobInfo.salary,
Hobbies.exercise, Hobbies.fun,
Clothing.shoes, Clothing.shirt, Clothing,pants
...
FROM JobInfo
LEFT OUTER JOIN Hobbies ON Hobbies.name = JobInfo.name
LEFT OUTER JOIN Clothing ON Clothing.name = JobInfo.name
...
WHERE
Clothing.shoes REXEGP "Nike" AND
Hobbies.exercise REGEXP "out"
...;
Model-based approach
I'm trying to convert this to a Django-based approach, where I can easily get a QuerySet that pulls in information from all tables.
I've looked into using a OneToOneField (example), making one table have a field for tying it to each of the others. However, this would mean that one table needs the "central" table, which all others reference in reverse. This seems like a mess with twenty-odd fields, and doesn't really make schematic sense (is "job info" the core properties? clothes?).
I feel like I'm going about this the wrong way. How should I be building a QuerySet on related tables, where each table has one primary key field common across all tables?

If your DB access allows this, I would probably do this by defining a Person model, then declare the name DB column to be a foreign key to that model with to_field set as the name on the person model. Then you can use the usual __ syntax in your queries.
Assuming Django doesn't complain about a ForeignKey field with primary_key=True, anyway.
class Person(models.Model):
name = models.CharField(primary_key=True, max_length=...)
class JobInfo(models.Model):
person = models.ForeignKey(Person, primary_key=True, db_column='name', to_field='name')
startdate = models.DateField(db_column='startdate')
...
I don't think to_field is actually required as long as name is declared as your primary key, but I think it's good for clarity. Or if you don't declare name as the PK on person.
I haven't tested this, though.
To use a view, you have two options. I think both would do best with an actual table containing all the known user names, maybe with a numeric PK as Django usually expects as well. Let's assume that table exists - call it person.
One option is to create a single large view to encompass all information about a user, similar to the big join you use above - something like:
create or replace view person_info as
select person.id, person.name,
jobinfo.startdate, jobinfo.employer, jobinfo.salary,
hobbies.exercise, hobbies.fun,
clothing.shoes, ...
from person
left outer join hobbies on hobbies.name = person.name
left outer join jobinfo on jobinfo.name = person.name
left outer join clothing on clothing.name = person.name
;
That might take a little debugging, but the idea should be clear.
Then declare your model with db_table = person_info and managed = False in the Meta class.
A second option would be to declare a view for each subsidiary table that includes the person_id value matching the name, then just use Django FKs.
create or replace view jobinfo_by_person as
select person.id as person_id, jobinfo.*
from person inner join jobinfo on jobinfo.name = person.name;
create or replace view hobbies_by_person as
select person.id as person_id, hobbies.*
from person inner join hobbies on hobbies.name = person.name;
etc. Again, I'm not totally sure the .* syntax will work - if not, you'd have to list all the fields you're interested in. And check what the column names from the subsidiary tables are.
Then point your models at the by_person versions and use the standard FK setup.
This is a little inelegant and I make no claims for good performance, but it does let you avoid further denormalizing your database.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

GQL - Select all parents where specific child is not in children - python

Related

Is it possible to use queryset in the FROM clause

Django Query where one field is duplicate and another is different

Django: remove duplicates (group by) from queryset by related model field

How to do a Django subquery

How to query multiple Django models describing denormalized tables

Categories

Resources