How to rewrite next Raw SQL Query (MySQL) with Django ORM query?
mysql> select author_id,
count(*) c
from library_books
group by author_id
having c>2
limit 3;
+---------------+----+
| author_id | c |
+---------------+----+
| 0 | 39 |
| 1552 | 17 |
| 1784 | 8 |
+---------------+-----
First, annotate an author queryset with the number of books.
from django.db.models import Count
authors = Author.objects.annotate(num_books=Count('librarybook')
You haven't shown your Django models, so I've had to guess that 'librarybook' is the correct name for the reverse relationship.
Then filter on the num_books to find authors with more than two books.
authors = Author.objects.annotate(num_books=Count('librarybook').filter(num_books__gt=2)
Finally, slice the queryset to limit it to 3 results.
authors = Author.objects.annotate(num_books=Count('librarybook').filter(num_books__gt=2)[:3]
You can then loop through the resulting authors and access the number of books.
for author in authors:
print author.name, author.num_books
Related
I have two tables with relatively different data.
the photos table is a table with all the relevant meta data for photos such as user_id, photo_id, datetime, name, etc.
I have another table ratings that holds liked/disliked data for each respective photo. The columns in this table would have rater_id(for the person rating the picture), photo_id, and the rating (like/dislike).
The user would be presented a picture (at random) and then pick whether they liked it or not. Every time the image is loaded/presented it would have to be something that they have not yet rated.
What I'm trying to do is return a photo_id where the user has not yet rated it.
I've thought of using join or union, but I'm having difficulty understanding how to best use those (or any other solution) for this application. Where my confusion lies is how I can compare the ratings table against the photos table, to only return the photos that have not been rated by rater_id.
Sample data
photos table
id | photo_id
-------------------------
1 | photo_123
2 | photo_456
3 | photo_432
4 | photo_642
-------------------------
ratings table
id | photo_id | rater_id | rating
---------------------------------
1 | photo_123 | user2 | 1
2 | photo_456 | user2 | 1
3 | photo_123 | user1 | 1
4 | photo_642 | user2 | 1
--------------------------------
Sample Result: return photo_432 for user2 because it has not yet had a rating in ratings table
The canonical way would be not exists:
select p.*
from photos p
where not exists (select 1
from ratings r
where r.photo_id = p.id and
r.rater_id = #rater
)
order by rand()
limit 1;
There are more efficient ways to get a random row back if the table is big.
I'm having trouble understanding how to make a query that will show me 'the three most popular articles' in terms of views ('Status: 200 OK').
There are 2 tables I'm currently dealing with.
A Log table
An Articles table
The columns in these tables:
Table "public.log"
Column | Type | Modifiers
--------+--------------------------+--------------------------------------------------
path | text |
ip | inet |
method | text |
status | text |
time | timestamp with time zone | default now()
id | integer | not null default nextval('log_id_seq'::regclass)
Indexes:
and
Table "public.articles"
Column | Type | Modifiers
--------+--------------------------+-------------------------------------------------------
author | integer | not null
title | text | not null
slug | text | not null
lead | text |
body | text |
time | timestamp with time zone | default now()
id | integer | not null default nextval('articles_id_seq'::regclass)
Indexes:
.
So far, I've written this query based on my level and current understanding of SQL...
SELECT articles.title, log.status
FROM articles join log
WHERE articles.title = log.path
HAVING status = “200 OK”
GROUP BY title, status
Obviously, this is incorrect. I want to be able to pull the three most popular articles from the database and I know that 'matching' the 200 OK's with the "article title" will show or count in for me one "view" or hit. My thought process is like, I need to determine how many times that article.title=log.path (1 unique) shows up in the log database (with a status of 200 OK) by creating a query. My assignment is actually to write a program that will print the results with "[my code getting] the database to do the heavy lifting by using joins, aggregations, and the where clause.. doing minimal "post-processing" in the Python code itself."
Any explanation, idea, a tip is appreciated all of StackOverflow...
Perhaps the following is what you have in mind:
SELECT
a.title,
COUNT(*) AS cnt
FROM articles a
INNER JOIN log l
ON a.title = l.path
WHERE
l.lstatus = '200 OK'
GROUP BY
a.title
ORDER BY
COUNT(*) DESC
LIMIT 3;
This would return the three article titles having the highest status 200 hit counts. This answer assumes that you are using MySQL.
Hi I have a table with the following structure.
Table Name: DOCUMENTS
Sample Table Structure:
ID | UIN | COMPANY_ID | DOCUMENT_NAME | MODIFIED_ON |
---|----------|------------|---------------|---------------------|
1 | UIN_TX_1 | 1 | txn_summary | 2016-09-02 16:02:42 |
2 | UIN_TX_2 | 1 | txn_summary | 2016-09-02 16:16:56 |
3 | UIN_AD_3 | 2 | some other doc| 2016-09-02 17:15:43 |
I want to fetch the latest modified record UIN for the company whose id is 1 and document_name is "txn_summary".
This is the postgresql query that works:
select distinct on (company_id)
uin
from documents
where comapny_id = 1
and document_name = 'txn_summary'
order by company_id, "modified_on" DESC;
This query fetches me UIN_TX_2 which is correct.
I am using web2py DAL to get this value. After some research I have been successful to do this:
fmax = db.documents.modified_on.max()
query = (db.documents.company_id==1) & (db.documents.document_name=='txn_summary')
rows = db(query).select(fmax)
Now "rows" contains only the value of the modified_on date which has maximum value. I want to fetch the record which has the maximum date inside "rows". Please suggest a way. Help is much appreciated.
And my requirement extends to find each such records for each company_id for each document_name.
Your approach will not return complete row, it will only return last modified_on value.
To fetch last modified record for the company whose id is 1 and document_name "txn_summary", query will be
query = (db.documents.company_id==1) & (db.documents.document_name=='txn_summary')
row = db(query).select(db.documents.ALL, orderby=~db.documents.modified_on, limitby=(0, 1)).first()
orderby=~db.documents.modified_on will return records arranged in descending order of modified_on (last modified record will be first) and first() will select the first record. i.e. complete query will return last modified record having company 1 and document_name = "txn_summary".
There can be other/better way to achieve this. Hope this helps!
+-------------------+-------------------+----------+
| mac_src | mac_dst | bytes_in |
+-------------------+-------------------+----------+
| aa:aa:aa:aa:aa:aa | bb:bb:bb:bb:bb:bb | 10 |
| bb:bb:bb:bb:bb:bb | aa:aa:aa:aa:aa:aa | 20 |
| cc:cc:cc:cc:cc:cc | aa:aa:aa:aa:aa:aa | 30 |
+-------------------+-------------------+----------+
I have a table with fields mac_src, mac_dst and bytes_in.
I need to get all rows where each mac_src value that exists in the table is present in EITHER mac_src or mac_dst. I then need the sum of the fields bytes_in of all these rows.
I want to get the sum of field bytes_in of all rows where the field mac_src and mac_dst are equal, and then sort this sum from highest to lowest.
The Queryset returned should have just one entry per mac_src.
Thanks.
I don't think there's a simple way to do it with just the Django ORM. Just write an SQL query (warning: untested and probably slow SQL below):
from django.db import connection
with connection.cursor() as cursor:
cursor.execute('''
SELECT mac, SUM(total) FROM (
(SELECT mac_src AS mac, SUM(bytes_in) AS total FROM your_table GROUP BY mac_src)
UNION ALL (SELECT mac_dst AS mac, SUM(bytes_in) AS total FROM your_table WHERE mac_src != mac_dst GROUP BY mac_dst)
) AS combined_rows GROUP BY mac
''')
counts = dict(cursor.fetchall()) # {mac1: total_bytes1, ...}
I've written django sqlite orm syntax to retrieve particular set of records:
from django.db.models.aggregates import Count
JobStatus.objects.filter(
status='PRF'
).values_list(
'job', flat=True
).order_by(
'job'
).aggregate(
Count(status)__gt=3
).distinct()
But it gives me an error and the sql equivalent for this syntax works fine for me.
This is my sql equivalent.
SELECT *
FROM tracker_jobstatus
WHERE status = 'PRF'
GROUP BY job_id
HAVING COUNT(status) > 3;
and I'm getting the result as follows
+----+--------+--------+---------+---------------------+---------+
| id | job_id | status | comment | date_and_time | user_id |
+----+--------+--------+---------+---------------------+---------+
| 13 | 3 | PRF | | 2012-11-12 13:16:00 | 1 |
| 31 | 4 | PRF | | 2012-11-12 13:48:00 | 1 |
+----+--------+--------+---------+---------------------+---------+
I'm unable to find the django sqlite equivalent for this.
I will be very grateful if anyone can help.
Finally I've managed to figure it out. The ORM syntax is something like this.
from django.db.models.aggregates import Count
JobStatus.objects.filter(
status='PRF'
).values_list(
'job', flat=True
).order_by(
'job'
).annotate(
count_status=Count('status')
).filter(
count_status__gt=1
).distinct()
More general rule for this: you need to create new column (by annotate) and then filter through that new column. This queryset will be transformed to HAVING keyword.