Django group by id then select max timestamp - python

It might be a redundant question, but I have tried previous answers from other related topics and still can't figure it out.
I have a table Board_status looks like this (multiple status and timestamp for each board):
time | board_id | status
-------------------------------
2012-4-5 | 1 | good
2013-6-6 | 1 | not good
2013-6-7 | 1 | alright
2012-6-8 | 2 | good
2012-6-4 | 3 | good
2012-6-10 | 2 | good
Now I want to select all records from Board_status table, group all of them by board_id for distinct board_id, then select the latest status on each board. Basically end up with table like this (only latest status and timestamp for each board):
time | board_id | status
------------------------------
2013-6-7 | 1 | alright
2012-6-4 | 3 | good
2012-6-10 | 2 | good
I have tried:
b = Board_status.objects.values('board_id').annotate(max=Max('time')).values_list('board_id','max','status')
but doesn't seem like it is working. Still give me more than 1 record per board_id.
Which command should I use in Django to do this?

An update, this is the solution I use. Not the best, but it works for now:
b=[]
a = Board_status.objects.values('board_id').distinct()
for i in range(a.count()):
b.append(Board_status.objects.filter(board_id=a[i]['board_id']).latest('time'))
So I got all board_id, store into list a. Then for each board_id, do another query to get the latest time. Any better answer is still welcomed.

How will it work? You neither have filter nor distinct to filter out the duplicates. I am not sure if this can be easily done in a single django query. You should read more on:
https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.distinct
https://docs.djangoproject.com/en/1.4/topics/db/aggregation/
If you can't do it in 1 raw sql query, you can't do it with an OR mapper either as it's built on top of mysql (in your case). Can you tell me how you would do this via raw SQL?

Related

Get the intersection of two many-to-many relationship of specific values

N.B. I have tagged this with SQLAlchemy and Python because the whole point of the question was to develop a query to translate into SQLAlchemy. This is clear in the answer I have posted. It is also applicable to MySQL.
I have three interlinked tables I use to describe a book. (In the below table descriptions I have eliminated extraneous rows to the question at hand.)
MariaDB [icc]> describe edition;
+-----------+------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
+-----------+------------+------+-----+---------+----------------+
7 rows in set (0.001 sec)
MariaDB [icc]> describe line;
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| edition_id | int(11) | YES | MUL | NULL | |
| line | varchar(200) | YES | | NULL | |
+------------+--------------+------+-----+---------+----------------+
5 rows in set (0.001 sec)
MariaDB [icc]> describe line_attribute;
+------------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------+------+-----+---------+-------+
| line_id | int(11) | NO | PRI | NULL | |
| num | int(11) | YES | | NULL | |
| precedence | int(11) | YES | MUL | NULL | |
| primary | tinyint(1) | NO | MUL | NULL | |
+------------+------------+------+-----+---------+-------+
5 rows in set (0.001 sec)
line_attribute.precedence is the hierarchical level of the given heading. So if War and Peace has Books > Chapters, all of the lines have an attribute that corresponds to the Book they're in (e.g., Book 1 has precedence=1 and num=1) and an attribute for the Chapter they're in (e.g., Chapter 2 has precedence=2 and num=2). This allows me to translate the hierarchical structure of books with volumes, books, parts, chapters, sections, or even acts and scenes. The primary column is a boolean, so that each and every line has one attribute that is primary. If it is a book heading, it is the Book attribute, if it is a chapter heading, it is the Chapter attribute. If it is a regular line in text, it is a line attribute, and the precedence is 0 since it is not a part of the hierarchical structure.
I need to be able to query for all lines with a particular edition_id and that also have the intersection of two line_attributes.
(This would allow me to get all lines from a particular edition that are in, say, Book 1 Chapter 2 of War and Peace).
I can get all lines that have Book 1 with
SELECT
line.*
FROM
line
INNER JOIN
line_attribute
ON
line_attribute.line_id=line.id
WHERE
line.edition_id=2 AND line_attribute.precedence=1 AND line_attribute.num=1;
and I can get all lines that have Chapter 2:
SELECT
line.*
FROM
line
INNER JOIN
line_attribute
ON
line_attribute.line_id=line.id
WHERE
line.edition_id=2 AND line_attribute.precedence=2 AND line_attribute.num=1;
Except the second query returns each chapter 2 from every book in War and Peace.
How do I get from these two queries to just the lines from book 1 chapter 2?
Warning from Raymond Nijland in the comments:
Note for future readers.. Because this question is tagged MySQL.. MySQL does not support INTERSECT keyword.. MariaDB is indeed a fork off the MySQL source code but supports extra features which MySQL does not support.. In MySQL you can simulate the INTERSECT keyword with a INNER JOIN or IN()
Trying to write up a question on SO helps me get my thoughts clear and eventually solve the problem before I have to ask the question. The queries above are much clearer than my initial queries and the question pretty much answers itself, but I never found a clear answer that talks about the intersect utility, so I'm posting this answer anyway.
The solution was the INTERSECT operator.
The solution is simply the intersection of those two queries:
SELECT
line.*
FROM
line
INNER JOIN
line_attribute
ON
line_attribute.line_id=line.id
WHERE
line.edition_id=2 AND line_attribute.precedence=1 AND line_attribute.num=1
INTERSECT /* it is literally this simple */
SELECT
line.*
FROM
line
INNER JOIN
line_attribute
ON
line_attribute.line_id=line.id
WHERE
line.edition_id=2 AND line_attribute.precedence=2 AND line_attribute.num=2;
This also means I could get all of the book and chapter headings for a particular book by simply adding an additional constraint (line_attribute.primary=1).
This solution seems broadly applicable to me. Assuming, for instance, you have questions in a StackOverflow clone, which are tagged, you can get the intersection of questions with two tags (e.g., all posts that have both the SQLAlchemy and Python tags). I am certainly going to use this method for that sort of query.
I coded this up in MySQL because it helps me get the query straight to translate it into SQLAlchemy.
The SQLAlchemy query is this simple:
[nav] In [10]: q1 = Line.query.join(LineAttribute).filter(LineAttribute.precedence==1, LineAttribute.num==1)
[ins] In [11]: q2 = Line.query.join(LineAttribute).filter(LineAttribute.precedence==2, LineAttribute.num==1)
[ins] In [12]: q1.intersect(q2).all()
Hopefully the database structure in this question helps someone down the road. I didn't want to delete the question after I solved my own problem.

Trouble with SQL join, where, having clause

I'm having trouble understanding how to make a query that will show me 'the three most popular articles' in terms of views ('Status: 200 OK').
There are 2 tables I'm currently dealing with.
A Log table
An Articles table
The columns in these tables:
Table "public.log"
Column | Type | Modifiers
--------+--------------------------+--------------------------------------------------
path | text |
ip | inet |
method | text |
status | text |
time | timestamp with time zone | default now()
id | integer | not null default nextval('log_id_seq'::regclass)
Indexes:
and
Table "public.articles"
Column | Type | Modifiers
--------+--------------------------+-------------------------------------------------------
author | integer | not null
title | text | not null
slug | text | not null
lead | text |
body | text |
time | timestamp with time zone | default now()
id | integer | not null default nextval('articles_id_seq'::regclass)
Indexes:
.
So far, I've written this query based on my level and current understanding of SQL...
SELECT articles.title, log.status
FROM articles join log
WHERE articles.title = log.path
HAVING status = “200 OK”
GROUP BY title, status
Obviously, this is incorrect. I want to be able to pull the three most popular articles from the database and I know that 'matching' the 200 OK's with the "article title" will show or count in for me one "view" or hit. My thought process is like, I need to determine how many times that article.title=log.path (1 unique) shows up in the log database (with a status of 200 OK) by creating a query. My assignment is actually to write a program that will print the results with "[my code getting] the database to do the heavy lifting by using joins, aggregations, and the where clause.. doing minimal "post-processing" in the Python code itself."
Any explanation, idea, a tip is appreciated all of StackOverflow...
Perhaps the following is what you have in mind:
SELECT
a.title,
COUNT(*) AS cnt
FROM articles a
INNER JOIN log l
ON a.title = l.path
WHERE
l.lstatus = '200 OK'
GROUP BY
a.title
ORDER BY
COUNT(*) DESC
LIMIT 3;
This would return the three article titles having the highest status 200 hit counts. This answer assumes that you are using MySQL.

Storing user permissions on rows of data

I have a table with rows of data for different experiments.
experiment_id data_1 data_2
------------- ------ -------
1
2
3
4
..
I have a user database on django, and I would like to store permissions indicating which users can access which rows and then return only the rows the user is authorized for.
What format should I use to store the permissions? Simply a table with a row for each user and a column for each experiment with Boolean? And in that case I would have to add a row to this table each time an experiment is added?
user experiment_1 experiment_2 experiment_3 ..
---- ------------ ------------ ------------ --
user_1 True False False ..
user_2 False True False ..
..
Any reference literature on the topic would also be great, preferably related to sqlite3 functionality since that is my current db backend.
I'm not 100% sure what all will work best for you but in the past I find using a solution as follows to be the easiest to query against and maintain in the future.
Table: Experiment
Experiment_Id | data_1 | data_2
-----------------------------------
1 | ... | ...
2 | ... | ...
Table: User
User_Name | Password | ...
----------------------------
User1 | ...
User2 | ...
Table: User_Experiment_Permissions
User_Name | Experiment | Can_Read | Can_Edit
--------------------------------------------
User1 | 1 | true | false
User2 | 1 | false | false
User1 | 2 | true | true
User2 | 2 | true | false
As you can see, in the new table we reference both the user and the experiment. This allows us fine grain control over the relationship between the user and the experiment. Also, if this relationship had a new permission that arose, such as can_delete then you can simply add this to the new cross reference table with a default and the change will be retrofit into your your system :-)
It depends on the way you will use the permissions for.
- In case you will use this values inside a query
you have two options for example to get the users with specific permiss
Create a bit masking number fields and every bit will represent permission, and you can use AND/OR to get whatever combinations of permissions you need.
Advantage : small size, very efficient
Disadvantage: complex to implement
Create a field for each permission ( your solution ).
Advantage : To easy to add
Disadvantage: Have to edit schema with each permission
- In case you will not use it for any query and will process it at the code you can just dump a JSON into one column include all the permission the user has like :
{"experiment_1": 1, "experiment_2": 0, "experiment_3": 1}

SQLite: Transposing results of a GROUP BY and filling in IDs with names

My question is rather specific, if you have a better title please suggest one. Also, formatting is bad - didn't know how to combine lists and codeblocks.
I have an SQLite3 database with the following (relevant parts of the) .schema:
CREATE TABLE users (id INTEGER PRIMARY KEY NOT NULL, user TEXT UNIQUE);
CREATE TABLE locations (id INTEGER PRIMARY KEY NOT NULL, name TEXT UNIQUE);
CREATE TABLE purchases (location_id INTEGER, user_id INTEGER);
CREATE TABLE sales (location_id integer, user_id INTEGER);
purchases has about 4.5mil entries, users about 300k, sales about 100k, and locations about 250 - just to gauge the data volume.
My desired use would be to generate a JSON object to be handed off to another application, very much condensed in volume by doing the following:
-GROUPing both purchases and sales into one common table BY location_id,user_id - IOW, getting the number of "actions" per user per location. That I can do, result is something like
loc | usid | loccount
-----------------------
1 | 1246 | 123
1 | 2345 | 1
13 | 1246 | 46
13 | 8732 | 4
27 | 2345 | 41
(At least it looks good, always hard to tell with such volumes; query:
select location_id,user_id,count(location_id) from
(select location_id,user_id from purchases
union all
select location_id,user_id from sales)
group by location_id,user_id order by user_id`
)
-Then, transposing that giant table such that I would get:
usid | loc1 | loc13 | loc27
---------------------------
1246 | 123 | 46 | 0
2345 | 1 | 0 | 41
8732 | 0 | 4 | 0
That I cannot do, and it's my absolutely crucial point for this question. I tried some things I found online, especially here, but I just started SQLite a little while ago and don't understand many queries.
-Lastly, translate the table into plain text in order to write it to JSON:
user | AAAA | BBBBB | CCCCC
---------------------------
zeta | 123 | 46 | 0
beta | 1 | 0 | 41
iota | 0 | 4 | 0
That I probably could do with quite a bit of experimentation and inner join, although I'm always very unsure what way is the best approach to handle such data volumes, hence I wouldn't mind a pointer.
The whole thing is written in Python's sqlite3 interface, if it matters. In the end, I'd love to have something I could just do a "for" loop per user over in order to generate the JSON, which would then of course be very simple. It doesn't matter if the query takes a long time (<10min would be nice), it's only run twice per day as a sort of backup. I've only got a tiny VPS available, but being limited to a single core the performance is as good as on my reasonably powerful desktop. (i5-3570k running Debian.)
The table headers are just examples because I wasn't quite sure if I could use integers for them (didn't discover the syntax if so), as long as I'm somehow able to look up the numeric part in the locations table I'm fine. Same for translating the user IDs into names. The number of columns is known beforehand - they're after all just INTEGER PRIMARY KEYs and I have a list() of them from some other operation. The number of rows can be determined reasonably quickly, ~3s, if need be.
Consider using subqueries to achieve your desired transposed output:
SELECT DISTINCT m.usid,
IFNULL((SELECT t1.loccount FROM tablename t1
WHERE t1.usid = m.usid AND t1.loc=1),0) AS Loc1,
IFNULL((SELECT t2.loccount FROM tablename t2
WHERE t2.usid = m.usid AND t2.loc=13),0) AS Loc13,
IFNULL((SELECT t3.loccount FROM tablename t3
WHERE t3.usid = m.usid AND t3.loc=27),0) AS Loc27
FROM tablename As m
Alternatively, you can use nested IF statements (or in the case of SQLite that uses CASE/WHEN) as derived table:
SELECT temp.usid, Max(temp.loc1) As Loc1,
Max(temp.loc13) As Loc13, Max(temp.loc27) As Loc27
FROM
(SELECT tablename.usid,
CASE WHEN loc=1 THEN loccount ELSE 0 As Loc1 END,
CASE WHEN loc=13 THEN loccount ELSE 0 As Loc13 END,
CASE WHEN loc=27 THEN loccount ELSE 0 As Loc27 END
FROM tablename) AS temp
GROUP BY temp.usid

Pivoting data and complex annotations in Django ORM

The ORM in Django lets us easily annotate (add fields to) querysets based on related data, hwoever I can't find a way to get multiple annotations for different filtered subsets of related data.
This is being asked in relation to django-helpdesk, an open-source Django-powered trouble-ticket tracker. I need to have data pivoted like this for charting and reporting purposes
Consider these models:
CHOICE_LIST = (
('open', 'Open'),
('closed', 'Closed'),
)
class Queue(models.model):
name = models.CharField(max_length=40)
class Issue(models.Model):
subject = models.CharField(max_length=40)
queue = models.ForeignKey(Queue)
status = models.CharField(max_length=10, choices=CHOICE_LIST)
And this dataset:
Queues:
ID | Name
---+------------------------------
1 | Product Information Requests
2 | Service Requests
Issues:
ID | Queue | Status
---+-------+---------
1 | 1 | open
2 | 1 | open
3 | 1 | closed
4 | 2 | open
5 | 2 | closed
6 | 2 | closed
7 | 2 | closed
I would like to see an annotation/aggregate look something like this:
Queue ID | Name | open | closed
---------+-------------------------------+------+--------
1 | Product Information Requests | 2 | 1
2 | Service Requests | 1 | 3
This is basically a crosstab or pivot table, in Excel parlance. I am currently building this output using some custom SQL queries, however if I can move to using the Django ORM I can more easily filter the data dynamically without doing dodgy insertion of WHERE clauses in my SQL.
For "bonus points": How would one do this where the pivot field (status in the example above) was a date, and we wanted the columns to be months / weeks / quarters / days?
You have Python, use it.
from collections import defaultdict
summary = defaultdict( int )
for issue in Issues.objects.all():
summary[issue.queue, issue.status] += 1
Now your summary object has queue, status as a two-tuple key. You can display it directly, using various template techniques.
Or, you can regroup it into a table-like structure, if that's simpler.
table = []
queues = list( q for q,_ in summary.keys() )
for q in sorted( queues ):
table.append( q.id, q.name, summary.count(q,'open'), summary.count(q.'closed') )
You have lots and lots of Python techniques for doing pivot tables.
If you measure, you may find that a mostly-Python solution like this is actually faster than a pure SQL solution. Why? Mappings can be faster than SQL algorithms which require a sort as part of a GROUP-BY.
Django has added a lot of functionality to the ORM since this question was originally asked. The answer to how to pivot data since Django 1.8 is to use the Case/When conditional expressions. And there is a third party app that will do that for you, PyPI and documentation

Categories

Resources