How can I rank entries using sqlalchemy? - python

I have created a model User that has the columns score and rank. I would like to periodically update the rank of all users in User such that the user with the highest score has rank 1, second highest score rank 2, etc. Is there anyway to efficiently achieve this in Flask-SQLAlchemy?
Thanks!
btw, here is the model:
app = Flask(__name__)
db = SQLAlchemy(app)
class User(db.Model):
id = db.Column(db.Integer, primary_key=True)
score = db.Column(db.Integer)
rank = db.Column(db.Integer)

Well as far as why one might do this, it's so that you can query for "rank" without needing to perform an aggregate query, which can be more performant. especially if you want to see "whats the rank for user #456?" without hitting every row.
the most efficient way to do this is a single UPDATE. Using standard SQL, we can use a correlated subquery like this:
UPDATE user SET rank=(SELECT count(*) FROM user AS u1 WHERE u1.score > user.score) + 1
Some databases have extensions like PG's UPDATE..FROM, which I have less experience with, perhaps if you could UPDATE..FROM a SELECT statement that gets the rank at once using a window function that would be more efficient, though I'm not totally sure.
Anyway our standard SQL with SQLAlchemy looks like:
from sqlalchemy.orm import aliased
from sqlalchemy import func
u1 = aliased(User)
subq = session.query(func.count(u1.id)).filter(u1.score > User.score).as_scalar()
session.query(User).update({"rank": subq + 1}, synchronize_session=False)

Just cycle on all your users:
users = User.query.order_by(User.score._desc()).all() #fetch them all in one query
for (rank, user) in enumerate(users):
user.rank = rank + 1 #plus 1 cause enumerate starts from zero
db.session.commit()

Related

Confusing SQLAlchemy conversion of simple subquery

I've been wrestling with what should be a simple conversion of a straightforward SQL query into an SQLAlchemy expression, and I just cannot get things to line up the way I mean in the subquery. This is a single-table query of a "Comments" table; I want to find which users have made the most first comments:
SELECT user_id, count(*) AS count
FROM comments c
where c.date = (SELECT MIN(c2.date)
FROM comments c2
WHERE c2.post_id = c.post_id
)
GROUP BY user_id
ORDER BY count DESC
LIMIT 20;
I don't know how to write the subquery so that it refers to the outer query, and if I did, I wouldn't know how to assemble this into the outer query itself. (Using MySQL, which shouldn't matter.)
Well, after giving up for a while and then looking back at it, I came up with something that works. I'm sure there's a better way, but:
c2 = aliased(Comment)
firstdate = select([func.min(c2.date)]).\
where(c2.post_id == Comment.post_id).\
as_scalar() # or scalar_subquery(), in SQLA 1.4
users = session.query(
Comment.user_id, func.count('*').label('count')).\
filter(Comment.date == firstdate).\
group_by(Comment.user_id).\
order_by(desc('count')).\
limit(20)

Django query with relations

I have a messy and old query that I'm trying to convert from SQL to Django ORM and I can't seem to figure it out.
As the original query is not something that should be public, heres something similair to what I'm working with:
Table 1
id
Table 2
Id
username
active
birthday
table_1_fk
Table 3
Id
amount
table_1_fk
I need to end up with a list of active users (username), sorted by date, displaying the amount. Table1 references within table 2 and 3 are not in order. The main issues I'm having are:
How do I retrieve these with just ORM (no looping/executing, or hardly any if I must)
If I can't use solely ORM and do decide to just loop over the parts I need to, how would I even create a single object to display in a table without looping over everything multiple times?
My tought processes:
Table 2 is active -> get table 1 -> find table 1 pk in table 3 -> add table 3 info to table 1?
Table 1 -> get table 2 Actives, Table1 -> get table 3 amounts -> loop to match according to table1_fks
You can perform related references using the Table1. If your models looks something like this:
from django.db import models
from django.db.models import F
class Table1(models.Model):
...
class Table2(models.Model):
username = models.CharField(max_length=100)
active = models.BooleanField()
birthday = models.DateField() # Sorted by date
table1 = models.ForeignKey(Table1, related_name="table2")
class Table3(models.Model):
amount = models.IntegerField()
table1 = models.ForeignKey(Table1, related_name="table3")
You can do later:
>>> users = (
Table1.objects
.filter(table2__active=True)
.annotate(
username=F("table2__username"),
amount=F("table3__amount"),
birthday=F("table2__birthday")
)
.order_by("-birthday")
.values("username", "amount", "birthday")
)
>>> print(users)
[
["user1", 100.0, "2020-01-13"],
["user2", 890.0, "2020-01-10"],
["user3", None, "2020-01-01"],
]
It completely depends on how your models classes are implemented.

sqlalchemy join and order by on multiple tables

I'm working with a database that has a relationship that looks like:
class Source(Model):
id = Identifier()
class SourceA(Source):
source_id = ForeignKey('source.id', nullable=False, primary_key=True)
name = Text(nullable=False)
class SourceB(Source):
source_id = ForeignKey('source.id', nullable=False, primary_key=True)
name = Text(nullable=False)
class SourceC(Source, ServerOptions):
source_id = ForeignKey('source.id', nullable=False, primary_key=True)
name = Text(nullable=False)
What I want to do is join all tables Source, SourceA, SourceB, SourceC and then order_by name.
Sound easy to me but I've been banging my head on this for while now and my heads starting to hurt. Also I'm not very familiar with SQL or sqlalchemy so there's been a lot of browsing the docs but to no avail. Maybe I'm just not seeing it. This seems to be close albeit related to a newer version than what I have available (see versions below).
I feel close not that that means anything. Here's my latest attempt which seems good up until the order_by call.
Sources = [SourceA, SourceB, SourceC]
# list of join on Source
joins = [session.query(Source).join(source) for source in Sources]
# union the list of joins
query = joins.pop(0).union_all(*joins)
query seems right at this point as far as I can tell i.e. query.all() works. So now I try to apply order_by which doesn't throw an error until .all is called.
Attempt 1: I just use the attribute I want
query.order_by('name').all()
# throws sqlalchemy.exc.ProgrammingError: (ProgrammingError) column "name" does not exist
Attempt 2: I just use the defined column attribute I want
query.order_by(SourceA.name).all()
# throws sqlalchemy.exc.ProgrammingError: (ProgrammingError) missing FROM-clause entry for table "SourceA"
Is it obvious? What am I missing? Thanks!
versions:
sqlalchemy.version = '0.8.1'
(PostgreSQL) 9.1.3
EDIT
I'm dealing with a framework that wants a handle to a query object. I have a bare query that appears to accomplish what I want but I would still need to wrap it in a query object. Not sure if that's possible. Googling ...
select = """
select s.*, a.name from Source d inner join SourceA a on s.id = a.Source_id
union
select s.*, b.name from Source d inner join SourceB b on s.id = b.Source_id
union
select s.*, c.name from Source d inner join SourceC c on s.id = c.Source_id
ORDER BY "name";
"""
selectText = text(select)
result = session.execute(selectText)
# how to put result into a query. maybe Query(selectText)? googling...
result.fetchall():
Assuming that coalesce function is good enough, below examples should point you in the direction. One option automatically creates a list of children, while the other is explicit.
This is not the query you specified in your edit, but you are able to sort (your original request):
def test_explicit():
# specify all children tables to be queried
Sources = [SourceA, SourceB, SourceC]
AllSources = with_polymorphic(Source, Sources)
name_col = func.coalesce(*(_s.name for _s in Sources)).label("name")
query = session.query(AllSources).order_by(name_col)
for x in query:
print(x)
def test_implicit():
# get all children tables in the query
from sqlalchemy.orm import class_mapper
_map = class_mapper(Source)
Sources = [_smap.class_
for _smap in _map.self_and_descendants
if _smap != _map # #note: exclude base class, it has no `name`
]
AllSources = with_polymorphic(Source, Sources)
name_col = func.coalesce(*(_s.name for _s in Sources)).label("name")
query = session.query(AllSources).order_by(name_col)
for x in query:
print(x)
Your first attempt sounds like it isn't working because there is no name in Source, which is the root table of the query. In addition, there will be multiple name columns after your joins, so you will need to be more specific. Try
query.order_by('SourceA.name').all()
As for your second attempt, what is ServerA?
query.order_by(ServerA.name).all()
Probably a typo, but not sure if it's for SO or your code. Try:
query.order_by(SourceA.name).all()

SQLAlchemy: several counts in one query

I am having hard time optimizing my SQLAlchemy queries. My SQL knowledge is very basic, and I just can't get the stuff I need from the SQLAlchemy docs.
Suppose the following very basic one-to-many relationship:
class Parent(Base):
__tablename__ = "parents"
id = Column(Integer, primary_key = True)
children = relationship("Child", backref = "parent")
class Child(Base):
__tablename__ = "children"
id = Column(Integer, primary_key = True)
parent_id = Column(Integer, ForeignKey("parents.id"))
naughty = Column(Boolean)
How could I:
Query tuples of (Parent, count_of_naughty_children, count_of_all_children) for each parent?
After decent time spent googling, I found how to query those values separately:
# The following returns tuples of (Parent, count_of_all_children):
session.query(Parent, func.count(Child.id)).outerjoin(Child, Parent.children).\
group_by(Parent.id)
# The following returns tuples of (Parent, count_of_naughty_children):
al = aliased(Children, session.query(Children).filter_by(naughty = True).\
subquery())
session.query(Parent, func.count(al.id)).outerjoin(al, Parent.children).\
group_by(Parent.id)
I tried to combine them in different ways, but didn't manage to get what I want.
Query all parents which have more than 80% naughty children? Edit: naughty could be NULL.
I guess this query is going to be based on the previous one, filtering by naughty/all ratio.
Any help is appreciated.
EDIT : Thanks to Antti Haapala's help, I found solution to the second question:
avg = func.avg(func.coalesce(Child.naughty, 0)) # coalesce() treats NULLs as 0
# avg = func.avg(Child.naughty) - if you want to ignore NULLs
session.query(Parent).join(Child, Parent.children).group_by(Parent).\
having(avg > 0.8)
It finds average if children's naughty variable, treating False and NULLs as 0, and True as 1. Tested with MySQL backend, but should work on others, too.
the count() sql aggretate function is pretty simple; it gives you the total number of non-null values in each group. With that in mind, we can adjust your query to give you the proper result.
print (Query([
Parent,
func.count(Child.id),
func.count(case(
[((Child.naughty == True), Child.id)], else_=literal_column("NULL"))).label("naughty")])
.join(Parent.children).group_by(Parent)
)
Which produces the following sql:
SELECT
parents.id AS parents_id,
count(children.id) AS count_1,
count(CASE WHEN (children.naughty = 1)
THEN children.id
ELSE NULL END) AS naughty
FROM parents
JOIN children ON parents.id = children.parent_id
GROUP BY parents.id
If your query is only to get the parents who have > 80 % children naughty, you can on most databases cast the naughty to integer, then take average of it; then having this average greater than 0.8.
Thus you get something like
from sqlalchemy.sql.expression import cast
naughtyp = func.avg(cast(Child.naughty, Integer))
session.query(Parent, func.count(Child.id), naughtyp).join(Child)\
.group_by(Parent.id).having(naughtyp > 0.8).all()

SQLalchemy: quantiles for all permutations of column value combinations

We have a sql server query in which we need to generate ntiles for increasingly large numbers of variables, such that the variables are combined with each other in their various permutations. Here's an excerpt exemplifying what I mean:
statement 1:
ntile(10) over (partition by MAUorALL, User_Type, fsi.Month_ID
order by Objects_Created) AS Ntile_Mon_Objects_Created,
statement 2:
ntile(10) over (partition by MAUorALL, User_Type, fsi.Month_ID, *Country*
order by Objects_Created) AS Ntile_Country_Objects_Created
statement 3:
ntile(10) over (partition by MAUorALL, User_Type, fsi.Month_ID, *User*_Type
order by Objects_Created) AS Ntile_UT_Objects_Created
You can see that the statements are the same except that in the second and third one the italicized columns "country" and "user type" have been created. So we take ntiles for the same variable "Objects_Created" at different levels of specificity, and we also have to take ntiles for the various possible permutations of these variables, e.g.:
statement 4:
ntile(10) over (partition by MAUorALL, User_Type, fsi.Month_ID, *Country, User_Type*
order by Objects_Created) AS Ntile_Country_UT_Objects_Created
We can manually code these permutations to a point, but if we could use sqlalchemy to execute all the permutations of these variables it might make things easier. Does anyone have an example I could re-purpose?
Thanks for your help!
I have no idea how fsi is related to other columns, but assuming all data is in one model (which is easy to extend with sqlalchemy query) like below:
class User(Base):
__tablename__ = 't_users'
id = Column(Integer, primary_key=True)
MAUorALL = Column(String)
User_Type = Column(String)
Country = Column(String)
Month_ID = Column(Integer)
Objects_Created = Column(Integer)
the task is accomplished by simple usage of itertools.permutations (or itertools.combinations, depending what you want to achieve) for creating query. Below code would generate a query for a User table with various ntiles for it. I assume reading the code suffice for understading what is happening:
# configuration: {label: Column}
column_labels = {
'Country': User.Country,
'UT': User.User_Type,
}
def get_ntile(additional_columns=None):
""" #return: sqlalchemy expression for selecting a given ntile() using
predefined as well as *additional* columns.
"""
partition_by = [
User.MAUorALL,
User.User_Type,
User.Month_ID,
]
label = "Ntile_Objects_Created"
if additional_columns:
lbls = []
for col_name in additional_columns:
col = column_labels[col_name]
partition_by.append(col)
lbls.append(col_name)
label = "Ntile_{}_Objects_Created".format("_".join(lbls))
xprs = over(
func.ntile(10),
partition_by = partition_by,
order_by = User.Objects_Created,
).label(label)
return xprs
def get_query(additional_columns=['UT', 'Country']):
""" #return: a query object which selects a User with additional ntiles
for predefined columns (fixed) and all possible permutations of
*additional_columns*
"""
from itertools import permutations#, combinations
tiles = [get_ntile(comb)
for r in range(len(additional_columns) + 1)
for comb in permutations(additional_columns, r)
]
q = session.query(User, *tiles)
return q
q = get_query()
print [_c["name"] for _c in q.column_descriptions]
# >>> ['User', 'Ntile_Objects_Created', 'Ntile_UT_Objects_Created', 'Ntile_Country_Objects_Created', 'Ntile_UT_Country_Objects_Created', 'Ntile_Country_UT_Objects_Created']
for tile in q.all():
print tile

Categories

Resources