How to make distinct via peewee in the python language? - python

I have a table in the sqlite with column "telegram_id"
I want to get all unique records of this column
def get_all_telegram_ids():
'''Returns all telegram_id from DB'''
#rows = cursor.execute("SELECT DISTINCT telegram_id FROM locations").fetchall()
rows = Locations.select(Locations.telegram_id).distinct()
print(rows)
return rows
but this code does not work :-(

Seems to be working fine:
import random
from peewee import *
db = SqliteDatabase(':memory:')
class Location(Model):
telegram_id = IntegerField()
class Meta:
database = db
db.create_tables([Location])
for i in range(100):
Location.create(telegram_id=random.randint(1, 10))
q = Location.select(Location.telegram_id).distinct()
for l in q:
print(l.telegram_id)
Prints 1-10 with some variation and no duplicates.
Alternative invocation that also works:
q = Location.select(fn.DISTINCT(Location.telegram_id))
for l in q:
print(l.telegram_id)

but this code does not work :-(
Have you considered reading the documentation
By specifying a single value of True the query will use a simple SELECT DISTINCT. Specifying one or more columns will result in a SELECT DISTINCT ON.
So it doesn't seem like you can just call distinct() with no parameters, it's always at least select(True).
Also when asking questions about things not working, it's usually a good idea to explain the ways in which it is not working e.g. the error you get if there is one, or your expectations versus what you're observing.
Even more so when you don't come close to providing a minimal reproducible example.

Related

Django query based on FK — get all, not any

I need to find an order with all order items with status = completed. It looks like this:
FINISHED_STATUSES = [17,18,19]
if active_tab == 'outstanding':
orders = orders.exclude(items__status__in=FINISHED_STATUSES)
However, this query only gives me orders with any order item with a completed status. How would I do the query such that I retrieve only those orders with ALL order items with a completed status?
I think that you need to do raw query here:
Set you orders and items model as Orders and Items:
# raw query
sql = """\
select `orders`.* from `%{orders_table}s` as `orders`
join `%{items_table}s` as `items`
on `items`.`%{item_order_fk}s` = `orders`.`%{order_pk}s`
where `items`.`%{status_field}s` in (%{status_list}s)
group by `orders`.`%{orders_pk}s`
having count(*) = %{status_count)s;
""" % {
"orders_table": Orders._meta.db_table,
"items_table": Items._meta.db_table,
"order_pk": Orders._meta.pk.colum,
"item_order_fk":Items._meta.get_field("order").colum,
"status_field": Items._meta.get_field("status").colum,
"status_list": str(FINISHED_STATUSES)[1:-1],
"status_count": len(FINISHED_STATUSES),
}
orders = Orders.objects.raw(sql)
I was able to get this done by a sort of hackish way. First, I added an additional Boolean column, is_finished. Then, to find an order with at least one non-finished item:
orders = orders.filter(items__status__is_finished=False)
This gives me all un-finished orders.
Doing the opposite of that gets the finished orders:
orders = orders.exclude(items__status__is_finished=False)
Adding the boolean field is a good idea. That way you have your business rules clearly defined in the model.
Now, let's say that you still wanted to do it without resorting to adding fields. This may very well be a requirement given a different set of circumstances. Unfortunately, you can't really use subqueries or arbitrary joins in the Django ORM. You could, however, build up Q objects and make an implicit join in the having clause using filter() and annotate().
from django.db.models.aggregates import Count
from django.db.models import Q
from functools import reduce
from operator import or_
total_items_by_orders = Orders.objects.annotate(
item_count=Count('items'))
finished_items_by_orders = Orders.objects.filter(
items__status__in=FINISHED_STATUSES).annotate(
item_count=Count('items'))
orders = total_items_by_orders.exclude(
reduce(or_, (Q(id=o.id, item_count=o.item_count)
for o in finished_items_by_orders)))
Note that using raw SQL, while less elegant, would usually be more efficient.

How can I query rows with unique values on a joined column?

I'm trying to have my popular_query subquery remove dupe Place.id, but it doesn't remove it. This is the code below. I tried using distinct but it does not respect the order_by rule.
SimilarPost = aliased(Post)
SimilarPostOption = aliased(PostOption)
popular_query = (db.session.query(Post, func.count(SimilarPost.id)).
join(Place, Place.id == Post.place_id).
join(PostOption, PostOption.post_id == Post.id).
outerjoin(SimilarPostOption, PostOption.val == SimilarPostOption.val).
join(SimilarPost,SimilarPost.id == SimilarPostOption.post_id).
filter(Place.id == Post.place_id).
filter(self.radius_cond()).
group_by(Post.id).
group_by(Place.id).
order_by(desc(func.count(SimilarPost.id))).
order_by(desc(Post.timestamp))
).subquery().select()
all_posts = db.session.query(Post).select_from(filter.pick()).all()
I did a test printout with
print [x.place.name for x in all_posts]
[u'placeB', u'placeB', u'placeB', u'placeC', u'placeC', u'placeA']
How can I fix this?
Thanks!
This should get you what you want:
SimilarPost = aliased(Post)
SimilarPostOption = aliased(PostOption)
post_popularity = (db.session.query(func.count(SimilarPost.id))
.select_from(PostOption)
.filter(PostOption.post_id == Post.id)
.correlate(Post)
.outerjoin(SimilarPostOption, PostOption.val == SimilarPostOption.val)
.join(SimilarPost, sql.and_(
SimilarPost.id == SimilarPostOption.post_id,
SimilarPost.place_id == Post.place_id)
)
.as_scalar())
popular_post_id = (db.session.query(Post.id)
.filter(Post.place_id == Place.id)
.correlate(Place)
.order_by(post_popularity.desc())
.limit(1)
.as_scalar())
deduped_posts = (db.session.query(Post, post_popularity)
.join(Place)
.filter(Post.id == popular_post_id)
.order_by(post_popularity.desc(), Post.timestamp.desc())
.all())
I can't speak to the runtime performance with large data sets, and there may be a better solution, but that's what I managed to synthesize from quite a few sources (MySQL JOIN with LIMIT 1 on joined table, SQLAlchemy - subquery in a WHERE clause, SQLAlchemy Query documentation). The biggest complicating factor is that you apparently need to use as_scalar to nest the subqueries in the right places, and therefore can't return both the Post id and the count from the same subquery.
FWIW, this is kind of a behemoth and I concur with user1675804 that SQLAlchemy code this deep is hard to grok and not very maintainable. You should take a hard look at any more low-tech solutions available like adding columns to the db or doing more of the work in python code.
I don't want to sound like the bad guy here but... in my opinion your approach to the issue seems far less than optimal... if you're using postgresql you could simplify the whole thing using WITH ... but a better approach factoring in my assumption that these posts will be read much more often than updated would be to add some columns to your tables that are updated by triggers on insert/update to other tables, at least if performance is likely to ever become an issue this is the solution I'd go with
Not very familiar with sqlalchemy, so can't write it in clear code for you, but the only other solution I can come up with uses at least a subquery to select the things from order by for each of the columns in group by, and that will add significantly to your already slow query

Filtering with joined tables

I'm trying to get some query performance improved, but the generated query does not look the way I expect it to.
The results are retrieved using:
query = session.query(SomeModel).
options(joinedload_all('foo.bar')).
options(joinedload_all('foo.baz')).
options(joinedload('quux.other'))
What I want to do is filter on the table joined via 'first', but this way doesn't work:
query = query.filter(FooModel.address == '1.2.3.4')
It results in a clause like this attached to the query:
WHERE foos.address = '1.2.3.4'
Which doesn't do the filtering in a proper way, since the generated joins attach tables foos_1 and foos_2. If I try that query manually but change the filtering clause to:
WHERE foos_1.address = '1.2.3.4' AND foos_2.address = '1.2.3.4'
It works fine. The question is of course - how can I achieve this with sqlalchemy itself?
If you want to filter on joins, you use join():
session.query(SomeModel).join(SomeModel.foos).filter(Foo.something=='bar')
joinedload() and joinedload_all() are used only as a means to load related collections in one pass, not used for filtering/ordering!. Please read:
http://docs.sqlalchemy.org/en/latest/orm/tutorial.html#joined-load - the note on "joinedload() is not a replacement for join()", as well as :
http://docs.sqlalchemy.org/en/latest/orm/loading.html#the-zen-of-eager-loading

How to filter by joinloaded table in SqlAlchemy?

Lets say I got 2 models, Document and Person. Document got relationship to Person via "owner" property. Now:
session.query(Document)\
.options(joinedload('owner'))\
.filter(Person.is_deleted!=True)
Will double join table Person. One person table will be selected, and the doubled one will be filtered which is not exactly what I want cuz this way document rows will not be filtered.
What can I do to apply filter on joinloaded table/model ?
You are right, table Person will be used twice in the resulting SQL, but each of them serves different purpose:
one is to filter the the condition: filter(Person.is_deleted != True)
the other is to eager load the relationship: options(joinedload('owner'))
But the reason your query returns wrong results is because your filter condition is not complete. In order to make it produce the right results, you also need to JOIN the two models:
qry = (session.query(Document).
join(Document.owner). # THIS IS IMPORTANT
options(joinedload(Document.owner)).
filter(Person.is_deleted != True)
)
This will return correct rows, even though it will still have 2 references (JOINs) to Person table. The real solution to your query is that using contains_eager instead of joinedload:
qry = (session.query(Document).
join(Document.owner). # THIS IS STILL IMPORTANT
options(contains_eager(Document.owner)).
filter(Person.is_deleted != True)
)

Table-column addressing in Python

Often I select data from a SQLite database into a list of dictionaries using something like:
conn.row_factory = sqlite3.Row
c = conn.cursor()
selection = c.execute('select * from myTable')
dataset = selection.fetchall()
dataset1 = [dict(row) for row in dataset]
However, given my database background (Foxpro, SQL-server, etc.) I am more used to using table.column format, which I can get using:
dataset2 = [RowObj(row) for row in dataset]
where
class RowObj(dict):
def __getattr__(self, name):
return self[name]
Question - What is preferable for column value addressing, table['column'] or table.column? In my opinion the latter looks neater. Is it just a matter of personal preference, or are there pros+cons of each approach?
I also need to bear-in mind that one day the database might be changed from SQLite to something line mySQL, so I want minimum code changes if/when that happens.
I don't want to use an ORM package like SQLObject or SQLAlchemy at this stage - not until I am convinced they will benefit my applications.
Regards,
Alan
I fought the row['column'] syntax for a while, but in the end I prefer it. It has two distinct advantages:
row['class'] is correct syntax, but row.class is not; keywords cannot directly be used as property names.
And, more generally, if you ever craft a query whose column names are not valid property names (the above case included) the dictionary-style syntax will allow you to address that column. row.COUNT(*) is obviously not valid syntax, but row['COUNT(*)'] is, etc. (Yes, you could use AS in the query to alias, and that's fine of course. Still, it's a valid concern.)
Having said that, your RowObj class of course supports both means of addressing the columns. I'd still prefer consistency though, and if you have a class column, it's going to look weird if you address it differently: row.widget, row.dingus, row['class']. (One of these things is not like the other...)

Categories

Resources