Suppose I have two models:
class User(models.Model):
name = CharField(max_length=42)
class Action(models.Model):
user = models.ForeignKey(User)
timestamp = models.DateTimeField(auto_now_add=True)
How can I find all users who have more actions on weekends (sunday and saturday) than on other days of week and vice versa?
Edit: I don't need to check this condition for one user, that would be easy. I need to select all users who have one of these conditions hold true.
This can be done in one query with the extra method, passing a custom statement in the WHERE clause. This MySQL example selects all users where the number of actions during the weekend is less than or equal to the number of actions on other days:
f = {
'user_table': User._meta.db_table,
'action_table': Action._meta.db_table,
'user_id': User._meta.pk.get_attname_column()[1],
'user_fk': Action._meta.get_field('user').get_attname_column()[1],
'timestamp': Action._meta.get_field('timestamp').get_attname_column()[1],
}
query = "(SELECT COUNT(*) FROM %(action_table)s \
WHERE %(action_table)s.%(user_fk)s = %(user_table)s.%(user_id)s \
AND DAYOFWEEK(%(action_table)s.%(timestamp)s IN (1,7)) \
<= (SELECT COUNT(*) FROM %(action_table)s \
WHERE %(action_table)s.%(user_fk)s = %(user_table)s.%(user_id)s \
AND DAYOFWEEK(%(action_table)s.%(timestamp)s) NOT IN (1,7))" % f
users = User.objects.extra(where=[query]))
The syntax might be slightly different for backends other than MySQL. You should of course alter the table- and columnnames for your situation.
Related
In Django I builded a model based on view and in admin account i see there is added values that are not present in postgresql database.
These are ordered values in db view:
but in admin values looks like laged by 1 day:
Model in django looks like this:
class VBalance(models.Model):
id = models.IntegerField(primary_key=True)
date = models.DateField()
balance = models.DecimalField(max_digits=14, decimal_places=2)
class Meta:
managed = False # Created from a view. Don't remove.
db_table = 'v_balance'
def __str__(self) -> str:
# return "Balance: " + "{:.2f}".format(self.balance) + " on: " + self.date.strftime("%m/%d/%Y")
return "Balance: " + str(self.balance) + " on: " + self.date.strftime("%m/%d/%Y")
Any ideas how to fix this?
EDIT - db screeens with data:
View v_balance is builded like this:
CREATE OR REPLACE VIEW rap.v_balance
AS SELECT row_number() OVER () AS id,
balance_avg.date,
avg(balance_avg.balance) AS balance
FROM ( SELECT balance.date + 1 AS date,
sum(balance.amount) OVER (ORDER BY balance.date) AS balance
FROM ( SELECT COALESCE(b.date, dt.dt::date) AS date,
b.amount
FROM generate_series('2021-08-24 00:00:00+02'::timestamp with time zone, CURRENT_DATE::timestamp with time zone, '1 day'::interval) dt(dt)
LEFT JOIN import.balance b ON date(dt.dt) = b.date
UNION
SELECT transactions.date,
transactions.amount
FROM ( SELECT date(transactions_xtb.close_time) AS date,
sum(transactions_xtb.net_profit) AS amount
FROM import.transactions_xtb
GROUP BY (date(transactions_xtb.close_time))
ORDER BY (date(transactions_xtb.close_time))) transactions
ORDER BY 1) balance
ORDER BY (balance.date + 1)) balance_avg
GROUP BY balance_avg.date;
It's based on two tables: balance and transactions.
Table balance have only one row and it's not null:
Table transactions also dont have null values:
What's more it's look like django lag date becuase in my view recent date is: 2022-01-02 but in Django is: 2022-01-01.
So dates are laged but values are not.
In a prototype application that uses Python and SQLAlchemy with a PostgreSQL database I have the following schema (excerpt):
class Guest(Base):
__tablename__ = 'guest'
id = Column(Integer, primary_key=True)
name = Column(String(50))
surname = Column(String(50))
email = Column(String(255))
[..]
deleted = Column(Date, default=None)
I want to build a query, using SQLAlchemy, that retrieves the list of guests, to be displayed in the back-office.
To implement pagination I will be using LIMIT and OFFSET, and also COUNT(*) OVER() to get the total amount of records while executing the query (not with a different query).
An example of the SQL query could be:
SELECT id, name, surname, email,
COUNT(*) OVER() AS total
FROM guest
WHERE (deleted IS NULL)
ORDER BY id ASC
LIMIT 50
OFFSET 0
If I were to build the query using SQLAlchemy, I could do something like:
query = session.query(Guest)
query = query.filter(Login.deleted == None)
query = query.order_by(Guest.id.asc())
query = query.offset(0)
query = query.limit(50)
result = query.all()
And if I wanted to count all the rows in the guests table, I could do something like this:
from sqlalchemy import func
query = session.query(func.count(Guest.id))
query = query.filter(Login.deleted == None)
result = query.scalar()
Now the question I am asking is how to execute one single query, using SQLAlchemy, similar to the one above, that kills two birds with one stone (returns the first 50 rows and the count of the total rows to build the pagination links, all in one query).
The interesting bit is the use of window functions in PostgreSQL which allows the abovementioned behaviour, thus saving you from having to query twice but just once.
Is it possible?
Thanks in advance.
So I could not find any examples in the SQLAlchemy documentation, but I found these functions:
count()
over()
label()
And I managed to combine them to produce exactly the result I was looking for:
from sqlalchemy import func
query = session.query(Guest, func.count(Guest.id).over().label('total'))
query = query.filter(Guest.deleted == None)
query = query.order_by(Guest.id.asc())
query = query.offset(0)
query = query.limit(50)
result = query.all()
Cheers!
P.S. I also found this question on Stack Overflow, which was unanswered.
I have a postgis database table called tasks, mapped to a python class Task using geoalchemy2/sqlalchemy - each entry has a MultiPolygon geometry and an integer state. Collectively, entries in my database cover a geographic region. I want to select a random entry of state=0 which is not geographically adjacent to any entry of state=1.
Here's code which selects a random entry of state=0:
class Task(Base):
__tablename__ = "tasks"
id = Column(Integer, primary_key=True, index=True)
geometry = Column(Geometry('MultiPolygon', srid=4326))
state = Column(Integer, default=0)
session = DBSession()
taskgetter = session.query(Task).filter_by(state=0)
count = taskgetter.count()
if count != 0:
atask = taskgetter.offset(random.randint(0, count-1)).first()
So far so good. But now, how to make sure that they are not adjacent to another set of entries?
Geoalchemy has a function ST_Union which can unify geometries, and ST_Disjoint which detects if they intersect or not. SO it seems I should be able to select items of state=1, union them into a single geometry, and then filter down my original query (above) to only keep the items that are disjoint to it. But I can't find a way to express this in geoalchemy. Here's one way I tried:
session = DBSession()
taskgetter = session.query(Task).filter_by(state=0) \
.filter(Task.geometry.ST_Disjoint(session.query( \
Task.geometry.ST_Union()).filter_by(state=1)))
count = taskgetter.count()
if count != 0:
atask = taskgetter.offset(random.randint(0, count-1)).first()
and it yields an error like this:
ProgrammingError: (ProgrammingError) subquery in FROM must have an alias
LINE 3: FROM tasks, (SELECT ST_Union(tasks.geometry) AS "ST_Union_1"...
^
HINT: For example, FROM (SELECT ...) [AS] foo.
'SELECT count(*) AS count_1
FROM (SELECT tasks.id AS tasks_id
FROM tasks, (SELECT ST_Union(tasks.geometry) AS "ST_Union_1"
FROM tasks
WHERE tasks.state = %(state_1)s)
WHERE tasks.state = %(state_2)s AND ST_Disjoint(tasks.geometry, (SELECT ST_Union(tasks.geometry) AS "ST_Union_1"
FROM tasks
WHERE tasks.state = %(state_1)s))) AS anon_1' {'state_1': 1, 'state_2': 0}
A shot in the dark as I don't have the setup to test it :
This seems to be related to SQLAlchemy's subqueries more than GeoAlchemy, try to add .subquery() at the end of your subquery to generate an alias (cf : http://docs.sqlalchemy.org/en/rel_0_9/orm/tutorial.html#using-subqueries)
Edit :
Still using info from the linked tutorial, I think this may work :
state1 = session.query(
Task.geometry.ST_Union().label('taskunion')
).filter_by(state=1).subquery()
taskgetter = session.query(Task)\
.filter_by(state=0)
.filter(Task.geometry.ST_Disjoint(state1.c.taskunion))
Add a label to the column you're creating on your subquery to reference it in your super-query.
how can i have a subquery in django's queryset? for example if i have:
select name, age from person, employee where person.id = employee.id and
employee.id in (select id from employee where employee.company = 'Private')
this is what i have done yet.
Person.objects.value('name', 'age')
Employee.objects.filter(company='Private')
but it not working because it returns two output...
as mentioned by ypercube your use case doesn't require subquery.
but anyway since many people land into this page to learn how to do sub-query here is how its done.
employee_query = Employee.objects.filter(company='Private').only('id').all()
Person.objects.value('name', 'age').filter(id__in=employee_query)
Source:
http://mattrobenolt.com/the-django-orm-and-subqueries/
ids = Employee.objects.filter(company='Private').values_list('id', flat=True)
Person.objects.filter(id__in=ids).values('name', 'age')
The correct answer on your question is here https://docs.djangoproject.com/en/2.1/ref/models/expressions/#subquery-expressions
As an example:
>>> from django.db.models import OuterRef, Subquery
>>> newest = Comment.objects.filter(post=OuterRef('pk')).order_by('-created_at')
>>> Post.objects.annotate(newest_commenter_email=Subquery(newest.values('email')[:1]))
You can create subqueries in Django by using an unevaluated queryset to filter your main queryset. In your case, it would look something like this:
employee_query = Employee.objects.filter(company='Private')
people = Person.objects.filter(employee__in=employee_query)
I'm assuming that you have a reverse relationship from Person to Employee named employee. I found it helpful to look at the SQL query generated by a queryset when I was trying to understand how the filters work.
print people.query
As others have said, you don't really need a subquery for your example. You could just join to the employee table:
people2 = Person.objects.filter(employee__company='Private')
hero_qs = Hero.objects.filter(category=OuterRef("pk")).order_by("-benevolence_factor")
Category.objects.all().annotate(most_benevolent_hero=Subquery(hero_qs.values('name')[:1]))
the generated sql
SELECT "entities_category"."id",
"entities_category"."name",
(SELECT U0."name"
FROM "entities_hero" U0
WHERE U0."category_id" = ("entities_category"."id")
ORDER BY U0."benevolence_factor" DESC
LIMIT 1) AS "most_benevolent_hero"
FROM "entities_category"
For more details, see this article.
Take good care with onlyif your subqueries don't select the primary key.
Example:
class Customer:
pass
class Order:
customer: Customer
pass
class OrderItem:
order: Order
is_recalled: bool
Customer has Orders
Order has OrderItems
Now we are trying to find all customers with at least one recalled order-item.(1)
This will not work properly
order_ids = OrderItem.objects \
.filter(is_recalled=True) \
.only("order_id")
customer_ids = OrderItem.objects \
.filter(id__in=order_ids) \
.only('customer_id')
# BROKEN! BROKEN
customers = Customer.objects.filter(id__in=customer_ids)
The code above looks very fine, but it produces the following query:
select * from customer where id in (
select id -- should be customer_id
from orders
where id in (
select id -- should be order_id
from order_items
where is_recalled = true))
Instead one should use select
order_ids = OrderItem.objects \
.filter(is_recalled=True) \
.select("order_id")
customer_ids = OrderItem.objects \
.filter(id__in=order_ids) \
.select('customer_id')
customers = Customer.objects.filter(id__in=customer_ids)
(1) Note: in a real case we might consider 'WHERE EXISTS'
I need to write a query that returns all object less that or equal to a certain day of a certain month. The year is not important. It's easy enough to get an object by a particular day/month (assume now = datetime.datetime.now()):
posts = TodaysObject.objects.filter(publish_date__day=now.day, publish_date__month=now.month)
But I can't do this:
posts = TodaysObject.objects.filter(publish_date__day__lte=now.day, publish_date__month=now.month)
Seems that Django thinks I'm trying to do a join when combining multiple field lookups (publish_date__day__lte). What's the best way to do this in Django?
Try this:
Option 1:
from django.db.models import Q
datafilter = Q()
for i in xrange(1, now.day+1):
datafilter = datafilter | Q(publish_date__day=i)
datafilter = datafilter & Q(publish_date__month=now.month)
posts = TodaysObject.objects.filter(datafilter)
Option 2:
Perform raw sql query:
def query_dicts(query_string, *query_args):
from django.db import connection
cursor = connection.cursor()
cursor.execute(query_string, query_args)
col_names = [desc[0] for desc in cursor.description]
while True:
row = cursor.fetchone()
if row is None:
break
row_dict = dict(izip(col_names, row))
yield row_dict
return
posts = query_dicts('SELECT * FROM tablename WHERE DAY(publish_date)<=%s AND MONTH(publish_date)=%s', now.day, now.month)
Using extra() function:
posts = TodaysObject.objects.extra([where='DAY(publish_date)<=%d AND MONTH(publish_date)=%d' % (now.day, now.month)])
It's assumed that you are using MySQL. For PostgreSQL, you need to change DAY(publish_date) and MONTH(publish_date) to DATE_PART('DAY', publish_date) and DATE_PART('MONTH', publish_date) respectively.
it's not always portable from one database engine to another, but you may want to look into the extra() queryset method.
from django docs
this allows you to inject raw sql to construct more complex queries than the django queryset api.
if your application needs to be portable to different database engines, you can try restructuring so you have day, month, and year integer fields.
now = datetime.date.today()
post = TodaysObject.objects.raw("SELECT * FROM (app_name)_todaysobject WHERE DAY(publish_date) =%(day)d AND MONTH(publish_date)=%(month)d" %{'day' : today.day, 'month':today.month} )