Django queryset SUM positive and negative values - python

I have a model which have IntegerField named as threshold.
I need to get total SUM of threshold value regardless of negative values.
vote_threshold
100
-200
-5
result = 305
Right now I am doing it like this.
earning = 0
result = Vote.objects.all().values('vote_threshold')
for v in result:
if v.vote_threshold > 0:
earning += v.vote_threshold
else:
earning -= v.vote_threshold
What is a faster and more proper way?

use abs function in django
from django.db.models.functions import Abs
from django.db.models import Sum
<YourModel>.objects.aggregate(s=Sum(Abs("vote_threshold")))

try this:
objects = Vote.objects.extra(select={'abs_vote_threshold': 'abs(vote_threshold)'}).values('abs_vote_threshold')
earning = sum([obj['abs_vote_threshold'] for obj in objects])

I don't think there is an easy way to do the calculation using the Django orm. Unless you have performance issues, there is nothing wrong with doing the calculation in python. You can simplify your code slightly by using sum() and abs().
votes = Vote.objects.all()
earning = sum(abs(v.vote_threshold) for v in votes)
If performance is an issue, you can use raw SQL.
from django.db import connection
cursor = connection.cursor()
cursor.execute("SELECT sum(abs(vote_theshold)) from vote")
row = cursor.fetchone()
earning = row[0]

This one example, if you want to sum negative and positive in one query
select = {'positive': 'sum(if(value>0, value, 0))',
'negative': 'sum(if(value<0, value, 0))'}
summary = items.filter(query).extra(select=select).values('positive', 'negative')[0]
positive, negative = summary['positive'], summary['negative']

Related

SQLAlchemy counting subquery result

I have an SQL query which I want to convert to use the ORM but I cannot get the ORM to count the results from the subquery.
So my working SQL is:
select FOO
,BAR
,TOTALCOUNT
from(
select FOO
,BAR
,COUNT(BAR) OVER (PARTITION BY FOO) AS TOTALCOUNT
from(
SELECT distinct
[FOO]
,[BAR]
FROM [database].[dbo].[table]
)m
)m
WHERE TOTALCOUNT > 10
I have tried to create the equivalent code using the ORM but my final result has just 1's for the final count, the code I have tried is below
subs = session.query(table.FOO,table.BAR).filter(
table.date > datetime.now() - timedelta(days=10),
).distinct().subquery()
result = pd.read_sql(session.query(subs.c.FOO,subs.c.BAR,func.count(subs.c.BAR).label('TOTALCOUNT')).group_by(subs.c.FOO,subs.c.BAR).statement,session.bind)
I have also tried to do it in one query with:
result = pd.read_sql(session.query(table.FOO,table.BAR,func.count(table.BAR).label("TOTALCOUNT")).filter(
and_(
table.date> datetime.now() - timedelta(days= 30),
)
),groupby.order_by(table.FOO).distinct().statement,session.bind)
But that is counting the columns before applying the distinct operator so the count is incorrect. I would really appreciate if someone could assist me or tell me where I am going wrong, I have googled all morning and cant seem to find an answer.
ahh im an idiot, should pay more attention to what I am doing, added the alias and then removed an additional column i was grouping by. However should anyone else ever struggle with something similar here is the working code.
subs = session.query(table.FOO,table.BAR).filter(
table.date > datetime.now() - timedelta(days=10),
).distinct().subquery().alias('subs')
result = pd.read_sql(session.query(subs.c.FOO,func.count(subs.c.BAR).label('TOTALCOUNT'))./
group_by(subs.c.FOO).statement,session.bind)

SQLAlchemy - filtering func.count within query

Let's say that I have a table with a column, that has some integer values and I want to calculate the percentage of values that are over 200 for that column.
Here's the kicker, I would prefer if I could do it inside one query that I could use group_by on.
results = db.session.query(
ClassA.some_variable,
label('entries', func.count(ClassA.some_variable)),
label('percent', *no clue*)
).filter(ClassA.value.isnot(None)).group_by(ClassA.some_variable)
Alternately it would be okay thought not prefered to do the percentage calculation on the client side, something like this.
results = db.session.query(
ClassA.some_variable,
label('entries', func.count(ClassA.some_variable)),
label('total_count', func.count(ClassA.value)),
label('over_200_count', func.count(ClassA.value > 200)),
).filter(ClassA.value.isnot(None)).group_by(ClassA.some_variable)
But I obviously can't filter within the count statemenet, and I can't apply the filter at the end of the query, since if I apply the > 200 constraint at the end, total_count wouldn't work.
Using RAW SQL is an option too, it doesn't have to be Sqlalchemy
MariaDB unfortunately does not support the aggregate FILTER clause, but you can work around that using a CASE expression or NULLIF, since COUNT returns the count of non-null values of given expression:
from sqlalchemy import case
...
func.count(case([(ClassA.value > 200, 1)])).label('over_200_count')
With that in mind you can calculate the percentage simply as
(func.count(case([(ClassA.value > 200, 1)])) * 1.0 /
func.count(ClassA.value)).label('percent')
though there's that one edge: what if func.count(ClassA.value) is 0? Depending on whether you'd consider 0 or NULL a valid return value you could either use yet another CASE expression or NULLIF:
dividend = func.count(case([(ClassA.value > 200, 1)])) * 1.0
divisor = func.count(ClassA.value)
# Zero
case([(divisor == 0, 0)],
else_=dividend / divisor).label('percent')
# NULL
(dividend / func.nullif(divisor, 0)).label('percent')
Finally, you could create a compilation extension for mysql dialect that rewrites a FILTER clause to a suitable CASE expression:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import FunctionFilter
from sqlalchemy.sql.functions import Function
from sqlalchemy import case
#compiles(FunctionFilter, 'mysql')
def compile_functionfilter_mysql(element, compiler, **kwgs):
# Support unary functions only
arg0, = element.func.clauses
new_func = Function(
element.func.name,
case([(element.criterion, arg0)]),
packagenames=element.func.packagenames,
type_=element.func.type,
bind=element.func._bind)
return new_func._compiler_dispatch(compiler, **kwgs)
With that in place you could express the dividend as
dividend = func.count(1).filter(ClassA.value > 200) * 1.0
which compiles to
In [28]: print(dividend.compile(dialect=mysql.dialect()))
count(CASE WHEN (class_a.value > %s) THEN %s END) * %s

Django Aggregate Max doesn't give correct max for CharField

I have this query to give me the next available key from the DB. It works just fine until it get to 10, where it will say that 10 is available when it's not
max_var = ShortUrl.objects.filter(is_custom=False).aggregate(max=Cast(Coalesce(Max('key'), 0),BigIntegerField()))['max'] + 1
The column is a CharField.
Any tips on how to fix this?
You need to annotate(cast string to int) first. Then you can find the aggregate of casted value. ie:
from django.db.models import BigIntegerField
from django.db.models.functions import Cast
from django.db.models import Max
max_var = ShortUrl.objects.filter(is_custom=False) \
.annotate(key_int = Cast('key', output_field=BigIntegerField())) \
.aggregate(max=Max('key_int'))['max'] or 0
max_var+=1

Cumulative (running) sum with django orm and postgresql

Is it possible to calculate the cumulative (running) sum using django's orm? Consider the following model:
class AModel(models.Model):
a_number = models.IntegerField()
with a set of data where a_number = 1. Such that I have a number ( >1 ) of AModel instances in the database all with a_number=1. I'd like to be able to return the following:
AModel.objects.annotate(cumsum=??).values('id', 'cumsum').order_by('id')
>>> ({id: 1, cumsum: 1}, {id: 2, cumsum: 2}, ... {id: N, cumsum: N})
Ideally I'd like to be able to limit/filter the cumulative sum. So in the above case I'd like to limit the result to cumsum <= 2
I believe that in postgresql one can achieve a cumulative sum using window functions. How is this translated to the ORM?
For reference, starting with Django 2.0 it is possible to use the Window function to achieve this result:
AModel.objects.annotate(cumsum=Window(Sum('a_number'), order_by=F('id').asc()))\
.values('id', 'cumsum').order_by('id', 'cumsum')
From Dima Kudosh's answer and based on https://stackoverflow.com/a/5700744/2240489 I had to do the following:
I removed the reference to PARTITION BY in the sql and replaced with ORDER BY resulting in.
AModel.objects.annotate(
cumsum=Func(
Sum('a_number'),
template='%(expressions)s OVER (ORDER BY %(order_by)s)',
order_by="id"
)
).values('id', 'cumsum').order_by('id', 'cumsum')
This gives the following sql:
SELECT "amodel"."id",
SUM("amodel"."a_number")
OVER (ORDER BY id) AS "cumsum"
FROM "amodel"
GROUP BY "amodel"."id"
ORDER BY "amodel"."id" ASC, "cumsum" ASC
Dima Kudosh's answer was not summing the results but the above does.
For posterity, I found this to be a good solution for me. I didn't need the result to be a QuerySet, so I could afford to do this, since I was just going to plot the data using D3.js:
import numpy as np
import datettime
today = datetime.datetime.date()
raw_data = MyModel.objects.filter('date'=today).values_list('a_number', flat=True)
cumsum = np.cumsum(raw_data)
You can try to do this with Func expression.
from django.db.models import Func, Sum
AModel.objects.annotate(cumsum=Func(Sum('a_number'), template='%(expressions)s OVER (PARTITION BY %(partition_by)s)', partition_by='id')).values('id', 'cumsum').order_by('id')
Check this
AModel.objects.order_by("id").extra(select={"cumsum":'SELECT SUM(m.a_number) FROM table_name m WHERE m.id <= table_name.id'}).values('id', 'cumsum')
where table_name should be the name of table in database.

how to make a rowcount in ponyorm? Python

I am using this Python ORM to manage my database in my application: ponyorm.com
I just changed an attribute in my table, I would make a rowcount to me that he return 1 for TRUE and 0 for FALSE.
Ex: Using sqlite3 only I would do so:
user = conn.execute('SELECT * FROM users')
count = user.rowcount
if count == 1:
print('Return %d lines' %count)
else:
print('Bad....return %d lines', %count)
Using the rowcount attribute is usually not the right way to count number of rows. According to SQLite documentation regarding rowcount attribute,
Although the Cursor class of the sqlite3 module implements this attribute, the database engine’s own support for the determination of “rows affected”/”rows selected” is quirky.
If you use SQL, the standard way to get count of rows is to use COUNT(*) function. In SQLite it may be achieved in the following way:
cursor = conn.cursor()
cursor.execute('SELECT COUNT(*) FROM users')
rowcount = cursor.fetchone()[0]
With PonyORM you can do count in three alternative ways.
rowcount = select(u for u in User).count() # approach 1
rowcount = User.select().count() # approach 2
rowcount = count(u for u in User) # approach 3
All lines above should produce the same query. You can choose the line which looks the most intuitive to you.
If you want to count not all rows, but only specific ones, you can add condition to the query. For example, to count the number of products which price is greater than 100 you can write any of the following lines:
rowcount = select(p for p in Product if p.price > 100).count() # approach 1
rowcount = Product.select(lambda p: p.price > 100).count() # approach 2
rowcount = count(p for p in Product if p.price > 100) # approach 3
Also you may want to count not the number of rows, but number of different values in specific column. For example, the number of distinct user countries. This may be done in the following way:
user_countries_count = select(count(u.country) for u in User).get()
Hope I answered your question.

Categories

Resources