I want to create the following query in Django:
select field1, count(field1), log(count(field1)) from object_table
where parent_id = 12345
group by field1;
I've implemented field1, count(field1) and group by field1 by following:
from django.db.models import Count
Object.objects.filter(
parent = 12345
).values_list(
'field1'
).annotate(
count=Count('field1')
)
However if I add something like this
.extra(
select={'_log':'log(count)'}
)
it doesn't affect my results. Could you give me a clue what am I doing wrong? How to implement log(count(field)) within Django?
PS, I'm using Django 1.9.
Thanks in advance!
Note that some databases don't natively support logarithm function (e.g. sqlite). This is probably an operation that should be done in your Python code instead of the database query.
import math
for obj in object_list:
# use math.log() for natural logarithm
obj._log = math.log10(obj.count)
If you are certain you can rely on a database function and you want to use the database to perform the computation, you can use raw queries. For example, postgres has the log function implemented:
query = """\
select count(field1), log(count(field1)) as logvalue
from myapp_mymodel
group by field1"""
queryset = MyModel.objects.raw(query)
for obj in queryset:
print(obj.logvalue)
Related
I need a little help, I come from working with relational data models and now I venture into Django Framework, I need to make an API that returns something like this SQL query
SELECT user_profile_userprofile.email,
user_profile_userprofile.name,
business_unity.active AS status,
profile.name AS profile,
business_unity.name AS unit
FROM user_profile_userprofile
JOIN profile ON user_profile_userprofile.id = profile.id
JOIN user_profile_unity ON user_profile_unity.user_id = user_profile_userprofile.id
JOIN business_unity ON user_profile_unity.id = business_unity.id;
The models are already created but I don't know how to make a view in python that meets the conditions of this query
Basically, you need to "preload" the associations using select_related, and then you just use the normal methods to navigate them.
Assuming your models are something like
class UserProfile(Model):
profile = ForeignKey("Profiles", ...)
unity = ForeignKey("Unity", ...)
class Unity(Model):
business_unity = ForeignKey("business.Unity", ...)
qs = UserProfile.select_related("profile", "unity__business_unity").all()
up = qs[0]
Now you have all your data loaded (and more)
print(up.email)
print(up.name)
print(up.unity.business_unity.active)
print(up.profile.name)
print(up.unity.business_unity.name)
I am using sqlalchemy with a database that doesn't support subselects. What that means is that something like this wouldn't work (where Calendar is a model inheriting a declarative base):
Calendar.query.filter(uuid=uuid).count()
I am trying to override the count method with something like this:
def count(self):
col = func.count(literal_column("'uuid'"))
return self.from_self(col).scalar()
However, the from_self bit still does the subselect. I can't do something like this:
session.query(sql.func.count(Calendar.uuid)).scalar()
Because I want all the filter information from the Query. Is there a way I can get the filter arguments for the current Query without doing the subselect?
Thanks~
From the SQLAlchemy documentation:
For fine grained control over specific columns to count, to skip the usage of a subquery or otherwise control of the FROM clause, or to use other aggregate functions, use func expressions in conjunction with query(), i.e.:
from sqlalchemy import func
# count User records, without
# using a subquery.
session.query(func.count(User.id))
# return count of user "id" grouped
# by "name"
session.query(func.count(User.id)).\
group_by(User.name)
from sqlalchemy import distinct
# count distinct "name" values
session.query(func.count(distinct(User.name)))
Source: SQLAlchemy (sqlalchemy.orm.query.Query.count)
A short intoduction to the problem...
PostgreSQL has very neat array fields (int array, string array) and functions for them like UNNEST and ANY.
These fields are supported by Django (I am using djorm_pgarray for that), but functions are not natively supported.
One could use .extra(), but Django 1.8 introduced a new concept of database functions.
Let me provide a most primitive example of what I am basicly doing with all these. A Dealer has a list of makes that it supports. A Vehicle has a make and is linked to a dealer. But it happens that Vehicle's make does not match Dealer's make list, that is inevitable.
MAKE_CHOICES = [('honda', 'Honda'), ...]
class Dealer(models.Model):
make_list = TextArrayField(choices=MAKE_CHOICES)
class Vehicle(models.Model):
dealer = models.ForeignKey(Dealer, null=True, blank=True)
make = models.CharField(max_length=255, choices=MAKE_CHOICES, blank=True)
Having a database of dealers and makes, I want to count all vehicles for which the vehicle's make and its dealer's make list do match. That's how I do it avoiding .extra().
from django.db.models import functions
class SelectUnnest(functions.Func):
function = 'SELECT UNNEST'
...
Vehicle.objects.filter(
make__in=SelectUnnest('dealer__make_list')
).count()
Resulting SQL:
SELECT COUNT(*) AS "__count" FROM "myapp_vehicle"
INNER JOIN "myapp_dealer"
ON ( "myapp_vehicle"."dealer_id" = "myapp_dealer"."id" )
WHERE "myapp_vehicle"."make"
IN (SELECT UNNEST("myapp_dealer"."make_list"))
And it works, and much faster than a traditional M2M approach we could use in Django. BUT, for this task, UNNEST is not a very good solution: ANY is much faster. Let's try it.
class Any(functions.Func):
function = 'ANY'
...
Vehicle.objects.filter(
make=Any('dealer__make_list')
).count()
It generates the following SQL:
SELECT COUNT(*) AS "__count" FROM "myapp_vehicle"
INNER JOIN "myapp_dealer"
ON ( "myapp_vehicle"."dealer_id" = "myapp_dealer"."id" )
WHERE "myapp_vehicle"."make" =
(ANY("myapp_dealer"."make_list"))
And it fails, because braces around ANY are bogus. If you remove them, it runs in the psql console with no problems, and fast.
So my question.
Is there any way to remove these braces? I could not find anything about that in Django documentation.
If not, - maybe there are other ways to rephrase this query?
P. S. I think that an extensive library of database functions for different backends would be very helpful for database-heavy Django apps.
Of course, most of these will not be portable. But you typically do not often migrate such a project from one database backend to another. In our example, using array fields and PostGIS we are stuck to PostgreSQL and do not intend to move.
Is anybody developing such a thing?
P. P. S. One might say that, in this case, we should be using a separate table for makes and intarray instead of string array, that is correct and will be done, but nature of problem does not change.
UPDATE.
TextArrayField is defined at djorm_pgarray. At the linked source file, you can see how it works.
The value is list of text strings. In Python, it is represented as a list. Example: ['honda', 'mazda', 'anything else'].
Here is what is said about it in the database.
=# select id, make from appname_tablename limit 3;
id | make
---+----------------------
58 | {vw}
76 | {lexus,scion,toyota}
39 | {chevrolet}
And underlying PostgreSQL field type is text[].
I've managed to get (more or less) what you need using following:
from django.db.models.lookups import BuiltinLookup
from django.db.models.fields import Field
class Any(BuiltinLookup):
lookup_name = 'any'
def get_rhs_op(self, connection, rhs):
return " = ANY(%s)" % (rhs,)
Field.register_lookup(Any)
and query:
Vehicle.objects.filter(make__any=F('dealer__make_list')).count()
as result:
SELECT COUNT(*) AS "__count" FROM "zz_vehicle"
INNER JOIN "zz_dealer" ON ("zz_vehicle"."dealer_id" = "zz_dealer"."id")
WHERE "zz_vehicle"."make" = ANY(("zz_dealer"."make_list"))
btw. instead djorm_pgarray and TextArrayField you can use native django:
make_list = ArrayField(models.CharField(max_length=200), blank=True)
(to simplify your dependencies)
I have the following model:
class Ticket(models.Model):
# ... other fields omitted
active_at = models.DateTimeField()
duration = models.DurationField()
Given now = datetime.now(), I'd like to retrieve all records for which now is between active_at and active_at + duration.
I'm using Django 1.8. Here are the DurationField docs.
As noted in the documentation, arithmetic with a DurationField will not always work as expected in databases other than PostgreSQL. I don't know to which extend this works or doesn't work in other databases, so you'll have to try it yourself.
If that works, you can use the following query:
from django.db.models import F
active_tickets = Ticket.objects.filter(active_at__lte=now,
active_at__gt=now-F('duration'))
The F object refers to a field in the database, duration in this case.
Assuming that the file models.py in my django application (webapp) is like the following :
from django.db import models
from django.db import connection
class Foo(models.Model):
name = models.CharField(...)
surname = models.CharField(...)
def dictfetchall(cursor):
"Returns all rows from a cursor as a dict"
desc = cursor.description
return [
dict(zip([col[0] for col in desc], row))
for row in cursor.fetchall()
]
def get_foo():
cursor = connection.cursor()
cursor.execute('SELECT * FROM foo_table')
rows = dictfetchall(cursor)
return rows
To get access to my database content, I have basicly two options :
Option 1 :
from webapp.models import Foo
bar = Foo.objects.raw('SELECT * FROM foo_table')
Option 2 :
from application.models import get_foo
bar = get_foo()
Which option is the fastest in execution ?
Is there a better way to do what I want to do ?
There is no direct and clear answer on which approach is better.
Using Manager.raw() still keeps you within the ORM layer and while it returns Model instances you still have a nice database abstraction. But, while making a raw query, django does more than just cursor.execute in order to translate the results into Model instances (see what is happening in RawQuerySet and RawQuery classes).
But (quote from docs):
Sometimes even Manager.raw() isn’t quite enough: you might need to
perform queries that don’t map cleanly to models, or directly execute
UPDATE, INSERT, or DELETE queries.
So, generally speaking, what to choose depends on what results are going to get and what you are going to do with them.
See also:
Performing raw SQL queries
executing-custom-sql-directly
Raw sql queries in Django views
Using the connection cursor is for sure the faster than using raw() as it doesn't instantiate additionals objects... But for really telling what the fastest solution is you should do some benchmarking!
And don't overdo optimizations if not necessary because you are avoiding some of Django's most useful features this way as long as you don't have any serious performance problems. And if you have some they will most likely not be the result of how you execute the query. Of course you will be able to write better queries if you exactly know your use case and the ORM doesn't.