How to make/use a custom database function in Django

How to make/use a custom database function in Django - python

Prologue:
This is a question arising often in SO:
Equivalent of PostGIS ST_MakeValid in Django GEOS
Geodjango: How to Buffer From Point
Get random point from django PolygonField
Django custom for complex Func (sql function)
and can be applied to the above as well as in the following:
Django F expression on datetime objects
I wanted to compose an example on SO Documentation but since it got shut down on August 8, 2017, I will follow the suggestion of this widely upvoted and discussed meta answer and write my example as a self-answered post.
Of course, I would be more than happy to see any different approach as well!!
Question:
Django/GeoDjango has some database functions like Lower() or MakeValid() which can be used like this:
Author.objects.create(name='Margaret Smith')
author = Author.objects.annotate(name_lower=Lower('name')).get()
print(author.name_lower)
Is there any way to use and/or create my own custom database function based on existing database functions like:
Position() (MySQL)
TRIM() (SQLite)
ST_MakePoint() (PostgreSQL with PostGIS)
How can I apply/use those functions in Django/GeoDjango ORM?

Django provides the Func() expression to facilitate the calling of database functions in a queryset:
Func() expressions are the base type of all expressions that involve database functions like COALESCE and LOWER, or aggregates like SUM.
There are 2 options on how to use a database function in Django/GeoDjango ORM:
For convenience, let us assume that the model is named MyModel and that the substring is stored in a variable named subst:
from django.contrib.gis.db import models as gis_models
class MyModel(models.Model):
name = models.CharField()
the_geom = gis_models.PolygonField()
Use Func()to call the function directly:
We will also need the following to make our queries work:
Aggregation to add a field to each entry in our database.
F() which allows the execution of arithmetic operations on and between model fields.
Value() which will sanitize any given value (why is this important?)
The query:
MyModel.objects.aggregate(
pos=Func(F('name'), Value(subst), function='POSITION')
)
Create your own database function extending Func:
We can extend Func class to create our own database functions:
class Position(Func):
function = 'POSITION'
and use it in a query:
MyModel.objects.aggregate(pos=Position('name', Value(subst)))
GeoDjango Appendix:
In GeoDjango, in order to import a GIS related function (like PostGIS's Transform function) the Func() method must be replaced by GeoFunc(), but it is essentially used under the same principles:
class Transform(GeoFunc):
function='ST_Transform'
There are more complex cases of GeoFunc usage and an interesting use case has emerged here: How to calculate Frechet Distance in Django?
Generalize custom database function Appendix:
In case that you want to create a custom database function (Option 2) and you want to be able to use it with any database without knowing it beforehand, you can use Func's as_<database-name> method, provided that the function you want to use exists in every database:
class Position(Func):
function = 'POSITION' # MySQL method
def as_sqlite(self, compiler, connection):
#SQLite method
return self.as_sql(compiler, connection, function='INSTR')
def as_postgresql(self, compiler, connection):
# PostgreSQL method
return self.as_sql(compiler, connection, function='STRPOS')

Related

DatabaseSessionIsOver with Pony ORM due to lazy loading?

I am using Pony ORM for a flask solution and I've come across the following.
Consider the following:
#db_session
def get_orders_of_the_week(self, user, date):
q = select(o for o in Order for s in o.supplier if o.user == user)
q2 = q.filter(lambda o: o.date >= date and o.date <= date+timedelta(days=7))
res = q2[:]
#for r in res:
# print r.supplier.name
return res
When I need the result in Jinja2 -- which is looks like this
{% for order in res %}
Supplier: {{ order.supplier.name }}
{% endfor %}
I get a
DatabaseSessionIsOver: Cannot load attribute Supplier[3].name: the database session is over
If I uncomment the for r in res part, it works fine. I suspect there is some sort of lazy loading that doesn't get loaded with res = q2[:].
Am I completely missing a point or what's going on here?

I just added prefetch functionality that should solve your problem. You can take working code from the GitHub repository. This feature will be part of the upcoming release Pony ORM 0.5.4.
Now you can write:
q = q.prefetch(Supplier)
or
q = q.prefetch(Order.supplier)
and Pony will automatically load related supplier objects.
Below I'll show several queries with prefetching, using the standard Pony example with Students, Groups and Departments.
from pony.orm.examples.presentation import *
Loading Student objects only, without any prefetching:
students = select(s for s in Student)[:]
Loading students together with groups and departments:
students = select(s for s in Student).prefetch(Group, Department)[:]
for s in students: # no additional query to the DB is required
print s.name, s.group.major, s.group.dept.name
The same as above, but specifying attributes instead of entities:
students = select(s for s in Student).prefetch(Student.group, Group.dept)[:]
for s in students: # no additional query to the DB is required
print s.name, s.group.major, s.group.dept.name
Loading students and its courses (many-to-many relationship):
students = select(s for s in Student).prefetch(Student.courses)
for s in students:
print s.name
for c in s.courses: # no additional query to the DB is required
print c.name
As a parameters of the prefetch() method you can specify entities and/or attributes. If you specified an entity, then all to-one attributes with this type will be prefetched. If you specified an attribute, then this specific attribute will be prefetched. The to-many attributes are prefetched only when specified explicitly (as in the Student.courses example). The prefetching goes recursively, so you can load long chain of attributes, such as student.group.dept.
When object is prefetched, then by default all of its attributes are loaded, except lazy attributes and to-many attributes. You can prefetch lazy and to-many attributes explicitly if it is needed.
I hope this new method fully covers your use-case. If something is not working as expected, please start new issue on GitHub. Also you can discuss functionality and make feature requests at Pony ORM mailing list.
P.S. I'm not sure that repository pattern that you use give your serious benefits. I think that it actually increase coupling between template rendering and repo implementation, because you may need to change repo implementation (i.e. add new entities to prefetching list) when template code start using of new attributes. With the top-level #db_session decorator you can just send query result to the template and all happens automatically, without the need of explicit prefetching. But maybe I'm missing something, so I will be interested to see additional comments about the benefits of using the repository pattern in your case.

This happens because you're trying to access the related object which was not loaded and since you're trying to access it outside of the database session (the function decorated with the db_session), Pony raises this exception.
The recommended approach is to use the db_session decorator at the top level, at the same place where you put the Flask's app.route decorator:
#app.route('/index')
#db_session
def index():
....
return render_template(...)
This way all calls to the database will be wrapped with the database session, which will be finished after a web page is generated.
If there is a reason that you want to narrow the database session to a single function, then you need to iterate the returning objects inside the function decorated with the db_session and access all the necessary related objects. Pony will use the most effective way for loading the related objects from the database, avoiding the N+1 Query problem. This way Pony will extract all the necessary objects within the db_session scope, while the connection to the database is still active.
--- update:
Right now, for loading the related objects, you should iterate over the query result and call the related object attribute:
for r in res:
r.supplier.name
It is similar to the code in your example, I just removed the print statement. When you 'touch' the r.supplier.name attribute, Pony loads all non-lazy attributes of the related supplier object. If you need to load lazy attributes, you need to touch each of them separately.
Seems that we need to introduce a way to specify what related objects should be loaded during the query execution. We will add this feature in one of the future releases.

Mapping a table function to a model using SQLAlchamy

I have a CTE which resides in a table function which requires a parameter being passed into it. The data I need is then called with something like
SELECT * FROM myThingFunction('e543149c-6589-49c6-b962-bf2503c0e278')
What I would like to do if possible is map a SQLAlchamy model so that I can apply filters, limits etc to the record set returned, for example
qry = session.query(Thing).limit(100)
What I am struggling with is how do I handle the parameter. I know that I am treating the function like a table which feels a bit wrong as the function is more of a composite set of relations rather than a table mapping to just one type of domain object but I need to get this data into Python somehow.

Have you seen the recipe for mapping arbitrary selects? You can write a factory method which returns a class representing the query for a given parameter:
def myThingFunction(param):
tmpSelect = select(..)
class tmpCls(Base):
__table__ = tmpSelect
return tmpCls
But there is a note below this recipe which states that creating a mapping is unnecessary. I haven't tried it, but in principle,
session.query(func.myThingFunction("bar")).all()
might work, too. (func.foo creates a GenericFunction on the fly, see the documentation for FunctionElement and below.)

Python: Assign an object's variable from a function (OpenERP)

I'm working on a OpenERP environment, but maybe my issue can be answered from a pure python perspective. What I'm trying to do is define a class whose "_columns" variable can be set from a function that returns the respective dictionary. So basically:
class repos_report(osv.osv):
_name = "repos.report"
_description = "Reposition"
_auto = False
def _get_dyna_cols(self):
ret = {}
cr = self.cr
cr.execute('Select ... From ...')
pass #<- Fill dictionary
return ret
_columns = _get_dyna_cols()
def init(self, cr):
pass #Other stuff here too, but I need to set my _columns before as per openerp
repos_report()
I have tried many ways, but these code reflects my basic need. When I execute my module for installation I get the following error.
TypeError: _get_dyna_cols() takes exactly 1 argument (0 given)
When defining the the _get_dyna_cols function I'm required to have self as first parameter (even before executing). Also, I need a reference to openerp's 'cr' cursor in order to query data to fill my _columns dictionary. So, how can I call this function so that it can be assigned to _columns? What parameter could I pass to this function?
From an OpenERP perspective, I guess I made my need quite clear. So any other approach suggested is also welcome.

From an OpenERP perspective, the right solution depends on what you're actually trying to do, and that's not quite clear from your description.
Usually the _columns definition of a model must be static, since it will be introspected by the ORM and (among other things) will result in the creation of corresponding database columns. You could set the _columns in the __init__ method (not init1) of your model, but that would not make much sense because the result must not change over time, (and it will only get called once when the model registry is initialized anyway).
Now there are a few exceptions to the "static columns" rules:
Function Fields
When you simply want to dynamically handle read/write operations on a virtual column, you can simply use a column of the fields.function type. It needs to emulate one of the other field types, but can do anything it wants with the data dynamically. Typical examples will store the data in other (real) columns after some pre-processing. There are hundreds of example in the official OpenERP modules.
Dynamic columns set
When you are developing a wizard model (a subclass of TransientModel, formerly osv_memory), you don't usually care about the database storage, and simply want to obtain some input from the user and take corresponding actions.
It is not uncommon in that case to need a completely dynamic set of columns, where the number and types of the columns may change every time the model is used. This can be achieved by overriding a few key API methods to simulate dynamic columns`:
fields_view_get is the API method that is called by the clients to obtain the definition of a view (form/tree/...) for the model.
fields_get is included in the result of fields_view_get but may be called separately, and returns a dict with the columns definition of the model.
search, read, write and create are called by the client in order to access and update record data, and should gracefully accept or return values for the columns that were defined in the result of fields_get
By overriding properly these methods, you can completely implement dynamic columns, but you will need to preserve the API behavior, and handle the persistence of the data (if any) yourself, in real static columns or in other models.
There are a few examples of such dynamic columns sets in the official addons, for example in the survey module that needs to simulate survey forms based on the definition of the survey campaign.
1 The init() method is only called when the model's module is installed or updated, in order to setup/update the database backend for this model. It relies on the _columns to do this.

When you write _columns = _get_dyna_cols() in the class body, that function call is made right there, in the class body, as Python is still parsing the class itself. At that point, your _get_dyn_cols method is just a function object in the local (class body) namespace - and it is called.
The error message you get is due to the missing self parameter, which is inserted only when you access your function as a method - but this error message is not what is wrong here: what is wrong is that you are making an imediate function call and expecting an special behavior, like late execution.
The way in Python to achieve what you want - i.e. to have the method called authomatically when the attribute colluns is accessed is to use the "property" built-in.
In this case, do just this: _columns = property(_get_dyna_cols) -
This will create a class attribute named "columns" which through a mechanism called "descriptor protocol" will call the desired method whenever the attribute is accessed from an instance.
To leran more about the property builtin, check the docs: http://docs.python.org/library/functions.html#property

Python, SQLAlchemy and Postgresql: understanding inheritance

Pretty recent (but not newborn) to both Python, SQLAlchemy and Postgresql, and trying to understand inheritance very hard.
As I am taking over another programmer's code, I need to understand what is necessary, and where, for the inheritance concept to work.
My questions are:
Is it possible to rely only on SQLAlchemy for inheritance? In other words, can SQLAlchemy apply inheritance on Postgresql database tables that were created without specifying INHERITS=?
Is the declarative_base technology (SQLAlchemy) necessary to use inheritance the proper way. If so, we'll have to rewrite everything, so please don't discourage me.
Assuming we can use Table instance, empty Entity classes and mapper(), could you give me a (very simple) example of how to go through the process properly (or a link to an easily understandable tutorial - I did not find any easy enough yet).
The real world we are working on is real estate objects. So we basically have
- one table immobject(id, createtime)
- one table objectattribute(id, immoobject_id, oatype)
- several attribute tables: oa_attributename(oa_id, attributevalue)
Thanks for your help in advance.
Vincent

Welcome to Stack Overflow: in the future, if you have more than one question; you should provide a separate post for each. Feel free to link them together if it might help provide context.
Table inheritance in postgres is a very different thing and solves a different set of problems from class inheritance in python, and sqlalchemy makes no attempt to combine them.
When you use table inheritance in postgres, you're doing some trickery at the schema level so that more elaborate constraints can be enforced than might be easy to express in other ways; Once you have designed your schema; applications aren't normally aware of the inheritance; If they insert a row; it just magically appears in the parent table (much like a view). This is useful, for instance, for making some kinds of bulk operations more efficient (you can just drop the table for the month of january).
This is a fundamentally different idea from inheritance as seen in OOP (in python or otherwise, with relational persistence or otherwise). In that case, the application is aware that two types are related, and that the subtype is a permissible substitute for the supertype. "A holding is an address, a contact has an address therefore a contact can have a holding."
Which of these, (mostly orthogonal) tools you need depends on the application. You might need neither, you might need both.
Sqlalchemy's mechanisms for working with object inheritance is flexible and robust, you should use it in favor of a home built solution if it is compatible with your particular needs (this should be true for almost all applications).
The declarative extension is a convenience; It allows you to describe the mapped table, the python class and the mapping between the two in one 'thing' instead of three. It makes your code more "DRY"; It is however only a convenience layered on top of "classic sqlalchemy" and it isn't necessary by any measure.
If you find that you need table inheritance that's visible from sqlalchemy; your mapped classes won't be any different from not using those features; tables with inheritance are still normal relations (like tables or views) and can be mapped without knowledge of the inheritance in the python code.

For your #3, you don't necessarily have to declare empty entity classes to use mapper. If your application doesn't need fancy properties, you can just use introspection and metaclasses to model the existing tables without defining them. Here's what I did:
mymetadata = sqlalchemy.MetaData()
myengine = sqlalchemy.create_engine(...)
def named_table(tablename):
u"return a sqlalchemy.Table object given a SQL table name"
return sqlalchemy.Table(tablename, mymetadata, autoload=True, autoload_with=myengine)
def new_bound_class(engine, table):
u"returns a new ORM class (processed by sqlalchemy.orm.mapper) given a sqlalchemy.Table object"
fieldnames = table.c.__dict__['_data']
def format_attributes(obj, transform):
attributes = [u'%s=%s' % (x, transform(x)) for x in fieldnames]
return u', '.join(attributes)
class DynamicORMClass(object):
def __init__(self, **kw):
u"Keyword arguments may be used to initialize fields/columns"
for key in kw:
if key in fieldnames: setattr(self, key, kw[key])
else: raise KeyError, '%s is not a valid field/column' % (key,)
def __repr__(self):
return u'%s(%s)' % (self.__class__.__name__, format_attributes(self, repr))
def __str__(self):
return u'%s(%s)' % (str(self.__class__), format_attributes(self, str))
DynamicORMClass.__doc__ = u"This is a dynamic class created using SQLAlchemy based on table %s" % (table,)
return sqlalchemy.orm.mapper(DynamicORMClass, table)
def named_orm_class(table):
u"returns a new ORM class (processed by sqlalchemy.orm.mapper) given a table name or object"
if not isinstance(table, Table):
table = named_table(table)
return new_bound_class(table)
Example of use:
>>> myclass = named_orm_class('mytable')
>>> session = Session()
>>> obj = myclass(name='Fred', age=25, ...)
>>> session.add(obj)
>>> session.commit()
>>> print str(obj) # will print all column=value pairs
I beefed up my versions of new_bound_class and named_orm_class a little more with decorators, etc. to provide extra capabilities, and you can too. Of course, under the covers, it is declaring an empty entity class. But you don't have to do it, except this one time.
This will tide you over until you decide that you're tired of doing all those joins yourself, and why can't I just have an object attribute that does a lazy select query against related classes whenever I use it. That's when you make the leap to declarative (or Elixir).

Django ORM: Selecting related set

Say I have 2 models:
class Poll(models.Model):
category = models.CharField(u"Category", max_length = 64)
[...]
class Choice(models.Model):
poll = models.ForeignKey(Poll)
[...]
Given a Poll object, I can query its choices with:
poll.choice_set.all()
But, is there a utility function to query all choices from a set of Poll?
Actually, I'm looking for something like the following (which is not supported, and I don't seek how it could be):
polls = Poll.objects.filter(category = 'foo').select_related('choice_set')
for poll in polls:
print poll.choice_set.all() # this shouldn't perform a SQL query at each iteration
I made an (ugly) function to help me achieve that:
def qbind(objects, target_name, model, field_name):
objects = list(objects)
objects_dict = dict([(object.id, object) for object in objects])
for foreign in model.objects.filter(**{field_name + '__in': objects_dict.keys()}):
id = getattr(foreign, field_name + '_id')
if id in objects_dict:
object = objects_dict[id]
if hasattr(object, target_name):
getattr(object, target_name).append(foreign)
else:
setattr(object, target_name, [foreign])
return objects
which is used as follow:
polls = Poll.objects.filter(category = 'foo')
polls = qbind(polls, 'choices', Choice, 'poll')
# Now, each object in polls have a 'choices' member with the list of choices.
# This was achieved with 2 SQL queries only.
Is there something easier already provided by Django? Or at least, a snippet doing the same thing in a better way.
How do you handle this problem usually?

Time has passed and this functionality is now available in Django 1.4 with the introduction of the prefetch_related() QuerySet function. This function effectively does what is performed by the suggested qbind function. ie. Two queries are performed and the join occurs in Python land, but now this is handled by the ORM.
The original query request would now become:
polls = Poll.objects.filter(category = 'foo').prefetch_related('choice_set')
As is shown in the following code sample, the polls QuerySet can be used to obtain all Choice objects per Poll without requiring any further database hits:
for poll in polls:
for choice in poll.choice_set:
print choice

Update: Since Django 1.4, this feature is built in: see prefetch_related.
First answer: don't waste time writing something like qbind until you've already written a working application, profiled it, and demonstrated that N queries is actually a performance problem for your database and load scenarios.
But maybe you've done that. So second answer: qbind() does what you'll need to do, but it would be more idiomatic if packaged in a custom QuerySet subclass, with an accompanying Manager subclass that returns instances of the custom QuerySet. Ideally you could even make them generic and reusable for any reverse relation. Then you could do something like:
Poll.objects.filter(category='foo').fetch_reverse_relations('choices_set')
For an example of the Manager/QuerySet technique, see this snippet, which solves a similar problem but for the case of Generic Foreign Keys, not reverse relations. It wouldn't be too hard to combine the guts of your qbind() function with the structure shown there to make a really nice solution to your problem.

I think what you're saying is, "I want all Choices for a set of Polls." If so, try this:
polls = Poll.objects.filter(category='foo')
choices = Choice.objects.filter(poll__in=polls)

I think what you are trying to do is the term "eager loading" of child data - meaning you are loading the child list (choice_set) for each Poll, but all in the first query to the DB, so that you don't have to make a bunch of queries later on.
If this is correct, then what you are looking for is 'select_related' - see https://docs.djangoproject.com/en/dev/ref/models/querysets/#select-related
I noticed you tried 'select_related' but it didn't work. Can you try doing the 'select_related' and then the filter. That might fix it.
UPDATE: This doesn't work, see comments below.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.