I have a django application with the following model:
Object A is a simple object extending from Model with a few fields, and let's say, a particular one is a char field called "NAME" and an Integer field called "ORDER". A is abstract, meaning there are no A objects in the database, but instead...
Objects B and C are specializations of A, meaning they inherit from A and they add some other fields.
Now suppose I need all the objects whose field NAME start with the letter "Z", ordered by the ORDER field, but I want all the B and C-specific fields too for those objects. Now I see 2 approaches:
a) Do the queries individually for B and C objects and fetch two lists, merge them, order manually and work with that.
b) Query A objects for names starting with "Z" ordered by "ORDER" and with the result query the B and C objects to bring all the remaining data.
Both approaches sound highly inefficient, in the first one I have to order them myself, in the second one I have to query the database multiple times.
Is there a magical way I'm missing to fetch all B and C objects, ordered in one single method? Or at least a more efficient way to do this than the both mentioned?
Thanks in Advance!
Bruno
If A can be concrete, you can do this all in one query using select_related.
from django.db import connection
q = A.objects.filter(NAME__istartswith='z').order_by('ORDER').select_related('b', 'c')
for obj in q:
obj = obj.b or obj.c or obj
print repr(obj), obj.__dict__ # (to prove the subclass-specific attributes exist)
print "query count:", len(connection.queries)
This question was answered here.
Use the InheritanceManager from the django-model-utils project.
Querying using your "b" method, will allow for you to "bring in" all the remaining data without querying your B and C models separately. You can use the "dot lowercase model name" relation.
http://docs.djangoproject.com/en/dev/topics/db/models/#multi-table-inheritance
for object in A.objects.filter(NAME__istartswith='z').order_by('ORDER'):
if object.b:
// do something
pass
elif object.c:
// do something
pass
You may need to try and except DoesNotExist exceptions. I'm a bit rusty with my django. Good Luck.
So long as you order both queries on B and C, it is fairly easy to merge them without having to do an expensive resort:
# first define a couple of helper functions
def next_or(iterable, other):
try:
return iterable.next(), None
except StopIteration:
return None, other
def merge(x,y,func=lambda a,b: a<=b):
''' merges a pair of sorted iterables '''
xs = iter(x)
ys = iter(y)
a,r = next_or(xs,ys)
b,r = next_or(ys,xs)
while r is None:
if func(a,b):
yield a
a,r = next_or(xs,ys)
else:
yield b
b,r = next_or(ys,xs)
else:
if a is not None:
yield a
else:
yield b
for o in r:
yield o
# now get your objects & then merge them
b_qs = B.objects.filter(NAME__startswith='Z').order_by('ORDER')
c_qs = C.objects.filter(NAME__startswith='Z').order_by('ORDER')
for obj in merge(b_qs,c_qs,lambda a,b: a.ORDER <= b.ORDER):
print repr(obj), obj.__dict__
The advantage of this technique is it works with an abstract base class.
Related
In my odoo instance I have several calculated fields on the analytic account object. These fields are calculated to ensure the viewer always has the most up to date overview.
Some of these fields depend on other fields that are by themselves calculated fields. The calculations by themselves are fairly simple (field A = field B + field C). Most of the fields are also depending on the underlying child ids. For example, field A on the top object is a summary of all field A values of the child ids. Field A on the children is calculated on their own field B and C combined, as described above.
The situation I currently find myself in is that for some reason the fields seem to be calculated in a random order. I noticed this because when I refresh in rapid succession I get different values for the same record.
Example:
Field B and C are both 10. I expect A to be 20 (B+C) but most of the times it's actually 0 because field calculation for A happens before B and C. Sometimes it's 10 since either B or C snuck in before A could finish. On very rare occasions it's actually 20....
Note:
- I cannot make the fields stored because they will depend on account move lines which are created at an incredible rate and the database will go absolutely nuts recalculating all records every minute or so.
- I already added the #api.depends but this is only useful if you use stored fields to determine that fields should trigger it, which is not applicable in my situation.
Does anyone know of a solution to this? Or have suggestions on alternative ways of calculating?
[EDIT] Added code
Example code:
#api.multi
#api.depends('child_ids','costs_allowed','total_cost')
def _compute_production_result(self):
for rec in self:
rec_prod_cost = 0.0
if rec.usage_type in ['contract','project']:
for child in rec.child_ids:
rec_prod_cost += child.production_result
elif rec.usage_type in ['cost_control','planning']:
rec_prod_cost = rec.costs_allowed - rec.total_cost
rec.production_result = rec_prod_cost
As you can see, if we are on a contract or project we need to look at the children (cost_control accounts) for their results and ADD them together. If we are actually on a cost_control account, then we can get the actual values by taking field B and C and (in this case) subtracting them.
The problem occurs when EITHER the contract records are handled before the cost_control OR the costs_allowed and total_cost fields are 0.0 when evaluating the cost_control accounts.
Mind you: costs_allowed and total_cost are both calculated fields in their own respect!
You can do as they did in Invoice, many computed fields depends on many other fields and they set a value for each computed field.
#api.one
#api.depends('X', 'Y', ...)
def _compute_amounts(self):
self.A = ...
self.B = ...
self.C = self.A + self.B
You may find python #properties helpful. Rather than just using plain fields this allows you do define something that looks like a field, but is lazy evaluated - i.e. calculated on demand when you 'get' it. This way we can guarantee it's up to date. An example:
import datetime
class Person(object):
def __init__(self):
self._born = datetime.datetime.now()
#property
def age(self):
return datetime.datetime.now() - self._born
p = Person()
# do some stuff...
# We can 'get' age just like a field, but it is lazy evaluated
# i.e. calculated on demand
# This way we can guarantee it's up to date
print(p.age)
So I managed to find a colleague and we figured it out together.
As it so turns out, when you define a method that calculates a field for both it's own record as well as depending on that field on child records, you need to explicitly mention this in the dependencies.
For example:
#api.multi
#api.depends('a', 'b', 'c')
def _compute_a(self):
for rec in self:
if condition:
rec.a = sum(child_ids.a)
else:
rec.a = rec.b + rec.c
In this example, the self object contains records (1,2,3,4).
If you include the dependency but otherwise let the code remain the same, like so:
#api.multi
#api.depends('a', 'b', 'c', 'child_ids.a')
def _compute_a(self):
for rec in self:
if condition:
rec.a = sum(child_ids.a)
else:
rec.a = rec.b + rec.c
will run this method 4 times, starting with the lowest/deepest candidate. So self in this case will be (4), then (3), etc.
Too bad this logic seems to be implied and not really described anywhere (as far as I could see).
First of all, I use the terms "container" and "collection" in a very general way, not linked to any Python terminology.
I have the following function in Python. It takes a list of ids idlist and returns a list of objects from objs corresponding to those ids. That works fine:
def findObj(idlist, objs):
return [next(o for o in objs if id == o.id) for id in idlist]
The problem is: idlist and the return value should not necessarily need to be a Python list. I would wish that idlist could also be one plain id, and the function would return a single object. Or idlist would be a Python set and the function would return a set of objects.
How can I achieve that I can use various "container" types (including a plain id) for idlist and get returned the same "container" type?
I argue that what you have in mind is not a good API.
It's simpler, more robust and less error prone to have the function return a specific type and let the user handle the eventual conversion.
In particular I'd prefer making the function lazy using a generator:
def find_obj(ids, objs):
try:
for id in ids:
yield next(o for o in objs if o.id == id)
except TypeError:
# ids not iterable? assume it is a plain id
yield next(o for o in objs if o.id == ids)
Now a user can just do:
list(find_obj(...))
set(find_obj(...))
next(find_obj(...)) # 1 element
And obtain the thing he wants.
Added benefits:
Explicit is better than implicit: here the type conversion is explicit. Image code where the calls are of the kind find_obj(some_var_defined_elsewhere, objects) now how do you know which type will be returned if the definition of the input is not near there?
you can pass a type X as input and convert to type Y without wasting intermediate space and doing an unneccessary conversion
No special cases needed. The caller can provide an input that doesn't follow the usual way to construct containers (note that there is no standard way to build a container)
Alternative that special cases the single id case:
def find_obj(ids, objs):
try:
return (next(o for o in objs if o.id == id) for id in ids)
except TypeError:
for o in objs:
if o.id == id:
return o
The above returns a generator when given a sequence of ids and returns a plain object (instead of a 1-element-generator) when passed in a single id.
Finally: most of the time (but not always) sequences have a constructor that accepts an iterable and builds the container with those elements. This means that:
type(some_iterable)(something for el in some_iterable)
will produce a container of the same type as some_iterable.
Note that some classes require a list instead of a generic iterable so you'd have to use type(some_iterable)([<as-before>]) and other containers do not have such a constructor. In this last case only the caller could perform the conversion. The first solution handles this nicely without any special case.
You could generalize more the function by adding a parameter to perform the conversion:
def find_obj(ids, objs, converter=None):
if converter is None:
converter = type(ids)
try:
return converter(next(o for o in objs if o.id == id) for id in ids)
except TypeError:
return next(o for o in objs if o.id == ids)
in this way the caller can customize the conversion if he's dealing with strange types.
An added note: in python we use "duck typing", i.e. just use the object as if it was of the correct type and if it raises an exception fallback to do other stuff. In some cases it's simpler to first check for support of certain operations, in that cases you could use isinstance with the abstract base classes found in collections.abc to see if an object is Iterable, a Sequence, a Mapping etc.
This should work if your plain object's id is passed (for idliist) as int (not string)
def findObj(idlist, objs):
t = type(idlist)
try:
iter(idlist)
return t([next(o for o in objs if id == o.id) for id in idlist])
except TypeError, te: #not iterator. So single id and object
return idlist if idlist == objs.id else False
You can change your code to:
import collections
def findObj(idlist, objs):
if isinstance(idlist, collections.Iterable):
return type(idlist)([next(o for o in objs if id == o.id) for id in idlist])
else:
#cover case when idlist is just plain object
pass
I'm looking for a SQL-relational-table-like data structure in python, or some hints for implementing one if none already exist. Conceptually, the data structure is a set of objects (any objects), which supports efficient lookups/filtering (possibly using SQL-like indexing).
For example, lets say my objects all have properties A, B, and C, which I need to filter by, hence I define the data should be indexed by them. The objects may contain lots of other members, which are not used for filtering. The data structure should support operations equivalent to SELECT <obj> from <DATASTRUCTURE> where A=100 (same for B and C). It should also be possible to filter by more than one field (where A=100 and B='bar').
The requirements are:
Should support a large number of items (~200K). The items must be the objects themselves, and not some flattened version of them (which rules out sqlite and likely pandas).
Insertion should be fast, should avoid reallocation of memory (which pretty much rules out pandas)
Should support simple filtering (like the example above), which must be more efficient than O(len(DATA)), i.e. avoid "full table scans".
Does such data structure exist?
Please don't suggest using sqlite. I'd need to repeatedly convert object->row and row->object, which is time consuming and cumbersome since my objects are not necessarily flat-ish.
Also, please don't suggest using pandas because repeated insertions of rows is too slow as it may requires frequent reallocation.
So long as you don't have any duplicates on (a,b,c) you could sub-class dict, enter your objects indexed by the tuple(a,b,c), and define your filter method (probably a generator) to return all entries that match your criteria.
class mydict(dict):
def filter(self,a=None, b=None, c=None):
for key,obj in enumerate(self):
if (a and (key[0] == a)) or not a:
if (b and (key[1] == b)) or not b:
if (c and (key[2] == c)) or not c:
yield obj
that is an ugly and very inefficient example, but you get the idea. I'm sure there is a better implementation method in itertools, or something.
edit:
I kept thinking about this. I toyed around with it some last night and came up with storing the objects in a list and storing dictionaries of the indexes by the desired keyfields. Retrieve objects by taking the intersection of the indexes for all specified criteria. Like this:
objs = []
aindex = {}
bindex = {}
cindex = {}
def insertobj(a,b,c,obj):
idx = len(objs)
objs.append(obj)
if a in aindex:
aindex[a].append(idx)
else:
aindex[a] = [idx]
if b in bindex:
bindex[b].append(idx)
else:
bindex[b] = [idx]
if c in cindex:
cindex[c].append(idx)
else :
cindex[c] = [idx]
def filterobjs(a=None,b=None,c=None):
if a : aset = set(aindex[a])
if b : bset = set(bindex[b])
if c : cset = set(cindex[c])
result = set(range(len(objs)))
if a and aset : result = result.intersection(aset)
if b and bset : result = result.intersection(bset)
if c and cset : result = result.intersection(cset)
for idx in result:
yield objs[idx]
class testobj(object):
def __init__(self,a,b,c):
self.a = a
self.b = b
self.c = c
def show(self):
print ('a=%i\tb=%i\tc=%s'%(self.a,self.b,self.c))
if __name__ == '__main__':
for a in range(20):
for b in range(5):
for c in ['one','two','three','four']:
insertobj(a,b,c,testobj(a,b,c))
for obj in filterobjs(a=5):
obj.show()
print()
for obj in filterobjs(b=3):
obj.show()
print()
for obj in filterobjs(a=8,c='one'):
obj.show()
it should be reasonably quick, although the objects are in a list, they are accessed directly by index. The "searching" is done on a hashed dict.
The way to limit the returned set of results from a Django queryset is done via an array slice. For example, to get the first 5 people:
People.objects.all()[0:5]
Or, to get them ordered by name:
People.objects.order_by(name)[0:5]
Or ordered by name, but only those over 65:
People.objects.order_by(name).filter(age__gt=65)[0:5]
In fact the only activity I can think of on a query set that doesn't have a function is limiting.
What I'd like to know is, it there a method (internal, documented or otherwise) that can be called on a QuerySet that acts as a limit or slice?
If not, what is the best way about doing this?
Notes:
Yes, this is probably a bad idea, no I'm not super keen on implementing it, but if there were a good reason for it to be done, could it be?
Yes, I'm aware slices are executed lazily, that's not what I'm asking.
This is not a duplicate of this question, as the accepted answer says:
Do results[:max_count] in a view, after .order_by().
Reviewing the code of Django Querysets, its not as blackboxy as it seemed
def __getitem__(self, k):
"""
Retrieves an item or slice from the set of results.
"""
# ... trimmed ...
if isinstance(k, slice):
qs = self._clone()
if k.start is not None:
start = int(k.start)
else:
start = None
if k.stop is not None:
stop = int(k.stop)
else:
stop = None
qs.query.set_limits(start, stop)
return list(qs)[::k.step] if k.step else qs
qs = self._clone()
qs.query.set_limits(k, k + 1)
return list(qs)[0]
The key line is here:
qs.query.set_limits(start, stop)
The reason the slice is lazy is because it just takes the start and stop values and passes them to another method.
Which corresponds to a call to the sql.Query object here:
def set_limits(self, low=None, high=None):
So it is possible (although probably not recommended) to slice a Queryset like so:
people = People.objects.order_by(name).filter(age__gt=65) # unevaluated
people.query.set_limits(start, stop) # still unevaluated
for person in people: # now its evaluated
person.do_the_thing()
This seems like a simple question, but I feel like I'm missing something.
I have 2 objects: A, and B. B has a ForeignKey to A called my_a, and for various reasons I need to have a ForeignKey on A to B, i.e. A.the_b_used. In a view function I want to create an instance of A (a = A()), and an instance of B (b = B()), and to then link them together. However my objects (a & b) need to have ids before I can link them (right?), so I think you have to do this:
a = A()
b = B()
a.save()
b.save()
a.the_b_used = b
b.my_a = a
a.save()
b.save()
It looks like I have to do 4 .save()'s, i.e. 4 write database operations. Is there a way to do this without having to do as many database opertions? I might be missing something simple.
In most cases, you shouldn't need to have a foriegn key from a parent object to a child object if there's already a foreign key from the child back to the parent. A One-to-one correspondence is achieved by making the foreign key column on the child object unique, so that only one child can link to a particular parent.
Supposing you did exactly this with 'A' as the child, having a foreign key column to parent 'B'. Since the link from b back to a is implicit from the link from a to b, you don't need to know a's id for b to be complete.
a = A()
b = B()
b.save()
b has an 'id', which we can use with for a
a.the_b_used = b
a.save()
That's all you should need.