MongoDB & web2py: Working with ObjectIds

MongoDB & web2py: Working with ObjectIds - python

I'm working on a very simple application as a use case for integrating MongoDB with web2py. In one section of the application, I'm interested in returning a list of products:
My database table:
db.define_table('products',
Field('brand', label='Brand'),
Field('photo', label='Photo'),
...
Field('options', label='Options'))
My controller:
def products():
qset = db(db['products'])
grid = qset.select()
return dict(grid=grid)
My view:
{{extend 'layout.html'}}
<h2>Product List</h2>
{{=grid}}
The products are returned without issue. However, the products._id field returns values in the form '26086541625969213357181461154'. If I switch to the shell (or python) and attempt to query my database based on those _ids, I can't find any of the products.
As you would expect, the _ids in the database are ObjectIds that look like this '544a481b2ceb7c3093a173a2'. I'd like to my view to return the ObjectIds and not the long strings. Simple, but I'm having trouble with it.

When constructing the DAL Row object for a given MongoDB record, the ObjectId is represented by converting to a long integer via long(str(value), 16). To convert back to an ObjectId, you can use the object_id method of the MongoDB adapter:
object_id = db._adapter.object_id('26086541625969213357181461154')
Of course, if you use the DAL to query MongoDB, you don't have to worry about this, as it handles the conversion automatically.

Although it makes perfect sense, I wasn't able to make Anthony's answer work. So, I just hacked it:
hex(value).replace("0x","").replace("L","")

Related

Django with Oracle DB - ORA-19011: Character string buffer too small

I have the following model for an Oracle database, which is not a part of my Django project:
class ResultsData(models.Model):
RESULT_DATA_ID = models.IntegerField(primary_key=True, db_column="RESULT_DATA_ID")
RESULT_XML = models.TextField(blank=True, null=True, db_column="RESULT_XML")
class Meta:
managed = False
db_table = '"schema_name"."results_data"'
The RESULT_XML field in the database itself is declared as XMLField. I chose to represent it as TextField in Django model, due to no character limit.
When I do try to download some data with that model, I get the following error:
DatabaseError: ORA-19011: Character string buffer too small
I figure, it is because of the volume of data stored in RESULT_XML field, since when I try to just pull a record with .values("RESULT_DATA_ID"), it pulls fine.
Any ideas on how I can work around this problem? Googling for answers did not yield anything so far.

UPDATED ANSWER
I have found a much better way of dealing with that issue - I wrote a custom field value Transform object, which generates an Oracle SQL query I was after:
OracleTransforms.py
from django.db.models import TextField
from django.db.models.lookups import Transform
class CLOBVAL(Transform):
'''
Oracle-specific transform for XMLType field, which returns string data exceeding
buffer size (ORA-19011: Character string buffer too small) as a character LOB type.
'''
function = None
lookup_name = 'clobval'
def as_oracle(self, compiler, connection, **extra_context):
return super().as_sql(
compiler, connection,
template='(%(expressions)s).GETCLOBVAL()',
**extra_context
)
# Needed for CLOBVAL to work as a .values('field_name__clobval') lookup in Django ORM queries
TextField.register_lookup(CLOBVAL)
With the above, I can now just write a query as follows:
from .OracleTransforms import CLOBVAL
ResultsData.objects.filter(RESULT_DATA_ID=some_id).values('RESULT_DATA_ID', 'RESULT_XML__clobval')
or
ResultsData.objects.filter(RESULT_DATA_ID=some_id).values('RESULT_DATA_ID', XML = CLOBVAL('RESULT_XML'))
This is the best solution for me, as I do get to keep using QuerySet, instead of RawQuerySet.
The only limitation I see with this solution for now, is that I need to always specify .values(CLOBVAL('RESULT_XML')) in my ORM queries, or Oracle DB will report ORA-19011 again, but I guess this still is a good outcome.
OLD ANSWER
So, I have found a way around the problem, thanks to Christopher Jones suggestion.
ORA-19011 is an error which Oracle DB replies with, when the amount of data it would be sending back as a string exceeds allowed buffer. Therefore, it needs to be sent back as a character LOB object instead.
Django does not have a direct support for that Oracle-specific method (at least I did not find one), so an answer to the problem was a raw Django query:
query = 'select a.RESULT_DATA_ID, a.RESULT_XML.getClobVal() as RESULT_XML FROM SCHEMA_NAME.RESULTS_DATA a WHERE a.RESULT_DATA_ID=%s'
data = ResultsData.objects.raw(query, [id])
This way, you get back a RawQuerySet, which if this less known, less liked cousin of Django's QuerySet. You can iterate through the answer, and RESULT_XML will contain a LOB field, which when interrogated will convert to a String type.
Handling a String type-encoded XML data is problematic, so I also employed XMLTODICT Python package, to get it into a bit more civilized shape.
Next, I should probably look for a way to modify Django's getter for the RESULT_XML field only, and have it generate a query to Oracle DB with .getClobVal() method in it, but I will touch on that in a different StackOverflow question: Django - custom getter for 1 field in model

Django querysets optimization - preventing selection of annotated fields

Let's say I have following models:
class Invoice(models.Model):
...
class Note(models.Model):
invoice = models.ForeignKey(Invoice, related_name='notes', on_delete=models.CASCADE)
text = models.TextField()
and I want to select Invoices that have some notes. I would write it using annotate/Exists like this:
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True)
This works well enough, filters only Invoices with notes. However, this method results in the field being present in the query result, which I don't need and means worse performance (SQL has to execute the subquery 2 times).
I realize I could write this using extra(where=) like this:
Invoice.objects.extra(where=['EXISTS(SELECT 1 FROM note WHERE invoice_id=invoice.id)'])
which would result in the ideal SQL, but in general it is discouraged to use extra / raw SQL.
Is there a better way to do this?

You can remove annotations from the SELECT clause using .values() query set method. The trouble with .values() is that you have to enumerate all names you want to keep instead of names you want to skip, and .values() returns dictionaries instead of model instances.
Django internaly keeps the track of removed annotations in
QuerySet.query.annotation_select_mask. So you can use it to tell Django, which annotations to skip even wihout .values():
class YourQuerySet(QuerySet):
def mask_annotations(self, *names):
if self.query.annotation_select_mask is None:
self.query.set_annotation_mask(set(self.query.annotations.keys()) - set(names))
else:
self.query.set_annotation_mask(self.query.annotation_select_mask - set(names))
return self
Then you can write:
invoices = (Invoice.objects
.annotate(has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
.filter(has_notes=True)
.mask_annotations('has_notes')
)
to skip has_notes from the SELECT clause and still geting filtered invoice instances. The resulting SQL query will be something like:
SELECT invoice.id, invoice.foo FROM invoice
WHERE EXISTS(SELECT note.id, note.bar FROM notes WHERE note.invoice_id = invoice.id) = True
Just note that annotation_select_mask is internal Django API that can change in future versions without a warning.

Ok, I've just noticed in Django 3.0 docs, that they've updated how Exists works and can be used directly in filter:
Invoice.objects.filter(Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
This will ensure that the subquery will not be added to the SELECT columns, which may result in a better performance.
Changed in Django 3.0:
In previous versions of Django, it was necessary to first annotate and then filter against the annotation. This resulted in the annotated value always being present in the query result, and often resulted in a query that took more time to execute.
Still, if someone knows a better way for Django 1.11, I would appreciate it. We really need to upgrade :(

We can filter for Invoices that have, when we perform a LEFT OUTER JOIN, no NULL as Note, and make the query distinct (to avoid returning the same Invoice twice).
Invoice.objects.filter(notes__isnull=False).distinct()

This is best optimize code if you want to get data from another table which primary key reference stored in another table
Invoice.objects.filter(note__invoice_id=OuterRef('pk'),)

We should be able to clear the annotated field using the below method.
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True).query.annotations.clear()

Web2py DAL/built in select with JSON

I am wondering if the DAL supports select with JSON, or if there is a hack to make it able to select JSON fields. I can do the following:
SELECT count(id) FROM my_table WHERE my_json_colum::json->>'form_id' = '%s';" % (dummy_string)
my_count = db.executesql(query)
return my_count
However, the docs suggest this isn't reliabe:
In this case, the return values are not parsed or transformed by the DAL, and the format depends on the specific database driver.
I couldn't find anything in the documentation that suggested support for this. More specifically, when I run the above code it returns just the letter H. Is there a workaround (or better yet a legitimate way to do it that I missed) to get the DAL working with JSON?

The DAL is able to save JSON data in individual fields, but it does not provide a mechanism for querying specific attributes of the JSON data, as that requires special functionality within the RDBMS itself, which is not supported by most databases.

query.group_by in Django 1.9

I am moving code from Django 1.6 to 1.9.
In 1.6 I had this code
models.py
class MyReport(models.Model):
group_id = models.PositiveIntegerField(blank=False, null=False)
views.py
query = MyReport.objects.filter(owner=request.user).query
query.group_by = ['group_id']
entries = QuerySet(query=query, model=MyReport)
The query would return one object for each 'group_id'; due to the way I use it, any table row with the group_id would do as a representative.
With 1.9 this code is broken. The query after the second line above is:
SELECT "reports_myreport"."group_id", ... etc FROM "reports_myreport" WHERE "reports_myreport"."owner_id" = 1 GROUP BY "reports_myreport"."group_id", "reports_report"."otherfield", ...
Basically it lists all the table fields in the group by clause, making the query return the whole table.
Ever though in the debugger I see
query.group_by = ['group_by']
It doesn't look like query.group_by is a method in 1.9 nor does the change-logs of 1.7-1.9 suggest that something changed.
Is there a better way - not depending on internal Django stuff - I can use for my query?
Any way to fix my current query?

You can use order_by() to get the results ordered, in that same query you can order by a second criteria.
If your want to get the groups you will need to iterate over the collection to retrieve those values.
If you consume all of the results returned by the query, you can consider:
a) itertools.groupby which makes an in-memory group by instead, but you should not use it for large data sets.
b) Another option is to use Manager.raw() but you will need to write SQL inside Django, like this:
for report in MyReport.objects.raw('SELECT * FROM reporting_report GROUP by group_id'):
print(report)
This will work for large data sets, but you could lose compatibility with some database engines.
Bonus: I recommend you to understand what exactly the old code did before doing a rewrite.

How to query a couchdb view using a composite key?

I have a couchdb view "record_by_date_product" with the following definition:
function(doc) {
emit([doc.logtime, doc.product_id], doc);
}
I am trying to run a query which is something like:
(logtime > fromdate & logtime < todate) & product_id in (1,2,6)
Is this possible with this view?
I am also using couchdb python library to access couchdb. Here is a code snippet:
server = couchdb.Server()
db = server['mydb']
results = db.view('_design/record_by_date_product/_view/record_by_date_product')
This page http://packages.python.org/CouchDB/client.html#viewresults specifies that we can use a startkey and endkey. But I am not able to get it working.
Thanks

I think I just found the exact answer:
Design a view 'sampleview' which is like:
{
"records_by_date_product": {
"map": "function(doc) {\n emit([doc.prod_id, doc.logtime], doc);\n}"
}
}
Let us say that the query parameters are:
prod_id in [1,3]
from_date = '2010-01-01 00:00:00'
to_date = '2010-01-02 00:00:00'
Then you will have to run 2 separate queries on the same view:
http://localhost:5984/db/_design/sampleview/_view/records_by_date_product?startkey='\["1,2010-01-01%2000:00:00"\]'&endkey='\[1,"2010-01-02%2000:00:00"\]'
http://localhost:5984/db/_design/sampleview/_view/records_by_date_product?startkey='\[2,"2010-01-01%2000:00:00"\]'&endkey='\[2,"2010-01-02%2000:00:00"\]'
Notice that the same query is run each time except that the prod_id is changed in the second query. The results have to be collated later. Hope this helps!

That exact query is not possible. As the documentation suggests, you can get everything in a view in a particular key range. Views are sorted data structures, so all CouchDB does to fulfill this request is locate the start key and begin returning items until you hit the end key.
The strategy you should use for this query depends on characteristics of the data itself. Most importantly, will you waste a lot of time weeding out items if you use only the first part of the key (logtime) and iterate through those in Python, weeding out items where product_id won't match? If so, you should consider writing another view that is primarily sorted by product_id. If not, go ahead and use the weed-out approach.

How about this solution:
I create a view for each product with logtime as the index.
Access each view if required and filter theresults using the range - [fromdate todate]
Do 3 for each product in the input parameters and collate the results
This has a drawback that for every product we will have to create a view and this looks like a manual process.
Just a thought! Let me know your views.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

MongoDB & web2py: Working with ObjectIds - python

Although it makes perfect sense, I wasn't able to make Anthony's answer work. So, I just hacked it: hex(value).replace("0x","").replace("L","")

Related

Django with Oracle DB - ORA-19011: Character string buffer too small

Django querysets optimization - preventing selection of annotated fields

Web2py DAL/built in select with JSON

query.group_by in Django 1.9

How to query a couchdb view using a composite key?

Categories

Resources