Django with Oracle DB - ORA-19011: Character string buffer too small

Django with Oracle DB - ORA-19011: Character string buffer too small - python

I have the following model for an Oracle database, which is not a part of my Django project:
class ResultsData(models.Model):
RESULT_DATA_ID = models.IntegerField(primary_key=True, db_column="RESULT_DATA_ID")
RESULT_XML = models.TextField(blank=True, null=True, db_column="RESULT_XML")
class Meta:
managed = False
db_table = '"schema_name"."results_data"'
The RESULT_XML field in the database itself is declared as XMLField. I chose to represent it as TextField in Django model, due to no character limit.
When I do try to download some data with that model, I get the following error:
DatabaseError: ORA-19011: Character string buffer too small
I figure, it is because of the volume of data stored in RESULT_XML field, since when I try to just pull a record with .values("RESULT_DATA_ID"), it pulls fine.
Any ideas on how I can work around this problem? Googling for answers did not yield anything so far.

UPDATED ANSWER
I have found a much better way of dealing with that issue - I wrote a custom field value Transform object, which generates an Oracle SQL query I was after:
OracleTransforms.py
from django.db.models import TextField
from django.db.models.lookups import Transform
class CLOBVAL(Transform):
'''
Oracle-specific transform for XMLType field, which returns string data exceeding
buffer size (ORA-19011: Character string buffer too small) as a character LOB type.
'''
function = None
lookup_name = 'clobval'
def as_oracle(self, compiler, connection, **extra_context):
return super().as_sql(
compiler, connection,
template='(%(expressions)s).GETCLOBVAL()',
**extra_context
)
# Needed for CLOBVAL to work as a .values('field_name__clobval') lookup in Django ORM queries
TextField.register_lookup(CLOBVAL)
With the above, I can now just write a query as follows:
from .OracleTransforms import CLOBVAL
ResultsData.objects.filter(RESULT_DATA_ID=some_id).values('RESULT_DATA_ID', 'RESULT_XML__clobval')
or
ResultsData.objects.filter(RESULT_DATA_ID=some_id).values('RESULT_DATA_ID', XML = CLOBVAL('RESULT_XML'))
This is the best solution for me, as I do get to keep using QuerySet, instead of RawQuerySet.
The only limitation I see with this solution for now, is that I need to always specify .values(CLOBVAL('RESULT_XML')) in my ORM queries, or Oracle DB will report ORA-19011 again, but I guess this still is a good outcome.
OLD ANSWER
So, I have found a way around the problem, thanks to Christopher Jones suggestion.
ORA-19011 is an error which Oracle DB replies with, when the amount of data it would be sending back as a string exceeds allowed buffer. Therefore, it needs to be sent back as a character LOB object instead.
Django does not have a direct support for that Oracle-specific method (at least I did not find one), so an answer to the problem was a raw Django query:
query = 'select a.RESULT_DATA_ID, a.RESULT_XML.getClobVal() as RESULT_XML FROM SCHEMA_NAME.RESULTS_DATA a WHERE a.RESULT_DATA_ID=%s'
data = ResultsData.objects.raw(query, [id])
This way, you get back a RawQuerySet, which if this less known, less liked cousin of Django's QuerySet. You can iterate through the answer, and RESULT_XML will contain a LOB field, which when interrogated will convert to a String type.
Handling a String type-encoded XML data is problematic, so I also employed XMLTODICT Python package, to get it into a bit more civilized shape.
Next, I should probably look for a way to modify Django's getter for the RESULT_XML field only, and have it generate a query to Oracle DB with .getClobVal() method in it, but I will touch on that in a different StackOverflow question: Django - custom getter for 1 field in model

Related

Django querysets optimization - preventing selection of annotated fields

Let's say I have following models:
class Invoice(models.Model):
...
class Note(models.Model):
invoice = models.ForeignKey(Invoice, related_name='notes', on_delete=models.CASCADE)
text = models.TextField()
and I want to select Invoices that have some notes. I would write it using annotate/Exists like this:
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True)
This works well enough, filters only Invoices with notes. However, this method results in the field being present in the query result, which I don't need and means worse performance (SQL has to execute the subquery 2 times).
I realize I could write this using extra(where=) like this:
Invoice.objects.extra(where=['EXISTS(SELECT 1 FROM note WHERE invoice_id=invoice.id)'])
which would result in the ideal SQL, but in general it is discouraged to use extra / raw SQL.
Is there a better way to do this?

You can remove annotations from the SELECT clause using .values() query set method. The trouble with .values() is that you have to enumerate all names you want to keep instead of names you want to skip, and .values() returns dictionaries instead of model instances.
Django internaly keeps the track of removed annotations in
QuerySet.query.annotation_select_mask. So you can use it to tell Django, which annotations to skip even wihout .values():
class YourQuerySet(QuerySet):
def mask_annotations(self, *names):
if self.query.annotation_select_mask is None:
self.query.set_annotation_mask(set(self.query.annotations.keys()) - set(names))
else:
self.query.set_annotation_mask(self.query.annotation_select_mask - set(names))
return self
Then you can write:
invoices = (Invoice.objects
.annotate(has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
.filter(has_notes=True)
.mask_annotations('has_notes')
)
to skip has_notes from the SELECT clause and still geting filtered invoice instances. The resulting SQL query will be something like:
SELECT invoice.id, invoice.foo FROM invoice
WHERE EXISTS(SELECT note.id, note.bar FROM notes WHERE note.invoice_id = invoice.id) = True
Just note that annotation_select_mask is internal Django API that can change in future versions without a warning.

Ok, I've just noticed in Django 3.0 docs, that they've updated how Exists works and can be used directly in filter:
Invoice.objects.filter(Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
This will ensure that the subquery will not be added to the SELECT columns, which may result in a better performance.
Changed in Django 3.0:
In previous versions of Django, it was necessary to first annotate and then filter against the annotation. This resulted in the annotated value always being present in the query result, and often resulted in a query that took more time to execute.
Still, if someone knows a better way for Django 1.11, I would appreciate it. We really need to upgrade :(

We can filter for Invoices that have, when we perform a LEFT OUTER JOIN, no NULL as Note, and make the query distinct (to avoid returning the same Invoice twice).
Invoice.objects.filter(notes__isnull=False).distinct()

This is best optimize code if you want to get data from another table which primary key reference stored in another table
Invoice.objects.filter(note__invoice_id=OuterRef('pk'),)

We should be able to clear the annotated field using the below method.
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True).query.annotations.clear()

Peewee execute_sql with escaped characters

I have wrote a query which has some string replacements. I am trying to update a url in a table but the url has % signs in which causes a tuple index out of range exception.
If I print the query and run in manually it works fine but through peewee causes an issue. How can I get round this? I'm guessing this is because the percentage signs?
query = """
update table
set url = '%s'
where id = 1
""" % 'www.example.com?colour=Black%26white'
db.execute_sql(query)

The code you are currently sharing is incredibly unsafe, probably for the same reason as is causing your bug. Please do not use it in production, or you will be hacked.
Generally: you practically never want to use normal string operations like %, +, or .format() to construct a SQL query. Rather, you should to use your SQL API/ORM's specific built-in methods for providing dynamic values for a query. In your case of SQLite in peewee, that looks like this:
query = """
update table
set url = ?
where id = 1
"""
values = ('www.example.com?colour=Black%26white',)
db.execute_sql(query, values)
The database engine will automatically take care of any special characters in your data, so you don't need to worry about them. If you ever find yourself encountering issues with special characters in your data, it is a very strong warning sign that some kind of security issue exists.
This is mentioned in the Security and SQL Injection section of peewee's docs.

Wtf are you doing? Peewee supports updates.
Table.update(url=new_url).where(Table.id == some_id).execute()

MongoDB & web2py: Working with ObjectIds

I'm working on a very simple application as a use case for integrating MongoDB with web2py. In one section of the application, I'm interested in returning a list of products:
My database table:
db.define_table('products',
Field('brand', label='Brand'),
Field('photo', label='Photo'),
...
Field('options', label='Options'))
My controller:
def products():
qset = db(db['products'])
grid = qset.select()
return dict(grid=grid)
My view:
{{extend 'layout.html'}}
<h2>Product List</h2>
{{=grid}}
The products are returned without issue. However, the products._id field returns values in the form '26086541625969213357181461154'. If I switch to the shell (or python) and attempt to query my database based on those _ids, I can't find any of the products.
As you would expect, the _ids in the database are ObjectIds that look like this '544a481b2ceb7c3093a173a2'. I'd like to my view to return the ObjectIds and not the long strings. Simple, but I'm having trouble with it.

When constructing the DAL Row object for a given MongoDB record, the ObjectId is represented by converting to a long integer via long(str(value), 16). To convert back to an ObjectId, you can use the object_id method of the MongoDB adapter:
object_id = db._adapter.object_id('26086541625969213357181461154')
Of course, if you use the DAL to query MongoDB, you don't have to worry about this, as it handles the conversion automatically.

Although it makes perfect sense, I wasn't able to make Anthony's answer work. So, I just hacked it:
hex(value).replace("0x","").replace("L","")

Django get_or_create raises Duplicate entry for key Primary with defaults

Help! Can't figure this out! I'm getting a Integrity error on get_or_create even with a defaults parameter set.
Here's how the model looks stripped down.
class Example(models.Model):model
user = models.ForeignKey(User)
text = models.TextField()
def __unicode__(self):
return "Example"
I run this in Django:
def create_example_model(user, textJson):
defaults = {text: textJson.get("text", "undefined")}
model, created = models.Example.objects.get_or_create(
user=user,
id=textJson.get("id", None),
defaults=defaults)
if not created:
model.text = textJson.get("text", "undefined")
model.save()
return model
I'm getting an error on the get_or_create line:
IntegrityError: (1062, "Duplicate entry '3020' for key 'PRIMARY'")
It's live so I can't really tell what the input is.
Help? There's actually a defaults set, so it's not like, this problem where they do not have a defaults. Plus it doesn't have together-unique. Django : get_or_create Raises duplicate entry with together_unique
I'm using python 2.6, and mysql.

You shouldn't be setting the id for objects in general, you have to be careful when doing that.
Have you checked to see the value for 'id' that you are putting into the database?
If that doesn't fix your issue then it may be a database issue, for PostgreSQL there is a special sequence used to increment the ID's and sometimes this does not get incremented. Something like the following:
SELECT setval('tablename_id_seq', (SELECT MAX(id) + 1 FROM
tablename_id_seq));

get_or_create() will try to create a new object if it can't find one that is an exact match to the arguments you pass in.
So is what I'm assuming is happening is that a different user has made an object with the id of 3020. Since there is no object with the user/id combo you're requesting, it tries to make a new object with that combo, but fails because a different user has already created an item with the id of 3020.
Hopefully that makes sense. See what the following returns. Might give a little insight as to what has gone on.
models.Example.objects.get(id=3020)
You might need to make 3020 a string in the lookup. I'm assuming a string is coming back from your textJson.get() method.

One common but little documented cause for get_or_create() fails is corrupted database indexes.
Django depends on the assumption that there is only one record for given identifier, and this is in turn enforced using UNIQUE index on this particular field in the database. But indexes are constantly being rewritten and they may get corrupted e.g. when the database crashes unexpectedly. In such case the index may no longer return information about an existing record, another record with the same field is added, and as result you'll be hitting the IntegrityError each time you try to get or create this particular record.
The solution is, at least in PostgreSQL, to REINDEX this particular index, but you first need to get rid of the duplicate rows programmatically.

What checks should be performed on user data from forms?

I'm writing an app engine app, that has some input fields.
Are there any concerns I need to take into account about something like this?

You should validate that any input from your users meets your requirements. For example, if you need an positive integer, then make sure that's what you got.
As far as strings, you don't have to worry about SQL (or GQL in this case) injection as long as you don't construct the queries by hand. Instead use the GqlQuery.bind() method, or the methods provided by Query to pass the values (e.g., Query.filter()). Then these classes will take care of formulating the query so you don't need to worry about the syntax (or injection).
Examples (adapted from the docs linked to previously):
# this basic string query is safe
query = Song.all()
query.filter('title =', self.request.get('title'))
# a GqlQuery version of the previous example
query = GqlQuery("SELECT x FROM Song WHERE title = :1",self.request.get('title'))
# sanitize/validate when you have requirements: e.g., year must be a number
query = Song.all()
try:
year = int(self.request.get('year')) # make sure we got a number
except:
show error msg
query.filter('year =', year)

There are a number of forms libraries that do most of the hard work for you - you should use one of them. Django's newforms library is included with App Engine.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django with Oracle DB - ORA-19011: Character string buffer too small - python

Related

Django querysets optimization - preventing selection of annotated fields

Peewee execute_sql with escaped characters

MongoDB & web2py: Working with ObjectIds

Django get_or_create raises Duplicate entry for key Primary with defaults

What checks should be performed on user data from forms?

Categories

Resources