I am implementing a search feature for user names. Some names have accented characters, but I want to be able to search for them with the nearest ascii character approximation. For example: Vû Trån would be searchable with Vu Tran.
I found a Python library, called unidecode to handle this conversion. It works as expected and takes my unicode string Vû Trån and returns Vu Tran. Perfect.
The issue arises when I start querying my database – I use SQLAlchemy and Postgres.
Here's my Python query:
Person.query.filter(Person.ascii_name.ilike("%{q}%".format(q=query))).limit(25).all()
ascii_name is the getter for my name column, implemented as such
class Person(Base, PersonUtil):
"""
My abbreviated Person class
"""
__tablename__ = 'person'
id = Column(BigInteger, ForeignKey('user.id'), primary_key=True)
first_name = Column(Unicode, nullable=False)
last_name = Column(Unicode, nullable=False)
name = column_property(first_name + " " + last_name)
ascii_name = synonym('name', descriptor=property(fget=PersonUtil._get_ascii_name))
class PersonUtil(object):
def _get_ascii_name(self):
return unidecode(unicode(self.name))
My intent behind this code is that because I store the unicode version of the first and last names in my database, I need to have a way to call unidecode(unicode(name)) when I retrieve the person's name. Hence, I use the descriptor=property(fget=...) so that whenever I call Person.ascii_name, I retrieve the "unidecoded" name attribute. That way, I can simply write Person.ascii_name.ilike("%{my_query}%")... and match the nearest ascii_name to the search query, which is also just ascii characters.
This doesn't fully work. The ilike method with ascii_name works when I do not have any converted characters in the query. For example, the ilike query will work for the name "Bob Smith", but it will not work for "Bøb Smíth". It fails when it encounters the first converted character, which in the case of "Bøb Smíth" is the letter "ø".
I am not sure why this is happening. The ascii_name getter returns my expected string of "Bob Smith" or "Vu Tran", but when coupled with the ilike method, it doesn't work.
Why is this happening? I've not been able to find anything about this issue.
How can I either fix my existing code to make this work, or is there a better way to do this that will work? I would prefer not to have to change my DB schema.
Thank you.
What you want to do simply won't work because ilike only works on real columns in the database. The column_property and synonym are just syntactic sugars provided by sqlalchemy to help with making the front end easy. If you want to leverage the backend to query with LIKE in the way you intended you need the actual values there. I am afraid you have to generate/store the ascii full name into the database which means you need to change your schema to include ascii_name as a real column, and make sure they are inserted. To verify this yourself, you should dump out the data in the table, and see if your manually constructed queries can work.
Related
I have the following model for an Oracle database, which is not a part of my Django project:
class ResultsData(models.Model):
RESULT_DATA_ID = models.IntegerField(primary_key=True, db_column="RESULT_DATA_ID")
RESULT_XML = models.TextField(blank=True, null=True, db_column="RESULT_XML")
class Meta:
managed = False
db_table = '"schema_name"."results_data"'
The RESULT_XML field in the database itself is declared as XMLField. I chose to represent it as TextField in Django model, due to no character limit.
When I do try to download some data with that model, I get the following error:
DatabaseError: ORA-19011: Character string buffer too small
I figure, it is because of the volume of data stored in RESULT_XML field, since when I try to just pull a record with .values("RESULT_DATA_ID"), it pulls fine.
Any ideas on how I can work around this problem? Googling for answers did not yield anything so far.
UPDATED ANSWER
I have found a much better way of dealing with that issue - I wrote a custom field value Transform object, which generates an Oracle SQL query I was after:
OracleTransforms.py
from django.db.models import TextField
from django.db.models.lookups import Transform
class CLOBVAL(Transform):
'''
Oracle-specific transform for XMLType field, which returns string data exceeding
buffer size (ORA-19011: Character string buffer too small) as a character LOB type.
'''
function = None
lookup_name = 'clobval'
def as_oracle(self, compiler, connection, **extra_context):
return super().as_sql(
compiler, connection,
template='(%(expressions)s).GETCLOBVAL()',
**extra_context
)
# Needed for CLOBVAL to work as a .values('field_name__clobval') lookup in Django ORM queries
TextField.register_lookup(CLOBVAL)
With the above, I can now just write a query as follows:
from .OracleTransforms import CLOBVAL
ResultsData.objects.filter(RESULT_DATA_ID=some_id).values('RESULT_DATA_ID', 'RESULT_XML__clobval')
or
ResultsData.objects.filter(RESULT_DATA_ID=some_id).values('RESULT_DATA_ID', XML = CLOBVAL('RESULT_XML'))
This is the best solution for me, as I do get to keep using QuerySet, instead of RawQuerySet.
The only limitation I see with this solution for now, is that I need to always specify .values(CLOBVAL('RESULT_XML')) in my ORM queries, or Oracle DB will report ORA-19011 again, but I guess this still is a good outcome.
OLD ANSWER
So, I have found a way around the problem, thanks to Christopher Jones suggestion.
ORA-19011 is an error which Oracle DB replies with, when the amount of data it would be sending back as a string exceeds allowed buffer. Therefore, it needs to be sent back as a character LOB object instead.
Django does not have a direct support for that Oracle-specific method (at least I did not find one), so an answer to the problem was a raw Django query:
query = 'select a.RESULT_DATA_ID, a.RESULT_XML.getClobVal() as RESULT_XML FROM SCHEMA_NAME.RESULTS_DATA a WHERE a.RESULT_DATA_ID=%s'
data = ResultsData.objects.raw(query, [id])
This way, you get back a RawQuerySet, which if this less known, less liked cousin of Django's QuerySet. You can iterate through the answer, and RESULT_XML will contain a LOB field, which when interrogated will convert to a String type.
Handling a String type-encoded XML data is problematic, so I also employed XMLTODICT Python package, to get it into a bit more civilized shape.
Next, I should probably look for a way to modify Django's getter for the RESULT_XML field only, and have it generate a query to Oracle DB with .getClobVal() method in it, but I will touch on that in a different StackOverflow question: Django - custom getter for 1 field in model
I'm trying to build a system that allows users to make 'projects'. These projects have a fairly simple syntax, they just need an ID, a name, optionally a description and the participants.
Because I want users to be able to add or remove other users from the project without having to input the entire list of users again, I want to make use of a string or array or some such method instead of a string.
However, I'm stuck with trying to input it. I initially tried a regular list, but SQLalchemy didn't accept that. After a google search, it appears that's not possible? Unless I simply haven't come upon it.
I am now trying to use an Array instead. My current code looks like this:
class DBProject(db.Model):
__tablename__ = 'project'
project_id = db.Column(db.Integer, primary_key=True)
project_name = db.Column(db.String(128))
project_description = db.Column(db.String(255), nullable=True)
project_participants = db.Column(db.ARRAY(db.Integer))
But this gives the error: in _compiler_dispatch raise exc.UnsupportedCompilationError(visitor, cls)
sqlalchemy.exc.CompileError: (in table 'project', column 'project_participants'): Compiler can't render element of type
Before, I tried leaving (db.Integer) out or replacing it with just Integer (because I had seen this with other people with similar problems) like this:
project_participants = db.Column(db.ARRAY(Integer))
Which gives the error that 'Integer' is not defined or, in case of leaving it out altogether, this error:
TypeError: init() missing 1 required positional argument:
'item_type'
I'm hoping to add the array to the database so that I can use it to append new users or delete users without having to make the system's user input all allowed users all over again when he just wants to add or delete one.
First i recommend you strongly to save your participants data in an additional table.
You can add a m:n relation between your DBProject-Table and your Participants-Table.
Anything else would be against any best practice use of databases.
Saving your participants as an Array in your table makes it impossible or at least very uncomfortable to filter by participants in a SQL-query.
But if you have a good reason to ignore that recommendation you can use pickle to make SQLAlchemy transform your array into a string while writing into your database.
class DBProject(db.Model):
__tablename__ = 'project'
project_id = db.Column(db.Integer, primary_key=True)
project_name = db.Column(db.String(128))
project_description = db.Column(db.String(255), nullable=True)
project_participants = db.Column(db.PickleType, nullable=True)
Using that construct you can basicalliy put any object (if not exceeding a database specific maximum size) into a database field.
Save data:
dbproject_object = DBProject()
dbproject_object.name = "a_name"
dbproject_object.participants = ["you","him","another one"]
session.add(dbproject_object)
session.commit()
Read Data:
participants_array = db.session.query(DBProject).filter(name == "a_name").one().participants
Result:
participants_array : ["you","him","another one"]
I read about sqlalchemy joinloads like mentioned here and I little confused about the benefits or special usages over simply joining two tables like mentioned here
I would like to know about when to use each method, currently I don't see any benefit for using joinloads for now, can you please explain the difference? And the use cases to prefer joinloads
Sqlalchemy docs says joinedload() is not a replacement for join() and joinedload() doesn't affect the query result :
Query.join()
Query.options(joinedload())
Let's say if you wants to get same date that already related with data you are querying, but when you get this related data it won't change the result of the query it is like an attachment. Better to look sqlalchemy docs joinedload
class User(db.Model):
...
addresses = relationship('Address', backref='user')
class Address(db.Model):
...
user_id = Column(Integer, ForeignKey('users.id'))
The code below query user filter and return that user and optionally you can getting that user addresses.
user = db.session.query(User).options(joinedload(User.addresses)).filter(id==1).one()
Now lets look at join:
user = db.session.query(User).join(Address).filter(User.id==Address.user_id).one()
Conclusion
The query with joinedload() get that user addresses.
Other query, query on both table, check for user id on both table, so the result depend on this. But joinedload() if user doesn't have any address you will have user but no address. in join() if user doesn't have address there will not result.
I am accessing Postgre database using SQLAlchemy models. In one of models I have Column with UUID type.
id = Column(UUID(as_uuid=True), default=uuid.uuid4(), nullable=False, unique=True)
and it works when I try to insert new row (generates new id).
Problem is when I try to fetch Person by id I try like
person = session.query(Person).filter(Person.id.like(some_id)).first()
some_id is string received from client
but then I get error LIKE (Programming Error) operator does not exist: uuid ~~ unknown.
How to fetch/compare UUID column in database through SQLAlchemy ?
don't use like, use =, not == (in ISO-standard SQL, = means equality).
Keep in mind that UUID's are stored in PostgreSQL as binary types, not as text strings, so LIKE makes no sense. You could probably do uuid::text LIKE ? but it would perform very poorly over large sets because you are effectively ensuring that indexes can't be used.
But = works, and is far preferable:
mydb=>select 'd796d940-687f-11e3-bbb6-88ae1de492b9'::uuid = 'd796d940-687f-11e3-bbb6-88ae1de492b9';
?column?
----------
t
(1 row)
Considering my users can save data as "café" or "cafe", I need to be able to search on that fields with an accent-insensitive query.
I've found https://github.com/djcoin/django-unaccent/, but I have no idea if it is possible to implement something similar on sqlalchemy.
I'm using PostgreSQL, so if the solution is specific to this database is good to me. If it is generic solution, it is much much better.
Thanks for your help.
First install the unaccess extension in PostgreSQL: create extension unaccent;
Next, declare the SQL function unaccent in Python:
from sqlalchemy.sql.functions import ReturnTypeFromArgs
class unaccent(ReturnTypeFromArgs):
pass
and use it like this:
for place in session.query(Place).filter(unaccent(Place.name) == "cafe").all():
print place.name
Make sure you have the correct indexes if you have a large table, otherwise this will result in a full table scan.
A simple and database agnostic solution is to write the field(s) that can have accents twice, once with and once without accents. Then you can conduct your searches on the unaccented version.
To generate the unaccented vesrsion of a string you can use Unidecode.
To automatically assign the unaccented version to the database when a record is inserted or updated you can use the default and onupdate clauses in the Column definition. For example, using Flask-SQLAlchemy you could do something like this:
from unidecode import unidecode
def unaccent(context):
return unidecode(context.current_parameters['some_string'])
class MyModel(db.Model):
id = Column(db.Integer, primary_key=True)
some_string = db.Column(db.String(128))
some_string_unaccented = db.Column(db.String(128), default=unaccent, onupdate=unaccent, index=True)
Note how I only indexed the unaccented field, because that is the one on which the searches will be made.
Of course before you can search you also have to unaccent the value you are searching for. For example:
def search(text):
return MyModel.query.filter_by(some_string_unaccented = unaccent(text)).all()
You can apply the same technique to full text search, if necessary.