Django: Auto suggest using Elasticsearch DSL - python

I am using elasticsearch-dsl to index data in elasticsearch index. I am am using following class to create elasticsearch document:
class BooksDoc(Document):
title = Text(
analyzer=my_analyzer
)
author = Text(
analyzer=my_analyzer
)
publisher = Text(
analyzer=my_analyzer
)
image_url = Text(analyzer=my_analyzer)
price = Text(analyzer=my_analyzer)
category =Text(analyzer=my_analyzer)
published = Boolean()
upload_date = Text()
class Index:
name = 'books'
Here is my analyzer:
my_analyzer = analyzer('my_analyzer',
tokenizer=tokenizer('trigram', 'edge_ngram', min_gram=1, max_gram=20),
filter=['lowercase']
)
I am using following function to index document:
def indexing(self):
doc = BooksDoc(
meta={'id': self.id},
title=self.book_title,
author=self.book_author,
publisher=self.book_publisher,
image_url=self.front_image,
price=self.book_price,
catagory=__catagory,
published=self.isPublished,
upload_date=self.dateadded
)
try:
doc.save()
return doc.to_dict(include_meta=True)
except:
print(traceback.format_exc())
return None
I want to implement search such that when user enters a search string i can query elasticsearch and fetch all the records where i find a match. For example, if user enters "Boo" then all the records containing string "Boo" should be returned.Right now search works fine when string matches exactly in elasticsearch but i also want to fetch record if there is a partial match. How can i do this?

It really depends what query you are using. Typically if you are using edge_ngram you want to specify a different analyzer for search so that the edge_ngram is not applied to the input. Then you can use a standard match query.
For auto-suggest though it is much better (more performant) to use Completion field and completion suggester (0, 1).
0 - https://www.elastic.co/blog/you-complete-me
1 - http://elasticsearch-dsl.readthedocs.io/en/latest/search_dsl.html#suggestions

Related

Get column names where searched value was found in Django

I have query that performs full text search on several columns (including on columns of models related using FK) in Django:
from django.contrib.postgres.search import SearchVector, SearchQuery, SearchRank
class TaskManager(models.Manager):
def search_by_text(self, text: str):
search_vector = SearchVector(
"task_type__name",
"order__registration_number",
"order__report_number",
"car_owner_name",
"task_number",
"order__customer_order_number",
"order__customer_owner",
"order__report_type__value",
)
search_query = SearchQuery(text)
return self.get_queryset().annotate(
rank=SearchRank(search_vector, search_query)
).order_by("rank")
How can I get not only found records but also column names where searched value was found for each record? Example:
>>> Entry.objects.search_by_text("some value")[0].columns_matched
["task_type__name", "task_number"]
I'm using Postgresql 10.12 and Django 2.2.10.
Solved this problem by creating migration which creates database view which has search_q column containing concatenated string with values from all searched columns.
CREATE VIEW app_taskdata_search_view
AS
SELECT Row_number()
over(
ORDER BY TASK.task_number) AS id,
Concat(tasktype.name, '|', TASK.description, '|', USER.first_name, '|',
USER.last_name) AS search_q
FROM app_taskdata AS TASK
inner join app_tasktype AS tasktype
ON TASK.task_type_id = tasktype.id
inner join users_user AS USER
ON TASK.user_id = USER.id
ORDER BY TASK.task_number;
Then in models.py:
class TaskDataSearchView(models.Model):
"""
Database view refrenced from.
"""
id = models.BigIntegerField(primary_key=True)
search_q = models.TextField()
class Meta:
db_table= "app_taskdata_search_view"
managed = False
Assuming I know the order of concatenated column values I can make a Python code which loops through result and checks if searched value was found in column:
text = "Some text to search"
records = TaskDataSearchView.objects.filter(search_q__icontains=text)
values = records[0].search_q.split("|")
# check if task_type column contains searched text
if text in values[0]:
field_mapping['task_type'] = True
P.S: Useful link

Flask multiple parameters how to avoid multiple if statements when querying multiple columns of a database from one url

I am trying to build an accounting database using flask as the front end. The main page is the ledger, with nine columns "date" "description" "debit" "credit" "amount" "account" "reference" "journal" and "year", I need to be able to query each and some times two at once, there are over 8000 entries, and growing. My code so far displays all the rows, 200 at a time with pagination, I have read "pep 8" which talks about readable code, I have read this multiple parameters and this multiple parameters and like the idea of using
request.args.get
But I need to display all the rows until I query, I have also looked at this nested ifs and I thought perhaps I could use a function for each query and "If" out side of the view function and then call each in the view function, but I am not sure how to. Or I could have a view function for each query. But I am not sure how that would work, here is my code so far,
#bp.route('/books', methods=['GET', 'POST'])
#bp.route('/books/<int:page_num>', methods=['GET', 'POST'])
#bp.route('/books/<int:page_num>/<int:id>', methods=['GET', 'POST'])
#bp.route('/books/<int:page_num>/<int:id>/<ref>', methods=['GET', 'POST'])
#login_required
def books(page_num, id=None, ref=None):
if ref is not None:
books = Book.query.order_by(Book.date.desc()).filter(Book.REF==ref).paginate(per_page=100, page=page_num, error_out=True)
else:
books = Book.query.order_by(Book.date.desc()).paginate(per_page=100, page=page_num, error_out=True)
if id is not None:
obj = Book.query.get(id) or Book()
form = AddBookForm(request.form, obj=obj)
if form.validate_on_submit():
form.populate_obj(obj)
db.session.add(obj)
db.session.commit()
return redirect(url_for('books.books'))
else:
form = AddBookForm()
if form.validate_on_submit():
obj = Book(id=form.id.data, date=form.date.data, description=form.description.data, debit=form.debit.data,\
credit=form.credit.data, montant=form.montant.data, AUX=form.AUX.data, TP=form.TP.data,\
REF=form.REF.data, JN=form.JN.data, PID=form.PID.data, CT=form.CT.data)
db.session.add(obj)
db.session.commit()
return redirect(url_for('books.books', page_num=1))
return render_template('books/books.html', title='Books', books=books, form=form)
With this code there are no error messages, this is a question asking for advice on how to keep my code as readable and as simple as possible and be able to query nine columns of the database whilst displaying all the rows queried and all the rows when no query is activated
All help is greatly appreciated. Paul
I am running this on debian 10 with python 3.7
Edit: I am used to working with Libre Office Base
My question is How do I search one or two columns at a time in My database where I have nine columns out of twelve that I want to be able to search, I want to be able to search one or more at a time, example: column "reference" labels a document reference like "A32", and "account" by a the name of the supplier "FILCUI", possibly both at the same time. I have carried out more research and found that most people advocate a "fulltext" search engine such as "Elastic or Whoosh", But in my case I feel if I search "A32" ( a document number) I will get anything in the model of 12 columns with A 1 2. I have looked at Flask Tutorial 101 search Whoosh all very good tutorials, by excellent people, I thought about trying to use SQLAlchemy as a way, but in the first "Flask Tutorial" he says
but given the fact that SQLAlchemy does not support this functionality,
I thought that this SQLAlchemy-Intergrations will not work either.
So therefor is there a way to "search" "query" "filter" multiple different columns of a model with possibly a form for each search without ending up with a "sack of knots" like code impossible to read or test? I would like to stick to SQLAlchemy if possible
I need just a little pointer in the right direction or a simple personal opinion that I can test.
Warm regards.
EDIT:
I have not answered my question but I have advanced, I can query one row at a time and display all the results on the one page, with out a single "if" statement, i think my code is clear and readable (?) I divided each query into its own view function returning to the same main page, each function has its own submitt button. This has enabled me to render the same page. here is my routes code.
#bp.route('/search_aux', methods=['GET', 'POST'])
#login_required
def search_aux():
page_num = request.args.get('page_num', default = 1, type = int)
books = Book.query.order_by(Book.date.desc()).paginate(per_page=100, page=page_num, error_out=True)
add_form = AddBookForm()
aux_form = SearchAuxForm()
date_form = SearchDateForm()
debit_form = SearchDebitForm()
credit_form = SearchCreditForm()
montant_form = SearchMontantForm()
jn_form = SearchJNForm()
pid_form = SearchPIDForm()
ref_form = SearchREForm()
tp_form = SearchTPForm()
ct_form = SearchCTForm()
des_form = SearchDescriptionForm()
if request.method == 'POST':
aux = aux_form.selectaux.data
books = Book.query.order_by(Book.date.desc()).filter(Book.AUX == str(aux)).paginate(per_page=100, page=page_num, error_out=True)
return render_template('books/books.html', books=books, add_form=add_form, aux_form=aux_form, date_form=date_form, debit_form=debit_form,
credit_form=credit_form, montant_form=montant_form, jn_form=jn_form, pid_form=pid_form, ref_form=ref_form,
tp_form=tp_form, ct_form=ct_form, des_form=des_form)
There is a simple form for each query, it works a treat for each single query. Here is the form and html code:
class SearchAuxForm(FlaskForm):
selectaux = QuerySelectField('Aux', query_factory=AUX, get_label='id')
submitaux = SubmitField('submit')
def AUX():
return Auxilliere.query
html:
<div class="AUX">
<form action="{{ url_for('books.search_aux') }}" method="post">
{{ aux_form.selectaux(class="input") }}{{ aux_form.submitaux(class="submit") }}
</form>
</div>
I tried to do this as a single function with one submit button, but it ended in disaster. I have not submitted this as an answer, Because it does not do all I asked but it is a start.
FINAL EDIT:
I would like to thank the person(s) who reopened this question, allowing mr Lucas Scott to provide a fascinating and informative answer to help me and others.
There are many ways to achieve your desired result of being able to query/filter multiple columns in a table. I will give you an example of how I would approach creating an endpoint that will allow you to filter on one column, or multiple columns.
Here is our basic Books model and the /books endpoint as a stub
import flask
from flask_sqlalchemy import SQLAlchemy
app = flask.Flask(__name__)
db = SQLAlchemy(app) # uses in memory sqlite3 db by default
class Books(db.Model):
__tablename__ = "book"
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
title = db.Column(db.String(255), nullable=False)
author = db.Column(db.String(255), nullable=False)
supplier = db.Column(db.String(255))
published = db.Column(db.Date, nullable=False)
db.create_all()
#app.routes("/books", methods=["GET"])
def all_books():
pass
The first step is to decide on a method of querying a collection by using url parameters. I will use fact that multiple instances of the same key in a query parameter are given as lists to allow us to filter on multiple columns.
For example /books?filter=id&filter=author will turn into {"filter": ["id", "author"]}.
For our querying syntax we will use comma separated values for the filter value.
example:
books?filter=author,eq,jim&suplier,eq,self published
Which turns into {"filter": ["author,eq,jim", "supplier,eq,self published"]}. Notice the space in self published. flask will handle the url-encoding for us and give back a string with a space instead of %20.
Let's clean this up a bit by adding a Filter class to represent our filter query parameter.
class QueryValidationError(Exception):
""" We can handle specific exceptions and
return a http response with flask """
pass
class Filter:
supported_operators = ("eq", "ne", "lt", "gt", "le", "ge")
def __init__(self, column, operator, value):
self.column = column
self.operator = operator
self.value = value
self.validate()
def validate(self):
if operator not in self.supported_operators:
# We will deal with catching this later
raise QueryValidationError(
f"operator `{operator}` is not one of supported "
f"operators `{self.supported_operators}`"
)
Now we will create a function for processing our list of filters into a list of Filter objects.
def create_filters(filters):
filters_processed = []
if filters is None:
# No filters given
return filters_processed
elif isinstance(filters, str):
# if only one filter given
filter_split = filters.split(",")
filters_processed.append(
Filter(*filter_split)
)
elif isinstance(filters, list):
# if more than one filter given
try:
filters_processed = [Filter(*_filter.split(",")) for _filter in filters]
except Exception:
raise QueryValidationError("Filter query invalid")
else:
# Programer error
raise TypeError(
f"filters expected to be `str` or list "
f"but was of type `{type(filters)}`"
)
return filters_processed
and now we can add our helper functions to our endpoint.
#app.route("/books", methods=["GET"])
def all_books():
args = flask.request.args
filters = create_filters(args.get("filter"))
SQLAlchemy allows us to do filtering by using operator overloading. That is using filter(Book.author == "some value"). The == here does not trigger the default == behaviour. Instead the creator of SQLAlchemy has overloaded this operator and instead it creates the SQL query that checks for equality and adds it to the
query. We can leverage this behaviour by using the Pythons operator module. For example:
import operator
from models import Book
authors = Book.query.filter(operator.eq(Book.author, "some author")).all()
This does not seem helpful by it's self, but gets us a step closer to creating a generic and dynamic filtering mechanism. The next important step to making this more dynamic is with the built-in getattr which allows us to look up attributes on a given object using strings. Example:
class Anything:
def say_hi(self):
print("hello")
# use getattr to say hello
getattr(Anything, "say_hi") # returns the function `say_hi`
getattr(Anything, "say_hi")() # calls the function `say_hi`
We can now tie this all together by creating a generic filtering function:
def filter_query(filters, query, model):
for _filter in filters:
# get our operator
op = getattr(operator, _filter.operator)
# get the column to filter on
column = getattr(model, _filter.column)
# value to filter for
value = _filter.value
# build up a query by adding multiple filters
query = query.filter(op(column, value))
return query
We can filter any model with our implementation, and not just by one column.
#app.route("/books", methods=["GET"])
def all_books():
args = flask.request.args
filters = create_filters(args.get("filter"))
query = Books.query
query = filter_query(filters, query, Books)
result = []
for book in query.all():
result.append(dict(
id=book.id,
title=book.title,
author=book.author,
supplier=book.supplier,
published=str(book.published)
))
return flask.jsonify(result), 200
Here is everything all together, and including the error handling of validation errors
import flask
import json
import operator
from flask_sqlalchemy import SQLAlchemy
app = flask.Flask(__name__)
db = SQLAlchemy(app) # uses in memory sqlite3 db by default
class Books(db.Model):
__tablename__ = "book"
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
title = db.Column(db.String(255), nullable=False)
author = db.Column(db.String(255), nullable=False)
supplier = db.Column(db.String(255))
published = db.Column(db.Date, nullable=False)
db.create_all()
class QueryValidationError(Exception):
pass
class Filter:
supported_operators = ("eq", "ne", "lt", "gt", "le", "ge")
def __init__(self, column, operator, value):
self.column = column
self.operator = operator
self.value = value
self.validate()
def validate(self):
if self.operator not in self.supported_operators:
raise QueryValidationError(
f"operator `{self.operator}` is not one of supported "
f"operators `{self.supported_operators}`"
)
def create_filters(filters):
filters_processed = []
if filters is None:
# No filters given
return filters_processed
elif isinstance(filters, str):
# if only one filter given
filter_split = filters.split(",")
filters_processed.append(
Filter(*filter_split)
)
elif isinstance(filters, list):
# if more than one filter given
try:
filters_processed = [Filter(*_filter.split(",")) for _filter in filters]
except Exception:
raise QueryValidationError("Filter query invalid")
else:
# Programer error
raise TypeError(
f"filters expected to be `str` or list "
f"but was of type `{type(filters)}`"
)
return filters_processed
def filter_query(filters, query, model):
for _filter in filters:
# get our operator
op = getattr(operator, _filter.operator)
# get the column to filter on
column = getattr(model, _filter.column)
# value to filter for
value = _filter.value
# build up a query by adding multiple filters
query = query.filter(op(column, value))
return query
#app.errorhandler(QueryValidationError)
def handle_query_validation_error(err):
return flask.jsonify(dict(
errors=[dict(
title="Invalid filer",
details=err.msg,
status="400")
]
)), 400
#app.route("/books", methods=["GET"])
def all_books():
args = flask.request.args
filters = create_filters(args.get("filter"))
query = Books.query
query = filter_query(filters, query, Books)
result = []
for book in query.all():
result.append(dict(
id=book.id,
title=book.title,
author=book.author,
supplier=book.supplier,
published=str(book.published)
))
return flask.jsonify(result), 200
I hope this answers your question, or gives you some ideas on how to tackle your problem.
I would also recommend looking at serialising and marshalling tools like marshmallow-sqlalchemy which will help you simplify turning models into json and back again. It is also helpful for nested object serialisation which can be a pain if you are returning relationships.

How to get query in sqlalchemyORM также

I have few form fields on search page. After performing the search, my page should display a list of possible matching results. If the user typed in only part of a title, ISBN, or author name, search page should find matches for those as well. Also if user typed only one or few field - page should show all matches.
Idk how to write query. If i have one value from request.form and other values is None - so whole query is empty
#app.route('/search', methods=("GET", "POST"))
def search_book():
books = None
if request.method == "POST":
isbn = request.form['isbn']
title = request.form['title']
author = request.form['author']
year = request.form['year']
books = db.query(Books).filter_by(isbn=isbn, title=title, author=author, year=year).all()
return render_template("search.html", books=books)
.filter_by(multiple arguments) will default to AND, meaning the user will have to enter data that matches all fields
This also means that if the user leaves fields empty, the query will only return books that have (for example) title = " " (when title form is empty), despite having entered a valid year.
This is probably not intended, from a user point of view. A way to fix this is to (1. validate input data, and then) add filters to a list if they are not empty, then add the non-empty fields using or_(*filter_list). Query will then return all rows that match any field specified by the forms.
from sqlalchemy import or_
query = db.query(Books)
filter_list = []
if request.form['isbn']:
filter_list.append(Book.isbn.ilike(request.form['isbn']))
if request.form['author']:
filter_list.append(Book.author.ilike(request.form['author']))
# etc...
if len(filter_list) > 0:
query = query.filter(or_(*filter_list))
didnt test code but *filter_list allows you to pass a list to the filter method, or_ allows you to change the default AND to OR
more here:
Using OR in SQLAlchemy
Use the 'or_()' method . This will search all the matching results for any given column
name = request.form.get('res')
result = Books.query.filter(db.or_(Books.author==name , Books.isbn==name , Books.title==name)).all()
'res' is the search entry given by the user in the search form in your html page , as mentioned it can be anything ISBN , title or the author's name

Django/PostgreSQL Full Text Search - Different search results when using SearchVector versus SearchVectorField on AWS RDS PostgreSQL

I'm trying to use the Django SearchVectorField to support full text search. However, I'm getting different search results when I use the SearchVectorField on my model vs. instantiating a SearchVector class in my view. The problem is isolated to an AWS RDS PostgreSQL instance. Both perform the same on my laptop.
Let me try to explain it with some code:
# models.py
class Tweet(models.Model):
def __str__(self):
return self.tweet_id
tweet_id = models.CharField(max_length=25, unique=True)
text = models.CharField(max_length=1000)
text_search_vector = SearchVectorField(null=True, editable=False)
class Meta:
indexes = [GinIndex(fields=['text_search_vector'])]
I've populated all rows with a search vector and have established a trigger on the database to keep the field up to date.
# views.py
query = SearchQuery('chance')
vector = SearchVector('text')
on_the_fly = Tweet.objects.annotate(
rank=SearchRank(vector, query)
).filter(
rank__gte=0.001
)
from_field = Tweet.objects.annotate(
rank=SearchRank(F('text_search_vector'), query)
).filter(
rank__gte=0.001
)
# len(on_the_fly) == 32
# len(from_field) == 0
The on_the_fly queryset, which uses a SearchVector instance, returns 32 results. The from_field queryset, which uses the SearchVectorField, returns 0 results.
The empty result prompted me to drop into the shell to debug. Here's some output from the command line in my python manage.py shell environment:
>>> qs = Tweet.objects.filter(
... tweet_id__in=[949763170863865857, 961432484620787712]
... ).annotate(
... vector=SearchVector('text')
... )
>>>
>>> for tweet in qs:
... print(f'Doc text: {tweet.text}')
... print(f'From db: {tweet.text_search_vector}')
... print(f'From qs: {tweet.vector}\n')
...
Doc text: #Espngreeny Run your 3rd and long play and compete for a chance on third down.
From db: '3rd':4 'chanc':12 'compet':9 'espngreeni':1 'long':6 'play':7 'run':2 'third':14
From qs: '3rd':4 'a':11 'and':5,8 'chance':12 'compete':9 'down':15 'espngreeny':1 'for':10 'long':6 'on':13 'play':7 'run':2 'third':14 'your':3
Doc text: No chance. It was me complaining about Girl Scout cookies. <url-removed-for-stack-overflow>
From db: '/aggcqwddbh':13 'chanc':2 'complain':6 'cooki':10 'girl':8 'scout':9 't.co':12 't.co/aggcqwddbh':11
From qs: '/aggcqwddbh':13 'about':7 'chance':2 'complaining':6 'cookies':10 'girl':8 'it':3 'me':5 'no':1 'scout':9 't.co':12 't.co/aggcqwddbh':11 'was':4
You can see that the search vector looks very different when comparing the value from the database to the value that's generated via Django.
Does anyone have any ideas as to why this would happen? Thanks!
SearchQuery translates the terms the user provides into a search query object that the database compares to a search vector. By default, all the words the user provides are passed through the Stemming algorithms , and then it looks for matches for all of the resulting terms.
there two issue need to be solved first gave stemming algorithm information about language.
query = SearchQuery('chance' , config="english")
and second is replace this line
rank=SearchRank(F('text_search_vector'), query)
with
rank=SearchRank('text_search_vector', query)
about the missing word in text_search_vector this is standard procedure of Stemming algorithms to remove common word known as stop word

Django-tables2 - can't I use [A('argument')] inside the "text" parameter?

I'm trying to make this table with a clickable field which changes the boolean for the entry to its opposite value. It works, but I want an alternative text as "False" or "True" does not look nice, and the users are mainly Norwegian.
def bool_to_norwegian(boolean):
if boolean:
return "Ja"
else:
return "Nei"
class OrderTable(tables.Table):
id = tables.LinkColumn('admin_detail', args=[A('id')])
name = tables.Column()
address = tables.Column()
order = tables.Column()
order_placed_at = tables.DateTimeColumn()
order_delivery_at = tables.DateColumn()
price = tables.Column()
comment = tables.Column()
sent = tables.LinkColumn('status_sent', args=[A('id')])
paid = tables.LinkColumn('status_paid', args=[A('id')], text=[A('paid')])
class Meta:
attrs = {'class': 'order-table'}
If you look under the "paid" entry I am testing this right now, why can't I access the data with the same accessor as I do in the args? If I change the args to args=[A('paid')] and look at the link, it does indeed have the correct data on it. The model names are the same as the ones in this table, and "paid" and "sent" are BooleanFields.
This is kind of what I ultimately want:
text=bool_to_norwegian([A('paid')])
Here is what I send to the table:
orders = Order.objects.order_by("-order_delivery_at")
orders = orders.values()
table = OrderTable(orders)
RequestConfig(request).configure(table)
The text argument expects a callable that accepts a record, and returns a text value. You are passing it a list (which it will just ignore), and your function is expecting a boolean instead of a record. There is also no need for using accessors here.
Something like this should work:
def bool_to_norwegian(record):
if record.paid:
return "Ja"
else:
return "Nei"
Then in your column:
paid = tables.LinkColumn('status_paid', text=bool_to_norwegian)
(Note, it is not clear from your question where the data is coming from - is paid a boolean? You may need to adjust this to fit).
As an aside, the way you are passing args to your columns is weird (it seems the documentation also recommends this, but I don't understand why - it's very confusing). A more standard approach would be:
id = tables.LinkColumn('admin_detail', A('id'))
or using named arguments:
id = tables.LinkColumn('admin_detail', accessor=A('id'))

Categories

Resources