How to make Wagtail search case-insensitive

How to make Wagtail search case-insensitive - python

I use Wagtail serach:
query = self.request.query_params
questions = models.Questions.objects.filter(
answer__isnull=False,
owner__isnull=False).exclude(answer__exact='')
s = get_search_backend()
results = s.search(query[u'question'], questions)
And this is how I set up the indexing of my Questions model:
search_fields = [
index.SearchField('question', partial_match=True, boost=2),
index.FilterField('answer'),
index.FilterField('owner_id')
]
But it case sensitive. So queries how and How will give different results.
I need to make my search behave this way:
When I type either how or How, it should return
how to...
How to...
The way how...
THE WAY HoW...
In other words, it should find all mentions of how in all posible cases.
How do I make it work?
P.S.: I'm using default backend, and I'm free to change it if needed.

With Wagtail's elasticsearch backend, fields indexed with partial_match=True are tokenized in lowercase. So to accomplish case-insensitive search all you need to do is lowercase the query string:
results = s.search(query[u'question'].lower(), questions)

Related

filtering with a for or if loop

So I'm currently working on a project where I'm trying to improve the code.
Currently, I have this as my views.py
def home1(request):
if request.user.is_authenticated():
location = request.GET.get('location', request.user.profile.location)
users = User.objects.filter(profile__location=location)
print users
matchesper = Match.objects.get_matches_with_percent(request.user)
print matchesper
matches = [match for match in matchesper if match[0] in users][:20]
Currently, users gives me back a list of user that have the same location as the request.user and matchesper gives me a match percentage with all users. Then matches uses these two lists to give me back a list of users with their match percentage and that match with the request.users location
This works perfectly, however as soon the number of users using the website increases this will become very slow? I could add [:50] at the end of matchesper for example but this means you will never match with older users that have the same location as the request.user.
My question is, is there not a way to just create matches with matchesper for only the users that have the same location? Could I use an if statement before matchesper or a for loop?
I haven't written this code but I do understand it, however when trying to improve it I get very stuck, I hope my explanation and question makes sense.
Thank you for any help in advance I'm very stuck!

(I'm assuming you're using the matchmaker project.)
In Django, you can chain QuerySet methods. You'll notice that the models.py file you're working from defines both a MatchQuerySet and a MatchManager. You might also notice that get_matches_with_percent is only defined on the Manager, not the QuerySet.
This is a problem, but not an insurmountable one. One way around it is to modify which QuerySet our manager method actually works on on. We can do this by creating a new method that is basically a copy of get_matches_with_percent, but with some additional filtering.
class MatchManager(models.Manager):
[...]
def get_matches_with_percent_by_location(self, user, location=None):
if location is None:
location = user.profile.location
user_a = Q(user_a__profile__location=location)
user_b = Q(user_b__profile__location=location)
qs = self.get_queryset().filter(user_a | user_b).matches(user).order_by('-match_decimal')
matches = []
for match in qs:
if match.user_a == user:
items_wanted = [match.user_b, match.get_percent]
matches.append(items_wanted)
elif match.user_b == user:
items_wanted = [match.user_a, match.get_percent]
matches.append(items_wanted)
else:
pass
return matches
Note the use of repeated chaining in line 10! That's the magic.
Other notes:
Q objects are a way of doing complex queries, like multiple "OR" conditions.
An even better solution would factor out the elements that are common to get_matches_with_percent and get_matches_with_percent_by_location to keep the code "DRY", but this is good enough for now ;)
Be mindful of the fact that get_matches_with_percent returns a vanilla list instead of a Django QuerySet; it's a "terminal" method. Thus, you can't use any other QuerySet methods (like filter) after invoking get_matches_with_percent.

Django Rest Framework: How do I order/sort a search/filter query?

I'm building out an API with Django Rest Framework, and I'd like to have a feature that allows users to search by a query. Currently,
http://127.0.0.1:8000/api/v1/species/?name=human yields:
{
count: 3,
next: null,
previous: null,
results: [
{
id: 1,
name: "Humanoid",
characters: [
{
id: 46,
name: "Doctor Princess"
}
]
},
{
id: 3,
name: "Inhuman (overtime)",
characters: [
]
},
{
id: 4,
name: "Human",
characters: [
{
id: 47,
name: "Abraham Lincoln"
}
]
}
]
}
It's pretty close to what I want, but not quite there. I'd like it so that the first object inside results would be the one with the id of 4 since the name field is the most relevant to the search query (?name=human). (I don't really care about how the rest is ordered.) It seems that currently it is sorting the results by ascending id. Anyone know a good way to handle this? Thanks!
Here is my api folder's views.py
class SpeciesFilter(django_filters.FilterSet):
name = django_filters.CharFilter(name="name", lookup_type=("icontains"))
class Meta:
model = Species
fields = ['name']
class SpeciesViewSet(viewsets.ModelViewSet):
queryset = Species.objects.all()
serializer_class = SpeciesSerializer
filter_backends = (filters.DjangoFilterBackend,)
# search_fields = ('name',)
filter_class = SpeciesFilter

You want to sort search result by relevance, in your case name: "Human" should be the best result because it exactly matchs the query word.
If it's only to solve the problem, your could use raw sql query to achieve your goal, which like:
# NOT TESTED, sql expression may vary based on which database you are using
queryset = Species.objects.raw("select * from species where lower(name) like '%human%' order by char_length(name) desc limit 20")
This query will find all record which contains "human"(ignore cases), and sort the result by length of name field desc. which name: "Human" will be the first item to show up.
FYI, Database query usually is not the best approach to do such kind of stuff, you should go check djang-haystack project which helps you build search engine upon django project, fast and simple.

I agree with #piglei on django-haystack, but I think sorting by field value length is a terrible idea, and there is also no need to resort to writing SQL for that. A better way would be something like:
Species.objects.all().extra(select={'relevance': 'char_length(full_name)', order_by=['relevance']) # PostgreSQl
Still terrible, even as a quick fix.
If you really don't want to setup django-haystack, a slightly less terrible approach would be to sort your results using python:
from difflib import SequenceMatcher
species = Species.objects.all()
species = sorted(species,
lambda s: SequenceMatcher(None, needle.lower(), s.name.lower()).quick_ratio(),
reverse=True)
I didn't test this code, so let me know if it doesn't work and also if you need help integrating it in DRF.
The reason why this is still terrible is that difflib's search algorithm differs from the one used to search the database, so you may never actually get results that would have had greater relevance using difflib than some of the ones that __icontains might find. More on that here: Is there a way to filter a django queryset based on string similarity (a la python difflib)?
Edit:
While trying to come up with an example of why sorting by field value length is a terrible idea, I've actually managed to convince myself that it may be the less terrible idea when used with __icontains. I'm gonna leave the answer like this though as it might be useful or interesting to someone. Example:
needle = 'apple'
haystack = ['apple', 'apples', 'apple computers', 'apples are nice'] # Sorted by value length

django-haystack - filter based on query along with query for search term

I am able to search using ?q='search term'. But my requirement is, among the searched terms, I should be able to order them by price etc. filter by another field etc.
Will provide more information if necessary.

You should look into faceting which enables you to search on other fields of a model. Basically it comes down to defining the facets and then enabling the user to search for them, in addition to textual search as you're doing now with keywords.

Assuming you are using a SearchView, override the get_results method to do the extra processing you need on the SearchQuerySet like:
Class MySearchView(SearchView)
#...
def get_results(self):
results = super(MySearchView, self).get_results()
order = self.request.GET.get('order')
if order:
results = results.order_by(order)
return results

Best practices method of implementing a django OR query from an iterable?

I am implementing a one off data importer where I need to search for existing slugs. The slugs are in an array. What is the accepted best practices way of converting an array to an OR query?
I came up with the following, which works, but feels like way too much code to accomplish something this simple.
# slug might be an array or just a string
# ex:
slug = [ "snakes", "snake-s" ] # in the real world this is generated from directory structure on disk
# build the query
query = MyModel.objects
if hasattr(slug, "__iter__"):
q = Q()
for s in slug:
q = q.__or__(Q(slug=s))
query = query.filter(q)
else:
query = query.filter(slug=slug)

slug = ["snakes", "snake-s" ] # in the real world this is generated from directory structure on disk
# build the query
query = MyModel.objects
if hasattr(slug, "__iter__"):
q_list = []
for s in slug:
q_list.append(Q(slug=s))
query = query.filter(reduce(operator.or_, q_list))
else:
query = query.filter(slug=slug)
q_list = [] create a list of Q clauses
reduce(operator.or_, q_list) implode the list with or operators
read this: http://www.michelepasin.org/techblog/2010/07/20/the-power-of-djangos-q-objects/
#MostafaR - sure we could crush my entire codeblock down to one line if we wanted (see below). Its not very readable anymore at that level though. saying code isn't "Pythonic" just because it hadn't become reduced and obsfucated is silly. Readable code is king IMHO. Its also important to keep in mind the purpose of my answer was to show the reduce by an operator technique. The rest of my answer was fluff to show that technique in context to the original question.
result = MyModel.objects.filter(reduce(operator.or_, [Q(slug=s) for s in slug])) if hasattr(slug, "__iter__") else MyModel.objects.filter(slug=slug)

result = MyModel.objects.filter(slug__in=slug).all() if isinstance(slug, list) else MyModel.objects.filter(slug=slug).all()

I believe in this case you should use django's __in field lookup like this:
slugs = [ "snakes", "snake-s" ]
objects = MyModel.objects.filter(slug__in=slugs)

The code that you posted will not work in many ways (but I am not sure if it should be more pseudocode?), but from what I understand, this might help:
MyModel.objects.filter(slug__in=slug)
should do the job.

How do you improve search?

I just got haystack with solr installed and created a custom view:
from haystack.query import SearchQuerySet
def post_search(request, template_name='search/search.html'):
getdata = request.GET.copy()
try:
results = SearchQuerySet().filter(title=getdata['search'])[:10]
except:
results = None
return render_to_response(template_name, locals(), context_instance=RequestContext(request))
This view only returns exact matches on the title field. How do I do at least things like the sql LIKE '%string%' (or at least i think it's this) where if I search 'i' or 'IN' or 'index' I will get the result 'index'?
Also are most of the ways you search edited using haystack or solr?
What other good practices/search improvements do you suggest (please give implementation too)?
Thanks a bunch in advance!

When you use Haystack/Solr, the idea is that you have to tell Haystack/Solr what you want indexed for a particular object. So say you wanted to build a find as you type index for a basic dictionary. If you wanted it to just match prefixes, for the word Boston, you'd need to tell it to index B, Bo, Bos, etc. and then you'd issue a query for whatever the current search expression was and you could return the results. If you wanted to search any part of the word, you'd need to build suffix trees and then Solr would take care of indexing them.
Look at templates in Haystack for more info. http://docs.haystacksearch.org/dev/best_practices.html#well-constructed-templates
The question you're asking is fairly generic, it might help to give specifics about what people are searching for. Then it'll be easier to suggest how to index the data. Good luck.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to make Wagtail search case-insensitive - python

With Wagtail's elasticsearch backend, fields indexed with partial_match=True are tokenized in lowercase. So to accomplish case-insensitive search all you need to do is lowercase the query string: results = s.search(query[u'question'].lower(), questions)

Related

filtering with a for or if loop

Django Rest Framework: How do I order/sort a search/filter query?

django-haystack - filter based on query along with query for search term

Best practices method of implementing a django OR query from an iterable?

How do you improve search?

Categories

Resources