I am trying to write a little search engine with django-haystac and whoosh.
I adapted their tutorial, I've created my index from a JSON file and query it successfully with QueryParser and now I'm trying to use their view.
when I try to access the search url at: http://127.0.0.1:8000/my_search/ I get the error:
The index 'PaperIndex' must have one (and only one) SearchField with document=True.
If I remove the search_indexes.py I can access the search page, but then of course it does not work as it doesn't have any indecies to search.
By debugging it seems it does not pickup any fields, but it does see the class.
I tried several things but nothing worked
my search_indexes.py:
from haystack import indexes
from my_search.models import Paper
class PaperIndex(indexes.SearchIndex, indexes.Indexable):
"""
This is a search index
"""
title = indexes.CharField(model_attr='title'),
abstract = indexes.CharField(document=True,
use_template=False, model_attr='abstract'),
def get_model(self):
return Paper
def index_queryset(self, using=None):
"""Used when the entire index for model is updated."""
return self.get_model().objects # .filter(
pub_date__lte=datetime.datetime.now())
my models.py:
from django.db import models
class Paper(models.Model):
paper_url = models.CharField(max_length=200),
title = models.CharField(max_length=200),
abstract = models.TextField()
authors = models.CharField(max_length=200),
date = models.DateTimeField(max_length=200),
def __unicode__(self):
return self.title
thanks!
Haystack uses an extra field for the document=True field.
The text = indexes.CharField(document=True) is not on the model, and haystack dumps a bunch of search-able text in there.
Haystack provides a helper method prepare_text() to populate this field. Alternatively, the template method can be used, which is simply a txt file with django template style model properties on it.
class PaperIndex(indexes.SearchIndex, indexes.Indexable):
"""
This is a search index
"""
text = indexes.CharField(document=True)
title = indexes.CharField(model_attr='title'),
abstract = indexes.CharField(model_attr='abstract'),
def get_model(self):
return Paper
def index_queryset(self, using=None):
"""Used when the entire index for model is updated."""
return self.get_model().objects # .filter(
pub_date__lte=datetime.datetime.now())
Related
I have a django model Story which I am successfully able to index using templates. However there is another model Reviews which has a static method which takes Story object and returns ratings as Integer. How can I index Story on ratings also.
{{ object.story_name }}
{{Reviews.ratings(object)}}
I tried to call this method in template story_text.txt, but that results in an error.
django.template.exceptions.TemplateSyntaxError: Could not parse the remainder: '(object)'....
Edit:
I tried using below in template, it doesn't give any error while building the index. But how can I now refer to this field while searching using SearchQuerySet
Reviews.average_start_rating( {{object}} )
I am confused. I don't think that you can use syntax like {{ Reviews.rating object }} with template engine in Django. If it is possible, that is what I didn't know.
Why don't you pass what you want to show in template in Context in the first place?
{{ object }} could be rendered because it has object in Context. For example, if you use UpdateView(class based view), It contains object in Context automatically.
class Example(UpdateView):
model = yourClass
form_class = yourFormClass
template_name = yourhtml
success_url = URL redirect page after success
you can use {{object}} in yourhtml.html because of UpdateView. you give pk number in url conf like (?P<pk>[0-9]+).
you can do like this without UpdateView
class anotherExample(View):
def get(self, request, *args, **kwargs):
render(request, 'yourhtml.html', {"object": Class.objects.get(id=self.kwargs['pk'])})
in form view, you can use
def get_context_data(self, **kwargs):
context = super().get_context_data(**kwargs)
context['object'] = Class.objects.get(id= ... )
return context
my idea is passing story object and review object which has FK of story object together in context.
I was able to get it working using haystack advanced-data-preparation.
Advanced Data Preparation
Using an additional field one can have a prepare method for that. However only issue is I can order the data using this field but can't search using it.
class StoryIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
ratings = indexes.FloatField()
def prepare_ratings(self, obj):
return Reviews.ratings(obj)
def get_model(self):
return Story
Instead of using a template for the text field, here you can use the prepare or prepare_FOO methods:
class StoryIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True)
# text = indexes.CharField(document=True, use_template=True)
# ratings = indexes.FloatField()
def prepare_text(self, obj):
return "\n".join(f"{col}" for col in [obj.story_name, Reviews.ratings(obj)])
def get_model(self):
return Story
I want to use haystack, but all my models have "body" as their text-field name. it is the same on all models though.
Now I get this error:
All 'SearchIndex' classes must use the same 'text' fieldname for the 'document=True' field. Offending index is '<qna.search_indexes.QuestionIndex object at 0x2435328>'.
That's the index file:
import datetime
from haystack import indexes
from qna.models import Question
class QuestionIndex(indexes.SearchIndex, indexes.Indexable):
subject = indexes.CharField(document=False, use_template=False)
body = indexes.CharField(document=True, use_template=True, model_attr='user')
pub_date = indexes.DateTimeField(model_attr='pub_date')
def get_model(self):
return Question
def index_queryset(self, using=None):
"""Used when the entire index for model is updated."""
return self.get_model().objects.filter(pub_date__lte=datetime.datetime.now())
It is the ONLY one! With what is it offending? As far as I understand the field name doesn't have to be "text" it only has to be the same on every field. But it's the only field! Do I have to change some config? What might be the cause of this... ??
I saw your error in the haystack source. It looks like there is a setting for the name of this field (https://github.com/toastdriven/django-haystack/blob/master/haystack/utils/loading.py#L154):
self.document_field = getattr(settings, 'HAYSTACK_DOCUMENT_FIELD', 'text')
Later in that file (https://github.com/toastdriven/django-haystack/blob/master/haystack/utils/loading.py#L222) it checks to make sure that your index name matches and blows up with the error you saw if they don't agree:
if field_object.index_fieldname != self.document_field:
raise SearchFieldError("All 'SearchIndex' classes must use the same '%s' fieldname for the 'document=True' field. Offending index is '%s'." % (self.document_field, index))
If you set HAYSTACK_DOCUMENT_FIELD to "body" in your settings, it looks like that should do it.
I am using Django Haystack for search.
I only want to target the title field of my model when searching for results.
At present however, it returns results if the search term is in any of the fields in my model.
For example: searching xyz gives results where xyz is in the bio field.
This should not happen, I only want to return results where xyz is in the title field. Totally ignoring all other fields other than Artist.title for searching on.
artists/models.py :
class Artist(models.Model):
title = models.CharField(max_length=255)
slug = models.SlugField(max_length=100)
strapline = models.CharField(max_length=255)
image = models.ImageField(upload_to=get_file_path, storage=s3, max_length=500)
bio = models.TextField()
artists/search_indexes.py
from haystack import indexes
from app.artists.models import Artist
class ArtistIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True, model_attr='title')
def get_model(self):
return Artist
I guess thinking of it like a SQL query:
SELECT * FROM artists WHERE title LIKE '%{search_term}%'
UPDATE
Following suggestion to remove use_template=True, my search_indexes.py now looks like:
from haystack import indexes
from app.artists.models import Artist
class ArtistIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, model_attr='title')
title = indexes.CharField(model_attr='title')
def get_model(self):
return Artist
But I am having the same problem. (Have tried python manage.py rebuild_index)
This is my Haystack settings if that makes any difference:
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.simple_backend.SimpleEngine',
},
}
model_attr and use_template don't work together. In this case, as you're querying for a single model attribute there's no need to use a template. Templates in search indexes are purely meant to group data.
Thus, you end up with:
class ArtistIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, model_attr='title')
def get_model(self):
return Artist
If you don't have any other use case for your index (ie searches that should match terms elsewhere) you just have to not use_template at all (set the use_template param to False and just ditch your search template) and you'll be done. FWIW note that when passing True for use_template the model_attr param is ignored. Also, you may not have a use for a full text search engine then, you could possibly just use Django's standard QuerySet lookup API, ie Artist.objects.filter(title__icontains=searchterm).
Else - if you still need a 'full' document index for other searches and only want to restrict this one search to the title you can as well add another index.CharField (with document=False, model_attr='title') for the title and only search on this field. How to do so is fully documented in Haystack's SearchQuerySet API doc.
From the Docs
Additionally, we’re providing use_template=True on the text field. This allows us to use a data template (rather than error prone concatenation) to build the document the search engine will use in searching. You’ll need to create a new template inside your template directory called search/indexes/myapp/note_text.txt and place the following inside:
{{ object.title }}
{{ object.user.get_full_name }}
{{ object.body }}
So I guess in this template you can declare which fields should be indexed/ searched upon
Other way is to override the def prepare(self, object) of Index class and explicitly define fields that need to be indexed/ searched upon.
OR just use model_attr
Basically your search_indexes.py file is written wrong. It should be like:-
from haystack import indexes
from app.artists.models import Artist
class ArtistIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
title= indexes.CharField(model_attr='title',null=True)
def get_model(self):
return Artist
def index_queryset(self, using=None):
return self.get_model().objects.all()
Then you have to create a template in your app. The directory structure would be like:
templates/search/indexes/artists/artist_text.txt
and add the following code to the artist_text.txt file:
{{ object.title }}
Now do python manage.py rebuild_index.
Now It will return result only for title.
I have extracted PDF/DOCX content with Solr and I've suceeded to establish some search queries using the following Solr URL dedicated to this :
http://localhost:8983/solr/select?q=Lycee
I would like to establish a such query with django-haystack. I have found this link which is talking about the issue :
https://github.com/toastdriven/django-haystack/blob/master/docs/rich_content_extraction.rst
But there is no "FileIndex" class with django-haystack (2.0.0-beta). How can I integrate a such search within django-haystack ?
The "FileIndex" referenced in the documentation is a hypothetical subclass of haystack.indexes.SearchIndex. Here is an example:
from haystack import indexes
from myapp.models import MyFile
class FileIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
title = indexes.CharField(model_attr='title')
owner = indexes.CharField(model_attr='owner__name')
def get_model(self):
return MyFile
def index_queryset(self, using=None):
return self.get_model().objects.all()
def prepare(self, obj):
data = super(FileIndex, self).prepare(obj)
# This could also be a regular Python open() call, a StringIO instance
# or the result of opening a URL. Note that due to a library limitation
# file_obj must have a .name attribute even if you need to set one
# manually before calling extract_file_contents:
file_obj = obj.the_file.open()
extracted_data = self.backend.extract_file_contents(file_obj)
# Now we'll finally perform the template processing to render the
# text field with *all* of our metadata visible for templating:
t = loader.select_template(('search/indexes/myapp/myfile_text.txt', ))
data['text'] = t.render(Context({'object': obj,
'extracted': extracted_data}))
return data
So extracted_data would be replaced with whatever process you came up with to extract the PDF/DOCX content. You would then update your template to include that data.
I'm writing a generic template tag that could return a model's field value dynamically based on user inputs in template files. The idea follows which mentioned in the book "Practical Django Project 2nd Edition", but the book version is getting a list of objects where I want to get only a object's value. I want to get the site settings (Title, Tagline etc.) and pass in the template dynamically so that I don't have to write the code again to get each field's value.
Here is what I have done so far.
Define an admin.py for the model (not included here because it's not important)
Defined a model:
from django.db import models
from django.contrib.sites.models import Site
class Naming(models.Model):
title = models.CharField(max_length=250)
site_id = models.ForeignKey(Site)
tagline = models.CharField(max_length=250)
description = models.TextField(blank=True)
def __unicode__(self):
return self.title
Defined a template tag file, the commented line is where I get stuck
from django.db.models import get_model
from django.contrib.sites.models import Site
from django import template
def do_site_att(parser, token):
bits = token.split_contents()
if len(bits) != 5:
raise template.TemplateSyntaxError("'get_site_att' tag takes exactly four arguments")
model_args = bits[1].split('.')
if len(model_args) != 2:
raise template.TemplateSyntaxError("First argument to 'get_site_att' must be an 'application name'.'model name' string.")
model = get_model(*model_args)
if model is None:
raise template.TemplateSyntaxError("'get_site_att' tag got an invalid model: %s." % bits[1])
return ContentNode(model, bits[2], bits[4])
class ContentNode(template.Node):
def __init__(self, model, field, varname):
self.model = model
self.field = field
self.varname = varname
def render(self, context):
current_site = Site.objects.get_current()
try:
var = self.model.objects.get(site_id=current_site.id)
context[self.varname] = var.title #I get stuck here because it not accepts input like var.field (I have extract the field value above)
except:
context[self.varname] = "Value not found"
return ''
register = template.Library()
register.tag('get_site_att', do_site_att)
The template query in base.html:
{% load general_tags %}
{% get_site_att general.Naming title as title %}
<h1 id="branding">{{title}}</h1>
{% get_site_att general.Naming tagline as tagline %}
<h2 id="tagline">{{tagline}}</h2>
I have tried all the possible ways I can think of, but just can't get it works. Any help is really appreciated. Thanks.
I found the solution for this:
on the commented lines, use this code:
var = self.model.objects.get(site_id__exact=current_site.id)
context[self.varname] = var.__dict__[self.field]#this will get the field's value dynamically, which is what I was looking for
The normal way in Python to get an attribute whose name you have in another variable is to use getattr.
context[self.varname] = getattr(var, self.field)