Add CharField to the search index in haystack

Add CharField to the search index in haystack - python

I use in my django app (1.8), haystack (2.4.1) to search. And I want to be able to serach words with autocomplete (EdgeNgramField) and words when I put only the part of name, for example 'zo-zo on' (this isn't working with EdgeNgramField)
Below I tired added: text_sec = indexes.CharField(use_template=True) but this isn't working for me.
Here is my code, but he doesn't works:
class EventIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.EdgeNgramField(document=True, use_template=True)
text_sec = indexes.CharField(use_template=True)
id = indexes.CharField(model_attr='id')
get_absolute_url = indexes.CharField(model_attr='get_absolute_url')
description = indexes.CharField(model_attr='description', null=True)
is_past = indexes.CharField(model_attr='is_past', default='false')
date_start = indexes.DateTimeField(model_attr='date_start')

You will need to setup to different fields in your schema to power autocomplete and normal search.
In the following i have defined two fields i.e. one is text and other is content_auto one which is populated with title from your model.
class EventIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
content_auto = indexes.EdgeNgramField(model_attr='title')
text_sec = indexes.CharField(use_template=True)
id = indexes.CharField(model_attr='id')
get_absolute_url = indexes.CharField(model_attr='get_absolute_url')
description = indexes.CharField(model_attr='description', null=True)
is_past = indexes.CharField(model_attr='is_past', default='false')
date_start = indexes.DateTimeField(model_attr='date_start')
When you want to do normal search you should search on text field, for autosuggest on content_auto.
You should read up on docs http://haystacksearch.org/ for more on this,

Related

Is there a way to merge 2 querysets in Django and order them by a their repecting field?

I'm trying to create a twitter clone and this is my user and tweet Model(some irrelevant fields have been removed).
class TwitterUser(models.Model):
user = models.OneToOneField(to=User, on_delete=models.CASCADE,primary_key=True)
Bio = models.CharField(max_length=200, blank=True)
Location = models.CharField(max_length=200, blank=True)
Website = models.URLField(blank=True)
ProfilePicture = models.ImageField(upload_to="Twitter", default="../static/twitter/images/default_profile.png")
CreateDate = models.DateField(default=timezone.now)
class Tweet(models.Model):
TweetBody = models.CharField(max_length=140, blank=False)
TweetDate = models.DateTimeField(default=timezone.now)
Owner= models.ForeignKey(to=TwitterUser,on_delete=models.CASCADE,related_name="Owner")
RetweetedBy= models.ManyToManyField(to=TwitterUser,related_name="Retweeted",blank=True,through="RetweetIntermediate")
and this the table that my many to many relationship for retweet is using.
class RetweetIntermediate(models.Model):
twitteruser=models.ForeignKey(TwitterUser,on_delete=models.CASCADE)
tweet=models.ForeignKey(Tweet,on_delete=models.CASCADE)
retweetDate=models.DateTimeField(default=timezone.now)
In profile view all the tweets and retweets should be shown ordered by date
what I'm doing right now (and it is working fine) is this:
def keymaker(a):
return a.TweetDate
def ProfileView(request):
tweets= list(Tweet.objects.filter(Owner=user.user_id,IsReplyToTweet__isnull=True).order_by("-TweetDate"))
retweets = list(user.Retweeted.all().order_by("-id"))
retweetInter=RetweetIntermediate.objects.all().order_by("-tweet_id")
for i , j in zip(retweets,retweetInter):
i.TweetDate=j.retweetDate
tweets=(tweets+retweets)
tweets.sort(key=keymaker,reverse=True)
I retrieve all the tweets ordered by date. then I retrieve all of retweets and make a list out of them and change the data of tweet to the date saved in intermediate table
and merge both lists and sort them by date.
I want to know is there a better way or more standard way to do this?
Thanks in advance.

You can do it using union together with annotate.
from django.db.models import F
tweets_qs = Tweet.objects\
.filter(Owner=user, IsReplyToTweet__isnull=True)\
.annotate(date=F('TweetDate'))
retweets_qs = Tweet.objects\
.filter(retweetintermediate__twitteruser=user)\
.annotate(date=F('retweetintermediate__retweetDate'))
timeline_qs = tweets_qs.union(retweets_qs).order_by('-date')
Notice that both querysets have Tweet objects.
Edit: Sorry for not understanding the question correctly the first time.

Django full text search using indexes with PostgreSQL

After solving the problem I asked about in this question, I am trying to optimize performance of the FTS using indexes.
I issued on my db the command:
CREATE INDEX my_table_idx ON my_table USING gin(to_tsvector('italian', very_important_field), to_tsvector('italian', also_important_field), to_tsvector('italian', not_so_important_field), to_tsvector('italian', not_important_field), to_tsvector('italian', tags));
Then I edited my model's Meta class as follows:
class MyEntry(models.Model):
very_important_field = models.TextField(blank=True, null=True)
also_important_field = models.TextField(blank=True, null=True)
not_so_important_field = models.TextField(blank=True, null=True)
not_important_field = models.TextField(blank=True, null=True)
tags = models.TextField(blank=True, null=True)
class Meta:
managed = False
db_table = 'my_table'
indexes = [
GinIndex(
fields=['very_important_field', 'also_important_field', 'not_so_important_field', 'not_important_field', 'tags'],
name='my_table_idx'
)
]
But nothing seems to have changed. The lookup takes exactly the same amount of time as before.
This is the lookup script:
from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
# other unrelated stuff here
vector = SearchVector("very_important_field", weight="A") + \
SearchVector("tags", weight="A") + \
SearchVector("also_important_field", weight="B") + \
SearchVector("not_so_important_field", weight="C") + \
SearchVector("not_important_field", weight="D")
query = SearchQuery(search_string, config="italian")
rank = SearchRank(vector, query, weights=[0.4, 0.6, 0.8, 1.0]). # D, C, B, A
full_text_search_qs = MyEntry.objects.annotate(rank=rank).filter(rank__gte=0.4).order_by("-rank")
What am I doing wrong?
Edit:
The above lookup is wrapped in a function I use a decorator on to time. The function actually returns a list, like this:
#timeit
def search(search_string):
# the above code here
qs = list(full_text_search_qs)
return qs
Might this be the problem, maybe?

You need to add a SearchVectorField to your MyEntry, update it from your actual text fields and then perform the search on this field. However, the update can only be performed after the record has been saved to the database.
Essentially:
from django.contrib.postgres.indexes import GinIndex
from django.contrib.postgres.search import SearchVector, SearchVectorField
class MyEntry(models.Model):
# The fields that contain the raw data.
very_important_field = models.TextField(blank=True, null=True)
also_important_field = models.TextField(blank=True, null=True)
not_so_important_field = models.TextField(blank=True, null=True)
not_important_field = models.TextField(blank=True, null=True)
tags = models.TextField(blank=True, null=True)
# The field we actually going to search.
# Must be null=True because we cannot set it immediately during create()
search_vector = SearchVectorField(editable=False, null=True)
class Meta:
# The search index pointing to our actual search field.
indexes = [GinIndex(fields=["search_vector"])]
Then you can create the plain instance as usual, for example:
# Does not set MyEntry.search_vector yet.
my_entry = MyEntry.objects.create(
very_important_field="something very important", # Fake Italien text ;-)
also_important_field="something different but equally important"
not_so_important_field="this one matters less"
not_important_field="we don't care are about that one at all"
tags="things, stuff, whatever"
Now that the entry exists in the database, you can update the search_vector field using all kinds of options. For example weight to specify the importance and config to use one of the default language configurations. You can also completely omit fields you don't want to search:
# Update search vector on existing database record.
my_entry.search_vector = (
SearchVector("very_important_field", weight="A", config="italien")
+ SearchVector("also_important_field", weight="A", config="italien")
+ SearchVector("not_so_important_field", weight="C", config="italien")
+ SearchVector("tags", weight="B", config="italien")
)
my_entry.save()
Manually updating the search_vector field every time some of the text fields change can be error prone, so you might consider adding an SQL trigger to do that for you using a Django migration. For an example on how to do that see for instance a blog article on Full-text Search with Django and PostgreSQL.
To actually search in MyEntry using the index you need to filter and rank by your search_vector field. The config for the SearchQuery should match the one of the SearchVector above (to use the same stopword, stemming etc).
For example:
from django.contrib.postgres.search import SearchQuery, SearchRank
from django.core.exceptions import ValidationError
from django.db.models import F, QuerySet
search_query = SearchQuery("important", search_type="websearch", config="italien")
search_rank = SearchRank(F("search_vector"), search_query)
my_entries_found = (
MyEntry.objects.annotate(rank=search_rank)
.filter(search_vector=search_query) # Perform full text search on index.
.order_by("-rank") # Yield most relevant entries first.
)

I'm not sure but according to postgresql documentation (https://www.postgresql.org/docs/9.5/static/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX):
Because the two-argument version of to_tsvector was used in the index
above, only a query reference that uses the 2-argument version of
to_tsvector with the same configuration name will use that index. That
is, WHERE to_tsvector('english', body) ## 'a & b' can use the index,
but WHERE to_tsvector(body) ## 'a & b' cannot. This ensures that an
index will be used only with the same configuration used to create the
index entries.
I don't know what configuration django uses but you can try to remove first argument

Django - cannot retrieve just one record in multi-part filter on model with multiple relations

I can't seem to isolate a single record from this query:
subcust = OwnerCustom.objects.get(carcustom=ncset, owner=sset)
This is the error:
OwnerCustom matching query does not exist
In the actual data, there is only actually one matching record in OwnerCustom for each record in CarCustom. It's supposed to be a kind of many-to-many where there are standard differences listed in CarCustom for each Car, and each owner may maintain their own customizations (overrides) or those default OwnerCustom entries.
Note, there are many different Owner of the same Car. And of course, I'm not actually doing cars, this is a renaming from the original purpose.
Here's the relevant models:
class Car(models.Model):
car_name = models.CharField(max_length=50)
class CarCustom(models.Model):
car = models.ForeignKey(Car, models.PROTECT)
class Owner(models.Model):
car = models.ForeignKey(Car, models.PROTECT)
class OwnerCustom(models.Model):
owner = models.ForeignKey(Owner, models.PROTECT)
carcustom = models.ForeignKey(CarCustom, models.PROTECT)
name = models.CharField(max_length=50)
And the code:
car_queryset = Car.objects.filter(car_name="fancy car")
for nset in car_queryset:
owner_queryset = Owner.objects.filter(car=nset)
for sset in owner_queryset :
carcustom_queryset = CarCustom.objects.filter(car=nset)
for ncset in carcustom_queryset:
subcust = OwnerCustom.objects.get(carcustom=ncset, owner=sset)
I've tried stuff like:
subcust = OwnerCustom.objects.filter(carcustom=ncset, owner=sset).first()
Which gives me a NoneType, and then tried:
subcust = OwnerCustom.objects.filter(carcustom=ncset, owner=sset)[:1].get()
Which gives "matching query does not exist" and this:
subcust = OwnerCustom.objects.filter(carcustom=ncset, owner=sset)[0]
Gives "list index out of range"
UPDATE: I CAN get a working function by using code like this, but I would think since there is only one (guaranteed by application) matching record possible for OwnerCustom.objects.filter(carcustom=ncset, owner=sset) that I could find a better way to fetch it:
car_queryset = Car.objects.filter(car_name="fancy car")
for nset in car_queryset:
owner_queryset = Owner.objects.filter(car=nset)
for sset in owner_queryset :
carcustom_queryset = CarCustom.objects.filter(car=nset)
for ncset in carcustom_queryset:
subcust_queryset = OwnerCustom.objects.filter(carcustom=ncset, owner=sset)
for subcust in subcust_queryset :
logger.info(subcust.name)

How do I store a string in ArrayField? (Django and PostgreSQL)

I am unable to store a string in ArrayField. There are no exceptions thrown when I try to save something in it, but the array remains empty.
Here is some code from models.py :
# models.py
from django.db import models
import uuid
from django.contrib.auth.models import User
from django.contrib.postgres.fields import JSONField, ArrayField
# Create your models here.
class UserDetail(models.Model):
user = models.OneToOneField(User, on_delete=models.CASCADE)
key = models.CharField(max_length=50, default=False, primary_key=True)
api_secret = models.CharField(max_length=50)
user_categories = ArrayField(models.CharField(max_length = 1000), default = list)
def __str__(self):
return self.key
class PreParentProduct(models.Model):
product_user = models.ForeignKey(UserDetail, default=False, on_delete=models.CASCADE)
product_url = models.URLField(max_length = 1000)
pre_product_title = models.CharField(max_length=600)
pre_product_description = models.CharField(max_length=2000)
pre_product_variants_data = JSONField(blank=True, null=True)
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
def __str__(self):
return self.pre_product_title
I try to save it this way:
catlist = ast.literal_eval(res.text)
for jsonitem in catlist:
key = jsonitem.get('name')
id = jsonitem.get("id")
dictionary = {}
dictionary['name'] = key
dictionary['id'] = id
tba = json.dumps(dictionary)
print("It works till here.")
print(type(tba))
usersearch[0].user_categories.append(tba)
print(usersearch[0].user_categories)
usersearch[0].save()
print(usersearch[0].user_categories)
The output I get is:
It works till here.
<class 'str'>
[]
It works till here.
<class 'str'>
[]
[]
Is this the correct way to store a string inside ArrayField?
I cannot store JSONField inside an ArrayField, so I had to convert it to a string.
How do I fix this?

Solution to the append problem.
You haven't demonstrated how your usersearch[0] I suspect it's something like this:
usersearch = UserDetail.objects.all()
If that is so you are making changes to a resultset, those things are immutable. Try this you will see that the id is unchanged too:
usersearch[0].id = 1000
print usersearch.id
But this works
usersearch = list(UserDetail.objects.all())
and so does
u = usersearch[0]
Solution to the real problem
user_categories = ArrayField(models.CharField(max_length = 1000), default = list)
This is wrong. ArrayFields shouldn't be used in this manner. You will soon find that you need to search through them and
Arrays are not sets; searching for specific array elements can be a
sign of database misdesign. Consider using a separate table with a row
for each item that would be an array element. This will be easier to
search, and is likely to scale better for a large number of elements
ref: https://www.postgresql.org/docs/9.5/static/arrays.html
You need to normalize your data. You need to have a category model and your UserDetail should be related to it through a foreign key.

Django - filtering in views

I am working on an basic application and I`m stuck at a displaying some info.
Please take a look:
Models:
class Companies(models.Model):
name = models.CharField()
address = models.CharField()
def __unicode__(self):
return self.name
class Payments(models.Model):
company = models.ForeignKey(Companies)
year = models.CharField(choices=YEAR)
month = models.CharField(choices=MONTHS)
date = models.DateField(auto_now_add=True)
I want a view in which to display ONLY the companies that did not pay the monthly fee.
So I`ve started like this:
def checks(request):
i = datetime.datetime.now()
an_c = i.strftime('%Y')
comp = Companies.objects.all()
pay1 = Payments.objects.filter(an=an_c, month='01')
But in the template I do not know how to filter the "comp" list.
I want to display in the template all the records from "comp" except that ones with the id/pk which can be find in the "pay1.company"

You wouldn't do that in the template. Do the whole thing in the view:
pay1 = Payments.objects.filter(an=an_c, month='01')
comp = Companies.objects.exclude(payments__in=pay1)
(Style note: Django model classes are usually named in the singular, not the plural.)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Add CharField to the search index in haystack - python

Related

Is there a way to merge 2 querysets in Django and order them by a their repecting field?

Django full text search using indexes with PostgreSQL

Django - cannot retrieve just one record in multi-part filter on model with multiple relations

How do I store a string in ArrayField? (Django and PostgreSQL)

Django - filtering in views

Categories

Resources