did you know the best full-text search on gae ?
thanks
Read this blog post which details how to add full-text search to App Engine models.
It also details how to make only certain fields searchable, and turn on stemming.
Now we can use experimental Search API:
The Search API allows your application to perform Google-like searches
over structured data. You can search across several different types of
data (plain text, HTML, atom, numbers, dates, and geographic
locations). Searches return a sorted list of matching text. You can
customize the sorting and presentation of results.
Documentation: https://developers.google.com/appengine/docs/python/search/overview
Early presentation: http://www.google.com/events/io/2011/sessions/full-text-search.html
Google App Engine - Full Text Search
Related
I have written Django app and I am using MySQL in it. I'd like to implement full text search but from what I see full text search is supported only for PostgreSQL. I checked many articles and answers but most of them are outdated.
Is there currently any way to handle full text search in MySQL for multiple models. I am currently using Q lookups but it need to use something similar to trigram similarity or levenstein distance.
I am using appengine with python (version 2.7) for a web application which deals with job listings and job search.
Backend consists of a "Job" table which consists of 20+ fields such as title,date,experience etc. I have the necessary composite indexes defined for each of the filter's permutation and combination. As you would have guessed, the number of indexes are high.
The front-end provides option for users to search for jobs and filter them using the columns.
This works as expected but with following drawbacks:
Slow Search Performance
The search is divided into two parts: inbuilt datastore filtering and then a custom filtering on top of the refined results. The custom filtering is required to further apply the complex filters which are not supported by appengine.
Exploding composite indexes
Some columns (5 for instance) accepts only a set of values, so filtering using them is pretty straightforward. While other fields can have user defined values and hence filtering through them requires custom python code.
Jinja is the templating engine which then renders the data into the html.
Advanced Search + Index References: https://cloud.google.com/appengine/articles/indexselection
Is there a better approach/algorithm for implementing the search and advanced search in the appengine?
You might want to consider using the Full Text Search API available in App Engine. In essence, when entities are created in Cloud Datastore, you would create a Document with the entity ID/Key and all searchable fields and send it to the Search API for indexing. Any updates to the Datastore entities would also need to update the corresponding Search document. Also, when entities are deleted, delete the corresponding Search document.
Modify your Application's search code to perform the Search on Indexed documents instead of Datastore queries. Retrieve a page (e.g. 50) of Document IDs. Fetch the data for the 50 entities using a Datastore Get and display the results.
Per the documentation -
The Search API lets your application perform Google-like full-text
searches over structured data, and supports Geolocation-based queries.
It can be useful in any application domain that benefits from
full-text search, such as:
This would definitely give a better Search experience for your application users when compared with Datastore queries.
Once you implement this, you might be able to just get rid of the composite indexes from Datastore.
Google App Engine (GAE) provides a way to do Full Text Search (FTS) and store and retrieve documents. The default document ranking is based on a time offset. Is there a way to do a Lucene style Inverted indices look-up and ranking on GAE? If not what are some other options to do this.
Use case: FTS and intelligent ranking of results (at least search query frequency based) for bunch of html pages.
Both GAE Datastore and GAE Search API can do query-by-index:
Datastore is a NoSQL datastore with user-defined indexes and limited queries. It's a database: fast, distributed and has transactions. Queries are however quite restricted: They can only span one Entity kind, so no JOINs. Only one inequality filter per query, so no geo-point search is possible. Also, string search is exact, so no sub-string search, regex search or LIKE search is possible.
Search API is more like Lucene: you store documents and build indexes from parts of the documents. It supports full-text search and geo-point search (e.g. finding geo-points within certain distance from given geo-point).
If you gave us a more specific use case, we might be able to help you decide which one to use.
Could you explain how search engines like Sphinx, Haystack, etc fit in to a web framework. If you could explain in a way that someone new to web development could understand that would help.
One example use case I made up for this question is a book search feature. Lets say I have a noSQL database that contains book objects, each containing author, title, ISBN, etc.; how does something like Sphinx/Haystack/other search engine fit in with my database to search for a books with a given ISBN?
Firstly, Haystack isn't a search engine, it's a library that provides a Django API to existing search engines like Solr and Whoosh.
That said, your example isn't really a very good one. You wouldn't use a separate search engine to search by ISBN, because your database would already have an index on the Book table which would efficiently do that search. Where a search engine would come in could be in two places. Firstly, you could index some or all of the book's contents to search on: databases are not very good at full-text search, but this is an area where search engines shine. Secondly, you could provide a search against multiple fields - say, author, title, publisher and description - in one go.
Also, search engines provide useful functionality like suggestions, faceting and so on that you won't get from a database.
I was wondering if there was any way to search the datastore for a entry. I have a bunch of entries for songs(title, artist,rating) but im not sure how to really search through them for both song title and artist. We take in a search term and are looking for all entries that "match." But we are lost :( any help is much appreciated!
We are using python
edit1: current code is useless, its an exact search but might help you see the issue
query = song.gql("SELECT * FROM song WHERE title = searchTerm OR artist = searchTerm")
The song data you work with sounds as a rather static data set (primarily inserts, no or few updates). In that case there is GAE technique called Relation Index Entity (RIE) which is an efficient way to implement keyword-based search.
But some preparation work required which is briefly:
build special RIE entity where you place all searchable keywords
from each song (one-to-one relationship).
RIE stores them in StringListProperty which supports searches like this:
keywords = 'SearchTerm'
(returns True if any of the values in the list keywords matches 'SearchTerm'`)
AND condition works immediately by adding multipe filters as above
OR condition needs more work by implementing in-memory merge from AND-only queries
You can find details on solution workflow and code samples in my blog Relation Index Entities with Python for Google Datastore.
http://www.billkatz.com/2009/6/Simple-Full-Text-Search-for-App-Engine