Advanced Search algorithm on Google Appengine datastore and indexes - python

I am using appengine with python (version 2.7) for a web application which deals with job listings and job search.
Backend consists of a "Job" table which consists of 20+ fields such as title,date,experience etc. I have the necessary composite indexes defined for each of the filter's permutation and combination. As you would have guessed, the number of indexes are high.
The front-end provides option for users to search for jobs and filter them using the columns.
This works as expected but with following drawbacks:
Slow Search Performance
The search is divided into two parts: inbuilt datastore filtering and then a custom filtering on top of the refined results. The custom filtering is required to further apply the complex filters which are not supported by appengine.
Exploding composite indexes
Some columns (5 for instance) accepts only a set of values, so filtering using them is pretty straightforward. While other fields can have user defined values and hence filtering through them requires custom python code.
Jinja is the templating engine which then renders the data into the html.
Advanced Search + Index References: https://cloud.google.com/appengine/articles/indexselection
Is there a better approach/algorithm for implementing the search and advanced search in the appengine?

You might want to consider using the Full Text Search API available in App Engine. In essence, when entities are created in Cloud Datastore, you would create a Document with the entity ID/Key and all searchable fields and send it to the Search API for indexing. Any updates to the Datastore entities would also need to update the corresponding Search document. Also, when entities are deleted, delete the corresponding Search document.
Modify your Application's search code to perform the Search on Indexed documents instead of Datastore queries. Retrieve a page (e.g. 50) of Document IDs. Fetch the data for the 50 entities using a Datastore Get and display the results.
Per the documentation -
The Search API lets your application perform Google-like full-text
searches over structured data, and supports Geolocation-based queries.
It can be useful in any application domain that benefits from
full-text search, such as:
This would definitely give a better Search experience for your application users when compared with Datastore queries.
Once you implement this, you might be able to just get rid of the composite indexes from Datastore.

Related

Sorting and Filtering multiple queries of the same collection in Firestore

I'm new on cloud firestore and I'm trying to make queries as efficient as possible but I kind of desperate with an specific one. I would greatly appreciate your help.
This is the situation:
I want to show a project list which that I'm getting from an user field and 2 queries in project entity. The user field let’s called "favorite projects" and it has the projects id that reference those projects on their entity. The other query retrieve me the public projects (==) and the last the private projects where the user is a contributor (array_contains).
I want to sort and filtering the result of the two queries. Is there an option to merge both queries and use sort and filter as a we do with a collection reference?
Thank you for your time, have a nice day!
Based on this and this documentation, I do not believe there is an out of the box solution for joining the results of queries such as the ones described.
You'll need to achieve that within the your code.
For example you can run the first query and store all the data of the document in a map or array. Then use the reference of the other document within the document_reference to make the second query and the third.
Once you have all of them you can do as you please using Python. But getting them ready using a single query or auto-joining the queries seems to not be supported yet.

App Engine social platform - Content interactions modeling strategy

I have a Python server running on Google app engine and implements a social network. I am trying to find the best way (best=fast and cheap) to implement interactions on items.
Just like any other social network I have the stream items ("Content") and users can "like" these items.
As for queries, I want to be able to:
Get the list of users who liked the content
Get a total count of the likers.
Get an intersection of the likers with any other users list.
My Current implementation includes:
1. IntegerProperty on the content item which holds the total likers count
2. InteractionModel - a NdbModel with a key id qual to the content id (fast fetch) and a JsonPropery the holds the likers usernames
Each time a user likes a content I need to update the counter and the list of users. This requires me to run and pay for 4 datastore operations (2 reads, 2 writes).
On top of that, items with lots of likers results in an InteractionModel with a huge json that takes time to serialize and deserialize when reading/writing (Still faster then RepeatedProperty).
None of the updated fields are indexed (built-in index) nor included in combined index (index.yaml)
Looking for a more efficient and cost effective way to implement the same requirements.
I´m guessing you have two entities in you model: User and Content. Your queries seem to aggregate upon multiple Content objects.
What about keeping this aggregated values on the User object? This way, you don´t need to do any queries, but rather only look up the data stored in the User object for these queries.
At some point though, you might consider not using the datastore, but look at sql storage instead. It has a higher constant cost, but I´m guessing at some point (more content/users) it might be worth considering both in terms of cost and performance.

Inverted Indices Data Store on Google App Engine

Google App Engine (GAE) provides a way to do Full Text Search (FTS) and store and retrieve documents. The default document ranking is based on a time offset. Is there a way to do a Lucene style Inverted indices look-up and ranking on GAE? If not what are some other options to do this.
Use case: FTS and intelligent ranking of results (at least search query frequency based) for bunch of html pages.
Both GAE Datastore and GAE Search API can do query-by-index:
Datastore is a NoSQL datastore with user-defined indexes and limited queries. It's a database: fast, distributed and has transactions. Queries are however quite restricted: They can only span one Entity kind, so no JOINs. Only one inequality filter per query, so no geo-point search is possible. Also, string search is exact, so no sub-string search, regex search or LIKE search is possible.
Search API is more like Lucene: you store documents and build indexes from parts of the documents. It supports full-text search and geo-point search (e.g. finding geo-points within certain distance from given geo-point).
If you gave us a more specific use case, we might be able to help you decide which one to use.

Google Apps Engine Datastore Search

I was wondering if there was any way to search the datastore for a entry. I have a bunch of entries for songs(title, artist,rating) but im not sure how to really search through them for both song title and artist. We take in a search term and are looking for all entries that "match." But we are lost :( any help is much appreciated!
We are using python
edit1: current code is useless, its an exact search but might help you see the issue
query = song.gql("SELECT * FROM song WHERE title = searchTerm OR artist = searchTerm")
The song data you work with sounds as a rather static data set (primarily inserts, no or few updates). In that case there is GAE technique called Relation Index Entity (RIE) which is an efficient way to implement keyword-based search.
But some preparation work required which is briefly:
build special RIE entity where you place all searchable keywords
from each song (one-to-one relationship).
RIE stores them in StringListProperty which supports searches like this:
keywords = 'SearchTerm'
(returns True if any of the values in the list keywords matches 'SearchTerm'`)
AND condition works immediately by adding multipe filters as above
OR condition needs more work by implementing in-memory merge from AND-only queries
You can find details on solution workflow and code samples in my blog Relation Index Entities with Python for Google Datastore.
http://www.billkatz.com/2009/6/Simple-Full-Text-Search-for-App-Engine

Best full-text search for google-app-engine

did you know the best full-text search on gae ?
thanks
Read this blog post which details how to add full-text search to App Engine models.
It also details how to make only certain fields searchable, and turn on stemming.
Now we can use experimental Search API:
The Search API allows your application to perform Google-like searches
over structured data. You can search across several different types of
data (plain text, HTML, atom, numbers, dates, and geographic
locations). Searches return a sorted list of matching text. You can
customize the sorting and presentation of results.
Documentation: https://developers.google.com/appengine/docs/python/search/overview
Early presentation: http://www.google.com/events/io/2011/sessions/full-text-search.html
Google App Engine - Full Text Search

Categories

Resources