How do you iterate over StringProperty as a string object? - python

My code is as follows:
class sample(ndb.Model):
text=ndb.StringProperty()
class sample2(webapp2.RequestHandler):
def get(self, item):
q = sample.query()
q1 = query.filter(item in sample.text)
I want to search for a specific word(item) in the text inside sample. How do I go about it? I tried this link, but it doesn't really answer my question- How do I get the value of a StringProperty in Python for Google App Engine?

Unfortunately, you can't do queries like that with the datastore (or likely any other database). Datastore queries are based on indices, and indices are not able to store complicated information, such as whether string has a certain substring.
App Engine also has a Search API that you can use to do more complicated text searching.

First of all, Python 2 is deprecated since 1st of January 2020 and the Google Cloud Platform strongly encourages the migration to newer runtimes. Also the ndb library is no longer recommended and the code relying on it would need to be updated before you could start using the Python 3 runtime. All these reasons makes me suggest you to directly switch before you spent much time on them.
Having said that, you could find the answer in the ndb library documentation about queries and properties. There are also a couple of examples in the Github code samples.
First you need to retrieve the entity with the query() method and then access it's data through python object property notation. More or less it will look like the following:
q = sample.query() # Retrieve sample entity.
substring = 'word' in q.text # Returns True if word is a substring of text

Related

todoist - Fetching the list of projects with python library

I am specifically trying to get the project id given the project name. I saw in the api the api.sync() is supposed to return to me all the projects as in array in a key which I was then planning to iterate through.
I tried using sync with the python library but my projects array is empty, is it some sort of promise mechanism if so how do I wait for success response in python language?
import todoist
api = todoist.TodoistAPI(token)
response = api.sync()
projects = response['projects']
for project in projects:
print(project['name'] + '-' + project['id'])
The python library automatically syncs so the sync method response rarely contains anything useful BUT that information is contained in the api class which has an overridden _find_object method. Therefore you can use the same notation as a dict to find elements - i.e. api['projects'].
I know this is an old question but I'd been having this problem and was struggling to find an answer so hopefully this will be useful to someone at some point!
The library takes care of the Sync for you. In case you already did a Sync in the past, it stores a hash into your $HOME/.todoist-sync/. I recommend you to try to clean this path and try again.

How to use full-text search in sqlite3 database in django?

I am working on a django application with sqlite3 database, that has a fixed database content. By fixed I mean the content of the db won't change over time. The model is something like this:
class QScript(models.Model):
ch_no = models.IntegerField()
v_no = models.IntegerField()
v = models.TextField()
There are around 6500 records in the table. Given a text that may have some words missing, or some words misspelled, I need to determine its ch_no and v_no. For example, if there is a v field in db with text "This is an example verse", a given text like "This an egsample verse" should give me the ch_no and v_no from db. This can be done using Full text search I believe.
My queries are:
can full-text search do this? My guess from what I have studied, it can, as said in sqlite3 page: full-text searches is "what Google, Yahoo, and Bing do with documents placed on the World Wide Web". Cited in SO, I read this article too, along with many others, but didn't find anything that closely matches my requirements.
How to use FTS in django models? I read this but it didn't help. It seems too outdated. Read here that: "...requires direct manipulation of the database to add the full-text index". Searching gives mostly MySQL related info, but I need to do it in sqlite3. So how to do that direct manipulation in sqlite3?
Edit:
Is my choice of sticking to sqlite3 correct? Or should I use something different (like haystack+elasticsearch as said by Alex Morozov)? My db will not grow any larger, and I have studied that for small sized db, sqlite is almost always better (my situation matches the fourth in sqlite's when to use checklist).
SQLite's FTS engine is based on tokens - keywords that the search engine tries to match.
A variety of tokenizers are available, but they are relatively simple. The "simple" tokenizer simply splits up each word and lowercases it: for example, in the string "The quick brown fox jumps over the lazy dog", the word "jumps" would match, but not "jump". The "porter" tokenizer is a bit more advanced, stripping the conjugations of words, so that "jumps" and "jumping" would match, but a typo like "jmups" would not.
In short, the SQLite FTS extension is fairly basic, and isn't meant to compete with, say, Google.
As for Django integration, I don't believe there is any. You will likely need to use Django's interface for raw SQL queries, for both creating and querying the FTS table.
I think that while sqlite is an amazing piece of software, its full-text search capabilities are quite limited. Instead you could index your database using Haystack Django app with some backend like Elasticsearch. Having this setup (and still being able to access your sqlite database) seems to me the most robust and flexible way in terms of FTS.
Elasticsearch has a fuzzy search based on the Levenshtein distance (in a nutshell, it would handle your "egsample" queries). So all you need is to make a right type of query:
from haystack.forms import SearchForm
from haystack.generic_views import SearchView
from haystack import indexes
class QScriptIndex(indexes.SearchIndex, indexes.Indexable):
v = indexes.CharField(document=True)
def get_model(self):
return QScript
class QScriptSearchForm(SearchForm):
text_fuzzy = forms.CharField(required=False)
def search(self):
sqs = super(QScriptSearchForm, self).search()
if not self.is_valid():
return self.no_query_found()
text_fuzzy = self.cleaned_data.get('text_fuzzy')
if text_fuzzy:
sqs = sqs.filter(text__fuzzy=text_fuzzy)
return sqs
class QScriptSearchView(SearchView):
form_class = QScriptSearchForm
Update: As long as PostgreSQL has the Levenshtein distance function, you could also leverage it as the Haystack backend as well as a standalone search engine. If you choose the second way, you'll have to implement a custom query expression, which is relatively easy if you're using a recent version of Django.

Updating a large number of entities in a datastore on Google App Engine

I would like to perform a small operation on all entities of a specific kind and rewrite them to the datastore. I currently have 20,000 entities of this kind but would like a solution that would scale to any amount.
What are my options?
Use a mapper - this is part of the MapReduce framework, but you only want the first component, map, as you don't need the shuffle/reduce step if you're simply mutating datastore entities.
Daniel is correct, but if you don't want to mess up with the mapper, that requires you to add another library to your app you can do it using Task Queues or even simpler using the deferred library that is included since SDK 1.2.3.
20.000 entities it's not that dramatic and I assume that this task is not going be performed in regular basis (but even if it does, it is feasible).
Here is an example using NDB and the deferred library (you can easily do that using DB, but consider switching to NDB anyway if you are not already using it). It's a pretty straight forward way, but without caring much about the timeouts:
def update_model(limit=1000):
more_cursor = None
more = True
while more:
model_dbs, more_cursor, more = Model.query().fetch_page(limit, start_cursor=more_cursor)
for model_db in model_dbs:
model_db.updated = True
ndb.put_multi(model_dbs)
logging.info('### %d entities were updated' % len(model_dbs))
class UpdateModelHandler(webapp2.RequestHandler):
def get(self):
deferred.defer(update_model, _queue='queue')
self.response.headers['Content-Type'] = 'text/html'
self.response.out.write('The task has been started!')

How to use High Replication Datastore

Okay, I have watched the video and read the articles in the App Engine documentation (including Using the High Replication Datastore). However I am still completely confused on the practical usage of it. I understand the benefits (from the video) and they sound great. But what I am lacking is a few practical examples. There are plenty of master/slave examples on the web, but very little illustrating (with proper documentation) the high replication datastore. The guestbook code example used in the Using the High Replication Datastore article illustrates the ancestor key by adding a new functionality that the previous guestbook code example does not have (seems you can change guestbook). This just adds to the confusion.
I often use djangoforms on GAE and I was wondering if someone can help me translate all these queries into high replication datastore compatible queries (let's forget for a moment the discussion that not all queries necessarily need to be high replication datastore compatible queries and focus on the example itself).
UPDATE: with high replication datastore compatible queries I refer to queries that always return the latest data and not potential stale data. Using entity groups seems to be the way to go here but as mentioned before, I don't have many practical code examples of how to do this, so that is what I am looking for!
So the queries in this article are:
The main recurring query in this article is:
query = db.GqlQuery("SELECT * FROM Item ORDER BY name")
which we will translate to:
query = Item.all().order('name') // datastore request
validating the form happens like:
data = ItemForm(data=self.request.POST)
if data.is_valid():
# Save the data, and redirect to the view page
entity = data.save(commit=False)
entity.added_by = users.get_current_user()
entity.put() // datastore request
and getting the latest entry from the datastore for populating a form happens like:
id = int(self.request.get('id'))
item = Item.get(db.Key.from_path('Item', id)) // datastore request
data = ItemForm(data=self.request.POST, instance=item)
So what do I/we need to do to make all these datastore requests compatible with the high replication datastore?
One last thing that is also not clear to me. Using ancestor keys, does this have any impact on the model in datastore. For example, in the guestbook code example they use:
def guestbook_key(guestbook_name=None):
return db.Key.from_path('Guestbook', guestbook_name or 'default_guestbook')
However 'Guestbook' does not exist in the model, so how can you use 'db.Key.from_path' on this and why would this work? Does this change how data is stored in the datastore which I need to keep into account when retrieving the data (e.g. does it add another field I should exclude from showing when using djangoforms)?
Like I said before, this is confusing me a lot and your help is greatly appreciated!
I'm not sure why you think you need to change your queries at all. The documentation that you link to clearly states:
The back end changes, but the datastore API does not change at all. You'll use the same programming interfaces no matter which datastore you're using.
The point of that page is just to say that queries may be out of sync if you don't use entity groups. Your final code snippet is just an example of that - the string 'Guestbook' is exactly an ancestor key. I don't understand why you think it needs to exist in the model. Once again, this is unchanged from the non-HR datastore - it has always been the case that keys are built up from paths, which can consist of arbitrary strings. You probably need to reread the documentation on entity groups and keys.
The changes to use the HRD are not in how queries are made, but in what guarantees are made about what data you get back. The example you give:
query = db.GqlQuery("SELECT * FROM Item ORDER BY name")
will work in the HRD as well. The catch (basically) is that this kind of query (using either this syntax, or the Item.all() form) can return objects slightly out-of-date. This is probably not a big deal with the guestbook.
Note that if you're getting an object by key directly, it will never be out-of-date. It's only for queries that you can see this issue. You can avoid this problem with queries by placing all the entities that need to be consistent in a single entity group. Note that this limits the rate at which you can write to the entity group.
In answer to your follow-up question, "Guestbook" is the name of the entity.

Google Apps Engine Datastore Search

I was wondering if there was any way to search the datastore for a entry. I have a bunch of entries for songs(title, artist,rating) but im not sure how to really search through them for both song title and artist. We take in a search term and are looking for all entries that "match." But we are lost :( any help is much appreciated!
We are using python
edit1: current code is useless, its an exact search but might help you see the issue
query = song.gql("SELECT * FROM song WHERE title = searchTerm OR artist = searchTerm")
The song data you work with sounds as a rather static data set (primarily inserts, no or few updates). In that case there is GAE technique called Relation Index Entity (RIE) which is an efficient way to implement keyword-based search.
But some preparation work required which is briefly:
build special RIE entity where you place all searchable keywords
from each song (one-to-one relationship).
RIE stores them in StringListProperty which supports searches like this:
keywords = 'SearchTerm'
(returns True if any of the values in the list keywords matches 'SearchTerm'`)
AND condition works immediately by adding multipe filters as above
OR condition needs more work by implementing in-memory merge from AND-only queries
You can find details on solution workflow and code samples in my blog Relation Index Entities with Python for Google Datastore.
http://www.billkatz.com/2009/6/Simple-Full-Text-Search-for-App-Engine

Categories

Resources