Google App Engine NDB: How to store document structure? - python

From App Engine NDB documentation:
The NDB API provides persistent storage in a schemaless object
datastore. It supports automatic caching, sophisticated queries, and
atomic transactions. NDB is well-suited to storing structured data
records.
I want to create a structure like the following using NDB, where each instance looks like :
{
city: 'SFO'
date: '2013-01-27'
data: {
'keyword1': count1,
'keyword2': count2,
'keyword3': count3,
'keyword4': count4,
'keyword5': count5,
....
}
}
How can I design such a schema-less entity in Google App Engine(GAE) using NDB?
I am new to GAE and not sure how to achieve this
Thank you

If you don't need to query for the attributes in data you can use one of the properties as mentioned by #voscausa:
JsonProperty
class MyModel(ndb.Model):
city = ndb.StringProperty()
date = ndb.DateProperty()
data = ndb.JsonProperty()
my_model = MyModel(city="somewhere",
date=datetime.date.today(),
data={'keyword1': 3,
'keyword2': 5,
'keyword3': 1,})
StructuredProperty:
class Data(ndb.Model):
keyword = ndb.StringProperty()
count = ndb.IntegerProperty()
class MyModel(ndb.Model):
city = ndb.StringProperty()
date = ndb.DateProperty()
data = ndb.StructuredProperty(Data, repeated=True)
my_model = MyModel(city="somewhere",
date=datetime.date.today(),
data=[Data(keyword="keyword1", count=3),
Data(keyword="keyword2", count=5),
Data(keyword="keyword3", count=1)])
my_model.put()
The problem here is filtering for structured properties. The properties of Keyword are viewed as parallel arrays. Doing a query such as:
q = MyModel.query(MyModel.data.keyword=='keyword1',
MyModel.data.count > 4)
would incorrectly include my_model.
https://developers.google.com/appengine/docs/python/ndb/queries#filtering_structured_properties
Using an expando model would work and allow you to query for keywords:
class MyModel(ndb.Expando):
city = ndb.StringProperty()
date = ndb.DateProperty()
m = MyModel(city="Somewhere", date=datetime.date.today())
m.keyword1 = 3
m.keyword2 = 5
m.keyword3 = 1
m.put()
q = MyModel.query(ndb.GenericProperty('keyword1') > 2)
https://developers.google.com/appengine/docs/python/ndb/entities#expando

You can use the ndb.JsonProperty to represent a list a dictionary or a string in your model. You can have a look in the documentation for more information.

Related

Optimize Django ORM query to get object if a specific related object does not exist

I have the following table structures:
class Library:
id = models.CharField(...)
bookcase = models.ForeignKey(
Bookcase,
related_name="libraries"
)
location = models.ChoiceField(...)
# Other attributes...
class Bookcase:
# some attributes
type = models.ChoiceField(..)
class Book:
bookcase = models.ForeignKey(
Bookcase,
related_name="books"
)
title=models.CharField(...)
status=models.ChoiceField(...) # borrowed | missing | available
Say if I want to get all Library objects that does not have a book with title "Foo" that is NOT missing, how can I optimize this query? I have the following:
libraries = Library.objects.select_related('bookcase').filter(location='NY', bookcase__type='wooden')
libraries_without_book = []
for library in libraries:
has_non_missing_book = Book.objects.filter(
bookcase=library.bookcase,
title="Foo",
).exclude(status='missing').exists()
if not has_non_missing_book:
libraries_without_book.append(library.id)
Unfortunately, this performs an extra query for every Library object that matches the initial filtering condition. Is there a more optimized method I can use here that makes use of prefetch_related in some way?
Book.objects.filter(~Q(status='missing'),bookcase=library.bookcase,title='Foo')
This query should be sufficient

NDB query using filters on Structured property which is also repeated ?

I am creating a sample application storing user detail along with its class information.
Modal classes being used are :
Model class for saving user's class data
class MyData(ndb.Model):
subject = ndb.StringProperty()
teacher = ndb.StringProperty()
strength = ndb.IntegerProperty()
date = ndb.DateTimeProperty()
Model class for user
class MyUser(ndb.Model):
user_name = ndb.StringProperty()
email_id = ndb.StringProperty()
my_data = ndb.StructuredProperty(MyData, repeated = True)
I am able to successfully store data into the datastore and can also make simple query on the MyUser entity using some filters based on email_id and user_name.
But when I try to query MyUser result using filter on a property from the MyUser modal's Structured property that is my_data, its not giving correct result.
I think I am querying incorrectly.
Here is my query function
function to query based upon the repeated structure property
def queryMyUserWithStructuredPropertyFilter():
shail_users_query = MyUser.query(ndb.AND(MyUser.email_id == "napolean#gmail.com", MyUser.my_data.strength > 30))
shail_users_list = shail_users_query.fetch(10)
maindatalist=[]
for each_user in shail_users_list:
logging.info('NEW QUERY :: The user details are : %s %s'% (each_user.user_name, each_user.email_id))
# Class data
myData = each_user.my_data
for each_my_data in myData:
templist = [each_my_data.strength, str(each_my_data.date)]
maindatalist.append(templist)
logging.info('NEW QUERY :: The class data is : %s %s %s %s'% (each_my_data.subject, each_my_data.teacher, str(each_my_data.strength),str(each_my_data.date)))
return maindatalist
I want to fetch that entity with repeated Structured property (my_data) should be a list which has strength > 30.
Please help me in knowing where I am doing wrong.
Thanks.
Queries over StructuredProperties return objects for which at least one of the structured ones satisfies the conditions. If you want to filter those properties, you'll have to do it afterwards.
Something like this should do the trick:
def queryMyUserWithStructuredPropertyFilter():
shail_users_query = MyUser.query(MyUser.email_id == "napolean#gmail.com", MyUser.my_data.strength > 30)
shail_users_list = shail_users_query.fetch(10)
# Here, shail_users_list has at most 10 users with email being
# 'napolean#gmail.com' and at least one element in my_data
# with strength > 30
maindatalist = [
[[data.strength, str(data.date)] for data in user.my_data if data.strength > 30] for user in shail_users_list
]
# Now in maindatalist you have ONLY those my_data with strength > 30
return maindatalist

How to fetch the latest data in GAE Python NDB

I am using GAE Python. I have two root entities:
class X(ndb.Model):
subject = ndb.StringProperty()
grade = ndb.StringProperty()
class Y(ndb.Model):
identifier = ndb.StringProperty()
name = ndb.StringProperty()
school = ndb.StringProperty()
year = ndb.StringProperty()
result = ndb.StructuredProperty(X, repeated=True)
Since google stores our data across several data centers, we might not get the most recent data when we do a query as shown below(in case some changes have been "put"):
def post(self):
identifier = self.request.get('identifier')
name = self.request.get('name')
school = self.request.get('school')
year = self.request.get('year')
qry = Y.query(ndb.AND(Y.name==name, Y.school==school, Y.year==year))
record_list = qry.fetch()
My question: How should I modify the above fetch operation to always get the latest data
I have gone through the related google help doc but could not understand how to apply that here
Based on hints from Isaac answer, Would the following be the solution(would "latest_record_data" contain the latest data of the entity):
def post(self):
identifier = self.request.get('identifier')
name = self.request.get('name')
school = self.request.get('school')
year = self.request.get('year')
qry = Y.query(ndb.AND(Y.name==name, Y.school==school, Y.year==year))
record_list = qry.fetch()
record = record_list[0]
latest_record_data = record.key.get()
There's a couple ways on app engine to get strong consistency, most commonly using gets instead of queries and using ancestor queries.
To use a get in your example, you could encode the name into the entity key:
class Y(ndb.Model):
result = ndb.StructuredProperty(X, repeated=True)
def put(name, result):
Y(key=ndb.Key(Y, name), result).put()
def get_records(name):
record_list = ndb.Key(Y, name).get()
return record_list
An ancestor query uses similar concepts to do something more powerful. For example, fetching the latest record with a specific name:
import time
class Y(ndb.Model):
result = ndb.StructuredProperty(X, repeated=True)
#classmethod
def put_result(cls, name, result):
# Don't use integers for last field in key. (one weird trick)
key = ndb.Key('name', name, cls, str(int(time.time())))
cls(key=key, result=result).put()
#classmethod
def get_latest_result(cls, name):
qry = cls.query(ancestor=ndb.Key('name', name)).order(-cls.key)
latest = qry.fetch(1)
if latest:
return latest[0]
The "ancestor" is the first pair of the entity's key. As long as you can put a key with at least the first pair into the query, you'll get strong consistency.

Google App Engine: defining custom id and querying

I want to define a custom string as an ID so I created the following Model:
class WikiPage(ndb.Model):
id = ndb.StringProperty(required=True, indexed=True)
content = ndb.TextProperty(required=True)
history = ndb.DateTimeProperty(repeated=True)
Based on this SO thread, I believe this is right.
Now I try to query by this id by:
entity = WikiPage.get_by_id(page) # page is an existing string id, passed in as an arg
This is based on the NDB API.
This however isn't returning anything -- entity is None.
It only works when I run the following query instead:
entity = WikiPage.query(WikiPage.id == page).get()
Am I defining my custom key incorrectly or misusing get_by_id() somehow?
Example:
class WikiPage(ndb.Model):
your_id = ndb.StringProperty(required=True)
content = ndb.TextProperty(required=True)
history = ndb.DateTimeProperty(repeated=True)
entity = WikiPage(id='hello', your_id='hello', content=...., history=.....)
entity.put()
entity = WikiPage.get_by_id('hello')
or
key = ndb.Key('WikiPage','hello')
entity = key.get()
entity = WikiPage.get_by_id(key.id())
and this still works:
entity = WikiPage.query(WikiPage.your_id == 'hello').get()

Accessing fields in model in post procedure in Google App Engine

I have a post(self) and I want to add some logic here to add lat and lng (these are computed from google maps) to the data store as defined in my db model. Should I add to data, or should I do it some other way such as with the original class. What is the best way to do this?
so...
class Company(db.Model):
company_type = db.StringProperty(required=True, choices=["PLC", "LTD", "LLC", "Sole Trader", "Other"])
company_lat = db.StringProperty(required=True)
company_lng = db.StringProperty(required=True)
class CompanyForm(djangoforms.ModelForm):
company_description = forms.CharField(widget=forms.Textarea(attrs={'rows':'2', 'cols':'20'}))
company_address = forms.CharField(widget=forms.Textarea(attrs={'rows':'2', 'cols':'20'}))
class Meta:
model = Company
exclude = ['company_lat,company_lng']
def post(self):
data = CompanyForm(data=self.request.POST)
map_url = ''
address = self.request.get("company_postcode")
...
lat = response['results'][0]['geometry']['location']['lat']
lng = response['results'][0]['geometry']['location']['lng']
...
# How do I add these fields lat and lng to my data store?
# Should I add them to data? if this is possible?
# Or shall I do it some other way?
Thanks
The djangoforms help page explains how to add data to your datastore entity. Call save method with commit=False. It returns datastore entity and then you can add fields before saving it with put()
def post(self):
...
# This code is after the code above
if data.is_valid():
entity=data.save(commit=False)
entity.company_lat=lat
entity.company_lng=lng
entity.put()
It really depends on the types of queries you intend to do. If you want to perform geospatial queries, GeoModel is built for your use case.

Categories

Resources