Where clause in Google App Engine Datastore - python

The model for my Resource class is as follows:
class Resource(ndb.Model):
name = ndb.StringProperty()
availability = ndb.StructuredProperty(Availability, repeated=True)
tags = ndb.StringProperty(repeated=True)
owner = ndb.StringProperty()
id = ndb.StringProperty(indexed=True, required=True)
lastReservedTime = ndb.DateTimeProperty(auto_now_add=False)
startString = ndb.StringProperty()
endString = ndb.StringProperty()
I want to extract records where the owner is equal to a certain string.
I have tried the below query. It does not give an error but does not return any result either.
Resource.query(Resource.owner== 'abc#xyz.com').fetch()
As per my understanding if a column has duplicate values it shouldn't be indexed and that is why owner is not indexed. Please correct me if I am wrong.
Can someone help me figure out how to achieve a where clause kind of functionality?
Any help is appreciated! Thanks!

Just tried this. It worked first time. Either you have no Resource entities with an owner of "abc#xyz.com", or the owner property was not indexed when the entities were put (which can happen if you had indexed=False at the time the entities were put).
My test:
Resource(id='1', owner='abc#xyz.com').put()
Resource(id='2', owner='abc#xyz.com').put()
resources = Resource.query(Resource.owner == 'abc#xyz.com').fetch()
assert len(resources) == 2
Also, your comment:
As per my understanding if a column has duplicate values it shouldn't
be indexed and that is why owner is not indexed. Please correct me if
I am wrong.
Your wrong!
Firstly, there is no concept of a 'column' in a datastore model, so I will I assume you mean 'Property'.
Next, to clarify what you mean by "if a column property has duplicate values":
I assume you mean 'multiple entities created from the same model with the same value for a specific property', in your case 'owner'. This has no effect on indexing, each entity will be indexed as expected.
Or maybe you mean 'a single entity with a property that allows multiple values (ie a list)', which also does not prevent indexing. In this case, the entity will be indexed multiple times, once for each item in the list.
To further elaborate, most properties (ie ones that accept primitive types such as string, int, float etc) are indexed automatically, unless you add the attribute indexed=False to the Property constructor. In fact, the only time you really need to worry about indexing is when you need to perform more complex queries, which involve querying against more that 1 property (and even then, by default, the app engine dev server will auto create the indexes for you in your local index.yaml file), or using inequality filters.
Please read the docs for more detail.
Hope this helps!

Related

GAE python ndb - How to get_by_id with projection?

I'd like to do this.
Content.get_by_id(content_id, projection=['title'])
However, I got an error.
TypeError: Unknown configuration option ('projection')
I should do like this. How?
Content.query(key=Key('Content', content_id)).get(projection=['title'])
Why bother projection for getting an entity? Because Content.body could be large so that I want to reduce db read time and instance hours.
If you are using ndb, the below query should work
Content.query(key=Key('Content', content_id)).get(projection=[Content.title])
Note: It gets this data from the query index. So, make sure that index is enabled for the column. Reference https://developers.google.com/appengine/docs/python/ndb/queries#projection
I figured out that following code.
Content.query(Content.key == ndb.Key('Content', content_id)).get(projection=['etag'])
I found a hint from https://developers.google.com/appengine/docs/python/ndb/properties
Don't name a property "key." This name is reserved for a special
property used to store the Model key. Though it may work locally, a
property named "key" will prevent deployment to App Engine.
There is a simpler method than the currently posted answers.
As previous answers have mentioned, projections are only for ndb.Queries.
Previous answers suggest to use the entity returned by get_by_id to perform a projection query in the form of:
<Model>.query(<Model>.key == ndb.Key('<Model>', model_id).get(projection=['property_1', 'property_2', ...])
However, you can just manipulate the model's _properties directly. (See: https://cloud.google.com/appengine/docs/standard/python/ndb/modelclass#intro_properties)
For example:
desired_properties = ['title', 'tags']
content = Content.get_by_id(content_id)
content._properties = {k: v for k, v in content._properties.iteritems()
if k in desired_properties}
print content
This would update the entity properties and only return those properties whose keys are in the desired_properties list.
Not sure if this is the intended functionality behind _properties but it works, and it also prevents the need of generating/maintaining additional indexes for the projection queries.
The only down-side is that this retrieves the entire entity in-memory first. If the entity has arbitrarily large metadata properties that will affect performance, it would be a better idea to use the projection query instead.
Projection is only for query, not get by id. You can put the content.body in a different db model and store only the ndb.Key of it in the Content.

Variable interpolation in python/django, django query filters [duplicate]

Given a class:
from django.db import models
class Person(models.Model):
name = models.CharField(max_length=20)
Is it possible, and if so how, to have a QuerySet that filters based on dynamic arguments? For example:
# Instead of:
Person.objects.filter(name__startswith='B')
# ... and:
Person.objects.filter(name__endswith='B')
# ... is there some way, given:
filter_by = '{0}__{1}'.format('name', 'startswith')
filter_value = 'B'
# ... that you can run the equivalent of this?
Person.objects.filter(filter_by=filter_value)
# ... which will throw an exception, since `filter_by` is not
# an attribute of `Person`.
Python's argument expansion may be used to solve this problem:
kwargs = {
'{0}__{1}'.format('name', 'startswith'): 'A',
'{0}__{1}'.format('name', 'endswith'): 'Z'
}
Person.objects.filter(**kwargs)
This is a very common and useful Python idiom.
A simplified example:
In a Django survey app, I wanted an HTML select list showing registered users. But because we have 5000 registered users, I needed a way to filter that list based on query criteria (such as just people who completed a certain workshop). In order for the survey element to be re-usable, I needed for the person creating the survey question to be able to attach those criteria to that question (don't want to hard-code the query into the app).
The solution I came up with isn't 100% user friendly (requires help from a tech person to create the query) but it does solve the problem. When creating the question, the editor can enter a dictionary into a custom field, e.g.:
{'is_staff':True,'last_name__startswith':'A',}
That string is stored in the database. In the view code, it comes back in as self.question.custom_query . The value of that is a string that looks like a dictionary. We turn it back into a real dictionary with eval() and then stuff it into the queryset with **kwargs:
kwargs = eval(self.question.custom_query)
user_list = User.objects.filter(**kwargs).order_by("last_name")
Additionally to extend on previous answer that made some requests for further code elements I am adding some working code that I am using
in my code with Q. Let's say that I in my request it is possible to have or not filter on fields like:
publisher_id
date_from
date_until
Those fields can appear in query but they may also be missed.
This is how I am building filters based on those fields on an aggregated query that cannot be further filtered after the initial queryset execution:
# prepare filters to apply to queryset
filters = {}
if publisher_id:
filters['publisher_id'] = publisher_id
if date_from:
filters['metric_date__gte'] = date_from
if date_until:
filters['metric_date__lte'] = date_until
filter_q = Q(**filters)
queryset = Something.objects.filter(filter_q)...
Hope this helps since I've spent quite some time to dig this up.
Edit:
As an additional benefit, you can use lists too. For previous example, if instead of publisher_id you have a list called publisher_ids, than you could use this piece of code:
if publisher_ids:
filters['publisher_id__in'] = publisher_ids
Django.db.models.Q is exactly what you want in a Django way.
This looks much more understandable to me:
kwargs = {
'name__startswith': 'A',
'name__endswith': 'Z',
***(Add more filters here)***
}
Person.objects.filter(**kwargs)
A really complex search forms usually indicates that a simpler model is trying to dig it's way out.
How, exactly, do you expect to get the values for the column name and operation?
Where do you get the values of 'name' an 'startswith'?
filter_by = '%s__%s' % ('name', 'startswith')
A "search" form? You're going to -- what? -- pick the name from a list of names? Pick the operation from a list of operations? While open-ended, most people find this confusing and hard-to-use.
How many columns have such filters? 6? 12? 18?
A few? A complex pick-list doesn't make sense. A few fields and a few if-statements make sense.
A large number? Your model doesn't sound right. It sounds like the "field" is actually a key to a row in another table, not a column.
Specific filter buttons. Wait... That's the way the Django admin works. Specific filters are turned into buttons. And the same analysis as above applies. A few filters make sense. A large number of filters usually means a kind of first normal form violation.
A lot of similar fields often means there should have been more rows and fewer fields.

Unexpected empty results of GQL Query

Surprisingly, after having done a lot of queries without problem. I've run into the first strange GQL problem.
Following are the properties of a model called Feedback:
content date title type votes_count written_by
and following are configured in index.yaml:
- kind: Feedback
properties:
- name: type
- name: date
direction: desc
When I queried for all Feedback data, sorted by date, it returns me all the results:
query = GqlQuery("SELECT __key__ FROM Feedback ORDER BY date DESC")
The type property is stored in type = db.IntegerProperty(default=1, required=False, indexed=True), and there are 8 rows of Feedback data with type of integer 1.
However, when I queried:
query = GqlQuery("SELECT __key__ FROM Feedback WHERE type = :1 ORDER BY date DESC", type)
It kept returning me empty results. What has gone wrong ?
Update
def get_keys_by_feedback_type(type_type):
if type_type == FeedbackCode.ALL:
query = GqlQuery("SELECT __key__ FROM Feedback ORDER BY date DESC")
else:
query = GqlQuery("SELECT __key__ FROM Feedback WHERE type = :1 ORDER BY date DESC", type_type)
return query
results = Feedback.get_keys_by_feedback_type(int(feedback_type_filter))
for feedback_key in results:
# iterate the query results
The index is serving:
Feedback
type ▲ , date ▼ Serving
It was my bad for not describing clearly in the first place. I'm going to share the solution just in case, if somebody else faced the same problem.
The root cause of this was due to my insufficient knowledge of App Engine indexes. Earlier, the 'type' property was unindexed because I didn't plan to filter it until recent requirement changes.
Hence, I indexed the 'type' property from the property definition model as shown from the question. However, the 'type' property was remained unindexed for the reason explained from this, Indexing Formerly Unindexed Properties:
If you have existing records created with an unindexed property, that property continues to be unindexed for those records even after you change the entity (class) definition to make the property indexed again. Consequently, those records will not be returned by queries filtering on that property.
So, the solution would be :
To make a formerly unindexed property be indexed
Set indexed=True in the Property constructor:
class Person(db.Model):
name = db.StringProperty()
age = db.IntegerProperty(indexed=True)
Fetch each record.
Put each record. (You may need to change something in the record, such as the timestamp, prior to the put in order for the write to occur.)
So, there was nothing wrong with my GQL everything from the question. It was all because the 'type' property was remained unindexed. Anyway, still great thanks to #Adam Crossland for some insights and suggestions.

Designing a scalable product database on Google App Engine

I've built a product database that is divided in 3 parts. And each part has a "sub" part containing labels. But the more I work with it the more unstable it feels. And each addition I make it takes more and more code to get it to work.
A product is built of parts, and each part is of a type. Each product, part and type has a label. And there's a label for each language.
A product contains parts in 2 list. One list for default parts (one of each type) and one of optional parts.
Now I want to add currency in the mix and have come to the decision to re-model the entire way I handle this.
The result I want to get is a list of all product objects that contains the name, description, price, all parts and all types that match the parts. And for these the correct language labels.
Like so:
product
- name
- description (by language)
- price (by currency)
- parts
- part (type name and part name by language)
- partPrice (by currency)
The problem with my current setup that is a wild mix of db.ReferenceProperty and db.ListProperty(db.key)
And getting all data by is a bit of a hassle that require multiple for-loops, matching dict and datastore calls. Well it's bit of a mess.
The re-model(un-tested) look like this
class Products(db.model)
name = db.StringProperty()
imageUrl = db.StringProperty()
optionalParts = db.ListProperty(db.Key)
defaultParts = db.ListProperty(db.Key)
active = db.BooleanProperty(default=True)
#property
def itemId(self):
return self.key().id()
class ProductPartTypes(db.Model):
name= db.StringProperty()
#property
def itemId(self):
return self.key().id()
class ProductParts(db.Model):
name = db.StringProperty()
type = db.ReferenceProperty(ProductPartTypes)
imageUrl = db.StringProperty()
parts = db.ListProperty(db.Key)
#property
def itemId(self):
return self.key().id()
class Labels(db.Model)
key = db.StringProperty() #want to store a key here
language = db.StringProperty()
label = db.StringProperty()
class Price(db.Model)
key = db.StringProperty() #want to store a key here
language = db.StringProperty()
price = db.IntegerProperty()
The major thing here is that I've split the Labels and Price out. So these can contain labels and prices for any products, parts or types.
So what I am curious about, is this a solid solution from a architectural point of view? Will this hold even if there's thousands of entries in each model?
Also, any tips for retrieving data in a good manner are welcome. My current solution of get all data first and for-looping over them and stick them in dicts works but feels like it could fail any minute.
..fredrik
You need to keep in mind that App Engine's datastore requires you to rethink your usual way of designing databases. It goes against intuition at first but you must denormalize your data as much as possible if you want your application to be scalable. The datastore has been designed this way.
The approach I usually take is to consider first what kind of queries will need to be done in different use cases, eg. what data do I need to retrieve at the same time ? In what order ? What properties should be indexed ?
If I understand correctly, your main goal is to fetch a list of products with complete details. BTW, if you have other query scenarios - ie. filtering on price, type, etc - you should take them into account too.
In order to fetch all the data you need from only one query, I suggest you create one model which could look like this :
class ProductPart(db.Model):
product_name = db.StringProperty()
product_image_url = db.StringProperty()
product_active = db.BooleanProperty(default=True)
product_description = db.StringListProperty(indexed=False) # Contains product description in all languages
part_name = db.StringProperty()
part_image_url = db.StringProperty()
part_type = db.StringListProperty(indexed=False) # Contains part type in all languages
part_label = db.StringListProperty(indexed=False) # Contains part label in all languages
part_price = db.ListProperty(float, indexed=False) # Contains part price in all currencies
part_default = db.BooleanProperty()
part_optional = db.BooleanProperty()
About this solution :
ListProperties are set to
indexed=False in order to avoid
exploding indexes if you don't need
to filter on them.
In order to get the right
description, label or type, you will have to set
list values always in the same order.
For example : part_label[0] is
English, part_label[1] is Spanish,
etc. Same idea for prices and
currencies.
After fetching entities from this
model you will have to do some
in-memory manipulations in order to
get the data nicely structured the way
you want, maybe in a new dictionary.
Obviously, there will be a lot of redundancy in the datastore with such a design - but that's okay, since it allows you to query the datastore in a scalable fashion.
Besides, this is not meant as a replacement for the architecture that you had in mind, but rather an additional Model designed specifically for the user-facing kind of queries that you need to do, ie. retrieving lists of complete product/parts information.
These ProductPart entities could be populated by background tasks, replicating data located in your other normalized entities which would be the authoritative data source. Since you have plenty of data storage on App Engine, this should not be a problem.
IMO your design mostly makes sense. I did come up with almost same design after reading your problem statement. With a few differnces
I had prices with Product and ProductPart not as a separate table.
Other difference was part_types. If there are not many part_type you can simply have them as python list/tuple.
part_types = ('wheel', 'break', 'mirror')
It also depends on kind of queries you are anticipating. If there are many queries of nature price calculation (independent of rest of product and part info) then it might make sense to design it way you have done.
You have mentioned that you will get all the data first. Isn't querying possible? If you get the whole data in your app and then sort/filter in python then it would be slow. Which database are you considering? For me mongodb looks like a good option here.
Finally why are you suspicious about even 1000 records? You can run a few tests on your db beforehand.
Bests

AppEngine: Query datastore for records with <missing> value

I created a new property for my db model in the Google App Engine Datastore.
Old:
class Logo(db.Model):
name = db.StringProperty()
image = db.BlobProperty()
New:
class Logo(db.Model):
name = db.StringProperty()
image = db.BlobProperty()
is_approved = db.BooleanProperty(default=False)
How to query for the Logo records, which to not have the 'is_approved' value set?
I tried
logos.filter("is_approved = ", None)
but it didn't work.
In the Data Viewer the new field values are displayed as .
According to the App Engine documentation on Queries and Indexes, there is a distinction between entities that have no value for a property, and those that have a null value for it; and "Entities Without a Filtered Property Are Never Returned by a Query." So it is not possible to write a query for these old records.
A useful article is Updating Your Model's Schema, which says that the only currently-supported way to find entities missing some property is to examine all of them. The article has example code showing how to cycle through a large set of entities and update them.
A practice which helps us is to assign a "version" field on every Kind. This version is set on every record initially to 1. If a need like this comes up (to populate a new or existing field in a large dataset), the version field allows iteration through all the records containing "version = 1". By iterating through, setting either a "null" or another initial value to the new field, bump the version to 2, store the record, allows populating the new or existing field with a default value.
The benefit to the "version" field is that the selection process can continue to select against that lower version number (initially set to 1) over as many sessions or as much time is needed until ALL records are updated with the new field default value.
Maybe this has changed, but I am able to filter records based on null fields.
When I try the GQL query SELECT * FROM Contact WHERE demo=NULL, it returns only records for which the demo field is missing.
According to the doc http://code.google.com/appengine/docs/python/datastore/gqlreference.html:
The right-hand side of a comparison can be one of the following (as
appropriate for the property's data type): [...] a Boolean literal, as TRUE or
FALSE; the NULL literal, which represents the null value (None in
Python).
I'm not sure that "null" is the same as "missing" though : in my case, these fields already existed in my model but were not populated on creation. Maybe Federico you could let us know if the NULL query works in your specific case?

Categories

Resources