Unexpected empty results of GQL Query - python

Surprisingly, after having done a lot of queries without problem. I've run into the first strange GQL problem.
Following are the properties of a model called Feedback:
content date title type votes_count written_by
and following are configured in index.yaml:
- kind: Feedback
properties:
- name: type
- name: date
direction: desc
When I queried for all Feedback data, sorted by date, it returns me all the results:
query = GqlQuery("SELECT __key__ FROM Feedback ORDER BY date DESC")
The type property is stored in type = db.IntegerProperty(default=1, required=False, indexed=True), and there are 8 rows of Feedback data with type of integer 1.
However, when I queried:
query = GqlQuery("SELECT __key__ FROM Feedback WHERE type = :1 ORDER BY date DESC", type)
It kept returning me empty results. What has gone wrong ?
Update
def get_keys_by_feedback_type(type_type):
if type_type == FeedbackCode.ALL:
query = GqlQuery("SELECT __key__ FROM Feedback ORDER BY date DESC")
else:
query = GqlQuery("SELECT __key__ FROM Feedback WHERE type = :1 ORDER BY date DESC", type_type)
return query
results = Feedback.get_keys_by_feedback_type(int(feedback_type_filter))
for feedback_key in results:
# iterate the query results
The index is serving:
Feedback
type ▲ , date ▼ Serving

It was my bad for not describing clearly in the first place. I'm going to share the solution just in case, if somebody else faced the same problem.
The root cause of this was due to my insufficient knowledge of App Engine indexes. Earlier, the 'type' property was unindexed because I didn't plan to filter it until recent requirement changes.
Hence, I indexed the 'type' property from the property definition model as shown from the question. However, the 'type' property was remained unindexed for the reason explained from this, Indexing Formerly Unindexed Properties:
If you have existing records created with an unindexed property, that property continues to be unindexed for those records even after you change the entity (class) definition to make the property indexed again. Consequently, those records will not be returned by queries filtering on that property.
So, the solution would be :
To make a formerly unindexed property be indexed
Set indexed=True in the Property constructor:
class Person(db.Model):
name = db.StringProperty()
age = db.IntegerProperty(indexed=True)
Fetch each record.
Put each record. (You may need to change something in the record, such as the timestamp, prior to the put in order for the write to occur.)
So, there was nothing wrong with my GQL everything from the question. It was all because the 'type' property was remained unindexed. Anyway, still great thanks to #Adam Crossland for some insights and suggestions.

Related

I have a date_time field in Dynamo-db table. How can I query only the entries between two specific timedates?

I'm using boto3. The table name is exapmle_table. I want to get only specific hour entries according to the
date_time field.
So far I've tried this without a success:
def read_from_dynamodb():
now = datetime.datetime.now()
one_hour_ago = now - datetime.timedelta(hours=1)
now = timestamp = now.replace(tzinfo=timezone.utc).timestamp()
now = int(now)
one_hour_ago = one_hour_ago.replace(tzinfo=timezone.utc).timestamp()
one_hour_ago = int (one_hour_ago)
dynamodb = boto3.resource("dynamodb", aws_access_key_id=RnD_Credentials.aws_access_key_id,
aws_secret_access_key=RnD_Credentials.aws_secret_access_key,
region_name=RnD_Credentials.region
)
example_table = dynamodb.Table('example_table')
response = example_table.query(
IndexName='date_time',
KeyConditionExpression=Key('date_time').between(one_hour_ago, now)
)
return response
I'm getting the error:
botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the Query operation: The table does not have the specified index: date_time
Queries that rely on ranges - including between - need the attribute (in your case date_time) to be a sort key in an index (primary or otherwise), and you'll also need to supply the partition key as part of the KeyConditionExpression. You can't query the entire table by sort key unless all items in the table have the same partition key.
If date_time is an attribute outside of the primary key you can add a GSI where date_time is the sort key but you'll still need to supply a partition key too.
Another idea if you're writing the data yourself is to create a new attribute with the start time of the hour for that item's date_time i.e. quantize it, then create a GSI hash key on that new hour attribute. Then you can query that specific hour rather than look for ranges. If you're working with existing data you could scan and refactor your table to add this attribute - not ideal but might be a solution depending on the size of the table and your use case.
If you're really stuck you could also scan and filter the table instead of query, but that is far less efficient as it will require reading every item every time you execute it.

Peewee alias not working and throwing AttributeError

I have a Order model and Payment model. Payment model has a jsonb column data.
My Query:
orders = (
Order
.select(Order, Payment.data.alias('payment_data'))
.join(Payment, JOIN_LEFT_OUTER, on=(Order.payment==Payment.id))
.iterator()
)
When I am iterating over the above query, and accessing order.payment_data, I am getting an AttributeError
But if I write the query below, it gives me the payment_data key in the dict while iterating over the orders:
orders = (
Order
.select(Order, Payment.data.alias('payment_data'))
.join(Payment, JOIN_LEFT_OUTER, on=(Order.payment==Payment.id))
.dicts()
.iterator()
)
Can someone please explain me what I am doing wrong in the first query and how can have access to order.payment_data?
Thanks
When I am iterating over the above query, and accessing order.payment_data, I am getting an AttributeError
The payment data is probably getting attached to the related payment instance. So instead of order.payment_data you would look up the value using:
order.payment.payment_data
If you want all attributes simply patched directly onto the order, use the objects() query method, which skips the model/relation graph:
orders = (Order
.select(Order, Payment.data.alias('payment_data'))
.join(Payment, JOIN_LEFT_OUTER, on=(Order.payment==Payment.id))
.objects() # Do not make object-graph
.iterator())
for order in orders:
print(order.id, order.payment_data)
This is all covered in the docs: http://docs.peewee-orm.com/en/latest/peewee/relationships.html#selecting-from-multiple-sources
This could be a result of having NULL fields in joined results. Probably you miss payment_data for some records and peewee doesn't handle this situation as expected.
Check if your query results contain NULLs in places of payment_data. If so you should probably check if order has payment_data attribute on each iteration.
Here is more detailed explanation on Github: https://github.com/coleifer/peewee/issues/1756#issuecomment-430399189

Where clause in Google App Engine Datastore

The model for my Resource class is as follows:
class Resource(ndb.Model):
name = ndb.StringProperty()
availability = ndb.StructuredProperty(Availability, repeated=True)
tags = ndb.StringProperty(repeated=True)
owner = ndb.StringProperty()
id = ndb.StringProperty(indexed=True, required=True)
lastReservedTime = ndb.DateTimeProperty(auto_now_add=False)
startString = ndb.StringProperty()
endString = ndb.StringProperty()
I want to extract records where the owner is equal to a certain string.
I have tried the below query. It does not give an error but does not return any result either.
Resource.query(Resource.owner== 'abc#xyz.com').fetch()
As per my understanding if a column has duplicate values it shouldn't be indexed and that is why owner is not indexed. Please correct me if I am wrong.
Can someone help me figure out how to achieve a where clause kind of functionality?
Any help is appreciated! Thanks!
Just tried this. It worked first time. Either you have no Resource entities with an owner of "abc#xyz.com", or the owner property was not indexed when the entities were put (which can happen if you had indexed=False at the time the entities were put).
My test:
Resource(id='1', owner='abc#xyz.com').put()
Resource(id='2', owner='abc#xyz.com').put()
resources = Resource.query(Resource.owner == 'abc#xyz.com').fetch()
assert len(resources) == 2
Also, your comment:
As per my understanding if a column has duplicate values it shouldn't
be indexed and that is why owner is not indexed. Please correct me if
I am wrong.
Your wrong!
Firstly, there is no concept of a 'column' in a datastore model, so I will I assume you mean 'Property'.
Next, to clarify what you mean by "if a column property has duplicate values":
I assume you mean 'multiple entities created from the same model with the same value for a specific property', in your case 'owner'. This has no effect on indexing, each entity will be indexed as expected.
Or maybe you mean 'a single entity with a property that allows multiple values (ie a list)', which also does not prevent indexing. In this case, the entity will be indexed multiple times, once for each item in the list.
To further elaborate, most properties (ie ones that accept primitive types such as string, int, float etc) are indexed automatically, unless you add the attribute indexed=False to the Property constructor. In fact, the only time you really need to worry about indexing is when you need to perform more complex queries, which involve querying against more that 1 property (and even then, by default, the app engine dev server will auto create the indexes for you in your local index.yaml file), or using inequality filters.
Please read the docs for more detail.
Hope this helps!

AppEngine: Query datastore for records with <missing> value

I created a new property for my db model in the Google App Engine Datastore.
Old:
class Logo(db.Model):
name = db.StringProperty()
image = db.BlobProperty()
New:
class Logo(db.Model):
name = db.StringProperty()
image = db.BlobProperty()
is_approved = db.BooleanProperty(default=False)
How to query for the Logo records, which to not have the 'is_approved' value set?
I tried
logos.filter("is_approved = ", None)
but it didn't work.
In the Data Viewer the new field values are displayed as .
According to the App Engine documentation on Queries and Indexes, there is a distinction between entities that have no value for a property, and those that have a null value for it; and "Entities Without a Filtered Property Are Never Returned by a Query." So it is not possible to write a query for these old records.
A useful article is Updating Your Model's Schema, which says that the only currently-supported way to find entities missing some property is to examine all of them. The article has example code showing how to cycle through a large set of entities and update them.
A practice which helps us is to assign a "version" field on every Kind. This version is set on every record initially to 1. If a need like this comes up (to populate a new or existing field in a large dataset), the version field allows iteration through all the records containing "version = 1". By iterating through, setting either a "null" or another initial value to the new field, bump the version to 2, store the record, allows populating the new or existing field with a default value.
The benefit to the "version" field is that the selection process can continue to select against that lower version number (initially set to 1) over as many sessions or as much time is needed until ALL records are updated with the new field default value.
Maybe this has changed, but I am able to filter records based on null fields.
When I try the GQL query SELECT * FROM Contact WHERE demo=NULL, it returns only records for which the demo field is missing.
According to the doc http://code.google.com/appengine/docs/python/datastore/gqlreference.html:
The right-hand side of a comparison can be one of the following (as
appropriate for the property's data type): [...] a Boolean literal, as TRUE or
FALSE; the NULL literal, which represents the null value (None in
Python).
I'm not sure that "null" is the same as "missing" though : in my case, these fields already existed in my model but were not populated on creation. Maybe Federico you could let us know if the NULL query works in your specific case?

Django - SQL Query - Timestamp

Can anyone turn me to a tutorial, code or some kind of resource that will help me out with the following problem.
I have a table in a mySQL database. It contains an ID, Timestamp, another ID and a value. I'm passing it the 'main' ID which can uniquely identify a piece of data. However, I want to do a time search on this piece of data(therefore using the timestamp field). Therefore what would be ideal is to say: between the hours of 12 and 1, show me all the values logged for ID = 1987.
How would I go about querying this in Django? I know in mySQL it'd be something like less than/greater than etc... but how would I go about doing this in Django? i've been using Object.Filter for most of database handling so far. Finally, I'd like to stress that I'm new to Django and I'm genuinely stumped!
If the table in question maps to a Django model MyModel, e.g.
class MyModel(models.Model):
...
primaryid = ...
timestamp = ...
secondaryid = ...
valuefield = ...
then you can use
MyModel.objects.filter(
primaryid=1987
).exclude(
timestamp__lt=<min_timestamp>
).exclude(
timestamp__gt=<max_timestamp>
).values_list('valuefield', flat=True)
This selects entries with the primaryid 1987, with timestamp values between <min_timestamp> and <max_timestamp>, and returns the corresponding values in a list.
Update: Corrected bug in query (filter -> exclude).
I don't think Vinay Sajip's answer is correct. The closest correct variant based on his code is:
MyModel.objects.filter(
primaryid=1987
).exclude(
timestamp__lt=min_timestamp
).exclude(
timestamp__gt=max_timestamp
).values_list('valuefield', flat=True)
That's "exclude the ones less than the minimum timestamp and exclude the ones greater than the maximum timestamp." Alternatively, you can do this:
MyModel.objects.filter(
primaryid=1987
).filter(
timestamp__gte=min_timestamp
).exclude(
timestamp__gte=max_timestamp
).values_list('valuefield', flat=True)
exclude() and filter() are opposites: exclude() omits the identified rows and filter() includes them. You can use a combination of them to include/exclude whichever you prefer. In your case, you want to exclude() those below your minimum time stamp and to exclude() those above your maximum time stamp.
Here is the documentation on chaining QuerySet filters.

Categories

Resources