No Results from Multi-property Projection Query in Google Cloud Datastore

No Results from Multi-property Projection Query in Google Cloud Datastore - python

I'm trying to query the database with:
fields = "property_1, property_2, ... property_n"
query = "SELECT {0} FROM Table WHERE property_{n+1} = '{1}'".format(fields, property_{n+1})
all_objs = CacheDatastore.fetch(query, refresh=True)
The problem is that the returned list is empty, while if the query is like
"SELECT * FROM Table WHERE property_{n+1} ='{1}'", I receive the full set.
I've created the necessary indexes and have deployed them, so it is not from there.
The log says that a Blob key is not found, but none of the properties is different from string, float or int...

It turned to be a bug in the db library which is no longer in development, so I'm leaving here the link to the ticket and the comments on it.
GAE allows indexing of static members of the db.Model class hierarchy, but returns 0 results for projection queries where the static member is included
among the projected properties.
https://code.google.com/p/google-cloud-platform/issues/detail?id=119

Related

Django with Oracle DB - ORA-19011: Character string buffer too small

I have the following model for an Oracle database, which is not a part of my Django project:
class ResultsData(models.Model):
RESULT_DATA_ID = models.IntegerField(primary_key=True, db_column="RESULT_DATA_ID")
RESULT_XML = models.TextField(blank=True, null=True, db_column="RESULT_XML")
class Meta:
managed = False
db_table = '"schema_name"."results_data"'
The RESULT_XML field in the database itself is declared as XMLField. I chose to represent it as TextField in Django model, due to no character limit.
When I do try to download some data with that model, I get the following error:
DatabaseError: ORA-19011: Character string buffer too small
I figure, it is because of the volume of data stored in RESULT_XML field, since when I try to just pull a record with .values("RESULT_DATA_ID"), it pulls fine.
Any ideas on how I can work around this problem? Googling for answers did not yield anything so far.

UPDATED ANSWER
I have found a much better way of dealing with that issue - I wrote a custom field value Transform object, which generates an Oracle SQL query I was after:
OracleTransforms.py
from django.db.models import TextField
from django.db.models.lookups import Transform
class CLOBVAL(Transform):
'''
Oracle-specific transform for XMLType field, which returns string data exceeding
buffer size (ORA-19011: Character string buffer too small) as a character LOB type.
'''
function = None
lookup_name = 'clobval'
def as_oracle(self, compiler, connection, **extra_context):
return super().as_sql(
compiler, connection,
template='(%(expressions)s).GETCLOBVAL()',
**extra_context
)
# Needed for CLOBVAL to work as a .values('field_name__clobval') lookup in Django ORM queries
TextField.register_lookup(CLOBVAL)
With the above, I can now just write a query as follows:
from .OracleTransforms import CLOBVAL
ResultsData.objects.filter(RESULT_DATA_ID=some_id).values('RESULT_DATA_ID', 'RESULT_XML__clobval')
or
ResultsData.objects.filter(RESULT_DATA_ID=some_id).values('RESULT_DATA_ID', XML = CLOBVAL('RESULT_XML'))
This is the best solution for me, as I do get to keep using QuerySet, instead of RawQuerySet.
The only limitation I see with this solution for now, is that I need to always specify .values(CLOBVAL('RESULT_XML')) in my ORM queries, or Oracle DB will report ORA-19011 again, but I guess this still is a good outcome.
OLD ANSWER
So, I have found a way around the problem, thanks to Christopher Jones suggestion.
ORA-19011 is an error which Oracle DB replies with, when the amount of data it would be sending back as a string exceeds allowed buffer. Therefore, it needs to be sent back as a character LOB object instead.
Django does not have a direct support for that Oracle-specific method (at least I did not find one), so an answer to the problem was a raw Django query:
query = 'select a.RESULT_DATA_ID, a.RESULT_XML.getClobVal() as RESULT_XML FROM SCHEMA_NAME.RESULTS_DATA a WHERE a.RESULT_DATA_ID=%s'
data = ResultsData.objects.raw(query, [id])
This way, you get back a RawQuerySet, which if this less known, less liked cousin of Django's QuerySet. You can iterate through the answer, and RESULT_XML will contain a LOB field, which when interrogated will convert to a String type.
Handling a String type-encoded XML data is problematic, so I also employed XMLTODICT Python package, to get it into a bit more civilized shape.
Next, I should probably look for a way to modify Django's getter for the RESULT_XML field only, and have it generate a query to Oracle DB with .getClobVal() method in it, but I will touch on that in a different StackOverflow question: Django - custom getter for 1 field in model

Unable to update document without specify Primary Key

schema.py:
class Test(Document):
_id = StringField()
classID = StringField(required=True, unique=True)
status = StringField()
====================
database.py:
query = schema.Test(_id = id)
query.update(status = "confirm")
Critical error occured. attempt to update a document not yet saved
I can update the DB only if I specify _id = StringField(primary_key=True), but if I insert a new data, the _id has to be inserted by me instead of automatically created by MongoDB.
Anyone can help me with a solution?
Thanks!

Inserts and updates are distinct operations in MongoDB:
Insert adds a document to the collection
Update finds a document in the collection given a search criteria, then changes this document
If you haven't inserted a document, trying to update it won't do anything since it will never be found by any search criteria. Your ODM is pointing this out to you and prevents you from updating a document you haven't saved. Using the driver you can issue the update anyway but it won't have any effect.
If you want to add a new document to the database, use inserts. To change documents that are already saved, use updates. To change fields on document instances without saving them, consult your ODM documentation to figure out how to do that instead of attempting to save the documents.

How to query bigquery tables in google data studio with python-like string formatting in table names based on custom parameters?

So I have several tables with each product for each year and tables go like:
2020product5, 2019product5, 2018product6 and so on. I have added two custom parameters in google data studio as well named year and product_id but could not use them in table names themselves. I have used parameterized queries before but in conditions like where product_id = #product_id but this setup only works if all of the data is in same table which is not the current case with me. In python I use string formatters like f"{year}product{product_id}" but that obviously does not work in this case...
Using Bigquery Default CONCAT & FORMAT functions does not help as both throw following validation error: Table-valued function not found: CONCAT at [1:15]
So how do I get around with querying bigquery tables in google data studio with python-like string formatting in table names based on custom parameters?

After much research I (kinda) sorted it out. Turns out it is a database level feature to query schema-level entities e.g. table names dynamically. BigQuery does not support formatting within table name like tables as per in question (e.g. 2020product5, 2019product5, 2018product6) cannot be queried directly. However, it does have a TABLE_SUFFIX function which allow you to access tables dynamically given that changes in table names are located at the end of the table. (This feature also allowed for dateweise partitioning and many tools which use BQ as data sink, utilize this. So If you are using BQ as data sink, there is good chance that your original data source is already doing so). Thus, table names like (product52020, product52019, product62018) as well can be accessed dynamically and of course from data studio too using following:
SELECT * FROM `project_salsa_101.dashboards.product*` WHERE _table_Suffix = CONCAT(#product_id,#year)
P.S.: Used python to create a dirty script which looped through products and tables and copied and created new ones which goes as follows: (Adding script with formatted string so it might be useful for anyone with such case wtih nominal effort)
import itertools
credentials = service_account.Credentials.from_service_account_file(
'project_salsa_101-bq-admin.json')
project_id = 'project_salsa_101'
schema = 'dashboards'
client = bigquery.Client(credentials= credentials,project=project_id)
for product_id, year in in itertools.product(product_ids, years):
df = client.query(f"""
SELECT * FROM `{project_id}.{schema}.{year}product{product_id}`
""").result().to_dataframe()
df.to_gbq(project_id = project_id,
destination_table = f'{schema}.product{product_id}{year}',
credentials = service_account.Credentials.from_service_account_file(
'credentials.json'),
if_exists = 'replace')
client.query(f"""
DROP TABLE `{project_id}.{schema}.{year}product{product_id}`""").result()

python mysqlx.Result.get_autoincrement_value() doesn't work

I'm trying to use the document storage of MySQL 8 in my python project(python 3.8). The version of MySQL-connector python is 8.0.20. According to the API reference and the X DevAPI User Guide, I tried to get the auto increment document ID after adding a document into the DB. Each time after the execution, the data would be inserted into DB successfully, but '-1' would be returned after get_autoincrement_value() was invoked.
My code is just like below:
try:
schema = session.get_schema('my_schema')
collection = schema.get_collection('my_collection')
topic_dict = protobuf_to_dict(topic)
doc_id = collection.add(topic_dict).execute().get_autoincrement_value()
logger.debug('doc_id: {}', doc_id)
return doc_id
except Exception as e:
logger.exception("failed to add topic to db, topic: {}, err: {}", topic, e)
Is there anything wrong with my usage? Thank you all~

Seems like you are interested in the document id that has been auto-generated. If that is the case, you should instead use get_generated_ids:
doc_id = collection.add(topic_dict).execute().get_generated_ids()[0]
In this case, the method returns a list of all the ids that were generated in the scope of the add() operation.
The documentation is probably not clear enough, but get_auto_increment_value() only contains useful data if you are inserting a row with either session.sql() or table.insert() on a table containing an AUTO_INCREMENT column. It has no meaning in the scope of NoSQL collections because in the end a collection is just a table created like (condensed version):
CREATE TABLE collection (
`doc` json DEFAULT NULL,
`_id` varbinary(32),
PRIMARY KEY (`_id`)
)
Which means there isn't anything to "auto increment".
Disclaimer: I'm the lead developer of the MySQL X DevAPI Connector for Node.js

AppEngine: Query datastore for records with <missing> value

I created a new property for my db model in the Google App Engine Datastore.
Old:
class Logo(db.Model):
name = db.StringProperty()
image = db.BlobProperty()
New:
class Logo(db.Model):
name = db.StringProperty()
image = db.BlobProperty()
is_approved = db.BooleanProperty(default=False)
How to query for the Logo records, which to not have the 'is_approved' value set?
I tried
logos.filter("is_approved = ", None)
but it didn't work.
In the Data Viewer the new field values are displayed as .

According to the App Engine documentation on Queries and Indexes, there is a distinction between entities that have no value for a property, and those that have a null value for it; and "Entities Without a Filtered Property Are Never Returned by a Query." So it is not possible to write a query for these old records.
A useful article is Updating Your Model's Schema, which says that the only currently-supported way to find entities missing some property is to examine all of them. The article has example code showing how to cycle through a large set of entities and update them.

A practice which helps us is to assign a "version" field on every Kind. This version is set on every record initially to 1. If a need like this comes up (to populate a new or existing field in a large dataset), the version field allows iteration through all the records containing "version = 1". By iterating through, setting either a "null" or another initial value to the new field, bump the version to 2, store the record, allows populating the new or existing field with a default value.
The benefit to the "version" field is that the selection process can continue to select against that lower version number (initially set to 1) over as many sessions or as much time is needed until ALL records are updated with the new field default value.

Maybe this has changed, but I am able to filter records based on null fields.
When I try the GQL query SELECT * FROM Contact WHERE demo=NULL, it returns only records for which the demo field is missing.
According to the doc http://code.google.com/appengine/docs/python/datastore/gqlreference.html:
The right-hand side of a comparison can be one of the following (as
appropriate for the property's data type): [...] a Boolean literal, as TRUE or
FALSE; the NULL literal, which represents the null value (None in
Python).
I'm not sure that "null" is the same as "missing" though : in my case, these fields already existed in my model but were not populated on creation. Maybe Federico you could let us know if the NULL query works in your specific case?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

No Results from Multi-property Projection Query in Google Cloud Datastore - python

Related

Django with Oracle DB - ORA-19011: Character string buffer too small

Unable to update document without specify Primary Key

How to query bigquery tables in google data studio with python-like string formatting in table names based on custom parameters?

python mysqlx.Result.get_autoincrement_value() doesn't work

AppEngine: Query datastore for records with <missing> value

Categories

Resources