SQLAlchemy: lazy='raise' complaining about fields even though they are loaded

SQLAlchemy: lazy='raise' complaining about fields even though they are loaded - python

We use the async version of sqlalchemy and we need to eager load every relationship (lazy loading does not work for async). Accordingly for every relationship in our models we have set lazy='raise'. The problem is that it seems overly aggressive on raising errors. Consider the following unit test:
async def test_user_group_self_allowed(self):
privilege = await self.db.get(Privilege, 1, [joinedload(Privilege.role)])
options = [joinedload(Item.privileges).joinedload(Privilege.role), joinedload(Item.item_group)]
item = await self.db.get(Item, 1, options)
item.privileges.append(privilege)
await (self.db.commit())
options = [joinedload(User.user_groups).joinedload(UserGroup.privileges), joinedload(User.privileges)]
user = await self.db.get(User, 1, options)
user.privileges = []
item = await self.db.get(Item, 1, [joinedload(Item.privileges).joinedload(Privilege.role), joinedload(Item.item_group)])
user_group = await self.db.get(UserGroup, 1, [joinedload(UserGroup.organization)])
print('why?????', user_group.organization)
self.assertTrue(await self.helper.is_authorized(self.db, user, 'edit', user_group))
Notice the print, it results in the following error:
Traceback (most recent call last):
File "/usr/lib/python3.9/unittest/async_case.py", line 65, in _callTestMethod
self._callMaybeAsync(method)
File "/usr/lib/python3.9/unittest/async_case.py", line 88, in _callMaybeAsync
return self._asyncioTestLoop.run_until_complete(fut)
File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/usr/lib/python3.9/unittest/async_case.py", line 102, in _asyncioLoopRunner
ret = await awaitable
File "/src/backend-core/backend_core/tests/authorization.py", line 206, in test_user_group_self_allowed
print('why?????', user_group.organization)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/attributes.py", line 481, in __get__
return self.impl.get(state, dict_)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/attributes.py", line 926, in get
value = self._fire_loader_callables(state, key, passive)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/attributes.py", line 962, in _fire_loader_callables
return self.callable_(state, passive)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/strategies.py", line 836, in _load_for_state
self._invoke_raise_load(state, passive, "raise")
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/strategies.py", line 795, in _invoke_raise_load
raise sa_exc.InvalidRequestError(
sqlalchemy.exc.InvalidRequestError: 'UserGroup.organization' is not available due to lazy='raise'
As you can see it complains about the organization not being eager loaded while I clearly include it in the options with a joinedload. Now we can make this error go away by changing the options for the user query to:
options = [joinedload(User.user_groups).joinedload(UserGroup.privileges), joinedload(User.privileges), joinedload(User.user_groups).joinedload(UserGroup.organization)]
(the same options as before, only we add a joinedload for User -> UserGroups -> Organization)
This makes the error go away and everything is fine again. Now my question is, why does it complain about this to start with? I access user_group.organization not user.user_groups[x].organization.. I don't know how these queries work under the hood exactly, but not only do I have to write way too many joinedloads this way, I think it also results in needless querying.

As it turns out, .get caches more than I would expect. Not only the main entity (in this case usergroup) but also stuff loaded through a joinedload (user.user_group.organization). So that means a direct get on the organization does not overwrite the cached one from user.user_group.organization or usergroup.organization. However it is possible to do e.g. db.get(Organization, 1, populate_existing=True) which will retrieve the entity again and update the cache.
From the docs:
If the given primary key identifier is present in the local identity map, the object is returned directly from this collection and no SQL is emitted, unless the object has been marked fully expired....
...
populate_existing – causes the method to unconditionally emit a SQL query and refresh the object with the newly loaded data, regardless of whether or not the object is already present.
As it tells, another way is to expire an object, read more about that here

Related

Neptune InternalFailureException: Can not get the attachable from the host vertex

I am using neptune's graph database with gremlin queries through python, to store addresses in a database. Most of the queries execute fine, but once i try the following query neptune returns a internal failure exception:
g.V(address).outE('isPartOf').inV().
dedup().as_('groupNode').
inE('isPartOf').outV().dedup().as_('children').
addE('isPartOf').to(group).
select('groupNode').drop().
fold().
coalesce(__.unfold(),
g.V(address).addE('isPartOf').to(group)).next()
Every address has the possibility to belong to a group. when the address is already assigned to a group, i try to take all addresses assigned to that group and assign them to a new group, while deleting the old group. If the address is not yet assigned to a group i simply want to assign the address to the new group immediately.
If i try this query on it's own everything executes perfectly (although it is a bit of a slow query). However once i try to execute this query in parallel on more addresse this query fails with the following error:
Traceback (most recent call last):
File "/usr/lib64/python2.7/threading.py", line 804, in __bootstrap_inner
self.run()
File "gremlinExample.py", line 30, in run
processTx(self.tx, self.g, self.parentBlock)
File "gremlinExample.py", line 152, in processTx
g.V(address).outE('isPartOf').inV().dedup().as_('groupNode').inE('isPartOf').outV().dedup().as_('children').select('children').addE('isPartOf').to(group).select('groupNode').drop().fold().coalesce(__.unfold(), g.V(address).addE('isPartOf').to(group)).next()
File "/home/ec2-user/.local/lib/python2.7/site-packages/gremlin_python/process/traversal.py", line 70, in next
return self.__next__()
File "/home/ec2-user/.local/lib/python2.7/site-packages/gremlin_python/process/traversal.py", line 43, in __next__
self.traversal_strategies.apply_strategies(self)
File "/home/ec2-user/.local/lib/python2.7/site-packages/gremlin_python/process/traversal.py", line 346, in apply_strategies
traversal_strategy.apply(traversal)
File "/home/ec2-user/.local/lib/python2.7/site-packages/gremlin_python/driver/remote_connection.py", line 143, in apply
remote_traversal = self.remote_connection.submit(traversal.bytecode)
File "/home/ec2-user/.local/lib/python2.7/site-packages/gremlin_python/driver/driver_remote_connection.py", line 54, in submit
results = result_set.all().result()
File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 405, in result
return self.__get_result()
File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 357, in __get_result
raise type(self._exception), self._exception, self._traceback
GremlinServerError: 500: {"requestId":"a42015b7-6b22-4bd1-9e7d-e3252e8f3ab6","code":"InternalFailureException","detailedMessage":"Can not get the attachable from the host vertex: v[64b32957-ef71-be47-c8d7-0109cfc4d9fd]-/->neptunegraph[org.apache.commons.configuration.PropertiesConfiguration#6db0f02]"}
To my knowledge execution in parallel shouldn't be the problem, since every query simply get's queued at the database (exactly for this reason i tried to create a query which executes the whole task at once).
Excuses for any bad English, it is not my native language

For anyone else who's looking for an update here - the OP was able to resolve the issue by replacing .next() with a .iterate(). Some followups were needed to understand the query and data better, but the OP has abandoned the project and moved to another solution.

ndb.get_multi returning AssertionError

Over the last 48 hours or so my small python GAE app has started getting AssertionErrors from ndb.get_multi calls.
Full traceback is appended, and the errors are being generated on the production server in _BaseValue's __init__ on line 734 of /base/data/.../ndb/model.py, and the failing assertion is b_val is not None with message "Cannot wrap None"
The error doesn't appear to be related to a particular entity or entities, but I've only seen it with one entity type so far (yet to test others).
The get_multi call is for only up to a dozen keys, and the error is intermittent so that repeating it will sometimes succeed. Or not...
I'm not seeing this error via remote shell, but I note that my local install is 1.9.23 while the log entry says the production server is 1.9.25 (GoogleAppEngineLauncher says my local install is up to date)
I'm adding a workaround to catch the exception and iterate through the keys to get them individually but I'm still seeing an upstream warning about a "suspended generator get" on line 744 of context.py.
The warning appears on the first get of this entity type from the list, for at least 2 different lists of keys (as well as preceding the AssertionError).
I don't want to have to wrap all get_multi calls in this way.
What's going on?
TRACEBACK:
Cannot wrap None
Traceback (most recent call last):
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 570, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/s~thegapnetball/115.386356111937586421/handlers/assess.py", line 50, in get
rs = ndb.get_multi(t.players)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 3905, in get_multi
for future in get_multi_async(keys, **ctx_options)]
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 326, in get_result
self.check_success()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 372, in _help_tasklet_along
value = gen.send(val)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/context.py", line 751, in get
pbs = entity._to_pb(set_key=False).SerializePartialToString()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 3147, in _to_pb
prop._serialize(self, pb, projection=self._projection)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 2379, in _serialize
projection=projection)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1405, in _serialize
values = self._get_base_value_unwrapped_as_list(entity)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1175, in _get_base_value_unwrapped_as_list
wrapped = self._get_base_value(entity)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1163, in _get_base_value
return self._apply_to_values(entity, self._opt_call_to_base_type)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1335, in _apply_to_values
value[:] = map(function, value)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1217, in _opt_call_to_base_type
value = _BaseValue(self._call_to_base_type(value))
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 734, in \__init__
assert b_val is not None, "Cannot wrap None"
AssertionError: Cannot wrap None

Tim Hoffman and Patrick Costello put me on the right track to solve this.
I incremented version to protect some changes but took longer to finish than I expected.
One change added a repeated StructuredProperty to a model derived from ndb.Model, and I put several entities with the extra property (about 30 out of 1100 total).
The previous version without the extra property was still the default and was being lightly used, so the entities became just inconsistent enough to produce the intermittent AssertionError.
The main lesson is to take note of the recommendations in Google's schema update article, particularly changing the underlying parent to Expando and/or disabling datastore edits until any migration is complete.
https://cloud.google.com/appengine/articles/update_schema
The fix was to add the property to the previous version, get all the entities and then put them.
Thanks Tim and Patrick for the pointer!

Putting models in a callback function from ctypes library

I am trying to setup an application based on the Google App Engine using the Managed VM feature.
I am using a shared library written in C++ using ctypes
cdll.LoadLibrary('./mylib.so')
which registers a callback function
CB_FUNC_TYPE = CFUNCTYPE(None, eSubscriptionType)
cbFuncType = CB_FUNC_TYPE(scrptCallbackHandler)
in which i want to save data to the ndb datastore
def scrptCallbackHandler(arg):
model = Model(name=str(arg.data))
model.put()
I am registering a callback function in which i want to take the Data from the C++ program and put it in the ndb datastore. This results in an error. On the devserver it behaves slightly different, so from a production server:
suspended generator _put_tasklet(context.py:343) raised BadRequestError(Application Id (app) format is invalid: '_')LOG 2 1429698464071045 suspended generator put(context.py:810) raised BadRequestError(Application Id (app) format is invalid: '_')
Traceback (most recent call last):
File "_ctypes/callbacks.c", line 314, in 'calling callback function' File "/home/vmagent/app/isw_cloud_client.py", line 343, in scrptCallbackHandler node.put()
File "/home/vmagent/python_vm_runtime/google/appengine/ext/ndb/model.py", line 3380, in _put return self._put_async(**ctx_options).get_result()
File "/home/vmagent/python_vm_runtime/google/appengine/ext/ndb/tasklets.py", line 325, in get_result self.check_success()
File "/home/vmagent/python_vm_runtime/google/appengine/ext/ndb/tasklets.py", line 368, in _help_tasklet_along value = gen.throw(exc.__class__, exc, tb)
File "/home/vmagent/python_vm_runtime/google/appengine/ext/ndb/context.py", line 810, in put key = yield self._put_batcher.add(entity, options)
File "/home/vmagent/python_vm_runtime/google/appengine/ext/ndb/tasklets.py", line 368, in _help_tasklet_along value = gen.throw(exc.__class__, exc, tb)
File "/home/vmagent/python_vm_runtime/google/appengine/ext/ndb/context.py", line 343, in _put_tasklet keys = yield self._conn.async_put(options, datastore_entities)
File "/home/vmagent/python_vm_runtime/google/appengine/ext/ndb/tasklets.py", line 454, in _on_rpc_completion result = rpc.get_result()
File "/home/vmagent/python_vm_runtime/google/appengine/api/apiproxy_stub_map.py", line 613, in get_result return self.__get_result_hook(self)
File "/home/vmagent/python_vm_runtime/google/appengine/datastore/datastore_rpc.py", line 1827, in __put_hook self.check_rpc_success(rpc)
File "/home/vmagent/python_vm_runtime/google/appengine/datastore/datastore_rpc.py", line 1342, in check_rpc_success raise _ToDatastoreError(err)google.appengine.api.datastore_errors.BadRequestError: Application Id (app) format is invalid: '_'
The start of the C++ program is triggered by a call to a Request handler but runs in the background and accepts incoming data which should be processed in the callback.
Update: As Tim pointed out already it seems that the context of the wsgi handler is lost. Most likely the solution here would be to create the application context somehow.

I am only guessing what is my problem and i want to tell what i did to solve it.
The execution context of the callback functions is somewhat different than the rest of the python application. Any asynchronous operation in the callback fails. I tried doing an http call or saving it to the datastore. The operations never finish and after 60s the application shows an error that they crashed. I guess this is because how the python manages the execution and the corresponding memory allocation.
I was able to execute the callback in an object's context by wrapping it in a closure within a class. This wasnt really the problem but the solution can be found in this answer: How can I get methods to work as callbacks with python ctypes?
For my solution i am now using a combination of cloud-endpoints on another module and background threads on the ctypes-module.
Within the C-Callback i start a background thread, which is able to do asynchronous work
# Start a background thread using the background thread service from GAE
background_thread.start_new_background_thread(putData, [name, value])
And here the simple task it executes:
# Here i call my cloud-endpoints
def putData(name, value):
body = {
'name' : 'name',
'value' : int(value)
}
res = service.objects().create(body=body).execute()
Of course i need to do error handling and additional stuff, but for me this is a good solution.
Note: Adding models to the datastore in the bg thread failed because the environment in the bg thread is different from the application and the app id was not set.

Error when fetch Python NDB with repeated integer property

I have an app engine Python NDB Model that looks like this:
class Car(ndb.Model)
name=ndb.StringProperty()
tags=ndb.IntegerProperty(repeated=True)
when I go to fetch a Car by key I use:
car = ndb.Key('Car', long(6079586488025088)).get()
when I do this I am seeing:
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/key.py", line 532, in get
return self.get_async(**ctx_options).get_result()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 325, in get_result
self.check_success()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 371, in _help_tasklet_along
value = gen.send(val)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/context.py", line 689, in get
pbs = entity._to_pb(set_key=False).SerializePartialToString()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 3052, in _to_pb
prop._serialize(self, pb, projection=self._projection)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1365, in _serialize
values = self._get_base_value_unwrapped_as_list(entity)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1135, in _get_base_value_unwrapped_as_list
wrapped = self._get_base_value(entity)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1123, in _get_base_value
return self._apply_to_values(entity, self._opt_call_to_base_type)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1295, in _apply_to_values
value[:] = map(function, value)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1177, in _opt_call_to_base_type
value = _BaseValue(self._call_to_base_type(value))
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1198, in _call_to_base_type
return call(value)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1274, in call
newvalue = method(self, value)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1536, in _validate
(value,))
BadValueError: Expected integer, got None
if I remove that property from the model definition it returns fine, so I know it's this property. In the datastore it is listed as having a null value for that field. Any idea why this is happening and how to deal with it? Thanks!

This generally occurs when you first have a single, non-repeated property and then convert it to a repeated property. When you initially do a put(), if you have not yet set the property it will fill the value with None. However, if you then turn it into a repeated property, ndb will read this and think you want [None]. Because None is not a valid IntegerProperty, trying to serialize and put() the data will fail.
In your example it fails on a get() because after doing a get() from the datastore it tries to serialize the data and put it in memcache.
Depending on your situation, you have a couple of options:
If you are only running in the devappserver, clear your datastore by running devappserver.py --clear_datastore
Do a search for all objects with a None value and replace them with the empty list. This might look something like this:
for c in Car.query(Car.tags=None):
c.tags=[]
c.put()
Note that you have to be careful about a few things here. First, make sure that c.tags only is [None] and not [a, b, c, None], just in case. Second, if you have a lot of Cars with no tags, you won't be able to handle fixing them all in the same request. You'll either want to run on a backend, or pass the data on to Tasks for processing.
This is pretty similar to #2, but if you have very little data you could use the Datastore viewer and simply resave the entities with tags = None.

changing django default model settings

I'm just starting with the django creating your own app tutorial (creating a Poll) I'm deviating slightly as I'm wanting to create an app using my own database model that already exists.
And in the tutorial it says
Table names are automatically
generated by combining the name of
the app (polls) and the lowercase
name of the model -- poll and choice.
(You can override this behavior.)
Primary keys (IDs) are added
automatically. (You can override
this, too.)
By convention, Django appends
"_id" to the foreign key field
name. Yes, you can override this,
as well.
But I can't see where it mentions how you can override this behaviour? I've defined my model as so
from django.db import models
# Create your models here.
class Channels(models.Model):
channelid = models.IntegerField()
channelid.primary_key = True
channelname = models.CharField(max_length=50)
Now when I go in to the shell this is what I get
>>> from tvlistings.models import *
>>> Channels.objects.all()
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "C:\Python26\lib\site-packages\django\db\models\query.py", line 67, in __
repr__
data = list(self[:REPR_OUTPUT_SIZE + 1])
File "C:\Python26\lib\site-packages\django\db\models\query.py", line 82, in __
len__
self._result_cache.extend(list(self._iter))
File "C:\Python26\lib\site-packages\django\db\models\query.py", line 271, in i
terator
for row in compiler.results_iter():
File "C:\Python26\lib\site-packages\django\db\models\sql\compiler.py", line 67
7, in results_iter
for rows in self.execute_sql(MULTI):
File "C:\Python26\lib\site-packages\django\db\models\sql\compiler.py", line 73
2, in execute_sql
cursor.execute(sql, params)
File "C:\Python26\lib\site-packages\django\db\backends\util.py", line 15, in e
xecute
return self.cursor.execute(sql, params)
File "C:\Python26\lib\site-packages\django\db\backends\mysql\base.py", line 86
, in execute
return self.cursor.execute(query, args)
File "C:\Python26\lib\site-packages\MySQLdb\cursors.py", line 166, in execute
self.errorhandler(self, exc, value)
File "C:\Python26\lib\site-packages\MySQLdb\connections.py", line 35, in defau
lterrorhandler
raise errorclass, errorvalue
DatabaseError: (1146, "Table 'tvlistings.tvlistings_channels' doesn't exist")
Obviously it can't find the table tvlistings_channels as it's actually called channels. So how do I change the default?

You can override Model behavior in Django largely through the use of an inner Meta class
db_table allows you to rename the table name
specifying another field as the primary key will have it use that rather than a surrogate key (not in the Meta class, just in the model itself)

You should try and work your way all through the tutorial before you try and customise things. All these things are covered in the actual documentation, but it's best to have a basic understanding of things first before diving into that.
FWIW, here are the docs on defining your own primary key and specifying a table name. But really, do the tutorial as written first.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.