I have an app engine Python NDB Model that looks like this:
class Car(ndb.Model)
name=ndb.StringProperty()
tags=ndb.IntegerProperty(repeated=True)
when I go to fetch a Car by key I use:
car = ndb.Key('Car', long(6079586488025088)).get()
when I do this I am seeing:
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/key.py", line 532, in get
return self.get_async(**ctx_options).get_result()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 325, in get_result
self.check_success()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 371, in _help_tasklet_along
value = gen.send(val)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/context.py", line 689, in get
pbs = entity._to_pb(set_key=False).SerializePartialToString()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 3052, in _to_pb
prop._serialize(self, pb, projection=self._projection)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1365, in _serialize
values = self._get_base_value_unwrapped_as_list(entity)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1135, in _get_base_value_unwrapped_as_list
wrapped = self._get_base_value(entity)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1123, in _get_base_value
return self._apply_to_values(entity, self._opt_call_to_base_type)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1295, in _apply_to_values
value[:] = map(function, value)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1177, in _opt_call_to_base_type
value = _BaseValue(self._call_to_base_type(value))
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1198, in _call_to_base_type
return call(value)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1274, in call
newvalue = method(self, value)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1536, in _validate
(value,))
BadValueError: Expected integer, got None
if I remove that property from the model definition it returns fine, so I know it's this property. In the datastore it is listed as having a null value for that field. Any idea why this is happening and how to deal with it? Thanks!
This generally occurs when you first have a single, non-repeated property and then convert it to a repeated property. When you initially do a put(), if you have not yet set the property it will fill the value with None. However, if you then turn it into a repeated property, ndb will read this and think you want [None]. Because None is not a valid IntegerProperty, trying to serialize and put() the data will fail.
In your example it fails on a get() because after doing a get() from the datastore it tries to serialize the data and put it in memcache.
Depending on your situation, you have a couple of options:
If you are only running in the devappserver, clear your datastore by running devappserver.py --clear_datastore
Do a search for all objects with a None value and replace them with the empty list. This might look something like this:
for c in Car.query(Car.tags=None):
c.tags=[]
c.put()
Note that you have to be careful about a few things here. First, make sure that c.tags only is [None] and not [a, b, c, None], just in case. Second, if you have a lot of Cars with no tags, you won't be able to handle fixing them all in the same request. You'll either want to run on a backend, or pass the data on to Tasks for processing.
This is pretty similar to #2, but if you have very little data you could use the Datastore viewer and simply resave the entities with tags = None.
Related
We use the async version of sqlalchemy and we need to eager load every relationship (lazy loading does not work for async). Accordingly for every relationship in our models we have set lazy='raise'. The problem is that it seems overly aggressive on raising errors. Consider the following unit test:
async def test_user_group_self_allowed(self):
privilege = await self.db.get(Privilege, 1, [joinedload(Privilege.role)])
options = [joinedload(Item.privileges).joinedload(Privilege.role), joinedload(Item.item_group)]
item = await self.db.get(Item, 1, options)
item.privileges.append(privilege)
await (self.db.commit())
options = [joinedload(User.user_groups).joinedload(UserGroup.privileges), joinedload(User.privileges)]
user = await self.db.get(User, 1, options)
user.privileges = []
item = await self.db.get(Item, 1, [joinedload(Item.privileges).joinedload(Privilege.role), joinedload(Item.item_group)])
user_group = await self.db.get(UserGroup, 1, [joinedload(UserGroup.organization)])
print('why?????', user_group.organization)
self.assertTrue(await self.helper.is_authorized(self.db, user, 'edit', user_group))
Notice the print, it results in the following error:
Traceback (most recent call last):
File "/usr/lib/python3.9/unittest/async_case.py", line 65, in _callTestMethod
self._callMaybeAsync(method)
File "/usr/lib/python3.9/unittest/async_case.py", line 88, in _callMaybeAsync
return self._asyncioTestLoop.run_until_complete(fut)
File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/usr/lib/python3.9/unittest/async_case.py", line 102, in _asyncioLoopRunner
ret = await awaitable
File "/src/backend-core/backend_core/tests/authorization.py", line 206, in test_user_group_self_allowed
print('why?????', user_group.organization)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/attributes.py", line 481, in __get__
return self.impl.get(state, dict_)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/attributes.py", line 926, in get
value = self._fire_loader_callables(state, key, passive)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/attributes.py", line 962, in _fire_loader_callables
return self.callable_(state, passive)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/strategies.py", line 836, in _load_for_state
self._invoke_raise_load(state, passive, "raise")
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/strategies.py", line 795, in _invoke_raise_load
raise sa_exc.InvalidRequestError(
sqlalchemy.exc.InvalidRequestError: 'UserGroup.organization' is not available due to lazy='raise'
As you can see it complains about the organization not being eager loaded while I clearly include it in the options with a joinedload. Now we can make this error go away by changing the options for the user query to:
options = [joinedload(User.user_groups).joinedload(UserGroup.privileges), joinedload(User.privileges), joinedload(User.user_groups).joinedload(UserGroup.organization)]
(the same options as before, only we add a joinedload for User -> UserGroups -> Organization)
This makes the error go away and everything is fine again. Now my question is, why does it complain about this to start with? I access user_group.organization not user.user_groups[x].organization.. I don't know how these queries work under the hood exactly, but not only do I have to write way too many joinedloads this way, I think it also results in needless querying.
As it turns out, .get caches more than I would expect. Not only the main entity (in this case usergroup) but also stuff loaded through a joinedload (user.user_group.organization). So that means a direct get on the organization does not overwrite the cached one from user.user_group.organization or usergroup.organization. However it is possible to do e.g. db.get(Organization, 1, populate_existing=True) which will retrieve the entity again and update the cache.
From the docs:
If the given primary key identifier is present in the local identity map, the object is returned directly from this collection and no SQL is emitted, unless the object has been marked fully expired....
...
populate_existing – causes the method to unconditionally emit a SQL query and refresh the object with the newly loaded data, regardless of whether or not the object is already present.
As it tells, another way is to expire an object, read more about that here
I'm new to using Gremlin + Neptune. I'm writing a small python program to cleanup some edges + nodes in a Neptune DB. To start, I want to delete all nodes + edges rooted at a particular node.
After looking around, I came across How to delete all child nodes when parent node deleted using the gremlin query?, and I tried the following:
g.V().has("id", <insert id>).union(__(), __.repeat(__.out()).emit()).drop()
However, the node with "id" still exists in the DB. When I try running the above command straight on the python3 console, I get the following output:
>>> g.V().has("id", <insert id>).union(__(), __.repeat(__.out()).emit()).drop()
[['V'], ['has', 'id', '5c4266a3a44649ddb24ce6cf4f831300'], ['union', <gremlin_python.process.graph_traversal.__ object at 0x7fa51c967fd0>, [['repeat', [['out']]], ['emit']]], ['drop']]
What's going on here? It almost looks like it's listing out the steps it plans to take, but not executing them.
Edit -- From answer below, I added .iterate() to the end. I get this error now:
Traceback (most recent call last):
...
tg.g.V().has("id", <id>).union(__(), __.repeat(__.out()).emit()).drop().iterate()
File "/home/vagrant/.cache/bazel/_bazel_vagrant/f9910d98673307a31f928c448bd4acd0/execroot/project/bazel-out/k8-fastbuild/bin/path/to/directory/python_code.runfiles/pip_deps/pypi__gremlinpython/gremlin_python/process/traversal.py", line 66, in iterate
try: self.nextTraverser()
File "/home/vagrant/.cache/bazel/_bazel_vagrant/f9910d98673307a31f928c448bd4acd0/execroot/project/bazel-out/k8-fastbuild/bin/path/to/directory/python_code.runfiles/pip_deps/pypi__gremlinpython/gremlin_python/process/traversal.py", line 71, in nextTraverser
self.traversal_strategies.apply_strategies(self)
File "/home/vagrant/.cache/bazel/_bazel_vagrant/f9910d98673307a31f928c448bd4acd0/execroot/project/bazel-out/k8-fastbuild/bin/path/to/directory/python_code.runfiles/pip_deps/pypi__gremlinpython/gremlin_python/process/traversal.py", line 573, in apply_strategies
traversal_strategy.apply(traversal)
File "/home/vagrant/.cache/bazel/_bazel_vagrant/f9910d98673307a31f928c448bd4acd0/execroot/project/bazel-out/k8-fastbuild/bin/path/to/directory/python_code.runfiles/pip_deps/pypi__gremlinpython/gremlin_python/driver/remote_connection.py", line 149, in apply
remote_traversal = self.remote_connection.submit(traversal.bytecode)
File "/home/vagrant/.cache/bazel/_bazel_vagrant/f9910d98673307a31f928c448bd4acd0/execroot/project/bazel-out/k8-fastbuild/bin/path/to/directory/python_code.runfiles/pip_deps/pypi__gremlinpython/gremlin_python/driver/driver_remote_connection.py", line 55, in submit
result_set = self._client.submit(bytecode)
File "/home/vagrant/.cache/bazel/_bazel_vagrant/f9910d98673307a31f928c448bd4acd0/execroot/project/bazel-out/k8-fastbuild/bin/path/to/directory/python_code.runfiles/pip_deps/pypi__gremlinpython/gremlin_python/driver/client.py", line 111, in submit
return self.submitAsync(message, bindings=bindings).result()
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 428, in result
return self.__get_result()
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/home/vagrant/.cache/bazel/_bazel_vagrant/f9910d98673307a31f928c448bd4acd0/execroot/project/bazel-out/k8-fastbuild/bin/path/to/directory/python_code.runfiles/pip_deps/pypi__gremlinpython/gremlin_python/driver/connection.py", line 66, in cb
f.result()
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 428, in result
return self.__get_result()
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/vagrant/.cache/bazel/_bazel_vagrant/f9910d98673307a31f928c448bd4acd0/execroot/project/bazel-out/k8-fastbuild/bin/path/to/directory/python_code.runfiles/pip_deps/pypi__gremlinpython/gremlin_python/driver/protocol.py", line 74, in write
request_id, request_message)
File "/home/vagrant/.cache/bazel/_bazel_vagrant/f9910d98673307a31f928c448bd4acd0/execroot/project/bazel-out/k8-fastbuild/bin/path/to/directory/python_code.runfiles/pip_deps/pypi__gremlinpython/gremlin_python/driver/serializer.py", line 132, in serialize_message
message = self.build_message(request_id, processor, op, args)
File "/home/vagrant/.cache/bazel/_bazel_vagrant/f9910d98673307a31f928c448bd4acd0/execroot/project/bazel-out/k8-fastbuild/bin/path/to/directory/python_code.runfiles/pip_deps/pypi__gremlinpython/gremlin_python/driver/serializer.py", line 142, in build_message
return self.finalize_message(message, b"\x21", self.version)
File "/home/vagrant/.cache/bazel/_bazel_vagrant/f9910d98673307a31f928c448bd4acd0/execroot/project/bazel-out/k8-fastbuild/bin/path/to/directory/python_code.runfiles/pip_deps/pypi__gremlinpython/gremlin_python/driver/serializer.py", line 145, in finalize_message
message = json.dumps(message)
File "/usr/lib/python3.7/json/__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "/usr/lib/python3.7/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python3.7/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/usr/lib/python3.7/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type __ is not JSON serializable
Unlike the Gremlin Console, the Python Console doesn't automatically iterate your traversals for you. As in your application code you must iterate them yourself. When you do
g.V().has("id", <insert id>).union(__.identity(), __.repeat(__.out()).emit()).drop()
that simply spawns a Traversal object but does not execute it. You must therefore iterate it in some way to exhaust the elements within it - in your case, the appropriate terminator to use would be iterate():
g.V().has("id", <insert id>).union(__.identity(), __.repeat(__.out()).emit()).drop().iterate()
The semantics around drops in TinkerPop aren't always consistent unfortunately. TinkerPop tried to preserve flexibility for providers in how they implement that but it causes confusion sometimes because a query will work fine in TinkerGraph but then behave slightly differently when executed on a different provider. If the above approach only drops the root, you could try to realize the results before the drop:
g.V().has("id", <insert id>).
union(__.identity(),
__.repeat(__.out()).emit()).
fold().unfold().
drop()
Looks a bit dumb but will force all the vertices you wish to drop to be traversed into a list before they are dropped. In that way, you won't kill the repeat() by dropping the parent and its edges first and leaving the child paths untravelled.
I want my deferred task to only try once more after failing.
After reading this related question: Specifying retry limit for tasks queued using GAE deferred library it was obvious that I needed to follow the accepted answer, so I modified my code to look like this:
from google.appengine.ext import deferred
deferred.defer(MyFunction, DATA, _retry_options={'task_retry_limit': 1})
Now I get this error:
File "/usr/local/google-cloud-sdk/platform/google_appengine/google/appengine/ext/deferred/deferred.py", line 269, in defer
return task.add(queue, transactional=transactional)
File "/usr/local/google-cloud-sdk/platform/google_appengine/google/appengine/api/taskqueue/taskqueue.py", line 1143, in add
return self.add_async(queue_name, transactional).get_result()
File "/usr/local/google-cloud-sdk/platform/google_appengine/google/appengine/api/taskqueue/taskqueue.py", line 1139, in add_async
return Queue(queue_name).add_async(self, transactional, rpc)
File "/usr/local/google-cloud-sdk/platform/google_appengine/google/appengine/api/taskqueue/taskqueue.py", line 1889, in add_async
rpc)
File "/usr/local/google-cloud-sdk/platform/google_appengine/google/appengine/api/taskqueue/taskqueue.py", line 2008, in __AddTasks
fill_request(task, request.add_add_request(), transactional)
File "/usr/local/google-cloud-sdk/platform/google_appengine/google/appengine/api/taskqueue/taskqueue.py", line 2093, in __FillAddPushTasksRequest
task.retry_options, task_request.mutable_retry_parameters())
File "/usr/local/google-cloud-sdk/platform/google_appengine/google/appengine/api/taskqueue/taskqueue.py", line 2033, in __FillTaskQueueRetryParameters
if retry_options.min_backoff_seconds is not None:
AttributeError: 'dict' object has no attribute 'min_backoff_seconds'
Obviously im making a silly mistake, I just can't figure out what it is.
You need to pass a TaskRetryOptions instance, not just a dict:
from google.appengine.ext import deferred
from google.appengine.api.taskqueue import TaskRetryOptions
options = TaskRetryOptions(task_retry_limit=1)
deferred.defer(MyFunction, DATA, _retry_options=options)
I get this error when trying to use only with select_related.
FieldError: Invalid field name(s) given in select_related: 'userinfo'. Choices are: userinfo
It's a little strange that it reports the field I'm trying to select as an error. Here is my query:
users_with_schools = User.objects.select_related('userinfo').only(
"id",
"date_joined",
"userinfo__last_coordinates_id",
"userinfo__school_id"
).filter(
userinfo__school_id__isnull=False,
date_joined__gte=start_date
)
I've been able to use select_related with only in other places in my code so I am not sure why this is happening.
Edit: Here is the full traceback
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "env/lib/python2.7/site-packages/django/db/models/query.py", line 138, in __repr__
data = list(self[:REPR_OUTPUT_SIZE + 1])
File "env/lib/python2.7/site-packages/django/db/models/query.py", line 162, in __iter__
self._fetch_all()
File "env/lib/python2.7/site-packages/django/db/models/query.py", line 965, in _fetch_all
self._result_cache = list(self.iterator())
File "env/lib/python2.7/site-packages/django/db/models/query.py", line 238, in iterator
results = compiler.execute_sql()
File "env/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 818, in execute_sql
sql, params = self.as_sql()
File "env/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 367, in as_sql
extra_select, order_by, group_by = self.pre_sql_setup()
File "env/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 48, in pre_sql_setup
self.setup_query()
File "env/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 39, in setup_query
self.select, self.klass_info, self.annotation_col_map = self.get_select()
File "env/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 203, in get_select
related_klass_infos = self.get_related_selections(select)
File "env/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 743, in get_related_selections
', '.join(_get_field_choices()) or '(none)',
FieldError: Invalid field name(s) given in select_related: 'userinfo'. Choices are: userinfo
From the documentation:
All of the cautions in the note for the defer() documentation apply to only() as well. Use it cautiously and only after exhausting your other options.
...
Using only() and omitting a field requested using select_related() is an error as well.
select_related will try to fetch all the columns for userinfo. As described above, trying to restrict the set of columns to specific ones will result in an error - the combination of select_related and only does not support that.
It's probably worth noting the comment that goes with these methods:
The defer() method (and its cousin, only(), below) are only for advanced use-cases. They provide an optimization for when you have analyzed your queries closely and understand exactly what information you need and have measured that the difference between returning the fields you need and the full set of fields for the model will be significant.
Edit: you mention in a comment below that the following query, used elsewhere in your code, seems to work fine:
ChatUser.objects.select_related("user__userinfo").\
only( "id", "chat_id", "user__id", "user__username", "user__userinfo__id" )
My best guess is that you are hitting this bug in Django (fixed in 1.10).
I suppose the easiest way to verify is to check the SQL query generated by the queryset that seems to work. My guess is that you will find that it isn't actually querying everything in one go and that there are additional queries when you try to access the related model that you asked to be fetched in select_related.
Over the last 48 hours or so my small python GAE app has started getting AssertionErrors from ndb.get_multi calls.
Full traceback is appended, and the errors are being generated on the production server in _BaseValue's __init__ on line 734 of /base/data/.../ndb/model.py, and the failing assertion is b_val is not None with message "Cannot wrap None"
The error doesn't appear to be related to a particular entity or entities, but I've only seen it with one entity type so far (yet to test others).
The get_multi call is for only up to a dozen keys, and the error is intermittent so that repeating it will sometimes succeed. Or not...
I'm not seeing this error via remote shell, but I note that my local install is 1.9.23 while the log entry says the production server is 1.9.25 (GoogleAppEngineLauncher says my local install is up to date)
I'm adding a workaround to catch the exception and iterate through the keys to get them individually but I'm still seeing an upstream warning about a "suspended generator get" on line 744 of context.py.
The warning appears on the first get of this entity type from the list, for at least 2 different lists of keys (as well as preceding the AssertionError).
I don't want to have to wrap all get_multi calls in this way.
What's going on?
TRACEBACK:
Cannot wrap None
Traceback (most recent call last):
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 570, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/s~thegapnetball/115.386356111937586421/handlers/assess.py", line 50, in get
rs = ndb.get_multi(t.players)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 3905, in get_multi
for future in get_multi_async(keys, **ctx_options)]
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 326, in get_result
self.check_success()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 372, in _help_tasklet_along
value = gen.send(val)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/context.py", line 751, in get
pbs = entity._to_pb(set_key=False).SerializePartialToString()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 3147, in _to_pb
prop._serialize(self, pb, projection=self._projection)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 2379, in _serialize
projection=projection)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1405, in _serialize
values = self._get_base_value_unwrapped_as_list(entity)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1175, in _get_base_value_unwrapped_as_list
wrapped = self._get_base_value(entity)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1163, in _get_base_value
return self._apply_to_values(entity, self._opt_call_to_base_type)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1335, in _apply_to_values
value[:] = map(function, value)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1217, in _opt_call_to_base_type
value = _BaseValue(self._call_to_base_type(value))
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 734, in \__init__
assert b_val is not None, "Cannot wrap None"
AssertionError: Cannot wrap None
Tim Hoffman and Patrick Costello put me on the right track to solve this.
I incremented version to protect some changes but took longer to finish than I expected.
One change added a repeated StructuredProperty to a model derived from ndb.Model, and I put several entities with the extra property (about 30 out of 1100 total).
The previous version without the extra property was still the default and was being lightly used, so the entities became just inconsistent enough to produce the intermittent AssertionError.
The main lesson is to take note of the recommendations in Google's schema update article, particularly changing the underlying parent to Expando and/or disabling datastore edits until any migration is complete.
https://cloud.google.com/appengine/articles/update_schema
The fix was to add the property to the previous version, get all the entities and then put them.
Thanks Tim and Patrick for the pointer!