Syntax error while running mongo-connector between MongoDB and Neo4J

Syntax error while running mongo-connector between MongoDB and Neo4J - python

I am using mongo-connector to do the initial bulk_upsert operation between MongoDB and Neo4J. At some point while querying with py2neo, the InvalidSyntax exception is occurring due to which nothing is being inserted into graph database. I believe the issue lies somewhere in the DocManager during syntax translations. I am running py2neo v2.0.8 and Neo4J v2.3.1.
Here is the detailed stack trace:
Exception in thread Thread-2:
Traceback (most recent call last):
File "//anaconda/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "//anaconda/lib/python2.7/site-packages/mongo_connector/util.py", line 85, in wrapped
func(*args, **kwargs)
File "//anaconda/lib/python2.7/site-packages/mongo_connector/oplog_manager.py", line 256, in run
docman.upsert(doc, ns, timestamp)
File "//anaconda/lib/python2.7/site-packages/mongo_connector/doc_managers/neo4j_doc_manager.py", line 66, in upsert
tx.commit()
File "//anaconda/lib/python2.7/site-packages/py2neo/cypher/core.py", line 333, in commit
return self.post(self.__commit or self.__begin_commit)
File "//anaconda/lib/python2.7/site-packages/py2neo/cypher/core.py", line 288, in post
raise self.error_class.hydrate(error)
InvalidSyntax: Invalid input '{': expected whitespace, comment or a label name (line 1, column 20 (offset: 19))
"MERGE (d:Document: { _id: {parameters}._id})"
What could be happening here?

Thanks for reporting this.
Neo4j Doc Manager uses a key naming convention of xxx_id to identify relationships, where the value of a property with key xxx_id is assumed to be an id referencing a document in collection xxx. This convention allows us to define relationships from the document data model. I'm assuming that the error here is caused by Neo4j Doc Manager treating the nested document's _id field as a relationship, but not checking for a null collection name (since nothing appears before "_id" in the key).
This is a bug and we'll add a check for this to avoid the Cypher syntax error. Those interested can track the issue here: https://github.com/neo4j-contrib/neo4j_doc_manager/issues/56

Related

SQLAlchemy: lazy='raise' complaining about fields even though they are loaded

We use the async version of sqlalchemy and we need to eager load every relationship (lazy loading does not work for async). Accordingly for every relationship in our models we have set lazy='raise'. The problem is that it seems overly aggressive on raising errors. Consider the following unit test:
async def test_user_group_self_allowed(self):
privilege = await self.db.get(Privilege, 1, [joinedload(Privilege.role)])
options = [joinedload(Item.privileges).joinedload(Privilege.role), joinedload(Item.item_group)]
item = await self.db.get(Item, 1, options)
item.privileges.append(privilege)
await (self.db.commit())
options = [joinedload(User.user_groups).joinedload(UserGroup.privileges), joinedload(User.privileges)]
user = await self.db.get(User, 1, options)
user.privileges = []
item = await self.db.get(Item, 1, [joinedload(Item.privileges).joinedload(Privilege.role), joinedload(Item.item_group)])
user_group = await self.db.get(UserGroup, 1, [joinedload(UserGroup.organization)])
print('why?????', user_group.organization)
self.assertTrue(await self.helper.is_authorized(self.db, user, 'edit', user_group))
Notice the print, it results in the following error:
Traceback (most recent call last):
File "/usr/lib/python3.9/unittest/async_case.py", line 65, in _callTestMethod
self._callMaybeAsync(method)
File "/usr/lib/python3.9/unittest/async_case.py", line 88, in _callMaybeAsync
return self._asyncioTestLoop.run_until_complete(fut)
File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/usr/lib/python3.9/unittest/async_case.py", line 102, in _asyncioLoopRunner
ret = await awaitable
File "/src/backend-core/backend_core/tests/authorization.py", line 206, in test_user_group_self_allowed
print('why?????', user_group.organization)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/attributes.py", line 481, in __get__
return self.impl.get(state, dict_)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/attributes.py", line 926, in get
value = self._fire_loader_callables(state, key, passive)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/attributes.py", line 962, in _fire_loader_callables
return self.callable_(state, passive)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/strategies.py", line 836, in _load_for_state
self._invoke_raise_load(state, passive, "raise")
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/strategies.py", line 795, in _invoke_raise_load
raise sa_exc.InvalidRequestError(
sqlalchemy.exc.InvalidRequestError: 'UserGroup.organization' is not available due to lazy='raise'
As you can see it complains about the organization not being eager loaded while I clearly include it in the options with a joinedload. Now we can make this error go away by changing the options for the user query to:
options = [joinedload(User.user_groups).joinedload(UserGroup.privileges), joinedload(User.privileges), joinedload(User.user_groups).joinedload(UserGroup.organization)]
(the same options as before, only we add a joinedload for User -> UserGroups -> Organization)
This makes the error go away and everything is fine again. Now my question is, why does it complain about this to start with? I access user_group.organization not user.user_groups[x].organization.. I don't know how these queries work under the hood exactly, but not only do I have to write way too many joinedloads this way, I think it also results in needless querying.

As it turns out, .get caches more than I would expect. Not only the main entity (in this case usergroup) but also stuff loaded through a joinedload (user.user_group.organization). So that means a direct get on the organization does not overwrite the cached one from user.user_group.organization or usergroup.organization. However it is possible to do e.g. db.get(Organization, 1, populate_existing=True) which will retrieve the entity again and update the cache.
From the docs:
If the given primary key identifier is present in the local identity map, the object is returned directly from this collection and no SQL is emitted, unless the object has been marked fully expired....
...
populate_existing – causes the method to unconditionally emit a SQL query and refresh the object with the newly loaded data, regardless of whether or not the object is already present.
As it tells, another way is to expire an object, read more about that here

Neptune InternalFailureException: Can not get the attachable from the host vertex

I am using neptune's graph database with gremlin queries through python, to store addresses in a database. Most of the queries execute fine, but once i try the following query neptune returns a internal failure exception:
g.V(address).outE('isPartOf').inV().
dedup().as_('groupNode').
inE('isPartOf').outV().dedup().as_('children').
addE('isPartOf').to(group).
select('groupNode').drop().
fold().
coalesce(__.unfold(),
g.V(address).addE('isPartOf').to(group)).next()
Every address has the possibility to belong to a group. when the address is already assigned to a group, i try to take all addresses assigned to that group and assign them to a new group, while deleting the old group. If the address is not yet assigned to a group i simply want to assign the address to the new group immediately.
If i try this query on it's own everything executes perfectly (although it is a bit of a slow query). However once i try to execute this query in parallel on more addresse this query fails with the following error:
Traceback (most recent call last):
File "/usr/lib64/python2.7/threading.py", line 804, in __bootstrap_inner
self.run()
File "gremlinExample.py", line 30, in run
processTx(self.tx, self.g, self.parentBlock)
File "gremlinExample.py", line 152, in processTx
g.V(address).outE('isPartOf').inV().dedup().as_('groupNode').inE('isPartOf').outV().dedup().as_('children').select('children').addE('isPartOf').to(group).select('groupNode').drop().fold().coalesce(__.unfold(), g.V(address).addE('isPartOf').to(group)).next()
File "/home/ec2-user/.local/lib/python2.7/site-packages/gremlin_python/process/traversal.py", line 70, in next
return self.__next__()
File "/home/ec2-user/.local/lib/python2.7/site-packages/gremlin_python/process/traversal.py", line 43, in __next__
self.traversal_strategies.apply_strategies(self)
File "/home/ec2-user/.local/lib/python2.7/site-packages/gremlin_python/process/traversal.py", line 346, in apply_strategies
traversal_strategy.apply(traversal)
File "/home/ec2-user/.local/lib/python2.7/site-packages/gremlin_python/driver/remote_connection.py", line 143, in apply
remote_traversal = self.remote_connection.submit(traversal.bytecode)
File "/home/ec2-user/.local/lib/python2.7/site-packages/gremlin_python/driver/driver_remote_connection.py", line 54, in submit
results = result_set.all().result()
File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 405, in result
return self.__get_result()
File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 357, in __get_result
raise type(self._exception), self._exception, self._traceback
GremlinServerError: 500: {"requestId":"a42015b7-6b22-4bd1-9e7d-e3252e8f3ab6","code":"InternalFailureException","detailedMessage":"Can not get the attachable from the host vertex: v[64b32957-ef71-be47-c8d7-0109cfc4d9fd]-/->neptunegraph[org.apache.commons.configuration.PropertiesConfiguration#6db0f02]"}
To my knowledge execution in parallel shouldn't be the problem, since every query simply get's queued at the database (exactly for this reason i tried to create a query which executes the whole task at once).
Excuses for any bad English, it is not my native language

For anyone else who's looking for an update here - the OP was able to resolve the issue by replacing .next() with a .iterate(). Some followups were needed to understand the query and data better, but the OP has abandoned the project and moved to another solution.

OrientDB Gremlin server not working in python

I am using the orientdb and gremlin server in python, Gremlin server is started successfully, but when I am trying to add one vertex to the orientdb through gremlin code it's giving me an error.
query = """graph.addVertex(label, "Test", "title", "abc", "title", "abc")"""
following is the Traceback
/usr/bin/python3.6 /home/admin-12/Documents/bitbucket/ecodrone/ecodrone/test/test1.py
Traceback (most recent call last):
File "/home/admin-12/Documents/bitbucket/ecodrone/ecodrone/test/test1.py", line 27, in <module>
result = execute_query("""graph.addVertex(label, "Test", "title", "abc", "title", "abc")""")
File "/home/admin-12/Documents/bitbucket/ecodrone/ecodrone/GremlinConnector.py", line 21, in execute_query
results = future_results.result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/home/admin-12/.local/lib/python3.6/site-packages/gremlin_python/driver/resultset.py", line 81, in cb
f.result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
return self.__get_result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/admin-12/.local/lib/python3.6/site-packages/gremlin_python/driver/connection.py", line 77, in _receive
self._protocol.data_received(data, self._results)
File "/home/admin-12/.local/lib/python3.6/site-packages/gremlin_python/driver/protocol.py", line 106, in data_received
"{0}: {1}".format(status_code, data["status"]["message"]))
gremlin_python.driver.protocol.GremlinServerError: 599: Error during serialization: Infinite recursion (StackOverflowError) (through reference chain: com.orientechnologies.orient.core.id.ORecordId["record"]->com.orientechnologies.orient.core.record.impl.ODocument["schemaClass"]->com.orientechnologies.orient.core.metadata.schema.OClassImpl["document"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"]->com.orientechnologies.orient.core.record.impl.ODocument["owners"])
Process finished with exit code 1

First of all, I very much recommend that you do not use the Graph API to make mutation. Prefer the Traversal API for that and do:
g.addV('Test').
property('title1', 'abc').
property('title2', 'abc')
Second, I think that your error is occurring because you are returning a Vertex which contains an ORecordId which is the vertex identifier and Gremlin Server doesn't know how to handle that. I don't know if OrientDB has serializers built to handle that, but if they do then you would want to add them to Gremlin Server configurations which is described in a bit more detail here - basically, you would want to know if OrientDB exposes an TinkerPop IORegistry for all their custom classes that might be sent back over the wire.
If they do not, then you would want to avoid returning those or convert them yourself. TinkerPop already recommends that you not return full Vertex objects and only return data that you need. So, rather than g.V() you would want to convert that Vertex into a Map with g.V().valueMap('title') or something similar (perhaps use project() step). If you definitely need the vertex identifier then you would need to convert that to something TinkerPop serializers understand. That might mean something as simple as:
g.V().has("title1","abc").id().next().toString()

ndb.get_multi returning AssertionError

Over the last 48 hours or so my small python GAE app has started getting AssertionErrors from ndb.get_multi calls.
Full traceback is appended, and the errors are being generated on the production server in _BaseValue's __init__ on line 734 of /base/data/.../ndb/model.py, and the failing assertion is b_val is not None with message "Cannot wrap None"
The error doesn't appear to be related to a particular entity or entities, but I've only seen it with one entity type so far (yet to test others).
The get_multi call is for only up to a dozen keys, and the error is intermittent so that repeating it will sometimes succeed. Or not...
I'm not seeing this error via remote shell, but I note that my local install is 1.9.23 while the log entry says the production server is 1.9.25 (GoogleAppEngineLauncher says my local install is up to date)
I'm adding a workaround to catch the exception and iterate through the keys to get them individually but I'm still seeing an upstream warning about a "suspended generator get" on line 744 of context.py.
The warning appears on the first get of this entity type from the list, for at least 2 different lists of keys (as well as preceding the AssertionError).
I don't want to have to wrap all get_multi calls in this way.
What's going on?
TRACEBACK:
Cannot wrap None
Traceback (most recent call last):
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 570, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/s~thegapnetball/115.386356111937586421/handlers/assess.py", line 50, in get
rs = ndb.get_multi(t.players)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 3905, in get_multi
for future in get_multi_async(keys, **ctx_options)]
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 326, in get_result
self.check_success()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 372, in _help_tasklet_along
value = gen.send(val)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/context.py", line 751, in get
pbs = entity._to_pb(set_key=False).SerializePartialToString()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 3147, in _to_pb
prop._serialize(self, pb, projection=self._projection)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 2379, in _serialize
projection=projection)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1405, in _serialize
values = self._get_base_value_unwrapped_as_list(entity)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1175, in _get_base_value_unwrapped_as_list
wrapped = self._get_base_value(entity)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1163, in _get_base_value
return self._apply_to_values(entity, self._opt_call_to_base_type)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1335, in _apply_to_values
value[:] = map(function, value)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1217, in _opt_call_to_base_type
value = _BaseValue(self._call_to_base_type(value))
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 734, in \__init__
assert b_val is not None, "Cannot wrap None"
AssertionError: Cannot wrap None

Tim Hoffman and Patrick Costello put me on the right track to solve this.
I incremented version to protect some changes but took longer to finish than I expected.
One change added a repeated StructuredProperty to a model derived from ndb.Model, and I put several entities with the extra property (about 30 out of 1100 total).
The previous version without the extra property was still the default and was being lightly used, so the entities became just inconsistent enough to produce the intermittent AssertionError.
The main lesson is to take note of the recommendations in Google's schema update article, particularly changing the underlying parent to Expando and/or disabling datastore edits until any migration is complete.
https://cloud.google.com/appengine/articles/update_schema
The fix was to add the property to the previous version, get all the entities and then put them.
Thanks Tim and Patrick for the pointer!

Web2py DAL has no attribute?

My web2py application returned me an error today, which is quite odd.
Traceback (most recent call last):
File "/var/www/web2py/gluon/restricted.py", line 212, in restricted
exec ccode in environment
File "/var/www/web2py/applications/1MedCloud/controllers/default.py", line 475, in <module>
File "/var/www/web2py/gluon/globals.py", line 194, in <lambda>
self._caller = lambda f: f()
File "/var/www/web2py/applications/1MedCloud/controllers/default.py", line 63, in patient_register
rows = db(db.patientaccount.email==email).select()
File "/var/www/web2py/gluon/dal.py", line 7837, in __getattr__
return ogetattr(self, key)
AttributeError: 'DAL' object has no attribute 'patientaccount'
I am using Mysql as the database, and the table 'patientaccount' does exist. There is no connection issue as I can create tables but not fetch them from the server.
I have been using the very same code to do the db thing, here is my code
db = DAL('mysql://###:$$$#^^^^^^:3306/account_info', pool_size=0)
rows = db(db.patientaccount.email==email).select()
I did not change any code in my default.py file, but accidentally deleted some files inside "database" folder in my application. But I doubt if that could result the error, since the module is fetching tables on the server rather than using local files.
Please help! Thanks in advance!

The DAL does not inspect the MySQL database to discover its tables and fields. You must define the data models explicitly. So, somewhere in your code, you must do:
db.define_table('patientaccount',
Field('email'),
...)
That will define the db.patientaccount table so the DAL knows it exists and what fields it includes.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Syntax error while running mongo-connector between MongoDB and Neo4J - python

Related

SQLAlchemy: lazy='raise' complaining about fields even though they are loaded

Neptune InternalFailureException: Can not get the attachable from the host vertex

OrientDB Gremlin server not working in python

ndb.get_multi returning AssertionError

Web2py DAL has no attribute?

Categories

Resources