py2neo Cypher transactions failing - python

I'm trying to batch import millions of nodes through Py2Neo.
I don't know what's faster, the BatchWrite or the cipher.Transaction, but the latter seemed the best option as I need to split my batches.
However, when I try to execute a simple transaction, I receive a weird error.
The python code:
session = cypher.Session("http://127.0.0.1:7474/db/data/") #error also w/o /db/data/
def init():
tx = session.create_transaction()
for ngram, one_grams in data.items():
tx.append("CREATE "+str(n)+":WORD {'word': "+ngram+", 'rank': "+str(ngram_rank)+", 'prob': "+str(ngram_prob)+", 'gram': '0gram'}")
tx.execute() # line 69 in the error below
The error:
Traceback (most recent call last):
File "Ngram_neo4j.py", line 176, in <module>
init(rNgram_file="dataset_id.json")
File "Ngram_neo4j.py", line 43, in init
data = probability_items(data)
File "Ngram_neo4j.py", line 69, in probability_items
tx.execute()
File "D:\datasets\GOOGLE~1\virtenv\lib\site-packages\py2neo\cypher.py", line 224, in execute
return self._post(self._execute or self._begin)
File "D:\datasets\GOOGLE~1\virtenv\lib\site-packages\py2neo\cypher.py", line 209, in _post
raise TransactionError(error["code"], error["status"], error["message"])
KeyError: 'status'
I tried catching the exception:
except cypher.TransactionError as e:
print("--------------------------------------------------------------------------------------------")
print(e.status)
print(e.message)
But never gets called. (maybe an error on my part?)
Regular insert using graph_db.create({"node:" node}) do work, but are incredibly slow (36hrs for 2.5M nodes)
Note that the dataset consists of a series of JSON files, each with a structure to 5 levels deep.
I'd like to batch the last 2 levels (around 100 to 20.000 nodes per batch)
--- EDIT ---
I'm using Py2Neo 1.6.1, Neo4j 2.0.0. Currently on Windows 7 (but also OSX Mav., CentOS 6)

The problem you're seeing is due to a last minute alteration in the way that Cypher transaction errors are reported by the Neo4j server. Py2neo 1.6 was built against M05/M06 and when a few features changed in RC1/GA, Py2neo broke in a few places.
This has been fixed for Py2neo 1.6.2 (https://github.com/nigelsmall/py2neo/issues/224) but I do not yet know when I will get a chance to finish and release this version.

What neo4j and py2neo versions are you using?
You should use parameters for your create statements.
Can you check the server logs in data/logs and data/graph.db/messages.log for errors?
If you have so much data to insert then perhaps direct batch-insertion would make more sense?
See: http://neo4j.org/develop/import
Two tools I wrote for this:
https://github.com/jexp/neo4j-shell-tools/tree/20
https://github.com/jexp/batch-import/tree/20#binary-download

Related

Semantic versioning in dask repository

Why didn't the the commit 7138f470f0e55f2ebdb7638ddc4dfe2e78671403 trigger a new major version of dask since the function read_metadata is incompatible with older versions? The commit introduced the return of 4 values, but the old version only returned 3. According to semantic versioning this would have been the correct behavior.
cudf got broken, because of that commit.
Code from the issue:
>>> import cudf
>>> import dask_cudf
>>> dask_cudf.from_cudf(cudf.DataFrame({'a':[1,2,3]}),npartitions=1).to_parquet('test_parquet')
>>> dask_cudf.read_parquet('test_parquet')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/nvme/0/vjawa/conda/envs/cudf_15_june_25/lib/python3.7/site-packages/dask_cudf/io/parquet.py", line 213, in read_parquet
**kwargs,
File "/nvme/0/vjawa/conda/envs/cudf_15_june_25/lib/python3.7/site-packages/dask/dataframe/io/parquet/core.py", line 234, in read_parquet
**kwargs
File "/nvme/0/vjawa/conda/envs/cudf_15_june_25/lib/python3.7/site-packages/dask_cudf/io/parquet.py", line 17, in read_metadata
meta, stats, parts, index = ArrowEngine.read_metadata(*args, **kwargs)
ValueError: not enough values to unpack (expected 4, got 3)
dask_cudf==0.14 is only compatible with dask<=0.19. In dask_cudf==0.16 the issue is fixed.
Edit: Link to the issue
Whilst Dask does not have a concrete policy around the value of the version string, one could argue that in this particular case, the IO code is non-core, and largely being pushed by upstream (pyarrow) development rather than our own initiative.
We are sorry that your code broke, but of course picking the correct versions of packages and expecting downstream packages to catch up is part of the open source ecosystem.
You may want to raise this as a github issue, if you'd like to get input from more of the dask maintenance team. (There isn't really much to "answer" here, from a stackoverflow perspective)

Updating to the latest Yocto version sanity.bbclass issues

I have taken over a project which was using Yocto Fido version from 2015. I need to update this to the latest stable version Thud.
I have cloned the Poky-Thud repository and cloned the latest layers which are required by our customized layer such as meta-openembedded etc. and added our customized layer back to it.
Now, I wasn't expecting this to build straight away without issues by any means, but the below errors I'm getting with the new layers, I just don't understand. There are more errors like this relating to not enough values, but posted below is one.
There is an interface issue in meta/classes/sanity.bbclass. I can't just revert to the older version of meta to solve this, nor I don't think it makes sense to modify the code myself? Any ideas why this is and how to solve it?
ERROR: Execution of event handler 'config_reparse_eventhandler' failed
Traceback (most recent call last):
File "/home/ubuntu/new-repo/poky-thud/build-
bbgw/../meta/classes/sanity.bbclass", line 971, in
config_reparse_eventhandler(e=<bb.event.ConfigParsed object at
0x7ff4103bf3c8>):
python config_reparse_eventhandler() {
> sanity_check_conffiles(e.data)
}
File "/home/ubuntu/new-repo/poky-thud/build-
bbgw/../meta/classes/sanity.bbclass", line 572, in sanity_check_conffiles(d=
<bb.data_smart.DataSmart object at 0x7ff4108d35c0>):
for func in funcs:
> conffile, current_version, required_version, func =
func.split(":")
if check_conf_exists(conffile, d) and d.getVar(current_version)
is not None and \
ValueError: not enough values to unpack (expected 4, got 1)

Why does Spark-XML on AWS Glue fail with AbstractMethodError?

I have an AWS Glue job written in Python that pulls in the spark-xml library (through the Dependent jars path). I'm using spark-xml_2.11-0.2.0.jar. When I try to output my DataFrame to XML I get an error. The code I'm using is:
applymapping1.toDF().repartition(1).write.format("com.databricks.xml").save("s3://glue.xml.output/Test.xml");
The error I get is:
"/mnt/yarn/usercache/root/appcache/application_1517883778506_0016/container_1517883778506_0016_02_000001/pyspark.zip/pyspark/sql/readwriter.py",
line 550, in save File
"/mnt/yarn/usercache/root/appcache/application_1517883778506_0016/container_1517883778506_0016_02_000001/py4j-0.10.4-src.zip/py4j/java_gateway.py",
line 1133, in call File
"/mnt/yarn/usercache/root/appcache/application_1517883778506_0016/container_1517883778506_0016_02_000001/pyspark.zip/pyspark/sql/utils.py",
line 63, in deco File
"/mnt/yarn/usercache/root/appcache/application_1517883778506_0016/container_1517883778506_0016_02_000001/py4j-0.10.4-src.zip/py4j/protocol.py",
line 319, in get_return_value py4j.protocol.Py4JJavaError: An error
occurred while calling o75.save. : java.lang.AbstractMethodError:
com.databricks.spark.xml.DefaultSource15.createRelation(Lorg/apache/spark/sql/SQLContext;Lorg/apache/spark/sql/SaveMode;Lscala/collection/immutable/Map;Lorg/apache/spark/sql/Dataset;)Lorg/apache/spark/sql/sources/BaseRelation;
at
org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
at
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
at
If I change it to CSV, it works fine:
applymapping1.toDF().repartition(1).write.format("com.databricks.csv").save("s3://glue.xml.output/Test.xml");
Note: When using CSV I don't have to import spark-xml. I think spark-csv is included in AWS Glue's Spark environment.
Any suggestions to what to try?
I've tried various versions of spark-xml:
spark-xml_2.11-0.2.0
spark-xml_2.11-0.3.1
spark-xml_2.10-0.2.0
That question is very similar to (but not an exact duplicate of) Why does elasticsearch-spark 5.5.0 give AbstractMethodError when submitting to YARN cluster? that also deals with AbstractMethodError.
Quoting the javadoc of java.lang.AbstractMethodError:
Thrown when an application tries to call an abstract method. Normally, this error is caught by the compiler; this error can only occur at run time if the definition of some class has incompatibly changed since the currently executing method was last compiled.
That pretty much explains what you experience (note the part that starts with "this error can only occur at run time").
I think it's a Spark version mismatch in play here.
Given com.databricks.spark.xml.DefaultSource15 in the stack trace and the change that does the following:
Remove the separated DefaultSource15 due to compatibility in Spark 1.5+
This removes DefaultSource15 and merge it into DefaultSource. This was separated for compatibility in Spark 1.5+ . In master and spark-xml 0.4.x, it dropped 1.x support.
You should make sure that the version of Spark in AWS Glue's Spark environment matches the spark-xml. The latest version of spark-xml 0.4.1 was released on 6 Nov 2016.

Code for gensim Word2vec as an HTTP service 'KeyedVectors' Attribute error

I am using the w2v_server_googlenews code from the word2vec HTTP server running at https://rare-technologies.com/word2vec-tutorial/#bonus_app. I changed the loaded file to a file of vectors trained with the original C version of word2vec. I load the file with
gensim.models.KeyedVectors.load_word2vec_format(fname, binary=True)
and it seems to load without problems. But when I test the HTTP service with, let's say
curl 'http://127.0.0.1/most_similar?positive%5B%5D=woman&positive%5B%5D=king&negative%5B%5D=man'
I got an empty result with only the execution time.
{"taken": 0.0003361701965332031, "similars": [], "success": 1}
I put a traceback.print_exc() on the except part of the related method, which is in this case def most_similar(self, *args, **kwargs): and I got:
Traceback (most recent call last):
File "./w2v_server.py", line 114, in most_similar
topn=5)
File "/usr/local/lib/python2.7/dist-packages/gensim/models/keyedvectors.py", line 304, in most_similar
self.init_sims()
File "/usr/local/lib/python2.7/dist-packages/gensim/models/keyedvectors.py", line 817, in init_sims
self.syn0norm = (self.syn0 / sqrt((self.syn0 ** 2).sum(-1))[..., newaxis]).astype(REAL)
AttributeError: 'KeyedVectors' object has no attribute 'syn0'
Any idea on why this might happens?
Note: I use python 2.7 and I installed gensim using pip, which gave me gensim 2.1.0.
FYI that demo code was baed on gensim 0.12.3 (from 2015, as listed in its requirements.txt), and would need updating to work with the latest gensim.
It might be sufficient to add a line to w2v_server.py at line 70 (just after the load_word2vec_format()), to force the creation of the needed syn0norm property (which in older gensims was auto-created on load), before deleting the raw syn0 values. Specifically:
self.model.init_sims(replace=True)
(You would leave out the replace=True if you were going to be doing operations other than most_similar(), that might require raw vectors.)
If this works to fix the problem for you, a pull-request to the w2v_server_googlenews repo would be favorably received!

Error calling Python module function in MySQL Workbench

I'm kind of at my wits end here, and so far have had no feedback from the MySQL Workbench bug reporting site, so I thought I'd throw this question/problem out to more sites.
I'm attempting to migrate from a MSSQL server on a Windows Server 2003 machine to MySQL server running on a Centos 6.5 VM. I can connect to the source and target databases, select a schemata, and runs through a pass through once for retrieving tables. After this the process fails and throws the following errors:
Traceback (most recent call last):
File "/usr/lib64/mysql-workbench/modules/db_mssql_grt.py", line 409, in reverseEngineer
reverseEngineerProcedures(connection, schema)
File "/usr/lib64/mysql-workbench/modules/db_mssql_grt.py", line 1016, in reverseEngineerProcedures
for idx, (proc_count, proc_name, proc_definition) in enumerate(cursor):
MemoryError
Traceback (most recent call last):
File "/usr/share/mysql-workbench/libraries/workbench/wizard_progress_page_widget.py", line 192, in thread_work
self.func()
File "/usr/lib64/mysql-workbench/modules/migration_schema_selection.py", line 160, in task_reveng
self.main.plan.migrationSource.reverseEngineer()
File "/usr/lib64/mysql-workbench/modules/migration.py", line 353, in reverseEngineer
self.state.sourceCatalog = self._rev_eng_module.reverseEngineer(self.connection, self.selectedCatalogName, self.selectedSchemataNames, self.state.applicationData)
SystemError: MemoryError(""): error calling Python module function DbMssqlRE.reverseEngineer
ERROR: Reverse engineer selected schemata: MemoryError(""): error calling Python module function DbMssqlRE.reverseEngineer
Failed
I thought this was initally a memory error, so I've upped the memory on the box to 16 GiB. This error also occurs on any size DBs, as I've tried very minimal sized ones with hardly any tables.
Any thoughts? Thanks for looking
Just in case anyone else runs into this. I had the same problem and fixed it by getting rid of non-ASCII characters in schemas, tables....basically all MSSQL objects. This was confounded by the fact that I had SQL# (www.sqlsharp.com) installed, which adds a number of functions and stored procs with a schema called SQL#. You can remove that with this command:
EXEC SQL#.SQLsharp_Uninstall
Once you get rid of non-ASCII chars, the migration works.
The OP (I assume) closed their bug report with this message:
[...] I figured out a work around, or the flaw in the system perhaps. Turns out that Null values were not allowed inside the Datetime fields when doing a migration. I turned every Datetime field in my database to a default value and migrated it successfully after that.

Categories

Resources