The most appropriate way to use Neo4j from Python in 2015

The most appropriate way to use Neo4j from Python in 2015 - python

I'm using latest community Neo4j (2.2.0-M03) for storing my graphs. I'm interested in accessing it from Python. According to the official Neo4j documentation, there are several alternatives.
From what I have understood by checking the docs, playing around a bit, and checking this post, py2neo is the only one supporting Neo4j 2 (and labels). However, if I'd like to write and run specific algorithms on Neo4j, I should use Gremlin, through Bulbs, that however does not seem to support Neo4j 2.
Now, I would like to use some custom algorithms not currently in Neo4j, like Spreading Activation.
Is writing algorithms directly in Neo4j in Java and running them from Python using cypher commands through py2neo the only alternative? Am I missing something?
Cheers
PS. I wanted to post links to all the software I cited but unfortunately I need at least 10 reputation to post more than 2 links...

This is a very tough question, it seems you need design guidance not a quick neo4j question. Depending on how you're using spreading activation, it might be better not to modify the server, but I can't tell because your use case is probably involved. Keep in mind that you can always use neo4j as a graph store, and then put higher-level concepts like spreading activation in your application code, not in the server.
The question presumes I think you want to put it in the server. So what are the options? Broadly, you could write a server plugin and extend the RESTful API (which wouldn't help you with py2neo) On the other hand, I don't think defining your own custom cypher function is supported right now, so you can't necessarily modify the cypher language itself, then use py2neo bindings to exploit a fancy new cypher function. Advice given elsewhere suggests you might want to consider an unmanaged extension to implemented spreading activation. If you did this, once again, I don't see how py2neo would help you.
Short term, I think you should consider NOT modifying neo4j itself, but rather putting your spreading activation in python code that maybe uses py2neo. Long-term, if neo4j comes up with a way of doing cypher user-defined functions (UDFs) which I understand is on the development roadmap (maybe?) then that might be a better option, but I wouldn't recommend it without many more requirements and details.

Related

Repository pattern on Python

I'm new to python and I'm comming from the c# world.
Over there it seemed like the repository pattern was the way to go, but I am having trouble finding any tutorials of how to best do this on Python.
edit I understand that it can be implemented, I'm just wondering if there is any reason why I am finding close to nothing for how to go about doing this.
Thanks!

I wasn't immediately familiar with the "repository pattern", so I looked it up. It appears to be the idea of putting a more general API, like a dictionary-like key/value lookup, in front of a database or other data store. It seems that the idea is to add an additional layer of abstraction that can allow multiple types of data sources (like both a relational database and a CVS file) to be accessed transparently via a common API.
Given this definition, I can think of no reason why this design pattern wouldn't be equally applicable to a problem addressed with Python vs any other programming language.

Searching with the pyramid framework

I'm trying to implement a search function into my website, which is running on pyramid, and I was wondering what is the most efficient way of approaching this problem. I am currently looking into Whoosh and MySQL full text searching with SqlAlchemy. I need a fast and simple implementation, and wondering which one would be the best choice.

I tried using fulltext with the native database for a while and it just was too much work to keep things working across sqlite, mysql, and pgsql. I ported all the search code over to whoosh and have been really happy ever since. It performs well for small workloads, is pure python, and no server to setup.
You just implement it almost like writing and updating a file on disk. From what I've read it does well in the single millions of documents. I'm using it with some 18k documents with an index size of around 100MB. There's a lot of flexibility to implement various tokenizing and other config with it. I really suggest people start there and if they out grow the whoosh, then look at starting up extra processes with elasticsearch, lucene/solr, and the like.
You can see how I've got it implemented here:
https://github.com/mitechie/Bookie/blob/develop/bookie/models/fulltext.py
and I update it using SqlAlchemy event hooks:
https://github.com/mitechie/Bookie/blob/develop/bookie/models/__init__.py#L663
and you can judge a basic implementation of it by searching:
https://bmark.us/search

I'm a huge fan of ElasticSearch. It's the easiest to set up, maintain, and work with.
I generally use requests.
to index:
requests.put("http://localhost:9200/myindex/category/",data=json.dumps(document))
to search:
requests.get("http://localhost:9200/myindex/category/_search?q="+somequery)
you can get way more in depth in searching using the DSL:
http://www.elasticsearch.org/guide/reference/query-dsl/

Full text searching and Python

Can someone help me out with some suggestion for a full-text searching engine that supports Python?
Right now we have a MySQL database in place and I'd like to add the ability to have a full-text search engine index some of the text in some of the tables in this database. This text data would be used by a web application to search for the corresponding records in the database. For instance, index the customer name information in our customer table, full text search that with the web application to get the MySQL record for the customer.
I've looked (briefly) at Lucene, Swish-E and MongoDB, and few others, but I'm not sure what would be a good choice for me considering a couple of things:
I'm not a Java guy (though I've been programming for a long time),
we only want to search a relatively small set of data,
we're looking to index text in a MySQL database,
and would like that index to be updated in semi-realtime.
Any hints, tips or pointers would be greatly appreciated!

Have a look at Whoosh. I've heard it doesn't scale up terribly well (maybe that's fixed now) but for small collections, it might be useful.
For a scalable solution, consider using Lucene with PyLucene or Jython.

Building pylucene a few months ago was one of the most painful experiences I had. The project won't get any traction IMHO if it's so hard to build.
With a few other folks having the same itch to scratch, we started https://code.google.com/a/apache-extras.org/p/pylucene-extra/ to gather prebuilt pylucene and jcc eggs on several operating systems, Python versions and Java runtimes combos. It is not very active lately, though.
Whoosh might be a good fit, or you may want to have a look at Sphinx, ElasticSearch or HaystackSearch (CAVEAT: I did not work on any of these).
Or maybe try to access Solr via python (there are a few APIs), which might be much easier than using pylucene. Consider that lucene will still need a JVM to run, of course.
Since you don't have huge scalability needs, I would focus on simple usage and community support rather than performance and scale. Hope it helps.

Solr is a great wrapper to Lucene, it greatly simplifies things. It doesn't require any Java tinkering for most things, you just need to configure some XML files. It does run as another process, so this may complicate your deployment.
I have had great results with pysolr, but really, you could write your own python communication library since Solr uses REST, so it is really simple to send and retrieve data in either xml or json.

ORM with Graph-Databases like Neo4j in Python

i wonder wether there is a solution (or a need for) an ORM with Graph-Database (f.e. Neo4j). I'm tracking relationships (A is related to B which is related to A via C etc., thus constructing a large graph) of entities (including additional attributes for those entities) and need to store them in a DB, and i think a graph database would fit this task perfectly.
Now, with sql-like DBs, i use sqlalchemyś ORM to store my objects, especially because of the fact that i can retrieve objects from the db and work with them in a pythonic style (use their methods etc.).
Is there any object-mapping solution for Neo4j or other Graph-DB, so that i can store and retrieve python objects into and from the Graph-DB and work with them easily?
Or would you write some functions or adapters like in the python sqlite documentation (http://docs.python.org/library/sqlite3.html#letting-your-object-adapt-itself) to retrieve and store objects?

Shameless plug... there is also my own ORM which you may also want to checkout: https://github.com/robinedwards/neomodel
It's built on top of py2neo, using cypher and rest API calls under hood, i.e no dependency on gremlin.

There are a couple choices in Python out there right now, based on databases' REST interfaces.
As I mentioned in the link #Peter provided, we're working on neo4django, which updates the old Neo4j/Django integration. It's a good choice if you need complex queries and want an ORM that will manage node indexing as well- or if you're already using Django. It works very similarly to the native Django ORM. Find it on PyPi or GitHub.
There's also a more general solution called Bulbflow that is supposed to work with any graph database supported by Blueprints. I haven't used it, but from what I've seen it focuses on domain modeling - Bulbflow already has working relationship models, for example, which we're still working on- but doesn't much support complex querying (as we do with Django querysets + index use). It also lets you work a bit closer to the graph.

Maybe you could take a look on Bulbflow, that allows to create models in Django, Flask or Pyramid. However, it works over a REST client instead of the python-binding provided by Neo4j, so perhaps it's not as fast as the native binding is.

Graph databases and RDF triplestores: storage of graph data in python

I need to develop a graph database in python (I would enjoy if anybody can join me in the development. I already have a bit of code, but I would gladly discuss about it).
I did my research on the internet. in Java, neo4j is a candidate, but I was not able to find anything about actual disk storage. In python, there are many graph data models (see this pre-PEP proposal, but none of them satisfy my need to store and retrieve from disk.
I do know about triplestores, however. triplestores are basically RDF databases, so a graph data model could be mapped in RDF and stored, but I am generally uneasy (mainly due to lack of experience) about this solution. One example is Sesame. Fact is that, in any case, you have to convert from in-memory graph representation to RDF representation and viceversa in any case, unless the client code wants to hack on the RDF document directly, which is mostly unlikely. It would be like handling DB tuples directly, instead of creating an object.
What is the state-of-the-art for storage and retrieval (a la DBMS) of graph data in python, at the moment? Would it make sense to start developing an implementation, hopefully with the help of someone interested in it, and in collaboration with the proposers for the Graph API PEP ? Please note that this is going to be part of my job for the next months, so my contribution to this eventual project is pretty damn serious ;)
Edit: Found also directededge, but it appears to be a commercial product

I have used both Jena, which is a Java framework, and Allegrograph (Lisp, Java, Python bindings). Jena has sister projects for storing graph data and has been around a long, long time. Allegrograph is quite good and has a free edition, I think I would suggest this cause it is easy to install, free, fast and you could be up and going in no time. The power you would get from learning a little RDF and SPARQL may very well be worth your while. If you know SQL already then you are off to a great start. Being able to query your graph using SPARQL would yield some great benefits to you. Serializing to RDF triples would be easy, and some of the file formats are super easy ( NT for instance ). I'll give an example. Lets say you have the following graph node-edge-node ids:
1 <- 2 -> 3
3 <- 4 -> 5
these are already subject predicate object form so just slap some URI notation on it, load it in the triple store and query at-will via SPARQL. Here it is in NT format:
<http://mycompany.com#1> <http://mycompany.com#2> <http://mycompany.com#3> .
<http://mycompany.com#3> <http://mycompany.com#4> <http://mycompany.com#5> .
Now query for all nodes two hops from node 1:
SELECT ?node
WHERE {
<http://mycompany.com#1> ?p1 ?o1 .
?o1 ?p2 ?node .
}
This would of course yield <http://mycompany.com#5>.
Another candidate would be Mulgara, written in pure Java. Since you seem more interested in Python though I think you should take a look at Allegrograph first.

I think the solution really depends on exactly what it is you want to do with the graph once you have managed to store it on disk/in database, and this is a little unclear in your question. However, a couple of things you might wish to consider are:
if you just want to persist the graph without using any of the features or properties you might expect from an rdbms solution (such as ACID), then how about just pickling the objects into a flat file? Very rudimentary, but like I say, depends on exactly what you want to achieve.
ZODB is an object database for Python (a spin off from the Zope project I think). I can't say I've had much experience of it in a high performance environment, but bar a few restrictions does allow you to store Python objects natively.
if you wish to pursue RDF, there is an RDF Alchemy project which might help to alleviate some of your concerns about converting from your graph to RDF structures and I think has Sesame as part of it's stack.
There are some other persistence tools detailed on the python site which may be of interest, however I spent quite a while looking into this area last year, and ultimately I found there wasn't a native Python solution that met my requirements.
The most success I had was using MySQL with a custom ORM and I posted a couple of relevant links in an answer to this question. Additionally, if you want to contribute to an RDBMS project, when I spoke to someone from Open Query about a Graph storage engine for MySQL them seemed interested in getting active participation in their project.
Sorry I can't give a more definitive answer, but I don't think there is one... If you do start developing your own implementation, I'd be interested to keep up-to-date with how you get on.

Greetings from your Serius Cybernetics Intelligent Agent!
Some useful links...
Programming the Semantic Web
SEMANTIC PROGRAMMING
RDFLib Python Library for RDF

Hmm, maybe you should take a look at CubicWeb

Regarding Neo4j, did you notice the existing Python bindings? As for the disk storage, take a look at this thread on the mailing list.
For graphdbs in Python, the Hypergraph Database Management System project was recently started on SourceForge by Maurice Ling.

Redland (http://librdf.org) is probably the solution you're looking for. It has Python bindings too.

RDFLib is a python library that you can use. Using harschware's example:
Create a test.nt file like below:
<http://mycompany.com#1> <http://mycompany.com#2> <http://mycompany.com#3> .
<http://mycompany.com#3> <http://mycompany.com#4> <http://mycompany.com#5> .
To query for all nodes two hops from node 1 in RDFLib:
from rdflib import Graph
g = Graph()
g.parse("test.nt", format="nt")
qres = g.query(
"""SELECT ?node
WHERE {
<http://mycompany.com#1> ?p1 ?o1 .
?o1 ?p2 ?node .
}"""
)
for row in qres:
print(node)
Should return the answer <http://mycompany.com#5>.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.