how to create relation between existing two node, I'm using neo4j. - python

I'm in starting neo4j and I'm using python3.5 and py2neo.
I had build two graph node with following code. and successfully create.[!
>>> u1 = Node("Person",name='Tom',id=1)
>>> u2 = Node('Person', name='Jerry', id=2)
>>> graph.create(u1,u2)
after that, I going to make a relation between 'Tom' and 'Jerry'
Tom's id property is 1, Jerry's id property is 2.
So. I think, I have to point to existing two node using id property.
and then I tried to create relation like below.
>>> u1 = Node("Person",id=1)
>>> u2 = Node("Person",id=2)
>>> u1_knows_u2=Relationship(u1, 'KKNOWS', u2)
>>> graph.create(u1_knows_u2)
above successfully performed. But the graph is something strange.
I don't know why unknown graph nodes are created. and why the relation is created between unknown two node.

You can have two nodes with the same label and same properties. The second node you get with u1 = Node("Person",id=1) is not the same one you created before. It's a new node with the same label/property.
When you define two nodes (i.e. your new u1 and u2) and create a relationships between them, the whole pattern will be created.
To get the two nodes and create a relationship between them you would do:
# create Tom and Jerry as before
u1 = Node("Person",name='Tom',id=1)
u2 = Node('Person', name='Jerry', id=2)
graph.create(u1,u2)
# either use u1 and u2 directly
u1_knows_u2 = Relationship(u1, 'KKNOWS', u2)
graph.create(u1_knows_u2)
# or find existing nodes and create a relationship between them
existing_u1 = graph.find_one('Person', property_key='id', property_value=1)
existing_u2 = graph.find_one('Person', property_key='id', property_value=2)
existing_u1_knows_u2 = Relationship(existing_u1, 'KKNOWS', existing_u2)
graph.create(existing_u1_knows_u2)
find_one() assumes that your id properties are unique.

Note also that you can use the Cypher query language with Py2neo:
graph.cypher.execute('''
MERGE (tom:Person {name: "Tom"})
MERGE (jerry:Person {name: "Jerry"})
CREATE UNIQUE (tom)-[:KNOWS]->(jerry)
''')
The MERGE statement in Cypher is similar to "get or create". If a Person node with the given name "Tom" already exists it will be bound to the variable tom, if not the node will be created and then bound to tom. This, combined with adding uniqueness constraints allows for avoiding unwanted duplicate nodes.

Check this Query,
MATCH (a),(b) WHERE id(a) =1 and id(b) = 2 create (a)-[r:KKNOWS]->(b) RETURN a, b

Related

use custom comparator for a specific set

I am storing a number of objects in a set. Is there a way to override the comparator function used just for that set? I know I can override __eq__ and friends but I don't want to do so as I am also storing those objects in other sets.
Illustration:
# suppose Person class has name and address fields
p1 = Person("Alice", "addr1")
p2 = Person("Alice", "addr2")
s1 = set(p1, p2, [equality based on name only]) # this should contain only one of p1 or p2
s2 = set(p1, p2) # this should contain p1 and p2
set doesn't provide a way to determine whether two objects are equivalent; it leaves that up to the class.
However, you can group the objects by an arbitrary predicate, then construct an appropriate sequence to pass to set.
Here's a solution using itertools.groupby:
from itertools import groupby
def get_name(p):
return p.name # or however you get the name of a Person instance
s1 = set(next(v) for _, v in groupby(sorted([p1, p2], key=get_name), get_name))
After sorting by name, groupby will put all Persons with the same name in a single group. Iterating over the resulting sequence yields tuples like ("Alice", <sequence of Person...>). You can ignore the key, and just call next on the sequence to get an object with the name Alice.
Note that depending on how you do the grouping, "equal" elements can still end up in the different groups, and set will discard the duplicates as usual.
You can do this using a dictionary (which kind of uses a set under-the-hood):
# Just to simulate a class, not really necessary
import collections
Person = collections.namedtuple('Person', ('name', 'address'))
people = [Person("Alice", "addr1"), Person("Alice", "addr2"), Person("Bob", "addr1")]
s = set({person.name: person for person in people}.values())
print(s)
# Output: {Person(name='Bob', address='addr1'), Person(name='Alice', address='addr2')}

Querying objects using attribute of member of many-to-many

I have the following models:
class Member(models.Model):
ref = models.CharField(max_length=200)
# some other stuff
def __str__(self):
return self.ref
class Feature(models.Model):
feature_id = models.BigIntegerField(default=0)
members = models.ManyToManyField(Member)
# some other stuff
A Member is basically just a pointer to a Feature. So let's say I have Features:
feature_id = 2, members = 1, 2
feature_id = 4
feature_id = 3
Then the members would be:
id = 1, ref = 4
id = 2, ref = 3
I want to find all of the Features which contain one or more Members from a list of "ok members." Currently my query looks like this:
# ndtmp is a query set of member-less Features which Members can point to
sids = [str(i) for i in list(ndtmp.values('feature_id'))]
# now make a query set that contains all rels and ways with at least one member with an id in sids
okmems = Member.objects.filter(ref__in=sids)
relsways = Feature.geoobjects.filter(members__in=okmems)
# now combine with nodes
op = relsways | ndtmp
This is enormously slow, and I'm not even sure if it's working. I've tried using print statements to debug, just to make sure anything is actually being parsed, and I get the following:
print(ndtmp.count())
>>> 12747
print(len(sids))
>>> 12747
print(okmems.count())
... and then the code just hangs for minutes, and eventually I quit it. I think that I just overcomplicated the query, but I'm not sure how best to simplify it. Should I:
Migrate Feature to use a CharField instead of a BigIntegerField? There is no real reason for me to use a BigIntegerField, I just did so because I was following a tutorial when I began this project. I tried a simple migration by just changing it in models.py and I got a "numeric" value in the column in PostgreSQL with format 'Decimal:( the id )', but there's probably some way around that that would force it to just shove the id into a string.
Use some feature of Many-To-Many Fields which I don't know abut to more efficiently check for matches
Calculate the bounding box of each Feature and store it in another column so that I don't have to do this calculation every time I query the database (so just the single fixed cost of calculation upon Migration + the cost of calculating whenever I add a new Feature or modify an existing one)?
Or something else? In case it helps, this is for a server-side script for an ongoing OpenStreetMap related project of mine, and you can see the work in progress here.
EDIT - I think a much faster way to get ndids is like this:
ndids = ndtmp.values_list('feature_id', flat=True)
This works, producing a non-empty set of ids.
Unfortunately, I am still at a loss as to how to get okmems. I tried:
okmems = Member.objects.filter(ref__in=str(ndids))
But it returns an empty query set. And I can confirm that the ref points are correct, via the following test:
Member.objects.values('ref')[:1]
>>> [{'ref': '2286047272'}]
Feature.objects.filter(feature_id='2286047272').values('feature_id')[:1]
>>> [{'feature_id': '2286047272'}]
You should take a look at annotate:
okmems = Member.objects.annotate(
feat_count=models.Count('feature')).filter(feat_count__gte=1)
relsways = Feature.geoobjects.filter(members__in=okmems)
Ultimately, I was wrong to set up the database using a numeric id in one table and a text-type id in the other. I am not very familiar with migrations yet, but as some point I'll have to take a deep dive into that world and figure out how to migrate my database to use numerics on both. For now, this works:
# ndtmp is a query set of member-less Features which Members can point to
# get the unique ids from ndtmp as strings
strids = ndtmp.extra({'feature_id_str':"CAST( \
feature_id AS VARCHAR)"}).order_by( \
'-feature_id_str').values_list('feature_id_str',flat=True).distinct()
# find all members whose ref values can be found in stride
okmems = Member.objects.filter(ref__in=strids)
# find all features containing one or more members in the accepted members list
relsways = Feature.geoobjects.filter(members__in=okmems)
# combine that with my existing list of allowed member-less features
op = relsways | ndtmp
# prove that this set is not empty
op.count()
# takes about 10 seconds
>>> 8997148 # looks like it worked!
Basically, I am making a query set of feature_ids (numerics) and casting it to be a query set of text-type (varchar) field values. I am then using values_list to make it only contain these string id values, and then I am finding all of the members whose ref ids are in that list of allowed Features. Now I know which members are allowed, so I can filter out all the Features which contain one or more members in that allowed list. Finally, I combine this query set of allowed Features which contain members with ndtmp, my original query set of allowed Features which do not contain members.

py2neo - How can I use merge_one function along with multiple attributes for my node?

I have overcome the problem of avoiding the creation of duplicate nodes on my DB with the use of merge_one functions which works like that:
t=graph.merge_one("User","ID","someID")
which creates the node with unique ID. My problem is that I can't find a way to add multiple attributes/properties to my node along with the ID which is added automatically (date for example).
I have managed to achieve this the old "duplicate" way but it doesn't work now since merge_one can't accept more arguments! Any ideas???
Graph.merge_one only allows you to specify one key-value pair because it's meant to be used with a uniqueness constraint on a node label and property. Is there anything wrong with finding the node by its unique id with merge_one and then setting the properties?
t = graph.merge_one("User", "ID", "someID")
t['name'] = 'Nicole'
t['age'] = 23
t.push()
I know I am a bit late... but still useful I think
Using py2neo==2.0.7 and the docs (about Node.properties):
... and the latter is an instance of PropertySet which extends dict.
So the following worked for me:
m = graph.merge_one("Model", "mid", MID_SR)
m.properties.update({
'vendor':"XX",
'model':"XYZ",
'software':"OS",
'modelVersion':"",
'hardware':"",
'softwareVesion':"12.06"
})
graph.push(m)
This hacky function will iterate through the properties and values and labels gradually eliminating all nodes that don't match each criteria submitted. The final result will be a list of all (if any) nodes that match all the properties and labels supplied.
def find_multiProp(graph, *labels, **properties):
results = None
for l in labels:
for k,v in properties.iteritems():
if results == None:
genNodes = lambda l,k,v: graph.find(l, property_key=k, property_value=v)
results = [r for r in genNodes(l,k,v)]
continue
prevResults = results
results = [n for n in genNodes(l,k,v) if n in prevResults]
return results
The final result can be used to assess uniqueness and (if empty) create a new node, by combining the two functions together...
def merge_one_multiProp(graph, *labels, **properties):
r = find_multiProp(graph, *labels, **properties)
if not r:
# remove tuple association
node,= graph.create(Node(*labels, **properties))
else:
node = r[0]
return node
example...
from py2neo import Node, Graph
graph = Graph()
properties = {'p1':'v1', 'p2':'v2'}
labels = ('label1', 'label2')
graph.create(Node(*labels, **properties))
for l in labels:
graph.create(Node(l, **properties))
graph.create(Node(*labels, p1='v1'))
node = merge_one_multiProp(graph, *labels, **properties)

Indexing nodes in neo4j in python

I'm building a database with tag nodes and url nodes, and the url nodes are connected to tag nodes. In this case if the same url is inserted in to the database, it should be linking to the tag node, rather than creating duplicate url nodes. I think indexing would solve this problem. How is it possible to do indexing and traversal with the neo4jrestclient?. Link to a tutorial would be fine. I'm currently using versae neo4jrestclient.
Thanks
The neo4jrestclient supports both indexing and traversing the graph, but I think by using just indexing could be enoguh for your use case. However, I don't know if I understood properly your problem. Anyway, something like this could work:
>>> from neo4jrestclient.client import GraphDatabase
>>> gdb = GraphDatabase("http://localhost:7474/db/data/")
>>> idx = gdb.nodes.indexes.create("urltags")
>>> url_node = gdb.nodes.create(url="http://foo.bar", type="URL")
>>> tag_node = gdb.nodes.create(tag="foobar", type="TAG")
We add the property count to the relationship to keep track the number of URLs "http://foo.bar" tagged with the tag foobar.
>>> url_node.relationships.create(tag_node["tag"], tag_node, count=1)
And after that, we index the url node according the value of the URL.
>>> idx["url"][url_node["url"]] = url_node
Then, when I need to create a new URL node tagged with a TAG node, we first query the index to check if that is yet indexed. Otherwise, we create the node and index it.
>>> new_url = "http://foo.bar2"
>>> nodes = idx["url"][new_url]
>>> if len(nodes):
... rel = nodes[0].relationships.all(types=[tag_node["tag"]])[0]
... rel["count"] += 1
... else:
... new_url_node = gdb.nodes.create(url=new_url, type="URL")
... new_url_node.relationships.create(tag_node["tag"], tag_node, count=1)
... idx["url"][new_url_node["url"]] = new_url_node
An important concept is that the indexes are key/value/object triplets where the object is either a node or a relationship you want to index.
Steps to create and use the index:
Create an instance of the graph database rest client.
from neo4jrestclient.client import GraphDatabase
gdb = GraphDatabase("http://localhost:7474/db/data/")
Create a node or relationship index (Creating a node index here)
index = gdb.nodes.indexes.create('latin_genre')
Add nodes to the index
nelly = gdb.nodes.create(name='Nelly Furtado')
shakira = gdb.nodes.create(name='Shakira')
index['latin_genre'][nelly.get('name')] = nelly
index['latin_genre'][shakira.get('name')] = shakira
Fetch nodes based on the index and do further processing:
for artist in index['latin_genre']['Shakira']:
print artist.get('name')
More details can be found from the notes in the webadmin
Neo4j has two types of indexes, node and relationship indexes. With
node indexes you index and find nodes, and with relationship indexes
you do the same for relationships.
Each index has a provider, which is the underlying implementation
handling that index. The default provider is lucene, but you can
create your own index provides if you like.
Neo4j indexes take key/value/object triplets ("object" being a node or
a relationship), it will index the key/value pair, and associate this
with the object provided. After you have indexed a set of
key/value/object triplets, you can query the index and get back
objects that where indexed with key/value pairs matching your query.
For instance, if you have "User" nodes in your database, and want to
rapidly find them by username or email, you could create a node index
named "Users", and for each user index username and email. With the
default lucene configuration, you can then search the "Users" index
with a query like: "username:bob OR email:bob#gmail.com".
You can use the data browser to query your indexes this way, the
syntax for the above query is "node:index:Users:username:bob OR
email:bob#gmail.com".

Storing a directed, weighted, complete graph in the GAE datastore

I have a directed, weighted, complete graph with 100 vertices. The vertices represent movies, and the edges represent preferences between two movies. Each time a user visits my site, I query a set of 5 vertices to show to the user (the set changes frequently). Let's call these vertices A, B, C, D, E. The user orders them (i.e. ranks these movies from most to least favorite). For example, he might order them D, B, A, C, E. I then need to update the graph as follows:
Graph[D][B] +=1
Graph[B][A] +=1
Graph[A][C] +=1
Graph[C][E] +=1
So the count Graph[V1][V2] ends up representing how many users ranked (movie) V1 directly above (movie) V2. When the data is collected, I can do all kinds of offline graph analysis, e.g. find the sinks and sources of the graph to identify the most and least favorite movies.
The problem is: how do I store a directed, weighted, complete graph in the datastore? The obvious answer is this:
class Vertex(db.Model):
name = db.StringProperty()
class Edge(db.Model):
better = db.ReferenceProperty(Vertex, collection_name = 'better_set')
worse = db.ReferenceProperty(Vertex, collection_name = 'worse_set')
count = db.IntegerProperty()
But the problem I see with this is that I have to make 4 separate ugly queries along the lines of:
edge = Edge.all().filter('better =', vertex1).filter('worse =', vertex2).get()
Then I need to update and put() the new edges in a fifth query.
A more efficient (fewer queries) but hacky implementation would be this one, which uses pairs of lists to simulate a dict:
class Vertex(db.Model):
name = db.StringProperty()
better_keys = db.ListProperty(db.Key)
better_values = db.ListProperty(int)
So to add a score saying that A is better than B, I would do:
index = vertexA.index(vertexB.key())
vertexA.better_values[index] += 1
Is there a more efficient way to model this?
I solved my own problem with a minor modification to the first design I suggested in my question.
I learned about the key_name argument that lets me set my own key names. So every time I create a new edge, I pass in the following argument to the constructor:
key_name = vertex1.name + ' > ' + vertex2.name
Then, instead of running this query multiple times:
edge = Edge.all().filter('better =', vertex1).filter('worse =', vertex2).get()
I can retrieve the edges easily since I know how to construct their keys. Using the Key.from_path() method, I construct a list of keys that refer to edges. Each key is obtained by doing this:
db.Key.from_path('Edge', vertex1.name + ' > ' + vertex2.name)
I then pass that list of keys to get all the objects in one query.

Categories

Resources