Adding new nodes to a Tree by dendroPy - python

I would like to create a tree by dynamically adding nodes to an already
existing tree in DendroPy. So here is how I am proceeding,
>>> t1 = dendropy.Tree(stream=StringIO("(8,3)"),schema="newick")
Now That creates a small tree with two children having Taxon labels 8 and 3. Now
I want to add a new leaf to the node with taxon label 3. In order to do that I want the node
object.
>>> cp = t1.find_node_with_taxon_label('3')
I want to use add child function at that point which is an attribute of a node.
>>> n = dendropy.Node(taxon='5',label='5')
>>> cp.add_child(n)
But even after adding the node when I am printing all the node objects in t1, It is
returning the only children 8 and 3 that it was initialized with.
Please help me to understand how to add nodes in an existing tree in dendropy..
Now if we print t1 we would see the tree. But even after adding the elements
I could not find the objects that are added. For example if we do a
>>> cp1 = t1.find_node_with_taxon_label('5')
It is not returning the object related to 5.

To add a taxon you have to explicitly create and add it to the tree:
t1 = dendropy.Tree(stream=StringIO("(8,3)"),schema="newick")
# Explicitly create and add the taxon to the taxon set
taxon_1 = dendropy.Taxon(label="5")
t1.taxon_set.add_taxon(taxon_1)
# Create a new node and assign a taxon OBJECT to it (not a label)
n = dendropy.Node(taxon=taxon_1, label='5')
# Now this works
print t1.find_node_with_taxon_label("5")
The key is that find_node_with_taxon_label search in the t1.taxon_set list of taxons.

Related

Build tree-hierachy from two-dimensional list

I have a 2D-list that looks something like this:
[
["elem1","elem2"],
["elem1","elem3"],
["elem4","elem7"],
...
]
And I want to create a nested dictionary that then looks something like this:
[{"elem1":["elem2","elem3"]},{"elem4":"elem7"}]
So the higher the index in one of the initial sublists the higher will be the hierachical posiiton in the generated tree. How would you go about this in python? How do you call that "treeification"? I feel like there has to be a package out there that does exactly that.
I don't imagine there is something in a library for this considering it is fairly simple and not that useful for most people. It is better to write the code manually.
First of all, the output format in the question cannot fully represent a tree: for example the data
[
["elem1", "elem2"],
["elem1", "elem3"],
["elem4", "elem7"],
["elem3", "elem5"],
]
...would need to be similar to [{elem1":["elem2","elem3"]},{"elem4":"elem7"}] but add elem5 as a child of elem3, however elem3 is a string type, with no place for children to be stored. Thus, I suggest the following output format:
{'elem4': {'elem7': {}}, 'elem1': {'elem2': {}, 'elem3': {'elem5': {}}}}
Here every node is represented as a dictionary from child node names to child node values, so a tree containing only a root node looks like {}, and a tree with 3 nodes (the root + 2 children) looks like {'child1': {}, 'child2': {}}.
To take the turn a list of parent-child associations and turn them into such a tree you can use this code:
def treeify(data):
# result dictionary
map_list = {}
# initially all nodes with a child, will have items removed later
root_nodes = {parent for parent, child in data}
for parent, child in data:
# get the dictionary that this node maps to (empty dictionary by default)
children = map_list.setdefault(parent, {})
# add this connection
children[child] = map_list.setdefault(child, {})
# remove node with a parent from the set of root_nodes
if child in root_nodes:
root_nodes.remove(child)
# return the dictionary with only root nodes at the root
return dict((root_node, map_list[root_node]) for root_node in root_nodes)
print(treeify([
["elem1", "elem2"],
["elem1", "elem3"],
["elem4", "elem7"],
["elem3", "elem5"],
]))
Here is code which can help you to get as your required output
data = [
["elem1","elem2"],
["elem1","elem3"],
["elem4","elem7"],
]
maplist = {}
for a in data:
if a[0] in maplist:
maplist[a[0]].append(a[1])
else:
maplist[a[0]] = [a[1]]
print(maplist)
To get sorted based on list item you can use below code
sorted_items = sorted(maplist.items(), key = lambda item : len(item[1]), reverse=True)

Construct a Tree with list of Objects in Python

I created a class that is formatted as follows:
class PathStructure(object):
def __init__(self, Description, ID, Parent):
self.Description = Description
self.ID = ID
self.Parent = Parent
self.Children = []
Where Description, ID and Parent are strings, and Children are lists of PathStructure objects; thus, I know all the parent-child relationships. I want to be able to construct a graphical representation of this tree, so each PathStructure object becomes a node with the parent-child relationships linking the nodes. Creating the nodes is easy, I think:
nodes = {}
For item in pathstructure_list:
name = item.Description
nodes[name] = item
I am having trouble thinking of a way to link these nodes into create a tree structure out of the linked nodes. I have looked at examples, but I am kind of new to using dicts, so I don't really understand the solutions -- especially since I will be constructing a dict of objects.
EDIT:
To clarify, I initialize each PathStructure object from a spreadsheet of information, and then I determine the parent-child relationships. For example:
first = PathStructure('Master','1-234-5',None)
second = PathStructure('Sub One','2-345-6',first.ID)
third = PathStructure('Sub Two','3-456-7',first.ID)
fourth = PathStructure('Sub Three','4-597-8',second.ID)
pathstructs = [first, second, third, fourth]
And then I determine each object's children through a function, so I know each object's parent and child.
I was able to get close to where I want to be with the following:
full = collections.defaultdict(dict)
for item in pathstructs:
name = item.Description
ident = item.ID
perent = item.Parent
final = full[ident]
if parent:
full[parent][ident] = final
else:
root = final
But this method gets rid of the PathStructure objects, so I am stuck with a tree of string instead of object.

how to create relation between existing two node, I'm using neo4j.

I'm in starting neo4j and I'm using python3.5 and py2neo.
I had build two graph node with following code. and successfully create.[!
>>> u1 = Node("Person",name='Tom',id=1)
>>> u2 = Node('Person', name='Jerry', id=2)
>>> graph.create(u1,u2)
after that, I going to make a relation between 'Tom' and 'Jerry'
Tom's id property is 1, Jerry's id property is 2.
So. I think, I have to point to existing two node using id property.
and then I tried to create relation like below.
>>> u1 = Node("Person",id=1)
>>> u2 = Node("Person",id=2)
>>> u1_knows_u2=Relationship(u1, 'KKNOWS', u2)
>>> graph.create(u1_knows_u2)
above successfully performed. But the graph is something strange.
I don't know why unknown graph nodes are created. and why the relation is created between unknown two node.
You can have two nodes with the same label and same properties. The second node you get with u1 = Node("Person",id=1) is not the same one you created before. It's a new node with the same label/property.
When you define two nodes (i.e. your new u1 and u2) and create a relationships between them, the whole pattern will be created.
To get the two nodes and create a relationship between them you would do:
# create Tom and Jerry as before
u1 = Node("Person",name='Tom',id=1)
u2 = Node('Person', name='Jerry', id=2)
graph.create(u1,u2)
# either use u1 and u2 directly
u1_knows_u2 = Relationship(u1, 'KKNOWS', u2)
graph.create(u1_knows_u2)
# or find existing nodes and create a relationship between them
existing_u1 = graph.find_one('Person', property_key='id', property_value=1)
existing_u2 = graph.find_one('Person', property_key='id', property_value=2)
existing_u1_knows_u2 = Relationship(existing_u1, 'KKNOWS', existing_u2)
graph.create(existing_u1_knows_u2)
find_one() assumes that your id properties are unique.
Note also that you can use the Cypher query language with Py2neo:
graph.cypher.execute('''
MERGE (tom:Person {name: "Tom"})
MERGE (jerry:Person {name: "Jerry"})
CREATE UNIQUE (tom)-[:KNOWS]->(jerry)
''')
The MERGE statement in Cypher is similar to "get or create". If a Person node with the given name "Tom" already exists it will be bound to the variable tom, if not the node will be created and then bound to tom. This, combined with adding uniqueness constraints allows for avoiding unwanted duplicate nodes.
Check this Query,
MATCH (a),(b) WHERE id(a) =1 and id(b) = 2 create (a)-[r:KKNOWS]->(b) RETURN a, b

py2neo - How can I use merge_one function along with multiple attributes for my node?

I have overcome the problem of avoiding the creation of duplicate nodes on my DB with the use of merge_one functions which works like that:
t=graph.merge_one("User","ID","someID")
which creates the node with unique ID. My problem is that I can't find a way to add multiple attributes/properties to my node along with the ID which is added automatically (date for example).
I have managed to achieve this the old "duplicate" way but it doesn't work now since merge_one can't accept more arguments! Any ideas???
Graph.merge_one only allows you to specify one key-value pair because it's meant to be used with a uniqueness constraint on a node label and property. Is there anything wrong with finding the node by its unique id with merge_one and then setting the properties?
t = graph.merge_one("User", "ID", "someID")
t['name'] = 'Nicole'
t['age'] = 23
t.push()
I know I am a bit late... but still useful I think
Using py2neo==2.0.7 and the docs (about Node.properties):
... and the latter is an instance of PropertySet which extends dict.
So the following worked for me:
m = graph.merge_one("Model", "mid", MID_SR)
m.properties.update({
'vendor':"XX",
'model':"XYZ",
'software':"OS",
'modelVersion':"",
'hardware':"",
'softwareVesion':"12.06"
})
graph.push(m)
This hacky function will iterate through the properties and values and labels gradually eliminating all nodes that don't match each criteria submitted. The final result will be a list of all (if any) nodes that match all the properties and labels supplied.
def find_multiProp(graph, *labels, **properties):
results = None
for l in labels:
for k,v in properties.iteritems():
if results == None:
genNodes = lambda l,k,v: graph.find(l, property_key=k, property_value=v)
results = [r for r in genNodes(l,k,v)]
continue
prevResults = results
results = [n for n in genNodes(l,k,v) if n in prevResults]
return results
The final result can be used to assess uniqueness and (if empty) create a new node, by combining the two functions together...
def merge_one_multiProp(graph, *labels, **properties):
r = find_multiProp(graph, *labels, **properties)
if not r:
# remove tuple association
node,= graph.create(Node(*labels, **properties))
else:
node = r[0]
return node
example...
from py2neo import Node, Graph
graph = Graph()
properties = {'p1':'v1', 'p2':'v2'}
labels = ('label1', 'label2')
graph.create(Node(*labels, **properties))
for l in labels:
graph.create(Node(l, **properties))
graph.create(Node(*labels, p1='v1'))
node = merge_one_multiProp(graph, *labels, **properties)

test for node membership in pydot graph

pydot has a huge number of bound methods for getting and setting every little thing in a dot graph, reading and writing, you-name-it, but I can't seem to find a simple membership test.
>>> d = pydot.Dot()
>>> n = pydot.Node('foobar')
>>> d.add_node(n)
>>> n in d.get_nodes()
False
is just one of many things that didn't work. It appears that nodes, once added to a graph, acquire a new identity
>>> d.get_nodes()[0]
<pydot.Node object at 0x171d6b0>
>>> n
<pydot.Node object at 0x1534650>
Can anyone suggest a way to create a node and test to see if it's in a graph before adding it so you could do something like this:
d = pydot.Dot()
n = pydot.Node('foobar')
if n not in d:
d.add_node(n)
Looking through the source code, http://code.google.com/p/pydot/source/browse/trunk/pydot.py, it seems that node names are unique values, used as the keys to locate the nodes within a graph's node dictionary (though, interestingly, rather than return an error for an existing node, it simply adds the attributes of the new node to those of the existing one).
So unless you want to add an implementation of __contains__() to one of the classes in the pydot.py file that does the following, you can just do the following in your code:
if n.get_name() not in d.obj_dict['nodes'].keys():
d.add_node(n)

Categories

Resources