I created a class that is formatted as follows:
class PathStructure(object):
def __init__(self, Description, ID, Parent):
self.Description = Description
self.ID = ID
self.Parent = Parent
self.Children = []
Where Description, ID and Parent are strings, and Children are lists of PathStructure objects; thus, I know all the parent-child relationships. I want to be able to construct a graphical representation of this tree, so each PathStructure object becomes a node with the parent-child relationships linking the nodes. Creating the nodes is easy, I think:
nodes = {}
For item in pathstructure_list:
name = item.Description
nodes[name] = item
I am having trouble thinking of a way to link these nodes into create a tree structure out of the linked nodes. I have looked at examples, but I am kind of new to using dicts, so I don't really understand the solutions -- especially since I will be constructing a dict of objects.
EDIT:
To clarify, I initialize each PathStructure object from a spreadsheet of information, and then I determine the parent-child relationships. For example:
first = PathStructure('Master','1-234-5',None)
second = PathStructure('Sub One','2-345-6',first.ID)
third = PathStructure('Sub Two','3-456-7',first.ID)
fourth = PathStructure('Sub Three','4-597-8',second.ID)
pathstructs = [first, second, third, fourth]
And then I determine each object's children through a function, so I know each object's parent and child.
I was able to get close to where I want to be with the following:
full = collections.defaultdict(dict)
for item in pathstructs:
name = item.Description
ident = item.ID
perent = item.Parent
final = full[ident]
if parent:
full[parent][ident] = final
else:
root = final
But this method gets rid of the PathStructure objects, so I am stuck with a tree of string instead of object.
Related
I'm trying to optimize my maze generation algorithm. At the moment i have a list of sets of nodes and a list of nodes themselves. Nodes are stored as (x,y) tuples. At the beginning each set contains only one node. I pick a random border between two nodes and check if they are in the same set. Here's the problem - i have to iterate through the list of sets and look at every single item until i find the set which contains given node/nodes. I want to be able to access sets as a property of Node class, but also i want my sets to contain objects of "Node" class and i run into this:
class Node:
def __init__(self, xy:tuple, group:set):
self.xy = xy
self.group = group
node = Node((10, 10),{Node(10, 10),{Node(10, 10),{... and so on }}})
How do i create such a relation so that i can access sets as node.group and at the same time group property would point to the needed set with other Node objects without recursion?
Is this what you wanted ?
class Node:
def __init__(self, xy:tuple):
self.xy = xy
self.group = None
def set_group(self, group:set):
if self.group is not None:
self.group.remove(self)
group.add(self)
self.group = group
node1 = Node((1,1))
node2 = Node((2,2))
group1 = set()
group2 = set()
node1.set_group(group1)
node2.set_group(group2)
I have a 2D-list that looks something like this:
[
["elem1","elem2"],
["elem1","elem3"],
["elem4","elem7"],
...
]
And I want to create a nested dictionary that then looks something like this:
[{"elem1":["elem2","elem3"]},{"elem4":"elem7"}]
So the higher the index in one of the initial sublists the higher will be the hierachical posiiton in the generated tree. How would you go about this in python? How do you call that "treeification"? I feel like there has to be a package out there that does exactly that.
I don't imagine there is something in a library for this considering it is fairly simple and not that useful for most people. It is better to write the code manually.
First of all, the output format in the question cannot fully represent a tree: for example the data
[
["elem1", "elem2"],
["elem1", "elem3"],
["elem4", "elem7"],
["elem3", "elem5"],
]
...would need to be similar to [{elem1":["elem2","elem3"]},{"elem4":"elem7"}] but add elem5 as a child of elem3, however elem3 is a string type, with no place for children to be stored. Thus, I suggest the following output format:
{'elem4': {'elem7': {}}, 'elem1': {'elem2': {}, 'elem3': {'elem5': {}}}}
Here every node is represented as a dictionary from child node names to child node values, so a tree containing only a root node looks like {}, and a tree with 3 nodes (the root + 2 children) looks like {'child1': {}, 'child2': {}}.
To take the turn a list of parent-child associations and turn them into such a tree you can use this code:
def treeify(data):
# result dictionary
map_list = {}
# initially all nodes with a child, will have items removed later
root_nodes = {parent for parent, child in data}
for parent, child in data:
# get the dictionary that this node maps to (empty dictionary by default)
children = map_list.setdefault(parent, {})
# add this connection
children[child] = map_list.setdefault(child, {})
# remove node with a parent from the set of root_nodes
if child in root_nodes:
root_nodes.remove(child)
# return the dictionary with only root nodes at the root
return dict((root_node, map_list[root_node]) for root_node in root_nodes)
print(treeify([
["elem1", "elem2"],
["elem1", "elem3"],
["elem4", "elem7"],
["elem3", "elem5"],
]))
Here is code which can help you to get as your required output
data = [
["elem1","elem2"],
["elem1","elem3"],
["elem4","elem7"],
]
maplist = {}
for a in data:
if a[0] in maplist:
maplist[a[0]].append(a[1])
else:
maplist[a[0]] = [a[1]]
print(maplist)
To get sorted based on list item you can use below code
sorted_items = sorted(maplist.items(), key = lambda item : len(item[1]), reverse=True)
I would like to create a tree by dynamically adding nodes to an already
existing tree in DendroPy. So here is how I am proceeding,
>>> t1 = dendropy.Tree(stream=StringIO("(8,3)"),schema="newick")
Now That creates a small tree with two children having Taxon labels 8 and 3. Now
I want to add a new leaf to the node with taxon label 3. In order to do that I want the node
object.
>>> cp = t1.find_node_with_taxon_label('3')
I want to use add child function at that point which is an attribute of a node.
>>> n = dendropy.Node(taxon='5',label='5')
>>> cp.add_child(n)
But even after adding the node when I am printing all the node objects in t1, It is
returning the only children 8 and 3 that it was initialized with.
Please help me to understand how to add nodes in an existing tree in dendropy..
Now if we print t1 we would see the tree. But even after adding the elements
I could not find the objects that are added. For example if we do a
>>> cp1 = t1.find_node_with_taxon_label('5')
It is not returning the object related to 5.
To add a taxon you have to explicitly create and add it to the tree:
t1 = dendropy.Tree(stream=StringIO("(8,3)"),schema="newick")
# Explicitly create and add the taxon to the taxon set
taxon_1 = dendropy.Taxon(label="5")
t1.taxon_set.add_taxon(taxon_1)
# Create a new node and assign a taxon OBJECT to it (not a label)
n = dendropy.Node(taxon=taxon_1, label='5')
# Now this works
print t1.find_node_with_taxon_label("5")
The key is that find_node_with_taxon_label search in the t1.taxon_set list of taxons.
I have a couple functions like this:
object obj.getChild(childIndex)
int obj.numChildren()
So I am using these to create this function:
collection obj.getChildren()
I am flexible in the return type, but I will doing a lot of "subtraction", "multiplication" using another list. So something like this:
children = obj.getChildren()
children = [5,10,15,20,25]
globalChildren = [1,2,3,4,5,6,7,8,9,10,12,14,16,18,20]
difference = children - globalChildren
difference = [15,25]
shared = children * globalChildren
shared = [5,10,20]
Is there a fast and elegant way to do these or do I have to go through each element one by one and gather the elements manually?
You're looking for sets
children = {5,10,15,20,25}
globalChildren = {1,2,3,4,5,6,7,8,9,10,12,14,16,18,20}
difference = children - globalChildren
shared = children & globalChildren
I'm using Python and I have some data that I want to put into a tree format and assign codes to. Here's some example data:
Africa North Africa Algeria
Africa North Africa Morocco
Africa West Africa Ghana
Africa West Africa Sierra Leone
What would be an appropriate tree structure for this data?
Also, is there a way that I can then retrieve numerical codes from this tree structure, so that I could query the data and get codes like the following example?
def get_code(place_name):
# Python magic query to my tree structure
return code
get_code("Africa") # returns 1
get_code("North Africa") # returns 1.1
get_code("Morocco") # returns 1.1.2
Thank you for your help - I still have much to learn about Python :)
I would recommend, assuming you can count on there being no duplication among the names, something like:
class Node(object):
byname = {}
def __init__(self, name, parent=None):
self.name = name
self.parent = parent
self.children = []
self.byname[name] = self
if parent is None: # root pseudo-node
self.code = 0
else: # all normal nodes
self.parent.children.append(self)
self.code = len(self.parent.children)
def get_codes(self, codelist):
if self.code:
codelist.append(str(self.code))
self.parent.get_codes(codelist)
root = Node('')
def get_code(nodename):
node = Node.byname.get(nodename)
if node is None: return ''
codes = []
node.get_codes(codes)
codes.reverse()
return '.'.join(codes)
Do you also want to see the Python code for how to add a node given a hierarchical sequence of names, such as ['Africa', 'North Africa', 'Morocco']? I hope it would be pretty clear given the above structure, so you might want to do it yourself as an exercise, but of course do ask if you'd rather see a solution instead;-).
Getting the hierarchical sequence of names from a text line (string) depends on what the separators are -- in your example it looks like it's just a bunch of spaces added for purely aesthetic reasons connected with lining up the columns (if that's the case I'd recommend a simple re based approach to split on sequence of two+ spaces), but if it's actually (e.g.) tab characters as the separators, the csv module from Python's standard library would serve you better. I just can't tell from the short example you posted in your Q!-)
Edit: the OP says they can get the sequence of names just fine but would like to see the code to add the relevant nodes from those -- so, here goes!-)
def addnodes(names):
parent = root
for name in names:
newnode = Node.byname.get(name)
if newnode is None:
newnode = Node(name, parent)
parent = newnode
See why it's important that node names are unique, to make the above class work? Since Node.byname is a single per-class dict, it can record only one "corresponding node" for each given name -- thus, a name that's duplicated in two or more places in the hierarchy would "clash" and only one of the two or more nodes would be properly recorded.
But then again, the function get_code which the OP says is the main reason for this whole apparatus couldn't work as desired if a name could be ambiguous, since the OP's specs mandate it returning only one string. So, some geographical list like
America United States Georgia
Europe Eastern Europe Georgia
(where two completely unrelated areas just happen to be both named 'Georgia' -- just the kind of thing that unfortunately often happens in real-world geography, as the above example shows!-) would destroy the whole scheme (depending on how the specs for get_code happen to be altered to deal with an ambiguous-name argument, of course, the class structure could surely be altered accordingly and accomodate the new, drastically different specs!).
The nice thing about encapsulating these design decisions in a class (albeit in this case with a couple of accompanying functions -- they could be elegantly be made into class methods, of course, but the OP's specs rigidly demand that get_code be a function, so I decided that, in that case addnodes might as well also be one!-) is that the specific design decisions are mostly hidden from the rest of the code and thus can easily be altered (as long as specs never change, of course -- that's why it's so crucial to spend time and attention defining one's API specs, much more than on any other part of design and coding!-) to refactor the internal behavior (e.g. for optimization, ease of debugging/testing, and so on) while maintaining API-specified semantics intact, and thus leaving all other parts of the application pristine (not even needing re-testing, actually, as long of course as the parts that implement the API are very thoroughly unit-tested -- not hard to do, since they're nicely isolated and stand-alone!-).
An ad-hoc POD ("plain old data") class representing a tree would do fine, something like:
class Location(object):
def __init__(self, data, parent)
self.data = data
self.parent = parent
self.children = []
Now assign/read the data attribute, or add/remove children, perhaps with helper methods:
def add_child(self, child):
self.children.append(child)
Now, to actually divide your data into tree levels, a simple algorithm would be look at all the places with a common level-data (such as Africa) and assign them a location, then recursively for next level of data.
So, for Africa you create a location with data = Africa. Then, it will have a Location child for North Africa, West Africa and so on.
For "get the code" have a dictionary mapping each country to its location node, and use the parent links in the nodes. Traverse from the node to the top (until parent is None) at each level assigning the part of the code to be the index in the children list of the parent.
I am not sure, if I have got it right. If we keep every object in a global dict then it defeats the purpose of using a tree, which is only used to construct the numbering scheme.
But the tree based representation looks something like this:
class Location(object):
allLocation = {}
def __init__(self, name):
self.name = name
self.parent = None
self.number = "0"
self.children = {}
def putChild(self, childLocation):
if childLocation.name not in self.allLocation.keys():
# Now adjust the number scheme
#if self.number is "0":
# this is root
numScheme = str(len(self.children) + 1)
childLocation.setNumber(numScheme)
# Add the child
self.children[childLocation.number] = childLocation
self.allLocation[childLocation.name] = childLocation
childLocation.parent = self
return 0
else:
return 1 # Location already a child of the current clocation
def setNumber(self, num):
if self.number is not "0":
# raise an exception, number already adjusted
pass
else:
# set the number
self.number = num
def locateChild(self, numScheme):
# Logic should be to break up the numScheme and pass the token successively
numSchemeList = []
if type(numScheme) is str:
numSchemeList = numScheme.split(".")
else:
numSchemeList = numScheme
if len(numSchemeList) >= 1:
k = numSchemeList.pop()
# if the child is available
if k in self.children.keys():
childReferenced = self.children[k]
# Is child of child required
if len(numSchemeList) >= 1:
return childReferenced.locateChild(numSchemeList)
else:
return childReferenced
else:
# No such child
return None
else:
# The list is empty , search ends here
return None
def getScheme(self, name):
if name in self.allLocation.keys():
locObj = self.allLocation[name]
return locObj.getNumScheme(name, "")
else:
return None
def getNumScheme(self, name, numScheme="0",):
if not self.parent:
return numScheme
if numScheme != "":
return self.parent.getNumScheme(name, self.number + "." + numScheme)
else:
return self.parent.getNumScheme(name, self.number )
root = Location("root")
africa = Location("Africa")
asia = Location("Asia")
america = Location("America")
root.putChild(africa)
root.putChild(asia)
root.putChild(america)
nafrica = Location("North Africa")
africa.putChild(nafrica)
nafrica.putChild(Location("Morrocco"))
obj = root.locateChild("1.1.1")
print obj.name
print root.getScheme("Morrocco")
This code can be hideous. But, I just want to paste it because I have put some time into it :)
tree = file_to_list_of_tuples(thefile)
d = {}
i = 1
for continent, region, country in tree:
if continent not in d:
d[continent] = [i, 0, 0]
i += 1
cont_code = d[continent][0]
if region not in d:
max_reg_code = max( [y for x, y, z in d.values() if x==cont_code] )
d[region] = [cont_code, max_reg_code+1 , 0]
reg_code = d[region][1]
if country not in d:
max_country_code = max( [z for x, y, z in d.values() if x == cont_code and y== reg_code] )
d[country] = [cont_code, reg_code, max_country_code+1]
def get_code(x):
print d[x]
get_code will print lists, but you can easily make them print in the format you want.
You might use itertree package (I'm the author):
from itertree import *
#1. create the tree:
root2=iTree('root')
root2.append(iTree('Africa'))
root2[0].append(iTree('North Africa'))
root2[0].append(iTree('West Africa'))
root2[0][0].append(iTree('Algeria'))
root2[0][0].append(iTree('Morocco'))
item=iTree('Ghana') # keep the item for easier access
root2[0][1].append(item)
root2[0][1].append(iTree('Sierra Leone'))
# get the index path information of an item:
print('"Ghana" item index path:',item.idx_path)
# you can also search for items:
result = root2.find(['**', 'Morroco'])
print('"Morroco" item index path:', result.idx_path)
executing the script will deliver:
"Ghana" item index path: [0, 1, 0]
"Morroco" item index path [0, 0, 1]
Beside this you might add additional data to each item and do filtered searches.