Hey everybody,
I've been trying to find a built-in function for extracting the root of a tree in python,
I haven't found something like that and I've been trying to build one of my own but I couldn't build something generic to fit all my needs.
Does anyone have something prepared or perhaps know how to extract this information from the tree structure in python?
thanks
You have to roll your own:
class Node(object):
def __init__(self, p=None):
self.parent = p
self.children = []
n1 = Node()
n2 = Node()
n1.children.append(n2)
n2.parent = n1
Of course you would want to have methods like addChild that would manage the .children and .parent attributes of the involved objects automatically.
Then you could write a method
def findRoot(node):
p = node
while p.parent != None:
p = p.parent
return p
Related
I wan to create a python script that print out a directory tree.
I'm aware there are tons of information about the topic, and many ways to achieve it.
Still, my problem really is about recursion.
In order to face the problem i choosed a OOP way:
Create a Class TreeNode
Store some props and methods
calling in the os.walk function (ya i know I can use pathlib or other libs.)
recursively create parent-child relationship of folders/files
First, the Class TreeNode:
properties: data, children, parent
methods: add_child(),
get_level(), to get the level of the parent/child relation in order to print it later
print_tree(), to actually print the tree (desired result shown above code)
class Treenode:
def __init__(self, data):
self.data = data
self.children = []
self.parent = None
def add_child(self,child):
child.parent = self
self.children.append(child)
def get_level(self):
level = 0
p = self.parent
while p:
level += 1
p = p.parent
return level
def print_tree(self):
spaces = " " * self.get_level() * 3
prefix = spaces + "|__" if self.parent else ""
print(prefix + self.data)
for child in self.children:
child.print_tree()
Second, the probelm. Function to creating the tree
def build_tree(dir_path):
for root,dirs,files in os.walk(dir_path):
if dir_path == root:
for d in dirs:
directory = Treenode(d)
tree.add_child(directory)
for f in files:
file = Treenode(f)
tree.add_child(file)
working_directories = dirs
else:
for w in working_directories:
build_tree(os.path.join(dir_path,w))
return tree
Finally, the main method:
if __name__ == '__main__':
tree = Treenode("C:/Level0")
tree = build_tree("C:/Level0")
tree.print_tree()
pass
The output of this code would be:
C:/Level0
|__Level1
|__0file.txt
|__Level2
|__Level2b
|__1file1.txt
|__1file2.txt
|__Level3
|__2file1.txt
|__LEvel4
|__3file1.txt
|__4file1.txt
|__2bfile1.txt
The desired output should be:
C:/Level0
|__Level1
|__Level2
|__Level3
|__LEvel4
|__4file1.txt
|__3file1.txt
|__2file1.txt
|__Level2b
|__2bfile1.txt
|__1file1.txt
|__1file2.txt
|__0file.txt
The problem lays in the tree.add_child(directory), since everytime the code get there it add the new directory (or file) as child of the same "root tree". Not in tree.children.children..etc
So here's the problem. How do i get that. The if else statement in the build_tree() function is probably unecessary, i was trying to work my way around but no luck.
I know it's a dumb problem, coming from a lack of proper study of algorithms and data structures..
If you will to help though, i'm here to learn ^^
This will do what you want:
def build_tree(parent, dir_path):
child_list = os.listdir(dir_path)
child_list.sort()
for child in child_list:
node = Treenode(child)
parent.add_child(node)
child_path = os.path.join(dir_path, child)
if os.path.isdir(child_path):
build_tree(node, child_path)
Then, for your main code, use:
if __name__ == '__main__':
root_path = "C:/Level0"
tree = Treenode(root_path)
build_tree(tree, root_path)
tree.print_tree()
The main change was to use os.listdir rather than os.walk. The problem with os.walk is that it recursively walks the entire directory tree, which doesn't work well with the recursive build_tree, which wants to operate on a single level at a time.
You can use os.walk, but then don't use recursion, as you don't want to repeat the call to os.walk: one call gives all the data you need already. Instead use a dictionary to keep track of the hierarchy:
def build_tree(dir_path):
helper = { dir_path: Treenode(dir_path) }
for root, dirs, files in os.walk(dir_path, topdown=True):
for item in dirs + files:
node = helper[os.path.join(root, item)] = Treenode(item)
helper[root].add_child(node)
return helper[dir_path]
if __name__ == "__main__":
tree = build_tree("C:/Level0")
tree.print_tree()
I need help about a problem that I'm pretty sure dask can solve.
But I don't know how to tackle it.
I need to construct a tree recursively.
For each node if a criterion is met a computation (compute_val) is done else 2 new childs are created. The same treament is performed on the childs (build).
Then if all the childs of node had performed a computation we can proceed to a merge (merge). The merge can perform a fusion of the childs (if they both meet a criterion) or nothing.
For the moment I was able to parallelize only the first level and I don't know which tools of dask I should use to be more effective.
This is a simplified MRE sequential of what I want to achieve:
import numpy as np
import time
class Node:
def __init__(self, level):
self.level = level
self.val = None
def merge(node, childs):
values = [child.val for child in childs]
if all(values) and sum(values)<0.1:
node.val = np.mean(values)
else:
node.childs = childs
return node
def compute_val():
time.sleep(0.1)
return np.random.rand(1)
def build(node):
print(node.level)
if (np.random.rand(1) < 0.1 and node.level>1) or node.level>5:
node.val = compute_val()
else:
childs = [build(Node(level=node.level+1)) for _ in range(2)]
node = merge(node, childs)
return node
tree = build(Node(level=0))
As I understand, the way you tackle recursion (or any dynamic computation) is to create tasks within a task.
I was experimenting with something similar, so below is my 5 minute illustrative solution. You'd have to optimise it according to characteristics of the algorithm.
Keep in mind that tasks add overhead, so you'd want to chunk the computations for optimal results.
Relevant doc:
https://distributed.dask.org/en/latest/task-launch.html
Api reference:
https://distributed.dask.org/en/latest/api.html#distributed.worker_client
https://distributed.dask.org/en/latest/api.html#distributed.Client.gather
https://distributed.dask.org/en/latest/api.html#distributed.Client.submit
import numpy as np
import time
from dask.distributed import Client, worker_client
# Create a dask client
# For convenience, I'm creating a localcluster.
client = Client(threads_per_worker=1, n_workers=8)
client
class Node:
def __init__(self, level):
self.level = level
self.val = None
self.childs = None # This was missing
def merge(node, childs):
values = [child.val for child in childs]
if all(values) and sum(values)<0.1:
node.val = np.mean(values)
else:
node.childs = childs
return node
def compute_val():
time.sleep(0.1) # Is this required.
return np.random.rand(1)
def build(node):
print(node.level)
if (np.random.rand(1) < 0.1 and node.level>1) or node.level>5:
node.val = compute_val()
else:
with worker_client() as client:
child_futures = [client.submit(build, Node(level=node.level+1)) for _ in range(2)]
childs = client.gather(child_futures)
node = merge(node, childs)
return node
tree_future = client.submit(build, Node(level=0))
tree = tree_future.result()
I would need the simplest possible implementation for a data structure, that can be traversed in both parent->children and children->parent direction; so ideally the child should hold a reference to the parent as well.
Was thinking about a dictionary, where the children would simply hold a reference to their parent, similar to this:
# define the root node
a = {'name': 'trunk', 'value': 0, 'parent': None, 'children': []}
# add child
a['children'].append({'name': 'branch-1', 'value': 1,
'parent': a, 'children': []})
# and so on...
Is this safe to do? (Circular reference might impact garbage collection?) Does it make sense to do this? What would be simpler?
A simple Tree (Node) class, that can be traversed both ways:
class Tree(object):
def __init__(self, data, children=None, parent=None):
self.data = data
self.children = children or []
self.parent = parent
def add_child(self, data):
new_child = Tree(data, parent=self)
self.children.append(new_child)
return new_child
def is_root(self):
return self.parent is None
def is_leaf(self):
return not self.children
def __str__(self):
if self.is_leaf():
return str(self.data)
return '{data} [{children}]'.format(data=self.data, children=', '.join(map(str, self.children)))
> t = Tree('foo')
> bar = t.add_child('bar')
> baz = t.add_child('baz')
> print(t)
'foo [bar, baz]'
> print(bar.parent)
'foo [bar, baz]'
You would make a Node class.
The basic structure would look something like this, though honestly you could probably do it with dicts too. Just personally feel classes are cleaner looking.
class Node(object):
def __init__(self):
self.parent = None # Single object
self.child = [] # Array of objects
self.name = None
self.data = None
The rest depends on your needs. Some functions you may want built into your class (or if you use hashes, build out as methods in your script)
Update: which takes a specific node and updates its values/name/what
have you
Delete: which takes a specific node and removes it from the tree. If
you do this make sure to connect the deleted nodes children to the
deleted nodes parent.
Insert: which takes a specific point in the tree and adds a new node
into it. This should update the parent and children around the node.
Update children: appends children to node.child array. Should be
called from update parent as the two processes are self referential.
Update parent: Deletes self from parent.child array. Adds self to
new_parent.child array.
If you want to easily reference specific parts of a node, you can make a hash_map as a sort of table of contents
node_tree_map = {}
node_tree_map[node.name] = node
# node_tree_map['name'] would allow you quick access to bob's
# parent/children/value
# just by knowing the name but without having to traverse
# the whole tree to find it
The above will allow you to easily dive into specific nodes if necessary.
Btw, removing a node from being referenced in the tree or hash map would make garbage collection a non issue.
In a python application we have a tree made up of TreeNode objects, and we had to add a property on the TreeNode class that returns the path from the tree root to that node as a list. We have implemented this in a simple recursive way, however the code looks a little verbose for python (we suspect there is a terser way of expressing a simple algorithm like this in python). Does anyone know a more pythonic way of expressing this?
Here is a simplified version of our code - it is the definition of path_from_root we are looking to improve:
class TreeNode(object):
def __init__(self, value, parent=None):
self.value = value
self.parent = parent
#property
def path_from_root(self):
path = []
_build_path_from_root(self, path)
return path
def _build_path_from_root(node, path):
if node.parent:
_build_path_from_root(node.parent, path)
path.append(node)
Below are some unit tests that show how path_from_root works:
class TreePathAsListTests(unittest.TestCase):
def setUp(self):
self.root = TreeNode(value="root")
self.child_1 = TreeNode(value="child 1", parent=self.root)
self.child_2 = TreeNode(value="child 2", parent=self.root)
self.leaf_1a = TreeNode(value="leaf 1a", parent=self.child_1)
def test_path_from_root(self):
self.assertEquals([self.root, self.child_1, self.leaf_1a], self.leaf_1a.path_from_root)
self.assertEquals([self.root, self.child_2], self.child_2.path_from_root)
self.assertEquals([self.root], self.root.path_from_root)
Update: Accepted an answer that was a clear improvement, but definitely still interested in any other ways of expressing this.
I would do it this way:
#property
def path_from_root(self):
if self.parent:
return self.parent.path_from_root + [self]
return [self]
I've got a piece of code which contains a for loop to draw things from an XML file;
for evoNode in node.getElementsByTagName('evolution'):
evoName = getText(evoNode.getElementsByTagName( "type")[0].childNodes)
evoId = getText(evoNode.getElementsByTagName( "typeid")[0].childNodes)
evoLevel = getText(evoNode.getElementsByTagName( "level")[0].childNodes)
evoCost = getText(evoNode.getElementsByTagName("costperlevel")[0].childNodes)
evolutions.append("%s x %s" % (evoLevel, evoName))
Currently it outputs into a list called evolutions as it says in the last line of that code, for this and several other for functions with very similar functionality I need it to output into a class instead.
class evolutions:
def __init__(self, evoName, evoId, evoLevel, evoCost)
self.evoName = evoName
self.evoId = evoId
self.evoLevel = evoLevel
self.evoCost = evoCost
How to create a series of instances of this class, each of which is a response from that for function? Or what is a core practical solution? This one doesn't really need the class but one of the others really does.
A list comprehension might be a little cleaner. I'd also move the parsing logic to the constructor to clean up the implemenation:
class Evolution:
def __init__(self, node):
self.node = node
self.type = property("type")
self.typeid = property("typeid")
self.level = property("level")
self.costperlevel = property("costperlevel")
def property(self, prop):
return getText(self.node.getElementsByTagName(prop)[0].childNodes)
evolutionList = [Evolution(evoNode) for evoNode in node.getElementsByTagName('evolution')]
Alternatively, you could use map:
evolutionList = map(Evolution, node.getElementsByTagName('evolution'))
for evoNode in node.getElementsByTagName('evolution'):
evoName = getText(evoNode.getElementsByTagName("type")[0].childNodes)
evoId = getText(evoNode.getElementsByTagName("typeid")[0].childNodes)
evoLevel = getText(evoNode.getElementsByTagName("level")[0].childNodes)
evoCost = getText(evoNode.getElementsByTagName("costperlevel")[0].childNodes)
temporaryEvo = Evolutions(evoName, evoId, evoLevel, evoCost)
evolutionList.append(temporaryEvo)
# Or you can go with the 1 liner
evolutionList.append(Evolutions(evoName, evoId, evoLevel, evoCost))
I renamed your list because it shared the same name as your class and was confusing.