representing a tree as a list in python - python

I'm learning python and I'm curious about how people choose to store (binary) trees in python.
Is there something wrong in storing the nodes of the tree as a list in python? something like:
[0,1,2,3,4,5,6,7,8]
where the 0'th position is 0 by default, 1 is the root, and for each position (i), the 2i and 2i+1 positions are the children. When no child is present, we just have a 'None' in that position.
I've read a couple of books/notes where they represent a tree using a list of lists, or something more complicated than just a simple list like this, and I was wondering if there's something inherently wrong in how i'm looking at it?

You certainly COULD do this. I'd define it as a class deriving from list with a get_children method. However this is fairly ugly since either A) you'd have to preprocess the whole list in O(n) time to pair up indices with values or B) you'd have to call list.index in O(n log n) time to traverse the tree.
class WeirdBinaryTreeA(list):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def get_children(value):
"""Calls list.index on value to derive the children"""
idx = self.index(value) # O(n) once, O(n log n) to traverse
return self[idx * 2], self[idx * 2 + 1]
class WeirdBinaryTreeB(list):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.__mapping = self.processtree()
def processtree(self):
for idx, val in enumerate(self):
self.__mapping[val] = idx
def get_children(value):
"""Queries the mapping on value to derive the children"""
idx = self.__mapping[value] # O(1) once, O(n) to traverse
return self[idx * 2], self[idx * 2 + 1]
However the bigger question is why would you do this? What makes it better than a list of lists or a dict of dicts? What happens when you have:
A
/ \
B
/ \
C
/ \
D
/ \
E
/ \
F
And your list looks like:
[0, 'A', None, 'B', None, None, None, 'C', None, None, None, None, None, None, None, 'D', ...]
Instead of:
{"A": {"B": {"C": {"D": {"E": {"F": None}}}}}}

There's nothing wrong with storing a binary tree as a list the way you're doing - it's the same idea as storing it as a flat array in a language like C or Java. Accessing the parent of a given node is very fast, and finding the children is also pretty efficient.
I suppose a lot of examples and tutorials will prefer to use a representation that's 'really tree shaped' (list of lists or objects) - it might be a bit more intuitive to explain.

I've seen representations like this (your flat list/array) used in C code, and representations like this could be acceptable in Python too, but it depends on the nature of the data you're handling. In C code, a balanced tree in this list representation can be very fast to access (much quicker than navigating a series of pointers), though the performance benefit in Python may be less noticeable due to all the other overhead.
For reasonably balanced dense trees, this flat list approach is reasonable. However, as Adam Smith commented, this type of flat list tree would become extremely wasteful for unbalanced sparse trees. Suppose you have one branch with single children going down a hundred levels, and the rest of the tree had nothing. You would need 2^100 + 2^99 + 2^98 + ... + 2^1 + 1 spots in the flat list tree. For such a case, you would use up a huge amount of memory for something that could be represented much more efficiently with nested lists.
So in essence, the choice between flat list trees vs. nested list trees is similar to the choice between flat array trees and pointer based trees in a C like language.

Related

Pros and cons of different implementations of graph adjacency list

I have seen multiple representations of adjacency list of a graph and I do not know which one to use.
I am thinking of the following representation of a Node object and Graph object (as below)
class Node(object):
def __init__(self, val):
self.val = val
self.connections_distance = {}
# key = node: val = distance
def add(self, neighborNode, distance):
if neighborNode not in self.connections_distance:
self.connections_distance[neighborNode] = distance
class Graph(object):
def __init__(self):
self.nodes = {}
# key = node.val : val = node object
# multiple methods
The second way is nodes are labelled 0 - n - 1 (n is number of nodes). Each node stores it adjacency as an array of linked lists (where the index is the node value and the linked list stores all of its neighbors)
ex. graph:
0 connected to 1 and 2
1 connected to 0 and 2
2 connected to 0 and 1
Or if [a, b, c] is and array containing a, b, and c and [x -> y -> z] is a linked list containing x, y, and z:
representation: [[1->2], [0->2], [0->1]]
Question : What are the pros and cons of each representation and which is more widely used?
Note: It's a bit odd that one representation includes distances and the other doesn't. It's pretty easy to them to both include distances or both omit them though, so I'll omit that detail (you might be interested in set() rather than {}).
It looks like both representations are variants of an Adjacency List (explained further in https://stackoverflow.com/a/62684297/3798897). Conceptually there isn't much difference between the two representations -- you have a collection of nodes, and each node has a reference to a collection of neighbors. Your question is really two separate problems:
(1) Should you use a dictionary or an array to hold the collection of nodes?
They're nearly equivalent; a dictionary isn't much more than an array behind the scenes. If you don't have a strong reason to do otherwise, relying on the built-in dictionary rather than re-implementing one with your own hash function and a dense array will probably be the right choice.
A dictionary will use a bit more space
Dictionary deletions from a dictionary will be much faster (and so will insertions if you actually mean an array and not python's list)
If you have a fast way to generate the number 1-n for each node then that might work better than the hash function a dictionary uses behind the scenes, so you might want to use an array.
(2) Should you use a set or a linked list to hold the collection of adjacent nodes?
Almost certainly you want a set. It's at least as good asymptotically as a list for anything you want to do with a collection of neighbors, it's more cache friendly, it has less object overhead, and so on.
As always, your particular problem can sway the choice one way or another. E.g., I mentioned that an array has worse insertion/deletion performance than a dictionary, but if you hardly ever insert/delete then that won't matter, and the slightly reduced memory would start to look attractive.

Increment numbers in list from a certain point

I have a list of numbers, e.g. [50,100,150,200,250]. I need to increment (or decrement) each number from a specified index and by a specified amount. I have been able to do this in two ways:
from itertools import islice
l = [50,100,150,200,250]
start_increment_index = 3
l[start_increment_index:] = [e+100 for e in l[start_increment_index:]]
print (l)
l = [50,100,150,200,250]
l[start_increment_index:] = [e+100 for e in islice(l,start_increment_index,len(l))]
print (l)
Both print: [50, 100, 150, 300, 350].
However, my real list contains millions of numbers and this operation is performed repeatedly with different indexes and different increments/decrements. Would there be a faster way of doing this using a Python list? I have been considering writing my own C/C++ extension to deal with this.
Edit: Would this be a useful module for Python in general? Having a function written in C which can take parameters (python_list_object, increment_amount, start_index, end_index)?
Main problem in your solution that you creates(allocating memory + copy) two lists. First it's list comprehension by itself and second l[start_increment_index:] inside it.
If you data source is python list, you can do you operation for O(n):
for i in range(start_increment_index, len(l)):
l[i] += increment
NB: define increment first.
It depends specifically on your goals. I suppose that you can use segment tree for this case. For more information see https://en.m.wikipedia.org/wiki/Segment_tree.
Just for brief description. This structure represents array upon which will be performed range operations (like addition/substraction subarray with number) This structure is optimized for case where you have very big number of such range queries.
Note: if you want to use only python list structure, then you can implement sparse table (it is another view of segment tree with implicit storing of tree in arrays)

Python Dictionary of Pointers (how to track roots when merging trees)

I am attempting to implement an algorithm (in Python) which involves a growing forest. The number of nodes are fixed, and at each step an edge is added. Throughout the course of the algorithm I need to keep track of the roots of the trees. This is a fairly common problem, e.g. Kruskal's Algorithm. Naively one might compute the root on the fly, but my forest is too large to make this feasable. A second attempt might be to keep a dictionary keyed by the nodes and whose values are the roots of the tree containing the node. This seems more promising, but I need to avoid updating the dictionary value of every node in two trees to be merged (the trees eventually get very deep and this is too computationally expensive). I was hopeful when I found the topic:
Simulating Pointers in Python
The notion was to keep a pointer to the root of each tree and simply update the roots when trees were merged. However, I quickly ran into the following (undesirable) behavior:
class ref:
def __init__(self,obj): self.obj = obj
def get(self): return self.obj
def set(self,obj): self.obj=obj
a = ref(1)
b = ref(2)
c = ref(3)
a = b
b = c
print(a,b,c) # => 2, 3, 3
Of course the desired output would be 3,3,3. I I check the addresses at each step I find that a and b are indeed pointing to the same thing (after a=b), but that a is not updated when I set b=c.
a = ref(1)
b = ref(2)
c = ref(3)
print(id(a),id(b),id(c)) # => 140512500114712 140512500114768 140512500114824
a = b
print(id(a),id(b),id(c)) # => 140512500114768 140512500114768 140512500114824
b = c
print(id(a),id(b),id(c)) # => 140512500114768 140512500114824 140512500114824
My primary concern is to be able to track to roots of trees when they are merged without a costly update, I would take any reasonable solution on this front whether or not it relates to the ref class. My secondary concern is to understand why Python is behaving this way with the ref class and how to modify the class to get the desired behavior. Any help or insight with regards to these problems is greatly appreciated.
When a=b is executed, the computer is getting the value of b. It calls b.get(), so 2 is returned. Therefore, a points to 2, not b.
If you used a.set(b) instead, then a would point to b. (I hope!)
Let me know if this works and if anything needs more clarification.

What are practical use cases for "recursive" lists in python? [duplicate]

I was playing around in python. I used the following code in IDLE:
p = [1, 2]
p[1:1] = [p]
print p
The output was:
[1, [...], 2]
What is this […]? Interestingly I could now use this as a list of list of list up to infinity i.e.
p[1][1][1]....
I could write the above as long as I wanted and it would still work.
EDIT:
How is it represented in memory?
What's its use? Examples of some cases where it is useful would be helpful.
Any link to official documentation would be really useful.
This is what your code created
It's a list where the first and last elements are pointing to two numbers (1 and 2) and where the middle element is pointing to the list itself.
In Common Lisp when printing circular structures is enabled such an object would be printed as
#1=#(1 #1# 2)
meaning that there is an object (labelled 1 with #1=) that is a vector with three elements, the second being the object itself (back-referenced with #1#).
In Python instead you just get the information that the structure is circular with [...].
In this specific case the description is not ambiguous (it's backward pointing to a list but there is only one list so it must be that one). In other cases may be however ambiguous... for example in
[1, [2, [...], 3]]
the backward reference could either point to the outer or to the inner list.
These two different structures printed in the same way can be created with
x = [1, [2, 3]]
x[1][1:1] = [x[1]]
y = [1, [2, 3]]
y[1][1:1] = [y]
print(x)
print(y)
and they would be in memory as
It means that you created an infinite list nested inside itself, which can not be printed. p contains p which contains p ... and so on. The [...] notation is a way to let you know this, and to inform that it can't be represented! Take a look at #6502's answer to see a nice picture showing what's happening.
Now, regarding the three new items after your edit:
This answer seems to cover it
Ignacio's link describes some possible uses
This is more a topic of data structure design than programming languages, so it's unlikely that any reference is found in Python's official documentation
To the question "What's its use", here is a concrete example.
Graph reduction is an evaluation strategy sometime used in order to interpret a computer language. This is a common strategy for lazy evaluation, notably of functional languages.
The starting point is to build a graph representing the sequence of "steps" the program will take. Depending on the control structures used in that program, this might lead to a cyclic graph (because the program contains some kind of "forever" loop -- or use recursion whose "depth" will be known at evaluation time, but not at graph-creation time)...
In order to represent such graph, you need infinite "data structures" (sometime called recursive data structures), like the one you noticed. Usually, a little bit more complex though.
If you are interested in that topic, here is (among many others) a lecture on that subject: http://undergraduate.csse.uwa.edu.au/units/CITS3211/lectureNotes/14.pdf
We do this all the time in object-oriented programming. If any two objects refer to each other, directly or indirectly, they are both infinitely recursive structures (or both part of the same infinitely recursive structure, depending on how you look at it). That's why you don't see this much in something as primitive as a list -- because we're usually better off describing the concept as interconnected "objects" than an "infinite list".
You can also get ... with an infinitely recursive dictionary. Let's say you want a dictionary of the corners of a triangle, where each value is a dictionary of the other corners connected to that corner. You could set it up like this:
a = {}
b = {}
c = {}
triangle = {"a": a, "b": b, "c": c}
a["b"] = b
a["c"] = c
b["a"] = a
b["c"] = c
c["a"] = a
c["b"] = b
Now if you print triangle (or a or b or c for that matter), you'll see it's full of {...} because any two corners are referring to back to each other.
As I understood, this is an example of fixed point
p = [1, 2]
p[1:1] = [p]
f = lambda x:x[1]
f(p)==p
f(f(p))==p

Recursive generation + filtering. Better non-recursive?

I have the following need (in python):
generate all possible tuples of length 12 (could be more) containing either 0, 1 or 2 (basically, a ternary number with 12 digits)
filter these tuples according to specific criteria, culling those not good, and keeping the ones I need.
As I had to deal with small lengths until now, the functional approach was neat and simple: a recursive function generates all possible tuples, then I cull them with a filter function. Now that I have a larger set, the generation step is taking too much time, much longer than needed as most of the paths in the solution tree will be culled later on, so I could skip their creation.
I have two solutions to solve this:
derecurse the generation into a loop, and apply the filter criteria on each new 12-digits entity
integrate the filtering in the recursive algorithm, so to prevent it stepping into paths that are already doomed.
My preference goes to 1 (seems easier) but I would like to hear your opinion, in particular with an eye towards how a functional programming style deals with such cases.
How about
import itertools
results = []
for x in itertools.product(range(3), repeat=12):
if myfilter(x):
results.append(x)
where myfilter does the selection. Here, for example, only allowing result with 10 or more 1's,
def myfilter(x): # example filter, only take lists with 10 or more 1s
return x.count(1)>=10
That is, my suggestion is your option 1. For some cases it may be slower because (depending on your criteria) you many generate many lists that you don't need, but it's much more general and very easy to code.
Edit: This approach also has a one-liner form, as suggested in the comments by hughdbrown:
results = [x for x in itertools.product(range(3), repeat=12) if myfilter(x)]
itertools has functionality for dealing with this. However, here is a (hardcoded) way of handling with a generator:
T = (0,1,2)
GEN = ((a,b,c,d,e,f,g,h,i,j,k,l) for a in T for b in T for c in T for d in T for e in T for f in T for g in T for h in T for i in T for j in T for k in T for l in T)
for VAL in GEN:
# Filter VAL
print VAL
I'd implement an iterative binary adder or hamming code and run that way.

Categories

Resources