Sort nodes based on inputs / outputs - python

I have a node system where every node only stores its inputs and outputs, but not its index. Here is a simplified example:
class Node1:
requiredInputs = []
class Node2:
requiredInputs = ["Node1"]
class Node3:
requiredInputs = ["Node2"]
class Node4:
requiredInputs = ["Node3", "Node2"]
Now I want to order that nodes, so that all inputs are already processed when processing that node. For this simple example, a possible order would be [Node1, Node2, Node3, Node4].
My first idea would be to use a brute force to check every possible combination. However, this will be very slow for a bigger number of nodes.
What would be a more efficient way to do this? I dont need an implementation, just a basic idea or algorithm.

What you want is to topologically sort the nodes.
http://en.wikipedia.org/wiki/Topological_sorting
The very basic idea would be to assign an integer to each node that is at the beginning equal to the number of outputs it has. Then add all the nodes with the value 0 (that is, those that have no outputs) to the list that will represent the order. For each node that is ever appended to the list, subtract one from the values associated with the nodes that are inputs to that node. If any of those nodes now have the value of zero, add them to the list as well. Repeat doing it. It is guaranteed that eventually the process terminates, as long as you don't have cycles, and that nodes in the list will be sorted in such a way that inputs always go before outputs.

Algorithm
Topological sort is indeed the way to go; Per your request, I will not write the full implementation.
Outline & notes
Types
First, you code store the requiredInputs as classes, not as strings. This will make the comparison way more elegant:
class Node1:
requiredInputs = []
class Node2:
requiredInputs = [Node1]
class Node3:
requiredInputs = [Node2]
class Node4:
requiredInputs = [Node3, Node2]
Input and output data structures
Then, you can place your nodes in two arrays, for input and output. This can be done in-place (using a single array), but it's rarely worth the trouble.
unordered_nodes = [Node4, Node3, Node2, Node1]
ordered_nodes = []
Here's the algorithm outline:
while there are unordered_nodes:
for each node N in unordered_nodes:
if the requiredInputs of N are already in ordered_nodes:
add N to ordered_nodes
remove N from unordered_nodes
break
Expected result
When implemented, it should give:
print ordered_nodes
[<class __main__.Node1 at 0x10a7a8bb0>,
<class __main__.Node2 at 0x10a7a83f8>,
<class __main__.Node3 at 0x10a7a80b8>,
<class __main__.Node4 at 0x10a7a8600>]
Optimizations
There are quite a few ways to optimize or otherwise improve a topological sort. As before, I'll hint a few without disclosing any implementation.
Pre-sorting the input array by some property
Sorting in-place, with a single array
Using a different data structure to represent the relations between nodes
Adding more than one node to ordered_nodes at any iteration

Related

How to sort N elements given a list of tuples stating their known order?

sorry for the complicated / confusing title.
Basically I'm trying to implement a system that helps with the sorting of documents with no known date of writing.
It consists of two inputs:
Input 1: A tuple, where the first element is the number of documents, N. The second element is the number of pairs of documents with a known writing order, L. (First document was written before the second document).
Input 2: A list of L tuples. Each tuple contains two elements (documents). The first document was written before the second document. Eg: (1 2), (3 4) means that document1 was written before document2 and document3 was written before document4.
Next, the software must determine if there is a way of sorting all documents chronologically, there can be three outputs:
Inconclusive - Means that the temporal organization of the documents is inconsistent and there is no way of sorting them.
Incomplete - Means that information is lacking and the system can't find a way of sorting.
In case the information is enough, the system should output the order in which the documents have been written.
So far, I have managed to take both inputs, but I do not know where to start in terms of sorting the documents. Any suggestions?
Here's my code so far (Python3):
LN = tuple(int(x.strip()) for x in input("Number of docs. / Number of known pairs").split(' '))
print(LN)
if (LN[1]) > (LN[0]**2):
print("Invalid number of pairs")
else:
A = ['none'] * LN[0]
for i in range(LN[0]):
t = tuple(int(x.strip()) for x in input("Pair:").split(' '))
A[i] = t
print(A)
I appreciate all suggestions :)
Build a directed graph. The inputs are the edges. Check for cycles which would indicate inconsistent input. Find the "leftmost" node, that is the node that doesn't have any edge to it, meaning nothing to its left. Multiple that are leftmost? Incomplete. Then, for each node in the graph, assign the index that equals the length of the longest path from the leftmost node. As there are no (directed) cycles, you could probably just do BFS starting at the leftmost node and at each step assign to the node the maximum of its current value and its value given from its parent. Then iterate through all nodes, and put the numbers in their corresponding indices. Two nodes have the same index assigned? Incomplete.

Pros and cons of different implementations of graph adjacency list

I have seen multiple representations of adjacency list of a graph and I do not know which one to use.
I am thinking of the following representation of a Node object and Graph object (as below)
class Node(object):
def __init__(self, val):
self.val = val
self.connections_distance = {}
# key = node: val = distance
def add(self, neighborNode, distance):
if neighborNode not in self.connections_distance:
self.connections_distance[neighborNode] = distance
class Graph(object):
def __init__(self):
self.nodes = {}
# key = node.val : val = node object
# multiple methods
The second way is nodes are labelled 0 - n - 1 (n is number of nodes). Each node stores it adjacency as an array of linked lists (where the index is the node value and the linked list stores all of its neighbors)
ex. graph:
0 connected to 1 and 2
1 connected to 0 and 2
2 connected to 0 and 1
Or if [a, b, c] is and array containing a, b, and c and [x -> y -> z] is a linked list containing x, y, and z:
representation: [[1->2], [0->2], [0->1]]
Question : What are the pros and cons of each representation and which is more widely used?
Note: It's a bit odd that one representation includes distances and the other doesn't. It's pretty easy to them to both include distances or both omit them though, so I'll omit that detail (you might be interested in set() rather than {}).
It looks like both representations are variants of an Adjacency List (explained further in https://stackoverflow.com/a/62684297/3798897). Conceptually there isn't much difference between the two representations -- you have a collection of nodes, and each node has a reference to a collection of neighbors. Your question is really two separate problems:
(1) Should you use a dictionary or an array to hold the collection of nodes?
They're nearly equivalent; a dictionary isn't much more than an array behind the scenes. If you don't have a strong reason to do otherwise, relying on the built-in dictionary rather than re-implementing one with your own hash function and a dense array will probably be the right choice.
A dictionary will use a bit more space
Dictionary deletions from a dictionary will be much faster (and so will insertions if you actually mean an array and not python's list)
If you have a fast way to generate the number 1-n for each node then that might work better than the hash function a dictionary uses behind the scenes, so you might want to use an array.
(2) Should you use a set or a linked list to hold the collection of adjacent nodes?
Almost certainly you want a set. It's at least as good asymptotically as a list for anything you want to do with a collection of neighbors, it's more cache friendly, it has less object overhead, and so on.
As always, your particular problem can sway the choice one way or another. E.g., I mentioned that an array has worse insertion/deletion performance than a dictionary, but if you hardly ever insert/delete then that won't matter, and the slightly reduced memory would start to look attractive.

Data Structure for fast insertion and random access in already sorted data

p = random_point(a,b)
#random_point() returns a tuple/named-tuple (x,y)
#0<x<a 0<y<b
if centers.validates(p):
centers.insert(p)
#centers is the data structure to store points
In the centers data structure all x and y coordinates are stored in two separate sorted(ascending) lists, one for x and other for y. Each node in x points to the corresponding y, and vice versa, so that they can be separately sorted and still hold the pair property: centers.get_x_of(y) and centers.get_y_of(x)
Properties that I require in data structure:
Fast Insertion, in already sorted data (preferably log n)
Random access
Sort x and y separately, without losing pair property
Initially I thought of using simple Lists, and using Binary search to get the index for inserting any new element. But I found, that, it can be improved using self balancing trees like AVL or B-trees. I could make two trees each for x and y, with each node having an additional pointer that could point from x-tree node to y-tree node.
But I don't know how to build random access functionality in these trees. The function centers.validate() tries to insert x & y, and runs some checks with the neighboring elements, which requires random access:
def validate(p):
indices = get_index(p)
#returns a named tuple of indices to insert x and y, Eg: (3,7)
condition1 = func(x_list[indices.x-1], p.x) and func(x_list[indices.x+1], p.x)
condition2 = func(y_list[indices.y-1], p.y) and func(y_list[indices.y+1], p.y)
#func is some mathematical condition on neighboring elements of x and y
return condition1 and condition2
In the above function I need to access neighboring elements of x & y
data structure. I think implementing this in trees would complicate it. Are there any combination of data structure that can achieve this? I am writing this in Python(if that can help)
Class with 2 dicts that hold the values with the keys being the key of the other dict that contains the related value to the value in this dict. It would need to maintain a list per dict for the current order to call elements of that dict in when calling it (your current sort of that dicts values). You would need a binary or other efficient sort to operate on each dict for insertion, though it would really be using the order list for that dict to find each midpoint key and then checking against value from that key.

Python Dictionary of Pointers (how to track roots when merging trees)

I am attempting to implement an algorithm (in Python) which involves a growing forest. The number of nodes are fixed, and at each step an edge is added. Throughout the course of the algorithm I need to keep track of the roots of the trees. This is a fairly common problem, e.g. Kruskal's Algorithm. Naively one might compute the root on the fly, but my forest is too large to make this feasable. A second attempt might be to keep a dictionary keyed by the nodes and whose values are the roots of the tree containing the node. This seems more promising, but I need to avoid updating the dictionary value of every node in two trees to be merged (the trees eventually get very deep and this is too computationally expensive). I was hopeful when I found the topic:
Simulating Pointers in Python
The notion was to keep a pointer to the root of each tree and simply update the roots when trees were merged. However, I quickly ran into the following (undesirable) behavior:
class ref:
def __init__(self,obj): self.obj = obj
def get(self): return self.obj
def set(self,obj): self.obj=obj
a = ref(1)
b = ref(2)
c = ref(3)
a = b
b = c
print(a,b,c) # => 2, 3, 3
Of course the desired output would be 3,3,3. I I check the addresses at each step I find that a and b are indeed pointing to the same thing (after a=b), but that a is not updated when I set b=c.
a = ref(1)
b = ref(2)
c = ref(3)
print(id(a),id(b),id(c)) # => 140512500114712 140512500114768 140512500114824
a = b
print(id(a),id(b),id(c)) # => 140512500114768 140512500114768 140512500114824
b = c
print(id(a),id(b),id(c)) # => 140512500114768 140512500114824 140512500114824
My primary concern is to be able to track to roots of trees when they are merged without a costly update, I would take any reasonable solution on this front whether or not it relates to the ref class. My secondary concern is to understand why Python is behaving this way with the ref class and how to modify the class to get the desired behavior. Any help or insight with regards to these problems is greatly appreciated.
When a=b is executed, the computer is getting the value of b. It calls b.get(), so 2 is returned. Therefore, a points to 2, not b.
If you used a.set(b) instead, then a would point to b. (I hope!)
Let me know if this works and if anything needs more clarification.

List implemented using an inorder binary tree

For the new computer science assignment we are to implement a list/array using an inorder binary tree. I would just like a suggestion rather than a solution.
The idea is having a binary tree that has its nodes accessible via indexes, e.g.
t = ListTree()
t.insert(2,0) # 1st argument is the value, 2nd the index to insert at
t.get(0) # returns 2
The Node class that the values are stored in is not modifiable but has a property size which contains the total number of children below, along with left, right and value that point to children and store the value accordingly.
My chief problem at the moment keeping track of the index - as we're not allowed to store the index of the node in the node itself I must rely on traversing to track it. As I always start with the left node when traversing I haven't yet thought of a way to recursively figure out what index we are currently at.
Any suggestions would be welcome.
Thanks!
You really wouldn't want to store it on the node itself, because then the index would have to be updated on inserts for all nodes with index less than insert index. I think the real question is how to do an in-order traversal. Try having your recursive function return the number of nodes to its left.
I don't think you want to store the index, rather just the size of each subtree. For insance, if you wanted to look up the 10th element in the list, and the left and right subrees had 7 elements each, you would know that the root is the eight element (since it's in-order binary), and the first element of the right subree is 9th. armed with this knowledge, you would recurse into the right subree, looking for the 2nd element.
HTH
Well, a node in a binary tree cannot have a value and an index. They can have multiple pieces of data but the tree can only be keyed/built on one.
Maybe your assignment wants you to use the index value as the key to the tree and attach the value to the node for quick retrieval of the value given an index.
Does the tree have to be balanced? Does the algorithm need to be efficient?
If not, then the simplest thing to do is make a tree in which all the left children are null, i.e., a tree that devolves to a linked list.
To insert, recursively look go to the right child, and then update the size of the node on the way back out. Something like (pseudocode)
function insert(node, value, index, depth)
if depth < index
create the rightchild if necessary
insert( rightchild, value, index, depth + 1)
if depth == size
node.value = value
node.size = rightchild.size + 1
After you have this working, you can modify it to be more balanced. When increasing the length of the list, add nodes to the left or right child nodes depending on which currently has the least, and update the size on the way out of the recursion.
To generalize to be more efficient, you need to work on the index in terms of its binary representation.
For example, and empty list has one node, without children with value null and size 0.
Say you want to insert "hello" at index 1034. Then you want to end up with a tree whose root has two children, with sizes 1024 and 10. The left child has no actual children, but the right node has a right child of its own of size 2. (The left of size 8 is implied.) That node in turn, has one right child of size 0, with the value "hello". (This list has a 1-based index, but a 0-based index is similar.)
So you need to recursively break down the index into its binary parts, and add nodes as necessary. When searching the list, you need to take care when traversing a node with null children
A very easy solution is to do GetFirst() to get the first node of the tree (this is simply finding the leftmost node of the tree). If your index N is 0, return the first node. Otherwise, call GetNodeNext() N times to get the appropriate node. This isn't super efficient though since accessing a node by index takes O(N Lg N) time.
Node *Tree::GetFirstNode()
{
Node *pN,*child;
pN=GetRootNode();
while(NOTNIL(child=GetNodeLeft(pN)))
{
pN=child;
}
return(pN);
}
Node *Tree::GetNodeNext(Node *pNode)
{
Node *temp;
temp=GetNodeRight(pNode);
if(NOTNIL(temp))
{
pNode=temp;
temp=GetNodeLeft(pNode);
while(NOTNIL(temp))
{
pNode=temp;
temp=GetNodeLeft(pNode);
}
return(pNode);
}
else
{
temp=GetNodeParent(pNode);
while( (NOTNIL(temp)) && (GetNodeRight(temp)==pNode) )
{
pNode=temp;
temp=GetNodeParent(pNode);
}
return(temp);
}
}

Categories

Resources