Disjoint-Set forests in Python alternate implementation - python

I'm implementing a disjoint set system in Python, but I've hit a wall. I'm using a tree implementation for the system and am implementing Find(), Merge() and Create() functions for the system.
I am implementing a rank system and path compression for efficiency.
The catch is that these functions must take the set of disjoint sets as a parameter, making traversing hard.
class Node(object):
def __init__(self, value):
self.parent = self
self.value = value
self.rank = 0
def Create(values):
l = [Node(value) for value in values]
return l
The Create function takes in a list of values and returns a list of singular Nodes containing the appropriate data.
I'm thinking the Merge function would look similar to this,
def Merge(set, value1, value2):
value1Root = Find(set, value1)
value2Root = Find(set, value2)
if value1Root == value2Root:
return
if value1Root.rank < value2Root.rank:
value1Root.parent = value2Root
elif value1Root.rank > value2Root.rank:
value2Root.parent = value1Root
else:
value2Root.parent = value1Root
value1Root.rank += 1
but I'm not sure how to implement the Find() function since it is required to take the list of Nodes and a value (not just a node) as the parameters. Find(set, value) would be the prototype.
I understand how to implement path compression when a Node is taken as a parameter for Find(x), but this method is throwing me off.
Any help would be greatly appreciated. Thank you.
Edited for clarification.

The implementation of this data structure becomes simpler when you realize that the operations union and find can also be implemented as methods of a disjoint set forest class, rather than on the individual disjoint sets.
If you can read C++, then have a look at my take on the data structure; it hides the actual sets from the outside world, representing them only as numeric indices in the API. In Python, it would be something like
class DisjSets(object):
def __init__(self, n):
self._parent = range(n)
self._rank = [0] * n
def find(self, i):
if self._parent[i] == i:
return i
else:
self._parent[i] = self.find(self._parent[i])
return self._parent[i]
def union(self, i, j):
root_i = self.find(i)
root_j = self.find(j)
if root_i != root_j:
if self._rank[root_i] < self._rank[root_j]:
self._parent[root_i] = root_j
elif self._rank[root_i] > self._rank[root_j]:
self._parent[root_j] = root_i
else:
self._parent[root_i] = root_j
self._rank[root_j] += 1
(Not tested.)
If you choose not to follow this path, the client of your code will indeed have to have knowledge of Nodes and Find must take a Node argument.

Clearly merge function should be applied to pair of nodes.
So find function should take single node parameter and look like this:
def find(node):
if node.parent != node:
node.parent = find(node.parent)
return node.parent
Also wikipedia has pseudocode that is easily translatable to python.

Find is always done on an item. Find(item) is defined as returning the set to which the item belongs. Merger as such must not take nodes, merge always takes two items/sets. Merge or union (item1, item2) must first find(item1) and find(item2) which will return the sets to which each of these belong. After that the smaller set represented by an up-tree must be added to the taller. When a find is issued, always retrace the path and compress it.
A tested implementation with path compression is here.

Related

Implementation of Shortest Path Graph in Python Class

Hi! I am new to Python and I am struggling for many hours so far with some problem regarding Shortest Path Algorithm implementation in Python.
I am expected to solve some task about finding shortest paths (graph problem) among given persons, and at the end find a common person who connects all of them.
I've made something like this so far:
import itertools
class centralperson():
def initialization(self,N,data,a,b,c):
self.data = data
self.N = N
self.a = a
self.b = b
self.c = c
self.list_of_values = [self.a,self.b,self.c]
self.list_of_paths = []
self.common_person = []
def makeGraph(self):
# Create dict with empty list for each key
graph = {key: [] for key in range(self.N)}
self.graph = graph
for key in self.graph:
for x in self.data:
if key in x:
x = x.copy()
x.remove(key)
self.graph[key].extend(x)
def find_path(self,start, end):
path = []
path = path + [start]
if start == end:
return path
if start not in self.graph.keys():
raise ValueError('No such key in graph!')
for node in self.graph[start]:
if node not in path:
newpath = self.find_path(self.graph, node, end, path)
if newpath: return newpath
return self.list_of_paths.append(path)
def findPaths(self):
for pair in itertools.combinations(self.list_of_values, r=5):
self.find_path(*pair)
def commonperson(self):
list_of_lens = {}
commonalities = set(self.list_of_paths[0]) & set(self.list_of_paths[1]) & set(self.list_of_paths[2])
for i in self.list_of_values:
list_of_lens[i] = (len(self.graph[i]))
if len(commonalities)>1 or len(commonalities)<1:
for k,v in list_of_lens.items():
if v==1 and self.graph[k][0] in commonalities:
output = self.graph[k]
self.common_person.append(output)
else:
output = list(commonalities)
self.common_person.append(output)
return
def printo(self):
#return(self.common_person[0])
return(self.list_of_paths,self.list_of_values)
Description of each function and inputs:
N -> number of unique nodes
a,b,c -> some arbitrarily chosen nodes to find common one among them
initialization -> just initialize our global variables used in other methods, and store the list of outputs
makeGraph -> makes an Adjacency List out of an input.
find_path -> find path between two given nodes (backtracking recursion)
findPaths -> it was expected to call find_path here for every combination of A,B,C i.e A->B, A->C, B->C
commonperson -> expected to find common person from the output of list_of_paths list
printo -> print this common person
Generally It works (I'think) when I'am running each function separately. But, when I try to make a huge class of it, it doesn't work :(
I think the problem is with this recursive function find_path. It is supposed to find a path between two person given, and append the result path to the list with all paths. Yet, as I have 3 different persons, and find_path is a recursive function with only 2 parameters.
Hence, I need to find all paths that connects them (3 paths) and append it to a bigger list list_of_paths. I've created a def findPaths to make use of itertools.combinations and in for loop cal function find_path for every combination of start and end argument of this function, but it seems not to work...
Can you help me with this? Also I don't know how to run all the def functions at once, because honestly I wouldn't like to run all instances of the class separately... My desired version is to:
Provide Input to a class i.e : N,data,a,b,c where N is number of unique nodes, data is just list of list with networks assigned, and A,B,C are my persons.
Get Output: which is a common person for all this 3 persons, (I planned to store it in common_person list.
The code inside you class should be indented, i.e.:
class centralperson:
def __init__(self, ...):
...
def makeGraph(self, ...):
...
instead of
class centralperson:
def __init__(self, ...):
...
def makeGraph(self, ...):
...
Try googling for 'python class examples'. I hope this helps!
It might also be useful to experiment with simpler classes before working on this problem.
itertools.combinations(self.list_of_values, r=5)
returns an empty list, since self.list_of_values only has 3 elements, from which you cannot pick 5.
Perhaps you meant:
itertools.combinations(self.list_of_values, r=2)

How does the "in" keyword in python work ? Is there a faster way?

I am a programming newbie and I was playing with some big data (yelp dataset 6m+ reviews) and for a simple example I wanted to try to find a certain word in a text so basically i tried to for loop through all this data and find a certain word in these reviews and before getting triggered I know that this is the worst way of doing that but then I used nltk to preprocess the data, put the reviews in a list and check for the word inside this list using the "in" keyword and it was much faster so my question is what makes the "in" keyword faster ? And is there an even faster way other than improving the preprocessing part ?
Edit 1(here is an example code):
First I tokenize the review e.x "This place is good" becomes ["This","place","is","good"]
contents = word_tokenize(data[u'text'])
Then I check if a certain string is in this list
if(contents[i] in list_of_targeted_words): return 1
This appeared to be faster than using for loop
if(contents[i] == list_of_targeted_words[j]): return 1
in itself is not faster; it's just the public face of the __contains__ method that a class can define. list.__contains__ is an O(n) operation because it has to walk through the entire list looking for a match.
# a in my_list
for x in my_list:
if x == a:
return True
return False
dict.__contains__ is fast because it just needs to perform the O(1) lookup in a dict value.
# a in my_dict
try:
my_dict[a]
return True
except KeyError:
return False
Other classes can define __contains__ as needed. Consider a class representing a binary search tree:
class BST:
# You should be able to infer enough of the structure of the tree
# from this definition.
def __contains__(self, x):
node = self.root
while node is not None:
if node.value == x:
return True
elif node.value < x:
node = node.right
else:
node = node.left
return False
Most importantly, it doesn't do an O(n) traversal of the entire tree looking for x; instead, it does a O(lg n) walk down from the root.

Binary Searching and lists

class BinaryStringList():
def __init_(self):
self.item = []
def strAdd(self,item):
self.items.append(item)
def finditem(self, item):
if len(self)==0:
print("List is empty!")
else:
midpoint = len(self)//2
if self[midpoint]==item:
print("Item Found ", item)
else:
if item<self[midpoint]:
return finditem(self[:midpoint], item)
else:
return finditem(self[midpoint+1:], item)
So where I am finding I have an issue is when trying to add items to the list. If i do something like:
alist = BinaryStringList()
alist.strAdd("test1")
my code fails stating object has no attribute. Not sure why it is failing since I have almost the exact same code for another program except the find is using a sequential search where as this is a binary search.
You have multiple syntax errors in your code. Also recursion doesn't work that way, you need to have a base condition which returns. This solution will work, but I strongly suggest you to solve simpler problems using recursion to understand how it works.
class BinaryStringList:
def __init__(self): # You had 1 _ after init
self.items = [] # Typo, should have been items.
def strAdd(self,item):
self.items.append(item)
def finditem(self, item):
return self.binser(self.items, item)
def binser(self, items, item):
if len(items)==0:
return
midpoint = len(items)/2 # len(self) means nothing, it should be len(self.items)
if items[midpoint]==item:
return item
else:
if item<items[midpoint]:
return self.binser(items[:midpoint], item) #self[:midpoint] means nothing, you needed self.items[:midpoint]
else:
return self.binser(items[midpoint+1:], item)
binser = BinaryStringList()
binser.strAdd(1) # You added a string here. Your logic won't work with string.
binser.strAdd(2)
binser.strAdd(3)
binser.strAdd(5)
binser.strAdd(8)
binser.strAdd(9)
binser.strAdd(10)
print binser.finditem(1)
print binser.finditem(10)
print binser.finditem(5)
print binser.finditem(11)
(there are other ways of solving binary search too - i.e. iterative approach, passing low/high index values rather than splicing the input array). Try to solve binary search using those two approaches.
Binary search with passing low/high index values, your signature for binser will look like: def binser(self, low, high, item):
Your code is failing because you misspelled __init__. You need two underscores on each side, or it's just a weirdly named method. Since you lack a __init__, the default __init__ (which sets no attributes) is used, and you don't have an item or items attribute. You need to fix the __init__, and use a consistent name for items:
class BinaryStringList():
def __init__(self): # <-- Added extra trailing underscore
self.items = [] # Fixed name to be items, not item
You have many other problems here (you're not maintaining sorted order, so binary search won't work, you haven't implemented __getitem__ so self[midpoint] won't work so you'd need self.items[midpoint], lack of __len__ means len(self) won't work either, etc.), but the two issues above are what specifically makes you get the AttributeError.

How to write pop(item) method for unsorted list

I'm implementing some basic data structures in preparation for an exam and have come across the following issue. I want to implement an unsorted linked list, and have already implemented a pop() method, however I don't know, either syntactically or conceptually, how to make a function sometimes take an argument, sometimes not take an argument. I hope that makes sense.
def pop(self):
current = self.head
found = False
endOfList = None
while current != None and not found:
if current.getNext() == None:
found = True
endOfList = current.getData()
self.remove(endOfList)
self.count = self.count - 1
else:
current = current.getNext()
return endOfList
I want to know how to make the statement unsortedList.pop(3) valid, 3 being just an example and unsortedList being a new instance of the class.
The basic syntax (and a common use case) for using a parameter with a default value looks like this:
def pop(self, index=None):
if index is not None:
#Do whatever your default behaviour should be
You then just have to identify how you want your behaviour to change based on the argument. I am just guessing that the argument should specify the index of the element that should be pop'ed from the list.
If that is the case you can directly use a valid default value instead of None e.g. 0
def pop(self, index=0):
First, add a parameter with a default value to the function:
def pop(self, item=None):
Now, in the code, if item is None:, you can do the "no param" thing; otherwise, use item. Whether you want to switch at the top, or lower down in the logic, depends on your logic. In this case, item is None probably means "match the first item", so you probably want a single loop that checks item is None or current.data == item:.
Sometimes you'll want to do this for a parameter that can legitimately be None, in which case you need to pick a different sentinel. There are a few questions around here (and blog posts elsewhere) on the pros and cons of different choices. But here's one way:
class LinkedList(object):
_sentinel = object()
def pop(self, item=_sentinel):
Unless it's valid for someone to use the private _sentinel class member of LinkedList as a list item, this works. (If that is valid—e.g., because you're building a debugger out of these things—you have to get even trickier.)
The terminology on this is a bit tricky. Quoting the docs:
When one or more top-level parameters have the form parameter = expression, the function is said to have “default parameter values.”
To understand this: "Parameters" (or "formal parameters") are the things the function is defined to take; "arguments" are things passed to the function in a call expression; "parameter values" (or "actual parameters", but this just makes things more confusing) are the values the function body receives. So, it's technically incorrect to refer to either "default parameters" or "parameters with default arguments", but both are quite common, because even experts find this stuff confusing. (If you're curious, or just not confused yet, see function definitions and calls in the reference documentation for full details.)
Is your exam using Python specifically? If not, you may want to look into function overloading. Python doesn't support this feature, but many other languages do, and is a very common approach to solving this kind of problem.
In Python, you can get a lot of mileage out of using parameters with default values (as Michael Mauderer's example points out).
def pop(self, index=None):
prev = None
current = self.head
if current is None:
raise IndexError("can't pop from empty list")
if index is None:
index = 0 # the first item by default (counting from head)
if index < 0:
index += self.count
if not (0 <= index < self.count):
raise IndexError("index out of range")
i = 0
while i != index:
i += 1
prev = current
current = current.getNext()
assert current is not None # never happens if list is self-consistent
assert i == index
value = current.getData()
self.remove(current, prev)
##self.count -= 1 # this should be in self.remove()
return value

How to walk up a linked-list using a list comprehension?

I've been trying to think of a way to traverse a hierarchical structure, like a linked list, using a list expression, but haven't come up with anything that seems to work.
Basically, I want to convert this code:
p = self.parent
names = []
while p:
names.append(p.name)
p = p.parent
print ".".join(names)
into a one-liner like:
print ".".join( [o.name for o in <???>] )
I'm not sure how to do the traversal in the ??? part, though, in a generic way (if its even possible). I have several structures with similar .parent type attributes, and don't want to have write a yielding function for each.
Edit:
I can't use the __iter__ methods of the object itself because its already used for iterating over the values contained within the object itself. Most other answers, except for liori's, hardcode the attribute name, which is what I want to avoid.
Here's my adaptation based upon liori's answer:
import operator
def walk(attr, start):
if callable(attr):
getter = attr
else:
getter = operator.attrgetter(attr)
o = getter(start)
while o:
yield o
o = getter(o)
The closest thing I can think of is to create a parent generator:
# Generate a node's parents, heading towards ancestors
def gen_parents(node):
node = node.parent
while node:
yield node
node = node.parent
# Now you can do this
parents = [x.name for x in gen_parents(node)]
print '.'.join(parents)
If you want your solution to be general, use a general techique. This is a fixed-point like generator:
def fixedpoint(f, start, stop):
while start != stop:
yield start
start = f(start)
It will return a generator yielding start, f(start), f(f(start)), f(f(f(start))), ..., as long as neither of these values are equal to stop.
Usage:
print ".".join(x.name for x in fixedpoint(lambda p:p.parent, self, None))
My personal helpers library has similar fixedpoint-like function for years... it is pretty useful for quick hacks.
List comprehension works with objects that are iterators (have the next() method). You need to define an iterator for your structure in order to be able to iterate it this way.
Your LinkedList needs to be iterable for it to work properly.
Here's a good resource on it. (PDF warning) It is very in depth on both iterators and generators.
Once you do that, you'll be able to just do this:
print ".".join( [o.name for o in self] )

Categories

Resources