I am taking Algorithms, Part I via Coursera and am looking to test the run time of the Quick Find, Quick Union, and Weighted Quick Union algorithms. The course is in Java, which I am unfamiliar with, so I have gone through and attempted to recreate the algorithms in Python, which I am more familiar with.
Now that I have everything implemented, I aim to test each function to verify run time/complexity. I had been thinking of using the timeit library, but that seems to be throwing incorrect results, e.g., Weighted Quick Union takes longer to complete than QuickUnion.
How can I verify that a Weighted Quick Union is in fact O(log n) and is faster than Quick Union? Here is what I have created and tried so far:
O(N**2) - Slow
class QuickFind_Eager:
def __init__(self, nodes):
self.array = [num for num in range(nodes)]
# Joins two nodes into a component
def union(self, first_node, second_node):
for pos, val in enumerate(self.array):
if self.array[pos] == self.array[first_node]:
self.array[pos] = self.array[second_node]
# Checks if two nodes are in the same component
def connected(self, first_node, second_node):
return self.array[first_node] == self.array[second_node]
O(N) - Still too slow - Avoid
class QuickUnion_Lazy:
def __init__(self, nodes):
self.array = [num for num in range(nodes)]
# Follows parent pointers to actual root
def root(self, parent):
while parent != self.array[parent]:
parent = self.array[parent]
return parent
# Joins two nodes into a component
def union(self, first_node, second_node):
self.array[first_node] = self.array[second_node]
# Checks if two nodes are in the same component
def connected(self, first_node, second_node):
return self.root(first_node) == self.root(second_node)
O(log N) - Pretty darn fast
class WeightedQuickUnion:
def __init__(self, nodes):
self.array = [num for num in range(nodes)]
self.weight = [num for num in range(nodes)]
# Follows parent pointers to actual root
def root(self, parent):
while parent != self.array[parent]:
parent = self.array[parent]
return parent
# Joins two nodes into a component
def union(self, first_node, second_node):
if self.root(first_node) == self.root(second_node):
return
if (self.weight[first_node] < self.weight[second_node]):
self.array[first_node] = self.root(second_node)
self.weight[second_node] += self.weight[first_node]
else:
self.array[second_node] = self.root(first_node)
self.weight[first_node] += self.weight[second_node]
# Checks if two nodes are in the same component
def connected(self, first_node, second_node):
return self.root(first_node) == self.root(second_node)
O(N + M lg* N) - wicked fast
class WeightedQuickUnion_PathCompression:
def __init__(self, nodes):
self.array = [num for num in range(nodes)]
self.weight = [num for num in range(nodes)]
# Follows parent pointers to actual root
def root(self, parent):
while parent != self.array[parent]:
self.array[parent] = self.array[self.array[parent]]
parent = self.array[parent]
return parent
# Joins two nodes into a component
def union(self, first_node, second_node):
if self.root(first_node) == self.root(second_node):
return
if self.weight[first_node] < self.weight[second_node]:
self.array[first_node] = self.root(second_node)
self.weight[second_node] += self.weight[first_node]
else:
self.array[second_node] = self.root(first_node)
self.weight[first_node] += self.weight[second_node]
# Checks if two nodes are in the same component
def connected(self, first_node, second_node):
return self.root(first_node) == self.root(second_node)
Test run time
def test_quickfind(quickfind):
t = quickfind(100)
t.union(1,2)
t.connected(1,2)
t.union(4,2)
t.union(3,4)
t.connected(0,2)
t.connected(1,4)
t.union(0,3)
t.connected(0,4)
import timeit
t = timeit.timeit(stmt="test_quickfind(QuickFind_Eager)", setup="from __main__ import QuickFind_Eager; from __main__ import test_quickfind", number=100000)
print(t)
# 11.4380569069981
t = timeit.timeit(stmt="test_quickfind(QuickUnion_Lazy)", setup="from __main__ import QuickUnion_Lazy; from __main__ import test_quickfind", number=100000)
print(t)
# 1.4744456350017572
t = timeit.timeit(stmt="test_quickfind(WeightedQuickUnion)", setup="from __main__ import WeightedQuickUnion; from __main__ import test_quickfind", number=100000)
print(t)
# 2.738758583996969
t = timeit.timeit(stmt="test_quickfind(WeightedQuickUnion_PathCompression)", setup="from __main__ import WeightedQuickUnion_PathCompression; from __main__ import test_quickfind", number=100000)
print(t)
# 3.0113827050008695
Update
Added results from timeit.
You need to tabulate the algorithms' running times as a function of problem size, ie. calling quickfind for different problem sizes ( say 100,200,300,400,500; beware, expect the latter to run at least 3 minutes for the naive O(n^2) algorithm ).
You still have no guarantees that you observe the asymptotic run time functions (that's what the O notation is about: O(f) actually describes a family of functions g_i, g_i = a_i * f(n) + b_i; a_i, b_i: const [sort of abusing the notation]), since some of your implementations may run into a resource exhaustion (read: no more ram) resulting in significant performance hits beyond the realm of your implementations.
Implementation of union function in QuickFindEager class is not correct.
self.array[first_node] and self.array[second_node] should be added to variables before loop and than changed in loop from variables
def union(self, first_node, second_node):
pid = self.array[first_node]
qid = self.array[second_node]
for pos, val in enumerate(self.array):
if self.array[pos] == pid:
self.array[pos] = qid
Related
I am trying to develop a 15 star puzzle program in Python and its supposed to sort everything in numerical order using the a star search algorithm with the 0 being at the end.
Here is my a star algorithm I've developed so far:
"""Search the nodes with the lowest f scores first.
You specify the function f(node) that you want to minimize; for example,
if f is a heuristic estimate to the goal, then we have greedy best
first search; if f is node.depth then we have breadth-first search.
There is a subtlety: the line "f = memoize(f, 'f')" means that the f
values will be cached on the nodes as they are computed. So after doing
a best first search you can examine the f values of the path returned."""
def best_first_graph_search_manhattan(root_node):
start_time = time.time()
f = manhattan(root_node)
node = root_node
frontier = []
# how do we create this association?
heapq.heappush(frontier, node)
explored = set()
z = 0
while len(frontier) > 0:
node = heapq.heappop(frontier)
print(node.state.tiles)
explored.add(node)
if (goal_test(node.state.tiles)):
#print('In if statement')
path = find_path(node)
end_time = time.time()
z = z + f
return path, len(explored), z, (end_time - start_time)
for child in get_children(node):
# calcuate total cost
f_0 = manhattan(child)
z = z + f_0
print(z)
if child not in explored and child not in frontier:
#print('Pushing frontier and child')
heapq.heappush(frontier, child)
print('end of for loop')
return None
"""
Return the heuristic value for a given state using manhattan function
"""
def manhattan(node):
# Manhattan Heuristic Function
# x1, y1 = node.state.get_location()
# x2, y2 = self.goal
zero_location = node.state.tiles.index('0')
x1 = math.floor(zero_location / 4)
y1 = zero_location % 4
x2 = 3
y2 = 3
return abs(x2 - x1) + abs(y2 - y1)
"""
astar_search() is a best-first graph searching algortithim using equation f(n) = g(n) + h(n)
h is specified as...
"""
def astar_search_manhattan(root_node):
"""A* search is best-first graph search with f(n) = g(n)+h(n).
You need to specify the h function when you call astar_search, or
else in your Problem subclass."""
return best_first_graph_search_manhattan(root_node)
Here is the rest of my program. Assume that everything is working correctly in the following:
import random
import math
import time
import psutil
import heapq
#import utils.py
import os
import sys
from collections import deque
# This class defines the state of the problem in terms of board configuration
class Board:
def __init__(self,tiles):
self.size = int(math.sqrt(len(tiles))) # defining length/width of the board
self.tiles = tiles
#This function returns the resulting state from taking particular action from current state
def execute_action(self,action):
new_tiles = self.tiles[:]
empty_index = new_tiles.index('0')
if action=='l':
if empty_index%self.size>0:
new_tiles[empty_index-1],new_tiles[empty_index] = new_tiles[empty_index],new_tiles[empty_index-1]
if action=='r':
if empty_index%self.size<(self.size-1):
new_tiles[empty_index+1],new_tiles[empty_index] = new_tiles[empty_index],new_tiles[empty_index+1]
if action=='u':
if empty_index-self.size>=0:
new_tiles[empty_index-self.size],new_tiles[empty_index] = new_tiles[empty_index],new_tiles[empty_index-self.size]
if action=='d':
if empty_index+self.size < self.size*self.size:
new_tiles[empty_index+self.size],new_tiles[empty_index] = new_tiles[empty_index],new_tiles[empty_index+self.size]
return Board(new_tiles)
# This class defines the node on the search tree, consisting of state, parent and previous action
class Node:
def __init__(self,state,parent,action):
self.state = state
self.parent = parent
self.action = action
#self.initial = initial
#Returns string representation of the state
def __repr__(self):
return str(self.state.tiles)
#Comparing current node with other node. They are equal if states are equal
def __eq__(self,other):
return self.state.tiles == other.state.tiles
def __hash__(self):
return hash(self.state)
def __lt__(self, other):
return manhattan(self) < manhattan(other)
# Utility function to randomly generate 15-puzzle
def generate_puzzle(size):
numbers = list(range(size*size))
random.shuffle(numbers)
return Node(Board(numbers),None,None)
# This function returns the list of children obtained after simulating the actions on current node
def get_children(parent_node):
children = []
actions = ['l','r','u','d'] # left,right, up , down ; actions define direction of movement of empty tile
for action in actions:
child_state = parent_node.state.execute_action(action)
child_node = Node(child_state,parent_node,action)
children.append(child_node)
return children
# This function backtracks from current node to reach initial configuration. The list of actions would constitute a solution path
def find_path(node):
path = []
while(node.parent is not None):
path.append(node.action)
node = node.parent
path.reverse()
return path
# Main function accepting input from console , running iterative_deepening_search and showing output
def main():
global nodes_expanded
global path
global start_time
global cur_time
global end_time
nodes_expanded = 0
process = psutil.Process(os.getpid())
initial_memory = process.memory_info().rss / 1024.0
initial = str(input("initial configuration: "))
initial_list = initial.split(" ")
root = Node(Board(initial_list),None,None)
print(astar_search_manhattan(root))
final_memory = process.memory_info().rss / 1024.0
print('Directions: ', path)
print('Total Time: ', (end_time-start_time), ' seconds')
print('Total Memory: ',str(final_memory-initial_memory)+" KB")
print('Total Nodes Expanded: ', nodes_expanded)
# Utility function checking if current state is goal state or not
def goal_test(cur_tiles):
return cur_tiles == ['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','0']
if __name__=="__main__":main()
I've managed to narrow it down into my for loop in my best_first_graph_search_manhattan function and it appears that the infinite loop is caused if the if statement where its checking if child is not in explored and child is not in frontier. I'm unsure if its the way I'm calling my child function or the way I'm pushing frontier and child into my priority queue. I have imported heapq into my program and I've done extensive research where importing that function allows you to utilize priority queue into your program. Please don't mind other variables that are not used in my a star search.
Here is a test case: 1 0 3 4 5 2 6 8 9 10 7 11 13 14 15 12 | DRDRD
Thank you all very much for your help!
One of the exercises (namely, #6) asks us to compare performance of queue implementations (with head in the beinning / at the end of a list). That sounds like there could be some difference, so I tried to figure it out. Here's my code
import timeit
class QueueStart(object):
'''Queue implementation with head in the beginning of a list'''
def __init__(self):
self.items = []
def enqueue(self, i):
self.items.append(i)
def dequeue(self):
return self.items.pop(0)
def isEmpty(self):
return len(self.items) == 0
def size(self):
return len(self.items)
class QueueEnd(object):
'''Queue implementation with head at the end of a list'''
def __init__(self):
self.items = []
def enqueue(self, item):
self.items.insert(0, item)
def dequeue(self):
return self.items.pop()
def isEmpty(self):
return len(self.items) == 0
def size(self):
return len(self.items)
# store results for further plotting
start_add_list = [] # QueueStart.enqueue(item) runtimes for inputs of different sizes
start_pop_list = [] # the same for QueueStart.dequeue(item)
end_add_list = [] # the same for QueueEnd.enqueue(item)
end_pop_list = [] # the same for QueueEnd.dequeue(item)
for i in range(100000, 500000, 10000):
qs = QueueStart()
qs.items = list(range(i))
qe = QueueEnd()
qe.items = list(range(i))
start_add = timeit.Timer('qs.enqueue(1)', 'from __main__ import qs')
start_pop = timeit.Timer('qs.dequeue()', 'from __main__ import qs')
end_add = timeit.Timer('qe.enqueue(1)', 'from __main__ import qe')
end_pop = timeit.Timer('qe.dequeue()', 'from __main__ import qe')
start_add_list.append(start_add.timeit(number=1000))
start_pop_list.append(start_pop.timeit(number=1000))
end_add_list.append(end_add.timeit(number=1000))
end_pop_list.append(end_pop.timeit(number=1000))
And here are plots that reflect results of my experiment
It's known that insert and pop(index) are O(n). The interesting thing, though, is that from the graphs we see that insert(0, item) takes twice as long as pop(0). That observation made me wonder, why this is the case. On the surface, two methods look very similar, but, apparently, under the hood there's something interesting going on. So, the question is: could you help me figure it out?
Some reading on CPython's list implementation: http://www.laurentluce.com/posts/python-list-implementation/
Basically, lists are designed to have about twice as much memory as they need at any given time, so they can change length slightly without needing to reallocate memory. When lists need more memory, they sometimes need to move the whole list to another location in memory that has enough space. When they shrink, they can just free memory at the end of the list.
I have two iterators in python and both should follow the same "random" distribution (both should run in parallel). For instance:
class Iter1(object):
def __iter__(self):
for i in random_generator():
yield i
class Iter2(object):
def __iter__(self):
for i in random_generator():
yield i
for el1, el2 in zip(Iter1(), Iter2()):
print '{} {}'.format(el1, el2)
output should be somethig like:
0.53534 0.53534
0.12312 0.12312
0.19238 0.19238
How can I define random_generator() in a way that it creates the same random distributions in parallel for both iterators.
Note:
They should run in parallel
I can't generate the sequence in advance (it is a streaming, so I don't know the size of the sequence)
Thanks.
Specify the same seed to each call of random_generator:
import random
def random_generator(l, seed=None):
r = random.Random(seed)
for i in range(l):
yield r.random()
class Iter1(object):
def __init__(self, seed):
self.seed = seed
def __iter__(self):
for i in random_generator(10, self.seed):
yield i
class Iter2(object):
def __init__(self, seed):
self.seed = seed
def __iter__(self):
for i in random_generator(10, self.seed):
yield i
# The seed can be any hashable object, but don't use None; that
# tells random.seed() to use the current time. But make sure that
# Python itself isn't using hash randomization.
common_seed = object()
for el1, el2 in zip(Iter1(common_seed), Iter2(common_seed)):
print '{} {}'.format(el1, el2)
There is no way to control the random generation number in this way. If you want to do that you should create your own random function. But as another pythonic and simpler way you can just create one object and use itertools.tee in order to copy your iterator object to having the same result for your random sequences:
In [28]: class Iter1(object):
def __init__(self, number):
self.number = number
def __iter__(self):
for _ in range(self.number):
yield random.random()
....:
In [29]:
In [29]: num = Iter1(5)
In [30]: from itertools import tee
In [31]: num, num2 = tee(num)
In [32]: list(zip(num, num2))
Out[32]:
[(0.485400998727448, 0.485400998727448),
(0.8801649381536764, 0.8801649381536764),
(0.9684025615967844, 0.9684025615967844),
(0.9980073706742334, 0.9980073706742334),
(0.1963579685642387, 0.1963579685642387)]
I've been told to write a simple program that generates coupon codes, which should offer more than two algorithms (any two) and that the algorithm and the number of codes generated should be read from a config file. Also I've been told that the solution would involve using a known design pattern and that I should look for what pattern is.
I've come up with two solutions for this, but I don't think I've found a proper OOP design pattern that fits for the problem, since objects are data with methods that operate over that data, and in this problem there is little data to operate over, it's more a function (functional?) problem to my naive eyes. Here are the two, one is basically executing the proper static method for the algorithm in the config file and the other returns a reference to a function. Both generate the numbers and print them to the screen.
First method:
class CouponGenerator:
SEQUENTIAL_NUMBERS = "sequentialNumbers"
FIBONACCI_NUMBERS = "fibonacciNumbers"
ALPHANUMERIC_SEQUENCE = "alphanumericSequence"
quantity = 0
algorithm = ""
def __init__(self, quantity, algorithm):
self.quantity = quantity
self.algorithm = algorithm
def generateCouponList(self):
numbers = list()
if self.algorithm == self.SEQUENTIAL_NUMBERS:
numbers = CouponGenerator.generateSequentialNumbers(self.quantity)
elif self.algorithm == self.FIBONACCI_NUMBERS:
numbers = CouponGenerator.generateFibonacciSequence(self.quantity)
for number in numbers:
print number
#staticmethod
def getCouponGenerator(configFile):
cfile = open(configFile)
config = cfile.read()
jsonconfig = json.loads(config)
cg = CouponGenerator(jsonconfig['quantity'], jsonconfig['algorithm'])
return cg
#staticmethod
def generateSequentialNumbers(quantity):
numbers = list()
for n in range(1, quantity+1):
zeroes = 6-len(str(n))
numbers.append(zeroes*"0"+str(n))
return numbers
#staticmethod
def generateFibonacciSequence(quantity):
def fib(n):
a, b = 0, 1
for _ in xrange(n):
a, b = b, a + b
return a
numbers = list()
for n in range(1, quantity+1):
number = fib(n)
zeros = 6-len(str(number))
numbers.append(zeros*"0"+str(number))
return numbers
if __name__ == "__main__":
generator = CouponGenerator.getCouponGenerator("config")
generator.generateCouponList()
Second solution:
class CouponGenerator:
#staticmethod
def getCouponGenerator(algorithm):
def generateSequentialNumbers(quantity):
numbers = list()
for n in range(1, quantity+1):
zeroes = 6-len(str(n))
numbers.append(zeroes*"0"+str(n))
return numbers
def generateFibonacciSequence(quantity):
def fib(n):
a, b = 0, 1
for _ in xrange(n):
a, b = b, a + b
return a
numbers = list()
for n in range(1, quantity+1):
number = fib(n)
zeros = 6-len(str(number))
numbers.append(zeros*"0"+str(number))
return numbers
generators = {"sequentialNumbers": generateSequentialNumbers,
"fibonacciNumbers": generateFibonacciSequence}
return generators[algorithm]
class CouponGeneratorApp:
configFile = "config"
def __init__(self):
cfile = open(self.configFile)
config = cfile.read()
self.jsonconfig = json.loads(config)
self.generateCouponCodes()
def generateCouponCodes(self):
generator = CouponGenerator.getCouponGenerator(self.jsonconfig["algorithm"])
numbers = generator(self.jsonconfig["quantity"])
for n in numbers:
print n
if __name__ == "__main__":
app = CouponGeneratorApp()
If you want to make it a little more object oriented I suggest you use some kind of strategy pattern, that means, use a class per generation algorithm (which should have a common interface) and specify that CouponGenrator use an object which implements this interface to do whatever it has to do. This is theory and making interface and everything in your case might be a little to much.
http://en.wikipedia.org/wiki/Strategy_pattern
you could try something like :
class SequentialGenerator(Object):
def implementation():
...
class FibonacciGenerator(Object):
def implementation():
...
class CouponGenerator(Object):
def set_generator(generator):
# set self.generator to either an instance
# of FibonacciGenerator or SequentialGenerator
def generate_coupon_code():
# at some point calls self.generator.implementation()
I have implemented the following class to generate either 'p' or 'q' based on a input frequency of 'p'. However, this implementation breaks if the frequency gets smaller than the size of the list used to store the options. Is there a way in which I can implement this to work for any value of p?
from random import random
class AlleleGenerator(object):
"""
allele generator - will break if p < 0.001
"""
def __init__(self, p):
"""construct class and creates list to select from"""
self.values = list()
for i in xrange(int(1000*p)):
self.values.append('p')
while len(self.values) <= 1000:
self.values.append('q')
def next(self):
"""Returns p or q based on allele frequency"""
rnd = int(random() * 1000)
return self.values[rnd]
def __call__(self):
return self.next()
Don't use self.values. In next, just generate a random number between 0 and 1, and return 'p' if the random number is less than p:
from random import random
class AlleleGenerator(object):
def __init__(self, p):
"""construct class and creates list to select from"""
self.p = p
def next(self):
"""Returns p or q based on allele frequency"""
return 'p' if random() < self.p else 'q'
def __call__(self):
return self.next()
Also, be careful not to use classes when a function suffices.
For example, you might consider using a generator function:
from random import random
def allele_generator(p):
while True:
yield 'p' if random() < p else 'q'
agen = allele_generator(0.001)
for i in range(3):
print(next(agen))