One of the exercises (namely, #6) asks us to compare performance of queue implementations (with head in the beinning / at the end of a list). That sounds like there could be some difference, so I tried to figure it out. Here's my code
import timeit
class QueueStart(object):
'''Queue implementation with head in the beginning of a list'''
def __init__(self):
self.items = []
def enqueue(self, i):
self.items.append(i)
def dequeue(self):
return self.items.pop(0)
def isEmpty(self):
return len(self.items) == 0
def size(self):
return len(self.items)
class QueueEnd(object):
'''Queue implementation with head at the end of a list'''
def __init__(self):
self.items = []
def enqueue(self, item):
self.items.insert(0, item)
def dequeue(self):
return self.items.pop()
def isEmpty(self):
return len(self.items) == 0
def size(self):
return len(self.items)
# store results for further plotting
start_add_list = [] # QueueStart.enqueue(item) runtimes for inputs of different sizes
start_pop_list = [] # the same for QueueStart.dequeue(item)
end_add_list = [] # the same for QueueEnd.enqueue(item)
end_pop_list = [] # the same for QueueEnd.dequeue(item)
for i in range(100000, 500000, 10000):
qs = QueueStart()
qs.items = list(range(i))
qe = QueueEnd()
qe.items = list(range(i))
start_add = timeit.Timer('qs.enqueue(1)', 'from __main__ import qs')
start_pop = timeit.Timer('qs.dequeue()', 'from __main__ import qs')
end_add = timeit.Timer('qe.enqueue(1)', 'from __main__ import qe')
end_pop = timeit.Timer('qe.dequeue()', 'from __main__ import qe')
start_add_list.append(start_add.timeit(number=1000))
start_pop_list.append(start_pop.timeit(number=1000))
end_add_list.append(end_add.timeit(number=1000))
end_pop_list.append(end_pop.timeit(number=1000))
And here are plots that reflect results of my experiment
It's known that insert and pop(index) are O(n). The interesting thing, though, is that from the graphs we see that insert(0, item) takes twice as long as pop(0). That observation made me wonder, why this is the case. On the surface, two methods look very similar, but, apparently, under the hood there's something interesting going on. So, the question is: could you help me figure it out?
Some reading on CPython's list implementation: http://www.laurentluce.com/posts/python-list-implementation/
Basically, lists are designed to have about twice as much memory as they need at any given time, so they can change length slightly without needing to reallocate memory. When lists need more memory, they sometimes need to move the whole list to another location in memory that has enough space. When they shrink, they can just free memory at the end of the list.
Related
So I am trying to keep a list in order at all times. So whenever new data comes in, I will insert this to the 'sorted list'.
Question, why is bisect.insort much faster than my linked list implementation. I know bisect search takes O(logn), but due to the insertion to the list, it really takes O(n). Where as the linked list implementation should also be O(n). Inserting new values in the sorted linked list should also be O(n). But why is the time comparison much much slower? Is my linked list implementation not optimized?
Here's my sample code:
import timeit
import bisect
import random
# Testing parameters
NUM_ITERATION_TEST = 10
TOTAL_NUM_DATA = 10000
DATA = [random.randint(0, 1000) for x in range(TOTAL_NUM_DATA)]
class Node():
def __init__(self, val):
self.val = val
self.next = None
class LinkedListIterator():
def __init__(self, head):
self.current = head
def __iter__(self):
return self
def __next__(self):
if not self.current:
raise StopIteration
else:
val = self.current.val
self.current = self.current.next
return val
class LinkedList():
def __init__(self):
self.head = None
def __iter__(self):
return LinkedListIterator(self.head)
def insert(self, val):
new_node = Node(val)
if self.head is None:
self.head = new_node
return
curr = self.head
if curr.val > val:
new_node.next = curr
self.head = new_node
return
while curr.next:
if curr.next.val > val:
break
curr = curr.next
new_node.next = curr.next
curr.next = new_node
def method1(DATA):
sorted_list = []
for num in DATA:
bisect.insort_right(sorted_list, num)
def method2(DATA):
sorted_list = LinkedList()
for num in DATA:
sorted_list.insert(num)
if __name__ == "__main__":
# METHOD 1
print("Method 1 Execution Time:")
print(timeit.timeit("test_timeit.method1(test_timeit.DATA)",
number=NUM_ITERATION_TEST,
setup="import test_timeit"))
# METHOD 2
print("Method 2 Execution Time:")
print(timeit.timeit("test_timeit.method2(test_timeit.DATA)",
number=NUM_ITERATION_TEST,
setup="import test_timeit"))
The execution times are:
Method 1 Execution Time:
0.11593010000000001
Method 2 Execution Time:
33.0651346
I also tried using other implementations like sorted dicts, but nothing really beats the bisect implementation. Is there a more efficient implementation? Basically want an always sorted list of data, where I would constantly add/insert new data to the list..
Your own implementation is executed by the Python interpreter, creating and linking dynamic run-time objects with a lot of superfluous validity checks, one object creation or deletion at a time.
The built-in function is optimized in C, already complied. It can allocate memory in larger chunks, manufacture new objects in a single struct mapping, avoid many of the validity checks, ...
A C-based built-in will (almost) always beat anything you can program in Python.
I've started learning data structures using python. I've been stuck with this since yesterday and haven't thought of a solution yet. The tutorials I see on the internet links the values of a linked list manually. I've been searching and thinking if there is a way to do it automatically.
Here is a sample code from tutorialspoint.com
class Node:
def __init__(self, dataval=None):
self.dataval = dataval
self.nextval = None
class SLinkedList:
def __init__(self):
self.headval = None
def listprint(self):
printval = self.headval
while printval is not None:
print (printval.dataval)
printval = printval.nextval
list = SLinkedList()
list.headval = Node("Mon")
e2 = Node("Tue")
e3 = Node("Wed")
# Link first Node to second node
list.headval.nextval = e2
# Link second Node to third node
e2.nextval = e3
list.listprint()
Sure, here's one way you might do it, without disrupting the current set-up of
your classes too much:
class Node:
def __init__(self, data=None):
self.data = data
self.next_node = None
#classmethod
def from_iterable(cls, iterable):
node = cls()
iterable = iter(iterable)
try:
node.data = next(iterable)
except StopIteration:
return None
node.next_node = cls.from_iterable(iterable)
return node
class SLinkedList:
def __init__(self, head_node=None):
self.head_node = head_node
#classmethod
def from_iterable(cls, iterable):
return cls(Node.from_iterable(iterable))
def listprint(self):
print_node = self.head_node
while print_node is not None:
print(print_node.data)
print_node = print_node.next_node
my_list = SLinkedList.from_iterable(["Mon", "Tue", "Wed"])
my_list.listprint()
with output
Mon
Tue
Wed
I've renamed a couple of bits here (remember list is a Python builtin!) and
there and changed some function signatures to be a little more natural, but
mostly what I've just done is added the two from_iterable methods.
This is a very recursive implementation. If you need lists bigger than about
1000 elements you'll have to write it all in terms of while loops (although a
linked list implementation like this in pure Python won't ever be very fast)
Of course it is possible to add an element into a linked list. You can insert an element either in first position (trivial) or after an existing node (slightly more complex because you have to specify how to designate that node).
...
class SLinkedList:
def __init__(self):
self.headval = None
def listprint(self):
printval = self.headval
while printval is not None:
print (printval.dataval)
printval = printval.nextval
def insertfirst(self, val):
node = Node(val)
node.nextval = self.headval
self.headval = node
>>> lst = SLinkedList() # do not use list which is the standard library list
>>> lst.insertfirst("Wed")
>>> lst.insertfirst("Tue")
>>> lst.insertfirst("Mon")
>>> lst.listprint()
Mon
Tue
Wed
like #Reut Sharabani said in the comment right now the node constructor takes only the dataval and set the next one to be None. To get the behavior you want you would have to make a second constructor where you give dataval and next to the Node. In your SLinkedList class you will then construct the linked list in reverse by starting at the tail node and moving to towards the head.
If this what you meant by automatic.
so i am a bit confused as to how to attack the StackList and Node portion of this homework.. I barely have any programming experience, and after a full day of struggling I finally understood StackArray. But I also need to understand how I can implement the same functions into a LinkedList... Here are the instructions in case I cant be clear ..
The goal of this project is to implement the Stack Abstract Data Type using the built in List construct in
Python and the Stack ADT using the simple linked data structure covered in class.
As discussed in class you are to allocate a list of size stack_capacity and use this to store the items in the
stack. Since Lists in python expand when more storage is needed you will have to provide a mechanism to
prevent the list inside your stack from growing. This really prevents the stack from growing if the user only
accesses it through the given interface but that is fine for the purposes of this exercise. (This prevents the stack
from using more space that a user might want. Think of this as a requirement for an application on a small
device than has very limited storage.) In the case when a user attempts to push an item on to a full stack, your
push function (method) should raise an IndexError. Similarly if a user tries to pop an empty, your pop function
(method) should raise an IndexError.
tly
class StackArray:
"""Implements an efficient last-in first-out Abstract Data Type using a Python List"""
def __init__(self, capacity):
"""Creates and empty stack with a capacity"""
self.capacity = capacity # this is example for list implementation
self.items = [None]*capacity # this is example for list implementation
self.num_items = 0 # this is example for list implementation
def is_empty(self):
"""Returns true if the stack self is empty and false otherwise"""
def is_full(self):
"""Returns true if the stack self is full and false otherwise"""
def push(self, item):
def pop(self):
def peek(self):
def size(self):
"""Returns the number of elements currently in the stack, not the capacity"""
Submit to two files:
• stacks.py containing a list based implementation of stack and a linked implementation of stack. The
classes must be called: StackArray and StackLinked . Both implementations should follow the above
specification and be thoroughly tested.
• test_stacks.py contains your set of tests to ensure you classes work correctly
So far this is what I have come up with
class StackArray:
"""Implements an efficient last-in first-out Abstract Data Type using a Python List"""
def __init__(self, capacity):
"""Creates and empty stack with a capacity"""
self.capacity = capacity # this is example for list implementation
self.items = [None]*capacity # this is example for list implementation
self.num_items = 0 # this is example for list implementation
def is_empty(self):
"""Returns true if the stack self is empty and false otherwise"""
return self.num_items == 0
def is_full(self):
"""Returns true if the stack self is full and false otherwise"""
return self.num_items == self.capacity
def push(self, item):
self.num_items += 1
self.items[self.num_items - 1] = item
def pop(self):
return self.items.pop(0)
def peek(self):
return self.items[0]
def size(self):
"""Returns the number of elements currently in the stack, not the capacity"""
return self.num_items
class Node:
# ? ?
# nodes have 2 parts to them data and pointer. Pointer always points to the data of previous
class StackLinkList:
def __init__(self, capacity):
"""Creates and empty stack with a capacity"""
self.capacity = capacity # this is example for list implementation
self.items = [None] * capacity # this is example for list implementation
self.num_items = 0 # this is example for list implementation
def is_empty(self):
"""Returns true if the stack self is empty and false otherwise"""
def is_full(self):
"""Returns true if the stack self is full and false otherwise"""
def push(self, item):
def pop(self):
def peek(self):
def size(self):
"""Returns the number of elements currently in the stack, not the capacity"""
And these are my testcases
import unittest
from stacks import *
class TestCase(unittest.TestCase):
# testing an empty Array
def test_if_empty(self):
s1 = StackArray(3) #[none,none,none]
self.assertTrue(s1.is_empty()) # Should be True
def test_if_not_empty(self):
s1 = StackArray(4) #[none,none,none,none]
s1.push(2) #[2,none,none,none]
s1.push(3) #[2,3,none,none]
s1.push(4) #[2,3,4,none]
self.assertFalse(s1.is_empty()) # this is saying that the statement that the string is empty is false
def test_if_full(self):
s1 = StackArray(3) #[none,none,none]
s1.push(5) #[5,none,none]
s1.push(4) #[5,4,none]
s1.push(3) #[5,4,3]
self.assertTrue(s1.is_full()) #this is checking to see if its true that the stack is full
def test_if_not_full(self):
s1 = StackArray(2) # [none,none]
s1.push(2) # [2,none]
self.assertFalse(s1.is_full()) # it is false that the stack is full
def test_IfWeCanPush(self):
s1 = StackArray(3) #[none,none,none]
s1.push(2) #[2,none]
s1.push(3) #[2,3]
self.assertFalse(s1.is_full()) # it is false that the stack is full
def test_IfWeCantPush(self):
s1 = StackArray(3) #[none,none,none]
s1.push(2) #[2,none,none]
s1.push(3) #[2,3,none]
s1.push(3) #[2,3,3]
self.assertTrue(s1.is_full())
def test_ifWeCanPop(self):
s1 = StackArray(3) #[none,none,none]
s1.push(3) #[3,none,none]
s1.push(3) #[3,3,none]
s1.push(5) #[3,3,5]
# This gives us an array [3,3,5]
s1.pop() # now we have [3,5]
s1.pop() # now we have [5]
self.assertFalse(s1.is_empty())
self.assertEqual(s1.pop(),5)
def test_ifWeCantPop(self):
s1 = StackArray(2) # this gives us [none,none]
self.assertFalse(s1.pop())
def test_ifWeCanPeek(self):
s1 = StackArray(4)
s1.push(4)
s1.push(3)
s1.push(5)
s1.push(54)
self.assertEqual(s1.peek(),4) #this statement says that if we peek at the top of the stack it should be 4
def test_ifWeCanSeeTheSize(self):
s1 = StackArray(5)
s1.push(3)
self.assertEqual(s1.size(),1) # this statement checks that the size or number of items in the
# stack is 1 ... [ 3,none,none,none,none] only has one item which is 3
if (name == 'main'):
unittest.main()
This question already has answers here:
Function changes list values and not variable values in Python [duplicate]
(7 answers)
Closed 7 years ago.
The code below imports a linked list from LinkedQfile and creates a list object with some node objects.
If I run this code the output from check_something() becomes CD .
I thought linked_list in check_something() would become a local object inside the function and since I'm not assigning whatever I'm returning to anything it wouldn't change, that is I would expect the output ABCD. This is obviously not the case so I'm wondering if someone could explain to me what is going on here?
If linked_list was a global variable I would expect this outcome, my guess is that the return statements in each function returns some information to the object but I have no idea how and why! (I got the code from a lecture note and it works just like I want it to, I just want to know why!)
from LinkedQFile import LinkedQ
def check_something(linked_list):
check_first_element(linked_list)
check_second_element(linked_list)
print(linked_list)
def check_first_element(linked_list):
word = linked_list.dequeue()
if word == "A":
return
def check_second_element(linked_list):
word = linked_list.dequeue()
if word == "B":
return
def main():
list = LinkedQ()
list.enqueue("A")
list.enqueue("B")
list.enqueue("C")
list.enqueue("D")
check_something(list)
main()
And if needed, the LinkedQFile:
class Node:
def __init__(self, x, next= None):
self._data = x
self._next = next
def getNext(self):
return self._next
def setNext(self, next):
self._next = next
def getValue(self):
return self._data
def setValue(self, data):
self._data = data
class LinkedQ:
def __init__(self):
self._first = None
self._last = None
self._length = 0
def __str__(self):
s = ""
p = self._first
while p != None:
s = s + str(p.getValue())
p = p.getNext()
return s
def enqueue(self, kort):
ny = Node(kort)
if self._first == None:
self._first = ny
else:
self._last = self._first
while self._last.getNext():
self._last = self._last.getNext()
self._last.setNext(ny)
self._length += 1
def dequeue(self):
data = self._first.getValue()
self._first = self._first.getNext()
self._length = self._length - 1
return data
You're right about linked_list being a local variable, but just because a variable is local doesn't mean it can't reference something that isn't. In order for it to do what you expected, it would need to copy your entire linked list every time you pass it to a function, which wouldn't make sense.
Here's a simple example that illustrates the idea of a shared object. In this example, an empty list is created and assigned to a. Then a is assigned to b. This does not copy the list. Instead, there is a single list, referenced by both a and b. When it is modified, through either a or b, both a and b reflect the change:
>>> a = []
>>> b = a
>>> a.append("x")
>>> a
['x']
>>> b
['x']
>>>
The same thing is happening with your class objects. In fact, your linked lists wouldn't work at all if it didn't.
I am taking Algorithms, Part I via Coursera and am looking to test the run time of the Quick Find, Quick Union, and Weighted Quick Union algorithms. The course is in Java, which I am unfamiliar with, so I have gone through and attempted to recreate the algorithms in Python, which I am more familiar with.
Now that I have everything implemented, I aim to test each function to verify run time/complexity. I had been thinking of using the timeit library, but that seems to be throwing incorrect results, e.g., Weighted Quick Union takes longer to complete than QuickUnion.
How can I verify that a Weighted Quick Union is in fact O(log n) and is faster than Quick Union? Here is what I have created and tried so far:
O(N**2) - Slow
class QuickFind_Eager:
def __init__(self, nodes):
self.array = [num for num in range(nodes)]
# Joins two nodes into a component
def union(self, first_node, second_node):
for pos, val in enumerate(self.array):
if self.array[pos] == self.array[first_node]:
self.array[pos] = self.array[second_node]
# Checks if two nodes are in the same component
def connected(self, first_node, second_node):
return self.array[first_node] == self.array[second_node]
O(N) - Still too slow - Avoid
class QuickUnion_Lazy:
def __init__(self, nodes):
self.array = [num for num in range(nodes)]
# Follows parent pointers to actual root
def root(self, parent):
while parent != self.array[parent]:
parent = self.array[parent]
return parent
# Joins two nodes into a component
def union(self, first_node, second_node):
self.array[first_node] = self.array[second_node]
# Checks if two nodes are in the same component
def connected(self, first_node, second_node):
return self.root(first_node) == self.root(second_node)
O(log N) - Pretty darn fast
class WeightedQuickUnion:
def __init__(self, nodes):
self.array = [num for num in range(nodes)]
self.weight = [num for num in range(nodes)]
# Follows parent pointers to actual root
def root(self, parent):
while parent != self.array[parent]:
parent = self.array[parent]
return parent
# Joins two nodes into a component
def union(self, first_node, second_node):
if self.root(first_node) == self.root(second_node):
return
if (self.weight[first_node] < self.weight[second_node]):
self.array[first_node] = self.root(second_node)
self.weight[second_node] += self.weight[first_node]
else:
self.array[second_node] = self.root(first_node)
self.weight[first_node] += self.weight[second_node]
# Checks if two nodes are in the same component
def connected(self, first_node, second_node):
return self.root(first_node) == self.root(second_node)
O(N + M lg* N) - wicked fast
class WeightedQuickUnion_PathCompression:
def __init__(self, nodes):
self.array = [num for num in range(nodes)]
self.weight = [num for num in range(nodes)]
# Follows parent pointers to actual root
def root(self, parent):
while parent != self.array[parent]:
self.array[parent] = self.array[self.array[parent]]
parent = self.array[parent]
return parent
# Joins two nodes into a component
def union(self, first_node, second_node):
if self.root(first_node) == self.root(second_node):
return
if self.weight[first_node] < self.weight[second_node]:
self.array[first_node] = self.root(second_node)
self.weight[second_node] += self.weight[first_node]
else:
self.array[second_node] = self.root(first_node)
self.weight[first_node] += self.weight[second_node]
# Checks if two nodes are in the same component
def connected(self, first_node, second_node):
return self.root(first_node) == self.root(second_node)
Test run time
def test_quickfind(quickfind):
t = quickfind(100)
t.union(1,2)
t.connected(1,2)
t.union(4,2)
t.union(3,4)
t.connected(0,2)
t.connected(1,4)
t.union(0,3)
t.connected(0,4)
import timeit
t = timeit.timeit(stmt="test_quickfind(QuickFind_Eager)", setup="from __main__ import QuickFind_Eager; from __main__ import test_quickfind", number=100000)
print(t)
# 11.4380569069981
t = timeit.timeit(stmt="test_quickfind(QuickUnion_Lazy)", setup="from __main__ import QuickUnion_Lazy; from __main__ import test_quickfind", number=100000)
print(t)
# 1.4744456350017572
t = timeit.timeit(stmt="test_quickfind(WeightedQuickUnion)", setup="from __main__ import WeightedQuickUnion; from __main__ import test_quickfind", number=100000)
print(t)
# 2.738758583996969
t = timeit.timeit(stmt="test_quickfind(WeightedQuickUnion_PathCompression)", setup="from __main__ import WeightedQuickUnion_PathCompression; from __main__ import test_quickfind", number=100000)
print(t)
# 3.0113827050008695
Update
Added results from timeit.
You need to tabulate the algorithms' running times as a function of problem size, ie. calling quickfind for different problem sizes ( say 100,200,300,400,500; beware, expect the latter to run at least 3 minutes for the naive O(n^2) algorithm ).
You still have no guarantees that you observe the asymptotic run time functions (that's what the O notation is about: O(f) actually describes a family of functions g_i, g_i = a_i * f(n) + b_i; a_i, b_i: const [sort of abusing the notation]), since some of your implementations may run into a resource exhaustion (read: no more ram) resulting in significant performance hits beyond the realm of your implementations.
Implementation of union function in QuickFindEager class is not correct.
self.array[first_node] and self.array[second_node] should be added to variables before loop and than changed in loop from variables
def union(self, first_node, second_node):
pid = self.array[first_node]
qid = self.array[second_node]
for pos, val in enumerate(self.array):
if self.array[pos] == pid:
self.array[pos] = qid