Why this is a bad bubble sort algorithm? - python

I started studying Data Structures and algorithms, and tried to implement Bubble sort:
def BubbleSort(list):
for a in range(len(list)):
for b in range(len(list)):# I could start this loop from 1
if list[a]< list[b]: # to avoid comparing the first element twice
temp=list[a]
list[a]=list[b]
list[b]=temp
return list
I browsed the net and books - but found no Python implementation of bubble sort.
What's wrong with the above?

Several things:
the algorihm will not always sort correctly;
syntactically it seems to sort the opposite way;
it takes twice the amount of time necessary to perform bubble sort;
it is not bubblesort; and
you better never use variables in Python named list, dict, etc.
BubbeSort sorts by comparing two adjacent elements: the so-called "bubble". If checks if the left item is indeed less than the right one. If this is not the case, it swaps the elements. The algorithm iterates maximum n times over the list, after which it is guaranteed to be sorted.
So a very basic implementation would be:
def BubbleSort(data):
for _ in range(len(data)): # iterate n times
for i in range(len(data)-1): # i is the left index of the bubble
if data[i+1] > data[i]: # if the left item is greater
# perform a swap
temp = data[i]
data[i] = data[i+1]
data[i+1] = temp
return data
Now we can improve the algorithm (approximately let the algorithm work in half the time) by stopping at len(data)-1-j, since after each iteration, the right most element over which the bubble has moved is guaranteed to be the maximum:
def BubbleSort(data):
for j in range(len(data)): # iterate n times
for i in range(len(data)-1-j): # i is the left index of the bubble
if data[i+1] > data[i]: # if the left item is greater
# perform a swap
temp = data[i]
data[i] = data[i+1]
data[i+1] = temp
return data
But using bubblesort is - except for some very rare cases - inefficient. It is better to use faster algorithms like QuickSort, MergeSort, and TimSort (the builtin sorting algorithm of Python).

Here's a short list of books which implement bubble sort:
Fundamentals of Python: Data Structures, 1st ed.: Data Structures
Introduction to Numerical Programming: A Practical Guide for Scientists and Engineers Using Python and C/C++ (Series in Computational Physics)
Fundamentals of Python: First Programs
Python 3 for Absolute Beginners (Expert's Voice in Open Source)
Python Data Structures and Algorithms

You can change the for loop from index a + 1 instead of 0 which will avoid comparing the first element to itself. Makes it a bit faster.
Use the swap function to swap the values of the list[a] and list[b] elements, rather than using a temporary variable.
Use the sorted function to check if the list is already sorted, and return the list immediately if it is.

Related

Coding exercise for practicing adjacency list and BFS

I have a coding exercise with a row of trampolines, each with a minimum and maximum "bounciness". I have the index of a starting trampoline and an end trampoline, and with this, I need to find the minimum amount of jumps required to reach the end trampoline from the start trampoline.
I have tried creating an adjacency-list, in which I list all possible jumps from a trampoline. This is fine until I reach a large number of trampolines. The Problem is it takes O(n^2) time.
This is how I create the Adjacency List:
def createAL (a, b, l):
al = [list() for _ in range(l)]
for i in range(l):
for j in range(a[i], b[i]+1):
if (i+j) <= l-1:
al[i].append(i+j)
if (i-j) >= 0:
al[i].append(i-j)
for i in range(len(al)):
al[i] = list(set(al[i]))
return al
"a" is the min. bounciness, "b" the max bounciness and "l" is the length of the two lists.
As you can see, the problem is I have 2 nested loops. Does anyone have an idea for a more efficient way of doing this? (preferably wo/ the loops)
Assuming "bounciness" is strictly positive, you can omit this part:
for i in range(len(al)):
al[i] = list(set(al[i]))
...as there is no way you could have duplicates in those lists.
(If however bounciness could be 0 or negative, then first replace any values below 1 by 1 in a)
The building of a can be made a bit faster by:
making the ranges on the actual target indexes (so you don't need i+j in every loop),
cliping those ranges using min() and max(), avoiding if statements in the loop
avoiding individual append calls, using list comprehension
Result:
al = [
[*range(max(0, i-b[i]), i-a[i]+1), *range(i+a[i], min(l, i+b[i]+1))]
for i in range(l)
]
Finally, as this adjacency list presumably serves a BFS algorithm, you could also consider that building the adjacency list may not be necessary, as finding the adjacent nodes during BFS is a piece of cake using a and b on-the-spot. I wonder if you really gain time by creating the adjacency list.
In your BFS code, you probably have something like this (where i is the "current" node):
for neighbor in al[i]:
This could be replaced with:
for neighbor in (*range(max(0, i-b[i]), i-a[i]+1), *range(i+a[i], min(l, i+b[i]+1))):
We should also realise that if the target trampoline is found in a number of steps that is much smaller than the number of trampolines, then there is a probability that not all trampolines are visited during the BFS search. And in that case it would have been a waste of time to have created the complete adjacency list...

Under what circumstances is bidirectional bubble sort better than standard bubble sort?

I have implemented the bidirectional bubble sort algorithm. But I can't think of a scenario where bidirectional bubble sort better than standard bubble sort..can some one give me some clue?
My implementation in Python:
def bubbleSort_v(my_list):
s = 0
e = 0
right = True
for index in range(len(my_list)-1,0,-1):
if right:
right = False
for idx in range(s,index+e,1):
if my_list[idx] > my_list[idx+1]:
my_list[idx],my_list[idx+1] = my_list[idx+1],my_list[idx]
s += 1
else:
right = True
for idx in range(index-1+s,e,-1):
if my_list[idx] < my_list[idx-1]:
my_list[idx],my_list[idx-1] = my_list[idx-1],my_list[idx]
e += 1
return my_list
Thanks!
In case there is an element that is at the right (for instance the last index) of the list, that should be moved to the left side (for instance the first index) of the list. This will take a long time with single-directional bubble-sort: each time it will move only one step.
If we perform bi-directional bubblesort however, the element will be moved to the left in the first step to the right.
So in general it is better if one or more elements should be moved (over a large number of places) in the opposite direction in which the single direction bubblesort is done.
For your implementation of bubblesort, it will however not make much difference: usually bubblesort will test while it sorts. In case it can do a full run without swaps, it will simply stop working.
For example a single-directional bubblesort that moves to the right:
def single_bubble(data):
for i in range(len(data)):
can_exit = True
for j in range(len(data)-i-1):
if data[j] > data[j+1]:
data[j],data[j+1] = data[j+1],data[j]
can_exit = False
if can_exit:
return
So in case you want to move an element a large number of places to the left, then for each such step, you will have to do a full loop again. We can optimize the above method a bit more, but this behavior cannot be eliminated.
Bi-directional bubblesort can be implemented like:
def single_bubble(data):
for i in range(len(data)):
can_exit = True
for j in range(len(data)-i-1):
if data[j] > data[j+1]:
data[j],data[j+1] = data[j+1],data[j]
can_exit = False
if can_exit:
return
for j in range(len(data)-i,i,-1):
if data[j-i] > data[j]:
data[j-1],data[j] = data[j],data[j-1]
can_exit = False
if can_exit:
return
That being said, bubble sort is not a good sorting algorithm in general. There exist way better algorithms like quicksort, mergesort, timsort, radixsort (for numerical data), etc.
Bubblesort is actually a quite bad algorithm even among O(n2) algorithms, since it will move an object one place at a time. Insertion sort will simply first calculate what has to move and then move that part of the list quite fast saving a lot of useless moves. The algorithms can however serve an educational purpose when learning to design, implement and analyze algorithms, since the algorithms will perform significantly bad compared to more advanced ones.
Implementing (general purpose) sorting function yourself is probably not beneficial: good algorithms have been implemented for all popular programming languages and these algorithms are fast, consume less memory, etc.

Memoized to DP solution - Making Change

Recently I read a problem to practice DP. I wasn't able to come up with one, so I tried a recursive solution which I later modified to use memoization. The problem statement is as follows :-
Making Change. You are given n types of coin denominations of values
v(1) < v(2) < ... < v(n) (all integers). Assume v(1) = 1, so you can
always make change for any amount of money C. Give an algorithm which
makes change for an amount of money C with as few coins as possible.
[on problem set 4]
I got the question from here
My solution was as follows :-
def memoized_make_change(L, index, cost, d):
if index == 0:
return cost
if (index, cost) in d:
return d[(index, cost)]
count = cost / L[index]
val1 = memoized_make_change(L, index-1, cost%L[index], d) + count
val2 = memoized_make_change(L, index-1, cost, d)
x = min(val1, val2)
d[(index, cost)] = x
return x
This is how I've understood my solution to the problem. Assume that the denominations are stored in L in ascending order. As I iterate from the end to the beginning, I have a choice to either choose a denomination or not choose it. If I choose it, I then recurse to satisfy the remaining amount with lower denominations. If I do not choose it, I recurse to satisfy the current amount with lower denominations.
Either way, at a given function call, I find the best(lowest count) to satisfy a given amount.
Could I have some help in bridging the thought process from here onward to reach a DP solution? I'm not doing this as any HW, this is just for fun and practice. I don't really need any code either, just some help in explaining the thought process would be perfect.
[EDIT]
I recall reading that function calls are expensive and is the reason why bottom up(based on iteration) might be preferred. Is that possible for this problem?
Here is a general approach for converting memoized recursive solutions to "traditional" bottom-up DP ones, in cases where this is possible.
First, let's express our general "memoized recursive solution". Here, x represents all the parameters that change on each recursive call. We want this to be a tuple of positive integers - in your case, (index, cost). I omit anything that's constant across the recursion (in your case, L), and I suppose that I have a global cache. (But FWIW, in Python you should just use the lru_cache decorator from the standard library functools module rather than managing the cache yourself.)
To solve for(x):
If x in cache: return cache[x]
Handle base cases, i.e. where one or more components of x is zero
Otherwise:
Make one or more recursive calls
Combine those results into `result`
cache[x] = result
return result
The basic idea in dynamic programming is simply to evaluate the base cases first and work upward:
To solve for(x):
For y starting at (0, 0, ...) and increasing towards x:
Do all the stuff from above
However, two neat things happen when we arrange the code this way:
As long as the order of y values is chosen properly (this is trivial when there's only one vector component, of course), we can arrange that the results for the recursive call are always in cache (i.e. we already calculated them earlier, because y had that value on a previous iteration of the loop). So instead of actually making the recursive call, we replace it directly with a cache lookup.
Since every component of y will use consecutively increasing values, and will be placed in the cache in order, we can use a multidimensional array (nested lists, or else a Numpy array) to store the values instead of a dictionary.
So we get something like:
To solve for(x):
cache = multidimensional array sized according to x
for i in range(first component of x):
for j in ...:
(as many loops as needed; better yet use `itertools.product`)
If this is a base case, write the appropriate value to cache
Otherwise, compute "recursive" index values to use, look up
the values, perform the computation and store the result
return the appropriate ("last") value from cache
I suggest considering the relationship between the value you are constructing and the values you need for it.
In this case you are constructing a value for index, cost based on:
index-1 and cost
index-1 and cost%L[index]
What you are searching for is a way of iterating over the choices such that you will always have precalculated everything you need.
In this case you can simply change the code to the iterative approach:
for each choice of index 0 upwards:
for each choice of cost:
compute value corresponding to index,cost
In practice, I find that the iterative approach can be significantly faster (e.g. *4 perhaps) for simple problems as it avoids the overhead of function calls and checking the cache for preexisting values.

Is there a non-recursive way of separating each list elements into their own list?

I was looking at Wikipedia's pseudo-code (and other webpages like sortvis.org and sorting-algorithm.com) on merge sort and saw the preparation of a merge uses recursion.
I was curious to see if there is a non-recursive way to do it.
Perhaps something like a for each i element in list, i=[i-th element].
I am under the impression that recursion is keep-it-to-a-minimum-because-it's-undesirable, and so therefore I thought of this question.
The following is a pseudo-code sample of the recursive part of the merge-sort from Wikipedia:
function merge_sort(list m)
// if list size is 1, consider it sorted and return it
if length(m) <= 1
return m
// else list size is > 1, so split the list into two sublists
var list left, right
var integer middle = length(m) / 2
for each x in m up to middle
add x to left
for each x in m after or equal middle
add x to right
// recursively call merge_sort() to further split each sublist
// until sublist size is 1
left = merge_sort(left)
right = merge_sort(right)
Bottom-up merge sort is a non-recursive variant of merge sort.
See also this wikipedia page for a more detailed pseudocode implementation.
middle = len(lst) / 2
left = lst[:middle]
right = lst[middle:]
List slicing works fine.
As an aside - recursion is not undesirable per se.
Recursion is undesirable if you have limited stack space (are you afraid of stackoverflow? ;-) ), or in some cases where the time overhead of function calls is of great concern.
For much of the time these conditions do not hold; readability and maintainability of your code will be more relevant. Algorithms like merge sort make more sense when expressed recursively in my opinion.

Finding Nth item of unsorted list without sorting the list

Hey. I have a very large array and I want to find the Nth largest value. Trivially I can sort the array and then take the Nth element but I'm only interested in one element so there's probably a better way than sorting the entire array...
A heap is the best data structure for this operation and Python has an excellent built-in library to do just this, called heapq.
import heapq
def nth_largest(n, iter):
return heapq.nlargest(n, iter)[-1]
Example Usage:
>>> import random
>>> iter = [random.randint(0,1000) for i in range(100)]
>>> n = 10
>>> nth_largest(n, iter)
920
Confirm result by sorting:
>>> list(sorted(iter))[-10]
920
Sorting would require O(nlogn) runtime at minimum - There are very efficient selection algorithms which can solve your problem in linear time.
Partition-based selection (sometimes Quick select), which is based on the idea of quicksort (recursive partitioning), is a good solution (see link for pseudocode + Another example).
A simple modified quicksort works very well in practice. It has average running time proportional to N (though worst case bad luck running time is O(N^2)).
Proceed like a quicksort. Pick a pivot value randomly, then stream through your values and see if they are above or below that pivot value and put them into two bins based on that comparison.
In quicksort you'd then recursively sort each of those two bins. But for the N-th highest value computation, you only need to sort ONE of the bins.. the population of each bin tells you which bin holds your n-th highest value. So for example if you want the 125th highest value, and you sort into two bins which have 75 in the "high" bin and 150 in the "low" bin, you can ignore the high bin and just proceed to finding the 125-75=50th highest value in the low bin alone.
You can iterate the entire sequence maintaining a list of the 5 largest values you find (this will be O(n)). That being said I think it would just be simpler to sort the list.
You could try the Median of Medians method - it's speed is O(N).
Use heapsort. It only partially orders the list until you draw the elements out.
You essentially want to produce a "top-N" list and select the one at the end of that list.
So you can scan the array once and insert into an empty list when the largeArray item is greater than the last item of your top-N list, then drop the last item.
After you finish scanning, pick the last item in your top-N list.
An example for ints and N = 5:
int[] top5 = new int[5]();
top5[0] = top5[1] = top5[2] = top5[3] = top5[4] = 0x80000000; // or your min value
for(int i = 0; i < largeArray.length; i++) {
if(largeArray[i] > top5[4]) {
// insert into top5:
top5[4] = largeArray[i];
// resort:
quickSort(top5);
}
}
As people have said, you can walk the list once keeping track of K largest values. If K is large this algorithm will be close to O(n2).
However, you can store your Kth largest values as a binary tree and the operation becomes O(n log k).
According to Wikipedia, this is the best selection algorithm:
function findFirstK(list, left, right, k)
if right > left
select pivotIndex between left and right
pivotNewIndex := partition(list, left, right, pivotIndex)
if pivotNewIndex > k // new condition
findFirstK(list, left, pivotNewIndex-1, k)
if pivotNewIndex < k
findFirstK(list, pivotNewIndex+1, right, k)
Its complexity is O(n)
One thing you should do if this is in production code is test with samples of your data.
For example, you might consider 1000 or 10000 elements 'large' arrays, and code up a quickselect method from a recipe.
The compiled nature of sorted, and its somewhat hidden and constantly evolving optimizations, make it faster than a python written quickselect method on small to medium sized datasets (< 1,000,000 elements). Also, you might find as you increase the size of the array beyond that amount, memory is more efficiently handled in native code, and the benefit continues.
So, even if quickselect is O(n) vs sorted's O(nlogn), that doesn't take into account how many actual machine code instructions processing each n elements will take, any impacts on pipelining, uses of processor caches and other things the creators and maintainers of sorted will bake into the python code.
You can keep two different counts for each element -- the number of elements bigger than the element, and the number of elements lesser than the element.
Then do a if check N == number of elements bigger than each element
-- the element satisfies this above condition is your output
check below solution
def NthHighest(l,n):
if len(l) <n:
return 0
for i in range(len(l)):
low_count = 0
up_count = 0
for j in range(len(l)):
if l[j] > l[i]:
up_count = up_count + 1
else:
low_count = low_count + 1
# print(l[i],low_count, up_count)
if up_count == n-1:
#print(l[i])
return l[i]
# # find the 4th largest number
l = [1,3,4,9,5,15,5,13,19,27,22]
print(NthHighest(l,4))
-- using the above solution you can find both - Nth highest as well as Nth Lowest
If you do not mind using pandas then:
import pandas as pd
N = 10
column_name = 0
pd.DataFrame(your_array).nlargest(N, column_name)
The above code will show you the N largest values along with the index position of each value.
Hope it helps. :-)
Pandas Nlargest Documentation

Categories

Resources