Coding exercise for practicing adjacency list and BFS

Coding exercise for practicing adjacency list and BFS - python

I have a coding exercise with a row of trampolines, each with a minimum and maximum "bounciness". I have the index of a starting trampoline and an end trampoline, and with this, I need to find the minimum amount of jumps required to reach the end trampoline from the start trampoline.
I have tried creating an adjacency-list, in which I list all possible jumps from a trampoline. This is fine until I reach a large number of trampolines. The Problem is it takes O(n^2) time.
This is how I create the Adjacency List:
def createAL (a, b, l):
al = [list() for _ in range(l)]
for i in range(l):
for j in range(a[i], b[i]+1):
if (i+j) <= l-1:
al[i].append(i+j)
if (i-j) >= 0:
al[i].append(i-j)
for i in range(len(al)):
al[i] = list(set(al[i]))
return al
"a" is the min. bounciness, "b" the max bounciness and "l" is the length of the two lists.
As you can see, the problem is I have 2 nested loops. Does anyone have an idea for a more efficient way of doing this? (preferably wo/ the loops)

Assuming "bounciness" is strictly positive, you can omit this part:
for i in range(len(al)):
al[i] = list(set(al[i]))
...as there is no way you could have duplicates in those lists.
(If however bounciness could be 0 or negative, then first replace any values below 1 by 1 in a)
The building of a can be made a bit faster by:
making the ranges on the actual target indexes (so you don't need i+j in every loop),
cliping those ranges using min() and max(), avoiding if statements in the loop
avoiding individual append calls, using list comprehension
Result:
al = [
[*range(max(0, i-b[i]), i-a[i]+1), *range(i+a[i], min(l, i+b[i]+1))]
for i in range(l)
]
Finally, as this adjacency list presumably serves a BFS algorithm, you could also consider that building the adjacency list may not be necessary, as finding the adjacent nodes during BFS is a piece of cake using a and b on-the-spot. I wonder if you really gain time by creating the adjacency list.
In your BFS code, you probably have something like this (where i is the "current" node):
for neighbor in al[i]:
This could be replaced with:
for neighbor in (*range(max(0, i-b[i]), i-a[i]+1), *range(i+a[i], min(l, i+b[i]+1))):
We should also realise that if the target trampoline is found in a number of steps that is much smaller than the number of trampolines, then there is a probability that not all trampolines are visited during the BFS search. And in that case it would have been a waste of time to have created the complete adjacency list...

Related

What's The Big(O) Complexity of this loop?

def cups (a,b):
i=0
j=0
done = False
while not done:
if a[i]==b[j] :
print("A[" + str(i) + "] with B[" + str(j) + "]")
i += 1
j = 0
if i == len(a):
i=0
done = True
if a[i] != b[j] :
j += 1
I'm trying to compare two lists and print the indices of two values that are the same in the two lists
I'm curious whether the complexity is O(1) or O(n)?

I suspect that your code might not be working the way you want it to, but this question seems to be more about algorithmic complexity and less about your particular implementation, so I'll focus on that.
In general, two sorted lists can be compared in the way you describe in linear time by advancing pointers. We can use two streets of numbered houses as a concrete example: if you and I want to find out what house numbers are duplicated on Main Street and Elm street, we can do the following:
you start at the bottom of Main Street, I start at the bottom of Elm Street.
we each report the number that we see
if the numbers match, we record that number
if not, one of us is seeing a number that is lower than the other. That one walks up the street until they find a number which is equal to or greater than the last one reported by the other and repeat from step 3
In this case, neither of us ever goes back to the bottom of the street.
However, if the lists are not sorted, then we have to use a different approach. Assuming that Main Street and Elm street are numbered in random order, we have to do the following:
I start at the bottom of Elm Street.
I report the number that I see.
you start at the bottom of Main street and walk up until you find a house that has that number, or until you reach the end of Main. If you find a match, we record it.
I advance to the next house and we repeat from step 3.
This is an O(n**2) algorithm, as you can see (you walk up Main St. once for each house on Elm, if there are n houses on each street then we're looking at n * n operations)
This is the state you're in - the problem you're stating cannot be solved in O(n)
However, I will point out that sorting a list is an O(n log n) operation, which would allow you to reduce the problem to the linear case, for a final complexity of O(n log n)

Putting aside any bugs you have in the code - and concentrating on the runtime complexity question:
O(1) means the code is running at a constant time regardless of the size of the input, and clearly here it is not the case since different sizes of a and b will result in different numbers of loop iterations.
In your case your while loop can iterate at most len(a) * len(b) times (roughly)
So you can say that the code has a run time complexity of the order of O(n), where n=len(a)*len(b)

You're scanning all elements of at least one list, so it cannot be constant
The best case, since you're setting j=0, is when all elements of a equal the first element of b, therefore making all elements of a the same, and this is linear time, but big-O does not measure best case
In the worst case, you're scanning all elements of both lists. For each non-equal element, you're scanning all of b until matched; and when you find a match, you're resetting j back to the start of b, so really it would be O(N*M), and for equal length lists, this is O(N^2)
Note: the more generic algorithm for this is this
for a_elem, i in enumerate(a):
for b_elem, j in enumerate(b, start=i)
if a_elem == b_elem:
print(f"a[{i}] with b[{j}]")

how to calculate the minimum unfairness sum of a list

I have tried to summarize the problem statement something like this::
Given n, k and an array(a list) arr where n = len(arr) and k is an integer in set (1, n) inclusive.
For an array (or list) myList, The Unfairness Sum is defined as the sum of the absolute differences between all possible pairs (combinations with 2 elements each) in myList.
To explain: if mylist = [1, 2, 5, 5, 6] then Minimum unfairness sum or MUS. Please note that elements are considered unique by their index in list not their values
MUS = |1-2| + |1-5| + |1-5| + |1-6| + |2-5| + |2-5| + |2-6| + |5-5| + |5-6| + |5-6|
If you actually need to look at the problem statement, It's HERE
My Objective
given n, k, arr(as described above), find the Minimum Unfairness Sum out of all of the unfairness sums of sub arrays possible with a constraint that each len(sub array) = k [which is a good thing to make our lives easy, I believe :) ]
what I have tried
well, there is a lot to be added in here, so I'll try to be as short as I can.
My First approach was this where i used itertools.combinations to get all the possible combinations and statistics.variance to check its spread of data (yeah, I know I'm a mess).
Before you see the code below, Do you think these variance and unfairness sum are perfectly related (i know they are strongly related) i.e. the sub array with minimum variance has to be the sub array with MUS??
You only have to check the LetMeDoIt(n, k, arr) function. If you need MCVE, check the second code snippet below.
from itertools import combinations as cmb
from statistics import variance as varn
def LetMeDoIt(n, k, arr):
v = []
s = []
subs = [list(x) for x in list(cmb(arr, k))] # getting all sub arrays from arr in a list
i = 0
for sub in subs:
if i != 0:
var = varn(sub) # the variance thingy
if float(var) < float(min(v)):
v.remove(v[0])
v.append(var)
s.remove(s[0])
s.append(sub)
else:
pass
elif i == 0:
var = varn(sub)
v.append(var)
s.append(sub)
i = 1
final = []
f = list(cmb(s[0], 2)) # getting list of all pairs (after determining sub array with least MUS)
for r in f:
final.append(abs(r[0]-r[1])) # calculating the MUS in my messy way
return sum(final)
The above code works fine for n<30 but raised a MemoryError beyond that.
In Python chat, Kevin suggested me to try generator which is memory efficient (it really is), but as generator also generates those combination on the fly as we iterate over them, it was supposed to take over 140 hours (:/) for n=50, k=8 as estimated.
I posted the same as a question on SO HERE (you might wanna have a look to understand me properly - it has discussions and an answer by fusion which takes me to my second approach - a better one(i should say fusion's approach xD)).
Second Approach
from itertools import combinations as cmb
def myvar(arr): # a function to calculate variance
l = len(arr)
m = sum(arr)/l
return sum((i-m)**2 for i in arr)/l
def LetMeDoIt(n, k, arr):
sorted_list = sorted(arr) # i think sorting the array makes it easy to get the sub array with MUS quickly
variance = None
min_variance_sub = None
for i in range(n - k + 1):
sub = sorted_list[i:i+k]
var = myvar(sub)
if variance is None or var<variance:
variance = var
min_variance_sub=sub
final = []
f = list(cmb(min_variance_sub, 2)) # again getting all possible pairs in my messy way
for r in f:
final.append(abs(r[0] - r[1]))
return sum(final)
def MainApp():
n = int(input())
k = int(input())
arr = list(int(input()) for _ in range(n))
result = LetMeDoIt(n, k, arr)
print(result)
if __name__ == '__main__':
MainApp()
This code works perfect for n up to 1000 (maybe more), but terminates due to time out (5 seconds is the limit on online judge :/ ) for n beyond 10000 (the biggest test case has n=100000).
=====
How would you approach this problem to take care of all the test cases in given time limits (5 sec) ? (problem was listed under algorithm & dynamic programming)
(for your references you can have a look on
successful submissions(py3, py2, C++, java) on this problem by other candidates - so that you can
explain that approach for me and future visitors)
an editorial by the problem setter explaining how to approach the question
a solution code by problem setter himself (py2, C++).
Input data (test cases) and expected output
Edit1 ::
For future visitors of this question, the conclusions I have till now are,
that variance and unfairness sum are not perfectly related (they are strongly related) which implies that among a lots of lists of integers, a list with minimum variance doesn't always have to be the list with minimum unfairness sum. If you want to know why, I actually asked that as a separate question on math stack exchange HERE where one of the mathematicians proved it for me xD (and it's worth taking a look, 'cause it was unexpected)
As far as the question is concerned overall, you can read answers by archer & Attersson below (still trying to figure out a naive approach to carry this out - it shouldn't be far by now though)
Thank you for any help or suggestions :)

You must work on your list SORTED and check only sublists with consecutive elements. This is because BY DEFAULT, any sublist that includes at least one element that is not consecutive, will have higher unfairness sum.
For example if the list is
[1,3,7,10,20,35,100,250,2000,5000] and you want to check for sublists with length 3, then solution must be one of [1,3,7] [3,7,10] [7,10,20] etc
Any other sublist eg [1,3,10] will have higher unfairness sum because 10>7 therefore all its differences with rest of elements will be larger than 7
The same for [1,7,10] (non consecutive on the left side) as 1<3
Given that, you only have to check for consecutive sublists of length k which reduces the execution time significantly
Regarding coding, something like this should work:
def myvar(array):
return sum([abs(i[0]-i[1]) for i in itertools.combinations(array,2)])
def minsum(n, k, arr):
res=1000000000000000000000 #alternatively make it equal with first subarray
for i in range(n-k):
res=min(res, myvar(l[i:i+k]))
return res

I see this question still has no complete answer. I will write a track of a correct algorithm which will pass the judge. I will not write the code in order to respect the purpose of the Hackerrank challenge. Since we have working solutions.
The original array must be sorted. This has a complexity of O(NlogN)
At this point you can check consecutive sub arrays as non-consecutive ones will result in a worse (or equal, but not better) "unfairness sum". This is also explained in archer's answer
The last check passage, to find the minimum "unfairness sum" can be done in O(N). You need to calculate the US for every consecutive k-long subarray. The mistake is recalculating this for every step, done in O(k), which brings the complexity of this passage to O(k*N). It can be done in O(1) as the editorial you posted shows, including mathematic formulae. It requires a previous initialization of a cumulative array after step 1 (done in O(N) with space complexity O(N) too).
It works but terminates due to time out for n<=10000.
(from comments on archer's question)
To explain step 3, think about k = 100. You are scrolling the N-long array and the first iteration, you must calculate the US for the sub array from element 0 to 99 as usual, requiring 100 passages. The next step needs you to calculate the same for a sub array that only differs from the previous by 1 element 1 to 100. Then 2 to 101, etc.
If it helps, think of it like a snake. One block is removed and one is added.
There is no need to perform the whole O(k) scrolling. Just figure the maths as explained in the editorial and you will do it in O(1).
So the final complexity will asymptotically be O(NlogN) due to the first sort.

Why this is a bad bubble sort algorithm?

I started studying Data Structures and algorithms, and tried to implement Bubble sort:
def BubbleSort(list):
for a in range(len(list)):
for b in range(len(list)):# I could start this loop from 1
if list[a]< list[b]: # to avoid comparing the first element twice
temp=list[a]
list[a]=list[b]
list[b]=temp
return list
I browsed the net and books - but found no Python implementation of bubble sort.
What's wrong with the above?

Several things:
the algorihm will not always sort correctly;
syntactically it seems to sort the opposite way;
it takes twice the amount of time necessary to perform bubble sort;
it is not bubblesort; and
you better never use variables in Python named list, dict, etc.
BubbeSort sorts by comparing two adjacent elements: the so-called "bubble". If checks if the left item is indeed less than the right one. If this is not the case, it swaps the elements. The algorithm iterates maximum n times over the list, after which it is guaranteed to be sorted.
So a very basic implementation would be:
def BubbleSort(data):
for _ in range(len(data)): # iterate n times
for i in range(len(data)-1): # i is the left index of the bubble
if data[i+1] > data[i]: # if the left item is greater
# perform a swap
temp = data[i]
data[i] = data[i+1]
data[i+1] = temp
return data
Now we can improve the algorithm (approximately let the algorithm work in half the time) by stopping at len(data)-1-j, since after each iteration, the right most element over which the bubble has moved is guaranteed to be the maximum:
def BubbleSort(data):
for j in range(len(data)): # iterate n times
for i in range(len(data)-1-j): # i is the left index of the bubble
if data[i+1] > data[i]: # if the left item is greater
# perform a swap
temp = data[i]
data[i] = data[i+1]
data[i+1] = temp
return data
But using bubblesort is - except for some very rare cases - inefficient. It is better to use faster algorithms like QuickSort, MergeSort, and TimSort (the builtin sorting algorithm of Python).

Here's a short list of books which implement bubble sort:
Fundamentals of Python: Data Structures, 1st ed.: Data Structures
Introduction to Numerical Programming: A Practical Guide for Scientists and Engineers Using Python and C/C++ (Series in Computational Physics)
Fundamentals of Python: First Programs
Python 3 for Absolute Beginners (Expert's Voice in Open Source)
Python Data Structures and Algorithms

You can change the for loop from index a + 1 instead of 0 which will avoid comparing the first element to itself. Makes it a bit faster.
Use the swap function to swap the values of the list[a] and list[b] elements, rather than using a temporary variable.
Use the sorted function to check if the list is already sorted, and return the list immediately if it is.

Given a list L labeled 1 to N, and a process that "removes" a random element from consideration, how can one efficiently keep track of min(L)?

The question is pretty much in the title, but say I have a list L
L = [1,2,3,4,5]
min(L) = 1 here. Now I remove 4. The min is still 1. Then I remove 2. The min is still 1. Then I remove 1. The min is now 3. Then I remove 3. The min is now 5, and so on.
I am wondering if there is a good way to keep track of the min of the list at all times without needing to do min(L) or scanning through the entire list, etc.
There is an efficiency cost to actually removing the items from the list because it has to move everything else over. Re-sorting the list each time is expensive, too. Is there a way around this?

To remove a random element you need to know what elements have not been removed yet.
To know the minimum element, you need to sort or scan the items.
A min heap implemented as an array neatly solves both problems. The cost to remove an item is O(log N) and the cost to find the min is O(1). The items are stored contiguously in an array, so choosing one at random is very easy, O(1).
The min heap is described on this Wikipedia page
BTW, if the data are large, you can leave them in place and store pointers or indexes in the min heap and adjust the comparison operator accordingly.

Google for self-balancing binary search trees. Building one from the initial list takes O(n lg n) time, and finding and removing an arbitrary item will take O(lg n) (instead of O(n) for finding/removing from a simple list). A smallest item will always appear in the root of the tree.
This question may be useful. It provides links to several implementation of various balanced binary search trees. The advice to use a hash table does not apply well to your case, since it does not address maintaining a minimum item.

Here's a solution that need O(N lg N) preprocessing time + O(lg N) update time and O(lg(n)*lg(n)) delete time.
Preprocessing:
step 1: sort the L
step 2: for each item L[i], map L[i]->i
step 3: Build a Binary Indexed Tree or segment tree where for every 1<=i<=length of L, BIT[i]=1 and keep the sum of the ranges.
Query type delete:
Step 1: if an item x is said to be removed, with a binary search on array L (where L is sorted) or from the mapping find its index. set BIT[index[x]] = 0 and update all the ranges. Runtime: O(lg N)
Query type findMin:
Step 1: do a binary search over array L. for every mid, find the sum on BIT from 1-mid. if BIT[mid]>0 then we know some value<=mid is still alive. So we set hi=mid-1. otherwise we set low=mid+1. Runtime: O(lg**2N)
Same can be done with Segment tree.
Edit: If I'm not wrong per query can be processed in O(1) with Linked List

If sorting isn't in your best interest, I would suggest only do comparisons where you need to do them. If you remove elements that are not the old minimum, and you aren't inserting any new elements, there isn't a re-scan necessary for a minimum value.
Can you give us some more information about the processing going on that you are trying to do?
Comment answer: You don't have to compute min(L). Just keep track of its index and then only re-run the scan for min(L) when you remove at(or below) the old index (and make sure you track it accordingly).

Your current approach of rescanning when the minimum is removed is O(1)-time in expectation for each removal (assuming every item is equally likely to be removed).
Given a list of n items, a rescan is necessary with probability 1/n, so the expected work at each step is n * 1/n = O(1).

Finding Nth item of unsorted list without sorting the list

Hey. I have a very large array and I want to find the Nth largest value. Trivially I can sort the array and then take the Nth element but I'm only interested in one element so there's probably a better way than sorting the entire array...

A heap is the best data structure for this operation and Python has an excellent built-in library to do just this, called heapq.
import heapq
def nth_largest(n, iter):
return heapq.nlargest(n, iter)[-1]
Example Usage:
>>> import random
>>> iter = [random.randint(0,1000) for i in range(100)]
>>> n = 10
>>> nth_largest(n, iter)
920
Confirm result by sorting:
>>> list(sorted(iter))[-10]
920

Sorting would require O(nlogn) runtime at minimum - There are very efficient selection algorithms which can solve your problem in linear time.
Partition-based selection (sometimes Quick select), which is based on the idea of quicksort (recursive partitioning), is a good solution (see link for pseudocode + Another example).

A simple modified quicksort works very well in practice. It has average running time proportional to N (though worst case bad luck running time is O(N^2)).
Proceed like a quicksort. Pick a pivot value randomly, then stream through your values and see if they are above or below that pivot value and put them into two bins based on that comparison.
In quicksort you'd then recursively sort each of those two bins. But for the N-th highest value computation, you only need to sort ONE of the bins.. the population of each bin tells you which bin holds your n-th highest value. So for example if you want the 125th highest value, and you sort into two bins which have 75 in the "high" bin and 150 in the "low" bin, you can ignore the high bin and just proceed to finding the 125-75=50th highest value in the low bin alone.

You can iterate the entire sequence maintaining a list of the 5 largest values you find (this will be O(n)). That being said I think it would just be simpler to sort the list.

You could try the Median of Medians method - it's speed is O(N).

Use heapsort. It only partially orders the list until you draw the elements out.

You essentially want to produce a "top-N" list and select the one at the end of that list.
So you can scan the array once and insert into an empty list when the largeArray item is greater than the last item of your top-N list, then drop the last item.
After you finish scanning, pick the last item in your top-N list.
An example for ints and N = 5:
int[] top5 = new int[5]();
top5[0] = top5[1] = top5[2] = top5[3] = top5[4] = 0x80000000; // or your min value
for(int i = 0; i < largeArray.length; i++) {
if(largeArray[i] > top5[4]) {
// insert into top5:
top5[4] = largeArray[i];
// resort:
quickSort(top5);
}
}

As people have said, you can walk the list once keeping track of K largest values. If K is large this algorithm will be close to O(n2).
However, you can store your Kth largest values as a binary tree and the operation becomes O(n log k).
According to Wikipedia, this is the best selection algorithm:
function findFirstK(list, left, right, k)
if right > left
select pivotIndex between left and right
pivotNewIndex := partition(list, left, right, pivotIndex)
if pivotNewIndex > k // new condition
findFirstK(list, left, pivotNewIndex-1, k)
if pivotNewIndex < k
findFirstK(list, pivotNewIndex+1, right, k)
Its complexity is O(n)

One thing you should do if this is in production code is test with samples of your data.
For example, you might consider 1000 or 10000 elements 'large' arrays, and code up a quickselect method from a recipe.
The compiled nature of sorted, and its somewhat hidden and constantly evolving optimizations, make it faster than a python written quickselect method on small to medium sized datasets (< 1,000,000 elements). Also, you might find as you increase the size of the array beyond that amount, memory is more efficiently handled in native code, and the benefit continues.
So, even if quickselect is O(n) vs sorted's O(nlogn), that doesn't take into account how many actual machine code instructions processing each n elements will take, any impacts on pipelining, uses of processor caches and other things the creators and maintainers of sorted will bake into the python code.

You can keep two different counts for each element -- the number of elements bigger than the element, and the number of elements lesser than the element.
Then do a if check N == number of elements bigger than each element
-- the element satisfies this above condition is your output
check below solution
def NthHighest(l,n):
if len(l) <n:
return 0
for i in range(len(l)):
low_count = 0
up_count = 0
for j in range(len(l)):
if l[j] > l[i]:
up_count = up_count + 1
else:
low_count = low_count + 1
# print(l[i],low_count, up_count)
if up_count == n-1:
#print(l[i])
return l[i]
# # find the 4th largest number
l = [1,3,4,9,5,15,5,13,19,27,22]
print(NthHighest(l,4))
-- using the above solution you can find both - Nth highest as well as Nth Lowest

If you do not mind using pandas then:
import pandas as pd
N = 10
column_name = 0
pd.DataFrame(your_array).nlargest(N, column_name)
The above code will show you the N largest values along with the index position of each value.
Hope it helps. :-)
Pandas Nlargest Documentation

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.