Implementing the following algorithm for sorting - python

I am writing this algorithm for a sort. I fail to see how it is different from insertion sort. I was wondering if someone can help me understand the difference. The current sort is written as insertion because I don't see the difference yet. This is homework, so I don't want an answer I want to understand the difference. The algorithm is here
def file_open(perkList,fileName):
with open(fileName, 'r') as f:
for line in f.readlines():
perkList.append(int(line))
def perkSort(perkList):
for marker in range(len(perkList)):
save = perkList[marker]
i = marker
while i < len(perkList) and perkList[i+1] > save:
perkList[i] = perkList[i-1]
i = i - 1
perkList[i] = save
print("New list",perkList)
def main():
perkList = []
file_open(perkList,'integers')
file_open(perkList,'integers2')
print("initial list",perkList)
perkSort(perkList)
main()
Apologies that this question is not that clean. Edits are appreciated.

Perksort algorithm mentioned in your homework is essentially Bubble sort algorithm. What you have implemented is Inserion Sort algorithm. The difference is as follows:
Insertion Sort
It works by inserting an element in the input list to the correct position in the list that is already sorted. That is it builds the sorted array one item at a time.
## Unsorted List ##
7 6 1 3 2
# First Pass
7 6 1 3 2
# Second Pass
6 7 1 3 2
# Third Pass
1 6 7 3 2
# Fourth Pass
1 3 6 7 2
# Fifth Pass
1 2 3 6 7
Note that after i iterations the first i elements are ordered.
You have got maximum i iterations on ith step.
Psuedocode:
for i ← 1 to length(A)
j ← i
while j > 0 and A[j-1] > A[j]
swap A[j] and A[j-1]
j ← j - 1
This is what you did in your python implementation.
Some complexity analysis:
Worst case performance О(n2)
Best case performance O(n)
Average case performance О(n2)
Worst case space complexity О(n) total, O(1) auxiliary
Bubble Sort
This is PerkSort Algorithm given to implement in your homework.
It works by repeatedly scanning through the list to be sorted while comparing pairs of elements that are adjacent and hence swapping them if required.
## Unsorted List ##
7 6 1 3 2
# First Pass
6 1 3 2 7
# Second Pass
1 3 2 6 7
# Third Pass
1 2 3 6 7
# Fourth Pass
1 2 3 6 7
# No Fifth Pass as there were no swaps in Fourth Pass
Note that after i iterations the last i elements are the biggest, and ordered.
You have got maximum n-i-1 iterations on ith step.
I am not giving psuedocode here as this is your homework assignment.
Hint: You will move from marker in forward direction, in order to shift the elements towards up, just like bubbling
Some complexity analysis:
Worst case performance О(n2)
Best case performance O(n)
Average case performance О(n2)
Worst case space complexity О(n) total, O(1) auxiliary
Similarities
Both have same worst case, average case and best case time
complexities
Both have same space complexities
Both are in-place algorithms (i.e. they change the original data)
Both are Comparision Sorts
Differences ( Apart from algorithm, of course )
Even though both algorithms have same time and space complexities on average, practically Insertion sort is better than Bubble sort.This is because on an average Bubble sort needs more swaps than Insertion sort. Insertion sort performs better on a list with small number of inversions.

The program that you have written does implement insertion sort.
Lets take an example and see what your program would do. For input 5 8 2 7
After first iteration
5 8 2 7
After second iteration
2 5 8 7
After third iteration
2 5 7 8
But the algorithm that is given in your link works differently. It takes the largest element and puts it in the end. For our example
After first iteration
5 2 7 8
After second iteration
2 5 7 8

Related

Google Jam test cases passes but submission shows "Wrong Answer"

Note: The main parts of the statements of the problems "Reversort" and
"Reversort Engineering" are identical, except for the last paragraph.
The problems can otherwise be solved independently.
Reversort is an algorithm to sort a list of distinct integers in
increasing order. The algorithm is based on the "Reverse" operation.
Each application of this operation reverses the order of some
contiguous part of the list.
After i−1 iterations, the positions 1,2,…,i−1 of the list contain the
i−1 smallest elements of L, in increasing order. During the i-th
iteration, the process reverses the sublist going from the i-th
position to the current position of the i-th minimum element. That
makes the i-th minimum element end up in the i-th position.
For example, for a list with 4 elements, the algorithm would perform 3
iterations. Here is how it would process L=[4,2,1,3]:
i=1, j=3⟶L=[1,2,4,3] i=2, j=2⟶L=[1,2,4,3] i=3, j=4⟶L=[1,2,3,4] The
most expensive part of executing the algorithm on our architecture is
the Reverse operation. Therefore, our measure for the cost of each
iteration is simply the length of the sublist passed to Reverse, that
is, the value j−i+1. The cost of the whole algorithm is the sum of the
costs of each iteration.
In the example above, the iterations cost 3, 1, and 2, in that order,
for a total of 6.
Given the initial list, compute the cost of executing Reversort on it.
Input The first line of the input gives the number of test cases, T. T
test cases follow. Each test case consists of 2 lines. The first line
contains a single integer N, representing the number of elements in
the input list. The second line contains N distinct integers L1, L2,
..., LN, representing the elements of the input list L, in order.
Output For each test case, output one line containing Case #x: y,
where x is the test case number (starting from 1) and y is the total
cost of executing Reversort on the list given as input.
Limits Time limit: 10 seconds. Memory limit: 1 GB. Test Set 1 (Visible
Verdict) 1≤T≤100. 2≤N≤100. 1≤Li≤N, for all i. Li≠Lj, for all i≠j.
Sample Sample Input 3 4 4 2 1 3 2 1 2 7 7 6 5 4 3 2 1 Sample Output
Case #1: 6 Case #2: 1 Case #3: 12 Sample Case #1 is described in the
statement above.
In Sample Case #2, there is a single iteration, in which Reverse is
applied to a sublist of size 1. Therefore, the total cost is 1.
In Sample Case #3, the first iteration reverses the full list, for a
cost of 7. After that, the list is already sorted, but there are 5
more iterations, each of which contributes a cost of 1.
def Reversort(L):
sort = 0
for i in range(len(L)-1):
small = L[i]
x = L[i]
y = L[i]
for j in range(i, len(L)):
if L[j] < small :
small = L[j]
sort = sort + (L.index(small) - L.index(y) + 1)
L[L.index(small)] = x
L[L.index(y)] = small
print(L) #For debugging purpose
return sort
T = int(input())
for i in range(T):
N = int(input())
L = list(map(int, input().rstrip().split()))
s = Reversort(L)
print(f"Case #{i+1}: {s}")
Your code fails for the test case 7 6 5 4 3 2 1. The code gives the answer as 18 whereas the answer should be 12.
You have forgotten to reverse the list between i and j.
the algorithm says
During the i-th iteration, the process reverses the sublist going from the i-th position to the current position of the i-th minimum element.

how to find combinations of numbers in python list that sum up to a given number in n combinations?

Question I am attempting
Function Description
Complete the bonetrousle function in the editor below. It should return an array of integers.
bonetrousle has the following parameter(s):
n: the integer number of sticks to buy
k: the integer number of box sizes the store carries
b: the integer number of boxes to buy
Edit:
The function takes in n as the number of sphagetti sticks to summed up from all the boxes bought by Papyrus.
b is the number of boxes that will be purchased by Papyrus and then all the sphagetti in them will be added up to form n
k will be the total boxes in the store each containing and increasing number of sphagetti for instance 8 boxes will be in the order of 1 box == 1 sphagetti , 2 boxes == sphagetti ... 8th box.
if the number n of boxes required by Papyrus are more compared to those in stock, then you return -1.
I hope this clarifies the question some more
Also this question can be better understood via HackerRack
https://www.hackerrank.com/challenges/bonetrousle/problem
If there is a solution, print a single line of distinct space-separated integers where each integer denotes the numbers of noodles in each box that Papyrus must purchase.
If there are multiple possible solutions, you can print any one of them. Do not print any leading or trailing spaces or extra newlines.
Sample Input
4
12 8 3
10 3 3
9 10 2
9 10 2
Sample Output
2 3 7
-1
5 4
1 8
**This is My Solution **
It should work on the first test case but I know I haven't put b(number of boxes skeleton wants to buy) to good use because I am confused about how to go about it.
def bonetrousle(n, k, b):
# list the range for the store boxes {done}
# find if the number of boxes required by skeleton are accessible from the store
# find a combination that gives me the number of sphagetti required by skeleton
inventory=[item for item in range(1,k+1)]
list_of_outcome=[]
# if sum of the items in the store is greater than the number of
# individual sphagetti's wanted, this means,is possible to get
# what skeleton came looking for
if sum(inventory)>n:
if b==2:
for i in inventory:
for j in inventory:
if i+j==n:
list_of_outcome.append("{}{}".format(
i,j
))
else:
for i in inventory:
for j in inventory:
for k in inventory:
if i+j+k==n:
list_of_outcome.append("{}{}{}".format(
i,j,k
))
else:
return ("-1")
if list_of_outcome!=[]:
return ((random.choice(list_of_outcome)).strip())
Compiler Message
Wrong Answer
Input (stdin)
Download
4
12 8 3
10 3 3
9 10 2
9 10 2
Your Output (stdout)
3 5 4
- 1
8 1
8 1
Expected Output
Download
2 3 7
-1
5 4
1 8
So, I really need your help with automating the for loops or any alternative solution because I know many for loops consume so much space and time.

Order bias in wrong implementation of Fisher Yates Shuffle

I implemented the shuffling algorithm as:
import random
a = range(1, n+1) #a containing element from 1 to n
for i in range(n):
j = random.randint(0, n-1)
a[i], a[j] = a[j], a[i]
As this algorithm is biased. I just wanted to know for any n(n ≤ 17), is it possible to find that which permutation have the highest probablity of occuring and which permutation have least probablity out of all possible n! permutations. If yes then what is that permutation??
For example n=3:
a = [1,2,3]
There are 3^3 = 27 possible shuffle
No. occurence of different permutations:
1 2 3 = 4
3 1 2 = 4
3 2 1 = 4
1 3 2 = 5
2 1 3 = 5
2 3 1 = 5
P.S. I am not so good with maths.
This is not a proof by any means, but you can quickly come up with the distribution of placement probabilities by running the biased algorithm a million times. It will look like this picture from wikipedia:
An unbiased distribution would have 14.3% in every field.
To get the most likely distribution, I think it's safe to just pick the highest percentage for each index. This means it's most likely that the entire array is moved down by one and the first element will become the last.
Edit: I ran some simulations and this result is most likely wrong. I'll leave this answer up until I can come up with something better.

Python FInd Largest number in past k items

Given an array of integers and an integer value K my task is to write a function that prints to the standard output the highest number for that value in the array and the past K entries before it.
Example Input:
tps: 6, 9, 4, 7, 4, 1
k: 3
Example Output:
6
9
9
9
7
7
I have been told that the code I have written could be made much more efficient for large data sets. How can I make this code most efficient?
def tweets_per_second(tps, k):
past = [tps[0]]
for t in tps[1:]:
past.append(t)
if len(past) > k: past = past[-k:]
print max(past)
You can achieve a linear time complexity using a monotonic queue(O(n) for any value of k). The idea is the following:
Let's maintain a deque of pairs (value, position). Initially, it is empty.
When a new element arrives, do the following: while the position of the front element is out of range(less than i - K), pop it. While the value of the back element is less than the new one, pop it. Finally, push a pair (current element, its position) to the back of the deque.
The answer for the current position is the front element of the deque.
Each element is added to the deque only once and removed at most once. Thus, the time complexity is linear and it does not depend on K. This solution is optimal because just reading the input is O(n).
Try using a heap to achieve reducing the complexity of the max operation in from O(K) to O(logK) time.
Add first (-tps[i])*, i in range(0,k) and output (-heap[0]) each time
for the next N-k numbers you should add in the heap the tps[i] remove tps[i-k], and print (-heap[0])
Overall you get a O(N log(K)) algorithm, while what you use now is O(N*K). This will be very helpful if K is not small.
*Since the implementation of heap has the min(heap) in heap[0] as an invariant, if you add -value the -heap[0] will be the max(heap) as you want it.
pandas can do this pretty well:
import pandas as pd
df = pd.DataFrame(dict(data=[6, 9, 4, 7, 4, 1]))
df['running_max'] = pd.expanding_max(df.data)
df['rolling_max'] = pd.rolling_max(df.data, 3, min_periods=0)
print df
data running_max rolling_max
0 6 6 6
1 9 9 9
2 4 9 9
3 7 9 9
4 4 9 7
5 1 9 7

How can you compute percentiles and ranks with a generator on a single pass?

Building off and earlier question: Computing stats on generators in single pass. Python
As I mentioned before computing statistics from a generator in a single pass is extremely fast and memory efficient. Complex statistics and rank attributes like the 90th percentile and the nth smallest often need more complex work than standard deviation and averages (solved in the above). These approaches become very important when working with map/reduce jobs and large datasets where putting the data into a list or computing multiple passes becomes very slow.
The following is an O(n) quicksort style algorithm for looking up data based on rank order. Useful for finding medians, percentiles, quartiles, and deciles. Equivalent to data[n] when the data is already sorted. But needs all the data in a list that can be split/pivoted.
How can you compute medians, percentiles, quartiles, and deciles with a generator on a single pass?
The Quicksort style algorithm that needs a complete list
import random
def select(data, n):
"Find the nth rank ordered element (the least value has rank 0)."
data = list(data)
if not 0 <= n < len(data):
raise ValueError('not enough elements for the given rank')
while True:
pivot = random.choice(data)
pcount = 0
under, over = [], []
uappend, oappend = under.append, over.append
for elem in data:
if elem < pivot:
uappend(elem)
elif elem > pivot:
oappend(elem)
else:
pcount += 1
if n < len(under):
data = under
elif n < len(under) + pcount:
return pivot
else:
data = over
n -= len(under) + pcount
You will need to store large parts of the data. Up to the point where it may just pay off to store it completely. Unless you are willing to accept an approximate algorithm (which may be very reasonable when you know your data is independent).
Consider you need to find the median of the following data set:
0 1 2 3 4 5 6 7 8 9 -1 -2 -3 -4 -5 -6 -7 -8 -9
The median is obviously 0. However, if you have seen only the first 10 elements, it is your worst guess at that time! So in order to find the median of an n element stream, you need to keep at least n/2 candidate elements in memory. And if you do not know the total size n, you need to keep all!
Here are the medians for every odd-sized situation:
0 _ 1 _ 2 _ 3 _ 4 _ 4 _ 3 _ 2 _ 1 _ 0
While they were never candidates, you also need to remember the element 5 - 9:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
yields the median 9. For every element in a series of size n I can find a continued series of size O(2*n) that has this element as median. But obviously, these series are not random / independent.
See "On-line" (iterator) algorithms for estimating statistical median, mode, skewness, kurtosis? for an overview of related methods.

Categories

Resources