Given an array of integers and an integer value K my task is to write a function that prints to the standard output the highest number for that value in the array and the past K entries before it.
Example Input:
tps: 6, 9, 4, 7, 4, 1
k: 3
Example Output:
6
9
9
9
7
7
I have been told that the code I have written could be made much more efficient for large data sets. How can I make this code most efficient?
def tweets_per_second(tps, k):
past = [tps[0]]
for t in tps[1:]:
past.append(t)
if len(past) > k: past = past[-k:]
print max(past)
You can achieve a linear time complexity using a monotonic queue(O(n) for any value of k). The idea is the following:
Let's maintain a deque of pairs (value, position). Initially, it is empty.
When a new element arrives, do the following: while the position of the front element is out of range(less than i - K), pop it. While the value of the back element is less than the new one, pop it. Finally, push a pair (current element, its position) to the back of the deque.
The answer for the current position is the front element of the deque.
Each element is added to the deque only once and removed at most once. Thus, the time complexity is linear and it does not depend on K. This solution is optimal because just reading the input is O(n).
Try using a heap to achieve reducing the complexity of the max operation in from O(K) to O(logK) time.
Add first (-tps[i])*, i in range(0,k) and output (-heap[0]) each time
for the next N-k numbers you should add in the heap the tps[i] remove tps[i-k], and print (-heap[0])
Overall you get a O(N log(K)) algorithm, while what you use now is O(N*K). This will be very helpful if K is not small.
*Since the implementation of heap has the min(heap) in heap[0] as an invariant, if you add -value the -heap[0] will be the max(heap) as you want it.
pandas can do this pretty well:
import pandas as pd
df = pd.DataFrame(dict(data=[6, 9, 4, 7, 4, 1]))
df['running_max'] = pd.expanding_max(df.data)
df['rolling_max'] = pd.rolling_max(df.data, 3, min_periods=0)
print df
data running_max rolling_max
0 6 6 6
1 9 9 9
2 4 9 9
3 7 9 9
4 4 9 7
5 1 9 7
Related
Note: The main parts of the statements of the problems "Reversort" and
"Reversort Engineering" are identical, except for the last paragraph.
The problems can otherwise be solved independently.
Reversort is an algorithm to sort a list of distinct integers in
increasing order. The algorithm is based on the "Reverse" operation.
Each application of this operation reverses the order of some
contiguous part of the list.
After i−1 iterations, the positions 1,2,…,i−1 of the list contain the
i−1 smallest elements of L, in increasing order. During the i-th
iteration, the process reverses the sublist going from the i-th
position to the current position of the i-th minimum element. That
makes the i-th minimum element end up in the i-th position.
For example, for a list with 4 elements, the algorithm would perform 3
iterations. Here is how it would process L=[4,2,1,3]:
i=1, j=3⟶L=[1,2,4,3] i=2, j=2⟶L=[1,2,4,3] i=3, j=4⟶L=[1,2,3,4] The
most expensive part of executing the algorithm on our architecture is
the Reverse operation. Therefore, our measure for the cost of each
iteration is simply the length of the sublist passed to Reverse, that
is, the value j−i+1. The cost of the whole algorithm is the sum of the
costs of each iteration.
In the example above, the iterations cost 3, 1, and 2, in that order,
for a total of 6.
Given the initial list, compute the cost of executing Reversort on it.
Input The first line of the input gives the number of test cases, T. T
test cases follow. Each test case consists of 2 lines. The first line
contains a single integer N, representing the number of elements in
the input list. The second line contains N distinct integers L1, L2,
..., LN, representing the elements of the input list L, in order.
Output For each test case, output one line containing Case #x: y,
where x is the test case number (starting from 1) and y is the total
cost of executing Reversort on the list given as input.
Limits Time limit: 10 seconds. Memory limit: 1 GB. Test Set 1 (Visible
Verdict) 1≤T≤100. 2≤N≤100. 1≤Li≤N, for all i. Li≠Lj, for all i≠j.
Sample Sample Input 3 4 4 2 1 3 2 1 2 7 7 6 5 4 3 2 1 Sample Output
Case #1: 6 Case #2: 1 Case #3: 12 Sample Case #1 is described in the
statement above.
In Sample Case #2, there is a single iteration, in which Reverse is
applied to a sublist of size 1. Therefore, the total cost is 1.
In Sample Case #3, the first iteration reverses the full list, for a
cost of 7. After that, the list is already sorted, but there are 5
more iterations, each of which contributes a cost of 1.
def Reversort(L):
sort = 0
for i in range(len(L)-1):
small = L[i]
x = L[i]
y = L[i]
for j in range(i, len(L)):
if L[j] < small :
small = L[j]
sort = sort + (L.index(small) - L.index(y) + 1)
L[L.index(small)] = x
L[L.index(y)] = small
print(L) #For debugging purpose
return sort
T = int(input())
for i in range(T):
N = int(input())
L = list(map(int, input().rstrip().split()))
s = Reversort(L)
print(f"Case #{i+1}: {s}")
Your code fails for the test case 7 6 5 4 3 2 1. The code gives the answer as 18 whereas the answer should be 12.
You have forgotten to reverse the list between i and j.
the algorithm says
During the i-th iteration, the process reverses the sublist going from the i-th position to the current position of the i-th minimum element.
I'm trying to get some guidance on how to implement this algorithm in python. I'm currently trying to solve adventofcode.com/2017/day/10.
I understand the logic of what needs to be done, but where I'm stuck is implementing this in code specifically taking a selection/slice of a list, reversing it, then "putting it back" into said list.
The example they give is:
Suppose we instead only had a circular list containing five elements, [0, 1, 2, 3, 4] and were given input lengths of [3, 4, 1, 5].
The list begins as [0] 1 2 3 4 (where square brackets indicate the current position).
The first length, 3, selects [0] 1 2) 3 4 (where parentheses indicate the sublist to be reversed).
After reversing that section (0 1 2 into 2 1 0), we get ([2] 1 0) 3 4.
Then, the current position moves forward by the length, 3, plus the skip size, 0: 2 1 0 [3] 4. Finally, the skip size increases to 1.
The second length, 4, selects a section which wraps: 2 1) 0 ([3] 4.
The sublist 3 4 2 1 is reversed to form 1 2 4 3: 4 3) 0 ([1] 2.
The current position moves forward by the length plus the skip size, a total of 5, causing it not to move because it wraps around: 4 3 0 [1] 2. The skip size increases to 2.
I know I will use a for loop for each length, then based on the current "cursor" position, create a slice that starts at the cursor position, and contains the amount of elements that matches the current iteration (in the second example it would be 4). I feel like this needs to be solved using the %, list slicing, and the length of the list but can't figure it out. Any guidance or help would be much appreciated even if it's resources to learn how to solve this versus just the answer. But I'll take anything. Thanks friends.
I am writing this algorithm for a sort. I fail to see how it is different from insertion sort. I was wondering if someone can help me understand the difference. The current sort is written as insertion because I don't see the difference yet. This is homework, so I don't want an answer I want to understand the difference. The algorithm is here
def file_open(perkList,fileName):
with open(fileName, 'r') as f:
for line in f.readlines():
perkList.append(int(line))
def perkSort(perkList):
for marker in range(len(perkList)):
save = perkList[marker]
i = marker
while i < len(perkList) and perkList[i+1] > save:
perkList[i] = perkList[i-1]
i = i - 1
perkList[i] = save
print("New list",perkList)
def main():
perkList = []
file_open(perkList,'integers')
file_open(perkList,'integers2')
print("initial list",perkList)
perkSort(perkList)
main()
Apologies that this question is not that clean. Edits are appreciated.
Perksort algorithm mentioned in your homework is essentially Bubble sort algorithm. What you have implemented is Inserion Sort algorithm. The difference is as follows:
Insertion Sort
It works by inserting an element in the input list to the correct position in the list that is already sorted. That is it builds the sorted array one item at a time.
## Unsorted List ##
7 6 1 3 2
# First Pass
7 6 1 3 2
# Second Pass
6 7 1 3 2
# Third Pass
1 6 7 3 2
# Fourth Pass
1 3 6 7 2
# Fifth Pass
1 2 3 6 7
Note that after i iterations the first i elements are ordered.
You have got maximum i iterations on ith step.
Psuedocode:
for i ← 1 to length(A)
j ← i
while j > 0 and A[j-1] > A[j]
swap A[j] and A[j-1]
j ← j - 1
This is what you did in your python implementation.
Some complexity analysis:
Worst case performance О(n2)
Best case performance O(n)
Average case performance О(n2)
Worst case space complexity О(n) total, O(1) auxiliary
Bubble Sort
This is PerkSort Algorithm given to implement in your homework.
It works by repeatedly scanning through the list to be sorted while comparing pairs of elements that are adjacent and hence swapping them if required.
## Unsorted List ##
7 6 1 3 2
# First Pass
6 1 3 2 7
# Second Pass
1 3 2 6 7
# Third Pass
1 2 3 6 7
# Fourth Pass
1 2 3 6 7
# No Fifth Pass as there were no swaps in Fourth Pass
Note that after i iterations the last i elements are the biggest, and ordered.
You have got maximum n-i-1 iterations on ith step.
I am not giving psuedocode here as this is your homework assignment.
Hint: You will move from marker in forward direction, in order to shift the elements towards up, just like bubbling
Some complexity analysis:
Worst case performance О(n2)
Best case performance O(n)
Average case performance О(n2)
Worst case space complexity О(n) total, O(1) auxiliary
Similarities
Both have same worst case, average case and best case time
complexities
Both have same space complexities
Both are in-place algorithms (i.e. they change the original data)
Both are Comparision Sorts
Differences ( Apart from algorithm, of course )
Even though both algorithms have same time and space complexities on average, practically Insertion sort is better than Bubble sort.This is because on an average Bubble sort needs more swaps than Insertion sort. Insertion sort performs better on a list with small number of inversions.
The program that you have written does implement insertion sort.
Lets take an example and see what your program would do. For input 5 8 2 7
After first iteration
5 8 2 7
After second iteration
2 5 8 7
After third iteration
2 5 7 8
But the algorithm that is given in your link works differently. It takes the largest element and puts it in the end. For our example
After first iteration
5 2 7 8
After second iteration
2 5 7 8
I have a triangle with two-hundred rows, where I have to find the maximum distance to get from the top to the bottom of the triangle.
5
9 8
5 4 6
9 7 3 4
Here, the shortest distance would be 5+8+4+3=20. The maximum distance would be 5+9+5+9=28.
I have a good idea of the algorithm I want to implement but I am struggling to turn it into code.
My plan is: start at the 2nd to last row, add the maximum of the possible paths from the bottom row, and iterate to the top.
For instance, the above triangle would turn into:
28
23 19
14 11 10
9 7 3 4
This is vastly more efficient than brute-forcing, but I have two general questions:
Using brute-force, how do I list all the possible paths from top to
bottom (can only move to adjacent points)? I tried using this
(triangle is the list of lists containing the triangle):
points=list(itertools.product(*triangle))
but this contains all possible combinations from each row, not just
adjacent members.
Project Euler #18 - how to brute force all possible paths in tree-like structure using Python?
This somewhat explains a possible approach, but I'd like to use
itertools and any other modules (as pythonic as possible)
How would I go about iterating the strategy of adding each maximum
from the previous row and iterating to the top? I know I have to
implement a nested loop:
for x in triangle:
for i in x:
i+=? #<-Not sure if this would even increment it
edit:
what I was thinking was:
triangle[y][x] = max([triangle[y+1][x],triangle[y+1][x+1]])
It does not use itertools, it is recursive, but I memoize the results, so its still fast...
def memoize(function):
memo = {}
def wrapper(*args):
if args in memo:
return memo[args]
else:
rv = function(*args)
memo[args] = rv
return rv
return wrapper
#memoize
def getmaxofsub(x, y):
if y == len(triangle) or x>y: return 0
#print x, y
return triangle[y][x] + max(getmaxofsub(x, y+1), getmaxofsub(x+1, y+1))
getmaxofsub(0,0)
I read your algorithm suggestion some more times and your "cumulative triangle" is stored in memoof the memoized decorator, so in the end it is very similar. if you want to prevent that there is big stack during recursive "down calling" through the triangle, you can fill the cache of memoize by calling getmaxofsub() bottom -> up.
for i in reversed(range(len(triangle))):
getmaxofsub(0, i), getmaxofsub(i//2, i), getmaxofsub(i, i)
print getmaxofsub(0,0)
Edit
getmaxofsub: How does this function work? First you have to know, that you can't divide your triangle in sub triangles. I take your triangle as an example:
5
9 8
5 4 6
9 7 3 4
That's the complete one. The "coordinates" of the peak are x=0, y=0.
Now I extract the sub triangle of the peak x=0, y=1:
9
5 4
9 7 3
or x=1, y=2
4
7 3
So this is how my algorithm works: The peak of the whole triangle (x=0, y=0) asks its sub triangles (x=0, y=1) and (x=1, y=1), "What is your maximum distance to the ground?" And each of them will ask their sub-triangles and so on…
this will go on until the function reaches the ground/y==len(triangle): The ground-entries want to ask their sub triangles, but since their is none of those, they get the answer 0.
After each triangle has called their sub triangles, it decides, which one is the greater one, add their own value and return this sum.
So now you see, what is the principle of this algorithm. Those algorithms are called recursive algorithms. You see, a function calling itself is pretty standard… and it works…
So, if you think about this whole algorithm, you would see that a lot of sub-triangles are called several times and they would ask their sub-triangles and so on… But each time they return the same value. That is why I used the memorize-decorator: If a function is called with the same arguments x and y, the decorator returns the last calculated value for those arguments and prevents the time-consuming calculation… It is a simple cache…
That is why this function is as easy to implement as a recursive algorithm and as fast as a iteration...
To answer your first question (how to brute-force iterate over all paths): If you start at the top of the triangle and move down along some random path, you have to make a decision to go left or right for every level that you go down. The number of different paths is thus 2^(nrows-1). For your problem with 200 rows, there are thus 8e59 different paths, way to much to check in a brute-force way.
For a small triangle, you can still iterate over all possible paths in a brute-force way, for example like this:
In [10]: from itertools import product
In [11]: triangle = [[5], [9,8], [5,4,6], [9,7,3,4]]
In [12]: for decisions in product((0,1), repeat = len(triangle)-1):
...: pos = 0
...: path = [triangle[0][0]]
...: for lr, row in zip(decisions, triangle[1:]):
...: pos += lr # cumulative sum of left-right decisions
...: path.append(row[pos])
...: print path
[5, 9, 5, 9]
[5, 9, 5, 7]
[5, 9, 4, 7]
[5, 9, 4, 3]
[5, 8, 4, 7]
[5, 8, 4, 3]
[5, 8, 6, 3]
[5, 8, 6, 4]
The way this works is to use itertools.product to iterate over all possible combinations of nrows-1 left/right decisisions, where a 0 means go left and a 1 means go right (so you are more or less generating the bits of all binary numbers up to 2^(nrows-1)). If you store the triangle as a list of lists, going left means staying at the same index in the next row, while going right means adding 1. To keep track of the position in the row, you thus simply calculate the cumulative sum of all left/right decisions.
To answer your second question: First of all, your algorithm seems pretty good, you only need to iterate once backwards over all rows and you do not have the exponential number of cases to check as in the brute-force solution. The only thing I would add to that is to build a new triangle, which indicates at every step whether the maximum was found to the left or to the right. This is useful to reconstruct the optimal path afterwards. All this can be implemented like this:
mx = triangle[-1] # maximum distances so far, start with last row
directions = [] # upside down triangle with left/right direction towards max
for row in reversed(triangle[:-1]): # iterate from penultimate row backwards
directions.append([l < r for l, r in zip(mx[:-1], mx[1:])])
mx = [x + max(l, r) for x, l, r in zip(row, mx[:-1], mx[1:])]
print 'Maximum so far:', mx
print 'The maximum distance is', mx[0]
directions.reverse()
pos = 0
path = [triangle[0][0]]
for direction, row in zip(directions, triangle[1:]):
pos += direction[pos]
path.append(row[pos])
print 'The optimal path is', path
As before, I used the trick that False = 0 and True = 1 to indicate going left and right. Using the same triangle as before, the result:
Maximum so far: [14, 11, 10]
Maximum so far: [23, 19]
Maximum so far: [28]
The maximum distance is 28
The optimal path is [5, 9, 5, 9]
I have already solved the problem using mergesort, now I am thinking is that possible to calculate the number using quicksort? I also coded the quicksort, but I don't know how to calculate. Here is my code:
def Merge_and_Count(AL, AR):
count=0
i = 0
j = 0
A = []
for index in range(0, len(AL) + len(AR)):
if i<len(AL) and j<len(AR):
if AL[i] > AR[j]:
A.append(AR[j])
j = j + 1
count = count+len(AL) - i
else:
A.append(AL[i])
i = i + 1
elif i<len(AL):
A.append(AL[i])
i=i+1
elif j<len(AR):
A.append(AR[j])
j=j+1
return(count,A)
def Sort_and_Count(Arrays):
if len(Arrays)==1:
return (0,Arrays)
list1=Arrays[:len(Arrays) // 2]
list2=Arrays[len(Arrays) // 2:]
(LN,list1) = Sort_and_Count(list1)
(RN,list2) = Sort_and_Count(list2)
(M,Arrays)= Merge_and_Count(list1,list2)
return (LN + RN + M,Arrays)
Generally no, because during the partitioning, when you move a value to its correct side of the pivot, you don't know how many of the values you're moving it past are smaller than it and how many are larger. So, as soon as you do that you've lost information about the number of inversions in the original input.
I come across this problem for some times, As a whole, I think it should be still ok to use quick sort to compute the inversion count, as long as we do some modification to the original quick sort algorithm. (But I have not verified it yet, sorry for that).
Consider an array 3, 6, 2, 5, 4, 1. Support we use 3 as the pivot, the most voted answer is right in that the exchange might mess the orders of the other numbers. However, we might do it different by introducing a new temporary array:
Iterates over the array for the first time. During the iteration, moves all the numbers less than 3 to the temporary array. For each such number, we also records how much number larger than 3 are before it. In this case, the number 2 has one number 6 before it, and the number 1 has 3 number 6, 5, 4 before it. This could be done by a simple counting.
Then we copy 3 into the temporary array.
Then we iterates the array again and move the numbers large than 3 into the temporary array. At last we get 2 1 3 6 5 4.
The problem is that during this process how much inversion pairs are lost? The number is the sum of all the numbers in the first step, and the count of number less than the pivot in the second step. Then we have count all the inversion numbers that one is >= pivot and another is < pivot. Then we could recursively deal with the left part and the right part.