Measuring the time elapsed gives incorrect values

Measuring the time elapsed gives incorrect values - python

I am trying to measure the time elapsed (in milliseconds) for sorting an array of variable size using the mergesort algorithm but the code gives inconsistent values of time elapsed, so for example when no. of elements = 60 --> time = 16.407999999999998 ms
and when no. of elements = 70 --> time = 0.988 ms
def mergeSort(arr):
if len(arr) > 1:
# Finding the mid of the array
mid = len(arr)//2
# Dividing the array elements
L = arr[:mid]
# into 2 halves
R = arr[mid:]
# Sorting the first half
mergeSort(L)
# Sorting the second half
mergeSort(R)
i = j = k = 0
# Copy data to temp arrays L[] and R[]
while i < len(L) and j < len(R):
if L[i] < R[j]:
arr[k] = L[i]
i += 1
else:
arr[k] = R[j]
j += 1
k += 1
# Checking if any element was left
while i < len(L):
arr[k] = L[i]
i += 1
k += 1
while j < len(R):
arr[k] = R[j]
j += 1
k += 1
# random number array generator
def arrGen(num):
myArr = list(np.random.randint(0,100, size = num))
return myArr
def printList(arr):
for i in range(len(arr)):
print(arr[i], end=" ")
print()
# Driver Code
if __name__ == '__main__':
for i in range(10,100,10):
arr = arrGen(i)
print(f"Arr length is {len(arr)}\n")
print("Given array is", end="\n")
printList(arr)
start_time = datetime.datetime.now()
mergeSort(arr)
end_time = datetime.datetime.now()
time_diff = (end_time - start_time)
execution_time = time_diff.total_seconds() * 1000.0
print("Sorted array is: ", end="\n")
printList(arr)
print(f"\nTotal time is {execution_time}")
print("\n\n")

Use a benchmarking library (e.g. timeit).
elapsed_secs = timeit.timeit(
'mergeSort(data.copy())',
setup='data = arrGen(100)',
globals=globals())
Note this makes a fresh copy of the unsorted data on each pass. Otherwise the input would be sorted after the first pass.

Related

Error messages while implementing mergesort related to timeit and the index range

I need to implement mergesort in python and compare the execution time given a few lists of different lengths consisting of random numbers. More precisely, I have to find if the execution time is linear, quadratic or something else, by means of a plot. We know that mergesort has a running time of $O(NlogN).$ Unfortunately, I get error messages related to the execution time part but I also get the following:
File "/home/myname/file.py", line 58, in merge
S[k] = S2[j]
IndexError: list assignment index out of range
Here is the code. Thanks for any suggestions.
import numpy as np
import random
import timeit
import matplotlib.pyplot as plt
def mergesort(S):
n = S.size
if n == 1:
return S
else:
S1, S2 = split(S)
S1sorted = mergesort(S1)
S2sorted = mergesort(S2)
Ssorted = merge(S1sorted, S2sorted)
return Ssorted
def split(S):
l = len(S)//2
S1 = S[:l]
S2 = S[l:]
return S1, S2
def merge(S1, S2):
i = 0
j = 0
k = 0
S = []
while i < len(S1) and j < len(S2):
if S1[i] < S2[j]:
S[k] = S1[i]
i += 1
else:
S[k] = S2[j]
j += 1
k += 1
while i < len(S1):
S[k] = S1[i]
i += 1
k += 1
while j < len(S2):
S[k] = S2[j]
j += 1
k += 1
return S
if __name__ == '__main__':
random.seed(5)
d = [np.random.rand(10**i) for i in range(10)]
print("List of lists:\n",d)
time_list = []
length_list = []
for s in d:
execution_time = timeit.timeit(stmt = 'mergesort(s)', setup='from __main__ import mergesort,s')
time_list.append(execution_time)
length_list.append(len(s))
print("The list s:\n", s)
print("The execution time for the list",s,"is:\n", timeit.timeit(stmt = 'mergesort(s)', setup='from __main__ import mergesort,s'))
plt.scatter(length_list, time_list)
plt.xlabel("N")
plt.ylabel("Execution time for a list of length N")
plt.show()

In that specific line S[k] = S1[i] you are trying to assign to the k-ish element of your list a value. However this element does not exist. Try using S.append(S1[i]).

issue is in function merge you have define the size of list S to zero ie S = []
and in the below code you are setting the value for index k. means for empty array you are assigning values which shouldn't be.
All you need to do is make list S equal to len(S1) + len(S2)`.
so below is right code for merge function
def merge(S1, S2):
i = 0
j = 0
k = 0
S = [None for _ in range(len(S1)+len(S2))] # < -- change happen here
while i < len(S1) and j < len(S2):
if S1[i] < S2[j]:
S[k] = S1[i]
i += 1
else:
S[k] = S2[j]
j += 1
k += 1
while i < len(S1):
S[k] = S1[i]
i += 1
k += 1
while j < len(S2):
S[k] = S2[j]
j += 1
k += 1
return S
EDIT:
what does None and _ means below
S = [None for _ in range(len(S1)+len(S2))]
_ here it is just a throwaway variable, which is not assigned anywhere and not need, just used to make list comprehension
None used to Assign as default value to the final sorted list,when we are initalising/declaring the final list ie S
this is same as S = [n] * None

Need to optimize my mathematical py code with lists

Im not very sure if I will translate the assignment correctly, but the bottom line is that I need to consider the function f(i) = min dist(i, S), where S is number set of length k and dist(i, S) = sum(j <= S)(a (index i) - a(index j)), where a integer array.
I wrote the following code to accomplish this task:
n, k = (map(int, input().split()))
arr = list(map(int, input().split()))
sorted_arr = arr.copy()
sorted_arr.sort()
dists = []
returned = []
ss = 0
indexed = []
pop1 = None
pop2 = None
for i in arr:
index = sorted_arr.index(i)
index += indexed.count(i)
indexed.append(i)
dists = []
if (index == 0):
ss = sorted_arr[1:k+1]
elif (index == len(arr) - 1):
sorted_arr.reverse()
ss = sorted_arr[1:k+1]
else:
if index - k < 0:
pop1 = 0
elif index + k > n - 1:
pop2 = None
else:
pop1 = index - k
pop2 = index + k + 1
ss = sorted_arr[pop1:index] + sorted_arr[index + 1: pop2]
for ind in ss:
dists.append(int(abs(i - ind)))
dists.sort()
returned.append(str(sum(dists[:k])))
print(" ".join(returned))
But I need to speed up its execution time significantly.

Heap Sort Algorithm number of comparisons

I'm trying to count the number of comparisons in this heap sort algorithm:
import random
import time
#HeapSort Algorithm
def heapify(arr, n, i):
count = 0
largest = i
l = 2 * i + 1
r = 2 * i + 2
if l < n and arr[i] < arr[l]:
largest = l
if r < n and arr[largest] < arr[r]:
largest = r
if largest != i:
count += 1
arr[i],arr[largest] = arr[largest],arr[i]
heapify(arr, n, largest)
return count
def heapSort(arr):
n = len(arr)
count = 0
for i in range(n, -1, -1):
heapify(arr, n, i)
count += heapify(arr, i, 0)
for i in range(n-1, 0, -1):
arr[i], arr[0] = arr[0], arr[i]
heapify(arr, i, 0)
return count
print("For n = 1000:")
print("a) Random Generation:")
arr = [x for x in range(1000)]
random.shuffle(arr)
print("Before Sort:")
print (arr)
print("After Sort:")
start_time = time.time()
heapSort(arr)
time = time.time() - start_time
print(arr)
print("Comparisions")
print(heapSort(arr))
print("Time:")
print(time)
I expect the result when n = 1000 integers to be 8421 and when n = 10000 to be 117681
However, each time it either shows 0 or 2001 when I try to count += 1 around the loops and not comparisons.

You seem to be forgetting to take into account the comparisons your recursive solution makes while solving the smaller subproblems. In other words, you are only finding the comparisons made in the topmost level of your solution. Instead, you should update the count variable in the relevant scope whenever you make a call to your heapify function. Notice the updates below where I increased local count variables by the return value of calls to heapify.
def heapify(arr, n, i):
count = 0
largest = i
l = 2 * i + 1
r = 2 * i + 2
if l < n and arr[i] < arr[l]:
largest = l
if r < n and arr[largest] < arr[r]:
largest = r
if largest != i:
count += 1
arr[i],arr[largest] = arr[largest],arr[i]
count += heapify(arr, n, largest)
return count
def heapSort(arr):
n = len(arr)
count = 0
for i in range(n, -1, -1):
heapify(arr, n, i)
count += heapify(arr, i, 0)
for i in range(n-1, 0, -1):
arr[i], arr[0] = arr[0], arr[i]
count += heapify(arr, i, 0)
return count
Here is a working example of your code including the fix given above. I understand that the output is still slightly different than the exact number of comparisons you are expecting, but it is in the ballpark. The relatively small distance is due to the fact that you are randomizing the initial state of the array.

How do you sort a list with a while loop in Python?

How do you sort a list with a while loop? Having a bit of a problem, thanks very much in advance.
a = [12,0,39,50,1]
first = a[0]
i = 0
j = 1
while i < len(a):
if a[i] < first:
tmp = a[i]
a[i] = a[j]
a[j] = tmp
i += 1
print(a)

You can create an empty list that would store your sorted numbers
a = [12,0,39,50,1]
kk = len(a)
new_a = []
i = 0
while i < kk:
xx = min(a) ## This would retreive the minimum value from the list (a)
new_a.append(xx) ## You store this minimum number in your new list (new_a)
a.remove(xx) ## Now you have to delete that minimum number from the list a
i += 1 ## This starts the whole process again.
print(new_a)
Please, note that I used the original length of the list a (kk) for the while statement so as not to stop the iteration because the length of the list a decreases as we delete the minimum numbers.

Following is the implementation of basic sorting using two while loops.
In every iteration, the minimum element (considering ascending order) from the unsorted subarray is picked and moved to the sorted subarray.
:
a=[12,0,39,50,1]
i=0
while i<len(a):
key=i
j=i+1
while j<len(a):
if a[key]>a[j]:
key=j
j+=1
a[i],a[key]=a[key],a[i]
i+=1
print(a)

# By using For loop
def ascending_array(arr):
print(f"Original array is {arr}")
arr_length = len(arr)
if arr_length <= 1:
return arr
for i in range(len(arr)):
for j in range(i+1, len(arr)):
if arr[i] >= arr[j]:
arr[i], arr[j] = arr[j], arr[i]
print(f"The result array is {arr}") # [0,0,0,1,10,20,59,63,88]
arr = [1,10,20,0,59,63,0,88,0]
ascending_array(arr)
# By using While loop
def ascending_array(arr):
print(f"Original array is {arr}")
arr_length = len(arr)
if arr_length <= 1:
return arr
i = 0
length_arr = len(arr)
while (i<length_arr):
j = i+1
while (j<length_arr):
if arr[i] > arr[j]:
arr[i], arr[j] = arr[j], arr[i]
j+=1
i+=1
print(f"The result array is {arr}") # [0,0,0,1,10,20,59,63,88]
arr = [1,10,20,0,59,63,0,88,0]
ascending_array(arr)
For-loop is best in terms of performance. while-loop is checking condition every iteration.

You can also concatenate two lists and sort them in decreasing/increasing order using this example:
x = [2,9,4,6]
y = [7,8,3,5]
z = []
maxi = x[0]
pos = 0
print('x: '+str(x))
print('y: '+str(y))
for i in range(len(y)):
x.append(y[i])
for j in range(len(x)-1):
maxi = x[0]
for i in range(len(x)):
if maxi < x[i]:
maxi = x[i]
pos = i
z.append(maxi)
del x[pos]
z.append(x[0])
print('z: '+str(z))

Reducing time complexity of contiguous subarray

I was wondering how I could reduce the time complexity of this algorithm.
It calculates the length of the max subarray having elements that sum to the k integer.
a = an array of integers
k = max integer
ex: a = [1,2,3], k= 3
possible subarrays = [1],[1,2]
length of the max subarray = 2
sys.setrecursionlimit(20000)
def maxLength(a, k):
#a = [1,2,3]
#k = 4
current_highest = 0
no_bigger = len(a)-1
for i in xrange(len(a)): #0 in [0,1,2]
current_sum = a[i]
sub_total = 1
for j in xrange(len(a)):
if current_sum <= k and ((i+sub_total)<=no_bigger) and (k>=(current_sum + a[i+sub_total])):
current_sum += a[i+sub_total]
sub_total += 1
else:
break
if sub_total > current_highest:
current_highest = sub_total
return current_highest

You can use sliding window algorithm for this.
Start at index 0, and calculate sum of subarray as you move forward. When sum exceeds k, start decrementing the initial elements till sum is again less than k and start summing up again.
Find below the python code:
def max_length(a,k):
s = 0
m_len = 0
i,j=0,0
l = len(a)
while i<l:
if s<=k and m_len<(j-i):
m_len = j-i
print i,j,s
if s<=k and j<l:
s+=a[j]
j+=1
else:
s-=a[i]
i+=1
return m_len
a = [1,2,3]
k = 3
print max_length(a,k)
OUTPUT:
2

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Measuring the time elapsed gives incorrect values - python

Use a benchmarking library (e.g. timeit). elapsed_secs = timeit.timeit( 'mergeSort(data.copy())', setup='data = arrGen(100)', globals=globals()) Note this makes a fresh copy of the unsorted data on each pass. Otherwise the input would be sorted after the first pass.

Related

Error messages while implementing mergesort related to timeit and the index range

Need to optimize my mathematical py code with lists

Heap Sort Algorithm number of comparisons

How do you sort a list with a while loop in Python?

Reducing time complexity of contiguous subarray

Categories

Resources