BIG O time complexity of TSP algorithms

BIG O time complexity of TSP algorithms - python

I've written 2 nearest neighbor algorithms in python and I have to analyize the runtime complexity by O(n) and Θ(n).
So I've tried several samples and I don't understand why one of my algorithm is faster than the other one.
So here is my Code for the repeated nearest neighbor (RNN) algorithm:
def repeated_nn_tsp(cities):
return shortest_tour(nn_tsp(cities, start) for start in cities)
def shortest_tour(self, tours):
return min(tours, key=self.tour_length)
nn_tsp has a runtime complexity of O(n^2) and every startpoint will create a new NN Tour. Through all NN tours I have to find the best tour.
That's why I think the time complexity of the RNN has to be T(n)=O(n^3) and T(n)=Θ(n^3).
So here is my Code for the altered nearest neighbor (ANN) algorithm:
def alter_tour(tour):
original_length = tour_length(tour)
for (start, end) in all_segments(len(tour)):
reverse_segment_if_better(tour, start, end)
if tour_length(tour) < original_length:
return alter_tour(tour)
return tour
def all_segments(N):
return [(start, start + length) for length in range(N, 2-1, -1) for start in range(N - length + 1)]
def reverse_segment_if_better(tour, i, j):
A, B, C, D = tour[i-1], tour[i], tour[j-1], tour[j % len(tour)]
if distance(A, B) + distance(C, D) > distance(A, C) + distance(B, D):
tour[i:j] = reversed(tour[i:j])
The time complexity of all_segments should be T(n) = O(1/2 * n^2 - 0.5n) -> O(n^2) and creates n^2 elements.
Inside the Loop through all_segments (through n^2 elements) I call the function reverse_segment_if_better. I'll use the reversed method of python, which causes a time complexity of O(n).
That's why I think the time complexity of the loop has to be O(n^3). When there's a better tour, the function will call itself recursive. I think the outcome of the altered NN has a time complexity of O(n^4). Is that right?
But here we come to my problem: My evaluation, which runs the code 100times over 100cities, shows me that ANN is faster than RNN on average which is the opposite of the runtime complexity I expected. (RNN needs 4.829secs and ANN only needs 0.877secs for 1x 100-city.)
So where did I make a mistake?
Thanks in advance!

First I must say that time-complexity and big-o notations are not always on point, one algorithm may have a 'better' running-time function but still would run slower than expected, or slower than another function with a worst running-time function, in your case, it is very hard to determine what is the worst case to feed the algorithm, and we cannot assure you have done that! Maybe the cases were 'pleasant' with the ANN algorithm while the other one got stuck somewhere..? this is why it is not always 100% correct to rely only on the running time function we calculate.
What I am trying to say, is that you most probably did not make a mistake in your calculations on purpose, because they are hard functions to analyze on the fly, or What kind of input would be the worst, for example
As for the 'why?':
When talking about actual personal running time (as you gave an example of 0.877seconds), it boils down to our own machines, each computer has its own running hardware behind the curtains, not all computers are born the same.
Secondly, when we talk about running time complexity, we drop the low term values as you did with the all_segments function, you can see that you even dropped a negative term which in theory would help us reduce the numbers of 'operations'.
There are many cases in which there is a bit of code not-so efficient, that we bother to execute only if a specific criteria is met, thus reducing the running time.
Last and most importantly is the fact that when we talk about classifying
algorithms into sets such as O(n) or O(nlogn) we are talking about
asymptotic functions, we need to look at the bigger picture and see
what happens when we feed the algorithm very large amount of data,
which I assume you didn't check, because as you wrote, you ran only 100 cities. That may
vary if we would look at let's say, millions and millions of cities.
For your code, I can notice multiple parts that would reasonably be the cause of this 'weird' difference in the running time. The first, is that in theANN code, more specifically in the reverse_segment_if_better function, we are not always reversing the list, only if a certain statement is evaluated to a truthy value. We cannot be sure what kind of input you've given the algorithm, and thus I have only to imagine it is compliant with the second algorithm.
Moreover, it may be that I am missing something (as the function reverse_segment_if_better / we cannot view the function tour_length or distance) but I don't see how you came up with O(n^4) at the end, it seems like it is doing O(n^3):
all_segments- no doubt it is O(n) - returning ~n/2 values
The tricky part is analyzing reverse_segment_if_better and alter_tour - reversing only occurs from i:j thus it is not strictly correct to say it has O(n) - as we do not reverse the whole tour (at least, not for every value of start, end.
It is safe to say that it may be the case of not checking for very large numbers asymptotically, you gave an input and it was kind to this specific algorithm, or the final form of T(n) was not strict enough.

Related

Unsure about the time complexity in the code

I have a question about time complexity
import math
def power_iter(x,n) :
for i in range(math.floor(math.log2(n))):
x = x*x
print(x)
return math.pow(2,(n-math.pow(2,math.floor(math.log2(n)))))*x
print(power_iter(2,10))
Q1. Is the time complexity of math.floor(math.log2(n)) and n-math.pow(2,math.floor(math.log2(n)))) O(1)?
Q2. I think that this code's time complexity is O(log2(n)). Is this right?

Q1. Are the "math.floor(math.log2(n))" and "n-math.pow(2,math.floor(math.log2(n))))" time complexity is O(1) each other? or not include in time complexity
Correct, these operations are ultimately irrelevant in the simplified time complexity. Big O notation describes the rate of increase, in this case with respect to n. The iteration over the range object is what you're after here; you can effectively treat the individual math calls within each iteration as you would basic operators on integers with O(1) time.
Q2. I think that this code's time complexity is O(log2(n)). Is this right?
Yes.

Answer to Q1: It depends on the scale of n, but normally, you can suppose that the time complexity of the floor and log functions are in Theta(1).
Answer to Q2: As you found, we have only one loop with the size of log(n). So, if we assume the answer to the first question is right, you can say the time complexity is in O(log(n)) and, also Theta(log(n)).

What is the worst-case big-O time complexity for this code?

I had a quiz in my class and didn't do so well on it. I'm looking to find out if someone can explain to me what I did wrong here - our professor is overwhelmed with office hours as we moved online so I thought I'd post here.
def functionB(n):
for i in range(1,6):
for j in range(i,6):
n = n // 2
return n
I gave the following answer:
The above function is O(n^2) because of the nested for-loops. Although
the value of n is being cut in half upon each iteration, it does not
have an impact on the actual run time of the code.
I was given 3/10 for it but unfortunately there is no explanation so I'm unsure of what I got wrong and why. Is there anyone here who can explain the correct answer to me?

If you're considering n to be the argument passed in, note how you say
it (n) does not have an impact on the actual run time of the code.
If n has no impact on the runtime, it wouldn't be O(n^2), since that indicates that the runtime scales (quadratically) with n.
This function looks like it's O(1). The function will always run exactly the same, regardless of input. It will always run exactly 15 times, because n has no bearing on how many times the loop will run. The runtime of the program is decided entirely by the hardcoded arguments given to range, which never change.

The approach suggested by #Carcigenicate is correct. Here I will add something to it.
The time complexity of code snippet is O(1) i.e. constant time. If I take both range bounds inclusive in nature then it will run exactly for 21 times (6 + 5 + 4 + 3 + 2 + 1). So the return from the method would be n/2^21. So in the bitwise concept, we can say the given number has been right-shifted 21 times if we are considering remainders i.e. n is decimal number.

When CPython set `in` operator is O(n)?

I was reading about the time complexity of set operations in CPython and learned that the in operator for sets has the average time complexity of O(1) and worst case time complexity of O(n). I also learned that the worst case wouldn't occur in CPython unless the set's hash table's load factor is too high.
This made me wonder, when such a case would occur in the CPython implementation? Is there a simple demo code, which shows a set with clearly observable O(n) time complexity of the in operator?

Load factor is a red herring. In CPython sets (and dicts) automatically resize to keep the load factor under 2/3. There's nothing you can do in Python code to stop that.
O(N) behavior can occur when a great many elements have exactly the same hash code. Then they map to the same hash bucket, and set lookup degenerates to a slow form of linear search.
The easiest way to contrive such bad elements is to create a class with a horrible hash function. Like, e.g., and untested:
class C:
def __init__(self, val):
self.val = val
def __eq__(a, b):
return a.val == b.val
def __hash__(self):
return 3
Then hash(C(i)) == 3 regardless of the value of i.
To do the same with builtin types requires deep knowledge of their CPython implementation details. For example, here's a way to create an arbitrarily large number of distinct ints with the same hash code:
>>> import sys
>>> M = sys.hash_info.modulus
>>> set(hash(1 + i*M) for i in range(10000))
{1}
which shows that the ten thousand distinct ints created all have hash code 1.

You can view the set source here which can help: https://github.com/python/cpython/blob/723f71abf7ab0a7be394f9f7b2daa9ecdf6fb1eb/Objects/setobject.c#L429-L441
It's difficult to devise a specific example but the theory is fairly simple luckily :)
The set stores the keys using a hash of the value, as long as that hash is unique enough you'll end up with the O(1) performance as expected.
If for some weird reason all of your items have different data but the same hash, it collides and it will have to check all of them separately.
To illustrate, you can see the set as a dict like this:
import collection
your_set = collection.defaultdict(list)
def add(value):
your_set[hash(value)].append(value)
def contains(value):
# This is where your O(n) can occur, all values the same hash()
values = your_set.get(hash(value), [])
for v in values:
if v == value:
return True
return False

This a sometimes called the 'amortization' of a set or dictionary. It's shows up now and then as an interview question. As #TimPeters says resizing happens automagically at 2/3 capacity, so you'll only hit O(n) if you force the hash, yourself.
In computer science, amortized analysis is a method for analyzing a given algorithm's complexity, or how much of a resource, especially time or memory, it takes to execute. The motivation for amortized analysis is that looking at the worst-case run time per operation, rather than per algorithm, can be too pessimistic.
`/* GROWTH_RATE. Growth rate upon hitting maximum load.
* Currently set to used*3.
* This means that dicts double in size when growing without deletions,
* but have more head room when the number of deletions is on a par with the
* number of insertions. See also bpo-17563 and bpo-33205.
*
* GROWTH_RATE was set to used*4 up to version 3.2.
* GROWTH_RATE was set to used*2 in version 3.3.0
* GROWTH_RATE was set to used*2 + capacity/2 in 3.4.0-3.6.0.
*/
#define GROWTH_RATE(d) ((d)->ma_used*3)`
More to the efficiency point. Why 2/3 ? The Wikipedia article has a nice graph
https://upload.wikimedia.org/wikipedia/commons/1/1c/Hash_table_average_insertion_time.png
accompanying the article . (linear probing curve corresponds to O(1) to O(n) for our purposes, chaining is a more complicated hashing approach)
See https://en.wikipedia.org/wiki/Hash_table
for the complete
Say you have a set or dictionary which is stable, and is at 2/3 - 1 of it underlying capacity. Do you really want sluggish performance forever? You may wish to force resizing it upwards.
"if the keys are always known in advance, you can store them in a set and build your dictionaries from the set using dict.fromkeys()." plus some other useful if dated observations. Improving performance of very large dictionary in Python
For a good read on dictresize(): (dict was in Python before set)
https://github.com/python/cpython/blob/master/Objects/dictobject.c#L415

Complexity does not match actual growth in running time? (python)

I ran 2 codes in python then measured the time it took to complete.The codes are quite simple , just recursive maximums. Here it is:
1.
def max22(L, left, right):
if(left>=right):
return L[int(left)]
k = max22(L,left,(left+right-1)//2)
p = max22(L, (right+left+1)//2,right)
return max(k,p)
def max_list22(L):
return max22(L,0,len(L)-1)
def max2(L):
if len(L)==1:
return L[0]
l = max2(L[:len(L)//2])
r = max2(L[len(L)//2:])
return max(l,r)
The first one should run (imo) in O(logn), and the second one in O(n*logn).
However, I measured the running time for n=1000 , n=2000 and n=4000,
And somehow the growth for both of the algorithms seems to be linear! How is this possible? Did I get the complexity wrong, or is it okay?
Thanks.

The first algorithms is not O(log n) because it checks value of each element. It may be shown that it is O(n)
As for the second, possibly you just couldn't notice difference between n and nlogn on such small scales.

Just because a function is splitting the search space by 2 and then recursively looking at each half does not mean that it has a log(n) factor in the complexity.
In your first solution, you are splitting the search space by 2, but then ultimately inspecting every element in each half. Unlike binary search which discards one half of the search space, you are inspecting both halves. This means nothing is discarded from the search and you ultimately end up looking at every element, making your complexity O(n). The same holds true for your second implementation.

Your first algorithm is O(n) on a normal machine, so it is not surprising that your testing indicated this. Your second algorithm is O(n*log in), but it would be O(n) if you were using proper arrays instead of lists. Since Python builtin list operations are pretty fast, you may not have hit the logarithmic slowdown yet; try it with values more like n=4000000 and see what you get.
Note that, if you could run both recursive calls in parallel (with O(1) slicing), both algorithms could run in O(log n) time. Of course, you would need O(n) processors to do this, but if you were designing a chip, instead of writing a program, that kind of scaling would be straightforward...

Calculating Time and Space Complexity of xrange(), random.randint() and sort() function

What is the time and space complexity of xrange(),random.randint(1,100) and sort() function in Python
import random
a = [random.randint(1,100) for i in xrange(1000000)]
print a
a.sort()
print a

Without further information on the problem, your actual task and your solving attempts an answer could merely be adequate...but I will try to at least give you some input.
a = [random.randint(1,100) for i in xrange(1000000)]
A statement like a = ... is normally considered to have O(1) in terms of time complexity. Space complexity depends on how detailed you wish to analyze the problem. Simplified one might say 1.000.000 random ints in a list is something like O(1.000.000) and therefore constant, hence one could say in dependency of the input length (1.000.000, 2.000.000, ...) it results in O(n).
[random.randint(1,100) for i in xrange(1000000)] is a for-loop with 1.000.000 loops and generating a random integer. In dependency of the randint-algorithm this would also be something like O(n).
a.sort() is highly dependent on the used sorting algorithm. Most languages use merge-sort, which is O(n * log(n)) at all cases.

I got the answer on Facebook. Thanks to Shashank Gupta.
I'm assuming you know the basics of asymptotic notation and stuff.
Now, forget the a.sort() function for a moment and concentrate on your list comprehension:
a = [random.randint(1,100) for i in xrange(1000000)]
1000000 is pretty big so let's reduce it to 10 for now.
a = [random.randint(1,100) for i in xrange(10)]
You're building a new list here with 10 elements. Each element is generated via the randint function. Let's assume the time complexity of this function is O(1). For 10 elements, this function will be called 10 times, right?
Now, let's generalize this. For integer 'n'
a = [random.randint(1,100) for i in xrange(n)]
You will be calling the randint function 'n' times.
All of this can also be written as:
for i in xrange(n):
a.append(randint(1, 100))
This is O(n).
Following the code, you've a simple print statement. This is O(n) again (internally, python interpreter iterates over the complete list). Now comes the sorting part. You've used the sort function. How much time does it take? There are many sorting algorithms out there, and without going into the exact algo used, I can safely assume the time complexity will be O(n log n)
Hence, the actual time complexity of your code is T(n) = O(n log n) + O(n) which is O(n log n) (the lower term is ignored for large n)
What about space? Your code initialized a new list of size 'n'. Hence space complexity is O(n).
There you go.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.