Python "in" operator speed - python

Is the in operator's speed in python proportional to the length of the iterable?
So,
len(x) #10
if(a in x): #lets say this takes time A
pass
len(y) #10000
if(a in y): #lets say this takes time B
pass
Is A > B?

A summary for in:
list - Average: O(n)
set/dict - Average: O(1), Worst: O(n)
See this for more details.

There's no general answer to this: it depends on the types of a and especially of b. If, for example, b is a list, then yes, in takes worst-case time O(len(b)). But if, for example, b is a dict or a set, then in takes expected-case time O(1) (i.e., constant time).
About "Is A > B?", you didn't define A or B. As above, there's no general answer to which of your in statements will run faster.

Assuming x and y are lists:
Is the in operator's speed in Python proportional to the length of the iterable?
Yes.
The time for in to run on a list of length n is O(n). It should be noted that it is O(1) for x in set and x in dict as they are hashed, and in is constant time.
Is A > B?
No.
Time A < Time B as 10 < 10000
Docs: https://wiki.python.org/moin/TimeComplexity

Related

What is the Run Time Complexity and Space Complexity of this solution in terms of Big O?

I believe the space complexity would just be O(n) since the set is the only one that is stored throughout the program and the list is recalculated each time. I'm not sure if the time complexity would be O(n^2) because there is a while loop and inside there is a for loop or if it is something different because the while loop can just keep running if n is never 1 or in the set.
def isHappy(self,n):
seen = set()
while True:
if n not in seen:
seen.add(n)
n = sum([int(x) * int(x) for x in str(n)])
if n == 1:
return True
else:
return False
EDIT:
The previous statement about average time complexity was incorrect, as it did not take the complexity of the summing of squares of n's decimal digits into account.
Forgive me in advance for the lack of mathematical formatting. There's no easy way to do that in StackOverflow posts.
The short answer is that your solution will not enter an infinite loop, and it does indeed have O(n) space complexity and O(n**2) time complexity.
Here's the long answer:
Let f(n) denote the result of summing the squares of n's decimal digits, as is being done inside the while loop. If n has four or more digits, then f(n) is guaranteed to have fewer digits than n, as
f(9999) == 4 * 9**2 == 324
, and the difference between 10**k - 1 and f(10**k - 1) increases as k increases. So it takes, at most, log10(n) iterations of the loop to get to a three digit number for an n with four or more digits. And as
f(999) == 3 * 9**2 == 243
, no matter how many times you apply n = f(n) to an n with three or fewer digits, the result will also have three or fewer digits. There are only 1000 nonnegative integers with three or fewer digits, so by the Pigeonhole Principle, f(n) will either equal one or already be contained in the set after at most 1001 iterations. In total, that's no greater than log10(n) + 1001 iterations of the loop, where in this case n refers to the original value of the function argument.
For a set s, insertion and membership testing are both O(len(s)) in the worst case. Since the set can contain only as many elements as there are past iterations,
len(s) <= log10(n) + 1001.
And log10(n) + 1001 is O(n) (but not O(log(n)), since complexity is in terms of the size of the input (the number of digits), not the input itself). And since, during a given iteration, n either has fewer than its original number of digits or fewer than four digits, the summing of squares is also O(n) in the number of digits. In total, that's O(n) iterations that are O(n) each, for a total worst-case time complexity of O(n**2).
As explained above, you're guaranteed to reach a three-digit number eventually no matter how large n is, so you can actually replace the set with a list of 1000 bools. Then the solution would have O(1) space complexity.
The best case scenario is O(1) suppose if the program gets answer in the first attempt and the worst case scenario might be O(n^2) since the loop iterates over itself again and again but if you want a more precise answer than you can consider adding a new constant which will represent the following loop
sum([int(x) * int(x) for x in str(n)])
lets represent this loop with a constant r then the worst case complexity will become O(n^2 + r)

Is insertion of heapq is faster than insertion of bisect?

I have a question about bisect and heapq.
First I will show you 2 versions of code and then ask question about it.
version of using bisect:
while len(scoville) > 1:
a = scoville.pop(0)
#pops out smallest unit
if a >= K:
break
b = scoville.pop(0)
#pops out smallest unit
c = a + b * 2
bisect.insort(scoville, c)
version of using heapq
while len(scoville) > 1:
a = heapq.heappop(scoville)
#pops out smallest unit
if a >= K:
break
b = heapq.heappop(scoville)
#pops out smallest unit
c = a + b * 2
heapq.heappush(scoville, c)
Both algorithms use 2 pops and 1 insert.
As I know, at version of using bisect, pop operation of list is O(1), and insertion operation of bisect class is O(logn).
And at version of using heapq, pop operation of heap is O(1), and insertion operation of heap is O(logn) in average.
So both code should have same time efficiency approximately. However, version of using bisect is keep failing time efficiency test at some code challenge site.
Does anybody have a good guess?
*scoville is a list of integers
Your assumptions are wrong. Neither is pop(0) O(1), nor is bisect.insort O(logn).
The problem is that in both cases, all the elements after the element you pop or insert have to be shifted one position to the left or might, making both operations O(n).
From the bisect.insort documentation:
bisect.insort_left(a, x, lo=0, hi=len(a))
Insert x in a in sorted order. This is equivalent to a.insert(bisect.bisect_left(a, x, lo, hi), x) assuming that a is already sorted. Keep in mind that the O(log n) search is dominated by the slow O(n) insertion step.
You can test this by creating a really long list, say l = list(range(10**8)) and then doing l.pop(0) or l.pop() and bisect.insort(l, 0) or bisect.insort(l, 10**9). The operations popping and inserting at the end shoul be instantaneous, while the others have a short but noticeable delay.
You can also use %timeit to test it repeatedly on shorter lists, if you alternatingly pop and insert so that the length of the list remains constant over many thousands of runs:
>>> l = list(range(10**6))
>>> %timeit l.pop(); bisect.insort(l, 10**6)
100000 loops, best of 3: 2.21 us per loop
>>> %timeit l.pop(0); bisect.insort(l, 0)
100 loops, best of 3: 14.2 ms per loop
Thus, the version using bisect is O(n) and the one with heapq is O(logn).

Computational Complexity

Suppose we have the function below:
def func(x, value_):
assert 0 < x < value_
while x < value_:
x *= 2
Although value_ can be arbitrarily large, the while loop is not infinite and the number of comparisons is bounded above by value_. Consequently, is it correct that this function has computational complexity of O(N)?
It's O(log n) as x increases by doubling value toward _value for every execution. Try draw a graph of two lines you will see it.
The time complexity will be O(log(m/n, 2)), where m = value_ and n = x. Here, log(i, 2) represents the logarithmic of i in base 2.
Consider that if x is doubled, for a fixed value_, one less iteration is computed.
On the contrary, if value_ is doubled, for a fixed x, one extra iteration is computed.

Time Complexity - Codility - Ladder - Python

The question is available here. My Python code is
def solution(A, B):
if len(A) == 1:
return [1]
ways = [0] * (len(A) + 1)
ways[1], ways[2] = 1, 2
for i in xrange(3, len(ways)):
ways[i] = ways[i-1] + ways[i-2]
result = [1] * len(A)
for i in xrange(len(A)):
result[i] = ways[A[i]] & ((1<<B[i]) - 1)
return result
The detected time complexity by the system is O(L^2) and I can't see why. Thank you in advance.
First, let's show that the runtime genuinely is O(L^2). I copied a section of your code, and ran it with increasing values of L:
import time
import matplotlib.pyplot as plt
def solution(L):
if L == 0:
return
ways = [0] * (L+5)
ways[1], ways[2] = 1, 2
for i in xrange(3, len(ways)):
ways[i] = ways[i-1] + ways[i-2]
points = []
for L in xrange(0, 100001, 10000):
start = time.time()
solution(L)
points.append(time.time() - start)
plt.plot(points)
plt.show()
The result graph is this:
To understand why this O(L^2) when the obvious "time complexity" calculation suggests O(L), note that "time complexity" is not a well-defined concept on its own since it depends on which basic operations you're counting. Normally the basic operations are taken for granted, but in some cases you need to be more careful. Here, if you count additions as a basic operation, then the code is O(N). However, if you count bit (or byte) operations then the code is O(N^2). Here's the reason:
You're building an array of the first L Fibonacci numbers. The length (in digits) of the i'th Fibonacci number is Theta(i). So ways[i] = ways[i-1] + ways[i-2] adds two numbers with approximately i digits, which takes O(i) time if you count bit or byte operations.
This observation gives you an O(L^2) bit operation count for this loop:
for i in xrange(3, len(ways)):
ways[i] = ways[i-1] + ways[i-2]
In the case of this program, it's quite reasonable to count bit operations: your numbers are unboundedly huge as L increases and addition of huge numbers is linear in clock time rather than O(1).
You can fix the complexity of your code by computing the Fibonacci numbers mod 2^32 -- since 2^32 is a multiple of 2^B[i]. That will keep a finite bound on the numbers you're dealing with:
for i in xrange(3, len(ways)):
ways[i] = (ways[i-1] + ways[i-2]) & ((1<<32) - 1)
There are some other issues with the code, but this will fix the slowness.
I've taken the relevant parts of the function:
def solution(A, B):
for i in xrange(3, len(A) + 1): # replaced ways for clarity
# ...
for i in xrange(len(A)):
# ...
return result
Observations:
A is an iterable object (e.g. a list)
You're iterating over the elements of A in sequence
The behavior of your function depends on the number of elements in A, making it O(A)
You're iterating over A twice, meaning 2 O(A) -> O(A)
On point 4, since 2 is a constant factor, 2 O(A) is still in O(A).
I think the page is not correct in its measurement. Had the loops been nested, then it would've been O(A²), but the loops are not nested.
This short sample is O(N²):
def process_list(my_list):
for i in range(0, len(my_list)):
for j in range(0, len(my_list)):
# do something with my_list[i] and my_list[j]
I've not seen the code the page is using to 'detect' the time complexity of the code, but my guess is that the page is counting the number of loops you're using without understanding much of the actual structure of the code.
EDIT1:
Note that, based on this answer, the time complexity of the len function is actually O(1), not O(N), so the page is not incorrectly trying to count its use for the time-complexity. If it were doing that, it would've incorrectly claimed a larger order of growth because it's used 4 separate times.
EDIT2:
As #PaulHankin notes, asymptotic analysis also depends on what's considered a "basic operation". In my analysis, I've counted additions and assignments as "basic operations" by using the uniform cost method, not the logarithmic cost method, which I did not mention at first.
Most of the time simple arithmetic operations are always treated as basic operations. This is what I see most commonly being done, unless the algorithm being analysed is for a basic operation itself (e.g. time complexity of a multiplication function), which is not the case here.
The only reason why we have different results appears to be this distinction. I think we're both correct.
EDIT3:
While an algorithm in O(N) is also in O(N²), I think it's reasonable to state that the code is still in O(N) b/c, at the level of abstraction we're using, the computational steps that seem more relevant (i.e. are more influential) are in the loop as a function of the size of the input iterable A, not the number of bits being used to represent each value.
Consider the following algorithm to compute an:
def function(a, n):
r = 1
for i in range(0, n):
r *= a
return r
Under the uniform cost method, this is in O(N), because the loop is executed n times, but under logarithmic cost method, the algorithm above turns out to be in O(N²) instead due to the time complexity of the multiplication at line r *= a being in O(N), since the number of bits to represent each number is dependent on the size of the number itself.
Codility Ladder competition is best solved in here:
It is super tricky.
We first compute the Fibonacci sequence for the first L+2 numbers. The first two numbers are used only as fillers, so we have to index the sequence as A[idx]+1 instead of A[idx]-1. The second step is to replace the modulo operation by removing all but the n lowest bits

Trying to understand the differences between Θ(n2) and Θ(n) in regards to run-time

I am trying to learn about the differences between Θ(n2) and Θ(n). I started with wiki, my textbook, read here, and here and then tried to create the following simple example for my beginner understanding. When comparing the run-times of this snippet of code in Python 2.7, where I would comment out option #1 when timing option #2, and comment out option #2 when timing option #1:
import timeit
start = timeit.default_timer()
a = []
a.extend(range(1, 10))
b = []
b.extend(range(1, 10))
c = []
c.extend(range(1, 100))
# option 1
for x in a:
for y in b:
print("-")
# option 2
# for x in c:
# print ("-")
stop = timeit.default_timer()
print stop - start
each line is an output per run, and I have prepended a #1 - or #2 - for clarity:
#1 - 0.000207901000977
#1 - 0.000203132629395
#1 - 0.000202178955078
#1 - 0.000203847885132
#1 - 0.000203847885132
#2 - 0.000240087509155
#2 - 0.000240087509155
#2 - 0.0142140388489
#2 - 0.000237941741943
#2 - 0.000246047973633
Both options print - 100 times. I had assumed Θ(n2) is slower than Θ(n), even with a trivial case, yet, option #1, with Θ(n2), outperformed option #2 with Θ(n).
You are not comparing an O(N2) algorithm with an O(N) algorithm.
Both algorithms are, essentially O(N); you are executing print N times. But in the first option, you take the square root of N in an outer loop and the same square root in the inner loop, creating a O(sqrt(N) * sqrt(N)) == O(N) algorithm.
You can never compare the timings of two algorithms by just varying the value of N otherwise. Normally you take a look at the inputs to an algorithm. Sorting for example, looks at the number of elements to sort, and the best sorting algorithms will take NlogN steps to sort such a list. You were trying to compare the output instead; print() being executed N times in total.
Putting it differently, if option one takes N as the input, runs in O(N2) time (prints N2 times), then option two takes M input, runs in O(M) time, but M = N * N. As a result, the second option runs in O(N2) time as well, but you merely produced the N2 repetitions differently.
The timing differences are otherwise too close to call and can be down to I/O waits (you are writing to a terminal).
The main issue is that you actually have three inputs to consider rather than one. Option A is O(A * B), while option B is O(C), where A is the length of a, B is the size of b and C is the size of c. Meaning that you can't really directly compare the two algorithms.
Your choice of input values mean that it just so happens that O(a) = O(b) and O(c) = O(a * b) resulting in the same performance and complexity in this particular case.
You could convert this into a single input problem where option A is O((sqrt N)^2) = O(N) and option B is O(N) adding the constraints of C = N and A = B = sqrt(N).

Categories

Resources