What is the time complexity in my single number code - python

The question is "Given an array of integers, every element appears twice except for one. Find that single one.
Note:
Your algorithm should have a linear runtime complexity. Could you implement it without using extra memory?"
My code is below:
def singleNumber(nums):
for i in range(len(nums)):
if nums.count(nums[i]) == 1:
return nums[i]
Why my code is not O(N)? did it calculate by for loop, which takes n rounds?
Thanks.

We can do this using an xor command:
a = [0,0,1,1,2,2,3,3,4,5,5,6,6,7,7,8,8,9,9]
ans = 0
for i in a:
ans = i^ans
ans
4
This works, as the xor is effectively doing (1^1)^(2^2)^4^(5^5), and we cancel all the doubles out.

You code runs in O(N^2) because the count method runs in O(N) and it is executed inside of a for loop. Which in total gives you O(N^2).
If you want to make it run in O(N).
You can do the following:
Loop through the array and set its values as keys to a dict and the number of times it appears in the list as it key's value.
Then iterate through the key-value pairs of the dict again and look for the key whose value is 1. This is will give O(2*N) which is O(N) time complexity.

Related

What is the time complexity of the *in* operation on arrays in python

This code returns the first two numbers in the array that sum up to the targetSum. So for example
print(twoNumberSum([3, 5, -4, 8, 11, 1, -1, 6],10)) should return [11,-1]
def twoNumberSum(array,targetSum):
for i in array:
remainder = targetSum - i
if (remainder != i) and (remainder in array):
return [i,remainder]
return []
The code works but it is said to execute in O(n) time. My intuition is this - we first loop through the array and choose a number. For each number, we find the remainder. Using each remainder, we again loop through the entire array. Shouldn't this be an O(n^2) operation? Is the in operation in python not an O(n) operation?
The in operation will have different complexities based on the type of container it is referred to. Here i in array will become array.__contains__(i) and it is referred to a list type container.
(list, tuple) as you guessed are O(n).
Trees would be average O(log n).
set/dict - Average: O(1), Worst: O(n).
See this Document if you have any further queries.
Take a look at this. For the case of an list, it makes no sense that this will take less than O(n^2) time. The outer loop takes O(n) time, and for each iteration O(n) time to check if the element is present or not.
If instead of a list, you use a dict, then the in operation is O(1). Then I could say that the whole of this code takes linear time.

Filtering a list to take its highest values

I have a program which works with a for loop, but it's too slow, and I need to speed it up.
I have a reverse-sorted list of probabilities whose sum is 1. There are over 5 million items.
I want to take the highest probabilities, i.e. the first n items whose collective sum is 0.9999.
This was my code:
for b in sorted_list:
new_list.append(b)
if sum(new_list) > 0.9999:
break
Can anyone suggest a quicker method?
Thank you
Edit: I found that this question was asked before - stackexchange link
however, the suggestions all make use of loops so I don't think they will be any quicker. Someone at the end suggested a list comprehension. So I am going to google that and see what that means! Thank you
Keep a running sum instead of recomputing it every step for the whole list. I.e.
running_sum = 0
for b in sorted_list:
new_list.append(b)
running_sum += b
if running_sum > 0.9999:
break
sum(iterable) has to visit all elements to calculate the sum. That is unnecessary as you can reuse the sum from the previous iteration.
The built-in tool to accumulate such a sum is, well, itertools.accumulate. Moreover, you don't have to append repeatedly. Instead, you can take a single slice at the end:
from itertools import accumulate
for i, s in enumerate(accumulate(sorted_list)):
if s > 0.9999:
break
new_list = sorted_list[:i+1]

Reasoning behind Quadratic Complexity in this particular code

I am asking in reference to this code:
array = [-37,-36,-19,-99,29,20,3,-7,-64,84,36,62,26,-76,55,-24,84,49,-65,41]
print sum(i for i in array if array.index(i) % 2 == 0)*array[-1] if array != [] else 0
You could see it here: Python code for sum with condition
Is this Quadratic because of a for loop followed by an if statement, inside the brackets?
Is this code, proposed by one more person on the same page - array[-1] * sum(array[::2]) free of Quadratic behaviour?
I think, it's again a Quadratic one, as it has to do a traversal and that too an alternate one.
Thanks in advance.
Yes, it's the array.index that makes it quadratic.
Let's first cut all irrelevant stuff away. The conditionals are for the complexity reasoning irrelevant (we will have array != [] and that check takes O(1) time). The same goes with the multiplication with array[-1]. So you're left with:
sum(i for i in array if array.index(i) % 2 == 0)
Now the inner is a generator and it would expand to an annonymous function looping through array and yielding a bunch of values, at most one per iteration. The sum function receives these values and adds them up.
The confusing thing maybe how a generator actually works. It actually works by running the generator intermixed with code from the consumer. This results in the complexity is the sum of the complexity of the generator and the consumer (ie sum). Now sum has linear complexity (it should have, it would if I wrote it).
So for the generator, it loops through the array, but for each element in the array it calls array.index which is of O(N) complexity.
To fix this you might use enumerate to avoid having to call array.index(i), it may or may not be what you want to do since array.index(i) returns the first index for which the element is i which might not be the index where you actually found i:
sum(i for idx, i in enumerate(array) if idx % 2 == 0)
To see the difference consider the list array = [0, 1, 2, 2], the first solution should sum up this to 4 since array.index(2) == 2 so it would also add the second 2. The later solution however will add up to 2 since enumerate will enumerate the elements in array, yielding pairs (0,0), (1,1), (2,2), (3,2) - where the first component is the actual index while the second is the actual element. Here the second 2 is omitted because it's actually got from index 3.
The first solution is indeed quadratic: when you call array.index you basically iterates each time on array again, so it behaves like an embedded loop.
The second solution traverses the list only one time, skipping each odd indexes.

Fastest way to return duplicate element in list and also find missing element in list?

So my code is as shown below. Input is a list with exactly one duplicate item and one missing item.The answer is a list of two elements long ,first of which is the duplicate element in the list and second the missing element in the list in the range 1 to n.
Example =[1,4,2,5,1] answer=[1,3]
The code below works.
Am , I wrong about the complexity being O(n) and is there any faster way of achieving this in Python?
Also, is there any way I can do this without using extra space.
Note:The elements may be of the order 10^5 or larger
n = max(A)
answer = []
seen = set()
for i in A:
if i in seen:
answer.append(i)
else:
seen.add(i)
for i in xrange(1,n):
if i not in A:
answer.append(i)
print ans
You are indeed correct the complexity of this algorithm is O(n), which is the best you can achieve. You can try to optimize it by aborting the search as soon as you finish the duplicate value. But worst case your duplicate is at the back of the list and you still need to traverse it completely.
The use of hashing (your use of a set) is a good solution. There are a lot other approaches, for instance the use of Counters. But this won't change the assymptotic complexity of the algorithm.
As #Emisor advices, you can leverage the information that you have a list with 1 duplicate and 1 missing value. As you might know if you would have a list with no duplicate and no missing value, summing up all elements of the list would result in 1+2+3+..+n, which can be rewritten in the mathematical equivalent (n*n+1)/2
When you've discovered the duplicate value, you can calculate the missing value, without having to perform:
for i in xrange(1,n):
if i not in A:
answer.append(i)
Since you know the sum if all values would be present: total = (n*n+1)/2) = 15, and you know which value is duplicated. By taking the sum of the array A = [1,4,2,5,1] which is 13 and removing the duplicated value 1, results in 12.
Taking the calculated total and subtracting the calculated 12from it results in 3.
This all can be written in a single line:
(((len(A)+1)*(len(A)+2))/2)-sum(A)-duplicate
Slight optimization (i think)
def lalala2(A):
_max = 0
_sum = 0
seen = set()
duplicate = None
for i in A:
_sum += i
if _max < i:
_max = i
if i in seen:
duplicate = i
elif duplicate is None:
seen.add(i)
missing = -_sum + duplicate + (_max*(_max + 1)/2) # This last term means the sum of every number from 1 to N
return [duplicate , missing]
Looks a bit uglier, and i'm doing stuff like sum() and max() on my own instead of relying on Python's tools. But with this way, we only check every element once. Also, It'll stop adding stuff to the set once it's found the duplicate since it can calculate the missing element from it, once it knows the max

Python lists, dictionary optimization

I have been attending a couple of hackathons. I am beginning to understand that writing code is not enough. The code has to be optimized. That brings me to my question. Here are two questions that I faced.
def pairsum(numbers, k)
"""Write a function that returns two values in numbers whose sum is K"""
for i, j in numbers:
if i != j:
if i+j == k
return i, j
I wrote this function. And I was kind of stuck with optimization.
Next problem.
string = "ksjdkajsdkajksjdalsdjaksda"
def dedup(string):
""" write a function to remove duplicates in the variable string"""
output = []
for i in string:
if i not in output:
output.append(i)
These are two very simple programs that I wrote. But I am stuck with optimization after this. More on this, when we optimize code, how does the complexity reduce? Any pointers will help. Thanks in advance.
Knowing the most efficient Python idioms and also designing code that can reduce iterations and bail out early with an answer is a major part of optimization. Here are a few examples:
List list comprehensions and generators are usually fastest:
With a straightforward nested approach, a generator is faster than a for loop:
def pairsum(numbers, k):
"""Returns two unique values in numbers whose sum is k"""
return next((i, j) for i in numbers for j in numbers if i+j == k and i != j)
This is probably faster on average since it only goes though one iteration at most and does not check if a possible result is in numbers unless k-i != i:
def pairsum(numbers, k):
"""Returns two unique values in numbers whose sum is k"""
return next((k-i, i) for i in numbers if k-i != i and k-i in numbers)
Ouput:
>>> pairsum([1,2,3,4,5,6], 8)
(6, 2)
Note: I assumed numbers was a flat list since the doc string did not mention tuples and it makes the problem more difficult which is what I would expect in a competition.
For the second problem, if you are to create your own function as opposed to just using ''.join(set(s)) you were close:
def dedup(s):
"""Returns a string with duplicate characters removed from string s"""
output = ''
for c in s:
if c not in output:
output += c
return output
Tip: Do not use string as a name
You can also do:
def dedup(s):
for c in s:
s = c + s.replace(c, '')
return s
or a much faster recursive version:
def dedup(s, out=''):
s0, s = s[0], s.replace(s[0], '')
return dedup(s, n + s0) if s else out + s0
but not as fast as set for strings without lots of duplicates:
def dedup(s):
return ''.join(set(s))
Note: set() will not preserve the order of the remaining characters while the other approaches will preserve the order based on first occurrence.
Your first program is a little vague. I assume numbers is a list of tuples or something? Like [(1,2), (3,4), (5,6)]? If so, your program is pretty good, from a complexity standpoint - it's O(n). Perhaps you want a little more Pythonic solution? The neatest way to clean this up would be to join your conditions:
if i != j and i + j == k:
But this simply increases readability. I think it may also add an additional boolean operation, so it might not be an optimization.
I am not sure if you intended for your program to return the first pair of numbers which sum to k, but if you wanted all pairs which meet this requirement, you could write a comprehension:
def pairsum(numbers, k):
return list(((i, j) for i, j in numbers if i != j and i + j == k))
In that example, I used a generator comprehension instead of a list comprehension so as to conserve resources - generators are functions which act like iterators, meaning that they can save memory by only giving you data when you need it. This is called lazy iteration.
You can also use a filter, which is a function which returns only the elements from a set for which a predicate returns True. (That is, the elements which meet a certain requirement.)
import itertools
def pairsum(numbers, k):
return list(itertools.ifilter(lambda t: t[0] != t[1] and t[0] + t[1] == k, ((i, j) for i, j in numbers)))
But this is less readable in my opinion.
Your second program can be optimized using a set. If you recall from any discrete mathematics you may have learned in grade school or university, a set is a collection of unique elements - in other words, a set has no duplicate elements.
def dedup(mystring):
return set(mystring)
The algorithm to find the unique elements of a collection is generally going to be O(n^2) in time if it is O(1) in space - if you allow yourself to allocate more memory, you can use a Binary Search Tree to reduce the time complexity to O(n log n), which is likely how Python sets are implemented.
Your solution took O(n^2) time but also O(n) space, because you created a new list which could, if the input was already a string with only unique elements, take up the same amount of space - and, for every character in the string, you iterated over the output. That's essentially O(n^2) (although I think it's actually O(n*m), but whatever). I hope you see why this is. Read the Binary Search Tree article to see how it improves your code. I don't want to re-implement one again... freshman year was so grueling!
The key to optimization is basically to figure out a way to make the code do less work, in terms of the total number of primitive steps that needs to be performed. Code that employs control structures like nested loops quickly contributes to the number of primitive steps needed. Optimization is therefore often about replacing loops iterating over the a full list with something more clever.
I had to change the unoptimized pairsum() method sligtly to make it usable:
def pairsum(numbers, k):
"""
Write a function that returns two values in numbers whose sum is K
"""
for i in numbers:
for j in numbers:
if i != j:
if i+j == k:
return i,j
Here we see two loops, one nested inside the other. When describing the time complexity of a method like this, we often say that it is O(n²). Since when the length of the numbers array passed in grows proportional to n, then the number of primitive steps grows proportional to n². Specifically, the i+j == k conditional is evaluated exactly len(number)**2 times.
The clever thing we can do here is to presort the array at the cost of O(n log(n)) which allows us to hone in on the right answer by evaluating each element of the sorted array at most one time.
def fast_pairsum(numbers, k):
sortedints = sorted(numbers)
low = 0
high = len(numbers) - 1
i = sortedints[0]
j = sortedints[-1]
while low < high:
diff = i + j - k
if diff > 0:
# Too high, let's lower
high -= 1
j = sortedints[high]
elif diff < 0:
# Too low, let's increase.
low += 1
i = sortedints[low]
else:
# Just right
return i, j
raise Exception('No solution')
These kinds of optimization only begin to really matter when the size of the problem becomes large. On my machine the break-even point between pairsum() and fast_pairsum() is with a numbers array containing 13 integers. For smaller arrays pairsum() is faster, and for larger arrays fast_pairsum() is faster. As the size grows fast_pairsum() becomes drastically faster than the unoptimized pairsum().
The clever thing to do for dedup() is to avoid having to linearly scan through the output list to find out if you've already seen a character. This can be done by storing information about which characters you've seen in a set, which has O(log(n)) look-up cost, rather than the O(n) look-up cost of a regular list.
With the outer loop, the total cost becomes O(n log(n)) rather than O(n²).
def fast_dedup(string):
# if we didn't care about the order of the characters in the
# returned string we could simply do
# return set(string)
seen = set()
output = []
seen_add = seen.add
output_append = output.append
for i in string:
if i not in seen:
seen_add(i)
output_append(i)
return output
On my machine the break-even point between dedup() and fast_dedup() is with a string of length 30.
The fast_dedup() method also shows another simple optimization trick: Moving as much of the code out of the loop bodies as possible. Since looking up the add() and append() members in the seen and output objects takes time, it is cheaper to do it once outside the loop bodies and store references to the members in variables that is used repeatedly inside the loop bodies.
To properly optimize Python, one needs to find a good algorithm for the problem and a Python idiom close to that algorithm. Your pairsum example is a good case. First, your implementation appears wrong — numbers is most likely a sequence of numbers, not a sequence of pairs of numbers. Thus a naive implementation would look like this:
def pairsum(numbers, k)
"""Write a function that returns two values in numbers whose sum is K"""
for i in numbers:
for j in numbers:
if i != j and i + j != k:
return i, j
This will perform n^2 iterations, n being the length of numbers. For small ns this is not a problem, but once n gets into hundreds, the nested loops will become visibly slow, and once n gets into thousands, they will become unusable.
An optimization would be to recognize the difference between the inner and the outer loops: the outer loop traverses over numbers exactly once, and is unavoidable. The inner loop, however, is only used to verify that the other number (which has to be k - i) is actually present. This is a mere lookup, which can be made extremely fast by using a dict, or even better, a set:
def pairsum(numbers, k)
"""Write a function that returns two values in numbers whose sum is K"""
numset = set(numbers)
for i in numbers:
if k - i in numset:
return i, k - i
This is not only faster by a constant because we're using a built-in operation (set lookup) instead of a Python-coded loop. It actually does less work because set has a smarter algorithm of doing the lookup, it performs it in constant time.
Optimizing dedup in the analogous fashion is left as an excercise for the reader.
Your string one, order preserving is most easily and should be fairly efficient written as:
from collections import OrderedDict
new_string = ''.join(OrderedDict.fromkeys(old_string))

Categories

Resources