Solving the "firstDuplicate" question in Python

Solving the "firstDuplicate" question in Python - python

I'm trying to solve the following challenge from codesignal.com:
Given an array a that contains only numbers in the range from 1 to a.length, find the first duplicate number for which the second occurrence has the minimal index. In other words, if there are more than 1 duplicated numbers, return the number for which the second occurrence has a smaller index than the second occurrence of the other number does. If there are no such elements, return -1.
Example
For a = [2, 1, 3, 5, 3, 2], the output should be
firstDuplicate(a) = 3.
There are 2 duplicates: numbers 2 and 3. The second occurrence of 3 has a smaller index than the second occurrence of 2 does, so the answer is 3.
For a = [2, 4, 3, 5, 1], the output should be
firstDuplicate(a) = -1.
The execution time limit is 4 seconds.
The guaranteed constraints were:
1 ≤ a.length ≤ 10^5, and
1 ≤ a[i] ≤ a.length
So my code was:
def firstDuplicate(a):
b = a
if len(list(set(a))) == len(a):
return -1
n = 0
answer = -1
starting_distance = float("inf")
while n!=len(a):
value = a[n]
if a.count(value) > 1:
place_of_first_number = a.index(value)
a[place_of_first_number] = 'string'
place_of_second_number = a.index(value)
if place_of_second_number < starting_distance:
starting_distance = place_of_second_number
answer = value
a=b
n+=1
if n == len(a)-1:
return answer
return answer
Out of the 22 tests the site had, I passed all of them up to #21, because the test list was large and the execution time exceeded 4 seconds. What are some tips for reducing the execution time, while keeping the the code more or less the same?

As #erip has pointed out in the comments, you can iterate through the list, add items to a set, and if the item is already in a set, it is a duplicate that has the lowest index, so you can simply return the item; or return -1 if you get to the end of the loop without finding a duplicate:
def firstDuplicate(a):
seen = set()
for i in a:
if i in seen:
return i
seen.add(i)
return -1

Create a new set and find its already in the new list, if its there return the element:
def firstDuplicate(a):
dup = set()
for i in range(len(a)):
if a[i] in dup:
return a[i]
else:
dup.add(a[i])
return -1

This is just an idea, I didn't verify it but it should work. It seems there's no memory limit but just a time limit. Therefore using space to trade time is probably a practical way to do this. The computation complexity is O(n). This algorithm also depends on the condition that the number range is between 1 to len(a).
def first_duplicate(a):
len_a = len(a)
b = [len_a + 1] * len_a
for i, n in enumerate(a):
n0 = n - 1
if b[n0] == len_a + 1:
b[n0] = len_a
elif b[n0] == len_a:
b[n0] = i
min_i = len_a
min_n = -1
for n0, i in enumerate(b):
if i < min_i:
min_i = i
min_n = n0 + 1
return min_n
Update:
This solution is not as fast as the set() solution by #blhsing. However, it may not be the same if it was implemented in C - it's kinda unfair since set() is a built-in function which was implemented in C as other core functions of CPython.

Related

Python implementation of the mergeSort algorithm

I came across the following implementation of the mergeSort algorithm:
def merge_sort(x):
merge_sort2(x,0,len(x)-1)
def merge_sort2(x,first,last):
if first < last:
middle = (first + last) // 2
merge_sort2(x,first,middle)
merge_sort2(x,middle+1,last)
merge(x,first,middle,last)
def merge(x,first,middle,last):
L = x[first:middle+1]
R = x[middle+1:last+1]
L.append(999999999)
R.append(999999999)
i=j=0
for k in range(first,last+1):
if L[i] <= R[j]:
x[k] = L[i]
i += 1
else:
x[k] = R[j]
j += 1
x = [17, 87, 6, 22, 41, 3, 13, 54]
x_sorted = merge_sort(x)
print(x)
I get most of it. However, what I don't understand are the following four lines of the merge function:
L = x[first:middle+1]
R = x[middle+1:last+1]
L.append(999999999)
R.append(999999999)
First of all: why does the slicing end with middle+1 ? Slicing an array in Python includes the last element, right? So, shouldn't it be sufficient to slice from first:middle ? So, what is the +1 there for?
Secondly: Why do I have to append the huge number to the lists? Why doesn't it work without? It doesn't, I checked that. But I just don't know why.

Q1: Slicing an array in Python includes the last element, right?
No, Like range function Python slicing doesn't include the last element.
> a=[1,2,3,4,5]
> a[1:4]
[2, 3, 4]
Q2: Regarding the below snippet.
L = x[first:middle+1]
R = x[middle+1:last+1]
L.append(999999999)
R.append(999999999)
Without appending those large numbers to the lists, your merge code could have been different something like below.
# Copy data to temp arrays L[] and R[]
while i < len(L) and j < len(R):
if L[i] <= R[j]:
x[k] = L[i]
i += 1
else:
x[k] = R[j]
j += 1
# Checking if any element was left
while i < len(L):
x[k] = L[i]
i+=1
k+=1
while j < len(R):
x[k] = R[j]
j+=1
k+=1
As #Cedced_Bro pointed out in the comment section, those largest numbers are used to know that the end of one of the sides has been reached.
If you observe the above code snippet, if we run out of numbers in one list we ideally get out of the for loop and inserts the remaining elements of other lists in the temp array if any.
Appending those large numbers is an intelligent way to avoid those two for loops. But it has some cost of unnecessary comparison of 999999999 with remaining elements in the other list.

You don't really need the spaghetti-style nested function, simply recur would do, from https://rosettacode.org/wiki/Sorting_algorithms/Merge_sort#Python
from heapq import merge
def merge_sort(m):
if len(m) <= 1:
return m
middle = len(m) // 2
left = m[:middle]
right = m[middle:]
left = merge_sort(left)
right = merge_sort(right)
return list(merge(left, right))
The indexing shouldn't have +1 since Python slices don't overlap if they are the same index, i.e.
>>> x = [1,2,3,4,5,6]
>>> middle = 4
>>> x[:middle]
[1, 2, 3, 4]
>>> x[middle:]
[5, 6]
Moreover the heapq implementation of merge would have been more optimal than what you can write =)

Count all pairs with given XOR

Given a list of size N. Find the number of pairs (i, j) such that A[i] XOR A[j] = x, and 1 <= i < j <= N.
Input : list = [3, 6, 8, 10, 15, 50], x = 5
Output : 2
Explanation : (3 ^ 6) = 5 and (10 ^ 15) = 5
This is my code (brute force):
import itertools
n=int(input())
pairs=0
l=list(map(int,raw_input().split()))
q=[x for x in l if x%2==0]
p=[y for y in l if y%2!=0]
for a, b in itertools.combinations(q, 2):
if (a^b!=2) and ((a^b)%2==0) and (a!=b):
pairs+=1
for a, b in itertools.combinations(p, 2):
if (a^b!=2) and ((a^b)%2==0) and (a!=b):
pairs+=1
print pairs
how to do this more efficiently in a complexity of O(n) in python?

Observe that if A[i]^A[j] == x, this implies that A[i]^x == A[j] and A[j]^x == A[i].
So, an O(n) solution would be to iterate through an associate map (dict) where each key is an item from A and each value is the respective count of the item. Then, for each item, calculate A[i]^x, and see if A[i]^x is in the map. If it is in the map, this implies that A[i]^A[j] == x for some j. Since we have a map with the count of all items that equal A[j], the total number of pairs will be num_Ai * num_Aj. Note that each element will be counted twice since XOR is commutative (i.e. A[i]^A[j] == A[j]^A[i]), so we have to divide the final count by 2 since we've double counted each pair.
def create_count_map(lst):
result = {}
for item in lst:
if item in result:
result[item] += 1
else:
result[item] = 1
return result
def get_count(lst, x):
count_map = create_count_map(lst)
total_pairs = 0
for item in count_map:
xor_res = item ^ x
if xor_res in count_map:
total_pairs += count_map[xor_res] * count_map[item]
return total_pairs // 2
print(get_count([3, 6, 8, 10, 15, 50], 5))
print(get_count([1, 3, 1, 3, 1], 2))
outputs
2
6
as desired.
Why is this O(n)?
Converting a list to a dict s.t. the dict contains the count of each item in the list is O(n) time.
Calculating item ^ x is O(1) time, and calculating whether this result is in a dict is also O(1) time. dict key access is also O(1), and so is multiplication of two elements. We do all this n times, hence O(n) time for the loop.
O(n) + O(n) reduces to O(n) time.
Edited to handle duplicates correctly.

The accepted answer is not giving the correct result for X=0. This code handles that minute error. You can modify it to get answers for other values as well.
def calculate(a) :
# Finding the maximum of the array
maximum = max(a)
# Creating frequency array
# With initial value 0
frequency = [0 for x in range(maximum + 1)]
# Traversing through the array
for i in a :
# Counting frequency
frequency[i] += 1
answer = 0
# Traversing through the frequency array
for i in frequency :
# Calculating answer
answer = answer + i * (i - 1) // 2
return answer

Find Triplets smaller than a given number

I am trying to solve a problem where:
Given an array of n integers nums and a target, find the number of
index triplets i, j, k with 0 <= i < j < k < n that satisfy the
condition nums[i] + nums[j] + nums[k] < target.
For example, given nums = [-2, 0, 1, 3], and target = 2.
Return 2. Because there are two triplets which sums are less than 2:
[-2, 0, 1] [-2, 0, 3]
My algorithm: Remove a single element from the list, set target = target - number_1, search for doublets such that number_1 + number _2 < target - number_1. Problem solved.
The problem link is https://leetcode.com/problems/3sum-smaller/description/ .
My solution is:
def threeSumSmaller(nums, target):
"""
:type nums: List[int]
:type target: int
:rtype: int
"""
nums = sorted(nums)
smaller = 0
for i in range(len(nums)):
# Create temp array excluding a number
if i!=len(nums)-1:
temp = nums[:i] + nums[i+1:]
else:
temp = nums[:len(nums)-1]
# Sort the temp array and set new target to target - the excluded number
l, r = 0, len(temp) -1
t = target - nums[i]
while(l<r):
if temp[l] + temp[r] >= t:
r = r - 1
else:
smaller += 1
l = l + 1
return smaller
My solution fails:
Input:
[1,1,-2]
1
Output:
3
Expected:
1
I am not getting why is the error there as my solution passes more than 30 test cases.
Thanks for your help.

One main point is that when you sort the elements in the first line, you also lose the indexes. This means that, despite having found a triplet, you'll never be sure whether your (i, j, k) will satisfy condition 1, because those (i, j, k) do not come from the original list, but from the new one.
Additionally: everytime you pluck an element from the middle of the array, the remaining part of the array is also iterated (although in an irregular way, it still starts from the first of the remaining elements in tmp). This should not be the case! I'm expanding details:
The example iterates 3 times over the list (which is, again, sorted and thus you lose the true i, j, and k indexes):
First iteration (i = 0, tmp = [1, -2], t = 0).
When you sum temp[l] + temp[r] (l, r are 0, 1) it will be -1.
It satisfies being lower than t. smaller will increase.
The second iteration will be like the first, but with i = 1.
Again it will increase.
The third one will increase as well, because t = 3 and the sum will be 2 now.
So you'll count the value three times (despite only one tuple can be formed in order of indexes) because you are iterating through the permutations of indexes instead of combinations of them. So those two things you did not take care about:
Preserving indexes while sorting.
Ensuring you iterate the indexes in a forward-fashion only.
Try like this better:
def find(elements, upper_bound):
result = 0
for i in range(0, len(elements) - 2):
upper_bound2 = upper_bound - elements[i]
for j in range(i+1, len(elements) - 1):
upper_bound3 = upper_bound2 - elements[j]
for k in range(j+1, len(elements)):
upper_bound4 = upper_bound3 - elements[k]
if upper_bound4 > 0:
result += 1
return result

Seems like you're counting the same triplet more than once...
In the first iteration of the loop, you omit the first 1 in the list, and then increase smaller by 1. Then you omit the second 1 in the list and increase smaller again by 1. And finally you omit the third element in the list, -2, and of course increase smaller by 1, because -- well -- in all these three cases you were in fact considering the same triplet {1,1,-2}.
p.s. It seems like you care more about correctness than performance. In that case, consider maintaining a set of the solution triplets, to ensure you're not counting the same triplet twice.

There are already good answers , Apart that , If you want to check your algorithm result then you can take help of this in-built funtion :
import itertools
def find_(vector_,target):
result=[]
for i in itertools.combinations(vector_, r=3):
if sum(i)<target:
result.append(i)
return result
output:
print(find_([-2, 0, 1, 3],2))
output:
[(-2, 0, 1), (-2, 0, 3)]
if you want only count then:
print(len(find_([-2, 0, 1, 3],2)))
output:
2

Code challenge: finding the divisible in a list

I am playing a code challenge. Simply speaking, the problem is:
Given a list L (max length is of the order of 1000) containing positive integers.
Find the number of "Lucky Triples", which is L[i] divides L[j], and L[j] divides L[k].
for example, [1,2,3,4,5,6] should give the answer 3 because [1,2,4], [1,2,6],[1,3,6]
My attempt:
Sort the list. (let say there are n elements)
3 For loops: i, j, k (i from 1 to n-2), (j from i+1 to n-1), (k from j+1 to n)
only if L[j] % L[i] == 0, the k for loop will be executed
The algorithm seems to give the correct answer. But the challenge said that my code exceeded the time limit. I tried on my computer for the list [1,2,3,...,2000], count = 40888(I guess it is correct). The time is around 5 second.
Is there any faster way to do that?
This is the code I have written in python.
def answer(l):
l.sort()
cnt = 0
if len(l) == 2:
return cnt
for i in range(len(l)-2):
for j in range(1,len(l)-1-i):
if (l[i+j]%l[i] == 0):
for k in range(1,len(l)-j-i):
if (l[i+j+k]%l[i+j] == 0):
cnt += 1
return cnt

You can use additional space to help yourself. After you sort the input list you should make a map/dict where the key is each element in the list and value is a list of elements which are divisible by that in the list so you would have something like this
assume sorted list is list = [1,2,3,4,5,6] your map would be
1 -> [2,3,4,5,6]
2-> [4,6]
3->[6]
4->[]
5->[]
6->[]
now for every key in the map you find what it can divide and then you find what that divides, for example you know that
1 divides 2 and 2 divides 4 and 6, similarly 1 divides 3 and 3 divides 6
the complexity of sorting should be O(nlogn) and that of constructing the list should be better than O(n^2) (but I am not sure about this part) and then I am not sure about the complexity of when you are actually checking for multiples but I think this should be much much faster than a brute force O(n^3)
If someone could help me figure out the time complexity of this I would really appreciate it
EDIT :
You can make the map creation part faster by incrementing by X (and not 1) where X is the number in the list you are currently on since it is sorted.

Thank you guys for all your suggestions. They are brilliant. But it seems that I still can't pass the speed test or I cannot handle with duplicated elements.
After discussing with my friend, I have just come up with another solution. It should be O(n^2) and I passed the speed test. Thanks all!!
def answer(lst):
lst.sort()
count = 0
if len(lst) == 2:
return count
#for each middle element, count the divisors at the front and the multiples at the back. Then multiply them.
for i, middle in enumerate(lst[1:len(lst)-1], start = 1):
countfirst = 0
countthird = 0
for first in (lst[0:i]):
if middle % first == 0:
countfirst += 1
for third in (lst[i+1:]):
if third % middle == 0:
countthird += 1
count += countfirst*countthird
return count

I guess sorting the list is pretty inefficient. I would rather try to iteratively reduce the number of candidates. You could do that in two steps.
At first filter all numbers that do not have a divisor.
from itertools import combinations
candidates = [max(pair) for pair in combinations(l, 2) if max(pair)%min(pair) == 0]
After that, count the number of remaining candidates, that do have a divisor.
result = sum(max(pair)%min(pair) == 0 for pair in combinations(candidates, 2))

Your original code, for reference.
def answer(l):
l.sort()
cnt = 0
if len(l) == 2:
return cnt
for i in range(len(l)-2):
for j in range(1,len(l)-1-i):
if (l[i+j]%l[i] == 0):
for k in range(1,len(l)-j-i):
if (l[i+j+k]%l[i+j] == 0):
cnt += 1
return cnt
There are a number of misimplementations here, and with just a few tweaks we can probably get this running much faster. Let's start:
def answer(lst): # I prefer not to use `l` because it looks like `1`
lst.sort()
count = 0 # use whole words here. No reason not to.
if len(lst) == 2:
return count
for i, first in enumerate(lst):
# using `enumerate` here means you can avoid ugly ranges and
# saves you from a look up on the list afterwards. Not really a
# performance hit, but definitely looks and feels nicer.
for j, second in enumerate(lst[i+1:], start=i+1):
# this is the big savings. You know since you sorted the list that
# lst[1] can't divide lst[n] if n>1, but your code still starts
# searching from lst[1] every time! Enumerating over `l[i+1:]`
# cuts out a lot of unnecessary burden.
if second % first == 0:
# see how using enumerate makes that look nicer?
for third in lst[j+1:]:
if third % second == 0:
count += 1
return count
I bet that on its own will pass your speed test, but if not, you can check for membership instead. In fact, using a set here is probably a great idea!
def answer2(lst):
s = set(lst)
limit = max(s) # we'll never have a valid product higher than this
multiples = {} # accumulator for our mapping
for n in sorted(s):
max_prod = limit // n # n * (max_prod+1) > limit
multiples[n] = [n*k for k in range(2, max_prod+1) if n*k in s]
# in [1,2,3,4,5,6]:
# multiples = {1: [2, 3, 4, 5, 6],
# 2: [4, 6],
# 3: [6],
# 4: [],
# 5: [],
# 6: []}
# multiples is now a mapping you can use a Depth- or Breadth-first-search on
triples = sum(1 for j in multiples
for k in multiples.get(j, [])
for l in multiples.get(k, []))
# This basically just looks up each starting value as j, then grabs
# each valid multiple and assigns it to k, then grabs each valid
# multiple of k and assigns it to l. For every possible combination there,
# it adds 1 more to the result of `triples`
return triples

I'll give you just an idea, the implementation should be up to you:
Initialize the global counter to zero.
Sort the list, starting with smallest number.
Create a list of integers (one entry per number with same index).
Iterate through each number (index i), and do the following:
Check for dividers at positions 0 to i-1.
Store the number of dividers in the list at the position i.
Fetch the number of dividers from the list for each divider, and add each number to the global counter.
Unless you finished, go to 3rd.
Your result should be in the global counter.

python - checking if an array consisting of N integers is a permutation

I am analyzing the routine which checks if an array of N integers is a permutation (sequence containing each element from 1 to N).
I am new to python. I can't grasp how this routine gets the correct answer. Could anybody explain the logic behind the loop? especially the use of the counter[element-1].
Is the counter a built-in function working on every element of A? does the counter[element-1] reference position/value of elements of A by default because the loop is defined on an array?
A=[4,1,3,2]
def solution(A):
counter = [0]*len(A)
limit = len(A)
for element in A:
if not 1 <= element <= limit:
return 0
else:
if counter[element-1] != 0:
return 0
else:
counter[element-1] = 1
return 1
Update:
I modified the code to see the values used within the loop, for example
def solution(A):
counter = [0]*len(A)
limit = len(A)
for element in A:
if not 1 <= element <= limit:
print element
print 'outside'
return 0
else:
if counter[element-1] != 0:
print 'element %d' % element
print [element-1]
print counter[element-1]
return 0
else:
counter[element-1] = 1
print 'element %d' % element
print [element-1]
print counter[element-1]
return 1
gives me
element 4
[3]
1
element 1
[0]
1
element 3
[2]
1
element 2
[1]
1
1
I still don't get the logic. For example fot the first element, why [3] gives 1?

The idea behind the code is twofold. A permutation of the list [1, 2, ..., N] has two properties. It has only elements between 1 and N and each element just appear one time in the list.
I will try explain it to you part by part this idea in the code.
def solution(A):
counter = [0]*len(A)
limit = len(A)
Assume as an example, a list [1, 3, 2].
counter is initialized as a list of zeros of size len(A) = 3. Each 0 correspond to one of the elements of the list
for element in A:
if not 1 <= element <= limit:
return 0
This part condition is the most easy one. If the element is not in this range, the list cannot be a permutation of [1, 2,...N]. For instance, [1, 3, 2] is a permutation of [1, 2, 3] but [1, 6, 2] is not.
else:
if counter[element-1] != 0:
return 0
else:
counter[element-1] = 1
This next part is related with the uniqueness of each term. The if checks if a number = element has already passed through this loop. The second else make sure that this number is marked, so if a repeated number is found in the next iterations, the if will be true and return 0.
For instance, for the list [1, 2, 2]. The first 2 would not trigger the if, while the second 2 would trigger it, returning 0. On the other hand, [1, 3, 2], would never trigger the if.
If all the number pass this conditions, the two properties were true and the list is a permutation.

Quite a cunning algorithm actually.
The input is a sequence of length N.
Each element of input is presumed to be an integer (if not, either comparison or indexing will throw an exception).
counter is an array of flags - of length N, too.
No integers outside of [1,N] range are allowed
No duplicates are allowed (see how it's done)
Can you now prove that the only way for both conditions to stay true is for the sequence to be a permutation?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Solving the "firstDuplicate" question in Python - python

Create a new set and find its already in the new list, if its there return the element: def firstDuplicate(a): dup = set() for i in range(len(a)): if a[i] in dup: return a[i] else: dup.add(a[i]) return -1

Related

Python implementation of the mergeSort algorithm

Count all pairs with given XOR

Find Triplets smaller than a given number

Code challenge: finding the divisible in a list

python - checking if an array consisting of N integers is a permutation

Categories

Resources