Efficiently find number of pairs of duplicates

Efficiently find number of pairs of duplicates - python

I am trying to find an algorithm that returns the number of pairs of duplicates in a list.
Example:
Input: [13,4,8,4,13,7,13,9,13]
Output: 7
(4 13's comes out to 6 pairs and two 4's comes out to 1 pair )
Can my algorithm become more efficient? I would like it to be faster than Theta(n^2)
Here is what I have:
my_List=[13,3,8,3,13,7,13,9,13]
pairs=0
alreadySeen=[]
for element in my_List:
howMany=0
if element in alreadySeen:
False
else:
howMany=my_List.count(element)
pairs=pairs+((howMany*(howMany-1))/2)
howMany=0
alreadySeen.append(element)
print(pairs)

Here is an algorithm that runs in O(N).
Iterate over the elements once to create a dict of each element and its count.
The output of this step for your example is {13: 4, 4:2, 8:1, ...}
Iterate over that dict and calculate the number of pairs for each element. The number of pairs for each element can be thought of as selecting 2 items from a list of N elements. This could be done by calculating the combinations without repetitions using the formula (N * (N-1)) / 2. So for 4 elements, there are (4 * 3) / 2 = 6 pairs.

#Hesham Attia already provided the correct algorithm, here's simple implementation with Counter:
>>> from collections import Counter
>>> l = [13,4,8,4,13,7,13,9,13]
>>> sum(x * (x - 1) // 2 for x in Counter(l).values())
7

Here is a javascript code, you can convert this to phython code, the complexity is linear ~ O(n)
var arr = [13,3,8,3,13,7,13,9,13];
var count = {};
var totalpairs =0;
for(var i=0;i<arr.length;i++){
if(count[arr[i]]){
count[arr[i]]++;
}else{
count[arr[i]] =1;
}
}
for(key in count){
if(count[key] %2 == 0){
totalpairs = totalpairs + count[key]/2;
}
}
console.log(' total pairs are '+ totalpairs);

Here is a simple and efficient way of finding all possible duplicate pairs in a list with time complexity ~ O(N).
l = [13,3,8,3,13,7,13,9,13]
# Two pairs of 13 and One pair of 3
# Sum equals to Three
alreadySeen = []
total_no_of_pairs = 0
for i in range(len(l)):
if l[i] not in alreadySeen:
alreadySeen.append(l[i])
else:
# If element l[i] is present in alreadySeen list
# Indicates a Pair and increments count
# Remove element for creating a new pair
total_no_of_pairs +=1
alreadySeen.remove(l[i])
print(total_no_of_pairs)
Output:
3

Time Complexity : O(N)
arr = list(map(int,input().split()))
d = {}
for i in range(len(arr)):
if arr[i] in d.keys():
d[arr[i]] += 1
else:
d[arr[i]] = 1
ans = 0
for val in d.values():
if val > 1:
ans += val*(val-1)//2
print(ans)

Related

Count pairs of elements in an array whose sum equals a given sum (but) do it in a single iteration(!)

Given an array of integers, and a number ‘sum’, find the number of pairs of integers in the array whose sum is equal to given ‘sum’ in a SINGLE iteration. (O(n) Time complexity is not enough!).
Usually, I would iterate twice through the array once to create hashmap of frequencies and another to find the number of pairs as shown below
def getPairsCount(arr, n, sum):
m=defaultdict(int)
for i in range(0, n): #iteration NO. 1
m[arr[i]] += 1
twice_count = 0
for i in range(0, n): #iteration NO. 2
twice_count += m[sum - arr[i]]
if (sum - arr[i] == arr[i]):
twice_count -= 1
return int(twice_count / 2)
I was asked to do the same in a single iteration instead of two by an interviewer. I am at loss how to do it wihout breaking it at edge cases like {2,2,1,1} where required sum is 3.

A way is to build the hash map at the same time as you are consuming it (thereby only looping the list once). Thus, for each value in the array, check if you have seen the complement (the value needed for the sum) before. If so, you know you have a new pair, and you remove the complement from the seen values. Otherwise you do not have a sum and you add the value you have just seen.
In code this looks like follows:
from collections import defaultdict
def get_pairs_count(array, sum):
pairs_count = 0
seen_values = defaultdict(int)
for value in array:
complement = sum - value
if seen_values[complement] > 0:
pairs_count += 1
seen_values[complement] -= 1
else:
seen_values[value] += 1
return pairs_count

Another way:
def pair_sum2(arr, k):
if len(arr)<2:
return
seen=set()
output=set()
for num in arr:
target=k-num
print("target",target)
if target not in seen:
print("seen before add",seen)
seen.add(num)
print("seen",seen)
else:
output.add( (min(num, target), max(num, target)) )
print("op:",output)
print ('\n'.join( map(str, list(output)) ) )

Count all pairs with given XOR

Given a list of size N. Find the number of pairs (i, j) such that A[i] XOR A[j] = x, and 1 <= i < j <= N.
Input : list = [3, 6, 8, 10, 15, 50], x = 5
Output : 2
Explanation : (3 ^ 6) = 5 and (10 ^ 15) = 5
This is my code (brute force):
import itertools
n=int(input())
pairs=0
l=list(map(int,raw_input().split()))
q=[x for x in l if x%2==0]
p=[y for y in l if y%2!=0]
for a, b in itertools.combinations(q, 2):
if (a^b!=2) and ((a^b)%2==0) and (a!=b):
pairs+=1
for a, b in itertools.combinations(p, 2):
if (a^b!=2) and ((a^b)%2==0) and (a!=b):
pairs+=1
print pairs
how to do this more efficiently in a complexity of O(n) in python?

Observe that if A[i]^A[j] == x, this implies that A[i]^x == A[j] and A[j]^x == A[i].
So, an O(n) solution would be to iterate through an associate map (dict) where each key is an item from A and each value is the respective count of the item. Then, for each item, calculate A[i]^x, and see if A[i]^x is in the map. If it is in the map, this implies that A[i]^A[j] == x for some j. Since we have a map with the count of all items that equal A[j], the total number of pairs will be num_Ai * num_Aj. Note that each element will be counted twice since XOR is commutative (i.e. A[i]^A[j] == A[j]^A[i]), so we have to divide the final count by 2 since we've double counted each pair.
def create_count_map(lst):
result = {}
for item in lst:
if item in result:
result[item] += 1
else:
result[item] = 1
return result
def get_count(lst, x):
count_map = create_count_map(lst)
total_pairs = 0
for item in count_map:
xor_res = item ^ x
if xor_res in count_map:
total_pairs += count_map[xor_res] * count_map[item]
return total_pairs // 2
print(get_count([3, 6, 8, 10, 15, 50], 5))
print(get_count([1, 3, 1, 3, 1], 2))
outputs
2
6
as desired.
Why is this O(n)?
Converting a list to a dict s.t. the dict contains the count of each item in the list is O(n) time.
Calculating item ^ x is O(1) time, and calculating whether this result is in a dict is also O(1) time. dict key access is also O(1), and so is multiplication of two elements. We do all this n times, hence O(n) time for the loop.
O(n) + O(n) reduces to O(n) time.
Edited to handle duplicates correctly.

The accepted answer is not giving the correct result for X=0. This code handles that minute error. You can modify it to get answers for other values as well.
def calculate(a) :
# Finding the maximum of the array
maximum = max(a)
# Creating frequency array
# With initial value 0
frequency = [0 for x in range(maximum + 1)]
# Traversing through the array
for i in a :
# Counting frequency
frequency[i] += 1
answer = 0
# Traversing through the frequency array
for i in frequency :
# Calculating answer
answer = answer + i * (i - 1) // 2
return answer

Code challenge: finding the divisible in a list

I am playing a code challenge. Simply speaking, the problem is:
Given a list L (max length is of the order of 1000) containing positive integers.
Find the number of "Lucky Triples", which is L[i] divides L[j], and L[j] divides L[k].
for example, [1,2,3,4,5,6] should give the answer 3 because [1,2,4], [1,2,6],[1,3,6]
My attempt:
Sort the list. (let say there are n elements)
3 For loops: i, j, k (i from 1 to n-2), (j from i+1 to n-1), (k from j+1 to n)
only if L[j] % L[i] == 0, the k for loop will be executed
The algorithm seems to give the correct answer. But the challenge said that my code exceeded the time limit. I tried on my computer for the list [1,2,3,...,2000], count = 40888(I guess it is correct). The time is around 5 second.
Is there any faster way to do that?
This is the code I have written in python.
def answer(l):
l.sort()
cnt = 0
if len(l) == 2:
return cnt
for i in range(len(l)-2):
for j in range(1,len(l)-1-i):
if (l[i+j]%l[i] == 0):
for k in range(1,len(l)-j-i):
if (l[i+j+k]%l[i+j] == 0):
cnt += 1
return cnt

You can use additional space to help yourself. After you sort the input list you should make a map/dict where the key is each element in the list and value is a list of elements which are divisible by that in the list so you would have something like this
assume sorted list is list = [1,2,3,4,5,6] your map would be
1 -> [2,3,4,5,6]
2-> [4,6]
3->[6]
4->[]
5->[]
6->[]
now for every key in the map you find what it can divide and then you find what that divides, for example you know that
1 divides 2 and 2 divides 4 and 6, similarly 1 divides 3 and 3 divides 6
the complexity of sorting should be O(nlogn) and that of constructing the list should be better than O(n^2) (but I am not sure about this part) and then I am not sure about the complexity of when you are actually checking for multiples but I think this should be much much faster than a brute force O(n^3)
If someone could help me figure out the time complexity of this I would really appreciate it
EDIT :
You can make the map creation part faster by incrementing by X (and not 1) where X is the number in the list you are currently on since it is sorted.

Thank you guys for all your suggestions. They are brilliant. But it seems that I still can't pass the speed test or I cannot handle with duplicated elements.
After discussing with my friend, I have just come up with another solution. It should be O(n^2) and I passed the speed test. Thanks all!!
def answer(lst):
lst.sort()
count = 0
if len(lst) == 2:
return count
#for each middle element, count the divisors at the front and the multiples at the back. Then multiply them.
for i, middle in enumerate(lst[1:len(lst)-1], start = 1):
countfirst = 0
countthird = 0
for first in (lst[0:i]):
if middle % first == 0:
countfirst += 1
for third in (lst[i+1:]):
if third % middle == 0:
countthird += 1
count += countfirst*countthird
return count

I guess sorting the list is pretty inefficient. I would rather try to iteratively reduce the number of candidates. You could do that in two steps.
At first filter all numbers that do not have a divisor.
from itertools import combinations
candidates = [max(pair) for pair in combinations(l, 2) if max(pair)%min(pair) == 0]
After that, count the number of remaining candidates, that do have a divisor.
result = sum(max(pair)%min(pair) == 0 for pair in combinations(candidates, 2))

Your original code, for reference.
def answer(l):
l.sort()
cnt = 0
if len(l) == 2:
return cnt
for i in range(len(l)-2):
for j in range(1,len(l)-1-i):
if (l[i+j]%l[i] == 0):
for k in range(1,len(l)-j-i):
if (l[i+j+k]%l[i+j] == 0):
cnt += 1
return cnt
There are a number of misimplementations here, and with just a few tweaks we can probably get this running much faster. Let's start:
def answer(lst): # I prefer not to use `l` because it looks like `1`
lst.sort()
count = 0 # use whole words here. No reason not to.
if len(lst) == 2:
return count
for i, first in enumerate(lst):
# using `enumerate` here means you can avoid ugly ranges and
# saves you from a look up on the list afterwards. Not really a
# performance hit, but definitely looks and feels nicer.
for j, second in enumerate(lst[i+1:], start=i+1):
# this is the big savings. You know since you sorted the list that
# lst[1] can't divide lst[n] if n>1, but your code still starts
# searching from lst[1] every time! Enumerating over `l[i+1:]`
# cuts out a lot of unnecessary burden.
if second % first == 0:
# see how using enumerate makes that look nicer?
for third in lst[j+1:]:
if third % second == 0:
count += 1
return count
I bet that on its own will pass your speed test, but if not, you can check for membership instead. In fact, using a set here is probably a great idea!
def answer2(lst):
s = set(lst)
limit = max(s) # we'll never have a valid product higher than this
multiples = {} # accumulator for our mapping
for n in sorted(s):
max_prod = limit // n # n * (max_prod+1) > limit
multiples[n] = [n*k for k in range(2, max_prod+1) if n*k in s]
# in [1,2,3,4,5,6]:
# multiples = {1: [2, 3, 4, 5, 6],
# 2: [4, 6],
# 3: [6],
# 4: [],
# 5: [],
# 6: []}
# multiples is now a mapping you can use a Depth- or Breadth-first-search on
triples = sum(1 for j in multiples
for k in multiples.get(j, [])
for l in multiples.get(k, []))
# This basically just looks up each starting value as j, then grabs
# each valid multiple and assigns it to k, then grabs each valid
# multiple of k and assigns it to l. For every possible combination there,
# it adds 1 more to the result of `triples`
return triples

I'll give you just an idea, the implementation should be up to you:
Initialize the global counter to zero.
Sort the list, starting with smallest number.
Create a list of integers (one entry per number with same index).
Iterate through each number (index i), and do the following:
Check for dividers at positions 0 to i-1.
Store the number of dividers in the list at the position i.
Fetch the number of dividers from the list for each divider, and add each number to the global counter.
Unless you finished, go to 3rd.
Your result should be in the global counter.

Counting number of list entries that occur 1 time

I'm trying to write a Python function that counts the number of entries in a list that occur exactly once.
For example, given the list [17], this function would return 1. Or given [3,3,-22,1,-22,1,3,0], it would return 1.
** Restriction: I cannot import anything into my program.
The incorrect code that I've written so far: I'm going the double-loop route, but the index math is getting over-complicated.
def count_unique(x):
if len(x) == 1:
return 1
i = 0
j = 1
for i in range(len(x)):
for j in range(j,len(x)):
if x[i] == x[j]:
del x[j]
j+1
j = 0
return len(x)

Since you can't use collections.Counter or sorted/itertools.groupby apparently (one of which would usually be my go to solution, depending on whether the inputs are hashable or sortable), just simulate roughly the same behavior as a Counter, counting all elements and then counting the number of elements that appeared only once at the end:
def count_unique(x):
if len(x) <= 1:
return len(x)
counts = {}
for val in x:
counts[val] = counts.get(val, 0) + 1
return sum(1 for count in counts.values() if count == 1)

lst = [3,3,-22,1,-22,1,3,0]
len(filter(lambda z : z[0] == 1,
map(lambda x : (len(filter(lambda y : y == x, lst)), x), lst)))
sorry :)
Your solution doesn't work because you are doing something weird. Deleting things from a list while iterating through it, j+1 makes no sense etc. Try adding elements that are found to be unique to a new list and then counting the number of things in it. Then figure out what my solution does.
Here is the O(n) solution btw:
lst = [3,3,-22,1,-22,1,3,0,37]
cnts = {}
for n in lst:
if n in cnts:
cnts[n] = cnts[n] + 1
else:
cnts[n] = 1
count = 0
for k, v in cnts.iteritems():
if v == 1:
count += 1
print count

A more simple and understandable solution:
l = [3, 3, -22, 1, -22, 1, 3, 0]
counter = 0
for el in l:
if l.count(el) == 1:
counter += 1
It's pretty simple. You iterate over the items of the list. Then you look if the element is exactly one time in the list and then you add +1. You can improve the code (make liste comprehensions, use lambda expressions and so on), but this is the idea behind it all and the most understandable, imo.

you are making this overly complicated. try using a dictionary where the key is the element in your list. that way if it exists it will be unique
to add to this. it is probably the best method when looking at complexity. an in lookup on a dictionary is considered O(1), the for loop is O(n) so total your time complexity is O(n) which is desirable... using count() on a list element does a search on the whole list for every element which is basically O(n^2)... thats bad
from collections import defaultdict
count_hash_table = defaultdict(int) # i am making a regular dictionary but its data type is an integer
elements = [3,3,-22,1,-22,1,3,0]
for element in elements:
count_hash_table[element] += 1 # here i am using that default datatype to count + 1 for each type
print sum(c for c in count_hash_table.values() if c == 1):

There is method on lists called count.... from this you can go further i guess.
for example:
for el in l:
if l.count(el) > 1:
continue
else:
print("found {0}".format(el))

Fastest algorithm possible to pick number pairs

The question:
Given N integers [N<=10^5], count the total pairs of integers that have a difference of K. [K>0 and K<1e9]. Each of the N integers will be greater than 0 and at least K away from 2^31-1 (Everything can be done with 32 bit integers).
1st line contains N & K (integers).
2nd line contains N numbers of the set. All the N numbers are assured to be distinct.
Now the question is from hackerrank. I got a solution for the question but it doesn't satisfy the time-limit for all the sample test cases. I'm not sure if its possible to use another algorithm but I'm out of ideas. Will really appreciate if someone took a bit of time to check my code and give a tip or two.
temp = input()
temp = temp.split(" ")
N = int(temp[0])
K = int(temp[1])
num_array = input()
num_array = num_array.split(" ")
diff = 0
pairs= 0
i = 0
while(i < N):
num_array[i] = int(num_array[i])
i += 1
while(num_array != []):
j = 0
while(j < (len(num_array)-1)):
diff = abs(num_array[j+1] - num_array[0])
if(diff == K):
pairs += 1
j += 1
del num_array[0]
if(len(num_array) == 1):
break
print(pairs)

You can do this in aproximately linear time by following the procedure:
So, O(n) solution:
For each number x add it to hash-set H[x]
For each number x check whether x-k is in H, if yes - add 1 to answer
Or by using some balanced structure (like tree-based set) in O(nlgn)
This solution bases on the assumption that integers are distinct, if they are not you need to store the number of times element has been "added to set" and instead of adding 1 to answer - add the product of H[x]*H[x+k]
So in general you take some HashMap H with "default value 0"
For each number x update map: H[x] = H[x]+1
For each number x add to answer H[x]*H[x-k] (you don't have to check whether it is in the map, because if it is not, H[x-k]=0 )
and again - solution using hash-map is O(n) and using tree-map O(nlgn)
So given set of numbesr A, and number k (solution for distinct numbers):
H=set()
ans=0
for a in A:
H.add(a)
for a in A:
if a-k in H:
ans+=1
print ans
or shorter
H=set(A)
ans = sum(1 for a in A if a-k in H)
print ans

Use a dictionary (hash map).
Step 1: Fill the dictionary D with all entries from the array.
Step 2: Count occurences of all A[i] + k in the dictionary.
Dictionary<int> dict = new Dictionary<int>();
foreach (int n in num_array) do dict.Add(n);
int solitions = 0;
foreach (int n in num_Array) do
if dict.contains(n+k)
solutions += 1;
Filling a dictionary is O(1), Searching is O(1) as well. Doing it for each element in the array is O(n). This is as fast as it can get.
Sorry, you have to translate it to python, though.
EDIT: Same idea as the previous one. Sorry to post a duplicate. It's too late to remove my duplicate I guess.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Efficiently find number of pairs of duplicates - python

#Hesham Attia already provided the correct algorithm, here's simple implementation with Counter: >>> from collections import Counter >>> l = [13,4,8,4,13,7,13,9,13] >>> sum(x * (x - 1) // 2 for x in Counter(l).values()) 7

Time Complexity : O(N) arr = list(map(int,input().split())) d = {} for i in range(len(arr)): if arr[i] in d.keys(): d[arr[i]] += 1 else: d[arr[i]] = 1 ans = 0 for val in d.values(): if val > 1: ans += val*(val-1)//2 print(ans)

Related

Count pairs of elements in an array whose sum equals a given sum (but) do it in a single iteration(!)

Count all pairs with given XOR

Code challenge: finding the divisible in a list

Counting number of list entries that occur 1 time

Fastest algorithm possible to pick number pairs

Categories

Resources