Duplicate pairs in an array

Duplicate pairs in an array - python

Given an array A with zero index and N integers find equal elements with different positions in the array. Pair of indexes (P,Q) such that 0 <= P < Q < N such that A[P] = A[Q].
My idea:
def function(arr, n) :
count = 0
arr.sort()
i = 0
while i < (n-1) :
if (arr[i] == arr[i + 1]) :
count += 1
i = i + 2
else :
i += 1
return count
Two questions:
How do I avoid counting elements whose first indices are not smaller than the second indices?
How do I build a function where the input is only the array? (So not (arr, n))

What you can do is similar to this:
This one is the naive approach:
def function(arr) :
count = 0
n = len(arr)
i = 0
for i in range(n):
for j in range(i+1,n):
if arr[i]==arr[j]:
count+=1
return count
This one is more optimized approach you can try:
def function(arr) :
mp = dict()
n = len(arr)
for i in range(n):
if arr[i] in mp.keys():
mp[arr[i]] += 1
else:
mp[arr[i]] = 1
ans = 0
for it in mp:
count = mp[it]
ans += (count * (count - 1)) // 2
return ans

You can use collections.Counter to count the number of occurrences of every integer,
then use math.comb with n=count and k=2 to get the number of such pairs for every integer, and simply sum them:
from collections import Counter
from math import comb
def function(arr):
return sum(comb(count, 2) for num,count in Counter(arr).items())
print(function([1,2,3,6,3,6,3,2]))
The reason math.comb(count,2) is exactly the number of pairs is that any 2 elements out of the count you choose, regardless of their order, are a single pair: the former one is P and the latter is Q.
EDIT: Added timeit benchmakrs:
Here's a full example you can test to compare the performance of both methods:
from timeit import timeit
from random import randint
from collections import Counter
from math import comb
def with_comb(arr):
return sum(comb(count, 2) for num,count in Counter(arr).items())
def with_loops(arr):
mp = dict()
n = len(arr)
for i in range(n):
if arr[i] in mp.keys():
mp[arr[i]] += 1
else:
mp[arr[i]] = 1
ans = 0
for it in mp:
count = mp[it]
ans += (count * (count - 1)) // 2
return ans
a = [randint(1,1000) for _ in range(10000)]
time1 = timeit('with_loops(a)', globals=globals(), number=1000)
time2 = timeit('with_comb(a)', globals=globals(), number=1000)
print(time1)
print(time2)
print(time1/time2)
Output (on my laptop):
2.9549962
0.8175686999999998
3.6143705110041524

Related

Is there a more efficient way to compare two lists in python than O(m*n)?

I am trying to find a method for comparing two lists in python in a more efficient way than what I think is the current O(m*n) runtime. Right now I have a brute force approach of iterating each item in m and comparing it to n but is anything else possible? I have tried maybe sorting the lists first for maybe something faster but I am kind of stuck on whether anything else could work here.
In my function i take each item in m and compare it to n and count the number of times the item in m is greater than the item in n.
n = [1,3,7]
m = [2,9]
def comparison(n,m):
counter = 0
for i in m:
for j in n:
if i >= j:
counter += 1
return counter

Here's how you could use a binary search approach after sorting the target list:
from bisect import bisect_right
n = [1,3,7,2]
m = [2,9]
n.sort()
counter = sum(bisect_right(n,value) for value in m)
print(counter) # 6
This should correspond to O((n+m) x log(n)) if n is not known to be sorted. If n is always provided in sorted order, then you don't need your function to sort it and you will get O(m x log(n)) time complexity.

I wrote a code for you to test which one runs faster using the built-in "timeit" library. You can test others' advice using the same structure. There is the code:
import timeit
import numpy as np
n = [1,3,7]
m = [9,2]
my_code = '''
def comparison(n,m):
counter = 0
for i in n:
for j in m:
if i >= j:
counter += 1
return counter
'''
mysetup = "import numpy as np"
my_code2 = '''
def comparison_with_numpy(n,m):
x = np.array(n)
y = np.array(m)
smaller = np.array([x[i] > y[:] for i in range(x.shape[0])]).astype('int')
return sum(smaller)[0]
'''
my_code3 = '''
def sort_first(n,m):
sorted(n)
sorted(m)
count = 0
if len(n) > len(m):
iteration = len(n)
else:
iteration = len(m)
for _ in range(iteration):
if n != []:
y = n.pop(0)
if m != []:
x = m.pop(0)
if y > x:
count += 1
return count
'''
def comparison(n,m):
counter = 0
for i in n:
for j in m:
if i >= j:
counter += 1
print(counter)
return counter
def comparison_with_numpy(n,m):
x = np.array(n)
y = np.array(m)
smaller = np.array([x[i] > y[:] for i in range(x.shape[0])]).astype('int')
return sum(smaller)[0]
def sort_first(n,m):
sorted(n)
sorted(m)
count = 0
if len(n) > len(m):
iteration = len(n)
else:
iteration = len(m)
for _ in range(iteration):
if n != []:
y = n.pop(0)
if m != []:
x = m.pop(0)
if y > x:
count += 1
return count
def main():
print('comparison /w sort\t\t',timeit.timeit(stmt = my_code3,number=10000))
print('comparison\t\t',timeit.timeit(stmt = my_code,number=10000))
print('comparison with numpy\t\t',timeit.timeit(setup = mysetup
,stmt = my_code2
,number=10000))
if __name__ == "__main__":
main()

Given an array of N integers, and an integer K, find the number of pairs of elements in the array whose sum is equal to K

Problem Statement:- Given an array of N integers, and an integer K, find the number of pairs of elements in the array whose sum is equal to K.
**def countpairs(x,length,sum):
count = 0
for i in range(0,length):
for j in range(i+1,length):
print(x[i],x[j])
if(x[i]+x[j]==sum):
count+=1
print(count)
x = [1, 1, 1, 1]
sum = 2
length=len(x)
countpairs(x,length,sum)
Output:= 6**
This is My solution used in VS code.
My Question:- whenever I am running the same code in gfg it is not accepting the code giving me this error. I even have tried the same code in the online compiler there also it is running correctly.
This Is the gfg code which i have written
class Solution:
def getPairsCount(self, arr, K, N):
count = 0
for i in range(0,N):
for j in range(i+1,N):
if(arr[i]+arr[j]==K):
count+=1
return count
#Initial Template for Python 3
if __name__ == '__main__':
tc = int(input())
while tc > 0:
n, k = list(map(int, input().strip().split()))
arr = list(map(int, input().strip().split()))
ob = Solution()
ans = ob.getPairsCount(arr, n, k)
print(ans)
tc -= 1
Error
if(arr[i]+arr[j]==K):
IndexError: list index out of range

There's no added value in using a class for this. You just need:-
def getPairsCount(arr, K):
count = 0
for i in range(len(arr)-1):
if arr[i] + arr[i+1] == K:
count += 1
return count
EDIT:
Previous answer assumed that only adjacent elements were to be considered. If that's not the case then try this:-
import itertools
def getPairsCount(arr, K):
count = 0
for c in itertools.combinations(sorted(arr), 2):
if c[0] + c[1] == K:
count += 1
return count
data = [1, 2, 1, 4, -1]
print(getPairsCount(data, 3))

We do not need two loops for this question. Here is something that runs in O(n):
def countpairs(list_,K):
count = 0
set_ = set(list_)
pairs_ = []
for val in list_:
if K - val in set_:
# we ensure that pairs are unordered by using min and max
pairs_.append ( (min(val, K-val), max(val, K-val)) )
count+=1
set_pairs = set(pairs_)
print ("Pairs which sum up to ",K," are: ", set_pairs)
return len(set_pairs)
x = [1,4,5,8,2,0,24,7,6]
sum_ = 13
print ("Total count of pairs summming up to ", sum_, " = ", countpairs(x, sum_))
Output:
Pairs which sum up to 13 are: {(6, 7), (5, 8)}
Total count of pairs summming up to 13 = 2
The idea is that if two values should sum to a value K, we can iterate through the array and check if there is another element in the array which when paired with the current element, sums up to K. The inner loop in your solution can be replaced with a search using the in. Now, we need this search to be fast (O(1) per element), so we create a set out of our input array (set_ in my example).

def solve(a,K):
freq = {}
for v in a:
if v in freq:
freq[v] += 1
else:
freq[v] = 1
for i in range(len(set(a))):
res += freq[a[i]] * freq[K - a[i]]
return res
a = [int(v) for v in input().split()]
K = int(input())
print(solve(a,K))
# Time Complexity : O(N)
# Space Complexity : O(1)

def solve(a,K):
freq = {}
for v in a:
if v in freq:
freq[v] += 1
else:
freq[v] = 1
for i in range(len(set(a))):
res += freq[a[i]] * freq[K - a[i]]
return res
a = [int(v) for v in input().split()]
K = int(input())
print(solve(a,K))

Mean, Median, and Mode in Python

I'm doing a statistical problem set in Python on Hackerrank. When I input a list of values to calculate the mode. It shows me a runtime error.
# Enter your code here. Read input from STDIN. Print output to STDOUT
N = int(input())
X = list(map(int, input().split()))
X.sort()
# Find the mean
mean = sum(X) / N
print(mean)
# Find the median
if N % 2 == 0:
median = (X[N//2] + X[N//2 - 1]) / 2
else:
median = X[N//2]
print(median)
# Find the mode
occurrence = list([1 for _ in range(N)])
for i in range(N):
for j in range(i+1, N):
if X[i] == X[j]:
occurrence += 1
if max(occurrence) == 1:
mode = min(X)
else:
mode = X[occurrence[max(occurrence)]]
print(mode)
When I take a 2500 input for X, it just shows me a runtime error.
This is the link to the test case
enter link description here

I use this when looking for mean, median, and mode
import numpy as np
from scipy import stats
n = int(input())
arr = list(map(int, input().split()))
print(np.mean(arr))
print(np.median(arr))
print(stats.mode(arr)[0][0])

You are trying to add 1 to occurence which is of list type:
Also, I'm sure this may be a copying mistake but your loop is incorrect:
for i in range(N):
for j in range(i+1, N):
if X[i] == X[j]:
occurrence += 1
# It will be
for i in range(N):
for j in range(i+1, N):
if X[i] == X[j]:
occurrence += 1
Then you might wanna change your occurrence to something like:
occurrence[i] += 1
# from
occurrence += 1
Hope this helps

I have run your code, here is the compile problem:
for i in range(N):
for j in range(i+1, N):
if X[i] == X[j]:
occurrence += 1
I think your meaning is if inside two for, like:
for i in range(N):
for j in range(i + 1, N):
if X[i] == X[j]:
occurrence += 1
but occurrence is list here, can't plus by one, I think you means to count the occurrence of int, and output the max one? you can use defaultdict or Counter here, but defaultdict is only in one loops.
# import collections
# import operator
# Find the mode
occurrence = collections.Counter(X)
# occurrence = collections.defaultdict(int)
#
# for i in range(N):
# occurrence[X[i]] += 1
mode = max(occurrence.items(), key=operator.itemgetter(1))[0]
print(mode)

Here is a Mean, Median, and Mode class.
import statistics
from collections import Counter
def median(list):
n = len(list)
s = sorted(list)
return (sum(s[n//2-1:n//2+1])/2.0, s[n//2])[n % 2] if n else None
def mean(list):
if len(list) == 0:
return 0
list.sort()
total = 0
for number in list:
total += number
return total / len(list)
def mode(list):
counter = Counter(list)
if len(counter) > 1:
possible_mode, next_highest = counter.most_common(2)
if possible_mode[1] > next_highest[1]:
return possible_mode[0]
return "None"

Heap Sort Algorithm number of comparisons

I'm trying to count the number of comparisons in this heap sort algorithm:
import random
import time
#HeapSort Algorithm
def heapify(arr, n, i):
count = 0
largest = i
l = 2 * i + 1
r = 2 * i + 2
if l < n and arr[i] < arr[l]:
largest = l
if r < n and arr[largest] < arr[r]:
largest = r
if largest != i:
count += 1
arr[i],arr[largest] = arr[largest],arr[i]
heapify(arr, n, largest)
return count
def heapSort(arr):
n = len(arr)
count = 0
for i in range(n, -1, -1):
heapify(arr, n, i)
count += heapify(arr, i, 0)
for i in range(n-1, 0, -1):
arr[i], arr[0] = arr[0], arr[i]
heapify(arr, i, 0)
return count
print("For n = 1000:")
print("a) Random Generation:")
arr = [x for x in range(1000)]
random.shuffle(arr)
print("Before Sort:")
print (arr)
print("After Sort:")
start_time = time.time()
heapSort(arr)
time = time.time() - start_time
print(arr)
print("Comparisions")
print(heapSort(arr))
print("Time:")
print(time)
I expect the result when n = 1000 integers to be 8421 and when n = 10000 to be 117681
However, each time it either shows 0 or 2001 when I try to count += 1 around the loops and not comparisons.

You seem to be forgetting to take into account the comparisons your recursive solution makes while solving the smaller subproblems. In other words, you are only finding the comparisons made in the topmost level of your solution. Instead, you should update the count variable in the relevant scope whenever you make a call to your heapify function. Notice the updates below where I increased local count variables by the return value of calls to heapify.
def heapify(arr, n, i):
count = 0
largest = i
l = 2 * i + 1
r = 2 * i + 2
if l < n and arr[i] < arr[l]:
largest = l
if r < n and arr[largest] < arr[r]:
largest = r
if largest != i:
count += 1
arr[i],arr[largest] = arr[largest],arr[i]
count += heapify(arr, n, largest)
return count
def heapSort(arr):
n = len(arr)
count = 0
for i in range(n, -1, -1):
heapify(arr, n, i)
count += heapify(arr, i, 0)
for i in range(n-1, 0, -1):
arr[i], arr[0] = arr[0], arr[i]
count += heapify(arr, i, 0)
return count
Here is a working example of your code including the fix given above. I understand that the output is still slightly different than the exact number of comparisons you are expecting, but it is in the ballpark. The relatively small distance is due to the fact that you are randomizing the initial state of the array.

How to efficiently get all combinations where the sum is 10 or below in Python

Imagine you're trying to allocate some fixed resources (e.g. n=10) over some number of territories (e.g. t=5). I am trying to find out efficiently how to get all the combinations where the sum is n or below.
E.g. 10,0,0,0,0 is good, as well as 0,0,5,5,0 etc., while 3,3,3,3,3,3 is obviously wrong.
I got this far:
import itertools
t = 5
n = 10
r = [range(n+1)] * t
for x in itertools.product(*r):
if sum(x) <= n:
print x
This brute force approach is incredibly slow though; there must be a better way?
Timings (1000 iterations):
Default (itertools.product) --- time: 40.90 s
falsetru recursion --- time: 3.63 s
Aaron Williams Algorithm (impl, Tony) --- time: 0.37 s

Possible approach follows. Definitely would use with caution (hardly tested at all, but the results on n=10 and t=5 look reasonable).
The approach involves no recursion. The algorithm to generate partitions of a number n (10 in your example) having m elements (5 in your example) comes from Knuth's 4th volume. Each partition is then zero-extended if necessary, and all the distinct permutations are generated using an algorithm from Aaron Williams which I have seen referred to elsewhere. Both algorithms had to be translated to Python, and that increases the chance that errors have crept in. The Williams algorithm wanted a linked list, which I had to fake with a 2D array to avoid writing a linked-list class.
There goes an afternoon!
Code (note your n is my maxn and your t is my p):
import itertools
def visit(a, m):
""" Utility function to add partition to the list"""
x.append(a[1:m+1])
def parts(a, n, m):
""" Knuth Algorithm H, Combinatorial Algorithms, Pre-Fascicle 3B
Finds all partitions of n having exactly m elements.
An upper bound on running time is (3 x number of
partitions found) + m. Not recursive!
"""
while (1):
visit(a, m)
while a[2] < a[1]-1:
a[1] -= 1
a[2] += 1
visit(a, m)
j=3
s = a[1]+a[2]-1
while a[j] >= a[1]-1:
s += a[j]
j += 1
if j > m:
break
x = a[j] + 1
a[j] = x
j -= 1
while j>1:
a[j] = x
s -= x
j -= 1
a[1] = s
def distinct_perms(partition):
""" Aaron Williams Algorithm 1, "Loopless Generation of Multiset
Permutations by Prefix Shifts". Finds all distinct permutations
of a list with repeated items. I don't follow the paper all that
well, but it _possibly_ has a running time which is proportional
to the number of permutations (with 3 shift operations for each
permutation on average). Not recursive!
"""
perms = []
val = 0
nxt = 1
l1 = [[partition[i],i+1] for i in range(len(partition))]
l1[-1][nxt] = None
#print(l1)
head = 0
i = len(l1)-2
afteri = i+1
tmp = []
tmp += [l1[head][val]]
c = head
while l1[c][nxt] != None:
tmp += [l1[l1[c][nxt]][val]]
c = l1[c][nxt]
perms.extend([tmp])
while (l1[afteri][nxt] != None) or (l1[afteri][val] < l1[head][val]):
if (l1[afteri][nxt] != None) and (l1[i][val]>=l1[l1[afteri][nxt]][val]):
beforek = afteri
else:
beforek = i
k = l1[beforek][nxt]
l1[beforek][nxt] = l1[k][nxt]
l1[k][nxt] = head
if l1[k][val] < l1[head][val]:
i = k
afteri = l1[i][nxt]
head = k
tmp = []
tmp += [l1[head][val]]
c = head
while l1[c][nxt] != None:
tmp += [l1[l1[c][nxt]][val]]
c = l1[c][nxt]
perms.extend([tmp])
return perms
maxn = 10 # max integer to find partitions of
p = 5 # max number of items in each partition
# Find all partitions of length p or less adding up
# to maxn or less
# Special cases (Knuth's algorithm requires n and m >= 2)
x = [[i] for i in range(maxn+1)]
# Main cases: runs parts fn (maxn^2+maxn)/2 times
for i in range(2, maxn+1):
for j in range(2, min(p+1, i+1)):
m = j
n = i
a = [0, n-m+1] + [1] * (m-1) + [-1] + [0] * (n-m-1)
parts(a, n, m)
y = []
# For each partition, add zeros if necessary and then find
# distinct permutations. Runs distinct_perms function once
# for each partition.
for part in x:
if len(part) < p:
y += distinct_perms(part + [0] * (p - len(part)))
else:
y += distinct_perms(part)
print(y)
print(len(y))

Make your own recursive function which do not recurse with an element unless it's possible to make a sum <= 10.
def f(r, n, t, acc=[]):
if t == 0:
if n >= 0:
yield acc
return
for x in r:
if x > n: # <---- do not recurse if sum is larger than `n`
break
for lst in f(r, n-x, t-1, acc + [x]):
yield lst
t = 5
n = 10
for xs in f(range(n+1), n, 5):
print xs

You can create all the permutations with itertools, and parse the results with numpy.
>>> import numpy as np
>>> from itertools import product
>>> t = 5
>>> n = 10
>>> r = range(n+1)
# Create the product numpy array
>>> prod = np.fromiter(product(r, repeat=t), np.dtype('u1,' * t))
>>> prod = prod.view('u1').reshape(-1, t)
# Extract only permutations that satisfy a condition
>>> prod[prod.sum(axis=1) < n]
Timeit:
>>> %%timeit
prod = np.fromiter(product(r, repeat=t), np.dtype('u1,' * t))
prod = prod.view('u1').reshape(-1, t)
prod[prod.sum(axis=1) < n]
10 loops, best of 3: 41.6 ms per loop
You could even speed up the product computation by populating combinations directly in numpy.

You could optimize the algorithm using Dynamic Programming.
Basically, have an array a, where a[i][j] means "Can I get a sum of j with the elements up to the j-th element (and using the jth element, assuming you have your elements in an array t (not the number you mentioned)).
Then you can fill the array doing
a[0][t[0]] = True
for i in range(1, len(t)):
a[i][t[i]] = True
for j in range(t[i]+1, n+1):
for k in range(0, i):
if a[k][j-t[i]]:
a[i][j] = True
Then, using this info, you could backtrack the solution :)
def backtrack(j = len(t)-1, goal = n):
print j, goal
all_solutions = []
if j == -1:
return []
if goal == t[j]:
all_solutions.append([j])
for i in range(j-1, -1, -1):
if a[i][goal-t[j]]:
r = backtrack(i, goal - t[j])
for l in r:
print l
l.append(j)
all_solutions.append(l)
all_solutions.extend(backtrack(j-1, goal))
return all_solutions
backtrack() # is the answer

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Duplicate pairs in an array - python

Related

Is there a more efficient way to compare two lists in python than O(m*n)?

Given an array of N integers, and an integer K, find the number of pairs of elements in the array whose sum is equal to K

Mean, Median, and Mode in Python

Heap Sort Algorithm number of comparisons

How to efficiently get all combinations where the sum is 10 or below in Python

Categories

Resources