Count all pairs with given XOR - python

Given a list of size N. Find the number of pairs (i, j) such that A[i] XOR A[j] = x, and 1 <= i < j <= N.
Input : list = [3, 6, 8, 10, 15, 50], x = 5
Output : 2
Explanation : (3 ^ 6) = 5 and (10 ^ 15) = 5
This is my code (brute force):
import itertools
n=int(input())
pairs=0
l=list(map(int,raw_input().split()))
q=[x for x in l if x%2==0]
p=[y for y in l if y%2!=0]
for a, b in itertools.combinations(q, 2):
if (a^b!=2) and ((a^b)%2==0) and (a!=b):
pairs+=1
for a, b in itertools.combinations(p, 2):
if (a^b!=2) and ((a^b)%2==0) and (a!=b):
pairs+=1
print pairs
how to do this more efficiently in a complexity of O(n) in python?

Observe that if A[i]^A[j] == x, this implies that A[i]^x == A[j] and A[j]^x == A[i].
So, an O(n) solution would be to iterate through an associate map (dict) where each key is an item from A and each value is the respective count of the item. Then, for each item, calculate A[i]^x, and see if A[i]^x is in the map. If it is in the map, this implies that A[i]^A[j] == x for some j. Since we have a map with the count of all items that equal A[j], the total number of pairs will be num_Ai * num_Aj. Note that each element will be counted twice since XOR is commutative (i.e. A[i]^A[j] == A[j]^A[i]), so we have to divide the final count by 2 since we've double counted each pair.
def create_count_map(lst):
result = {}
for item in lst:
if item in result:
result[item] += 1
else:
result[item] = 1
return result
def get_count(lst, x):
count_map = create_count_map(lst)
total_pairs = 0
for item in count_map:
xor_res = item ^ x
if xor_res in count_map:
total_pairs += count_map[xor_res] * count_map[item]
return total_pairs // 2
print(get_count([3, 6, 8, 10, 15, 50], 5))
print(get_count([1, 3, 1, 3, 1], 2))
outputs
2
6
as desired.
Why is this O(n)?
Converting a list to a dict s.t. the dict contains the count of each item in the list is O(n) time.
Calculating item ^ x is O(1) time, and calculating whether this result is in a dict is also O(1) time. dict key access is also O(1), and so is multiplication of two elements. We do all this n times, hence O(n) time for the loop.
O(n) + O(n) reduces to O(n) time.
Edited to handle duplicates correctly.

The accepted answer is not giving the correct result for X=0. This code handles that minute error. You can modify it to get answers for other values as well.
def calculate(a) :
# Finding the maximum of the array
maximum = max(a)
# Creating frequency array
# With initial value 0
frequency = [0 for x in range(maximum + 1)]
# Traversing through the array
for i in a :
# Counting frequency
frequency[i] += 1
answer = 0
# Traversing through the frequency array
for i in frequency :
# Calculating answer
answer = answer + i * (i - 1) // 2
return answer

Related

Count pairs of elements in an array whose sum equals a given sum (but) do it in a single iteration(!)

Given an array of integers, and a number ‘sum’, find the number of pairs of integers in the array whose sum is equal to given ‘sum’ in a SINGLE iteration. (O(n) Time complexity is not enough!).
Usually, I would iterate twice through the array once to create hashmap of frequencies and another to find the number of pairs as shown below
def getPairsCount(arr, n, sum):
m=defaultdict(int)
for i in range(0, n): #iteration NO. 1
m[arr[i]] += 1
twice_count = 0
for i in range(0, n): #iteration NO. 2
twice_count += m[sum - arr[i]]
if (sum - arr[i] == arr[i]):
twice_count -= 1
return int(twice_count / 2)
I was asked to do the same in a single iteration instead of two by an interviewer. I am at loss how to do it wihout breaking it at edge cases like {2,2,1,1} where required sum is 3.
A way is to build the hash map at the same time as you are consuming it (thereby only looping the list once). Thus, for each value in the array, check if you have seen the complement (the value needed for the sum) before. If so, you know you have a new pair, and you remove the complement from the seen values. Otherwise you do not have a sum and you add the value you have just seen.
In code this looks like follows:
from collections import defaultdict
def get_pairs_count(array, sum):
pairs_count = 0
seen_values = defaultdict(int)
for value in array:
complement = sum - value
if seen_values[complement] > 0:
pairs_count += 1
seen_values[complement] -= 1
else:
seen_values[value] += 1
return pairs_count
Another way:
def pair_sum2(arr, k):
if len(arr)<2:
return
seen=set()
output=set()
for num in arr:
target=k-num
print("target",target)
if target not in seen:
print("seen before add",seen)
seen.add(num)
print("seen",seen)
else:
output.add( (min(num, target), max(num, target)) )
print("op:",output)
print ('\n'.join( map(str, list(output)) ) )

Maximum items with given k tokens

I have N elements in array. I can select the first item max of N times, the second item max of N-1 times and so on.
I have K tokens to use and need to use them so I can have the maximum number of items.
arr = [3, 4, 8] where array elements indicates tokens required for i'th item
n = 10 , represents number of tokens I have
Output:
3
Explanation:
We have 2 options here:
1. option 1: 1st item 2 times for 6 tokens (3*2) and second item once for 4 tokens (4*1)
2. option 2: 1st item 3 times for 9 tokens (3*3)
so maximum we can have 3 items
Code:
def process(arr,n):
count = 0
sum = 0
size = len(arr)+1
for i in range(0, len(arr), 1):
size1 = size-1
size -= 1
while((sum+arr[i] <= n) and (size1 > 0)):
size1 = size1 -1
sum = sum + arr[i]
count += 1
return count;
But it worked for only few test cases, it failed for some hidden test cases. I am not sure where I made a mistake. Can anybody help me?
Your greedy approach will fail for the test cases like this:
[8,2,1,1] 10
Your code will return 2 but the maximum will be 6.
I will use a heap of a tuple i.e. heap[(cost_of_ride,max_no_rides)] .
See the code below:
from heapq import *
def process(arr,n):
count = 0
heap = []
for i in range(len(arr)):
heappush(heap,(arr[i],-(len(arr)-i))) # Constructing min-heap with second index as negative of maximum number of rides
while(n>0 and heap):
cost,no_of_rides = heappop(heap)
no_of_rides = -1 * no_of_rides # Changing maximum no_of_rides from negative to positive
div = n//cost
# If the amount of money is not sufficient to calculate the last number of rides user could take
if(div<no_of_rides):
count += div
break
# Else decrement the number of tokens by minimum cost * maximum no_of_rides
else:
count += no_of_rides
n -= no_of_rides*cost
return count;
Time Complexity for the solution is: O(len(arr)*lg(len(arr))) or O(N*lg(N)).
Try:
def process(arr, n, res=[]):
l=len(arr)
for j in range(len(arr)+1):
r=[arr[0]]*j
if(sum(r)==n) or (sum(r)<n) and (l==1):
yield len(res+r)
elif(sum(r)<n):
yield from process(arr[1:], n-sum(r), res+r)
else:
break
The idea is to iterate over all possible combinations of resulting tokens, more precisely - all options for individual token are just this token taken between 0 and N times, where N refers to tokens position, per your logic.
Discarding on the way combinations, which exceed n, ultimately returning generator, which produces lengths of produced vector of all tokens taken in all possible quantities (so in order to address your question - you need to take max(...) from it).
Outputs:
>>> print(max(process([3,4,8],10)))
3
>>> print(max(process([8,2,1,1],10)))
6
>>> print(max(process([10, 8, 6, 4, 2], 30)))
6
#learner your logic doesn't seem to be working properly.
Please try these inputs: arr = [10, 8, 6, 4, 2], n = 30.
As per your description answer should be 6 rides but your code would produce 3
Use a modified form a quickselect, where you select the next pivot based on the sum of the products of cost * max_times, but still sort based on just cost. This is worst-case O(n^2), but expected O(n).

Efficiently find number of pairs of duplicates

I am trying to find an algorithm that returns the number of pairs of duplicates in a list.
Example:
Input: [13,4,8,4,13,7,13,9,13]
Output: 7
(4 13's comes out to 6 pairs and two 4's comes out to 1 pair )
Can my algorithm become more efficient? I would like it to be faster than Theta(n^2)
Here is what I have:
my_List=[13,3,8,3,13,7,13,9,13]
pairs=0
alreadySeen=[]
for element in my_List:
howMany=0
if element in alreadySeen:
False
else:
howMany=my_List.count(element)
pairs=pairs+((howMany*(howMany-1))/2)
howMany=0
alreadySeen.append(element)
print(pairs)
Here is an algorithm that runs in O(N).
Iterate over the elements once to create a dict of each element and its count.
The output of this step for your example is {13: 4, 4:2, 8:1, ...}
Iterate over that dict and calculate the number of pairs for each element. The number of pairs for each element can be thought of as selecting 2 items from a list of N elements. This could be done by calculating the combinations without repetitions using the formula (N * (N-1)) / 2. So for 4 elements, there are (4 * 3) / 2 = 6 pairs.
#Hesham Attia already provided the correct algorithm, here's simple implementation with Counter:
>>> from collections import Counter
>>> l = [13,4,8,4,13,7,13,9,13]
>>> sum(x * (x - 1) // 2 for x in Counter(l).values())
7
Here is a javascript code, you can convert this to phython code, the complexity is linear ~ O(n)
var arr = [13,3,8,3,13,7,13,9,13];
var count = {};
var totalpairs =0;
for(var i=0;i<arr.length;i++){
if(count[arr[i]]){
count[arr[i]]++;
}else{
count[arr[i]] =1;
}
}
for(key in count){
if(count[key] %2 == 0){
totalpairs = totalpairs + count[key]/2;
}
}
console.log(' total pairs are '+ totalpairs);
Here is a simple and efficient way of finding all possible duplicate pairs in a list with time complexity ~ O(N).
l = [13,3,8,3,13,7,13,9,13]
# Two pairs of 13 and One pair of 3
# Sum equals to Three
alreadySeen = []
total_no_of_pairs = 0
for i in range(len(l)):
if l[i] not in alreadySeen:
alreadySeen.append(l[i])
else:
# If element l[i] is present in alreadySeen list
# Indicates a Pair and increments count
# Remove element for creating a new pair
total_no_of_pairs +=1
alreadySeen.remove(l[i])
print(total_no_of_pairs)
Output:
3
Time Complexity : O(N)
arr = list(map(int,input().split()))
d = {}
for i in range(len(arr)):
if arr[i] in d.keys():
d[arr[i]] += 1
else:
d[arr[i]] = 1
ans = 0
for val in d.values():
if val > 1:
ans += val*(val-1)//2
print(ans)

Finding the Kth Largest element in a Python List using recursion

Given an input list that contains some random unsorted numbers, I am trying to write a program that outputs the kth largest distinct element in that list. For example:
Input:
el = [10,10, 20,30,40, 40]
k = 2
Output: 30 #Since 30 is the second largest distinct element in the list
The following function, takes as input a list, the pivot Index and k and populates list "lesser" with all elements lesser than the pivot and populates another list "greater" with all elements greater than the pivot.
Now, looking at the length of the two list, I can determine if the kth largest element is in the lesser list or the greater list. Now I recursively call the same function. However, my program's output is wrong for certain values of k.
def kthLargest(el, pivotIndex, k):
pivot = el[pivotIndex]
lesser = [] #List to store all elements lesser than pivot
greater = [] #Lsit to store all elements greater than pivot
equals = [] #List to store all elements equal to pivot
for x in el:
if x > pivot:
greater.append(x)
elif x < pivot:
lesser.append(x)
else:
equals.append(x)
g = len(greater) #Length of greater list
l = len(lesser)
if(g == k - 1): #If greater list has k-1 elements, that makes the pivot kth largest element
return pivot
elif(g < k):
return kthLargest(lesser, l - 1, k) #If greater list is smaller than k, kth largest element is in lesser list
else:
return kthLargest(greater, g - 1, k) #Else kth largest element is in greater list
Is there any reason you want to use recursion? To find the kth largest element of a list you have to look through the entire list, so the problem is essentially O(n) complexity anyway.
You could do this without recursion like this:
el = [10, 10, 53, 20, 30, 40, 59, 40]
k = 2
def kth_largest(input_list, k):
# initialize the top_k list to first k elements and sort descending
top_k = input_list[0:k]
top_k.sort(reverse = True)
for i in input_list[k:]:
if i > top_k[-1]:
top_k.pop() # remove the lowest of the top k elements
top_k.append(i) # add the new element
top_k.sort(reverse = True) # re-sort the list
return top_k[-1] # return the kth largest
kth_largest(el, k)
Here is a simple solution:
def kthmax(k, list):
if (k == 1):
return max(list)
else:
m = max(list)
return(kthmax(k-1, [x for x in list if x != m]))
kthmax(3,[4, 6, 2, 7, 3, 2, 6, 6])
Output: 4
There's an easy way to do this problem using recursion. I'm just not sure why you need the pivot in the problem description... For example:
def find_kth(k, arr):
if k == 1:
return max(arr)
m = max(arr)
new_arr = list(filter(lambda a: a != m, arr))
return(find_kth(k-1, new_arr))
If we can pass in a list or series that is already sorted in descending
order, e.g
el.sort_values(ascending=False, inplace = True)
then you can easily find the kth largest (index,value) tuple
using just simple slicing of sorted dataframe column and/or list
def kth_largest(input_series, k):
new_series = input_series[k-1:len(input_series)]
return (np.argmax(new_series) , np.max(new_series))
el = pd.Series([10, 10, 53, 20, 30, 40, 59, 40])
print el
k = 2
el.sort_values(ascending=False, inplace=True)
print kth_largest(el, 2)
Output: 30
0 10
1 10
2 53
3 20
4 30
5 40
6 59
7 40
dtype: int64
(2, 53)
Algorithm: Take the index of max value and convert to zero.
def high(arr,n):
for i in range(n+ 1 ):
arr[arr.index(max(arr))] = 0
return max(arr)
high([1,2,3,4,5], 2)
My way of finding the Kth largest element is...
lst=[6,2,3,4,1,5]
print(sorted(lst,reverse=True)[k-1])
On top of S Rohith Kumar's answer, If the input has duplicate values, then the answer can be :
print(sorted(set(lst),reverse=True)[k-1])

Increment first n list elements given a condition

I have a list for example
l = [10, 20, 30, 40, 50, 60]
I need to increment the first n elements of the list given a condition. The condition is independent of the list. For example if n = 3, the list l should become :
l = [11, 21, 31, 40, 50, 60]
I understand that I can do it with a for loop on each element of the list. But I need to do such operation around 150 million times. So, I am looking for a faster method to do this. Any help is highly appreciated. Thanks in advance
Here's an operation-aggregating implementation in NumPy:
initial_array = # whatever your l is, but as a NumPy array
increments = numpy.zeros_like(initial_array)
...
# every time you want to increment the first n elements
if n:
increments[n-1] += 1
...
# to apply the increments
initial_array += increments[::-1].cumsum()[::-1]
This is O(ops + len(initial_array)), where ops is the number of increment operations. Unless you're only doing a small number of increments over a very small portion of the list, this should be much faster. Unlike the naive implementation, it doesn't let you retrieve element values until the increments are applied; if you need to do that, you might need a solution based on a BST or BST-like structure to track increments.
m - queries count, n - list to increment length, O(n + m) algorithm idea:
since you only have to increment from start to some k-th element you will get ranges of increments. Let our increment be pair (up to position, increment by). Example:
(1, 2) - increment positions 0 and 1 by 2
If we are trying to calculate value at position k then we should add increments that have positions greater or equal than k to current value at position k. How we can quickly calculate sum of increments that have positions greater or equal than k? We can start calculating values from the back of the list and then remember sum of increments.
Proof of concept:
# list to increment
a = [1, 2, 5, 1, 6]
# (up to and including k-th index, increment by value)
queries = [(1, 2), (0, 10), (3, 11), (4, 3)]
# decribed algorithm implementation
increments = [0]*len(a)
for position, inc in queries:
increments[position] += inc
got = list(a)
increments_sum = 0
for i in xrange(len(increments) -1, -1, -1):
increments_sum += increments[i]
got[i] += increments_sum
# verify that solution is correct using slow but correct algorithm
expected = list(a)
for position, inc in queries:
for i in xrange(position + 1):
expected[i] += inc
print 'Expected: ', expected
print 'Got: ', got
output:
Expected: [27, 18, 19, 15, 9]
Got: [27, 18, 19, 15, 9]
You can create a simple data structure on top of your list which stores the start and end range of each increment operation. The start would be 0 in your case so you can just store the end.
This way you don't have to actually traverse the list to increment the elements, but you only retain that you performed increments on ranges for example {0 to 2} and {0 to 3}. Furthermore, you can also collate some operations, so that if multiple operations increment until the same index, you only need to store one entry.
The worst case complexity of this solution is O(q + g x qlogq + n) where g is the number of get operations, q is the number of updates and n is the length of the list. Since we can have at most n distinct endings for the intervals this reduces to O(q + nlogn + n) = O(q + nlogn). A naive solution using an update for each query would be O(q * l) where l (the length of a query) could be up to the size of n giving O(q * n). So we can expect this solution to be better when q > log n.
Working python example below:
def RangeStructure(object):
def __init__(self, l):
self.ranges = collections.defaultdict(int)
self.l = l
def incToPosition(self, k):
self.ranges[k] += 1
def get(self):
res = self.l
sorted_keys = sorted(self.ranges)
last = len(sorted_keys) - 1
to_add = 0
while last >= 0:
start = 0 if last < 1 else sorted_keys[last - 1]
end = sorted_keys[last]
to_add += self.ranges[end]
for i in range(start, end):
res[i] += to_add
last -= 1
return res
rs = RangeStructure([10, 20, 30, 40, 50, 60])
rs.incToPosition(2)
rs.incToPosition(2)
rs.incToPosition(3)
rs.incToPosition(4)
print rs.get()
And an explanation:
after the inc operations ranges will contain (start, end, inc) tuples of the form (0, 2, 2), (0, 3, 1), (0, 4, 1); these will be represented in the dict as { 2:2, 3:1, 4:1} since the start is always 1 and can be omitted
during the get operation, we ensure that we only operate on any list element once; we sort the ranges in increasing order of their end point, and traverse them in reverse order updating the contained list elements and the sum (to_add) to be added to subsequent ranges
This prints, as expected:
[14, 24, 32, 41, 50, 60]
You can use list comprehension and add the remaining list
[x + 1 for x in a[:n]]+a[n:]

Categories

Resources