Number of distinct contiguous subarray - python

import math
n=7 #length of list
k=2 #number
arr=[1,1,1,1,4,5,1]
l=n
def segmentedtree(segmentedtreearr,arr,low,high,pos): #function to build segment tree
if low==high:
segmentedtreearr[pos]=arr[high]
return
mid=(low+high)//2
segmentedtree(segmentedtreearr,arr,low,mid,((2*pos)+1))
segmentedtree(segmentedtreearr,arr,mid+1,high,((2*pos)+2))
segmentedtreearr[pos]=segmentedtreearr[((2*pos)+1)]+segmentedtreearr[((2*pos)+2)]
flag=int(math.ceil(math.log2(n))) #calculating height of segment tree
size=2*int(math.pow(2,flag))-1#calculating size
segmentedtreearr=[0]*(size)
low=0
high=l-1
pos=0
segmentedtree(segmentedtreearr,arr,low,high,pos)
if (n%2==0):
print (segmentedtreearr.count(k)+1)
else:
print (segmentedtreearr.count(k))
Now arr=[1,1,1,1,4,5,1] so different possible combinations for sum equal to k=2 can be [1,1] using index (0,1) and [1,1] using index (1,2) and [1,1] using index (2,3) but i am getting 2 as a output although my implementation is correct.

Segment trees are good for looking up ranges when you have an absolute point, but in your case you have a relative measure you are looking for (a sum).
Your code is missing a pair of ones that are in two different branches of the tree:
As you can imagine, larger sums could span several branches (like for sum = 7). There is no trivial way to make use of this tree to answer the question.
It is much easier with a simple iteration through the list, using two indexes (left and right of a range), incrementing the left index when the sum is too large and incrementing the right index when it is too small. This assumes that all values in the input list are positive, which is stated in your reference to hackerrank:
def count_segments_with_sum(lst, total):
i = 0
count = 0
for j, v in enumerate(lst):
total -= v
while total < 0:
total += lst[i]
i += 1
count += not total
return count
print(count_segments_with_sum([1,1,1,1,4,5,1], 2)) # -> 3

Here is an O(n) solution discarding the tree approach. It uses accumulate and groupby from itertools and merge from heapq:
It is not very optimized. My focus was on demonstrating the principle and using vectorizable components.
import itertools as it, operator as op, heapq as hq
arr=[1,1,1,1,4,5,1]
k = 2
N = len(arr)
# compute cumulative sum (starting at zero) and again shifted by `-k`
ps = list(it.chain(*(it.accumulate(it.chain((i,), arr), op.add) for i in (0,-k))))
# merge the cumsum and shifted cumsum, do this indirectly (index based); observe that any eligible subsequence will result in a repeated number in the merge
idx = hq.merge(range(N+1), range(N+1, 2*N+2), key=ps.__getitem__)
# use groupby to find repeats
grps = (list(grp) for k, grp in it.groupby(idx, key=ps.__getitem__))
grps = (grp for grp in grps if len(grp) > 1)
grps = [(i, j-N-1) for i, j in grps]
Result:
[(0, 2), (1, 3), (2, 4)]
Some more detailed explanation:
1) we build the sequence ps = {0, arr_0, arr_0 + arr_1, arr_0 + arr_1 + arr_2, ...} of cumulative sums of arr. This is useful because andy sum of a stretch of elements can be written as the difference between two terms in ps.
2) in particular, a contiguous subsequence that sums to k will correspond to a pair of elements of ps whose difference is k. To find those we make a copy of ps and subtract k from each element. We therefore need to find numbers that are in ps and in the shifted ps.
3) because ps and ps shifted are sorted (assuming the terms of arr are positive) the numbers that are in ps and ps shifted can be found in O(n) using merge which puts such pairs next to each other. If I remember correctly, the merge is guaranteed to be stable, so we can rely on the element from ps coming first in any such pair.
4) it remains to find the pairs which we do using groupby.
5) But wait a minute. If we do this directly all we got in the end are pairs of equal values. If you just want to count them that's fine, but if we want the actual sublists we have to do the merge indirectly, using the key kwd arg which works in the same way as in sorted
6) So we create two ranges of indices and use list.__getitem__ as key function because we have two lists but can only pass one key, we concatenate the lists first. As a consequence the indices into the first and second list are unique.
7) the result is a list of indices idx such that ps[idx[0]], ps[idx[1]], ... is sorted (ps in the program is ps with ps-k already glued to it) using the same key function as before we can do the groupby indirectly, on idx.
8) we then discard all groups that have only a single element and for the remaining pairs shift back the second index.

Related

Find pairs of elements in order in terms of distance

Suppose we have data a₁, ..., aₙ, where n is an even integer and each aᵢ ∈ ℝ. Also define the distance between two pairs of elements dis(aᵢ, aⱼ) = | aᵢ − aⱼ |. Now the program should output a list of pairs of elements sorted by the distance in an ascending order. Also the program should pack the input data into pairs, therefore each element aᵢ would only appear once in the output.
For example, given the input [1, 0.4, 3, 1.1] the output should be [(1, 1.1), (0.4, 3)].
A naive brute-force method is to calculate all C(n,2) pair and sorted the distance of each pair.
def not_in_list_of_pair(i, ls):
return not i in [p[0] for p in ls] + [p[1] for p in ls]
def calc(ls):
ls = sorted(ls)
d ={}
for idx1, i in enumerate(ls[:-1]):
for idx2, j in enumerate(ls[idx1+1:], idx1 + 1):
d[(i,j)] = j - i
# 2nd part
res = []
for pair in sorted(d, key = lambda k: d[k]):
i, j = pair
if not_in_list_of_pair(i, res) and not_in_list_of_pair(j, res):
res.append(pair)
return res
# another example
ls = [1, 0.1, 2, 2.4, 3, 4, 1.5]
assert calc(ls) == [(2, 2.4), (1, 1.5), (3, 4)]
But this naive method only works in O(n²), and the 2nd part (extracting min distance) is also slow. Therefore I am looking for a more effective method to solve this problem. Thanks!
I have to say that your descrption of the problem is not clear and the complexity in the description is not correct, i.e., you have to calculate the distance of all the pairs of integers (which is O(n^2)) and after that you sort all the distance (which is O(n^2 * log(n^2))).
For this problem, you are basically finding two integers with smallest distance, pick these two integers out, and repeat the same process on the remaining integers.
One naive solution is, supposed the integers are sorted, and we only find one pair of integers with smallest distance, then we just need to calculate the distance of each two adjacent integers (e.g., dist between ls[0] and ls[1], between ls[1] and ls[2], ..., between ls[n - 2] and ls[n - 1]) and find out which pair is the smallest. After we find one, remove the two selected integers, the remaining integers are still sorted. If we want to find the next pair of integers with smallest distance, the problem remains the same.
The naive solution is still expensive in two aspsects: (1) we need to calculate the distance of each two adjacent integers each time; (2) we need to remove two integers from a sorted array and keep the array sorted.
To solve (1), in fact, we don't have to calculate the all the distances each time. E.g., suppose we have 6 integers where we calculated dist(0, 1), dist(1, 2), dist(2, 3), dist(3, 4), dist(4, 5). We find that the 2nd and the 3rd integers are the closet ones, so we output and remove the 2nd and the 3rd integers. For the next round, we need to calculate dist(0, 1), dist(1, 4), dist(4, 5). We can see that we only need to remove dist(1, 2) and dist(3, 4) as they're useless, but we need to add a new distance dist(1, 4) while dist(0, 1) and dist(4, 5) are not changed. We can maintain a btree to achieve the purpose.
To solve (2), the best data structure where we can remove items from the middle is double linked list with complexity O(1). But we are using array now and we may not want to change array to linked list. One way is that we use index array to mimic a double linked list.
Here is an example.
Update 1: I found OrderedDict does not pop the minimal item each time. I don't find any data structure in python that works as btree. I have to use a heap where I cannot delete those useless distance but I can identiy and ignore them. Sorry for the mistake.
Update 2: Add a else branch in the while loop, i.e., we should not change the double linked list when we see a useless item.
Update 3: Just realize that the heap will have no more than n items in each iteration in the while loop. So the complexity is roughly O(n log n), with n being the number of integers.
from heapq import *
def calc(ls):
ls = sorted(ls) # O(nlogn)
n = len(ls)
# mimic a double linked list
left = [i - 1 for i in range(n)]
right = [i + 1 for i in range(n)]
appeared = [False for i in range(n)]
btree = []
for i in range(0, n - 1):
# distance of adjacent integers, and their indices
heappush(btree, (ls[i + 1] - ls[i], i, i + 1))
# roughly O(n log n), because the heap will have at most `n` items in each iteration
result = []
while len(btree) != 0:
minimal = heappop(btree)
a, b = minimal[1:3]
# skip if either a or b appeared
if not appeared[a] and not appeared[b]:
result.append((ls[a], ls[b]))
appeared[a] = True
appeared[b] = True
else:
continue # this is important
#print result
if left[a] != -1:
right[left[a]] = right[b]
if right[b] != n:
left[right[b]] = left[a]
if left[a] != -1 and right[b] != n:
heappush(btree, (ls[right[b]] - ls[left[a]], left[a], right[b]))
return result
ls = [1, 0.1, 2, 2.4, 3, 4, 1.5]
print calc(ls)
With the following output:
[(2, 2.4), (1, 1.5), (3, 4)]
Note: The number of input integers is 7, which is NOT even.
Show one more image to present what is going on:
I am not very familiar with Python, so I may not be using the best data structure in the above code snippet.

Find pairs of numbers that add to a certain value?

I have a function match that takes in a list of numbers and a target number and I want to write a function that finds within the array two numbers that add to that target.
Here is my approach:
>>> def match(values, target=3):
... for i in values:
... for j in values:
... if j != i:
... if i + j == target:
... return print(f'{i} and {j}')
... return print('no matching pair')
Is this solution valiant? Can it be improved?
The best approach would result in O(NlogN) solution.
You sort the list, this will cost you O(NlogN)
Once the list is sorted you get two indices, the former points to the first element, the latter -- to the latest element and you check to see if the sum of the elements matches whatever is your target. If the sum is above the target, you move the upper index down, if the sum is below the target -- you move the lower index up. Finish when the upper index is equal to the lower index. This operation is linear and can be done in O(N) time.
All in all, you have O(NlogN) for the sorting and O(N) for the indexing, bringing the complexity of the whole solution to O(NlogN).
There is room for improvement. Right now, you have a nested loop. Also, you do not return when you use print.
As you iterate over values, you are getting the following:
values = [1, 2, 3]
target = 3
first_value = 1
difference: 3 - 1 = 2
We can see that in order for 1 to add up to 3, a 2 is required. Rather than iterating over the values, we can simply ask 2 in values.
def match(values, target):
values = set(values)
for value in values:
summand = target - value
if summand in values:
break
else:
print('No matching pair')
print(f'{value} and {summand}')
Edit: Converted values to a set since it has handles in quicker than if it were looking it up in a list. If you require the indices of these pairs, such as in the LeetCode problem you should not convert it to a set, since you will lose the order. You should also use enumerate in the for-loop to get the indices.
Edit: summand == value edge case
def match(values, target):
for i, value in enumerate(values):
summand = target - value
if summand in values[i + 1:]:
break
else:
print('No matching pair')
return
print(f'{value} and {summand}')

Find the maximum Index difference given constraints

In my quest on learning algorithm design, i started practicing questions and there is this particular questions that i have trouble with finding an efficient solution.
Given an array A of integers, find the maximum of j - i subjected to
the constraint of A[i] <= A[j]. A : [3 5 4 2] Output : 2 for the pair
(3, 4)
def maxIndex(arr):
max_val = float("-inf")
for i in range(0,len(arr)):
for j in range(i + 1 , len(arr)):
#print(arr[i],arr[j])
if arr[i] <= arr[j]:
diff_i = j - i
if diff_i > max_val:
max_val = diff_i
return max_val
A = [3, 5, 4, 2]
print("Result :",maxIndex(A))
My naive approach above will work but the time complexity is O(n^2) with a space complexity of O(1).
Here both the value and the indexes are important.IF i sort the list out of place and store the indices in a dictionary , i will still have to use a nested for loop to check for j - 1 constraint.
How can i improve the time complexity?
You can create two auxiliary array such that the min array stores at index i the minimum value till the index i, similarly the max array contains the max array value till index i (traversed in reverse)
You can find the answer here https://www.geeksforgeeks.org/given-an-array-arr-find-the-maximum-j-i-such-that-arrj-arri/
As has been mentioned, there is an O(n) solution which is the most efficient. I will add another way of solving it in O(n log n):
We can think of this problem as for each index i, know the furthest index j > i where a[i] <= a[j]. If we had this, we only need to evaluate the difference of the indexes and keep a maximum over it. So, how to calculate this information?
Add all elements to a set, in the form of a pair (element, index) so it first sorts by element, and then by index.
Now iterate the array backwards starting from last element. For every pair in the set where element is lower or equal to current element, we set its furthest index as current index and we remove it from set.
After all is done, evaluate the furthest index j of each i and the answer is the max of those
Note that for each element, we need to search in the set all values that are lower. The search is O(log n), and while we could iterate more, as we remove it later from the set we only end up iterating each element once, so the overall complexity is O(n log n).
A possible solution that I can think from the top of my head for the given problem would be to create a list of pair from the given list which preserves the list indices along with the list value, that is, list of (Ai, i) for all elements in the list.
You can sort this given list of pairs in ascending order and iterate from left-to-right. We maintain a variable which represents the minimum index we have encountered till now in our iteration min_index. Now at every step i, we update our answer as ans = max(ans, indexi - min_index) iff indexi > min_index and also our min_index and also our min_index as min_index = min(indexi, min_index) Since our list is sorted, it's guaranteed that A[i] >= A[min_index]
Since we need to sort the array initially, the overall complexity of the solution is O(nlog(n))
There is approach with O(nlogn) time (while I have feeling that linear algorithm should exist).
Make list min-candidates
Walk through the source list.
If current item is less than current minimum, add its index to min-candidates. So corresponding values are sorted in descending order.
If current item is larger than current minimum, search for the first less item in min-candidates with binary search. Find index difference and compare with the current best result.
This could be solved in O(nlogn) time and O(n) space.
Create a list of tuples of [value, index]
Sort them by value
Initialize min_index to some max value ( list.length + 1 )
Initialize a max value for difference of indices
Initialize a tuple to capture indices that has max difference.
Now go through following steps ( pseudo code ):
min_index = list.length + 1
max = 0
max_tuple = []
for tuple t in list:
min_index = minimum( t.index, min_index )
if ( t.index != min_index )
if ( t.index - min_index >= max )
max = t.index - min_index
max_tuple = [min_index, t.index]
In other words, you keep track of minimum index and because your list is sorted, as you go through the list in increasing value order, you will get a difference between the min index and your current index which you need to maximize.

How to create a list of all possible lists satisfying a certain condition?

I'm currently trying to do project Euler problem 18 (https://projecteuler.net/problem=18), using the 'brute force' method to check all possible paths. I've just been trying the smaller, 'model' triangle so far.
I was using list comprehension to create a list of lists where the inner lists would contain the indices for that line, for example:
lst = [[a,b,c,d] for a in [0] for b in [0,1] for c in [0,1,2] for d in
[0,1,2,3] if b == a or b == a + 1 if c == b or c == b + 1 if d == c or d ==
c + 1]
This gives me the list of lists I want, namely:
[[0,0,0,0],[0,0,0,1],[0,0,1,1],[0,0,1,2],[0,1,1,1],[0,1,1,2],[0,1,2,2],
[0,1,2,3]]
Note: the if conditions ensure that it only moves to adjacent numbers in the next row of the triangle, so that
lst[i][j] = lst[i][j-1] or lst[i][j] = lst[i][j]-1
After I got to this point, I intended that for each of the inner lists, I would take the numbers associated with those indices (so [0,0,0,0] would be 3,7,2,8) and sum over them, and this way get all of the possible sums, then take the maximum of those.
The problem is that if I were to scale this up to the big triangle I'd have fifteen 'for's and 'if's in my list comprehension. It seems like there must be an easier way! I'm pretty new to Python so hopefully there's some obvious feature I can make use of that I've missed so far!
What an interesting question! Here is a simple brute force approach, note the use of itertools to generate all the combinations, and then ruling out all the cases where successive row indices differ by more than one.
import itertools
import numpy as np
# Here is the input triangle
tri = np.array([[3],[7,4],[2,4,6],[8,5,9,3]])
indices = np.array([range(len(i)) for i in tri])
# Generate all the possible combinations
indexCombs = list(itertools.product(*indices))
# Generate the difference between indices in successive rows for each combination
diffCombs = [np.array(i[1:]) - np.array(i[:-1]) for i in indexCombs]
# The only combinations that are valid are when successive row indices differ by 1 or 0
validCombs = [indexCombs[i] for i in range(len(indexCombs)) if np.all(diffCombs[i]**2<=1)]
# Now get the actual values from the triangle for each row combination
valueCombs = [[tri[i][j[i]] for i in range(len(tri))] for j in validCombs]
# Find the sum for each combination
sums = np.sum(valueCombs, axis=1)
# Print the information pertaining to the largest sum
print 'Highest sum: {0}'.format(sums.max())
print 'Combination: {0}'.format(valueCombs[sums.argmax()])
print 'Row indices: {0}'.format(indexCombs[sums.argmax()])
The output is:
Highest sum: 23
Combination: [3, 7, 4, 9]
Row indices: (0, 0, 1, 0)
Unfortunately this is hugely intensive computationally, so it won't work with the large triangle - but there are definitely some concepts and tools that you could extend to try get it to work!

Improving the execution time of matrix calculations in Python

I work with a large amount of data and the execution time of this piece of code is very very important. The results in each iteration are interdependent, so it's hard to make it in parallel. It would be awesome if there is a faster way to implement some parts of this code, like:
finding the max element in the matrix and its indices
changing the values in a row/column with the max from another row/column
removing a specific row and column
Filling the weights matrix is pretty fast.
The code does the following:
it contains a list of lists of words word_list, with count elements in it. At the beginning each word is a separate list.
it contains a two dimensional list (count x count) of float values weights (lower triangular matrix, the values for which i>=j are zeros)
in each iteration it does the following:
it finds the two words with the most similar value (the max element in the matrix and its indices)
it merges their row and column, saving the larger value from the two in each cell
it merges the corresponding word lists in word_list. It saves both lists in the one with the smaller index (max_j) and it removes the one with the larger index (max_i).
it stops if the largest value is less then a given THRESHOLD
I might think of a different algorithm to do this task, but I have no ideas for now and it would be great if there is at least a small performance improvement.
I tried using NumPy but it performed worse.
weights = fill_matrix(count, N, word_list)
while 1:
# find the max element in the matrix and its indices
max_element = 0
for i in range(count):
max_e = max(weights[i])
if max_e > max_element:
max_element = max_e
max_i = i
max_j = weights[i].index(max_e)
if max_element < THRESHOLD:
break
# reset the value of the max element
weights[max_i][max_j] = 0
# here it is important that always max_j is less than max i (since it's a lower triangular matrix)
for j in range(count):
weights[max_j][j] = max(weights[max_i][j], weights[max_j][j])
for i in range(count):
weights[i][max_j] = max(weights[i][max_j], weights[i][max_i])
# compare the symmetrical elements, set the ones above to 0
for i in range(count):
for j in range(count):
if i <= j:
if weights[i][j] > weights[j][i]:
weights[j][i] = weights[i][j]
weights[i][j] = 0
# remove the max_i-th column
for i in range(len(weights)):
weights[i].pop(max_i)
# remove the max_j-th row
weights.pop(max_i)
new_list = word_list[max_j]
new_list += word_list[max_i]
word_list[max_j] = new_list
# remove the element that was recently merged into a cluster
word_list.pop(max_i)
count -= 1
This might help:
def max_ij(A):
t1 = [max(list(enumerate(row)), key=lambda r: r[1]) for row in A]
t2 = max(list(enumerate(t1)), key=lambda r:r[1][1])
i, (j, max_) = t2
return max_, i, j
It depends on how much work you want to put into it but if you're really concerned about speed you should look into Cython. The quick start tutorial gives a few examples ranging from a 35% speedup to an amazing 150x speedup (with some added effort on your part).

Categories

Resources