Fast way to check consecutive subsequences for total

Fast way to check consecutive subsequences for total - python

I have a list (up to 10,000 long) of numbers 0, 1, or 2.
I need to see how many consecutive subsequences have a total which is NOT 1. My current method is to for each list do:
cons = 0
for i in range(seqlen+1):
for j in range(i+1, seqlen+1):
if sum(twos[i:j]) != 1:
cons += 1
So an example input would be:
[0, 1, 2, 0]
and the output would be
cons = 8
as the 8 working subsequences are:
[0] [2] [0] [1,2] [2, 0] [0, 1, 2] [1, 2, 0] [0, 1, 2, 0]
The issue is that simply going through all these subsequences (the i in range, j in range) takes almost more time than is allowed, and when the if statement is added, the code takes far too long to run on the server. (To be clear, this is only a small part of a larger problem, I'm not just asking for the solution to an entire problem). Anyway, is there any other way to check faster? I can't think of anything that wouldn't result in more operations needing to happen every time.

I think I see the problem: your terminology is incorrect / redundant. By definition, a sub-sequence is a series of consecutive elements.
Do not sum every candidate. Instead, identify every candidate whose sum is 1, and then subtract that total from the computed quantity of all sub-sequences (simple algebra).
All of the 1-sum candidates are of the regular expression form 0*10*: a 1 surrounded by any quantity of 0s on either or both sides.
Identify all such maximal-length strings. FOr instance, in
210002020001002011
you will pick out 1000, 000100, 01, and 1. For each string compute the quantity of substrings that contain the 1 (a simple equation on the lengths of the 0s on each side). Add up those quantities. Subtract from the total for the entire input. There's you answer.

Use sliding window technique to solve these type of problem. Take two variable to track first and last to track the scope of window. So you start with sum equal to first element. If the sum is larger than required value you subtract the 'first' element from sum and increment sum by 1. If the sum is smaller than required you add next element of 'last' pointer and increment last by 1. Every time sum is equal to required increment some counter.
As for NOT, count number of sub-sequence having '1' sum and then subtract from total number of sub-sequence possible, i.e. n * (n + 1) / 2

Related

Non-tail recursion within a for loop

Given an array of numbers, find the length of the longest increasing subsequence in the array. The subsequence does not necessarily have to be contiguous.
For example, given the array [0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15], the longest increasing subsequence has length 6: it is 0, 2, 6, 9, 11, 15.
One of the solutions to the above problem uses non-tail recursion within a for loop, and I am having trouble making sense of it. I don't understand when the code after the recursive call in the for loop is executed, and I can't visualize the entire execution process of the whole solution.
def longest_increasing_subsequence(arr):
if not arr:
return 0
if len(arr) == 1:
return 1
max_ending_here = 0
for i in range(len(arr)):
ending_at_i = longest_increasing_subsequence(arr[:i])
if arr[-1] > arr[i - 1] and ending_at_i + 1 > max_ending_here:
max_ending_here = ending_at_i + 1
return max_ending_here
The description of the solution is as follows:
Assume that we already have a function that gives us the length of the longest increasing subsequence. Then we’ll try to feed some part of our input array back to it and try to extend the result. Our base cases are: the empty list, returning 0, and an array with one element, returning 1.
Then,
For every index i up until the second to last element, calculate longest_increasing_subsequence up to there.
We can only extend the result with the last element if our last element is greater than arr[i] (since otherwise, it’s not increasing).
Keep track of the largest result.
Source: https://www.dailycodingproblem.com/blog/longest-increasing-subsequence/
**EDITS**:
What I mean by I don't understand when the code after the recursive call in the for loop is executed. Here is my understanding:
Some code calls lis([0, 8, 4, 12, 2]).
arr = [0, 8, 4, 12, 2] doesn't meet either of the two base cases.
The for loop makes the first call when i = 0 in the line ending_at_i = lis([]). This is the first base case, so it returns 0. I can't understand why control doesn't return to the for loop so that ending_at_i is set to 0, and the if condition is executed (because it surely isn't checked else [][-1] would throw an error), after which we can move on to the for loop making the second call when i = 1, third call when i = 2 which would branch into two calls, and so on.

Here's how this function works. Fist, it handles the degenerate cases where the list length is 0 or 1.
It then looks for the solution when the list length is >= 2. There are two possibilities for the longest sequence: (1) It may contain the last number in the list, or (2) It may not contain the last number in the list.
For case (1), if the last number in the list is in the longest sequence, then the number before it in the longest sequence must be one of the earlier numbers. Suppose the number before it in the sequence is at position x. Then the longest sequence is the longest sequence taken from the numbers in the list up to and including x, plus the last number in the list. So it recurses on all of the possible positions of x, which are 0 through the list length minus 2. It iterates i over range(len(arr)), which is 0 through len(arr)-1). But it then uses i as the upper bound in the slice, so the last element in the slice corresponds to indices -1 through len(arr)-2. In the case of -1, this is an empty slice, which handles the case where all values in the list before the last are >= the last element.
This handles case (1). For case (2), we just need to find the largest sequence from the sublist that excludes the last element. However, this check is missing from the posted code, which is why the wrong answer is given for a list like [1, 2, 3, 0]:
>>> longest_increasing_subsequence([1, 2, 3, 0])
0
>>>
Obviously the correct answer in this case is 3, not 0. This is fairly easy to fix, but somehow was left out of the posted version.
Also, as others have pointed out, creating a new slice each time it recurses is unnecessary and inefficient. All that's needed is to pass the length of the sublist to achieve the same result.

Here is a (hopefully good enough) explanation:
ending_at_i = the length of the LIS when you clip arr at the i-th index (that is, considering elements arr[0], arr[1], ..., arr[i-1].
if arr[-1] > arr[i - 1] and ending_at_i + 1 > max_ending_here
if arr[-1] > arr[i - 1] = if the last element of arr is greater than the last element of the part of arr correponding to ending_at_i
if ending_at_i + 1 > max_ending_here = if appending the last element of arr to the LIS found during computing ending_at_i is larger than the current best LIS
The recursive step is then:
Let an oracle tell you the length of the LIS in arr[:i] (= arr[0], arr[1], ..., arr[i-1])
realize that, if the last element of arr, that is, arr[-1], is larger than the last element of arr[:i], then whatever the LIS inside arr[:i] was, if you take it and append arr[-1], it will still be an LIS, except that it will be one element larger
Check whether arr[-1] is actually larger than arr[i-1], (= arr[:i][-1])
Check whether appending arr[-1] to the LIS of arr[:i] creates the new optimal solution
Repeat 1., 2., 3. for i in range(len(arr)).
The result will be the knowledge of the length of the LIS inside arr.
All that being said, since the recursive substep of this algorithm runs in O(n), there are very few worse feasible solutions to the problem.
You tagged dynamic programming, however, this is precisely the anti-example of such. Dynamic programming lets you reuse the solutions to subproblems, which is precisely what this algorithm doesn't do, hence wasting time. Check out a DP solution instead.

Sample Online Data Algorithm Analysis [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am going through a book called "Elements of Programming Interview" and have gotten stuck at the following problem:
Implement an algorithm that takes as input an array of distinct
elements and a size, and returns a subset of the given size of the
array elements. All subsets should be equally likely. Return the
result in input array itself.
The solution they provide below is:
import random
def random_sampling(k, A):
for i in range(k):
# Generate a random index in [i, len(A) - 1].
r = random.randint(i, len(A) - 1)
A[i], A[r] = A[r], A[i]
A = [3, 7, 5, 11]
k = 3
print(random_sampling(k, A))
I so not understand what the authors are trying to do intuitively. Their explanation is below
Another approach is to enumerate all subsets of size k and then select
one at random from these. Since there are (n to k) subsets of size k,
the time and space complexity are huge. The key to efficiently
building a random subset of size exactly k is to first build one of
size k - 1 and then adding one more element, selected randomly from
the rest. The problem is trivial when k = 1. We make one call to the
random number generator, take the returned value mod n (call it r),
and swap A[0] with A[r]. The entry A[0] now holds the result.
For k > 1, we begin by choosing one element at random as above and we
now repeat the same process with n - 1 element sub-array A[1, n -1].
Eventually, the random subset occupies the slots A[0, k - 1] and the
remaining elements are in the last n - k slots.
Intuitively, if all subsets of size k are equally likely, then the
construction process ensures that the subset of size k + 1 are also
equally likely. A formal proof for this uses mathematical induction -
the induction hypothesis is that every permutation of every size k
subset of A is equally likely to be in A[0, k -1].
As a concrete example, let the input be A = <3, 7, 5, 11> and the size
be 3. In the first iteration, we use the random number generator to
pick a random integer in the interval [0,3]. Let the returned random
number be 2. We swap A[0] with A[2] - now the array is <5, 7, 3, 11>.
Now we pick a random integer in the interval [1, 3]. Let the returned
random number be 3. We swap A[1] with A[3] - now the resulting array
is <5, 11, 3, 7>. Now we pick a random integer in the interval [2,3].
Let the returned random number be 2. When we swap A[2] with itself the
resulting array is unchanged. The random subset consists of he first
three entries, ie., {5, 11, 3}.
Sorry for the long text; my questions are this
What is the key to efficiency they are referring to? Its not clicking in my head
What did they mean by "eventually, the random subset occupies the slots A[0, k-1] and the remaining elements are in the last n - k slots"
is there a clear reason why "every permutation of every size k subset of A is equally likely to be in A[0, k - 1]"?
Can you explain the theory behind the algorithm in clearer terms?
What is the return of the algorithm supposed to be?
thanks

an intuitive solution might be
def random_sampling(k, A):
subset = []
selected = set()
for i in range(k):
index = random.randint(0, len(A) - 1)
while index in selected:
index = random.randint(0, len(A) - 1)
selected.add(index)
subset.append([A[index]])
return subset
but its not clear that every k subset has equal probability (because for the same k you may use different number of randoms on different ranges)
so a solution that fit the probability condition will be
import itertools as it
def random_sampling(k, A):
index_posibilities = [i for i in it.combinations(A,k)] #very expansive action
index = random.randint(0, len(index_posibilities) - 1)
selected = []
for i in index:
selected.append(A[i])
return selected
so the solution they gave makes sure you use the same procedure of randoms for every set of k elements without the brute force above
the order of the list is now, first k elements are these we selected, the rest of the list are the remaining items
this is the induction assumption, I assume that every set in length k-1 has the same probability and proof it for set of length k.
an efficient way to make sure the same probability for every k size sub set, is to do exactly the same steps to produce it
no return value because the list is being changed in the function is also changed in main, the subset is the first k elements of the list after the function being called

Evenly distribute within a list (Google Foobar: Maximum Equality)

This question comes from Google Foobar, and my code passes all but the last test, with the input/output hidden.
The prompt
In other words, choose two elements of the array, x[i] and x[j]
(i distinct from j) and simultaneously increment x[i] by 1 and decrement
x[j] by 1. Your goal is to get as many elements of the array to have
equal value as you can.
For example, if the array was [1,4,1] you could perform the operations
as follows:
Send a rabbit from the 1st car to the 0th: increment x[0], decrement
x[1], resulting in [2,3,1] Send a rabbit from the 1st car to the 2nd:
increment x[2], decrement x[1], resulting in [2,2,2].
All the elements are of the array are equal now, and you've got a
strategy to report back to Beta Rabbit!
Note that if the array was [1,2], the maximum possible number of equal
elements we could get is 1, as the cars could never have the same
number of rabbits in them.
Write a function answer(x), which takes the array of integers x and
returns the maximum number of equal array elements that we can get, by
doing the above described command as many times as needed.
The number of cars in the train (elements in x) will be at least 2,
and no more than 100. The number of rabbits that want to share a car
(each element of x) will be an integer in the range [0, 1000000].
My code
from collections import Counter
def most_common(lst):
data = Counter(lst)
return data.most_common(1)[0][1]
def answer(x):
"""The goal is to take all of the rabbits in list x and distribute
them equally across the original list elements."""
total = sum(x)
length = len(x)
# Find out how many are left over when distributing niavely.
div, mod = divmod(total, length)
# Because of the variable size of the list, the remainder
# might be greater than the length of the list.
# I just realized this is unnecessary.
while mod > length:
div += length
mod -= length
# Create a new list the size of x with the base number of rabbits.
result = [div] * length
# Distribute the leftovers from earlier across the list.
for i in xrange(mod):
result[i] += 1
# Return the most common element.
return most_common(result)
It runs well under my own testing purposes, handling one million tries in ten or so seconds. But it fails under an unknown input.
Have I missed something obvious, or did I make an assumption I shouldn't have?

Sorry, but your code doesn't work in my testing. I fed it [0, 0, 0, 0, 22] and got back a list of [5, 5, 4, 4, 4] for an answer of 3; the maximum would be 4 identical cars, with the original input being one such example. [4, 4, 4, 4, 6] would be another. I suspect that's your problem, and that there are quite a few other such examples in the data base.
For N cars, the maximum would be either N (if the rabbit population is divisible by the number of cars) or N-1. This seems so simple that I fear I'm missing a restriction in the problem. It didn't ask for a balanced population, just as many car populations as possible should be equal. In short:
def answer(census):
size = len(census)
return size if sum(census) % size == 0 else (size-1)

Random contiguous slice of list in Python based on a single random integer

Using a single random number and a list, how would you return a random slice of that list?
For example, given the list [0,1,2] there are seven possibilities of random contiguous slices:
[ ]
[ 0 ]
[ 0, 1 ]
[ 0, 1, 2 ]
[ 1 ]
[ 1, 2]
[ 2 ]
Rather than getting a random starting index and a random end index, there must be a way to generate a single random number and use that one value to figure out both starting index and end/length.
I need it that way, to ensure these 7 possibilities have equal probability.

Simply fix one order in which you would sort all possible slices, then work out a way to turn an index in that list of all slices back into the slice endpoints. For example, the order you used could be described by
The empty slice is before all other slices
Non-empty slices are ordered by their starting point
Slices with the same starting point are ordered by their endpoint
So the index 0 should return the empty list. Indices 1 through n should return [0:1] through [0:n]. Indices n+1 through n+(n-1)=2n-1 would be [1:2] through [1:n]; 2n through n+(n-1)+(n-2)=3n-3 would be [2:3] through [2:n] and so on. You see a pattern here: the last index for a given starting point is of the form n+(n-1)+(n-2)+(n-3)+…+(n-k), where k is the starting index of the sequence. That's an arithmetic series, so that sum is (k+1)(2n-k)/2=(2n+(2n-1)k-k²)/2. If you set that term equal to a given index, and solve that for k, you get some formula involving square roots. You could then use the ceiling function to turn that into an integral value for k corresponding to the last index for that starting point. And once you know k, computing the end point is rather easy.
But the quadratic equation in the solution above makes things really ugly. So you might be better off using some other order. Right now I can't think of a way which would avoid such a quadratic term. The order Douglas used in his answer doesn't avoid square roots, but at least his square root is a bit simpler due to the fact that he sorts by end point first. The order in your question and my answer is called lexicographical order, his would be called reverse lexicographical and is often easier to handle since it doesn't depend on n. But since most people think about normal (forward) lexicographical order first, this answer might be more intuitive to many and might even be the required way for some applications.
Here is a bit of Python code which lists all sequence elements in order, and does the conversion from index i to endpoints [k:m] the way I described above:
from math import ceil, sqrt
n = 3
print("{:3} []".format(0))
for i in range(1, n*(n+1)//2 + 1):
b = 1 - 2*n
c = 2*(i - n) - 1
# solve k^2 + b*k + c = 0
k = int(ceil((- b - sqrt(b*b - 4*c))/2.))
m = k + i - k*(2*n-k+1)//2
print("{:3} [{}:{}]".format(i, k, m))
The - 1 term in c doesn't come from the mathematical formula I presented above. It's more like subtracting 0.5 from each value of i. This ensures that even if the result of sqrt is slightly too large, you won't end up with a k which is too large. So that term accounts for numeric imprecision and should make the whole thing pretty robust.
The term k*(2*n-k+1)//2 is the last index belonging to starting point k-1, so i minus that term is the length of the subsequence under consideration.
You can simplify things further. You can perform some computation outside the loop, which might be important if you have to choose random sequences repeatedly. You can divide b by a factor of 2 and then get rid of that factor in a number of other places. The result could look like this:
from math import ceil, sqrt
n = 3
b = n - 0.5
bbc = b*b + 2*n + 1
print("{:3} []".format(0))
for i in range(1, n*(n+1)//2 + 1):
k = int(ceil(b - sqrt(bbc - 2*i)))
m = k + i - k*(2*n-k+1)//2
print("{:3} [{}:{}]".format(i, k, m))

It is a little strange to give the empty list equal weight with the others. It is more natural for the empty list to be given weight 0 or n+1 times the others, if there are n elements on the list. But if you want it to have equal weight, you can do that.
There are n*(n+1)/2 nonempty contiguous sublists. You can specify these by the end point, from 0 to n-1, and the starting point, from 0 to the endpoint.
Generate a random integer x from 0 to n*(n+1)/2.
If x=0, return the empty list. Otherwise, x is unformly distributed from 1 through n(n+1)/2.
Compute e = floor(sqrt(2*x)-1/2). This takes the values 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, etc.
Compute s = (x-1) - e*(e+1)/2. This takes the values 0, 0, 1, 0, 1, 2, 0, 1, 2, 3, ...
Return the interval starting at index s and ending at index e.
(s,e) takes the values (0,0),(0,1),(1,1),(0,2),(1,2),(2,2),...
import random
import math
n=10
x = random.randint(0,n*(n+1)/2)
if (x==0):
print(range(n)[0:0]) // empty set
exit()
e = int(math.floor(math.sqrt(2*x)-0.5))
s = int(x-1 - (e*(e+1)/2))
print(range(n)[s:e+1]) // starting at s, ending at e, inclusive

First create all possible slice indexes.
[0:0], [1:1], etc are equivalent, so we include only one of those.
Finally you pick a random index couple, and apply it.
import random
l = [0, 1, 2]
combination_couples = [(0, 0)]
length = len(l)
# Creates all index couples.
for j in range(1, length+1):
for i in range(j):
combination_couples.append((i, j))
print(combination_couples)
rand_tuple = random.sample(combination_couples, 1)[0]
final_slice = l[rand_tuple[0]:rand_tuple[1]]
print(final_slice)
To ensure we got them all:
for i in combination_couples:
print(l[i[0]:i[1]])
Alternatively, with some math...
For a length-3 list there are 0 to 3 possible index numbers, that is n=4. You have 2 of them, that is k=2. First index has to be smaller than second, therefor we need to calculate the combinations as described here.
from math import factorial as f
def total_combinations(n, k=2):
result = 1
for i in range(1, k+1):
result *= n - k + i
result /= f(k)
# We add plus 1 since we included [0:0] as well.
return result + 1
print(total_combinations(n=4)) # Prints 7 as expected.

there must be a way to generate a single random number and use that one value to figure out both starting index and end/length.
It is difficult to say what method is best but if you're only interested in binding single random number to your contiguous slice you can use modulo.
Given a list l and a single random nubmer r you can get your contiguous slice like that:
l[r % len(l) : some_sparkling_transformation(r) % len(l)]
where some_sparkling_transformation(r) is essential. It depents on your needs but since I don't see any special requirements in your question it could be for example:
l[r % len(l) : (2 * r) % len(l)]
The most important thing here is that both left and right edges of the slice are correlated to r. This makes a problem to define such contiguous slices that wont follow any observable pattern. Above example (with 2 * r) produces slices that are always empty lists or follow a pattern of [a : 2 * a].
Let's use some intuition. We know that we want to find a good random representation of the number r in a form of contiguous slice. It cames out that we need to find two numbers: a and b that are respectively left and right edges of the slice. Assuming that r is a good random number (we like it in some way) we can say that a = r % len(l) is a good approach.
Let's now try to find b. The best way to generate another nice random number will be to use random number generator (random or numpy) which supports seeding (both of them). Example with random module:
import random
def contiguous_slice(l, r):
random.seed(r)
a = int(random.uniform(0, len(l)+1))
b = int(random.uniform(0, len(l)+1))
a, b = sorted([a, b])
return l[a:b]
Good luck and have fun!

Sorting Technique Python

I'm trying to create a sorting technique that sorts a list of numbers. But what it does is that it compares two numbers, the first being the first number in the list, and the other number would be the index of 2k - 1.
2^k - 1 = [1,3,7, 15, 31, 63...]
For example, if I had a list [1, 4, 3, 6, 2, 10, 8, 19]
The length of this list is 8. So the program should find a number in the 2k - 1 list that is less than 8, in this case it will be 7.
So now it will compare the first number in the random list (1) with the 7th number in the same list (19). if it is greater than the second number, it will swap positions.
After this step, it will continue on to 4 and the 7th number after that, but that doesn't exist, so now it should compare with the 3rd number after 4 because 3 is the next number in 2k - 1.
So it should compare 4 with 2 and swap if they are not in the right place. So this should go on and on until I reach 1 in 2k - 1 in which the list will finally be sorted.
I need help getting started on this code.
So far, I've written a small code that makes the 2k - 1 list but thats as far as I've gotten.
a = []
for i in range(10):
a.append(2**(i+1) -1)
print(a)
EXAMPLE:
Consider sorting the sequence V = 17,4,8,2,11,5,14,9,18,12,7,1. The skipping
sequence 1, 3, 7, 15, … yields r=7 as the biggest value which fits, so looking at V, the first sparse subsequence =
17,9, so as we pass along V we produce 9,4,8,2,11,5,14,17,18,12,7,1 after the first swap, and
9,4,8,2,1,5,14,17,18,12,7,11 after using r=7 completely. Using a=3 (the next smaller term in the skipping
sequence), the first sparse subsequence = 9,2,14,12, which when applied to V gives 2,4,8,9,1,5,12,17,18,14,7,11, and the remaining a = 3 sorts give 2,1,8,9,4,5,12,7,18,14,17,11, and then 2,1,5,9,4,8,12,7,11,14,17,18. Finally, with a = 1, we get 1,2,4,5,7,8,9,11,12,14,17,18.
You might wonder, given that at the end we do a sort with no skips, why
this might be any faster than simply doing that final step as the only step at the beginning. Think of it as a comb
going through the sequence -- notice that in the earlier steps we’re using course combs to get distant things in the
right order, using progressively finer combs until at the end our fine-tuning is dealing with a nearly-sorted sequence
needing little adjustment.
p = 0
x = len(V) #finding out the length of V to find indexer in a
for j in a: #for every element in a (1,3,7....)
if x >= j: #if the length is greater than or equal to current checking value
p = j #sets j as p
So that finds what distance it should compare the first number in the list with but now i need to write something that keeps doing that until the distance is out of range so it switches from 3 to 1 and then just checks the smaller distances until the list is sorted.

The sorting algorithm you're describing actually is called Combsort. In fact, the simpler bubblesort is a special case of combsort where the gap is always 1 and doesn't change.
Since you're stuck on how to start this, here's what I recommend:
Implement the bubblesort algorithm first. The logic is simpler and makes it much easier to reason about as you write it.
Once you've done that you have the important algorithmic structure in place and from there it's just a matter of adding gap length calculation into the mix. This means, computing the gap length with your particular formula. You'll then modifying the loop control index and the inner comparison index to use the calculated gap length.
After each iteration of the loop you decrease the gap length(in effect making the comb shorter) by some scaling amount.
The last step would be to experiment with different gap lengths and formulas to see how it affects algorithm efficiency.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.