Select a random subset of indices, with minimum consecutive count - python

I would like to select a random subset of indices from a numpy array with the caveat that I need each randomly selected index to be part of a consecutive "cluster" of at least three indices in a row.
For example, if I have an array that contains 25 items
a = np.arange(0,25)
I want to make sure that no index is selected without including at least two neighboring indices. So, for example, if I was looking for a subset of length 12, the following two options both fulfill this.
# this has 3 consecutive, followed by 5 consecutive, followed by 4 consecutive
rand_subset_1 = [0,1,2,9,10,11,12,13,18,19,20,21]
# this has 6 consecutive, followed by 3 consecutive, followed by 3 consecutive
rand_subset_2 = [3,4,5,6,7,8,14,15,16,22,23,24]
Attempted Answer
I tried to figure this out initially by dividing a into lists of three.
a_mod = np.array([0,1,2],[3,4,5],[6,7,8],...[21,22,23])
and then using np.random.choice(a_mod, subset_length/3, replace=False)
However this doesn't solve my problem, for two reasons.
I want to be able to input arrays with lengths that don't have to be divisible by three.
I don't mind if the subset indices are in cluster sizes that also aren't divisible by three. I just need the cluster to have at least three consecutive indices.
Clarification Edit:
Is there a method that allows every number in the subset of indices is part of a "cluster" of consecutive numbers? Ideally this wouldn't limit the cluster to be divisible by a particular integer (which is where I got stuck on my attempted solution above), but would be flexible in allowing clusters to be random lengths with a specified minimum cluster size.
Thanks in advance for any help with this problem!

Use the following function.
It selects an index at random and add two consecutive indices.
After that, select indices without considering the indices selected already.
def select_consequtive_index(a, m, n = 3):
# a: array
# m: number of index to be selected
# n: minimum of consequtive counts
output = []
x = np.random.choice(a)
if x == 0:
output += [x, x+1, x+2]
elif x == a[-1]:
output += [x-2, x-1, x]
else:
output += [x-1, x, x+1]
output += np.random.choice(list(set(a) - set(output)), m - n, replace = False).tolist()
output = np.array(output)
output.sort()
return output
code sample.
a = np.arange(0, 25)
print(select_consequtive_index(a, m = 12, n = 3))
The result is as follows.
[ 3 4 7 8 9 10 11 12 17 21 22 24]

Related

How do I fix code that calculates the amount of combinations in the partitions of a set?

I am working on a code in Python 2 that partitions a set of 13 elements using integer partitions, then evaluating the different combinations they can have (order does not matter). I have seen the ways people do this by using recursive functions to calculate every partition in a set retroactively, but for what I'm working on I'm taking a different approach.
I'm working with the logic that the different ways a set can be partitioned is determined by the integer partitions of a set. For a set of 4 elements, it can be partitioned in these ways:
[1,1,1,1]
[1,1,2]
[2,2]
[1,3]
[4]
Every number stands for the length of a subset in the partition. Using this info, I can then calculate all of the combinations that can be used with these different integer partitions. If I add the number of combinations from each partition together, I should receive the Bell number (the number of possible partitions in a set). For a list of 4 elements, the Bell number should be 15.
My code runs through the subset lengths in each partition, sets the length of the set to n and the subset length to r, then calculates the combinations in the specific subset. When it goes to the next subset, it subtracts the previous r from n to account for it lessening the amount of combinations available, as n gets smaller when a subset is already defined.
My code, however, is lackluster. When inputting 4 as the length of the set, it outputs 16 (instead of 15). When inputting 5, it outputs 48 (instead of 52). When inputting 13, it outputs 102,513 (instead of 27,644,437). I need it to be exact rather than an estimate.
This is in part because of if elem != 1: not properly accounting for a list of all ones or a list of one subset. It's also in part because it doesn't account for repeats of a combination when appearing in a subset. In [2,2] for a list of 4 elements, it considers the subset to contain 6 combinations when in reality it contains 3.
I'm stuck on how to solve this issue, as I only know enough Python to get by. The way the code currently outputs is how I prefer it to output, obviously without the errors.
The recursive function that calculates the integer partitions is from Nicolas Blanc, and the rest was coded by myself. Important links: Bell number, Partition of a set
import math
in_par = []
stack = []
bell = 0
def partitions(remainder, start_number = 1):
if remainder == 0:
in_par.append(list(stack))
#print stack
else:
for nb_to_add in range(start_number, remainder+1):
stack.append(nb_to_add)
partitions(remainder - nb_to_add, nb_to_add)
stack.pop()
x = partitions(13) # <------- input element count here
for part in in_par:
part.reverse()
combinations = 0
n = 13 # <------- input element count here
for i,elem in enumerate(part):
r = elem
combo = 0
if elem != 1:
if i != (len(part) - 1):
combo = math.factorial(n) / (math.factorial(r) * math.factorial(n-r))
n = n - elem
combinations = combinations + combo
bell = bell + combinations
part.append([combinations])
print part
#print str(bell)
print "Bell Number: " + str(bell)

Finding minimum number of points to cover all segments

Hi I have a problem as below:
Given a set of n segments {[a0, b0], [a1, b1], . . . , [an-1, bn-1]} with integer coordinates on a line, find the minimum number m of points such that each segment contains at least one point. That is, find a set of integers X of the minimum size such that for any segment [ai,bi] there is a point x ∈ X such that ai ≤ x ≤ bi.
Input Format: The first line of the input contains the number n of segments. Each of the following n lines contains two integers ai and bi (separated by a space) defining the coordinates of endpoints of the i-th segment.
Output Format: Output the minimum number m of points on the first line and the integer coordinates of m points (separated by spaces) on the second line. You can output the points in any order. If there are many such sets of points, you can output any set. (It is not difficult to see that there always exist a set of points of the minimum size such that all the coordinates of the points are integers.)
Sample 1:
Input: 3
1 3
2 5
3 6
Output: 1 3
Explanation:
In this sample, we have three segments: [1,3],[2,5],[3,6] (of length 2,3,3 respectively). All of them contain the point with coordinate 3: 1 ≤3 ≤3, 2 ≤3 ≤5, 3 ≤ 3 ≤ 6.
Sample 2:
Input: 4
4 7
1 3
2 5
5 6
Output: 2
3 6
Explanation:
The second and the third segments contain the point with coordinate 3 while the first and the fourth segments contain the point with coordinate 6. All the four segments cannot be covered by a single point, since the segments [1, 3] and [5, 6] are disjoint.
Solution:
The greedy choice is selecting the minimum right endpoint. Then remove all segments that contains that endpoint. Keep choosing minimum right endpoint and removing segments.
I followed the solution. I found the minimum right endpoint, removed all segments that contain that endpoint in my code. Then execute the function again with the new segments list (Keep choosing minimum right endpoint and removing segments - Recursive) but I'm stuck with the order of my code and can't make it works.
list_time = [[4,7],[1,3],[2,5],[5,6]]
def check_inside_range(n, lst): #Function to check if a number is inside the range of start and end of a list
#for example 3 is in [3,5], 4 is not in [5,6], return False if in
if lst[1]-n>=0 and n-lst[0]>=0:
return False
else:
return True
def lay_chu_ki(list_time):
list_time.sort(key = lambda x: x[1]) #Sort according to the end of each segments [1,3],[2,5],[5,6],[4,7]
first_end = list_time[0][1] #Selecting the minimum right endpoint
list_after_remove = list(filter(lambda x: check_inside_range(first_end, x),list_time))
#Remove all segments that contains that endpoint
lay_chu_ki(list_after_remove) #Keep doing the function again with new segments list
#(Keep choosing minimum right endpoint and removing segments)
return first_end #I don't know where to put this line.
print(lay_chu_ki(list_time))
As you can see, I've already done 3 steps: Selecting the minimum right endpoint; Remove all segments that contains that endpoint; Keep choosing minimum right endpoint and removing segments but it won't work somehow. I tried to print two numbers 3 and 6 first (the return result of each recursive call). I also tried to create a count variable to count each recursive call (count +=1) but it didn't work too since it reset count = 0 for each call.
I think recursion overcomplicates the implementation. While it's still feasible, you have to pass in a bunch of extra parameters, which could be difficult to track. In my opinion, it's much simpler to implement this approach iteratively.
Also, your approach repeatedly uses filter() and list(), which takes linear time every time you do it (to clarify, "linear" means linear in the size of the input list). In the worst case, you would perform that operation for every element in the list, which means that the runtime of your original implementation is quadratic (assuming you fix the existing issues with your code). This approach avoids that by making a single pass through the list:
def lay_chu_ki(list_time):
list_time.sort(key=lambda x: x[1])
idx = 0
selected_points = []
while idx != len(list_time):
selected_point = list_time[idx][1]
while idx != len(list_time) and list_time[idx][0] <= selected_point:
idx += 1
selected_points.append(selected_point)
return selected_points
result = lay_chu_ki(list_time)
print(len(result))
print(' '.join(map(str, result)))
With the given list, this outputs:
2
3 6

Indexing across multiple intervals

I am trying to extract the n-th element from a set of multiple intervals. I am currently dealing with genome sequences. Assume we have a gene with a gap in the middle. The position of this gene within the whole DNA is:
gene = [100,110], [130,140]
# representing the lists [100,101,...,109] and [130, 131,...,139]
# the gene spans over these entries of the DNA, so it looks like -gene-gap-gene-
Now, for a position within the gene (e.g. 10th position), I want to find the corresponding position on the whole DNA (which would be 109 in this example).
The function should do the following:
function(gene, 9)
> 109
function(gene, 10)
> 130
My approach is to explicitly generate the two sequences, concatenate them and take the n-th element of this list. However, for large lists (as they happen to occur), this is very inefficient.
Can anyone think of a simple way?
Thanks in advance!
A generic solution, should work for as many gaps in the gene as you want:
gene = [[100,110], [130,140]]
def function(gene, n):
for span in gene:
span_len = span[1] - span[0]
if n <= span_len:
return n + span[0] - 1
else:
n -= span_len
print(function(gene,10))
print(function(gene,11))
your function can be provided both lists and you can find which list you should be indexing and where using the size of the lists
so if you do function(gene, 10) and function(gene, 11)
10 <= len(List1) but 11 > len(list1) so you know you need to access the second list in the case of 11, and the right element is 11 - len(list1) -1 which is index 0 but for the second list.

Algorithm for understanding behavior of arrays [duplicate]

This question already has answers here:
How to most efficiently increase values at a specified range in a large array and then find the largest value
(5 answers)
Closed 5 years ago.
You are given a list of size N, initialized with zeroes. You have to perform M operations on the list and output the maximum of final values of all the N elements in the list. For every operation, you are given three integers a,b and k and you have to add value to all the elements ranging from index to (both inclusive).
Input Format
First line will contain two integers N and M separated by a single space.
Next M lines will contain three integers a,b and k separated by a single space.
Numbers in list are numbered from 1 to N .
Constraints
Click here
Output Format
A single line containing maximum value in the updated list.
Sample Input
5 3
1 2 100
2 5 100
3 4 100
Sample Output
200
Explanation
After first update list will be 100 100 0 0 0.
After second update list will be 100 200 100 100 100.
After third update list will be 100 200 200 200 100.
So the required answer will be 200.
One of the Solutions with less time complexity
n, inputs = [int(n) for n in input().split(" ")]
list = [0]*(n+1)
for _ in range(inputs):
x, y, incr = [int(n) for n in input().split(" ")]
list[x-1] += incr
if((y)<=len(list)):
list[y] -= incr
max = x = 0
for i in list:
x=x+i;
if(max<x):max=x
print(max)
Can someone explain the above solution?
Basically it stores deltas rather than the final list; that means each operation only takes 2 reads and writes rather than (b - a + 1). Then the final max scan adds the deltas as it goes along, which is still an O(n) operation which you would have had to do anyway.
n, inputs = [int(n) for n in input().split(" ")]
Get the list size (n) and number of operations (m), ie 5 and 3
list = [0]*(n+1)
Create an empty 0-filled list. Should be lst = [0] * n (do not use list as a variable name, it shadows the built-in type) (we do not need an extra end cell, except as a checksum on our algorithm - if it works properly the final checksum should be 0).
for _ in range(inputs):
x, y, incr = [int(n) for n in input().split(" ")]
Get an operation (a, b, k) ie 1, 2, 100.
list[x-1] += incr
Add delta to the starting cell
if((y)<=len(list)):
list[y] -= incr
Subtract the delta from the ending cell (should be if y < n: lst[y] -= incr)
The algorithm may be easier to understand if you add a print(lst) here (after the if but inside the for loop).
Now process the deltas to find the maximum item:
max = x = 0
for i in list:
x=x+i;
x is now the value the actual value of the current list cell. Also max is a terrible variable name because it shadows the built-in max() function.
if(max<x):max=x
Keep a running max tally
print(max)
Show the result.

How to create a list of all possible lists satisfying a certain condition?

I'm currently trying to do project Euler problem 18 (https://projecteuler.net/problem=18), using the 'brute force' method to check all possible paths. I've just been trying the smaller, 'model' triangle so far.
I was using list comprehension to create a list of lists where the inner lists would contain the indices for that line, for example:
lst = [[a,b,c,d] for a in [0] for b in [0,1] for c in [0,1,2] for d in
[0,1,2,3] if b == a or b == a + 1 if c == b or c == b + 1 if d == c or d ==
c + 1]
This gives me the list of lists I want, namely:
[[0,0,0,0],[0,0,0,1],[0,0,1,1],[0,0,1,2],[0,1,1,1],[0,1,1,2],[0,1,2,2],
[0,1,2,3]]
Note: the if conditions ensure that it only moves to adjacent numbers in the next row of the triangle, so that
lst[i][j] = lst[i][j-1] or lst[i][j] = lst[i][j]-1
After I got to this point, I intended that for each of the inner lists, I would take the numbers associated with those indices (so [0,0,0,0] would be 3,7,2,8) and sum over them, and this way get all of the possible sums, then take the maximum of those.
The problem is that if I were to scale this up to the big triangle I'd have fifteen 'for's and 'if's in my list comprehension. It seems like there must be an easier way! I'm pretty new to Python so hopefully there's some obvious feature I can make use of that I've missed so far!
What an interesting question! Here is a simple brute force approach, note the use of itertools to generate all the combinations, and then ruling out all the cases where successive row indices differ by more than one.
import itertools
import numpy as np
# Here is the input triangle
tri = np.array([[3],[7,4],[2,4,6],[8,5,9,3]])
indices = np.array([range(len(i)) for i in tri])
# Generate all the possible combinations
indexCombs = list(itertools.product(*indices))
# Generate the difference between indices in successive rows for each combination
diffCombs = [np.array(i[1:]) - np.array(i[:-1]) for i in indexCombs]
# The only combinations that are valid are when successive row indices differ by 1 or 0
validCombs = [indexCombs[i] for i in range(len(indexCombs)) if np.all(diffCombs[i]**2<=1)]
# Now get the actual values from the triangle for each row combination
valueCombs = [[tri[i][j[i]] for i in range(len(tri))] for j in validCombs]
# Find the sum for each combination
sums = np.sum(valueCombs, axis=1)
# Print the information pertaining to the largest sum
print 'Highest sum: {0}'.format(sums.max())
print 'Combination: {0}'.format(valueCombs[sums.argmax()])
print 'Row indices: {0}'.format(indexCombs[sums.argmax()])
The output is:
Highest sum: 23
Combination: [3, 7, 4, 9]
Row indices: (0, 0, 1, 0)
Unfortunately this is hugely intensive computationally, so it won't work with the large triangle - but there are definitely some concepts and tools that you could extend to try get it to work!

Categories

Resources