Generate ranges from sequences based on missing values

Generate ranges from sequences based on missing values - python

Given a range, range(101), I have known missing inputs from the sequence,
{ 8, 23, 56 }
In this range, the only significant numbers are the start and end, 0 and 100. Here's what my initialization looks like:
r = tuple(range(101))
init_start, init_end = r[0], r[-1]
missing = { 8, 23, 56 }
r = tuple(filter(lambda n: n not in missing, r))
Now here is where I get stuck. I don't know how to approach generating the sub-ranges where the holes are. The expected output here is 0, 7, 9, 22, 24, 55, and 57, 100. I could brute force it with the known missing values, but then it doesn't handle edge cases (what if I only have one value in the range?).
Edit
Someone posted a working solution for the "happy path" of the problem, but it misses edge cases:
how to handle sequentially missing values?
what if the start or end are in the missing values?
r = tuple(range(101))
init_start, init_end = r[0], r[-1]
missing = [8, 23, 56]
r = tuple(filter(lambda n: n not in missing, r))
def gen_ranges():
start = init_start
end = 0
for n in sorted(missing):
yield start
start = n + 1
end = n - 1
yield end
yield start
yield init_end
>>> list(gen_ranges())
[0, 7, 9, 22, 24, 55, 57, 100]

One solution using itertools.groupby():
holes = {8, 23, 56}
from itertools import groupby
def generate(holes, r=range(0, 101)):
for v, g in groupby(r, lambda k: k in holes):
if v is False:
l = [*g]
yield from (l[0], l[-1])
print(list(generate(holes)))
Prints:
[0, 7, 9, 22, 24, 55, 57, 100]
Other inputs:
holes = {8, 10, 56} # [0, 7, 9, 9, 11, 55, 57, 100]
holes = {8, 9, 56} # [0, 7, 10, 55, 57, 100]
EDIT (some explanation):
With itertools.groupby I'm making groups from the range() generator using the key function. Here is the key function k in holes (k is value from the range()). If the value returned from the key function changes, that means one consecutive group. I basically do this:
False [0, 1, 2, ... 5, 6, 7] # group 1 (Take first, last)
True [8, 9] # group 2
False [10, 11, ... 54, 55] # group 3 (Take first, last)
True [56] # group 4
False [57, 58, ... 99, 100] # group 5 (Take first, last)

I see a couple of issues:
You need to ensure the values in missing are sorted;
You're not outputting the last pair of values after you finish processing the values in missing
This should give the results you want (note I've added missing as a parameter to ease testing):
def gen_ranges(missing):
start = init_start
end = 0
for n in sorted(missing):
if n == start:
start = n + 1
continue
yield start
start = n + 1
end = n - 1
yield end
if start <= init_end:
yield start
yield init_end
print(list(gen_ranges({ 8, 9, 56 })))
print(list(gen_ranges({ 0 })))
print(list(gen_ranges({ 100 })))
print(list(gen_ranges({ 0, 100 })))
print(list(gen_ranges({ 1, 100 })))
print(list(gen_ranges({ 0, 99 })))
print(list(gen_ranges({ 0, 50, 100 })))
print(list(gen_ranges({ 0, 1, 50, 99, 100 })))
Output:
[0, 7, 10, 55, 57, 100]
[1, 100]
[0, 99]
[1, 99]
[0, 0, 2, 99]
[1, 98, 100, 100]
[1, 49, 51, 99]
[2, 49, 51, 98]

There's already a good accepted answer, but this was an interesting problem.
Here is my solution, which works with edge cases:
def gen_ranges(start, end, missing):
missing = sorted(missing + [start - 1, end + 1])
for num, value in enumerate(missing):
if value - missing[num - 1] > 1:
yield missing[num - 1] + 1
yield value - 1
print(list(gen_ranges(0, 100, [8, 23, 56]))) # [0, 7, 9, 22, 24, 55, 57, 100]
print(list(gen_ranges(0, 100, [5, 40, 50, 52, 93]))) # [0, 4, 6, 39, 41, 49, 51, 51, 53, 92, 94, 100]
print(list(gen_ranges(0, 100, [0, 50, 100]))) # [1, 49, 51, 99]
print(list(gen_ranges(0, 100, [0, 50, 51, 100]))) # [1, 49, 52, 99]

Related

How to generate sequential subsets of integers?

I have the following start and end values:
start = 0
end = 54
I need to generate subsets of 4 sequential integers starting from start until end with a space of 20 between each subset. The result should be this one:
0, 1, 2, 3, 24, 25, 26, 27, 48, 49, 50, 51
In this example, we obtained 3 subsets:
0, 1, 2, 3
24, 25, 26, 27
48, 49, 50, 51
How can I do it using numpy or pandas?
If I do r = [i for i in range(0,54,4)], I get [0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52].

This should get you what you want:
j = 20
k = 4
result = [split for i in range(0,55, j+k) for split in range(i, k+i)]
print (result)
Output:
[0, 1, 2, 3, 24, 25, 26, 27, 48, 49, 50, 51]

Maybe something like this:
r = [j for i in range(0, 54, 24) for j in range(i, i + 4)]
print(r)
[0, 1, 2, 3, 24, 25, 26, 27, 48, 49, 50, 51]

you can use numpy.arange which returns an ndarray object containing evenly spaced values within a given range
import numpy as np
r = np.arange(0, 54, 4)
print(r)
Result
[0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52]

Numpy approach
You can use np.arange to generate number with a step value of 20 + 4, where 20 is for space between each interval and 4 for each sequential sub array.
start = 0
end = 54
out = np.arange(0, 54, 24) # array([ 0, 24, 48]) These are the starting points
# for each subarray
step = np.tile(np.arange(4), (len(out), 1))
# [[0 1 2 3]
# [0 1 2 3]
# [0 1 2 3]]
res = out[:, None] + step
# array([[ 0, 1, 2, 3],
# [24, 25, 26, 27],
# [48, 49, 50, 51]])

This can be done with plane python:
rangeStart = 0
rangeStop = 54
setLen = 4
step = 20
stepTot = step + setLen
a = list( list(i+s for s in range(setLen)) for i in range(rangeStart,rangeStop,stepTot))
In this case you will get the subsets as sublists in the array.

I dont think you need to use numpy or pandas to do what you want. I achieved it with a simple while loop
num = 0
end = 54
sequence = []
while num <= end:
sequence.append(num)
num += 1
if num%4 == 0: //If four numbers have been added
num += 20
//output: [0, 1, 2, 3, 24, 25, 26, 27, 48, 49, 50, 51]

How can I evenly sample an array in Python, in order, according to a sample rate?

I have array_large and array_small. I need to evenly sample from array_large so that I end up with an array the same size as array_small. (Or in other words, I need a representative, downsized version of array_large to match up with array_small.)
As a super-trivial example, if array_small = [0, 1] and array_large = [0, 1, 2, 3] I would expect sample = [0, 2] or sample = [1, 3].

Let's imagine array_small is 30 items and array_large is 100.
array_small = [i for i in range(30)]
array_large = [i for i in range(100)]
sample_rate = len(array_large) / len(array_small)
In that case our sample_rate is 3.333... which means we want about every 3rd item, but sometimes every 4th item. Since the sample_rate is a float we can account for that with math.floor() and use the mod operator on the array index:
import math
array_large_sample = [
num for i, num in enumerate(array_large)
if math.floor(i % sample_rate) == 0
]
print(array_large_sample)
print(len(array_large_sample))
OUTPUT:
[0, 4, 7, 11, 14, 17, 21, 24, 27, 31, 34, 37, 41, 44, 47, 51, 54, 57, 61, 64, 67, 71, 74, 77, 81, 84, 87, 91, 94, 97]
30

How to swap the new added number into the correct position in binary heap?

This question of my homework has passed a list where index 1 is the new node and is also the root. Then I have to check if it's children is smaller then itself and swap it with the smaller child. I've written some code but it's not working.
def perc_down(data):
count = 0
index = 1
l, r = 2 * index, 2 * index + 1
while index < len(data):
if data[index] > data[l] and data[index] > data[r]:
min_i = data.index(min(data[l], data[r]))
data[index], data[min_i] = data[min_i], data[index]
count += 1
index = min_i
return count
values = [0, 100, 7, 8, 9, 22, 45, 12, 16, 27, 36]
swaps = perc_down(values)
print('Binary heap =',values)# should be [0, 7, 9, 8, 16, 22, 45, 12, 100, 27, 36]
print('Swaps =', swaps)# should be 3

Give l and r values inside the while loop
while index <= len(data) // 2:
l, r = 2 * index, 2 * index + 1
if r >= len(data):
r = index
if data[index] > data[l] or data[index] > data[r]:
min_i = data.index(min(data[l], data[r]))
data[index], data[min_i] = data[min_i], data[index]
count += 1
index = min_i
print(data) #Added this for easy debugging.
return count
And run the loop till half values only because it's binary min heap.
Output:
[0, 7, 100, 8, 9, 22, 45, 12, 16, 27, 36]
[0, 7, 9, 8, 100, 22, 45, 12, 16, 27, 36]
[0, 7, 9, 8, 16, 22, 45, 12, 100, 27, 36]
Binary heap = [0, 7, 9, 8, 16, 22, 45, 12, 100, 27, 36]
Swaps = 3
Revised the algorithm for those indices whose children do not exist.
For : values = [0, 100, 7, 11, 9, 8, 45, 12, 16, 27, 36] for 100 after 2 swaps comes at index 5 which does not have a right child so when it exceeds the length of list we just set it back to original index.
Heapified list : Binary heap = [0, 7, 8, 11, 9, 36, 45, 12, 16, 27, 100].

Find the longest arithmetic progression inside a sequence

Suppose I have a sequence of increasing numbers, and I want to find the length of longest arithmetic progression within the sequence. Longest arithmetic progression means an increasing sequence with common difference, such as [2, 4, 6, 8] or [3, 6, 9, 12].
For example,
for [5, 10, 14, 15, 17], [5, 10, 15] is the longest arithmetic progression, with length 3;
for [10, 12, 13, 20, 22, 23, 30], [10, 20, 30] is the longest arithmetic progression with length 3;
for [7, 10, 12, 13, 15, 20, 21], [10, 15, 20] or [7, 10, 13] are the longest arithmetic progressions with length 3.
This site
https://prismoskills.appspot.com/lessons/Dynamic_Programming/Chapter_22_-_Longest_arithmetic_progression.jsp
offers some insight into the problem, i.e. by looping around j and consider
every 3 elements. I intend to use this algorithm in Python, and my code is as follows:
def length_of_AP(L):
n = len(L)
Table = [[0 for _ in range(n)] for _ in range(n)]
length_of_AP = 2
# initialise the last column of the table as all i and (n-1) pairs have lenth 2
for i in range(n):
Table[i][n-1] =2
# loop around the list and i, k such that L[i] + L[k] = 2 * L[j]
for j in range(n - 2, 0, -1):
i = j - 1
k = j + 1
while i >= 0 and k < n:
difference = (L[i] + L[k]) - 2 * L[j]
if difference < 0:
k = k + 1
else:
if difference > 0:
i = i - 1
else:
Table[i][j] = Table[j][k] + 1
length_of_AP = max(length_of_AP, Table[i][j])
k = k + 1
i = i - 1
return length_of_AP
This function works fine with [1, 3, 4, 5, 7, 8, 9], but it doesn't work for [5, 10, 14, 15, 20, 25, 26, 27, 28, 30, 31], where I am supposed to get 6 but I got 4. I can see the reason being that 25, 26, 27, 28 inside the list may be a distracting factor for my function. How do I change my function so that it gives me the result desired.
Any help may be appreciated.

Following your link and running second sample, it looks like the code actually find proper LAP
5, 10, 15, 20, 25, 30,
but fails to find proper length. I didn't spend too much time analyzing the code but the piece
// Any 2-letter series is an AP
// Here we initialize only for the last column of lookup because
// all i and (n-1) pairs form an AP of size 2
for (int i=0; i<n; i++)
lookup[i][n-1] = 2;
looks suspicious to me. It seems that you need to initialize whole lookup table with 2 instead of just last column and if I do so, it starts to get correct length on your sample as well.
So get rid of the "initialise" loop and change your 3rd line to following code:
# initialise whole table with 2 as all (i, j) pairs have length 2
Table = [[2 for _ in range(n)] for _ in range(n)]
Moreover their
Sample Execution:
Max AP length = 6
3, 5, 7, 9, 11, 13, 15, 17,
Contains this bug as well and actually prints correct sequence only because of sheer luck. If I modify the sortedArr to
int sortedArr[] = new int[] {3, 4, 5, 7, 8, 9, 11, 13, 14, 15, 16, 17, 18, 112, 113, 114, 115, 116, 117, 118};
I get following output
Max AP length = 7
112, 113, 114, 115, 116, 117, 118,
which is obviously wrong as original 8-items long sequence 3, 5, 7, 9, 11, 13, 15, 17, is still there.

Did you try it?
Here's a quick brute force implementation, for small datasets it should run fast enough:
def gen(seq):
diff = ((b-a, a) for a, b in it.combinations(sorted(seq), 2))
for d, n in diff:
k = []
while n in seq:
k.append(n)
n += d
yield (d, k)
def arith(seq):
return max(gen(seq), key=lambda x: len(x[1]))
In [1]: arith([7, 10, 12, 13, 15, 20, 21])
Out[1]: (3, [7, 10, 13])
In [2]: %timeit arith([7, 10, 12, 13, 15, 20, 21])
10000 loops, best of 3: 23.6 µs per loop
In [3]: seq = {random.randrange(1000) for _ in range(100)}
In [4]: arith(seq)
Out[4]: (171, [229, 400, 571, 742, 913])
In [5]: %timeit arith(seq)
100 loops, best of 3: 3.79 ms per loop
In [6]: seq = {random.randrange(1000000) for _ in range(1000)}
In [7]: arith(seq)
Out[7]: (81261, [821349, 902610, 983871])
In [8]: %timeit arith(seq)
1 loop, best of 3: 434 ms per loop

Grouping tuple columns so their sum is less than 1

I need to create a list of groups of items, grouped so that the sum of the negative logarithms of the probabilities is roughly 1.
So far I've come up with
probs = np.random.dirichlet(np.ones(50)*100.,size=1).tolist()
logs = [-1 * math.log(1-x,2) for x in probs[0]]
zipped = zip(range(0,50), logs)
for key, igroup in iter.groupby(zipped, lambda x: x[1] < 1):
print(list(igroup))
I.e. I create a list of random numbers, take their negative logarithms, then zip these probabilities together with the item number.
I then want to create groups by adding together the numbers in the second column of the tuple until the sum is 1 (or slightly above it).
I've tried:
for key, igroup in iter.groupby(zipped, lambda x: x[1]):
for thing in igroup:
print(list(iter.takewhile(lambda x: x < 1, iter.accumulate(igroup))))
and various other variations on using itertools.accmuluate, but I can't get it to work.
Does anyone have an idea of what could be going wrong (I think I'm doing too much work).
Ideally, the output should be something like
groups = [[1,2,3], [4,5], [6,7,8,9]]
etc i.e these are the groups which satisfy this property.

Using numpy.ufunc.accumulate and simple loop:
import numpy as np
def group(xs, start=1):
last_sum = 0
for stop, acc in enumerate(np.add.accumulate(xs), start):
if acc - last_sum >= 1:
yield list(range(start, stop))
last_sum = acc
start = stop
if start < stop:
yield list(range(start, stop))
probs = np.random.dirichlet(np.ones(50) * 100, size=1)
logs = -np.log2(1 - probs[0])
print(list(group(logs)))
Sample output:
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]]
ALTERNATIVE
Using numpy.searchsorted:
def group(xs, idx_start=1):
xs = np.add.accumulate(xs)
idxs = np.searchsorted(xs, np.arange(xs[-1]) + 1, side='left').tolist()
return [list(range(i+idx_start, j+idx_start)) for i, j in zip([0] + idxs, idxs)]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Generate ranges from sequences based on missing values - python

Related

How to generate sequential subsets of integers?

How can I evenly sample an array in Python, in order, according to a sample rate?

How to swap the new added number into the correct position in binary heap?

Find the longest arithmetic progression inside a sequence

Grouping tuple columns so their sum is less than 1

Categories

Resources