Linearize optimization of non-overlapping items along a sequence

Linearize optimization of non-overlapping items along a sequence - python

This is a follow-up to my previous question here. I have a optimization model that tries to find the highest coverage of a set of probe to a sequence. I approached it by creating an overlap matrix as shown below.
import pyomo
import pyomo.environ as pe
import pyomo.opt as po
import numpy as np
import matplotlib.pyplot as plt
# Initialise all sequences and probes
sequence = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
probes = ["a", "b", "c", "d", "e", "f", "g", "h"]
probe_starts = {"a": 0, "b": 1, "c": 4, "d": 5, "e": 6, "f": 8, "g": 13, "h": 12}
probe_ends = {"a": 2, "b": 2, "c": 6, "d": 6, "e": 8, "f": 11, "g": 15, "h": 14}
probe_lengths = {
p: e - s + 1 for (p, s), e in zip(probe_starts.items(), probe_ends.values())
}
# Create a matrix of probes against probes to check for overlap
def is_overlapping(x, y):
x_start, x_end = x
y_start, y_end = y
return (
(x_start >= y_start and x_start <= y_end)
or (x_end >= y_start and x_end <= y_end)
or (y_start >= x_start and y_start <= x_end)
or (y_end >= x_start and y_end <= x_end)
)
overlap = {}
matrix = np.zeros((len(probes), len(probes)))
for row, x in enumerate(zip(probe_starts.values(), probe_ends.values())):
for col, y in enumerate(zip(probe_starts.values(), probe_ends.values())):
matrix[row, col] = is_overlapping(x, y)
overlap[probes[row]] = list(matrix[row].astype(int))
I now build up my model as normal, adding a constraint that if one probe is assigned than any overlapping probes cannot be assigned.
# Model definition
model = pe.ConcreteModel()
model.probes = pe.Set(initialize=probes)
model.lengths = pe.Param(model.probes, initialize=probe_lengths)
model.overlap = pe.Param(model.probes, initialize=overlap, domain=pe.Any)
model.assign = pe.Var(model.probes, domain=pe.Boolean)
# Objective - highest coverage
obj = sum(model.assign[p] * probe_lengths[p] for p in model.probes)
model.objective = pe.Objective(expr=obj, sense=pe.maximize)
# Constraints
model.no_overlaps = pe.ConstraintList()
for query in model.probes:
model.no_overlaps.add(
sum(
[
model.assign[query] * model.assign[p]
for idx, p in enumerate(model.probes)
if model.overlap[query][idx]
]
)
<= 1
)
This works when solving with the quadratic BONMIN solver as shown below. However, when scaling up to a few thousand probes with significantly more overlap then this becomes prohibitively slowly.
solver = po.SolverFactory("BONMIN")
results = solver.solve(model)
visualize = np.zeros((len(probes), len(sequence)))
for idx, (start, end, val) in enumerate(
zip(probe_starts.values(), probe_ends.values(), model.assign.get_values().values())
):
visualize[idx, start : end + 1] = val + 1
plt.imshow(visualize)
plt.yticks(ticks=range(len(probes)), labels=probes)
plt.xticks(range(len(sequence)))
plt.colorbar()
plt.show()
Any suggestions regarding how to convert this into a linear problem would be appreciated. Thanks in advance!

You can attack this as an Integer Program (IP). There are 2 variables you need: one to indicate whether a probe has been "assigned" and another to indicate (or count) if a spot s in the sequence is covered by probe p in order to do the accounting.
It also helps to chop up the sequence into subsets (shown) that are indexed by the probes which could cover them, if assigned.
There is probably a dynamic programming approach to this as well that somebody might chip in. This works...
Code:
# model to make non-contiguous connections across a sequence
# with objective to "cover" as many points in sequence as possible
import pyomo.environ as pe
sequence = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
probes = ["a", "b", "c", "d", "e", "f", "g", "h"]
probe_starts = {"a": 0, "b": 1, "c": 4, "d": 5, "e": 6, "f": 8, "g": 13, "h": 12}
probe_ends = {"a": 2, "b": 2, "c": 6, "d": 6, "e": 8, "f": 11, "g": 15, "h": 14}
# sequence = [0, 1, 2, 3, 4, 5]
# probes = ["a", "b", "c"]
# probe_starts = {"a": 0, "b": 2, "c": 3}
# probe_ends = {"a": 2, "b": 4, "c": 5}
coverages = {p:[t for t in sequence if t>=probe_starts[p] and t<=probe_ends[p]] for p in probes}
# Model definition
model = pe.ConcreteModel()
model.sequence = pe.Set(initialize=sequence)
model.probes = pe.Set(initialize=probes)
# make an indexed set as convenience of probes:coverage ...
model.covers = pe.Set(model.probes, within=model.sequence, initialize=coverages)
model.covers_flat_set = pe.Set(initialize=[(p,s) for p in probes for s in model.covers[p]])
model.assign = pe.Var(model.probes, domain=pe.Binary) # 1 if probe p is used...
model.covered = pe.Var(model.covers_flat_set, domain=pe.Binary) # s is covered by p
# model.pprint()
# Objective
obj = sum(model.covered[p, s] for (p, s) in model.covers_flat_set)
model.objective = pe.Objective(expr=obj, sense=pe.maximize)
# Constraints
# selected probe must cover the associated points between start and end, if assigned
def cover(model, p):
return sum(model.covered[p, s] for s in model.covers[p]) == len(model.covers[p])*model.assign[p]
model.C1 = pe.Constraint(model.probes, rule=cover)
# cannot cover any point by more than 1 probe
def over_cover(model, s):
cov_options = [(p,s) for p in model.probes if (p, s) in model.covers_flat_set]
if not cov_options:
return pe.Constraint.Skip # no possible coverages
return sum(model.covered[p, s] for (p, s) in cov_options) <= 1
model.C2 = pe.Constraint(model.sequence, rule=over_cover)
solver = pe.SolverFactory('glpk')
result = solver.solve(model)
print(result)
#model.display()
# el-cheapo visualization...
for s in model.sequence:
probe = None
print(f'{s:3d}', end='')
for p in model.probes:
if (p, s) in model.covers_flat_set and model.assign[p].value:
probe = p
if probe:
print(f' {probe}')
else:
print()
Yields:
Problem:
- Name: unknown
Lower bound: 13.0
Upper bound: 13.0
Number of objectives: 1
Number of constraints: 24
Number of variables: 32
Number of nonzeros: 55
Sense: maximize
Solver:
- Status: ok
Termination condition: optimal
Statistics:
Branch and bound:
Number of bounded subproblems: 5
Number of created subproblems: 5
Error rc: 0
Time: 0.007474184036254883
Solution:
- number of solutions: 0
number of solutions displayed: 0
0 a
1 a
2 a
3
4 c
5 c
6 c
7
8 f
9 f
10 f
11 f
12 h
13 h
14 h
15
16
[Finished in 609ms]

Related

Pandas groupby perform computation that uses multiple rows and columns per group

Consider the following data:
, Animal, Color, Rank, X
0, c, b, 1, 9
1, c, b, 2, 8
2, c, b, 3, 7
3, c, r, 1, 6
4, c, r, 2, 5
5, c, r, 3, 4
6, d, g, 1, 3
7, d, g, 2, 2
8, d, g, 3, 1
I now want to group by ["Animal", "Color"] and, for every group, I want to subtract the X value that corresponds to Rank equal to 1 from every other X value in that group.
Currently I am looping like this:
dfs = []
for _, tmp in df.groupby(["Animal","Color"]):
baseline = tmp.loc[tmp["Rank"]==1,"X"].to_numpy()
tmp["Y"] = tmp["X"]-baseline
dfs.append(tmp)
dfs = pd.concat(dfs)
This yields the right result, i.e.,
The whole process is however really slow and I would prefer to use apply or transform instead.
My problem is that I am unable to find a way to use the whole grouped data within apply or transform.
Is there a way to accelerate my computation?
For completeness, here's my MWE:
df = pd.DataFrame(
{
"Animal": {0: "c", 1: "c", 2: "c", 3: "c", 4: "c", 5: "c", 6: "d", 7: "d", 8: "d"},
"Color": {0: "b", 1: "b", 2: "b", 3: "r", 4: "r", 5: "r", 6: "g", 7: "g", 8: "g"},
"Rank": {0: 1, 1: 2, 2: 3, 3: 1, 4: 2, 5: 3, 6: 1, 7: 2, 8: 3},
"X": {0: 9, 1: 8, 2: 7, 3: 6, 4: 5, 5: 4, 6: 3, 7: 2, 8: 1},
}
)

Maybe it's the same as OP's solution performance-wise, but a little bit shorter:
# Just to be sure that we won't mess up the ordering after groupby
df.sort_values(['Animal', 'Color', 'Rank'], inplace=True)
df['Y'] = df['X'] - df.groupby(['Animal', 'Color']).transform('first')['X']

I think I found a solution that is faster (at least in my use case):
# get a unique identifier for every group
df["_group"] = df.groupby(["Animal", "Color"]).ngroup()
# for every group, get that identifier and the value X to be subtracted
baseline = df.loc[df["Rank"] == 1, ["_group", "X"]]
# merge the original data and the baseline data on the group
# this gives a new column with the Rank==1 value of X
df = pd.merge(df, baseline, on="_group", suffixes=("", "_baseline"))
# perform arithmetic
df["Y"] = df["X"] - df["X_baseline"]
# drop intermediate columns
df.drop(columns=["_group", "X_baseline"], inplace=True)

Add padding based on partial sum

I have four given variables:
group size
total of groups
partial sum
1-D tensor
and I want to add zeros when the sum within a group reached the partial sum. For example:
groupsize = 4
totalgroups = 3
partialsum = 15
d1tensor = torch.tensor([ 3, 12, 5, 5, 5, 4, 11])
The expected result is:
[ 3, 12, 0, 0, 5, 5, 5, 0, 4, 11, 0, 0]
I have no clue how can I achieve that in pure pytorch. In python it would be something like this:
target = [0]*(groupsize*totalgroups)
cursor = 0
current_count = 0
d1tensor = [ 3, 12, 5, 5, 5, 4, 11]
for idx, ele in enumerate(target):
subgroup_start = (idx//groupsize) *groupsize
subgroup_end = subgroup_start + groupsize
if sum(target[subgroup_start:subgroup_end]) < partialsum:
target[idx] = d1tensor[cursor]
cursor +=1
Can anyone help me with that? I have already googled it but couldn't find anything.

Some logic, Numpy and list comprehensions are sufficient here.
I will break it down step by step, you can make it slimmer and prettier afterwards:
import numpy as np
my_val = 15
block_size = 4
total_groups = 3
d1 = [3, 12, 5, 5, 5, 4, 11]
d2 = np.cumsum(d1)
d3 = d2 % my_val == 0 #find where sum of elements is 15 or multiple
split_points= [i+1 for i, x in enumerate(d3) if x] # find index where cumsum == my_val
#### Option 1
split_array = np.split(d1, split_points, axis=0)
padded_arrays = [np.pad(array, (0, block_size - len(array)), mode='constant') for array in split_array] #pad arrays
padded_d1 = np.concatenate(padded_arrays[:total_groups]) #put them together, discard extra group if present
#### Option 2
split_points = [el for el in split_points if el <len(d1)] #make sure we are not splitting on the last element of d1
split_array = np.split(d1, split_points, axis=0)
padded_arrays = [np.pad(array, (0, block_size - len(array)), mode='constant') for array in split_array] #pad arrays
padded_d1 = np.concatenate(padded_arrays)

How to find the max array from both sides

Given an integer array A, I need to pick B elements from either left or right end of the array A to get maximum sum. If B = 4, then you can pick the first four elements or the last four elements or one from front and three from back etc.
Example input:
A = [5, -2, 3, 1, 2]
B = 3
The correct answer is 8 (by picking 5 from the left, and 1 and 2 from the right).
My code:
def solve(A, B):
n = len(A)
# track left most index and right most index i,j
i = 0
j = n-1
Sum = 0
B2 = B # B for looping and B2 for reference it
# Add element from front
for k in range(B):
Sum += A[k]
ans = Sum
# Add element from last
for _ in range(B2):
# Remove element from front
Sum -= A[i]
# Add element from last
Sum += A[j]
ans = max(ans, Sum)
return ans
But the answer I get is 6.

Solution
def max_bookend_sum(x, n):
bookends = x[-n:] + x[:n]
return max(sum(bookends[i : i + n]) for i in range(n + 1))
Explanation
Let n = 3 and take x,
>>> x = [4, 9, -7, 4, 0, 4, -9, -8, -6, 9]
Grab the "right" n elements, concatenate with the "left" n:
>>> bookends = x[-n:] + x[:n]
>>> bookends # last three elements from x, then first three
[-8, -6, 9, 4, 9, -7]
Take "sliding window" groups of n elements:
>>> [bookends[i : i + n] for i in range(n + 1)]
[[-8, -6, 9], [-6, 9, 4], [9, 4, 9], [4, 9, -7]]
Now, instead of producing the sublists sum them instead, and take the max:
>>> max(sum(bookends[i : i + n]) for i in range(n + 1))
22
For your large array A from the comments:
>>> max(sum(bookends[i : i + n]) for i in range(n + 1))
6253

Solution based on sum of the left and right slices:
Data = [-533, -666, -500, 169, 724, 478, 358, -38, -536, 705, -855, 281, -173, 961, -509, -5, 942, -173, 436, -609,
-396, 902, -847, -708, -618, 421, -284, 718, 895, 447, 726, -229, 538, 869, 912, 667, -701, 35, 894, -297, 811,
322, -667, 673, -336, 141, 711, -747, -132, 547, 644, -338, -243, -963, -141, -277, 741, 529, -222, -684,
35] # to avoid var shadowing
def solve(A, B):
m, ln = None, len(A)
for i in range(B):
r = -(B-i-1) # r is right index to slice
tmp = sum(A[0:i + 1]) + sum(A[r:]) if r < 0 else 0
m = tmp if m is None else max(m, tmp)
return m
print(solve(Data, 48)) # 6253

A recursive approach with comments.
def solve(A, B, start_i=0, end_i=None):
# set end_i to the index of last element
if end_i is None:
end_i = len(A) - 1
# base case 1: we have no more moves
if B == 0:
return 0
# base case 2: array only has two elemens
if end_i - start_i == 1:
return max(A)
# next, we need to choose whether to use one of our moves on
# the left side of the array or the right side. We compute both,
# then check which one is better.
# pick the left side to sum
sum_left = A[start_i] + solve(A, B - 1, start_i + 1, end_i)
# pick the right side to sum
sum_right = A[end_i] + solve(A, B - 1, start_i, end_i - 1)
# return the max of both options
return max(sum_left, sum_right)
arr = [5, -2, 3, 1, 2]
print(solve(arr, 3)) # prints 8

The idea is if we have this list:
[5, 1, 1, 8, 2, 10, -2]
Then the possible numbers for B=3 would be:
lhs = [5, 1, 1] # namely L[+0], L[+1], L[+2]
rhs = [2, 10, -2] # namely R[-3], R[-2], R[-1]
The possible combinations would be:
[5, 1, 1] # L[+0], L[+1], L[+2]
[5, 1, -2] # L[+0], L[+1], R[-1]
[5, 10, -2] # L[+0], R[-2], R[-1]
[2, 10, -2] # R[-3], R[-2], R[-1]
As you can see, we can easily perform forward and backward iterations which will start from all L (L[+0], L[+1], L[+2]), and then iteratively replacing the last element with an R (R[-1], then R[-2], then R[-3]) up until all are R (R[-3], then R[-2], then R[-1]).
def solve(A, B):
n = len(A)
max_sum = None
for lhs, rhs in zip(range(B, -1, -1), range(0, -(B+1), -1)):
combined = A[0:lhs] + (A[rhs:] if rhs < 0 else [])
combined_sum = sum(combined)
max_sum = combined_sum if max_sum is None else max(max_sum, combined_sum)
return max_sum
for A in [
[5, 1, 1, 8, 2, 10, -2],
[5, 6, 1, 8, 2, 10, -2],
[5, 6, 3, 8, 2, 10, -2],
]:
print(A)
print("\t1 =", solve(A, 1))
print("\t2 =", solve(A, 2))
print("\t3 =", solve(A, 3))
print("\t4 =", solve(A, 4))
Output
[5, 1, 1, 8, 2, 10, -2]
1 = 5
2 = 8
3 = 13
4 = 18
[5, 6, 1, 8, 2, 10, -2]
1 = 5
2 = 11
3 = 13
4 = 20
[5, 6, 3, 8, 2, 10, -2]
1 = 5
2 = 11
3 = 14
4 = 22

public int solve(int[] A, int B) {
int sum = 0;
int i = 0;
int n = A.length -1;
for (int k = 0; k < B; k++){
sum += A[k];
}
int ans = sum;
int B2 = B -1;
for (int j = n; j > n -B; j--){
sum -= A[B2];
sum += A[j];
ans = Math.max(ans, sum);
B2--;
}
return ans;
}
}

Checking if n elements in an array are increasing

I have written a code for SPC and I am attempting to highlight certain out of control runs.
So I was wondering if there was a way to pull out n(in my case 7) amount of increasing elements in an array so I can index with with the color red when I go to plot them.
This is what I attempted but I obviously get an indexing error.
import numpy as np
import matplotlib.pyplot as plt
y = np.linspace(0,10,15)
x = np.array([1,2,3,4,5,6,7,8,9,1,4,6,4,6,8])
col =[]
for i in range(len(x)):
if x[i]<x[i+1] and x[i+1]<x[i+2] and x[i+2]<x[i+3] and x[i+3]<x[i+4] and x[i+4]<x[i+5] and x[i+5]<x[i+6] and x[i+6]<x[i+7]:
col.append('red')
elif x[i]>x[i+1] and x[i+1]>x[i+2] and x[i+2]>x[i+3] and x[i+3]>x[i+4] and x[i+4]>x[i+5] and x[i+5]>x[i+6] and x[i+6]>x[i+7]:
col.append('red')
else:
col.append('blue')
for i in range(len(x)):
# plotting the corresponding x with y
# and respective color
plt.scatter(y[i], x[i], c = col[i], s = 10,
linewidth = 0)
Any help would be greatly appreciated!

As Andy said in his comment you get the index error because at i=8 you get to 15 which is the length of x.
Either you only loop over len(x)-7 and just repeat the last entry in col 7 times or you could do something like this:
import numpy as np
import matplotlib.pyplot as plt
y = np.linspace(0,10,20)
x = np.array([1,2,3,4,5,6,1,2,3,1,0,-1,-2,-3,-4,-5,-6,4,5])
col =[]
diff = np.diff(x) # get diff to see if x inc + or dec - // len(x)-1
diff_sign = np.diff(np.sign(diff)) # get difference of the signs to get either 1 (true) or 0 (false) // len(x)-2
zero_crossings = np.where(diff_sign)[0] + 2 # get indices (-2 from len(x)-2) where a zero crossing occures
diff_zero_crossings = np.diff(np.concatenate([[0],zero_crossings,[len(x)]])) # get how long the periods are till next zero crossing
for i in diff_zero_crossings:
if i >= 6:
for _ in range(i):
col.append("r")
else:
for _ in range(i):
col.append("b")
for i in range(len(x)):
# plotting the corresponding x with y
# and respective color
plt.scatter(y[i], x[i], c = col[i], s = 10,
linewidth = 0)
plt.show()

To determine if all integer elements of a list are ascending, you could do this:-
def ascending(arr):
_rv = True
for i in range(len(arr) - 1):
if arr[i + 1] <= arr[i]:
_rv = False
break
return _rv
a1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 8, 10, 11, 12, 13, 14, 16]
a2 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16]
print(ascending(a1))
print(ascending(a2))
If you want to limit the sequence of ascending values then you could just use nested loops. It may look inelegant but it's surprisingly efficient and much simpler than bringing dataframes into the mix:-
def ascending(arr, seq):
for i in range(len(arr) - seq + 1):
state = True
for j in range(i, i + seq - 1):
if arr[j] >= arr[j + 1]:
state = False
break
if state:
return True
return False
a1 = [100, 99, 98, 6, 7, 8, 10, 11, 12, 13, 14, 13]
a2 = [9, 8, 7, 6, 5, 4, 3, 2, 1]
print(ascending(a1, 7))
print(ascending(a2, 7))

Print 2d array on diagonals

I'm trying to print the diagonals for a 2d array starting with the bottom left corner and moving towards the top right corner. I've managed to print the first half of the matrix but I got stuck when I have to print the second part of it and I'm hoping somebody can give me a clue how to continue. Here is what I have:
matrix = [["A", "B", "C", "D"],
["E", "F", "G", "H"],
["I", "J", "K", "L"],
["M", "N", "O", "P"],
["Q", "R", "S", "T"]]
and the partial function that print the diagonals up to a point:
def diagonal_matrix_print(input_matrix):
width = len(input_matrix[0])
height = len(input_matrix)
start_row = height - 1
first_row = 0
for start_row in reversed(range(0, height)):
i = start_row
for column in range(0, width):
if i == height:
start_row = start_row - 1
break
print input_matrix[i][column]
i = i + 1
print
The issue I'm facing is printing the diagonals that start with the second half of the matrix - B G L, C H, D
I tried using another 2 for loops for it like:
for row in range (0, height -1):
i = row
for start_column in range(1, width):
print input_matrix[i][start_column]
i = i + 1
but when the row value changes to 1, is not printing the diagonal anymore...

Suppose we have the list of lists L:
>>> L = [[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
We want a function diagonals such that
>>> diagonals(L)
[[9], [6, 10], [3, 7, 11], [0, 4, 8], [1, 5], [2]]
We can think about items in L with respect to 2 different coordinate
systems. There is the usual (x, y) coordinate system where (x, y) corresponds to
the location with value L[x][y].
And then there is also the (p, q) coordinate system where p represents the
pth diagonal, with p=0 being the diag at the lower-left corner. And q
represents the qth item along the pth diag, with q=0 starting at the left
edge. Thus (p, q) = (0,0) corresponds to the location with value L[3][0] = 9
in the example above.
Let h,w equal the height and width of L respectively.
Then p ranges from 0 to h + w - 1.
We want a formula for translating from (p, q) coordinates to (x, y) coordinates.
x decreases linearly as p increases.
x increases linearly as q increases.
When (p, q) = (0, 0), x equals h.
Therefore: x = h - p + q.
y does not change with p (if q is fixed).
y increases linearly as q increases.
When (p, q) = (0, 0), y equals q.
Therefore, y = q.
Now the extent of valid values for x and y requires that:
(0 <= x = h - p + q < h) and (0 <= y = q < w)
which is equivalent to
(p - h + 1 <= q < p + 1) and (0 <= q < w)
which is equivalent to
max(p - h + 1, 0) <= q < min(p + 1, w)
Therefore we can loop through the items of L using
for p in range(h + w - 1):
for q in range(max(p-h+1, 0), min(p+1, w))
L[h - p + q - 1][q]
def diagonals(L):
h, w = len(L), len(L[0])
return [[L[h - p + q - 1][q]
for q in range(max(p-h+1, 0), min(p+1, w))]
for p in range(h + w - 1) ]
matrix = [ ["A", "B", "C", "D"], ["E","F","G","H"], ["I","J","K","L"], ["M","N","O","P"], ["Q", "R", "S","T"]]
for diag in diagonals(matrix):
print(diag)
yields
['Q']
['M', 'R']
['I', 'N', 'S']
['E', 'J', 'O', 'T']
['A', 'F', 'K', 'P']
['B', 'G', 'L']
['C', 'H']
['D']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Linearize optimization of non-overlapping items along a sequence - python

Related

Pandas groupby perform computation that uses multiple rows and columns per group

Add padding based on partial sum

How to find the max array from both sides

Checking if n elements in an array are increasing

Print 2d array on diagonals

Categories

Resources