Optimization Problem in Python (using binary array) possible solutions

Optimization Problem in Python (using binary array) possible solutions - python

I have a problem where I have a large binary numpy array (1000,2000). The general idea is that the array's columns represent time from 0 to 2000 and each row represents a task. Each 0 in the array represents a failure and each 1 represents success.
What I need to do is select 150 tasks(row axis) out of 1000 available and maximize the total successes (1s) over unique columns. It does not have to be consecutive and we are just looking to maximize success per time period (just need 1 success any additional is extraneous). I would like to select the best "Basket" of 150 tasks. The subarray rows can be taken anywhere from the 1000 initial rows. I want the optimal "Basket" of 150 tasks that lead to the most success across time (columns). (Edited for Additional Clarity)
A real basic example of what the array looks like :
array([[0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0],
[0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0],
[1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0],
[1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0],
[1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0]])
I have successfully created a Monte Carlo simulation using randomly generated baskets of tasks in NumPy and then going through the array and summing. As you can imagine this takes a while and given the large number of potential combinations it is inefficient. Can someone point me to a algorithm or way to set this problem up in a solver like PuLP?

Try this:
n = 150
row_sums = np.sum(x, axis=1)
top_n_row_sums = np.argsort(row_sums)[-n:]
max_successes = x[top_n_row_sums]
This takes the sum of every row, grabs the indices of the highest n sums, and indexes into x with those row indices.
Note that the rows will end up sorted in ascending order of their sums across the columns. If you want the rows in normal order (ascending order by index), use this instead:
max_successes = x[sorted(top_n_row_sums)]

Why not just calculate the sum of successes for each row and then you can easily pick the top 150 values.

Related

Python - Plotting a series of 1s after n periods of 1s in another series

This is a follow up to my question from earlier, here.
Pretend I have any iterable that consists of 0s and 1s, and would like a new series that plots 1s for m indices, n indices after the first 1 in the original series. How would I craft this function?
Here is an example:
arr = np.array([0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0])
>> function(arr, m=3, n=2)
np.array([0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0])
After each first 1 in every series of 1s within the original array, the function waits 2 periods or indices before plotting 3 consecutive 1s in a new array. The array length is preserved, and the function checks every index in the first array for a 1 even if there is a new series of 1s starting where a 1 is input in the returned array.
I apologize if that makes things more confusing but I think the point I'm making is that the process does not resume after 3 ones are plotted in the returned array, rather the process is continuous.

Scheduling optimization to minimize the number of timeslots (with constraints)

I'm working on a scheduling optimization problem where we have a set of tasks that need to be completed within a certain timeframe.
Each task has a schedule that specifies a list of time slots when it can be performed. The schedule for each task can be different depending on the weekday.
Here is small sample (reduced number of tasks and time slots):
task_availability_map = {
"T1" : [0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
"T2" : [0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
"T3" : [0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
"T4" : [0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
"T5" : [0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
"T6" : [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0],
"T7" : [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0],
"T8" : [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0],
"T9" : [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
"T10": [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0]
}
The constraint is that only up to N tasks can be performed in parallel within the same time slot (if they overlap). The group of parallel tasks always takes the same amount of time regardless of whether 1 or N are being done.
The objective is to minimize the number of time slots.
I've tried a brute force approach that generates all time slot index permutations. For each index in a given permutation, get all tasks that can be scheduled and add them to a list of tasks to be excluded in the next iteration. Once all iterations for a given permutation are completed, add the number of time slots and the combination of indices to a list.
def get_tasks_by_timeslot(timeslot, tasks_to_exclude):
for task in task_availability_map.keys():
if task in tasks_to_exclude:
continue
if task_availability_map[task][timeslot] == 1:
yield task
total_timeslot_count = len(task_availability_map.values()[0]) # 17
timeslot_indices = range(total_timeslot_count)
timeslot_index_permutations = list(itertools.permutations(timeslot_indices))
possible_schedules = []
for timeslot_variation in timeslot_index_permutations:
tasks_already_scheduled = []
current_schedule = []
for t in timeslot_variation:
tasks = list(get_tasks_by_timeslot(t, tasks_already_scheduled))
if len(tasks) == 0:
continue
elif len(tasks) > MAX_PARALLEL_TASKS:
break
tasks_already_scheduled += tasks
current_schedule.append(tasks)
time_slot_count = np.sum([len(t) for t in current_schedule])
possible_schedules.append([time_slot_count, timeslot_variation])
...
Sort possible schedules by number of time slots, and that's the solution. However, this algorithm grows in complexity exponentially with the number of time slots. Given there are hundreds of tasks and hundreds of time slots, I need a different approach.
Someone suggested LP MIP (such as Google OR Tools), but I'm not very familiar with it and am having a hard time formulating the constraints in code. Any help with either LP or some other solution that can help me get started in the right direction is much appreciated (doesn't have to be Python, can even be Excel).

My proposal for a MIP model:
Introduce binary variables:
x(i,t) = 1 if task i is assigned to slot t
0 otherwise
y(t) = 1 if slot t has at least one task assigned to it
0 otherwise
Furthermore let:
N = max number of tasks per slot
ok(i,t) = 1 if we are allowed to assign task i to slot t
0 otherwise
Then the model can look like:
minimize sum(t,y(t)) (minimize used slots)
sum(t, ok(i,t)*x(i,t)) = 1 for all i (each task is assigned to exactly one slot)
sum(i, ok(i,t)*x(i,t)) <= N for all t (capacity constraint for each slot)
y(t) >= x(i,t) for all (i,t) such that ok(i,t)=1
x(i,t),y(t) in {0,1} (binary variables)
Using N=3, I get a solution like:
---- 45 VARIABLE x.L assignment
s5 s6 s7 s13
task1 1.000
task2 1.000
task3 1.000
task4 1.000
task5 1.000
task6 1.000
task7 1.000
task8 1.000
task9 1.000
task10 1.000
The model is fairly simple and it should not be very difficult to code and solve it using your favorite MIP solver. The one thing you want to make sure is that only variables x(i,t) exist when ok(i,t)=1. In other words, make sure that variables do not appear in the model when ok(i,t)=0. It can help to interpret the assignment constraints as:
sum(t | ok(i,t)=1, x(i,t)) = 1 for all i (each task is assigned to exactly one slot)
sum(i | ok(i,t)=1, x(i,t)) <= N for all t (capacity constraint for each slot)
where | means 'such that' or 'where'. If you do this right, your model should have 50 variables x(i,t) instead of 10 x 17 = 170. Furthermore we can relax y(t) to be continuous between 0 and 1. It will be either 0 or 1 automatically. Depending on the solver that may affect performance.
I have no reason to believe this is easier to model as a constraint programming model or that it is easier to solve that way. My rule of thumb is, if it is easy to model as a MIP stick to a MIP. If we need to go through lots of hoops to make it a proper MIP, and a CP formulation makes life easier, then use CP. In many cases this simple rule works quite well.

Efficient way to subset and combine arrays of different lengths

Given a 3 dimensional boolean data:
np.random.seed(13)
bool_data = np.random.randint(2, size=(2,3,6))
>> bool_data
array([[[0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1]],
[[1, 0, 1, 1, 0, 0],
[0, 1, 1, 1, 1, 0],
[1, 1, 1, 0, 0, 0]]])
I wish to count the number of consecutive 1's bounded by two 0's in each row (along axis=1) and return a single array with the tally. For bool_data, this would give array([1, 1, 2, 4]).
Due to the 3D structure of bool_data and the variable tallies for each row, I had to clumsily convert the tallies into nested lists, flatten them using itertools.chain, then back-convert the list into an array:
# count consecutive 1's bounded by two 0's
def count_consect_ones(input):
return np.diff(np.where(input==0)[0])-1
# run tallies across all rows in bool_data
consect_ones = []
for i in range(len(bool_data)):
for j in range(len(bool_data[i])):
res = count_consect_ones(bool_data[i, j])
consect_ones.append(list(res[res!=0]))
>> consect_ones
[[], [1, 1], [], [2], [4], []]
# combines nested lists
from itertools import chain
consect_ones_output = np.array(list(chain.from_iterable(consect_ones)))
>> consect_ones_output
array([1, 1, 2, 4])
Is there a more efficient or clever way for doing this?

consect_ones.append(list(res[res!=0]))
If you use .extend instead, the content of the sequence is appended directly. That saves the step to combine the nested lists afterwards:
consect_ones.extend(res[res!=0])
Furthermore, you could skip the indexing, and iterate over the dimensions directly:
consect_ones = []
for i in bool_data:
for j in i:
res = count_consect_ones(j)
consect_ones.extend(res[res!=0])

We could use a trick to pad the columns with zeros and then look for ramp-up and ramp-down indices on a flattened version and finally filter out the indices corresponding to the border ones to give ourselves a vectorized solution, like so -
# Input 3D array : a
b = np.pad(a, ((0,0),(0,0),(1,1)), 'constant', constant_values=(0,0))
# Get ramp-up and ramp-down indices/ start-end indices of 1s islands
s0 = np.flatnonzero(b[...,1:]>b[...,:-1])
s1 = np.flatnonzero(b[...,1:]<b[...,:-1])
# Filter only valid ones that are not at borders
n = b.shape[2]
valid_mask = (s0%(n-1)!=0) & (s1%(n-1)!=a.shape[2])
out = (s1-s0)[valid_mask]
Explanation -
The idea with padding zeros at either ends of each row as "sentients" is that when we get one-off sliced array versions and compare, we could detect the ramp-up and ramp-down places with b[...,1:]>b[...,:-1] and b[...,1:]<b[...,:-1] respectively. Thus, we get s0 and s1 as the start and end indices for each of the islands of 1s. Now, we don't want the border ones, so we need to get their column indices traced back to the original un-padded input array, hence that bit : s0%(n-1) and s1%(n-1). We need to remove all cases where the start of each island of 1s are at the left border and end of each island of 1s at the right side border. The starts and ends are s0 and s1. So, we use those to check if s0 is 0 and s1 is a.shape[2]. These give us the valid ones. The island lengths are obtained with s1-s0, so mask it with valid-mask to get our desired output.
Sample input, output -
In [151]: a
Out[151]:
array([[[0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1]],
[[1, 0, 1, 1, 0, 0],
[0, 1, 1, 1, 1, 0],
[1, 1, 1, 0, 0, 0]]])
In [152]: out
Out[152]: array([1, 1, 2, 4])

sympy - find conflicting row in matrix

I have a matrix A = Matrix([[1, 0, 0, 20], [-1, 1, 0, 0], [-2, 1, 0, 0], [0, -1, 1, 0]]), a sympy object.
I want to know if there is a conflicting row - meaning a row that after i reduce the matrix, all the terms in the row are zero, apart from the rightmost one.
This seems easy to do on paper, but I think I misunderstand sympy.
Basically the output from rref method is not what I expected.
Notice that if we row reduce A with pen and paper, we should get Matrix([[1, 0, 0, 20], [0, 1, 0, 20], [0, 0, 0, 20], [0, 0, 1, 20]]) at a certain point.
So row number 2 is a conflicting row.
However when I use A.rref() I get something else entirely. I get Matrix([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]]) and list <class 'list'>: [0, 1, 2, 3]
I don't understand how they reached this result and how to interpet the list. How can I find the conflicting rows using sympy?

The answer by sympy is correct. The matrix you reached in reducing manually is not the end of the row-reduction process, which explains the difference between your answer and sympy's.
To continue the row-reduction from your matrix, swap rows 2 and 3 (the third and fourth rows), and you get
matrix([
[ 1, 0, 0, 20],
[ 0, 1, 0, 20],
[ 0, 0, 1, 20],
[ 0, 0, 0, 20]])
Now subtract row 3 (the last row) from each of the other rows, then divide that last row by 20, and we get
matrix([
[ 1, 0, 0, 0],
[ 0, 1, 0, 0],
[ 0, 0, 1, 0],
[ 0, 0, 0, 1]])
which is sympy's answer.
There are multiple ways to interpret this result. One way is to think of a system of 4 linear equations in 3 variables--the last column of the matrix hold the constants on the right side of the equations while the other columns are the variable coefficients. Your original matrix represents the equations
x = 20
- x + y = 0
- 2x + y = 0
- y + z = 0
and sympy's row reduction shows this system has the same solutions as
x = 0
y = 0
z = 0
0 = 1
which, of course, has no solutions at all, thanks to the last equation.
Also, you seem to have a misunderstanding of what row-reduction can do. You ask, "How can I find the conflicting rows using sympy?" and "if there is a conflicting row." Row reduction does not find which row conflicts, it finds if the rows together conflict. The rref process cannot show a conflicting row since it swaps rows if needed to get a non-zero pivot value in proper place, so the rows of the starting and the ending matrix do not correspond. Also, it is not true that one row conflicts with the others, just that all the rows together conflict. In your matrix, you could remove any one of the first 3 rows and the result will be non-conflicting. (Removing the last row still has a conflicting matrix.) So which row can you say conflicts? There usually is not one conflicting row, so rref() or any other method cannot possibly find one.

Find consecutive ones in numpy array

How can I find the amount of consecutive 1s (or any other value) in each row for of the following numpy array? I need a pure numpy solution.
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 2, 0, 0, 1, 1, 1],
[0, 0, 0, 4, 1, 0, 0, 0, 0, 1, 1, 0]])
There are two parts to my question, first: what is the maximum number of 1s in a row? Should be
array([2,3,2])
in the example case.
And second, what is the index of the start of the first set of multiple consecutive 1s in a row? For the example case this would be
array([3,9,9])
In this example I put 2 consecutive 1s in a row. But it should be possible to change that to 5 consecutive 1s in a row, this is important.
A similar question was answered using np.unique, but it only works for one row and not an array with multiple rows as the result would have different lengths.

Here's a vectorized approach based on differentiation -
import numpy as np
import pandas as pd
# Append zeros columns at either sides of counts
append1 = np.zeros((counts.shape[0],1),dtype=int)
counts_ext = np.column_stack((append1,counts,append1))
# Get start and stop indices with 1s as triggers
diffs = np.diff((counts_ext==1).astype(int),axis=1)
starts = np.argwhere(diffs == 1)
stops = np.argwhere(diffs == -1)
# Get intervals using differences between start and stop indices
start_stop = np.column_stack((starts[:,0], stops[:,1] - starts[:,1]))
# Get indices corresponding to max. interval lens and thus lens themselves
SS_df = pd.DataFrame(start_stop)
out = start_stop[SS_df.groupby([0],sort=False)[1].idxmax(),1]
Sample input, output -
Original sample case :
In [574]: counts
Out[574]:
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 2, 0, 0, 1, 1, 1],
[0, 0, 0, 4, 1, 0, 0, 0, 0, 1, 1, 0]])
In [575]: out
Out[575]: array([2, 3, 2], dtype=int64)
Modified case :
In [577]: counts
Out[577]:
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 2, 0, 1, 1, 1, 1],
[0, 0, 0, 4, 1, 1, 1, 1, 1, 0, 1, 0]])
In [578]: out
Out[578]: array([2, 4, 5], dtype=int64)
Here's a Pure NumPy version that is identical to the previous until we have start, stop. Here's the full implementation -
# Append zeros columns at either sides of counts
append1 = np.zeros((counts.shape[0],1),dtype=int)
counts_ext = np.column_stack((append1,counts,append1))
# Get start and stop indices with 1s as triggers
diffs = np.diff((counts_ext==1).astype(int),axis=1)
starts = np.argwhere(diffs == 1)
stops = np.argwhere(diffs == -1)
# Get intervals using differences between start and stop indices
intvs = stops[:,1] - starts[:,1]
# Store intervals as a 2D array for further vectorized ops to make.
c = np.bincount(starts[:,0])
mask = np.arange(c.max()) < c[:,None]
intvs2D = mask.astype(float)
intvs2D[mask] = intvs
# Get max along each row as final output
out = intvs2D.max(1)

I think one problem that is very similar is to check if between the sorted rows the element wise difference is a certain amount. Here if there is a difference of 1 between 5 consecutive would be as follows. It can also be done for difference of 0 for two cards:
cardAmount=cards[0,:].size
has4=cards[:,np.arange(0,cardAmount-4)]-cards[:,np.arange(cardAmount-3,cardAmount)]
isStraight=np.any(has4 == 4, axis=1)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Optimization Problem in Python (using binary array) possible solutions - python

Why not just calculate the sum of successes for each row and then you can easily pick the top 150 values.

Related

Python - Plotting a series of 1s after n periods of 1s in another series

Scheduling optimization to minimize the number of timeslots (with constraints)

Efficient way to subset and combine arrays of different lengths

sympy - find conflicting row in matrix

Find consecutive ones in numpy array

Categories

Resources