How I can formulat my optimization problem with Gekko? - python

I want to formulate the objective function (minimization problem): sum[sum[Ri*{Pi² + (Qi - Qcj*Xij)²}for j in range(Nc)] for i in range(N) ] with P and Q are the constants, Qc is a list of proposed solution and X is our decision variable (binary variable). I'm trying to get the vector X which minimizes the objective function.
here is my attempt:
from gekko import GEKKO
import numpy as np
P=[13.10511598922975,11.2611396806742,10.103920431906348,8.199519500182628,6.411296067052755,4.753519719147589,3.8977762462825973,2.6593092284662734,1.6399999999854893]
Q=[5.06643685386732,4.4344047044589585,3.8082608015186405,3.2626022579039584,1.2568869621197523,0.6152693459109657,0.46237064874523776,0.35226399840832523,0.20000000001140983]
R=[0.1233, 0.014, 0.7463, 0.6984, 1.9831, 0.9053, 2.0552, 4.7953, 5.3434]
Qc=[150, 300, 450, 600,750, 900,1050, 1200,1350,1500,1650,1800,1950,2100,2250,2400,2550,2700,2850,3000,3150,3300,3450,3600,3750,3900,4050]
N=len(Q)
Nc=len(Qc)
m = GEKKO(remote=False)
X = m.Array(m.Var,(N,Nc),integer=True,lb=0,ub=1,value=0)
#convirtir P et Q en KW
for i in range(N):
Q[i]=Q[i]*1000
P[i]=P[i]*1000
#constrainte ## one per line
for i in range(N):
m.Equation(m.sum([X[i][j]for j in range(Nc)])<=1)
b=m.sum([m.sum([R[i]*((P[i]**2)+((Q[i])-Qc[j]*X[i][j])**2) for j in range(Nc)]) for i in range(N)])
m.Minimize(b)
I tried 3 methods:
method 1:
m.options.SOLVER = 1
m.solve()
method 2:
bv = np.array([[0, 0, 0, 0, 0, 0, 0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0, 0, 0, 0, 0, 0, 0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1],
[0, 0, 0, 0, 0, 0, 0, 0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0, 0, 0, 0, 0, 0, 0, 0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0, 0, 0, 0, 0, 0, 1, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0, 0, 0, 0, 0, 0, 0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0, 0, 0, 0, 0, 0, 0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0, 0, 0, 0, 1, 0, 0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0, 0, 0, 0, 1, 0, 0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]])
for i in range(N):
for j in range(Nc):
X[i,j].value = bv[i,j]
m.options.SOLVER = 1
m.solve()
method 3:
m.options.SOLVER = 3
m.solve(debug=0, disp=True)
m.options.SOLVER = 1
m.solve(debug=0, disp=True)
here is my attempt:
I tried 3 methods:
method 1:
method 2
method 3:
The 3 methods don't give me the optimal solution.

Use the solver options to get a better solution by not terminating when the gap_tol is met at 1e-3 (default). The gap_tol is an early termination criterion that helps obtain MINLP solutions faster, but with a less optimal solution. Setting gap_tol to zero and minlp_max_iter_with_int_sol to a large number will iterate through all remaining potential solutions. The computational time increases so I recommend a smaller gap_tol such as 1e-5 and minlp_max_iter_with_int_sol 2000.
m.solver_options = ['minlp_gap_tol 1e-5',\
'minlp_maximum_iterations 10000',\
'minlp_max_iter_with_int_sol 2000']
m.options.SOLVER=1
m.solve(disp=True)
This gives a solution with an objective of 9.535e9.
---------------------------------------------------
Solver : APOPT (v1.0)
Solution time : 170.673799999990 sec
Objective : 9535331689.96189
Successful solution
---------------------------------------------------
The objective with the initial guess fixed is 9.541e9.
bv = np.array([[0, 0, 0, 0, 0, 0, 0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0, 0, 0, 0, 0, 0, 0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1],
[0, 0, 0, 0, 0, 0, 0, 0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0, 0, 0, 0, 0, 0, 0, 0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0, 0, 0, 0, 0, 0, 1, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0, 0, 0, 0, 0, 0, 0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0, 0, 0, 0, 0, 0, 0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0, 0, 0, 0, 1, 0, 0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0, 0, 0, 0, 1, 0, 0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]])
for i in range(N):
for j in range(Nc):
X[i,j].value = bv[i,j]
m.Equation(X[i,j]==bv[i,j])
---------------------------------------------------
Solver : APOPT (v1.0)
Solution time : 2.180000001681037E-002 sec
Objective : 9540896947.56266
Successful solution
---------------------------------------------------
A couple other suggestions that didn't help the solution accuracy, but may be things to consider in the future:
The speed of solution can be improved with this alternative for expressing the objective function.
b=[sum([R[i]*((P[i]**2)+((Q[i])-Qc[j]*X[i][j])**2)
for j in range(Nc)]) for i in range(N)]
[m.Minimize(bi) for bi in b]
The objective function is quite high >1e9 at the solution. You could consider increasing the solver tolerance and keeping the problem in MW versus kW by removing these lines.
#convirtir P et Q en KW
#for i in range(N):
# Q[i]=Q[i]*1000
# P[i]=P[i]*1000

Related

An elegant/faster way to find the end points of a line on a image?

I've been working to improve the speed of my code by replacing for loops of array operations to appropriate NumPy functions.
The function aims to get the end points of a line, which is the only two points that has exactly one neighbor pixel in 255.
Is there a way I could get two points from np.where with conditions or some NumPy functions I'm not familiar with will do the job?
def get_end_points(image):
x1=-1
y1=-1
x2=-1
y2=-1
for i in range(image.shape[0]):
for j in range(image.shape[1]):
if image[i][j]==255 and neighbours_sum(i,j,image) == 255:
if x1==-1:
x1 = j
y1 = i
else:
x2=j
y2=i
return x1,y1,x2,y2
Here is a solution with convolution:
import numpy as np
import scipy.signal
def find_endpoints(img):
# Kernel to sum the neighbours
kernel = [[1, 1, 1],
[1, 0, 1],
[1, 1, 1]]
# 2D convolution (cast image to int32 to avoid overflow)
img_conv = scipy.signal.convolve2d(img.astype(np.int32), kernel, mode='same')
# Pick points where pixel is 255 and neighbours sum 255
endpoints = np.stack(np.where((img == 255) & (img_conv == 255)), axis=1)
return endpoints
# Test
img = np.zeros((1000, 1000), dtype=np.uint8)
# Draw a line from (200, 130) to (800, 370)
for i in range(200, 801):
j = round(i * 0.4 + 50)
img[i, j] = 255
print(find_endpoints(img))
# [[200 130]
# [800 370]]
EDIT:
You may also consider using Numba for this. The code would be pretty much what you already have, so maybe not particularly "elegant", but much faster. For example, something like this:
import numpy as np
import numba as nb
#nb.njit
def find_endpoints_nb(img):
endpoints = []
# Iterate through every row and column
for i in range(img.shape[0]):
for j in range(img.shape[1]):
# Check current pixel is white
if img[i, j] != 255:
continue
# Sum neighbours
s = 0
for ii in range(max(i - 1, 0), min(i + 2, img.shape[0])):
for jj in range(max(j - 1, 0), min(j + 2, img.shape[1])):
s += img[ii, jj]
# Sum including self pixel for simplicity, check for two white pixels
if s == 255 * 2:
endpoints.append((i, j))
if len(endpoints) >= 2:
break
if len(endpoints) >= 2:
break
return np.array(endpoints)
print(find_endpoints_nb(img))
# [[200 130]
# [800 370]]
This runs comparatively faster in my computer:
%timeit find_endpoints(img)
# 34.4 ms ± 64.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit find_endpoints_nb(img)
# 552 µs ± 4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Also, it should use less memory. The code above assumes there will be only two endpoints. You may be able to make it even faster if you add parallelization (although you would have to make some changes, because you would not be able to modify the list endpoints from parallel threads).
Edit: I didnt notice you have grayscale image, but as far as the idea is concerned, nothing changed
I cannot give you exact solution, but I can give you faster way to find what you want
1) a) Find indexes (pixels) where is white [255,255,255]
indice =np.where(np.all(image==255, axis=2))
1) b) do your loops around this points
this is faster because you are not doing useless loops
2) This solution should be very very fast, but it will be hard to program
a) find the indexes like in 1)
indice =np.where(np.all(image==255, axis=2))
b) move indice array +1 in X axis and add it to image
indices = =np.where(np.all(image==255, axis=2))
indices_up = # somehow add to all indexes in x dimension +1 (simply move it up)
add_up = image[indices]+image[indices_up]
# if in add_up matrix is array with(rgb channel) [510,510,510] # 255+255, then it has neightbour in x+1
# Note that you cant do it with image of dtype uint8, because 255 is max, adding up you will end up back at 255
You have to this for all neighbours though -> x+1,x-1,y+1,y-1, x+1,y+1....
It will be extra fast tough
EDIT2: I was able to make a script that should do it, but you should test it first
import numpy as np
image = np.array([[0, 0, 0, 0, 0, 0, 0,0,0],
[0, 0, 255, 0, 0, 0, 0,0,0],
[0, 0, 255, 0, 255, 0, 0,0,0],
[0, 0, 0, 255,0, 255, 0,0,0],
[0, 0, 0, 0, 0, 255, 0,0,0],
[0, 0, 0, 0, 0, 0, 0,0,0],
[0, 0, 0, 0, 0, 0, 0,0,0]])
image_f = image[1:-1,1:-1] # cut image
i = np.where(image_f==255) # find 255 in the cut image
x = i[0]+1 # calibrate x indexes for original image
y = i[1]+1 # calibrate y indexes for original image
# this is done so you dont search in get_indexes() out of image
def get_indexes(xx,yy,image):
for i in np.where(image[xx,yy]==255):
for a in i:
yield xx[a],yy[a]
# Search for horizontal and vertical duplicates(neighbours)
for neighbours_index in get_indexes(x+1,y,image):
print(neighbours_index )
for neighbours_index in get_indexes(x-1,y,image):
print(neighbours_index )
for neighbours_index in get_indexes(x,y+1,image):
print(neighbours_index )
for neighbours_index in get_indexes(x,y-1,image):
print(neighbours_index )
I think I can at least provide an elegant solution using convolutions.
We can look for the amount of neighbouring pixels by convolving the original image with a 3x3 ring. Then we can determine if the line end was there if the center pixel also had a white pixel in it.
>>> import numpy as np
>>> from scipy.signal import convolve2d
>>> a = np.array([[0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 1, 1, 0]])
>>> a
array([[0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 1, 0]])
>>> c = np.full((3, 3), 1)
>>> c[1, 1] = 0
>>> c
array([[1, 1, 1],
[1, 0, 1],
[1, 1, 1]])
>>> np.logical_and(convolve2d(a, c, mode='same') == 1, a == 1).astype(int)
array([[0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
Feel free to see what the individual components produce, but for the sake of brevity I didn't include them here. And as you might have noticed, it does correctly reject cases where the line ends with two neighbouring pixels.
This you can of course convert to the arbitrary amount of indices of line endings with np.where:
np.array(np.where(result))

Scheduling optimization to minimize the number of timeslots (with constraints)

I'm working on a scheduling optimization problem where we have a set of tasks that need to be completed within a certain timeframe.
Each task has a schedule that specifies a list of time slots when it can be performed. The schedule for each task can be different depending on the weekday.
Here is small sample (reduced number of tasks and time slots):
task_availability_map = {
"T1" : [0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
"T2" : [0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
"T3" : [0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
"T4" : [0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
"T5" : [0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
"T6" : [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0],
"T7" : [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0],
"T8" : [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0],
"T9" : [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
"T10": [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0]
}
The constraint is that only up to N tasks can be performed in parallel within the same time slot (if they overlap). The group of parallel tasks always takes the same amount of time regardless of whether 1 or N are being done.
The objective is to minimize the number of time slots.
I've tried a brute force approach that generates all time slot index permutations. For each index in a given permutation, get all tasks that can be scheduled and add them to a list of tasks to be excluded in the next iteration. Once all iterations for a given permutation are completed, add the number of time slots and the combination of indices to a list.
def get_tasks_by_timeslot(timeslot, tasks_to_exclude):
for task in task_availability_map.keys():
if task in tasks_to_exclude:
continue
if task_availability_map[task][timeslot] == 1:
yield task
total_timeslot_count = len(task_availability_map.values()[0]) # 17
timeslot_indices = range(total_timeslot_count)
timeslot_index_permutations = list(itertools.permutations(timeslot_indices))
possible_schedules = []
for timeslot_variation in timeslot_index_permutations:
tasks_already_scheduled = []
current_schedule = []
for t in timeslot_variation:
tasks = list(get_tasks_by_timeslot(t, tasks_already_scheduled))
if len(tasks) == 0:
continue
elif len(tasks) > MAX_PARALLEL_TASKS:
break
tasks_already_scheduled += tasks
current_schedule.append(tasks)
time_slot_count = np.sum([len(t) for t in current_schedule])
possible_schedules.append([time_slot_count, timeslot_variation])
...
Sort possible schedules by number of time slots, and that's the solution. However, this algorithm grows in complexity exponentially with the number of time slots. Given there are hundreds of tasks and hundreds of time slots, I need a different approach.
Someone suggested LP MIP (such as Google OR Tools), but I'm not very familiar with it and am having a hard time formulating the constraints in code. Any help with either LP or some other solution that can help me get started in the right direction is much appreciated (doesn't have to be Python, can even be Excel).
My proposal for a MIP model:
Introduce binary variables:
x(i,t) = 1 if task i is assigned to slot t
0 otherwise
y(t) = 1 if slot t has at least one task assigned to it
0 otherwise
Furthermore let:
N = max number of tasks per slot
ok(i,t) = 1 if we are allowed to assign task i to slot t
0 otherwise
Then the model can look like:
minimize sum(t,y(t)) (minimize used slots)
sum(t, ok(i,t)*x(i,t)) = 1 for all i (each task is assigned to exactly one slot)
sum(i, ok(i,t)*x(i,t)) <= N for all t (capacity constraint for each slot)
y(t) >= x(i,t) for all (i,t) such that ok(i,t)=1
x(i,t),y(t) in {0,1} (binary variables)
Using N=3, I get a solution like:
---- 45 VARIABLE x.L assignment
s5 s6 s7 s13
task1 1.000
task2 1.000
task3 1.000
task4 1.000
task5 1.000
task6 1.000
task7 1.000
task8 1.000
task9 1.000
task10 1.000
The model is fairly simple and it should not be very difficult to code and solve it using your favorite MIP solver. The one thing you want to make sure is that only variables x(i,t) exist when ok(i,t)=1. In other words, make sure that variables do not appear in the model when ok(i,t)=0. It can help to interpret the assignment constraints as:
sum(t | ok(i,t)=1, x(i,t)) = 1 for all i (each task is assigned to exactly one slot)
sum(i | ok(i,t)=1, x(i,t)) <= N for all t (capacity constraint for each slot)
where | means 'such that' or 'where'. If you do this right, your model should have 50 variables x(i,t) instead of 10 x 17 = 170. Furthermore we can relax y(t) to be continuous between 0 and 1. It will be either 0 or 1 automatically. Depending on the solver that may affect performance.
I have no reason to believe this is easier to model as a constraint programming model or that it is easier to solve that way. My rule of thumb is, if it is easy to model as a MIP stick to a MIP. If we need to go through lots of hoops to make it a proper MIP, and a CP formulation makes life easier, then use CP. In many cases this simple rule works quite well.

Find and flat repeated values in numpy array

I want to find values in an np array that are repeated more than x times and set them to 0.
Lets say this is my array:
[255,0,0,255,255,255,0,0,255,255,255,255,255,0,0]
I want to set to 0 all parts that are repeated more than x times.
Lets say, x = 3, the output array will be:
[255,0,0,255,255,255,0,0,0,0,0,0,0,0,0]
If x = 2:
[255,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
Of course, I can loop over the indexes, count them and set to 0, but there's got to be a faster and more efficient way (the purpose is to remove horizontal grids from an image).
Using pandas
s = pd.Series(x)
n = 5
s.groupby((s != s.shift()).cumsum()).apply(lambda z: z if z.size < n else pd.Series([0]*z.size)).values
array([255, 0, 0, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int64)
n = 2
array([255, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int64)
You may be able to solve this by viewing at your data using a rolling window with length x+1 and hopsize 1. If all values in this window are equal, set them all to zero. Rolling windows can easily be done using SciKit image's view_as_windows():
import numpy
import skimage
x = 3
data = numpy.asarray([255,0,0,255,255,255,0,0,255,255,255,255,255,0,0])
data_view = skimage.util.view_as_windows(data, window_shape=(x + 1,))
mask = numpy.all(numpy.isclose(data_view, data_view[..., 0, None]), axis=1)
data_view[mask, :] = 0
data
# array([255, 0, 0, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Replace specific values in a matrix using Python

I have a m x n matrix where each row is a sample and each column is a class. Each row contains the soft-max probabilities of each class. I want to replace the maximum value in each row with 1 and others with 0. How can I do it efficiently in Python?
Some made up data:
>>> a = np.random.rand(5, 5)
>>> a
array([[ 0.06922196, 0.66444783, 0.2582146 , 0.03886282, 0.75403153],
[ 0.74530361, 0.36357237, 0.3689877 , 0.71927017, 0.55944165],
[ 0.84674582, 0.2834574 , 0.11472191, 0.29572721, 0.03846353],
[ 0.10322931, 0.90932896, 0.03913152, 0.50660894, 0.45083403],
[ 0.55196367, 0.92418942, 0.38171512, 0.01016748, 0.04845774]])
In one line:
>>> (a == a.max(axis=1)[:, None]).astype(int)
array([[0, 0, 0, 0, 1],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 1, 0, 0, 0]])
A more efficient (and verbose) approach:
>>> b = np.zeros_like(a, dtype=int)
>>> b[np.arange(a.shape[0]), np.argmax(a, axis=1)] = 1
>>> b
array([[0, 0, 0, 0, 1],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 1, 0, 0, 0]])
I think the best answer to your particular question is to use a matrix type object.
A sparse matrix should be the most performant in terms of storing large numbers of these matrices of large sizes in a memory friendly way, given that most of the matrix is populated with zeroes. This should be superior to using numpy arrays directly especially for very large matrices in both dimensions, if not in terms of speed of computation, in terms of memory.
import numpy as np
import scipy #older versions may require `import scipy.sparse`
matrix = np.matrix(np.random.randn(10, 5))
maxes = matrix.argmax(axis=1).A1
# was .A[:,0], slightly faster, but .A1 seems more readable
n_rows = len(matrix) # could do matrix.shape[0], but that's slower
data = np.ones(n_rows)
row = np.arange(n_rows)
sparse_matrix = scipy.sparse.coo_matrix((data, (row, maxes)),
shape=matrix.shape,
dtype=np.int8)
This sparse_matrix object should be very lightweight relative to a regular matrix object, which would needlessly track each and every zero in it. To materialize it as a normal matrix:
sparse_matrix.todense()
returns:
matrix([[0, 0, 0, 0, 1],
[0, 0, 1, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[1, 0, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0],
[0, 1, 0, 0, 0],
[1, 0, 0, 0, 0],
[0, 0, 0, 1, 0]], dtype=int8)
Which we can compare to matrix:
matrix([[ 1.41049496, 0.24737968, -0.70849012, 0.24794031, 1.9231408 ],
[-0.08323096, -0.32134873, 2.14154425, -1.30430663, 0.64934781],
[ 0.56249379, 0.07851507, 0.63024234, -0.38683508, -1.75887624],
[-0.41063182, 0.15657594, 0.11175805, 0.37646245, 1.58261556],
[ 1.10421356, -0.26151637, 0.64442885, -1.23544526, -0.91119517],
[ 0.51384883, 1.5901419 , 1.92496778, -1.23541699, 1.00231508],
[-2.42759787, -0.23592018, -0.33534536, 0.17577329, -1.14793293],
[-0.06051458, 1.24004714, 1.23588228, -0.11727146, -0.02627196],
[ 1.66071534, -0.07734444, 1.40305686, -1.02098911, -1.10752638],
[ 0.12466003, -1.60874191, 1.81127175, 2.26257234, -1.26008476]])
This approach using basic numpy and list comprehensions works, but is the least performant. I'm leaving this answer here as it may be somewhat instructive. First we create a numpy matrix:
matrix = np.matrix(np.random.randn(2,2))
matrix is, e.g.:
matrix([[-0.84558168, 0.08836042],
[-0.01963479, 0.35331933]])
Now map 1 to a new matrix if the element is max, else 0:
newmatrix = np.matrix([[1 if i == row.max() else 0 for i in row]
for row in np.array(matrix)])
newmatrix is now:
matrix([[0, 1],
[0, 1]])
Y = np.random.rand(10,10)
X=np.zeros ((5,5))
y_insert=2
x_insert=3
offset = (1,2)
for index_x, row in enumerate(X):
for index_y, e in enumerate(row):
Y[index_x + offset[0]][index_y + offset[1]] = e

Python - creating a list with 2 characteristics bug

The goal is to create a list of 99 elements. All elements must be 1s or 0s. The first element must be a 1. There must be 7 1s in total.
import random
import math
import time
# constants determined through testing
generation_constant = 0.96
def generate_candidate():
coin_vector = []
coin_vector.append(1)
for i in range(0, 99):
random_value = random.random()
if (random_value > generation_constant):
coin_vector.append(1)
else:
coin_vector.append(0)
return coin_vector
def validate_candidate(vector):
vector_sum = sum(vector)
sum_test = False
if (vector_sum == 7):
sum_test = True
first_slot = vector[0]
first_test = False
if (first_slot == 1):
first_test = True
return (sum_test and first_test)
vector1 = generate_candidate()
while (validate_candidate(vector1) == False):
vector1 = generate_candidate()
print vector1, sum(vector1), validate_candidate(vector1)
Most of the time, the output is correct, saying something like
[1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0] 7 True
but sometimes, the output is:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 2 False
What exactly am I doing wrong?
I'm not certain I understand your requirements, but here's what it sounds like you need:
#!/usr/bin/python3
import random
ones = [ 1 for i in range(6) ]
zeros = [ 0 for i in range(99 - 6) ]
list_ = ones + zeros
random.shuffle(list_)
list_.insert(0, 1)
print(list_)
print(list_.count(1))
print(list_.count(0))
HTH
The algorithm you gave works, though it's slow. Note that the ideal generation_constant can actually be calculated using the binomial distribution. The optimum is &approx;0.928571429 which will fit the conditions 1.104% of the time. If you set the first element to 1 manually, then the optimum generation_constant is &approx;0.93877551 which will fit the conditions 16.58% of the time.
The above is based on the binomial distribution, which says that the probability of having exactly k "success" events out of N total tries where each try has probability p will be P( k | N, p ) = N! * p ^ k * (1 - p) ^ (N - k) / ( n! * (N - k)). Just stick that into Excel, Mathematica, or a graphing calculator and maximize P.
Alternatively:
To generate a list of 99 numbers where the first and 6 additional items are 1 and the remaining elements are 0, you don't need to call random.random so much. Generating pseudo-random numbers is very expensive.
There are two ways to avoid calling random so much.
The most processor efficient way is to only call random 6 times, for the 6 ones you need to insert:
import random
# create vector of 99 0's
vector = [0 for i in range(99)]
# set first element to 1
vector[0] = 1
# list of locations of all 0's
indexes = range(1, 99)
# only need to loop 6 times for remaining 6 ones
for i in range(6):
# select one of the 0 locations at random
# "pop" it from the list so it can't be selected again
# and set it's coresponding element in vector to 1.
vector[indexes.pop(random.randint(0, len(indexes) - 1))] = 1
Alternatively, to save on memory, you can just test each new index to make sure it will actually set something:
import random
# create vector of 99 0's
vector = [0 for i in range(99)]
# only need to loop 7 times
for i in range(7):
index = 0 # first element is set to 1 first
while vector[index] == 1: # keep calling random until a 0 is found
index = random.randint(0, 98) # random index to check/set
vector[index] = 1 # set the random (or first) element to 1
The second one will always set the first element to 1 first, because index = random.randint(0, 98) only ever gets called if vector[0] == 1.
With genetic programming you want to control your domain so that invalid configurations are eliminated as much as possible. The fitness is suppose to rate valid configurations, not eliminate invalid configurations. Honestly this problem doesn't really seem to be a good fit for genetic programming. You have outlined the domain. But I don't see a fitness description anywhere.
Anyway, that being said, the way I would populate the domain would be: since the first element is always 1, ignore it, since the remaining 98 only have 6 ones, shuffle in 6 ones to 92 zeros. Or even enumerate the possible as your domain isn't very large.
I have a feeling it is your use of sum(). I believe this modifies the list in place:
>>> mylist = [1,2,3,4]
>>> sum(mylist)
10
>>> mylist
[]
Here's a (somewhat) pythonic recursive version
def generate_vector():
generation_constant = .96
myvector = [1]+[ 1 if random.random() > generation_constant else 0 for i in range(0,99)]
mysum = 0
for a in myvector:
mysum = (mysum + a)
if mysum == 7 and myvector[0]==1:
return myvector
return generate_vector()
and for good measure
def generate_test():
for i in range(0,10000):
vector = generate_vector()
sum = 0
for a in vector:
sum = sum + a
if sum != 7 or vector[0]!=1:
print vector
output:
>>> generate_test()
>>>

Categories

Resources