Numpy array index out of range with Genetic Algorithm

Numpy array index out of range with Genetic Algorithm - python

I wrote a script to generate an image based from a source, with randomized ellipses using genetic algorithms. I keep receiving this error (the length of seeds is different every time, this is just an example) after running it:
Output:
[[ 42 166 88 21]
[ 25 201 321 227]
[ 21 78 153 53]
[ 5 74 231 20]
[ 3 96 394 15]
[ 20 239 28 244]
[ 33 6 94 27]
[ 4 253 193 113]
[ 10 139 323 16]
[ 31 9 97 117]
[ 23 273 181 214]
[ 24 286 361 231]
[ 33 2 187 47]
[ 35 98 133 177]
[ 10 307 136 76]
[ 35 132 269 161]
[ 25 147 11 2]
[ 36 141 338 100]
[ 23 163 430 37]
[ 17 285 216 53]
[ 18 2 181 119]
[ 43 199 117 253]] 22
Traceback (most recent call last):
File "E:/genetic image/genetic_image.py", line 106, in <module>
generate()
File "E:/genetic image/genetic_image.py", line 93, in generate
params, test_image = seed_test(seeds[:random.randint(0, reproduce)])
File "E:/genetic image/genetic_image.py", line 41, in seed_test
r = int(seeds[i, 0] + random.random() - 0.5)
IndexError: index (22) out of range (0<=index<22) in dimension 0
Here is the script:
import random
import copy
import numpy
from PIL import Image, ImageDraw
optimal = Image.open("charles-darwin_large.jpg")
optimal = optimal.convert("RGB")
size = width, height = optimal.size
population = 2
generations = 5000
elements = int(1e3)
reproduce = height / 10
max_radius = height / 10
diff_max = height / 10
def random_test():
test_elements = []
test_image = Image.new("RGB", (width, height), "white")
draw = ImageDraw.Draw(test_image)
for i in range(elements):
r = int(max_radius * random.random())
x, y = random.randint(0, width), random.randint(0, height)
color_value = random.randint(0, 255)
color = (color_value, color_value, color_value)
test_elements.append([r, x, y, color_value])
draw.ellipse((x - r, y - r, x + r, y + r), fill = color)
return test_elements, test_image
def seed_test(seeds):
test_elements = []
test_image = Image.new("RGB", (width, height), "white")
draw = ImageDraw.Draw(test_image)
print seeds, len(seeds)
for i in range(elements):
r = int(seeds[i, 0] + random.random() - 0.5)
x, y = seeds[i, 1] + random.randint(-5, 5), seeds[i, 2] + random.randint(-5, 5)
color_value = seeds[i, 3] + random.randint(-5, 5)
color = (color_value, color_value, color_value)
test_elements.append([r, x, y, color_value])
draw.ellipse((x - r, y - r, x + r, y + r), fill = color)
return test_elements, test_image
def grayscale(image):
return image.convert("LA")
def fitness(source, generated):
fitness = 0
for i in range(height - 1):
for j in range(width - 1):
r1, g1, b1 = source.getpixel((j, i))
r2, g2, b2 = generated.getpixel((j, i))
deltaRed = r1 - r2
deltaGreen = g1 - g2
deltaBlue = b1 - b2
pixelFitness = deltaRed ** 2 + deltaGreen ** 2 + deltaBlue ** 2
fitness += pixelFitness
return fitness
def generate():
samples = []
scores = [0] * reproduce
for i in range(population):
params, test_image = random_test()
fitness_score = fitness(optimal, test_image)
if fitness_score > scores[-1]:
scores[-1] = fitness_score
scores = sorted(scores)
samples.append(params)
for generation in range(generations):
seeds = numpy.array(copy.deepcopy(samples))[0]
samples = []
scores = [0] * reproduce
for i in range(population):
params, test_image = seed_test(seeds[:random.randint(0, reproduce)])
fitness_score = fitness(optimal, test_image)
if fitness_score > scores[-1]:
scores[-1] = fitness_score
scores = sorted(scores)
samples.append(params)
for each in samples:
print each
if __name__ == "__main__":
generate()
The source image can be found here.
What does the error mean?

you have 1000 elements (1e3) and 22 seeds (indexes 0 - 21), so when you try to get the item seeds[22, 0] in following loop, the index is out of range:
for i in range(elements):
r = int(seeds[i, 0] ...
I suspect tha what you need to do is:
for i in range(len(seeds)):
...

In your code you are setting the global elements to 100, why not set it to len(elements)? At present if there are less that 100 seed valuers the algorithm id guaranteed to fail in the way you describe.
A problem with your current solution attempt is the fact that much of the "coupling" between the various functions takes place through global variables. In Python we like to say "implicit is better than explicit", and best software engineering practice would be to pass the data explicitly to those functions that use it.

Related

Cumulative result with specific number in pandas

This is my DataFrame:
index, value
10, 109
11, 110
12, 111
13, 110
14, 108
15, 106
16, 100
I want to build another column based on multippliing by 0,05 with cumulative result.
index, value, result
10, 109, 109
11, 110, 109 + (0,05 * 1) = 109,05
12, 111, 109 + (0,05 * 2) = 109,1
13, 110, 109 + (0,05 * 3) = 109,15
14, 108, 109 + (0,05 * 4) = 109,2
15, 106, 109 + (0,05 * 5) = 109,25
16, 100, 109 + (0,05 * 6) = 109,3
I tried to experiment with shift and cumsum, but nothing works. Can you give me an advice how to do it?
Now I do something like:
counter = 1
result = {}
speed = 0,05
for item in range (index + 1, last_row_index + 1):
result[item] = result[first_index] + speed * counter
counter += 1
P.S. During your answers I've edited column result. Please don't blame me. I am really silly person, but I try to grow.
Thank you all for your answers!

Use numpy:
df['result'] = df['value'].iloc[0]*1.05**np.arange(len(df))
Output:
index value result
0 10 109 109.000000
1 11 110 114.450000
2 12 111 120.172500
3 13 110 126.181125
4 14 108 132.490181
5 15 106 139.114690
6 16 100 146.070425
After you edited the question:
df['result'] = df['value'].iloc[0]+0.05*np.arange(len(df))
output:
index value result
0 10 109 109.00
1 11 110 109.05
2 12 111 109.10
3 13 110 109.15
4 14 108 109.20
5 15 106 109.25
6 16 100 109.30

if indices are consecutive
df['result'] = (df['index'] - df['index'][0]) * 0.05 + df['value'][0]
or not:
df['result'] = df.value.reset_index().index * 0.05 + df.value[0]

df['result'] = np.arange(len(df)) * 0.05
df['result'] = df['value'].add(df['result'])
print(df)
Output:
value result
0 109 109.00
1 110 110.05
2 111 111.10
3 110 110.15
4 108 108.20
5 106 106.25
6 100 100.30

Python: how to compute the distance between cells?

Let's suppose I want to compute the distance between cells in a square grid 5x5. The distance between two cells is 100m.
Each cell of the grid is number between 0 and 24
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
15 16 17 18 19
20 21 22 23 24
For instance:
distance between cell 0 and 3 is 300
distance between cell 2 and 7 is 100
distance between cell 11 and 19 is 400
I have to count the distance as different between x and y location of the cells.
gs = 5 ## Cells per side
S = gs*gs ## Grid Size
r0 = 100 ## distance between two cells
for i in range(0, S):
for j in range(0, S):
if i == j: continue
x = int(floor(i/gs))
y = int(floor(j/gs))
dist = x*r0 + abs(j-i)*r0
but it is not the right solution

# n1, n2 = cell numbers
cellsize = 100.0
x1,x2 = n1%gs, n2%gs
y1,y2 = n1//gs, n2//gs
dist = sqrt( float(x1-x2)**2 + float(y1-y2)**2) # pythagoras theorem
dist *= cellsize

you should consider co-ordinate and not the cell number
gs = 5 ## Cells per side
S = gs*gs ## Grid Size
r0 = 100 ## distance between two cells
for i in range(0, S):
for j in range(0, S):
if i == j: continue
xi = int(i/gs)
yi = i % gs
xj = int(j/gs)
yj = j % gs
dist = r0 * (abs(xi-xj) + abs(yi-yj))

this is a way to do that:
r = 100
grid = ((0, 1, 2, 3, 4),
(5, 6, 7, 8, 9),
(10, 11, 12, 13, 14),
(15, 16, 17, 18, 19),
(20, 21, 22, 23, 24))
def coord(n):
for x, line in enumerate(grid):
if n not in line:
continue
y = line.index(n)
return x, y
def dist(n, m):
xn, yn = coord(n)
xm, ym = coord(m)
return r * (abs(xn - xm) + abs(yn - ym))
print(dist(0, 3)) # 300
print(dist(2, 7)) # 100
print(dist(11, 19)) # 400
the idea is to get the coordinates of your numbers first and then calculating the 'distance'.

This should work for you
n = 5 # row length in array
def distance(a, b):
distance = (abs(a // n - b // n) + abs(a % n - b % n)) * 100
return "distance between cell %s and %s is %s" % (a, b, distance)
print(distance(0, 3))
print(distance(2, 7))
print(distance(11, 19))
Output:
distance between cell 0 and 3 is 300
distance between cell 2 and 7 is 100
distance between cell 11 and 19 is 400
Where a and b are your cells, and n is a length of the row in array, in your example is 5.

We just need to get the row and column number of every number. Then a difference b/w the two and multiplied by 100 will give you answer
def get_row_col(num):
for i,g in enumerate(grid):
if num in g:
col = g.index(num)
row = i
return row, col
num1 = get_row_col(11)
num2 = get_row_col(19)
print (abs(num1[0] - num2[0])*100) + (abs(num1[1]-num2[1])*100)
One can enhance this code to check if number is present or not.

python pil library call and method that returns repeated data

I am working on some python code that opens a jpg and goes to a certain part of the image and extracts that part. The code is meant to take a rectangle of a certain size and compress it down to 28 by 28. With my code now I always get the output below. The output shows a row of data repeated 28 times. I expect the output to vary but it doesn't. I was hoping this was something that someone could spot easily. Any help would be appreciated. Thanks.
def start_here(self):
... # x,y,w,h are all valid
filename = "some valid filename"
img = self.look_at_img(filename,x,y,w,h)
self.print_block(img)
def look_at_img(self, filename, x = 0, y = 0, width = 28, height = 28):
img = Image.open(open(filename))
size = 28, 28
img2 = [[0] * 28] * 28
oneimg = []
mnist_dim = 28
multx = width / float(mnist_dim)
multy = height / float(mnist_dim)
xy_list = []
dimx, dimy = img.size
#img = np.asarray(img, dtype='float64')
''' Put in shrunk form. '''
if not len (img.shape) < 3 :
if not (x + width > dimx and y + height > dimy) :
for aa in range(28) :
for bb in range(28) :
astart = x + aa * multx
bstart = y + bb * multy
#print astart, bstart
if True :
item = [ aa, bb, list(img.getpixel((int(astart) ,int(bstart))))]
xy_list.append(item)
''' Put list in 28 x 28 array. '''
if len(xy_list) == 0:
xy_list = [[0, 0,[0,0,0]]]
for i in range(len(xy_list)):
q = xy_list[i]
if i < 10: print(q)
if (q[0] < 28) and (q[1] < 28) and (q[0] >= 0) and (q[1] >= 0) :
img2[int(q[0])] [ int(q[1])] = q[2][0]
''' Then add entire array to oneimg variable and flatten.'''
for yz in range(28):
for xz in range(28):
oneimg.append(img2[yz][xz])
return oneimg
def print_block(self, img):
#print (img)
for x in range(28):
for y in range(28):
out = str(img[x *28 + y]) +" "
sys.stdout.write(out)
print "|"
print "---------------"
some of the output is included below:
[0, 0, [90, 75, 70]]
[0, 1, [85, 77, 66]]
[0, 2, [87, 73, 70]]
[0, 3, [88, 74, 73]]
[0, 4, [86, 73, 64]]
[0, 5, [91, 77, 68]]
[0, 6, [89, 74, 69]]
[0, 7, [86, 73, 65]]
[0, 8, [87, 72, 65]]
[0, 9, [86, 72, 63]]
45 35 48 61 62 61 95 91 94 88 92 93 87 98 178 194 116 98 90 91 92 85 84 88 88 90 91 92 |
45 35 48 61 62 61 95 91 94 88 92 93 87 98 178 194 116 98 90 91 92 85 84 88 88 90 91 92 |
45 35 48 61 62 61 95 91 94 88 92 93 87 98 178 194 116 98 90 91 92 85 84 88 88 90 91 92 |
... more repeated values ...
EDIT: these are some of my imports
import numpy as np
from PIL import Image
import os
import sys
EDIT: I changed the code and updated the output some. I cannot figure why the list from the print (q) line doesn't match the numbers in the table.

something like this works for me. a numpy array needed to be used for img2
def look_at_img(self, filename, x = 0, y = 0, width = 28, height = 28):
img = Image.open(filename)
img2 = [[0] * 28] * 28
img2 = np.asarray(img2, dtype="float32") ## 'img2' MUST BE A NUMPY ARRAY!!
oneimg = []
mnist_dim = 28
multx = width / float(mnist_dim)
multy = height / float(mnist_dim)
xy_list = []
dimx, dimy = img.size
counter = 0
''' Put in shrunk form. '''
if not len (img.getbands()) < 3 :
if not (x + width > dimx and y + height > dimy) :
for aa in range(28) :
for bb in range(28) :
astart = x + aa * multx
bstart = y + bb * multy
if astart >= 0 and astart < dimx and bstart >= 0 and bstart < dimy :
item = [ aa, bb, list(img.getpixel((int(astart) ,int(bstart))))]
xy_list.append(item)
counter = counter + 1
''' Put list in 28 x 28 array. '''
if len(xy_list) == 0:
xy_list = [[0, 0,[0,0,0]]]
''' just one color '''
high = img.getextrema()[0][1] /2
for i in range(len(xy_list)):
q = xy_list[i]
color = q[2][0]
if color > high : img2[int(q[0]), int(q[1])] = color
''' Then add entire array to oneimg variable and flatten.'''
for yz in range(28):
for xz in range(28):
oneimg.append(img2[yz][xz])
return oneimg

Python: My while loop keeps appending the first color in the list

My while loop does not execute through correctly. It will go through and increment i how it is supposed to, but it does not increment i outside of the loop. This means it keeps appending the same rgb pixel color pairs ~4000 times. Any thoughts?
Input file example: ( I skip first three rows because that is file type, photo dimensions, # or colors. The rest are r,g,b pixel data. Every 3 rows is one pixel in the order of r,g,b)
P3
200 200
255
192
48
64
216
52
180
252
8
176
212
96
4
152
108
108
20
248
64
80
140
132
My Code:
import math
with open('Ocean.ppm','r') as f:
output = f.read().split("\n")
i = 0
r_point = 3 + i
g_point = 4 + i
b_point = 5 + i
resolution = []
resolution.append(output[1].split(" "))
file_size = resolution[0]
file_size = int(file_size[0]) * int(file_size[1])
file_size = int(file_size*3)
print(file_size)
pixel_list = []
pixel_list.append(str(output[0]))
pixel_list.append(str(output[1]))
pixel_list.append(str(output[2]))
while file_size >= i:
red = math.sqrt((int(output[r_point])-255)**2 + (int(output[g_point]) - 0)**2 + (int(output[b_point])-0)**2)
green = math.sqrt((int(output[r_point])-0)**2 + (int(output[g_point]) - 255)**2 + (int(output[b_point])-0)**2)
blue = math.sqrt((int(output[r_point])-0)**2 + (int(output[g_point]) - 0)**2 + (int(output[b_point])-255)**2)
white = math.sqrt((int(output[r_point])-0)**2 + (int(output[g_point]) - 0)**2 + (int(output[b_point])-0)**2)
black = math.sqrt((int(output[r_point])-255)**2 + (int(output[g_point]) - 255)**2 + (int(output[b_point])-255)**2)
L = [red, green, blue, white, black]
idx = min(range(len(L)), key=L.__getitem__)
if idx == 0:
# red
pixel_list.append('255')
pixel_list.append('0')
pixel_list.append('0')
i += 3
elif idx == 1:
# green
pixel_list.append('0')
pixel_list.append('255')
pixel_list.append('0')
i += 3
elif idx == 2:
# blue
pixel_list.append('0')
pixel_list.append('0')
pixel_list.append('255')
i += 3
elif idx == 3:
# white
pixel_list.append('0')
pixel_list.append('0')
pixel_list.append('0')
i += 3
elif idx == 4:
# black
pixel_list.append('255')
pixel_list.append('255')
pixel_list.append('255')
i += 3
f = open('myfile.ppm','w')
for line in pixel_list:
f.write(line + "\n")

I think you just need to move the following to inside the loop:
r_point = 3 + i
g_point = 4 + i
b_point = 5 + i
With that it will update the index to your lists.

Hotelling's T^2 scores in python

I applied pca on a data set using matplotlib in python. However, matplotlib does not provide a t-squared scores like Matlab. Is there a way to compute Hotelling's T^2 score like Matlab?
Thanks.

matplotlib's PCA class doesn't include the Hotelling T2 calculation, but it can be done with just a couple lines of code. The following code includes a function to compute the T2 values for each point. The __main__ script applies PCA to the same example as used in Matlab's pca documentation, so you can verify that the function generates the same values as Matlab.
from __future__ import print_function, division
import numpy as np
from matplotlib.mlab import PCA
def hotelling_tsquared(pc):
"""`pc` should be the object returned by matplotlib.mlab.PCA()."""
x = pc.a.T
cov = pc.Wt.T.dot(np.diag(pc.s)).dot(pc.Wt) / (x.shape[1] - 1)
w = np.linalg.solve(cov, x)
t2 = (x * w).sum(axis=0)
return t2
if __name__ == "__main__":
hald_text = """Y X1 X2 X3 X4
78.5 7 26 6 60
74.3 1 29 15 52
104.3 11 56 8 20
87.6 11 31 8 47
95.9 7 52 6 33
109.2 11 55 9 22
102.7 3 71 17 6
72.5 1 31 22 44
93.1 2 54 18 22
115.9 21 47 4 26
83.8 1 40 23 34
113.3 11 66 9 12
109.4 10 68 8 12
"""
hald = np.loadtxt(hald_text.splitlines(), skiprows=1)
ingredients = hald[:, 1:]
pc = PCA(ingredients, standardize=False)
coeff = pc.Wt
np.set_printoptions(precision=4)
# For coeff and latent, compare to
# http://www.mathworks.com/help/stats/pca.html#btjpztu-1
print("coeff:")
print(coeff)
print()
latent = pc.s / (ingredients.shape[0] - 1)
print("latent:" + (" %9.4f"*len(latent)) % tuple(latent))
print()
# For tsquared, compare to
# http://www.mathworks.com/help/stats/pca.html#bti6r0c-1
tsquared = hotelling_tsquared(pc)
print("tsquared:")
print(tsquared)
Output:
coeff:
[[ 0.0678 0.6785 -0.029 -0.7309]
[ 0.646 0.02 -0.7553 0.1085]
[-0.5673 0.544 -0.4036 0.4684]
[ 0.5062 0.4933 0.5156 0.4844]]
latent: 517.7969 67.4964 12.4054 0.2372
tsquared:
[ 5.6803 3.0758 6.0002 2.6198 3.3681 0.5668 3.4818 3.9794 2.6086
7.4818 4.183 2.2327 2.7216]

Even though this is an old question, I am posting the code as it may help someone.
Here is the code, as a bonus this does multiple hotelling tests at once
import numpy as np
from scipy.stats import f as f_distrib
def hotelling_t2(X, Y):
# X and Y are 3D arrays
# dim 0: number of features
# dim 1: number of subjects
# dim 2: number of mesh nodes or voxels (numer of tests)
nx = X.shape[1]
ny = Y.shape[1]
p = X.shape[0]
Xbar = X.mean(1)
Ybar = Y.mean(1)
Xbar = Xbar.reshape(Xbar.shape[0], 1, Xbar.shape[1])
Ybar = Ybar.reshape(Ybar.shape[0], 1, Ybar.shape[1])
X_Xbar = X - Xbar
Y_Ybar = Y - Ybar
Wx = np.einsum('ijk,ljk->ilk', X_Xbar, X_Xbar)
Wy = np.einsum('ijk,ljk->ilk', Y_Ybar, Y_Ybar)
W = (Wx + Wy) / float(nx + ny - 2)
Xbar_minus_Ybar = Xbar - Ybar
x = np.linalg.solve(W.transpose(2, 0, 1),
Xbar_minus_Ybar.transpose(2, 0, 1))
x = x.transpose(1, 2, 0)
t2 = np.sum(Xbar_minus_Ybar * x, 0)
t2 = t2 * float(nx * ny) / float(nx + ny)
stat = (t2 * float(nx + ny - 1 - p) / (float(nx + ny - 2) * p))
pval = 1 - np.squeeze(f_distrib.cdf(stat, p, nx + ny - 1 - p))
return pval, t2

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy array index out of range with Genetic Algorithm - python

you have 1000 elements (1e3) and 22 seeds (indexes 0 - 21), so when you try to get the item seeds[22, 0] in following loop, the index is out of range: for i in range(elements): r = int(seeds[i, 0] ... I suspect tha what you need to do is: for i in range(len(seeds)): ...

Related

Cumulative result with specific number in pandas

Python: how to compute the distance between cells?

python pil library call and method that returns repeated data

Python: My while loop keeps appending the first color in the list

Hotelling's T^2 scores in python

Categories

Resources