I have been trying to develop the YOLO cost function which I have shown below. This is the first time I have tried to develop my own cost function in Tensorflow and am unsure if I am approaching it correctly or not. For one, my model uses a number of intermediate steps. I not sure if this complicates the computational graph in some meaningfully destructive way? Or, I am using an abs. value step and am unsure whether it would have some negative effect on my backprop? Any assistance would be helpful in regard to whether I am approaching this problem correctly.
I can answer any questions about my implementation.
Note - Z13 is the prediction, y are the true values. There are 49 cells in my model (7x7) with each cell being represented by a 7x1 vector: [prob of anything in cell, x midpoint, y midpoint, box width, box height, prob dog, prob cat] .Referenced paper: https://arxiv.org/pdf/1506.02640.pdf which explains the cost function in depth.
I believe that there is either an issue with my forward prop or my cost function as my model is not learning meaningful representations.
def cost_function(Z13,y,coord=5,noobj=0.5):
"""
Z13: shape (None,7,7,7)
y: shape (None,7,7,7)
"""
# Masks are used as classification score for box coords only applies to cell where actual bounding box is
c_mask_true = y[:,:,:,0:1] > 0 # Mask which determines which cell has bounding box
c_mask_false = y[:,:,:,0:1] < 1 # Mask for cells w/o bounding boxes
# Confidence scores
ci_guess_t = tf.boolean_mask(Z13[:,:,:,0:1],c_mask_true)
ci_guess_f = tf.boolean_mask(Z13[:,:,:,0:1],c_mask_false)
ci_act_t = tf.boolean_mask(y[:,:,:,0:1],c_mask_true)
ci_act_f = tf.boolean_mask(y[:,:,:,0:1],c_mask_false)
# Bounding box coordinated for ground truth box prediction
xi_guess = tf.boolean_mask(Z13[:,:,:,1:2],c_mask_true) # Midpoint x position
xi_act = tf.boolean_mask(y[:,:,:,1:2],c_mask_true)
yi_guess = tf.boolean_mask(Z13[:,:,:,2:3],c_mask_true) # Midpoint y position
yi_act = tf.boolean_mask(y[:,:,:,2:3],c_mask_true)
# Width:
wi_guess = tf.boolean_mask(Z13[:,:,:,3:4],c_mask_true) # Midpoint width pos.
wi_guess = tf.minimum(tf.sqrt(tf.abs(wi_guess)),wi_guess) # prevent sqrt(neg) and increase cost for neg prediction
wi_act = tf.sqrt(tf.boolean_mask(y[:,:,:,3:4],c_mask_true))
# Height:
hi_guess = tf.boolean_mask(Z13[:,:,:,4:5],c_mask_true) # Midpoint height pos.
hi_guess = tf.minimum(tf.sqrt(tf.abs(hi_guess)),hi_guess) # prevent sqrt(neg) and increase cost for neg prediction
hi_act = tf.sqrt(tf.boolean_mask(y[:,:,:,4:5],c_mask_true))
# Predicted classes:
class_g_dog = tf.boolean_mask(Z13[:,:,:,5:6],c_mask_true)
class_t_dog = tf.boolean_mask(y[:,:,:,5:6],c_mask_true)
class_g_cat = tf.boolean_mask(Z13[:,:,:,6:7],c_mask_true)
class_t_cat = tf.boolean_mask(y[:,:,:,6:7],c_mask_true)
# Parts correspond with the cost function equations above
part1 = coord * tf.reduce_sum(tf.square(xi_act - xi_guess)+tf.square(yi_act - yi_guess))
part2 = coord * tf.reduce_sum(tf.square(wi_act - wi_guess)+tf.square(hi_act - hi_guess))
part3 = tf.reduce_sum(tf.square(ci_act_t - ci_guess_t))
part4 = noobj * tf.reduce_sum(tf.square(ci_act_f - ci_guess_f))
part5 = tf.reduce_sum(tf.square(class_t_dog - class_g_dog)+tf.square(class_t_cat - class_g_cat))
total_cost = part1 + part2 + part3 + part4 + part5
return total_cost
Related
I'm implementing Yolo v3 in Pytorch,
Imagine we have an image with a large object in it, whem building targets, I'll have to assign the anchor box with highest IoU with this object for each scale, (So each scale will have and anchor box assigned to this object, my interpretation), the IoU for the lowest resolution grid 13x13 may be good, but let's say the IoU for the 52x52 resolution is too low, so, the offsets the network will need to learn for that anchor box fit the target for this scale are huge, the oposite case (small object) is also true, so, does it do not harm the performance for detecting the objects? I mean, if I plot the mean IoU over all scales it will not be high, because the 13x13 one will not detect small objects and the 52x52 one will not detect large objects. is this the case? what is the metric I'll use for see training performance if this mean does not looks a good option?
Would make sense to assign only one anchor box in the scale that it has the best IoU and do not assign an anchor box for the others scales? I mean drop the box for the rest and let only the 13x13 scale predict for large objects, 26x26 for medium and 52x52 for small?
Here is my implementation of the build targets function:
(I followed Aladdin Persson's implementation)
In resume my questions are:
is this correct?
what is the metric I use to see if the training is going well?
and why this will not harm the performance?
In green the ground truth and in red the anchor box assigned to the object.
# boxes for an image -> [[Class,xCenter,yCenter,width,height],[Class,xCenter,yCenter,width,height]]
# S = [13,26,52]
# C = 20 # num of classes
# B = 3 # num of anchors per grid cell.
def create_targets(self,boxes,S,C):
target_matrix = [torch.zeros((torch.div(self.total_num_of_anchors, 3, rounding_mode='floor'), S, S, 1+4+1)) for S in self.S]
#target_matrix
#torch.Size([3, 13, 13, 6])
#torch.Size([3, 26, 26, 6])
#torch.Size([3, 52, 52, 6])
for box in boxes:
class_ = int(box[0])
xCenter = float(box[1])
yCenter = float(box[2])
width = float(box[3])
height = float(box[4])
# IoU
iou_anchors = iou_width_height(torch.tensor((width,height)), self.anchor_boxes)
# Highest IoU
anchor_indices = iou_anchors.argsort(descending=True, dim=0)
has_anchor = [False,False,False]
for anchor_index in anchor_indices:
# 0/3 = 0 , 1/3 = 0, 2/3 = 0 i.e index 0,1,2 scale 1.
# 3/3 = 1 , 4/3 = 1, 5/3 = 1 i.e index 3,4,5 scale 2.
# 6/3 = 2 , 7/3 = 2, 8/3 = 2 i.e index 6,7,8 scale 3.
# find the scale of the anchor.
scale_index = torch.div(anchor_index, self.num_anchors_per_scale, rounding_mode='floor') # 0, 1 or 2. ; anchor_index // self.num_anchors_per_scale
# 0%3 = 0 -> 1º anchor
# 1%3 = 1 -> 2º anchor
# 2%3 = 2 -> 3º anchor
# etc.
# Find the index of the anchor, 1º 2º or 3º anchor box in that scale.
anchor_on_scale = anchor_index % self.num_anchors_per_scale
S_scale = self.S[scale_index]
# grid cell where the object is.
i, j = int(S_scale*yCenter), int(S_scale*xCenter)
# there is an anchor is this position?
anchor_taken = target_matrix[scale_index][anchor_on_scale, i, j, 0]
if not anchor_taken and not has_anchor[scale_index]:
# confidence score set to 1.
target_matrix[scale_index][anchor_on_scale, i, j, 0] = 1
# converting coordinates for cell relative.
x_cell, y_cell = S_scale*xCenter - j, S_scale*yCenter - i
width_cell, height_cell = width*S_scale, height*S_scale
# store coordinates and class
target_matrix[scale_index][anchor_on_scale, i, j, 1:5] = torch.tensor([x_cell, y_cell, width_cell, height_cell])
target_matrix[scale_index][anchor_on_scale, i, j, 5] = class_
has_anchor[scale_index] = True
# to be honest I did not understood very well this part
# what is the difference if it's -1 or 0? , it will not be punished in the loss function if there is not an object assigned to this position.
elif not anchor_taken and iou_anchors[anchor_index] > self.ignore_iou_thresh:
target_matrix[scale_index][anchor_on_scale, i, j, 0] = -1
return tuple(target_matrix)
I'm trying to wrap my head around this but struggling to understand how I can compute the f1-score in an object detection task.
Ideally, I would like to know false positives, true positives, false negatives and true negatives for every target in the image (it's a binary problem with an object in the image as one class and the background as the other class).
Eventually I would also like to extract the false positive bounding boxes from the image. I'm not sure if this is efficient but I'd save the image names and bbox predictions and whether they are false positives etc. into a numpy file.
I currently have this set up with a batch size of 1 so I can apply a non-maximum suppression algorithm per image:
def apply_nms(orig_prediction, iou_thresh=0.3):
# torchvision returns the indices of the bboxes to keep
keep = torchvision.ops.nms(orig_prediction['boxes'], orig_prediction['scores'], iou_thresh)
final_prediction = orig_prediction
final_prediction['boxes'] = final_prediction['boxes'][keep]
final_prediction['scores'] = final_prediction['scores'][keep]
final_prediction['labels'] = final_prediction['labels'][keep]
return final_prediction
cpu_device = torch.device("cpu")
model.eval()
with torch.no_grad():
for images, targets in valid_data_loader:
images = list(img.to(device) for img in images)
outputs = model(images)
outputs = [{k: v.to(cpu_device) for k, v in t.items()} for t in outputs]
predictions = apply_nms(outputs[0], iou_thresh=0.3)
Any idea on how I can determine the aforementioned classification metrics and f1-score?
I've come across this line in an evaluation code provided by torchvision and wondering whether it would help me going forward:
res = {target["image_id"].item(): output for target, output in zip(targets, outputs)}
The use of the terms precision, recall, and F1 score in object detection are slightly confusing because these metrics were originally used for binary evaluation tasks (e.g. classifiation). In any case, in object detection they have slightly different meanings:
let:
TP - set of predicted objects that are successfully matched to a ground truth object (above IOU threshold for whatever dataset you're using, generally 0.5 or 0.7)
FP - set of predicted objects that were not successfully matched to a ground truth object
FN - set of ground truth objects that were not successfully matched to a predicted object
Precision: TP / (TP + FP)
Recall: TP / (TP + FN)
F1: 2*Precision*Recall /(Precision + Recall)
You can find many an implementation of the matching step (matching ground truth and predicted objects) generally provided with an dataset for evaluation, or you can implement it yourself. I'll suggest the py-motmetrics repository.
A simple implementation of the IOU calculation might look like:
def iou(self,a,b):
"""
Description
-----------
Calculates intersection over union for all sets of boxes in a and b
Parameters
----------
a : tensor of size [batch_size,4]
bounding boxes
b : tensor of size [batch_size,4]
bounding boxes.
Returns
-------
iou - float between [0,1]
average iou for a and b
"""
area_a = (a[2]-a[0]) * (a[3]-a[1])
area_b = (b[2]-b[0]) * (b[3]-b[1])
minx = max(a[0], b[0])
maxx = min(a[2], b[2])
miny = max(a[1], b[1])
maxy = min(a[3], b[3])
intersection = max(0, maxx-minx) * max(0,maxy-miny)
union = area_a + area_b - intersection
iou = intersection/union
return iou
So I've implemented the f1 score to be calculated globally- that is for the entire dataset.
The implementation below gives an example of determining the f1-score for a validation set.
The outputs of the model are in a dictionary format, and so we need to place them into tensors like this:
predicted_boxes (list): [[train_index, class_prediction, prob_score, x1, y1, x2, y2],[],...[]]
train_index: index of image that the specific bbox comes from
class_prediction: integer value representing class prediction
prob_score: outputed objectiveness score for a bbox
x1,y1,x2,y2: (x1, y1) and (x2,y2) bbox coordinates
gt_boxes (list): [[train_index, class_prediction, prob_score, x1, y1, x2, y2],[],...[]]
Where prob_score is just 1 for the ground truth inputs (it could be anything really as long as that dimension is specified and filled in).
IoU is also implemented in torchvision which makes everything a lot easier.
I hope this helps others as I couldn't find another implementation of f1 score in object detection anywhere else.
model_test.eval()
with torch.no_grad():
global_tp = []
global_fp = []
global_gt = []
valid_df_unique = get_unique(valid_df['image_id'])
for images, targets in valid_data_loader:
images = list(img.to(device) for img in images)
outputs = model_test(images)
outputs = [{k: v.to(cpu_device) for k, v in t.items()} for t in outputs]
predictions = apply_nms(outputs[0], iou_thresh=0.1)
# looping through each class
for c in range(num_classes):
# detections (list): predicted_boxes that are class c
detections = []
# ground_truths (list): gt_boxes that are class c
ground_truths = []
for b,la,s in zip(predictions['boxes'], predictions['labels'],predictions['scores']):
updated_detection_array = [targets[0]['image_id'].item(), la.item(), s.item(), b[0].item(),b[1].item(),b[2].item(),b[3].item()]
if la.item() == c:
detections.append(updated_detection_array)
for b,la in zip(targets[0]['boxes'], targets[0]['labels']):
updated_gt_array = [targets[0]['image_id'].item(), la.item(), 1, b[0].item(),b[1].item(),b[2].item(),b[3].item()]
if la.item() == c:
ground_truths.append(updated_gt_array)
global_gt.append(updated_gt_array)
# use Counter to create a dictionary where key is image # and value
# is the # of bboxes in the given image
amount_bboxes = Counter([gt[0] for gt in ground_truths])
# goal: keep track of the gt bboxes we have already "detected" with prior predicted bboxes
# key: image #
# value: tensor of 0's (size is equal to # of bboxes in the given image)
for key, value in amount_bboxes.items():
amount_bboxes[key] = torch.zeros(value)
# sort over the probabiliity scores of the detections
detections.sort(key = lambda x: x[2], reverse = True)
true_Positives = torch.zeros(len(detections))
false_Positives = torch.zeros(len(detections))
total_gt_bboxes = len(ground_truths)
false_positives_frame = []
true_positives_frame = []
# iterate through all detections in given class c
for detection_index, detection in enumerate(detections):
# detection[0] indicates image #
# ground_truth_image: the gt bbox's that are in same image as detection
ground_truth_image = [bbox for bbox in ground_truths if bbox[0] == detection[0]]
# num_gt_boxes: number of ground truth boxes in given image
num_gt_boxes = len(ground_truth_image)
best_iou = 0
best_gt_index = 0
for index, gt in enumerate(ground_truth_image):
iou = torchvision.ops.box_iou(torch.tensor(detection[3:]).unsqueeze(0),
torch.tensor(gt[3:]).unsqueeze(0))
if iou > best_iou:
best_iou = iou
best_gt_index = index
if best_iou > iou_threshold:
# check if gt_bbox with best_iou was already covered by previous detection with higher confidence score
# amount_bboxes[detection[0]][best_gt_index] == 0 if not discovered yet, 1 otherwise
if amount_bboxes[detection[0]][best_gt_index] == 0:
true_Positives[detection_index] = 1
amount_bboxes[detection[0]][best_gt_index] == 1
true_positives_frame.append(detection)
global_tp.append(detection)
else:
false_Positives[detection_index] = 1
false_positives_frame.append(detection)
global_fp.append(detection)
else:
false_Positives[detection_index] = 1
false_positives_frame.append(detection)
global_fp.append(detection)
# remove nan values from ground truth list as list contains every mitosis image row entry (including images with no targets)
global_gt_updated = []
for gt in global_gt:
if math.isnan(gt[3]) == False:
global_gt_updated.append(gt)
global_fn = len(global_gt_updated) - len(global_tp)
precision = len(global_tp)/ (len(global_tp)+ len(global_fp))
recall = len(global_tp)/ (len(global_tp) + global_fn)
f1_score = 2* (precision * recall)/ (precision + recall)
print(len(global_tp))
print(recall)
print(precision)
print(f1_score)
I am aiming to perform a color correction based on a reference image, using color charts. As a personal goal, I'm trying to correct the colors of an image I previously modified. The chart has been affected by the same layers, of course:
Originals:
Manually modified:
I'm using the following function that I've written myself to get the matrix:
def _get_matrix_transformation(self,
observed_colors: np.ndarray,
reference_colors: np.ndarray):
"""
Args:
observed_colors: colors found in target chart
reference_colors: colors found on source/reference image
Returns:
Nothing.
"""
# case 1
observed_m = [observed_colors[..., i].mean() for i in range(observed_colors.shape[-1])]
observed_colors = (observed_colors - observed_m).astype(np.float32)
reference_m = [reference_colors[..., i].mean() for i in range(reference_colors.shape[-1])]
reference_colors = (reference_colors - reference_m).astype(np.float32)
# XYZ color conversion
observed_XYZ = cv.cvtColor(observed_colors, cv.COLOR_BGR2XYZ)
observed_XYZ = np.reshape(observed_colors, (observed_XYZ.shape[0] * observed_XYZ.shape[1],
observed_XYZ.shape[2]))
reference_XYZ = cv.cvtColor(reference_colors, cv.COLOR_BGR2XYZ)
reference_XYZ = np.reshape(reference_colors, (reference_XYZ.shape[0] * reference_XYZ.shape[1],
reference_XYZ.shape[2]))
# case 2
# mean subtraction in order to use the covariance matrix
# observed_m = [observed_XYZ[..., i].mean() for i in range(observed_XYZ.shape[-1])]
# observed_XYZ = observed_XYZ - observed_m
# reference_m = [reference_XYZ[..., i].mean() for i in range(reference_XYZ.shape[-1])]
# reference_XYZ = reference_XYZ - reference_m
# apply SVD
H = np.dot(reference_XYZ.T, observed_XYZ)
U, S, Vt = np.linalg.svd(H)
# get transformation
self._M = Vt.T * U.T
# consider reflection case
if np.linalg.det(self._M) < 0:
Vt[2, :] *= -1
self._M = Vt.T * U.T
return
I'm applying the correction like this:
def _apply_profile(self, img: np.ndarray) -> np.ndarray:
"""
Args:
img: image to be corrected.
Returns:
Corrected image.
"""
# Revert gamma compression
img = adjust_gamma(img, gamma=1/2.2)
# Apply color correction
corrected_img = cv.cvtColor(img.astype(np.float32), cv.COLOR_BGR2XYZ)
corrected_img = corrected_img.reshape((corrected_img.shape[0]*corrected_img.shape[1], corrected_img.shape[2]))
corrected_img = np.dot(self._M, corrected_img.T).T.reshape(img.shape)
corrected_img = cv.cvtColor(corrected_img.astype(np.float32), cv.COLOR_XYZ2BGR)
corrected_img = np.clip(corrected_img, 0, 255)
# Apply gamma
corrected_img = adjust_gamma(corrected_img.astype(np.uint8), gamma=2.2)
return corrected_img
The result I'm currently getting if the transformation is done in BGR (just commented color conversion functions):
In XYZ (don't pay attention to the resizing, that's because of me):
Now, I'm asking these questions:
Is inverting gamma necessary in this case? If so, am I doing it correctly? Should I implement a LUT that works with other data types such as np.float32?
Subtraction of the mean should be done in XYZ on BGR color space (case 1 vs case 2)?
Is considering the reflection case (as in a rigid body rotation problem) necessary?
Is clipping necessary? And if so, are those the correct values and data types?
What I am supposed to do. I have an black and white image (100x100px):
I am supposed to train a backpropagation neural network with this image. The inputs are x, y coordinates of the image (from 0 to 99) and output is either 1 (white color) or 0 (black color).
Once the network has learned, I would like it to reproduce the image based on its weights and get the closest possible image to the original.
Here is my backprop implementation:
import os
import math
import Image
import random
from random import sample
#------------------------------ class definitions
class Weight:
def __init__(self, fromNeuron, toNeuron):
self.value = random.uniform(-0.5, 0.5)
self.fromNeuron = fromNeuron
self.toNeuron = toNeuron
fromNeuron.outputWeights.append(self)
toNeuron.inputWeights.append(self)
self.delta = 0.0 # delta value, this will accumulate and after each training cycle used to adjust the weight value
def calculateDelta(self, network):
self.delta += self.fromNeuron.value * self.toNeuron.error
class Neuron:
def __init__(self):
self.value = 0.0 # the output
self.idealValue = 0.0 # the ideal output
self.error = 0.0 # error between output and ideal output
self.inputWeights = []
self.outputWeights = []
def activate(self, network):
x = 0.0;
for weight in self.inputWeights:
x += weight.value * weight.fromNeuron.value
# sigmoid function
if x < -320:
self.value = 0
elif x > 320:
self.value = 1
else:
self.value = 1 / (1 + math.exp(-x))
class Layer:
def __init__(self, neurons):
self.neurons = neurons
def activate(self, network):
for neuron in self.neurons:
neuron.activate(network)
class Network:
def __init__(self, layers, learningRate):
self.layers = layers
self.learningRate = learningRate # the rate at which the network learns
self.weights = []
for hiddenNeuron in self.layers[1].neurons:
for inputNeuron in self.layers[0].neurons:
self.weights.append(Weight(inputNeuron, hiddenNeuron))
for outputNeuron in self.layers[2].neurons:
self.weights.append(Weight(hiddenNeuron, outputNeuron))
def setInputs(self, inputs):
self.layers[0].neurons[0].value = float(inputs[0])
self.layers[0].neurons[1].value = float(inputs[1])
def setExpectedOutputs(self, expectedOutputs):
self.layers[2].neurons[0].idealValue = expectedOutputs[0]
def calculateOutputs(self, expectedOutputs):
self.setExpectedOutputs(expectedOutputs)
self.layers[1].activate(self) # activation function for hidden layer
self.layers[2].activate(self) # activation function for output layer
def calculateOutputErrors(self):
for neuron in self.layers[2].neurons:
neuron.error = (neuron.idealValue - neuron.value) * neuron.value * (1 - neuron.value)
def calculateHiddenErrors(self):
for neuron in self.layers[1].neurons:
error = 0.0
for weight in neuron.outputWeights:
error += weight.toNeuron.error * weight.value
neuron.error = error * neuron.value * (1 - neuron.value)
def calculateDeltas(self):
for weight in self.weights:
weight.calculateDelta(self)
def train(self, inputs, expectedOutputs):
self.setInputs(inputs)
self.calculateOutputs(expectedOutputs)
self.calculateOutputErrors()
self.calculateHiddenErrors()
self.calculateDeltas()
def learn(self):
for weight in self.weights:
weight.value += self.learningRate * weight.delta
def calculateSingleOutput(self, inputs):
self.setInputs(inputs)
self.layers[1].activate(self)
self.layers[2].activate(self)
#return round(self.layers[2].neurons[0].value, 0)
return self.layers[2].neurons[0].value
#------------------------------ initialize objects etc
inputLayer = Layer([Neuron() for n in range(2)])
hiddenLayer = Layer([Neuron() for n in range(10)])
outputLayer = Layer([Neuron() for n in range(1)])
learningRate = 0.4
network = Network([inputLayer, hiddenLayer, outputLayer], learningRate)
# let's get the training set
os.chdir("D:/stuff")
image = Image.open("backprop-input.gif")
pixels = image.load()
bbox = image.getbbox()
width = 5#bbox[2] # image width
height = 5#bbox[3] # image height
trainingInputs = []
trainingOutputs = []
b = w = 0
for x in range(0, width):
for y in range(0, height):
if (0, 0, 0, 255) == pixels[x, y]:
color = 0
b += 1
elif (255, 255, 255, 255) == pixels[x, y]:
color = 1
w += 1
trainingInputs.append([float(x), float(y)])
trainingOutputs.append([float(color)])
print "\nOriginal image ... Black:"+str(b)+" White:"+str(w)+"\n"
#------------------------------ let's train
for i in range(500):
for j in range(len(trainingOutputs)):
network.train(trainingInputs[j], trainingOutputs[j])
network.learn()
for w in network.weights:
w.delta = 0.0
#------------------------------ let's check
b = w = 0
for x in range(0, width):
for y in range(0, height):
out = network.calculateSingleOutput([float(x), float(y)])
if 0.0 == round(out):
color = (0, 0, 0, 255)
b += 1
elif 1.0 == round(out):
color = (255, 255, 255, 255)
w += 1
pixels[x, y] = color
#print out
print "\nAfter learning the network thinks ... Black:"+str(b)+" White:"+str(w)+"\n"
Obviously, there is some issue with my implementation. The above code returns:
Original image ... Black:21 White:4
After learning the network thinks ...
Black:25 White:0
It does the same thing if I try to use larger training set (I'm testing just 25 pixels from the image above for testing purposes). It returns that all pixels should be black after learning.
Now, if I use a manual training set like this instead:
trainingInputs = [
[0.0,0.0],
[1.0,0.0],
[2.0,0.0],
[0.0,1.0],
[1.0,1.0],
[2.0,1.0],
[0.0,2.0],
[1.0,2.0],
[2.0,2.0]
]
trainingOutputs = [
[0.0],
[1.0],
[1.0],
[0.0],
[1.0],
[0.0],
[0.0],
[0.0],
[1.0]
]
#------------------------------ let's train
for i in range(500):
for j in range(len(trainingOutputs)):
network.train(trainingInputs[j], trainingOutputs[j])
network.learn()
for w in network.weights:
w.delta = 0.0
#------------------------------ let's check
for inputs in trainingInputs:
print network.calculateSingleOutput(inputs)
The output is for example:
0.0330125791296 # this should be 0, OK
0.953539182136 # this should be 1, OK
0.971854575477 # this should be 1, OK
0.00046146137467 # this should be 0, OK
0.896699762781 # this should be 1, OK
0.112909223162 # this should be 0, OK
0.00034058462280 # this should be 0, OK
0.0929886299643 # this should be 0, OK
0.940489647869 # this should be 1, OK
In other words the network guessed all pixels right (both black and white). Why does it say all pixels should be black if I use actual pixels from the image instead of hard coded training set like the above?
I tried changing the amount of neurons in the hidden layers (up to 100 neurons) with no success.
This is a homework.
This is also a continuation of my previous question about backprop.
It's been a while, but I did get my degree in this stuff, so I think hopefully some of it has stuck.
From what I can tell, you're too deeply overloading your middle layer neurons with the input set. That is, your input set consists of 10,000 discrete input values (100 pix x 100 pix); you're attempting to encode those 10,000 values into 10 neurons. This level of encoding is hard (I suspect it's possible, but certainly hard); at the least, you'd need a LOT of training (more than 500 runs) to get it to reproduce reasonably. Even with 100 neurons for the middle layer, you're looking at a relatively dense compression level going on (100 pixels to 1 neuron).
As to what to do about these problems; well, that's tricky. You can increase your number of middle neurons dramatically, and you'll get a reasonable effect, but of course it'll take a long time to train. However, I think there might be a different solution; if possible, you might consider using polar coordinates instead of cartesian coordinates for the input; quick eyeballing of the input pattern indicates a high level of symmetry, and effectively you'd be looking at a linear pattern with a repeated predictable deformation along the angular coordinate, which it seems would encode nicely in a small number of middle layer neurons.
This stuff is tricky; going for a general solution for pattern encoding (as your original solution does) is very complex, and can usually (even with large numbers of middle layer neurons) require a lot of training passes; on the other hand, some advance heuristic task breakdown and a little bit of problem redefinition (i.e. advance converting from cartesian to polar coordinates) can give good solutions for well defined problem sets. Therein, of course, is the perpetual rub; general solutions are hard to come by, but slightly more specified solutions can be quite nice indeed.
Interesting stuff, in any event!
I have some problems with combining image channels in one rgb image. I use skimage and numpy for solving my problems. On input we have photo like this http://rghost.ru/8gYDcq2T6. With numpy.array slicing, I slice image in 3 parts by its height, then I slice edges of image (5% of height and width). Now I am ready to calculate mean squared error for two parts (part1 and part2, then for part1 and part3) for best image combining. I shift one part relatively to another on 15 pixels (left, right, up, down) and take minimum of all shifts, then I knew two shifts (x,y coordinates) for part1 and part2 and part1 and part3, then I usenumpy.dstack for combining channels, and it's all. But quality in end is not good, so what's my mistake? I think problems with combining images in end using dstack, but cant solve it, coz dont understand, how to do it. There is my code:
from skimage import data, io
from numpy import *
def metrics(first, second, x, y):
reshaped_second = roll(second, x,0)
reshaped_second = roll(reshaped_second, y, 1)
reshaped_first = first
mse = (((reshaped_first - reshaped_second) ** 2).sum())/(reshaped_first.size)
return (mse, ncc, x, y)
def align(path):
image = data.imread(path)
size = image.shape
part1 = image[0 : size[0]/3, : ]
part2 = image[size[0]/3 : 2*size[0]/3 , :]
part3 = image[2*size[0]/3 : size[0], :]
new_size = [min(part1.shape[0], part2.shape[0], part3.shape[0]), min(part1.shape[1], part2.shape[1], part3.shape[1])]
part1 = part1[new_size[0]/100*5 : new_size[0] - new_size[0]/100*5, new_size[1]/100*5 : new_size[1] - new_size[1]/100*5]
part2 = part2[new_size[0]/100*5 : new_size[0] - new_size[0]/100*5, new_size[1]/100*5 : new_size[1] - new_size[1]/100*5]
part3 = part3[new_size[0]/100*5 : new_size[0] - new_size[0]/100*5, new_size[1]/100*5 : new_size[1] - new_size[1]/100*5]
min_mse = 1000000000
xx_1 = None
yy_1 = None
for x in range(-15, 16):
for y in range(-15, 16):
mse = metrics(part1, part2,x,y)
if mse[0] <= min_mse:
xx_1 = mse[2]
yy_1 = mse[3]
min_mse = mse[0]
min_mse = 1000000000
xx_2 = None
yy_2 = None
for x in range(-15, 16):
for y in range(-15, 16):
mse = metrics(part1, part3,x,y)
if mse[0] <= min_mse:
xx_2 = mse[2]
yy_2 = mse[3]
min_mse = mse[0]
part2 = roll(part2, xx_1, 0) # numpy.roll()
part2 = roll(part2, yy_1, 1)
part3 = roll(part3, xx_2, 0)
part3 = roll(part3, yy_2, 1)
photo = dstack((part3,part2,part1))
io.imshow(photo)
io.show(
After execution of program I get such photo: http://rghost.ru/6fqqmFCnM, its first test image, on others worse. But want like better quality
What can I do? Thank you for help.
PROBLEM FOUND: problem was with type of reshaped_first and reshaped_second in metrics function, they were 'uint8', and when I calculated reshaped_first - reshaped_second it was undefined behavior and I get unrepresentative metric. So 5/6 test photos have good combining, but last its about 15-20 pixels error. So new question is about what metrics I should choose for this problem? I tried normalized cross-correlation, but it's worse than mean squared error, that I use now