Perceptron single layered - python

I'm struggling to implement a single layered perceptron: http://en.wikipedia.org/wiki/Perceptron. My program, depending on the weights, either is lost in the learning loop or find wrong weights. As a test case I use logical AND. Could you please give a hind why my perceptron does not converge? This is for my own learning. Thanks.
# learning rate
rate = 0.1
# Test data
# logical AND
# vector = (bias, coordinate1, coordinate2, targetedresult)
testdata = [[1, 0, 0, 0], [1, 0, 1, 0], [1, 1, 0, 0], [1, 1, 1, 1]]
# initial weigths
import random
w = [random.random(), random.random(), random.random()]
print 'initial weigths = ', w
def test(w, vector):
if diff(w, vector) <= 0.1:
return True
else:
return False
def diff(w, vector):
from copy import deepcopy
we = deepcopy(w)
return dirac(sum(we[i]*vector[i] for i in range(3))) - vector[3]
def improve(w, vector):
for i in range(3):
w[i] += rate*diff(w, vector)*vector[i]
return w
def dirac(z):
if z > 0:
return 1
else:
return 0
error = True
while error == True:
discrepancy = 0
for x in testdata:
if not test(w, x):
w = improve(w, x)
discrepancy += 1
if discrepancy == 0:
print 'improved weigths = ', w
error = False

It looks like you need an extra loop surrounding your for loop to iterate the improvement until your solutions converge (step 3 in the Wikipedia page you linked).
As it stands now, you give each training case exactly one chance to update the weights, so it has no chance to converge.

The only glitch I can see is in the activation function. Increase the cut off value, (z > 0.5).
Also, since there are only 4 input cases in each epoch, it very difficult to work with 0 and 1 as the only output. Try removing the dirac function and increasing the threshold to 0.2. It might take longer to learn but will be much more precise. Of course in case of NAND you dont really need to be. But it helps in understanding.

Related

Iterate over tensor in a custom loss function

I need to use this loss function for a CNN the list_distance and list_residual are output tensors from hidden layers which are important to compute the loss, but when i execute the code it gives me back this error
TypeError: Tensor objects are only iterable when eager execution is enabled. To iterate over this tensor use tf.map_fn.
Is there another way to iterate over tensors without the use of the costruct
x in X or convert it in a numpy array or using the backend function of keras?
def DBL(y_true, y_pred, list_distances, list_residual, l=0.65):
prob_dist = []
Li = []
# mean of the images power spectrum
S = np.sum([np.power(np.abs(fp.fft2(residual)), 2)
for residual in list_residual], axis=0) / K.shape(list_residual)[0]
# log-ratio between the geometric and arithmetic of S
R = np.log10((scistats.gmean(S) / np.mean(S)))
for c_i, dis_i in enumerate(list_distances):
prob_dist.append([
np.exp(-dis_i) / sum([np.exp(-dis_j) if c_j != c_i else 0 for c_j, dis_j in enumerate(list_distances)])
])
for count, _ in enumerate(prob_dist):
Li.append(
-1 * np.log10(sum([p_j for c_j, p_j in enumerate(prob_dist[count])
if y_pred[count] == 1 and count != c_j])))
L0 = np.sum(Li)
return L0 - l * R
You need to define a custom function to feed into tf.map_fn() - Tensorflow dox
Mapper functions map (funnily enough) the existing object (tensor) into a new one using a function you define.
They apply the custom function to every element in the object, without all the mucking about with for loops.
For instance (non tested code, may not run - on my phone atm):
def custom(a):
b = a + 1
return b
original = np.array([2,2,2])
mapped = tf.map_fn(custom, original)
# mapped == [3, 3, 3] ... hopefully
Tensorflow examples all use lambda functions, so you might need to define your functions like that if the above doesn’t work. Tensorflow example:
elems = np.array([1, 2, 3, 4, 5, 6])
squares = map_fn(lambda x: x * x, elems)
# squares == [1, 4, 9, 16, 25, 36]
Edit:
As an aside, map functions are much easier to parallelise than for loops - it is assumed that each element of an object is processed uniquely - so you can see a performance uplift by using them.
Edit 2:
For the "reduce sum, but not on this index" part, I would heavily recommend you start looking back at matrix operations... As mentioned, map functions work element-wise - they are not aware of other elements. A reduce function is what you want, but even they are finiky when you try and do "not this index" sums... also tensorflow is built around matrix ops... Not the MapReduce paradigm.
Something along these lines might help:
sess = tf.Session()
var = np.ones([3, 3, 3]) * 5
zero_identity = tf.linalg.set_diag(
var, tf.zeros(var.shape[0:-1], dtype=tf.float64)
)
exp_one = tf.exp(var)
exp_two = tf.exp(zero_identity)
summed = tf.reduce_sum(exp_two, axis = [0,1])
final = exp_one / summed
print("input matrix: \n", var, "\n")
print("Identities of the matrix to Zero: \n", zero_identity.eval(session=sess), "\n")
print("Exponential Values numerator: \n", exp_one.eval(session=sess), "\n")
print("Exponential Values to Sum: \n", exp_two.eval(session=sess), "\n")
print("Summed values for zero identity matrix\n ... along axis [0,1]: \n", summed.eval(session=sess), "\n")
print("Output:\n", final.eval(session=sess), "\n")

Vectorizing scipy norm.pdf

def predictDigit(img):
prob = [0] * 10
for digit in range(10):
for pix in range(len(img)):
std = pix_std[digit][pix]
mean = pix_means[digit][pix]
if std == 0:
continue
else:
prob[digit] += np.log(norm.pdf(img[pix], mean, std))
prob[digit] += np.log(digit_prob[digit])
return np.argmax(prob)
I wrote this function to use it for implementing a Naive Bayes Classifier for digit classification. The idea is to go through all pixels of an input image and add the np.log(norm.pdf(img[pix], mean, std)) to the prob and return the argmax of it at the end to label the digit of the input image.
However, this takes too long. I successfully vectorized getting the mean and std using:
pix_means[digit] = np.mean(image_cluster[digit], axis = 0)
pix_std[digit] = np.std(image_cluster[digit], axis = 0)
But, I am not sure if vectorization is possible with norm.pdf.
Please help.
EDIT
Digit Prob
digit_count = {}
for digit in y_train:
if digit not in digit_count:
digit_count[digit] = 1
else:
digit_count[digit] += 1
digit_prob = {}
for digit in range(10):
digit_prob[digit] = digit_count[digit] / len(y_train)
image_cluster
image_cluster = {}
for image, digit in zip(x_train, y_train):
if digit not in image_cluster:
image_cluster[digit] = [image]
else:
image_cluster[digit].append(image)
pix_means = {}
pix_std = {}
# get mean and sd
for digit in range(10):
pix_means[digit] = np.mean(image_cluster[digit], axis = 0)
pix_std[digit] = np.std(image_cluster[digit], axis = 0)
norm.pdf is vectorized right out the box!
To compute the cdf at a number of points, we can pass a list or a numpy array.
norm.cdf([-1., 0, 1])
array([ 0.15865525, 0.5, 0.84134475])
import numpy as np
norm.cdf(np.array([-1., 0, 1]))
array([ 0.15865525, 0.5, 0.84134475])
Thus, the basic methods, such as pdf, cdf, and so on, are vectorized.
https://docs.scipy.org/doc/scipy/reference/tutorial/stats.html

roc_auc_score - Only one class present in y_true

I am doing a k-fold XV on an existing dataframe, and I need to get the AUC score.
The problem is - sometimes the test data only contains 0s, and not 1s!
I tried using this example, but with different numbers:
import numpy as np
from sklearn.metrics import roc_auc_score
y_true = np.array([0, 0, 0, 0])
y_scores = np.array([1, 0, 0, 0])
roc_auc_score(y_true, y_scores)
And I get this exception:
ValueError: Only one class present in y_true. ROC AUC score is not
defined in that case.
Is there any workaround that can make it work in such cases?
You could use try-except to prevent the error:
import numpy as np
from sklearn.metrics import roc_auc_score
y_true = np.array([0, 0, 0, 0])
y_scores = np.array([1, 0, 0, 0])
try:
roc_auc_score(y_true, y_scores)
except ValueError:
pass
Now you can also set the roc_auc_score to be zero if there is only one class present. However, I wouldn't do this. I guess your test data is highly unbalanced. I would suggest to use stratified K-fold instead so that you at least have both classes present.
As the error notes, if a class is not present in the ground truth of a batch,
ROC AUC score is not defined in that case.
I'm against either throwing an exception (about what? This is the expected behaviour) or returning another metric (e.g. accuracy). The metric is not broken per se.
I don't feel like solving a data imbalance "issue" with a metric "fix". It would probably be better to use another sampling, if possibile, or just join multiple batches that satisfy the class population requirement.
I am facing the same problem now, and using try-catch does not solve my issue. I developed the code below in order to deal with that.
import pandas as pd
import numpy as np
class KFold(object):
def __init__(self, folds, random_state=None):
self.folds = folds
self.random_state = random_state
def split(self, x, y):
assert len(x) == len(y), 'x and y should have the same length'
x_, y_ = pd.DataFrame(x), pd.DataFrame(y)
y_ = y_.sample(frac=1, random_state=self.random_state)
x_ = x_.loc[y_.index]
event_index, non_event_index = list(y_[y == 1].index), list(y_[y == 0].index)
assert len(event_index) >= self.folds, 'number of folds should be less than the number of rows in x'
assert len(non_event_index) >= self.folds, 'number of folds should be less than number of rows in y'
indexes = []
#
#
#
step = int(np.ceil(len(non_event_index) / self.folds))
start, end = 0, step
while start < len(non_event_index):
train_fold = set(non_event_index[start:end])
valid_fold = set([k for k in non_event_index if k not in train_fold])
indexes.append([train_fold, valid_fold])
start, end = end, min(step + end, len(non_event_index))
#
#
#
step = int(np.ceil(len(event_index) / self.folds))
start, end, i = 0, step, 0
while start < len(event_index):
train_fold = set(event_index[start:end])
valid_fold = set([k for k in event_index if k not in train_fold])
indexes[i][0] = list(indexes[i][0].union(train_fold))
indexes[i][1] = list(indexes[i][1].union(valid_fold))
indexes[i] = tuple(indexes[i])
start, end, i = end, min(step + end, len(event_index)), i + 1
return indexes
I just wrote that code and I did not tested it exhaustively. It was tested only for binary categories. Hope it be useful yet.
You can increase the batch-size from e.g. from 32 to 64, you can use StratifiedKFold or StratifiedShuffleSplit. If the error still occurs, try shuffeling your data e.g. in your DataLoader.
Simply modify the code with 0 to 1 make it work
import numpy as np
from sklearn.metrics import roc_auc_score
y_true = np.array([0, 1, 0, 0])
y_scores = np.array([1, 0, 0, 0])
roc_auc_score(y_true, y_scores)
I believe the error message has suggested that only one class in y_true (all zero), you need to give 2 classes in y_true.

Convergence of backpropagation algorithm : how many iterations?

I'm new to the world of neural network, which is very interesting. I wrote the basic algorithm of backpropagation on multi-layer NN to solve small problems.
I use the activation function sigmoid (like most of you I think) (x->1/(1+exp(-x))).
I tried my program on several problems :
The first one is the XOR problem. I took a 3 layers network of size [2,2,1] with one bias neuron in the two first layers (so actually the size is more [3,3,1]).
It tried it with 1000 sets of data (i.e. a couple (0/1, 0/1) and its XOR as output), and the algorithm seemed to converge at an error of 0.5 :( I found it weird so i raised the number to 10000, and as it didn't change anything, to 100000 (in despair :p) and it WORKS ! The error fell down to less that 0.02 in average. Does anybody have an idea why it needs so much data to work?
The second one is the sum problem between two numbers (like 4+8 = ?). I took randomly a [2, 5, 5, 1] network with one bias neuron in the three first layers (so actually the size is more [3, 6, 6, 1]). I put a training data set of 100000 couples of numbers below 100 and their sum. This time, the error does not converge at all, while the output of the network always return the number 1. Have you already seen such situations ? Is it a bug code ? (code that i checked many times but perhaps).
import random
import math
class Network:
def initdata(self):
#weights initialization
self.weights.append([])
self.threshold.append([])
for l in range(1,len(self.layers)):
n = self.layers[l]
thresholdl = []
weightsl = []
for i in range(n):
thresholdl.append(-random.random())
weightsli = []
for j in range(self.layers[l-1]):
weightsli.append(random.random()*2-1)
#adding bias neurons
weightsli.append(thresholdl[-1])
weightsl.append(weightsli)
self.weights.append(weightsl)
self.threshold.append(thresholdl)
def __init__(self, layers):
self.layers = layers
self.weights = []
self.threshold = []
self.initdata()
def activation_function(self, x):
return 1/(1+math.exp(-x))
def outputlayer(self, input, l):
if l==0:
return [input]
output = []
prevoutput = self.outputlayer(input, l-1)
for i in range(self.layers[l]):
f = 0
for k in range(len(prevoutput[-1])):
f += self.weights[l][i][k]*prevoutput[-1][k]
f += self.weights[l][i][-1] #bias weight !
output.append(self.activation_function(f))
return prevoutput+[output]
def layersoutput(self, input):
return self.outputlayer(input, len(self.layers)-1)
def finaloutput(self, input):
return self.layersoutput(input)[-1]
def train(self, data, nu):
for (input, finaloutput) in data:
output = self.layersoutput(input)
err = self.errorvector(finaloutput, output[-1])
self.changeweights(err, output, nu)
def changeweights(self, err, output, nu):
deltas = []
for i in range(len(self.layers)):
deltas.append([])
tempweights = self.weights.copy()
def changeweightslayer(layer):
if layer != len(self.layers)-1:
changeweightslayer(layer+1)
for i in range(self.layers[layer]):
delta = 0
if layer != len(self.layers)-1:
delta = output[layer][i]*(1-output[layer][i])*sum([deltas[layer+1][l]*self.weights[layer+1][l][i] for l in range(self.layers[layer+1])])
else:
delta = output[layer][i]*(1-output[layer][i])*err[i]
deltas[layer].append(delta)
for k in range(len(self.weights[layer][i])-1):
tempweights[layer][i][k] += nu*output[layer-1][k]*delta
tempweights[layer][i][-1] += nu*delta
changeweightslayer(1)
self.weights = tempweights
def quadraticerror(self, a, b):
return sum([(a[i]-b[i])**2 for i in range(len(a))])
def errorvector(self, a, b):
return [a[i]-b[i] for i in range(len(a))]
network = Network([2, 5, 5, 1])
print(network.weights)
data = []
for i in range(1000000):
bit1 = random.randrange(100)
bit2 = random.randrange(100)
data.append(([float(bit1), float(bit2)], [float(bit1+bit2)]))
network.train(data, 0.1)
print(network.weights)

Baye's rule - how to calculate likelihood

Given is some data, data, which corresponds to a binary sequence of coin flips, where heads are 1's and tails are 0's. Theta is a value between 0 and 1 representing the probability that a coin produces heads when flipped.
How does one go about calculating the likelihood? I faintly remember a formula where:
likelihood = (theta)^(h)*(1-theta)^(1-h)
where h is 1 if heads, and 0 if tails. I implemented the following code:
import numpy as np
(np.prod([theta*1 for i in data if i==1]) * np.prod([1-theta for i in data if i==0]))
This code works for some cases but not for some hidden cases (so I'm not sure what's wrong with it).
There are a couple of ways to interpret what you are trying to calculate:
Probability of exactly that sequence, including the order in which the head occurs (which is how your question is posed here)
Probability of the number of heads (lets call this X) occurring in your sequence, regardless of the order (which is what I think you were asking for).
option 1:
import numpy as np
theta = 0.2 # Probability of H is 0.2, hence NOT a fair coin
data = [0, 1, 0, 1, 1, 1, 0, 0, 1, 1] # T, H, T, H, H, ....
def likelihood(theta, h):
return (theta)**(h)*(1-theta)**(1-h)
likelihood(theta, 1) # 0.2
likelihood(theta, 0) # 0.8
singlethrow = [likelihood(theta, x) for x in data]
prob1 = np.prod(singlethrow) # 2.6214400000000015e-05
prob1 will converge to zero pretty quickly, because every additional coin toss will multiply the existing probability with a number smaller than 1 (either 0.2 if heads, 0.8 if tails)
option 2:
is a binomial distribution. This adds up the probability of all possible outcomes that results in a total of, say, 6 heads when tossing a coin 10 times. One particular sequence that results in 6 heads for 10 tosses we already evaluated in option 1 above. There are 210 such ways ( = 10! / (6!*(10−6)!) )
The scipy.stats.binom.pmf() functionality calculates this probability for you:
import scipy, scipy.stats
prob2 = scipy.stats.binom.pmf(6, 10, theta)
Or, more generally, if you rely on data in the form I defined above:
X = sum([toss == 1 for toss in data])
N = len(data)
prob3 = scipy.stats.binom.pmf(X, N, theta)
prob2 == prob3 # True
If you're interested in the Bayesian approach, you might want to have a look at the conjugate_prior package
from conjugate_prior import BetaBinomial
prior_model = BetaBinomial(1,1) # Uninformative prior
updated_model = prior_model.update(heads, tails)
credible_interval = updated_model.posterior(0.45, 0.55)
print ("There's {p:.2f}% chance that the coin is fair".format(p=credible_interval*100))

Categories

Resources