neural netork from scratch using one hidden layer and sigmoid activation

neural netork from scratch using one hidden layer and sigmoid activation - python

i am making a neural network for scratch from practise and i am not very experienced python programmer.
i know most of the maths concepts of neural network. my model is in not behaving well.derivation of sigmoid function is h(x)*(1-h(x)) but i am not sure that line of code is correct,i searched it on google and everyone have used tanh activation.and i am not really sure delta 2.i have no idea where my code is going wrong.i have few doubts of how do we subtract the prediction (y)-label(y). this is a three class classifier.
delta3[range(m1), y] -= 1
this line of code is also not clear to me,i have copied it from online just putting my m1(total number of examples in it).because my y matrix(labels in form of 0,1,2) is a vector of order(150,1) and the the prediction matrix is of (151,21) sp how do we subtract them.
#labels or classes
#1=iris-setosa
#2=iris-versicolor
#0=iris-virginica
#features
#sepallength
#sepalwidth
#petallengthcm
#petalwidth
import pandas as pd
import matplotlib.pyplot as plt
import csv
import numpy as np
df=pd.read_csv('Iris.csv')
df.convert_objects(convert_numeric=True)
df.fillna(0,inplace=True)
df.drop(['Id'],1,inplace=True)
#function to convert three labels into values 0,1,2
def handle_non_numericaldata(df):
columns=df.columns.values
for column in columns:
text_digit_vals={}
def convert_to_int(val):
return text_digit_vals[val]
if df[column].dtype!=np.int64 and df[column].dtype!=np.float:
column_contents=df[column].values.tolist()
unique_elements=set(column_contents)
x=0
for unique in unique_elements:
if unique not in text_digit_vals:
text_digit_vals[unique]=x
x+=1
df[column]=list(map(convert_to_int,df[column]))
return(df)
handle_non_numericaldata(df)
x=np.array(df.drop(['Species'],1).astype(float))
y=np.array(df['Species'])
m1=np.size(y)
theta=np.ones(shape=(4,1))
theta2=np.ones(shape=(1,21))
#no of examples "m"
#learning rate alpha
alpha=0.01
#regularization parameter
lamda=0.01
for i in range(1,2):
z1=np.dot(x,theta)
sigma=1/(1+np.exp(-z1))
#activation layer 2.
a2=sigma
z2=np.dot(a2,theta2)
probs=np.exp(z2)
softmax=probs/np.sum(probs,axis=1,keepdims=True)
delta3=softmax
print(softmax)
delta3[range(m1), y] -= 1
A2=np.transpose(a2)
dw2 = (A2).dot(delta3)
W2=np.transpose(theta2)
delta2=delta3.dot(W2)*sigma*(1-sigma)
X2=np.transpose(x)
dw1=np.dot(X2,delta2)
dw2=dw2-lamda*theta2
dw1=dw1-lamda*theta
theta =theta -alpha* dw1
theta2= theta2-alpha * dw2
correct_logprobs=-np.log(probs[range(m1),y])
data_loss=np.sum(correct_logprobs)
data_loss+=lamda/2*(np.sum(np.square(theta))+ np.square(theta2))
( 1./m1*data_loss)
my output for theta(weights)is
[[ 1.22833047]
[ 1.22591229]
[ 1.22341726]
[ 1.22162091]]
which obviously can not be correct.

Related

Should features that correlate be deleted from ML models?

I've seen that it's common practice to delete input features that demonstrate co-linearity (and leave only one of them).
However, I've just completed a course on how a linear regression model will give different weights to different features, and I've thought that maybe the model will do better than us giving a low weight to less necessary features instead of completely deleting them.
To try to solve this doubt myself, I've created a small dataset resembling a x_squared function and applied two linear regression models using Python:
A model that keeps only the x_squared feature
A model that keeps both the x and x_squared features
The results suggest that we shouldn't delete features, and let the model decide the best weights instead. However, I would like to ask the community if the rationale of my exercise is right, and whether you've found this doubt in other places.
Here's my code to generate the dataset:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Generate the data
all_Y = [10, 3, 1.5, 0.5, 1, 5, 8]
all_X = range(-3, 4)
all_X_2 = np.square(all_X)
# Store the data into a dictionary
data_dic = {"x": all_X, "x_2": all_X_2, "y": all_Y}
# Generate a dataframe
df = pd.DataFrame(data=data_dic)
# Display the dataframe
display(df)
which produces this:
and this is the code to generate the ML models:
# Create the lists to iterate over
ids = [1, 2]
features = [["x_2"], ["x", "x_2"]]
titles = ["$x^{2}$", "$x$ and $x^{2}$"]
colors = ["blue", "green"]
# Initiate figure
fig = plt.figure(figsize=(15,5))
# Iterate over the necessary lists to plot results
for i, model, title, color in zip(ids, features, titles, colors):
# Initiate model, fit and make predictions
lr = LinearRegression()
lr.fit(df[model], df["y"])
predicted = lr.predict(df[model])
# Calculate mean squared error of the model
mse = mean_squared_error(all_Y, predicted)
# Create a subplot for each model
plt.subplot(1, 2, i)
plt.plot(df["x"], predicted, c=color, label="f(" + title + ")")
plt.scatter(df["x"], df["y"], c="red", label="y")
plt.title("Linear regression using " + title + " --- MSE: " + str(round(mse, 3)))
plt.legend()
# Display results
plt.show()
which generate this:
What do you think about this issue? This difference in the Mean Squared Error can be of high importance on certain contexts.

Because x and x^2 are not linear anymore, that is why deleting one of them is not helping the model. The general notion for regression is to delete those features which are highly co-linear (which is also highly correlated)

So x2 and y are highly correlated and you are trying to predict y with x2? A high correlation between predictor variable and response variable is usually a good thing - and since x and y are practically uncorrelated you are likely to "dilute" your model and with that get worse model performance.
(Multi-)Colinearity between the predicor variables themselves would be more problematic.

Outlier detection with Local Outlier Factor (LOF)

I am working with healthcare insurance claims data and would like to identify fraudulent claims. Have been reading online to try and find a better method. I came across the following code on scikit-learn.org
Does anyone know how to select the outliers? the code plot them in a graph but I would like to select those outliers if possible.
I have tried appending the y_predictions to the x dataframe but that has not worked.
print(__doc__)
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import LocalOutlierFactor
np.random.seed(42)
# Generate train data
X = 0.3 * np.random.randn(100, 2)
# Generate some abnormal novel observations
X_outliers = np.random.uniform(low=-4, high=4, size=(20, 2))
X = np.r_[X + 2, X - 2, X_outliers]
# fit the model
clf = LocalOutlierFactor(n_neighbors=20)
y_pred = clf.fit_predict(X)
y_pred_outliers = y_pred[200:]
Below is the code i tried.
X['outliers'] = y_pred

The first 200 data are inliers while the last 20 are outliers. When you did fit_predict on X, you will get either outlier (-1) or inlier(1) in y_pred. So to get the predicted outliers, you need to get those y_pred = -1 and get the corresponding value in X. Below script will give you the outliers in X.
X_pred_outliers = [each[1] for each in list(zip(y_pred, X.tolist())) if each[0] == -1]
I combine y_pred and X into an array and check if y=-1, if yes then collect X values.
However, there are eight errors on the predictions (8 out of 220). These errors are -1 values in y_pred[:200] and 1 in y_pred[201:220]. Please be aware of the errors as well.

Is there a way to call a Numpy function inside a TensorFlow session?

I am trying to implement a Expectation Maximization algorithm using TensorFlow and TensorFlow Probability. It worked very well until I tried to implement Missing Data (data can contain NaN values in some random dimensions).
The problem is that with Missing Data I can no longer do all the operations as vector operations, I have to work with indexing and for-loops, like this:
# Here we iterate through all the data samples
for i in range(n):
# x_i is the sample i
x_i = tf.expand_dims(x[:, i], 1)
gamma.append(estimate_gamma(x_i, pi, norm, ber))
est_x_n_i = []
est_xx_n_i = []
est_x_b_i = []
for j in range(k):
mu_k = norm.mean()[j, :]
sigma_k = norm.covariance()[j, :, :]
rho_k = ber.mean()[j, :]
est_x_n_i.append(estimate_x_norm(x_i[:d, :], mu_k, sigma_k))
est_xx_n_i.append(estimate_xx_norm(x_i[:d, :], mu_k, sigma_k))
est_x_b_i.append(estimate_x_ber(x_i[d:, :], rho_k))
est_x_n.append(tf.convert_to_tensor(est_x_n_i))
est_xx_n.append(tf.convert_to_tensor(est_xx_n_i))
est_x_b.append(tf.convert_to_tensor(est_x_b_i))
What I found out was that these operations are not very efficient. While the first samples took about less than 1 second per sample, after 50 samples it took about 3 seconds per sample. I guess that this was happening because I was creating different tensors inside the session and that was messing up the memory or something.
I am quite new using TensorFlow and a lot of people only use TensorFlow for Deep Learning and Neural Networks so I couldn't find a solution for this.
Then I tried to implement the previous for-loop and the functions called inside that loop using only numpy arrays and numpy operations. But this returned the following error:
You must feed a value for placeholder tensor 'Placeholder_4' with
dtype double and shape [8,18]
This error happens because when it tries to execute the numpy functions inside the loop, the placeholder has not been fed yet.
pi_k, mu_k, sigma_k, rho_k, gamma_ik, exp_loglik = exp_max_iter(x, pi, dist_norm, dist_ber)
pi, mu, sigma, rho, responsability, NLL[i + 1] = sess.run([pi_k, mu_k, sigma_k, rho_k, gamma_ik, exp_loglik],{x: samples})
Is there any way to solve this? Thanks.

To answer your title question "Is there a way to call a Numpy function inside a TensorFlow session?", I've put in place below some sample code to execute a "numpy function" (sklearn.mixture.GaussianMixture) given missing data by directly calling the function or via Tensorflow's py_function. I am sensing this may not 100% be what you are looking for... in the case that you are just trying to implement EM..? the existing implementation of Gaussian Mixture Model in Tensorflow may be of some help:
documentation on tf.contrib.factorization.gmm:
https://www.tensorflow.org/api_docs/python/tf/contrib/factorization/gmm
implementation:
https://github.com/tensorflow/tensorflow/blob/r1.14/tensorflow/contrib/factorization/python/ops/gmm_ops.py#L462-L506
Sample code to call a 'numpy function' directly and within Tensorflow graph:
import numpy as np
np.set_printoptions(2)
import tensorflow as tf
from sklearn.mixture import GaussianMixture as GMM
def myfunc(x,istf=True):
#strip nans
if istf:
mask = ~tf.is_nan(x)
x = tf.boolean_mask(x,mask)
else:
ind=np.where(~np.isnan(x))
x = x[ind]
x = np.expand_dims(x,axis=-1)
gmm = GMM(n_components=2)
gmm.fit(x)
m0,m1 = gmm.means_[:,0]
return np.array([m0,m1])
# create data with nans
np.random.seed(42)
x = np.random.rand(5,28,1)
c = 5
x.ravel()[np.random.choice(x.size, c, replace=False)] = np.nan
# directly call "numpy function"
for ind in range(x.shape[0]):
val = myfunc(x[ind,:],istf=False)
print(val)
[0.7 0.26]
[0.15 0.72]
[0.77 0.2 ]
[0.65 0.23]
[0.35 0.87]
# initialization
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
# create graph
X = tf.placeholder(tf.float32, [28,1])
Y = tf.py_function(myfunc,[X],[tf.float32],name='myfunc')
# call "numpy function" in tensorflow graph
for ind in range(x.shape[0]):
val = sess.run(Y, feed_dict={X: x[ind,:],})
print(val)
[array([0.29, 0.76], dtype=float32)]
[array([0.72, 0.15], dtype=float32)]
[array([0.77, 0.2 ], dtype=float32)]
[array([0.23, 0.65], dtype=float32)]
[array([0.35, 0.87], dtype=float32)]

You can convert your numpy function into tensorflow function then it might not create problem when calling inside a session a simple function is following. Make an IOU function in numpy and then call it via tf.numpy_functionhere
def IOU(Pred, GT, NumClasses, ClassNames):
ClassIOU=np.zeros(NumClasses)#Vector that Contain IOU per class
ClassWeight=np.zeros(NumClasses)#Vector that Contain Number of pixel per class Predicted U Ground true (Union for this class)
for i in range(NumClasses): # Go over all classes
Intersection=np.float32(np.sum((Pred==GT)*(GT==i)))# Calculate class intersection
Union=np.sum(GT==i)+np.sum(Pred==i)-Intersection # Calculate class Union
if Union>0:
ClassIOU[i]=Intersection/Union# Calculate intesection over union
ClassWeight[i]=Union
# b/c we will only take the mean over classes that are actually present in the GT
present_classes = np.unique(GT)
mean_IOU = np.mean(ClassIOU[present_classes])
# append it in final results
ClassNames = np.append(ClassNames, 'Mean')
ClassIOU = np.append(ClassIOU, mean_IOU)
ClassWeight = np.append(ClassWeight, np.sum(ClassWeight))
return mean_IOU
# an now call as
NumClasses=6
ClassNames=['Background', 'Class_1', 'Class_1',
'Class_1 ', 'Class_1', 'Class_1 ']
x = tf.numpy_function(IOU, [y_pred, y_true, NumClasses, ClassNames],
tf.float64, name=None)

Multi-Layer Perceptron not working

I am trying to implement a multi-layer perceptron using simple numpy. But, I am running into a road-block. There's a fair chance that I am not implementing it correctly as in the past I have always used a library for this purpose. I would really appreciate some help in debugging the code. It is less than 100 lines of code and hopefully, should not take up too much time. Thanks!
The details of my perceptron are as follows:
Inputs = 2
Output = 1
Hidden Layer Count = 5
Loss = Squared-Error
The following is my code:
(I have commented in all the places that seemed necessary)
import numpy as np
import matplotlib.pyplot as plt
#Sampling 100 random values uniformly distributed b/w 0 and 5.
x=np.random.uniform(low=0, high=5, size=(100,))
y=np.multiply(x,x)
#Storing the random values and their squares in x and y
x=np.reshape(x,(-1,1))
y=np.reshape(y,(-1,1))
# plt.plot(x,y, 'ro')
# plt.show()
#Network Initialisation
hSize=5
inputSize=1
outputSize=1
Wxh=np.random.rand(hSize, inputSize+1)
Woh=np.random.rand(outputSize, hSize+1)
#+++++++++++++Back-propagation++++++++++++++
iterations=100
WohGrad=np.zeros(Woh.shape)
WxhGrad=np.zeros(Wxh.shape)
for i in range(0, iterations):
#+++++++++++++Forward Pass++++++++++++++
#Input Layer
z1=x[i]
a1=z1
h1=np.append([1], a1)
#Hidden Layer-1
z2=np.dot(Wxh, h1)
a2=1/(1+np.exp(-z2))
h2=np.append([1], a2)
#Output Layer
z3=np.dot(Woh, h2)
a3=z3
#+++++++++++++Backward Pass++++++++++++++
#Squared Error
pred=a3
expected=y[i]
loss=np.square(pred-expected)/2
#Delta Values
delta_3=(pred-expected)
delta_2=np.multiply(np.dot(np.transpose(Woh), delta_3)[1:], 1/(1+np.exp(-z2) ))
#Parameter Gradients and Update
WohGrad=WohGrad+np.dot(delta_3,(h2.reshape(1,-1)))
WxhGrad=WxhGrad+np.dot(delta_2.reshape(hSize,-1),(h1.reshape(1,-1)))
#Parameter Update
learningRate=0.01
L2_regularizer=0.01
WohGrad=WohGrad/iterations+L2_regularizer*Woh
WxhGrad=WxhGrad/iterations+L2_regularizer*Wxh
Wxh=Wxh-learningRate*WxhGrad
Woh=Woh-learningRate*WohGrad
#++++++++Testing++++++++++
#Forward Pass
#Input Layer
z1=np.array([2.5])
a1=z1
h1=np.append([1], a1)
#Hidden Layer-1
z2=np.dot(Wxh, h1)
a2=1/(1+np.exp(-z2))
h2=np.append([1], a2)
#Output Layer
z3=np.dot(Woh, h2)
a3=z3
print(a3)

neural network from scratch in python using sigmoid activation

i am new to python,trying to learn machine learning in python.i have tried to write a neural network from scratch with one hidden layer on the famous iris dataset.this is a three class classifier with out put as one hot vectors.i have also taken help from already written algos for help.for instance i used the same training set as my testing set.
it is a huge code to go through,i would like you to tell me, that how do we subtract 'y' output( which is one hot vector) of dimensions (150,3) and my out y softmax will be of vector (150,21).this is my biggest problem.i tried to look online everyone have used this method but since i am weak in python i don't understand it.this is the line of code delta3[range(m1), y] -= 1
arrays used as indices must be of integer (or boolean) type if m1 is sie of(150)
and if i give size m1(150,3) then
delta3[range(m1), y] -= 1
TypeError: range() integer end argument expected, got tuple.
remember m1=150
my y vector=150,3
softmax=150,21
my code is
#labels or classes
#1=iris-setosa
#2=iris-versicolor
#0=iris-virginica
#features
#sepallength
#sepalwidth
#petallengthcm
#petalwidth
import pandas as pd
import matplotlib.pyplot as plt
import csv
import numpy as np
df=pd.read_csv('Iris.csv')
df.convert_objects(convert_numeric=True)
df.fillna(0,inplace=True)
df.drop(['Id'],1,inplace=True)
#function to convert three labels into values 0,1,2
def handle_non_numericaldata(df):
columns=df.columns.values
for column in columns:
text_digit_vals={}
def convert_to_int(val):
return text_digit_vals[val]
if df[column].dtype!=np.int64 and df[column].dtype!=np.float:
column_contents=df[column].values.tolist()
unique_elements=set(column_contents)
x=0
for unique in unique_elements:
if unique not in text_digit_vals:
text_digit_vals[unique]=x
x+=1
df[column]=list(map(convert_to_int,df[column]))
return(df)
handle_non_numericaldata(df)
x=np.array(df.drop(['Species'],1).astype(float))
c=np.array(df['Species'])
n_values=(np.max(c)+1)
y=(np.eye(n_values)[c])
m1=np.size(c)
theta=np.ones(shape=(4,1))
theta2=np.ones(shape=(1,21))
#no of examples "m"
#learning rate alpha
alpha=0.01
#regularization parameter
lamda=0.01
for i in range(1,1000):
z1=np.dot(x,theta)
sigma=1/(1+np.exp(-z1))
#activation layer 2.
a2=sigma
z2=np.dot(a2,theta2)
probs=np.exp(z2)
softmax=probs/np.sum(probs,axis=1,keepdims=True)
delta3=softmax
delta3[range(m1), y] -= 1
A2=np.transpose(a2)
dw2 = (A2).dot(delta3)
W2=np.transpose(theta2)
delta2=delta3.dot(W2)*sigma*(1-sigma)
X2=np.transpose(x)
dw1=np.dot(X2,delta2)
dw2=dw2-lamda*theta2
dw1=dw1-lamda*theta
theta =theta -alpha* dw1
theta2= theta2-alpha * dw2
correct_logprobs=0
correct_logprobs=correct_logprobs-np.log(probs[range(m1),y])
data_loss=np.sum(correct_logprobs)
data_loss+=lamda/2*(np.sum(np.square(theta))+ np.square(theta2))
loss=1./m1*data_loss
if 1000%i==0:
print("loss after iteration%i:%f",loss)
final1=x.dot(theta)
sigma=1/(1+np.exp(-final1))
z2=sigma.dot(theta2)
exp_scores=np.exp(z2)
probs=exp_scores/np.sum(exp_scores,axis=1,keepdims=True)
print(np.argmax(probs,axis=1))

In Python range generates a tuple of numbers from x to y with range(x, y). If you generate something like range(10) then it is the same as (0, 1, 2, 3, 4, 5, 6, 7, 8, 9). Lists in Python need an integer index such as list[0] or list[4], not list[0, 4], however, there is a built-in thing in Python that allows access from index x to index y in a list here is the syntax: list[0:4]. This will return every value from 0 to 3 in the list. Such as if a list is list = [0,10,3,4,12,5,3] then list[0:4] will return [0,10,3,4].
Try taking a look at list data structures in Python on the Python Docs. As well as Understanding Generators in Python.
I think what your looking for is something like this: delta3 = [[z-1 for z in delta3[x:y]] for x in range(m1)]. This list comprehension uses two generations both, [x-1 for x in l], which subtracts one from every element in the list, and [l[x:y] for x in range(m)], which generates a list of lists with values through x to y in a range of m. Though I'm not sure I understand what your end goal is, fully.

What is a Neural Network?
The term ‘Neural’ has origin from the human (animal) nervous system’s basic functional unit ‘neuron’ or nerve cells present in the brain and other parts of the human (animal) body. A neural network is a group of algorithms that certify the underlying relationship in a set of data similar to the human brain. The neural network helps to change the input so that the network gives the best result without redesigning the output procedure
Now in code Example:_
import numpy as np
#assign input values
input_value=np.array([[0.26,0.77,0.25],[0.42,0.8,0.25],[0.56,0.53,0.25],[0.29,0.79,0.25]])
input_value.shape
#assign output values
output=np.array([0.644045,0.651730,0.707523,0.644395])
output=output.reshape(4,1)
output
#assign weights
weights=np.array([[0.1],[0.1],[0.1]])
weights.shape
weights
#add bias
bias=0.3
#activation function
def sigmoid_func(x):
return 1/(1+np.exp(-x))
#derivative of sigmoid function
def der(x):
return sigmoid_func(x)*(1-sigmoid_func(x))
#updating weights
for epochs in range(10000):
input_arr=input_value
#print(input_arr)
weighted_sum=np.dot(input_arr,weights)+bias
### CALCULATION OF PRE ACTIVATION FUNCTION
first_output=sigmoid_func(weighted_sum)
#print(first_output)
error=first_output - output
#print(error)
total_error=np.square(np.subtract(first_output,output)).mean()
#print total error
first_der=error
second_der=der(first_output)
derivative=first_der*second_der
t_input=input_value.T
final_derivative=np.dot(t_input,derivative)
#update Weigths
weights=weights-0.05*final_derivative
#update bias
for i in derivative:
bias=bias-0.05*i
print(weights)
print(bias)
#prediction for 1st item
pred=np.array([0.26,0.77,0.25])
result=np.dot(pred,weights)+bias
res=sigmoid_func(result)
print(res)
#prediction for 2nd item
pred=np.array([0.42,0.8,0.25])
result=np.dot(pred,weights)+bias
res=sigmoid_func(result)
print(res)
#prediction for 3rd item
pred=np.array([0.56,0.53,0.25])
result=np.dot(pred,weights)+bias
res=sigmoid_func(result)
print(res)
#prediction for 4th item
pred=np.array([0.29,0.79,0.25])
result=np.dot(pred,weights)+bias
res=sigmoid_func(result)
print(res)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

neural netork from scratch using one hidden layer and sigmoid activation - python

Related

Should features that correlate be deleted from ML models?

Outlier detection with Local Outlier Factor (LOF)

Is there a way to call a Numpy function inside a TensorFlow session?

Multi-Layer Perceptron not working

neural network from scratch in python using sigmoid activation

Categories

Resources