My question is if there was an issue in changing def step(self,x) function since the original was faulty.
I attempted to change def step(self,x) to x.any. It resulted in a prediction error where all predictions were 1 I attempted to implement an OR Perceptron neural network from a book by following the codes given. However, I received an error The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
This is the code:
from nn import Perceptron
import numpy as np
X = np.array([[0,0],[0,1],[1,0],[1,1]])
print(X[1])
y = np.array([0],[1],[1],[0])
print("[INFO] training perceptron...")
p = Perceptron(X.shape[1],alpha = 0.1)
p.fit(X,y,epochs=20)
print("[INFO] testing perceptron...")
for (x,target) in zip(X,y):
pred=p.predict(X)
print("[INFO] data={}, ground-truth={}, pred={}". format(x, target[0], pred))
The package that I imported was:
import numpy as np
class Perceptron:
def __init__(self, N, alpha = 0.1):
self.W = np.random.randn(N+1)/np.sqrt(N)
self.alpha = alpha
def step(self,x):
if x>0:
return 1
else:
return 0
def fit(self, X, y, epochs = 10):
X = np.c_[X,np.ones((X.shape[0]))]
for epoch in np.arange(0, epochs):
for (x,target) in zip(X,y):
p = self.step(np.dot(x, self.W))
if p!= target:
error = p-target
self.W += -self.alpha * error * x
def predict(self,X,addBias=True):
X = np.atleast_2d(X)
if addBias:
X=np.c_[X, np.ones((X.shape[0]))]
return self.step(np.dot(X,self.W))
My apologies if its a silly question as I spent the whole day thinking about it to no avail.
Thanks in advance!
The error that you are facing is because step() is coded to evaluate 1 element of the array at a time but when you pass an array to it in the predict function it has to do something like this:
[0.266,1.272,-1.282,0.889] > 1
The interpreter doesn't know which value to evaluate since it's an array and hence gives the error. Using any or all would check for 'any' or 'all' value in the array and give you 0 or 1 correspondingly, which is why you get an array of 1s when you write x.any().
Another thing that bothered me about the code you imported was that the forward pass is done in a loop, which is not very efficient or pythonic. A vectorized implementation is way better. I have changed the step function and fit function in that imported code to be vectorized and it runs fine for me.
import numpy as np
class Perceptron:
def __init__(self, N, alpha = 0.1):
self.W = np.random.randn(N+1)/np.sqrt(N)
self.alpha = alpha
def step(self,x):
return 1. * (x > 0)
def fit(self, X, y, epochs = 10):
X = np.c_[X,np.ones((X.shape[0]))]
for epoch in np.arange(0, epochs):
Z = np.dot(X, self.W)
p = self.step(Z)
if np.any(p != y):
error = (p-y)
self.W += -self.alpha * np.dot(X.T,error)
def predict(self,X,addBias=True):
X = np.atleast_2d(X)
if addBias:
X=np.c_[X, np.ones((X.shape[0]))]
return self.step(np.dot(X,self.W))
Now the step function is returning a binary array where the value is 1 when the input is greater than 0 else 0. For example if you had an array say:
X= [0.266,1.272,-1.282,0.889]
would be converted to:
[1,1,0,1]
I also changed the fit function so that it does everything vectorized.
One other thing that I did to my code was this :
Instead of
y = np.array([0],[1],[1],[0])
I did
y = np.array([0,1,1,0])
to get it working. I hope this helps. Be sure to ask anything if you don't understand.
Related
I’m trying to apply multiclass logistic regression from scratch. The dataset is the MNIST.
I built some functions such as hypothesis, sigmoid, cost function, cost function derivate, and gradient descendent. My code is below.
I’m struggling with:
As all images are labeled with the respective digit that they represent. There are a total of 10 classes.
Inside the function gradient descendent, I need to loop through each class, but I do not know how to apply it using the One vs All method.
In other words, what I need to do are:
How to filter each class inside the gradient descendent.
After that, how to build a function to predict the test set.
Here is my code.
import numpy as np
import pandas as pd
# Only training data set
# the test data will be load later.
url='https://drive.google.com/file/d/1-MO8oCfq4KU361QeeL4DdafVBhZePUNT/view?usp=sharing'
url='https://drive.google.com/uc?id=' + url.split('/')[-2]
df = pd.read_csv(url,header = None)
X = df.values[:, 0:-1]
y = df.values[:, -1]
m = np.size(X, 0)
y = np.array(y).reshape(m, 1)
X = np.c_[ np.ones(m), X ] # Bias
def hypothesis(X, thetas):
return sigmoid( X.dot(thetas)) #- 0.0000001
def sigmoid(z):
return 1/(1+np.exp(-z))
def losscost(X, y, m, thetas):
h = hypothesis(X, thetas)
return -(1/m) * ( y.dot(np.log(h)) + (1-y).dot(np.log(1-h)) )
def derivativelosscost(X, y, m, thetas):
h = hypothesis(X, thetas)
return (h-y).dot(X)/m
def descendinggradient(X, y, m, epoch, alpha, thetas):
n = np.size(X, 1)
J_historico = []
for i in range(epoch):
for j in range(0,10): # 10 classes
# How to filter each class inside here (inside this def descendinggradient)?
# 2 lines below are wrong.
#thetas = thetas - alpha * derivativelosscost(X, y, m, thetas)
#J_historico = J_historico + [losscost(X, y, m, thetas)]
return [thetas, J_historico]
alpha = 0.01
epoch = 50
(thetas, J_historico) = descendinggradient(X, y, m, epoch, alpha)
# After that, how to build a function to predict the test set.
Let me explain this problem step-by-step:
First since you code doesn't provides the actual data or a link to it I've created a random dataset followed by the same commands you used to create X and Y:
batch_size = 20
num_classes = 10
rng = np.random.default_rng(seed=42)
df = pd.DataFrame(
4* rng.random((batch_size, num_classes + 1)) - 2, # Create Random Array Between -2, 2
columns=['X0','X1','X2','X3','X4','X5','X6','X7','X8', 'X9','Y']
)
X = df.values[:, 0:-1]
y = df.values[:, -1]
m = np.size(X, 0)
y = np.array(y).reshape(m, 1)
X = np.c_[ np.ones(m), X ] # Bias
Next lets take a look at your hypothesis function. If we would just run hypothesis and take a look at the first sample, we will get a vector with the size (10,1). I also needed to provide the initial thetas for this case:
thetas = rng.random((X.shape[1],num_classes))
h = hypothesis(X, thetas)
print(h[0])
>>>[0.89701729 0.90050806 0.98358408 0.81786334 0.96636732 0.97819512
0.89118488 0.87238045 0.70612173 0.30256924]
Basically the function calculates a "propabilties"[1] for each class.
At this point we got to the first issue in your code. The result of the sigmoid function returns "propabilities" which are not "connected" to each other. So to set those "propabilties" in relation we need a another function: SOFTMAX. You will find plenty implementations about this functions. In short: It will calculate the "propabilites" based on the "sigmoid", so that the sum overall class-"propabilites" results to 1.
So for your second question "How to implement a predict after training", we only need to find the argmax value to determine the class:
h = hypothesis(X, thetas)
p = softmax(h) # needs to be implemented
prediction = np.argmax(p, axis=1)
print(prediction)
>>>[2 5 5 8 3 5 2 1 3 5 2 3 8 3 3 9 5 1 1 8]
Now that we know how to predict a class, we also need to know where to setup the training. We want to do this directly after the softmax function. But instead of using the argmax to determine the winning class, we use the costfunction and its derivative. Your problem in your code: You used the crossentropy loss for a binary problem. The binary problem also don't need to use the softmax function, because the sigmoid function already provides the connection of the two binary classes. So since we are not interested in the result at all of the cross-entropy-loss for multiple classes and only into its derivative, we also want to calculate this directly.
The conversion from binary crossentropy to multiclass is kind of unintuitive in the first view. I recommend to read a bit about it before implementing. After this you basicly use your line:
thetas = thetas - alpha * derivativelosscost(X, y, m, thetas)
for updating the thetas.
[1]These are not actuall propabilities, but this is a complete different topic.
I am attempting to visualize a NeuralNetwork; class below. The code containing this segmant can successfully return a scatter plot and a curve (trained NN) with a randomly-generated input. I've then inputted data from a .csv file and encountered overflow errors apparently due to very large or very small data values.
class NeuralNetwork:
def __init__(self, width, activation_function):
self.width=width
self.activation_function=activation_function
self.depth=len(width)-2
self.W= np.array([np.random.randn(self.width[i+1],self.width[i]) for i in range(self.depth+1)], dtype=object)
self.b= np.array([np.random.randn(self.width[i+1]) for i in range(self.depth+1)], dtype=object)
self.alpha = 0.1
def sigma(self,x):
if self.activation_function == 'Sigmoid':
return 1/(1+np.exp(-x))
else:
if x>=0:
return x
else:
return 0
def d_sigma(self,x):
if self.activation_function == 'Sigmoid':
return self.sigma(x)*self.sigma(1-x)
else:
if x>=0:
return 1
else:
return 0
# vectorize the function
def sigma_vec(self,X):
return np.array([self.sigma(x) for x in X],dtype=np.float128)
def d_sigma_vec(self,X):
return np.array([self.d_sigma(x) for x in X],dtype=np.float128)
def h(self,x):
a=np.copy(self.b)
z=np.copy(self.b)
z[0]=np.dot(self.W[0],x)+self.b[0]
for i in range(self.depth):
a[i]=self.sigma_vec(z[i])
z[i+1]=np.dot(self.W[i+1],a[i])+self.b[i+1]
return z[-1]
def plot(self,plt,col):
dx = np.linspace(0,1, 100)
h_dx = np.array([ self.h([x]) for x in dx],dtype=np.float128)
plt.plot(dx,h_dx,color=col)
def gradient_of_loss_function(self,h,y):
return h-y
def gradient(self,x,y):
a=np.copy(self.b)
z=np.copy(self.b)
z[0]=np.dot(self.W[0],x)+self.b[0]
for i in range(self.depth):
a[i]=self.sigma_vec(z[i])
z[i+1]=np.dot(self.W[i+1],a[i])+self.b[i+1]
gradient_b=np.copy(self.b)
gradient_W=np.copy(self.W)
gradient_b[-1]=self.gradient_of_loss_function(z[-1],y)
for i in reversed(range(self.depth)):
gradient_b[i]=np.dot(np.transpose(self.W[i+1]),gradient_b[i+1])*self.d_sigma_vec(z[i])
gradient_W[i+1]=np.outer(gradient_b[i+1],a[i])
gradient_W[0]=np.outer(gradient_b[0],x)
return gradient_b,gradient_W;
def batch_gradient(self,X,Y):
batch_gradient_W=0
batch_gradient_b=0
for i in range(len(X)):
gradient_b,gradient_W = self.gradient(X[i],Y[i])
batch_gradient_W += gradient_W
batch_gradient_b += gradient_b
return batch_gradient_b,batch_gradient_W;
def train(self,X,Y,iterate):
for i in range(iterate):
alpha= self.alpha/len(X)
batch_gradient_b,batch_gradient_W =self.batch_gradient(X,Y)
self.W -= alpha*batch_gradient_W
self.b -= alpha*batch_gradient_b
Below is the code I am using to plot the data. The data is extracted from a csv file; I have made sure there are no NaN or 0 values within. data_axis and data are exported dates and stock market prices respectively.
X= np.array(data, dtype=np.float128).reshape(-1,1)
noise = np.random.randn(n)
Y= np.array(data_axis, dtype=np.float128).reshape(-1,1)
width=[len(X[0]),5,3,len(Y[0])]
nn=NeuralNetwork(width,'Sigmoid')
plt.scatter(X,Y,color='r')
nn.plot(plt,'b') #linear
nn.train(X,Y,10)
nn.plot(plt,'g') #curve plotted according to training
plt.show()
How may I edit the code so that a plotted curve may be drawn; and no overflow errors are returned? I did ask a similar question yesterday; and I believe the source of the overflow to be in the input rather than the code. However, Numpy's Polyfit is able to return regression curves for the dataset - so I would imagine there is a way to eliminate overflowing through the code. Thank you for reading
I'm trying to create a network, that would help predict stock prices the following day. My input data are: open, high, low and close stock values, volume, index values, a few technical indicators and exchange rate; the output is closing price from the next day. I'm using data uploaded from Excel file.
I wrote a program, that I will paste below, but it doesn't seem to be working correctly. Network always returns 1, 0 or other constant value (between 0 - 1).
I took the following steps so far:
tried to normalise the data like so: X_norm = X/(10 ** d) where d is the smallest number for which this conditon is met: abs(X_norm) < 1. I did that for the whole set in Excel before dividing it into training and test.
shuffled the data before dividing it into training/test, so that learning examples are not from consecutive days
running the network on a smaller data set and on example data set (I generated random numbers and did a simple math using them for an output and tried running network with that)
changing amount of hidden neurons
chaninging number of iterations (up to a 1000, which was a lot for my computer considering the data set, so I didn't try any more because it would take too much time)
changing learning rate.
No matter what steps I took the outcome was always the same. I think my problem could be that I don't have a bias, but perhaps I also have other mistakes in my code that are contributing to this error.
My program:
import numpy as np
import pandas as pd
df = pd.read_excel(r"path", sheet_name="DATA", index_col=0, header=0)
df = df.to_numpy()
np.random.shuffle(df)
X_data = df[:, 0:15]
X_data = X_data.reshape(1000, 1, 15)
print(f"X_data: {X_data}")
Y_data = df[:, 15]
Y_data = Y_data.reshape(1000, 1, 1)
print(f"Y_data: {Y_data}")
X = X_data[0:801]
x_test = X_data[801:]
y = Y_data[0:801]
y_test = Y_data[801:]
print(f"X_train: {X}")
print(f"x_test: {x_test}")
print(f"Y_train: {y}")
print(f"y_test: {y_test}")
rate = 0.2
class NeuralNetwork:
def __init__(self):
self.input_neurons = 15
self.hidden1_neurons = 10
self.hidden2_neurons = 5
self.output_neuron = 1
self.input_to_hidden1_w = (np.random.random((self.input_neurons, self.hidden1_neurons))) # 14x30
self.hidden1_to_hidden2_w = (np.random.random((self.hidden1_neurons, self.hidden2_neurons))) # 30x20
self.hidden2_to_output_w = (np.random.random((self.hidden2_neurons, self.output_neuron))) # 20x1
def activation(self, x):
sigmoid = 1/(1+np.exp(-x))
return sigmoid
def activation_d(self, x):
derivative = x * (1 - x)
return derivative
def feed_forward(self, X):
self.z1 = np.dot(X, self.input_to_hidden1_w)
self.z1_a = self.activation(self.z1)
self.z2 = np.dot(self.z1_a, self.hidden1_to_hidden2_w)
self.z2_a = self.activation(self.z2)
self.z3 = np.dot(self.z2_a, self.hidden2_to_output_w)
output = self.activation(self.z3)
return output
def backward(self, X, y, rate, output):
error = y - output
z3_error_delta = error * self.activation_d(output)
z2_error = np.dot(z3_error_delta, np.transpose(self.hidden2_to_output_w))
z2_error_delta = z2_error * self.activation_d(self.z2)
z1_error = np.dot(z2_error_delta, np.transpose(self.hidden1_to_hidden2_w))
z1_error_delta = z1_error * self.activation_d(self.z1)
self.input_to_hidden1_w += rate * np.dot(np.transpose(X), z1_error_delta)
self.hidden1_to_hidden2_w += rate * np.dot(np.transpose(self.z1), z2_error_delta)
self.hidden2_to_output_w += rate * np.dot(np.transpose(self.z2), z3_error_delta)
def train(self, X, y):
output = self.feed_forward(X)
self.backward(X, y, rate, output)
def save_weights(self):
np.savetxt("w1.txt", self.input_to_hidden1_w, fmt="%s")
np.savetxt("w2.txt", self.hidden1_to_hidden2_w, fmt="%s")
np.savetxt("w3.txt", self.hidden2_to_output_w, fmt="%s")
def check(self, x_test, y_test):
self.feed_forward(x_test)
np.mean(np.square((y_test - self.feed_forward(x_test))))
Net = NeuralNetwork()
for l in range(100):
for i, pattern in enumerate(X):
for j, outcome in enumerate(y):
print(f"#: {l}")
print(f'''
# {str(l)}
# {str(X[i])}
# {str(y[j])}''')
print(f"Predicted output: {Net.feed_forward(X[i])}")
Net.train(X[i], y[j])
print(f"Error training: {(np.mean(np.square(y - Net.feed_forward(X))))}")
Net.save_weights()
for i, pattern in enumerate(x_test):
for j, outcome in enumerate(y_test):
Net.check(x_test[i], y_test[j])
print(f"Error test: {(np.mean(np.square(y_test - Net.feed_forward(x_test))))}")
I can't seem to get Theano to reshape my tensors as want it to. The reshaping in the code bellow is supposed to keep keep_dims dimensions and flatten all remaining ones into a single array.
The code fails with IndexError: index out of bounds on the reshape line if I run it with a test value. Otherwise, the function seems to compile, but fails upon first real input with ValueError: total size of new array must be unchanged.
When I tried using just numpy for an equivalent code, it worked normally. Is there anything I am doing wrong? Or is there any easy way to see the resulting dimensions that are used for the reshaping (ipdb does not help since everything is a Theano variable)?
import theano
import theano.tensor as T
import numpy as np
theano.config.compute_test_value = 'warn'
theano.config.optimizer = 'None'
class Layer(object):
def __init__(self, name):
self.name = name
self.inputs = []
self.outputs = []
def get_init_weights(self, shape):
rows, cols = shape
w_init = np.reshape(np.asarray([rnd.uniform(-0.05, 0.05)
for _ in xrange(rows * cols)]),
newshape=(rows, cols))
return w_init
class Embedding(Layer):
def __init__(self, name, dict_size, width, init='uniform_005'):
super(Embedding, self).__init__(name)
self.width = width
self.dict_size = dict_size
e_init = self.get_init_weights((dict_size, width))
self.e = theano.shared(value=e_init, name=self.name)
def connect(self, inputs):
output = self.e[inputs]
self.inputs.append(inputs)
self.outputs.append(output)
return output
class Flatten(Layer):
def __init__(self, name, keep_dims=1):
super(Flatten, self).__init__(name)
self.params = []
self.keep_dims = keep_dims
def connect(self, inputs):
keep_dims = self.keep_dims
# this line fails
output = inputs.reshape(inputs.shape[0:keep_dims] +
(T.prod(inputs.shape[keep_dims:]),),
ndim=(keep_dims + 1))
return output
if __name__ == '__main__':
x = T.itensor3('x') # batch embedding * embedding size * number of different embeddings
x.tag.test_value = np.random.randint(0, 50, (5, 20, 3)).astype('int32')
emb_layer = Embedding('e', dict_size=50, width=10)
y = emb_layer.connect(x)
flat_layer = Flatten('f')
y = flat_layer.connect(y)
func = theano.function([x], y, allow_input_downcast=True)
The problem relates to how you're combining the two components of the new shape. The reshape command requires an lvector for the new shape.
Since you're using the test values mechanism you can debug this problem by simply printing test value bits and pieces. For example, I used
print inputs.shape.tag.test_value
print inputs.shape[0:keep_dims].tag.test_value
print inputs.shape[keep_dims:].tag.test_value
print T.prod(inputs.shape[keep_dims:]).tag.test_value
print (inputs.shape[0:keep_dims] + (T.prod(inputs.shape[keep_dims:]),)).tag.test_value
print T.concatenate([inputs.shape[0:keep_dims], [T.prod(inputs.shape[keep_dims:])]]).tag.test_value
This shows a fix to the problem: using T.concatenate to combine the keep_dims and the product of the remaining dims.
I'm trying to fit a 2D-Gaussian to some greyscale image data, which is given by one 2D array.
The lmfit library implements a easy-to-use Model class, that should be capable of doing this.
Unfortunately the documentation (http://lmfit.github.io/lmfit-py/model.html) does only provide examples for 1D fitting. For my case I simply construct the lmfit Model with 2 independent variables.
The following code seems valid for me, but causes scipy to throw a "minpack.error: Result from function call is not a proper array of floats."
Tom sum it up: How to input 2D (x1,x2)->(y) data to a Model of lmfit.?
Here is my approach:
Everything is packed in a GaussianFit2D class, but here are the important parts:
That's the Gaussian function. The documentation says about user defined functions
Of course, the model function will have to return an array that will be the same size as the data being modeled. Generally this is handled by also specifying one or more independent variables.
I don't really get what this should mean, since for given values x1,x2 the only reasonable result is a scalar value.
def _function(self, x1, x2, amp, wid, cen1, cen2):
val = (amp/(np.sqrt(2*np.pi)*wid)) * np.exp(-((x1-cen1)**2+(x2-cen2)**2)/(2*wid**2))
return val
Here the model is generated:
def _buildModel(self, **kwargs):
model = lmfit.Model(self._function, independent_vars=["x1", "x2"],
param_names=["amp", "wid", "cen1", "cen2"])
return model
That's the function that takes the data, builds the model and params and calls lmfit fit():
def fit(self, data, freeX, **kwargs):
freeX = np.asarray(freeX, float)
model = self._buildModel(**kwargs)
params = self._generateModelParams(model, **kwargs)
model.fit(data, x1=freeX[0], x2=freeX[1], params=params)
Anf finally here this fit function gets called:
data = np.asarray(img, float)
gaussFit = GaussianFit2D()
x1 = np.arange(len(img[0, :]))
x2 = np.arange(len(img[:, 0]))
fit = gaussFit.fit(data, [x1, x2])
Ok, wrote with the devs and got the answer from them (thanks to Matt here).
The basic idea is to flatten all the input to 1D data, hiding from lmfit the >1 dimensional input.
Here's how you do it.
Modify your function:
def function(self, x1, x2):
return (x1+x2).flatten()
Flatten your 2D input array you want to fit to:
...
data = data.flatten()
...
Modify the two 1D x-variables such that you have any combination of them:
...
x1n = []
x2n = []
for i in x1:
for j in x2:
x1n.append(i)
x2n.append(j)
x1n = np.asarray(x1n)
x2n = np.asarray(x2n)
...
And throw anything into the fitter:
model.fit(data, x1=x1n, x2=x2n, params=params)
Here is an example for your reference, hope it may help you.
import numpy
from lmfit import Model
def gaussian(x, cenu, cenv, wid):
u = x[:, 0]
v = x[:, 1]
return (1/(2*numpy.pi*wid**2)) * numpy.exp(-(u-cenu)**2 / (2*wid**2)-(v-cenv)**2 / (2*wid**2))
data = numpy.empty((25,3))
x = numpy.arange(-2,3,1)
y = numpy.arange(-2,3,1)
xx, yy = numpy.meshgrid(x, y)
data[:,0] = xx.flatten()
data[:,1] = yy.flatten()
data[:, 2]= gaussian(data[:,0:2],0,0,0.5)
print 'xx\n', xx
print 'yy\n',yy
print 'data to be fit\n', data[:, 2]
cu = 0.9
cv = 0.5
wid = 1
gmod = Model(gaussian)
gmod.set_param_hint('cenu', value=cu, min=cu-2, max=cu+2)
gmod.set_param_hint('cenv', value=cv, min=cv -2, max=cv+2)
gmod.set_param_hint('wid', value=wid, min=0.1, max=5)
params = gmod.make_params()
result = gmod.fit(data[:, 2], x=data[:, 0:2], params=params)
print result.fit_report(min_correl=0.25)
print result.best_values
print result.best_fit