Machine learning size of input and output

Machine learning size of input and output - python

At the moment I'm playing around with machine learning in python based on this website (part two is about image recognition) . I would like to train a network to recognize 4 specific points in am image but My problem is:
The neural network is created by simply multiplying matrices together, calculate the delta between the given output and the recognized output and recalculate the weights in the matrix. Now let' say I have a 600x800 pixel image as input. If I multiply this with my layer matrices I can't get a 4x2 matrix as output (x,y for each point).
My second problem is how much hidden layers should I have for this problem? Are more layers always better but need longer to calculate? Can we guess how much hidden layers we need or should we test some values and use the best of it?
My current neural network code:
from os.path import isfile
import numpy as np
class NeuralNetwork:
def __init__(self):
np.random.seed(1)
self.syn0 = 2 * np.random.random((480000,8)) - 1
#staticmethod
def relu(x, deriv=False):
if(deriv):
res = np.maximum(x, 0)
return np.minimum(res, 1)
return np.maximum(x, 0)
def train(self, imgIn, out):
l1 = NeuralNetwork.relu(np.dot(imgIn, self.syn0))
l1_error = out - l1
exp = NeuralNetwork.relu(l1,True)
l1_delta = l1_error * exp
self.syn0 += np.dot(imgIn.T,l1_delta)
return l1 #np.abs(out - l1)
def identify(self, img):
return NeuralNetwork.relu(np.dot(imgIn, self.syn0))

Problem 1. Input data.
You must serialize the input. For example, if you have one 600*800 pixel image, input must be 1*480000(rows, cols).
Row means the number of data and column means the dimension of data.
Problem 2. Classification.
If you want to classify 4 different type of classes, you should use (1,4) vector for output. For example, there are 4 classes ('Fish', 'Cat', 'Tiger', 'Car'). Then vector (1,0,0,0) means Fish.
Problem 3. Fully connected network.
I think the example in this homepage uses fully connected network. It uses whole image for classifying once. If you want to classify with subset of image. You should use convolution neural network or other approach. I don't know well about this.
Problem 4. Hyperparameter
It depends on data. you must test with various hyper parameter. then choose best hyper parameter.

Related

Finding patterns in time series with PyTorch

I started PyTorch with image recognition. Now I want to test (very basically) with pure NumPy arrays. I struggle with getting the setup to work, so basically I have vectors with values between 0 and 1 (normalized curves). Those vectors are always of length 1500 and I want to find e.g. "high values at the beginning" or "sine wave-like function", "convex", "concave" etc. stuff like that, so just shapes of those curves.
My training set consists of many vectors with their classes; I have chosen 7 classes. The net should be trained to classify a vector into one or more of those 7 classes (not one hot).
I'm struggling with multiple issues, but first my very basic Net
class Net(nn.Module):
def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
super(Net, self).__init__()
self.hidden_dim = hidden_dim
self.layer_dim = layer_dim
self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim)
self.fc = nn.Linear(self.hidden_dim, output_dim)
def forward(self, x):
h0 = torch.zeros(self.layer_dim, x.size(1), self.hidden_dim).requires_grad_()
out, h0 = self.rnn(x, h0.detach())
out = out[:, -1, :]
out = self.fc(out)
return out
network = Net(1500, 70, 20, 7)
optimizer = optim.SGD(network.parameters(), lr=learning_rate, momentum=momentum)
This is just a copy-paste from an RNN demo. Here is my first issue. Is an RNN the right choice? It is a time series, but then again it is an image recognition problem when plotting the curve.
Now, this here is an attempt to batch the data. The data object contains all training curves together with the correct classifiers.
def train(epoch):
network.train()
network.float()
batching = True
index = 0
# monitor the cummulative loss for an epoch
cummloss = []
# start batching some curves
while batching:
optimizer.zero_grad()
# here I start clustering come curves to a batch and normalize the curves
_input = []
batch_size = min(len(data)-1, index+batch_size_train) - index
for d in data[index:min(len(data)-1, index+batch_size_train)]:
y = np.array(d['data']['y'], dtype='d')
y = np.multiply(y, y.max())
y = y[0:1500]
y = np.pad(y, (0, max(1500-len(y), 0)), 'edge')
if len(_input) == 0:
_input = y
else:
_input = np.vstack((_input, y))
input = torch.from_numpy(_input).float()
input = torch.reshape(input, (1, batch_size, len(y)))
target = np.zeros((1,7))
# the correct classes have indizes, to I create a vector with 1 at the correct locations
for _index in np.array(d['classifier']):
target[0,_index-1] = 1
target = torch.from_numpy(target)
# get the result form the network
output = network(input)
# is this a good loss function?
loss = F.l1_loss(output, target)
loss.backward()
cummloss.append(loss.item())
optimizer.step()
index = index + batch_size_train
if index > len(data):
print(np.mean(cummloss))
batching = False
for e in range(1, n_epochs):
print('Epoch: ' + str(e))
train(0)
The problem I'm facing right now is, the loss doesn't change very little, even with hundreds of epochs.
Are there existing examples of this kind of problem? I didn't find any, just pure png/jpg image recognition. When I convert the curves to png then I have a little issue to train a net, I took densenet and it worked just fine but it seems to be super overkill for this simple task.

This is just a copy-paste from an RNN demo. Here is my first issue. Is an RNN the right choice?
In theory what model you choose does not matter as much as "How" you formulate your problem.
But in your case the most obvious limitation you're going to face is your sequence length: 1500. RNN store information across steps and typically runs into trouble over long sequence with vanishing or exploding gradient.
LSTM net have been developed to circumvent this limitations with memory cell, but even then in the case of long sequence it will still be limited by the amount of information stored in the cell.
You could try using a CNN network as well and think of it as an image.
Are there existing examples of this kind of problem?
I don't know but I might have some suggestions : If I understood your problem correctly, you're going from a (1500, 1) input to a (7,1) output, where 6 of the 7 positions are 0 except for the corresponding class where it's 1.
I don't see any activation function, usually when dealing with multi class you don't use the output of the dense layer to compute the loss you apply a normalizing function like softmax and then you can compute the loss.

From your description of features you have in the form of sin like structures, the closes thing that comes to mind is frequency domain. As such, if you have and input image, just transform it to the frequency domain by a Fourier transform and use that as your feature input.
Might be best to look for such projects on the internet, one such project that you might want to read the research paper or video from this group (they have some jupyter notebooks for you to try) or any similar works. They use the furrier features, that go though a multi layer perceptron (MLP).
I am not sure what exactly you want to do, but seems like a classification task, you would use RNN if you want your neural network to work with a sequence. To me it seems like the 1500 dimensions are independent, and as such can be just treated as input.
Regarding the last layer, for a classification problem it usually is a probability distribution obtained by applying softmax (if only the classification is distinct - i.e. probability sums up to 1), in which, given an input, the net gives a probability of it being from each class. If we are predicting multiple classes we are going to use sigmoid as the last layer of the neural network.
Regarding your loss, there are many losses you can try and see if they are better. Once again, for different features you have to know what exactly is the measurement of distance (a.k.a. how different 2 things are). Check out this website, or just any loss function explanations on the net.
So you should try a simple MLP on top of fourier features as a starting point, assuming that is your feature vector.

Image Recognition is different from Time-Series data. In the imaging domain your data-set might have more similarity with problems like Activity-Recognition, Video-Recognition which have temporal component. So, I'd recommend looking into some models for those.
As for the current model, I'd recommend using LSTM instead of RNN. And also for classification you need to use an activation function in your final layer. This should softmax with cross entropy based loss or sigmoid with MSE loss.
Keras has a Timedistributed model which makes it easy to handle time components. You can use a similar approach with Pytorch by applying linear layers followed by LSTM.
Look into these for better undertsanding ::
Activity Recognition : https://www.narayanacharya.com/vision/2019-12-30-Action-Recognition-Using-LSTM
https://discuss.pytorch.org/t/any-pytorch-function-can-work-as-keras-timedistributed/1346
How to implement time-distributed dense (TDD) layer in PyTorch
Activation Function ::
https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html

How to randomly set inputs to zero in keras during training autoencoder (callback)?

I am training 2 autoencoders with 2 separate input paths jointly and I would like to randomly set one of the input paths to zero.
I use tensorflow with keras backend (functional API).
I am computing a joint loss (sum of two losses) for backpropagation.
A -> A' & B ->B'
loss => l2(A,A')+l2(B,B')
networks taking A and B are connected in latent space.
I would like to randomly set A or B to zero and compute the loss only on the corresponding path, meaning if input path A is set to zero loss be computed only by using outputs of only path B and vice versa; e.g.:
0 -> A' & B ->B'
loss: l2(B,B')
How do I randomly set input path to zero? How do I write a callback which does this?

Maybe try the following:
import random
def decision(probability):
return random.random() < probability
Define a method that makes a random decision based on a certain probability x and make your loss calculation depend on this decision.
if current_epoch == random.choice(epochs):
keep_mask = tf.ones_like(A.input, dtype=float32)
throw_mask = tf.zeros_like(A.input, dtype=float32)
if decision(probability=0.5):
total_loss = tf.reduce_sum(reconstruction_loss_a * keep_mask
+ reconstruction_loss_b * throw_mask)
else:
total_loss = tf.reduce_sum(reconstruction_loss_a * throw_mask
+ reconstruction_loss_b * keep_mask)
else:
total_loss = tf.reduce_sum(reconstruction_loss_a + reconstruction_loss_b)
I assume that you do not want to set one of the paths to zero every time you update your model parameters, as then there is a risk that one or even both models will not be sufficiently trained. Also note that I use the input of A to create zero_like and one_like tensors as I assume that both inputs have the same shape; if this is not the case, it can easily be adjusted.
Depending on what your goal is, you may also consider replacing your input of A or B with a random tensor e.g. tf.random.normal based on a random decision. This creates noise in your model, which may be desirable, as your model would be forced to look into the latent space to try reconstruct your original input. This means precisely that you still calculate your reconstruction loss with A.input and A.output, but in reality your model never received the A.input, but rather the random tensor.
Note that this answer serves as a simple conceptual example. A working example with Tensorflow can be found here.

You can set an input to 0 simply:
A = A*random.choice([0,1])
This code can be used inside a loss function

Batch inference of softmax does not sum to 1

I am working with REINFORCE algorithm with PyTorch. I noticed that the batch inference/predictions of my simple network with Softmax doesn’t sum to 1 (not even close to 1). I am attaching a minimum working code so that you can reproduce it. What am I missing here?
import numpy as np
import torch
obs_size = 9
HIDDEN_SIZE = 9
n_actions = 2
np.random.seed(0)
model = torch.nn.Sequential(
torch.nn.Linear(obs_size, HIDDEN_SIZE),
torch.nn.ReLU(),
torch.nn.Linear(HIDDEN_SIZE, n_actions),
torch.nn.Softmax(dim=0)
)
state_transitions = np.random.rand(3, obs_size)
state_batch = torch.Tensor(state_transitions)
pred_batch = model(state_batch) # WRONG PREDICTIONS!
print('wrong predictions:\n', *pred_batch.detach().numpy())
# [0.34072137 0.34721774] [0.30972624 0.30191955] [0.3495524 0.3508627]
# DOES NOT SUM TO 1 !!!
pred_batch = [model(s).detach().numpy() for s in state_batch] # CORRECT PREDICTIONS
print('correct predictions:\n', *pred_batch)
# [0.5955179 0.40448207] [0.6574412 0.34255883] [0.624833 0.37516695]
# DOES SUM TO 1 AS EXPECTED

Although PyTorch lets us get away with it, we don’t actually provide an input with the right dimensionality. We have a model that takes one input and produces one output, but PyTorch nn.Module and its subclasses are designed to do so on multiple samples at the same time. To accommodate multiple samples, modules expect the zeroth dimension of the input to be the number of samples in the batch.
Deep Learning with PyTorch
That your model works on each individual sample is an implementation nicety. You have incorrectly specified the dimension for the softmax (across batches instead of across the variables), and hence when given a batch dimension it is computing the softmax across samples instead of within samples:
nn.Softmax requires us to specify the dimension along which the softmax function is applied:
softmax = nn.Softmax(dim=1)
In this case, we have two input vectors in two rows (just like when we work with
batches), so we initialize nn.Softmax to operate along dimension 1.
Change torch.nn.Softmax(dim=0) to torch.nn.Softmax(dim=1) to get appropriate results.

Should I transpose features or weights in Neural network?

I am learning Neural network.
Here is the complete piece of code:
https://github.com/udacity/deep-learning-v2-pytorch/blob/master/intro-to-pytorch/Part%201%20-%20Tensors%20in%20PyTorch%20(Exercises).ipynb
When I transpose features, I get the following output:
import torch
def activation(x):
return 1/(1+torch.exp(-x))
### Generate some data
torch.manual_seed(7) # Set the random seed so things are predictable
# Features are 5 random normal variables
features = torch.randn((1, 5))
# True weights for our data, random normal variables again
weights = torch.randn_like(features)
# and a true bias term
bias = torch.randn((1, 1))
product = features.t() * weights + bias
output = activation(product.sum())
tensor(0.9897)
However, if I transpose weights, I get a different output:
weights_prime = weights.view(5,1)
prod = torch.mm(features, weights_prime) + bias
y_hat = activation(prod.sum())
tensor(0.1595)
Why does this happen?
Update
I took a look at the solution:
https://github.com/udacity/deep-learning-v2-pytorch/blob/master/intro-to-pytorch/Part%201%20-%20Tensors%20in%20PyTorch%20(Solution).ipynb
And I saw this:
y = activation((features * weights).sum() + bias)
why can a matrix features(1,5) multiply another matrix weights(1,5) without transposing weights first?
Update 2
After read several posts, I realized that
matrixA * matrixB is different from torch.mm(matrixA,matrixB) and torch.matmul(matrixA,matrixB).
Could someone confirm my three understandings between?
So the * means element-wise multiplication, whereas torch.mm() and torch.matmul() are matrix-wise multiplication.
differences between torch.mm() and torch.matmul(): mm() is used specifically for 2 dimensions matrix, whereas matmul() can be used for more complicated cases.
In Neutral Network for this Udacity coding exercise mentioned in my above link, it needs element-wise multiplication.
Update 3
Just to bring in the Video screenshot for someone who has the same confusion:
And here is the video link: https://www.youtube.com/watch?time_continue=98&v=6Z7WntXays8&feature=emb_logo

Looking at https://pytorch.org/docs/master/generated/torch.nn.Linear.html
The typical linear (fully connected) layer in torch uses input features of shape (N,∗,in_features) and weights of shape (out_features,in_features) to produce an output of shape (N,*,out_features). Here N is the batch size, and * is any number of other dimensions (may be none).
The implementation is:
output = input.matmul(weight.t())
So, the answer is that neither of your formulas is correct according to convention; the standard formula is the one above.
You may use a non-standard shape since you're implementing things from scratch; as long as it's consistent it may work, but I don't recommend it for learning. It's unclear what 1 and 5 is in your code, but presumably you want 5 input features and one output feature, with a batch size of 1 as well. In which case the standard shapes should be input = torch.randn((1, 5)) for batch size=1 and in_features=5, and weights = torch.randn((5, 1)) for in_features=5 and out_features=1.
There is no reason why weights should ever be the same shape as features; thus weights = torch.randn_like(features) doesn't make sense.
Lastly, for your actual questions:
"Should I transpose features or weights in Neural network?" - in torch convention, you should transpose weights, but use matmul with the features first. Other frameworks may have a different convention; as long as in_features dimension of the weights is multiplied by the num_features dimension of the input, it would work.
"Why does this happen?" - these are two completely different calculations; there is no reason to think they would produce the same result.
"So the * means element-wise multiplication, whereas torch.mm() and torch.matmul() are matrix-wise multiplication." - Yes; mm is matrix-matrix only, matmul is vector-matrix or matrix-matrix, including batched versions of same - check the docs for everything matmul can do (which is kinda a lot).
"differences between torch.mm() and torch.matmul(): mm() is used specifically for 2 dimensions matrix, whereas matmul() can be used for more complicated cases." - Yes; the big difference is that matmul can broadcast. Use it when you specifically intend that; use mm to prevent unintentional broadcasting.
"In Neutral Network for this Udacity coding exercise mentioned in my above link, it needs element-wise multiplication." - I doubt it; it's probably an error in the Udacity code. This bit of code weights = torch.randn_like(features) looks like an error in any case; the dimensions of weights have a meaning different from the dimensions of features.

This line is taking an outer product between the two vectors.
product = features.t() * weights + bias
The resulting shape is 5x5.
If you change this to a dot product, then output will match y_hat.
product = torch.mm(weights, features.t()) + bias

How much freedom does TensorFlow take away from pure Python?

I'm trying to get into ML and Deep Learning. It's been educational for me to start out with this in Python rather than a different language where I would may lose focus of exactly what is really going on. I've looked across the internet for tutorials on Neural Networks in Python and TensorFlow seems to be dominating the field, is this true? I enjoy doing things on my own (pure language) but I haven't seen a lot of tutorials that teach this that don't involve TensorFlow or some other library (Keras, Scikit-learn, etc.); so now I've decided to look into these.
The question I have is: does TensorFlow take away from pure Python?
For example, this code is from a tutorial and it creates a simple neural net that predicts what the output will be based on three numbers (NOTE: I haven't checked how this code trains and could probably make this better if I did I were to make it again myself):
import numpy as np
class NeuralNetwork():
def __init__(self):
# seeding for random number generation
np.random.seed(1)
#converting weights to a 3 by 1 matrix with values from -1 to 1 and mean of 0
self.synaptic_weights = 2 * np.random.random((3, 1)) - 1
def sigmoid(self, x):
#applying the sigmoid function
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(self, x):
#computing derivative to the Sigmoid function
return x * (1 - x)
def train(self, training_inputs, training_outputs, training_iterations):
#training the model to make accurate predictions while adjusting weights continually
for iteration in range(training_iterations):
#siphon the training data via the neuron
output = self.think(training_inputs)
#computing error rate for back-propagation
error = training_outputs - output
#performing weight adjustments
adjustments = np.dot(training_inputs.T, error * self.sigmoid_derivative(output))
self.synaptic_weights += adjustments
def think(self, inputs):
#passing the inputs via the neuron to get output
#converting values to floats
inputs = inputs.astype(float)
output = self.sigmoid(np.dot(inputs, self.synaptic_weights))
return output
if __name__ == "__main__":
#initializing the neuron class
neural_network = NeuralNetwork()
print("Beginning Randomly Generated Weights: ")
print(neural_network.synaptic_weights)
#training data consisting of 4 examples--3 input values and 1 output
training_inputs = np.array([[0,0,1],
[1,1,1],
[1,0,1],
[0,1,1]])
training_outputs = np.array([[0,1,1,0]]).T
#training taking place
neural_network.train(training_inputs, training_outputs, 15000)
print("Ending Weights After Training: ")
print(neural_network.synaptic_weights)
user_input_one = str(input("User Input One: "))
user_input_two = str(input("User Input Two: "))
user_input_three = str(input("User Input Three: "))
print("Considering New Situation: ", user_input_one, user_input_two, user_input_three)
print("New Output data: ")
print(neural_network.think(np.array([user_input_one, user_input_two, user_input_three])))
print("Wow, we did it!")
What does TensorFlow take away from this?
Thanks!

Hopefully this answers your question. Personally I like to think of TensorFlow as the OpenCV of Machine Learning. It contains large number of libraries that help Data Scientists/Developers who want to get things done more quickly. For example, TensorFlow has Keras library which faciliaties that addition and modification of neural network layers, as well as functions such as ImageDataGenerator which helps loading the trining/verification data and categorizing it. In your code, you had to manually implement sigmoid_derivative() function. However, in Tensorflow the function is already written for you
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(8, (3,3), activation=tf.nn.relu, input_shape=(150,150,3)),
tf.keras.layers.MaxPooling2D(3,3),
tf.keras.layers.Conv2D(16, (3,3), activation=tf.nn.relu),
tf.keras.layers.MaxPooling2D(3,3),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(16,activation=tf.nn.relu),
tf.keras.layers.Dense(1, activation='sigmoid')
])
The code above shows a simple neural network to identify dogs and cats (so original..). Notice activation='sigmoid'? That's the sigmoid function that without TensorFlow I would've had to type manually, like you did.

For anyone reading this I am going to share my thoughts. I chose to take the route of pure python with nothing like TensorFlow. I heard from other people that TensorFlow could take away an understanding that beginners need. At first, it was really hard for me to find anything that could help, but then I found everything all at once. The following links are the resources I used.
Theory:
The first 2-4 videos of this series by 3blue1brown: https://www.youtube.com/watch?v=WUvTyaaNkzM&list=PLZHQObOWTQDMsr9K-rj53DwVRMYO3t5Yr
All of the videos in this series by 3blue1brown: https://www.youtube.com/watch?v=aircAruvnKk&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi
Implementation:
This video by the coding train: https://www.youtube.com/watch?v=jc2IthslyzM&list=PLRqwX-V7Uu6bCN8LKrcMa6zF4FPtXyXYj&index=8
Additional links for the calculus are: https://www.symbolab.com/cheat-sheets/Derivatives#, https://www.mathsisfun.com/calculus/derivatives-partial.html
And this entire series by the coding train: https://www.youtube.com/watch?v=XJ7HLz9VYz0&list=PLRqwX-V7Uu6Y7MdSCaIfsxc561QI0U0Tb
Summary
And that is it, that is all I really needed to get started it is awesome! One more thing I should note is that I eventually ended up using Processing.py, if you don't know what it is, it is not a library for neural networks, but instead allows for really easy graphics in python, there are functions like ellipse which just make a circle, this is the python implementation of what the coding train uses.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.