I'm going through the "Make Your Own Neural Networks" book and following through the examples to implement my first NN. I understood the basic concepts and in particular this equation where the output is calculated doing a matrix dot product of the inputs and weights:
X = W * I
Where X is the output before applying the Sigmoid, W the link weights and I the inputs.
Now in the book, they do have a function that takes in this input as an array and then they translate that array to a 2 dimensional one. My understanding is that, the value of X is calculated like this based on:
W = [0.1, 0.2, 0.3
0.4, 0.5, 0.6
0.7, 0.8, 0.9]
I = [1
2
3]
So if I now pass in an array for my inputs like [1,2,3], why is that I need do the following to have it converted to a 2-D array as it is done in the book:
inputs = numpy.array(inputs, ndmin=2).T
Any ideas?
Your input here is a one-dimensional list (or a one-dimensional array):
I = [1, 2, 3]
The idea behind this one-dimensional array is the following: if these numbers represent the width in centimetres of a flower petal, its length, and its weight in grams: your flower petal will have a width of 1cm, a length of 2cm, and a weight of 3g.
Converting your input I to a 2-D array is necessary here for two things:
first, by default, converting this list to a NumPy array using numpy.array(inputs) will yield an array of shape (3,), with the second dimension left undefined. By setting ndmin=2, it forces the dimensions to be (3, 1), which allows to not generate any NumPy-related problems, for instance when using matrix multiplication, etc.
secondly, and perhaps more importantly, as I said in my comment, data in Neural Networks are conventionally stored in arrays this way, under the idea that each row in your array will represent a different feature (so there is a unique list for each feature). In other words, it's just a conventional way to say your not confusing apples and pears (in that case, length and weight)
So when you do inputs = numpy.array(inputs, ndmin=2).T, you end up with:
array([[1], # width
[2], # length
[3]]) # weight
and not:
array([1, 2, 3])
Hope it made things a bit clearer!
Related
I know this may be a question of semantics but I always see different articles explain forward pass slightly different. e.g. Sometimes they represent a forward pass to a hidden layer in a standard neural network as np.dot(x, W) and sometimes I see it as np.dot(W.T, x) and sometimes np.dot(W, x).
Take this image for example. They represent the input data as a matrix of [NxD] and weight data as [DxH] where H is the number of neurons in the hidden layer. This seems the most natural since input data will often be in tabular format with rows as samples and columns as features.
Now an example from the CS231n course notes. They talk about this below example and cite the code used to compute it as:
f = lambda x: 1.0/(1.0 + np.exp(-x)) # activation function (use sigmoid)
x = np.random.randn(3, 1) # random input vector of three numbers (3x1)
h1 = f(np.dot(W1, x) + b1) # calculate first hidden layer activations (4x1)
h2 = f(np.dot(W2, h1) + b2) # calculate second hidden layer activations (4x1)
out = np.dot(W3, h2) + b3 # output neuron (1x1)
Where W is [4x3] and x is [3x1]. I would expect the weight matrix to have dimensions equal to [n_features, n_hidden_neurons] but in this example it just seems like they transposed it naturally before it was used.
I guess I am just confused about general nomenclature in how data should be shaped and used consistently when computing neural network forward passes. Sometimes I see transpose, sometimes I don't. Is there a standard, preferred way to represent data in accordance to a diagram like these This question may be silly but I just wanted to discuss it a bit. Thank you.
TLDR;
NumPy handles the computation for dot product of 2 arrays without worrying too much about the order in which they are added to parameters IF one or both of them are 1-D arrays. In case of 2-D arrays, the dot product requires them to share the last and first axis respectively.
Detailed explanation;
From a mathematical point of view, during dot product (nxm) and (mx1) should share a common dimension so that the resultant array is of the form (nx1). However, from an implementation perspective, numpy handles the shapes of inputs if one or both of them are 1-D arrays. As per official documentation -
If both a and b are 1-D arrays, it is inner product of vectors
(without complex conjugation).
If both a and b are 2-D arrays, it is matrix multiplication,
but using :func:matmul or a # b is preferred.
If either a or b is 0-D (scalar), it is equivalent to :func:multiply
and using numpy.multiply(a, b) or a * b is preferred.
If a is an N-D array and b is a 1-D array, it is a sum-product over the last axis of a and b.
If a is an N-D array and b is an M-D array (where M>=2), it is a
sum-product over the last axis of a and the second-to-last axis of b
Therefore, consider the toy example -
x = np.array([4,5,6]) #shape - (3,1)
W = np.array([[1,0,0],[0,1,0],[0,0,1]]) #shape - (3,3)
np.dot(x, W)
#Output - array([4, 5, 6])
np.dot(W.T, x)
#Output - array([4, 5, 6])
np.dot(W, x)
#Output - array([4, 5, 6])
All of the above have the same result computationally because the last axis of both W and x is 3 , which is shared while the other axes result in the dot product which can be either (1,3) or (3,1) which NumPy simplifies to a (3,) changing it to a 1-D vector instead of a 2-D matrix with a single row.
If a is an N-D array and b is a 1-D array, it is a sum-product over the last axis of a and b.
This doesn't work as easily when we are taking dot product of 2-D arrays. This is where you have to make sure that the first array shares its last axis with the first axis of the second array. example -
W2 = np.array([[1,0,0],[0,1,0]]) #shape - (2,3)
np.dot(W, W2)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-913-839ef1feb8c0> in <module>
----> 1 np.dot(W, W2)
ValueError: shapes (3,3) and (2,3) not aligned: 3 (dim 1) != 2 (dim 0)
I'm new to numpy, and found such strange(as for me) behavior.
I'm implementing logistic regression cost function, here I have 2 column vectors with same dimension and same types(dfloat). y contains bunch of zeros and ones, and a contains float numbers in range (-1, 1).
At some point I should get dot product so I transpose one and multiply them:
x = y.T # a
But when I use
x = y # a.T
occasionally performance decreases about 3 times, while results are the same
Why is this so? Isn't operations are the same?
Thanks.
The performance decreases, and you get a very different answer!
For vector multiplication (unlike number multiplication) a # b != b # a. In your case (assuming column vectors), a.T # b is a number, but a # b.T is a full-blown matrix! So, if your vectors are both of shape (1, y), the last operation will result in a (y, y) matrix, which may be pretty huge. Of course, it'll take way more time to compute such a matrix (a.k.a. add a whole lot of numbers and produce a whole lot of numbers), than to add a bunch of numbers and produce one single number.
That's how matrix (or vector) multiplication works.
I have a batch of N sequences of integers of length L which is embedded into a N*L*d tensor. This sequence is auto-encoded by my network architecture. So, I have:
from theano import tensor as T
X = T.imatrix('X') # N*L elements in [0,C]
EMB = T.tensor('Embedding') # N*L*d
... # some code goes here :-)
PY = T.tensor('PY') # N*L*C probability of the predicted class in [0,C]
cost = -T.log(PY[X])
as far as I could get, the indexing is in the first dimension of the tensor, so I had to use a theano.scan. Is there a way to index the tensor directly?
Sounds like you want a 3 dimensional version of theano.tensor.nnet.categorical_crossentropy?
If so, then I think you could simply flatten the matrix of true class label indexes into a vector and the 3D tensor of predicted class probabilities into a matrix and then use the built in function.
cost = T.nnet.categorical_crossentropy(
Y.reshape((Y.shape[0] * Y.shape[1], X.shape[2])),
X.flatten())
The order of entries in Y may need to be adjusted first (e.g. via a dimshuffle) to make sure the entries in the matrix and vector being compared correspond to each other.
Here we assume, as the question suggests, that the sequences are not padded -- they are all exactly L elements in length. If the sequences are actually padded then you may need to do something much more complicated to avoid computing cost elements inside the padding regions.
I'm trying to use numpy and i couldn't figure out how to properly define a n by n matrix in numpy.
I've used the numpy.zeros(n,n)... but I'm not really sure if it is ok.
is it correct to use numpy like this?
im trying to get (matrix^T * vector) - vector.
matrix = np.zeros((n,n))
start = [(1/float(n)) for _ in range(n)]
vector = np.array(start)
newvector = np.dot(np.transpose(matrix) , vector)
ans= np.subtract(newvector , vector)
I'm asking this because im getting the wrong results and im not sure where is my problem
To define a matrix in numpy, you have several choices:
numpy.zeros defines a matrix filled with zeros.
numpy.ones defines a matrix filled with ones.
numpy.array defines a matrix based on something else (a list, for example)
numpy.empty defines a matrix without assigning values to it (so it contains what currently is in memory a the place it was allocated).
All those functions use as first argument a tuple with the dimensions of the matrix. This is why parenthesis are doubled.
With numpy, you can use any usual operator (+, -, * /, **), which are performed elementwise.
To perform matrix multiplication, you need to use numpy.dot function.
You can then wirte you function as:
n = 10
matrix = numpy.zeros((n,n))
vector = numpy.ones(n) / n
newvector = numpy.dot(matrix.T, vector)
ans = newvector - vector
But I suppose that matrix should be something else than a matrix of zeros, or the transpose operation isn't needed.
Basically you are right how you to use numpy. To ease the usage I would write the start vector in a different way and use object methods to calculate the desired values.
n = 10
matrix = np.zeros((n, n))
vector = np.ones((n,)) * 1.0/n
new_vector = matrix.T.dot(vector)
ans = new_vector - vector
print ans
>>> [-0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1]
The output should be correct (Matrix times vector should be a vector full of zeros minus one devided by ten, voila). I'm quite not sure about the general form of an NxM matrix and the usage of transpose (that would need another minute to think about ;-) )
In addition to the answer by #CharlesBrunet, there is a specialized class for mathematical matrices where A*B is the standard matrix multiplication (as opposed to element-wise).
numpy.matrix
Returns a matrix from an array-like object, or from a string of data. A matrix is a specialized 2-D array that retains its 2-D nature through operations. It has certain special operators, such as * (matrix multiplication) and ** (matrix power).
Creation examples from the docs:
>>> a = numpy.matrix('1 2; 3 4')
>>> print a
[[1 2]
[3 4]]
>>> numpy.matrix([[1, 2], [3, 4]])
matrix([[1, 2],
[3, 4]])
I'm looking to calculate least squares linear regression from an N by M matrix and a set of known, ground-truth solutions, in a N-1 matrix. From there, I'd like to get the slope, intercept, and residual value of each regression. Basic idea being, I know the actual value of that should be predicted for each sample in a row of N, and I'd like to determine which set of predicted values in a column of M is most accurate using the residuals.
I don't describe matrices well, so here's a drawing:
(N,M) matrix with predicted values for each row N
in each column of M...
##NOTE: Values of M and N are not actually 4 and 3, just examples
4 columns in "M"
[1, 1.1, 0.8, 1.3]
[2, 1.9, 2.2, 1.7] 3 rows in "N"
[3, 3.1, 2.8, 3.3]
(1,N) matrix with actual values of N
[1]
[2] Actual value of each sample N, in a single column
[3]
So again, for clarity's sake, I'm looking to calculate the lstsq regression between each column of the (N,M) matrix and the (1,N) matrix.
For instance, the regression between
[1] and [1]
[2] [2]
[3] [3]
then the regression between
[1] and [1.1]
[2] [1.9]
[3] [3.1]
and so on, outputting the slope, intercept, and standard error (average residual) for each regression calculated.
So far in the numpy/scipy documentation and around the 'net, I've only found examples computing one column at a time. I had thought numpy had the capability to compute regressions on each column in a set with the standard
np.linalg.lstsq(arrayA,arrayB)
But that returns the error
ValueError: array dimensions must agree except for d_0
Do I need to split the columns into their own arrays, then compute one at a time?
Is there a parameter or matrix operation I need to use to have numpy calculate the regressions on each column independently?
I feel like it should be simpler? I've looked it all over, and I can't seem to find anyone doing something similar.
Maybe you switched A and b?
Following works for me:
A=np.random.rand(4)+np.arange(3)[:,None]
# A is now a (3,4) array
b=np.arange(3)
np.linalg.lstsq(A,b)
The 0th dimension of arrayB must be the same as the 0th dimension of arrayA (ref: the official documentation of np.linalg.lstsq). You need matrices with dimensions (N, M) and (N, 1) or (N, M) and (N) instead of the (N,M) and (1,N) matrices you're using now.
Note that the (N, 1) and N dimensional matrices will give identical results -- but the shapes of the arrays will be different.
I get a slightly different exception from you, but that may be due to different versions (I am using Python 2.7, Numpy 1.6 on Windows):
>>> A = np.arange(12).reshape(3, 4)
>>> b = np.arange(3).reshape(1, 3)
>>> np.linalg.lstsq(A,b)
# This gives "LinAlgError: Incompatible dimensions" exception
>>> np.linalg.lstsq(A,b.T)
# This works, note that I am using the transpose of b here