Debug array in jax vmap function - python

Dear jax experts I need your kind help.
Here is a working example (I have follow the advise to simplify my code, although I am not an expert on jax neither on Python to guess what is the heart of the mechanism involved in vmap)
def jax_kernel(rng_key, logpdf, position, log_prob):
key, subkey = jax.random.split(rng_key)
move_proposals = jax.random.normal(key, shape=position.shape)* 0.1
proposal = position + move_proposals
proposal_log_prob = logpdf(proposal)
return proposal, proposal_log_prob
def jax_sampler(rng_key, n_samples, logpdf, initial_position):
def mh_update(i, state):
key, positions, log_prob = state
_, key = jax.random.split(key)
print(f"mh_update: positions[{i-1}]:",jnp.asarray(positions[i-1]))
new_position, new_log_prob = jax_kernel(key,logpdf,positions[i-1],log_prob)
positions=positions.at[i].set(new_position)
return (key, positions, new_log_prob)
# all positions structure should be set before lax.fori_loop
print("initial_position shape:",initial_position.shape)
all_positions = jnp.zeros((n_samples,)+initial_position.shape)
all_positions=all_positions.at[0,0].set(1.)
all_positions=all_positions.at[0,1].set(2.)
all_positions=all_positions.at[0,2].set(2.)
print("all_positions init:",all_positions.shape)
logp = logpdf(all_positions[0])
# use of a for-loop to be able to debug mh_update instead of a jax.fori_loop
initial_state = (rng_key,all_positions, logp)
val = initial_state
for i in range(1, n_samples):
val = mh_update(i, val)
rng_key, all_positions, log_prob = val
# return all the positions of the parameters (n_chains, n_samples, n_dim)
return all_positions
def func(par):
xi = jnp.asarray(sci_stats.uniform.rvs(size=10))
val = xi*par[1]+par[0]
return jnp.sum(jax.scipy.stats.norm.logpdf(x=val,loc=yi,scale=par[2]))
n_dim = 3 # number of parameters ie. (a,b,s)
n_samples = 5 # number of samples per chain
n_chains = 4 # number of MCMC chains
rng_key = jax.random.PRNGKey(42)
rng_keys = jax.random.split(rng_key, n_chains)
initial_position = jnp.ones((n_dim, n_chains))
print("main initial_position shape",initial_position.shape)
run = jax.vmap(jax_sampler, in_axes=(0, None, None, 1), out_axes=0)
all_positions = run(rng_keys,n_samples,lambda p: func(p),initial_position)
print("all_positions:",all_positions)
Then my question concerns the dimension evolution print(f"mh_update: positions[{i-1}]:",jnp.asarray(positions[i-1])). I do not understand why positions[i-1]starts with dimension n_dim and then switches to n_chains x n_dim?
Thanks in advance for your comments?
Here is the complete output:
main initial_position shape (3, 4)
initial_position shape: (3,)
all_positions init: (5, 3)
mh_update: positions[0]: [1. 2. 2.]
mh_update: positions[1]: Traced<ShapedArray(float32[3])>with<BatchTrace(level=1/0)>
with val = DeviceArray([[0.9354116 , 1.7876872 , 1.8443539 ],
[0.9844745 , 2.073029 , 1.9511036 ],
[0.98202926, 2.0109322 , 2.094176 ],
[0.9536771 , 1.9731759 , 2.093319 ]], dtype=float32)
batch_dim = 0
mh_update: positions[2]: Traced<ShapedArray(float32[3])>with<BatchTrace(level=1/0)>
with val = DeviceArray([[1.0606856, 1.6707807, 1.8377957],
[1.0465866, 1.9754674, 1.7009288],
[1.1107644, 2.0142047, 2.190575 ],
[1.0089972, 1.9953227, 1.996874 ]], dtype=float32)
batch_dim = 0
mh_update: positions[3]: Traced<ShapedArray(float32[3])>with<BatchTrace(level=1/0)>
with val = DeviceArray([[1.0731456, 1.644405 , 2.1343162],
[1.0599504, 2.0121546, 1.6867112],
[1.0585173, 1.9661485, 2.1573594],
[1.1213307, 1.9335203, 1.9683584]], dtype=float32)
batch_dim = 0
all_positions: [[[1. 2. 2. ]
[0.9354116 1.7876872 1.8443539 ]
[1.0606856 1.6707807 1.8377957 ]
[1.0731456 1.644405 2.1343162 ]
[1.0921828 1.5742197 2.058759 ]]
[[1. 2. 2. ]
[0.9844745 2.073029 1.9511036 ]
[1.0465866 1.9754674 1.7009288 ]
[1.0599504 2.0121546 1.6867112 ]
[1.0835105 2.0051234 1.4766487 ]]
[[1. 2. 2. ]
[0.98202926 2.0109322 2.094176 ]
[1.1107644 2.0142047 2.190575 ]
[1.0585173 1.9661485 2.1573594 ]
[1.1728328 1.981367 2.180744 ]]
[[1. 2. 2. ]
[0.9536771 1.9731759 2.093319 ]
[1.0089972 1.9953227 1.996874 ]
[1.1213307 1.9335203 1.9683584 ]
[1.1148386 1.9598911 2.1721165 ]]]

In the first iteration, you print a concrete array that you have constructed within a vmapped function. It is a float32 array of shape (3,).
After the first iteration, you've constructed a new array via operations on a vmapped input. When you vmap an input like this, JAX replaces your input array with a tracer that is an abstract representation of your input; the printed value looks like this:
Traced<ShapedArray(float32[3])>with<BatchTrace(level=1/0)>
with val = DeviceArray([[1.0731456, 1.644405 , 2.1343162],
[1.0599504, 2.0121546, 1.6867112],
[1.0585173, 1.9661485, 2.1573594],
[1.1213307, 1.9335203, 1.9683584]], dtype=float32)
The float32[3] indicates that this tracer represents an array of float32 values of shape (3,): that is, it still has the same type and shape as in the first iteration. But in this case it is not a concrete array with three elements, it is a batched tracer representing each iteration of the vmapped input. The power of the vmap transform is that JAX effectively tracks all implied iterations of the vmapped computation in one pass: in the tracer representation, the rows of val effectively show you the intermediate values for all the vmapped iterations.
For more understanding of how JAX tracing works, a good read is How To Think In JAX in the JAX documentation.

Related

tf.contrib.signal.stft returns an empty matrix

This is the piece of code I run:
import tensorflow as tf
sess = tf.InteractiveSession()
filename = 'song.mp3' # 30 second mp3 file
SAMPLES_PER_SEC = 44100
audio_binary = tf.read_file(filename)
pcm = tf.contrib.ffmpeg.decode_audio(audio_binary, file_format='mp3', samples_per_second=SAMPLES_PER_SEC, channel_count = 1)
stft = tf.contrib.signal.stft(pcm, frame_length=1024, frame_step=512, fft_length=1024)
sess.close()
The mp3 file is properly decoded because print(pcm.eval().shape) returns:
(1323119, 1)
And there are even some actual non-zero values when I print them with print(pcm.eval()[1000:1010]):
[[ 0.18793298]
[ 0.16214484]
[ 0.16022217]
[ 0.15918455]
[ 0.16428113]
[ 0.19858395]
[ 0.22861415]
[ 0.2347789 ]
[ 0.22684409]
[ 0.20728172]]
But for some reason print(stft.eval().shape) evaluates to:
(1323119, 0, 513) # why the zero dimension?
And therefore print(stft.eval()) is:
[]
According to this the second dimension of the tf.contrib.signal.stft output is equal to the number of frames. Why are there no frames though?
It seems that tf.contrib.ffmpeg.decode_audio returned a tensor of shape (?, 1) which is one signal of ? samples.
However tf.contrib.signal.stft expects a (signal_count, samples) tensor as input, therefore one has to transpose it beforehand.
Modifying the call like this does the trick:
stft = tf.contrib.signal.stft(tf.transpose(pcm), frame_length=1024, frame_step=512, fft_length=1024)

Tensorflow Grab Predictions and Indices for values above thresholds

What is the easiest way to grab the corresponding prediction values and indices based on those above a certain threshold?
Consider this problem:
sess = tf.InteractiveSession()
predictions = tf.constant([[ 0.32957435, 0.82079124, 0.54503286, 0.51966476, 0.63359714,
0.92034972, 0.13774526, 0.45154464, 0.18284607, 0.14604568],
[ 0.78612137, 0.98291659, 0.4841609 , 0.63260579, 0.21568334,
0.82978213, 0.05054879, 0.09517837, 0.28309393, 0.01788473],
[ 0.05706763, 0.24366784, 0.04608512, 0.32987678, 0.2342416 ,
0.91725373, 0.60084391, 0.51787591, 0.74161232, 0.30830121],
[ 0.67310858, 0.6250236 , 0.42477703, 0.37107778, 0.65123832,
0.97282803, 0.59533679, 0.49564457, 0.54935825, 0.63008392],
[ 0.70233917, 0.48129809, 0.59114349, 0.63535333, 0.71188867,
0.4799161 , 0.90896237, 0.86089945, 0.47896886, 0.83451629],
[ 0.82923532, 0.8950938 , 0.99231505, 0.05526769, 0.98151541,
0.18153167, 0.63851702, 0.07426929, 0.91846335, 0.81246626],
[ 0.12850153, 0.23018432, 0.29871917, 0.71228445, 0.13235569,
0.41061044, 0.98215759, 0.90024149, 0.53385031, 0.92247963],
[ 0.87011361, 0.44218826, 0.01772344, 0.87317121, 0.52231467,
0.86476815, 0.25352192, 0.31709731, 0.38249743, 0.74694788],
[ 0.15262914, 0.49544573, 0.49644637, 0.07461977, 0.13706958,
0.18619633, 0.86163998, 0.03700352, 0.51173556, 0.40018845]])
score_idx = tf.where(predictions > 0.8)
scores = tf.SparseTensor(score_idx, tf.gather_nd(predictions, score_idx), dense_shape=tf.shape(predictions, out_type=tf.int64))
dense_scores = tf.sparse_tensor_to_dense(scores)
print(sess.run([scores, dense_scores]))
I can easily get a sparse tensor that has all of the predictions above 0.8, but ultimately I am looking to return two separate 1D tensors:
Predicted Indices = list of indexes above threshold (0.8 in example)
Scores = the scores for the corresponding examples
So for the first row which is:
[ 0.32957435, 0.82079124, 0.54503286, 0.51966476, 0.63359714,
0.92034972, 0.13774526, 0.45154464, 0.18284607, 0.14604568]
I am looking to return:
predicted_indices = [1,5]
scores = [0.821, 0.920]
Is there a simple solution that I am missing?

Python numpy not multiplying matrix correctly.

I am attempting to pull out these y values from neural network. The current problem seems to be numpy not multiplying the matrix as I expected. I have included the code and output for your review. Thank you in advance for any insights you can provide.
def columnToRow(column):
newarray = np.array([column])
return newarray
def calcIndividualOutput(indivInputs,weights,biases):
# finds the resulting y values for one set of input data
I_transposed= columnToRow(indivInputs)
output = np.multiply(I_transposed, weights) + biases
return output
def getOutputs(inputs,weights,biases):
# iterates over each set of inputs to find corresponding outputs
# returns output matrix
i_len = len(inputs)-1
outputs = []
for i in range(0,i_len):
result = calcIndividualOutput(inputs[i],weights,biases)
outputs.append(np.tanh(result))
if (i==i_len):
print("Final Input reached:", i)
return outputs
# Test Single line of Outputs should
#print("Resulting Outputs0:\n\n",resultingOutputs[0,0:])
# Testing
currI=data[0]
Itrans=columnToRow(currI)
print(" THE CURRENT I0\n\n",currI,"\n\n")
print("transposed I:\n\n",Itrans,"\n\n")
print("Itrans shape:\n\n",Itrans.shape,"\n\n")
print("Current biases:\n\n",model_l1_b,"\n\n")
print("Current biases shape:\n\n",model_l1_b.shape,"\n\n")
print("B trans:",b_trans,"\n\n")
print("B trans shape:",b_trans.shape,"\n\n")
print("Current weights:\n\n",model_l1_W,"\n\n")
print("Transposed weights\n\n",w_transposed,"\n\n")
print("wtrans shape:\n\n",w_transposed.shape,"\n\n")
#Test calcIndividualOutput
testOutput= calcIndividualOutput(currI,w_transposed,b_trans)
print("Test calcIndividualOutput:\n\n",testOutput,"\n\n")
print("Test calcIndividualOutput Shape:\n\n",testOutput.shape,"\n\n")
# Transpose weights to match dimensions of input
b_trans=columnToRow(model_l1_b)
w_transposed=np.transpose(model_l1_W)
resultingOutputs = getOutputs(data,w_transposed,b_trans)
Output:
THE CURRENT I0
[-0.66399151 -0.59143853 0.5230611 -0.52583802 -0.31089544 0.47396523
-0.7301591 -0.21042131 0.92044264 -0.48792791 -1.54127669]
transposed I:
[[-0.66399151 -0.59143853 0.5230611 -0.52583802 -0.31089544 0.47396523
-0.7301591 -0.21042131 0.92044264 -0.48792791 -1.54127669]]
Itrans shape:
(1, 11)
Current biases:
[ 0.04497563 -0.01878226 0.03285328 0.00443657 -0.10408497 0.03982726
-0.07724283]
Current biases shape:
(7,)
B trans: [[ 0.04497563 -0.01878226 0.03285328 0.00443657 -0.10408497 0.03982726
-0.07724283]]
B trans shape: (1, 7)
Current weights:
[[ 0.02534341 0.01163373 -0.20102289 0.23845847 0.20859972 -0.09515963
0.00744185 -0.06694793 -0.03806938 0.02241485 0.34134269]
[ 0.0828636 -0.14711063 0.44623381 0.0095899 0.41908434 -0.25378567
0.35789928 0.21531652 -0.05924326 -0.18556432 0.23026766]
[-0.23547475 -0.18090464 -0.15210266 0.10483326 -0.0182989 0.52936584
0.15671678 -0.64570689 -0.27296376 0.28720504 0.21922119]
[-0.17561196 -0.42502806 -0.34866759 -0.07662395 -0.02361901 -0.10330012
-0.2626377 0.19807351 0.20543958 -0.34499851 0.29347673]
[-0.04404973 -0.31600055 -0.22984107 0.21733086 -0.15065287 0.18301299
0.13399698 0.11884601 0.04380761 -0.03720044 0.0146924 ]
[ 0.25086868 0.15678053 0.30350113 0.13065964 -0.30319506 0.47015968
0.00549904 0.32486886 -0.00331726 0.22858304 0.16789439]
[-0.10196115 -0.03687141 -0.28674102 0.01066647 0.2475083 0.15808311
-0.1452509 0.09170815 -0.14578934 -0.07375327 -0.16524883]]
Transposed weights
[[ 0.02534341 0.0828636 -0.23547475 -0.17561196 -0.04404973 0.25086868
-0.10196115]
[ 0.01163373 -0.14711063 -0.18090464 -0.42502806 -0.31600055 0.15678053
-0.03687141]
[-0.20102289 0.44623381 -0.15210266 -0.34866759 -0.22984107 0.30350113
-0.28674102]
[ 0.23845847 0.0095899 0.10483326 -0.07662395 0.21733086 0.13065964
0.01066647]
[ 0.20859972 0.41908434 -0.0182989 -0.02361901 -0.15065287 -0.30319506
0.2475083 ]
[-0.09515963 -0.25378567 0.52936584 -0.10330012 0.18301299 0.47015968
0.15808311]
[ 0.00744185 0.35789928 0.15671678 -0.2626377 0.13399698 0.00549904
-0.1452509 ]
[-0.06694793 0.21531652 -0.64570689 0.19807351 0.11884601 0.32486886
0.09170815]
[-0.03806938 -0.05924326 -0.27296376 0.20543958 0.04380761 -0.00331726
-0.14578934]
[ 0.02241485 -0.18556432 0.28720504 -0.34499851 -0.03720044 0.22858304
-0.07375327]
[ 0.34134269 0.23026766 0.21922119 0.29347673 0.0146924 0.16789439
-0.16524883]]
wtrans shape:
(11, 7)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-162-7e8be1d52690> in <module>()
48 #Test calcIndividualOutput
49
---> 50 testOutput= calcIndividualOutput(currI,w_transposed,b_trans)
51 print("Test calcIndividualOutput:\n\n",testOutput,"\n\n")
52 print("Test calcIndividualOutput Shape:\n\n",testOutput.shape,"\n\n")
<ipython-input-162-7e8be1d52690> in calcIndividualOutput(indivInputs, weights, biases)
7 # finds the resulting y values for one set of input data
8 I_transposed= columnToRow(indivInputs)
----> 9 output = np.multiply(I_transposed, weights) + biases
10 return output
11
ValueError: operands could not be broadcast together with shapes (1,11) (11,7)
np.multiply is for multiplying arrays element-wise, but from the dimensions of you data I guess that you are looking for matrix multiplication. To get that use np.dot.
The dot product maps R^n x R^n -> R, that's probably what you want.
If you're coming from Matlab, that's the same difference like A * B and A .* B
I think you are looking for np.matmul(a,b)
This is the row-wise actual multiplication that we actually do in math.
so if a = AxB dimension and b = BxC dimension..then
res = np.matmul(a,b) will have shape AxC..

Creating a generator from list of sequences for RNN

I need to create a generator for my data to pass into my RNN training function. I have a list of patient samples, where each sample is a time series of length ni (which varies, annoyingly) in three dimensions, and I want to create batches of data where each sample in a batch only belongs to a single patient but each batch may contain multiple patient samples. Doing it this way should maximise the number of samples I can train using with no consequences as my RNN is not stateful. At first I had the following function
def dataIterator(rawDataList, config):
batchSize, nSteps = config.batchSize, config.nSteps
for rawData in rawDataList:
dataLen, dataWidth = rawData.shape
batchLen = dataLen // batchSize
data = np.zeros([batchSize, batchLen, dataWidth], dtype=np.float32)
for i in xrange(batchSize):
data[i] = rawData[batchLen*i:batchLen*(i+1), :]
epochSize = (batchLen - 1) // nSteps
if epochSize == 0:
raise ValueError('epoch_size == 0')
for i in xrange(epochSize):
x = data[:, i*nSteps:(i+1)*nSteps, :]
y = data[:, i*nSteps+1:(i+1)*nSteps+1, :]
yield (x, y)
However this trims each of the patient samples in order to fit the batch size. So I want something that creates all possible batches, including the undersized one at the end. However my unfamiliarity with generators has left me pretty confused. So far I've worked out it's going to have to use modulo aritmetic, but exactly how I'm not sure, so I've only got to this point:
def dataIterator(data, batchSize=batchSize, nSteps=nSteps, nDimensions=3):
nTimePoints = sum([len(x) for x in data])
totalBatchLen = 1+(nTimePoints-1)//batchSize
newData = np.zeros([batchSize, totalBatchLen, nDimensions])
for i in xrange(batchSize):
...
EDIT
Here's a short example to show how I would solve the problem without using generators
import numpy as np
np.random.seed(42)
nPatients = 3
tsLength = 5
nDimensions = 3
rnnTSLength = 3
batchSize = 3
inputData = np.random.random((nPatients, tsLength, nDimensions))
inputData[1, :, :] *= 10
inputData[2, :, :] *= 100
outputData = []
for i in xrange(tsLength-rnnTSLength):
outputData.append(inputData[0, i:i+rnnTSLength, :])
for i in xrange(tsLength-rnnTSLength):
outputData.append(inputData[1, i:i+rnnTSLength, :])
for i in xrange(tsLength-rnnTSLength):
outputData.append(inputData[2, i:i+rnnTSLength, :])
temp1 = np.array(outputData[:3])
temp2 = np.array(outputData[3:])
npOutput = np.array((temp1, temp2))
print npOutput
Which produces:
[[[[ 3.74540119e-01 9.50714306e-01 7.31993942e-01]
[ 5.98658484e-01 1.56018640e-01 1.55994520e-01]
[ 5.80836122e-02 8.66176146e-01 6.01115012e-01]]
[[ 5.98658484e-01 1.56018640e-01 1.55994520e-01]
[ 5.80836122e-02 8.66176146e-01 6.01115012e-01]
[ 7.08072578e-01 2.05844943e-02 9.69909852e-01]]
[[ 1.83404510e+00 3.04242243e+00 5.24756432e+00]
[ 4.31945019e+00 2.91229140e+00 6.11852895e+00]
[ 1.39493861e+00 2.92144649e+00 3.66361843e+00]]]
[[[ 4.31945019e+00 2.91229140e+00 6.11852895e+00]
[ 1.39493861e+00 2.92144649e+00 3.66361843e+00]
[ 4.56069984e+00 7.85175961e+00 1.99673782e+00]]
[[ 6.07544852e+01 1.70524124e+01 6.50515930e+00]
[ 9.48885537e+01 9.65632033e+01 8.08397348e+01]
[ 3.04613769e+01 9.76721140e+00 6.84233027e+01]]
[[ 9.48885537e+01 9.65632033e+01 8.08397348e+01]
[ 3.04613769e+01 9.76721140e+00 6.84233027e+01]
[ 4.40152494e+01 1.22038235e+01 4.95176910e+01]]]]
Which as you can see has two batches of size three, both of which contain two different 'patients' in them, but the time series for each 'patient' do not overlap.
It's not exactly clear what you are looking for. A small sample of input and desired output would help. Nevertheless, I'll take a stab at what I think you are asking:
def dataIterator(data, batchSize=batchSize):
for patient_data in data:
for n in range(0, len(patient_data), batchSize):
yield patient_data[n:n+batchSize]

Different results of cost function in theano and numpy

I'm getting different results when calculating a negative log likelihood of a simple two layer neural net in theano and numpy.
This is the numpy code:
W1,b1,W2,b2 = model['W1'], model['b1'], model['W2'], model['b2']
N, D = X.shape
where model is a function that generates initial parameters and X is the input array.
z_1 = X
z_2 = np.dot(X,W1) + b1
z_3 = np.maximum(0, z_2)
z_4 = np.dot(z_3,W2)+b2
scores = z_4
exp_scores = np.exp(scores)
exp_sum = np.sum(exp_scores, axis = 1)
exp_sum.shape = (exp_scores.shape[0],1)
y_hat = exp_scores / exp_sum
loss = np.sum(np.log(y_hat[np.arange(y.shape[0]),y]))
loss = -1/float(y_hat.shape[0])*loss + reg/2.0*np.sum(np.multiply(W1,W1))+ reg/2.0*np.sum(np.multiply(W2,W2))
I'm getting a result of 1.3819194609246772, which is the correct value for the loss function. However my Theano code yields a value of 1.3715655944645178.
t_z_1 = T.dmatrix('z_1')
t_W1 = theano.shared(value = W1, name = 'W1', borrow = True)
t_b1 = theano.shared(value = b1, name = 'b1',borrow = True)
t_W2 = theano.shared(value = W2, name = 'W2')
t_b2 = theano.shared(value = b2, name = 'b2')
t_y = T.lvector('y')
t_reg = T.dscalar('reg')
first_layer = T.dot(t_z_1,W1) + t_b1
t_hidden = T.switch(first_layer > 0, 0, first_layer)
t_out = T.nnet.softmax(T.dot(t_hidden, W2)+t_b2)
t_cost = -T.mean(T.log(t_out)[T.arange(t_y.shape[0]),t_y],dtype = theano.config.floatX, acc_dtype = theano.config.floatX)+t_reg/2.0*T.sum(T.sqr(t_W1))+t_reg/2.0*T.sum(T.sqr(t_W2))
cost_func = theano.function([t_z_1,t_y,t_reg],t_cost)
loss = cost_func(z_1,y,reg)
I'm already getting wrong results when calculating the values in the output layer. I'm not really sure what the problem could be. Does the shared function keep the type of numpy array that is used as the value argument or is that converted to float32? Can anybody tell me what I'm doing wrong in the theano code?
EDIT: The problem seems to occur in the hidden layer after applying the ReLU function: Here's the comparison in the results between theano and numpy in each layer:
theano results of first layer
[[-0.3245614 -0.22532614 -0.12609087 -0.0268556 0.07237967 0.17161493
0.2708502 0.37008547 0.46932074 0.56855601]
[-0.26107962 -0.14975259 -0.03842555 0.07290148 0.18422852 0.29555556
0.40688259 0.51820963 0.62953666 0.7408637 ]
[-0.19759784 -0.07417904 0.04923977 0.17265857 0.29607737 0.41949618
0.54291498 0.66633378 0.78975259 0.91317139]
[-0.13411606 0.00139451 0.13690508 0.27241565 0.40792623 0.5434368
0.67894737 0.81445794 0.94996851 1.08547908]
[-0.07063428 0.07696806 0.2245704 0.37217274 0.51977508 0.66737742
0.81497976 0.9625821 1.11018444 1.25778677]]
numpy results of first layer
[[-0.3245614 -0.22532614 -0.12609087 -0.0268556 0.07237967 0.17161493
0.2708502 0.37008547 0.46932074 0.56855601]
[-0.26107962 -0.14975259 -0.03842555 0.07290148 0.18422852 0.29555556
0.40688259 0.51820963 0.62953666 0.7408637 ]
[-0.19759784 -0.07417904 0.04923977 0.17265857 0.29607737 0.41949618
0.54291498 0.66633378 0.78975259 0.91317139]
[-0.13411606 0.00139451 0.13690508 0.27241565 0.40792623 0.5434368
0.67894737 0.81445794 0.94996851 1.08547908]
[-0.07063428 0.07696806 0.2245704 0.37217274 0.51977508 0.66737742
0.81497976 0.9625821 1.11018444 1.25778677]]
theano results of hidden layer
[[-0.3245614 -0.22532614 -0.12609087 -0.0268556 0. 0. 0.
0. 0. 0. ]
[-0.26107962 -0.14975259 -0.03842555 0. 0. 0. 0.
0. 0. 0. ]
[-0.19759784 -0.07417904 0. 0. 0. 0. 0.
0. 0. 0. ]
[-0.13411606 0. 0. 0. 0. 0. 0.
0. 0. 0. ]
[-0.07063428 0. 0. 0. 0. 0. 0.
0. 0. 0. ]]
numpy results of hidden layer
[[ 0. 0. 0. 0. 0.07237967 0.17161493
0.2708502 0.37008547 0.46932074 0.56855601]
[ 0. 0. 0. 0.07290148 0.18422852 0.29555556
0.40688259 0.51820963 0.62953666 0.7408637 ]
[ 0. 0. 0.04923977 0.17265857 0.29607737 0.41949618
0.54291498 0.66633378 0.78975259 0.91317139]
[ 0. 0.00139451 0.13690508 0.27241565 0.40792623 0.5434368
0.67894737 0.81445794 0.94996851 1.08547908]
[ 0. 0.07696806 0.2245704 0.37217274 0.51977508 0.66737742
0.81497976 0.9625821 1.11018444 1.25778677]]
theano results of output
[[ 0.14393463 0.2863576 0.56970777]
[ 0.14303947 0.28582359 0.57113693]
[ 0.1424154 0.28544871 0.57213589]
[ 0.14193274 0.28515729 0.57290997]
[ 0.14171057 0.28502272 0.57326671]]
numpy results of output
[[-0.5328368 0.20031504 0.93346689]
[-0.59412164 0.15498488 0.9040914 ]
[-0.67658362 0.08978957 0.85616275]
[-0.77092643 0.01339997 0.79772637]
[-0.89110401 -0.08754544 0.71601312]]
I have the idea of using the switch() function for the ReLU layer from this post: Theano HiddenLayer Activation Function and I don't really see how that function is different from the equivalent numpy code: z_3 = np.maximum(0, z_2)?!
Solution to first problem: T.switch(first_layer > 0,0,first_layer) sets all the values greater than 0 to 0 => it should be T.switch(first_layer < 0,0,first_layer).
EDIT2: The gradients that theano calculates significantly differ from the numerical gradients I was given, this is my implementation:
g_w1, g_b1, g_w2, g_b2 = T.grad(t_cost,[t_W1,t_b1,t_W2,t_b2])
grads = {}
grads['W1'] = g_w1.eval({t_z_1 : z_1, t_y : y,t_reg : reg})
grads['b1'] = g_b1.eval({t_z_1 : z_1, t_y : y,t_reg : reg})
grads['W2'] = g_w2.eval({t_z_1 : z_1, t_y : y,t_reg : reg})
grads['b2'] = g_b2.eval({t_z_1 : z_1, t_y : y,t_reg : reg})
This is an assignment for the Convolutional Neural Networks class that was offered by Stanford earlier this year and I think it's safe to say that their numerical gradients are probably correct. I could post the code to their numerical implementation though if required.
Using a relative error the following way:
def relative_error(num, ana):
numerator = np.sum(np.abs(num-ana))
denom = np.sum(np.abs(num))+np.sum(np.abs(ana))
return numerator/denom
Calculating the numerical gradients using the eval_numerical_gradient method that was provided by the course get the following relative errors for the gradients:
param_grad_num = {}
rel_error = {}
for param_name in grads:
param_grad_num[param_name] = eval_numerical_gradient(lambda W: two_layer_net(X, model, y, reg)[0], model[param_name], verbose=False)
rel_error[param_name] = relative_error(param_grad_num[param_name],grads[param_name])
{'W1': 0.010069468997284833,
'W2': 0.6628490408291472,
'b1': 1.9498867941113963e-09,
'b2': 1.7223972753120753e-11}
Which are too large for W1 and W2, the relative error should be less than 1e-8. Can anybody explain this or help in any way?

Categories

Resources