Tensorflow Grab Predictions and Indices for values above thresholds

Tensorflow Grab Predictions and Indices for values above thresholds - python

What is the easiest way to grab the corresponding prediction values and indices based on those above a certain threshold?
Consider this problem:
sess = tf.InteractiveSession()
predictions = tf.constant([[ 0.32957435, 0.82079124, 0.54503286, 0.51966476, 0.63359714,
0.92034972, 0.13774526, 0.45154464, 0.18284607, 0.14604568],
[ 0.78612137, 0.98291659, 0.4841609 , 0.63260579, 0.21568334,
0.82978213, 0.05054879, 0.09517837, 0.28309393, 0.01788473],
[ 0.05706763, 0.24366784, 0.04608512, 0.32987678, 0.2342416 ,
0.91725373, 0.60084391, 0.51787591, 0.74161232, 0.30830121],
[ 0.67310858, 0.6250236 , 0.42477703, 0.37107778, 0.65123832,
0.97282803, 0.59533679, 0.49564457, 0.54935825, 0.63008392],
[ 0.70233917, 0.48129809, 0.59114349, 0.63535333, 0.71188867,
0.4799161 , 0.90896237, 0.86089945, 0.47896886, 0.83451629],
[ 0.82923532, 0.8950938 , 0.99231505, 0.05526769, 0.98151541,
0.18153167, 0.63851702, 0.07426929, 0.91846335, 0.81246626],
[ 0.12850153, 0.23018432, 0.29871917, 0.71228445, 0.13235569,
0.41061044, 0.98215759, 0.90024149, 0.53385031, 0.92247963],
[ 0.87011361, 0.44218826, 0.01772344, 0.87317121, 0.52231467,
0.86476815, 0.25352192, 0.31709731, 0.38249743, 0.74694788],
[ 0.15262914, 0.49544573, 0.49644637, 0.07461977, 0.13706958,
0.18619633, 0.86163998, 0.03700352, 0.51173556, 0.40018845]])
score_idx = tf.where(predictions > 0.8)
scores = tf.SparseTensor(score_idx, tf.gather_nd(predictions, score_idx), dense_shape=tf.shape(predictions, out_type=tf.int64))
dense_scores = tf.sparse_tensor_to_dense(scores)
print(sess.run([scores, dense_scores]))
I can easily get a sparse tensor that has all of the predictions above 0.8, but ultimately I am looking to return two separate 1D tensors:
Predicted Indices = list of indexes above threshold (0.8 in example)
Scores = the scores for the corresponding examples
So for the first row which is:
[ 0.32957435, 0.82079124, 0.54503286, 0.51966476, 0.63359714,
0.92034972, 0.13774526, 0.45154464, 0.18284607, 0.14604568]
I am looking to return:
predicted_indices = [1,5]
scores = [0.821, 0.920]
Is there a simple solution that I am missing?

Related

Debug array in jax vmap function

Dear jax experts I need your kind help.
Here is a working example (I have follow the advise to simplify my code, although I am not an expert on jax neither on Python to guess what is the heart of the mechanism involved in vmap)
def jax_kernel(rng_key, logpdf, position, log_prob):
key, subkey = jax.random.split(rng_key)
move_proposals = jax.random.normal(key, shape=position.shape)* 0.1
proposal = position + move_proposals
proposal_log_prob = logpdf(proposal)
return proposal, proposal_log_prob
def jax_sampler(rng_key, n_samples, logpdf, initial_position):
def mh_update(i, state):
key, positions, log_prob = state
_, key = jax.random.split(key)
print(f"mh_update: positions[{i-1}]:",jnp.asarray(positions[i-1]))
new_position, new_log_prob = jax_kernel(key,logpdf,positions[i-1],log_prob)
positions=positions.at[i].set(new_position)
return (key, positions, new_log_prob)
# all positions structure should be set before lax.fori_loop
print("initial_position shape:",initial_position.shape)
all_positions = jnp.zeros((n_samples,)+initial_position.shape)
all_positions=all_positions.at[0,0].set(1.)
all_positions=all_positions.at[0,1].set(2.)
all_positions=all_positions.at[0,2].set(2.)
print("all_positions init:",all_positions.shape)
logp = logpdf(all_positions[0])
# use of a for-loop to be able to debug mh_update instead of a jax.fori_loop
initial_state = (rng_key,all_positions, logp)
val = initial_state
for i in range(1, n_samples):
val = mh_update(i, val)
rng_key, all_positions, log_prob = val
# return all the positions of the parameters (n_chains, n_samples, n_dim)
return all_positions
def func(par):
xi = jnp.asarray(sci_stats.uniform.rvs(size=10))
val = xi*par[1]+par[0]
return jnp.sum(jax.scipy.stats.norm.logpdf(x=val,loc=yi,scale=par[2]))
n_dim = 3 # number of parameters ie. (a,b,s)
n_samples = 5 # number of samples per chain
n_chains = 4 # number of MCMC chains
rng_key = jax.random.PRNGKey(42)
rng_keys = jax.random.split(rng_key, n_chains)
initial_position = jnp.ones((n_dim, n_chains))
print("main initial_position shape",initial_position.shape)
run = jax.vmap(jax_sampler, in_axes=(0, None, None, 1), out_axes=0)
all_positions = run(rng_keys,n_samples,lambda p: func(p),initial_position)
print("all_positions:",all_positions)
Then my question concerns the dimension evolution print(f"mh_update: positions[{i-1}]:",jnp.asarray(positions[i-1])). I do not understand why positions[i-1]starts with dimension n_dim and then switches to n_chains x n_dim?
Thanks in advance for your comments?
Here is the complete output:
main initial_position shape (3, 4)
initial_position shape: (3,)
all_positions init: (5, 3)
mh_update: positions[0]: [1. 2. 2.]
mh_update: positions[1]: Traced<ShapedArray(float32[3])>with<BatchTrace(level=1/0)>
with val = DeviceArray([[0.9354116 , 1.7876872 , 1.8443539 ],
[0.9844745 , 2.073029 , 1.9511036 ],
[0.98202926, 2.0109322 , 2.094176 ],
[0.9536771 , 1.9731759 , 2.093319 ]], dtype=float32)
batch_dim = 0
mh_update: positions[2]: Traced<ShapedArray(float32[3])>with<BatchTrace(level=1/0)>
with val = DeviceArray([[1.0606856, 1.6707807, 1.8377957],
[1.0465866, 1.9754674, 1.7009288],
[1.1107644, 2.0142047, 2.190575 ],
[1.0089972, 1.9953227, 1.996874 ]], dtype=float32)
batch_dim = 0
mh_update: positions[3]: Traced<ShapedArray(float32[3])>with<BatchTrace(level=1/0)>
with val = DeviceArray([[1.0731456, 1.644405 , 2.1343162],
[1.0599504, 2.0121546, 1.6867112],
[1.0585173, 1.9661485, 2.1573594],
[1.1213307, 1.9335203, 1.9683584]], dtype=float32)
batch_dim = 0
all_positions: [[[1. 2. 2. ]
[0.9354116 1.7876872 1.8443539 ]
[1.0606856 1.6707807 1.8377957 ]
[1.0731456 1.644405 2.1343162 ]
[1.0921828 1.5742197 2.058759 ]]
[[1. 2. 2. ]
[0.9844745 2.073029 1.9511036 ]
[1.0465866 1.9754674 1.7009288 ]
[1.0599504 2.0121546 1.6867112 ]
[1.0835105 2.0051234 1.4766487 ]]
[[1. 2. 2. ]
[0.98202926 2.0109322 2.094176 ]
[1.1107644 2.0142047 2.190575 ]
[1.0585173 1.9661485 2.1573594 ]
[1.1728328 1.981367 2.180744 ]]
[[1. 2. 2. ]
[0.9536771 1.9731759 2.093319 ]
[1.0089972 1.9953227 1.996874 ]
[1.1213307 1.9335203 1.9683584 ]
[1.1148386 1.9598911 2.1721165 ]]]

In the first iteration, you print a concrete array that you have constructed within a vmapped function. It is a float32 array of shape (3,).
After the first iteration, you've constructed a new array via operations on a vmapped input. When you vmap an input like this, JAX replaces your input array with a tracer that is an abstract representation of your input; the printed value looks like this:
Traced<ShapedArray(float32[3])>with<BatchTrace(level=1/0)>
with val = DeviceArray([[1.0731456, 1.644405 , 2.1343162],
[1.0599504, 2.0121546, 1.6867112],
[1.0585173, 1.9661485, 2.1573594],
[1.1213307, 1.9335203, 1.9683584]], dtype=float32)
The float32[3] indicates that this tracer represents an array of float32 values of shape (3,): that is, it still has the same type and shape as in the first iteration. But in this case it is not a concrete array with three elements, it is a batched tracer representing each iteration of the vmapped input. The power of the vmap transform is that JAX effectively tracks all implied iterations of the vmapped computation in one pass: in the tracer representation, the rows of val effectively show you the intermediate values for all the vmapped iterations.
For more understanding of how JAX tracing works, a good read is How To Think In JAX in the JAX documentation.

How to feed a nested array into an SVM model

my question is the following:
I have an array that has feature vectors that correspond to several audio files. So if for example there are 10 audio files than this array would have length 10.
I have a feature that is itself a list (this list comprises the information of a specific feature of the audio file) and for a given audio file the feature vector looks like this:
array([0.03861840871664194, 187.72393405210002, 62.59881268743305,
0.2911392405063291,
array([4963.40332031, 3229.98046875, 2691.65039062, 3208.44726562,
4338.94042969, 4220.5078125 , 4166.67480469, 4801.90429688,
5555.56640625, 5910.86425781, 6115.4296875 , 5706.29882812,
4984.93652344, 2756.25 , 1991.82128906, 2551.68457031,
2734.71679688, 2906.98242188, 3143.84765625, 3219.21386719,
3186.9140625 , 3165.38085938, 3068.48144531, 2465.55175781,
2110.25390625, 2508.61816406, 2993.11523438, 3843.67675781,
4715.77148438, 5652.46582031, 5480.20019531, 5792.43164062,
5932.39746094, 6244.62890625, 6072.36328125, 6201.5625 ,
6158.49609375, 6201.5625 , 6233.86230469, 6061.59667969])],
dtype=object)
Now when I try to feed this data into the svm model:
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
X_train, X_val, y_train, y_val = train_test_split(X,y,test_size=0.3)
model = svm.SVC()
model.fit(X_train,y_train)
yt_p = model.predict(X_train)
yv_p = model.predict(X_val)
I get this error ValueError: setting an array element with a sequence.
How can I structure my feature vector in order to be able to feed it to the svm?
EDIT:
Here I provide with an example of X
if we have 5 audio files then X will be:
array([[0.017455393927437918, 227.66237105624407, 32.42076654734572,
0.3867924528301887,
array([1851.85546875, 2433.25195312, 3057.71484375, 3079.24804688,
3079.24804688, 3068.48144531, 3046.94824219, 3359.1796875 ,
3908.27636719, 4618.87207031, 4618.87207031, 4521.97265625,
4091.30859375, 3111.54785156, 3100.78125 , 2863.91601562,
1561.15722656, 1119.7265625 , 1065.89355469, 947.4609375 ,
979.76074219, 990.52734375, 990.52734375, 1356.59179688,
2077.95410156, 2993.11523438, 3025.41503906, 3068.48144531,
3079.24804688, 3090.01464844, 3100.78125 , 3111.54785156,
2993.11523438, 3100.78125 , 3079.24804688, 2853.14941406,
1205.859375 , 1281.22558594, 1614.99023438, 2131.78710938,
2325.5859375 , 2034.88769531, 1916.45507812, 1744.18945312,
1851.85546875, 2357.88574219, 2368.65234375, 1916.45507812,
1959.52148438, 1959.52148438, 1754.95605469, 1787.25585938,
2207.15332031])],
[0.03861840871664194, 187.72393405210002, 62.59881268743305,
0.2911392405063291,
array([4963.40332031, 3229.98046875, 2691.65039062, 3208.44726562,
4338.94042969, 4220.5078125 , 4166.67480469, 4801.90429688,
5555.56640625, 5910.86425781, 6115.4296875 , 5706.29882812,
4984.93652344, 2756.25 , 1991.82128906, 2551.68457031,
2734.71679688, 2906.98242188, 3143.84765625, 3219.21386719,
3186.9140625 , 3165.38085938, 3068.48144531, 2465.55175781,
2110.25390625, 2508.61816406, 2993.11523438, 3843.67675781,
4715.77148438, 5652.46582031, 5480.20019531, 5792.43164062,
5932.39746094, 6244.62890625, 6072.36328125, 6201.5625 ,
6158.49609375, 6201.5625 , 6233.86230469, 6061.59667969])],
[0.042435441297643324, 128.81225073038124, 20.912528554426807,
0.313953488372093,
array([4349.70703125, 4242.04101562, 4274.34082031, 4123.60839844,
4457.37304688, 4834.20410156, 4661.93847656, 4306.640625 ,
4231.27441406, 4543.50585938, 4435.83984375, 6201.5625 ,
8817.84667969, 8817.84667969, 742.89550781, 721.36230469,
732.12890625, 732.12890625, 710.59570312, 721.36230469,
925.92773438, 1119.7265625 , 1141.25976562, 1431.95800781,
7762.71972656, 7934.98535156, 7891.91894531, 7332.05566406,
3789.84375 , 2799.31640625, 2831.61621094, 2217.91992188,
581.39648438, 602.9296875 , 2217.91992188, 2228.68652344,
2368.65234375, 2519.38476562, 2863.91601562, 3682.17773438,
3649.87792969, 4188.20800781, 4112.84179688])],
[0.006295381642571726, 130.28309914454434, 5.193614287487564,
0.2411764705882353,
array([7978.05175781, 8010.3515625 , 8118.01757812, 8430.24902344,
8257.98339844, 8451.78222656, 8591.74804688, 8677.88085938,
8796.31347656, 8850.14648438, 8796.31347656, 8925.51269531,
6244.62890625, 344.53125 , 344.53125 , 1614.99023438,
2325.5859375 , 2971.58203125, 3316.11328125, 3617.578125 ,
3294.58007812, 2788.54980469, 2637.81738281, 2702.41699219,
2723.95019531, 3133.08105469, 3413.01269531, 5663.23242188,
5770.8984375 , 5577.09960938, 2228.68652344, 1604.22363281,
1690.35644531, 4123.60839844, 5566.33300781, 5803.19824219,
5749.36523438, 5846.26464844, 6772.19238281, 7073.65722656,
7622.75390625, 7859.61914062, 8236.45019531, 8441.015625 ,
8699.4140625 , 8807.08007812, 8742.48046875, 8667.11425781,
8710.18066406, 8947.04589844, 9140.84472656, 9130.078125 ,
8936.27929688, 8925.51269531, 8947.04589844, 8925.51269531,
9097.77832031, 9205.44433594, 9194.67773438, 9140.84472656,
9162.37792969, 9043.9453125 , 9162.37792969, 9108.54492188,
9183.91113281, 9280.81054688, 9270.04394531, 9108.54492188,
9076.24511719, 9356.17675781, 9226.97753906, 9216.2109375 ,
9248.51074219, 9140.84472656, 9237.74414062, 9334.64355469,
9259.27734375, 9226.97753906, 9216.2109375 , 9108.54492188,
9183.91113281, 9216.2109375 , 9248.51074219, 9259.27734375,
9183.91113281])],
[0.017070271599460656, 171.91660927761163, 26.854424936811768,
0.11188811188811189,
array([4715.77148438, 4629.63867188, 4898.80371094, 5275.63476562,
4941.87011719, 4532.73925781, 4618.87207031, 4995.703125 ,
4705.00488281, 4500.43945312, 4188.20800781, 4371.24023438,
4457.37304688, 4188.20800781, 4909.5703125 , 4877.27050781,
6761.42578125, 7708.88671875, 7719.65332031, 7956.51855469,
8484.08203125, 9033.17871094, 9043.9453125 , 9000.87890625,
9011.64550781, 9011.64550781, 9000.87890625, 9108.54492188,
8817.84667969, 6686.05957031, 1808.7890625 , 1830.32226562,
1851.85546875, 1636.5234375 , 1022.82714844, 1281.22558594,
1927.22167969, 1948.75488281, 1302.75878906, 1399.65820312,
1873.38867188, 1959.52148438, 7245.92285156, 9011.64550781,
9420.77636719, 9549.97558594, 9453.07617188, 9431.54296875,
9410.00976562, 9248.51074219, 9151.61132812, 9194.67773438,
8968.57910156, 8634.81445312, 8268.75 , 7439.72167969,
5501.73339844, 5232.56835938, 5103.36914062, 7052.12402344,
7299.75585938, 7127.49023438, 7192.08984375, 5673.99902344,
5523.26660156, 5986.23046875, 6729.12597656, 6309.22851562,
5135.66894531, 5081.8359375 , 5329.46777344, 5404.83398438])]],
dtype=object)

You can feed the feature with the lists inside to your model in two ways:
Treat the list as additional features
Map all of its elements into a single number with a function you deem appropriate (min, median, mean, max, sum, etc.).
To try the first option:
# Convert `X` to data frame
X = pd.DataFrame(X)
# Rename columns
X.columns = ['feature_' + str(i + 1) for i in range(X.shape[1])]
# Convert the feature with lists inside to long format
x = X['feature_5'].explode().to_frame()
# Create counter by observation so we can pivot
x['observation_id'] = x.groupby(level=0).cumcount()
# Convert to dataset and rename all columns
x = x.pivot(columns='observation_id', values='feature_5').fillna(0)
x = x.add_prefix('list_element_')
# Drop `feature_5` from X
X.drop(columns='feature_5', axis=1, inplace=True)
# Concatenate X and x together
X = pd.concat([X, x], axis=1)
# Carry on as before
X_train, X_val, y_train, y_val = train_test_split(X,y,test_size=0.3)
model = svm.SVC()
model.fit(X_train,y_train)
There's no right answer to the second option and only you can decide how to do this because only you know what the lists mean. However, if you want to get the mean (for example) of each list and use that as a feature:
# Get the mean of each list
means = [np.mean(array) for array in X[:, 4]]
# Replace the lists with `means`
X[:, 4] = means
And then carry on with the splitting and fitting.

tf.contrib.signal.stft returns an empty matrix

This is the piece of code I run:
import tensorflow as tf
sess = tf.InteractiveSession()
filename = 'song.mp3' # 30 second mp3 file
SAMPLES_PER_SEC = 44100
audio_binary = tf.read_file(filename)
pcm = tf.contrib.ffmpeg.decode_audio(audio_binary, file_format='mp3', samples_per_second=SAMPLES_PER_SEC, channel_count = 1)
stft = tf.contrib.signal.stft(pcm, frame_length=1024, frame_step=512, fft_length=1024)
sess.close()
The mp3 file is properly decoded because print(pcm.eval().shape) returns:
(1323119, 1)
And there are even some actual non-zero values when I print them with print(pcm.eval()[1000:1010]):
[[ 0.18793298]
[ 0.16214484]
[ 0.16022217]
[ 0.15918455]
[ 0.16428113]
[ 0.19858395]
[ 0.22861415]
[ 0.2347789 ]
[ 0.22684409]
[ 0.20728172]]
But for some reason print(stft.eval().shape) evaluates to:
(1323119, 0, 513) # why the zero dimension?
And therefore print(stft.eval()) is:
[]
According to this the second dimension of the tf.contrib.signal.stft output is equal to the number of frames. Why are there no frames though?

It seems that tf.contrib.ffmpeg.decode_audio returned a tensor of shape (?, 1) which is one signal of ? samples.
However tf.contrib.signal.stft expects a (signal_count, samples) tensor as input, therefore one has to transpose it beforehand.
Modifying the call like this does the trick:
stft = tf.contrib.signal.stft(tf.transpose(pcm), frame_length=1024, frame_step=512, fft_length=1024)

Creating a generator from list of sequences for RNN

I need to create a generator for my data to pass into my RNN training function. I have a list of patient samples, where each sample is a time series of length ni (which varies, annoyingly) in three dimensions, and I want to create batches of data where each sample in a batch only belongs to a single patient but each batch may contain multiple patient samples. Doing it this way should maximise the number of samples I can train using with no consequences as my RNN is not stateful. At first I had the following function
def dataIterator(rawDataList, config):
batchSize, nSteps = config.batchSize, config.nSteps
for rawData in rawDataList:
dataLen, dataWidth = rawData.shape
batchLen = dataLen // batchSize
data = np.zeros([batchSize, batchLen, dataWidth], dtype=np.float32)
for i in xrange(batchSize):
data[i] = rawData[batchLen*i:batchLen*(i+1), :]
epochSize = (batchLen - 1) // nSteps
if epochSize == 0:
raise ValueError('epoch_size == 0')
for i in xrange(epochSize):
x = data[:, i*nSteps:(i+1)*nSteps, :]
y = data[:, i*nSteps+1:(i+1)*nSteps+1, :]
yield (x, y)
However this trims each of the patient samples in order to fit the batch size. So I want something that creates all possible batches, including the undersized one at the end. However my unfamiliarity with generators has left me pretty confused. So far I've worked out it's going to have to use modulo aritmetic, but exactly how I'm not sure, so I've only got to this point:
def dataIterator(data, batchSize=batchSize, nSteps=nSteps, nDimensions=3):
nTimePoints = sum([len(x) for x in data])
totalBatchLen = 1+(nTimePoints-1)//batchSize
newData = np.zeros([batchSize, totalBatchLen, nDimensions])
for i in xrange(batchSize):
...
EDIT
Here's a short example to show how I would solve the problem without using generators
import numpy as np
np.random.seed(42)
nPatients = 3
tsLength = 5
nDimensions = 3
rnnTSLength = 3
batchSize = 3
inputData = np.random.random((nPatients, tsLength, nDimensions))
inputData[1, :, :] *= 10
inputData[2, :, :] *= 100
outputData = []
for i in xrange(tsLength-rnnTSLength):
outputData.append(inputData[0, i:i+rnnTSLength, :])
for i in xrange(tsLength-rnnTSLength):
outputData.append(inputData[1, i:i+rnnTSLength, :])
for i in xrange(tsLength-rnnTSLength):
outputData.append(inputData[2, i:i+rnnTSLength, :])
temp1 = np.array(outputData[:3])
temp2 = np.array(outputData[3:])
npOutput = np.array((temp1, temp2))
print npOutput
Which produces:
[[[[ 3.74540119e-01 9.50714306e-01 7.31993942e-01]
[ 5.98658484e-01 1.56018640e-01 1.55994520e-01]
[ 5.80836122e-02 8.66176146e-01 6.01115012e-01]]
[[ 5.98658484e-01 1.56018640e-01 1.55994520e-01]
[ 5.80836122e-02 8.66176146e-01 6.01115012e-01]
[ 7.08072578e-01 2.05844943e-02 9.69909852e-01]]
[[ 1.83404510e+00 3.04242243e+00 5.24756432e+00]
[ 4.31945019e+00 2.91229140e+00 6.11852895e+00]
[ 1.39493861e+00 2.92144649e+00 3.66361843e+00]]]
[[[ 4.31945019e+00 2.91229140e+00 6.11852895e+00]
[ 1.39493861e+00 2.92144649e+00 3.66361843e+00]
[ 4.56069984e+00 7.85175961e+00 1.99673782e+00]]
[[ 6.07544852e+01 1.70524124e+01 6.50515930e+00]
[ 9.48885537e+01 9.65632033e+01 8.08397348e+01]
[ 3.04613769e+01 9.76721140e+00 6.84233027e+01]]
[[ 9.48885537e+01 9.65632033e+01 8.08397348e+01]
[ 3.04613769e+01 9.76721140e+00 6.84233027e+01]
[ 4.40152494e+01 1.22038235e+01 4.95176910e+01]]]]
Which as you can see has two batches of size three, both of which contain two different 'patients' in them, but the time series for each 'patient' do not overlap.

It's not exactly clear what you are looking for. A small sample of input and desired output would help. Nevertheless, I'll take a stab at what I think you are asking:
def dataIterator(data, batchSize=batchSize):
for patient_data in data:
for n in range(0, len(patient_data), batchSize):
yield patient_data[n:n+batchSize]

Different results of cost function in theano and numpy

I'm getting different results when calculating a negative log likelihood of a simple two layer neural net in theano and numpy.
This is the numpy code:
W1,b1,W2,b2 = model['W1'], model['b1'], model['W2'], model['b2']
N, D = X.shape
where model is a function that generates initial parameters and X is the input array.
z_1 = X
z_2 = np.dot(X,W1) + b1
z_3 = np.maximum(0, z_2)
z_4 = np.dot(z_3,W2)+b2
scores = z_4
exp_scores = np.exp(scores)
exp_sum = np.sum(exp_scores, axis = 1)
exp_sum.shape = (exp_scores.shape[0],1)
y_hat = exp_scores / exp_sum
loss = np.sum(np.log(y_hat[np.arange(y.shape[0]),y]))
loss = -1/float(y_hat.shape[0])*loss + reg/2.0*np.sum(np.multiply(W1,W1))+ reg/2.0*np.sum(np.multiply(W2,W2))
I'm getting a result of 1.3819194609246772, which is the correct value for the loss function. However my Theano code yields a value of 1.3715655944645178.
t_z_1 = T.dmatrix('z_1')
t_W1 = theano.shared(value = W1, name = 'W1', borrow = True)
t_b1 = theano.shared(value = b1, name = 'b1',borrow = True)
t_W2 = theano.shared(value = W2, name = 'W2')
t_b2 = theano.shared(value = b2, name = 'b2')
t_y = T.lvector('y')
t_reg = T.dscalar('reg')
first_layer = T.dot(t_z_1,W1) + t_b1
t_hidden = T.switch(first_layer > 0, 0, first_layer)
t_out = T.nnet.softmax(T.dot(t_hidden, W2)+t_b2)
t_cost = -T.mean(T.log(t_out)[T.arange(t_y.shape[0]),t_y],dtype = theano.config.floatX, acc_dtype = theano.config.floatX)+t_reg/2.0*T.sum(T.sqr(t_W1))+t_reg/2.0*T.sum(T.sqr(t_W2))
cost_func = theano.function([t_z_1,t_y,t_reg],t_cost)
loss = cost_func(z_1,y,reg)
I'm already getting wrong results when calculating the values in the output layer. I'm not really sure what the problem could be. Does the shared function keep the type of numpy array that is used as the value argument or is that converted to float32? Can anybody tell me what I'm doing wrong in the theano code?
EDIT: The problem seems to occur in the hidden layer after applying the ReLU function: Here's the comparison in the results between theano and numpy in each layer:
theano results of first layer
[[-0.3245614 -0.22532614 -0.12609087 -0.0268556 0.07237967 0.17161493
0.2708502 0.37008547 0.46932074 0.56855601]
[-0.26107962 -0.14975259 -0.03842555 0.07290148 0.18422852 0.29555556
0.40688259 0.51820963 0.62953666 0.7408637 ]
[-0.19759784 -0.07417904 0.04923977 0.17265857 0.29607737 0.41949618
0.54291498 0.66633378 0.78975259 0.91317139]
[-0.13411606 0.00139451 0.13690508 0.27241565 0.40792623 0.5434368
0.67894737 0.81445794 0.94996851 1.08547908]
[-0.07063428 0.07696806 0.2245704 0.37217274 0.51977508 0.66737742
0.81497976 0.9625821 1.11018444 1.25778677]]
numpy results of first layer
[[-0.3245614 -0.22532614 -0.12609087 -0.0268556 0.07237967 0.17161493
0.2708502 0.37008547 0.46932074 0.56855601]
[-0.26107962 -0.14975259 -0.03842555 0.07290148 0.18422852 0.29555556
0.40688259 0.51820963 0.62953666 0.7408637 ]
[-0.19759784 -0.07417904 0.04923977 0.17265857 0.29607737 0.41949618
0.54291498 0.66633378 0.78975259 0.91317139]
[-0.13411606 0.00139451 0.13690508 0.27241565 0.40792623 0.5434368
0.67894737 0.81445794 0.94996851 1.08547908]
[-0.07063428 0.07696806 0.2245704 0.37217274 0.51977508 0.66737742
0.81497976 0.9625821 1.11018444 1.25778677]]
theano results of hidden layer
[[-0.3245614 -0.22532614 -0.12609087 -0.0268556 0. 0. 0.
0. 0. 0. ]
[-0.26107962 -0.14975259 -0.03842555 0. 0. 0. 0.
0. 0. 0. ]
[-0.19759784 -0.07417904 0. 0. 0. 0. 0.
0. 0. 0. ]
[-0.13411606 0. 0. 0. 0. 0. 0.
0. 0. 0. ]
[-0.07063428 0. 0. 0. 0. 0. 0.
0. 0. 0. ]]
numpy results of hidden layer
[[ 0. 0. 0. 0. 0.07237967 0.17161493
0.2708502 0.37008547 0.46932074 0.56855601]
[ 0. 0. 0. 0.07290148 0.18422852 0.29555556
0.40688259 0.51820963 0.62953666 0.7408637 ]
[ 0. 0. 0.04923977 0.17265857 0.29607737 0.41949618
0.54291498 0.66633378 0.78975259 0.91317139]
[ 0. 0.00139451 0.13690508 0.27241565 0.40792623 0.5434368
0.67894737 0.81445794 0.94996851 1.08547908]
[ 0. 0.07696806 0.2245704 0.37217274 0.51977508 0.66737742
0.81497976 0.9625821 1.11018444 1.25778677]]
theano results of output
[[ 0.14393463 0.2863576 0.56970777]
[ 0.14303947 0.28582359 0.57113693]
[ 0.1424154 0.28544871 0.57213589]
[ 0.14193274 0.28515729 0.57290997]
[ 0.14171057 0.28502272 0.57326671]]
numpy results of output
[[-0.5328368 0.20031504 0.93346689]
[-0.59412164 0.15498488 0.9040914 ]
[-0.67658362 0.08978957 0.85616275]
[-0.77092643 0.01339997 0.79772637]
[-0.89110401 -0.08754544 0.71601312]]
I have the idea of using the switch() function for the ReLU layer from this post: Theano HiddenLayer Activation Function and I don't really see how that function is different from the equivalent numpy code: z_3 = np.maximum(0, z_2)?!
Solution to first problem: T.switch(first_layer > 0,0,first_layer) sets all the values greater than 0 to 0 => it should be T.switch(first_layer < 0,0,first_layer).
EDIT2: The gradients that theano calculates significantly differ from the numerical gradients I was given, this is my implementation:
g_w1, g_b1, g_w2, g_b2 = T.grad(t_cost,[t_W1,t_b1,t_W2,t_b2])
grads = {}
grads['W1'] = g_w1.eval({t_z_1 : z_1, t_y : y,t_reg : reg})
grads['b1'] = g_b1.eval({t_z_1 : z_1, t_y : y,t_reg : reg})
grads['W2'] = g_w2.eval({t_z_1 : z_1, t_y : y,t_reg : reg})
grads['b2'] = g_b2.eval({t_z_1 : z_1, t_y : y,t_reg : reg})
This is an assignment for the Convolutional Neural Networks class that was offered by Stanford earlier this year and I think it's safe to say that their numerical gradients are probably correct. I could post the code to their numerical implementation though if required.
Using a relative error the following way:
def relative_error(num, ana):
numerator = np.sum(np.abs(num-ana))
denom = np.sum(np.abs(num))+np.sum(np.abs(ana))
return numerator/denom
Calculating the numerical gradients using the eval_numerical_gradient method that was provided by the course get the following relative errors for the gradients:
param_grad_num = {}
rel_error = {}
for param_name in grads:
param_grad_num[param_name] = eval_numerical_gradient(lambda W: two_layer_net(X, model, y, reg)[0], model[param_name], verbose=False)
rel_error[param_name] = relative_error(param_grad_num[param_name],grads[param_name])
{'W1': 0.010069468997284833,
'W2': 0.6628490408291472,
'b1': 1.9498867941113963e-09,
'b2': 1.7223972753120753e-11}
Which are too large for W1 and W2, the relative error should be less than 1e-8. Can anybody explain this or help in any way?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Tensorflow Grab Predictions and Indices for values above thresholds - python

Related

Debug array in jax vmap function

How to feed a nested array into an SVM model

tf.contrib.signal.stft returns an empty matrix

Creating a generator from list of sequences for RNN

Different results of cost function in theano and numpy

Categories

Resources