Dear jax experts I need your kind help.
Here is a working example (I have follow the advise to simplify my code, although I am not an expert on jax neither on Python to guess what is the heart of the mechanism involved in vmap)
def jax_kernel(rng_key, logpdf, position, log_prob):
key, subkey = jax.random.split(rng_key)
move_proposals = jax.random.normal(key, shape=position.shape)* 0.1
proposal = position + move_proposals
proposal_log_prob = logpdf(proposal)
return proposal, proposal_log_prob
def jax_sampler(rng_key, n_samples, logpdf, initial_position):
def mh_update(i, state):
key, positions, log_prob = state
_, key = jax.random.split(key)
print(f"mh_update: positions[{i-1}]:",jnp.asarray(positions[i-1]))
new_position, new_log_prob = jax_kernel(key,logpdf,positions[i-1],log_prob)
positions=positions.at[i].set(new_position)
return (key, positions, new_log_prob)
# all positions structure should be set before lax.fori_loop
print("initial_position shape:",initial_position.shape)
all_positions = jnp.zeros((n_samples,)+initial_position.shape)
all_positions=all_positions.at[0,0].set(1.)
all_positions=all_positions.at[0,1].set(2.)
all_positions=all_positions.at[0,2].set(2.)
print("all_positions init:",all_positions.shape)
logp = logpdf(all_positions[0])
# use of a for-loop to be able to debug mh_update instead of a jax.fori_loop
initial_state = (rng_key,all_positions, logp)
val = initial_state
for i in range(1, n_samples):
val = mh_update(i, val)
rng_key, all_positions, log_prob = val
# return all the positions of the parameters (n_chains, n_samples, n_dim)
return all_positions
def func(par):
xi = jnp.asarray(sci_stats.uniform.rvs(size=10))
val = xi*par[1]+par[0]
return jnp.sum(jax.scipy.stats.norm.logpdf(x=val,loc=yi,scale=par[2]))
n_dim = 3 # number of parameters ie. (a,b,s)
n_samples = 5 # number of samples per chain
n_chains = 4 # number of MCMC chains
rng_key = jax.random.PRNGKey(42)
rng_keys = jax.random.split(rng_key, n_chains)
initial_position = jnp.ones((n_dim, n_chains))
print("main initial_position shape",initial_position.shape)
run = jax.vmap(jax_sampler, in_axes=(0, None, None, 1), out_axes=0)
all_positions = run(rng_keys,n_samples,lambda p: func(p),initial_position)
print("all_positions:",all_positions)
Then my question concerns the dimension evolution print(f"mh_update: positions[{i-1}]:",jnp.asarray(positions[i-1])). I do not understand why positions[i-1]starts with dimension n_dim and then switches to n_chains x n_dim?
Thanks in advance for your comments?
Here is the complete output:
main initial_position shape (3, 4)
initial_position shape: (3,)
all_positions init: (5, 3)
mh_update: positions[0]: [1. 2. 2.]
mh_update: positions[1]: Traced<ShapedArray(float32[3])>with<BatchTrace(level=1/0)>
with val = DeviceArray([[0.9354116 , 1.7876872 , 1.8443539 ],
[0.9844745 , 2.073029 , 1.9511036 ],
[0.98202926, 2.0109322 , 2.094176 ],
[0.9536771 , 1.9731759 , 2.093319 ]], dtype=float32)
batch_dim = 0
mh_update: positions[2]: Traced<ShapedArray(float32[3])>with<BatchTrace(level=1/0)>
with val = DeviceArray([[1.0606856, 1.6707807, 1.8377957],
[1.0465866, 1.9754674, 1.7009288],
[1.1107644, 2.0142047, 2.190575 ],
[1.0089972, 1.9953227, 1.996874 ]], dtype=float32)
batch_dim = 0
mh_update: positions[3]: Traced<ShapedArray(float32[3])>with<BatchTrace(level=1/0)>
with val = DeviceArray([[1.0731456, 1.644405 , 2.1343162],
[1.0599504, 2.0121546, 1.6867112],
[1.0585173, 1.9661485, 2.1573594],
[1.1213307, 1.9335203, 1.9683584]], dtype=float32)
batch_dim = 0
all_positions: [[[1. 2. 2. ]
[0.9354116 1.7876872 1.8443539 ]
[1.0606856 1.6707807 1.8377957 ]
[1.0731456 1.644405 2.1343162 ]
[1.0921828 1.5742197 2.058759 ]]
[[1. 2. 2. ]
[0.9844745 2.073029 1.9511036 ]
[1.0465866 1.9754674 1.7009288 ]
[1.0599504 2.0121546 1.6867112 ]
[1.0835105 2.0051234 1.4766487 ]]
[[1. 2. 2. ]
[0.98202926 2.0109322 2.094176 ]
[1.1107644 2.0142047 2.190575 ]
[1.0585173 1.9661485 2.1573594 ]
[1.1728328 1.981367 2.180744 ]]
[[1. 2. 2. ]
[0.9536771 1.9731759 2.093319 ]
[1.0089972 1.9953227 1.996874 ]
[1.1213307 1.9335203 1.9683584 ]
[1.1148386 1.9598911 2.1721165 ]]]
In the first iteration, you print a concrete array that you have constructed within a vmapped function. It is a float32 array of shape (3,).
After the first iteration, you've constructed a new array via operations on a vmapped input. When you vmap an input like this, JAX replaces your input array with a tracer that is an abstract representation of your input; the printed value looks like this:
Traced<ShapedArray(float32[3])>with<BatchTrace(level=1/0)>
with val = DeviceArray([[1.0731456, 1.644405 , 2.1343162],
[1.0599504, 2.0121546, 1.6867112],
[1.0585173, 1.9661485, 2.1573594],
[1.1213307, 1.9335203, 1.9683584]], dtype=float32)
The float32[3] indicates that this tracer represents an array of float32 values of shape (3,): that is, it still has the same type and shape as in the first iteration. But in this case it is not a concrete array with three elements, it is a batched tracer representing each iteration of the vmapped input. The power of the vmap transform is that JAX effectively tracks all implied iterations of the vmapped computation in one pass: in the tracer representation, the rows of val effectively show you the intermediate values for all the vmapped iterations.
For more understanding of how JAX tracing works, a good read is How To Think In JAX in the JAX documentation.
my question is the following:
I have an array that has feature vectors that correspond to several audio files. So if for example there are 10 audio files than this array would have length 10.
I have a feature that is itself a list (this list comprises the information of a specific feature of the audio file) and for a given audio file the feature vector looks like this:
array([0.03861840871664194, 187.72393405210002, 62.59881268743305,
0.2911392405063291,
array([4963.40332031, 3229.98046875, 2691.65039062, 3208.44726562,
4338.94042969, 4220.5078125 , 4166.67480469, 4801.90429688,
5555.56640625, 5910.86425781, 6115.4296875 , 5706.29882812,
4984.93652344, 2756.25 , 1991.82128906, 2551.68457031,
2734.71679688, 2906.98242188, 3143.84765625, 3219.21386719,
3186.9140625 , 3165.38085938, 3068.48144531, 2465.55175781,
2110.25390625, 2508.61816406, 2993.11523438, 3843.67675781,
4715.77148438, 5652.46582031, 5480.20019531, 5792.43164062,
5932.39746094, 6244.62890625, 6072.36328125, 6201.5625 ,
6158.49609375, 6201.5625 , 6233.86230469, 6061.59667969])],
dtype=object)
Now when I try to feed this data into the svm model:
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
X_train, X_val, y_train, y_val = train_test_split(X,y,test_size=0.3)
model = svm.SVC()
model.fit(X_train,y_train)
yt_p = model.predict(X_train)
yv_p = model.predict(X_val)
I get this error ValueError: setting an array element with a sequence.
How can I structure my feature vector in order to be able to feed it to the svm?
EDIT:
Here I provide with an example of X
if we have 5 audio files then X will be:
array([[0.017455393927437918, 227.66237105624407, 32.42076654734572,
0.3867924528301887,
array([1851.85546875, 2433.25195312, 3057.71484375, 3079.24804688,
3079.24804688, 3068.48144531, 3046.94824219, 3359.1796875 ,
3908.27636719, 4618.87207031, 4618.87207031, 4521.97265625,
4091.30859375, 3111.54785156, 3100.78125 , 2863.91601562,
1561.15722656, 1119.7265625 , 1065.89355469, 947.4609375 ,
979.76074219, 990.52734375, 990.52734375, 1356.59179688,
2077.95410156, 2993.11523438, 3025.41503906, 3068.48144531,
3079.24804688, 3090.01464844, 3100.78125 , 3111.54785156,
2993.11523438, 3100.78125 , 3079.24804688, 2853.14941406,
1205.859375 , 1281.22558594, 1614.99023438, 2131.78710938,
2325.5859375 , 2034.88769531, 1916.45507812, 1744.18945312,
1851.85546875, 2357.88574219, 2368.65234375, 1916.45507812,
1959.52148438, 1959.52148438, 1754.95605469, 1787.25585938,
2207.15332031])],
[0.03861840871664194, 187.72393405210002, 62.59881268743305,
0.2911392405063291,
array([4963.40332031, 3229.98046875, 2691.65039062, 3208.44726562,
4338.94042969, 4220.5078125 , 4166.67480469, 4801.90429688,
5555.56640625, 5910.86425781, 6115.4296875 , 5706.29882812,
4984.93652344, 2756.25 , 1991.82128906, 2551.68457031,
2734.71679688, 2906.98242188, 3143.84765625, 3219.21386719,
3186.9140625 , 3165.38085938, 3068.48144531, 2465.55175781,
2110.25390625, 2508.61816406, 2993.11523438, 3843.67675781,
4715.77148438, 5652.46582031, 5480.20019531, 5792.43164062,
5932.39746094, 6244.62890625, 6072.36328125, 6201.5625 ,
6158.49609375, 6201.5625 , 6233.86230469, 6061.59667969])],
[0.042435441297643324, 128.81225073038124, 20.912528554426807,
0.313953488372093,
array([4349.70703125, 4242.04101562, 4274.34082031, 4123.60839844,
4457.37304688, 4834.20410156, 4661.93847656, 4306.640625 ,
4231.27441406, 4543.50585938, 4435.83984375, 6201.5625 ,
8817.84667969, 8817.84667969, 742.89550781, 721.36230469,
732.12890625, 732.12890625, 710.59570312, 721.36230469,
925.92773438, 1119.7265625 , 1141.25976562, 1431.95800781,
7762.71972656, 7934.98535156, 7891.91894531, 7332.05566406,
3789.84375 , 2799.31640625, 2831.61621094, 2217.91992188,
581.39648438, 602.9296875 , 2217.91992188, 2228.68652344,
2368.65234375, 2519.38476562, 2863.91601562, 3682.17773438,
3649.87792969, 4188.20800781, 4112.84179688])],
[0.006295381642571726, 130.28309914454434, 5.193614287487564,
0.2411764705882353,
array([7978.05175781, 8010.3515625 , 8118.01757812, 8430.24902344,
8257.98339844, 8451.78222656, 8591.74804688, 8677.88085938,
8796.31347656, 8850.14648438, 8796.31347656, 8925.51269531,
6244.62890625, 344.53125 , 344.53125 , 1614.99023438,
2325.5859375 , 2971.58203125, 3316.11328125, 3617.578125 ,
3294.58007812, 2788.54980469, 2637.81738281, 2702.41699219,
2723.95019531, 3133.08105469, 3413.01269531, 5663.23242188,
5770.8984375 , 5577.09960938, 2228.68652344, 1604.22363281,
1690.35644531, 4123.60839844, 5566.33300781, 5803.19824219,
5749.36523438, 5846.26464844, 6772.19238281, 7073.65722656,
7622.75390625, 7859.61914062, 8236.45019531, 8441.015625 ,
8699.4140625 , 8807.08007812, 8742.48046875, 8667.11425781,
8710.18066406, 8947.04589844, 9140.84472656, 9130.078125 ,
8936.27929688, 8925.51269531, 8947.04589844, 8925.51269531,
9097.77832031, 9205.44433594, 9194.67773438, 9140.84472656,
9162.37792969, 9043.9453125 , 9162.37792969, 9108.54492188,
9183.91113281, 9280.81054688, 9270.04394531, 9108.54492188,
9076.24511719, 9356.17675781, 9226.97753906, 9216.2109375 ,
9248.51074219, 9140.84472656, 9237.74414062, 9334.64355469,
9259.27734375, 9226.97753906, 9216.2109375 , 9108.54492188,
9183.91113281, 9216.2109375 , 9248.51074219, 9259.27734375,
9183.91113281])],
[0.017070271599460656, 171.91660927761163, 26.854424936811768,
0.11188811188811189,
array([4715.77148438, 4629.63867188, 4898.80371094, 5275.63476562,
4941.87011719, 4532.73925781, 4618.87207031, 4995.703125 ,
4705.00488281, 4500.43945312, 4188.20800781, 4371.24023438,
4457.37304688, 4188.20800781, 4909.5703125 , 4877.27050781,
6761.42578125, 7708.88671875, 7719.65332031, 7956.51855469,
8484.08203125, 9033.17871094, 9043.9453125 , 9000.87890625,
9011.64550781, 9011.64550781, 9000.87890625, 9108.54492188,
8817.84667969, 6686.05957031, 1808.7890625 , 1830.32226562,
1851.85546875, 1636.5234375 , 1022.82714844, 1281.22558594,
1927.22167969, 1948.75488281, 1302.75878906, 1399.65820312,
1873.38867188, 1959.52148438, 7245.92285156, 9011.64550781,
9420.77636719, 9549.97558594, 9453.07617188, 9431.54296875,
9410.00976562, 9248.51074219, 9151.61132812, 9194.67773438,
8968.57910156, 8634.81445312, 8268.75 , 7439.72167969,
5501.73339844, 5232.56835938, 5103.36914062, 7052.12402344,
7299.75585938, 7127.49023438, 7192.08984375, 5673.99902344,
5523.26660156, 5986.23046875, 6729.12597656, 6309.22851562,
5135.66894531, 5081.8359375 , 5329.46777344, 5404.83398438])]],
dtype=object)
You can feed the feature with the lists inside to your model in two ways:
Treat the list as additional features
Map all of its elements into a single number with a function you deem appropriate (min, median, mean, max, sum, etc.).
To try the first option:
# Convert `X` to data frame
X = pd.DataFrame(X)
# Rename columns
X.columns = ['feature_' + str(i + 1) for i in range(X.shape[1])]
# Convert the feature with lists inside to long format
x = X['feature_5'].explode().to_frame()
# Create counter by observation so we can pivot
x['observation_id'] = x.groupby(level=0).cumcount()
# Convert to dataset and rename all columns
x = x.pivot(columns='observation_id', values='feature_5').fillna(0)
x = x.add_prefix('list_element_')
# Drop `feature_5` from X
X.drop(columns='feature_5', axis=1, inplace=True)
# Concatenate X and x together
X = pd.concat([X, x], axis=1)
# Carry on as before
X_train, X_val, y_train, y_val = train_test_split(X,y,test_size=0.3)
model = svm.SVC()
model.fit(X_train,y_train)
There's no right answer to the second option and only you can decide how to do this because only you know what the lists mean. However, if you want to get the mean (for example) of each list and use that as a feature:
# Get the mean of each list
means = [np.mean(array) for array in X[:, 4]]
# Replace the lists with `means`
X[:, 4] = means
And then carry on with the splitting and fitting.
I'm getting different results when calculating a negative log likelihood of a simple two layer neural net in theano and numpy.
This is the numpy code:
W1,b1,W2,b2 = model['W1'], model['b1'], model['W2'], model['b2']
N, D = X.shape
where model is a function that generates initial parameters and X is the input array.
z_1 = X
z_2 = np.dot(X,W1) + b1
z_3 = np.maximum(0, z_2)
z_4 = np.dot(z_3,W2)+b2
scores = z_4
exp_scores = np.exp(scores)
exp_sum = np.sum(exp_scores, axis = 1)
exp_sum.shape = (exp_scores.shape[0],1)
y_hat = exp_scores / exp_sum
loss = np.sum(np.log(y_hat[np.arange(y.shape[0]),y]))
loss = -1/float(y_hat.shape[0])*loss + reg/2.0*np.sum(np.multiply(W1,W1))+ reg/2.0*np.sum(np.multiply(W2,W2))
I'm getting a result of 1.3819194609246772, which is the correct value for the loss function. However my Theano code yields a value of 1.3715655944645178.
t_z_1 = T.dmatrix('z_1')
t_W1 = theano.shared(value = W1, name = 'W1', borrow = True)
t_b1 = theano.shared(value = b1, name = 'b1',borrow = True)
t_W2 = theano.shared(value = W2, name = 'W2')
t_b2 = theano.shared(value = b2, name = 'b2')
t_y = T.lvector('y')
t_reg = T.dscalar('reg')
first_layer = T.dot(t_z_1,W1) + t_b1
t_hidden = T.switch(first_layer > 0, 0, first_layer)
t_out = T.nnet.softmax(T.dot(t_hidden, W2)+t_b2)
t_cost = -T.mean(T.log(t_out)[T.arange(t_y.shape[0]),t_y],dtype = theano.config.floatX, acc_dtype = theano.config.floatX)+t_reg/2.0*T.sum(T.sqr(t_W1))+t_reg/2.0*T.sum(T.sqr(t_W2))
cost_func = theano.function([t_z_1,t_y,t_reg],t_cost)
loss = cost_func(z_1,y,reg)
I'm already getting wrong results when calculating the values in the output layer. I'm not really sure what the problem could be. Does the shared function keep the type of numpy array that is used as the value argument or is that converted to float32? Can anybody tell me what I'm doing wrong in the theano code?
EDIT: The problem seems to occur in the hidden layer after applying the ReLU function: Here's the comparison in the results between theano and numpy in each layer:
theano results of first layer
[[-0.3245614 -0.22532614 -0.12609087 -0.0268556 0.07237967 0.17161493
0.2708502 0.37008547 0.46932074 0.56855601]
[-0.26107962 -0.14975259 -0.03842555 0.07290148 0.18422852 0.29555556
0.40688259 0.51820963 0.62953666 0.7408637 ]
[-0.19759784 -0.07417904 0.04923977 0.17265857 0.29607737 0.41949618
0.54291498 0.66633378 0.78975259 0.91317139]
[-0.13411606 0.00139451 0.13690508 0.27241565 0.40792623 0.5434368
0.67894737 0.81445794 0.94996851 1.08547908]
[-0.07063428 0.07696806 0.2245704 0.37217274 0.51977508 0.66737742
0.81497976 0.9625821 1.11018444 1.25778677]]
numpy results of first layer
[[-0.3245614 -0.22532614 -0.12609087 -0.0268556 0.07237967 0.17161493
0.2708502 0.37008547 0.46932074 0.56855601]
[-0.26107962 -0.14975259 -0.03842555 0.07290148 0.18422852 0.29555556
0.40688259 0.51820963 0.62953666 0.7408637 ]
[-0.19759784 -0.07417904 0.04923977 0.17265857 0.29607737 0.41949618
0.54291498 0.66633378 0.78975259 0.91317139]
[-0.13411606 0.00139451 0.13690508 0.27241565 0.40792623 0.5434368
0.67894737 0.81445794 0.94996851 1.08547908]
[-0.07063428 0.07696806 0.2245704 0.37217274 0.51977508 0.66737742
0.81497976 0.9625821 1.11018444 1.25778677]]
theano results of hidden layer
[[-0.3245614 -0.22532614 -0.12609087 -0.0268556 0. 0. 0.
0. 0. 0. ]
[-0.26107962 -0.14975259 -0.03842555 0. 0. 0. 0.
0. 0. 0. ]
[-0.19759784 -0.07417904 0. 0. 0. 0. 0.
0. 0. 0. ]
[-0.13411606 0. 0. 0. 0. 0. 0.
0. 0. 0. ]
[-0.07063428 0. 0. 0. 0. 0. 0.
0. 0. 0. ]]
numpy results of hidden layer
[[ 0. 0. 0. 0. 0.07237967 0.17161493
0.2708502 0.37008547 0.46932074 0.56855601]
[ 0. 0. 0. 0.07290148 0.18422852 0.29555556
0.40688259 0.51820963 0.62953666 0.7408637 ]
[ 0. 0. 0.04923977 0.17265857 0.29607737 0.41949618
0.54291498 0.66633378 0.78975259 0.91317139]
[ 0. 0.00139451 0.13690508 0.27241565 0.40792623 0.5434368
0.67894737 0.81445794 0.94996851 1.08547908]
[ 0. 0.07696806 0.2245704 0.37217274 0.51977508 0.66737742
0.81497976 0.9625821 1.11018444 1.25778677]]
theano results of output
[[ 0.14393463 0.2863576 0.56970777]
[ 0.14303947 0.28582359 0.57113693]
[ 0.1424154 0.28544871 0.57213589]
[ 0.14193274 0.28515729 0.57290997]
[ 0.14171057 0.28502272 0.57326671]]
numpy results of output
[[-0.5328368 0.20031504 0.93346689]
[-0.59412164 0.15498488 0.9040914 ]
[-0.67658362 0.08978957 0.85616275]
[-0.77092643 0.01339997 0.79772637]
[-0.89110401 -0.08754544 0.71601312]]
I have the idea of using the switch() function for the ReLU layer from this post: Theano HiddenLayer Activation Function and I don't really see how that function is different from the equivalent numpy code: z_3 = np.maximum(0, z_2)?!
Solution to first problem: T.switch(first_layer > 0,0,first_layer) sets all the values greater than 0 to 0 => it should be T.switch(first_layer < 0,0,first_layer).
EDIT2: The gradients that theano calculates significantly differ from the numerical gradients I was given, this is my implementation:
g_w1, g_b1, g_w2, g_b2 = T.grad(t_cost,[t_W1,t_b1,t_W2,t_b2])
grads = {}
grads['W1'] = g_w1.eval({t_z_1 : z_1, t_y : y,t_reg : reg})
grads['b1'] = g_b1.eval({t_z_1 : z_1, t_y : y,t_reg : reg})
grads['W2'] = g_w2.eval({t_z_1 : z_1, t_y : y,t_reg : reg})
grads['b2'] = g_b2.eval({t_z_1 : z_1, t_y : y,t_reg : reg})
This is an assignment for the Convolutional Neural Networks class that was offered by Stanford earlier this year and I think it's safe to say that their numerical gradients are probably correct. I could post the code to their numerical implementation though if required.
Using a relative error the following way:
def relative_error(num, ana):
numerator = np.sum(np.abs(num-ana))
denom = np.sum(np.abs(num))+np.sum(np.abs(ana))
return numerator/denom
Calculating the numerical gradients using the eval_numerical_gradient method that was provided by the course get the following relative errors for the gradients:
param_grad_num = {}
rel_error = {}
for param_name in grads:
param_grad_num[param_name] = eval_numerical_gradient(lambda W: two_layer_net(X, model, y, reg)[0], model[param_name], verbose=False)
rel_error[param_name] = relative_error(param_grad_num[param_name],grads[param_name])
{'W1': 0.010069468997284833,
'W2': 0.6628490408291472,
'b1': 1.9498867941113963e-09,
'b2': 1.7223972753120753e-11}
Which are too large for W1 and W2, the relative error should be less than 1e-8. Can anybody explain this or help in any way?