I have a tensor probs with probs.shape = (max_time, num_batches, num_labels).
And I have a tensor targets with targets.shape = (max_seq_len, num_batches) where the values are label indices, i.e. for the third dimension in probs.
Now I want to get a tensor probs_y with probs.shape = (max_time, num_batches, max_seq_len) where the third dimension is the index in targets. Basically
probs_y[:,i,:] = probs[:,i,targets[:,i]]
for all 0 <= i < num_batches.
How can I achieve this?
A similar problem with solution was posted here.
The solution there, if I understand correctly, would be:
probs_y = probs[:,T.arange(targets.shape[1])[None,:],targets]
But that doesn't seem to work. I get:
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices.
Also, isn't the creation of the temporal T.arange a bit costly? Esp when I try to workaround by really making it a full dense integer array. There should be a better way.
Maybe theano.map? But as far as I understand, that doesn't parallelize the code, so this is also not a solution.
This works for me:
import theano
import theano.tensor as T
max_time, num_batches, num_labels = 3, 4, 6
max_seq_len = 5
probs_ = np.arange(max_time * num_batches * num_labels).reshape(
max_time, num_batches, num_labels)
targets_ = np.arange(num_batches * max_seq_len).reshape(max_seq_len,
num_batches) % (num_batches - 1) # mix stuff up
probs, targets = map(theano.shared, (probs_, targets_))
print probs_
print targets_
probs_y = probs[:, T.arange(targets.shape[1])[:, np.newaxis], targets.T]
print probs_y.eval()
Above used a transposed version of your indices. Your exact proposition also works
probs_y2 = probs[:, T.arange(targets.shape[1])[np.newaxis, :], targets]
print probs_y2.eval()
print (probs_y2.dimshuffle(0, 2, 1) - probs_y).eval()
So maybe your problem is somewhere else.
As for speed, I am at a loss as to what could be faster than this. map, which is a specialization of scan almost certainly is not. I do not know to what extent the arange is actually built rather than simply iterated over.
Related
I have found myself needing to add features to existing numpy arrays which has led to a question around what the last portion of the following code is actually doing:
np.ones(shape=feature_set.shape)[...,None]
Set-up
As an example, let's say I wish to solve for linear regression parameter estimates by using numpy and solving:
Assume I have a feature set shape (50,1), a target variable of shape (50,), and I wish to use the shape of my target variable to add a column for intercept values.
It would look something like this:
# Create random target & feature set
y_train = np.random.randint(0,100, size = (50,))
feature_set = np.random.randint(0,100,size=(50,1))
# Build a set of 1s after shape of target variable
int_train = np.ones(shape=y_train.shape)[...,None]
# Able to then add int_train to feature set
X = np.concatenate((int_train, feature_set),1)
What I Think I Know
I see the difference in output when I include [...,None] vs when I leave it off. Here it is:
The second version returns an error around input arrays needing the same number of dimensions, and eventually I stumbled on the solution to use [...,None].
Main Question
While I see the output of [...,None] gives me what I want, I am struggling to find any information on what it is actually supposed to do. Can anybody walk me through what this code actually means, what the None argument is doing, etc?
Thank you!
The slice of [..., None] consists of two "shortcuts":
The ellipsis literal component:
The dots (...) represent as many colons as needed to produce a complete indexing tuple. For example, if x is a rank 5 array (i.e., it has 5 axes), then
x[1,2,...] is equivalent to x[1,2,:,:,:],
x[...,3] to x[:,:,:,:,3] and
x[4,...,5,:] to x[4,:,:,5,:].
(Source)
The None component:
numpy.newaxis
The newaxis object can be used in all slicing operations to create an axis of length one. newaxis is an alias for ‘None’, and ‘None’ can be used in place of this with the same result.
(Source)
So, arr[..., None] takes an array of dimension N and "adds" a dimension "at the end" for a resulting array of dimension N+1.
Example:
import numpy as np
x = np.array([[1,2,3],[4,5,6]])
print(x.shape) # (2, 3)
y = x[...,None]
print(y.shape) # (2, 3, 1)
z = x[:,:,np.newaxis]
print(z.shape) # (2, 3, 1)
a = np.expand_dims(x, axis=-1)
print(a.shape) # (2, 3, 1)
print((y == z).all()) # True
print((y == a).all()) # True
Consider this code:
np.ones(shape=(2,3))[...,None].shape
As you see the 'None' phrase change the (2,3) matrix to a (2,3,1) tensor. As a matter of fact it put the matrix in the LAST index of the tensor.
If you use
np.ones(shape=(2,3))[None, ...].shape
it put the matrix in the FIRST index of the tensor
I want to compute the sum product along one dimension of two multidimensional arrays, using Theano.
I'll describe precisely what I want to do using numpy first. numpy.tensordot and numpy.dot seem to always do a matrix product, whereas I'm in essence looking for a batched equivalent of a vector product. Given x and y, I want to compute z like so:
x = np.random.normal(size=(200, 2, 2, 1000))
y = np.random.normal(size=(200, 2, 2))
# this is how I now approach it:
z = np.sum(y[:,:,:,np.newaxis] * x, axis=1)
# z is of shape (200, 2, 1000)
Now I know that numpy.einsum would probably be able to help me here, but again, I want to do this particular computation in Theano, which does not have an einsum equivalent. I will need to use dot, tensordot, or Theano's specialized einsum subset functions batched_dot or batched_tensordot.
The reason I'm looking to change my approach to this is performance; I suspect that using builtin (CUDA) dot products will be faster than relying on broadcasting, element-wise product, and sum.
In Theano, none of the dimensions of three and four dimensional tensors are broadcastable. You have to explicitly set them. Then the Numpy principles will work just fine. One way to do this is to use T.patternbroadcast. To read more about broadcasting, refer this.
You have three dimensions in one of the tensors. So first you need to append a singleton dimension at the end and then make that dimension broadcastable. These two things can be achieved with a single command - T.shape_padaxis. The entire code is as follows:
import theano
from theano import tensor as T
import numpy as np
X = T.ftensor4('X')
Y = T.ftensor3('Y')
Y_broadcast = T.shape_padaxis(Y, axis=-1) # appending extra dimension and making it
# broadcastable
Z = T.sum((X*Y_broadcast), axis=1) # element-wise multiplication
f = theano.function([X, Y], Z, allow_input_downcast=True)
# Making sure that it works and gives correct results
x = np.random.normal(size=(3, 2, 2, 4))
y = np.random.normal(size=(3, 2, 2))
theano_result = f(x,y)
numpy_result = np.sum(y[:,:,:,np.newaxis] * x, axis=1)
print np.amax(theano_result - numpy_result) # prints 2.7e-7 on my system, close enough!
I hope this helps.
I am preparing the input tensor for the tensorflow RNN.
Currently I am doing the following
rnn_format = list()
for each in range(batch_size):
rnn_format.append(tf.slice(input2Dpadded,[each,0],[max_steps,10]))
lstm_input = tf.stack(rnn_format)
Would it be possible to do this at once, without loop, with some tensorflow function?
As suggested by Peter Hawkins, you can use gather_nd with the appropriate indices to get there.
Your uniform cropping on the inner dimension can simply be done before the call to gather_nd.
Example:
import tensorflow as tf
import numpy as np
sess = tf.InteractiveSession()
# integer image simply because it is more readable to me
im0 = np.random.randint(10, size=(20,20))
im = tf.constant(im0)
max_steps = 3
batch_size = 10
# create the appropriate indices here
indices = (np.arange(max_steps) +
np.arange(batch_size)[:,np.newaxis])[...,np.newaxis]
# crop then call gather_nd
res = tf.gather_nd(im[:,:10], indices).eval()
# check that the resulting tensors are equal to what you had previously
for each in range(batch_size):
assert(np.all(tf.slice(im, [each,0],[max_steps,10]).eval() == res[each]))
EDIT
If your slices indices are in a tensor, you simply replace numpy's operations with tensorflow's operations when creating indices:
# indices stored in a 1D array
my_indices = tf.constant([1, 8, 3, 0, 0])
indices = (np.arange(max_steps) +
my_indices[:,tf.newaxis])[...,tf.newaxis]
Further remarks:
indices is created by taking advantage of broadcasting during the addition: arrays are virtually tiled so that their dimensions match. Broadcasting is supported by numpy and by tensorflow in a similar fashion.
Ellipsis ... is part of the standard numpy slicing notation, it basically fills all remaining dimensions left by the other slicing indices. So [..., newaxis] is basically equivalent to expand_dims(·, -1).
Try tf.split or tf.split_v. See here:
https://www.tensorflow.org/api_docs/python/tf/split
Does that help?
Good afternoon.
I continue to have issues with updating random elements in tensorflow by index.
I want to randomly choose indices (half of all, for instance), and then set to zero elements correspond to that indices.
Here's the problematic part:
with tf.variable_scope("foo", reuse=True):
temp_var = tf.get_variable("W")
size_2a = tf.get_variable("b")
s1 = tf.shape(temp_var).eval()[0]
s2 = tf.shape(size_2a).eval()[0]
row_indices = tf.random_uniform(dtype=tf.int32, minval=0, maxval = s1 - 1, shape=[s1]).eval()
col_indices = tf.random_uniform(dtype=tf.int32, minval=0, maxval = s2 - 1, shape=[s2]).eval()
ones_mask = tf.ones([s1,s2])
# turn 'ones_mask' into 1d variable since "scatter_update" supports linear indexing only
ones_flat = tf.Variable(tf.reshape(ones_mask, [-1]))
# no automatic promotion, so make updates float32 to match ones_mask
updates = tf.zeros(shape=(s1,), dtype=tf.float32)
# get linear indices
linear_indices = row_indices*s2 + tf.reshape(col_indices,s1*s2)
ones_flat = tf.scatter_update(ones_flat, linear_indices/2, updates)
#I want to set to zero only half of all elements,that's why linear_indices/2
# convert back into original shape
ones_mask = tf.reshape(ones_flat, ones_mask.get_shape())
It gives me ValueError: Cannot reshape a tensor with 10 elements to shape [784,10] (7840 elements) for 'foo_1/Reshape_1' (op: 'Reshape') with input shapes: [10], [2]., but I don't know how to be here without reshaping (I tried to reshape to both s1 and s2, no use)
I have already read these topics:Update values of a matrix variable in tensorflow, advanced indexing (feed_dict doesn't seem to work in my case), python numpy ValueError: operands could not be broadcast together with shapes and practically everything on the subject on stackoverflow =(
I am trying to vectorize an operation using numpy, which I use in a python script that I have profiled, and found this operation to be the bottleneck and so needs to be optimized since I will run it many times.
The operation is on a data set of two parts. First, a large set (n) of 1D vectors of different lengths (with maximum length, Lmax) whose elements are integers from 1 to maxvalue. The set of vectors is arranged in a 2D array, data, of size (num_samples,Lmax) with trailing elements in each row zeroed. The second part is a set of scalar floats, one associated with each vector, that I have a computed and which depend on its length and the integer-value at each position. The set of scalars is made into a 1D array, Y, of size num_samples.
The desired operation is to form the average of Y over the n samples, as a function of (value,position along length,length).
This entire operation can be vectorized in matlab with use of the accumarray function: by using 3 2D arrays of the same size as data, whose elements are the corresponding value, position, and length indices of the desired final array:
sz_Y = num_samples;
sz_len = Lmax
sz_pos = Lmax
sz_val = maxvalue
ind_len = repmat( 1:sz_len ,1 ,sz_samples);
ind_pos = repmat( 1:sz_pos ,sz_samples,1 );
ind_val = data
ind_Y = repmat((1:sz_Y)',1 ,Lmax );
copiedY=Y(ind_Y);
mask = data>0;
finalarr=accumarray({ind_val(mask),ind_pos(mask),ind_len(mask)},copiedY(mask), [sz_val sz_pos sz_len])/sz_val;
I was hoping to emulate this implementation with np.bincounts. However, np.bincounts differs to accumarray in two relevant ways:
both arguments must be of same 1D size, and
there is no option to choose the shape of the output array.
In the above usage of accumarray, the list of indices, {ind_val(mask),ind_pos(mask),ind_len(mask)}, is 1D cell array of 1x3 arrays used as index tuples, while in np.bincounts it must be 1D scalars as far as I understand. I expect np.ravel may be useful but am not sure how to use it here to do what I want. I am coming to python from matlab and some things do not translate directly, e.g. the colon operator which ravels in opposite order to ravel. So my question is how might I use np.bincount or any other numpy method to achieve an efficient python implementation of this operation.
EDIT: To avoid wasting time: for these multiD index problems with complicated index manipulation, is the recommend route to just use cython to implement the loops explicity?
EDIT2: Alternative Python implementation I just came up with.
Here is a heavy ram solution:
First precalculate:
Using index units for length (i.e., length 1 =0) make a 4D bool array, size (num_samples,Lmax+1,Lmax+1,maxvalue) , holding where the conditions are satisfied for each value in Y.
ALLcond=np.zeros((num_samples,Lmax+1,Lmax+1,maxvalue+1),dtype='bool')
for l in range(Lmax+1):
for i in range(Lmax+1):
for v in range(maxvalue+!):
ALLcond[:,l,i,v]=(data[:,i]==v) & (Lvec==l)`
Where Lvec=[len(row) for row in data]. Then get the indices for these using np.where and initialize a 4D float array into which you will assign the values of Y:
[indY,ind_len,ind_pos,ind_val]=np.where(ALLcond)
Yval=np.zeros(np.shape(ALLcond),dtype='float')
Now in the loop in which I have to perform the operation, I compute it with the two lines:
Yval[ind_Y,ind_len,ind_pos,ind_val]=Y[ind_Y]
Y_avg=sum(Yval)/num_samples
This gives a factor of 4 or so speed up over the direct loop implementation. I was expecting more. Perhaps, this is a more tangible implementation for Python heads to digest. Any faster suggestions are welcome :)
One way is to convert the 3 "indices" to a linear index and then apply bincount. Numpy's ravel_multi_index is essentially the same as MATLAB's sub2ind. So the ported code could be something like:
shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)
ind_len = np.tile(Lvec[:,None], [1, Lmax])
ind_pos = np.tile(posvec, [n, 1])
ind_val = data
Y_copied = np.tile(Y[:,None], [1, Lmax])
mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((ind_len[mask], ind_pos[mask], ind_val[mask]), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied[mask], minlength=np.prod(shape)) / n
Y_avg.shape = shape
This is assuming data has shape (n, Lmax), Lvec is Numpy array, etc. You may need to adapt the code a little to get rid of off-by-one errors.
One could argue that the tile operations are not very efficient and not very "numpythonic". Something with broadcast_arrays could be nice, but I think I prefer this way:
shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)
len_idx = np.repeat(Lvec, Lvec)
pos_idx = np.broadcast_to(posvec, data.shape)[mask]
val_idx = data[mask]
Y_copied = np.repeat(Y, Lvec)
mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((len_idx, pos_idx, val_idx), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied, minlength=np.prod(shape)) / n
Y_avg.shape = shape
Note broadcast_to was added in Numpy 1.10.0.