Good afternoon.
I continue to have issues with updating random elements in tensorflow by index.
I want to randomly choose indices (half of all, for instance), and then set to zero elements correspond to that indices.
Here's the problematic part:
with tf.variable_scope("foo", reuse=True):
temp_var = tf.get_variable("W")
size_2a = tf.get_variable("b")
s1 = tf.shape(temp_var).eval()[0]
s2 = tf.shape(size_2a).eval()[0]
row_indices = tf.random_uniform(dtype=tf.int32, minval=0, maxval = s1 - 1, shape=[s1]).eval()
col_indices = tf.random_uniform(dtype=tf.int32, minval=0, maxval = s2 - 1, shape=[s2]).eval()
ones_mask = tf.ones([s1,s2])
# turn 'ones_mask' into 1d variable since "scatter_update" supports linear indexing only
ones_flat = tf.Variable(tf.reshape(ones_mask, [-1]))
# no automatic promotion, so make updates float32 to match ones_mask
updates = tf.zeros(shape=(s1,), dtype=tf.float32)
# get linear indices
linear_indices = row_indices*s2 + tf.reshape(col_indices,s1*s2)
ones_flat = tf.scatter_update(ones_flat, linear_indices/2, updates)
#I want to set to zero only half of all elements,that's why linear_indices/2
# convert back into original shape
ones_mask = tf.reshape(ones_flat, ones_mask.get_shape())
It gives me ValueError: Cannot reshape a tensor with 10 elements to shape [784,10] (7840 elements) for 'foo_1/Reshape_1' (op: 'Reshape') with input shapes: [10], [2]., but I don't know how to be here without reshaping (I tried to reshape to both s1 and s2, no use)
I have already read these topics:Update values of a matrix variable in tensorflow, advanced indexing (feed_dict doesn't seem to work in my case), python numpy ValueError: operands could not be broadcast together with shapes and practically everything on the subject on stackoverflow =(
Related
I have the following tensor:
X = torch.randn(30,1,2) # [batch_size, dim_1, dim_2]
t = torch.Tensor([0])
I am trying to concatenate the t tensor into X tensor that results [30,1,3] tensor. However, I tried couple of methods even with torch.stack. I still have not figured out how to do this properly. I tried both and they gave errors.
result = torch.cat((X,t), dim = -1) # first try
result = torch.stack([X,t], dim = -1) # second try.
Is there a way I can concatenate these tensors?
You can't concatenate the two described tensors, the shape of tensor X is [30, 1 , 2], which means it has 30 positions in the first dimension, 1 position in the second dimension, and 2 positions in the last dimension, totalling 30*1*2 = 60 elements. A tensor of shape [30,1,3] has 90 elements, meaning you need to add 30 elements to get the desired result.
You can do this by changing the code to:
>>> X = torch.randn(30,1,2)
>>> t = torch.zeros(30,1,1)
>>> r = torch.cat((X,t), dim=-1)
>>> r.shape
torch.Size([30, 1, 3])
I have found myself needing to add features to existing numpy arrays which has led to a question around what the last portion of the following code is actually doing:
np.ones(shape=feature_set.shape)[...,None]
Set-up
As an example, let's say I wish to solve for linear regression parameter estimates by using numpy and solving:
Assume I have a feature set shape (50,1), a target variable of shape (50,), and I wish to use the shape of my target variable to add a column for intercept values.
It would look something like this:
# Create random target & feature set
y_train = np.random.randint(0,100, size = (50,))
feature_set = np.random.randint(0,100,size=(50,1))
# Build a set of 1s after shape of target variable
int_train = np.ones(shape=y_train.shape)[...,None]
# Able to then add int_train to feature set
X = np.concatenate((int_train, feature_set),1)
What I Think I Know
I see the difference in output when I include [...,None] vs when I leave it off. Here it is:
The second version returns an error around input arrays needing the same number of dimensions, and eventually I stumbled on the solution to use [...,None].
Main Question
While I see the output of [...,None] gives me what I want, I am struggling to find any information on what it is actually supposed to do. Can anybody walk me through what this code actually means, what the None argument is doing, etc?
Thank you!
The slice of [..., None] consists of two "shortcuts":
The ellipsis literal component:
The dots (...) represent as many colons as needed to produce a complete indexing tuple. For example, if x is a rank 5 array (i.e., it has 5 axes), then
x[1,2,...] is equivalent to x[1,2,:,:,:],
x[...,3] to x[:,:,:,:,3] and
x[4,...,5,:] to x[4,:,:,5,:].
(Source)
The None component:
numpy.newaxis
The newaxis object can be used in all slicing operations to create an axis of length one. newaxis is an alias for ‘None’, and ‘None’ can be used in place of this with the same result.
(Source)
So, arr[..., None] takes an array of dimension N and "adds" a dimension "at the end" for a resulting array of dimension N+1.
Example:
import numpy as np
x = np.array([[1,2,3],[4,5,6]])
print(x.shape) # (2, 3)
y = x[...,None]
print(y.shape) # (2, 3, 1)
z = x[:,:,np.newaxis]
print(z.shape) # (2, 3, 1)
a = np.expand_dims(x, axis=-1)
print(a.shape) # (2, 3, 1)
print((y == z).all()) # True
print((y == a).all()) # True
Consider this code:
np.ones(shape=(2,3))[...,None].shape
As you see the 'None' phrase change the (2,3) matrix to a (2,3,1) tensor. As a matter of fact it put the matrix in the LAST index of the tensor.
If you use
np.ones(shape=(2,3))[None, ...].shape
it put the matrix in the FIRST index of the tensor
tl;dr what is the most efficient way to dynamically choose some entries of a tensor.
I am trying to implement syntactic GCN in Tensorflow. Basically, I need to have a different weight matrix for every label (lets ignore biases for this question) and choose at each run the relevant entries to use, those would be chosen by a sparse matrix (for each entry there is at most one label in one direction and mostly no edge so not even that).
More concretely, when I have a sparse matrix of labeled edges (zero-one), is it better to use it in a mask, a sparse-dense tensor multiplication or maybe just use normal multiplication (I guess not the latter, but for simplicty use it in the example)
example:
units = 6 # output size
x = ops.convert_to_tensor(inputs[0], dtype=self.dtype)
labeled_edges = ops.convert_to_tensor(inputs[1], dtype=self.dtype)
edges_shape = labeled_edges.get_shape().as_list()
labeled_edges = expand_dims(labeled_edges, -2)
labeled_edges = tile(
labeled_edges, [1] * (len(edges_shape) - 1) + [units, 1])
graph_kernel = math_ops.multiply(self.kernel, labeled_edges) # here is the question basically
outputs = standard_ops.tensordot(x, graph_kernel, [[1], [0]])
outputs = math_ops.reduce_sum(outputs, [-1])
To answer your tl;dr question, you can try using either of the following:
tf.nn.embedding_lookup : typical usage is tf.nn.embedding_lookup(params, ids). It returns a Tensor, which 0-axis entries are a subset of Tensor params. The indices of kept entries are defined by Tensor ids.
tf.nn.embedding_lookup_sparse : is the same as tf.nn.embedding_lookup but takes ids as a SparseTensor.
I would like to use a generic filter to calculate the mean of values within a given window (or kernel), for values that fulfill a couple of conditions. I expected the following code to produce a mean filter of the first array in a 3-layer window, using the other two arrays to mask values from the mean calculation.
from scipy import ndimage
import numpy as np
#some test data
tstArr = np.random.rand(3,7,7)
tstArr = tstArr*10
tstArr = np.int_(tstArr)
tstArr[1] = tstArr[1]*100
tstArr[2] = tstArr[2] *1000
#mean function
def testFun(tstData,processLayer,nLayers,kernelSize):
funData= tstData.reshape((nLayers,kernelSize,kernelSize))
meanLayer = funData[processLayer]
maskedData = meanLayer[(funData[1]>1)&(funData[2]<9000)]
returnMean = np.mean(maskedData)
return returnMean
#number of layers in the array
nLayers = np.shape(tstArr)[0]
#window size
kernelSize = 5
#create a sampling window of 5x5 elements from each array
footprnt = np.ones((nLayers,kernelSize,kernelSize),dtype = np.int)
# calculate the mean of the first layer in the array (other two are for masking)
processLayer = 0
tstOut = ndimage.generic_filter(tstArr, testFun, footprint=footprnt, extra_arguments = (processLayer,nLayers,kernelSize))
I thought this would yield a 7x7 array of masked mean values from the first layer in the input array. The output is a 3x7x7 array, and I don't understand what the values represent. I'm not sure how to produce the "masked" mean-filtered array, or how to interpret the output as given.
Your code produce a mean filter of the first array in a 3-layer window, using the over two arrays to mask values from the mean calculation. You will find the result in tstOut[1].
What is going on ? When you call ndimage.generic_filter with tstArr of shape (3, 7, 7) and footprint=np.ones((3, 5, 5)) then for all i from 0 to 2, for all j from 0 to 6 and for all k from 0 to 6, testFun is called with the subarray of tstArr centered in (i, j, k) and of shape (3, 5, 5) (the array is reflected at the boundary to supply missing values).
In the end:
tstOut[0] is the mean filter of tstArr[0] with tstArr[0] and tstArr[1] as masks
tstOut[1] is the mean filter of tstArr[0] with tstArr[1] and tstArr[2] as masks
tstOut[2] is the mean filter of tstArr[1] with tstArr[2] and tstArr[2] as masks
Again, the wanted result is in tstOut[1].
I hope this will help you.
I have a tensor probs with probs.shape = (max_time, num_batches, num_labels).
And I have a tensor targets with targets.shape = (max_seq_len, num_batches) where the values are label indices, i.e. for the third dimension in probs.
Now I want to get a tensor probs_y with probs.shape = (max_time, num_batches, max_seq_len) where the third dimension is the index in targets. Basically
probs_y[:,i,:] = probs[:,i,targets[:,i]]
for all 0 <= i < num_batches.
How can I achieve this?
A similar problem with solution was posted here.
The solution there, if I understand correctly, would be:
probs_y = probs[:,T.arange(targets.shape[1])[None,:],targets]
But that doesn't seem to work. I get:
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices.
Also, isn't the creation of the temporal T.arange a bit costly? Esp when I try to workaround by really making it a full dense integer array. There should be a better way.
Maybe theano.map? But as far as I understand, that doesn't parallelize the code, so this is also not a solution.
This works for me:
import theano
import theano.tensor as T
max_time, num_batches, num_labels = 3, 4, 6
max_seq_len = 5
probs_ = np.arange(max_time * num_batches * num_labels).reshape(
max_time, num_batches, num_labels)
targets_ = np.arange(num_batches * max_seq_len).reshape(max_seq_len,
num_batches) % (num_batches - 1) # mix stuff up
probs, targets = map(theano.shared, (probs_, targets_))
print probs_
print targets_
probs_y = probs[:, T.arange(targets.shape[1])[:, np.newaxis], targets.T]
print probs_y.eval()
Above used a transposed version of your indices. Your exact proposition also works
probs_y2 = probs[:, T.arange(targets.shape[1])[np.newaxis, :], targets]
print probs_y2.eval()
print (probs_y2.dimshuffle(0, 2, 1) - probs_y).eval()
So maybe your problem is somewhere else.
As for speed, I am at a loss as to what could be faster than this. map, which is a specialization of scan almost certainly is not. I do not know to what extent the arange is actually built rather than simply iterated over.