Ragged arange in tensorflow - python

I have an arbitrarily nested ragged tensor x I need to perform masking on. Something like:
x = tf.ragged.constant([
[[12, 9], [5]],
[[10], [6, 8], [42]],
])
The easiest way for me to mask will be by index of an element along the 1st axis. Is there a way to get a ragged arange with the same row lengths/splits like:
x = tf.ragged.constant([
[[0, 1], [2]],
[[0], [1, 2], [3]],
])

Try this code:
import tensorflow as tf
x = tf.ragged.constant([
[[12, 9], [5]],
[[10], [6, 8], [42]],
])
starts = tf.gather(x.nested_row_splits[1], x.nested_row_splits[0])[1:-1]
starts = tf.cast(starts, tf.int32)
len = tf.shape(x.flat_values)[0]
starts = tf.scatter_nd(starts[:,tf.newaxis], starts, [len])
starts = tf.scan(lambda a, x: a + x, starts)
output = tf.range(len) - starts
x = tf.RaggedTensor.from_nested_row_splits(output, x.nested_row_splits)
print(x)

I managed to solve this by merging the n-d tensor up to rank2, calculating a ragged range over that, and then essentially reshaping it from the original nested row lengths:
x = tf.ragged.constant([
[[12, 9], [5]],
[[10], [6, 8], [42]],
])
x_2d = x.merge_dims(inner_axis=-1, outer_axis=1)
arange_2d = tf.ragged.range(x_2d.row_lengths())
arange_nd = tf.RaggedTensor.from_nested_row_lengths(
arange_2d.flat_values,
x.nested_row_lengths(),
)
>>> arange_nd
<tf.RaggedTensor [[[0, 1], [2]], [[0], [1, 2], [3]]]>
See this issue for an alternative solution from one of the Tensorflow maintainers.

Related

Given a (5,2) tensor, delete rows that have duplicates in the second column

So, let's assume I have a tensor like this:
[[0,18],
[1,19],
[2, 3],
[3,19],
[4, 18]]
I need to delete rows that contains duplicates in the second column only by using tensorflow. The final output should be this:
[[0,18],
[1,19],
[2, 3]]
You should be able to solve this with tf.math.unsorted_segment_min and tf.gather:
import tensorflow as tf
x = tf.constant([[0,18],
[1,19],
[2, 3],
[3,19],
[4, 18]])
y, idx = tf.unique(x[:, 1])
indices = tf.math.unsorted_segment_min(tf.range(tf.shape(x)[0]), idx, tf.shape(y)[0])
result = tf.gather(x, indices)
print(result)
tf.Tensor(
[[ 0 18]
[ 1 19]
[ 2 3]], shape=(3, 2), dtype=int32)
Here is a simple explanation to what is happening after calling tf.unique:

How to delete particular array in 2 dimensional NumPy array by value?

Let the 2-dimensional array is as below:
In [1]: a = [[1, 2], [3, 4], [5, 6], [1, 2], [7, 8]]
a = np.array(a)
a, type(a)
Out [1]: (array([[1, 2],
[3, 4],
[5, 6],
[1, 2],
[7, 8]]),
numpy.ndarray)
I have tried to do this procedure:
In [2]: a = a[a != [1, 2])
a = np.reshape(a, (int(a.size/2), 2) # I have to do this since on the first line in In [2] change the dimension to 1 [3, 4, 5, 6, 7, 8] (the initial array is 2-dimensional array)
a
Out[2]: array([[3, 4],
[5, 6],
[7, 8]])
My question is, is there any function in NumPy that can directly do that?
Updated Question
Here's the semi-full source code that I've been working on:
from sklearn import datasets
data = datasets.load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['Target'] = pd.DataFrame(data.target)
bucket = df[df['Target'] == 0]
bucket = bucket.iloc[:,[0,1]].values
lp, rp = leftestRightest(bucket)
bucket = np.array([x for x in bucket if list(x) != lp])
bucket = np.array([x for x in bucket if list(x) != rp])
Notes:
leftestRightest(arg) is a function that returns 2 one-dimensional NumPy arrays of size 2 (which are lp and rp). For instances, lp = [1, 3], rp = [2, 4] and the parameter is 2-dimensional NumPy array
There should be a more delicate approach, but here what I have come up with:
np.array([x for x in a if list(x) != [1,2]])
Output
[[3, 4], [5, 6], [7, 8]]
Note that I wouldn't recommend working with list comprehensions in the large array since it would be highly time-consuming.
You're approach is correct, but the mask needs to be single-dimensional:
a[(a != [1, 2]).all(-1)]
Output:
array([[3, 4],
[5, 6],
[7, 8]])
Alternatively, you can collect the elements and infer the dimension with -1:
a[a != [1, 2]].reshape(-1, 2)
the boolean condition creates a 2D array of True/False. You have to apply and operation across the columns to make sure the match is not a partial match. Consider a row [5,2] in your above array, the script you wrote will add 5 and ignore 2 in the resultant 1D array. It can be done as follows:
a[np.all(a != [1, 2],axis=1)]

Get indices of element of one array using indices in another array

Suppose I have an array a of shape (2, 2, 2):
a = np.array([[[7, 9],
[19, 18]],
[[24, 5],
[18, 11]]])
and an array b that is the max of a: b=a.max(-1) (row-wise):
b = np.array([[9, 19],
[24, 18]])
I'd like to obtain the index of elements in b using index in flattened a, i.e. a.reshape(-1):
array([ 7, 9, 19, 18, 24, 5, 18, 11])
The result should be an array that is the same shape with b with indices of b in flattened a:
array([[1, 2],
[4, 6]])
Basically this is the result of maxpool2d when return_indices= True in pytorch, but I'm looking for an implementation in numpy. I've used where but it seems doesn't work, also is it possible to combine finding max and indices in one go, to be more efficient? Thanks for any help!
I have a solution similar to that of Andras based on np.argmax and np.arange. Instead of "indexing the index" I propose to add a piecewise offset to the result of np.argmax:
import numpy as np
a = np.array([[[7, 9],
[19, 18]],
[[24, 5],
[18, 11]]])
off = np.arange(0, a.size, a.shape[2]).reshape(a.shape[0], a.shape[1])
>>> off
array([[0, 2],
[4, 6]])
This results in:
>>> a.argmax(-1) + off
array([[1, 2],
[4, 6]])
Or as a one-liner:
>>> a.argmax(-1) + np.arange(0, a.size, a.shape[2]).reshape(a.shape[0], a.shape[1])
array([[1, 2],
[4, 6]])
The only solution I could think of right now is generating a 2d (or 3d, see below) range that indexes your flat array, and indexing into that with the maximum indices that define b (i.e. a.argmax(-1)):
import numpy as np
a = np.array([[[ 7, 9],
[19, 18]],
[[24, 5],
[18, 11]]])
multi_inds = a.argmax(-1)
b_shape = a.shape[:-1]
b_size = np.prod(b_shape)
flat_inds = np.arange(a.size).reshape(b_size, -1)
flat_max_inds = flat_inds[range(b_size), multi_inds.ravel()]
max_inds = flat_max_inds.reshape(b_shape)
I separated the steps with some meaningful variable names, which should hopefully explain what's going on.
multi_inds tells you which "column" to choose in each "row" in a to get the maximum:
>>> multi_inds
array([[1, 0],
[0, 0]])
flat_inds is a list of indices, from which one value is to be chosen in each row:
>>> flat_inds
array([[0, 1],
[2, 3],
[4, 5],
[6, 7]])
This is indexed into exactly according to the maximum indices in each row. flat_max_inds are the values you're looking for, but in a flat array:
>>> flat_max_inds
array([1, 2, 4, 6])
So we need to reshape that back to match b.shape:
>>> max_inds
array([[1, 2],
[4, 6]])
A slightly more obscure but also more elegant solution is to use a 3d index array and use broadcasted indexing into it:
import numpy as np
a = np.array([[[ 7, 9],
[19, 18]],
[[24, 5],
[18, 11]]])
multi_inds = a.argmax(-1)
i, j = np.indices(a.shape[:-1])
max_inds = np.arange(a.size).reshape(a.shape)[i, j, multi_inds]
This does the same thing without an intermediate flattening into 2d.
The last part is also how you can get b from multi_inds, i.e. without having to call a *max function a second time:
b = a[i, j, multi_inds]
This is a long one-liner
new = np.array([np.where(a.reshape(-1)==x)[0][0] for x in a.max(-1).reshape(-1)]).reshape(2,2)
print(new)
array([[1, 2],
[4, 3]])
However number = 18 is repeated twice; So which index is the target.

Tensorflow - Grouping placeholders by batch index

Given a network with two or more placeholders of varying dimensionality e.g.
x1 = tf.placeholder(tf.int32, [None, seq_len])
x2 = tf.placeholder(tf.int32, [None, seq_len])
xn = tf.placeholder(tf.int32, [None, None, seq_len]
The first dimension in each placeholder corresponds to the minibatch size. seq_len is the length of the inputs. The second dimension is like a list of inputs that I need to process together with x1 and x2 for each index in the minibatch. How can I group these tensors to operate on them by batch index?
For example
x1 = [[1, 2, 3], [4, 5, 6]]
x2 = [[7, 8, 9], [8, 7, 6]]
xn = [[[1, 5, 2], [7, 2, 8], [3, 2, 5]], [[8, 9, 8]]]
I need to keep x1[0] i.e. [1, 2, 3], x2[0] i.e. [7, 8, 9], and xn[0] i.e. [[1, 5, 2], [7, 2, 8], [3, 2, 5]] together, because I need to perform matrix operations between x1[i] and each element in xn[i] for all i.
Notice that the dimensionality of xn is jagged.
Still not sure if I understand your question. If I understand correctly, your challenge comes from the jagged nature of the dimensionality of xn. I have the below way to "unrolling" along batch index. The result is an array with a size of batch_size; each element in the array is a Tensor. Of course you can perform other operations for all these individual tensors before evaluating them.
I have to use tf.scan to perform the operation for each element of xn[i] because its first dimension is dynamic. There might exist better solutions though.
x1 = np.array([[1, 2, 3]])
xn = np.array([[[1, 5, 2], [7, 2, 8], [3, 2, 5]]])
batch_size = x1.shape[0]
result = []
for batch_idx in range(batch_size):
x1_i = x1[batch_idx]
xn_i = xn[batch_idx]
result.append(tf.scan(fn=lambda a, x: x * x1_i, elems=xn_i, initializer=x1_i))
with tf.Session() as sess:
print sess.run([result[0]])
# result, this is x1[0] multiply each element in xn[0] for all i (element-wise).
# free free to plug in your own matrix operations in the `fn` arg of `tf.scan`.
[array([[ 1, 10, 6],
[ 7, 4, 24],
[ 3, 4, 15]])]

Can numpy strides stride only within subarrays?

I have a really big numpy array(145000 rows * 550 cols). And I wanted to create rolling slices within subarrays. I tried to implement it with a function. The function lagged_vals behaves as expected but np.lib.stride_tricks does not behave the way I want it to -
def lagged_vals(series,l):
# Garbage implementation but still right
return np.concatenate([[x[i:i+l] for i in range(x.shape[0]) if i+l <= x.shape[0]] for x in series]
,axis = 0)
# Sample 2D numpy array
something = np.array([[1,2,2,3],[2,2,3,3]])
lagged_vals(something,2) # Works as expected
# array([[1, 2],
# [2, 2],
# [2, 3],
# [2, 2],
# [2, 3],
# [3, 3]])
np.lib.stride_tricks.as_strided(something,
(something.shape[0]*something.shape[1],2),
(8,8))
# array([[1, 2],
# [2, 2],
# [2, 3],
# [3, 2], <--- across subarray stride, which I do not want
# [2, 2],
# [2, 3],
# [3, 3])
How do I remove that particular row in the np.lib.stride_tricks implementation? And how can I scale this cross array stride removal for a big numpy array ?
Sure, that's possible with np.lib.stride_tricks.as_strided. Here's one way -
from numpy.lib.stride_tricks import as_strided
L = 2 # window length
shp = a.shape
strd = a.strides
out_shp = shp[0],shp[1]-L+1,L
out_strd = strd + (strd[1],)
out = as_strided(a, out_shp, out_strd).reshape(-1,L)
Sample input, output -
In [177]: a
Out[177]:
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
In [178]: out
Out[178]:
array([[0, 1],
[1, 2],
[2, 3],
[4, 5],
[5, 6],
[6, 7]])
Note that the last step of reshaping forces it to make a copy there. But that's can't be avoided if we need the final output to be a 2D. If we are okay with a 3D output, skip that reshape and thus achieve a view, as shown with the sample case -
In [181]: np.shares_memory(a, out)
Out[181]: False
In [182]: as_strided(a, out_shp, out_strd)
Out[182]:
array([[[0, 1],
[1, 2],
[2, 3]],
[[4, 5],
[5, 6],
[6, 7]]])
In [183]: np.shares_memory(a, as_strided(a, out_shp, out_strd) )
Out[183]: True

Categories

Resources