Create Numpy array without enumerating array - python

Starting with this:
x = range(30,60,2)[::-1];
x = np.asarray(x); x
array([58, 56, 54, 52, 50, 48, 46, 44, 42, 40, 38, 36, 34, 32, 30])
Create an array like this: (Notice, first item repeats) But if I can get this faster without the first item repeating, I can np.hstack first item.
[[58 58 56 54 52]
[56 56 54 52 50]
[54 54 52 50 48]
[52 52 50 48 46]
[50 50 48 46 44]
[48 48 46 44 42]
[46 46 44 42 40]
[44 44 42 40 38]
[42 42 40 38 36]
[40 40 38 36 34]
[38 38 36 34 32]
[36 36 34 32 30]
[34 34 32 30 None]
[32 32 30 None None]
[30 30 None None None]]
The code below works, want it faster without 'for' loop and enumerate.
arr = np.empty((0,5), int)
for i,e in enumerate(x):
arr2 = np.hstack((x[i], x[i:i+4], np.asarray([None]*5)))[:5]
arr = np.vstack((arr,arr2))

Approach #1
Here's a vectorized approach using NumPy broadcasting -
N = 4 # width factor
x_ext = np.concatenate((x,[None]*(N-1)))
arr2D = x_ext[np.arange(N) + np.arange(x_ext.size-N+1)[:,None]]
out = np.column_stack((x,arr2D))
Approach #2
Here's another one using hankel -
from scipy.linalg import hankel
N = 4 # width factor
x_ext = np.concatenate((x,[None]*(N-1)))
out = np.column_stack((x,hankel(x_ext[:4], x_ext[3:]).T))
Runtime test
Here's a modified version of #Aaron's benchmarking script using an input format for this post identical to the one used for his post in that script for a fair benchmarking and focusing just on these two approaches -
upper_limit = 58 # We will edit this to vary the dataset sizes
print "Timings are : "
t = time()
for _ in range(1000): #1000 iterations of #Aaron's soln.
width = 3
x = np.array(range(upper_limit,28,-2) + [float('nan')]*width)
arr = np.empty([len(x)-width, width+2])
arr[:,0] = x[:len(x)-width]
for i in xrange(len(x)-width):
arr[i,1:] = x[i:i+width+1]
print(time()-t)
t = time()
for _ in range(1000):
N = 4 # width factor
x_ext = np.array(range(upper_limit,28,-2) + [float('nan')]*(N-1))
arr2D = x_ext[np.arange(N) + np.arange(x_ext.size-N+1)[:,None]]
out = np.column_stack((x_ext[:len(x_ext)-N+1],arr2D))
print(time()-t)
Case #1 (upper_limit = 58 ) :
Timings are :
0.0316879749298
0.0322730541229
Case #2 (upper_limit = 1058 ) :
Timings are :
0.680443048477
0.124517917633
Case #3 (upper_limit = 5058 ) :
Timings are :
3.28129291534
0.47504901886

I got about an order of magnitude faster by avoiding _stack() and only using floats...
edit: added #Divakar's post to time trial...
import numpy as np
from time import time
t = time()
for _ in range(1000): #1000 iterations of my soln.
width = 3
x = np.array(range(58,28,-2) + [float('nan')]*width)
arr = np.empty([len(x)-width, width+2])
arr[:,0] = x[:len(x)-width]
for i in xrange(len(x)-width):
arr[i,1:] = x[i:i+width+1]
print(time()-t)
t = time()
for _ in range(1000): #1000 iterations of OP code
x = range(30,60,2)[::-1];
x = np.asarray(x)
arr = np.empty((0,5), int)
for i,e in enumerate(x):
arr2 = np.hstack((x[i], x[i:i+4], np.asarray([None]*5)))[:5]
arr = np.vstack((arr,arr2))
print(time()-t)
t = time()
for _ in range(1000):
x = np.array(range(58,28,-2))
N = 4 # width factor
x_ext = np.hstack((x,[None]*(N-1)))
arr2D = x_ext[np.arange(N) + np.arange(x_ext.size-N+1)[:,None]]
out = np.column_stack((x,arr2D))
print(time()-t)
prints out:
>>> runfile('...temp.py', wdir='...')
0.0160000324249
0.374000072479
0.0319998264313
>>>

Starting with Divaker's padded x
N = 4 # width factor
x_ext = np.concatenate((x,[None]*(N-1)))
Since we aren't doing math on it, padding with None (which makes an object array) or np.nan (which makes a float) shouldn't make much difference.
The column stack could be eliminated with a little change to the indexing:
idx = np.r_[0,np.arange(N)] + np.arange(x_ext.size-N+1)[:,None]
this produces
array([[ 0, 0, 1, 2, 3],
[ 1, 1, 2, 3, 4],
[ 2, 2, 3, 4, 5],
[ 3, 3, 4, 5, 6],
[ 4, 4, 5, 6, 7],
...
so the full result is
x_ext[idx]
================
A different approach is to use striding to create a kind of rolling window.
as_strided = np.lib.stride_tricks.as_strided
arr2D = as_strided(x_ext, shape=(15,4), str‌​ides=(4,4))
This is one of easier applications of as_strided. shape is straight forward - the shape of the desired result (without the repeat column) (x.shape[0],N).
In [177]: x_ext.strides
Out[177]: (4,)
For 1d array of this type, the step to the next item is 4 bytes. If I reshape the array to 2d with 3 columns, the stride to the next row is 12 - 3*4 (3 offset).
In [181]: x_ext.reshape(6,3).strides
Out[181]: (12, 4)
Using strides=(4,4) means that the step to next row is just 4 bytes, one element in original.
as_strided(x_ext,shape=(8,4),strides=(8,4))
produces a 2 item overlap
array([[58, 56, 54, 52],
[54, 52, 50, 48],
[50, 48, 46, 44],
[46, 44, 42, 40],
....
The potentially dangerous part of as_strided is that it is possible to create an array that samples memory outside of the original data buffer. Usually that appears as large random numbers where None appears in this example. It's the same sort of error that you would encounter if C code if you were careless in using array pointers and indexing.
The as_strided array is a view (the repeated values are not copied). So writing to that array could be dangerous. The column_stack with x will make a copy, replicating the repeated values as needed.

I suggest to contruct an initial matrix with equal columns and then use np.roll() to rotate them:
import numpy as np
import timeit as ti
import numpy.matlib
x = range(30,60,2)[::-1];
x = np.asarray(x);
def sol1():
# Your solution, for comparison
arr = np.empty((0,5), int)
for i,e in enumerate(x):
arr2 = np.hstack((x[i], x[i:i+4], np.asarray([None]*5)))[:5]
arr = np.vstack((arr,arr2))
return arr
def sol2():
# My proposal
x2 = np.hstack((x, [None]*3))
mat = np.matlib.repmat(x2, 5, 1)
for i in range(3):
mat[i+2, :] = np.roll(mat[i+2, :], -(i+1))
return mat[:,:-3].T
print(ti.timeit(sol1, number=100))
print(ti.timeit(sol2, number=100))
which guives:
0.026760146000015084
0.0038611710006080102
It uses a for loop but it only iterates over the shorter axis. Also, it should not be hard to adapt this code for other configurations instead of using hardcoded numbers.

Related

find index of n consecutive values greater than zero with the largest sum from a numpy array (or pandas Series)

So here is my problem: I have an array like this:
arr = array([0, 0, 1, 8, 10, 20, 26, 32, 37, 52, 0, 0, 46, 42, 30, 19, 8, 2, 0, 0, 0])
In this array I want to find n consecutive values, greater than zero with the biggest sum. In this example with n = 5 this would be array([20, 26, 32, 37, 52]) and the index would be 5.
What I tried is of course a loop:
n = 5
max_sum = 0
max_loc = 0
for i in range(arr.size - n):
if all(arr[i:i + n] > 0) and arr[i:i + n].sum() > max_sum:
max_sum = arr[i:i + n].sum()
max_loc = i
print(max_loc)
This is fine for not too many short arrays but of course I need to use this on many not so short arrays.
I was experimenting with numpy so I would only have to iterate non-zero value groups:
diffs = np.concatenate((np.array([False]), np.diff(arr > 0)))
groups = np.split(arr, np.where(diffs)[0])
for group in groups:
if group.sum() > 0 and group.size >= n:
...
but I believe this is nice but not the right direction. I am looking for a simpler and faster numpy / pandas solution that really uses the powers of these packages.
Using cross-correlation, numpy.correlate, is a possible, concise and fast solution:
n=5
arr[arr<0] = np.iinfo(arr.dtype).min # The greatest negative integer possible
#Thanks for the np.iinfo suggestion, #Corralien
idx = np.argmax(np.correlate(arr, np.ones(n), 'valid'))
idx, arr[idx:(idx+5)]
Another possible solution:
n, l = 5, arr.size
arr[arr<0] = np.iinfo(arr.dtype).min # The greatest negative integer possible
#Thanks for the np.iinfo suggestion, #Corralien
idx = np.argmax([np.sum(np.roll(arr,-x)[:n]) for x in range(l-n+1)])
idx, arr[idx:(idx+n)]
Output:
(5, array([20, 26, 32, 37, 52]))
You can use sliding_window_view:
from numpy.lib.stride_tricks import sliding_window_view
N = 5
win = sliding_window_view(arr, N)
idx = ((win.sum(axis=1)) * ((win>0).all(axis=1))).argmax()
print(idx, arr[idx:idx+N])
# Output
5 [20 26 32 37 52]
Answer greatly enhanced by chrslg to save memory and keep a win as a view.
Update
A nice bonus is this should work with Pandas Series just fine.
N = 5
idx = pd.Series(arr).where(lambda x: x > 0).rolling(N).sum().shift(-N+1).idxmax()
print(idx, arr[idx:idx+N])
# Output
5 [20 26 32 37 52]

select random indices from 2d array

I want to generate a 2d random array and select some(m) random indices to alter their values by predefined values(m).
For an example here, I want to generate 4 by 4 matrix. Then select 4 random indices and alter their values with [105,110,115,120] this values.
random_matrix = np.random.randint(0,100,(4,4))
# array([[27, 20, 2, 8],
# [43, 88, 14, 63],
# [ 5, 55, 4, 72],
# [59, 49, 84, 96]])
Now, I want to randomly select 4 indices and alter their values from predefined p_array = [105,110,115,120]
I try to generate all the indices like this:
[
(i,j)
for i in range(len(random_matrix))
for j in range(len(random_matrix[i]))
]
But how to select 4 random indices from this and alter their values from predefined p_matrix? I couldn't think of any solution because I have to ensure 4 unique random indices where I stuck badly, as randomness haven't that guarantee.
Can we generate random matrix and selecting indices in a single shot? I need that because if the size of m getting larger and larger than it will be getting slower (current implementation). I have to ensure performance also.
Do the following:
import numpy as np
# for reproducibility
np.random.seed(42)
rows, cols = 4, 4
p_array = np.array([105, 110, 115, 120])
# generate random matrix that will always include all the values from p_array
k = rows * cols - len(p_array)
random_matrix = np.concatenate((p_array, np.random.randint(0, 100, k)))
np.random.shuffle(random_matrix)
random_matrix = random_matrix.reshape((rows, cols))
print(random_matrix)
Output
[[115 33 54 27]
[ 3 27 16 69]
[ 33 24 81 105]
[ 62 110 94 120]]
UPDATE
Assuming the same setup as before, you could do the following, to generate a random matrix knowing the indices of the p_array values:
positions = np.random.permutation(np.arange(rows * cols))
random_matrix = random_matrix[positions].reshape((rows, cols))
print("random-matrix")
print("-------------")
print(random_matrix)
print("-------------")
# get indices in flat array
flat_indices = np.argwhere(np.isin(positions, np.arange(4))).flatten()
# get indices in matrix
matrix_indices = np.unravel_index(flat_indices, (rows, cols))
print("p_array-indices")
print("-------------")
print(matrix_indices)
# verify that indeed those are the values
print(random_matrix[matrix_indices])
Output
random-matrix
-------------
[[ 60 74 20 14]
[105 86 120 82]
[ 74 87 110 51]
[ 92 115 99 71]]
-------------
p_array-indices
-------------
(array([1, 1, 2, 3]), array([0, 2, 2, 1]))
[105 120 110 115]
You can do the following, using your suggested cross-product and random.sample:
import random
from itertools import product
pool = [*product(range(len(random_matrix)), range(len(random_matrix[0])))]
random_indices = random.sample(pool, 4)
# [(3, 1), (1, 2), (2, 0), (2, 3)]

NumPy random shuffle rows independently

I have the following array:
import numpy as np
a = np.array([[ 1, 2, 3],
[ 1, 2, 3],
[ 1, 2, 3]])
I understand that np.random.shuffle(a.T) will shuffle the array along the row, but what I need is for it to shuffe each row idependently. How can this be done in numpy? Speed is critical as there will be several million rows.
For this specific problem, each row will contain the same starting population.
import numpy as np
np.random.seed(2018)
def scramble(a, axis=-1):
"""
Return an array with the values of `a` independently shuffled along the
given axis
"""
b = a.swapaxes(axis, -1)
n = a.shape[axis]
idx = np.random.choice(n, n, replace=False)
b = b[..., idx]
return b.swapaxes(axis, -1)
a = a = np.arange(4*9).reshape(4, 9)
# array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
# [ 9, 10, 11, 12, 13, 14, 15, 16, 17],
# [18, 19, 20, 21, 22, 23, 24, 25, 26],
# [27, 28, 29, 30, 31, 32, 33, 34, 35]])
print(scramble(a, axis=1))
yields
[[ 3 8 7 0 4 5 1 2 6]
[12 17 16 9 13 14 10 11 15]
[21 26 25 18 22 23 19 20 24]
[30 35 34 27 31 32 28 29 33]]
while scrambling along the 0-axis:
print(scramble(a, axis=0))
yields
[[18 19 20 21 22 23 24 25 26]
[ 0 1 2 3 4 5 6 7 8]
[27 28 29 30 31 32 33 34 35]
[ 9 10 11 12 13 14 15 16 17]]
This works by first swapping the target axis with the last axis:
b = a.swapaxes(axis, -1)
This is a common trick used to standardize code which deals with one axis.
It reduces the general case to the specific case of dealing with the last axis.
Since in NumPy version 1.10 or higher swapaxes returns a view, there is no copying involved and so calling swapaxes is very quick.
Now we can generate a new index order for the last axis:
n = a.shape[axis]
idx = np.random.choice(n, n, replace=False)
Now we can shuffle b (independently along the last axis):
b = b[..., idx]
and then reverse the swapaxes to return an a-shaped result:
return b.swapaxes(axis, -1)
If you don't want a return value and want to operate on the array directly, you can specify the indices to shuffle.
>>> import numpy as np
>>>
>>>
>>> a = np.array([[1,2,3], [1,2,3], [1,2,3]])
>>>
>>> # Shuffle row `2` independently
>>> np.random.shuffle(a[2])
>>> a
array([[1, 2, 3],
[1, 2, 3],
[3, 2, 1]])
>>>
>>> # Shuffle column `0` independently
>>> np.random.shuffle(a[:,0])
>>> a
array([[3, 2, 3],
[1, 2, 3],
[1, 2, 1]])
If you want a return value as well, you can use numpy.random.permutation, in which case replace np.random.shuffle(a[n]) with a[n] = np.random.permutation(a[n]).
Warning, do not do a[n] = np.random.shuffle(a[n]). shuffle does not return anything, so the row/column you end up "shuffling" will be filled with nan instead.
Good answer above. But I will throw in a quick and dirty way:
a = np.array([[1,2,3], [1,2,3], [1,2,3]])
ignore_list_outpput = [np.random.shuffle(x) for x in a]
Then, a can be something like this
array([[2, 1, 3],
[4, 6, 5],
[9, 7, 8]])
Not very elegant but you can get this job done with just one short line.
Building on my comment to #Hun's answer, here's the fastest way to do this:
def shuffle_along(X):
"""Minimal in place independent-row shuffler."""
[np.random.shuffle(x) for x in X]
This works in-place and can only shuffle rows. If you need more options:
def shuffle_along(X, axis=0, inline=False):
"""More elaborate version of the above."""
if not inline:
X = X.copy()
if axis == 0:
[np.random.shuffle(x) for x in X]
if axis == 1:
[np.random.shuffle(x) for x in X.T]
if not inline:
return X
This, however, has the limitation of only working on 2d-arrays. For higher dimensional tensors, I would use:
def shuffle_along(X, axis=0, inline=True):
"""Shuffle along any axis of a tensor."""
if not inline:
X = X.copy()
np.apply_along_axis(np.random.shuffle, axis, X) # <-- I just changed this
if not inline:
return X
You can do it with numpy without any loop or extra function, and much more faster. E. g., we have an array of size (2, 6) and we want a sub array (2,2) with independent random index for each column.
import numpy as np
test = np.array([[1, 1],
[2, 2],
[0.5, 0.5],
[0.3, 0.3],
[4, 4],
[7, 7]])
id_rnd = np.random.randint(6, size=(2, 2)) # select random numbers, use choice and range if don want replacement.
new = np.take_along_axis(test, id_rnd, axis=0)
Out:
array([[2. , 2. ],
[0.5, 2. ]])
It works for any number of dimensions.
As of NumPy 1.20.0 released in January 2021 we have a permuted() method on the new Generator type (introduced with the new random API in NumPy 1.17.0, released in July 2019). This does exactly what you need:
import numpy as np
rng = np.random.default_rng()
a = np.array([
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
])
shuffled = rng.permuted(a, axis=1)
This gives you something like
>>> print(shuffled)
[[2 3 1]
[1 3 2]
[2 1 3]]
As you can see, the rows are permuted independently. This is in sharp contrast with both rng.permutation() and rng.shuffle().
If you want an in-place update you can pass the original array as the out keyword argument. And you can use the axis keyword argument to choose the direction along which to shuffle your array.

How to bin a 2D array in numpy?

I'm new to numpy and I have a 2D array of objects that I need to bin into a smaller matrix and then get a count of the number of objects in each bin to make a heatmap. I followed the answer on this thread to create the bins and do the counts for a simple array but I'm not sure how to extend it to 2 dimensions. Here's what I have so far:
data_matrix = numpy.ndarray((500,500),dtype=float)
# fill array with values.
bins = numpy.linspace(0,50,50)
digitized = numpy.digitize(data_matrix, bins)
binned_data = numpy.ndarray((50,50))
for i in range(0,len(bins)):
for j in range(0,len(bins)):
k = len(data_matrix[digitized == i:digitized == j]) # <-not does not work
binned_data[i:j] = k
P.S. the [digitized == i] notation on an array will return an array of binary values. I cannot find documentation on this notation anywhere. A link would be appreciated.
You can reshape the array to a four dimensional array that reflects the desired block structure, and then sum along both axes within each block. Example:
>>> a = np.arange(24).reshape(4, 6)
>>> a
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23]])
>>> a.reshape(2, 2, 2, 3).sum(3).sum(1)
array([[ 24, 42],
[ 96, 114]])
If a has the shape m, n, the reshape should have the form
a.reshape(m_bins, m // m_bins, n_bins, n // n_bins)
At first I was also going to suggest that you use np.histogram2d rather than reinventing the wheel, but then I realized that it would be overkill to use that and would need some hacking still.
If I understand correctly, you just want to sum over submatrices of your input. That's pretty easy to brute force: going over your output submatrix and summing up each subblock of your input:
import numpy as np
def submatsum(data,n,m):
# return a matrix of shape (n,m)
bs = data.shape[0]//n,data.shape[1]//m # blocksize averaged over
return np.reshape(np.array([np.sum(data[k1*bs[0]:(k1+1)*bs[0],k2*bs[1]:(k2+1)*bs[1]]) for k1 in range(n) for k2 in range(m)]),(n,m))
# set up dummy data
N,M = 4,6
data_matrix = np.reshape(np.arange(N*M),(N,M))
# set up size of 2x3-reduced matrix, assume congruity
n,m = N//2,M//3
reduced_matrix = submatsum(data_matrix,n,m)
# check output
print(data_matrix)
print(reduced_matrix)
This prints
print(data_matrix)
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 22 23]]
print(reduced_matrix)
[[ 24 42]
[ 96 114]]
which is indeed the result for summing up submatrices of shape (2,3).
Note that I'm using // for integer division to make sure it's python3-compatible, but in case of python2 you can just use / for division (due to the numbers involved being integers).
Another solution is to have a look at the binArray function on the comments here:
Binning a numpy array
To use your example :
data_matrix = numpy.ndarray((500,500),dtype=float)
binned_data = binArray(data_matrix, 0, 10, 10, np.sum)
binned_data = binArray(binned_data, 1, 10, 10, np.sum)
The result sum all square of size 10x10 in data_matrix (of size 500x500) to obtain a single value per square in binned_data (of size 50x50).
Hope this help !

In Tensorflow, how to use tf.gather() for the last dimension?

I am trying to gather slices of a tensor in terms of the last dimension for partial connection between layers. Because the output tensor's shape is [batch_size, h, w, depth], I want to select slices based on the last dimension, such as
# L is intermediate tensor
partL = L[:, :, :, [0,2,3,8]]
However, tf.gather(L, [0, 2,3,8]) seems to only work for the first dimension (right?) Can anyone tell me how to do it?
As of TensorFlow 1.3 tf.gather has an axis parameter, so the various workarounds here are no longer necessary.
https://www.tensorflow.org/versions/r1.3/api_docs/python/tf/gather
https://github.com/tensorflow/tensorflow/issues/11223
There's a tracking bug to support this use-case here: https://github.com/tensorflow/tensorflow/issues/206
For now you can:
transpose your matrix so that dimension to gather is first (transpose is expensive)
reshape your tensor into 1d (reshape is cheap) and turn your gather column indices into a list of individual element indices at linear indexing, then reshape back
use gather_nd. Will still need to turn your column indices into list of individual element indices.
With gather_nd you can now do this as follows:
cat_idx = tf.concat([tf.range(0, tf.shape(x)[0]), indices_for_dim1], axis=0)
result = tf.gather_nd(matrix, cat_idx)
Also, as reported by user Nova in a thread referenced by #Yaroslav Bulatov's:
x = tf.constant([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
idx = tf.constant([1, 0, 2])
idx_flattened = tf.range(0, x.shape[0]) * x.shape[1] + idx
y = tf.gather(tf.reshape(x, [-1]), # flatten input
idx_flattened) # use flattened indices
with tf.Session(''):
print y.eval() # [2 4 9]
The gist is flatten the tensor and use strided 1D addressing with tf.gather(...).
Yet another solution using tf.unstack(...), tf.gather(...) and tf.stack(..)
Code:
import tensorflow as tf
import numpy as np
shape = [2, 2, 2, 10]
L = np.arange(np.prod(shape))
L = np.reshape(L, shape)
indices = [0, 2, 3, 8]
axis = -1 # last dimension
def gather_axis(params, indices, axis=0):
return tf.stack(tf.unstack(tf.gather(tf.unstack(params, axis=axis), indices)), axis=axis)
print(L)
with tf.Session() as sess:
partL = sess.run(gather_axis(L, indices, axis))
print(partL)
Result:
L =
[[[[ 0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]]
[[20 21 22 23 24 25 26 27 28 29]
[30 31 32 33 34 35 36 37 38 39]]]
[[[40 41 42 43 44 45 46 47 48 49]
[50 51 52 53 54 55 56 57 58 59]]
[[60 61 62 63 64 65 66 67 68 69]
[70 71 72 73 74 75 76 77 78 79]]]]
partL =
[[[[ 0 2 3 8]
[10 12 13 18]]
[[20 22 23 28]
[30 32 33 38]]]
[[[40 42 43 48]
[50 52 53 58]]
[[60 62 63 68]
[70 72 73 78]]]]
A correct version of #Andrei's answer would read
cat_idx = tf.stack([tf.range(0, tf.shape(x)[0]), indices_for_dim1], axis=1)
result = tf.gather_nd(matrix, cat_idx)
You can try this way, for instance(in most cases in NLP at the least),
The parameter is of shape [batch_size, depth] and the indices are [i, j, k, n, m] of which the length is batch_size. Then gather_nd can be helpful.
parameters = tf.constant([
[11, 12, 13],
[21, 22, 23],
[31, 32, 33],
[41, 42, 43]])
targets = tf.constant([2, 1, 0, 1])
batch_nums = tf.range(0, limit=parameters.get_shape().as_list()[0])
indices = tf.stack((batch_nums, targets), axis=1) # the axis is the dimension number
items = tf.gather_nd(parameters, indices)
# which is what we want: [13, 22, 31, 42]
This snippet first find the fist dimension through the batch_num and then fetch the item along that dimension by the target number.
Tensor doesn't have attribute shape, but get_shape() method. Below is runnable by Python 2.7
import tensorflow as tf
import numpy as np
x = tf.constant([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
idx = tf.constant([1, 0, 2])
idx_flattened = tf.range(0, x.get_shape()[0]) * x.get_shape()[1] + idx
y = tf.gather(tf.reshape(x, [-1]), # flatten input
idx_flattened) # use flattened indices
with tf.Session(''):
print y.eval() # [2 4 9]
Implementing 2. from #Yaroslav Bulatov's:
#Your indices
indices = [0, 2, 3, 8]
#Remember for final reshaping
n_indices = tf.shape(indices)[0]
flattened_L = tf.reshape(L, [-1])
#Walk strided over the flattened array
offset = tf.expand_dims(tf.range(0, tf.reduce_prod(tf.shape(L)), tf.shape(L)[-1]), 1)
flattened_indices = tf.reshape(tf.reshape(indices, [-1])+offset, [-1])
selected_rows = tf.gather(flattened_L, flattened_indices)
#Final reshape
partL = tf.reshape(selected_rows, tf.concat(0, [tf.shape(L)[:-1], [n_indices]]))
Credit to How to select rows from a 3-D Tensor in TensorFlow?

Categories

Resources