How can slicing dataset in TensorFlow? - python

I want slicing dataset in tf.data. My data is like this:
dataset = tf.data.Dataset.from_tensor_slices([[0, 1, 2, 3, 4],
[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]])
Then the main data is:
[0 1 2 3 4]
[1 2 3 4 5]
[2 3 4 5 6]
[3 4 5 6 7]
[4 5 6 7 8]
I want create other tensor dataset that contain data like this:
[[1, 2],
[2, 3],
[3, 4],
[4, 5],
[5, 6]]
In the numpy it is like this:
dataset[:,1:3]
How can do this in TensorFlow?
Update:
I do that with this:
dataset2 = dataset.map(lambda data: data[1:3])
for val in dataset2:
print(val.numpy())
But I think there is good solutions.

According to me your solution is a best solution. For the benefit of community, i am using as_numpy_iterator() method of tf.data.Dataset to slice dataset (small syntax change to your code).
Please refer code below
import tensorflow as tf
dataset = tf.data.Dataset.from_tensor_slices([[0, 1, 2, 3, 4],
[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]])
dataset2 = dataset.map(lambda data: data[1:3])
for val in dataset2.as_numpy_iterator():
print(val)
Output:
[1 2]
[2 3]
[3 4]
[4 5]
[5 6]

Related

Convert one-dimensional array to two-dimensional array so that each element is a row in the result

I want to know how to convert this: array([0, 1, 2, 3, 4, 5]) to this:
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4],
[5, 5, 5]])
In short, given a flat array, repeat each element inside the array n times, so that each element creates a sub-array of n of the same element, and concatenate these sub-arrays into one, so that each row contains an element from the original array repeated n times.
I can do this:
def repeat(lst, n):
return [[e]*n for e in lst]
>repeat(range(10), 4)
[[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3],
[4, 4, 4, 4],
[5, 5, 5, 5],
[6, 6, 6, 6],
[7, 7, 7, 7],
[8, 8, 8, 8],
[9, 9, 9, 9]]
How to do this in NumPy?
You can use numpy's repeat like this:
np.repeat(range(10), 4).reshape(10,4)
which gives:
[[0 0 0 0]
[1 1 1 1]
[2 2 2 2]
[3 3 3 3]
[4 4 4 4]
[5 5 5 5]
[6 6 6 6]
[7 7 7 7]
[8 8 8 8]
[9 9 9 9]]
You can use tile that handles dimensions:
a = np.array([0, 1, 2, 3, 4, 5])
N = 4
np.tile(a[:,None], (1, N))
# or
np.tile(a, (N, 1)).T
or broadcast_to:
np.broadcast_to(a, (N, a.shape[0])).T
# or
np.broadcast_to(a[:,None], (a.shape[0], N))
Or multiply by an array of ones:
a[:,None]*np.ones(N, dtype=a.dtype)
output:
array([[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3],
[4, 4, 4, 4],
[5, 5, 5, 5]])

How do I get the diagonal of a tensor of rank higher than 2 along selected axis in tensorflow

I have an Tensor of shape tf.shape(input)=(Batch_Size,Channels,N,N) my goal is it to calculate and output which contains all diagonal elements along axis 2&3. So that tf.shape(output)=(Batch_Size,Channels,N)
There is the function tf.diag_part(input) but it doesn't let me select the axis I want to consider. How can I define a function that does this for me?
Could following code work?
Batches=[]
for batch in input:
diagonalpart=tf.diag_part(batch)
Batches.append(diagonalpart)
output=tf.stack(Batches)
The tf.linalg.diag_part should does exactly what you want, e.g:
import tensorflow as tf
import numpy as np
# Input shape: (2, 2, 4, 4)
input = np.array([
[ [[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 8, 7, 6],
[9, 8, 7, 6]],
[[5, 4, 3, 2],
[1, 2, 3, 4],
[5, 6, 7, 8],
[1, 2, 3, 4]] ],
[ [[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 8, 7, 6],
[1, 2, 3, 4]],
[[5, 4, 3, 2],
[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 8, 7, 6]] ]
])
print(tf.linalg.diag_part(input))
will outputs:
tf.Tensor(
[[[1 6 7 6]
[5 2 7 4]]
[[1 6 7 4]
[5 2 7 6]]], shape=(2, 2, 4), dtype=int32)

Tile rows of a 2D numpy array based on values in separate numpy vector

I have a source array:
a = array([[1, 1, 2, 2],
[3, 4, 5, 6],
[7, 7, 7, 8]])
And a vector that indicates how many times I want to tile each row of the array:
count = array([3, 1, 2])
I want to get:
results =array([[1, 1, 2, 2],
[1, 1, 2, 2],
[1, 1, 2, 2],
[3, 4, 5, 6],
[7, 7, 7, 8],
[7, 7, 7, 8]]
Is there a vectorized/numpy way to achieve this?
Currently I'm using an iterative loop approach and it's horribly slow when len(a) and/or count contains high values.
numpy.repeat() is what you are after:
Code:
np.repeat(a, count, axis=0)
Test Code:
import numpy as np
a = np.array([[1, 1, 2, 2],
[3, 4, 5, 6],
[7, 7, 7, 8]])
count = np.array([3, 1, 2])
print(np.repeat(a, count, axis=0))
Results:
[[1 1 2 2]
[1 1 2 2]
[1 1 2 2]
[3 4 5 6]
[7 7 7 8]
[7 7 7 8]]

How to gather data in my case using gather_nd in tensorflow?

I need to gather some data from Tensor, I used gather_nd. Now code is above
import tensorflow as tf
indices = [[[0, 4], [0, 1], [0, 6], [0, 2]],
[[1, 1], [1, 4], [1, 0], [1, 9]],
[[2, 5], [2, 1], [2, 9], [2, 6]]]
params = [[4,6,3,6,7,8,4,5,3,8], [9,5,6,2,6,5,1,9,6,4], [4,6,6,1,3,2,6,7,1,8]]
output = tf.gather_nd(params, indices)
sess = tf.Session()
print sess.run(output)
The output is
[[7 6 4 3]
[5 6 9 4]
[2 6 8 6]]
Yep, that's what I want. I want to take out the values located at 4,1,6,2 in params[0]. They are 7, 6, 4, 3 because params[0][4] = 7, params[0][1] = 6, params[0][6] = 4, params[0][2] = 3.
However, tf.gather_nd only receives a indices like above. Now my raw_indices is like,
[[4, 1, 6, 2],
[1, 4, 0, 9],
[5, 1, 9, 6]]
How can I transfer the raw_indices to indices in tensorflow? Yes, I have to do this step in tensor graph since raw_indices is generated in the middle of the graph.
A mixture of tf.range() and some tiling seems to work:
def index_matrix_to_pairs(index_matrix):
replicated_first_indices = tf.tile(
tf.expand_dims(tf.range(tf.shape(index_matrix)[0]), dim=1),
[1, tf.shape(index_matrix)[1]])
return tf.pack([replicated_first_indices, index_matrix], axis=2)
start = [[4, 1, 6, 2],
[1, 4, 0, 9],
[5, 1, 9, 6]]
with tf.Session():
print(index_matrix_to_pairs(start).eval())
Gives:
[[[0 4]
[0 1]
[0 6]
[0 2]]
[[1 1]
[1 4]
[1 0]
[1 9]]
[[2 5]
[2 1]
[2 9]
[2 6]]]
It's just generating the first part of each pair with a tiled tf.range() op, then packing that with the specified indices.

Grouping column indices of an array based on max value of another list in python

Suppose I have an array
k= array([[1, 2, 3, 4, 5],
[5, 6, 7, 8, 9],
[2, 5, 4, 7, 3],
[4, 7, 6, 8, 2],
[1, 2, 4, 3, 6],
[7, 8, 9, 5, 4]])
Assuming that after computing for each columns in an array I got array([0.6,0.4,0.75,0.2,0.75]) respectively, such that:
computation on column1 i.e. computation on array([1,5,2,4,1,7]) results in 0.6,
computation on column2 i.e. computation on array([2,6,5,7,2,8]) results in 0.4,
computation on column3 i.e. computation on array([3,7,4,6,4,9]) results in 0.75, and so on.
Let the computed list be m. such that
m=array([0.6,0.4,0.75,0.2,0.75])
So far I have computed for single columns in array k. Now I would like to group the elements in list m based on the largest floating point element in the list m and compute again on k. For example:
m[2]=m[4]=0.75 (largest number in the array), that would mean index 2 and index 4 of column in array k is largest. So keeping that index number common I would like to group k[:,2] with k[:,0],k[:,2] with k[:,1],k[:,2] with k[:,3] and similarly k[:,4] with k[:,0],k[:,4] with k[:,1],k[:,4] with k[:,3] and compute again on k,
such that grouping k[:,2] with k[:,0] means :
k0_2=array([[1,3], k1_2=array([[2,3], k3_2=array([[4,3],
[5,7], [6,7], [8,7],
[2,4], [5,4], [7,4],
[4,6], [7,6], [8,6],
[1,4], [2,4], [3,4],
[7,9]]) [8,9]]) [5,9]])
k0_4=array([[1,5], k1_4=array([[2,5], k3_4=array([[4,5],
[5,9], [6,9], [8,9],
[2,9], [5,3], [7,3],
[4,2], [7,2], [8,2],
[1,6], [2,6], [3,6],
[7,4]]) [8,4]]) [5,4]])
Can anyone please give me any clue regarding grouping column indices of array k based on max value of list m as shown above .
Does this help?
import numpy as np
k= np.array(
[[1, 2, 3, 4, 5],
[5, 6, 7, 8, 9],
[2, 5, 4, 7, 3],
[4, 7, 6, 8, 2],
[1, 2, 4, 3, 6],
[7, 8, 9, 5, 4]])
cols=[2,4]
others = [c for c in range(k.shape[1]) if c not in cols]
groups = [k[:,[o, c]] for c in cols for o in others]
for g in groups:
print(g)
print('')
This gives me
[[1 3]
[5 7]
[2 4]
[4 6]
[1 4]
[7 9]]
[[2 3]
[6 7]
[5 4]
[7 6]
[2 4]
[8 9]]
[[4 3]
[8 7]
[7 4]
[8 6]
[3 4]
[5 9]]
[[1 5]
[5 9]
[2 3]
[4 2]
[1 6]
[7 4]]
[[2 5]
[6 9]
[5 3]
[7 2]
[2 6]
[8 4]]
[[4 5]
[8 9]
[7 3]
[8 2]
[3 6]
[5 4]]
I think there's a typographical error in your k_04 array. I think k_04[2,10 should be 3, not 9, isn't that so?
Here is another solution which I along with one of my friend figured it out :
import numpy as np
k= np.array(
[[1, 2, 3, 4, 5],
[5, 6, 7, 8, 9],
[2, 5, 4, 7, 3],
[4, 7, 6, 8, 2],
[1, 2, 4, 3, 6],
[7, 8, 9, 5, 4]])
cols=[2,4]
kl = np.delete(k, np.s_[cols],1)
for i in range(len(cols)):
for j in range(len(kl[0])):
print np.column_stack((k[:,cols[i]],kl[:,j]))

Categories

Resources