Window Multidimensional Tensorflow Dataset

Window Multidimensional Tensorflow Dataset - python

I have 2-dimensional data with shape m by n that I want to window with size w along the first axis into a dataset of m-w many two-dimensional arrays each of size w by n. For instance if the data is:
[[0, 1, 2 ],
[3, 4, 5 ],
[6, 7, 8 ],
[9, 10, 11]]
then I want to window it into
[[[0, 1 , 2 ],
[3, 4 , 5 ],
[6, 7 , 8 ]],
[[3, 4 , 5 ],
[6, 7 , 8 ],
[9, 10, 11]]]
I can window the data together into the right sets:
dataset = tf.data.Dataset.from_tensor_slices(np.arange(5*3).reshape(5,3))
dataset = dataset.window(size=3,shift=1,drop_remainder=True)
for window in dataset : print(list(window.as_numpy_iterator()))
>>>[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]
>>>[array([3, 4, 5]), array([6, 7, 8]), array([ 9, 10, 11])]
>>>[array([6, 7, 8]), array([ 9, 10, 11]), array([12, 13, 14])]
but I can't figure out how to get the data back into the stacked shape again. I thought maybe tf.stack, but no dice on that. Does anybody know how to finish this?

I found the answer here actually. I don't know why it works, but it does:
dataset = tf.data.Dataset.from_tensor_slices(np.arange(5*3).reshape(5,3))
dataset = dataset.window(size=3,shift=1)
dataset = dataset.flat_map(lambda x : x.batch(3))
for d in dataset : print(d)
which makes
tf.Tensor(
[[0 1 2]
[3 4 5]
[6 7 8]], shape=(3, 3), dtype=int64)
tf.Tensor(
[[ 3 4 5]
[ 6 7 8]
[ 9 10 11]], shape=(3, 3), dtype=int64)
tf.Tensor(
[[ 6 7 8]
[ 9 10 11]
[12 13 14]], shape=(3, 3), dtype=int64)
tf.Tensor(
[[ 9 10 11]
[12 13 14]], shape=(2, 3), dtype=int64)
tf.Tensor([[12 13 14]], shape=(1, 3), dtype=int64)

Related

Most efficient way to use Column major when reshape a 1D array in PyTorch

import torch
p = torch.arange(0, 12, requires_grad=False, dtype=torch.int32)
pr = torch.reshape(p, (4, 3))
what I want is
pr = [0 4 8
1 5 9
2 6 10
3 7 11]
but it actually becomes
pr = [0 1 2
3 4 5
6 7 8
9 10 11]
I search online it said permute can do it, but it will make a copy of your array, what is the most efficient way to reshape it in PyTorch?

You are close but to get what you want you have to rearrange it a little bit.
import torch
p = torch.arange(0, 12, requires_grad=False, dtype=torch.int32)
pr = torch.reshape(p, (3, 4)).t()
pr
Out[49]:
tensor([[ 0, 4, 8],
[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11]], dtype=torch.int32)

Slicing a 2D tensor similar to numpy np.ix_

I have learned how to slice a tensor on one dimension here.
I have learned how to slice a 2D tensor giving a 1D tensor of specific values here.
Both use tf.gather() but I'm pretty sure I need tf.gather_nd() though I'm obviously using it wrong.
In numpy, I have a 5x5 2D array, and I can slice a 2x2 array by using np.ix_() with row and column indices (I always need the same indices for rows and columns, resulting in a squared matrix):
import numpy as np
a = np.array([[1,2,3,4,5],[2,1,6,7,8],[3,6,1,9,10],[4,7,9,1,11],[5,8,10,11,1]])
a
array([[ 1, 2, 3, 4, 5],
[ 2, 1, 6, 7, 8],
[ 3, 6, 1, 9, 10],
[ 4, 7, 9, 1, 11],
[ 5, 8, 10, 11, 1]])
a[np.ix_([1,3], [1,3])]
array([[1, 7],
[7, 1]])
Reading over the tf.gather_nd() docs I assumed this is the way to do it in TF, but I'm using it wrong:
import tensorflow as tf
a = tf.constant([[1,2,3,4,5],[2,1,6,7,8],[3,6,1,9,10],[4,7,9,1,11],[5,8,10,11,1]])
tf.gather_nd(a, [[1,3], [1,3]])
<tf.Tensor: shape=(2,), dtype=int32, numpy=array([7, 7])>
I would have to do something like:
tf.gather_nd(a, [[[1,1], [1,3]],[[3,1],[3,3]]])
<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[1, 7],
[7, 1]])>
Which leads me down another rabbit hole I'm not keen on. My indices vector is a lot longer of course.
My indices, BTW, are 1D integer tensors themselves. So bottom-line I want to slice a with the same indices for rows and columns as I do with np._ix(), and my indices are something like:
idx = tf.constant([1, 3])
# tf.gather_nd(a, indices = "something with idx")

To slice a nxn 2D array with a 1D tensor of length d that results in a dxd 2D array with the specified indices, it can be done by using tf.repeat, tf.tile and then tf.stack:
n = 5
a = tf.constant(np.arange(n * n).reshape(n, n)) # 2D nxn array
idx = [1,2,4] # 1D tensor with length d
d = tf.shape(idx)[0]
ix_ = tf.reshape(tf.stack([tf.repeat(idx,d),tf.tile(idx,[d])],1),[d,d,2])
target = tf.gather_nd(a,ix_) # 2D dxd array
print(a)
print(target)
Expected outputs:
tf.Tensor(
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]], shape=(5, 5), dtype=int64)
tf.Tensor(
[[ 6 7 9]
[11 12 14]
[21 22 24]], shape=(3, 3), dtype=int64)

Flatten all subarrays in an array

I got a numpy array X which have shape (2 , 2 , 3) like this:
X = [[[1 , 2 , 3 ]
[4 , 5 , 6]] ,
[[7 , 8 , 9 ]
[10, 11, 12 ]],
I want to flatten all subarrays and turn X to the shape of (2 , 6) which is represented like this:
X = [[ 1 , 2 , 3 , 4, 5 , 6 ] ,
[ 7 , 8 , 9 , 10, 11 , 12 ] ]
But when I used X.flatten(), it just turned out to be like this:
X = [ 1 , 2, 3, 4 , 5, ... , 12]
Is there any function to help me transform the array like what I mean.

Just reshape....
import numpy as np
arr = np.array([[[1 , 2 , 3 ],
[4 , 5 , 6]],
[[7 , 8 , 9 ],
[10, 11, 12 ]]])
arr.reshape(2, 6)
result:
array([[ 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12]])

Iterate over the array and flatten the sublists
arr = np.array([x.flatten() for x in X])
Or for numpy solution you can also use np.hstack()
arr = np.hstack(X)
Output
print(arr)
#[[ 1 2 3 4 5 6]
# [ 7 8 9 10 11 12]]

Loop through the array and flatten the subcomponents:
x = np.array([[[1,2],[3,4]], [[5,6],[7,8]]])
y = np.array([i.flatten() for i in x])
print(x)
print(y)
[[[1 2]
[3 4]]
[[5 6]
[7 8]]]
[[1 2 3 4]
[5 6 7 8]]

If its an numpy array you can use reshape
x.reshape(2,6)
Input:
x = np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]]
x.reshape(2,6)
Output:
array([[ 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12]])

Why is the shape of multidimensional arrays handled differently in numpy when using axis parameter

I am a rookie in the python language and have a question regarding the shape of arrays.
So far as I understand, if a 3 dimensional numpy array is created like this
temp = numpy.asarray([[[0, 0, 0], [1, 1, 1], [2, 2, 2]], [[3, 3, 3], [4, 4, 4], [5, 5, 5]], [[6, 6, 6], [7, 7, 7], [8, 8, 8]]]),
the shape is created like in the following figure:
shape of 3 dimensional array
To calculate the sum, median etc. an axis can be defined to calculate the values e.g.
>>> print(numpy.median(temp, axis=0))
[[3. 3. 3.] [4. 4. 4.] [5. 5. 5.]]
>>> print(numpy.median(temp, axis=1))
[[1. 1. 1.] [4. 4. 4.] [7. 7. 7.]]
>>> print(numpy.median(temp, axis=2))
[[0. 1. 2.] [3. 4. 5.] [6. 7. 8.]]
which implies to me a shape like this shape of 3 dimensional array using axis parameter
Why is the shape handled differently when calculateing the sum, median etc.with the axis parameter?

Your numpy array temp = numpy.asarray([[[0, 0, 0], [1, 1, 1], [2, 2, 2]], [[3, 3, 3], [4, 4, 4], [5, 5, 5]], [[6, 6, 6], [7, 7, 7], [8, 8, 8]]]) looks actually like this:
axis=2
|
v
[[[0 0 0] <-axis=1
[1 1 1]
[2 2 2]] <- axis=0
[[3 3 3]
[4 4 4]
[5 5 5]]
[[6 6 6]
[7 7 7]
[8 8 8]]]
Therefore, when you take the median over specific axis, numpy keeps the rest of the axis as is and finds the median along the specified axis. To have a better understanding, I am going to use the suggested array in comments by #hpaulj:
temp:
axis=2
|
v
[[[ 0 1 2 3] <-axis=1
[ 4 5 6 7]
[ 8 9 10 11]] <- axis=0
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
We then have:
numpy.median(temp, axis=0):
#The first element is median of [0,12], second one median of [1,13] and so on.
[[ 6. 7. 8. 9.]
[10. 11. 12. 13.]
[14. 15. 16. 17.]]
np.median(temp, axis=1)
#The first element is median of [0,4,8], second one median of [1,5,9] and so on.
[[ 4. 5. 6. 7.]
[16. 17. 18. 19.]]
np.median(temp, axis=2)
#The first element is median of [0,1,2,3], second one median of [4,5,6,7] and so on.
[[ 1.5 5.5 9.5]
[13.5 17.5 21.5]]

How can I make a padded numpy array using the first/last row/column as the pad?

I am in need of efficiently padding a numpy array on all 4 sides, using the first and last row/column as the padding data. For example, given the following:
A=np.array([[1 2 3 4],
[5 6 7 8],
[9 10 11 12]])
I am trying to end up with:
B=np.array([[1 1 2 3 4 4],
[1 1 2 3 4 4],
[5 5 6 7 8 8],
[9 9 10 11 12 12],
[9 9 10 11 12 12]])
Notice the original array A is located at: B[1:-1,1:-1]. I assume I could pad in one direction first (horizontal or vertical) than the other, to get the duplicated corner values. However, my vectorization/numpification is failing me. (Note: the array I am doing this with is quite large, and i need to perform this option many times, so doing it efficiently is key- I can do it with a loop, but it is quite slow).

With np.pad, you can specify the width of padding and the padding mode to apply to an array. For your example array, the edge padding mode gives the desired result:
>>> np.pad(A, 1, 'edge')
array([[ 1, 1, 2, 3, 4, 4],
[ 1, 1, 2, 3, 4, 4],
[ 5, 5, 6, 7, 8, 8],
[ 9, 9, 10, 11, 12, 12],
[ 9, 9, 10, 11, 12, 12]])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Window Multidimensional Tensorflow Dataset - python

Related

Most efficient way to use Column major when reshape a 1D array in PyTorch

Slicing a 2D tensor similar to numpy np.ix_

Flatten all subarrays in an array

Why is the shape of multidimensional arrays handled differently in numpy when using axis parameter

How can I make a padded numpy array using the first/last row/column as the pad?

Categories

Resources